Replication Data for: Alternative Datasets for Credit Scoring of Thin File Consumers (doi:10.7910/DVN/TJ6RMQ)
(Innovative Data Approaches for Assessing Credit Risk in Limited Credit History Consumers)

View:

Part 1: Document Description
Part 2: Study Description
Part 5: Other Study-Related Materials
Entire Codebook

(external link) (external link) (external link) (external link) (external link) (external link) (external link) (external link) (external link) (external link) (external link) (external link) (external link) (external link) (external link) (external link) (external link) (external link) (external link) (external link) (external link) (external link) (external link) (external link) (external link) (external link) (external link) (external link) (external link)

Document Description

Citation

Title:

Replication Data for: Alternative Datasets for Credit Scoring of Thin File Consumers

Identification Number:

doi:10.7910/DVN/TJ6RMQ

Distributor:

Harvard Dataverse

Date of Distribution:

2024-07-26

Version:

1

Bibliographic Citation:

Shukla, Deepa, 2024, "Replication Data for: Alternative Datasets for Credit Scoring of Thin File Consumers", https://doi.org/10.7910/DVN/TJ6RMQ, Harvard Dataverse, V1

Study Description

Citation

Title:

Replication Data for: Alternative Datasets for Credit Scoring of Thin File Consumers

Subtitle:

Non-Traditional Data Sources to Enhance Creditworthiness

Alternative Title:

Innovative Data Approaches for Assessing Credit Risk in Limited Credit History Consumers

Identification Number:

doi:10.7910/DVN/TJ6RMQ

Identification Number:

/kaggle/input/review-of-alternative-datasets-for-credit-scoring

Authoring Entity:

Shukla, Deepa (Jaipur National University)

Other identifications and acknowledgements:

Shukla, Deepa

Other identifications and acknowledgements:

Gupta, Sunil

Producer:

Sunil Gupta

Date of Production:

2024-07-27

Software used in Production:

Python

Distributor:

Harvard Dataverse

Access Authority:

Gupta, Sunil

Depositor:

Shukla, Deepa

Date of Deposit:

2024-07-27

Holdings Information:

https://doi.org/10.7910/DVN/TJ6RMQ

Study Scope

Keywords:

Business and Management, Computer and Information Science, Machine Learning Algorithms, Alternative Dataset, Credit Score, Behavioural Finance, Thin File

Topic Classification:

Digital Credit Scoring

Abstract:

Credit scoring is essential in financial services, allowing institutions to assess consumers' creditworthiness. Traditional credit scoring models heavily rely on extensive transaction history, which often poses a significant challenge for thin-file consumers—individuals with limited credit history. This comprehensive review aims to explore and evaluate various alternative datasets that can be utilised to improve credit scoring for thin-file consumers. By moving beyond traditional transaction profiles, alternative datasets such as social media data, web browsing behaviour, digital footprints, and telecom data offer new dimensions to assess consumer credit risk. Additionally, the review compares the effectiveness of various machine learning algorithms, including support vector machines, neural networks, decision trees, random forests, and hybrid models, in leveraging these datasets for credit scoring. The findings indicate that integrating multiple alternative data sources with advanced machine learning algorithms can significantly improve the accuracy and reliability of credit risk assessments. The comparative analysis of machine learning algorithms used in credit scoring highlights the strengths and limitations of different approaches. Support vector machines (SVM), neural networks, decision trees, random forests, and hybrid models have all shown varying degrees of success in utilising alternative datasets for credit scoring. Hybrid models combine multiple machine learning techniques and are particularly effective in leveraging diverse data sources to provide a robust credit risk assessment. This review underscores the potential of alternative datasets in revolutionizing credit scoring for thin-file consumers. By incorporating new data dimensions and advanced machine learning algorithms, researchers can improve their ability to assess credit risk accurately. Future researchers may continue to refine these models and explore new alternative datasets to enhance credit scoring models using machine learning algorithms.

Time Period:

2024-07-26-2024-07-27

Date of Collection:

2024-02-18-2024-07-20

Kind of Data:

https://github.com/Deezpa/Datasets_Review

Notes:

Dataset Overview - It contains information about various research papers related to alternative data sources for credit scoring. 1. Content of the Dataset: Authors and Year: Lists the authors of the papers and the year of publication. Title: Title of each research paper. Research Objectives: The goals or aims of the research described in each paper. Datasets Used: Types of datasets utilized in the research. Machine Learning Algorithms Employed: The algorithms used in the study. Key Findings: Major conclusions or results of the research. Citations: Links or references to the research papers. 2. Data Categories: The dataset includes a variety of alternative data categories used in credit scoring, such as social media data, web browsing behavior, digital footprints, telecom data, and hybrid data approaches. 3. Purpose of the Data: To categorize and assess the usage of different alternative datasets in credit scoring research for thin-file consumers. This data is used to analyze trends and prevalence of specific data sources in the context of credit scoring. 4. Visualization: The bar chart visualizes the distribution of these alternative data categories across the research papers. It helps to understand which data types are most frequently explored and their relevance in credit scoring models. 5. Analysis of Data: The dataset is organized to show the frequency of different categories of alternative data used in various studies. This analysis helps identify popular data sources and trends in research related to credit scoring. 6. Structure of Data: The data is structured into a table format with columns for serial number, authors and year, title, research objectives, datasets used, machine learning algorithms employed, key findings, and citations. 7. Potential Uses: This dataset can be used for literature reviews, identifying research gaps, and understanding the impact of various alternative datasets on credit scoring models. 8. Insights: The dataset provides insights into how different types of alternative data are used and evaluated in credit scoring research. It highlights the increasing interest in integrating diverse data sources to improve the accuracy and effectiveness of credit scoring models.

Methodology and Processing

Sources Statement

Data Sources:

Research Papers

Data Access

Other Study Description Materials

Related Materials

https://github.com/Deezpa/Datasets_Review

Related Publications

Citation

Title:

@misc{deepa_shukla_2024, title={Review of Alternative Datasets for Credit Scoring}, url={https://www.kaggle.com/dsv/9041453}, DOI={10.34740/KAGGLE/DSV/9041453}, publisher={Kaggle}, author={Deepa Shukla}, year={2024} }

Identification Number:

10.34740/KAGGLE/DSV/9041453

Bibliographic Citation:

@misc{deepa_shukla_2024, title={Review of Alternative Datasets for Credit Scoring}, url={https://www.kaggle.com/dsv/9041453}, DOI={10.34740/KAGGLE/DSV/9041453}, publisher={Kaggle}, author={Deepa Shukla}, year={2024} }

Citation

Title:

Smith, M., & Henderson, C. (2018). Beyond Thin Credit Files. Social Science Quarterly, 99, 24-42.

Identification Number:

10.1111/SSQU.12389

Bibliographic Citation:

Smith, M., & Henderson, C. (2018). Beyond Thin Credit Files. Social Science Quarterly, 99, 24-42.

Citation

Title:

Cheney, J. (2008). Alternative Data and its Use in Credit Scoring Thin- and No-File Consumers. Banking & Insurance.

Identification Number:

10.2139/ssrn.1160283

Bibliographic Citation:

Cheney, J. (2008). Alternative Data and its Use in Credit Scoring Thin- and No-File Consumers. Banking & Insurance.

Citation

Title:

Rozo, B., Crook, J., & Andreeva, G. (2021). The Role of Web Browsing in Credit Risk Prediction. Econometrics: Econometric & Statistical Methods - Special Topics eJournal.

Identification Number:

10.1016/j.dss.2022.113879

Bibliographic Citation:

Rozo, B., Crook, J., & Andreeva, G. (2021). The Role of Web Browsing in Credit Risk Prediction. Econometrics: Econometric & Statistical Methods - Special Topics eJournal.

Citation

Title:

Fu, G., Sun, M., & Xu, Q. (2020). An Alternative Credit Scoring System in China's Consumer Lending Market: A System Based on Digital Footprint Data. Decision-Making in Economics eJournal.

Identification Number:

10.2139/ssrn.3638710

Bibliographic Citation:

Fu, G., Sun, M., & Xu, Q. (2020). An Alternative Credit Scoring System in China's Consumer Lending Market: A System Based on Digital Footprint Data. Decision-Making in Economics eJournal.

Citation

Title:

Zhou, J., Wang, C., Ren, F., & Chen, G. (2021). Inferring multi-stage risk for online consumer credit services: An integrated scheme using data augmentation and model enhancement. Decis. Support Syst., 149, 113611.

Identification Number:

10.1016/J.DSS.2021.113611

Bibliographic Citation:

Zhou, J., Wang, C., Ren, F., & Chen, G. (2021). Inferring multi-stage risk for online consumer credit services: An integrated scheme using data augmentation and model enhancement. Decis. Support Syst., 149, 113611.

Citation

Title:

Djeundje, V., Crook, J., Calabrese, R., & Hamid, M. (2021). Enhancing credit scoring with alternative data. Expert Syst. Appl., 163, 113766.

Identification Number:

10.1016/j.eswa.2020.113766

Bibliographic Citation:

Djeundje, V., Crook, J., Calabrese, R., & Hamid, M. (2021). Enhancing credit scoring with alternative data. Expert Syst. Appl., 163, 113766.

Citation

Title:

Sustersic, M., Mramor, D., & Zupan, J. (2007). Consumer Credit Scoring Models with Limited Data. Banking & Financial Institutions eJournal.

Identification Number:

10.2139/ssrn.967384

Bibliographic Citation:

Sustersic, M., Mramor, D., & Zupan, J. (2007). Consumer Credit Scoring Models with Limited Data. Banking & Financial Institutions eJournal.

Citation

Title:

Huang, C., Chen, M., & Wang, C. (2007). Credit scoring with a data mining approach based on support vector machines. Expert Syst. Appl., 33, 847-856.

Identification Number:

10.1016/j.eswa.2006.07.007

Bibliographic Citation:

Huang, C., Chen, M., & Wang, C. (2007). Credit scoring with a data mining approach based on support vector machines. Expert Syst. Appl., 33, 847-856.

Citation

Title:

Jiang, J., Liao, L., Lu, X., Wang, Z., & Xiang, H. (2020). Deciphering Big Data in Consumer Credit Evaluation. International Political Economy: Investment & Finance eJournal.

Identification Number:

10.2139/ssrn.3312163

Bibliographic Citation:

Jiang, J., Liao, L., Lu, X., Wang, Z., & Xiang, H. (2020). Deciphering Big Data in Consumer Credit Evaluation. International Political Economy: Investment & Finance eJournal.

Citation

Title:

Dastile, X., Çelik, T., & Potsane, M. (2020). Statistical and machine learning models in credit scoring: A systematic literature survey. Appl. Soft Comput., 91, 106263.

Identification Number:

10.1016/j.asoc.2020.106263

Bibliographic Citation:

Dastile, X., Çelik, T., & Potsane, M. (2020). Statistical and machine learning models in credit scoring: A systematic literature survey. Appl. Soft Comput., 91, 106263.

Citation

Title:

Bequé, A., & Lessmann, S. (2017). Extreme learning machines for credit scoring: An empirical evaluation. Expert Syst. Appl., 86, 42-53.

Identification Number:

10.1016/j.eswa.2017.05.050

Bibliographic Citation:

Bequé, A., & Lessmann, S. (2017). Extreme learning machines for credit scoring: An empirical evaluation. Expert Syst. Appl., 86, 42-53.

Citation

Title:

Jiang, J., Liao, L., Lu, X., Wang, Z., & Xiang, H. (2021). Deciphering big data in consumer credit evaluation. Journal of Empirical Finance.

Identification Number:

10.1016/J.JEMPFIN.2021.01.009

Bibliographic Citation:

Jiang, J., Liao, L., Lu, X., Wang, Z., & Xiang, H. (2021). Deciphering big data in consumer credit evaluation. Journal of Empirical Finance.

Citation

Title:

He, H., Zhang, W., & Zhang, S. (2018). A novel ensemble method for credit scoring: Adaption of different imbalance ratios. Expert Syst. Appl., 98, 105-117.

Identification Number:

10.1016/j.eswa.2018.01.012

Bibliographic Citation:

He, H., Zhang, W., & Zhang, S. (2018). A novel ensemble method for credit scoring: Adaption of different imbalance ratios. Expert Syst. Appl., 98, 105-117.

Citation

Title:

Munkhdalai, L., Munkhdalai, T., Namsrai, O., Lee, J., & Ryu, K. (2019). An Empirical Comparison of Machine-Learning Methods on Bank Client Credit Assessments. Sustainability.

Identification Number:

10.3390/SU11030699

Bibliographic Citation:

Munkhdalai, L., Munkhdalai, T., Namsrai, O., Lee, J., & Ryu, K. (2019). An Empirical Comparison of Machine-Learning Methods on Bank Client Credit Assessments. Sustainability.

Citation

Title:

Aggarwal, N. (2018). Machine Learning, Big Data and the Regulation of Consumer Credit Markets: The Case of Algorithmic Credit Scoring. Discrimination.

Identification Number:

10.2139/ssrn.3309244

Bibliographic Citation:

Aggarwal, N. (2018). Machine Learning, Big Data and the Regulation of Consumer Credit Markets: The Case of Algorithmic Credit Scoring. Discrimination.

Citation

Title:

Zhu, B., Yang, W., Wang, H., & Yuan, Y. (2018). A hybrid deep learning model for consumer credit scoring. 2018 International Conference on Artificial Intelligence and Big Data (ICAIBD), 205-208.

Identification Number:

10.1109/ICAIBD.2018.8396195

Bibliographic Citation:

Zhu, B., Yang, W., Wang, H., & Yuan, Y. (2018). A hybrid deep learning model for consumer credit scoring. 2018 International Conference on Artificial Intelligence and Big Data (ICAIBD), 205-208.

Citation

Title:

McCanless, M. (2023). Banking on alternative credit scores: Auditing the calculative infrastructure of U.S. consumer lending. Environment and Planning A: Economy and Space, 55, 2128 - 2146.

Identification Number:

10.1177/0308518X231174026

Bibliographic Citation:

McCanless, M. (2023). Banking on alternative credit scores: Auditing the calculative infrastructure of U.S. consumer lending. Environment and Planning A: Economy and Space, 55, 2128 - 2146.

Citation

Title:

Wiginton, J. (1980). A Note on the Comparison of Logit and Discriminant Models of Consumer Credit Behavior. Journal of Financial and Quantitative Analysis, 15, 757 - 770.

Identification Number:

10.2307/2330408

Bibliographic Citation:

Wiginton, J. (1980). A Note on the Comparison of Logit and Discriminant Models of Consumer Credit Behavior. Journal of Financial and Quantitative Analysis, 15, 757 - 770.

Citation

Title:

Ala’raj, M., Abbod, M., & Majdalawieh, M. (2021). Modelling customers credit card behaviour using bidirectional LSTM neural networks. Journal of Big Data, 8, 1-27.

Identification Number:

10.1186/s40537-021-00461-7

Bibliographic Citation:

Ala’raj, M., Abbod, M., & Majdalawieh, M. (2021). Modelling customers credit card behaviour using bidirectional LSTM neural networks. Journal of Big Data, 8, 1-27.

Citation

Title:

Saberi, M., Mirtalaei, M., Hussain, F., Azadeh, A., Hussain, O., & Ashjari, B. (2013). A granular computing-based approach to credit scoring modeling. Neurocomputing, 122, 100-115.

Identification Number:

10.1016/j.neucom.2013.05.020

Bibliographic Citation:

Saberi, M., Mirtalaei, M., Hussain, F., Azadeh, A., Hussain, O., & Ashjari, B. (2013). A granular computing-based approach to credit scoring modeling. Neurocomputing, 122, 100-115.

Citation

Title:

Wei, Y., Yildirim, P., Bulte, C., & Dellarocas, C. (2014). Credit Scoring with Social Network Data. Economics of Networks eJournal.

Identification Number:

10.2139/ssrn.2475265

Bibliographic Citation:

Wei, Y., Yildirim, P., Bulte, C., & Dellarocas, C. (2014). Credit Scoring with Social Network Data. Economics of Networks eJournal.

Citation

Title:

Wang, C., Han, D., Liu, Q., & Luo, S. (2019). A Deep Learning Approach for Credit Scoring of Peer-to-Peer Lending Using Attention Mechanism LSTM. IEEE Access, 7, 2161-2168.

Identification Number:

10.1109/ACCESS.2018.2887138

Bibliographic Citation:

Wang, C., Han, D., Liu, Q., & Luo, S. (2019). A Deep Learning Approach for Credit Scoring of Peer-to-Peer Lending Using Attention Mechanism LSTM. IEEE Access, 7, 2161-2168.

Citation

Title:

West, D. (2000). Neural network credit scoring models. Comput. Oper. Res., 27, 1131-1152.

Identification Number:

10.1016/S0305-0548(99)00149-5

Bibliographic Citation:

West, D. (2000). Neural network credit scoring models. Comput. Oper. Res., 27, 1131-1152.

Citation

Title:

Brown, I., & Mues, C. (2012). An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Syst. Appl., 39, 3446-3453.

Identification Number:

10.1016/j.eswa.2011.09.033

Bibliographic Citation:

Brown, I., & Mues, C. (2012). An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Syst. Appl., 39, 3446-3453.

Citation

Title:

Lee, T., & Chen, I. (2005). A two-stage hybrid credit scoring model using artificial neural networks and multivariate adaptive regression splines. Expert Syst. Appl., 28, 743-752.

Identification Number:

10.1016/j.eswa.2004.12.031

Bibliographic Citation:

Lee, T., & Chen, I. (2005). A two-stage hybrid credit scoring model using artificial neural networks and multivariate adaptive regression splines. Expert Syst. Appl., 28, 743-752.

Citation

Title:

Mahjoub, R., & Afsar, A. (2019). A hybrid model for customer credit scoring in stock brokerages using data mining approach. Int. J. Bus. Inf. Syst., 31, 195-214.

Bibliographic Citation:

Mahjoub, R., & Afsar, A. (2019). A hybrid model for customer credit scoring in stock brokerages using data mining approach. Int. J. Bus. Inf. Syst., 31, 195-214.

Citation

Title:

Arram, A., Ayob, M., Albadr, M., Sulaiman, A., & Albashish, D. (2023). Credit card score prediction using machine learning models: A new dataset. ArXiv, abs/2310.02956.

Identification Number:

10.48550/arXiv.2310.02956

Bibliographic Citation:

Arram, A., Ayob, M., Albadr, M., Sulaiman, A., & Albashish, D. (2023). Credit card score prediction using machine learning models: A new dataset. ArXiv, abs/2310.02956.

Citation

Title:

Junior, L., Nardini, F., Renso, C., Trani, R., & Macêdo, J. (2020). A novel approach to define the local region of dynamic selection techniques in imbalanced credit scoring problems. Expert Syst. Appl., 152, 113351.

Identification Number:

10.1016/j.eswa.2020.113351

Bibliographic Citation:

Junior, L., Nardini, F., Renso, C., Trani, R., & Macêdo, J. (2020). A novel approach to define the local region of dynamic selection techniques in imbalanced credit scoring problems. Expert Syst. Appl., 152, 113351.

Other Study-Related Materials

Label:

Systematic Alternative Dataset Review.xlsx

Text:

The findings indicate that social media data provides valuable insights into consumer behaviour and financial reliability, while web browsing patterns and digital footprints offer additional dimensions to assess creditworthiness. Telecom data, including call records and mobile payment history, has proven to be a reliable indicator of financial behaviour. Hybrid approaches, which combine these various data sources, provide a more comprehensive assessment of credit risk, addressing the limitations of relying on a single type of alternative dataset.

Notes:

application/vnd.openxmlformats-officedocument.spreadsheetml.sheet