View: |
Part 1: Document Description
|
Citation |
|
---|---|
Title: |
Replication Data for: Alternative Datasets for Credit Scoring of Thin File Consumers |
Identification Number: |
doi:10.7910/DVN/TJ6RMQ |
Distributor: |
Harvard Dataverse |
Date of Distribution: |
2024-07-26 |
Version: |
1 |
Bibliographic Citation: |
Shukla, Deepa, 2024, "Replication Data for: Alternative Datasets for Credit Scoring of Thin File Consumers", https://doi.org/10.7910/DVN/TJ6RMQ, Harvard Dataverse, V1 |
Citation |
|
Title: |
Replication Data for: Alternative Datasets for Credit Scoring of Thin File Consumers |
Subtitle: |
Non-Traditional Data Sources to Enhance Creditworthiness |
Alternative Title: |
Innovative Data Approaches for Assessing Credit Risk in Limited Credit History Consumers |
Identification Number: |
doi:10.7910/DVN/TJ6RMQ |
Identification Number: |
/kaggle/input/review-of-alternative-datasets-for-credit-scoring |
Authoring Entity: |
Shukla, Deepa (Jaipur National University) |
Other identifications and acknowledgements: |
Shukla, Deepa |
Other identifications and acknowledgements: |
Gupta, Sunil |
Producer: |
Sunil Gupta |
Date of Production: |
2024-07-27 |
Software used in Production: |
Python |
Distributor: |
Harvard Dataverse |
Access Authority: |
Gupta, Sunil |
Depositor: |
Shukla, Deepa |
Date of Deposit: |
2024-07-27 |
Holdings Information: |
https://doi.org/10.7910/DVN/TJ6RMQ |
Study Scope |
|
Keywords: |
Business and Management, Computer and Information Science, Machine Learning Algorithms, Alternative Dataset, Credit Score, Behavioural Finance, Thin File |
Topic Classification: |
Digital Credit Scoring |
Abstract: |
Credit scoring is essential in financial services, allowing institutions to assess consumers' creditworthiness. Traditional credit scoring models heavily rely on extensive transaction history, which often poses a significant challenge for thin-file consumers—individuals with limited credit history. This comprehensive review aims to explore and evaluate various alternative datasets that can be utilised to improve credit scoring for thin-file consumers. By moving beyond traditional transaction profiles, alternative datasets such as social media data, web browsing behaviour, digital footprints, and telecom data offer new dimensions to assess consumer credit risk. Additionally, the review compares the effectiveness of various machine learning algorithms, including support vector machines, neural networks, decision trees, random forests, and hybrid models, in leveraging these datasets for credit scoring. The findings indicate that integrating multiple alternative data sources with advanced machine learning algorithms can significantly improve the accuracy and reliability of credit risk assessments. The comparative analysis of machine learning algorithms used in credit scoring highlights the strengths and limitations of different approaches. Support vector machines (SVM), neural networks, decision trees, random forests, and hybrid models have all shown varying degrees of success in utilising alternative datasets for credit scoring. Hybrid models combine multiple machine learning techniques and are particularly effective in leveraging diverse data sources to provide a robust credit risk assessment. This review underscores the potential of alternative datasets in revolutionizing credit scoring for thin-file consumers. By incorporating new data dimensions and advanced machine learning algorithms, researchers can improve their ability to assess credit risk accurately. Future researchers may continue to refine these models and explore new alternative datasets to enhance credit scoring models using machine learning algorithms. |
Time Period: |
2024-07-26-2024-07-27 |
Date of Collection: |
2024-02-18-2024-07-20 |
Kind of Data: |
https://github.com/Deezpa/Datasets_Review |
Notes: |
Dataset Overview - It contains information about various research papers related to alternative data sources for credit scoring. 1. Content of the Dataset: Authors and Year: Lists the authors of the papers and the year of publication. Title: Title of each research paper. Research Objectives: The goals or aims of the research described in each paper. Datasets Used: Types of datasets utilized in the research. Machine Learning Algorithms Employed: The algorithms used in the study. Key Findings: Major conclusions or results of the research. Citations: Links or references to the research papers. 2. Data Categories: The dataset includes a variety of alternative data categories used in credit scoring, such as social media data, web browsing behavior, digital footprints, telecom data, and hybrid data approaches. 3. Purpose of the Data: To categorize and assess the usage of different alternative datasets in credit scoring research for thin-file consumers. This data is used to analyze trends and prevalence of specific data sources in the context of credit scoring. 4. Visualization: The bar chart visualizes the distribution of these alternative data categories across the research papers. It helps to understand which data types are most frequently explored and their relevance in credit scoring models. 5. Analysis of Data: The dataset is organized to show the frequency of different categories of alternative data used in various studies. This analysis helps identify popular data sources and trends in research related to credit scoring. 6. Structure of Data: The data is structured into a table format with columns for serial number, authors and year, title, research objectives, datasets used, machine learning algorithms employed, key findings, and citations. 7. Potential Uses: This dataset can be used for literature reviews, identifying research gaps, and understanding the impact of various alternative datasets on credit scoring models. 8. Insights: The dataset provides insights into how different types of alternative data are used and evaluated in credit scoring research. It highlights the increasing interest in integrating diverse data sources to improve the accuracy and effectiveness of credit scoring models. |
Methodology and Processing |
|
Sources Statement |
|
Data Sources: |
Research Papers |
Data Access |
|
Other Study Description Materials |
|
Related Materials |
|
https://github.com/Deezpa/Datasets_Review |
|
Related Publications |
|
Citation |
|
Title: |
@misc{deepa_shukla_2024, title={Review of Alternative Datasets for Credit Scoring}, url={https://www.kaggle.com/dsv/9041453}, DOI={10.34740/KAGGLE/DSV/9041453}, publisher={Kaggle}, author={Deepa Shukla}, year={2024} } |
Identification Number: |
10.34740/KAGGLE/DSV/9041453 |
Bibliographic Citation: |
@misc{deepa_shukla_2024, title={Review of Alternative Datasets for Credit Scoring}, url={https://www.kaggle.com/dsv/9041453}, DOI={10.34740/KAGGLE/DSV/9041453}, publisher={Kaggle}, author={Deepa Shukla}, year={2024} } |
Citation |
|
Title: |
Smith, M., & Henderson, C. (2018). Beyond Thin Credit Files. Social Science Quarterly, 99, 24-42. |
Identification Number: |
10.1111/SSQU.12389 |
Bibliographic Citation: |
Smith, M., & Henderson, C. (2018). Beyond Thin Credit Files. Social Science Quarterly, 99, 24-42. |
Citation |
|
Title: |
Cheney, J. (2008). Alternative Data and its Use in Credit Scoring Thin- and No-File Consumers. Banking & Insurance. |
Identification Number: |
10.2139/ssrn.1160283 |
Bibliographic Citation: |
Cheney, J. (2008). Alternative Data and its Use in Credit Scoring Thin- and No-File Consumers. Banking & Insurance. |
Citation |
|
Title: |
Rozo, B., Crook, J., & Andreeva, G. (2021). The Role of Web Browsing in Credit Risk Prediction. Econometrics: Econometric & Statistical Methods - Special Topics eJournal. |
Identification Number: |
10.1016/j.dss.2022.113879 |
Bibliographic Citation: |
Rozo, B., Crook, J., & Andreeva, G. (2021). The Role of Web Browsing in Credit Risk Prediction. Econometrics: Econometric & Statistical Methods - Special Topics eJournal. |
Citation |
|
Title: |
Fu, G., Sun, M., & Xu, Q. (2020). An Alternative Credit Scoring System in China's Consumer Lending Market: A System Based on Digital Footprint Data. Decision-Making in Economics eJournal. |
Identification Number: |
10.2139/ssrn.3638710 |
Bibliographic Citation: |
Fu, G., Sun, M., & Xu, Q. (2020). An Alternative Credit Scoring System in China's Consumer Lending Market: A System Based on Digital Footprint Data. Decision-Making in Economics eJournal. |
Citation |
|
Title: |
Zhou, J., Wang, C., Ren, F., & Chen, G. (2021). Inferring multi-stage risk for online consumer credit services: An integrated scheme using data augmentation and model enhancement. Decis. Support Syst., 149, 113611. |
Identification Number: |
10.1016/J.DSS.2021.113611 |
Bibliographic Citation: |
Zhou, J., Wang, C., Ren, F., & Chen, G. (2021). Inferring multi-stage risk for online consumer credit services: An integrated scheme using data augmentation and model enhancement. Decis. Support Syst., 149, 113611. |
Citation |
|
Title: |
Djeundje, V., Crook, J., Calabrese, R., & Hamid, M. (2021). Enhancing credit scoring with alternative data. Expert Syst. Appl., 163, 113766. |
Identification Number: |
10.1016/j.eswa.2020.113766 |
Bibliographic Citation: |
Djeundje, V., Crook, J., Calabrese, R., & Hamid, M. (2021). Enhancing credit scoring with alternative data. Expert Syst. Appl., 163, 113766. |
Citation |
|
Title: |
Sustersic, M., Mramor, D., & Zupan, J. (2007). Consumer Credit Scoring Models with Limited Data. Banking & Financial Institutions eJournal. |
Identification Number: |
10.2139/ssrn.967384 |
Bibliographic Citation: |
Sustersic, M., Mramor, D., & Zupan, J. (2007). Consumer Credit Scoring Models with Limited Data. Banking & Financial Institutions eJournal. |
Citation |
|
Title: |
Huang, C., Chen, M., & Wang, C. (2007). Credit scoring with a data mining approach based on support vector machines. Expert Syst. Appl., 33, 847-856. |
Identification Number: |
10.1016/j.eswa.2006.07.007 |
Bibliographic Citation: |
Huang, C., Chen, M., & Wang, C. (2007). Credit scoring with a data mining approach based on support vector machines. Expert Syst. Appl., 33, 847-856. |
Citation |
|
Title: |
Jiang, J., Liao, L., Lu, X., Wang, Z., & Xiang, H. (2020). Deciphering Big Data in Consumer Credit Evaluation. International Political Economy: Investment & Finance eJournal. |
Identification Number: |
10.2139/ssrn.3312163 |
Bibliographic Citation: |
Jiang, J., Liao, L., Lu, X., Wang, Z., & Xiang, H. (2020). Deciphering Big Data in Consumer Credit Evaluation. International Political Economy: Investment & Finance eJournal. |
Citation |
|
Title: |
Dastile, X., Çelik, T., & Potsane, M. (2020). Statistical and machine learning models in credit scoring: A systematic literature survey. Appl. Soft Comput., 91, 106263. |
Identification Number: |
10.1016/j.asoc.2020.106263 |
Bibliographic Citation: |
Dastile, X., Çelik, T., & Potsane, M. (2020). Statistical and machine learning models in credit scoring: A systematic literature survey. Appl. Soft Comput., 91, 106263. |
Citation |
|
Title: |
Bequé, A., & Lessmann, S. (2017). Extreme learning machines for credit scoring: An empirical evaluation. Expert Syst. Appl., 86, 42-53. |
Identification Number: |
10.1016/j.eswa.2017.05.050 |
Bibliographic Citation: |
Bequé, A., & Lessmann, S. (2017). Extreme learning machines for credit scoring: An empirical evaluation. Expert Syst. Appl., 86, 42-53. |
Citation |
|
Title: |
Jiang, J., Liao, L., Lu, X., Wang, Z., & Xiang, H. (2021). Deciphering big data in consumer credit evaluation. Journal of Empirical Finance. |
Identification Number: |
10.1016/J.JEMPFIN.2021.01.009 |
Bibliographic Citation: |
Jiang, J., Liao, L., Lu, X., Wang, Z., & Xiang, H. (2021). Deciphering big data in consumer credit evaluation. Journal of Empirical Finance. |
Citation |
|
Title: |
He, H., Zhang, W., & Zhang, S. (2018). A novel ensemble method for credit scoring: Adaption of different imbalance ratios. Expert Syst. Appl., 98, 105-117. |
Identification Number: |
10.1016/j.eswa.2018.01.012 |
Bibliographic Citation: |
He, H., Zhang, W., & Zhang, S. (2018). A novel ensemble method for credit scoring: Adaption of different imbalance ratios. Expert Syst. Appl., 98, 105-117. |
Citation |
|
Title: |
Munkhdalai, L., Munkhdalai, T., Namsrai, O., Lee, J., & Ryu, K. (2019). An Empirical Comparison of Machine-Learning Methods on Bank Client Credit Assessments. Sustainability. |
Identification Number: |
10.3390/SU11030699 |
Bibliographic Citation: |
Munkhdalai, L., Munkhdalai, T., Namsrai, O., Lee, J., & Ryu, K. (2019). An Empirical Comparison of Machine-Learning Methods on Bank Client Credit Assessments. Sustainability. |
Citation |
|
Title: |
Aggarwal, N. (2018). Machine Learning, Big Data and the Regulation of Consumer Credit Markets: The Case of Algorithmic Credit Scoring. Discrimination. |
Identification Number: |
10.2139/ssrn.3309244 |
Bibliographic Citation: |
Aggarwal, N. (2018). Machine Learning, Big Data and the Regulation of Consumer Credit Markets: The Case of Algorithmic Credit Scoring. Discrimination. |
Citation |
|
Title: |
Zhu, B., Yang, W., Wang, H., & Yuan, Y. (2018). A hybrid deep learning model for consumer credit scoring. 2018 International Conference on Artificial Intelligence and Big Data (ICAIBD), 205-208. |
Identification Number: |
10.1109/ICAIBD.2018.8396195 |
Bibliographic Citation: |
Zhu, B., Yang, W., Wang, H., & Yuan, Y. (2018). A hybrid deep learning model for consumer credit scoring. 2018 International Conference on Artificial Intelligence and Big Data (ICAIBD), 205-208. |
Citation |
|
Title: |
McCanless, M. (2023). Banking on alternative credit scores: Auditing the calculative infrastructure of U.S. consumer lending. Environment and Planning A: Economy and Space, 55, 2128 - 2146. |
Identification Number: |
10.1177/0308518X231174026 |
Bibliographic Citation: |
McCanless, M. (2023). Banking on alternative credit scores: Auditing the calculative infrastructure of U.S. consumer lending. Environment and Planning A: Economy and Space, 55, 2128 - 2146. |
Citation |
|
Title: |
Wiginton, J. (1980). A Note on the Comparison of Logit and Discriminant Models of Consumer Credit Behavior. Journal of Financial and Quantitative Analysis, 15, 757 - 770. |
Identification Number: |
10.2307/2330408 |
Bibliographic Citation: |
Wiginton, J. (1980). A Note on the Comparison of Logit and Discriminant Models of Consumer Credit Behavior. Journal of Financial and Quantitative Analysis, 15, 757 - 770. |
Citation |
|
Title: |
Ala’raj, M., Abbod, M., & Majdalawieh, M. (2021). Modelling customers credit card behaviour using bidirectional LSTM neural networks. Journal of Big Data, 8, 1-27. |
Identification Number: |
10.1186/s40537-021-00461-7 |
Bibliographic Citation: |
Ala’raj, M., Abbod, M., & Majdalawieh, M. (2021). Modelling customers credit card behaviour using bidirectional LSTM neural networks. Journal of Big Data, 8, 1-27. |
Citation |
|
Title: |
Saberi, M., Mirtalaei, M., Hussain, F., Azadeh, A., Hussain, O., & Ashjari, B. (2013). A granular computing-based approach to credit scoring modeling. Neurocomputing, 122, 100-115. |
Identification Number: |
10.1016/j.neucom.2013.05.020 |
Bibliographic Citation: |
Saberi, M., Mirtalaei, M., Hussain, F., Azadeh, A., Hussain, O., & Ashjari, B. (2013). A granular computing-based approach to credit scoring modeling. Neurocomputing, 122, 100-115. |
Citation |
|
Title: |
Wei, Y., Yildirim, P., Bulte, C., & Dellarocas, C. (2014). Credit Scoring with Social Network Data. Economics of Networks eJournal. |
Identification Number: |
10.2139/ssrn.2475265 |
Bibliographic Citation: |
Wei, Y., Yildirim, P., Bulte, C., & Dellarocas, C. (2014). Credit Scoring with Social Network Data. Economics of Networks eJournal. |
Citation |
|
Title: |
Wang, C., Han, D., Liu, Q., & Luo, S. (2019). A Deep Learning Approach for Credit Scoring of Peer-to-Peer Lending Using Attention Mechanism LSTM. IEEE Access, 7, 2161-2168. |
Identification Number: |
10.1109/ACCESS.2018.2887138 |
Bibliographic Citation: |
Wang, C., Han, D., Liu, Q., & Luo, S. (2019). A Deep Learning Approach for Credit Scoring of Peer-to-Peer Lending Using Attention Mechanism LSTM. IEEE Access, 7, 2161-2168. |
Citation |
|
Title: |
West, D. (2000). Neural network credit scoring models. Comput. Oper. Res., 27, 1131-1152. |
Identification Number: |
10.1016/S0305-0548(99)00149-5 |
Bibliographic Citation: |
West, D. (2000). Neural network credit scoring models. Comput. Oper. Res., 27, 1131-1152. |
Citation |
|
Title: |
Brown, I., & Mues, C. (2012). An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Syst. Appl., 39, 3446-3453. |
Identification Number: |
10.1016/j.eswa.2011.09.033 |
Bibliographic Citation: |
Brown, I., & Mues, C. (2012). An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Syst. Appl., 39, 3446-3453. |
Citation |
|
Title: |
Lee, T., & Chen, I. (2005). A two-stage hybrid credit scoring model using artificial neural networks and multivariate adaptive regression splines. Expert Syst. Appl., 28, 743-752. |
Identification Number: |
10.1016/j.eswa.2004.12.031 |
Bibliographic Citation: |
Lee, T., & Chen, I. (2005). A two-stage hybrid credit scoring model using artificial neural networks and multivariate adaptive regression splines. Expert Syst. Appl., 28, 743-752. |
Citation |
|
Title: |
Mahjoub, R., & Afsar, A. (2019). A hybrid model for customer credit scoring in stock brokerages using data mining approach. Int. J. Bus. Inf. Syst., 31, 195-214. |
Bibliographic Citation: |
Mahjoub, R., & Afsar, A. (2019). A hybrid model for customer credit scoring in stock brokerages using data mining approach. Int. J. Bus. Inf. Syst., 31, 195-214. |
Citation |
|
Title: |
Arram, A., Ayob, M., Albadr, M., Sulaiman, A., & Albashish, D. (2023). Credit card score prediction using machine learning models: A new dataset. ArXiv, abs/2310.02956. |
Identification Number: |
10.48550/arXiv.2310.02956 |
Bibliographic Citation: |
Arram, A., Ayob, M., Albadr, M., Sulaiman, A., & Albashish, D. (2023). Credit card score prediction using machine learning models: A new dataset. ArXiv, abs/2310.02956. |
Citation |
|
Title: |
Junior, L., Nardini, F., Renso, C., Trani, R., & Macêdo, J. (2020). A novel approach to define the local region of dynamic selection techniques in imbalanced credit scoring problems. Expert Syst. Appl., 152, 113351. |
Identification Number: |
10.1016/j.eswa.2020.113351 |
Bibliographic Citation: |
Junior, L., Nardini, F., Renso, C., Trani, R., & Macêdo, J. (2020). A novel approach to define the local region of dynamic selection techniques in imbalanced credit scoring problems. Expert Syst. Appl., 152, 113351. |
Label: |
Systematic Alternative Dataset Review.xlsx |
Text: |
The findings indicate that social media data provides valuable insights into consumer behaviour and financial reliability, while web browsing patterns and digital footprints offer additional dimensions to assess creditworthiness. Telecom data, including call records and mobile payment history, has proven to be a reliable indicator of financial behaviour. Hybrid approaches, which combine these various data sources, provide a more comprehensive assessment of credit risk, addressing the limitations of relying on a single type of alternative dataset. |
Notes: |
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet |