Replication Data for: Credit scoring of thin file consumers (doi:10.7910/DVN/6MLVVI)
(Innovative Data Approaches for Assessing Credit Risk in Limited Credit History Consumers)

View:

Part 1: Document Description
Part 2: Study Description
Part 3: Data Files Description
Part 4: Variable Description
Entire Codebook

(external link)

Document Description

Citation

Title:

Replication Data for: Credit scoring of thin file consumers

Identification Number:

doi:10.7910/DVN/6MLVVI

Distributor:

Harvard Dataverse

Date of Distribution:

2024-05-12

Version:

1

Bibliographic Citation:

Deepa Shukla, 2024, "Replication Data for: Credit scoring of thin file consumers", https://doi.org/10.7910/DVN/6MLVVI, Harvard Dataverse, V1, UNF:6:tIIKPwlCPuBzLVc1RbTvlQ== [fileUNF]

Study Description

Citation

Title:

Replication Data for: Credit scoring of thin file consumers

Subtitle:

Non-Traditional Data Sources to Enhance Creditworthiness

Alternative Title:

Innovative Data Approaches for Assessing Credit Risk in Limited Credit History Consumers

Identification Number:

doi:10.7910/DVN/6MLVVI

Identification Number:

API

Authoring Entity:

Deepa Shukla (Jaipur National University)

Other identifications and acknowledgements:

Shukla, Deepa

Other identifications and acknowledgements:

Gupta, Sunil

Producer:

Gupta, Sunil

Date of Production:

2024-05-13

Software used in Production:

Python

Distributor:

Harvard Dataverse

Access Authority:

Sunil Gupta

Depositor:

Shukla, Deepa

Date of Deposit:

2024-05-12

Holdings Information:

https://doi.org/10.7910/DVN/6MLVVI

Study Scope

Keywords:

Business and Management, Computer and Information Science, Machine Learning Algorithms, Credit Score, Thin File, Behavioural Finance

Topic Classification:

Digital Credit Scoring

Abstract:

The rapid evolution of machine learning (ML) offers transformative potential for the credit scoring industry, especially in addressing the challenges faced by "thin-file" consumers who lack substantial credit histories. Traditional credit scoring models often fail to accurately assess these consumers due to insufficient data, leading to potential exclusion from crucial credit services. This research leverages a synthetically created dataset, generated using advanced Python libraries like Pandas, NumPy, and Faker, to develop and refine ML algorithms capable of evaluating such underserved consumer segments. The synthetic nature of the dataset ensures compliance with privacy norms while allowing the simulation of diverse consumer behaviors—from stable to erratic financial activities—typical of thin-file profiles. This initiative not only drives innovation in algorithmic credit scoring but also aligns with broader objectives of financial inclusivity, aiming to bridge service gaps by equipping the financial industry with tools to fairly evaluate creditworthiness across all consumer segments. Thus, this dataset forms a critical cornerstone for advancing research that enhances technical capabilities and fosters societal progress through improved financial inclusion.

Kind of Data:

Synthetic Data

Notes:

The dataset in question is designed to facilitate a study in the development of machine learning algorithms specifically tailored for credit scoring of "thin-file" consumers. "Thin-file" consumers are individuals who have little to no credit history, which makes traditional credit scoring models less effective or entirely inapplicable. These consumers often face difficulties in accessing credit products because they cannot be easily assessed by standard credit risk evaluation methods.

Methodology and Processing

Sources Statement

Data Sources:

https://github.com/Deezpa/credit-score

Documentation and Access to Sources:

https://github.com/Deezpa/credit-score

Data Access

Other Study Description Materials

Related Publications

Citation

Title:

Citation Type : BibTex @misc{deepa_shukla_2024, title={synthetic credit score of thin-file consumers}, url={https://www.kaggle.com/dsv/8378342}, DOI={10.34740/KAGGLE/DSV/8378342}, publisher={Kaggle}, author={Deepa Shukla}, year={2024} }

Identification Number:

10.34740/KAGGLE/DSV/8378342

Bibliographic Citation:

Citation Type : BibTex @misc{deepa_shukla_2024, title={synthetic credit score of thin-file consumers}, url={https://www.kaggle.com/dsv/8378342}, DOI={10.34740/KAGGLE/DSV/8378342}, publisher={Kaggle}, author={Deepa Shukla}, year={2024} }

File Description--f10198032

File: 500Credit_Score_Dataset.tab

  • Number of cases: 500

  • No. of variables per record: 12

  • Type of File: text/tab-separated-values

Notes:

UNF:6:tIIKPwlCPuBzLVc1RbTvlQ==

Variable Description

List of Variables:

Variables

Profile_ID

f10198032 Location:

Summary Statistics: Mean 499567.554; Valid 500.0; StDev 285229.799921329; Max. 997174.0; Min. 85.0

Variable Format: numeric

Notes: UNF:6:/9UBTIvgNc+VEsmfHjxBxw==

Age

f10198032 Location:

Summary Statistics: Max. 64.0; Mean 40.94; Min. 18.0; StDev 13.561571902435569; Valid 500.0

Variable Format: numeric

Notes: UNF:6:LLZbWCypgm9Xtxk3GQkrBg==

Employment_Status

f10198032 Location:

Variable Format: character

Notes: UNF:6:kUOD9uw3QuK64xUz45mY6Q==

Residential_Stability

f10198032 Location:

Summary Statistics: Min. 0.0; Mean 14.941999999999997; Valid 500.0; StDev 8.702994900635124; Max. 29.0

Variable Format: numeric

Notes: UNF:6:JmpVvSopeTUW88TzNg9VhA==

Bank_Account_Tenure

f10198032 Location:

Summary Statistics: Max. 29.0; Min. 0.0; Valid 500.0; Mean 14.394; StDev 8.700008292443012

Variable Format: numeric

Notes: UNF:6:KRHpC8XFv8ViypfiKa9M6w==

Utility_Payment_History

f10198032 Location:

Variable Format: character

Notes: UNF:6:KOhZ6cvB2pYirYjIVI/ZlQ==

Rent_Payment_History

f10198032 Location:

Variable Format: character

Notes: UNF:6:vSON07EeW2tn4tY7/2rJYg==

Telecommunications_Payment_History

f10198032 Location:

Variable Format: character

Notes: UNF:6:MmpnqfR1Q/YLzXo3AGsOdA==

Educational_Background

f10198032 Location:

Variable Format: character

Notes: UNF:6:dDlk1xOYJfzCNzMDkxD63w==

Online_Shopping_Behavior

f10198032 Location:

Variable Format: character

Notes: UNF:6:iOdGFM3Ve5UURmNN/PStZg==

Social_Media_Footprint

f10198032 Location:

Variable Format: character

Notes: UNF:6:P/nz2LEbqBombIOZ384Wqg==

Gig_Economy_Participation

f10198032 Location:

Variable Format: character

Notes: UNF:6:hrUi0qsPXCQ+Z2c6fX1WTA==