Statistical classification, an application to credit default
- Authors: Sikhakhane, Anele Gcina
- Date: 2024-10-11
- Subjects: Uncatalogued
- Language: English
- Type: Academic theses , Master's theses , text
- Identifier: http://hdl.handle.net/10962/465069 , vital:76570
- Description: Statistical learning has been used in both industry and academia to create credit scoring models. These models are used to predict who might default on their loan repayments, thus minimizing the risk financial institutions face. In this study six traditional and one more recent classifier, namely kNN, LDA, CART, RF, AdaBoost, XGBoost and SynBoost were used to predict who might default on their loans. The data set used in this study was imbalanced thus sampling and performance evaluation techniques were investigated and used to balance the class distribution and assess the classifiers performance. In addition to the standard variables and data set, new variables called synthetic variables and synthetic data sets were produced, investigated and used to predict who might default on their loans. This study found that the synthetic data set had strong predictive power and sampling methods negatively affected the classifiers performance. The best-performing classifier was XGBoost, with an AUC score of 0.7732. , Thesis (MSc) -- Faculty of Science, Statistics, 2024
- Full Text:
- Date Issued: 2024-10-11
- Authors: Sikhakhane, Anele Gcina
- Date: 2024-10-11
- Subjects: Uncatalogued
- Language: English
- Type: Academic theses , Master's theses , text
- Identifier: http://hdl.handle.net/10962/465069 , vital:76570
- Description: Statistical learning has been used in both industry and academia to create credit scoring models. These models are used to predict who might default on their loan repayments, thus minimizing the risk financial institutions face. In this study six traditional and one more recent classifier, namely kNN, LDA, CART, RF, AdaBoost, XGBoost and SynBoost were used to predict who might default on their loans. The data set used in this study was imbalanced thus sampling and performance evaluation techniques were investigated and used to balance the class distribution and assess the classifiers performance. In addition to the standard variables and data set, new variables called synthetic variables and synthetic data sets were produced, investigated and used to predict who might default on their loans. This study found that the synthetic data set had strong predictive power and sampling methods negatively affected the classifiers performance. The best-performing classifier was XGBoost, with an AUC score of 0.7732. , Thesis (MSc) -- Faculty of Science, Statistics, 2024
- Full Text:
- Date Issued: 2024-10-11
The application of statistical classification to predict sovereign default
- Authors: Vele, Rendani
- Date: 2023-10-13
- Subjects: Uncatalogued
- Language: English
- Type: Academic theses , Master's theses , text
- Identifier: http://hdl.handle.net/10962/424563 , vital:72164
- Description: When considering sovereign loans, it is imperative for a financial institution to have a good understanding of the sovereign they are transacting with. Defaults can occur if proper evaluation steps are not considered. To aid in the prediction of potential sovereign defaults, financial institutions, together with grading companies, quantify the risk associated with issuing a loan to a sovereign by developing sovereign default early warning systems (EWS). Various classification models are considered in this study to develop sovereign default EWS. These models are the binary logit, probit, Bayesian additive regression trees, and artificial neural networks. This study investigates the predictive performance of the various classification techniques. Sovereign information is not readily available, so missing data techniques are considered in order to counter the data availability issue. Sovereign defaults are rare, which results in an imbalance in the distribution of the binary dependent variable. To assess data sets with such characteristics, metrics for imbalanced data are considered for model performance comparison. From the findings, the Bayesian additive regression technique generated better results than the other techniques when considering a basic data analysis. Moreover when cross-validation was considered, the neural network technique performed best. In addition, regional models had better results than the global model when considering model predictive capability. The significance of this study is to develop sovereign default prediction models using various classification techniques focused on enhancing previous literature and analysis through the application of Bayesian additive regression trees. , Thesis (MSc) -- Faculty of Science, Statistics, 2023
- Full Text:
- Date Issued: 2023-10-13
- Authors: Vele, Rendani
- Date: 2023-10-13
- Subjects: Uncatalogued
- Language: English
- Type: Academic theses , Master's theses , text
- Identifier: http://hdl.handle.net/10962/424563 , vital:72164
- Description: When considering sovereign loans, it is imperative for a financial institution to have a good understanding of the sovereign they are transacting with. Defaults can occur if proper evaluation steps are not considered. To aid in the prediction of potential sovereign defaults, financial institutions, together with grading companies, quantify the risk associated with issuing a loan to a sovereign by developing sovereign default early warning systems (EWS). Various classification models are considered in this study to develop sovereign default EWS. These models are the binary logit, probit, Bayesian additive regression trees, and artificial neural networks. This study investigates the predictive performance of the various classification techniques. Sovereign information is not readily available, so missing data techniques are considered in order to counter the data availability issue. Sovereign defaults are rare, which results in an imbalance in the distribution of the binary dependent variable. To assess data sets with such characteristics, metrics for imbalanced data are considered for model performance comparison. From the findings, the Bayesian additive regression technique generated better results than the other techniques when considering a basic data analysis. Moreover when cross-validation was considered, the neural network technique performed best. In addition, regional models had better results than the global model when considering model predictive capability. The significance of this study is to develop sovereign default prediction models using various classification techniques focused on enhancing previous literature and analysis through the application of Bayesian additive regression trees. , Thesis (MSc) -- Faculty of Science, Statistics, 2023
- Full Text:
- Date Issued: 2023-10-13
- «
- ‹
- 1
- ›
- »