Statistical classification, an application to credit default
- Authors: Sikhakhane, Anele Gcina
- Date: 2024-10-11
- Subjects: Binary classification , Default (Finance) , Credit cards , Credit risk , Machine learning , Variables (Mathematics)
- Language: English
- Type: Academic theses , Master's theses , text
- Identifier: http://hdl.handle.net/10962/465069 , vital:76570
- Description: Statistical learning has been used in both industry and academia to create credit scoring models. These models are used to predict who might default on their loan repayments, thus minimizing the risk financial institutions face. In this study six traditional and one more recent classifier, namely kNN, LDA, CART, RF, AdaBoost, XGBoost and SynBoost were used to predict who might default on their loans. The data set used in this study was imbalanced thus sampling and performance evaluation techniques were investigated and used to balance the class distribution and assess the classifiers performance. In addition to the standard variables and data set, new variables called synthetic variables and synthetic data sets were produced, investigated and used to predict who might default on their loans. This study found that the synthetic data set had strong predictive power and sampling methods negatively affected the classifiers performance. The best-performing classifier was XGBoost, with an AUC score of 0.7732. , Thesis (MSc) -- Faculty of Science, Statistics, 2024
- Full Text:
- Date Issued: 2024-10-11
- Authors: Sikhakhane, Anele Gcina
- Date: 2024-10-11
- Subjects: Binary classification , Default (Finance) , Credit cards , Credit risk , Machine learning , Variables (Mathematics)
- Language: English
- Type: Academic theses , Master's theses , text
- Identifier: http://hdl.handle.net/10962/465069 , vital:76570
- Description: Statistical learning has been used in both industry and academia to create credit scoring models. These models are used to predict who might default on their loan repayments, thus minimizing the risk financial institutions face. In this study six traditional and one more recent classifier, namely kNN, LDA, CART, RF, AdaBoost, XGBoost and SynBoost were used to predict who might default on their loans. The data set used in this study was imbalanced thus sampling and performance evaluation techniques were investigated and used to balance the class distribution and assess the classifiers performance. In addition to the standard variables and data set, new variables called synthetic variables and synthetic data sets were produced, investigated and used to predict who might default on their loans. This study found that the synthetic data set had strong predictive power and sampling methods negatively affected the classifiers performance. The best-performing classifier was XGBoost, with an AUC score of 0.7732. , Thesis (MSc) -- Faculty of Science, Statistics, 2024
- Full Text:
- Date Issued: 2024-10-11
Suspicious activity reports: Enhancing the detection of terrorist financing and suspicious transactions in migrant remittances
- Authors: Mbiva, Stanley Munamato
- Date: 2024-10-11
- Subjects: Migrant remittances , Terrorism financing , Machine learning , Outliers (Statistics) , Anomaly detection (Computer security) , Unsupervised learning
- Language: English
- Type: Academic theses , Master's theses , text
- Identifier: http://hdl.handle.net/10962/465058 , vital:76569
- Description: Migrant remittances have become an important factor in poverty alleviation and microeconomic development in low-income nations. Global migrant remittances are expected to exceed US $630 billion by 2023, according to the World Bank. In addition to offering an alternate source of income that supplements the recipient’s household earnings, they are less likely to be affected by global economic downturns, ensuring stability and a consistent stream of revenue. However, the ease of global migrant remittance financial transfers has attracted the risk of being abused by terrorist organizations to quickly move and conceal operating cash, hence facilitating terrorist financing. This study aims to develop an unsupervised machine-learning model capable of detecting suspicious financial transactions associated with terrorist financing in migrant remittances. The data used in this study came from a World Bank survey of migrant remitters in Belgium. To understand the natural structures and grouping in the dataset, agglomerative hierarchical clustering and k-prototype clustering techniques were employed. This established the number of clusters present in the dataset making it possible to compare individual migrant remittances in the dataset with their peers. A Structural Equation Model (SEM) and an Local Outlier Factor - Isolation Forest (LOF-IF) algorithm were applied to analyze and detect suspicious transactions in the dataset. A traditional Rule-Based Method (RBM) was also created as a benchmark algorithm that evaluates model performance. The results show that the SEM model classifies a significantly high number of transactions as suspicious, making it prone to detecting false positives. Finally, the study applied the proposed ensemble outlier detection model to detect suspicious transactions in the same data set. The proposed ensemble model utilized an Isolation Forest (IF) for pruning and a Local Outlier Factor (LOF) to detect local outliers. The model performed exceptionally well, being able to detect over 90% of suspicious transactions in the testing data set during model cross-validation. , Thesis (MSc) -- Faculty of Science, Statistics, 2024
- Full Text:
- Date Issued: 2024-10-11
- Authors: Mbiva, Stanley Munamato
- Date: 2024-10-11
- Subjects: Migrant remittances , Terrorism financing , Machine learning , Outliers (Statistics) , Anomaly detection (Computer security) , Unsupervised learning
- Language: English
- Type: Academic theses , Master's theses , text
- Identifier: http://hdl.handle.net/10962/465058 , vital:76569
- Description: Migrant remittances have become an important factor in poverty alleviation and microeconomic development in low-income nations. Global migrant remittances are expected to exceed US $630 billion by 2023, according to the World Bank. In addition to offering an alternate source of income that supplements the recipient’s household earnings, they are less likely to be affected by global economic downturns, ensuring stability and a consistent stream of revenue. However, the ease of global migrant remittance financial transfers has attracted the risk of being abused by terrorist organizations to quickly move and conceal operating cash, hence facilitating terrorist financing. This study aims to develop an unsupervised machine-learning model capable of detecting suspicious financial transactions associated with terrorist financing in migrant remittances. The data used in this study came from a World Bank survey of migrant remitters in Belgium. To understand the natural structures and grouping in the dataset, agglomerative hierarchical clustering and k-prototype clustering techniques were employed. This established the number of clusters present in the dataset making it possible to compare individual migrant remittances in the dataset with their peers. A Structural Equation Model (SEM) and an Local Outlier Factor - Isolation Forest (LOF-IF) algorithm were applied to analyze and detect suspicious transactions in the dataset. A traditional Rule-Based Method (RBM) was also created as a benchmark algorithm that evaluates model performance. The results show that the SEM model classifies a significantly high number of transactions as suspicious, making it prone to detecting false positives. Finally, the study applied the proposed ensemble outlier detection model to detect suspicious transactions in the same data set. The proposed ensemble model utilized an Isolation Forest (IF) for pruning and a Local Outlier Factor (LOF) to detect local outliers. The model performed exceptionally well, being able to detect over 90% of suspicious transactions in the testing data set during model cross-validation. , Thesis (MSc) -- Faculty of Science, Statistics, 2024
- Full Text:
- Date Issued: 2024-10-11
Statistical and Mathematical Learning: an application to fraud detection and prevention
- Authors: Hamlomo, Sisipho
- Date: 2022-04-06
- Subjects: Credit card fraud , Bootstrap (Statistics) , Support vector machines , Neural networks (Computer science) , Decision trees , Machine learning , Cross-validation , Imbalanced data
- Language: English
- Type: Master's thesis , text
- Identifier: http://hdl.handle.net/10962/233795 , vital:50128
- Description: Credit card fraud is an ever-growing problem. There has been a rapid increase in the rate of fraudulent activities in recent years resulting in a considerable loss to several organizations, companies, and government agencies. Many researchers have focused on detecting fraudulent behaviours early using advanced machine learning techniques. However, credit card fraud detection is not a straightforward task since fraudulent behaviours usually differ for each attempt and the dataset is highly imbalanced, that is, the frequency of non-fraudulent cases outnumbers the frequency of fraudulent cases. In the case of the European credit card dataset, we have a ratio of approximately one fraudulent case to five hundred and seventy-eight non-fraudulent cases. Different methods were implemented to overcome this problem, namely random undersampling, one-sided sampling, SMOTE combined with Tomek links and parameter tuning. Predictive classifiers, namely logistic regression, decision trees, k-nearest neighbour, support vector machine and multilayer perceptrons, are applied to predict if a transaction is fraudulent or non-fraudulent. The model's performance is evaluated based on recall, precision, F1-score, the area under receiver operating characteristics curve, geometric mean and Matthew correlation coefficient. The results showed that the logistic regression classifier performed better than other classifiers except when the dataset was oversampled. , Thesis (MSc) -- Faculty of Science, Statistics, 2022
- Full Text:
- Date Issued: 2022-04-06
- Authors: Hamlomo, Sisipho
- Date: 2022-04-06
- Subjects: Credit card fraud , Bootstrap (Statistics) , Support vector machines , Neural networks (Computer science) , Decision trees , Machine learning , Cross-validation , Imbalanced data
- Language: English
- Type: Master's thesis , text
- Identifier: http://hdl.handle.net/10962/233795 , vital:50128
- Description: Credit card fraud is an ever-growing problem. There has been a rapid increase in the rate of fraudulent activities in recent years resulting in a considerable loss to several organizations, companies, and government agencies. Many researchers have focused on detecting fraudulent behaviours early using advanced machine learning techniques. However, credit card fraud detection is not a straightforward task since fraudulent behaviours usually differ for each attempt and the dataset is highly imbalanced, that is, the frequency of non-fraudulent cases outnumbers the frequency of fraudulent cases. In the case of the European credit card dataset, we have a ratio of approximately one fraudulent case to five hundred and seventy-eight non-fraudulent cases. Different methods were implemented to overcome this problem, namely random undersampling, one-sided sampling, SMOTE combined with Tomek links and parameter tuning. Predictive classifiers, namely logistic regression, decision trees, k-nearest neighbour, support vector machine and multilayer perceptrons, are applied to predict if a transaction is fraudulent or non-fraudulent. The model's performance is evaluated based on recall, precision, F1-score, the area under receiver operating characteristics curve, geometric mean and Matthew correlation coefficient. The results showed that the logistic regression classifier performed better than other classifiers except when the dataset was oversampled. , Thesis (MSc) -- Faculty of Science, Statistics, 2022
- Full Text:
- Date Issued: 2022-04-06
- «
- ‹
- 1
- ›
- »