Statistical and Mathematical Learning: an application to fraud detection and prevention
- Authors: Hamlomo, Sisipho
- Date: 2022-04-06
- Subjects: Credit card fraud , Bootstrap (Statistics) , Support vector machines , Neural networks (Computer science) , Decision trees , Machine learning , Cross-validation , Imbalanced data
- Language: English
- Type: Master's thesis , text
- Identifier: http://hdl.handle.net/10962/233795 , vital:50128
- Description: Credit card fraud is an ever-growing problem. There has been a rapid increase in the rate of fraudulent activities in recent years resulting in a considerable loss to several organizations, companies, and government agencies. Many researchers have focused on detecting fraudulent behaviours early using advanced machine learning techniques. However, credit card fraud detection is not a straightforward task since fraudulent behaviours usually differ for each attempt and the dataset is highly imbalanced, that is, the frequency of non-fraudulent cases outnumbers the frequency of fraudulent cases. In the case of the European credit card dataset, we have a ratio of approximately one fraudulent case to five hundred and seventy-eight non-fraudulent cases. Different methods were implemented to overcome this problem, namely random undersampling, one-sided sampling, SMOTE combined with Tomek links and parameter tuning. Predictive classifiers, namely logistic regression, decision trees, k-nearest neighbour, support vector machine and multilayer perceptrons, are applied to predict if a transaction is fraudulent or non-fraudulent. The model's performance is evaluated based on recall, precision, F1-score, the area under receiver operating characteristics curve, geometric mean and Matthew correlation coefficient. The results showed that the logistic regression classifier performed better than other classifiers except when the dataset was oversampled. , Thesis (MSc) -- Faculty of Science, Statistics, 2022
- Full Text:
- Date Issued: 2022-04-06
- Authors: Hamlomo, Sisipho
- Date: 2022-04-06
- Subjects: Credit card fraud , Bootstrap (Statistics) , Support vector machines , Neural networks (Computer science) , Decision trees , Machine learning , Cross-validation , Imbalanced data
- Language: English
- Type: Master's thesis , text
- Identifier: http://hdl.handle.net/10962/233795 , vital:50128
- Description: Credit card fraud is an ever-growing problem. There has been a rapid increase in the rate of fraudulent activities in recent years resulting in a considerable loss to several organizations, companies, and government agencies. Many researchers have focused on detecting fraudulent behaviours early using advanced machine learning techniques. However, credit card fraud detection is not a straightforward task since fraudulent behaviours usually differ for each attempt and the dataset is highly imbalanced, that is, the frequency of non-fraudulent cases outnumbers the frequency of fraudulent cases. In the case of the European credit card dataset, we have a ratio of approximately one fraudulent case to five hundred and seventy-eight non-fraudulent cases. Different methods were implemented to overcome this problem, namely random undersampling, one-sided sampling, SMOTE combined with Tomek links and parameter tuning. Predictive classifiers, namely logistic regression, decision trees, k-nearest neighbour, support vector machine and multilayer perceptrons, are applied to predict if a transaction is fraudulent or non-fraudulent. The model's performance is evaluated based on recall, precision, F1-score, the area under receiver operating characteristics curve, geometric mean and Matthew correlation coefficient. The results showed that the logistic regression classifier performed better than other classifiers except when the dataset was oversampled. , Thesis (MSc) -- Faculty of Science, Statistics, 2022
- Full Text:
- Date Issued: 2022-04-06
Bootstrap-based tolerance intervals for photovoltaic energy yield assessments
- Authors: Deyzel, Jani Igna
- Date: 2019
- Subjects: Bootstrap (Statistics) , Mathematical statistics Photovoltaic power systems
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10948/39469 , vital:35256
- Description: The assessment of the energy yield of a photovoltaic (PV) system is one of the key assessments required by investors and developers. Currently, available methods used for this assessment only provide a point estimate as the final assessment. This study proposes a statistical technique which provides an additional energy yield assessment method by using tolerance intervals. Variance component models are used to better account for the variability present in the daily and hourly energy yields of three different PV modules. A bootstrap-based technique is used to obtain 𝛽-expectation and (𝛼,𝛽) two-sided tolerance intervals. These tolerance intervals provided more information with a content and confidence level for seasonal and yearly time-periods. In addition, the comparisons of the PV modules provide valuable information to investors and developers.
- Full Text:
- Date Issued: 2019
- Authors: Deyzel, Jani Igna
- Date: 2019
- Subjects: Bootstrap (Statistics) , Mathematical statistics Photovoltaic power systems
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10948/39469 , vital:35256
- Description: The assessment of the energy yield of a photovoltaic (PV) system is one of the key assessments required by investors and developers. Currently, available methods used for this assessment only provide a point estimate as the final assessment. This study proposes a statistical technique which provides an additional energy yield assessment method by using tolerance intervals. Variance component models are used to better account for the variability present in the daily and hourly energy yields of three different PV modules. A bootstrap-based technique is used to obtain 𝛽-expectation and (𝛼,𝛽) two-sided tolerance intervals. These tolerance intervals provided more information with a content and confidence level for seasonal and yearly time-periods. In addition, the comparisons of the PV modules provide valuable information to investors and developers.
- Full Text:
- Date Issued: 2019
- «
- ‹
- 1
- ›
- »