Evaluation of the effectiveness of small aperture network telescopes as IBR data sources
- Authors: Chindipha, Stones Dalitso
- Date: 2023-03-31
- Subjects: Computer networks Monitoring , Computer networks Security measures , Computer bootstrapping , Time-series analysis , Regression analysis , Mathematical models
- Language: English
- Type: Academic theses , Doctoral theses , text
- Identifier: http://hdl.handle.net/10962/366264 , vital:65849 , DOI https://doi.org/10.21504/10962/366264
- Description: The use of network telescopes to collect unsolicited network traffic by monitoring unallocated address space has been in existence for over two decades. Past research has shown that there is a lot of activity happening in this unallocated space that needs monitoring as it carries threat intelligence data that has proven to be very useful in the security field. Prior to the emergence of the Internet of Things (IoT), commercialisation of IP addresses and widespread of mobile devices, there was a large pool of IPv4 addresses and thus reserving IPv4 addresses to be used for monitoring unsolicited activities going in the unallocated space was not a problem. Now, preservation of such IPv4 addresses just for monitoring is increasingly difficult as there is not enough free addresses in the IPv4 address space to be used for just monitoring. This is the case because such monitoring is seen as a ’non-productive’ use of the IP addresses. This research addresses the problem brought forth by this IPv4 address space exhaustion in relation to Internet Background Radiation (IBR) monitoring. In order to address the research questions, this research developed four mathematical models: Absolute Mean Accuracy Percentage Score (AMAPS), Symmetric Absolute Mean Accuracy Percentage Score (SAMAPS), Standardised Mean Absolute Error (SMAE), and Standardised Mean Absolute Scaled Error (SMASE). These models are used to evaluate the research objectives and quantify the variations that exist between different samples. The sample sizes represent different lens sizes of the telescopes. The study has brought to light a time series plot that shows the expected proportion of unique source IP addresses collected over time. The study also imputed data using the smaller /24 IPv4 net-block subnets to regenerate the missing data points using bootstrapping to create confidence intervals (CI). The findings from the simulated data supports the findings computed from the models. The CI offers a boost to decision making. Through a series of experiments with monthly and quarterly datasets, the study proposed a 95% - 99% confidence level to be used. It was known that large network telescopes collect more threat intelligence data than small-sized network telescopes, however, no study, to the best of our knowledge, has ever quantified such a knowledge gap. With the findings from the study, small-sized network telescope users can now use their network telescopes with full knowledge of gap that exists in the data collected between different network telescopes. , Thesis (PhD) -- Faculty of Science, Computer Science, 2023
- Full Text:
- Date Issued: 2023-03-31
- Authors: Chindipha, Stones Dalitso
- Date: 2023-03-31
- Subjects: Computer networks Monitoring , Computer networks Security measures , Computer bootstrapping , Time-series analysis , Regression analysis , Mathematical models
- Language: English
- Type: Academic theses , Doctoral theses , text
- Identifier: http://hdl.handle.net/10962/366264 , vital:65849 , DOI https://doi.org/10.21504/10962/366264
- Description: The use of network telescopes to collect unsolicited network traffic by monitoring unallocated address space has been in existence for over two decades. Past research has shown that there is a lot of activity happening in this unallocated space that needs monitoring as it carries threat intelligence data that has proven to be very useful in the security field. Prior to the emergence of the Internet of Things (IoT), commercialisation of IP addresses and widespread of mobile devices, there was a large pool of IPv4 addresses and thus reserving IPv4 addresses to be used for monitoring unsolicited activities going in the unallocated space was not a problem. Now, preservation of such IPv4 addresses just for monitoring is increasingly difficult as there is not enough free addresses in the IPv4 address space to be used for just monitoring. This is the case because such monitoring is seen as a ’non-productive’ use of the IP addresses. This research addresses the problem brought forth by this IPv4 address space exhaustion in relation to Internet Background Radiation (IBR) monitoring. In order to address the research questions, this research developed four mathematical models: Absolute Mean Accuracy Percentage Score (AMAPS), Symmetric Absolute Mean Accuracy Percentage Score (SAMAPS), Standardised Mean Absolute Error (SMAE), and Standardised Mean Absolute Scaled Error (SMASE). These models are used to evaluate the research objectives and quantify the variations that exist between different samples. The sample sizes represent different lens sizes of the telescopes. The study has brought to light a time series plot that shows the expected proportion of unique source IP addresses collected over time. The study also imputed data using the smaller /24 IPv4 net-block subnets to regenerate the missing data points using bootstrapping to create confidence intervals (CI). The findings from the simulated data supports the findings computed from the models. The CI offers a boost to decision making. Through a series of experiments with monthly and quarterly datasets, the study proposed a 95% - 99% confidence level to be used. It was known that large network telescopes collect more threat intelligence data than small-sized network telescopes, however, no study, to the best of our knowledge, has ever quantified such a knowledge gap. With the findings from the study, small-sized network telescope users can now use their network telescopes with full knowledge of gap that exists in the data collected between different network telescopes. , Thesis (PhD) -- Faculty of Science, Computer Science, 2023
- Full Text:
- Date Issued: 2023-03-31
A systematic methodology to evaluating optimised machine learning based network intrusion detection systems
- Authors: Chindove, Hatitye Ethridge
- Date: 2022-10-14
- Subjects: Intrusion detection systems (Computer security) , Machine learning , Computer networks Security measures , Principal components analysis
- Language: English
- Type: Academic theses , Master's theses , text
- Identifier: http://hdl.handle.net/10962/362774 , vital:65361
- Description: A network intrusion detection system (NIDS) is essential for mitigating computer network attacks in various scenarios. However, the increasing complexity of computer networks and attacks makes classifying unseen or novel network traffic challenging. Supervised machine learning techniques (ML) used in a NIDS can be affected by different scenarios. Thus, dataset recency, size, and applicability are essential factors when selecting and tuning a machine learning classifier. This thesis explores developing and optimising several supervised ML algorithms with relatively new datasets constructed to depict real-world scenarios. The methodology includes empirical analyses of systematic ML-based NIDS for a near real-world network system to improve intrusion detection. The thesis is experimental heavy for model assessment. Data preparation methods are explored, followed by feature engineering techniques. The model evaluation process involves three experiments testing against a validation, un-trained, and retrained set. They compare several traditional machine learning and deep learning classifiers to identify the best NIDS model. Results show that the focus on feature scaling, feature selection methods and ML algo- rithm hyper-parameter tuning per model is an essential optimisation component. Distance based ML algorithm performed much better with quantile transformation whilst the tree based algorithms performed better without scaling. Permutation importance performs as a feature selection method compared to feature extraction using Principal Component Analysis (PCA) when applied against all ML algorithms explored. Random forests, Sup- port Vector Machines and recurrent neural networks consistently achieved the best results with high macro f1-score results of 90% 81% and 73% for the CICIDS 2017 dataset; and 72% 68% and 73% against the CICIDS 2018 dataset. , Thesis (MSc) -- Faculty of Science, Computer Science, 2022
- Full Text:
- Date Issued: 2022-10-14
- Authors: Chindove, Hatitye Ethridge
- Date: 2022-10-14
- Subjects: Intrusion detection systems (Computer security) , Machine learning , Computer networks Security measures , Principal components analysis
- Language: English
- Type: Academic theses , Master's theses , text
- Identifier: http://hdl.handle.net/10962/362774 , vital:65361
- Description: A network intrusion detection system (NIDS) is essential for mitigating computer network attacks in various scenarios. However, the increasing complexity of computer networks and attacks makes classifying unseen or novel network traffic challenging. Supervised machine learning techniques (ML) used in a NIDS can be affected by different scenarios. Thus, dataset recency, size, and applicability are essential factors when selecting and tuning a machine learning classifier. This thesis explores developing and optimising several supervised ML algorithms with relatively new datasets constructed to depict real-world scenarios. The methodology includes empirical analyses of systematic ML-based NIDS for a near real-world network system to improve intrusion detection. The thesis is experimental heavy for model assessment. Data preparation methods are explored, followed by feature engineering techniques. The model evaluation process involves three experiments testing against a validation, un-trained, and retrained set. They compare several traditional machine learning and deep learning classifiers to identify the best NIDS model. Results show that the focus on feature scaling, feature selection methods and ML algo- rithm hyper-parameter tuning per model is an essential optimisation component. Distance based ML algorithm performed much better with quantile transformation whilst the tree based algorithms performed better without scaling. Permutation importance performs as a feature selection method compared to feature extraction using Principal Component Analysis (PCA) when applied against all ML algorithms explored. Random forests, Sup- port Vector Machines and recurrent neural networks consistently achieved the best results with high macro f1-score results of 90% 81% and 73% for the CICIDS 2017 dataset; and 72% 68% and 73% against the CICIDS 2018 dataset. , Thesis (MSc) -- Faculty of Science, Computer Science, 2022
- Full Text:
- Date Issued: 2022-10-14
- «
- ‹
- 1
- ›
- »