Comparative analysis of YOLOV5 and YOLOV8 for automated fish detection and classification in underwater environments
- Authors: Kuhlane, Luxolo
- Date: 2024-10-11
- Subjects: Artificial intelligence , Deep learning (Machine learning) , Machine learning , Neural networks (Computer science) , You Only Look Once , YOLOv5 , YOLOv8
- Language: English
- Type: Academic theses , Master's theses , text
- Identifier: http://hdl.handle.net/10962/464333 , vital:76502
- Description: The application of traditional manual techniques for fish detection and classification faces significant challenges, primarily stemming from their labour-intensive nature and limited scalability. Automating these kinds of processes through computer vision practices and machine learning techniques has emerged as a potential solution in recent years. With the development of and increase in ease of access to new technology in recent years, the use of a deep learning object detector known as YOLO (You Only Look Once) in the detection and classification of fish has steadily become notably popular. This thesis thus explores suitable YOLO architectures for detecting and classifying fish. The YOLOv5 and YOLOv8 models were evaluated explicitly for detecting and classifying fish in underwater environments. The selection of these models was based on a literature review highlighting their success in similar applications but remains largely understudied in underwater environments. Therefore, the effectiveness of these models was evaluated through comprehensive experimentation on collected and publicly available underwater fish datasets. In collaboration with the South African Institute of Biodiversity (SAIAB), five datasets were collected and manually annotated for labels for supervised machine learning. Moreover, two publicly available datasets were sourced for comparison to the literature. Furthermore, after determining that the smallest YOLO architectures are better suited to these imbalanced datasets, hyperparameter tuning tailored the models to the characteristics of the various underwater environments used in the research. The popular DeepFish dataset was evaluated to establish a baseline and feasibility of these models in the understudied domain. The results demonstrated high detection accuracy for both YOLOv5 and YOLOv8. However, YOLOv8 outperformed YOLOv5, achieving 97.43% accuracy compared to 94.53%. After experiments on seven datasets, trends revealed YOLOv8’s enhanced generalisation accuracy due to architectural improvements, particularly in detecting smaller fish. Overall, YOLOv8 demonstrated that it is the better fish detection and classification model on diverse data. , Thesis (MSc) -- Faculty of Science, Computer Science, 2024
- Full Text:
- Authors: Kuhlane, Luxolo
- Date: 2024-10-11
- Subjects: Artificial intelligence , Deep learning (Machine learning) , Machine learning , Neural networks (Computer science) , You Only Look Once , YOLOv5 , YOLOv8
- Language: English
- Type: Academic theses , Master's theses , text
- Identifier: http://hdl.handle.net/10962/464333 , vital:76502
- Description: The application of traditional manual techniques for fish detection and classification faces significant challenges, primarily stemming from their labour-intensive nature and limited scalability. Automating these kinds of processes through computer vision practices and machine learning techniques has emerged as a potential solution in recent years. With the development of and increase in ease of access to new technology in recent years, the use of a deep learning object detector known as YOLO (You Only Look Once) in the detection and classification of fish has steadily become notably popular. This thesis thus explores suitable YOLO architectures for detecting and classifying fish. The YOLOv5 and YOLOv8 models were evaluated explicitly for detecting and classifying fish in underwater environments. The selection of these models was based on a literature review highlighting their success in similar applications but remains largely understudied in underwater environments. Therefore, the effectiveness of these models was evaluated through comprehensive experimentation on collected and publicly available underwater fish datasets. In collaboration with the South African Institute of Biodiversity (SAIAB), five datasets were collected and manually annotated for labels for supervised machine learning. Moreover, two publicly available datasets were sourced for comparison to the literature. Furthermore, after determining that the smallest YOLO architectures are better suited to these imbalanced datasets, hyperparameter tuning tailored the models to the characteristics of the various underwater environments used in the research. The popular DeepFish dataset was evaluated to establish a baseline and feasibility of these models in the understudied domain. The results demonstrated high detection accuracy for both YOLOv5 and YOLOv8. However, YOLOv8 outperformed YOLOv5, achieving 97.43% accuracy compared to 94.53%. After experiments on seven datasets, trends revealed YOLOv8’s enhanced generalisation accuracy due to architectural improvements, particularly in detecting smaller fish. Overall, YOLOv8 demonstrated that it is the better fish detection and classification model on diverse data. , Thesis (MSc) -- Faculty of Science, Computer Science, 2024
- Full Text:
Enhancing licence plate recognition for a robust vehicle re-identification system
- Authors: Boby, Alden Zachary
- Date: 2024-10-11
- Subjects: Automobile theft South Africa , Deep learning (Machine learning) , Object detection , YOLOv7 , YOLO , Pattern recognition systems , Image processing Digital techniques , Automobile license plates
- Language: English
- Type: Academic theses , Master's theses , text
- Identifier: http://hdl.handle.net/10962/464322 , vital:76501
- Description: Vehicle security is a growing concern for citizens of South Africa. Law enforcement relies on reports and security camera footage for vehicle identification but struggles to match the increasing number of carjacking incidents and low vehicle recovery rates. Security camera footage offers an accessible means to identify stolen vehicles, yet it often poses hurdles like anamorphic plates and low resolution. Furthermore, depending on human operators proves inefficient, requiring faster processes to improve vehicle recovery rates and trust in law enforcement. The integration of deep learning has revolutionised object detection algorithms, increasing the popularity of vehicle tracking for security purposes. This thesis investigates advanced deep-learning methods for a comprehensive vehicle search and re-identification system. It enhances YOLOv7’s algorithmic capabilities and employs preprocessing techniques like super-resolution and perspective correction via the Improved Warped Planar Object Detection network for more effective licence plate optical character recognition. Key contributions include a specifically annotated dataset for training object detection models, an optical character recognition model based on YOLOv7, and a method for identifying vehicles in unrestricted data. The system detected rectangular and square licence plates without prior shape knowledge, achieving a 98.7% character recognition rate compared to 95.31% in related work. Moreover, it outperformed traditional optical character recognition by 28.25% and deep-learning EasyOCR by 14.18%. Its potential applications in law enforcement, traffic management, and parking systems can improve surveillance and security through automation. , Thesis (MSc) -- Faculty of Science, Computer Science, 2024
- Full Text:
- Authors: Boby, Alden Zachary
- Date: 2024-10-11
- Subjects: Automobile theft South Africa , Deep learning (Machine learning) , Object detection , YOLOv7 , YOLO , Pattern recognition systems , Image processing Digital techniques , Automobile license plates
- Language: English
- Type: Academic theses , Master's theses , text
- Identifier: http://hdl.handle.net/10962/464322 , vital:76501
- Description: Vehicle security is a growing concern for citizens of South Africa. Law enforcement relies on reports and security camera footage for vehicle identification but struggles to match the increasing number of carjacking incidents and low vehicle recovery rates. Security camera footage offers an accessible means to identify stolen vehicles, yet it often poses hurdles like anamorphic plates and low resolution. Furthermore, depending on human operators proves inefficient, requiring faster processes to improve vehicle recovery rates and trust in law enforcement. The integration of deep learning has revolutionised object detection algorithms, increasing the popularity of vehicle tracking for security purposes. This thesis investigates advanced deep-learning methods for a comprehensive vehicle search and re-identification system. It enhances YOLOv7’s algorithmic capabilities and employs preprocessing techniques like super-resolution and perspective correction via the Improved Warped Planar Object Detection network for more effective licence plate optical character recognition. Key contributions include a specifically annotated dataset for training object detection models, an optical character recognition model based on YOLOv7, and a method for identifying vehicles in unrestricted data. The system detected rectangular and square licence plates without prior shape knowledge, achieving a 98.7% character recognition rate compared to 95.31% in related work. Moreover, it outperformed traditional optical character recognition by 28.25% and deep-learning EasyOCR by 14.18%. Its potential applications in law enforcement, traffic management, and parking systems can improve surveillance and security through automation. , Thesis (MSc) -- Faculty of Science, Computer Science, 2024
- Full Text:
Investigating unimodal isolated signer-independent sign language recognition
- Authors: Marais, Marc Jason
- Date: 2024-04-04
- Subjects: Convolutional neural network , Sign language recognition , Human activity recognition , Pattern recognition systems , Neural networks (Computer science)
- Language: English
- Type: Academic theses , Master's theses , text
- Identifier: http://hdl.handle.net/10962/435343 , vital:73149
- Description: Sign language serves as the mode of communication for the Deaf and Hard of Hearing community, embodying a rich linguistic and cultural heritage. Recent Sign Language Recognition (SLR) system developments aim to facilitate seamless communication between the Deaf community and the broader society. However, most existing systems are limited by signer-dependent models, hindering their adaptability to diverse signing styles and signers, thus impeding their practical implementation in real-world scenarios. This research explores various unimodal approaches, both pose-based and vision-based, for isolated signer-independent SLR using RGB video input on the LSA64 and AUTSL datasets. The unimodal RGB-only input strategy provides a realistic SLR setting where alternative data sources are either unavailable or necessitate specialised equipment. Through systematic testing scenarios, isolated signer-independent SLR experiments are conducted on both datasets, primarily focusing on AUTSL – a signer-independent dataset. The vision-based R(2+1)D-18 model emerged as the top performer, achieving 90.64% accuracy on the unseen AUTSL dataset test split, closely followed by the pose-based Spatio- Temporal Graph Convolutional Network (ST-GCN) model with an accuracy of 89.95%. Furthermore, these models achieved comparable accuracies at a significantly lower computational demand. Notably, the pose-based approach demonstrates robust generalisation to substantial background and signer variation. Moreover, the pose-based approach demands significantly less computational power and training time than vision-based approaches. The proposed unimodal pose-based and vision-based systems were concluded to both be effective at classifying sign classes in the LSA64 and AUTSL datasets. , Thesis (MSc) -- Faculty of Science, Ichthyology and Fisheries Science, 2024
- Full Text:
- Authors: Marais, Marc Jason
- Date: 2024-04-04
- Subjects: Convolutional neural network , Sign language recognition , Human activity recognition , Pattern recognition systems , Neural networks (Computer science)
- Language: English
- Type: Academic theses , Master's theses , text
- Identifier: http://hdl.handle.net/10962/435343 , vital:73149
- Description: Sign language serves as the mode of communication for the Deaf and Hard of Hearing community, embodying a rich linguistic and cultural heritage. Recent Sign Language Recognition (SLR) system developments aim to facilitate seamless communication between the Deaf community and the broader society. However, most existing systems are limited by signer-dependent models, hindering their adaptability to diverse signing styles and signers, thus impeding their practical implementation in real-world scenarios. This research explores various unimodal approaches, both pose-based and vision-based, for isolated signer-independent SLR using RGB video input on the LSA64 and AUTSL datasets. The unimodal RGB-only input strategy provides a realistic SLR setting where alternative data sources are either unavailable or necessitate specialised equipment. Through systematic testing scenarios, isolated signer-independent SLR experiments are conducted on both datasets, primarily focusing on AUTSL – a signer-independent dataset. The vision-based R(2+1)D-18 model emerged as the top performer, achieving 90.64% accuracy on the unseen AUTSL dataset test split, closely followed by the pose-based Spatio- Temporal Graph Convolutional Network (ST-GCN) model with an accuracy of 89.95%. Furthermore, these models achieved comparable accuracies at a significantly lower computational demand. Notably, the pose-based approach demonstrates robust generalisation to substantial background and signer variation. Moreover, the pose-based approach demands significantly less computational power and training time than vision-based approaches. The proposed unimodal pose-based and vision-based systems were concluded to both be effective at classifying sign classes in the LSA64 and AUTSL datasets. , Thesis (MSc) -- Faculty of Science, Ichthyology and Fisheries Science, 2024
- Full Text:
- «
- ‹
- 1
- ›
- »