- Title
- Protein secondary structure prediction using neural networks and support vector machines
- Creator
- Tsilo, Lipontseng Cecilia
- ThesisAdvisor
- Jäger, Gunther
- Subject
- Neural networks (Computer science)
- Subject
- Support vector machines
- Subject
- Proteins -- Structure -- Mathematical models
- Date
- 2009
- Type
- Thesis
- Type
- Masters
- Type
- MSc
- Identifier
- vital:5569
- Identifier
- http://hdl.handle.net/10962/d1002809
- Identifier
- Neural networks (Computer science)
- Identifier
- Support vector machines
- Identifier
- Proteins -- Structure -- Mathematical models
- Description
- Predicting the secondary structure of proteins is important in biochemistry because the 3D structure can be determined from the local folds that are found in secondary structures. Moreover, knowing the tertiary structure of proteins can assist in determining their functions. The objective of this thesis is to compare the performance of Neural Networks (NN) and Support Vector Machines (SVM) in predicting the secondary structure of 62 globular proteins from their primary sequence. For each NN and SVM, we created six binary classifiers to distinguish between the classes’ helices (H) strand (E), and coil (C). For NN we use Resilient Backpropagation training with and without early stopping. We use NN with either no hidden layer or with one hidden layer with 1,2,...,40 hidden neurons. For SVM we use a Gaussian kernel with parameter fixed at = 0.1 and varying cost parameters C in the range [0.1,5]. 10- fold cross-validation is used to obtain overall estimates for the probability of making a correct prediction. Our experiments indicate for NN and SVM that the different binary classifiers have varying accuracies: from 69% correct predictions for coils vs. non-coil up to 80% correct predictions for stand vs. non-strand. It is further demonstrated that NN with no hidden layer or not more than 2 hidden neurons in the hidden layer are sufficient for better predictions. For SVM we show that the estimated accuracies do not depend on the value of the cost parameter. As a major result, we will demonstrate that the accuracy estimates of NN and SVM binary classifiers cannot distinguish. This contradicts a modern belief in bioinformatics that SVM outperforms other predictors.
- Format
- 185 leaves, pdf
- Publisher
- Rhodes University, Faculty of Science, Statistics
- Language
- English
- Rights
- Tsilo, Lipontseng Cecilia
- Hits: 1547
- Visitors: 1824
- Downloads: 321
Thumbnail | File | Description | Size | Format | |||
---|---|---|---|---|---|---|---|
View Details | SOURCEPDF | 1 MB | Adobe Acrobat PDF | View Details |