Prediction of protein secondary structure using binary classificationtrees, naive Bayes classifiers and the Logistic Regression Classifier
- Authors: Eldud Omer, Ahmed Abdelkarim
- Date: 2016
- Subjects: Bayesian statistical decision theory , Logistic regression analysis , Biostatistics , Proteins -- Structure
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:5581 , http://hdl.handle.net/10962/d1019985
- Description: The secondary structure of proteins is predicted using various binary classifiers. The data are adopted from the RS126 database. The original data consists of protein primary and secondary structure sequences. The original data is encoded using alphabetic letters. These data are encoded into unary vectors comprising ones and zeros only. Different binary classifiers, namely the naive Bayes, logistic regression and classification trees using hold-out and 5-fold cross validation are trained using the encoded data. For each of the classifiers three classification tasks are considered, namely helix against not helix (H/∼H), sheet against not sheet (S/∼S) and coil against not coil (C/∼C). The performance of these binary classifiers are compared using the overall accuracy in predicting the protein secondary structure for various window sizes. Our result indicate that hold-out cross validation achieved higher accuracy than 5-fold cross validation. The Naive Bayes classifier, using 5-fold cross validation achieved, the lowest accuracy for predicting helix against not helix. The classification tree classifiers, using 5-fold cross validation, achieved the lowest accuracies for both coil against not coil and sheet against not sheet classifications. The logistic regression classier accuracy is dependent on the window size; there is a positive relationship between the accuracy and window size. The logistic regression classier approach achieved the highest accuracy when compared to the classification tree and Naive Bayes classifiers for each classification task; predicting helix against not helix with accuracy 77.74 percent, for sheet against not sheet with accuracy 81.22 percent and for coil against not coil with accuracy 73.39 percent. It is noted that it is easier to compare classifiers if the classification process could be completely facilitated in R. Alternatively, it would be easier to assess these logistic regression classifiers if SPSS had a function to determine the accuracy of the logistic regression classifier.
- Full Text:
Characterisation of Human Hsj1a : an HSP40 molecular chaperone similar to Malarial Pfj4
- Authors: McNamara, Caryn
- Date: 2007
- Subjects: Heat shock proteins , Protein folding , Proteins -- Analysis , Proteins -- Structure , Plasmodium , Malaria , Molecular chaperones
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:4083 , http://hdl.handle.net/10962/d1007603
- Description: Protein folding, translocation, oligomeric rearrangement and degradation are vital functions to obtain correctly folded proteins in any cell. The constitutive or stress-induced members of each of the heat shock protein (Hsp) families, namely Hsp70 and Hsp40, make up the Hsp70/Hsp40 chaperone system. The Hsp40 J-domain is important for the Hsp70-Hsp40 interaction and hence function. The type-II Hsp40 proteins, Homo sapiens DnaJ 1a (Hsj1a) and Plasmodium falciparum DnaJ 4 (Pfj4), are structurally similar suggesting possible similar roles during malarial infection. This thesis has focussed on identifying whether Hsj1a and Pfj4 are functionally similar in their interaction with potential partner Hsp70 chaperones. Analysis in silico also showed Pfj4 to have a potential chaperone domain, a region resembling a ubiquitin-interacting motif (UIM) corresponding to UIM1 of HsjIa, and another highly conserved region was noted between residues 232-241. The highly conserved regions within the Hsp40 J-domains, and those amino acids therein, are suggested to be responsible for mediating this Hsp70-Hsp40 partner interaction. The thermosensitive dnaJ cbpA Escherichia coli OD259 mutant strain producing type-I Agrobacterium tumefaciens DnaJ (AgtDnaJ) was used as a model heterologous expression system in this study. AgtDnaJ was able to replace the lack of two E coli Hsp40s in vivo, DnaJ and CbpA, whereas AgtDnaJ(H33Q) was unable to. AgtDnaJ-based chimeras containing the swapped J-domains of similar type-II Hsp40 proteins, namely Hsj1Agt and Pfj4Agt, were also able to replace these in E. coli OD259. Conserved J-domain amino acids were identified and were substituted in these chimeras. Of these mutant proteins, Hsj IAgt(L8A), Hsj1Agt(R24A), Hsj1Agt(H31Q), Pfj4Agt(L 11A) and Pfj4Agt(H34Q) were not able to replace the E. coli Hsp40s, whilst Pfj4Agt(Y8A) and Pfj4Agt(R27A) were only able to partially replace them. This shows the leucine of helix I and the histidine of the loop region are key in the in vivo function of both proteins and that the arginine of helix II is key for Hsj1a. The histidine-tagged Hsj1a protein was also successfully purified from the heterologous system. The in vitro stimulated ATPase activity of human Hsp70 by Hsj1a was found to be approximately 14 nmol Pí[subscript]/min/mg, and yet not stimulated by Pfj4, suggesting a possible species-specific interaction is occurring.
- Full Text: