Bioinformatics tool and web server development focusing on structural bioinformatics applications
- Authors: Nabatanzi, Margaret
- Date: 2022-10-14
- Subjects: Structural bioinformatics , Proteins Structure , Protein structure prediction , Proteins Conformation , Protein complex
- Language: English
- Type: Academic theses , Doctoral theses , text
- Identifier: http://hdl.handle.net/10962/365700 , vital:65777 , DOI https://doi.org/10.21504/10962/365700
- Description: This thesis is divided into two main sections: Part 1 describes the design, and evaluation of the accuracy of a new web server – PRotein Interactive MOdeling (PRIMO-Complexes) for modeling protein complexes and biological assemblies. The second part describes the development of bioinformatics tools to predict HIV-1 drug resistance and support bioinformatics research and education. Recent technological advances have resulted in a tremendous increase in the number of sequences and protein structures deposited in the Universal Protein Resource Knowledgebase (UniProtKB) and the Protein Data Bank (PDB). However, the number of sequences has increased at a higher rate compared with the experimentally solved multimeric protein structures. This is partly due to advances in high-throughput sequencing technology. To fill this protein sequence-structure gap, computational approaches have been developed to predict protein structures from available sequences. Computational approaches include template-based and ab initio modeling with the former being the most reliable. Template-based modeling process can be achieved using either standalone software or automated modeling web servers. However, using standalone software requires familiarity with command-line interfaces as well as utilising other intermediate programs which could be daunting to novice users. To alleviate some of these problems, the modeling process has been automated, however, it still has numerous challenges. To date, only a few web servers that support multimeric protein modeling have been developed and even these provide little, if any user involvement in the process. To address some of these issues, a new web server – PRIMO-Complexes – was developed to model protein complexes and biological assemblies. The existing PRIMO web server could only model monomeric proteins. Part 1 of this thesis provides a detailed account of the development and evaluation of PRIMO-Complexes. The rationale for developing this new web server was based on the understanding that most proteins function as protein multimers and often the ligand-binding sites, and enzyme active sites are located at the protein-protein interfaces. It, therefore, necessitated developing capabilities for modeling multimeric proteins. PRIMO-Complexes web server was developed using the Waterfall system development life cycle model, is based on the Django web framework and makes use of high-performance computing resources to execute jobs. The accuracy of the algorithms embedded in PRIMO- Complexes was evaluated and the results were promising. Additionally, PRIMO-Complexes performs comparatively well in relation to other web servers that offer multimeric protein modeling. Another unique feature of PRIMO-Complexes is its interactivity. The webserver was developed with capabilities for allowing users to model multimeric proteins with an appreciable degree of control over the process. In the second part of the thesis several other bioinformatics tools are described, for example, a webserver for predicting HIV-1 drug resistance, the RUBi protein model repository, and a bioinformatics web portal for education and research resources. RUBi protein model repository stores verified theoretical models built using various modeling approaches. This enables users to easily access models to reproduce and/or further the research. This is described in chapter 5. Chapter 6 describes the design and development of the Human Immunodeficiency type 1 Resistance Predictor (HIV-1 ResPredictor), a web application that employs artificial neural networks (ANN) to predict drug resistance in patients infected with HIV-1 subtype B. The ANNs and subtype classifiers performed well making this web application potentially useful to both clinicians and researchers in this era of personalised medicine. Finally, chapter 7 describes a bioinformatics education web portal that equips students with information on how to use bioinformatics online resources. Being aware of these resources is not enough without a deeper understanding and guidance on how to apply bioinformatics methods to solve practical problems. This web portal was aimed at familiarising students with the basic terminology and approaches in structural bioinformatics. Students will potentially gain skills to conduct real-life bioinformatics research to obtain biological insights. , Thesis (PhD) -- Faculty of Science, Biochemistry and Microbiology, 2022
- Full Text:
- Date Issued: 2022-10-14
Sequence, structure, dynamics, and substrate specificity analyses of bacterial Glycoside Hydrolase 1 enzymes from several activities
- Authors: Veldman, Wayde Michael
- Date: 2022-04-08
- Subjects: Glycosidases , Bioinformatics , Molecular dynamics , Ligands (Biochemistry) , Enzymes , Ligand binding (Biochemistry) , Sequence alignment (Bioinformatics) , Structural bioinformatics
- Language: English
- Type: Doctoral thesis , text
- Identifier: http://hdl.handle.net/10962/233805 , vital:50129 , DOI 10.21504/10962/233810
- Description: Glycoside hydrolase 1 (GH1) enzymes are a ubiquitous family of enzymes that hydrolyse the glycosidic bond between two or more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. Despite their conserved catalytic domain, these enzymes have many different enzyme activities and/or substrate specificities as a change of only a few residues in the active site can alter their function. Most GH1 active site residues are situated in loop regions, and it is known that enzymes are more likely to develop new functions (broad specificity) if they possess an active site with a high proportion of loops. Furthermore, the GH1 active site consists of several subsites and cooperative binding makes the binding affinity of sites difficult to measure because the properties of one subsite are influenced by the binding of the other subsites. Extensive knowledge of protein-ligand interactions is critical to the comprehension of biology at the molecular level. However, the structural determinants and molecular details of GH1 ligand specificity and affinity are very broad, highly complex, not well understood, and therefore still need to be clarified. The aim of this study was to computationally characterise the activity of three newly solved GH1 crystallographic structures sent to us by our collaborators, and to provide evidence for their ligand-binding specificities. In addition, the differences in structural and biochemical contributions to enzyme specificity and/or function between different GH1 activities/enzymes was assessed, and the sequence/structure/function relationship of several activities of GH1 enzymes was analysed and compared. To accomplish the research aims, sequence analyses involving sequence identity, phylogenetics, and motif discovery were performed. As protein structure is more conserved than sequence, the discovered motifs were mapped to 3D structures for structural analysis and comparisons. To obtain information on enzyme mechanism or mode of action, as well as structure-function relationship, computational methods such as docking, molecular dynamics, binding free energy calculations, and essential dynamics were implemented. These computational approaches can provide information on the active site, binding residues, protein-ligand interactions, binding affinity, conformational change, and most structural or dynamic elements that play a role in enzyme function. The three new structures received from our collaborators are the first GH1 crystallographic structures from Bacillus licheniformis ever determined. As phospho-glycoside compounds were unavailable for purchase for use in activity assays, and as the active sites of the structures were absent of ligand, in silico docking and MD simulations were performed to provide evidence for their GH1 activities and substrate specificities. First though, the amino acid sequences of all known characterised bacterial GH1 enzymes were retrieved from the CAZy database and compared to the sequences of the three new B. licheniformis crystallographic structures which provided evidence of the putative 6Pβ-glucosidase activity of enzyme BlBglH, and dual 6Pβ-glucosidase/6Pβ-galactosidase (dual-phospho) activity of enzymes BlBglB and BlBglC. As all three enzymes were determined to be putative 6Pβ-glycosidase activity enzymes, much of the thesis focused on the overall analysis and comparison of the 6Pβ-glucosidase, 6Pβ-galactosidase, and dual-phospho activities that make up the 6Pβ-glycosidases. The 6Pβ-glycosidase active site residues were identified through consensus of binding interactions using all known 6Pβ-glycosidase PDB structures complexed complete ligand substrates. With regards to the 6Pβ-glucosidase activity, it was found that the L8b loop is longer and forms extra interactions with the L8a loop likely leading to increased L8 loop rigidity which would prevent the displacement of residue Ala423 ensuring a steric clash with galactoconfigured ligands and may engender substrate specificity for gluco-configured ligands only. Also, during molecular dynamics simulations using enzyme BlBglH (6Pβ-glucosidase activity), it was revealed that the favourable binding of substrate stabilises the loops that surround and make up the enzyme active site. Using the BlBglC (dual-phospho activity) enzyme structure with either galacto- (PNP6Pgal) or gluco-configured (PNP6Pglc) ligands, MD simulations in triplicate revealed important details of the broad specificity of dual-phospho activity enzymes. The ligand O4 hydroxyl position is the only difference between PNP6Pgal and PNP6Pgal, and it was found that residues Gln23 and Trp433 bind strongly to the ligand O3 hydroxyl group in the PNP6Pgal-enzyme complex, but to the ligand O4 hydroxyl group in the PNP6Pglc-enzyme complex. Also, His124 formed many hydrogen bonds with the PNP6Pgal O3 hydroxyl group but had none with PNP6Pglc. Alternatively, residues Tyr173, Tyr301, Gln302 and Thr321 formed hydrogen bonds with PNP6Pglc but not PNP6Pgal. Lastly, using multiple 3D structures from various GH1 activities, a large network of conserved interactions between active site residues (and other important residues) was uncovered, which most likely stabilise the loop regions that contain these residues, helping to retain their positions needed for binding molecules. Alternatively, there exists several differing residue-residue interactions when comparing each of the activities which could contribute towards individual activity substrate specificity by causing slightly different overall structure and malleability of the active site. Altogether, the findings in this thesis shed light on the function, mechanisms, dynamics, and ligand-binding of GH1 enzymes – particularly of the 6Pβ-glycosidase activities. , Thesis (PhD) -- Faculty of Science, Biochemistry and Microbiology, 2022
- Full Text:
- Date Issued: 2022-04-08
Structural bioinformatics studies and tool development related to drug discovery
- Authors: Hatherley, Rowan
- Date: 2016
- Subjects: Structural bioinformatics , Drug development , Natural products -- Databases , Natural products -- Biotechnology , Sequence alignment (Bioinformatics) , Malaria -- Chemotherapy , Heat shock proteins , Plasmodium falciparum
- Language: English
- Type: Thesis , Doctoral , PhD
- Identifier: vital:4164 , http://hdl.handle.net/10962/d1020021
- Description: This thesis is divided into two distinct sections which can be combined under the broad umbrella of structural bioinformatics studies related to drug discovery. The first section involves the establishment of an online South African natural products database. Natural products (NPs) are chemical entities synthesised in nature and are unrivalled in their structural complexity, chemical diversity, and biological specificity, which has long made them crucial to the drug discovery process. South Africa is rich in both plant and marine biodiversity and a great deal of research has gone into isolating compounds from organisms found in this country. However, there is no official database containing this information, making it difficult to access for research purposes. This information was extracted manually from literature to create a database of South African natural products. In order to make the information accessible to the general research community, a website, named “SANCDB”, was built to enable compounds to be quickly and easily searched for and downloaded in a number of different chemical formats. The content of the database was assessed and compared to other established natural product databases. Currently, SANCDB is the only database of natural products in Africa with an online interface. The second section of the thesis was aimed at performing structural characterisation of proteins with the potential to be targeted for antimalarial drug therapy. This looked specifically at 1) The interactions between an exported heat shock protein (Hsp) from Plasmodium falciparum (P. falciparum), PfHsp70-x and various host and exported parasite J proteins, as well as 2) The interface between PfHsp90 and the heat shock organising protein (PfHop). The PfHsp70-x:J protein study provided additional insight into how these two proteins potentially interact. Analysis of the PfHsp90:PfHop also provided a structural insight into the interaction interface between these two proteins and identified residues that could be targeted due to their contribution to the stability of the Hsp90:Hop binding complex and differences between parasite and human proteins. These studies inspired the development of a homology modelling tool, which can be used to assist researchers with homology modelling, while providing them with step-by-step control over the entire process. This thesis presents the establishment of a South African NP database and the development of a homology modelling tool, inspired by protein structural studies. When combined, these two applications have the potential to contribute greatly towards in silico drug discovery research.
- Full Text:
- Date Issued: 2016
Structural bioinformatics analysis of the Hsp40 and Hsp70 molecular chaperones from humans
- Authors: Adeyemi, Samson Adebowale
- Date: 2014
- Subjects: Structural bioinformatics , Molecular chaperones , Heat shock proteins , Protein-protein interactions , Biomolecules
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:4171 , http://hdl.handle.net/10962/d1020962
- Description: HSP70 is one of the most important families of molecular chaperone that regulate the folding and transport of client proteins in an ATP dependent manner. The ATPase activity of HSP70 is stimulated through an interaction with its family of HSP40 co-chaperones. There is evidence to suggest that specific partnerships occur between the different HSP40 and HSP70 isoforms. While some of the residues involved in the interaction are known, many of the residues governing the specificity of HSP40-HSP70 partnerships are not precisely defined. It is not currently possible to predict which HSP40 and HSP70 isoforms will interact. We attempted to use bioinformatics to identify residues involved in the specificity of the interaction between the J domain from HSP40 and the ATPase domain from the HSP70 isoforms from humans. A total of 49 HSP40 and 13 HSP70 sequences from humans were retrieved and used for subsequent analyses. The HSP40 J domains and HSP70 ATPase domains were extracted using python scripts and classified according to the subcellular localization of the proteins using localization prediction programs. Motif analysis was carried out using the full length HSP40 proteins and Multiple Sequence Alignment (MSA) was performed to identify conserved residues that may contribute to the J domain – ATPase domain interactions. Phylogenetic inference of the proteins was also performed in order to study their evolutionary relationship. Homology models of the J domains and ATPase domains were generated. The corresponding models were docked using HADDOCK server in order to analyze possible putative interactions between the partner proteins using the Protein Interactions Calculator (PIC). The level of residue conservation was found to be higher in Type I and II HSP40 than in Type III J proteins. While highly conserved residues on helixes II and III could play critical roles in J domain interactions with corresponding HSP70s, conserved residues on helixes I and IV seemed to be significant in keeping the J domain in its right orientation for functional interactions with HSP70s. Our results also showed that helixes II and III formed the interaction interface for binding to HSP70 ATPase domain as well as the linker residues. Finally, data based docking procedures, such as applied in this study, could be an effective method to investigate protein-protein interactions complex of biomolecules.
- Full Text:
- Date Issued: 2014