Application of machine learning, molecular modelling and structural data mining against antiretroviral drug resistance in HIV-1
- Authors: Sheik Amamuddy, Olivier Serge André
- Date: 2020
- Subjects: Machine learning , Molecules -- Models , Data mining , Neural networks (Computer science) , Antiretroviral agents , Protease inhibitors , Drug resistance , Multidrug resistance , Molecular dynamics , Renin-angiotensin system , HIV (Viruses) -- South Africa , HIV (Viruses) -- Social aspects -- South Africa , South African Natural Compounds Database
- Language: English
- Type: text , Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/115964 , vital:34282
- Description: Millions are affected with the Human Immunodeficiency Virus (HIV) world wide, even though the death toll is on the decline. Antiretrovirals (ARVs), more specifically protease inhibitors have shown tremendous success since their introduction into therapy since the mid 1990’s by slowing down progression to the Acquired Immune Deficiency Syndrome (AIDS). However, Drug Resistance Mutations (DRMs) are constantly selected for due to viral adaptation, making drugs less effective over time. The current challenge is to manage the infection optimally with a limited set of drugs, with differing associated levels of toxicities in the face of a virus that (1) exists as a quasispecies, (2) may transmit acquired DRMs to drug-naive individuals and (3) that can manifest class-wide resistance due to similarities in design. The presence of latent reservoirs, unawareness of infection status, education and various socio-economic factors make the problem even more complex. Adequate timing and choice of drug prescription together with treatment adherence are very important as drug toxicities, drug failure and sub-optimal treatment regimens leave room for further development of drug resistance. While CD4 cell count and the determination of viral load from patients in resource-limited settings are very helpful to track how well a patient’s immune system is able to keep the virus in check, they can be lengthy in determining whether an ARV is effective. Phenosense assay kits answer this problem using viruses engineered to contain the patient sequences and evaluating their growth in the presence of different ARVs, but this can be expensive and too involved for routine checks. As a cheaper and faster alternative, genotypic assays provide similar information from HIV pol sequences obtained from blood samples, inferring ARV efficacy on the basis of drug resistance mutation patterns. However, these are inherently complex and the various methods of in silico prediction, such as Geno2pheno, REGA and Stanford HIVdb do not always agree in every case, even though this gap decreases as the list of resistance mutations is updated. A major gap in HIV treatment is that the information used for predicting drug resistance is mainly computed from data containing an overwhelming majority of B subtype HIV, when these only comprise about 12% of the worldwide HIV infections. In addition to growing evidence that drug resistance is subtype-related, it is intuitive to hypothesize that as subtyping is a phylogenetic classification, the more divergent a subtype is from the strains used in training prediction models, the less their resistance profiles would correlate. For the aforementioned reasons, we used a multi-faceted approach to attack the virus in multiple ways. This research aimed to (1) improve resistance prediction methods by focusing solely on the available subtype, (2) mine structural information pertaining to resistance in order to find any exploitable weak points and increase knowledge of the mechanistic processes of drug resistance in HIV protease. Finally, (3) we screen for protease inhibitors amongst a database of natural compounds [the South African natural compound database (SANCDB)] to find molecules or molecular properties usable to come up with improved inhibition against the drug target. In this work, structural information was mined using the Anisotropic Network Model, Dynamics Cross-Correlation, Perturbation Response Scanning, residue contact network analysis and the radius of gyration. These methods failed to give any resistance-associated patterns in terms of natural movement, internal correlated motions, residue perturbation response, relational behaviour and global compaction respectively. Applications of drug docking, homology-modelling and energy minimization for generating features suitable for machine-learning were not very promising, and rather suggest that the value of binding energies by themselves from Vina may not be very reliable quantitatively. All these failures lead to a refinement that resulted in a highly sensitive statistically-guided network construction and analysis, which leads to key findings in the early dynamics associated with resistance across all PI drugs. The latter experiment unravelled a conserved lateral expansion motion occurring at the flap elbows, and an associated contraction that drives the base of the dimerization domain towards the catalytic site’s floor in the case of drug resistance. Interestingly, we found that despite the conserved movement, bond angles were degenerate. Alongside, 16 Artificial Neural Network models were optimised for HIV proteases and reverse transcriptase inhibitors, with performances on par with Stanford HIVdb. Finally, we prioritised 9 compounds with potential protease inhibitory activity using virtual screening and molecular dynamics (MD) to additionally suggest a promising modification to one of the compounds. This yielded another molecule inhibiting equally well both opened and closed receptor target conformations, whereby each of the compounds had been selected against an array of multi-drug-resistant receptor variants. While a main hurdle was a lack of non-B subtype data, our findings, especially from the statistically-guided network analysis, may extrapolate to a certain extent to them as the level of conservation was very high within subtype B, despite all the present variations. This network construction method lays down a sensitive approach for analysing a pair of alternate phenotypes for which complex patterns prevail, given a sufficient number of experimental units. During the course of research a weighted contact mapping tool was developed to compare renin-angiotensinogen variants and packaged as part of the MD-TASK tool suite. Finally the functionality, compatibility and performance of the MODE-TASK tool were evaluated and confirmed for both Python2.7.x and Python3.x, for the analysis of normals modes from single protein structures and essential modes from MD trajectories. These techniques and tools collectively add onto the conventional means of MD analysis.
- Full Text:
Computer aided approaches against Human African Trypanosomiasis
- Authors: Kimuda, Magambo Phillip
- Date: 2020
- Subjects: African trypanosomiasis , African trypanosomiasis -- Chemotherapy , Genomics , Macrophage migration inhibitory factor , Trypanosoma brucei , Pteridines , Tetrahydrofolate dehydrogenase , Adenylic acid , Molecular dynamics , Principal components analysis , Bioinformatics , Single nucleotide polymorphisms , Single Nucleotide Variants , Candidate Gene Association Study (CGAS)
- Language: English
- Type: Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/142542 , vital:38089
- Description: The thesis presented here is divided into two parts under a common theme that is the use of computer based tools, genomics, and in vitro experiments to develop innovative ways of tackling Human African Trypanosomiasis (HAT). Part I of this thesis focused on the human host genetic determinants while Part II focused on the discovery of novel chemotherapeutics against the parasite. Part I is further sub-divided into two parts: The first involves a Candidate Gene Association Study (CGAS) on an African population to identify genetic determinants associated with disease and/or susceptibility to HAT. The second involves studying the effects of missense Single Nucleotide Variants (SNVs) on protein structure, dynamics, and function using Macrophage Migration Inhibitory Factor (MIF) as a case study. Part II is also sub-divided into two parts: The first involves a computer based rational drug discovery of potential inhibitors against the Trypanosoma the folate pathway; particularly by targeting Trypanosoma brucei Pteridine Reductase (TbPTR1) which is an enzyme used by trypanosomes to overcome T. brucei Dihydrofolate Reductase (TbDHFR) inhibition. Lastly the derivation of CHARMM force-field parameters that can be used to accurately model the geometry and dynamics of the T. brucei Phosphodiesterase B1 enzyme (TbrPDEB1) bimetallic active site center. The derived parameters were then used in MD simulations to characterise protein-ligand residue interactions that are important in TbrPDEB1 inhibition with the goal of targeting the cyclic Adenosine Monophosphate (cAMP) signalling pathway. In the CGAS we were unable to detect any genetic associations in the Ugandan cohort analysed that passed correction for multiple testing in spite of the study being sufficiently powered. Additionally, our study found no association of the Apo lipoprotein 1 (APOL1) G2 allele association with protection against acute HAT that has been previously reported. Future investigations for example, Genome Wide Association Studies using larger samples sizes (>3000 cases and controls) are required. Macrophage migration inhibitory factor (MIF) is a cytokine that is important in both innate and adaptive immunity that has been shown to play a role in T. brucei pathogenicity using murine models. A total of 27 missense SNVs were modelled using homology modelling to create MIF protein mutants that were investigated using in silico effect prediction tools, molecular dynamics (MD), Principal Component Analysis (PCA), and Dynamic Residue Network (DRN) analysis. Our results demonstrate that mutations P2Q, I5M, P16Q, L23F, T24S, T31I, Y37H, H41P, M48V, P44L, G52C, S54R, I65M, I68T, S75F, N106S, and T113S caused significant conformational changes. Further, DRN analysis showed that residues P2, T31, Y37, G52, I65, I68, S75, N106, and T113S are part of a similar local residue interaction network with functional significance. These results show how polymorphisms such as missense SNVs can affect protein conformation, dynamics, and function. Trypanosomes are auxotrophic for folates and pterins but require them for survival. They scavenge them from their hosts. PTR1 is a multifunctional enzyme that is unique to trypanosomatids that reduces both pterins and folates. In the presence of DHFR inhibitors, PTR1 is over-expressed thus providing an escape from the effects of DHFR inhibition. Both TbPTR1 and TbDHFR are pharmacologically and genetically validated drug targets. In this study 5742 compounds were screened using molecular docking, and 13 promising binding modes were further analysed using MD simulations. The trajectories were analysed using RMSD, Rg, RMSF, PCA, Essential Dynamics Analysis (EDA), Molecular Mechanics Poisson–Boltzmann surface area (MM-PBSA) binding free energy calculations, and DRN analysis. The computational screening approach allowed us to identify five of the compounds, named RUBi004, RUBi007, RUBi014, RUBi016 and RUBi018 that exhibited antitrypanosomal growth activities against trypanosomes in culture with IC50 values of 12.5 ± 4.8 μM, 32.4 ± 4.2 μM, 5.9 ± 1.4 μM, 28.2 ± 3.3 μM, and 9.7 ± 2.1 μM, respectively. Further when used in combination with WR99210 a known TbDHFR inhibitor RUBi004, RUBi007, RUBi014 and RUBi018 showed antagonism while RUBi016 showed an additive effect. These results indicate that the four compounds might be competing with TbDHFR while RUBi016 might be more specific for TbPTR1. These compounds provide scaffolds that can be further optimised to improve their potency and specificity. Lastly, using a systematic approach we derived CHARMM force-field parameters to accurately describe the TbrPDEB1 bi-metal catalytic center. For dynamics, we employed mixed bonded and non-bonded approach. We optimised the structure using a two-layer QM/MM ONIOM (B3LYP/6-31(g): UFF). The TbrPDEB1 bi-metallic center bonds, angles, and dihedrals were parameterized by fitting the energy profiles from Potential Energy Surface (PES) scans to the CHARMM potential energy function. The parameters were validated by means of MD simulations and analysed using RMSD, Rg, RMSF, hydrogen bonding, bond/angle/dihedral evaluations, EDA, PCA, and DRN analysis. The force-field parameters were able to accurately reproduce the geometry and dynamics of the TbrPDEB1 bi-metal catalytic center during MD simulations. Molecular docking was used to identify 6 potential hits, that inhibited trypanosome growth in vitro. The derived force-field parameters were used to simulate the 6 protein-ligand complexes with the aim of elucidating crucial protein-ligand residue interactions. Using the most potent ligand RUBi022 that had an IC50 of 14.96 μM we were able to identify key residue interactions that can be of use in in silico prediction of potential TbrPDEB1 inhibitors. Overall we demonstrate how bioinformatics tools can complement current disease eradication strategies. Future work will focus on identifying variants identified in Genome Wide Association Studies and partnering with wet labs to carry out further enzyme-ligand activity relationship studies, structure determination or characterisation of appropriate protein-ligand complexes by crystallography, and site specific mutation studies
- Full Text:
A dynamics based analysis of allosteric modulation in heat shock proteins
- Authors: Penkler, David Lawrence
- Date: 2019
- Subjects: Heat shock proteins , Molecular chaperones , Allosteric regulation , Homeostasis , Protein kinases , Transcription factors , Adenosine triphosphatase , Cancer -- Chemotherapy , Molecular dynamics , High throughput screening (Drug development)
- Language: English
- Type: text , Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/115948 , vital:34273
- Description: The 70 kDa and 90 kDa heat shock proteins (Hsp70 and Hsp90) are molecular chaperones that play central roles in maintaining cellular homeostasis in all organisms of life with the exception of archaea. In addition to their general chaperone function in protein quality control, Hsp70 and Hsp90 cooperate in the regulation and activity of some 200 known natively folded protein clients which include protein kinases, transcription factors and receptors, many of which are implicated as key regulators of essential signal transduction pathways. Both chaperones are considered to be large multi-domain proteins that rely on ATPase activity and co-chaperone interactions to regulate their conformational cycles for peptide binding and release. The unique positioning of Hsp90 at the crossroads of several fundamental cellular pathways coupled with its known association with diverse oncogenic peptide clients has brought the molecular chaperone under increasing interest as a potential anti-cancer target that is crucially implicated with all eight hallmarks of the disease. Current orthosteric drug discovery efforts aimed at the inhibition of the ATPase domain of Hsp90 have been limited due to high levels of associated toxicity. In an effort to circumnavigate this, the combined focus of research efforts is shifting toward alternative approaches such as interference with co-chaperone binding and the allosteric inhibition/activation of the molecular chaperone. The overriding aim of this thesis was to demonstrate how the computational technique of Perturbation response scanning (PRS) coupled with all-atom molecular dynamics simulations (MD) and dynamic residue interaction network (DRN) analysis can be used as a viable strategy to efficiently scan and accurately identify allosteric control element capable of modulating the functional dynamics of a protein. In pursuit of this goal, this thesis also contributes to the current understanding of the nucleotide dependent allosteric mechanisms at play in cellular functionality of both Hsp70 and Hsp90. All-atom MD simulations of E. coli DnaK provided evidence of nucleotide driven modulation of conformational dynamics in both the catalytically active and inactive states. PRS analysis employed on these trajectories demonstrated sensitivity toward bound nucleotide and peptide substrate, and provided evidence of a putative allosterically active intermediate state between the ATPase active and inactive conformational states. Simultaneous binding of ATP and peptide substrate was found to allosterically prime the chaperone for interstate conversion regardless of the transition direction. Detailed analysis of these allosterically primed states revealed select residue sites capable of selecting a coordinate shift towards the opposite conformational state. In an effort to validate these results, the predicted allosteric hot spot sites were cross-validated with known experimental works and found to overlap with functional sites implicated in allosteric signal propagation and ATPase activation in Hsp70. This study presented for the first time, the application of PRS as a suitable diagnostic tool for the elucidation and quantification of the allosteric potential of select residues to effect functionally relevant global conformational rearrangements. The PRS methodology described in this study was packaged within the Python programming environment in the MD-TASK software suite for command-line ease of use and made freely available. Homology modelling techniques were used to address the lack of experimental structural data for the human cytosolic isoform of Hsp90 and for the first time provided accurate full-length structural models of human Hsp90α in fully-closed and partially-open conformations. Long-range all-atom MD simulations of these structures revealed nucleotide driven modulation of conformational dynamics in Hsp90. Subsequent DRN and PRS analysis of these MD trajectories allowed for the quantification and elucidation of nucleotide driven allosteric modulation in the molecular chaperone. A detailed PRS analysis revealed allosteric inter-domain coupling between the extreme terminals of the chaperone in response to external force perturbations at either domain. Furthermore PRS also identified several individual residue sites that are capable of selecting conformational rearrangements towards functionally relevant states which may be considered to be putative allosteric target sites for future drug discovery efforts Molecular docking techniques were employed to investigate the modulation of conformational dynamics of human Hsp90α in response to ligand binding interactions at two identified allosteric sites at the C-terminal. High throughput screening of a small library of natural compounds indigenous to South Africa revealed three hit compounds at these sites: Cephalostatin 17, 20(29)-Lupene-3β isoferulate and 3'-Bromorubrolide F. All-atom MD simulations on these protein-ligand complexes coupled with DRN analysis and several advanced trajectory based analysis techniques provided evidence of selective allosteric modulation of Hsp90α conformational dynamics in response to the identity and location of the bound ligands. Ligands bound at the four-helix bundle presented as putative allosteric inhibitors of Hsp90α, driving conformational dynamics in favour of dimer opening and possibly dimer separation. Meanwhile, ligand interactions at an adjacent sub-pocket located near the interface between the middle and C-terminal domains demonstrated allosteric activation of the chaperone, modulating conformational dynamics in favour of the fully-closed catalytically active conformational state. Taken together, the data presented in this thesis contributes to the understanding of allosteric modulation of conformational dynamics in Hsp70 and Hsp90, and provides a suitable platform for future biochemical and drug discovery studies. Furthermore, the molecular docking and computational identification of allosteric compounds with suitable binding affinity for allosteric sites at the CTD of human Hsp90α provide for the first time “proof-of-principle” for the use of PRS in conjunction with MD simulations and DRN analysis as a suitable method for the rapid identification of allosteric sites in proteins that can be probed by small molecule interaction. The data presented in this section could pave the way for future allosteric drug discovery studies for the treatment of Hsp90 associated pathologies.
- Full Text:
Bioinformatics tool development with a focus on structural bioinformatics and the analysis of genetic variation in humans
- Authors: Brown, David K
- Date: 2018
- Subjects: Bioinformatics , Human genetics -- Variation , High performance computing , Workflow management systems , Molecular dynamics , Next generation sequencing , Human Mutation Analysis (HUMA)
- Language: English
- Type: text , Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/60708 , vital:27820
- Description: This thesis is divided into three parts, united under the general theme of bioinformatics tool development and variation analysis. Part 1 describes the design and development of the Job Management System (JMS), a workflow management system for high performance computing (HPC). HPC has become an integral part of bioinformatics. Computational methods for molecular dynamics and next generation sequencing (NGS) analysis, which require complex calculations on large datasets, are not yet feasible on desktop computers. As such, powerful computer clusters have been employed to perform these calculations. However, making use of these HPC clusters requires familiarity with command line interfaces. This excludes a large number of researchers from taking advantage of these resources. JMS was developed as a tool to make it easier for researchers without a computer science background to make use of HPC. Additionally, JMS can be used to host computational tools and pipelines and generates both web-based interfaces and RESTful APIs for those tools. The web-based interfaces can be used to quickly and easily submit jobs to the underlying cluster. The RESTful web API, on the other hand, allows JMS to provided backend functionality for external tools and web servers that want to run jobs on the cluster. Numerous tools and workflows have already been added to JMS, several of which have been incorporated into external web servers. One such web server is the Human Mutation Analysis (HUMA) web server and database. HUMA, the topic of part 2 of this thesis, is a platform for the analysis of genetic variation in humans. HUMA aggregates data from various existing databases into a single, connected and related database. The advantages of this are realized in the powerful querying abilities that it provides. HUMA includes protein, gene, disease, and variation data and can be searched from the angle of any one of these categories. For example, searching for a protein will return the protein data (e.g. protein sequences, structures, domains and families, and other meta-data). However, the related nature of the database means that genes, diseases, variation, and literature related to the protein will also be returned, giving users a powerful and holistic view of all data associated with the protein. HUMA also provides links to the original sources of the data, allowing users to follow the links to find additional details. HUMA aims to be a platform for the analysis of genetic variation. As such, it also provides tools to visualize and analyse the data (several of which run on the underlying cluster, via JMS). These tools include alignment and 3D structure visualization, homology modeling, variant analysis, and the ability to upload custom variation datasets and map them to proteins, genes and diseases. HUMA also provides collaboration features, allowing users to share and discuss datasets and job results. Finally, part 3 of this thesis focused on the development of a suite of tools, MD-TASK, to analyse genetic variation at the protein structure level via network analysis of molecular dynamics simulations. The use of MD-TASK in combination with the tools developed in the previous parts of this thesis is showcased via the analysis of variation in the renin-angiotensinogen complex, a vital part of the renin-angiotensin system.
- Full Text:
The investigation of type-specific features of the copper coordinating AA9 proteins and their effect on the interaction with crystalline cellulose using molecular dynamics studies
- Authors: Moses, Vuyani
- Date: 2018
- Subjects: Copper proteins , Cellulose , Molecular dynamics , Cellulose -- Biodegradation , Bioinformatics
- Language: English
- Type: text , Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/58327 , vital:27230
- Description: AA9 proteins are metallo-enzymes which are crucial for the early stages of cellulose degradation. AA9 proteins have been suggested to cleave glycosidic bonds linking cellulose through the use of their Cu2+ coordinating active site. AA9 proteins possess different regioselectivities depending on the resulting cleavage they form and as result, are grouped accordingly. Type 1 AA9 proteins cleave the C1 carbon of cellulose while Type 2 AA9 proteins cleave the C4 carbon and Type 3 AA9 proteins cleave either C1 or C4 carbons. The steric congestion of the AA9 active site has been proposed to be a contributor to the observed regioselectivity. As such, a bioinformatics characterisation of type-specific sequence and structural features was performed. Initially AA9 protein sequences were obtained from the Pfam database and multiple sequence alignment was performed. The sequences were phylogenetically characterised and sequences were grouped into their respective types and sub-groups were identified. A selection analysis was performed on AA9 LPMO types to determine the selective pressure acting on AA9 protein residues. Motif discovery was then performed to identify conserved sequence motifs in AA9 proteins. Once type-specific sequence features were identified structural mapping was performed to assess possible effects on substrate interaction. Physicochemical property analysis was also performed to assess biochemical differences between AA9 LPMO types. Molecular dynamics (MD) simulations were then employed to dynamically assess the consequences of the discovered type-specific features on AA9-cellulose interaction. Due to the absence of AA9 specific force field parameters MD simulations were not readily applicable. As a result, Potential Energy Surface (PES) scans were performed to evaluate the force field parameters for the AA9 active site using the PM6 semi empirical approach and least squares fitting. A Type 1 AA9 active site was constructed from the crystal structure 4B5Q, encompassing only the Cu2+ coordinating residues, the Cu2+ ion and two water residues. Due to the similarity in AA9 active sites, the Type force field parameters were validated on all three AA9 LPMO types. Two MD simulations for each AA9 LPMO types were conducted using two separate Lennard-Jones parameter sets. Once completed, the MD trajectories were analysed for various features including the RMSD, RMSF, radius of gyration, coordination during simulation, hydrogen bonding, secondary structure conservation and overall protein movement. Force field parameters were successfully evaluated and validated for AA9 proteins. MD simulations of AA9 proteins were able to reveal the presence of unique type-specific binding modes of AA9 active sites to cellulose. These binding modes were characterised by the presence of unique type-specific loops which were present in Type 2 and 3 AA9 proteins but not in Type 1 AA9 proteins. The loops were found to result in steric congestion that affects how the Cu2+ ion interacts with cellulose. As a result, Cu2+ binding to cellulose was observed for Type 1 and not Type 2 and 3 AA9 proteins. In this study force field parameters have been evaluated for the Type 1 active site of AA9 proteins and this parameters were evaluated on all three types and binding. Future work will focus on identifying the nature of the reactive oxygen species and performing QM/MM calculations to elucidate the reactive mechanism of all three AA9 LPMO types.
- Full Text: