Application of machine learning, molecular modelling and structural data mining against antiretroviral drug resistance in HIV-1
- Sheik Amamuddy, Olivier Serge André
- Authors: Sheik Amamuddy, Olivier Serge André
- Date: 2020
- Subjects: Machine learning , Molecules -- Models , Data mining , Neural networks (Computer science) , Antiretroviral agents , Protease inhibitors , Drug resistance , Multidrug resistance , Molecular dynamics , Renin-angiotensin system , HIV (Viruses) -- South Africa , HIV (Viruses) -- Social aspects -- South Africa , South African Natural Compounds Database
- Language: English
- Type: text , Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/115964 , vital:34282
- Description: Millions are affected with the Human Immunodeficiency Virus (HIV) world wide, even though the death toll is on the decline. Antiretrovirals (ARVs), more specifically protease inhibitors have shown tremendous success since their introduction into therapy since the mid 1990’s by slowing down progression to the Acquired Immune Deficiency Syndrome (AIDS). However, Drug Resistance Mutations (DRMs) are constantly selected for due to viral adaptation, making drugs less effective over time. The current challenge is to manage the infection optimally with a limited set of drugs, with differing associated levels of toxicities in the face of a virus that (1) exists as a quasispecies, (2) may transmit acquired DRMs to drug-naive individuals and (3) that can manifest class-wide resistance due to similarities in design. The presence of latent reservoirs, unawareness of infection status, education and various socio-economic factors make the problem even more complex. Adequate timing and choice of drug prescription together with treatment adherence are very important as drug toxicities, drug failure and sub-optimal treatment regimens leave room for further development of drug resistance. While CD4 cell count and the determination of viral load from patients in resource-limited settings are very helpful to track how well a patient’s immune system is able to keep the virus in check, they can be lengthy in determining whether an ARV is effective. Phenosense assay kits answer this problem using viruses engineered to contain the patient sequences and evaluating their growth in the presence of different ARVs, but this can be expensive and too involved for routine checks. As a cheaper and faster alternative, genotypic assays provide similar information from HIV pol sequences obtained from blood samples, inferring ARV efficacy on the basis of drug resistance mutation patterns. However, these are inherently complex and the various methods of in silico prediction, such as Geno2pheno, REGA and Stanford HIVdb do not always agree in every case, even though this gap decreases as the list of resistance mutations is updated. A major gap in HIV treatment is that the information used for predicting drug resistance is mainly computed from data containing an overwhelming majority of B subtype HIV, when these only comprise about 12% of the worldwide HIV infections. In addition to growing evidence that drug resistance is subtype-related, it is intuitive to hypothesize that as subtyping is a phylogenetic classification, the more divergent a subtype is from the strains used in training prediction models, the less their resistance profiles would correlate. For the aforementioned reasons, we used a multi-faceted approach to attack the virus in multiple ways. This research aimed to (1) improve resistance prediction methods by focusing solely on the available subtype, (2) mine structural information pertaining to resistance in order to find any exploitable weak points and increase knowledge of the mechanistic processes of drug resistance in HIV protease. Finally, (3) we screen for protease inhibitors amongst a database of natural compounds [the South African natural compound database (SANCDB)] to find molecules or molecular properties usable to come up with improved inhibition against the drug target. In this work, structural information was mined using the Anisotropic Network Model, Dynamics Cross-Correlation, Perturbation Response Scanning, residue contact network analysis and the radius of gyration. These methods failed to give any resistance-associated patterns in terms of natural movement, internal correlated motions, residue perturbation response, relational behaviour and global compaction respectively. Applications of drug docking, homology-modelling and energy minimization for generating features suitable for machine-learning were not very promising, and rather suggest that the value of binding energies by themselves from Vina may not be very reliable quantitatively. All these failures lead to a refinement that resulted in a highly sensitive statistically-guided network construction and analysis, which leads to key findings in the early dynamics associated with resistance across all PI drugs. The latter experiment unravelled a conserved lateral expansion motion occurring at the flap elbows, and an associated contraction that drives the base of the dimerization domain towards the catalytic site’s floor in the case of drug resistance. Interestingly, we found that despite the conserved movement, bond angles were degenerate. Alongside, 16 Artificial Neural Network models were optimised for HIV proteases and reverse transcriptase inhibitors, with performances on par with Stanford HIVdb. Finally, we prioritised 9 compounds with potential protease inhibitory activity using virtual screening and molecular dynamics (MD) to additionally suggest a promising modification to one of the compounds. This yielded another molecule inhibiting equally well both opened and closed receptor target conformations, whereby each of the compounds had been selected against an array of multi-drug-resistant receptor variants. While a main hurdle was a lack of non-B subtype data, our findings, especially from the statistically-guided network analysis, may extrapolate to a certain extent to them as the level of conservation was very high within subtype B, despite all the present variations. This network construction method lays down a sensitive approach for analysing a pair of alternate phenotypes for which complex patterns prevail, given a sufficient number of experimental units. During the course of research a weighted contact mapping tool was developed to compare renin-angiotensinogen variants and packaged as part of the MD-TASK tool suite. Finally the functionality, compatibility and performance of the MODE-TASK tool were evaluated and confirmed for both Python2.7.x and Python3.x, for the analysis of normals modes from single protein structures and essential modes from MD trajectories. These techniques and tools collectively add onto the conventional means of MD analysis.
- Full Text:
- Date Issued: 2020
- Authors: Sheik Amamuddy, Olivier Serge André
- Date: 2020
- Subjects: Machine learning , Molecules -- Models , Data mining , Neural networks (Computer science) , Antiretroviral agents , Protease inhibitors , Drug resistance , Multidrug resistance , Molecular dynamics , Renin-angiotensin system , HIV (Viruses) -- South Africa , HIV (Viruses) -- Social aspects -- South Africa , South African Natural Compounds Database
- Language: English
- Type: text , Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/115964 , vital:34282
- Description: Millions are affected with the Human Immunodeficiency Virus (HIV) world wide, even though the death toll is on the decline. Antiretrovirals (ARVs), more specifically protease inhibitors have shown tremendous success since their introduction into therapy since the mid 1990’s by slowing down progression to the Acquired Immune Deficiency Syndrome (AIDS). However, Drug Resistance Mutations (DRMs) are constantly selected for due to viral adaptation, making drugs less effective over time. The current challenge is to manage the infection optimally with a limited set of drugs, with differing associated levels of toxicities in the face of a virus that (1) exists as a quasispecies, (2) may transmit acquired DRMs to drug-naive individuals and (3) that can manifest class-wide resistance due to similarities in design. The presence of latent reservoirs, unawareness of infection status, education and various socio-economic factors make the problem even more complex. Adequate timing and choice of drug prescription together with treatment adherence are very important as drug toxicities, drug failure and sub-optimal treatment regimens leave room for further development of drug resistance. While CD4 cell count and the determination of viral load from patients in resource-limited settings are very helpful to track how well a patient’s immune system is able to keep the virus in check, they can be lengthy in determining whether an ARV is effective. Phenosense assay kits answer this problem using viruses engineered to contain the patient sequences and evaluating their growth in the presence of different ARVs, but this can be expensive and too involved for routine checks. As a cheaper and faster alternative, genotypic assays provide similar information from HIV pol sequences obtained from blood samples, inferring ARV efficacy on the basis of drug resistance mutation patterns. However, these are inherently complex and the various methods of in silico prediction, such as Geno2pheno, REGA and Stanford HIVdb do not always agree in every case, even though this gap decreases as the list of resistance mutations is updated. A major gap in HIV treatment is that the information used for predicting drug resistance is mainly computed from data containing an overwhelming majority of B subtype HIV, when these only comprise about 12% of the worldwide HIV infections. In addition to growing evidence that drug resistance is subtype-related, it is intuitive to hypothesize that as subtyping is a phylogenetic classification, the more divergent a subtype is from the strains used in training prediction models, the less their resistance profiles would correlate. For the aforementioned reasons, we used a multi-faceted approach to attack the virus in multiple ways. This research aimed to (1) improve resistance prediction methods by focusing solely on the available subtype, (2) mine structural information pertaining to resistance in order to find any exploitable weak points and increase knowledge of the mechanistic processes of drug resistance in HIV protease. Finally, (3) we screen for protease inhibitors amongst a database of natural compounds [the South African natural compound database (SANCDB)] to find molecules or molecular properties usable to come up with improved inhibition against the drug target. In this work, structural information was mined using the Anisotropic Network Model, Dynamics Cross-Correlation, Perturbation Response Scanning, residue contact network analysis and the radius of gyration. These methods failed to give any resistance-associated patterns in terms of natural movement, internal correlated motions, residue perturbation response, relational behaviour and global compaction respectively. Applications of drug docking, homology-modelling and energy minimization for generating features suitable for machine-learning were not very promising, and rather suggest that the value of binding energies by themselves from Vina may not be very reliable quantitatively. All these failures lead to a refinement that resulted in a highly sensitive statistically-guided network construction and analysis, which leads to key findings in the early dynamics associated with resistance across all PI drugs. The latter experiment unravelled a conserved lateral expansion motion occurring at the flap elbows, and an associated contraction that drives the base of the dimerization domain towards the catalytic site’s floor in the case of drug resistance. Interestingly, we found that despite the conserved movement, bond angles were degenerate. Alongside, 16 Artificial Neural Network models were optimised for HIV proteases and reverse transcriptase inhibitors, with performances on par with Stanford HIVdb. Finally, we prioritised 9 compounds with potential protease inhibitory activity using virtual screening and molecular dynamics (MD) to additionally suggest a promising modification to one of the compounds. This yielded another molecule inhibiting equally well both opened and closed receptor target conformations, whereby each of the compounds had been selected against an array of multi-drug-resistant receptor variants. While a main hurdle was a lack of non-B subtype data, our findings, especially from the statistically-guided network analysis, may extrapolate to a certain extent to them as the level of conservation was very high within subtype B, despite all the present variations. This network construction method lays down a sensitive approach for analysing a pair of alternate phenotypes for which complex patterns prevail, given a sufficient number of experimental units. During the course of research a weighted contact mapping tool was developed to compare renin-angiotensinogen variants and packaged as part of the MD-TASK tool suite. Finally the functionality, compatibility and performance of the MODE-TASK tool were evaluated and confirmed for both Python2.7.x and Python3.x, for the analysis of normals modes from single protein structures and essential modes from MD trajectories. These techniques and tools collectively add onto the conventional means of MD analysis.
- Full Text:
- Date Issued: 2020
Mechanism of action of non-synonymous single nucleotide variations associated with α-carbonic anhydrases II, IV and VIII
- Authors: Sanyanga, T. Allan
- Date: 2020
- Subjects: Carbonic anhydrase , Carbonic anhydrase -- Therapeutic use , Nucleotides
- Language: English
- Type: text , Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/167346 , vital:41470
- Description: The carbonic anhydrase (CA) group of enzymes are Zinc (Zn2+) metalloproteins responsible for the reversible hydration of CO2 to bicarbonate (BCT or HCO− 3 ) and protons (H+) for the facilitation of acid-base balance and homeostasis within the body. Across all organisms, a minimum of six CA families exist, including, α (alpha), β (beta), γ (gamma), δ (delta), η (eta) and ζ (zeta). Some organisms can have more than one family, with exception to humans that contain the α family solely. The α-CA family comprises of 16 isoforms (CA-I to CA-XV) including the CA-VIII, CA-X and CA-XI acatalytic isoforms. Of the catalytic isoforms, CA-II and CA-IV possess one of the fastest rates of reaction, and any disturbances to the function of these enzymes results in CA deficiencies and undesirable phenotypes. CA-II deficiencies result in osteopetrosis with renal tubular acidosis and cerebral calcification, whereas CA-IV deficiencies result in retinitis pigmentosa 17 (RP17). Phenotypic effects generally manifest as a result of poor protein folding and function due to the presence of non-synonymous single nucleotide variations (nsSNVs). Even within the acatalytic isoforms such as CA-VIII that llosterically regulates the affinity of inositol triphosphate (IP3) for the IP3 receptor type 1 (ITPR1) and regulates calcium (Ca2+) signalling, the presence of SNVs also causes phenotypes cerebellar ataxia, mental retardation, and dysequilibrium syndrome 3 (CAMRQ3). Currently the majority of research into the CAs is focused on the inhibition of these proteins to achieve therapeutic effects in patients via the control of HCO− production or reabsorption as observed in glaucoma and diuretic medications. Little research has therefore been devoted into the identification of stabilising or activating compound that could rescue protein function in the case of deficiencies. The main aim of this research was to identify and characterise the effects of nsSNVs on the structure and function of CA-II, CA-IV and CA-VIII to set a foundation for rare disease studies into the CA group of proteins. Combined bioinformatics approaches divided into four main objectives were implemented. These included variant identification, sequence analysis and protein characterisation, force field (FF) parameter generation, molecular dynamics (MD) simulation and dynamic residue network analysis (DRN). Six variants for each of the CA-II, CA-IV and CA-VIII proteins with pathogenic annotations were identified from the HUMA and Ensembl databases. These included the pathogenic variants K18E, K18Q, H107Y, P236H, P236R and N252D for CA-II. CA-IV included the pathogenic R69H, R219C and R219S, and benign N86K, N177K and V234I variants. CA-VIII included pathogenic S100A, S100P, G162R and R237Q, and benign S100L and E109D variants. CA-II has been more extensively studied than CA-IV and CA-VIII, therefore residues essential to its function and stability are known. To discover important residues and regions within the CA-IV and CA-VIII proteins sequence and motif analysis was performed across the α-CA family, using CA-II as a reference. Sequence analysis identified multiple conserved residues between the two acatalytic CA-II and CA-IV, and the acatalytic CA-VIII isoforms that were proposed to be essential for protein stability. With exception to the benign N86K CA-IV variant, none of the other pathogenic or benign CA-II, CA-IV and CA-VIII SNVs were located at functionally or structurally important residues. Motif analysis identified 11 conserved and important motifs within the α-CA family. Several of the identified variants were located on these motifs including K18E, K18Q, H107Y and N252D (CA-II); N86K, R219C, R219S and V234I (CA-IV); and E109D, G162R and R237Q (CA-VIII). As there were no x-ray crystal structures of the variant proteins, homology modelling was performed to calculate the protein structures for characterisation. In CA-VIII, the substitution of Ser for Pro at position 100 (variant S100P) resulted in destruction of the β-sheet that the SNV was located on. Little is known about the mechanism of interaction between CA-VIII and ITPR1, and residues involved. SiteMap and CPORT were used to identify binding site amino for CA-VIII and results identified 38 potential residues. Traditional FFs are incapable of performing MD simulations of metalloproteins. The AMBER ff14SB FF was extended and Zn2+ FF parameters calculated to add support for metalloprotein MD simulations. In the protein, Zn2+ was noted to have a charge less than +1. Variant effects on protein structure were then investigated using MD simulations. Root mean square deviation (RMSD) and radius of gyration (Rg) results indicated subtle SNV effects to the variant global structure in CA-II and CA-IV. However, with regards to CA-VIII RMSD analysis highlighted that variant presence was associated with increases to the structural rigidity of the protein. Principal component analysis (PCA) in conjunction with free energy analysis was performed to observe variant effects on protein conformational sampling in 3D space. The binding of BCT to CA-II induced greater protein conformational sampling and was associated with higher free energy. In CA-IV and CA-VIII PCA analysis revealed key differences in the mechanism of action of pathogenic and benign SNVs. In CA-IV, wild-type (WT) and benign variant protein structures clustered into single low energy well hinting at the presence of more stable structures. Pathogenic variants were associated with higher free energy and proteins sampled more conformations without settling into a low energy well. PCA analysis of CA-VIII indicated the opposite to CA-IV. Pathogenic variants were clustered into low energy wells, while the WT and benign variants showed greater conformational sampling. Dynamic cross correlation (DCC) analysis was performed using the MD-TASK suite to determine variant effects on residue movement. CA-II WT protein revealed that BCT and CO2 were associated with anti-correlated and correlated residue movement, highlighting at opposite mechanisms. In CA-IV and CA-VIII variant presence resulted in a change to residue correlation compared to the WT proteins. DRN analysis was performed to investigate SNV effects of residue accessibility and communication. Results demonstrated that SNVs are associated with allosteric effects on the CA protein structures, and effects are located on the stability assisting residues of the aromatic clusters and the active site of the proteins. CA-II studies discovered that Glu117 is the most important residue for communication, and variant presence results in a decrease to the usage of the residue. This effect was greatest in the CA-II H107Y SNV, and suggests that variants could have an effect on Zn2+ dissociation from the active site. Decreases to the usage of Zn2+ coordinating residues were also noted. Where this occurred, compensatory increases to the usage of other primary and secondary coordination residues were observed, that could possibly assist with the maintenance of Zn2+ within the active site. The CA-IV variants R69H and R219C highlighted potentially similar pathogenic mechanisms, whereas N86K and N177K hinted at potentially similar benign mechanisms. Within CA-VIII, variant presence was associated with changes to the accessibility of the N-terminal binding site residues. The benign CA-VIII variants highlighted possible compensatory mechanisms, whereby as one group of N-terminal residues loses accessibility, there was an increase to the accessibility of other binding site residues to possibly balance the effect. Catalytically, the proton shuttle residue His64 in CA-II was found to occupy a novel conformation named the “faux in” that brought the imidazole group even closer to the Zn2+ compared to the “in” conformation. Overall, compared to traditional MD simulations the incorporation of DRN allowed more detailed investigations into the variant mechanisms of action. This highlights the importance of network analysis in the study of the effects of missense mutations on the structure and function of proteins. Investigations of diseases at the molecular level is essential in the identification of disease pathogenesis and assists with the development of specifically tailored and better treatment options especially in the cases of genetically associated rare diseases.
- Full Text:
- Date Issued: 2020
- Authors: Sanyanga, T. Allan
- Date: 2020
- Subjects: Carbonic anhydrase , Carbonic anhydrase -- Therapeutic use , Nucleotides
- Language: English
- Type: text , Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/167346 , vital:41470
- Description: The carbonic anhydrase (CA) group of enzymes are Zinc (Zn2+) metalloproteins responsible for the reversible hydration of CO2 to bicarbonate (BCT or HCO− 3 ) and protons (H+) for the facilitation of acid-base balance and homeostasis within the body. Across all organisms, a minimum of six CA families exist, including, α (alpha), β (beta), γ (gamma), δ (delta), η (eta) and ζ (zeta). Some organisms can have more than one family, with exception to humans that contain the α family solely. The α-CA family comprises of 16 isoforms (CA-I to CA-XV) including the CA-VIII, CA-X and CA-XI acatalytic isoforms. Of the catalytic isoforms, CA-II and CA-IV possess one of the fastest rates of reaction, and any disturbances to the function of these enzymes results in CA deficiencies and undesirable phenotypes. CA-II deficiencies result in osteopetrosis with renal tubular acidosis and cerebral calcification, whereas CA-IV deficiencies result in retinitis pigmentosa 17 (RP17). Phenotypic effects generally manifest as a result of poor protein folding and function due to the presence of non-synonymous single nucleotide variations (nsSNVs). Even within the acatalytic isoforms such as CA-VIII that llosterically regulates the affinity of inositol triphosphate (IP3) for the IP3 receptor type 1 (ITPR1) and regulates calcium (Ca2+) signalling, the presence of SNVs also causes phenotypes cerebellar ataxia, mental retardation, and dysequilibrium syndrome 3 (CAMRQ3). Currently the majority of research into the CAs is focused on the inhibition of these proteins to achieve therapeutic effects in patients via the control of HCO− production or reabsorption as observed in glaucoma and diuretic medications. Little research has therefore been devoted into the identification of stabilising or activating compound that could rescue protein function in the case of deficiencies. The main aim of this research was to identify and characterise the effects of nsSNVs on the structure and function of CA-II, CA-IV and CA-VIII to set a foundation for rare disease studies into the CA group of proteins. Combined bioinformatics approaches divided into four main objectives were implemented. These included variant identification, sequence analysis and protein characterisation, force field (FF) parameter generation, molecular dynamics (MD) simulation and dynamic residue network analysis (DRN). Six variants for each of the CA-II, CA-IV and CA-VIII proteins with pathogenic annotations were identified from the HUMA and Ensembl databases. These included the pathogenic variants K18E, K18Q, H107Y, P236H, P236R and N252D for CA-II. CA-IV included the pathogenic R69H, R219C and R219S, and benign N86K, N177K and V234I variants. CA-VIII included pathogenic S100A, S100P, G162R and R237Q, and benign S100L and E109D variants. CA-II has been more extensively studied than CA-IV and CA-VIII, therefore residues essential to its function and stability are known. To discover important residues and regions within the CA-IV and CA-VIII proteins sequence and motif analysis was performed across the α-CA family, using CA-II as a reference. Sequence analysis identified multiple conserved residues between the two acatalytic CA-II and CA-IV, and the acatalytic CA-VIII isoforms that were proposed to be essential for protein stability. With exception to the benign N86K CA-IV variant, none of the other pathogenic or benign CA-II, CA-IV and CA-VIII SNVs were located at functionally or structurally important residues. Motif analysis identified 11 conserved and important motifs within the α-CA family. Several of the identified variants were located on these motifs including K18E, K18Q, H107Y and N252D (CA-II); N86K, R219C, R219S and V234I (CA-IV); and E109D, G162R and R237Q (CA-VIII). As there were no x-ray crystal structures of the variant proteins, homology modelling was performed to calculate the protein structures for characterisation. In CA-VIII, the substitution of Ser for Pro at position 100 (variant S100P) resulted in destruction of the β-sheet that the SNV was located on. Little is known about the mechanism of interaction between CA-VIII and ITPR1, and residues involved. SiteMap and CPORT were used to identify binding site amino for CA-VIII and results identified 38 potential residues. Traditional FFs are incapable of performing MD simulations of metalloproteins. The AMBER ff14SB FF was extended and Zn2+ FF parameters calculated to add support for metalloprotein MD simulations. In the protein, Zn2+ was noted to have a charge less than +1. Variant effects on protein structure were then investigated using MD simulations. Root mean square deviation (RMSD) and radius of gyration (Rg) results indicated subtle SNV effects to the variant global structure in CA-II and CA-IV. However, with regards to CA-VIII RMSD analysis highlighted that variant presence was associated with increases to the structural rigidity of the protein. Principal component analysis (PCA) in conjunction with free energy analysis was performed to observe variant effects on protein conformational sampling in 3D space. The binding of BCT to CA-II induced greater protein conformational sampling and was associated with higher free energy. In CA-IV and CA-VIII PCA analysis revealed key differences in the mechanism of action of pathogenic and benign SNVs. In CA-IV, wild-type (WT) and benign variant protein structures clustered into single low energy well hinting at the presence of more stable structures. Pathogenic variants were associated with higher free energy and proteins sampled more conformations without settling into a low energy well. PCA analysis of CA-VIII indicated the opposite to CA-IV. Pathogenic variants were clustered into low energy wells, while the WT and benign variants showed greater conformational sampling. Dynamic cross correlation (DCC) analysis was performed using the MD-TASK suite to determine variant effects on residue movement. CA-II WT protein revealed that BCT and CO2 were associated with anti-correlated and correlated residue movement, highlighting at opposite mechanisms. In CA-IV and CA-VIII variant presence resulted in a change to residue correlation compared to the WT proteins. DRN analysis was performed to investigate SNV effects of residue accessibility and communication. Results demonstrated that SNVs are associated with allosteric effects on the CA protein structures, and effects are located on the stability assisting residues of the aromatic clusters and the active site of the proteins. CA-II studies discovered that Glu117 is the most important residue for communication, and variant presence results in a decrease to the usage of the residue. This effect was greatest in the CA-II H107Y SNV, and suggests that variants could have an effect on Zn2+ dissociation from the active site. Decreases to the usage of Zn2+ coordinating residues were also noted. Where this occurred, compensatory increases to the usage of other primary and secondary coordination residues were observed, that could possibly assist with the maintenance of Zn2+ within the active site. The CA-IV variants R69H and R219C highlighted potentially similar pathogenic mechanisms, whereas N86K and N177K hinted at potentially similar benign mechanisms. Within CA-VIII, variant presence was associated with changes to the accessibility of the N-terminal binding site residues. The benign CA-VIII variants highlighted possible compensatory mechanisms, whereby as one group of N-terminal residues loses accessibility, there was an increase to the accessibility of other binding site residues to possibly balance the effect. Catalytically, the proton shuttle residue His64 in CA-II was found to occupy a novel conformation named the “faux in” that brought the imidazole group even closer to the Zn2+ compared to the “in” conformation. Overall, compared to traditional MD simulations the incorporation of DRN allowed more detailed investigations into the variant mechanisms of action. This highlights the importance of network analysis in the study of the effects of missense mutations on the structure and function of proteins. Investigations of diseases at the molecular level is essential in the identification of disease pathogenesis and assists with the development of specifically tailored and better treatment options especially in the cases of genetically associated rare diseases.
- Full Text:
- Date Issued: 2020
A dynamics based analysis of allosteric modulation in heat shock proteins
- Authors: Penkler, David Lawrence
- Date: 2019
- Subjects: Heat shock proteins , Molecular chaperones , Allosteric regulation , Homeostasis , Protein kinases , Transcription factors , Adenosine triphosphatase , Cancer -- Chemotherapy , Molecular dynamics , High throughput screening (Drug development)
- Language: English
- Type: text , Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/115948 , vital:34273
- Description: The 70 kDa and 90 kDa heat shock proteins (Hsp70 and Hsp90) are molecular chaperones that play central roles in maintaining cellular homeostasis in all organisms of life with the exception of archaea. In addition to their general chaperone function in protein quality control, Hsp70 and Hsp90 cooperate in the regulation and activity of some 200 known natively folded protein clients which include protein kinases, transcription factors and receptors, many of which are implicated as key regulators of essential signal transduction pathways. Both chaperones are considered to be large multi-domain proteins that rely on ATPase activity and co-chaperone interactions to regulate their conformational cycles for peptide binding and release. The unique positioning of Hsp90 at the crossroads of several fundamental cellular pathways coupled with its known association with diverse oncogenic peptide clients has brought the molecular chaperone under increasing interest as a potential anti-cancer target that is crucially implicated with all eight hallmarks of the disease. Current orthosteric drug discovery efforts aimed at the inhibition of the ATPase domain of Hsp90 have been limited due to high levels of associated toxicity. In an effort to circumnavigate this, the combined focus of research efforts is shifting toward alternative approaches such as interference with co-chaperone binding and the allosteric inhibition/activation of the molecular chaperone. The overriding aim of this thesis was to demonstrate how the computational technique of Perturbation response scanning (PRS) coupled with all-atom molecular dynamics simulations (MD) and dynamic residue interaction network (DRN) analysis can be used as a viable strategy to efficiently scan and accurately identify allosteric control element capable of modulating the functional dynamics of a protein. In pursuit of this goal, this thesis also contributes to the current understanding of the nucleotide dependent allosteric mechanisms at play in cellular functionality of both Hsp70 and Hsp90. All-atom MD simulations of E. coli DnaK provided evidence of nucleotide driven modulation of conformational dynamics in both the catalytically active and inactive states. PRS analysis employed on these trajectories demonstrated sensitivity toward bound nucleotide and peptide substrate, and provided evidence of a putative allosterically active intermediate state between the ATPase active and inactive conformational states. Simultaneous binding of ATP and peptide substrate was found to allosterically prime the chaperone for interstate conversion regardless of the transition direction. Detailed analysis of these allosterically primed states revealed select residue sites capable of selecting a coordinate shift towards the opposite conformational state. In an effort to validate these results, the predicted allosteric hot spot sites were cross-validated with known experimental works and found to overlap with functional sites implicated in allosteric signal propagation and ATPase activation in Hsp70. This study presented for the first time, the application of PRS as a suitable diagnostic tool for the elucidation and quantification of the allosteric potential of select residues to effect functionally relevant global conformational rearrangements. The PRS methodology described in this study was packaged within the Python programming environment in the MD-TASK software suite for command-line ease of use and made freely available. Homology modelling techniques were used to address the lack of experimental structural data for the human cytosolic isoform of Hsp90 and for the first time provided accurate full-length structural models of human Hsp90α in fully-closed and partially-open conformations. Long-range all-atom MD simulations of these structures revealed nucleotide driven modulation of conformational dynamics in Hsp90. Subsequent DRN and PRS analysis of these MD trajectories allowed for the quantification and elucidation of nucleotide driven allosteric modulation in the molecular chaperone. A detailed PRS analysis revealed allosteric inter-domain coupling between the extreme terminals of the chaperone in response to external force perturbations at either domain. Furthermore PRS also identified several individual residue sites that are capable of selecting conformational rearrangements towards functionally relevant states which may be considered to be putative allosteric target sites for future drug discovery efforts Molecular docking techniques were employed to investigate the modulation of conformational dynamics of human Hsp90α in response to ligand binding interactions at two identified allosteric sites at the C-terminal. High throughput screening of a small library of natural compounds indigenous to South Africa revealed three hit compounds at these sites: Cephalostatin 17, 20(29)-Lupene-3β isoferulate and 3'-Bromorubrolide F. All-atom MD simulations on these protein-ligand complexes coupled with DRN analysis and several advanced trajectory based analysis techniques provided evidence of selective allosteric modulation of Hsp90α conformational dynamics in response to the identity and location of the bound ligands. Ligands bound at the four-helix bundle presented as putative allosteric inhibitors of Hsp90α, driving conformational dynamics in favour of dimer opening and possibly dimer separation. Meanwhile, ligand interactions at an adjacent sub-pocket located near the interface between the middle and C-terminal domains demonstrated allosteric activation of the chaperone, modulating conformational dynamics in favour of the fully-closed catalytically active conformational state. Taken together, the data presented in this thesis contributes to the understanding of allosteric modulation of conformational dynamics in Hsp70 and Hsp90, and provides a suitable platform for future biochemical and drug discovery studies. Furthermore, the molecular docking and computational identification of allosteric compounds with suitable binding affinity for allosteric sites at the CTD of human Hsp90α provide for the first time “proof-of-principle” for the use of PRS in conjunction with MD simulations and DRN analysis as a suitable method for the rapid identification of allosteric sites in proteins that can be probed by small molecule interaction. The data presented in this section could pave the way for future allosteric drug discovery studies for the treatment of Hsp90 associated pathologies.
- Full Text:
- Date Issued: 2019
- Authors: Penkler, David Lawrence
- Date: 2019
- Subjects: Heat shock proteins , Molecular chaperones , Allosteric regulation , Homeostasis , Protein kinases , Transcription factors , Adenosine triphosphatase , Cancer -- Chemotherapy , Molecular dynamics , High throughput screening (Drug development)
- Language: English
- Type: text , Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/115948 , vital:34273
- Description: The 70 kDa and 90 kDa heat shock proteins (Hsp70 and Hsp90) are molecular chaperones that play central roles in maintaining cellular homeostasis in all organisms of life with the exception of archaea. In addition to their general chaperone function in protein quality control, Hsp70 and Hsp90 cooperate in the regulation and activity of some 200 known natively folded protein clients which include protein kinases, transcription factors and receptors, many of which are implicated as key regulators of essential signal transduction pathways. Both chaperones are considered to be large multi-domain proteins that rely on ATPase activity and co-chaperone interactions to regulate their conformational cycles for peptide binding and release. The unique positioning of Hsp90 at the crossroads of several fundamental cellular pathways coupled with its known association with diverse oncogenic peptide clients has brought the molecular chaperone under increasing interest as a potential anti-cancer target that is crucially implicated with all eight hallmarks of the disease. Current orthosteric drug discovery efforts aimed at the inhibition of the ATPase domain of Hsp90 have been limited due to high levels of associated toxicity. In an effort to circumnavigate this, the combined focus of research efforts is shifting toward alternative approaches such as interference with co-chaperone binding and the allosteric inhibition/activation of the molecular chaperone. The overriding aim of this thesis was to demonstrate how the computational technique of Perturbation response scanning (PRS) coupled with all-atom molecular dynamics simulations (MD) and dynamic residue interaction network (DRN) analysis can be used as a viable strategy to efficiently scan and accurately identify allosteric control element capable of modulating the functional dynamics of a protein. In pursuit of this goal, this thesis also contributes to the current understanding of the nucleotide dependent allosteric mechanisms at play in cellular functionality of both Hsp70 and Hsp90. All-atom MD simulations of E. coli DnaK provided evidence of nucleotide driven modulation of conformational dynamics in both the catalytically active and inactive states. PRS analysis employed on these trajectories demonstrated sensitivity toward bound nucleotide and peptide substrate, and provided evidence of a putative allosterically active intermediate state between the ATPase active and inactive conformational states. Simultaneous binding of ATP and peptide substrate was found to allosterically prime the chaperone for interstate conversion regardless of the transition direction. Detailed analysis of these allosterically primed states revealed select residue sites capable of selecting a coordinate shift towards the opposite conformational state. In an effort to validate these results, the predicted allosteric hot spot sites were cross-validated with known experimental works and found to overlap with functional sites implicated in allosteric signal propagation and ATPase activation in Hsp70. This study presented for the first time, the application of PRS as a suitable diagnostic tool for the elucidation and quantification of the allosteric potential of select residues to effect functionally relevant global conformational rearrangements. The PRS methodology described in this study was packaged within the Python programming environment in the MD-TASK software suite for command-line ease of use and made freely available. Homology modelling techniques were used to address the lack of experimental structural data for the human cytosolic isoform of Hsp90 and for the first time provided accurate full-length structural models of human Hsp90α in fully-closed and partially-open conformations. Long-range all-atom MD simulations of these structures revealed nucleotide driven modulation of conformational dynamics in Hsp90. Subsequent DRN and PRS analysis of these MD trajectories allowed for the quantification and elucidation of nucleotide driven allosteric modulation in the molecular chaperone. A detailed PRS analysis revealed allosteric inter-domain coupling between the extreme terminals of the chaperone in response to external force perturbations at either domain. Furthermore PRS also identified several individual residue sites that are capable of selecting conformational rearrangements towards functionally relevant states which may be considered to be putative allosteric target sites for future drug discovery efforts Molecular docking techniques were employed to investigate the modulation of conformational dynamics of human Hsp90α in response to ligand binding interactions at two identified allosteric sites at the C-terminal. High throughput screening of a small library of natural compounds indigenous to South Africa revealed three hit compounds at these sites: Cephalostatin 17, 20(29)-Lupene-3β isoferulate and 3'-Bromorubrolide F. All-atom MD simulations on these protein-ligand complexes coupled with DRN analysis and several advanced trajectory based analysis techniques provided evidence of selective allosteric modulation of Hsp90α conformational dynamics in response to the identity and location of the bound ligands. Ligands bound at the four-helix bundle presented as putative allosteric inhibitors of Hsp90α, driving conformational dynamics in favour of dimer opening and possibly dimer separation. Meanwhile, ligand interactions at an adjacent sub-pocket located near the interface between the middle and C-terminal domains demonstrated allosteric activation of the chaperone, modulating conformational dynamics in favour of the fully-closed catalytically active conformational state. Taken together, the data presented in this thesis contributes to the understanding of allosteric modulation of conformational dynamics in Hsp70 and Hsp90, and provides a suitable platform for future biochemical and drug discovery studies. Furthermore, the molecular docking and computational identification of allosteric compounds with suitable binding affinity for allosteric sites at the CTD of human Hsp90α provide for the first time “proof-of-principle” for the use of PRS in conjunction with MD simulations and DRN analysis as a suitable method for the rapid identification of allosteric sites in proteins that can be probed by small molecule interaction. The data presented in this section could pave the way for future allosteric drug discovery studies for the treatment of Hsp90 associated pathologies.
- Full Text:
- Date Issued: 2019
Bioinformatics tool development with a focus on structural bioinformatics and the analysis of genetic variation in humans
- Authors: Brown, David K
- Date: 2018
- Subjects: Bioinformatics , Human genetics -- Variation , High performance computing , Workflow management systems , Molecular dynamics , Next generation sequencing , Human Mutation Analysis (HUMA)
- Language: English
- Type: text , Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/60708 , vital:27820
- Description: This thesis is divided into three parts, united under the general theme of bioinformatics tool development and variation analysis. Part 1 describes the design and development of the Job Management System (JMS), a workflow management system for high performance computing (HPC). HPC has become an integral part of bioinformatics. Computational methods for molecular dynamics and next generation sequencing (NGS) analysis, which require complex calculations on large datasets, are not yet feasible on desktop computers. As such, powerful computer clusters have been employed to perform these calculations. However, making use of these HPC clusters requires familiarity with command line interfaces. This excludes a large number of researchers from taking advantage of these resources. JMS was developed as a tool to make it easier for researchers without a computer science background to make use of HPC. Additionally, JMS can be used to host computational tools and pipelines and generates both web-based interfaces and RESTful APIs for those tools. The web-based interfaces can be used to quickly and easily submit jobs to the underlying cluster. The RESTful web API, on the other hand, allows JMS to provided backend functionality for external tools and web servers that want to run jobs on the cluster. Numerous tools and workflows have already been added to JMS, several of which have been incorporated into external web servers. One such web server is the Human Mutation Analysis (HUMA) web server and database. HUMA, the topic of part 2 of this thesis, is a platform for the analysis of genetic variation in humans. HUMA aggregates data from various existing databases into a single, connected and related database. The advantages of this are realized in the powerful querying abilities that it provides. HUMA includes protein, gene, disease, and variation data and can be searched from the angle of any one of these categories. For example, searching for a protein will return the protein data (e.g. protein sequences, structures, domains and families, and other meta-data). However, the related nature of the database means that genes, diseases, variation, and literature related to the protein will also be returned, giving users a powerful and holistic view of all data associated with the protein. HUMA also provides links to the original sources of the data, allowing users to follow the links to find additional details. HUMA aims to be a platform for the analysis of genetic variation. As such, it also provides tools to visualize and analyse the data (several of which run on the underlying cluster, via JMS). These tools include alignment and 3D structure visualization, homology modeling, variant analysis, and the ability to upload custom variation datasets and map them to proteins, genes and diseases. HUMA also provides collaboration features, allowing users to share and discuss datasets and job results. Finally, part 3 of this thesis focused on the development of a suite of tools, MD-TASK, to analyse genetic variation at the protein structure level via network analysis of molecular dynamics simulations. The use of MD-TASK in combination with the tools developed in the previous parts of this thesis is showcased via the analysis of variation in the renin-angiotensinogen complex, a vital part of the renin-angiotensin system.
- Full Text:
- Date Issued: 2018
- Authors: Brown, David K
- Date: 2018
- Subjects: Bioinformatics , Human genetics -- Variation , High performance computing , Workflow management systems , Molecular dynamics , Next generation sequencing , Human Mutation Analysis (HUMA)
- Language: English
- Type: text , Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/60708 , vital:27820
- Description: This thesis is divided into three parts, united under the general theme of bioinformatics tool development and variation analysis. Part 1 describes the design and development of the Job Management System (JMS), a workflow management system for high performance computing (HPC). HPC has become an integral part of bioinformatics. Computational methods for molecular dynamics and next generation sequencing (NGS) analysis, which require complex calculations on large datasets, are not yet feasible on desktop computers. As such, powerful computer clusters have been employed to perform these calculations. However, making use of these HPC clusters requires familiarity with command line interfaces. This excludes a large number of researchers from taking advantage of these resources. JMS was developed as a tool to make it easier for researchers without a computer science background to make use of HPC. Additionally, JMS can be used to host computational tools and pipelines and generates both web-based interfaces and RESTful APIs for those tools. The web-based interfaces can be used to quickly and easily submit jobs to the underlying cluster. The RESTful web API, on the other hand, allows JMS to provided backend functionality for external tools and web servers that want to run jobs on the cluster. Numerous tools and workflows have already been added to JMS, several of which have been incorporated into external web servers. One such web server is the Human Mutation Analysis (HUMA) web server and database. HUMA, the topic of part 2 of this thesis, is a platform for the analysis of genetic variation in humans. HUMA aggregates data from various existing databases into a single, connected and related database. The advantages of this are realized in the powerful querying abilities that it provides. HUMA includes protein, gene, disease, and variation data and can be searched from the angle of any one of these categories. For example, searching for a protein will return the protein data (e.g. protein sequences, structures, domains and families, and other meta-data). However, the related nature of the database means that genes, diseases, variation, and literature related to the protein will also be returned, giving users a powerful and holistic view of all data associated with the protein. HUMA also provides links to the original sources of the data, allowing users to follow the links to find additional details. HUMA aims to be a platform for the analysis of genetic variation. As such, it also provides tools to visualize and analyse the data (several of which run on the underlying cluster, via JMS). These tools include alignment and 3D structure visualization, homology modeling, variant analysis, and the ability to upload custom variation datasets and map them to proteins, genes and diseases. HUMA also provides collaboration features, allowing users to share and discuss datasets and job results. Finally, part 3 of this thesis focused on the development of a suite of tools, MD-TASK, to analyse genetic variation at the protein structure level via network analysis of molecular dynamics simulations. The use of MD-TASK in combination with the tools developed in the previous parts of this thesis is showcased via the analysis of variation in the renin-angiotensinogen complex, a vital part of the renin-angiotensin system.
- Full Text:
- Date Issued: 2018
- «
- ‹
- 1
- ›
- »