Application of machine learning, molecular modelling and structural data mining against antiretroviral drug resistance in HIV-1
- Sheik Amamuddy, Olivier Serge André
- Authors: Sheik Amamuddy, Olivier Serge André
- Date: 2020
- Subjects: Machine learning , Molecules -- Models , Data mining , Neural networks (Computer science) , Antiretroviral agents , Protease inhibitors , Drug resistance , Multidrug resistance , Molecular dynamics , Renin-angiotensin system , HIV (Viruses) -- South Africa , HIV (Viruses) -- Social aspects -- South Africa , South African Natural Compounds Database
- Language: English
- Type: text , Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/115964 , vital:34282
- Description: Millions are affected with the Human Immunodeficiency Virus (HIV) world wide, even though the death toll is on the decline. Antiretrovirals (ARVs), more specifically protease inhibitors have shown tremendous success since their introduction into therapy since the mid 1990’s by slowing down progression to the Acquired Immune Deficiency Syndrome (AIDS). However, Drug Resistance Mutations (DRMs) are constantly selected for due to viral adaptation, making drugs less effective over time. The current challenge is to manage the infection optimally with a limited set of drugs, with differing associated levels of toxicities in the face of a virus that (1) exists as a quasispecies, (2) may transmit acquired DRMs to drug-naive individuals and (3) that can manifest class-wide resistance due to similarities in design. The presence of latent reservoirs, unawareness of infection status, education and various socio-economic factors make the problem even more complex. Adequate timing and choice of drug prescription together with treatment adherence are very important as drug toxicities, drug failure and sub-optimal treatment regimens leave room for further development of drug resistance. While CD4 cell count and the determination of viral load from patients in resource-limited settings are very helpful to track how well a patient’s immune system is able to keep the virus in check, they can be lengthy in determining whether an ARV is effective. Phenosense assay kits answer this problem using viruses engineered to contain the patient sequences and evaluating their growth in the presence of different ARVs, but this can be expensive and too involved for routine checks. As a cheaper and faster alternative, genotypic assays provide similar information from HIV pol sequences obtained from blood samples, inferring ARV efficacy on the basis of drug resistance mutation patterns. However, these are inherently complex and the various methods of in silico prediction, such as Geno2pheno, REGA and Stanford HIVdb do not always agree in every case, even though this gap decreases as the list of resistance mutations is updated. A major gap in HIV treatment is that the information used for predicting drug resistance is mainly computed from data containing an overwhelming majority of B subtype HIV, when these only comprise about 12% of the worldwide HIV infections. In addition to growing evidence that drug resistance is subtype-related, it is intuitive to hypothesize that as subtyping is a phylogenetic classification, the more divergent a subtype is from the strains used in training prediction models, the less their resistance profiles would correlate. For the aforementioned reasons, we used a multi-faceted approach to attack the virus in multiple ways. This research aimed to (1) improve resistance prediction methods by focusing solely on the available subtype, (2) mine structural information pertaining to resistance in order to find any exploitable weak points and increase knowledge of the mechanistic processes of drug resistance in HIV protease. Finally, (3) we screen for protease inhibitors amongst a database of natural compounds [the South African natural compound database (SANCDB)] to find molecules or molecular properties usable to come up with improved inhibition against the drug target. In this work, structural information was mined using the Anisotropic Network Model, Dynamics Cross-Correlation, Perturbation Response Scanning, residue contact network analysis and the radius of gyration. These methods failed to give any resistance-associated patterns in terms of natural movement, internal correlated motions, residue perturbation response, relational behaviour and global compaction respectively. Applications of drug docking, homology-modelling and energy minimization for generating features suitable for machine-learning were not very promising, and rather suggest that the value of binding energies by themselves from Vina may not be very reliable quantitatively. All these failures lead to a refinement that resulted in a highly sensitive statistically-guided network construction and analysis, which leads to key findings in the early dynamics associated with resistance across all PI drugs. The latter experiment unravelled a conserved lateral expansion motion occurring at the flap elbows, and an associated contraction that drives the base of the dimerization domain towards the catalytic site’s floor in the case of drug resistance. Interestingly, we found that despite the conserved movement, bond angles were degenerate. Alongside, 16 Artificial Neural Network models were optimised for HIV proteases and reverse transcriptase inhibitors, with performances on par with Stanford HIVdb. Finally, we prioritised 9 compounds with potential protease inhibitory activity using virtual screening and molecular dynamics (MD) to additionally suggest a promising modification to one of the compounds. This yielded another molecule inhibiting equally well both opened and closed receptor target conformations, whereby each of the compounds had been selected against an array of multi-drug-resistant receptor variants. While a main hurdle was a lack of non-B subtype data, our findings, especially from the statistically-guided network analysis, may extrapolate to a certain extent to them as the level of conservation was very high within subtype B, despite all the present variations. This network construction method lays down a sensitive approach for analysing a pair of alternate phenotypes for which complex patterns prevail, given a sufficient number of experimental units. During the course of research a weighted contact mapping tool was developed to compare renin-angiotensinogen variants and packaged as part of the MD-TASK tool suite. Finally the functionality, compatibility and performance of the MODE-TASK tool were evaluated and confirmed for both Python2.7.x and Python3.x, for the analysis of normals modes from single protein structures and essential modes from MD trajectories. These techniques and tools collectively add onto the conventional means of MD analysis.
- Full Text:
- Authors: Sheik Amamuddy, Olivier Serge André
- Date: 2020
- Subjects: Machine learning , Molecules -- Models , Data mining , Neural networks (Computer science) , Antiretroviral agents , Protease inhibitors , Drug resistance , Multidrug resistance , Molecular dynamics , Renin-angiotensin system , HIV (Viruses) -- South Africa , HIV (Viruses) -- Social aspects -- South Africa , South African Natural Compounds Database
- Language: English
- Type: text , Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/115964 , vital:34282
- Description: Millions are affected with the Human Immunodeficiency Virus (HIV) world wide, even though the death toll is on the decline. Antiretrovirals (ARVs), more specifically protease inhibitors have shown tremendous success since their introduction into therapy since the mid 1990’s by slowing down progression to the Acquired Immune Deficiency Syndrome (AIDS). However, Drug Resistance Mutations (DRMs) are constantly selected for due to viral adaptation, making drugs less effective over time. The current challenge is to manage the infection optimally with a limited set of drugs, with differing associated levels of toxicities in the face of a virus that (1) exists as a quasispecies, (2) may transmit acquired DRMs to drug-naive individuals and (3) that can manifest class-wide resistance due to similarities in design. The presence of latent reservoirs, unawareness of infection status, education and various socio-economic factors make the problem even more complex. Adequate timing and choice of drug prescription together with treatment adherence are very important as drug toxicities, drug failure and sub-optimal treatment regimens leave room for further development of drug resistance. While CD4 cell count and the determination of viral load from patients in resource-limited settings are very helpful to track how well a patient’s immune system is able to keep the virus in check, they can be lengthy in determining whether an ARV is effective. Phenosense assay kits answer this problem using viruses engineered to contain the patient sequences and evaluating their growth in the presence of different ARVs, but this can be expensive and too involved for routine checks. As a cheaper and faster alternative, genotypic assays provide similar information from HIV pol sequences obtained from blood samples, inferring ARV efficacy on the basis of drug resistance mutation patterns. However, these are inherently complex and the various methods of in silico prediction, such as Geno2pheno, REGA and Stanford HIVdb do not always agree in every case, even though this gap decreases as the list of resistance mutations is updated. A major gap in HIV treatment is that the information used for predicting drug resistance is mainly computed from data containing an overwhelming majority of B subtype HIV, when these only comprise about 12% of the worldwide HIV infections. In addition to growing evidence that drug resistance is subtype-related, it is intuitive to hypothesize that as subtyping is a phylogenetic classification, the more divergent a subtype is from the strains used in training prediction models, the less their resistance profiles would correlate. For the aforementioned reasons, we used a multi-faceted approach to attack the virus in multiple ways. This research aimed to (1) improve resistance prediction methods by focusing solely on the available subtype, (2) mine structural information pertaining to resistance in order to find any exploitable weak points and increase knowledge of the mechanistic processes of drug resistance in HIV protease. Finally, (3) we screen for protease inhibitors amongst a database of natural compounds [the South African natural compound database (SANCDB)] to find molecules or molecular properties usable to come up with improved inhibition against the drug target. In this work, structural information was mined using the Anisotropic Network Model, Dynamics Cross-Correlation, Perturbation Response Scanning, residue contact network analysis and the radius of gyration. These methods failed to give any resistance-associated patterns in terms of natural movement, internal correlated motions, residue perturbation response, relational behaviour and global compaction respectively. Applications of drug docking, homology-modelling and energy minimization for generating features suitable for machine-learning were not very promising, and rather suggest that the value of binding energies by themselves from Vina may not be very reliable quantitatively. All these failures lead to a refinement that resulted in a highly sensitive statistically-guided network construction and analysis, which leads to key findings in the early dynamics associated with resistance across all PI drugs. The latter experiment unravelled a conserved lateral expansion motion occurring at the flap elbows, and an associated contraction that drives the base of the dimerization domain towards the catalytic site’s floor in the case of drug resistance. Interestingly, we found that despite the conserved movement, bond angles were degenerate. Alongside, 16 Artificial Neural Network models were optimised for HIV proteases and reverse transcriptase inhibitors, with performances on par with Stanford HIVdb. Finally, we prioritised 9 compounds with potential protease inhibitory activity using virtual screening and molecular dynamics (MD) to additionally suggest a promising modification to one of the compounds. This yielded another molecule inhibiting equally well both opened and closed receptor target conformations, whereby each of the compounds had been selected against an array of multi-drug-resistant receptor variants. While a main hurdle was a lack of non-B subtype data, our findings, especially from the statistically-guided network analysis, may extrapolate to a certain extent to them as the level of conservation was very high within subtype B, despite all the present variations. This network construction method lays down a sensitive approach for analysing a pair of alternate phenotypes for which complex patterns prevail, given a sufficient number of experimental units. During the course of research a weighted contact mapping tool was developed to compare renin-angiotensinogen variants and packaged as part of the MD-TASK tool suite. Finally the functionality, compatibility and performance of the MODE-TASK tool were evaluated and confirmed for both Python2.7.x and Python3.x, for the analysis of normals modes from single protein structures and essential modes from MD trajectories. These techniques and tools collectively add onto the conventional means of MD analysis.
- Full Text:
Modelling Ionospheric vertical drifts over the African low latitude region
- Dubazane, Makhosonke Berthwell
- Authors: Dubazane, Makhosonke Berthwell
- Date: 2018
- Subjects: Ionospheric drift , Magnetometers , Functions, Orthogonal , Neural networks (Computer science) , Ionospheric electron density -- Africa , Communication and Navigation Outage Forecasting Systems (C/NOFS)
- Language: English
- Type: text , Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/63356 , vital:28396
- Description: Low/equatorial latitudes vertical plasma drifts and electric fields govern the formation and changes of ionospheric density structures which affect space-based systems such as communications, navigation and positioning. Dynamical and electrodynamical processes play important roles in plasma distribution at different altitudes. Because of the high variability of E × B drift in low latitude regions, coupled with various processes that sometimes originate from high latitudes especially during geomagnetic storm conditions, it is challenging to develop accurate vertical drift models. This is despite the fact that there are very few instruments dedicated to provide electric field and hence E × B drift data in low/equatorial latitude regions. To this effect, there exists no ground-based instrument for direct measurements of E×B drift data in the African sector. This study presents the first time investigation aimed at modelling the long-term variability of low latitude vertical E × B drift over the African sector using a combination of Communication and Navigation Outage Forecasting Systems (C/NOFS) and ground-based magnetometer observations/measurements during 2008-2013. Because the approach is based on the estimation of equatorial electrojet from ground-based magnetometer observations, the developed models are only valid for local daytime. Three modelling techniques have been considered. The application of Empirical Orthogonal Functions and partial least squares has been performed on vertical E × B drift modelling for the first time. The artificial neural networks that have the advantage of learning underlying changes between a set of inputs and known output were also used in vertical E × B drift modelling. Due to lack of E×B drift data over the African sector, the developed models were validated using satellite data and the climatological Scherliess-Fejer model incorporated within the International Reference Ionosphere model. Maximum correlation coefficient of ∼ 0.8 was achieved when validating the developed models with C/NOFS E × B drift observations that were not used in any model development. For most of the time, the climatological model overestimates the local daytime vertical E × B drift velocities. The methods and approach presented in this study provide a background for constructing vertical E ×B drift databases in longitude sectors that do not have radar instrumentation. This will in turn make it possible to study day-to-day variability of vertical E×B drift and hopefully lead to the development of regional and global models that will incorporate local time information in different longitude sectors.
- Full Text:
- Authors: Dubazane, Makhosonke Berthwell
- Date: 2018
- Subjects: Ionospheric drift , Magnetometers , Functions, Orthogonal , Neural networks (Computer science) , Ionospheric electron density -- Africa , Communication and Navigation Outage Forecasting Systems (C/NOFS)
- Language: English
- Type: text , Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/63356 , vital:28396
- Description: Low/equatorial latitudes vertical plasma drifts and electric fields govern the formation and changes of ionospheric density structures which affect space-based systems such as communications, navigation and positioning. Dynamical and electrodynamical processes play important roles in plasma distribution at different altitudes. Because of the high variability of E × B drift in low latitude regions, coupled with various processes that sometimes originate from high latitudes especially during geomagnetic storm conditions, it is challenging to develop accurate vertical drift models. This is despite the fact that there are very few instruments dedicated to provide electric field and hence E × B drift data in low/equatorial latitude regions. To this effect, there exists no ground-based instrument for direct measurements of E×B drift data in the African sector. This study presents the first time investigation aimed at modelling the long-term variability of low latitude vertical E × B drift over the African sector using a combination of Communication and Navigation Outage Forecasting Systems (C/NOFS) and ground-based magnetometer observations/measurements during 2008-2013. Because the approach is based on the estimation of equatorial electrojet from ground-based magnetometer observations, the developed models are only valid for local daytime. Three modelling techniques have been considered. The application of Empirical Orthogonal Functions and partial least squares has been performed on vertical E × B drift modelling for the first time. The artificial neural networks that have the advantage of learning underlying changes between a set of inputs and known output were also used in vertical E × B drift modelling. Due to lack of E×B drift data over the African sector, the developed models were validated using satellite data and the climatological Scherliess-Fejer model incorporated within the International Reference Ionosphere model. Maximum correlation coefficient of ∼ 0.8 was achieved when validating the developed models with C/NOFS E × B drift observations that were not used in any model development. For most of the time, the climatological model overestimates the local daytime vertical E × B drift velocities. The methods and approach presented in this study provide a background for constructing vertical E ×B drift databases in longitude sectors that do not have radar instrumentation. This will in turn make it possible to study day-to-day variability of vertical E×B drift and hopefully lead to the development of regional and global models that will incorporate local time information in different longitude sectors.
- Full Text:
Tomographic imaging of East African equatorial ionosphere and study of equatorial plasma bubbles
- Authors: Giday, Nigussie Mezgebe
- Date: 2018
- Subjects: Ionosphere -- Africa, Central , Tomography -- Africa, Central , Global Positioning System , Neural networks (Computer science) , Space environment , Multi-Instrument Data Analysis System (MIDAS) , Equatorial plasma bubbles
- Language: English
- Type: text , Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/63980 , vital:28516
- Description: In spite of the fact that the African ionospheric equatorial region has the largest ground footprint along the geomagnetic equator, it has not been well studied due to the absence of adequate ground-based instruments. This thesis presents research on both tomographic imaging of the African equatorial ionosphere and the study of the ionospheric irregularities/equatorial plasma bubbles (EPBs) under varying geomagnetic conditions. The Multi-Instrument Data Analysis System (MIDAS), an inversion algorithm, was investigated for its validity and ability as a tool to reconstruct multi-scaled ionospheric structures for different geomagnetic conditions. This was done for the narrow East African longitude sector with data from the available ground Global Positioning Sys-tem (GPS) receivers. The MIDAS results were compared to the results of two models, namely the IRI and GIM. MIDAS results compared more favourably with the observation vertical total electron content (VTEC), with a computed maximum correlation coefficient (r) of 0.99 and minimum root-mean-square error (RMSE) of 2.91 TECU, than did the results of the IRI-2012 and GIM models with maximum r of 0.93 and 0.99, and minimum RMSE of 13.03 TECU and 6.52 TECU, respectively, over all the test stations and validation days. The ability of MIDAS to reconstruct storm-time TEC was also compared with the results produced by the use of a Artificial Neural Net-work (ANN) for the African low- and mid-latitude regions. In terms of latitude, on average,MIDAS performed 13.44 % better than ANN in the African mid-latitudes, while MIDAS under performed in low-latitudes. This thesis also reports on the effects of moderate geomagnetic conditions on the evolution of EPBs and/or ionospheric irregularities during their season of occurrence using data from (or measurements by) space- and ground-based instruments for the east African equatorial sector. The study showed that the strength of daytime equatorial electrojet (EEJ), the steepness of the TEC peak-to-trough gradient and/or the meridional/transequatorial thermospheric winds sometimes have collective/interwoven effects, while at other times one mechanism dominates. In summary, this research offered tomographic results that outperform the results of the commonly used (“standard”) global models (i.e. IRI and GIM) for a longitude sector of importance to space weather, which has not been adequately studied due to a lack of sufficient instrumentation.
- Full Text:
- Authors: Giday, Nigussie Mezgebe
- Date: 2018
- Subjects: Ionosphere -- Africa, Central , Tomography -- Africa, Central , Global Positioning System , Neural networks (Computer science) , Space environment , Multi-Instrument Data Analysis System (MIDAS) , Equatorial plasma bubbles
- Language: English
- Type: text , Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/63980 , vital:28516
- Description: In spite of the fact that the African ionospheric equatorial region has the largest ground footprint along the geomagnetic equator, it has not been well studied due to the absence of adequate ground-based instruments. This thesis presents research on both tomographic imaging of the African equatorial ionosphere and the study of the ionospheric irregularities/equatorial plasma bubbles (EPBs) under varying geomagnetic conditions. The Multi-Instrument Data Analysis System (MIDAS), an inversion algorithm, was investigated for its validity and ability as a tool to reconstruct multi-scaled ionospheric structures for different geomagnetic conditions. This was done for the narrow East African longitude sector with data from the available ground Global Positioning Sys-tem (GPS) receivers. The MIDAS results were compared to the results of two models, namely the IRI and GIM. MIDAS results compared more favourably with the observation vertical total electron content (VTEC), with a computed maximum correlation coefficient (r) of 0.99 and minimum root-mean-square error (RMSE) of 2.91 TECU, than did the results of the IRI-2012 and GIM models with maximum r of 0.93 and 0.99, and minimum RMSE of 13.03 TECU and 6.52 TECU, respectively, over all the test stations and validation days. The ability of MIDAS to reconstruct storm-time TEC was also compared with the results produced by the use of a Artificial Neural Net-work (ANN) for the African low- and mid-latitude regions. In terms of latitude, on average,MIDAS performed 13.44 % better than ANN in the African mid-latitudes, while MIDAS under performed in low-latitudes. This thesis also reports on the effects of moderate geomagnetic conditions on the evolution of EPBs and/or ionospheric irregularities during their season of occurrence using data from (or measurements by) space- and ground-based instruments for the east African equatorial sector. The study showed that the strength of daytime equatorial electrojet (EEJ), the steepness of the TEC peak-to-trough gradient and/or the meridional/transequatorial thermospheric winds sometimes have collective/interwoven effects, while at other times one mechanism dominates. In summary, this research offered tomographic results that outperform the results of the commonly used (“standard”) global models (i.e. IRI and GIM) for a longitude sector of importance to space weather, which has not been adequately studied due to a lack of sufficient instrumentation.
- Full Text:
- «
- ‹
- 1
- ›
- »