Computer aided approaches against Human African Trypanosomiasis
- Authors: Kimuda, Magambo Phillip
- Date: 2020
- Subjects: African trypanosomiasis , African trypanosomiasis -- Chemotherapy , Genomics , Macrophage migration inhibitory factor , Trypanosoma brucei , Pteridines , Tetrahydrofolate dehydrogenase , Adenylic acid , Molecular dynamics , Principal components analysis , Bioinformatics , Single nucleotide polymorphisms , Single Nucleotide Variants , Candidate Gene Association Study (CGAS)
- Language: English
- Type: Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/142542 , vital:38089
- Description: The thesis presented here is divided into two parts under a common theme that is the use of computer based tools, genomics, and in vitro experiments to develop innovative ways of tackling Human African Trypanosomiasis (HAT). Part I of this thesis focused on the human host genetic determinants while Part II focused on the discovery of novel chemotherapeutics against the parasite. Part I is further sub-divided into two parts: The first involves a Candidate Gene Association Study (CGAS) on an African population to identify genetic determinants associated with disease and/or susceptibility to HAT. The second involves studying the effects of missense Single Nucleotide Variants (SNVs) on protein structure, dynamics, and function using Macrophage Migration Inhibitory Factor (MIF) as a case study. Part II is also sub-divided into two parts: The first involves a computer based rational drug discovery of potential inhibitors against the Trypanosoma the folate pathway; particularly by targeting Trypanosoma brucei Pteridine Reductase (TbPTR1) which is an enzyme used by trypanosomes to overcome T. brucei Dihydrofolate Reductase (TbDHFR) inhibition. Lastly the derivation of CHARMM force-field parameters that can be used to accurately model the geometry and dynamics of the T. brucei Phosphodiesterase B1 enzyme (TbrPDEB1) bimetallic active site center. The derived parameters were then used in MD simulations to characterise protein-ligand residue interactions that are important in TbrPDEB1 inhibition with the goal of targeting the cyclic Adenosine Monophosphate (cAMP) signalling pathway. In the CGAS we were unable to detect any genetic associations in the Ugandan cohort analysed that passed correction for multiple testing in spite of the study being sufficiently powered. Additionally, our study found no association of the Apo lipoprotein 1 (APOL1) G2 allele association with protection against acute HAT that has been previously reported. Future investigations for example, Genome Wide Association Studies using larger samples sizes (>3000 cases and controls) are required. Macrophage migration inhibitory factor (MIF) is a cytokine that is important in both innate and adaptive immunity that has been shown to play a role in T. brucei pathogenicity using murine models. A total of 27 missense SNVs were modelled using homology modelling to create MIF protein mutants that were investigated using in silico effect prediction tools, molecular dynamics (MD), Principal Component Analysis (PCA), and Dynamic Residue Network (DRN) analysis. Our results demonstrate that mutations P2Q, I5M, P16Q, L23F, T24S, T31I, Y37H, H41P, M48V, P44L, G52C, S54R, I65M, I68T, S75F, N106S, and T113S caused significant conformational changes. Further, DRN analysis showed that residues P2, T31, Y37, G52, I65, I68, S75, N106, and T113S are part of a similar local residue interaction network with functional significance. These results show how polymorphisms such as missense SNVs can affect protein conformation, dynamics, and function. Trypanosomes are auxotrophic for folates and pterins but require them for survival. They scavenge them from their hosts. PTR1 is a multifunctional enzyme that is unique to trypanosomatids that reduces both pterins and folates. In the presence of DHFR inhibitors, PTR1 is over-expressed thus providing an escape from the effects of DHFR inhibition. Both TbPTR1 and TbDHFR are pharmacologically and genetically validated drug targets. In this study 5742 compounds were screened using molecular docking, and 13 promising binding modes were further analysed using MD simulations. The trajectories were analysed using RMSD, Rg, RMSF, PCA, Essential Dynamics Analysis (EDA), Molecular Mechanics Poisson–Boltzmann surface area (MM-PBSA) binding free energy calculations, and DRN analysis. The computational screening approach allowed us to identify five of the compounds, named RUBi004, RUBi007, RUBi014, RUBi016 and RUBi018 that exhibited antitrypanosomal growth activities against trypanosomes in culture with IC50 values of 12.5 ± 4.8 μM, 32.4 ± 4.2 μM, 5.9 ± 1.4 μM, 28.2 ± 3.3 μM, and 9.7 ± 2.1 μM, respectively. Further when used in combination with WR99210 a known TbDHFR inhibitor RUBi004, RUBi007, RUBi014 and RUBi018 showed antagonism while RUBi016 showed an additive effect. These results indicate that the four compounds might be competing with TbDHFR while RUBi016 might be more specific for TbPTR1. These compounds provide scaffolds that can be further optimised to improve their potency and specificity. Lastly, using a systematic approach we derived CHARMM force-field parameters to accurately describe the TbrPDEB1 bi-metal catalytic center. For dynamics, we employed mixed bonded and non-bonded approach. We optimised the structure using a two-layer QM/MM ONIOM (B3LYP/6-31(g): UFF). The TbrPDEB1 bi-metallic center bonds, angles, and dihedrals were parameterized by fitting the energy profiles from Potential Energy Surface (PES) scans to the CHARMM potential energy function. The parameters were validated by means of MD simulations and analysed using RMSD, Rg, RMSF, hydrogen bonding, bond/angle/dihedral evaluations, EDA, PCA, and DRN analysis. The force-field parameters were able to accurately reproduce the geometry and dynamics of the TbrPDEB1 bi-metal catalytic center during MD simulations. Molecular docking was used to identify 6 potential hits, that inhibited trypanosome growth in vitro. The derived force-field parameters were used to simulate the 6 protein-ligand complexes with the aim of elucidating crucial protein-ligand residue interactions. Using the most potent ligand RUBi022 that had an IC50 of 14.96 μM we were able to identify key residue interactions that can be of use in in silico prediction of potential TbrPDEB1 inhibitors. Overall we demonstrate how bioinformatics tools can complement current disease eradication strategies. Future work will focus on identifying variants identified in Genome Wide Association Studies and partnering with wet labs to carry out further enzyme-ligand activity relationship studies, structure determination or characterisation of appropriate protein-ligand complexes by crystallography, and site specific mutation studies
- Full Text:
- Date Issued: 2020
Generation of a virtual library of terpenes using graph theory, and its application in exploration of the mechanisms of terpene biosynthesis
- Authors: Dendera, Washington
- Date: 2020
- Subjects: Terpenes , Plants -- Metabolism , Computational biology , Bioinformatics , Organic compounds -- Synthesis , Monoterpenes , Molecular biology -- Computer simulation
- Language: English
- Type: text , Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/123453 , vital:35439
- Description: Terpenes form a large group of organic compounds which have proven to be of use to many living organisms being used by plants for metabolism (Pichersky and Gershenzon, 1934; McGarvey and Croteau, 1995; Gershenzon and Dudareva, 2007), defence or as a means to attract pollinators and also used by humans in medical, pharmaceutical and food industry (Bicas, Dionísio and Pastore, 2009; Marmulla and Harder, 2014; Kandi et al., 2015). Following on literature methods to generate chemical libraries using graph theoretic techniques, complete libraries of all possible terpene isomers have been constructed with the goal of construction of derivative libraries of possible carbocation intermediates which are important in the elucidation of mechanisms in the biosynthesis of terpenes. Virtual library generation of monoterpenes was first achieved by generating graphs of order 7, 8, 9 and 10 using the Nauty and Traces suite. These were screened and processed with a set of collated Python scripts written to recognize the graphs in text format and translate them to molecules, minimizing through Tinker whilst discarding graphs that violate chemistry laws. As a result of the computational time required only order 7 and order 10 graphs were processed. Out of the 873 graphs generated from order seven, 353 were converted to molecules and from the 11,7 million produced from order 10 half were processed resulting in the production of 442928 compounds (repeats included). For screening, 55 366 compounds were docked in the active site of limonene synthase; of these 2355 ligands had a good Vina docking score with a binding energy of between -7.0 and -7.4 kcal.mol-1. When these best docked molecules were overlaid in the active site a map of possible ligand positions within the active site of limonene synthase was traced out.
- Full Text:
- Date Issued: 2020
Bioinformatics tool development with a focus on structural bioinformatics and the analysis of genetic variation in humans
- Authors: Brown, David K
- Date: 2018
- Subjects: Bioinformatics , Human genetics -- Variation , High performance computing , Workflow management systems , Molecular dynamics , Next generation sequencing , Human Mutation Analysis (HUMA)
- Language: English
- Type: text , Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/60708 , vital:27820
- Description: This thesis is divided into three parts, united under the general theme of bioinformatics tool development and variation analysis. Part 1 describes the design and development of the Job Management System (JMS), a workflow management system for high performance computing (HPC). HPC has become an integral part of bioinformatics. Computational methods for molecular dynamics and next generation sequencing (NGS) analysis, which require complex calculations on large datasets, are not yet feasible on desktop computers. As such, powerful computer clusters have been employed to perform these calculations. However, making use of these HPC clusters requires familiarity with command line interfaces. This excludes a large number of researchers from taking advantage of these resources. JMS was developed as a tool to make it easier for researchers without a computer science background to make use of HPC. Additionally, JMS can be used to host computational tools and pipelines and generates both web-based interfaces and RESTful APIs for those tools. The web-based interfaces can be used to quickly and easily submit jobs to the underlying cluster. The RESTful web API, on the other hand, allows JMS to provided backend functionality for external tools and web servers that want to run jobs on the cluster. Numerous tools and workflows have already been added to JMS, several of which have been incorporated into external web servers. One such web server is the Human Mutation Analysis (HUMA) web server and database. HUMA, the topic of part 2 of this thesis, is a platform for the analysis of genetic variation in humans. HUMA aggregates data from various existing databases into a single, connected and related database. The advantages of this are realized in the powerful querying abilities that it provides. HUMA includes protein, gene, disease, and variation data and can be searched from the angle of any one of these categories. For example, searching for a protein will return the protein data (e.g. protein sequences, structures, domains and families, and other meta-data). However, the related nature of the database means that genes, diseases, variation, and literature related to the protein will also be returned, giving users a powerful and holistic view of all data associated with the protein. HUMA also provides links to the original sources of the data, allowing users to follow the links to find additional details. HUMA aims to be a platform for the analysis of genetic variation. As such, it also provides tools to visualize and analyse the data (several of which run on the underlying cluster, via JMS). These tools include alignment and 3D structure visualization, homology modeling, variant analysis, and the ability to upload custom variation datasets and map them to proteins, genes and diseases. HUMA also provides collaboration features, allowing users to share and discuss datasets and job results. Finally, part 3 of this thesis focused on the development of a suite of tools, MD-TASK, to analyse genetic variation at the protein structure level via network analysis of molecular dynamics simulations. The use of MD-TASK in combination with the tools developed in the previous parts of this thesis is showcased via the analysis of variation in the renin-angiotensinogen complex, a vital part of the renin-angiotensin system.
- Full Text:
- Date Issued: 2018
The investigation of type-specific features of the copper coordinating AA9 proteins and their effect on the interaction with crystalline cellulose using molecular dynamics studies
- Authors: Moses, Vuyani
- Date: 2018
- Subjects: Copper proteins , Cellulose , Molecular dynamics , Cellulose -- Biodegradation , Bioinformatics
- Language: English
- Type: text , Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/58327 , vital:27230
- Description: AA9 proteins are metallo-enzymes which are crucial for the early stages of cellulose degradation. AA9 proteins have been suggested to cleave glycosidic bonds linking cellulose through the use of their Cu2+ coordinating active site. AA9 proteins possess different regioselectivities depending on the resulting cleavage they form and as result, are grouped accordingly. Type 1 AA9 proteins cleave the C1 carbon of cellulose while Type 2 AA9 proteins cleave the C4 carbon and Type 3 AA9 proteins cleave either C1 or C4 carbons. The steric congestion of the AA9 active site has been proposed to be a contributor to the observed regioselectivity. As such, a bioinformatics characterisation of type-specific sequence and structural features was performed. Initially AA9 protein sequences were obtained from the Pfam database and multiple sequence alignment was performed. The sequences were phylogenetically characterised and sequences were grouped into their respective types and sub-groups were identified. A selection analysis was performed on AA9 LPMO types to determine the selective pressure acting on AA9 protein residues. Motif discovery was then performed to identify conserved sequence motifs in AA9 proteins. Once type-specific sequence features were identified structural mapping was performed to assess possible effects on substrate interaction. Physicochemical property analysis was also performed to assess biochemical differences between AA9 LPMO types. Molecular dynamics (MD) simulations were then employed to dynamically assess the consequences of the discovered type-specific features on AA9-cellulose interaction. Due to the absence of AA9 specific force field parameters MD simulations were not readily applicable. As a result, Potential Energy Surface (PES) scans were performed to evaluate the force field parameters for the AA9 active site using the PM6 semi empirical approach and least squares fitting. A Type 1 AA9 active site was constructed from the crystal structure 4B5Q, encompassing only the Cu2+ coordinating residues, the Cu2+ ion and two water residues. Due to the similarity in AA9 active sites, the Type force field parameters were validated on all three AA9 LPMO types. Two MD simulations for each AA9 LPMO types were conducted using two separate Lennard-Jones parameter sets. Once completed, the MD trajectories were analysed for various features including the RMSD, RMSF, radius of gyration, coordination during simulation, hydrogen bonding, secondary structure conservation and overall protein movement. Force field parameters were successfully evaluated and validated for AA9 proteins. MD simulations of AA9 proteins were able to reveal the presence of unique type-specific binding modes of AA9 active sites to cellulose. These binding modes were characterised by the presence of unique type-specific loops which were present in Type 2 and 3 AA9 proteins but not in Type 1 AA9 proteins. The loops were found to result in steric congestion that affects how the Cu2+ ion interacts with cellulose. As a result, Cu2+ binding to cellulose was observed for Type 1 and not Type 2 and 3 AA9 proteins. In this study force field parameters have been evaluated for the Type 1 active site of AA9 proteins and this parameters were evaluated on all three types and binding. Future work will focus on identifying the nature of the reactive oxygen species and performing QM/MM calculations to elucidate the reactive mechanism of all three AA9 LPMO types.
- Full Text:
- Date Issued: 2018
Comparative study of clan CA cysteine proteases: an insight into the protozoan parasites
- Authors: Moyo, Sipho Dugunye
- Date: 2015
- Subjects: Cysteine proteinases , Proteolytic enzymes , Protozoan diseases , Parasites , Protozoan diseases -- Chemotherapy , Bioinformatics , Plasmodium , Antiprotozoal agents
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:4165 , http://hdl.handle.net/10962/d1020309
- Description: Protozoan infections such as Malaria, Leishmaniasis, Toxoplasmosis, Chaga’s disease and African trypanosomiasis caused by the Plasmodium, Leishmania, Toxoplasma and Trypanosoma genuses respectively; inflict a huge economic, health and social impact in endemic regions particularly tropical and sub-tropical regions. The combined infections are estimated at over a billion annually and approximately 1.1 million deaths annually. The global burden of the protozoan infections is worsened by the increased drug resistance, toxicity and the relatively high cost of treatment and prophylaxis. Therefore there has been a high demand for new drugs and drug targets that play a role in parasite virulence. Cysteine proteases have been validated as viable drug targets due to their role in the infectivity stage of the parasites within the human host. There is a variety of cysteine proteases hence they are subdivided into families and in this study we focus on the clan CA, papain family C1 proteases. The current inhibitors for the protozoan cysteine proteases lack selectivity and specificity which contributes to drug toxicity. Therefore there is a need to identify the differences and similarities between the host, vector and protozoan proteases. This study uses a variety of bioinformatics tools to assess these differences and similarities. The Plasmodium cysteine protease FP-2 is the most characterized protease hence it was used as a reference to all the other proteases and its homologs were retrieved, aligned and the evolutionary relationships established. The homologs were also analysed for common motifs and the physicochemical properties determined which were validated using the Kruskal-Wallis test. These analyses revealed that the host and vector cathepsins share similar properties while the parasite cathepsins differ. At sub-site level sub-site 2 showed greater variations suggesting diverse ligand specificity within the proteases, a revelation that is vital in the design of antiprotozoan inhibitors.
- Full Text:
- Date Issued: 2015
A central enrichment-based comparison of two alternative methods of generating transcription factor binding motifs from protein binding microarray data
- Authors: Mahaye, Ntombikayise
- Date: 2013 , 2013-03-13
- Subjects: Transcription factors , Bioinformatics , Protein binding , Protein microarrays , Cell lines
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:3890 , http://hdl.handle.net/10962/d1003049 , Transcription factors , Bioinformatics , Protein binding , Protein microarrays , Cell lines
- Description: Characterising transcription factor binding sites (TFBS) is an important problem in bioinformatics, since predicting binding sites has many applications such as predicting gene regulation. ChIP-seq is a powerful in vivo method for generating genome-wide putative binding regions for transcription factors (TFs). CentriMo is an algorithm that measures central enrichment of a motif and has previously been used as motif enrichment analysis (MEA) tool. CentriMo uses the fact that ChIP-seq peak calling methods are likely to be biased towards the centre of the putative binding region, at least in cases where there is direct binding. CentriMo calculates a binomial p-value representing central enrichment, based on the central bias of the binding site with the highest likelihood ratio. In cases where binding is indirect or involves cofactors, a more complex distribution of preferred binding sites may occur but, in many cases, a low CentriMo p-value and low width of maximum enrichment (about 100bp) are strong evidence that the motif in question is the true binding motif. Several other MEA tools have been developed, but they do not consider motif central enrichment. The study investigates the claim made by Zhao and Stormo (2011) that they have identified a simpler method than that used to derive the UniPROBE motif database for creating motifs from protein binding microarray (PBM) data, which they call BEEML-PBM (Binding Energy Estimation by Maximum Likelihood-PBM). To accomplish this, CentriMo is employed on 13 motifs from both motif databases. The results indicate that there is no conclusive difference in the quality of motifs from the original PBM and BEEML-PBM approaches. CentriMo provides an understanding of the mechanisms by which TFs bind to DNA. Out of 13 TFs for which ChIP-seq data is used, BEEML-PBM reports five better motifs and twice it has not had any central enrichment when the best PBM motif does. PBM approach finds seven motifs with better central enrichment. On the other hand, across all variations, the number of examples where PBM is better is not high enough to conclude that it is overall the better approach. Some TFs bind directly to DNA, some indirect or in combination with other TFs. Some of the predicted mechanisms are supported by literature evidence. This study further revealed that the binding specificity of a TF is different in different cell types and development stages. A TF is up-regulated in a cell line where it performs its biological function. The discovery of cell line differences, which has not been done before in any CentriMo study, is interesting and provides reasons to study this further.
- Full Text:
- Date Issued: 2013
A comparative bioinformatic analysis of zinc binuclear cluster proteins
- Authors: Mthombeni, Jabulani S
- Date: 2005
- Subjects: Bioinformatics , Zinc proteins , GABA
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:4004 , http://hdl.handle.net/10962/d1004064 , Bioinformatics , Zinc proteins , GABA
- Description: Members of the zinc binuclear cluster family are important fungal transcriptional regulators sharing a common DNA binding domain. Da181p is a pleotropic zinc binuclear cluster protein involved in the induction of the UGA genes required for the γ-aminobutyrate nitrogen catabolic pathway in Saccharomyces cerevisiae. The zinc binuclear cluster domain is indispensable for function in Da181p and little is known about other domains in this protein. The aim of the study was to explore the zinc binuclear cluster protein family using comparative bioinformatics as a complement to biochemical and structural approaches. A database of all zinc binuclear cluster proteins was composed. A total of 118 zinc binuclear proteins are reported in this work. Thirty nine previously unidentified zinc binuclear cluster proteins were found. Four homologues of Da181p were identified by homology searching. Important sequence motifs were identified in the aligned sequences of Da181p and its homologues. The coiled coil motif found in the Ga14p zinc binuclear cluster protein could not be identified in Da181p and its homologues. This suggested that Da181p did not dimerise through this structural motif as other zinc binuclear cluster proteins. Solvent accessible site that could be phosphorylated by protein kinase C or casein kinase II and the role of such sites in the possible regulation of Da181p function were discussed.
- Full Text:
- Date Issued: 2005
Identification of cis-elements and transacting factors involved in the abiotic stress responses of plants
- Authors: Maclear, Athlee
- Date: 2005 , 2013-06-10
- Subjects: Plants -- Effect of stress on , Proteins -- Analysis , Bioinformatics , DNA , Plant genetics
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:4074 , http://hdl.handle.net/10962/d1007236 , Plants -- Effect of stress on , Proteins -- Analysis , Bioinformatics , DNA , Plant genetics
- Description: Many stress situations limit plant growth, resulting in crop production difficulties. Population growth, limited availability and over-utilization of arable land, and intolerant crop species have resulted in tremendous strain being placed on agriculturalists to produce enough to sustain the world's population. An understanding of the principles involved in plant resistance to environmental stress will enable scientists to harness these mechanisms to create stress-tolerant crop species, thus increasing crop production, and enabling the farming of previously unproductive land. This research project uses computational and bioinformatics techniques to explore the promoter regions of genes, encoding proteins that are up- or down-regulated in response to specific abiotic stresses, with the aim of identifying common patterns in the cis-elements governing the regulation of these abiotic stress responsive genes. An initial dataset of fifty known genes encoding for proteins reported to be up- or down-regulated in response to plant stresses that result in water-deficit at the cellular level viz. drought, low temperature, and salinity, were identified, and a postgreSQL database created to store relevant information pertaining to these genes and the proteins encoded by them. The genomic DNA was obtained where possible, and the promoter and intron regions identified. The Neural Network Promoter Prediction (NNPP) software package was used to predict the transcription start signal (TSS) and the promoter searching software tool, TESS (Transcription Element Search Software) used to identify known and user-defined cis-elements within the promoter regions of these genes. Currently available promoter prediction software analysis tools are reported to predict one promoter per kilobase of DNA, whilst functional promoters are thought to only occur one in 30-40 kilobases, which indicates that a large perccntage of predictions are likely to be false positives (pedersen et. al., 1999). NNPP was chosen as it was rated as the highest performing promoter prediction software tool by Fickett and Hatzigeorgiou (1997) in a thorough review of eukaryotic promoter prediction algorithms, however results were less than promising as very few predicted TSS were identified in the area 50 bps up- and downstream of the gene start site, where biologically functional TSSs are known to occur (Reese, 2000; Fickett and Hatzigeorgiou, 1997). TESS results seemed to support the hypothesis that drought, low-temperature and high salinity plant stress response proteins have similar as-elements in their promoter regions, and suggested links to various other gene regulation mechanisms viz. gibberellin-, light-, auxin- and development-regulated gene expression, highlighting the vast complexity of plant stress response processes. Although far from conclusive, results provide a valuable basis for future comparative promoter studies that will attempt to deduce possible common transcriptional initiation of abiotic stress response genes. , KMBT_363 , Adobe Acrobat 9.54 Paper Capture Plug-in
- Full Text:
- Date Issued: 2005
Stress-inducible protein 1: a bioinformatic analysis of the human, mouse and yeast STI1 gene structure
- Authors: Aken, Bronwen Louise
- Date: 2005
- Subjects: Molecular chaperones , Proteins -- Analysis , Heat shock proteins , Bioinformatics , Genetics -- Data processing
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:3990 , http://hdl.handle.net/10962/d1004049 , Molecular chaperones , Proteins -- Analysis , Heat shock proteins , Bioinformatics , Genetics -- Data processing
- Description: Stress-inducible protein 1 (Sti1) is a 60 kDa eukaryotic protein that is important under stress and non-stress conditions. Human Sti1 is also known as the Hsp70/Hsp90 organising protein (Hop) that coordinates the functional cooperation of heat shock protein 70 (Hsp70) and heat shock protein 90 (Hsp90) during the folding of various transcription factors and kinases, including certain oncogenic proteins and prion proteins. Limited studies have been conducted on the STI1 gene structure. Thus, the aim of this study was to develop a comprehensive description of human STI1 (hSTI1), mouse STI1 (mSTI1), and yeast STI1 (ySTI1) genes, using a bioinformatic approach. Genes encoded near the STI1 loci were identified for the three organisms using National Centre for Biotechnology Information (NCBI) MapViewer and the Saccharomyces Genome Database. Exon/intron boundaries were predicted using Hidden Markov model gene prediction software (HMMGene) and Genscan, and by alignment of the mRNA sequence with the genomic DNA sequence. Transcription factor binding sites (TFBS) were predicted by scanning the region 1000 base pairs (bp) upstream of the STI1 orthologues’ transcription start site (TSS) with Alibaba, Transcription element search software (TESS) and Transcription factor search (TFSearch). The promoter region was defined by comparing the number, type and position of TFBS across the orthologous STI1 genes. Additional putative TFBS were identified for ySTI1 by searching with software that aligns nucleic acid conserved elements (AlignACE) for over-represented motifs in the region upstream of the TSS of genes thought to be co-regulated with ySTI1. This study showed that hSTI1 and mSTI1 occur in a region of synteny with a number of genes of related function. Both hSTI1 and mSTI1 comprised 14 putative exons, while ySTI1 was encoded on a single exon. Human and mouse STI1 shared a perfectly conserved 55 bp region spanning their predicted TSS, although their TATA boxes were not conserved. A putative CpG island was identified in the region from -500 to +100 bp relative to the hSTI1 and mSTI1 TSS. This region overlapped with a region of high TFBS density, suggesting that the core promoter region was located in the region approximately 100 to 200 bp upstream of the TSS. Several conserved clusters of TFBS were also identified upstream of this promoter region, including binding sites for stimulatory protein 1 (Sp1), heat shock factor (HSF), nuclear factor kappa B (NF-kappaB), and the cAMP/enhancer binding protein (C/EBP). Microarray data suggested that ySTI1 was co-regulated with several heat shock proteins and substrates of the Hsp70/Hsp90 heterocomplex, and several putative regulatory elements were identified in the upstream region of these co-regulated genes, including a motif for HSF binding. The results of this research suggest several avenues of future experimental work, including the confirmation of the proposed core promoter, upstream regulatory elements, and CpG island, and the investigation into the co-regulation of mammalian STI1 with its surrounding genes. These results could also be used to inform STI1 gene knockout experiments in mice, to assess the biological importance of mammalian STI1.
- Full Text:
- Date Issued: 2005
The role of parallel computing in bioinformatics
- Authors: Akhurst, Timothy John
- Date: 2005
- Subjects: Bioinformatics , Parallel programming (Computer science) , LINDA (Computer system) , Java (Computer program language) , Parallel processing (Electronic computers) , Genomics -- Data processing
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:3986 , http://hdl.handle.net/10962/d1004045 , Bioinformatics , Parallel programming (Computer science) , LINDA (Computer system) , Java (Computer program language) , Parallel processing (Electronic computers) , Genomics -- Data processing
- Description: The need to intelligibly capture, manage and analyse the ever-increasing amount of publicly available genomic data is one of the challenges facing bioinformaticians today. Such analyses are in fact impractical using uniprocessor machines, which has led to an increasing reliance on clusters of commodity-priced computers. An existing network of cheap, commodity PCs was utilised as a single computational resource for parallel computing. The performance of the cluster was investigated using a whole genome-scanning program written in the Java programming language. The TSpaces framework, based on the Linda parallel programming model, was used to parallelise the application. Maximum speedup was achieved at between 30 and 50 processors, depending on the size of the genome being scanned. Together with this, the associated significant reductions in wall-clock time suggest that both parallel computing and Java have a significant role to play in the field of bioinformatics.
- Full Text:
- Date Issued: 2005