A dynamics based analysis of allosteric modulation in heat shock proteins
- Authors: Penkler, David Lawrence
- Date: 2019
- Subjects: Heat shock proteins , Molecular chaperones , Allosteric regulation , Homeostasis , Protein kinases , Transcription factors , Adenosine triphosphatase , Cancer -- Chemotherapy , Molecular dynamics , High throughput screening (Drug development)
- Language: English
- Type: text , Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/115948 , vital:34273
- Description: The 70 kDa and 90 kDa heat shock proteins (Hsp70 and Hsp90) are molecular chaperones that play central roles in maintaining cellular homeostasis in all organisms of life with the exception of archaea. In addition to their general chaperone function in protein quality control, Hsp70 and Hsp90 cooperate in the regulation and activity of some 200 known natively folded protein clients which include protein kinases, transcription factors and receptors, many of which are implicated as key regulators of essential signal transduction pathways. Both chaperones are considered to be large multi-domain proteins that rely on ATPase activity and co-chaperone interactions to regulate their conformational cycles for peptide binding and release. The unique positioning of Hsp90 at the crossroads of several fundamental cellular pathways coupled with its known association with diverse oncogenic peptide clients has brought the molecular chaperone under increasing interest as a potential anti-cancer target that is crucially implicated with all eight hallmarks of the disease. Current orthosteric drug discovery efforts aimed at the inhibition of the ATPase domain of Hsp90 have been limited due to high levels of associated toxicity. In an effort to circumnavigate this, the combined focus of research efforts is shifting toward alternative approaches such as interference with co-chaperone binding and the allosteric inhibition/activation of the molecular chaperone. The overriding aim of this thesis was to demonstrate how the computational technique of Perturbation response scanning (PRS) coupled with all-atom molecular dynamics simulations (MD) and dynamic residue interaction network (DRN) analysis can be used as a viable strategy to efficiently scan and accurately identify allosteric control element capable of modulating the functional dynamics of a protein. In pursuit of this goal, this thesis also contributes to the current understanding of the nucleotide dependent allosteric mechanisms at play in cellular functionality of both Hsp70 and Hsp90. All-atom MD simulations of E. coli DnaK provided evidence of nucleotide driven modulation of conformational dynamics in both the catalytically active and inactive states. PRS analysis employed on these trajectories demonstrated sensitivity toward bound nucleotide and peptide substrate, and provided evidence of a putative allosterically active intermediate state between the ATPase active and inactive conformational states. Simultaneous binding of ATP and peptide substrate was found to allosterically prime the chaperone for interstate conversion regardless of the transition direction. Detailed analysis of these allosterically primed states revealed select residue sites capable of selecting a coordinate shift towards the opposite conformational state. In an effort to validate these results, the predicted allosteric hot spot sites were cross-validated with known experimental works and found to overlap with functional sites implicated in allosteric signal propagation and ATPase activation in Hsp70. This study presented for the first time, the application of PRS as a suitable diagnostic tool for the elucidation and quantification of the allosteric potential of select residues to effect functionally relevant global conformational rearrangements. The PRS methodology described in this study was packaged within the Python programming environment in the MD-TASK software suite for command-line ease of use and made freely available. Homology modelling techniques were used to address the lack of experimental structural data for the human cytosolic isoform of Hsp90 and for the first time provided accurate full-length structural models of human Hsp90α in fully-closed and partially-open conformations. Long-range all-atom MD simulations of these structures revealed nucleotide driven modulation of conformational dynamics in Hsp90. Subsequent DRN and PRS analysis of these MD trajectories allowed for the quantification and elucidation of nucleotide driven allosteric modulation in the molecular chaperone. A detailed PRS analysis revealed allosteric inter-domain coupling between the extreme terminals of the chaperone in response to external force perturbations at either domain. Furthermore PRS also identified several individual residue sites that are capable of selecting conformational rearrangements towards functionally relevant states which may be considered to be putative allosteric target sites for future drug discovery efforts Molecular docking techniques were employed to investigate the modulation of conformational dynamics of human Hsp90α in response to ligand binding interactions at two identified allosteric sites at the C-terminal. High throughput screening of a small library of natural compounds indigenous to South Africa revealed three hit compounds at these sites: Cephalostatin 17, 20(29)-Lupene-3β isoferulate and 3'-Bromorubrolide F. All-atom MD simulations on these protein-ligand complexes coupled with DRN analysis and several advanced trajectory based analysis techniques provided evidence of selective allosteric modulation of Hsp90α conformational dynamics in response to the identity and location of the bound ligands. Ligands bound at the four-helix bundle presented as putative allosteric inhibitors of Hsp90α, driving conformational dynamics in favour of dimer opening and possibly dimer separation. Meanwhile, ligand interactions at an adjacent sub-pocket located near the interface between the middle and C-terminal domains demonstrated allosteric activation of the chaperone, modulating conformational dynamics in favour of the fully-closed catalytically active conformational state. Taken together, the data presented in this thesis contributes to the understanding of allosteric modulation of conformational dynamics in Hsp70 and Hsp90, and provides a suitable platform for future biochemical and drug discovery studies. Furthermore, the molecular docking and computational identification of allosteric compounds with suitable binding affinity for allosteric sites at the CTD of human Hsp90α provide for the first time “proof-of-principle” for the use of PRS in conjunction with MD simulations and DRN analysis as a suitable method for the rapid identification of allosteric sites in proteins that can be probed by small molecule interaction. The data presented in this section could pave the way for future allosteric drug discovery studies for the treatment of Hsp90 associated pathologies.
- Full Text:
- Date Issued: 2019
- Authors: Penkler, David Lawrence
- Date: 2019
- Subjects: Heat shock proteins , Molecular chaperones , Allosteric regulation , Homeostasis , Protein kinases , Transcription factors , Adenosine triphosphatase , Cancer -- Chemotherapy , Molecular dynamics , High throughput screening (Drug development)
- Language: English
- Type: text , Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/115948 , vital:34273
- Description: The 70 kDa and 90 kDa heat shock proteins (Hsp70 and Hsp90) are molecular chaperones that play central roles in maintaining cellular homeostasis in all organisms of life with the exception of archaea. In addition to their general chaperone function in protein quality control, Hsp70 and Hsp90 cooperate in the regulation and activity of some 200 known natively folded protein clients which include protein kinases, transcription factors and receptors, many of which are implicated as key regulators of essential signal transduction pathways. Both chaperones are considered to be large multi-domain proteins that rely on ATPase activity and co-chaperone interactions to regulate their conformational cycles for peptide binding and release. The unique positioning of Hsp90 at the crossroads of several fundamental cellular pathways coupled with its known association with diverse oncogenic peptide clients has brought the molecular chaperone under increasing interest as a potential anti-cancer target that is crucially implicated with all eight hallmarks of the disease. Current orthosteric drug discovery efforts aimed at the inhibition of the ATPase domain of Hsp90 have been limited due to high levels of associated toxicity. In an effort to circumnavigate this, the combined focus of research efforts is shifting toward alternative approaches such as interference with co-chaperone binding and the allosteric inhibition/activation of the molecular chaperone. The overriding aim of this thesis was to demonstrate how the computational technique of Perturbation response scanning (PRS) coupled with all-atom molecular dynamics simulations (MD) and dynamic residue interaction network (DRN) analysis can be used as a viable strategy to efficiently scan and accurately identify allosteric control element capable of modulating the functional dynamics of a protein. In pursuit of this goal, this thesis also contributes to the current understanding of the nucleotide dependent allosteric mechanisms at play in cellular functionality of both Hsp70 and Hsp90. All-atom MD simulations of E. coli DnaK provided evidence of nucleotide driven modulation of conformational dynamics in both the catalytically active and inactive states. PRS analysis employed on these trajectories demonstrated sensitivity toward bound nucleotide and peptide substrate, and provided evidence of a putative allosterically active intermediate state between the ATPase active and inactive conformational states. Simultaneous binding of ATP and peptide substrate was found to allosterically prime the chaperone for interstate conversion regardless of the transition direction. Detailed analysis of these allosterically primed states revealed select residue sites capable of selecting a coordinate shift towards the opposite conformational state. In an effort to validate these results, the predicted allosteric hot spot sites were cross-validated with known experimental works and found to overlap with functional sites implicated in allosteric signal propagation and ATPase activation in Hsp70. This study presented for the first time, the application of PRS as a suitable diagnostic tool for the elucidation and quantification of the allosteric potential of select residues to effect functionally relevant global conformational rearrangements. The PRS methodology described in this study was packaged within the Python programming environment in the MD-TASK software suite for command-line ease of use and made freely available. Homology modelling techniques were used to address the lack of experimental structural data for the human cytosolic isoform of Hsp90 and for the first time provided accurate full-length structural models of human Hsp90α in fully-closed and partially-open conformations. Long-range all-atom MD simulations of these structures revealed nucleotide driven modulation of conformational dynamics in Hsp90. Subsequent DRN and PRS analysis of these MD trajectories allowed for the quantification and elucidation of nucleotide driven allosteric modulation in the molecular chaperone. A detailed PRS analysis revealed allosteric inter-domain coupling between the extreme terminals of the chaperone in response to external force perturbations at either domain. Furthermore PRS also identified several individual residue sites that are capable of selecting conformational rearrangements towards functionally relevant states which may be considered to be putative allosteric target sites for future drug discovery efforts Molecular docking techniques were employed to investigate the modulation of conformational dynamics of human Hsp90α in response to ligand binding interactions at two identified allosteric sites at the C-terminal. High throughput screening of a small library of natural compounds indigenous to South Africa revealed three hit compounds at these sites: Cephalostatin 17, 20(29)-Lupene-3β isoferulate and 3'-Bromorubrolide F. All-atom MD simulations on these protein-ligand complexes coupled with DRN analysis and several advanced trajectory based analysis techniques provided evidence of selective allosteric modulation of Hsp90α conformational dynamics in response to the identity and location of the bound ligands. Ligands bound at the four-helix bundle presented as putative allosteric inhibitors of Hsp90α, driving conformational dynamics in favour of dimer opening and possibly dimer separation. Meanwhile, ligand interactions at an adjacent sub-pocket located near the interface between the middle and C-terminal domains demonstrated allosteric activation of the chaperone, modulating conformational dynamics in favour of the fully-closed catalytically active conformational state. Taken together, the data presented in this thesis contributes to the understanding of allosteric modulation of conformational dynamics in Hsp70 and Hsp90, and provides a suitable platform for future biochemical and drug discovery studies. Furthermore, the molecular docking and computational identification of allosteric compounds with suitable binding affinity for allosteric sites at the CTD of human Hsp90α provide for the first time “proof-of-principle” for the use of PRS in conjunction with MD simulations and DRN analysis as a suitable method for the rapid identification of allosteric sites in proteins that can be probed by small molecule interaction. The data presented in this section could pave the way for future allosteric drug discovery studies for the treatment of Hsp90 associated pathologies.
- Full Text:
- Date Issued: 2019
A case-control approach to assess variability in distribution of distance between transcription factor binding site and transcription start site
- Authors: Moos, Abdul Ragmaan
- Date: 2017
- Subjects: Transcription factors , Proteomics , Chromatin , Chromatin immunoprecipitation
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/5315 , vital:20808
- Description: Using the in-silico approach, with ENCODE ChIP-seq data for various transcription factors and different cell types; we systematically compared the distance between the transcription factor binding site (TFBS) and the transcription start (TSS). Our aim was to determine if the same transcription factor binds at a different position relative to the TSS in a normal and an abnormal cell type. We compare distribution of distance of binding sites from the TSS; to make description less verbose we call this “distance” where there is no possibility of confusion. We used a case-control methodology where the distance between the TFBS and the TSS in the normal, non-cancerous or untreated cell type is the control. The distance between the TFBS and the TSS in the cancerous or treated cell type is the case. We use the distance between the TFBS and the TSS in the control as the standard. We compared the distance between the TFBS and the TSS in the case and the control. If the distance between the TFBS and the TSS in the control was greater than the distance between the TFBS and the TSS in the case, we can infer the following. The transcription factor in the case binds closer to the TSS compared to the control. If the distance between the TFBS and the TSS in the control is smaller than the distance between the TFBS and the TSS in the case, we can infer the following. The TF in the case binds further away from the TSS compared to the control. Our method is a screening method whereby we compare ChIP-seq data to determine if there is a difference in the distribution distance between the TFBS and the TSS for normal and abnormal cell types. We used the R package ChIP-Enrich to compare the distribution of distance between ChIP-seq peak and the nearest TSS. ChIP-Enrich produces a histogram with the number of ChIP-seq peaks at a certain distance from the TSS. The results indicate for some transcription factors like GM12878-cMyc and K562-cMyc there is a difference between the distribution of distance between the TFBS and the nearest TSS. cMyc has more binding sites within a distance of 1kb from the TSS in GM12878 when compared to K562. GM12878-CTCF and K562-CTCF have slight differences when comparing their distribution of distance from the TSS. This means CTCF binds almost the same distance from the TSS in both GM12878 and K562. A549-gr treated with dexamethasone is interesting because with increase dose of dexamethasone the distribution of distance from the TSS changes as well.
- Full Text:
- Date Issued: 2017
- Authors: Moos, Abdul Ragmaan
- Date: 2017
- Subjects: Transcription factors , Proteomics , Chromatin , Chromatin immunoprecipitation
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/5315 , vital:20808
- Description: Using the in-silico approach, with ENCODE ChIP-seq data for various transcription factors and different cell types; we systematically compared the distance between the transcription factor binding site (TFBS) and the transcription start (TSS). Our aim was to determine if the same transcription factor binds at a different position relative to the TSS in a normal and an abnormal cell type. We compare distribution of distance of binding sites from the TSS; to make description less verbose we call this “distance” where there is no possibility of confusion. We used a case-control methodology where the distance between the TFBS and the TSS in the normal, non-cancerous or untreated cell type is the control. The distance between the TFBS and the TSS in the cancerous or treated cell type is the case. We use the distance between the TFBS and the TSS in the control as the standard. We compared the distance between the TFBS and the TSS in the case and the control. If the distance between the TFBS and the TSS in the control was greater than the distance between the TFBS and the TSS in the case, we can infer the following. The transcription factor in the case binds closer to the TSS compared to the control. If the distance between the TFBS and the TSS in the control is smaller than the distance between the TFBS and the TSS in the case, we can infer the following. The TF in the case binds further away from the TSS compared to the control. Our method is a screening method whereby we compare ChIP-seq data to determine if there is a difference in the distribution distance between the TFBS and the TSS for normal and abnormal cell types. We used the R package ChIP-Enrich to compare the distribution of distance between ChIP-seq peak and the nearest TSS. ChIP-Enrich produces a histogram with the number of ChIP-seq peaks at a certain distance from the TSS. The results indicate for some transcription factors like GM12878-cMyc and K562-cMyc there is a difference between the distribution of distance between the TFBS and the nearest TSS. cMyc has more binding sites within a distance of 1kb from the TSS in GM12878 when compared to K562. GM12878-CTCF and K562-CTCF have slight differences when comparing their distribution of distance from the TSS. This means CTCF binds almost the same distance from the TSS in both GM12878 and K562. A549-gr treated with dexamethasone is interesting because with increase dose of dexamethasone the distribution of distance from the TSS changes as well.
- Full Text:
- Date Issued: 2017
Transcription factor binding specificity and occupancy : elucidation, modelling and evaluation
- Authors: Kibet, Caleb Kipkurui
- Date: 2017
- Subjects: Transcription factors , Transcription factors -- Data processing , Motif Assessment and Ranking Suite
- Language: English
- Type: Thesis , Doctoral , PhD
- Identifier: vital:21185 , http://hdl.handle.net/10962/6838
- Description: The major contributions of this thesis are addressing the need for an objective quality evaluation of a transcription factor binding model, demonstrating the value of the tools developed to this end and elucidating how in vitro and in vivo information can be utilized to improve TF binding specificity models. Accurate elucidation of TF binding specificity remains an ongoing challenge in gene regulatory research. Several in vitro and in vivo experimental techniques have been developed followed by a proliferation of algorithms, and ultimately, the binding models. This increase led to a choice problem for the end users: which tools to use, and which is the most accurate model for a given TF? Therefore, the first section of this thesis investigates the motif assessment problem: how scoring functions, choice and processing of benchmark data, and statistics used in evaluation affect motif ranking. This analysis revealed that TF motif quality assessment requires a systematic comparative analysis, and that scoring functions used have a TF-specific effect on motif ranking. These results advised the design of a Motif Assessment and Ranking Suite MARS, supported by PBM and ChIP-seq benchmark data and an extensive collection of PWM motifs. MARS implements consistency, enrichment, and scoring and classification-based motif evaluation algorithms. Transcription factor binding is also influenced and determined by contextual factors: chromatin accessibility, competition or cooperation with other TFs, cell line or condition specificity, binding locality (e.g. proximity to transcription start sites) and the shape of the binding site (DNA-shape). In vitro techniques do not capture such context; therefore, this thesis also combines PBM and DNase-seq data using a comparative k-mer enrichment approach that compares open chromatin with genome-wide prevalence, achieving a modest performance improvement when benchmarked on ChIP-seq data. Finally, since statistical and probabilistic methods cannot capture all the information that determine binding, a machine learning approach (XGBooost) was implemented to investigate how the features contribute to TF specificity and occupancy. This combinatorial approach improves the predictive ability of TF specificity models with the most predictive feature being chromatin accessibility, while the DNA-shape and conservation information all significantly improve on the baseline model of k-mer and DNase data. The results and the tools introduced in this thesis are useful for systematic comparative analysis (via MARS) and a combinatorial approach to modelling TF binding specificity, including appropriate feature engineering practices for machine learning modelling.
- Full Text:
- Date Issued: 2017
- Authors: Kibet, Caleb Kipkurui
- Date: 2017
- Subjects: Transcription factors , Transcription factors -- Data processing , Motif Assessment and Ranking Suite
- Language: English
- Type: Thesis , Doctoral , PhD
- Identifier: vital:21185 , http://hdl.handle.net/10962/6838
- Description: The major contributions of this thesis are addressing the need for an objective quality evaluation of a transcription factor binding model, demonstrating the value of the tools developed to this end and elucidating how in vitro and in vivo information can be utilized to improve TF binding specificity models. Accurate elucidation of TF binding specificity remains an ongoing challenge in gene regulatory research. Several in vitro and in vivo experimental techniques have been developed followed by a proliferation of algorithms, and ultimately, the binding models. This increase led to a choice problem for the end users: which tools to use, and which is the most accurate model for a given TF? Therefore, the first section of this thesis investigates the motif assessment problem: how scoring functions, choice and processing of benchmark data, and statistics used in evaluation affect motif ranking. This analysis revealed that TF motif quality assessment requires a systematic comparative analysis, and that scoring functions used have a TF-specific effect on motif ranking. These results advised the design of a Motif Assessment and Ranking Suite MARS, supported by PBM and ChIP-seq benchmark data and an extensive collection of PWM motifs. MARS implements consistency, enrichment, and scoring and classification-based motif evaluation algorithms. Transcription factor binding is also influenced and determined by contextual factors: chromatin accessibility, competition or cooperation with other TFs, cell line or condition specificity, binding locality (e.g. proximity to transcription start sites) and the shape of the binding site (DNA-shape). In vitro techniques do not capture such context; therefore, this thesis also combines PBM and DNase-seq data using a comparative k-mer enrichment approach that compares open chromatin with genome-wide prevalence, achieving a modest performance improvement when benchmarked on ChIP-seq data. Finally, since statistical and probabilistic methods cannot capture all the information that determine binding, a machine learning approach (XGBooost) was implemented to investigate how the features contribute to TF specificity and occupancy. This combinatorial approach improves the predictive ability of TF specificity models with the most predictive feature being chromatin accessibility, while the DNA-shape and conservation information all significantly improve on the baseline model of k-mer and DNase data. The results and the tools introduced in this thesis are useful for systematic comparative analysis (via MARS) and a combinatorial approach to modelling TF binding specificity, including appropriate feature engineering practices for machine learning modelling.
- Full Text:
- Date Issued: 2017
Analysis of predictive power of binding affinity of PBM-derived sequences
- Authors: Matereke, Lavious Tapiwa
- Date: 2015
- Subjects: Transcription factors , Protein binding , DNA-binding proteins , Chromatin , Protein microarrays
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:4161 , http://hdl.handle.net/10962/d1018666
- Description: A transcription factor (TF) is a protein that binds to specific DNA sequences as part of the initiation stage of transcription. Various methods of finding these transcription factor binding sites (TFBS) have been developed. In vivo technologies analyze DNA binding regions known to have bound to a TF in a living cell. Most widely used in vivo methods at the moment are chromatin immunoprecipitation followed by deep sequencing (ChIP-seq) and DNase I hypersensitive sites sequencing. In vitro methods derive TFBS based on experiments with TFs and DNA usually in artificial settings or computationally. An example is the Protein Binding Microarray which uses artificially constructed DNA sequences to determine the short sequences that are most likely to bind to a TF. The major drawback of this approach is that binding of TFs in vivo is also dependent on other factors such as chromatin accessibility and the presence of cofactors. Therefore TFBS derived from the PBM technique might not resemble the true DNA binding sequences. In this work, we use PBM data from the UniPROBE motif database, ChIP-seq data and DNase I hypersensitive sites data. Using the Spearman’s rank correlation and area under receiver operating characteristic curve, we compare the enrichment scores which the PBM approach assigns to its identified sequences and the frequency of these sequences in likely binding regions and the human genome as a whole. We also use central motif enrichment analysis (CentriMo) to compare the enrichment of UniPROBE motifs with in vivo derived motifs (from the JASPAR CORE database) in their respective TF ChIP-seq peak region. CentriMo is applied to 14 TF ChIP-seq peak regions from different cell lines. We aim to establish if there is a relationship between the occurrences of UniPROBE 8-mer patterns in likely binding regions and their enrichment score and how well the in vitro derived motifs match in vivo binding specificity. We did not come out with a particular trend showing failure of the PBM approach to predict in vivo binding specificity. Our results show Ets1, Hnf4a and Tcf3 show prediction failure by the PBM technique in terms of our Spearman’s rank correlation for ChIP-seq data and central motif enrichment analysis. However, the PBM technique also matched the in vivo binding specificities of FoxA2, Pou2f2 and Mafk. Failure of the PBM approach was found to be a result of variability in the TF’s binding specificity, the presence of cofactors, narrow binding specificity and the presence ubiquitous binding patterns.
- Full Text:
- Date Issued: 2015
- Authors: Matereke, Lavious Tapiwa
- Date: 2015
- Subjects: Transcription factors , Protein binding , DNA-binding proteins , Chromatin , Protein microarrays
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:4161 , http://hdl.handle.net/10962/d1018666
- Description: A transcription factor (TF) is a protein that binds to specific DNA sequences as part of the initiation stage of transcription. Various methods of finding these transcription factor binding sites (TFBS) have been developed. In vivo technologies analyze DNA binding regions known to have bound to a TF in a living cell. Most widely used in vivo methods at the moment are chromatin immunoprecipitation followed by deep sequencing (ChIP-seq) and DNase I hypersensitive sites sequencing. In vitro methods derive TFBS based on experiments with TFs and DNA usually in artificial settings or computationally. An example is the Protein Binding Microarray which uses artificially constructed DNA sequences to determine the short sequences that are most likely to bind to a TF. The major drawback of this approach is that binding of TFs in vivo is also dependent on other factors such as chromatin accessibility and the presence of cofactors. Therefore TFBS derived from the PBM technique might not resemble the true DNA binding sequences. In this work, we use PBM data from the UniPROBE motif database, ChIP-seq data and DNase I hypersensitive sites data. Using the Spearman’s rank correlation and area under receiver operating characteristic curve, we compare the enrichment scores which the PBM approach assigns to its identified sequences and the frequency of these sequences in likely binding regions and the human genome as a whole. We also use central motif enrichment analysis (CentriMo) to compare the enrichment of UniPROBE motifs with in vivo derived motifs (from the JASPAR CORE database) in their respective TF ChIP-seq peak region. CentriMo is applied to 14 TF ChIP-seq peak regions from different cell lines. We aim to establish if there is a relationship between the occurrences of UniPROBE 8-mer patterns in likely binding regions and their enrichment score and how well the in vitro derived motifs match in vivo binding specificity. We did not come out with a particular trend showing failure of the PBM approach to predict in vivo binding specificity. Our results show Ets1, Hnf4a and Tcf3 show prediction failure by the PBM technique in terms of our Spearman’s rank correlation for ChIP-seq data and central motif enrichment analysis. However, the PBM technique also matched the in vivo binding specificities of FoxA2, Pou2f2 and Mafk. Failure of the PBM approach was found to be a result of variability in the TF’s binding specificity, the presence of cofactors, narrow binding specificity and the presence ubiquitous binding patterns.
- Full Text:
- Date Issued: 2015
Comparison of protein binding microarray derived and ChIP-seq derived transcription factor binding DNA motifs
- Hlatshwayo, Nkosikhona Rejoyce
- Authors: Hlatshwayo, Nkosikhona Rejoyce
- Date: 2015
- Subjects: Protein binding , DNA , DNA microarrays , Transcription factors , DNA-protein interactions , Gene regulatory networks
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:4146 , http://hdl.handle.net/10962/d1017907
- Description: Transcription factors (TFs) are biologically important proteins that interact with transcription machinery and bind DNA regulatory sequences to regulate gene expression by modulating the synthesis of the messenger RNA. The regulatory sequences comprise of short conserved regions of a specific length called motifs . TFs have very diverse roles in different cells and play a very significant role in development. TFs have been associated with carcinogenesis in various tissue types, as well as developmental and hormone response disorders. They may be responsible for the regulation of oncogenes and can be oncogenic. Consequently, understanding TF binding and knowing the motifs to which they bind is worthy of attention and research focus. Various projects have made the study of TF binding their main focus; nevertheless, much about TF binding remains confounding. Chromatin immunoprecipitation in conjunction with deep sequencing (ChIP-seq) techniques are a popular method used to investigate DNA-TF interactions in vivo. This procedure is followed by motif discovery and motif enrichment analysis using relevant tools. Protein Binding Microarrays (PBMs) are an in vitro method for investigating DNA-TF interactions. We use a motif enrichment analysis tools (CentriMo and AME) and an empirical quality assessment tool (Area under the ROC curve) to investigate which method yields motifs that are a true representation of in vivo binding. Motif enrichment analysis: On average, ChIP-seq derived motifs from the JASPAR Core database outperformed PBM derived ones from the UniPROBE mouse database. However, the performance of motifs derived using these two methods is not much different from each other when using CentriMo and AME. The E-values from Motif enrichment analysis were not too different from each other or 0. CentriMo showed that in 35 cases JASPAR Core ChIP-seq derived motifs outperformed UniPROBE mouse PBM derived motifs, while it was only in 11 cases that PBM derived motifs outperformed ChIP-seq derived motifs. AME showed that in 18 cases JASPAR Core ChIP-seq derived motifs did better, while only it was only in 3 cases that UniPROBE motifs outperformed ChIP-seq derived motifs. We could not distinguish the performance in 25 cases. Empirical quality assessment: Area under the ROC curve values computations followed by a two-sided t-test showed that there is no significant difference in the average performances of the motifs from the two databases (with 95% confidence, mean of differences=0.0088125 p-value= 0.4874, DF=47) .
- Full Text:
- Date Issued: 2015
- Authors: Hlatshwayo, Nkosikhona Rejoyce
- Date: 2015
- Subjects: Protein binding , DNA , DNA microarrays , Transcription factors , DNA-protein interactions , Gene regulatory networks
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:4146 , http://hdl.handle.net/10962/d1017907
- Description: Transcription factors (TFs) are biologically important proteins that interact with transcription machinery and bind DNA regulatory sequences to regulate gene expression by modulating the synthesis of the messenger RNA. The regulatory sequences comprise of short conserved regions of a specific length called motifs . TFs have very diverse roles in different cells and play a very significant role in development. TFs have been associated with carcinogenesis in various tissue types, as well as developmental and hormone response disorders. They may be responsible for the regulation of oncogenes and can be oncogenic. Consequently, understanding TF binding and knowing the motifs to which they bind is worthy of attention and research focus. Various projects have made the study of TF binding their main focus; nevertheless, much about TF binding remains confounding. Chromatin immunoprecipitation in conjunction with deep sequencing (ChIP-seq) techniques are a popular method used to investigate DNA-TF interactions in vivo. This procedure is followed by motif discovery and motif enrichment analysis using relevant tools. Protein Binding Microarrays (PBMs) are an in vitro method for investigating DNA-TF interactions. We use a motif enrichment analysis tools (CentriMo and AME) and an empirical quality assessment tool (Area under the ROC curve) to investigate which method yields motifs that are a true representation of in vivo binding. Motif enrichment analysis: On average, ChIP-seq derived motifs from the JASPAR Core database outperformed PBM derived ones from the UniPROBE mouse database. However, the performance of motifs derived using these two methods is not much different from each other when using CentriMo and AME. The E-values from Motif enrichment analysis were not too different from each other or 0. CentriMo showed that in 35 cases JASPAR Core ChIP-seq derived motifs outperformed UniPROBE mouse PBM derived motifs, while it was only in 11 cases that PBM derived motifs outperformed ChIP-seq derived motifs. AME showed that in 18 cases JASPAR Core ChIP-seq derived motifs did better, while only it was only in 3 cases that UniPROBE motifs outperformed ChIP-seq derived motifs. We could not distinguish the performance in 25 cases. Empirical quality assessment: Area under the ROC curve values computations followed by a two-sided t-test showed that there is no significant difference in the average performances of the motifs from the two databases (with 95% confidence, mean of differences=0.0088125 p-value= 0.4874, DF=47) .
- Full Text:
- Date Issued: 2015
Establishment of human OCT4 as a putative HSP90 client protein: a case for HSP90 chaperoning pluripotency
- Authors: Sterrenberg, Jason Neville
- Date: 2015
- Subjects: Induced pluripotent stem cells , Heat shock proteins , Stem cells , Transcription factors , Molecular chaperones
- Language: English
- Type: Doctoral theses , text
- Identifier: http://hdl.handle.net/10962/194010 , vital:45415 , 10.21504/10962/194010
- Description: The therapeutic potential of stem cells is already being harnessed in clinical trails. Of even greater therapeutic potential has been the discovery of mechanisms to reprogram differentiated cells into a pluripotent stem cell-like state known as induced pluripotent stem cells (iPSCs). Stem cell nature is governed and maintained by a hierarchy of transcription factors, the apex of which is OCT4. Although much research has elucidated the transcriptional regulation of OCT4, OCT4 regulated gene expression profiles and OCT4 transcriptional activation mechanisms in both stem cell biology and cellular reprogramming to iPSCs, the fundamental biochemistry surrounding the OCT4 transcription factor remains largely unknown. In order to analyze the biochemical relationship between HSP90 and human OCT4 we developed an exogenous active human OCT4 expression model with human OCT4 under transcriptional control of a constitutive promoter. We identified the direct interaction between HSP90 and human OCT4 despite the fact that the proteins predominantly display differential subcellular localizations. We show that HSP90 inhibition resulted in degradation of human OCT4 via the ubiquitin proteasome degradation pathway. As human OCT4 and HSP90 did not interact in the nucleus, we suggest that HSP90 functions in the cytoplasmic stabilization of human OCT4. Our analysis suggests HSP90 inhibition inhibits the transcriptional activity of human OCT4 dimers without affecting monomeric OCT4 activity. Additionally our data suggests that the HSP90 and human OCT4 complex is modulated by phosphorylation events either promoting or abrogating the interaction between HSP90 and human OCT4. Our data suggest that human OCT4 displays the characteristics describing HSP90 client proteins, therefore we identify human OCT4 as a putative HSP90 client protein. The regulation of the transcription factor OCT4 by HSP90 provides fundamental insights into the complex biochemistry of stem cell biology. This may also be suggestive that HSP90 not only regulates stem cell biology by maintaining routine cellular homeostasis but additionally through the direct regulation of pluripotency factors. , Thesis (PhD) -- Faculty of Science, Biochemistry and Microbiology, 2015
- Full Text:
- Date Issued: 2015
- Authors: Sterrenberg, Jason Neville
- Date: 2015
- Subjects: Induced pluripotent stem cells , Heat shock proteins , Stem cells , Transcription factors , Molecular chaperones
- Language: English
- Type: Doctoral theses , text
- Identifier: http://hdl.handle.net/10962/194010 , vital:45415 , 10.21504/10962/194010
- Description: The therapeutic potential of stem cells is already being harnessed in clinical trails. Of even greater therapeutic potential has been the discovery of mechanisms to reprogram differentiated cells into a pluripotent stem cell-like state known as induced pluripotent stem cells (iPSCs). Stem cell nature is governed and maintained by a hierarchy of transcription factors, the apex of which is OCT4. Although much research has elucidated the transcriptional regulation of OCT4, OCT4 regulated gene expression profiles and OCT4 transcriptional activation mechanisms in both stem cell biology and cellular reprogramming to iPSCs, the fundamental biochemistry surrounding the OCT4 transcription factor remains largely unknown. In order to analyze the biochemical relationship between HSP90 and human OCT4 we developed an exogenous active human OCT4 expression model with human OCT4 under transcriptional control of a constitutive promoter. We identified the direct interaction between HSP90 and human OCT4 despite the fact that the proteins predominantly display differential subcellular localizations. We show that HSP90 inhibition resulted in degradation of human OCT4 via the ubiquitin proteasome degradation pathway. As human OCT4 and HSP90 did not interact in the nucleus, we suggest that HSP90 functions in the cytoplasmic stabilization of human OCT4. Our analysis suggests HSP90 inhibition inhibits the transcriptional activity of human OCT4 dimers without affecting monomeric OCT4 activity. Additionally our data suggests that the HSP90 and human OCT4 complex is modulated by phosphorylation events either promoting or abrogating the interaction between HSP90 and human OCT4. Our data suggest that human OCT4 displays the characteristics describing HSP90 client proteins, therefore we identify human OCT4 as a putative HSP90 client protein. The regulation of the transcription factor OCT4 by HSP90 provides fundamental insights into the complex biochemistry of stem cell biology. This may also be suggestive that HSP90 not only regulates stem cell biology by maintaining routine cellular homeostasis but additionally through the direct regulation of pluripotency factors. , Thesis (PhD) -- Faculty of Science, Biochemistry and Microbiology, 2015
- Full Text:
- Date Issued: 2015
Analysis of transcription factor binding specificity using ChIP-seq data.
- Authors: Kibet, Caleb Kipkurui
- Date: 2014
- Subjects: Transcription factors , Chronic myeloid leukemia , Antioncogenes , Cancer cells -- Growth -- Regulation
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:4115 , http://hdl.handle.net/10962/d1013131
- Description: Transcription factors (TFs) are key regulators of gene expression whose failure has been implicated in many diseases, including cancer. They bind at various sites at different specificity depending on the prevailing cellular conditions, disease, development stage or environmental conditions of the cell. TF binding specificity is how well a TF distinguishes functional sites from potential non-functional sites to form a useful regulatory network. Owing to its role in diseases, various techniques have been used to determine TF binding specificity in vitro and in vivo, including chromatin immuno-precipitation followed by massively parallel sequencing (ChIP-seq). ChIP-seq is an in vivo technique that considers how the chromatin landscape affects TF binding. Motif enrichment analysis (MEA) tools are used to identify motifs that are over-represented in ChIP-seq peak regions. One such tool, CentriMo, finds over-represented motifs at the center since peak calling software are biased to declaring binding regions centered at the TF binding site. In this study, we investigate the use of CentriMo and other MEA tools to determine the difference in motif enrichment attributed presence of Chronic Myeloid leukemia (CML)), treatment with Interferon (IFN) and Dexamethasone (DEX) compared to control based on Fisher’s exact test; using uniform peaks ChIP-seq data generated by the ENCODE consortium. CentriMo proved to be capable. We observed differential motif enrichment of TFs with tumor promoter activity: YY1, CEBPA, Egr1, Cmyc family, Gata1 and JunD in K562 while Stat1, Irf1, and Runx1 in Gm12878. Enrichment of CTCF in Gm12878 with YY1 as the immuno-precipitated (ChIP-ed) factor and the presence of significant spacing (SpaMo analysis) of CTCF and YY1 in Gm12878 but not in K562 could show that CTCF, as a repressor, helps in maintaining the required YY1 level in a normal cell line. IFN might reduce Cmyc and the Jun family of TFs binding via the repressive action of CTCF and E2f2. We also show that the concentration of DEX treatment affects motif enrichment with 50nm being an optimum concentration for Gr binding by maintaining open chromatin via AP1 TF. This study has demonstrated the usefulness of CentriMo for TF binding specificity analysis.
- Full Text:
- Date Issued: 2014
- Authors: Kibet, Caleb Kipkurui
- Date: 2014
- Subjects: Transcription factors , Chronic myeloid leukemia , Antioncogenes , Cancer cells -- Growth -- Regulation
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:4115 , http://hdl.handle.net/10962/d1013131
- Description: Transcription factors (TFs) are key regulators of gene expression whose failure has been implicated in many diseases, including cancer. They bind at various sites at different specificity depending on the prevailing cellular conditions, disease, development stage or environmental conditions of the cell. TF binding specificity is how well a TF distinguishes functional sites from potential non-functional sites to form a useful regulatory network. Owing to its role in diseases, various techniques have been used to determine TF binding specificity in vitro and in vivo, including chromatin immuno-precipitation followed by massively parallel sequencing (ChIP-seq). ChIP-seq is an in vivo technique that considers how the chromatin landscape affects TF binding. Motif enrichment analysis (MEA) tools are used to identify motifs that are over-represented in ChIP-seq peak regions. One such tool, CentriMo, finds over-represented motifs at the center since peak calling software are biased to declaring binding regions centered at the TF binding site. In this study, we investigate the use of CentriMo and other MEA tools to determine the difference in motif enrichment attributed presence of Chronic Myeloid leukemia (CML)), treatment with Interferon (IFN) and Dexamethasone (DEX) compared to control based on Fisher’s exact test; using uniform peaks ChIP-seq data generated by the ENCODE consortium. CentriMo proved to be capable. We observed differential motif enrichment of TFs with tumor promoter activity: YY1, CEBPA, Egr1, Cmyc family, Gata1 and JunD in K562 while Stat1, Irf1, and Runx1 in Gm12878. Enrichment of CTCF in Gm12878 with YY1 as the immuno-precipitated (ChIP-ed) factor and the presence of significant spacing (SpaMo analysis) of CTCF and YY1 in Gm12878 but not in K562 could show that CTCF, as a repressor, helps in maintaining the required YY1 level in a normal cell line. IFN might reduce Cmyc and the Jun family of TFs binding via the repressive action of CTCF and E2f2. We also show that the concentration of DEX treatment affects motif enrichment with 50nm being an optimum concentration for Gr binding by maintaining open chromatin via AP1 TF. This study has demonstrated the usefulness of CentriMo for TF binding specificity analysis.
- Full Text:
- Date Issued: 2014
A central enrichment-based comparison of two alternative methods of generating transcription factor binding motifs from protein binding microarray data
- Authors: Mahaye, Ntombikayise
- Date: 2013 , 2013-03-13
- Subjects: Transcription factors , Bioinformatics , Protein binding , Protein microarrays , Cell lines
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:3890 , http://hdl.handle.net/10962/d1003049 , Transcription factors , Bioinformatics , Protein binding , Protein microarrays , Cell lines
- Description: Characterising transcription factor binding sites (TFBS) is an important problem in bioinformatics, since predicting binding sites has many applications such as predicting gene regulation. ChIP-seq is a powerful in vivo method for generating genome-wide putative binding regions for transcription factors (TFs). CentriMo is an algorithm that measures central enrichment of a motif and has previously been used as motif enrichment analysis (MEA) tool. CentriMo uses the fact that ChIP-seq peak calling methods are likely to be biased towards the centre of the putative binding region, at least in cases where there is direct binding. CentriMo calculates a binomial p-value representing central enrichment, based on the central bias of the binding site with the highest likelihood ratio. In cases where binding is indirect or involves cofactors, a more complex distribution of preferred binding sites may occur but, in many cases, a low CentriMo p-value and low width of maximum enrichment (about 100bp) are strong evidence that the motif in question is the true binding motif. Several other MEA tools have been developed, but they do not consider motif central enrichment. The study investigates the claim made by Zhao and Stormo (2011) that they have identified a simpler method than that used to derive the UniPROBE motif database for creating motifs from protein binding microarray (PBM) data, which they call BEEML-PBM (Binding Energy Estimation by Maximum Likelihood-PBM). To accomplish this, CentriMo is employed on 13 motifs from both motif databases. The results indicate that there is no conclusive difference in the quality of motifs from the original PBM and BEEML-PBM approaches. CentriMo provides an understanding of the mechanisms by which TFs bind to DNA. Out of 13 TFs for which ChIP-seq data is used, BEEML-PBM reports five better motifs and twice it has not had any central enrichment when the best PBM motif does. PBM approach finds seven motifs with better central enrichment. On the other hand, across all variations, the number of examples where PBM is better is not high enough to conclude that it is overall the better approach. Some TFs bind directly to DNA, some indirect or in combination with other TFs. Some of the predicted mechanisms are supported by literature evidence. This study further revealed that the binding specificity of a TF is different in different cell types and development stages. A TF is up-regulated in a cell line where it performs its biological function. The discovery of cell line differences, which has not been done before in any CentriMo study, is interesting and provides reasons to study this further.
- Full Text:
- Date Issued: 2013
- Authors: Mahaye, Ntombikayise
- Date: 2013 , 2013-03-13
- Subjects: Transcription factors , Bioinformatics , Protein binding , Protein microarrays , Cell lines
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:3890 , http://hdl.handle.net/10962/d1003049 , Transcription factors , Bioinformatics , Protein binding , Protein microarrays , Cell lines
- Description: Characterising transcription factor binding sites (TFBS) is an important problem in bioinformatics, since predicting binding sites has many applications such as predicting gene regulation. ChIP-seq is a powerful in vivo method for generating genome-wide putative binding regions for transcription factors (TFs). CentriMo is an algorithm that measures central enrichment of a motif and has previously been used as motif enrichment analysis (MEA) tool. CentriMo uses the fact that ChIP-seq peak calling methods are likely to be biased towards the centre of the putative binding region, at least in cases where there is direct binding. CentriMo calculates a binomial p-value representing central enrichment, based on the central bias of the binding site with the highest likelihood ratio. In cases where binding is indirect or involves cofactors, a more complex distribution of preferred binding sites may occur but, in many cases, a low CentriMo p-value and low width of maximum enrichment (about 100bp) are strong evidence that the motif in question is the true binding motif. Several other MEA tools have been developed, but they do not consider motif central enrichment. The study investigates the claim made by Zhao and Stormo (2011) that they have identified a simpler method than that used to derive the UniPROBE motif database for creating motifs from protein binding microarray (PBM) data, which they call BEEML-PBM (Binding Energy Estimation by Maximum Likelihood-PBM). To accomplish this, CentriMo is employed on 13 motifs from both motif databases. The results indicate that there is no conclusive difference in the quality of motifs from the original PBM and BEEML-PBM approaches. CentriMo provides an understanding of the mechanisms by which TFs bind to DNA. Out of 13 TFs for which ChIP-seq data is used, BEEML-PBM reports five better motifs and twice it has not had any central enrichment when the best PBM motif does. PBM approach finds seven motifs with better central enrichment. On the other hand, across all variations, the number of examples where PBM is better is not high enough to conclude that it is overall the better approach. Some TFs bind directly to DNA, some indirect or in combination with other TFs. Some of the predicted mechanisms are supported by literature evidence. This study further revealed that the binding specificity of a TF is different in different cell types and development stages. A TF is up-regulated in a cell line where it performs its biological function. The discovery of cell line differences, which has not been done before in any CentriMo study, is interesting and provides reasons to study this further.
- Full Text:
- Date Issued: 2013
- «
- ‹
- 1
- ›
- »