Bayesian spatio-temporal zero-inflated mixed models for overdispersion on chronic disease mapping
- Osuji, Georgeleen O https://orcid.org/0000-0002-8408-3928
- Authors: Osuji, Georgeleen O https://orcid.org/0000-0002-8408-3928
- Date: 2021-12
- Subjects: Medical mapping , Bayesian statistical decision theory
- Language: English
- Type: Doctoral theses , text
- Identifier: http://hdl.handle.net/10353/23644 , vital:58230
- Description: Background: Life expectancy in most developing countries has remarkably increased and decreased in mortality, but under 5 years old mortality has increased due to HIV and Tuberculosis incidence. Many factors have been established to influence the mortality rate among HIV patients and understanding the factors contribution to the risk of under 5-year-old mortality is important for designing appropriate health interventions. Excess zeros usually occur in such HIV mortality count data. Mixed models consisting of count part and zero part are often used to describe the observed excess zero in the data. Poisson models are popular modeling inference, but Negative-Binomial models are more flexible in analyzing count data and dealing with overdispersion. Method: This research proposed to develop two-part hurdle models in analyzing areal zero count data. A spatial Bayesian lognormal-logit hurdle model (BLLHM) with random effects characterizes and cross-spatial dependencies were introduced. The parameter inferences and predictions were evaluated using the Markov Chain Monte Carlo algorithm. The model proposed was applied to HIV-positive under 5-year-old mortality collected from the Eastern Cape Department of Health. Results: Bayesian lognormal-logit hurdle model is selected as the best model fit. It is observed that the total number of HIV patients not on ART-HIVnotTB (0.000612, p <0.000) was positively and statistically significantly associated with the HIV-positive mortality of under 5 years patients. Both CD4 counts were done on newly diagnosed HIV rate (CD4count) and HIV-positive new patients screened for TB rate (HIVTBrate) were negatively and statistically significantly associated with the HIV-positive mortality of under 5 years patients (-0.6294, p = 0.000 and -0.00056, p = 0.0052). However, the covariate HIV positive Tuberculosis Preventive therapy (TPT) uptake rate (HIVandTB) was not statistically significantly associated with the HIV-positive mortality of under 5 years patients (-0.00155, p = 0.5392). Conclusion: The model is flexible to deal with zero-inflated and over-dispersed count data. There is a need to consider the risk of cause-specific under-5-year-old mortality in terms of spatial effects. , Thesis (PhD) -- Faculty of Science and Agriculture, 2021
- Full Text:
- Date Issued: 2021-12
- Authors: Osuji, Georgeleen O https://orcid.org/0000-0002-8408-3928
- Date: 2021-12
- Subjects: Medical mapping , Bayesian statistical decision theory
- Language: English
- Type: Doctoral theses , text
- Identifier: http://hdl.handle.net/10353/23644 , vital:58230
- Description: Background: Life expectancy in most developing countries has remarkably increased and decreased in mortality, but under 5 years old mortality has increased due to HIV and Tuberculosis incidence. Many factors have been established to influence the mortality rate among HIV patients and understanding the factors contribution to the risk of under 5-year-old mortality is important for designing appropriate health interventions. Excess zeros usually occur in such HIV mortality count data. Mixed models consisting of count part and zero part are often used to describe the observed excess zero in the data. Poisson models are popular modeling inference, but Negative-Binomial models are more flexible in analyzing count data and dealing with overdispersion. Method: This research proposed to develop two-part hurdle models in analyzing areal zero count data. A spatial Bayesian lognormal-logit hurdle model (BLLHM) with random effects characterizes and cross-spatial dependencies were introduced. The parameter inferences and predictions were evaluated using the Markov Chain Monte Carlo algorithm. The model proposed was applied to HIV-positive under 5-year-old mortality collected from the Eastern Cape Department of Health. Results: Bayesian lognormal-logit hurdle model is selected as the best model fit. It is observed that the total number of HIV patients not on ART-HIVnotTB (0.000612, p <0.000) was positively and statistically significantly associated with the HIV-positive mortality of under 5 years patients. Both CD4 counts were done on newly diagnosed HIV rate (CD4count) and HIV-positive new patients screened for TB rate (HIVTBrate) were negatively and statistically significantly associated with the HIV-positive mortality of under 5 years patients (-0.6294, p = 0.000 and -0.00056, p = 0.0052). However, the covariate HIV positive Tuberculosis Preventive therapy (TPT) uptake rate (HIVandTB) was not statistically significantly associated with the HIV-positive mortality of under 5 years patients (-0.00155, p = 0.5392). Conclusion: The model is flexible to deal with zero-inflated and over-dispersed count data. There is a need to consider the risk of cause-specific under-5-year-old mortality in terms of spatial effects. , Thesis (PhD) -- Faculty of Science and Agriculture, 2021
- Full Text:
- Date Issued: 2021-12
A Bayesian approach to tilted-ring modelling of galaxies
- Authors: Maina, Eric Kamau
- Date: 2020
- Subjects: Bayesian statistical decision theory , Galaxies , Radio astronomy , TiRiFiC (Tilted Ring Fitting Code) , Neutral hydrogen , Spectroscopic data cubes , Galaxy parametrisation
- Language: English
- Type: text , Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/145783 , vital:38466
- Description: The orbits of neutral hydrogen (H I) gas found in most disk galaxies are circular and also exhibit long-lived warps at large radii where the restoring gravitational forces of the inner disk become weak (Spekkens and Giovanelli 2006). These warps make the tilted-ring model an ideal choice for galaxy parametrisation. Analysis software utilizing the tilted-ring-model can be grouped into two and three-dimensional based software. Józsa et al. (2007b) demonstrated that three dimensional based software is better suited for galaxy parametrisation because it is affected by the effect of beam smearing only by increasing the uncertainty of parameters but not with the notorious systematic effects observed for two-dimensional fitting techniques. TiRiFiC, The Tilted Ring Fitting Code (Józsa et al. 2007b), is a software to construct parameterised models of high-resolution data cubes of rotating galaxies. It uses the tilted-ring model, and with that, a combination of some parameters such as surface brightness, position angle, rotation velocity and inclination, to describe galaxies. TiRiFiC works by directly fitting tilted-ring models to spectroscopic data cubes and hence is not affected by beam smearing or line-of-site-effects, e.g. strong warps. Because of that, the method is unavoidable as an analytic method in future Hi surveys. In the current implementation, though, there are several drawbacks. The implemented optimisers search for local solutions in parameter space only, do not quantify correlations between parameters and cannot find errors of single parameters. In theory, these drawbacks can be overcome by using Bayesian statistics, implemented in Multinest (Feroz et al. 2008), as it allows for sampling a posterior distribution irrespective of its multimodal nature resulting in parameter samples that correspond to the maximum in the posterior distribution. These parameter samples can be used as well to quantify correlations and find errors of single parameters. Since this method employs Bayesian statistics, it also allows the user to leverage any prior information they may have on parameter values.
- Full Text:
- Date Issued: 2020
- Authors: Maina, Eric Kamau
- Date: 2020
- Subjects: Bayesian statistical decision theory , Galaxies , Radio astronomy , TiRiFiC (Tilted Ring Fitting Code) , Neutral hydrogen , Spectroscopic data cubes , Galaxy parametrisation
- Language: English
- Type: text , Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/145783 , vital:38466
- Description: The orbits of neutral hydrogen (H I) gas found in most disk galaxies are circular and also exhibit long-lived warps at large radii where the restoring gravitational forces of the inner disk become weak (Spekkens and Giovanelli 2006). These warps make the tilted-ring model an ideal choice for galaxy parametrisation. Analysis software utilizing the tilted-ring-model can be grouped into two and three-dimensional based software. Józsa et al. (2007b) demonstrated that three dimensional based software is better suited for galaxy parametrisation because it is affected by the effect of beam smearing only by increasing the uncertainty of parameters but not with the notorious systematic effects observed for two-dimensional fitting techniques. TiRiFiC, The Tilted Ring Fitting Code (Józsa et al. 2007b), is a software to construct parameterised models of high-resolution data cubes of rotating galaxies. It uses the tilted-ring model, and with that, a combination of some parameters such as surface brightness, position angle, rotation velocity and inclination, to describe galaxies. TiRiFiC works by directly fitting tilted-ring models to spectroscopic data cubes and hence is not affected by beam smearing or line-of-site-effects, e.g. strong warps. Because of that, the method is unavoidable as an analytic method in future Hi surveys. In the current implementation, though, there are several drawbacks. The implemented optimisers search for local solutions in parameter space only, do not quantify correlations between parameters and cannot find errors of single parameters. In theory, these drawbacks can be overcome by using Bayesian statistics, implemented in Multinest (Feroz et al. 2008), as it allows for sampling a posterior distribution irrespective of its multimodal nature resulting in parameter samples that correspond to the maximum in the posterior distribution. These parameter samples can be used as well to quantify correlations and find errors of single parameters. Since this method employs Bayesian statistics, it also allows the user to leverage any prior information they may have on parameter values.
- Full Text:
- Date Issued: 2020
Bayesian accelerated life tests for the Weibull distribution under non-informative priors
- Authors: Mostert, Philip
- Date: 2020
- Subjects: Accelerated life testing -- Statistical methods , Accelerated life testing -- Mathematical models , Failure time data analysis , Bayesian statistical decision theory , Monte Carlo method , Weibull distribution
- Language: English
- Type: text , Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/172181 , vital:42173
- Description: In a competitive world where products are designed to last for long periods of time, obtaining time-to-failure data is both difficult and costly. Hence for products with high reliability, accelerated life testing is required to obtain relevant life-data quickly. This is done by placing the products under higher-than-use stress levels, thereby causing the products to fail prematurely. Part of the analysis of accelerated life-data requires a life distribution that describes the lifetime of a product at a given stress level and a life-stress relationship – which is some function that describes the way in which the life distribution changes across different stress levels. In this thesis it is assumed that the underlying life distribution is the wellknown Weibull distribution, with shape parameter constant over all stress levels and scale parameter as a log-linear function of stress. The primary objective of this thesis is to obtain estimates from Bayesian analysis, and this thesis considers five types of non-informative prior distributions: Jeffreys’ prior, reference priors, maximal data information prior, uniform prior and probability matching priors. Since the associated posterior distribution under all the derived non-informative priors are of an unknown form, the propriety of the posterior distributions is assessed to ensure admissible results. For comparison purposes, estimates obtained via the method of maximum likelihood are also considered. Finding these estimates requires solving non-linear equations, hence the Newton-Raphson algorithm is used to obtain estimates. A simulation study based on the time-to-failure of accelerated data is conducted to compare results between maximum likelihood and Bayesian estimates. As a result of the Bayesian posterior distributions being analytically intractable, two methods to obtain Bayesian estimates are considered: Markov chain Monte Carlo methods and Lindley’s approximation technique. In the simulation study the posterior means and the root mean squared error values of the estimates under the symmetric squared error loss function and the two asymmetric loss functions: the LINEX loss function and general entropy loss function, are considered. Furthermore the coverage rates for the Bayesian Markov chain Monte Carlo and maximum likelihood estimates are found, and are compared by their average interval lengths. A case study using a dataset based on accelerated time-to-failure of an insulating fluid is considered. The fit of these data for the Weibull distribution is studied and is compared to that of other popular life distributions. A full simulation study is conducted to illustrate convergence of the proper posterior distributions. Both maximum likelihood and Bayesian estimates are found for these data. The deviance information criterion is used to compare Bayesian estimates between the prior distributions. The case study is concluded by finding reliability estimates of the data at use-stress levels.
- Full Text:
- Date Issued: 2020
- Authors: Mostert, Philip
- Date: 2020
- Subjects: Accelerated life testing -- Statistical methods , Accelerated life testing -- Mathematical models , Failure time data analysis , Bayesian statistical decision theory , Monte Carlo method , Weibull distribution
- Language: English
- Type: text , Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/172181 , vital:42173
- Description: In a competitive world where products are designed to last for long periods of time, obtaining time-to-failure data is both difficult and costly. Hence for products with high reliability, accelerated life testing is required to obtain relevant life-data quickly. This is done by placing the products under higher-than-use stress levels, thereby causing the products to fail prematurely. Part of the analysis of accelerated life-data requires a life distribution that describes the lifetime of a product at a given stress level and a life-stress relationship – which is some function that describes the way in which the life distribution changes across different stress levels. In this thesis it is assumed that the underlying life distribution is the wellknown Weibull distribution, with shape parameter constant over all stress levels and scale parameter as a log-linear function of stress. The primary objective of this thesis is to obtain estimates from Bayesian analysis, and this thesis considers five types of non-informative prior distributions: Jeffreys’ prior, reference priors, maximal data information prior, uniform prior and probability matching priors. Since the associated posterior distribution under all the derived non-informative priors are of an unknown form, the propriety of the posterior distributions is assessed to ensure admissible results. For comparison purposes, estimates obtained via the method of maximum likelihood are also considered. Finding these estimates requires solving non-linear equations, hence the Newton-Raphson algorithm is used to obtain estimates. A simulation study based on the time-to-failure of accelerated data is conducted to compare results between maximum likelihood and Bayesian estimates. As a result of the Bayesian posterior distributions being analytically intractable, two methods to obtain Bayesian estimates are considered: Markov chain Monte Carlo methods and Lindley’s approximation technique. In the simulation study the posterior means and the root mean squared error values of the estimates under the symmetric squared error loss function and the two asymmetric loss functions: the LINEX loss function and general entropy loss function, are considered. Furthermore the coverage rates for the Bayesian Markov chain Monte Carlo and maximum likelihood estimates are found, and are compared by their average interval lengths. A case study using a dataset based on accelerated time-to-failure of an insulating fluid is considered. The fit of these data for the Weibull distribution is studied and is compared to that of other popular life distributions. A full simulation study is conducted to illustrate convergence of the proper posterior distributions. Both maximum likelihood and Bayesian estimates are found for these data. The deviance information criterion is used to compare Bayesian estimates between the prior distributions. The case study is concluded by finding reliability estimates of the data at use-stress levels.
- Full Text:
- Date Issued: 2020
Bayesian hierarchical modelling with application in spatial epidemiology
- Authors: Southey, Richard Robert
- Date: 2018
- Subjects: Bayesian statistical decision theory , Spatial analysis (Statistics) , Medical mapping , Pericarditis , Mortality Statistics
- Language: English
- Type: text , Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/59489 , vital:27617
- Description: Disease mapping and spatial statistics have become an important part of modern day statistics and have increased in popularity as the methods and techniques have evolved. The application of disease mapping is not only confined to the analysis of diseases as other applications of disease mapping can be found in Econometric and financial disciplines. This thesis will consider two data sets. These are the Georgia oral cancer 2004 data set and the South African acute pericarditis 2014 data set. The Georgia data set will be used to assess the hyperprior sensitivity of the precision for the uncorrelated heterogeneity and correlated heterogeneity components in a convolution model. The correlated heterogeneity will be modelled by a conditional autoregressive prior distribution and the uncorrelated heterogeneity will be modelled with a zero mean Gaussian prior distribution. The sensitivity analysis will be performed using three models with conjugate, Jeffreys' and a fixed parameter prior for the hyperprior distribution of the precision for the uncorrelated heterogeneity component. A simulation study will be done to compare four prior distributions which will be the conjugate, Jeffreys', probability matching and divergence priors. The three models will be fitted in WinBUGS® using a Bayesian approach. The results of the three models will be in the form of disease maps, figures and tables. The results show that the hyperprior of the precision for the uncorrelated heterogeneity and correlated heterogeneity components are sensitive to changes and will result in different results depending on the specification of the hyperprior distribution of the precision for the two components in the model. The South African data set will be used to examine whether there is a difference between the proper conditional autoregressive prior and intrinsic conditional autoregressive prior for the correlated heterogeneity component in a convolution model. Two models will be fitted in WinBUGS® for this comparison. Both the hyperpriors of the precision for the uncorrelated heterogeneity and correlated heterogeneity components will be modelled using a Jeffreys' prior distribution. The results show that there is no significant difference between the results of the model with a proper conditional autoregressive prior and intrinsic conditional autoregressive prior for the South African data, although there are a few disadvantages of using a proper conditional autoregressive prior for the correlated heterogeneity which will be stated in the conclusion.
- Full Text:
- Date Issued: 2018
- Authors: Southey, Richard Robert
- Date: 2018
- Subjects: Bayesian statistical decision theory , Spatial analysis (Statistics) , Medical mapping , Pericarditis , Mortality Statistics
- Language: English
- Type: text , Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/59489 , vital:27617
- Description: Disease mapping and spatial statistics have become an important part of modern day statistics and have increased in popularity as the methods and techniques have evolved. The application of disease mapping is not only confined to the analysis of diseases as other applications of disease mapping can be found in Econometric and financial disciplines. This thesis will consider two data sets. These are the Georgia oral cancer 2004 data set and the South African acute pericarditis 2014 data set. The Georgia data set will be used to assess the hyperprior sensitivity of the precision for the uncorrelated heterogeneity and correlated heterogeneity components in a convolution model. The correlated heterogeneity will be modelled by a conditional autoregressive prior distribution and the uncorrelated heterogeneity will be modelled with a zero mean Gaussian prior distribution. The sensitivity analysis will be performed using three models with conjugate, Jeffreys' and a fixed parameter prior for the hyperprior distribution of the precision for the uncorrelated heterogeneity component. A simulation study will be done to compare four prior distributions which will be the conjugate, Jeffreys', probability matching and divergence priors. The three models will be fitted in WinBUGS® using a Bayesian approach. The results of the three models will be in the form of disease maps, figures and tables. The results show that the hyperprior of the precision for the uncorrelated heterogeneity and correlated heterogeneity components are sensitive to changes and will result in different results depending on the specification of the hyperprior distribution of the precision for the two components in the model. The South African data set will be used to examine whether there is a difference between the proper conditional autoregressive prior and intrinsic conditional autoregressive prior for the correlated heterogeneity component in a convolution model. Two models will be fitted in WinBUGS® for this comparison. Both the hyperpriors of the precision for the uncorrelated heterogeneity and correlated heterogeneity components will be modelled using a Jeffreys' prior distribution. The results show that there is no significant difference between the results of the model with a proper conditional autoregressive prior and intrinsic conditional autoregressive prior for the South African data, although there are a few disadvantages of using a proper conditional autoregressive prior for the correlated heterogeneity which will be stated in the conclusion.
- Full Text:
- Date Issued: 2018
Reliability analysis: assessment of hardware and human reliability
- Authors: Mafu, Masakheke
- Date: 2017
- Subjects: Bayesian statistical decision theory , Reliability (Engineering) , Human machine systems , Probabilities , Markov processes
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/6280 , vital:21077
- Description: Most reliability analyses involve the analysis of binary data. Practitioners in the field of reliability place great emphasis on analysing the time periods over which items or systems function (failure time analyses), which make use of different statistical models. This study intends to introduce, review and investigate four statistical models for modeling failure times of non-repairable items, and to utilise a Bayesian methodology to achieve this. The exponential, Rayleigh, gamma and Weibull distributions will be considered. The performance of the two non-informative priors will be investigated. An application of two failure time distributions will be carried out. To meet these objectives, the failure rate and the reliability functions of failure time distributions are calculated. Two non-informative priors, the Jeffreys prior and the general divergence prior, and the corresponding posteriors are derived for each distribution. Simulation studies for each distribution are carried out, where the coverage rates and credible intervals lengths are calculated and the results of these are discussed. The gamma distribution and the Weibull distribution are applied to failure time data.The Jeffreys prior is found to have better coverage rate than the general divergence prior. The general divergence shows undercoverage when used with the Rayleigh distribution. The Jeffreys prior produces coverage rates that are conservative when used with the exponential distribution. These priors give, on average, the same average interval lengths and increase as the value of the parameter increases. Both priors perform similar when used with the gamma distribution and the Weibull distribution. A thorough discussion and review of human reliability analysis (HRA) techniques will be considered. Twenty human reliability analysis (HRA) techniques are discussed; providing a background, description and advantages and disadvantages for each. Case studies in the nuclear industry, railway industry, and aviation industry are presented to show the importance and applications of HRA. Human error has been shown to be the major contributor to system failure.
- Full Text:
- Date Issued: 2017
- Authors: Mafu, Masakheke
- Date: 2017
- Subjects: Bayesian statistical decision theory , Reliability (Engineering) , Human machine systems , Probabilities , Markov processes
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/6280 , vital:21077
- Description: Most reliability analyses involve the analysis of binary data. Practitioners in the field of reliability place great emphasis on analysing the time periods over which items or systems function (failure time analyses), which make use of different statistical models. This study intends to introduce, review and investigate four statistical models for modeling failure times of non-repairable items, and to utilise a Bayesian methodology to achieve this. The exponential, Rayleigh, gamma and Weibull distributions will be considered. The performance of the two non-informative priors will be investigated. An application of two failure time distributions will be carried out. To meet these objectives, the failure rate and the reliability functions of failure time distributions are calculated. Two non-informative priors, the Jeffreys prior and the general divergence prior, and the corresponding posteriors are derived for each distribution. Simulation studies for each distribution are carried out, where the coverage rates and credible intervals lengths are calculated and the results of these are discussed. The gamma distribution and the Weibull distribution are applied to failure time data.The Jeffreys prior is found to have better coverage rate than the general divergence prior. The general divergence shows undercoverage when used with the Rayleigh distribution. The Jeffreys prior produces coverage rates that are conservative when used with the exponential distribution. These priors give, on average, the same average interval lengths and increase as the value of the parameter increases. Both priors perform similar when used with the gamma distribution and the Weibull distribution. A thorough discussion and review of human reliability analysis (HRA) techniques will be considered. Twenty human reliability analysis (HRA) techniques are discussed; providing a background, description and advantages and disadvantages for each. Case studies in the nuclear industry, railway industry, and aviation industry are presented to show the importance and applications of HRA. Human error has been shown to be the major contributor to system failure.
- Full Text:
- Date Issued: 2017
A review of generalized linear models for count data with emphasis on current geospatial procedures
- Authors: Michell, Justin Walter
- Date: 2016
- Subjects: Spatial analysis (Statistics) , Bayesian statistical decision theory , Geospatial data , Malaria -- Botswana -- Statistics , Malaria -- Botswana -- Research -- Statistical methods
- Language: English
- Type: Thesis , Masters , MCom
- Identifier: vital:5582 , http://hdl.handle.net/10962/d1019989
- Description: Analytical problems caused by over-fitting, confounding and non-independence in the data is a major challenge for variable selection. As more variables are tested against a certain data set, there is a greater risk that some will explain the data merely by chance, but will fail to explain new data. The main aim of this study is to employ a systematic and practicable variable selection process for the spatial analysis and mapping of historical malaria risk in Botswana using data collected from the MARA (Mapping Malaria Risk in Africa) project and environmental and climatic datasets from various sources. Details of how a spatial database is compiled for a statistical analysis to proceed is provided. The automation of the entire process is also explored. The final bayesian spatial model derived from the non-spatial variable selection procedure using Markov Chain Monte Carlo simulation was fitted to the data. Winter temperature had the greatest effect of malaria prevalence in Botswana. Summer rainfall, maximum temperature of the warmest month, annual range of temperature, altitude and distance to closest water source were also significantly associated with malaria prevalence in the final spatial model after accounting for spatial correlation. Using this spatial model malaria prevalence at unobserved locations was predicted, producing a smooth risk map covering Botswana. The automation of both compiling the spatial database and the variable selection procedure proved challenging and could only be achieved in parts of the process. The non-spatial selection procedure proved practical and was able to identify stable explanatory variables and provide an objective means for selecting one variable over another, however ultimately it was not entirely successful due to the fact that a unique set of spatial variables could not be selected.
- Full Text:
- Date Issued: 2016
- Authors: Michell, Justin Walter
- Date: 2016
- Subjects: Spatial analysis (Statistics) , Bayesian statistical decision theory , Geospatial data , Malaria -- Botswana -- Statistics , Malaria -- Botswana -- Research -- Statistical methods
- Language: English
- Type: Thesis , Masters , MCom
- Identifier: vital:5582 , http://hdl.handle.net/10962/d1019989
- Description: Analytical problems caused by over-fitting, confounding and non-independence in the data is a major challenge for variable selection. As more variables are tested against a certain data set, there is a greater risk that some will explain the data merely by chance, but will fail to explain new data. The main aim of this study is to employ a systematic and practicable variable selection process for the spatial analysis and mapping of historical malaria risk in Botswana using data collected from the MARA (Mapping Malaria Risk in Africa) project and environmental and climatic datasets from various sources. Details of how a spatial database is compiled for a statistical analysis to proceed is provided. The automation of the entire process is also explored. The final bayesian spatial model derived from the non-spatial variable selection procedure using Markov Chain Monte Carlo simulation was fitted to the data. Winter temperature had the greatest effect of malaria prevalence in Botswana. Summer rainfall, maximum temperature of the warmest month, annual range of temperature, altitude and distance to closest water source were also significantly associated with malaria prevalence in the final spatial model after accounting for spatial correlation. Using this spatial model malaria prevalence at unobserved locations was predicted, producing a smooth risk map covering Botswana. The automation of both compiling the spatial database and the variable selection procedure proved challenging and could only be achieved in parts of the process. The non-spatial selection procedure proved practical and was able to identify stable explanatory variables and provide an objective means for selecting one variable over another, however ultimately it was not entirely successful due to the fact that a unique set of spatial variables could not be selected.
- Full Text:
- Date Issued: 2016
Prediction of protein secondary structure using binary classificationtrees, naive Bayes classifiers and the Logistic Regression Classifier
- Eldud Omer, Ahmed Abdelkarim
- Authors: Eldud Omer, Ahmed Abdelkarim
- Date: 2016
- Subjects: Bayesian statistical decision theory , Logistic regression analysis , Biostatistics , Proteins -- Structure
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:5581 , http://hdl.handle.net/10962/d1019985
- Description: The secondary structure of proteins is predicted using various binary classifiers. The data are adopted from the RS126 database. The original data consists of protein primary and secondary structure sequences. The original data is encoded using alphabetic letters. These data are encoded into unary vectors comprising ones and zeros only. Different binary classifiers, namely the naive Bayes, logistic regression and classification trees using hold-out and 5-fold cross validation are trained using the encoded data. For each of the classifiers three classification tasks are considered, namely helix against not helix (H/∼H), sheet against not sheet (S/∼S) and coil against not coil (C/∼C). The performance of these binary classifiers are compared using the overall accuracy in predicting the protein secondary structure for various window sizes. Our result indicate that hold-out cross validation achieved higher accuracy than 5-fold cross validation. The Naive Bayes classifier, using 5-fold cross validation achieved, the lowest accuracy for predicting helix against not helix. The classification tree classifiers, using 5-fold cross validation, achieved the lowest accuracies for both coil against not coil and sheet against not sheet classifications. The logistic regression classier accuracy is dependent on the window size; there is a positive relationship between the accuracy and window size. The logistic regression classier approach achieved the highest accuracy when compared to the classification tree and Naive Bayes classifiers for each classification task; predicting helix against not helix with accuracy 77.74 percent, for sheet against not sheet with accuracy 81.22 percent and for coil against not coil with accuracy 73.39 percent. It is noted that it is easier to compare classifiers if the classification process could be completely facilitated in R. Alternatively, it would be easier to assess these logistic regression classifiers if SPSS had a function to determine the accuracy of the logistic regression classifier.
- Full Text:
- Date Issued: 2016
- Authors: Eldud Omer, Ahmed Abdelkarim
- Date: 2016
- Subjects: Bayesian statistical decision theory , Logistic regression analysis , Biostatistics , Proteins -- Structure
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:5581 , http://hdl.handle.net/10962/d1019985
- Description: The secondary structure of proteins is predicted using various binary classifiers. The data are adopted from the RS126 database. The original data consists of protein primary and secondary structure sequences. The original data is encoded using alphabetic letters. These data are encoded into unary vectors comprising ones and zeros only. Different binary classifiers, namely the naive Bayes, logistic regression and classification trees using hold-out and 5-fold cross validation are trained using the encoded data. For each of the classifiers three classification tasks are considered, namely helix against not helix (H/∼H), sheet against not sheet (S/∼S) and coil against not coil (C/∼C). The performance of these binary classifiers are compared using the overall accuracy in predicting the protein secondary structure for various window sizes. Our result indicate that hold-out cross validation achieved higher accuracy than 5-fold cross validation. The Naive Bayes classifier, using 5-fold cross validation achieved, the lowest accuracy for predicting helix against not helix. The classification tree classifiers, using 5-fold cross validation, achieved the lowest accuracies for both coil against not coil and sheet against not sheet classifications. The logistic regression classier accuracy is dependent on the window size; there is a positive relationship between the accuracy and window size. The logistic regression classier approach achieved the highest accuracy when compared to the classification tree and Naive Bayes classifiers for each classification task; predicting helix against not helix with accuracy 77.74 percent, for sheet against not sheet with accuracy 81.22 percent and for coil against not coil with accuracy 73.39 percent. It is noted that it is easier to compare classifiers if the classification process could be completely facilitated in R. Alternatively, it would be easier to assess these logistic regression classifiers if SPSS had a function to determine the accuracy of the logistic regression classifier.
- Full Text:
- Date Issued: 2016
Eliciting and combining expert opinion : an overview and comparison of methods
- Authors: Chinyamakobvu, Mutsa Carole
- Date: 2015
- Subjects: Decision making -- Statistical methods , Expertise , Bayesian statistical decision theory , Statistical decision , Delphi method , Paired comparisons (Statistics)
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:5579 , http://hdl.handle.net/10962/d1017827
- Description: Decision makers have long relied on experts to inform their decision making. Expert judgment analysis is a way to elicit and combine the opinions of a group of experts to facilitate decision making. The use of expert judgment is most appropriate when there is a lack of data for obtaining reasonable statistical results. The experts are asked for advice by one or more decision makers who face a specific real decision problem. The decision makers are outside the group of experts and are jointly responsible and accountable for the decision and committed to finding solutions that everyone can live with. The emphasis is on the decision makers learning from the experts. The focus of this thesis is an overview and comparison of the various elicitation and combination methods available. These include the traditional committee method, the Delphi method, the paired comparisons method, the negative exponential model, Cooke’s classical model, the histogram technique, using the Dirichlet distribution in the case of a set of uncertain proportions which must sum to one, and the employment of overfitting. The supra Bayes approach, the determination of weights for the experts, and combining the opinions of experts where each opinion is associated with a confidence level that represents the expert’s conviction of his own judgment are also considered.
- Full Text:
- Date Issued: 2015
- Authors: Chinyamakobvu, Mutsa Carole
- Date: 2015
- Subjects: Decision making -- Statistical methods , Expertise , Bayesian statistical decision theory , Statistical decision , Delphi method , Paired comparisons (Statistics)
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:5579 , http://hdl.handle.net/10962/d1017827
- Description: Decision makers have long relied on experts to inform their decision making. Expert judgment analysis is a way to elicit and combine the opinions of a group of experts to facilitate decision making. The use of expert judgment is most appropriate when there is a lack of data for obtaining reasonable statistical results. The experts are asked for advice by one or more decision makers who face a specific real decision problem. The decision makers are outside the group of experts and are jointly responsible and accountable for the decision and committed to finding solutions that everyone can live with. The emphasis is on the decision makers learning from the experts. The focus of this thesis is an overview and comparison of the various elicitation and combination methods available. These include the traditional committee method, the Delphi method, the paired comparisons method, the negative exponential model, Cooke’s classical model, the histogram technique, using the Dirichlet distribution in the case of a set of uncertain proportions which must sum to one, and the employment of overfitting. The supra Bayes approach, the determination of weights for the experts, and combining the opinions of experts where each opinion is associated with a confidence level that represents the expert’s conviction of his own judgment are also considered.
- Full Text:
- Date Issued: 2015
Tolerance intervals for variance component models using a Bayesian simulation procedure
- Authors: Sarpong, Abeam Danso
- Date: 2013
- Subjects: Bayesian statistical decision theory , Multilevel models (Statistics)
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:10583 , http://hdl.handle.net/10948/d1021025
- Description: The estimation of variance components serves as an integral part of the evaluation of variation, and is of interest and required in a variety of applications (Hugo, 2012). Estimation of the among-group variance components is often desired for quantifying the variability and effectively understanding these measurements (Van Der Rijst, 2006). The methodology for determining Bayesian tolerance intervals for the one – way random effects model has originally been proposed by Wolfinger (1998) using both informative and non-informative prior distributions (Hugo, 2012). Wolfinger (1998) also provided relationships with frequentist methodologies. From a Bayesian point of view, it is important to investigate and compare the effect on coverage probabilities if negative variance components are either replaced by zero, or completely disregarded from the simulation process. This research presents a simulation-based approach for determining Bayesian tolerance intervals in variance component models when negative variance components are either replaced by zero, or completely disregarded from the simulation process. This approach handles different kinds of tolerance intervals in a straightforward fashion. It makes use of a computer-generated sample (Monte Carlo process) from the joint posterior distribution of the mean and variance parameters to construct a sample from other relevant posterior distributions. This research makes use of only non-informative Jeffreys‟ prior distributions and uses three Bayesian simulation methods. Comparative results of different tolerance intervals obtained using a method where negative variance components are either replaced by zero or completely disregarded from the simulation process, is investigated and discussed in this research.
- Full Text:
- Date Issued: 2013
- Authors: Sarpong, Abeam Danso
- Date: 2013
- Subjects: Bayesian statistical decision theory , Multilevel models (Statistics)
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:10583 , http://hdl.handle.net/10948/d1021025
- Description: The estimation of variance components serves as an integral part of the evaluation of variation, and is of interest and required in a variety of applications (Hugo, 2012). Estimation of the among-group variance components is often desired for quantifying the variability and effectively understanding these measurements (Van Der Rijst, 2006). The methodology for determining Bayesian tolerance intervals for the one – way random effects model has originally been proposed by Wolfinger (1998) using both informative and non-informative prior distributions (Hugo, 2012). Wolfinger (1998) also provided relationships with frequentist methodologies. From a Bayesian point of view, it is important to investigate and compare the effect on coverage probabilities if negative variance components are either replaced by zero, or completely disregarded from the simulation process. This research presents a simulation-based approach for determining Bayesian tolerance intervals in variance component models when negative variance components are either replaced by zero, or completely disregarded from the simulation process. This approach handles different kinds of tolerance intervals in a straightforward fashion. It makes use of a computer-generated sample (Monte Carlo process) from the joint posterior distribution of the mean and variance parameters to construct a sample from other relevant posterior distributions. This research makes use of only non-informative Jeffreys‟ prior distributions and uses three Bayesian simulation methods. Comparative results of different tolerance intervals obtained using a method where negative variance components are either replaced by zero or completely disregarded from the simulation process, is investigated and discussed in this research.
- Full Text:
- Date Issued: 2013
- «
- ‹
- 1
- ›
- »