Improved tree species discrimination at leaf level with hyperspectral data combining binary classifiers
- Authors: Dastile, Xolani Collen
- Date: 2011
- Subjects: Mathematical statistics , Analysis of variance , Nearest neighbor analysis (Statistics) , Trees--Classification
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:5567 , http://hdl.handle.net/10962/d1002807 , Mathematical statistics , Analysis of variance , Nearest neighbor analysis (Statistics) , Trees--Classification
- Description: The purpose of the present thesis is to show that hyperspectral data can be used for discrimination between different tree species. The data set used in this study contains the hyperspectral measurements of leaves of seven savannah tree species. The data is high-dimensional and shows large within-class variability combined with small between-class variability which makes discrimination between the classes challenging. We employ two classification methods: G-nearest neighbour and feed-forward neural networks. For both methods, direct 7-class prediction results in high misclassification rates. However, binary classification works better. We constructed binary classifiers for all possible binary classification problems and combine them with Error Correcting Output Codes. We show especially that the use of 1-nearest neighbour binary classifiers results in no improvement compared to a direct 1-nearest neighbour 7-class predictor. In contrast to this negative result, the use of neural networks binary classifiers improves accuracy by 10% compared to a direct neural networks 7-class predictor, and error rates become acceptable. This can be further improved by choosing only suitable binary classifiers for combination.
- Full Text:
- Date Issued: 2011
- Authors: Dastile, Xolani Collen
- Date: 2011
- Subjects: Mathematical statistics , Analysis of variance , Nearest neighbor analysis (Statistics) , Trees--Classification
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:5567 , http://hdl.handle.net/10962/d1002807 , Mathematical statistics , Analysis of variance , Nearest neighbor analysis (Statistics) , Trees--Classification
- Description: The purpose of the present thesis is to show that hyperspectral data can be used for discrimination between different tree species. The data set used in this study contains the hyperspectral measurements of leaves of seven savannah tree species. The data is high-dimensional and shows large within-class variability combined with small between-class variability which makes discrimination between the classes challenging. We employ two classification methods: G-nearest neighbour and feed-forward neural networks. For both methods, direct 7-class prediction results in high misclassification rates. However, binary classification works better. We constructed binary classifiers for all possible binary classification problems and combine them with Error Correcting Output Codes. We show especially that the use of 1-nearest neighbour binary classifiers results in no improvement compared to a direct 1-nearest neighbour 7-class predictor. In contrast to this negative result, the use of neural networks binary classifiers improves accuracy by 10% compared to a direct neural networks 7-class predictor, and error rates become acceptable. This can be further improved by choosing only suitable binary classifiers for combination.
- Full Text:
- Date Issued: 2011
SL-model for paired comparisons
- Authors: Sjölander, Morné Rowan
- Date: 2006
- Subjects: Paired comparisons (Statistics) , Mathematical statistics
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:10574 , http://hdl.handle.net/10948/605 , Paired comparisons (Statistics) , Mathematical statistics
- Description: The method of paired comparisons can be found all the way back to 1860, where Fechner made the first publication in this method, using it for his psychometric investigations [4]. Thurstone formalised the method by providing a mathematical background to it [9-11] and in 1927 the method’s birth took place with his psychometric publications, one being “a law of comparative judgment” [12-14]. The law of comparative judgment is a set of equations relating the proportion of times any stimulus k is judged greater on a given attribute than any other stimulus j to the scales and discriminal dispersions of the two stimuli on the psychological continuum. The amount of research done for discrete models of paired comparisons is not a lot. This study develops a new discrete model, the SL-model for paired comparisons. Paired comparisons data processing in which objects have an upper limit to their scores was also not yet developed, and making such a model is one of the aims of this report. The SLmodel is thus developed in this context; however, the model easily generalises to not necessarily having an upper limit on scores.
- Full Text:
- Date Issued: 2006
- Authors: Sjölander, Morné Rowan
- Date: 2006
- Subjects: Paired comparisons (Statistics) , Mathematical statistics
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:10574 , http://hdl.handle.net/10948/605 , Paired comparisons (Statistics) , Mathematical statistics
- Description: The method of paired comparisons can be found all the way back to 1860, where Fechner made the first publication in this method, using it for his psychometric investigations [4]. Thurstone formalised the method by providing a mathematical background to it [9-11] and in 1927 the method’s birth took place with his psychometric publications, one being “a law of comparative judgment” [12-14]. The law of comparative judgment is a set of equations relating the proportion of times any stimulus k is judged greater on a given attribute than any other stimulus j to the scales and discriminal dispersions of the two stimuli on the psychological continuum. The amount of research done for discrete models of paired comparisons is not a lot. This study develops a new discrete model, the SL-model for paired comparisons. Paired comparisons data processing in which objects have an upper limit to their scores was also not yet developed, and making such a model is one of the aims of this report. The SLmodel is thus developed in this context; however, the model easily generalises to not necessarily having an upper limit on scores.
- Full Text:
- Date Issued: 2006
An evaluation of paired comparison models
- Venter, Daniel Jacobus Lodewyk
- Authors: Venter, Daniel Jacobus Lodewyk
- Date: 2004
- Subjects: Paired comparisons (Statistics) , Mathematical statistics
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:11087 , http://hdl.handle.net/10948/364 , Paired comparisons (Statistics) , Mathematical statistics
- Description: Introduction: A typical task in quantitative data analysis is to derive estimates of population parameters based on sample statistics. For manifest variables this is usually a straightforward process utilising suitable measurement instruments and standard statistics such the mean, median and standard deviation. Latent variables on the other hand are typically more elusive, making it difficult to obtain valid and reliable measurements. One of the most widely used methods of estimating the parameter value of a latent variable is to use a summated score derived from a set of individual scores for each of the various attributes of the latent variable. A serious limitation of this method and other similar methods is that the validity and reliability of measurements depend on whether the statements included in the questionnaire cover all characteristics of the variable being measured and also on respondents’ ability to correctly indicate their perceived assessment of the characteristics on the scale provided. Methods without this limitation and that are especially useful where a set of objects/entities must be ranked based on the parameter values of one or more latent variables, are methods of paired comparisons. Although the underlying assumptions and algorithms of these methods often differ dramatically, they all rely on data derived from a series of comparisons, each consisting of a pair of specimens selected from the set of objects/entities being investigated. Typical examples of the comparison process are: subjects (judges) who have to indicate for each pair of objects which of the two they prefer; sport teams that compete against each other in matches that involve two teams at a time. The resultant data of each comparison range from a simple dichotomy to indicate which of the two objects are preferred/better, to an interval or ratio scale score for e d Bradley-Terry models, and were based on statistical theory assuming that the variable(s) being measured is either normally (Thurstone-Mosteller) or exponentially (Bradley-Terry) distributed. For many years researchers had to rely on these PCM’s when analysing paired comparison data without any idea about the implications if the distribution of the data from which their sample were obtained differed from the assumed distribution for the applicable PCM being utilised. To address this problem, PCM’s were subsequently developed to cater for discrete variables and variables with distributions that are neither normal or exponential. A question that remained unanswered is how the performance, as measured by the accuracy of parameter estimates, of PCM's are affected if they are applied to data from a range of discrete and continuous distribution that violates the assumptions on which the applicable paired comparison algorithm is based. This study is an attempt to answer this question by applying the most popular PCM's to a range of randomly derived data sets that spans typical continuous and discrete data distributions. It is hoped that the results of this study will assist researchers when selecting the most appropriate PCM to obtain accurate estimates of the parameters of the variables in their data sets.
- Full Text:
- Date Issued: 2004
- Authors: Venter, Daniel Jacobus Lodewyk
- Date: 2004
- Subjects: Paired comparisons (Statistics) , Mathematical statistics
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:11087 , http://hdl.handle.net/10948/364 , Paired comparisons (Statistics) , Mathematical statistics
- Description: Introduction: A typical task in quantitative data analysis is to derive estimates of population parameters based on sample statistics. For manifest variables this is usually a straightforward process utilising suitable measurement instruments and standard statistics such the mean, median and standard deviation. Latent variables on the other hand are typically more elusive, making it difficult to obtain valid and reliable measurements. One of the most widely used methods of estimating the parameter value of a latent variable is to use a summated score derived from a set of individual scores for each of the various attributes of the latent variable. A serious limitation of this method and other similar methods is that the validity and reliability of measurements depend on whether the statements included in the questionnaire cover all characteristics of the variable being measured and also on respondents’ ability to correctly indicate their perceived assessment of the characteristics on the scale provided. Methods without this limitation and that are especially useful where a set of objects/entities must be ranked based on the parameter values of one or more latent variables, are methods of paired comparisons. Although the underlying assumptions and algorithms of these methods often differ dramatically, they all rely on data derived from a series of comparisons, each consisting of a pair of specimens selected from the set of objects/entities being investigated. Typical examples of the comparison process are: subjects (judges) who have to indicate for each pair of objects which of the two they prefer; sport teams that compete against each other in matches that involve two teams at a time. The resultant data of each comparison range from a simple dichotomy to indicate which of the two objects are preferred/better, to an interval or ratio scale score for e d Bradley-Terry models, and were based on statistical theory assuming that the variable(s) being measured is either normally (Thurstone-Mosteller) or exponentially (Bradley-Terry) distributed. For many years researchers had to rely on these PCM’s when analysing paired comparison data without any idea about the implications if the distribution of the data from which their sample were obtained differed from the assumed distribution for the applicable PCM being utilised. To address this problem, PCM’s were subsequently developed to cater for discrete variables and variables with distributions that are neither normal or exponential. A question that remained unanswered is how the performance, as measured by the accuracy of parameter estimates, of PCM's are affected if they are applied to data from a range of discrete and continuous distribution that violates the assumptions on which the applicable paired comparison algorithm is based. This study is an attempt to answer this question by applying the most popular PCM's to a range of randomly derived data sets that spans typical continuous and discrete data distributions. It is hoped that the results of this study will assist researchers when selecting the most appropriate PCM to obtain accurate estimates of the parameters of the variables in their data sets.
- Full Text:
- Date Issued: 2004
- «
- ‹
- 1
- ›
- »