Missing values: a closer look

Thorpe, Kerri

Title: Missing values: a closer look
Creator: Thorpe, Kerri
ThesisAdvisor: Garisch, I.
Date: 2017
Type: Thesis
Type: Masters
Type: MSc
Identifier: http://hdl.handle.net/10962/d1017827
Identifier: vital:20798
Description: Problem: In today’s world, missing values are more present than ever. Due to the ever-changing and fast paced global society in which we live, most business and research data produced around the world contain missing data. This means that locating data which is meticulously precise can be a hard task in itself, but at times may prove essential as the consequences of making use of incomplete data could be disastrous. The reasons for missing data cropping up in almost all forms of work are numerous and shall be discussed in this dissertation. For example, those being interviewed or polled may choose to simply ignore questions which are posed to them, recording equipment may malfunction or be misplaced, or organisers may not be able to locate the respondent in order to rectify the missing data. Whatever the reasons for data being incomplete, it is necessary to avoid having to use inefficient and incomplete data as a result from the above problems. Therefore, various strategies or methods have been developed in order to handle these missing values. It is important, however, that these strategies or methods are utilised effectively as missing data treatment can introduce bias into the analysis. This dissertation shall look at these and other problems in more detail by using a data set which consists of records for 581 children who were interviewed in 1990 as part of the National Longitudinal Survey of Youth (NLSY). Approach: As mentioned above, many strategies or methods have been developed in order to deal with missing values. More specifically, traditional methods such as complete case analysis, available case analysis or single imputation are widely used by researchers and shall be discussed herein. Although these methods are simple and easy to implement, they require assumptions about the data that are not often satisfied in practice. Over the years, more up to date and relevant methods, such as multiple imputation and maximum likelihood have been developed. These methods rely on weaker assumptions and contain superior statistical properties when compared to the traditional techniques. In this dissertation, these traditional methods shall be reviewed and assessed in SAS and shall be compared to the more modern techniques. Results: The ad hoc techniques for handling missing data such as complete case and available case methods produce biased parameter estimates when the data is not missing completely at random (MCAR). Single imputation techniques likewise produce biased estimates as well as result in the underestimation of standard errors. Although the expectation maximisation (EM) algorithm yields unbiased parameter estimates, the lack of convenient standard errors suggests that using this algorithm for hypothesis testing is not a good idea. Multiple imputation, however, yields unbiased parameter estimates and correctly estimates standard errors. Conclusion: Ignoring missing data in any analysis produces biased parameter estimates. Using single imputation to handle missing values is not recommended, as using a single value to replace missing values does not account for the variation that would have been present if the variables were observed. As a result, the variance will be greatly underestimated. The more modern missing data methods such as the EM algorithm and multiple imputation are preferred over the traditional techniques as they require less stringent assumptions and they also mitigate the downsides of the older methods.
Format: 121 leaves, pdf
Publisher: Rhodes University, Faculty of Science, Statistics
Language: English
Rights: Thorpe, Kerri

Hits: 963
Visitors: 1016
Downloads: 103

Collections

RU Department of Statistics

		Thumbnail	File	Description	Size	Format
View Details			SOURCE1	Adobe Acrobat PDF	1 MB	Adobe Acrobat PDF	View Details