Comparison Of EM Algorithm and Standard Imputation Methods For Missing Data: A Questionnaire Study On Diabetic Patients

Afshari Safavi , A; Kazemzadeh Gharechobogh , H; Rezaei, M

Volume 11, Issue 3 (Vol 11, No 3 2015) irje 2015, 11(3): 43-51 | Back to browse issues page

Mendeley

Zotero

RefWorks

Afshari Safavi A, Kazemzadeh Gharechobogh H, Rezaei M. Comparison Of EM Algorithm and Standard Imputation Methods For Missing Data: A Questionnaire Study On Diabetic Patients. irje 2015; 11 (3) :43-51
URL: http://irje.tums.ac.ir/article-1-5441-en.html

Comparison Of EM Algorithm and Standard Imputation Methods For Missing Data: A Questionnaire Study On Diabetic Patients

A Afshari Safavi¹

, H Kazemzadeh Gharechobogh ^*²

, M Rezaei³

1- Assistant Professor, Chronic Diseases ReseaarchCommittee, Isfahan University of Medical Sciences, Isfahan, Iran
2- MSc of Statistics, Social Security Organization, Tehran , kazemzadeh_hk@yahoo.com
3- Department of Biostatistics and Epidemiology, Kermanshah University of Medical Sciences, Kermanshah, Iran

Abstract: (14420 Views)

Background and Objectives: Missing data is a big challenge in the research. According to the type of the study and of the variables, different ways have been proposed to work with these data. This study compared five popular imputation approaches in addressing missing data in the questionnaires.

Methods: In this study, 500 questionnaires were used for self-medication in diabetic patients. Missing in the observations was artificially generated by random selection of questions and then deleting them. Five imputation ways included: 1) the mean of the questions, 2) the mean of the person, 3) the mode of the person, 4) linear regression, and 5) EM algorithm. For each method, the mean and standard deviation were compared with imputation. The Spearman correlation coefficient, the percentage of incorrectly classified and kappa statistic were also calculated.

Results: A kappa higher than 0.81 represented almost perfect agreement at 10% missingness. The EM algorithm showed the highest level of agreement with the results of actual data with a Kappa of 0.886. With increasing missingness to 30%, the EM algorithm and the mean of the person showed a rather similar agreement with a Kappa of 0.697 and 0.687, respectively.

Conclusion: In this study, the EM algorithm was the most accurate method for handling missing data in all models. The mean of the person method is easy for handling missing data, especially for most non statisticians.

Keywords: Algorithm EM, Missing data, Diabetes, Self-treatment, Kappa statistics, Regression

Full-Text [PDF 1632 kb] (3822 Downloads)

Type of Study: Research | Subject: General

Send email to the article author

Rights and permissions
	This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Related Websites

Site Keywords

Epidemiology, Iranian Journal, Iranian Epidemiological Association

Site Statistics

Registered users: 3631 users
Online users: 1 users
Guest users: 223 users
All visits: 5867156 visits
Visits in 24 Hours: 9318 visits
Total articles: 2772 articles
Published articles: 727 articles

Designed & Developed by : Yektaweb

Iranian Journal of

Epidemiology

Related Websites

Site Keywords

Site Statistics