Larose, D. T. (2006) Data Mining Methods and Models, Hoboken: John Wiley & Sons, Inc. Morgan Kaufmann
Larose, D. T. (2006) Data Mining Methods and Models, Hoboken: John Wiley & Sons, Inc. Morgan Kaufmann
2. EDHS 2011 dataset was utilized. The data were originally collected by Macro International United States of America (USA) and
CSA Ethiopia
3. A total of 11,654 records that met inclusion criteria were retrieved. Data was extracted from EDHS 2011 children’s dataset.
Extracted data were cleaned, coded, transformed and entered into
Waikato Environment for Knowledge Analysis (WEKA) 3.6.4 software. The extracted dataset was stratified into “Alive” and
“Dead”
groups. The “Alive” group comprised mothers whose child was
alive during the survey. The “Dead” group comprised mothers who
had one or more dead child. Since sample sizes of ‘Alive’ and
‘Dead’ subgroups is not balanced we have applied Synthetic Minority Oversampling Technique (SMOTE) was applied to
balance the
dataset and minimize sampling errors. Pruning techniques were
used to clean rules that were insignificant. The 10 fold cross validation and 95% split was done to oversee the strength of the
association of determinants with the outcome variable.
4. Most of the data sets used in data mining were not
necessarily gathered with a specific goal in mind. Some of
them may contain errors, outliers or missing values. In order
to use those data sets in the data mining process, the data
needs to undergo preprocessing, using data cleaning,
discretization and data transformation [9]. It has been
estimated that data preparation alone accounts for 60% of all
the time and effort expanded in the entire data mining process
[10].
5. Larose, D. T. (2006) Data Mining Methods and Models, Hoboken: John Wiley & Sons, Inc.
6.
[10] Pyle, D. (1999) Data Preparation for Data Mining, San Francisco: Morgan Kaufmann