0% found this document useful (0 votes)
19 views

Lesson 09 Data Analysis I Descriptive Statistics

The document discusses data preparation which involves collecting data from various sources, converting it to a numeric format, and handling missing values. It describes 3 key steps: 1) Coding data using a coding sheet, 2) Entering coded data into a spreadsheet or statistical software, and 3) Handling missing values through listwise deletion or imputation. Descriptive analysis is used to statistically describe and summarize data through frequency distributions, measures of central tendency (mean, median, mode), and measures of dispersion like standard deviation.

Uploaded by

lasith
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Lesson 09 Data Analysis I Descriptive Statistics

The document discusses data preparation which involves collecting data from various sources, converting it to a numeric format, and handling missing values. It describes 3 key steps: 1) Coding data using a coding sheet, 2) Entering coded data into a spreadsheet or statistical software, and 3) Handling missing values through listwise deletion or imputation. Descriptive analysis is used to statistically describe and summarize data through frequency distributions, measures of central tendency (mean, median, mode), and measures of dispersion like standard deviation.

Uploaded by

lasith
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Data Preparation

Data Preparation
• In research projects, data are collected from a variety of sources:
• Questionnaire surveys
• Interviews
• Observational data
• This data must be converted into a machine-readable and numeric
format.
• Preparation of data, converting them into numeric format is a essential
requirement before moving to analysing the data.
• Data preparation usually follows the following three steps:
1. Data coding
2. Data entry
3. Handling missing values
1. Data coding

• Coding is the initial process of converting data into numeric format.


• A coding sheet should be created to guide the coding process.
• It contains detailed description of each variable in a research study, items or
measures for that variable, the response scale for each item (i.e., whether it is
measured on a nominal, ordinal, interval, or ratio scale; whether such scale is a
five-point, seven-point, or some other type of scale), and how to code each value
into a numeric format.
• For instance, if you have a measurement item on a seven-point Likert scale with
the rang from “strongly disagree” to “strongly agree”, you can code that item as
1 for strongly disagree, 4 for neutral, and 7 for strongly agree, with the
intermediate anchors in between.
• Nominal data such as industry type can be coded in numeric form using a coding
scheme such as: 1 for manufacturing, 2 for retailing, 3 for financial, 4 for
healthcare, and so forth (of course, nominal data cannot be analysed statistically).
Coding

Q.5.2. ‘The most attractive place (AP) in the University is the hostel’. What
is your agreement (select the most appropriate answer).

Strongly agree  1
Somewhat Agree  2
Agree  3
Neither agree nor disagree  4
Somewhat disagree  5
Disagree  6
Strongly disagree  7
Code:
Q.5.2. AP – 6
You need to prepare a codebook/sheet.
2. Data entry
• Coded data can be entered into a spreadsheet, database, or directly into a statistical program
like SPSS/Minitab.
• Most statistical programs provide a data editor for entering data. However, these programs
store data in their own format (e.g., SPSS stores data as .sav files), which makes it difficult
to share that data with other statistical programs.
• Hence, it is often better to enter data into a spreadsheet such as Microsoft Excel, where they
can be reorganized as needed, shared across programs, and subsets of data can be extracted
for analysis.
• Each observation can be entered as one row in the spreadsheet and each measurement item
can be represented as one column. The entered data should be frequently checked for
accuracy, via occasional spot checks on a set of items or observations, during and after entry.
• Furthermore, while entering data, the coder should watch out for obvious evidence of bad
data, such as the respondent selecting the “strongly agree” response to all items irrespective
of content, including reverse-coded items. If so, such data can be entered but should be
excluded from subsequent analysis.
• Excel and Minitab data sheets.
Application of SPSS
• SPSS stands for Statistical Package for the Social Sciences.
• SPSS is a software package using for statistical analyses, including data processing
(data entry and data editing), data manipulation (data merging), and data display in
terms of tables and graphs.
• With the Start the SPSS program, It appears three views:
• Data view
• Variable view
• Output viewer
• Data view: Each column contains the data of each variable while each row contains
the data of each case.
• Variable view: there are 11 columns, each column contains each characteristics:
name, type, width, decimals, label, values, missing, columns, align, measures and
role. Each row contains the characteristic of each variable.
• Output Viewer: Output (tables, graphs etc.) will be kept in the Output Viewer
automatically after running statistical command/s during working with a dataset.
3. Missing values

• Respondents may not answer certain questions.


• During data entry, some statistical programs automatically treat blank entries as
missing values, while others require a specific numeric value such as -1 or 999 to
be entered to denote a missing value.
• Missing values can be handle in two ways;
• 1. During data analysis, missing values in most software programs is to simply drop the
entire observation containing even a single missing value, in a technique called list wise
deletion.
• 2. Some software programs allow the option of replacing missing values with an estimated
value via a process called imputation (assign value).
• For instance, if the missing value is one item in a multi-item scale, the assigned
value may be the average of the respondent’s responses to remaining items on
that scale.
• If the missing value belongs to a single-item scale, many researchers use the
average of other respondent’s responses to that item as the imputed value.
If missing value belongs to the Multi-item

 Academic Performance Rank

During this time, my subject knowledge has been improved 6

My English language grew during the university period 5

My computer skills grew during the university period  -

During this time my presentation skills has developed 5


If missing value belongs to a single-item
Q. No. RACE GEN AGE
Uva/B/01 2 2 38
Uva/B/02 2 1 37
Uva/B/03 1 2 58
Uva/B/04 1 2 -
Uva/B/05 1 2 40
Uva/B/06 1 1 54
Uva/B/07 1 1 48
Uva/B/08 1 2 35
Uva/B/09 1 2 47
Sab/R/01 1 2 -
Sab/R/02 1 2 53
Data Analysis
• Numeric data collected in a research project can be analysed
quantitatively using statistical tools which are broadly classified as;
1. Descriptive Analysis
2. Inferential analysis
• Descriptive analysis refers to statistically describing, aggregating,
averaging and presenting the constructs of interest or associations
between these constructs.
• Inferential analysis refers to the statistical testing of hypotheses
(theory testing) and use to reach statistical conclusion.
Descriptive Analysis
• Descriptive analysis is based on Univariate Analysis, or analysis of a
single variable, refers to a set of statistical techniques that can
describe the general properties of one variable.
• Univariate statistics include:
1. Frequency distribution
2. Central tendency
3. Dispersion
Frequency distribution
• Frequency distribution of a variable is a summary of the frequency (or
percentages) of individual values or ranges of values for that variable.

• Central tendency
• Central tendency is an estimate of the centre of a distribution of values.
• There are three major estimates of central tendency: mean, median, and
mode.
• The mean is the simple average of all values in a given distribution.
• The median is the middle value within a range of values in a distribution.
• The mode is the most frequently occurring value in a distribution of values.
Frequency Distribution
1. Calculate Frequency and Central Tendency using SPSS
• If you want to find frequency, you can follow the following steps.

• From menu bar -> Analyze->Descriptive statistics ->Frequencies -> Variables (eg, EDUCA) -
> OK.
Education level
Frequency Percent Valid Cumulative
  Percent Percent
Grade 1-5 4 1.1 1.1 1.1
Grade 6-9 15 4.1 4.1 5.2
Grade 10 to O/L 128 35.0 35.0 40.2
Valid Up to A/L 195 53.3 53.3 93.4
First degree 22 6.0 6.0 99.5
Higher degree 2 .5 .5 100.0
366 100.0 100.0
Total  
Central Tendency using SPSS

• If you want to find Mean, SD, Variance, Minimum and Maximum


values, you can follow the following steps.
• From menu bar -> Analyze->Descriptive statistics ->Descriptive ->
Variables (eg, EDUCA) -> Options -> Mean, Variance, Minimum
and Maximum -> continue -> OK.
Descriptive Statistics
  N Minimum Maximum Mean Variance
1.c. 366 2 7 4.61 .541
Valid N 366        
(listwise)
Dispersion
• Dispersion refers to the way values are spread around the central tendency. How
widely the values are clustered around the mean.
• The common measures of dispersion is the standard deviation (SD).
• If you want to find Mean, SD, Variance, Minimum and Maximum values, you can
follow the following steps.
• From menu bar -> Analyse->Descriptive statistics ->Descriptive -> Variables
(eg, EDUCA) -> Options -> Mean, SD, Variance, Minimum and Maximum ->
continue -> OK.
Descriptive Statistics
  N Mean Std. Deviation
1.c. 366 4.61 .735
Valid N (listwise) 366    

You might also like