0% found this document useful (0 votes)
43 views

Stats Note

This document discusses various methods for obtaining, entering, cleaning, and analyzing quantitative and qualitative data in statistics. It describes common data collection techniques like interviews, surveys, observations, and documents. It also outlines key steps in the data process including defining variables, entering case data, coding responses, and conducting univariate, bivariate, and multivariate analyses to explore relationships between variables.

Uploaded by

Danial Khawaja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views

Stats Note

This document discusses various methods for obtaining, entering, cleaning, and analyzing quantitative and qualitative data in statistics. It describes common data collection techniques like interviews, surveys, observations, and documents. It also outlines key steps in the data process including defining variables, entering case data, coding responses, and conducting univariate, bivariate, and multivariate analyses to explore relationships between variables.

Uploaded by

Danial Khawaja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Obtaining Data:

Yes, these are some common methods of obtaining data in statistics:

Interviews: Interviews involve direct communication between the researcher and the study participants
to gather information. They can be conducted in-person, over the phone, or online.

Questionnaires and surveys: Questionnaires and surveys are used to collect data from a large group of
people. They consist of a set of questions that are designed to elicit specific information from
participants.

Observations: Observations involve watching and recording behaviors and events in a systematic and
structured manner. Observations can be done in-person or remotely, and can be structured or
unstructured.

Documents and records: Documents and records can provide valuable data for research, such as
government records, legal documents, medical records, and financial records.

Focus groups: Focus groups are a type of group interview that involves a small group of people
discussing a specific topic. The group is led by a moderator who asks questions and facilitates the
discussion.

Oral histories: Oral histories involve gathering personal narratives from individuals about their
experiences and perspectives on a particular topic or event. These can be recorded and transcribed for
analysis.

Data Entry:

Defining variables is a crucial step in the data entry process. It involves identifying the characteristics or
attributes of the data that will be collected and analyzed. Variables can be classified as either
independent or dependent variables, and they can be quantitative (numeric) or qualitative (categorical)
in nature.

To define variables, you need to first identify the research question and the type of data that will be
collected. Then, you can determine which variables are relevant and how they will be measured. For
example, if you are conducting a survey on customer satisfaction, the variables might include age,
gender, income level, satisfaction level, and product/service usage.

Entering case data

Once the variables are defined, the next step is to enter the data into a database or spreadsheet. This
involves creating a data entry form and entering the data for each case (i.e., each individual or item in
the study).

Conducting runs

After the data is entered, you can conduct runs or analyses to examine the relationships between the
variables. This might involve running descriptive statistics, regression analyses, or other types of
statistical tests.
Coding and recoding

If numeric values are not pre-assigned, you will need to decide on a coding system for the variables. This
might involve assigning numeric codes to categorical variables or creating categories for numeric
variables. If there are open-ended responses, you will need to decide how to code and analyze them.
This might involve creating categories or themes based on the responses.

Overall, defining variables, entering case data, and conducting runs are important steps in the data entry
process. These steps help to ensure that the data is accurate, organized, and ready for analysis.

Data Cleaning:

Reread each set of responses back (immediately) to confirm accuracy: This is an important step in data
validation, as it helps to catch any errors or inconsistencies in the data. After collecting data, it's a good
practice to reread each response to ensure accuracy.

"Possible-code cleaning": This refers to the process of checking for invalid or impossible responses to a
question. The easiest way to do this is by running a frequency distribution, which allows you to identify
any values that are outside the expected range.

Contingency cleaning: This refers to checking for inconsistencies in responses to "if" questions. For
example, if a respondent indicates that they do not own a car, but then list car-related expenses in their
budget, this would be an inconsistency. By checking for these types of inconsistencies, you can ensure
that the data is accurate and reliable.

"Sort" by response: This refers to sorting the data based on a specific variable or response. For example,
if you have a question about recycling, you can sort the data by the "do you recycle" variable and then
check the responses to the "what do you recycle" variable. This can help you identify any inconsistencies
or errors in the data.

Running cross tabs: This involves analyzing the relationship between two or more variables. By running
cross tabs and checking that all cells are empty, you can ensure that the data is accurate and reliable.

Coding and data entry options

Transfer sheets: These are special forms that are used to transfer data from one document to another.
For example, if data is collected on paper questionnaires, transfer sheets may be used to transfer the
data to a computerized database.

Edge coding: This involves recording code numbers in the margins of questionnaires or other data
collection forms. This helps to identify the responses to specific questions and makes it easier to enter
the data into a computerized database.
Direct data entry: This involves entering data directly into a computerized database, eliminating the need
for transfer sheets or other forms of data transfer. This method of data entry can be faster and more
efficient than other methods.

Data entry by interviewer: In some cases, data may be collected by an interviewer who enters the data
directly into a computerized database. This can be an efficient method of data collection, as it allows for
immediate entry and validation of the data.

Optical scan sheets: These are special forms that are designed to be read by an optical scanner. The
scanner reads the responses and converts them into digital data, which can then be entered into a
computerized database. This method of data collection can be efficient and accurate, but requires
specialized equipment.

Coding:

The first part of your response describes "coding," which is the process of assigning numerical values to
responses or information gathered by a research instrument. Coding allows data to be analyzed and
processed by computers and statistical software, and helps to reduce the complexity of the data by
converting it into a simpler format.

The second part of your response describes a "codebook," which is a document that describes the
variables used in a research study and lists the codes assigned to each attribute of the variables. A
codebook typically includes information about the variable name, variable type (e.g., categorical or
continuous), the code values assigned to each attribute, and any special instructions or notes about the
variable. The codebook helps researchers and analysts to understand the data and how it should be
analyzed.

Types of Analysis:

Univariate analysis involves examining the distribution of cases or responses on a single variable. This
type of analysis is often used to describe and summarize data, and to identify patterns or trends in the
data. Examples of univariate analysis include calculating measures of central tendency (e.g., mean,
median, mode) and measures of variability (e.g., range, standard deviation).

Bivariate analysis involves examining the relationship between two variables. This type of analysis is
often used to explore the association or correlation between two variables, and to test hypotheses about
the relationship between them. Examples of bivariate analysis include calculating correlation coefficients
and conducting t-tests or chi-square tests.

Multivariate analysis involves examining the relationship between more than two variables
simultaneously. This type of analysis is often used to identify the complex relationships between
variables, and to control for the effects of other variables in the analysis. Examples of multivariate
analysis include multiple regression analysis and factor analysis.
Dependence Variable:

"Dependence methods" typically refer to statistical techniques that are used to analyze the relationship
between two or more variables, with the goal of understanding the degree of dependence or association
between them. These methods include but are not limited to correlation analysis, regression analysis,
and contingency table analysis. Correlation analysis is used to quantify the strength and direction of the
linear relationship between two continuous variables, while regression analysis is used to model the
relationship between a dependent variable and one or more independent variables. Contingency table
analysis, on the other hand, is used to examine the relationship between two or more categorical
variables.

Metric and Non Metric Data:

Metric data (also known as "quantitative data") are variables that have a meaningful numerical value,
such as height, weight, temperature, or income. Metric data can be further classified into two types:
discrete and continuous. Discrete data are values that can only take on specific whole numbers, such as
the number of children in a family or the number of cars owned by a household. Continuous data, on the
other hand, can take on any value within a range, such as height or temperature.

Non-metric data (also known as "categorical data") are variables that do not have a numerical value, but
rather belong to a particular category or group. Examples of non-metric data include gender, marital
status, religion, and occupation. Non-metric data can be further classified into two types: nominal and
ordinal. Nominal data are values that represent categories that are not ordered or ranked, such as hair
color or eye color. Ordinal data, on the other hand, represent categories that are ordered or ranked, such
as education level or income bracket.

One dependent variable

Multiple regression analysis is a statistical technique that is used to explore the relationship between a
dependent variable and multiple independent variables. In multiple regression analysis, the independent
variables can be either metric or non-metric. However, it is important to note that if non-metric variables
are used in the analysis, they must first be transformed into a metric form through a process called
"dummy coding". In the context of multiple regression analysis, metric independent variables are also
called "continuous predictors", while non-metric independent variables are also called "categorical
predictors".

Conjoint analysis is a marketing research technique that is used to understand how consumers value
different attributes of a product or service. In conjoint analysis, the independent variables are typically
non-metric and represent different levels or combinations of product attributes. For example, in a
conjoint analysis of a smartphone, the independent variables might include the brand, screen size,
camera quality, battery life, and price. Each independent variable is broken down into several levels or
options, and participants are asked to indicate their preference for different product profiles that are
created by combining the different levels of the independent variables. The analysis then uses statistical
techniques to estimate the relative importance of each independent variable and the optimal
combination of attributes that would maximize consumer preference.
Multiple discriminant analysis (MDA) is a statistical technique that is used to identify the underlying
linear combination of independent variables that best discriminates between two or more groups on a
dependent variable. In MDA, the independent variables are typically metric and continuous, while the
dependent variable is categorical.

MDA can be used in a variety of fields, including finance, marketing, and healthcare, to identify the
factors that differentiate between groups. For example, in finance, MDA can be used to identify the
financial ratios that distinguish between profitable and unprofitable companies. In marketing, MDA can
be used to identify the product attributes that differentiate between high- and low-revenue products. In
healthcare, MDA can be used to identify the clinical factors that distinguish between healthy and
diseased individuals.

Several Dependent variables

Multivariate analysis of variance (MANOVA) is a statistical technique that is used to test whether the
means of two or more groups differ on two or more continuous dependent variables. In MANOVA, the
dependent variables are typically metric and continuous, while the independent variable(s) can be either
metric or non-metric, with the non-metric independent variable(s) represented by categorical factors.

MANOVA extends the analysis of variance (ANOVA) technique to situations where there are multiple
dependent variables. ANOVA can only test the difference in means between two groups on one
dependent variable, whereas MANOVA can test the difference in means between two or more groups on
two or more dependent variables simultaneously. MANOVA tests the null hypothesis that the mean
vectors of the groups are equal across all dependent variables.

If the null hypothesis is rejected, follow-up tests can be performed to determine which dependent
variables are driving the significant difference between groups. MANOVA is commonly used in fields such
as psychology, education, and social sciences to investigate differences between groups on multiple
dependent variables. For example, in a study of the effectiveness of different teaching methods on
academic achievement, MANOVA can be used to test whether there are significant differences in mean
scores on multiple academic subjects between the groups of students who received different teaching
methods.

Canonical correlation analysis (CCA) is a multivariate statistical technique that is used to investigate the
relationship between two sets of variables. In CCA, the two sets of variables are called the "canonical
variates", and each set can include both metric and non-metric variables.

CCA is useful for exploring the relationship between two sets of variables when there may be complex
underlying relationships between them. For example, in a study of the relationship between academic
performance and psychological well-being, CCA can be used to investigate the relationship between the
academic performance variables (such as GPA and test scores) and the psychological well-being variables
(such as self-esteem and life satisfaction). CCA can reveal whether there are significant correlations
between the two sets of variables and identify the specific variables that are driving the correlations.
Interdependence methods:

Metric:

Factor analysis is a statistical technique used to reduce the complexity of a dataset by identifying
underlying factors or latent variables that explain the variance in a set of observed variables. In factor
analysis, the observed variables are typically metric and continuous. The goal of factor analysis is to
identify the smallest number of factors that can account for the maximum amount of variance in the
observed variables. These factors can then be interpreted in terms of the underlying constructs that they
represent. Factor analysis is often used in psychology and social sciences to identify underlying
personality traits, attitudes, or other psychological constructs.

Cluster analysis is a statistical technique used to group observations or variables into clusters based on
their similarity. In cluster analysis, the variables can be either metric or non-metric, while the
observations are usually metric and continuous. The goal of cluster analysis is to identify groups of
observations that are similar to each other, while being different from observations in other groups. The
similarity between observations is typically measured using a distance metric, and different clustering
algorithms can be used to create the clusters. Cluster analysis is often used in market segmentation,
social sciences, and biology to identify groups of consumers, individuals, or biological specimens with
similar characteristics.

Metric multidimensional scaling (MDS) is a statistical technique used to visualize the similarity between
objects based on their pairwise distance or dissimilarity scores. MDS maps the objects onto a low-
dimensional space, typically two or three dimensions, while preserving their pairwise distances or
dissimilarity scores as much as possible. In MDS, the objects can be either metric or non-metric, and the
distance or dissimilarity scores are typically metric.

MDS can be used for various purposes, such as understanding consumer preferences, visualizing the
similarity between products, or exploring the structure of psychological concepts. For example, in a
study of consumer preferences for different brands of soft drinks, MDS can be used to visualize the
similarity between the brands based on consumers' ratings of their taste, sweetness, and other sensory
attributes. MDS can reveal whether consumers perceive certain brands to be more similar to each other
than others and identify the dimensions that drive the perceived similarity.

Nonmetric

Non-metric multidimensional scaling (MDS) is a statistical technique used to visualize the similarity
between objects based on their pairwise rank-order dissimilarities. Non-metric MDS maps the objects
onto a low-dimensional space, typically two or three dimensions, while preserving their rank-order
dissimilarities as much as possible. In non-metric MDS, the objects are non-metric, and the rank-order
dissimilarities are non-metric.
Non-metric MDS is often used when the original dissimilarity scores are based on subjective judgments
or ordinal scales, where the numerical distance between the categories may not be meaningful. For
example, in a study of consumer preferences for different brands of coffee, the dissimilarity scores may
be based on the consumers' ratings of the brands' aroma, flavor, and acidity on an ordinal scale from 1 to
5. Non-metric MDS can be used to visualize the similarity between the brands based on the rank-order
dissimilarities, even if the numerical distance between the ratings is not meaningful.

Correspondence analysis is a statistical technique used to analyze the association between two
categorical variables and visualize their joint distribution. Correspondence analysis maps the categories
of the two variables onto a low-dimensional space, typically two or three dimensions, while preserving
the relative frequencies of the joint categories as much as possible. Correspondence analysis can reveal
whether there are any patterns or relationships between the categories of the two variables and identify
the specific categories that are driving the relationships.

Correspondence analysis is often used in market research, social sciences, and ecology to analyze the
relationship between two categorical variables, such as brand preference and demographic
characteristics, or plant species and environmental factors. Correspondence analysis can be extended to
multiple categorical variables using multiple correspondence analysis.

You might also like