0% found this document useful (0 votes)

34 views33 pages

Lesson 8 - Introduction To Data Analysis

This document provides an overview of data analytics and related concepts such as data warehousing, online analytical processing (OLAP), and data mining. It discusses how data is gathered from multiple sources and aggregated to generate reports and predictive models that can help with business decision making. Key steps in data analytics include data preparation, descriptive statistics, and inferential statistics. Specific analytical techniques covered include regression, association rules, and threats to conclusion validity when analyzing relationships in data.

Uploaded by

Nixon Mark

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views33 pages

Lesson 8 - Introduction To Data Analysis

Uploaded by

Nixon Mark

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 33

Lesson 11

INTRODUCTION TO RESEARCH
METHODS
Introduction to Data Analysis

Introduction to Research Methods: Dr (Eng.) Musebe 2023

1
Chapter 11: Data Analytics

 Overview

 Data Warehousing

 Online Analytical Processing

 Data Mining
Overview

 Data analytics:
– the processing of data to infer patterns, correlations, or
models for prediction

 Primarily used to make business decisions

– Per individual customer
• E.g. what product to suggest for purchase

– Across all customers

• E.g. what products to manufacture/stock, in what quantity

 Critical for businesses today

Overview (Cont.)
 Common steps in data analytics
– Gather data from multiple sources into one location
• Data warehouses also integrated data into common schema
• Data often needs to be extracted from source formats,
transformed to common schema, and loaded into the data
warehouse
– Generate aggregates and reports summarizing data
• Dashboards showing graphical charts/reports
• Online analytical processing (OLAP) systems allow
interactive querying
• Statistical analysis using tools such as R/SAS/SPSS
– Including extensions for parallel processing of big data
– Build predictive models and use the models for
decision making
Overview (Cont.)
 Predictive models are widely used today
– E.g. use customer profile features (e.g. income, age,
gender, education, employment) and past history of a
customer to predict likelihood of default on loan
• and use prediction to make loan decision
– E.g. use past history of sales (by season) to predict
future sales
• And use it to decide what/how much to produce/stock
• And to target customers
 Other examples of business decisions:
– What items to stock?
– What insurance premium to change?
– To whom to send advertisements?
Overview (Cont.)

 Machine learning techniques are key to finding

patterns in data and making predictions
 Data mining extends techniques developed by
machine-learning communities to run them on very
large datasets
 The term business intelligence (BI) is synonym for
data analytics
 The term decision support focuses on reporting
and aggregation
Data Analysis

 In most social research the data analysis involves

three major steps, done in roughly this order:

– Cleaning and organizing the data for analysis (Data

Preparation)

– Describing the data (Descriptive Statistics)

– Testing Hypotheses and Models (Inferential Statistics)

Data Warehousing
Data Analysis and OLAP
 Online Analytical Processing (OLAP)
– Interactive analysis of data, allowing data to be summarized
and viewed in different ways in an online fashion (with
negligible delay)
 We use the following relation to illustrate OLAP
concepts
– sales (item_name, color, clothes_size, quantity)
This is a simplified version of the sales fact table joined
with the dimension tables, and many attributes
removed (and some renamed)
Regression
 Regression deals with the prediction of a value, rather than
a class.
– Given values for a set of variables, X1, X2, …, Xn, we wish to
predict the value of a variable Y.
 One way is to infer coefficients a0, a1, a1, …, an such that
Y = a0 + a1 * X1 + a2 * X2 + … + an * Xn
 Finding such a linear polynomial is called linear regression.
– In general, the process of finding a curve that fits the data
is also called curve fitting.
 The fit may only be approximate
– because of noise in the data, or
– because the relationship is not exactly a polynomial
 Regression aims to find coefficients that give the best
possible fit.
Association Rules
 Retail shops are often interested in associations
between different items that people buy.
– Someone who buys bread is quite likely also to buy
milk
– A person who bought the book Database System
Concepts is quite likely also to buy the book Operating
System Concepts.
 Associations information can be used in several
ways.
– E.g. when a customer buys a particular book, an online
shop may suggest associated books.
Association Rules

 Association rules:
bread  milk
DB-Concepts, OS-Concepts  Networks
– Left hand side: antecedent, right hand side:
consequent
– An association rule must have an associated
population; the population consists of a set of
instances
• E.g. each transaction (sale) at a shop is an instance, and
the set of all transactions is the population
Association Rules (Cont.)
 Rules have an associated support, as well as an associated
confidence.
 Support is a measure of what fraction of the population
satisfies both the antecedent and the consequent of the
rule.
– E.g. suppose only 0.001 percent of all purchases include
milk and screwdrivers. The support for the rule is milk 
screwdrivers is low.
 Confidence is a measure of how often the consequent is
true when the antecedent is true.
– E.g. the rule bread  milk has a confidence of 80 percent if
80 percent of the purchases that include bread also include
milk.
 We omit further details, such as how to efficiently
infer association rules
Analysis

 Data Preparation involves:

– checking or logging the data in;
– checking the data for accuracy;
– entering the data into the computer;
transforming the data; and
– developing and documenting a database
structure that integrates the various measures.
Association Rules (Cont.)

 Descriptive Statistics are used to describe the

basic features of the data in a study.
 They provide simple summaries about the sample
and the measures.
 Together with simple graphics analysis, they form
the basis of virtually every quantitative analysis of
data.
 With descriptive statistics you are simply
describing what is, what the data shows.
Association Rules (Cont.)
 Inferential Statistics investigate questions, models and
hypotheses.
 In many cases, the conclusions from inferential statistics
extend beyond the immediate data alone.
 For instance, we use inferential statistics to try to infer
from the sample data what the population thinks.
– Or, we use inferential statistics to make judgments of the
probability that an observed difference between groups is
a dependable one or one that might have happened by
chance in this study.
 Thus, we use inferential statistics to make inferences
from our data to more general conditions; we use
descriptive statistics simply to describe what’s going on
in our data.
Conclusion Validity
 In many ways, conclusion validity is the most
important of the four validity types because it is
relevant whenever we are trying to decide if there is a
relationship in our observations (and that’s one of the
most basic aspects of any analysis).

 Perhaps we should start with an attempt at a

definition:

– Conclusion validity is the degree to which conclusions

we reach about relationships in our data are
reasonable.
Threats to Conclusion Validity
 A threat to conclusion validity is a factor that can lead
you to reach an incorrect conclusion about a
relationship in your observations.
 You can essentially make two kinds of errors about
relationships:
– Conclude that there is no relationship when in fact
there is (you missed the relationship or didn’t see it)
– Conclude that there is a relationship when in fact there
is not (you’re seeing things that aren’t there!)
 Most threats to conclusion validity have to do with the
first problem.
Finding no relationship when there is one
 When you’re looking for the needle in the haystack you
essentially have two basic problems:
– the tiny needle and too much hay.
 One important threat is low reliability of measures This can be
due to many factors including:
– poor question wording, bad instrument design or layout,
illegibility of field notes, and so on.
 In studies where you are evaluating a program you can
introduce noise through poor reliability of treatment
implementation.
 Random irrelevancies in the setting can also obscure your
ability to see a relationship.
 The types of people you have in your study can also make it
harder to see relationships.
Finding a relationship when there is not one
 In statistical analysis, we attempt to determine the
probability that the finding we get is a “real” one or
could have been a “chance” finding.
 In fact, we often use this probability to decide whether
to accept the statistical result as evidence that there is
a relationship.
 In the social sciences, researchers often use the rather
arbitrary value known as the 0.05 level of significance
to decide whether their result is credible or could be
considered a “fluke.”
 Essentially, the value 0.05 means that the result you got
could be expected to occur by chance at least 5 times
out of every 100 times you run the statistical analysis.
Problems that can lead to either conclusion error
 Every analysis is based on a variety of assumptions about the
nature of the data, the procedures you use to conduct the
analysis, and the match between these two.
– If you are not sensitive to the assumptions behind your analysis
you are likely to draw erroneous conclusions about relationships.
 In quantitative research we refer to this threat as the violated
assumptions of statistical tests.
– For instance, many statistical analyses assume that the data are
distributed normally — that the population from which they are
drawn would be distributed according to a “normal” or “bell-
shaped” curve.
 If that assumption is not true for your data and you use that
statistical test, you are likely to get an incorrect estimate of the
true relationship.
Improving Conclusion Validity

 So you may have a problem assuring that you are

reaching credible conclusions about relationships in
your data.
 What can you do about it?

 Here are some general guidelines you can follow in

designing your study that will help improve conclusion
validity.
Improving Conclusion Validity
 Good Statistical Power
 The rule of thumb in social research is that you want statistical power to
be greater than 0.8 in value. That is, you want to have at least 80 chances
out of 100 of finding a relationship when there is one.
 As pointed out in the discussion of statistical power, there are several
factors that interact to affect power.
– One thing you can usually do is to collect more information — use a larger
sample size.
– The second thing you can do is to increase your risk of making a Type I error
— increase the chance that you will find a relationship when it’s not there.
– In practical terms you can do that statistically by raising the alpha level. For
instance, instead of using a 0.05 significance level, you might use 0.10 as
your cutoff point.
Improving Conclusion Validity

 Good Reliability
 Reliability is related to the idea of noise or “error” that
obscures your ability to see a relationship.
 In general, you can improve reliability by:
– doing a better job of constructing measurement
instruments,
– by increasing the number of questions on any scale or
– by reducing situational distractions in the measurement
context.
Improving Conclusion Validity

 Good Implementation
 When you are studying the effects of interventions,
treatments or programs, you can improve conclusion
validity by assuring good implementation.
 This can be accomplished by training program
operators and standardizing the protocols for
administering the program.
Statistical Power

 There are four interrelated components that influence the conclusions you might reach
from a statistical test in a research project.
 The logic of statistical inference with respect to these components is often difficult to
understand and explain.
 The four components are:
– sample size;
– effect size is the salience of the treatment relative to the noise in measurement;
– alpha level (α, or significance level) is the odds that the observed result is due to chance;
– statistical power (1−β) is the odds that you will observe a treatment effect when it occurs.
Data Preparation

 Data Preparation involves:

– checking or logging the data in;

– checking the data for accuracy;

– entering the data into the computer;

– transforming the data, and

– developing and documenting a database structure that integrates the various measures.
Data Preparation
 Logging the Data
 In any research project you may have data coming from a number of different sources at different times:
– mail surveys returns
– coded interview data
– pretest or posttest data
– observational data
 In all but the simplest of studies, you need to set up a procedure for logging the information and keeping
track of it until you are ready to do a comprehensive data analysis.
 In most cases, you will want to set up a database that enables you to assess at any time what data is
already in and what is still outstanding.
Data Preparation
 Checking the Data For Accuracy
 As soon as data is received you should screen it for accuracy. In some circumstances doing this
right away will allow you to go back to the sample to clarify any problems or errors. There are
several questions you should ask as part of this initial data screening:
– Are the responses legible/readable?
– Are all important questions answered?
– Are the responses complete?
– Is all relevant contextual information included (e.g., data, time, place, researcher)?
 In most social research, quality of measurement is a major issue. Assuring that the data collection
process does not contribute inaccuracies will help assure the overall quality of subsequent analyses.
Data Preparation
 Developing a Database Structure
 The database structure is the manner in which you intend to store the data for the study
so that it can be accessed in subsequent data analyses.
 You might use the same structure you used for logging in the data or, in large complex
studies, you might have one structure for logging data and another for storing it.
 As mentioned above, there are generally two options for storing data on computer –
database programs and statistical programs.
 Usually database programs are the more complex of the two to learn and operate, but
they allow the analyst greater flexibility in manipulating the data.
Descriptive Statistics
 Descriptive statistics are used to describe the basic features of the data in a study.
 They provide simple summaries about the sample and the measures.
 Together with simple graphics analysis, they form the basis of virtually every
quantitative analysis of data.
 Descriptive statistics are typically distinguished from inferential statistics.
 With descriptive statistics you are simply describing what is or what the data shows.
 With inferential statistics, you are trying to reach conclusions that extend beyond the
immediate data alone.
Descriptive Statistics
 Univariate Analysis

 Univariate analysis involves the examination across cases of one variable at a time. There are three major
characteristics of a single variable that we tend to look at:
– the distribution

– the central tendency

– the dispersion

 In most situations, we would describe all three of these characteristics for each of the variables in our study.
Descriptive Statistics

 Correlation

 The correlation is one of the most common and most

useful statistics.
 A correlation is a single number that describes the
degree of relationship between two variables.

Mayers, Andrew - Introduction To Statistics and SPSS in Psychology-Pearson (2013)
100% (1)
Mayers, Andrew - Introduction To Statistics and SPSS in Psychology-Pearson (2013)
626 pages
Clinical Epidemiology - The Essentials (PDFDrive)
100% (3)
Clinical Epidemiology - The Essentials (PDFDrive)
349 pages
12320research Design: Qualitative, Quantitative, and Mixed Methods Approaches, 6e (Epub Convert) John W. Creswell Download PDF
No ratings yet
12320research Design: Qualitative, Quantitative, and Mixed Methods Approaches, 6e (Epub Convert) John W. Creswell Download PDF
50 pages
C207 Study Guide
No ratings yet
C207 Study Guide
27 pages
Introduction To Applied Statistics
100% (1)
Introduction To Applied Statistics
31 pages
Biostatistics in Public Health Using STATA-2016
100% (4)
Biostatistics in Public Health Using STATA-2016
202 pages
10.statistics Unit-X 2
No ratings yet
10.statistics Unit-X 2
32 pages
Statistical Modeling For Data Analysis
100% (1)
Statistical Modeling For Data Analysis
24 pages
Introduction To Statistics
100% (3)
Introduction To Statistics
43 pages
Lesson 4 Sample Size Determination
No ratings yet
Lesson 4 Sample Size Determination
16 pages
Data Analysis
100% (2)
Data Analysis
87 pages
L1 Intro Data Analytics
No ratings yet
L1 Intro Data Analytics
2 pages
Islamabad Semester Terminal Exam Autumn 2020 Name Zeenat Bibi Roll Number By479775 Program Bs English Course Name Introduction To Statistics
100% (1)
Islamabad Semester Terminal Exam Autumn 2020 Name Zeenat Bibi Roll Number By479775 Program Bs English Course Name Introduction To Statistics
23 pages
Defense Cheat Sheet
No ratings yet
Defense Cheat Sheet
4 pages
Week 27 Practical Research 2 G12
0% (2)
Week 27 Practical Research 2 G12
6 pages
GEA1000 Lecture Notes
No ratings yet
GEA1000 Lecture Notes
155 pages
Chapter 4 Data Collection
No ratings yet
Chapter 4 Data Collection
77 pages
Chapter 6 - Sexual Arousal
No ratings yet
Chapter 6 - Sexual Arousal
25 pages
CH 3-Final
No ratings yet
CH 3-Final
39 pages
Data Analysis and Interpretation: Major Points For Discussions
No ratings yet
Data Analysis and Interpretation: Major Points For Discussions
39 pages
Analytics PrepBook AnSoc 2017 PDF
100% (1)
Analytics PrepBook AnSoc 2017 PDF
41 pages
Chapter3 DataPreprocessing
No ratings yet
Chapter3 DataPreprocessing
50 pages
Example 1stst
No ratings yet
Example 1stst
10 pages
2020 Preprocessing
No ratings yet
2020 Preprocessing
63 pages
DM Merged
No ratings yet
DM Merged
169 pages
DP
No ratings yet
DP
44 pages
IBA Module2
No ratings yet
IBA Module2
27 pages
Leaving The Cathedral: Ryan Faulk August 18, 2020
No ratings yet
Leaving The Cathedral: Ryan Faulk August 18, 2020
100 pages
GEA1000 Lecture Notes
No ratings yet
GEA1000 Lecture Notes
156 pages
03 Preprocessing
No ratings yet
03 Preprocessing
80 pages
End To End Statistics For Data Science
No ratings yet
End To End Statistics For Data Science
28 pages
1overview On Data Analysis
No ratings yet
1overview On Data Analysis
67 pages
Chapter 2
No ratings yet
Chapter 2
136 pages
GEA1000 Lecture Notes
No ratings yet
GEA1000 Lecture Notes
157 pages
Hypothesis
No ratings yet
Hypothesis
11 pages
Research Chapter 6
No ratings yet
Research Chapter 6
29 pages
Kock2016 Minimum Sample Size Estimation in PLS-SEM
No ratings yet
Kock2016 Minimum Sample Size Estimation in PLS-SEM
35 pages
03 Preprocessing
No ratings yet
03 Preprocessing
60 pages
Forecasting Errors
No ratings yet
Forecasting Errors
59 pages
Chapter Five - Processing and Analysis
No ratings yet
Chapter Five - Processing and Analysis
27 pages
An Analytic Data Set (ADS) Is The
No ratings yet
An Analytic Data Set (ADS) Is The
27 pages
Week 10: Evaluation and Integration: Quantitative Data Analytics DR Alison Mcfarland Alison - Mcfarland@Kcl - Ac.Uk
No ratings yet
Week 10: Evaluation and Integration: Quantitative Data Analytics DR Alison Mcfarland Alison - Mcfarland@Kcl - Ac.Uk
37 pages
Wa0007.
No ratings yet
Wa0007.
43 pages
DWDM Unit 1 Chap2 PDF
No ratings yet
DWDM Unit 1 Chap2 PDF
21 pages
Big Data Chapter 3
No ratings yet
Big Data Chapter 3
29 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
62 pages
Big Data Chapter 2
No ratings yet
Big Data Chapter 2
62 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
54 pages
CH 6
No ratings yet
CH 6
42 pages
Chapter2 BI
No ratings yet
Chapter2 BI
77 pages
Chapter 8
No ratings yet
Chapter 8
36 pages
Quantitative Research Designs and Methods
No ratings yet
Quantitative Research Designs and Methods
21 pages
Big Data SYBBA (CA)
No ratings yet
Big Data SYBBA (CA)
12 pages
ECO 391 Lecture Slides - Part 2
No ratings yet
ECO 391 Lecture Slides - Part 2
26 pages
Business Analytics For Decision Making: Chandigarh Group of Colleges, Jhanjeri
No ratings yet
Business Analytics For Decision Making: Chandigarh Group of Colleges, Jhanjeri
48 pages
Data Analytics-11
No ratings yet
Data Analytics-11
23 pages
Fin 4130A IAS 17 LEASES
No ratings yet
Fin 4130A IAS 17 LEASES
8 pages
Ii Bba
No ratings yet
Ii Bba
16 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
50 pages
Phenomenology, Data Analysis and Referencing Styles - Notes
No ratings yet
Phenomenology, Data Analysis and Referencing Styles - Notes
31 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
54 pages
Epi & Statistics
No ratings yet
Epi & Statistics
135 pages
Descriptive Analytics I: Nature of Data,: Statistical Modeling, and Visualization
No ratings yet
Descriptive Analytics I: Nature of Data,: Statistical Modeling, and Visualization
76 pages
Assignment - DBB2102 - BBA 3 - Set-1 and 2 - July-Aug - 2024
No ratings yet
Assignment - DBB2102 - BBA 3 - Set-1 and 2 - July-Aug - 2024
18 pages
BCS 040
No ratings yet
BCS 040
21 pages
Graduate Level Statistics Compilation
No ratings yet
Graduate Level Statistics Compilation
48 pages
AB Test Notes
No ratings yet
AB Test Notes
7 pages
PTDLKT
No ratings yet
PTDLKT
11 pages
Data Analysis and Interpretation of Findings
No ratings yet
Data Analysis and Interpretation of Findings
34 pages
Memonetal JASEM Editorial V4 Iss2 June2020
No ratings yet
Memonetal JASEM Editorial V4 Iss2 June2020
21 pages
Module Data Analysis
No ratings yet
Module Data Analysis
6 pages
208 RM Lab File1 PDF
No ratings yet
208 RM Lab File1 PDF
31 pages
Study Guide For Mid Term
No ratings yet
Study Guide For Mid Term
9 pages
Impact of A Screening Family Genogram On First Encounters in Primary Care
No ratings yet
Impact of A Screening Family Genogram On First Encounters in Primary Care
11 pages
BRM CS
No ratings yet
BRM CS
4 pages
Power Analysis For Experimental Research A Practical Guide For The Biological Medical and Social Sciences 1st Edition R. Barker Bausell (Author)
No ratings yet
Power Analysis For Experimental Research A Practical Guide For The Biological Medical and Social Sciences 1st Edition R. Barker Bausell (Author)
56 pages
Statistical Treatment
No ratings yet
Statistical Treatment
7 pages
Sample Size Calculations in Clinical Research 2 Rev Exp Edition Shein-Chung Chow All Chapter Instant Download
100% (4)
Sample Size Calculations in Clinical Research 2 Rev Exp Edition Shein-Chung Chow All Chapter Instant Download
45 pages
Chapter 2 - Sex Research
No ratings yet
Chapter 2 - Sex Research
18 pages
How To Write A Thesis
No ratings yet
How To Write A Thesis
6 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
33 pages
2006 Legg & Nagy Why Most Conservation Monitoring Is, But Need Not Be, A Waste of Time
No ratings yet
2006 Legg & Nagy Why Most Conservation Monitoring Is, But Need Not Be, A Waste of Time
6 pages
2023 - Local Education Spending Mandates Indonesia S 20 Percent Rule
No ratings yet
2023 - Local Education Spending Mandates Indonesia S 20 Percent Rule
21 pages
Lecture Seven-Data Analysis and Report Writing
No ratings yet
Lecture Seven-Data Analysis and Report Writing
12 pages
Act 4110a Double Taxation
No ratings yet
Act 4110a Double Taxation
2 pages
Act 4110a Revision 111
No ratings yet
Act 4110a Revision 111
2 pages
Sampling Techniques and Sample Size Calculation: How and How Many Participants Should I Select For My Research?
No ratings yet
Sampling Techniques and Sample Size Calculation: How and How Many Participants Should I Select For My Research?
4 pages
Running Head: G POWER ANALYSIS 1
No ratings yet
Running Head: G POWER ANALYSIS 1
4 pages
Manual PIFACE Ingles
No ratings yet
Manual PIFACE Ingles
11 pages
Advancing The Counseling Profession Through Contemporary Quantitative Approaches
No ratings yet
Advancing The Counseling Profession Through Contemporary Quantitative Approaches
11 pages
L1111001R1 PA Pulse-Tube Validation Verification
No ratings yet
L1111001R1 PA Pulse-Tube Validation Verification
2 pages
An Empirical Investigation of Personality Traits of
No ratings yet
An Empirical Investigation of Personality Traits of
7 pages
Thinking Analytically: A Guide for Making Data-Driven Decisions
From Everand
Thinking Analytically: A Guide for Making Data-Driven Decisions
Jim Frost
No ratings yet
Microsoft Excel Statistical and Advanced Functions for Decision Making
From Everand
Microsoft Excel Statistical and Advanced Functions for Decision Making
Palani Murugappan
5/5 (2)
Data Analytics
From Everand
Data Analytics
Jeffery Short
1/5 (1)

Lesson 8 - Introduction To Data Analysis

Uploaded by

Lesson 8 - Introduction To Data Analysis

Uploaded by

Lesson 11

Introduction to Research Methods: Dr (Eng.) Musebe 2023

 Online Analytical Processing

 Primarily used to make business decisions

– Across all customers

 Critical for businesses today

 Machine learning techniques are key to finding

 In most social research the data analysis involves

– Cleaning and organizing the data for analysis (Data

– Describing the data (Descriptive Statistics)

– Testing Hypotheses and Models (Inferential Statistics)

 Data Preparation involves:

 Descriptive Statistics are used to describe the

 Perhaps we should start with an attempt at a

– Conclusion validity is the degree to which conclusions

 So you may have a problem assuring that you are

 Here are some general guidelines you can follow in

 Data Preparation involves:

– checking the data for accuracy;

– entering the data into the computer;

– transforming the data, and

– the central tendency

 The correlation is one of the most common and most

You might also like