0% found this document useful (0 votes)

47 views44 pages

Introduction To Data Viz Lecture 2

The document discusses statistical concepts and methods for describing and analyzing quantitative data, including different types of variables, levels of measurement, methods for collecting and sampling data, descriptive and inferential statistics, and techniques for summarizing data through graphs, measures of central tendency and variation, and contingency tables. It provides definitions and examples of key statistical terms and outlines topics to be covered in more depth, such as types of analyses, measures for univariate, bivariate and multivariate data, and data presentation methods.

Uploaded by

anderson

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views44 pages

Introduction To Data Viz Lecture 2

Uploaded by

anderson

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 44

STATISTICS &

ECONOMETRICS

Course Manager : T Tazvishaya

H.Acc, M.Acc, DipPharm, MsDA, (CTA & Law Student )

Contact Details : 0773610198

Email: [email protected]

1
CONTENT TO BE
COVERED
Data- Data sources
Types of variables
 Qualitative and quantitative variables
 Discrete and continuous variables
Levels of measurements
- Nominal, Ordinal, Interval, Ratio
Data Collection
 Data structures- Cross sectional, Time series, Panel
 Primary, Secondary
 Collection Methods-Questionnaire, Content Analysis etc
 Sample and Sampling Methods

2
CONTENT TO BE
COVERED
Types of statistics-Descriptive, Inferential
Describing Data using Graphs- Two way scatter, Box and whisker
plots, pie charts, pie charts etc
Describing Data using summaries
- measures of central tendency
- measures of variation
-measures of distribution
Confidence Interval – CI, p-value, Level of Significance test

3
CONTENT TO BE
COVERED
Analysis of Variance (ANOVA)-t-test, comparison of means
Correlation Matrix-Pearson, Spearman, Kendall.
Data Dimension Reduction Techniques
- PCA, FA,DA

4
INTRODUCTION TO
STATISTICS
Definition: (Statistics)
Science of collection, presentation, analysis, and reasonable
interpretation of data.

Statistics presents a rigorous scientific method for gaining insight into

data. For example, suppose we measure the weight of 100 patients in
a study. With so many measurements, simply looking at the data fails
to provide an informative account. However statistics can give an
instant overall picture of data based on graphical presentation or
numerical summarization irrespective to the number of data points.
Besides data summarization, another important task of statistics is to
make inference and predict relations of variables

5
6
DATA SIGHT

TAZVISHAYA 7
We are focusing on “quantitative analysis”
The general idea is to summarize and analyze data so that it is useful for
decision-making
We do this by calculating “measures of central tendency” and by looking for
relationships
 (We will NOT cover formal tests of hypotheses)

Primary vs. secondary data sources

Data on uses (system) vs. data on users (people)

8
DATA
Data may be continuous or discrete
Just looking at the data often does not enable one to ascertain what is
actually happening
Solution: Use appropriate descriptive statistics to summarize and
present results

9
A TAXONOMY OF
STATISTICS

10
TYPES OF STATISTICS
Techniques that summarize and describe characteristics of a group or
make comparisons of characteristics between groups are knows as
descriptive statistics.

Inferential statistics are used to make generalizations or inferences

about a population based on findings from a sample.

The choice of a type of analysis is based on the evaluation questions,

the type of data collected, and the audience who will receive the
results.

11
Three types of analysis

 Univariate analysis
 the examination of the distribution of cases on
only one variable at a time (e.g., college
graduation)
 Bivariate analysis
 the examination of two variables simultaneously
(e.g., the relation between gender and college
graduation)
 Multivariate analysis
 the examination of more than two variables
simultaneously (e.g., the relationship between
gender, race, and college graduation)
12
“Purpose”
 Univariate analysis

 Purpose: description

 Bivariate analysis

 Purpose: determining the empirical relationship

between the two variables

 Multivariate analysis

 Purpose: determining the empirical relationship among

the variables

13
UNIVARIATE ANALYSIS
Involves examination of the distribution of cases on only
ONE variable at a time

Frequency distributions are listings of the number of cases

in each attribute of a variable
 Ungrouped frequency distribution
 Grouped frequency distribution

Proportions express number of cases of the criterion

variable as part of the total population; frequency of
criterion variable divided by N

14
Percentages are simple 100 X proportion
 Or [100 X (frequency of criterion variable divided by N)]

Rates make comparisons more meaningful by controlling for population differences

15
TYPES OF VARIABLES

Continuous: increase steadily in tiny fractions

Discrete: jumps from category to category

16
BIVARIATE ANALYSIS
Bivariate analysis focus on the
relationship between two variables

17
CONTINGENCY TABLES
Format: attributes of independent variable are used as column
headings and attributes of the dependent variable are used as
row headings

Guidelines for presenting & interpreting contingency tables

 Contents of table described in title
 Attributes of each variable clearly described
 Base on which percentages are computed should be shown
 Norm is to percentage down & compare across
 Table should indicate # of cases omitted from analysis

18
MULTIVARIATE
ANALYSIS
Multivariate Analysis allow the separate and combined effects of the independent
variable to be examined

19
STATISTICAL DESCRIPTION
OF DATA
 Statistics describes a numeric set of data by
its
– Center
– Variability
– Shape
 Statistics describes a categorical set of data
by
– Frequency, percentage or proportion of
each category

20
Some Definitions
•Variable - any characteristic of an individual or entity. A variable can
take different values for different individuals. Variables can be
categorical or quantitative. Per S. S. Stevens…
• Nominal - Categorical variables with no inherent order or ranking
sequence such as names or classes (e.g., gender). Value may be a
numerical, but without numerical value (e.g., I, II, III). The only
operation that can be applied to Nominal variables is enumeration.
• Ordinal - Variables with an inherent rank or order, e.g. mild,
moderate, severe. Can be compared for equality, or greater or less, but
not how much greater or less.
 Unimodal - having a single peak
 Bimodal - having two distinct peaks
 Symmetric - left and right half are mirror images.

21
SOME DEFINITIONS

• Interval - Values of the variable are ordered as in Ordinal, and

additionally, differences between values are meaningful, however,
the scale is not absolutely anchored. Calendar dates and temperatures
on the Fahrenheit scale are examples. Addition and subtraction, but
not multiplication and division are meaningful operations.
• Ratio - Variables with all properties of Interval plus an absolute,
non-arbitrary zero point, e.g. age, weight, temperature (Kelvin).
Addition, subtraction, multiplication, and division are all meaningful
operations.
•Distribution - (of a variable) tells us what values the variable takes
and how often it takes these values.

22
DATA PRESENTATION
Two types of statistical presentation of data - graphical and numerical.

Graphical Presentation: We look for the overall pattern and for striking deviations
from that pattern. Over all pattern usually described by shape, center, and spread
of the data. An individual value that falls outside the overall pattern is called an
outlier.

Bar diagram and Pie charts are used for categorical variables.

Histogram, stem and leaf and Box-plot are used for numerical variables.

23
Data Presentation –Categorical
Variable
Bar Diagram: Lists the categories and presents the percent or count of individuals
who fall in each category.

Figure 1: Bar Chart of Subjects in

Tre atm ent Groups Treatment Frequency Proportion Percent
Group (%)
Nu m ber of Subjects

30
25
1 15 (15/60)=0.25 25.0
20
15 2 25 (25/60)=0.333 41.7
10
5
3 20 (20/60)=0.417 33.3
0 Total 60 1.00 100
1 2 3
Treatm ent Group

24
Data Presentation –Categorical
Variable
Pie Chart: Lists the categories and presents the percent or count of individuals
who fall in each category.

Figure 2: Pie Chart of Treatment Frequency Proportion Percent

Subjects in Treatment Groups Group (%)

1 15 (15/60)=0.25 25.0
25% 1 2 25 (25/60)=0.333 41.7
33%
2 3 20 (20/60)=0.417 33.3

3 Total 60 1.00 100

42%

25
GRAPHICAL PRESENTATION –
NUMERICAL VARIABLE
Histogram: Overall pattern can be described by its shape, center, and spread.
The following age distribution is right skewed. The center lies between 80 to
100. No outliers.

Mean 90.41666667
Figure 3: Age Distribution
Standard Error 3.902649518

16 Median 84
14 Mode 84
Number of Subjects

12 Standard Deviation 30.22979318

10
Sample Variance 913.8403955
8
Kurtosis -1.183899591
6
4 Skewness 0.389872725
2 Range 95
0 Minimum 48
40 60 80 100 120 140 More
Maximum 143
Age in Month
Sum 5425
Count 60

26
GRAPHICAL PRESENTATION –
NUMERICAL VARIABLE
Box-Plot: Describes the five-number summary

Figure 3: Distribution of Age

160
140
120
q1
100 min
80 median
60 max
q3
40
20
0
1

Box Plot

27
A fundamental concept in summary statistics is that of a central value for a set
of observations and the extent to which the central value characterizes the
whole set of data. Measures of central value such as the mean or median must
be coupled with measures of data dispersion (e.g., average distance from the
mean) to indicate how well the central value characterizes the data as a whole.

To understand how well a central value characterizes a set of observations, let

us consider the following two sets of data:
A: 30, 50, 70
B: 40, 50, 60
The mean of both two data sets is 50. But, the distance of the observations from
the mean in data set A is larger than in the data set B. Thus, the mean of data
set B is a better representation of the data set than is the case for set A.

28
Methods of Center Measurement

Center measurement is a summary measure of the overall level of a dataset

Commonly used methods are mean, median, mode, geometric mean etc.

Mean: Summing up all the observation and dividing by number of

observations. Mean of 20, 30, 40 is (20+30+40)/3 = 30.
Notation : Let x1 , x2, ...xn are n observations of a variable
x. Then the mean of this variable,
n

x1  x2  ...  xn x i
x  i 1

n n

29
Methods of Center Measurement

Median: The middle value in an ordered sequence of observations. That is, to

find the median we need to order the data set and then find the middle
value. In case of an even number of observations the average of the two
middle most values is the median. For example, to find the median of {9, 3, 6,
7, 5}, we first sort the data giving {3, 5, 6, 7, 9}, then choose the middle value
6. If the number of observations is even, e.g., {9, 3, 6, 7, 5, 2}, then the median
is the average of the two middle values from the sorted sequence, in this
case, (5 + 6) / 2 = 5.5.

Mode: The value that is observed most frequently. The mode is undefined
for sequences in which no observation is repeated.

30
Mean or Median

The median is less sensitive to outliers (extreme scores) than the mean and thus
a better measure than the mean for highly skewed distributions, e.g. family
income. For example mean of 20, 30, 40, and 990 is (20+30+40+990)/4 =270. The
median of these four observations is (30+40)/2 =35. Here 3 observations out of
4 lie between 20-40. So, the mean 270 really fails to give a realistic picture of
the major part of the data. It is influenced by extreme value 990.

31
Methods of Variability Measurement

Variability (or dispersion) measures the amount of scatter in a dataset.

Commonly used methods: range, variance, standard deviation, interquartile range,

coefficient of variation etc.

Range: The difference between the largest and the smallest observations. The
range of 10, 5, 2, 100 is (100-2)=98. It’s a crude measure of variability.

32
Methods of Variability Measurement

Variance: The variance of a set of observations is the average of the squares of

the deviations of the observations from their mean. In symbols, the variance of
the n observations x1, x2,…xn is
( x1  x ) 2  ....  ( xn  x ) 2
S 
2

n 1
Variance of 5, 7, 3? Mean is (5+7+3)/3 = 5 and the variance is

(5  5) 2  (3  5) 2  (7  5) 2
4
3 1
Standard Deviation: Square root of the variance. The standard deviation of the
above example is 2.

33
Methods of Variability Measurement

Quartiles: Data can be divided into four regions that cover the total range of
observed values. Cut points for these regions are known as quartiles.

In notations, quartiles of a data is the ((n+1)/4)qth observation of the data, where

q is the desired quartile and n is the number of observations of data.

The first quartile (Q1) is the first 25% of the data. The second quartile (Q2) is
between the 25th and 50th percentage points in the data. The upper bound of Q2 is
the median. The third quartile (Q3) is the 25% of the data lying between the
median and the 75% cut point in the data.

Q1 is the median of the first half of the ordered observations and Q3 is the
median of the second half of the ordered observations.

34
Methods of Variability Measurement

In the following example Q1= ((15+1)/4)1 =4th observation of the data. The 4th
observation is 11. So Q1 is of this data is 11.

An example with 15 numbers

3 6 7 11 13 22 30 40 44 50 52 61 68 80 94 Q1
Q2 Q3
The first quartile is Q1=11. The second quartile is Q2=40 (This is also the
Median.) The third quartile is Q3=61.

Inter-quartile Range: Difference between Q3 and Q1. Inter-quartile range of the

previous example is 61- 40=21. The middle half of the ordered data lie between 40
and 61.

35
Deciles and Percentiles
Deciles: If data is ordered and divided into 10 parts, then cut points are called
Deciles
Percentiles: If data is ordered and divided into 100 parts, then cut points are
called Percentiles. 25th percentile is the Q1, 50th percentile is the Median (Q2)
and the 75th percentile of the data is Q3.

In notations, percentiles of a data is the ((n+1)/100)p th observation of the data,

where p is the desired percentile and n is the number of observations of data.

Coefficient of Variation: The standard deviation of data divided by it’s mean. It is

usually expressed in percent.

Coefficient of Variation =  100
x

36
Skewness

 Measures asymmetry of data

 Positive or right skewed: Longer right tail
 Negative or left skewed: Longer left tail

Let x1 , x2 ,...xn be n observations. Then,

n
n  ( xi  x ) 3
Skewness  i 1
3/ 2
 n
2
  ( xi  x ) 
 i 1 

37
Kurtosis

 Measures peakedness of the distribution of data. The

kurtosis of normal distribution is 0.

Let x1 , x2 ,...xn be n observations. Then,

n
n ( xi  x ) 4
Kurtosis  i 1
2
3
 n 2
  ( xi  x ) 
 i 1 

38
Summary of the Variable ‘Age’ in the
given data set

Mean 90.41666667 Histogram of Age

Standard Error 3.902649518

10
Median 84
Mode 84

8
Standard Deviation 30.22979318

Number of Subjects

6
Sample Variance 913.8403955
Kurtosis -1.183899591

4
Skewness 0.389872725
Range 95 2

Minimum 48
0

Maximum 143
40 60 80 100 120 140 160
Sum 5425
Age in Month
Count 60

39
ANALYSIS--
INTRODUCTION
The BIG Questions:
 What are you trying to discover or show?
 How will you present the results?

From survey to report

 Flow of information
 Sample surveys

Brief comparison of SAS & R

40
DATA COLLECTION
INSTRUMENTS
Questionnaires & surveys
Transactions logs
Experimental observation
Bills & invoices
Census forms & reports
Pre-packaged data sets
Content analysis

41
ISSUES IN RESEARCH
DESIGN
Case study vs. statistical sample
What is the universe ? (uses, users, etc.)
 Example: political debate over “average tax cut” vs. “tax cut for the average family”

Is the sample representative ?

 Volumes vs. titles in the library

Does correlation imply causality?

 Do we need to identify the pathogen?

Controlling for outside factors

42
SAMPLE SIZE & SAMPLING
METHODS
How large a sample is needed?
 The larger the sample the more accurate the results
(unless the response rate becomes very low)
 The larger the sample the more the cost/effort
Sample size does NOT depend on the size of the population
Rules of thumb
 100 for 95% confidence, 5% tolerance, 90-10 expected split
 400 for 95% confidence, 5% tolerance, 50-50 expected split
 30 – 50 in each cell on n x m discrete classes

43
SOURCES OF ERROR
The respondent
The investigator
Sampling error
Change in the system itself
Coding & analysis
Model specification (Oversimplification and Under simplification)

Signed Off Statistics and Probability11 q2 m3 Random Sampling and Sampling Distribution v3
No ratings yet
Signed Off Statistics and Probability11 q2 m3 Random Sampling and Sampling Distribution v3
64 pages
Eco2061 Week 2
No ratings yet
Eco2061 Week 2
68 pages
STAT 111: Introduction To Statistics & Probability For Actuaries
100% (2)
STAT 111: Introduction To Statistics & Probability For Actuaries
230 pages
MAT 361 Lecture 15 16
No ratings yet
MAT 361 Lecture 15 16
40 pages
INTRODUCTION TO STATIATICS Basic Medical Sciences
No ratings yet
INTRODUCTION TO STATIATICS Basic Medical Sciences
79 pages
ST1009 - Week 1
No ratings yet
ST1009 - Week 1
26 pages
Data Analysis Uni-Variate Bivariate
No ratings yet
Data Analysis Uni-Variate Bivariate
56 pages
Lecture 4
No ratings yet
Lecture 4
61 pages
1 Biostatistics LECTURE 1
100% (1)
1 Biostatistics LECTURE 1
64 pages
Emdad Rahman
No ratings yet
Emdad Rahman
85 pages
Lecture 1 Statistics and Lecture2
No ratings yet
Lecture 1 Statistics and Lecture2
44 pages
Lecture 1
No ratings yet
Lecture 1
63 pages
Nonlinear Nonparametric Statistics: Using Partial Moments
100% (2)
Nonlinear Nonparametric Statistics: Using Partial Moments
101 pages
Basics of Statistics
No ratings yet
Basics of Statistics
40 pages
STA132 Complete Note
No ratings yet
STA132 Complete Note
110 pages
Eknm 201 - Statistics I Departments: Business Administration International Trade and Finance
No ratings yet
Eknm 201 - Statistics I Departments: Business Administration International Trade and Finance
94 pages
Biostatistics - I
No ratings yet
Biostatistics - I
46 pages
CH-1 Stat
No ratings yet
CH-1 Stat
29 pages
Statistical Foundations - Intro 64zlf
100% (2)
Statistical Foundations - Intro 64zlf
86 pages
Unit-2 MFAI
No ratings yet
Unit-2 MFAI
118 pages
Statistical Analysis (Lecture 1)
No ratings yet
Statistical Analysis (Lecture 1)
40 pages
Data presentation2023-MRM112-3
No ratings yet
Data presentation2023-MRM112-3
17 pages
Lecture 01 Introduction To Statistics PPT 06022025 095924am
No ratings yet
Lecture 01 Introduction To Statistics PPT 06022025 095924am
40 pages
STAT-702 Unit # 1
No ratings yet
STAT-702 Unit # 1
84 pages
Probability Statistics With R For Engineers and Scientists 1st Edition Michael Akritas - The Complete Ebook Version Is Now Available For Download
100% (2)
Probability Statistics With R For Engineers and Scientists 1st Edition Michael Akritas - The Complete Ebook Version Is Now Available For Download
77 pages
Statistics
No ratings yet
Statistics
61 pages
Unit 1 - Examining Distributions
No ratings yet
Unit 1 - Examining Distributions
80 pages
Ns Statistics 2022
No ratings yet
Ns Statistics 2022
70 pages
Yatchew A. Semiparametric Regression For The Applied Econometrician (CUP, 2003) (ISBN 0521812836) (235s) - GL
100% (1)
Yatchew A. Semiparametric Regression For The Applied Econometrician (CUP, 2003) (ISBN 0521812836) (235s) - GL
235 pages
Classification, Collection & Presentation of Data
100% (2)
Classification, Collection & Presentation of Data
6 pages
2.introduction To Statistics
No ratings yet
2.introduction To Statistics
51 pages
Intro To Biostatistics Lecture BSMLS 3-A&B
No ratings yet
Intro To Biostatistics Lecture BSMLS 3-A&B
74 pages
EPA Test Methods For Evaluating Solid Waste, Physical Chemical Methods
No ratings yet
EPA Test Methods For Evaluating Solid Waste, Physical Chemical Methods
79 pages
Lecture No 01 Statistics 13-2-24
No ratings yet
Lecture No 01 Statistics 13-2-24
34 pages
Chapter 1 & 2 - Stats
No ratings yet
Chapter 1 & 2 - Stats
5 pages
Basics of Business Statistics
100% (1)
Basics of Business Statistics
66 pages
Module 2 - Statistical Foundations
No ratings yet
Module 2 - Statistical Foundations
108 pages
QT Module-2
No ratings yet
QT Module-2
45 pages
Stats For PGDM
No ratings yet
Stats For PGDM
52 pages
Introduction To Statistics: "There Are Three Kinds of Lies: Lies, Damned Lies, and Statistics." (B.Disraeli)
No ratings yet
Introduction To Statistics: "There Are Three Kinds of Lies: Lies, Damned Lies, and Statistics." (B.Disraeli)
32 pages
Descriptive Statistics, Tables and Graphs 20
No ratings yet
Descriptive Statistics, Tables and Graphs 20
34 pages
Intro To Statistics Lecture
No ratings yet
Intro To Statistics Lecture
41 pages
Note For Int To Statistics
No ratings yet
Note For Int To Statistics
24 pages
Chapters 1 and 2chapters 1 and 2chapters 1 and 2chapters 1 and 2chapters 1 and 2
No ratings yet
Chapters 1 and 2chapters 1 and 2chapters 1 and 2chapters 1 and 2chapters 1 and 2
47 pages
Introduction Book 1
No ratings yet
Introduction Book 1
41 pages
Statistics - Basic Concepts
No ratings yet
Statistics - Basic Concepts
29 pages
RVO-STATISTICS - Statistics - Introduction To Statistics IBBI
No ratings yet
RVO-STATISTICS - Statistics - Introduction To Statistics IBBI
93 pages
1 Null Alternative Hypothesis SPTC 1301 Q4 FPF
No ratings yet
1 Null Alternative Hypothesis SPTC 1301 Q4 FPF
36 pages
STAT Module I Notes
No ratings yet
STAT Module I Notes
10 pages
Introduction To Statistics
100% (3)
Introduction To Statistics
43 pages
Basic Concepts in Statistics
No ratings yet
Basic Concepts in Statistics
42 pages
Lecture 1-Statistics Introduction-Defining, Displaying and Summarizing Data
No ratings yet
Lecture 1-Statistics Introduction-Defining, Displaying and Summarizing Data
53 pages
Chapter 1 AND 2-b.s.
No ratings yet
Chapter 1 AND 2-b.s.
9 pages
Unofficial Cheat Sheet For Forecasting
No ratings yet
Unofficial Cheat Sheet For Forecasting
2 pages
Part1 141104090445 Conversion Gate01
No ratings yet
Part1 141104090445 Conversion Gate01
27 pages
Statistic Reviewer
No ratings yet
Statistic Reviewer
9 pages
WK 1b Biostat
No ratings yet
WK 1b Biostat
38 pages
1 - Introduction To Statistics
No ratings yet
1 - Introduction To Statistics
34 pages
Introduction To Stati Stics: There Are Three Kinds of Lies: Lies, Damned Lies, A ND Statistics." (B.Disraeli)
No ratings yet
Introduction To Stati Stics: There Are Three Kinds of Lies: Lies, Damned Lies, A ND Statistics." (B.Disraeli)
39 pages
Written Report Gathering and Organizing Data
No ratings yet
Written Report Gathering and Organizing Data
13 pages
Introduction Bus Statistics
No ratings yet
Introduction Bus Statistics
32 pages
Course Introduction Inferential Statistics Prof. Sandy A. Lerio
No ratings yet
Course Introduction Inferential Statistics Prof. Sandy A. Lerio
46 pages
Statistics Analysis With Software Application
No ratings yet
Statistics Analysis With Software Application
22 pages
Financial Econometrics and Statistical Arbitrage: Administrative Details
No ratings yet
Financial Econometrics and Statistical Arbitrage: Administrative Details
22 pages
CCP303
No ratings yet
CCP303
17 pages
Newbold Chapter 7
No ratings yet
Newbold Chapter 7
62 pages
Bustat Reviewer
No ratings yet
Bustat Reviewer
6 pages
LP Chapter 4 Lesson 3 Confidence Interval Estimate of The Population Mean
No ratings yet
LP Chapter 4 Lesson 3 Confidence Interval Estimate of The Population Mean
2 pages
MSF Hand Book 24-25
No ratings yet
MSF Hand Book 24-25
29 pages
Lecture 1 - Introduction To Statistics
No ratings yet
Lecture 1 - Introduction To Statistics
3 pages
VaR Vs CVaR CARISMA Conference 2010
No ratings yet
VaR Vs CVaR CARISMA Conference 2010
75 pages
Chapter 3 - Clean Random Variables and Probability Distributions Notes
No ratings yet
Chapter 3 - Clean Random Variables and Probability Distributions Notes
17 pages
SMA 6304 / MIT 2.853 / MIT 2.854: Manufacturing Systems
No ratings yet
SMA 6304 / MIT 2.853 / MIT 2.854: Manufacturing Systems
35 pages
PRJ Sales Forecasting
No ratings yet
PRJ Sales Forecasting
22 pages
F Stat and One Way ANOVA
No ratings yet
F Stat and One Way ANOVA
22 pages
Module7-Coefficient of Variation and Skewness (Grouped Data) (Business)
No ratings yet
Module7-Coefficient of Variation and Skewness (Grouped Data) (Business)
7 pages
T-Test: T-TEST GROUPS VAR00001 (1 2) /missing Analysis /VARIABLES Rata2 Rata21 Rata22 Rata23 Rata24 /CRITERIA CI (.95)
No ratings yet
T-Test: T-TEST GROUPS VAR00001 (1 2) /missing Analysis /VARIABLES Rata2 Rata21 Rata22 Rata23 Rata24 /CRITERIA CI (.95)
3 pages
Bayesian Parameter Estimation
No ratings yet
Bayesian Parameter Estimation
40 pages
Notes: KNN: K-Nearest Neighbors
No ratings yet
Notes: KNN: K-Nearest Neighbors
4 pages
Introduction To Statistics Probability
No ratings yet
Introduction To Statistics Probability
16 pages
Demand Pattern: 1. Plot The Demand and Share The Characteristics of Demand Pattern
No ratings yet
Demand Pattern: 1. Plot The Demand and Share The Characteristics of Demand Pattern
10 pages
For 120 X 140
No ratings yet
For 120 X 140
5 pages
Artikel Ahmad Fadhil Imran PDF
No ratings yet
Artikel Ahmad Fadhil Imran PDF
5 pages
Chapter 6 - Utilization of Assessment Data Module 11
No ratings yet
Chapter 6 - Utilization of Assessment Data Module 11
6 pages
Key Ingredients To Inferential Statistics
No ratings yet
Key Ingredients To Inferential Statistics
4 pages
A Guide To Select Appropriate Multivariable and Multivariate Statistical Methods 2021
No ratings yet
A Guide To Select Appropriate Multivariable and Multivariate Statistical Methods 2021
4 pages
Gibbs Sampling Algorithm For MRFS: X, X, - . - , X T,, - . - , T T X I,, - . - , N X X T P X X X
No ratings yet
Gibbs Sampling Algorithm For MRFS: X, X, - . - , X T,, - . - , T T X I,, - . - , N X X T P X X X
2 pages
Run Test
No ratings yet
Run Test
2 pages
Descriptive Statistics: Six Sigma Thinking, #3
From Everand
Descriptive Statistics: Six Sigma Thinking, #3
Sumeet Savant
No ratings yet
Statistics: a QuickStudy Laminated Reference Guide
From Everand
Statistics: a QuickStudy Laminated Reference Guide
BarCharts Publishing, Inc.
No ratings yet

Introduction To Data Viz Lecture 2

Uploaded by

Introduction To Data Viz Lecture 2

Uploaded by

STATISTICS &

Course Manager : T Tazvishaya

Contact Details : 0773610198

Statistics presents a rigorous scientific method for gaining insight into

Primary vs. secondary data sources

Inferential statistics are used to make generalizations or inferences

The choice of a type of analysis is based on the evaluation questions,

 Purpose: determining the empirical relationship

 Purpose: determining the empirical relationship among

Frequency distributions are listings of the number of cases

Proportions express number of cases of the criterion

Rates make comparisons more meaningful by controlling for population differences

Continuous: increase steadily in tiny fractions

Discrete: jumps from category to category

Guidelines for presenting & interpreting contingency tables

• Interval - Values of the variable are ordered as in Ordinal, and

Figure 1: Bar Chart of Subjects in

Figure 2: Pie Chart of Treatment Frequency Proportion Percent

3 Total 60 1.00 100

12 Standard Deviation 30.22979318

Figure 3: Distribution of Age

To understand how well a central value characterizes a set of observations, let

Center measurement is a summary measure of the overall level of a dataset

Mean: Summing up all the observation and dividing by number of

Median: The middle value in an ordered sequence of observations. That is, to

Variability (or dispersion) measures the amount of scatter in a dataset.

Commonly used methods: range, variance, standard deviation, interquartile range,

Variance: The variance of a set of observations is the average of the squares of

In notations, quartiles of a data is the ((n+1)/4)qth observation of the data, where

An example with 15 numbers

Inter-quartile Range: Difference between Q3 and Q1. Inter-quartile range of the

In notations, percentiles of a data is the ((n+1)/100)p th observation of the data,

Coefficient of Variation: The standard deviation of data divided by it’s mean. It is

 Measures asymmetry of data

Let x1 , x2 ,...xn be n observations. Then,

 Measures peakedness of the distribution of data. The

Let x1 , x2 ,...xn be n observations. Then,

Mean 90.41666667 Histogram of Age

Standard Error 3.902649518

From survey to report

Brief comparison of SAS & R

Is the sample representative ?

Does correlation imply causality?

Controlling for outside factors

You might also like