0% found this document useful (0 votes)

56 views16 pages

INTRODUCTION TO STATISTICS Notes

This document provides an introduction to statistics. It discusses key concepts like population, sample, descriptive statistics, inferential statistics, measures of central tendency (mean, median, mode), measures of variability (range, standard deviation), and data types (qualitative, quantitative). Descriptive statistics summarize and organize data, while inferential statistics make inferences about populations from samples.

Uploaded by

sourav guha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views16 pages

INTRODUCTION TO STATISTICS Notes

Uploaded by

sourav guha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

➕

INTRODUCTION TO STATISTICS
Statistics is a field which is omnipresent.

💡 Statistics is the science that deals with collecting, analyzing and

interpreting data with the help of mathematical tools .

POPULATION : It is a set or collection of items of interest in a

statistical study .

SAMPLE : A sample is a subset of items that have been

collected from the population .

Majorly there are 2 types of statistics :

 Descriptive statistics

Methods of organizing , summarizing and presenting numerical data fall into this
area of stats. It is the statistics carried out on an entire population .

INTRODUCTION TO STATISTICS 1
2. Inferential statistics
Problems involving statistical inference arise when a statistician takes a sample
from a population and wishes to make statements about the population
characteristics from the information in the sample .

INTRODUCTION TO STATISTICS 2
DIFFERENCE BETWEEN DESCRIPTIVE AND INFERENTIAL STATISTICS

DESCRIPTIVE STATISTICS INFERENTIAL STATISTICS

It gives information about raw data which It makes inference about about population
describes the data in some manner . using data drawn from the population.
It helps in organizing , analyzing and It allows us to compare data , make
presenting data in a meaningful manner . hypothesis and predictions.

It is used to explain the chance of

It is used to describe a situation.
occurrence of an event .
It explains already known data and limited to It attempts to reach a conclusion about the
a sample or population having small size population .
It can be achieved with the help of charts ,
It can be achieved by probability .
graphs and tables etc.
Untitled

Measure of central tendency

 MEAN : Mean is a measure of central tendency which is majorly used with
probability distribution especially central limit theorem. It is the measure of
where the center of a data set lies. It is also known as arithmetic mean or
average . It is calculated by adding all of the numbers together and dividing by
no. of items in the set . Eg: Population mean.

Other types of mean : weighted mean, harmonic mean , geometric mean,

arithmetic geometric mean .
Eg: mean = 7912/3.

2. MEDIAN : It is the middle no. in a data set , once the data is arranged in either
ascending or descending order.
Eg: 1,2,3,4,5,6,7,8,9,10. Equal median no. for example is 5 & 6 . In order to find the
median we will have to calculate the arithmetic mean of 5 and 6 .

3. MODE : Mode is the most common no. in a set . Foe example : 21,21,21,23,24,26.
High frequency in a set means the most repeated no. like 21 .

INTRODUCTION TO STATISTICS 3
Measure of variability
 RANGE : It is the difference between maximum value and minimum value of
data set .

Eg : 21,21,21,23,24,25,26,28,29,31,33.

Range= 3321 = 12 .

💡 Unusually high or unusually low data is known as outlier .

💡 Range is not a reliable measure when outlier is present in data set .

2. STANDARD DEVIATION : Standard deviation is the measure of the amount of

variation or dispersion of a set of values. A low standard deviation indicates that
the values of the data set tend to be close to the mean . While a high standard
deviation indicates that the values are spread out over a wider range. Standard
deviation is abbreviated as SD and is represented in mathematical texts and
equations as lower case Greek letter sigma for population SD and the Latin letter S
for sample SD . The SD of a random variable or sample or statistical population or
probability distribution or data set is the square root of its variance .The SD of a
population and a standard error of a statistics are two different things but related .

INTRODUCTION TO STATISTICS 4
STEPS TO CALCULATE VARIANCE AND HENCE STANDARD
DEVIATION :

STEP 1 The mean value is calculated by adding all the data

points and dividing by no. of data points .

STEP 2 The variance for each data point is calculated by

subtracting the mean from the value of data point . Each of the
resulting value is then squared and thus the result is summed
.This result is divided by the no. of data points ;(less one for
sample SD )

INTRODUCTION TO STATISTICS 5
STEP 3 : The square root of the variance is then used to find
standard deviation .

Standard deviation is a very useful tool in investing and trading strategies as it

helps to measure market trends and predict performance . Lower SD is not
necessarily preferable. It is one of the key fundamental risk measure that analysts,
portfolio managers and advisors use . A large dispersion shows how much the
return on he fund is deviating from expected normal returns .

STANDARD DEVIATION v/s VARIANCE

The variance helps to measure the data's spread size when compared to mean
value . As the variance gets bigger more variation in data values occur and there
may be a larger gap between one dat value and another. If the data values are all
close together the variance will be smaller , however this is more difficult to grasp
than the standard deviation because variances represent a squared result that
may not be meaningfully expressed on the same graph as the original data set .

Standard deviations are usually easier to picture and apply . The standard
deviation is expressed in the same unit of measurement as data which is not the
case with variance . Using SD it can be determined if the data has normal curve or
other mathematical relationship . Larger variances cause more data points to fall
outside the SD. Smaller variances result in more data that is close to average .

💡 DRAWBACK OF SD : The biggest drawback of standard deviation is that

it can be impacted by outliers and extreme values .

AVERAGE ABSOLUTE DEVIATON (AAD)

AAD of a data set is the average of the absolute (positive) deviations from a
central point . In general form the central point can be mean, median or mode .
AAD includes mean absolute deviation MAD and Median absolute deviation
MAD.

INTRODUCTION TO STATISTICS 6
Two types of mean absolute deviation :

 Mean absolute deviation around mean

 Mean absolute deviation around median.

Two types of Median absolute deviation :

 Median absolute deviation around mean

 Median absolute deviation around median.

MAXIMUM ABSOLUTE DEVIATION

Maximum absolute deviation around an arbitrary point is the maximum of the
absolute deviations of a sample from that point . It is not a strict measure of
central tendency .

DATA SET
A data set is a collection of data of all kinds .
Majorly there are 2 types of dat set :

 Qualitative data type

It is also known as categorical data. It describes the object under consideration

using a finite set of discrete classes. It means that this type of data can not be
counted or measured easily using numbers and therefore divided into categories .

Example : Gender of a person .

Qualitative data type has 2 subtypes :

 Nominal : These are sets of values that don't possess a natural ordering . Eg:
colour, gender of persons.

 Ordinal : These type of values have a natural ordering while maintaining their
class of value . Eg: If we consider the size of a clothing brand then we can

INTRODUCTION TO STATISTICS 7
easily sort them according to their name tag in order of small , medium and
large . The grading system while marking candidate in a test can also be
considered as an ordinal data type where A is definitely better than S grade .

💡 NOTE : These categories help us in defining which encoding strategy

can be applied to which type of data . For nominal data type where there
is no comparison among the categories one not encoding can be applied
which is similar to binary coding . For ordinal type label encoding applied
which is form of integer in coding.

2. Quantitative data type

This data type tries to quantify things and it does so by considering numerical
values that make in comfortable in nature . Eg: Price of smart phone .

💡 The key thing is that there can be an infinite no. of value a feature can
take for eg; the price of a smart phone can vary from "X" amount to any
value .

Quantitative data type has 2 sub types :

 Discrete data type : The numerical values which fall under integers or whole
numbers are placed under this category . For example : The no. of speakers in
a cell phone or no. of SIM cards .

 Continuous data type : The fractional numbers or decimal values are

considered as continuous . For example : The android version of a phone or
the frequency of Wi Fi , height of a person .

You can give numbering to ordinal data then it should be called

discrete type or ordinal type ?

INTRODUCTION TO STATISTICS 8
The truth is that it is still ordinal data. The reason for this is that, even if the
numbering is done , it does not convey the actual distances between the classes .
Eg: Consider the grading system of a test . The respective grades can be
A,B,C,D,E and if we number them from starting then it would be 1,2,3,4,5.
Now according to the numerical difference the distance between D& E grades is
the same as the distance between C & D which is not very accurate , as we all
know that C grade is till acceptable than E .

💡 NOTE We have discussed all the major classification of data . This is

important because now we can prioritize the tests to be performed on
different categories . Now it makes sense to plot a histogram or
frequency plot for quantitative data and a pie chart and bar plot for
qualitative data.

💡 Regression analysis where the relationship between one dependent and

other independent variables is possible only for quantitative data.

💡 ANOVA is applicable only for qualitative data .

GRAPHICAL REPRESTATION
Graphics can be used as an effective method of visual communication . Statistical
graphics are beneficial for presentation and analysis of data . The statistical
graphic forms that we usually encounter are line chart, bar or common charts,
grouped bar charts, combination charts, pie charts and pictorial charts .

 LINE CHARTS : These use lines between data points to depict magnitudes of
data for 2 variables or for one variable over time . A line chart for a time series
is known as time series plot or sequence plot .

INTRODUCTION TO STATISTICS 9
💡 Data values for a variable overtime are known as time series .

2. BAR CHART OR COLUMN CHART : Bar charts are used to depict magnitude of
data for different qualitative categories or overtime . The length/height of bars
allows the user to compare magnitudes easily .

INTRODUCTION TO STATISTICS 10
3. GROUPED BAR CHARTS : These can be used to depict the magnitude of 2 or
more grouped dat , values for different qualitative categories or overtime .

INTRODUCTION TO STATISTICS 11
4. COMBINATION CHARTS : These charts use lines and bars to depict the
magnitudes of 2 or more data values for different categories or for different times.

INTRODUCTION TO STATISTICS 12
5. PIE CHARTS : Pie charts can be used effectively to depict the proportions or
percentages of the total quantity that corresponds to several qualitative
categories. Each category is depicted as a wedge of a circle or a piece of a pie .
The angle in degrees of each wedge is equal to the categories proportion
multiplied by 360 degree.

6. PICTORIAL CHARTS : These charts use pictorial symbols to represent data .

These are often used to gain attention but it can be difficult to interpret and are
also misused at times .

INTRODUCTION TO STATISTICS 13
SEGMENTING DATA
We often talk about the top 25% or top 10% or top 5% or top 1% of something,

When we are segmenting data into percentages we commonly are talking about
quartiles, deciles, quintiles and percentiles respectively.

💡 Quartiles divide the data into 4 parts .

💡 Deciles divide the data into 10 parts .

💡 Quintiles divide the data into 5 parts .

💡 Percentiles divide the data into 100 parts .

INTRODUCTION TO STATISTICS 14
KEY FEATURES OF QUARTILES

 The quartile measures the spread of values above and below the mean or
median by dividing the distribution into 4 groups .

 A quartile divides data into 3 points : a lower quartile, median and an upper
quartile to form 4 groups of the data set .

 Quartiles are used to calculate the inter quartile range which is a measure of
variability around the median .

The quartiles of a data set divide the data into 4 equal parts with 1/4th of the data
values in each part.
The first quartile Q1 is the median of the first half of the data set and marks the
point at which 25% of the data values are lower and 75% are higher.
The second quartile Q2 is the median of the data set which divides the data set
in half .
The third quartile Q3 is the median of the second half of the data set and marks
the point at which 25% of the data values are higher and 75% are lower.
For example : 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15. ( ascending order )

Q2 = 8
Q1 4.5 ( 45/2
Q3 = 12.5 1213/2 )

DECILES AND PERCENTILES

Deciles and percentiles are usually applied to large data sets .

Deciles divide the data set into 10 equal parts and percentiles divide them into
100 equal parts .

INTRODUCTION TO STATISTICS 15
One example of the use of deciles is in school awards or rankings .For example:
students in the top 10% may be given an award , if there are 578 students in a
graduating class the top 10% or 58 student may be given the award .

Similarly ,at the opposite end if the scale students who score in the bottom 10 %
or 20% may be given extra assistance to boost their scores.

Percentiles divide the data set into groupings of 1% . Standardized tests often
report percentile scores ,these score help compare student's performance to that
of their peers (often across a state or country ). The meaning of a percentile score
reflects the percentage of students whose scored at or above that particular
group of students.
For example : Students who receive a percentile ranking of 87 on a particular test
received scores that were equal to or higher than 87% of students who took the
test .

💡 DO NOT mistake percentile for the score of the student .

Growth charts are another common example of an application of percentile .

INTRODUCTION TO STATISTICS 16

Project Report PDF
No ratings yet
Project Report PDF
14 pages
Educ 201
No ratings yet
Educ 201
2 pages
Statistics For Data Science
No ratings yet
Statistics For Data Science
93 pages
Statistics - Compendium - DMS IIT DELHI - 2025
No ratings yet
Statistics - Compendium - DMS IIT DELHI - 2025
18 pages
Module 6 Statistics
No ratings yet
Module 6 Statistics
44 pages
Chapter1 Statistics
No ratings yet
Chapter1 Statistics
17 pages
Statistics and Probabilities Quarter 1
No ratings yet
Statistics and Probabilities Quarter 1
6 pages
Statistics Notes
No ratings yet
Statistics Notes
16 pages
Midterms Gec Math Adooooor
No ratings yet
Midterms Gec Math Adooooor
6 pages
Pca Tutorial
No ratings yet
Pca Tutorial
27 pages
Face Recognition Using PCA
No ratings yet
Face Recognition Using PCA
23 pages
Statistics - Imp Points
No ratings yet
Statistics - Imp Points
6 pages
Lesson 02 Probability and Statistics
No ratings yet
Lesson 02 Probability and Statistics
127 pages
Principal Components
No ratings yet
Principal Components
22 pages
Article Review 1 Eng
No ratings yet
Article Review 1 Eng
30 pages
BBA Statistics
No ratings yet
BBA Statistics
4 pages
Tutorial On PCA
No ratings yet
Tutorial On PCA
27 pages
Math Test Prep File
No ratings yet
Math Test Prep File
88 pages
Statistics, Statistical Modelling & Data Analytics
No ratings yet
Statistics, Statistical Modelling & Data Analytics
68 pages
Week 4 Bioscience
No ratings yet
Week 4 Bioscience
37 pages
Statistics
No ratings yet
Statistics
13 pages
Chapter2-Statistical Analysis
No ratings yet
Chapter2-Statistical Analysis
86 pages
Math Project
No ratings yet
Math Project
21 pages
Day 01-Basic Statistics
No ratings yet
Day 01-Basic Statistics
36 pages
Statistics SLM
No ratings yet
Statistics SLM
7 pages
Tian Statistics Lesson 3 Descriptive Statistics
No ratings yet
Tian Statistics Lesson 3 Descriptive Statistics
64 pages
2 - Introduction To Statistics
No ratings yet
2 - Introduction To Statistics
97 pages
Mathematics in The Modern World
No ratings yet
Mathematics in The Modern World
13 pages
Statistics Notes Self Made
100% (1)
Statistics Notes Self Made
41 pages
Data Management
No ratings yet
Data Management
48 pages
Statistics Lecture 1
No ratings yet
Statistics Lecture 1
20 pages
1 - Chapter (1) Analysis of Data and Its Types Exercise
No ratings yet
1 - Chapter (1) Analysis of Data and Its Types Exercise
10 pages
Assignment No 3
No ratings yet
Assignment No 3
16 pages
Lecture Note On PCA1
No ratings yet
Lecture Note On PCA1
26 pages
Lecture 1
No ratings yet
Lecture 1
32 pages
Statistics For Data Science 1
No ratings yet
Statistics For Data Science 1
65 pages
Module I. Basic Calculations. Average, Standard Deviation by Excel
No ratings yet
Module I. Basic Calculations. Average, Standard Deviation by Excel
48 pages
Ecs Notes
No ratings yet
Ecs Notes
10 pages
4x @6ote ) 'Btda2@m
No ratings yet
4x @6ote ) 'Btda2@m
55 pages
Statistics Notes
No ratings yet
Statistics Notes
46 pages
Unit 1 Computational Statistics
No ratings yet
Unit 1 Computational Statistics
58 pages
It0089 Finalreviewer
No ratings yet
It0089 Finalreviewer
143 pages
Statistics
No ratings yet
Statistics
21 pages
Maths PCA
No ratings yet
Maths PCA
28 pages
Descriptive Statistics PDF
100% (1)
Descriptive Statistics PDF
40 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
21 pages
MATM111
No ratings yet
MATM111
8 pages
Presentation 4
No ratings yet
Presentation 4
29 pages
Dbs3e PPT ch03
No ratings yet
Dbs3e PPT ch03
61 pages
2466939-EDA and STATISTICS NOTES
No ratings yet
2466939-EDA and STATISTICS NOTES
15 pages
Unit 1 - Business Statistics & Analytics
No ratings yet
Unit 1 - Business Statistics & Analytics
25 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
24 pages
Univariate Statistics
No ratings yet
Univariate Statistics
4 pages
Reviewer Part 1
No ratings yet
Reviewer Part 1
9 pages
Notes On Data Processing, Analysis, Presentation
No ratings yet
Notes On Data Processing, Analysis, Presentation
63 pages
Averages 2
No ratings yet
Averages 2
6 pages
Statistics
No ratings yet
Statistics
152 pages
Click To Add Text Dr. Cemre Erciyes: Soc 2003 Statistical Methods and Computer Applications in Social Sciences 18/19
No ratings yet
Click To Add Text Dr. Cemre Erciyes: Soc 2003 Statistical Methods and Computer Applications in Social Sciences 18/19
69 pages
Ai - Ssmda
No ratings yet
Ai - Ssmda
142 pages
MMW (Data Management) - Part 1
No ratings yet
MMW (Data Management) - Part 1
26 pages
Descriptive Statistics: Six Sigma Thinking, #3
From Everand
Descriptive Statistics: Six Sigma Thinking, #3
Sumeet Savant
No ratings yet
CE 207: Applied Mathematics For Engineers: Raquib Ahsan, Ph.D. Professor, Dept. of Civil Engineering Buet
No ratings yet
CE 207: Applied Mathematics For Engineers: Raquib Ahsan, Ph.D. Professor, Dept. of Civil Engineering Buet
17 pages
MATH 8-WEEK 5 Q4-Illustrating An ExperimentOutcomeSample Space eVENT 2
No ratings yet
MATH 8-WEEK 5 Q4-Illustrating An ExperimentOutcomeSample Space eVENT 2
61 pages
8604 Assignment 1
No ratings yet
8604 Assignment 1
26 pages
A Study To Assess The Perceived Stress Among Nursing Students During COVID-19 Lockdown
No ratings yet
A Study To Assess The Perceived Stress Among Nursing Students During COVID-19 Lockdown
7 pages
Bivariate Linear Regression
No ratings yet
Bivariate Linear Regression
33 pages
Detection of Fraud Statement Based On Word Vector Evidence From Financial Companies in China - ScienceDirect
No ratings yet
Detection of Fraud Statement Based On Word Vector Evidence From Financial Companies in China - ScienceDirect
9 pages
Edu 211 Sample Past Papers With Answers
No ratings yet
Edu 211 Sample Past Papers With Answers
35 pages
Eric and brian-INVESTIGATING THE INFLUENCE OF BRIDGE OFFICER EXPERIENCE ON ICE
No ratings yet
Eric and brian-INVESTIGATING THE INFLUENCE OF BRIDGE OFFICER EXPERIENCE ON ICE
8 pages
Fitri Utami Ningrum 0604001559 2008-2009 Sekaran, Uma. (2003) - Research Methods For Business, 4 Ed. USA: Wiley
No ratings yet
Fitri Utami Ningrum 0604001559 2008-2009 Sekaran, Uma. (2003) - Research Methods For Business, 4 Ed. USA: Wiley
28 pages
Nguyen Et Al., 2020
No ratings yet
Nguyen Et Al., 2020
9 pages
Probit and Logit Models: Differences in The Multivariate Realm
No ratings yet
Probit and Logit Models: Differences in The Multivariate Realm
14 pages
Gold Chapter 2 Group 2 Eim
No ratings yet
Gold Chapter 2 Group 2 Eim
31 pages
QMS 105 Group Assignment
No ratings yet
QMS 105 Group Assignment
4 pages
L 1 Intro
No ratings yet
L 1 Intro
16 pages
Chi Square Test SL&HL
No ratings yet
Chi Square Test SL&HL
8 pages
Knowledge of Common Freshmen Paulinians About St. Paul University Iloilo
50% (2)
Knowledge of Common Freshmen Paulinians About St. Paul University Iloilo
45 pages
Postschool Outcomes of Youths
No ratings yet
Postschool Outcomes of Youths
9 pages
Simu Final Note 2
No ratings yet
Simu Final Note 2
17 pages
Mathematical Model of QBD
No ratings yet
Mathematical Model of QBD
8 pages
Slides Lecture 6.4 PDF
No ratings yet
Slides Lecture 6.4 PDF
3 pages
Brahm Et Al 2020 Can Firms Be Both Broad and Deep Exploring Interdependencies Between Horizontal and Vertical Firm Scope
No ratings yet
Brahm Et Al 2020 Can Firms Be Both Broad and Deep Exploring Interdependencies Between Horizontal and Vertical Firm Scope
36 pages
0: Systematic Error: Experiment: Measuring Resistance I
No ratings yet
0: Systematic Error: Experiment: Measuring Resistance I
11 pages
如何撰写一篇有力的论证性文章？
100% (1)
如何撰写一篇有力的论证性文章？
5 pages
Digital Image Processing ECE 533 Assignment 4 Due Date: March 11, in Class
No ratings yet
Digital Image Processing ECE 533 Assignment 4 Due Date: March 11, in Class
7 pages
Lecture 4 Spearman
No ratings yet
Lecture 4 Spearman
2 pages
How To Write Objectives of The Study in Thesis
100% (3)
How To Write Objectives of The Study in Thesis
7 pages
MKT20019 - Assignment 2
No ratings yet
MKT20019 - Assignment 2
47 pages
The Impact of Supply Chain Finance On Corporate Performance Improving Supply Chain Efficiency and Increasing Profitability
100% (1)
The Impact of Supply Chain Finance On Corporate Performance Improving Supply Chain Efficiency and Increasing Profitability
85 pages
cz4041 9 Ensemble
No ratings yet
cz4041 9 Ensemble
54 pages

INTRODUCTION TO STATISTICS Notes

Uploaded by

INTRODUCTION TO STATISTICS Notes

Uploaded by

➕

💡 Statistics is the science that deals with collecting, analyzing and

POPULATION : It is a set or collection of items of interest in a

SAMPLE : A sample is a subset of items that have been

Majorly there are 2 types of statistics :

DESCRIPTIVE STATISTICS INFERENTIAL STATISTICS

It is used to explain the chance of

Measure of central tendency

Other types of mean : weighted mean, harmonic mean , geometric mean,

💡 Unusually high or unusually low data is known as outlier .

💡 Range is not a reliable measure when outlier is present in data set .

2. STANDARD DEVIATION : Standard deviation is the measure of the amount of

STEP 1 The mean value is calculated by adding all the data

STEP 2 The variance for each data point is calculated by

Standard deviation is a very useful tool in investing and trading strategies as it

STANDARD DEVIATION v/s VARIANCE

💡 DRAWBACK OF SD : The biggest drawback of standard deviation is that

AVERAGE ABSOLUTE DEVIATON (AAD)

 Mean absolute deviation around mean

 Mean absolute deviation around median.

Two types of Median absolute deviation :

 Median absolute deviation around mean

 Median absolute deviation around median.

MAXIMUM ABSOLUTE DEVIATION

 Qualitative data type

It is also known as categorical data. It describes the object under consideration

Example : Gender of a person .

Qualitative data type has 2 subtypes :

💡 NOTE : These categories help us in defining which encoding strategy

2. Quantitative data type

Quantitative data type has 2 sub types :

 Continuous data type : The fractional numbers or decimal values are

You can give numbering to ordinal data then it should be called

💡 NOTE We have discussed all the major classification of data . This is

💡 Regression analysis where the relationship between one dependent and

💡 ANOVA is applicable only for qualitative data .

6. PICTORIAL CHARTS : These charts use pictorial symbols to represent data .

💡 Quartiles divide the data into 4 parts .

💡 Deciles divide the data into 10 parts .

💡 Quintiles divide the data into 5 parts .

💡 Percentiles divide the data into 100 parts .

DECILES AND PERCENTILES

Deciles and percentiles are usually applied to large data sets .

💡 DO NOT mistake percentile for the score of the student .

Growth charts are another common example of an application of percentile .

You might also like