0% found this document useful (0 votes)

62 views

Module 3

The document discusses different types of data analysis including descriptive analysis, exploratory data analysis, predictive analysis, and inferential analysis. Descriptive analysis uses numerical methods to summarize data through measures like mean, median, mode, standard deviation, and variance. Exploratory data analysis takes a visual approach using plots and graphs to analyze single and multiple variables. Stem and leaf plots arrange data to show frequency of values through stems and leaves. Normal distributions follow a bell curve shape and describe randomness in many phenomena through mean and standard deviation. Skewness measures asymmetry in a distribution, with positive skewness bending left and negative skewness bending right.

Uploaded by

Sayan Majumder

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views

Module 3

Uploaded by

Sayan Majumder

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Module 3

Data Analysis Types:

Data analysis may be separated into four stages depending on the methodology used:

 Descriptive Analysis
 Exploratory Data Analysis
 Predictive Analysis
 Inferential Analysis

Descriptive Analysis:
Descriptive analysis is a numerical method of extracting information from data. The
numerical variables’ values are summarised in the descriptive analysis. Assume you’re
looking at sales data from a vehicle company. In descriptive analytical literature, you’ll look
for answers to queries like what is the mean, mode, and median of a car type’s selling price,
what was the income generated by selling a specific model of automobile, and so on. Using
this form of analysis, we may determine the central tendency and dispersion of the numerical
variables in the data. A descriptive analysis can assist you gain the high-level knowledge of
the data and become acclimated to the data set in most practical data science use cases.

The following are some key descriptive analysis terminologies:

 Mean: average value of total numbers given in the list of numbers

 Mode: most frequent number in the given list of numbers
 Median: middle value of the givan list of numbers
 Standard deviation: value of variation of the given set of values from the mean value
 Variance: Variation is a term that is used to describe (square of standard deviation)
 Interquartile Range (IQR): values between 25 and 75 percentile of a list of numbers

Importance of Descriptive Analysis:

Data visualisation is made simple with descriptive statistics. It enables data to be presented in
a meaningful and intelligible manner, allowing for a more straightforward understanding of
the data set. The analysis of raw data would be laborious, and determining trends and patterns
might be tough. Furthermore, raw data makes it difficult to visualise what is being displayed.

Exploratory Data Analysis:

In contrast to descriptive data analysis, which is a numerical approach to data analysis,
exploratory data analysis is a visual approach to data analysis. We will turn to exploratory
data analysis once we have a basic comprehension of the data at hand through descriptive
analysis. The exploratory data analysis may alternatively be divided into two parts:
 Uni variate analysis: Analysis of a single variable (exploring characteristics of a
single variable)
 Multivariate analysis: Analyses using many variables (comparative analysis of
multiple variables, if we compare the correlation of two variables, it is called bivariate
analysis)

We employ numerous types of plots and graphs to analyse data in the visual style of data
analysis. A bar plot, histograms, box plot with whisker, violin plot, and other plots can be
used to study a single variable (univariate analysis). We employ scatter plots, contour plots,
multi-dimensional graphs, and other multivariate analytic tools.

Need of Exploratory Data Analysis:

 Exploratory data analysis provides a visual representation of the data, which aids in
identifying the data’s features more clearly
 It assists us in determining which characteristics are most significant, which is very
handy when dealing with data that has a lot of dimensions. (i.e., dimensionality
reduction is aided by approaches like as PCA and t-SNE)
 It’s a good technique to communicate the incurred outcome to non-technical
stakeholders and executives

What are Stem and Leaf Plots?

A stem and leaf plot, also known as a stem and leaf diagram, is a way to arrange and
represent data so that it is simple to see how frequently various data values occur. It is a plot
that displays ordered numerical data.

A stem and leaf plot is shown as a special table where the digits of a data value are divided
into a stem (first few digits) and a leaf (usually the last digit). The symbol ‘|’ is used to split
and illustrate the stem and leaf values. For instance, 105 is written as 10 on the stem and 5 on
the leaf. This can be written as 10 | 5. Here, 10 | 5 = 105 is called the key. The key depicts the
data value a stem and leaf represent.

How do we Construct a Stem and Leaf Plot?
Step 1: Classify the data values in terms of the number of digits in each value, such as 2 digit
numbers or 3 digit numbers.

Step 2: Fix the key for the stem and leaf plot. For example, 2 | 5 = 25, 3 | 2 = 3.2 or 19 | 2 is
192.

Step 3: Consider the first digits as stems and the last digit as leaves.

Step 4: Find the range of the data, that is the lowest and the highest values among the data.

Step 5: Draw a vertical line. Place the stem on the left and the leaf on the right of the vertical
line.

Step 6: List the stems in the stem column. Sort them in ascending order.

Step 7: List the leaf values in the column against the stem from lowest to the highest
horizontally.

Rapid Recall

Key : 0 | 1 = 1

Solved Examples
Example 1:

The table below shows the duration of calls that Rosy makes each day. Represent the
given data using a stem and leaf plot.

Solution:

Step 1: Sort the data (number of minutes).

2, 3, 5, 6, 10, 14, 19, 23, 23, 30, 36, 56

Step 2: Choose the stems and the leaves. Just because the data values range from 2 to 56, use
the tens digit for the stem and the ones digit for the leaf. Also, include the key.

Step 3: Write down the stems on the left of the vertical line.

Step 4: Write down the leaves for each stem on the right of the vertical line.

Example 2

The stem-and-leaf plot below shows the quiz scores of students.

(a) Find the number of students who scored less than 9 points?

(b) Find the number of students who scored a minimum of 9 points?

Solution:
a) There are fourteen scores less than 9 points.

They are 6.6, 7.0, 7.5, 7.7, 7.8, 8.1, 8.1, 8.3, 8.4, 8.4, 8.6, 8.8, 8.8 and 8.9.

So, fourteen students scored less than 9 points.

b) There are two scores which are at least 9 points.

They are 9.0, 9.2, 9.9, and 10.0.

So, four students scored a minimum of 9 points.

Example 3:

Construct a stem-and-leaf plot for the data in the table.

Solution:

Step 1: Sort the data values: 1, 1, 1, 2, 2, 4, 5, 5, 7, 12, 20, 23, 27, 30, 32, 33, 38, 40, 44, 47

Step 2: Choose the stems and the leaves. As the data values range from 1 to 47, use the tens
digits for the stems and the ones digits for the leaves. Be sure to include the key.

Step 3: Write the stems to the left of the vertical line from the top to bottom.

Step 4: Write the leaf values corresponding to each stem to the right of the vertical line.

Key : 0 | 1 = 1 cm

What Is a Normal Distribution?

A normal distribution is a continuous probability distribution for a random variable. A
random variable is a variable whose value depends on the outcome of a random event. For
example, flipping a coin will give you either heads or tails at random. You cannot determine
with absolute certainty if the following outcome is a head or a tail.

When you plot the probability of a random event, you get its probability distribution. The
probability of a random variable that can take on any value is called a continuous probability
distribution. The number of values that the probability could be are infinite and form a
continuous curve. Hence, instead of writing the probability values, you define the range in
which they lie.

When the continuous probability distribution curve is bell-shaped, i.e., it looks like a hill with
a well-defined peak, it is said to be a normal distribution. The peak of the curve is at the
mean, and the data is symmetrically distributed on either side of it. The mean, median, and
mode are equal to each other or lie close to each other.
Figure 1: Normal distribution

Consider the marks scored in a math test by students in a class. The majority of the students
would have scored the average mark. Few students would have scored a little less, and some
would have scored more. Even fewer would be in the bottom 10% and the top 10%. Some
examples of normal distributions are:

1. Blood pressure of people

2. I.Q. scores
3. Salaries

Measures of Skewness and Kurtosis

Skewness refers to the degree of symmetry, or more precisely, the degree of lack of
symmetry. Distributions, or data sets, are said to be symmetric if they appear the same on both sides
of a central point. Kurtosis refers to the proportion of data that is heavy-tailed or light-tailed in
comparison with a normal distribution.

What Is Skewness?
Skewness is used to measure the level of asymmetry in our graph. It is the measure of
asymmetry that occurs when our data deviates from the norm.

Sometimes, the normal distribution tends to tilt more on one side. This is because the
probability of data being more or less than the mean is higher and hence makes the
distribution asymmetrical. This also means that the data is not equally distributed. The
skewness can be on two types:

1. Positively Skewed: In a distribution that is Positively Skewed, the values are more
concentrated towards the right side, and the left tail is spread out. Hence, the statistical results
are bent towards the left-hand side. Hence, that the mean, median, and mode are always
positive. In this distribution, Mean > Median > Mode.

Figure 2: Positively Skewed

2. Negatively Skewed: In a Negatively Skewed distribution, the data points are more
concentrated towards the right-hand side of the distribution. This makes the mean, median,
and mode bend towards the right. Hence these values are always negative. In this distribution,
Mode > Median > Mean.

Figure 3: Negatively Skewed

Pearson’s First Coefficient

The median is always the middle value, and the mean and mode are the extremes, so you can
derive a formula to capture the horizontal distance between mean and mode.

Figure 4: Pearson’s First Coefficient

The above formula gives you Pearson's first coefficient. Division by the standard deviation
will help you scale down the difference between mode and mean. This will scale down their
values in a range of -1 to 1. Now understand the below relationship between mode, mean and
median.

Figure 5: Mode in terms of mean and median

Substituting this in Pearson’s first coefficient gives us Pearson’s second coefficient and the
formula for skewness:
Figure 6: Pearson’s Second Coefficient

If this value is between:

1. -0.5 and 0.5, the distribution of the value is almost symmetrical

2. -1 and -0.5, the data is negatively skewed, and if it is between 0.5 to 1, the data is
positively skewed. The skewness is moderate.
3. If the skewness is lower than -1 (negatively skewed) or greater than 1 (positively
skewed), the data is highly skewed.

What Is Kurtosis?
Kurtosis is used to find the presence of outliers in our data. It gives us the total degree of
outliers present.

The data can be heavy-tailed, and the peak can be flatter, almost like punching the
distribution or squishing it. This is called Negative Kurtosis (Platykurtic). If the distribution
is light-tailed and the top curve steeper, like pulling up the distribution, it is called Positive
Kurtosis (Leptokurtic).

Figure 7: (a) Leptokurtic, (b) Normal Distribution, (c) Platykurtic

The expected value of kurtosis is 3. This is observed in a symmetric distribution. A kurtosis

greater than three will indicate Positive Kurtosis. In this case, the value of kurtosis will range
from 1 to infinity. Further, a kurtosis less than three will mean a negative kurtosis. The range
of values for a negative kurtosis is from -2 to infinity. The greater the value of kurtosis, the
higher the peak.

Figure 8: Excess Kurtosis

Hence, you can say that Skewness and Kurtosis are used to describe the spread and height of
your normal distribution. Skewness is used to denote the horizontal pull on the data. It tells
you how spread out the data is, and Kurtosis is used to find the vertical pull or the peak's
height.

Looking forward to a career in Data Analytics? Check out the Data Analytics Course and get certified
today.

4 Statistics and Probability G11 Quarter 4 Module 4 Identifying The Appropriate Test Statistics Involving Population Mean
78% (18)
4 Statistics and Probability G11 Quarter 4 Module 4 Identifying The Appropriate Test Statistics Involving Population Mean
27 pages
Probability and Statistics Notes
No ratings yet
Probability and Statistics Notes
38 pages
Chap 4
No ratings yet
Chap 4
8 pages
Freq. distribution Characteristics
No ratings yet
Freq. distribution Characteristics
13 pages
Statistical Machine Learning
100% (1)
Statistical Machine Learning
12 pages
Unit-2 Solution
No ratings yet
Unit-2 Solution
21 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
4 pages
Allama Iqbal Open University, Islamabad: (Department of Secondary Teacher Education)
No ratings yet
Allama Iqbal Open University, Islamabad: (Department of Secondary Teacher Education)
13 pages
Research Methodology: Result and Analysis (Part 1)
No ratings yet
Research Methodology: Result and Analysis (Part 1)
65 pages
Chapter 2 - Stat
No ratings yet
Chapter 2 - Stat
100 pages
Chapter 3_Trimmed mean, Box Plot, Dot Plot
No ratings yet
Chapter 3_Trimmed mean, Box Plot, Dot Plot
9 pages
Chapter 1-Overview & Descriptive Statistics_Classroom Upload
No ratings yet
Chapter 1-Overview & Descriptive Statistics_Classroom Upload
81 pages
ASSIGNMEN4
100% (1)
ASSIGNMEN4
15 pages
Descriptive and Inferential Statistics
100% (1)
Descriptive and Inferential Statistics
10 pages
Unit-3 DS Students
No ratings yet
Unit-3 DS Students
35 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
6 pages
Brand Loyalty: SERVICE. We Have
No ratings yet
Brand Loyalty: SERVICE. We Have
38 pages
ds1 Iat Ans
No ratings yet
ds1 Iat Ans
18 pages
Mathematics in The Modern World
No ratings yet
Mathematics in The Modern World
50 pages
Business Statistics: Qualitative or Categorical Data
No ratings yet
Business Statistics: Qualitative or Categorical Data
14 pages
Mmw Statistics
No ratings yet
Mmw Statistics
50 pages
Unit-2 Solution
No ratings yet
Unit-2 Solution
22 pages
Statistical Analysis_ Descriptive Stat (2)
No ratings yet
Statistical Analysis_ Descriptive Stat (2)
6 pages
DSBDL Asg 3 Write Up
No ratings yet
DSBDL Asg 3 Write Up
6 pages
Statistics- slide 2
No ratings yet
Statistics- slide 2
15 pages
Statistics Assignment Chinar Dawod Ozair
100% (1)
Statistics Assignment Chinar Dawod Ozair
12 pages
Data Management
No ratings yet
Data Management
36 pages
Statistics
No ratings yet
Statistics
30 pages
Project Management Methodology-Batch - 17082020-7AM
No ratings yet
Project Management Methodology-Batch - 17082020-7AM
81 pages
Notes On Statistics
No ratings yet
Notes On Statistics
15 pages
TN 4 3.1_3.2
No ratings yet
TN 4 3.1_3.2
4 pages
Data Processing
No ratings yet
Data Processing
64 pages
Quantitative Data Analysis Assignment (Recovered)
100% (1)
Quantitative Data Analysis Assignment (Recovered)
26 pages
chapter2-statistical analysis
No ratings yet
chapter2-statistical analysis
86 pages
Q No#1: Tabulation: 5 Major Objectives of Tabulation: (1) To Simplify The Complex Data
No ratings yet
Q No#1: Tabulation: 5 Major Objectives of Tabulation: (1) To Simplify The Complex Data
13 pages
Q No#1: Tabulation: 5 Major Objectives of Tabulation: (1) To Simplify The Complex Data
100% (1)
Q No#1: Tabulation: 5 Major Objectives of Tabulation: (1) To Simplify The Complex Data
13 pages
STAT 1770 Lab 2-2
No ratings yet
STAT 1770 Lab 2-2
3 pages
Notes On Data Processing, Analysis, Presentation
No ratings yet
Notes On Data Processing, Analysis, Presentation
63 pages
Introduction To Descriptive Statistics I: Sanju Rusara Seneviratne Mbpss
No ratings yet
Introduction To Descriptive Statistics I: Sanju Rusara Seneviratne Mbpss
35 pages
Methods of Research Week 15 Assessment
No ratings yet
Methods of Research Week 15 Assessment
8 pages
Descriptive and Inferential Statistics
No ratings yet
Descriptive and Inferential Statistics
10 pages
Descriptive Analytics Notes
No ratings yet
Descriptive Analytics Notes
6 pages
DS Module 2
No ratings yet
DS Module 2
113 pages
Unit-2-Business Statistics-Desc Stat
No ratings yet
Unit-2-Business Statistics-Desc Stat
26 pages
Statistics Lec 1
No ratings yet
Statistics Lec 1
28 pages
Statistics Is The Study of The Collection, Organization, Analysis, Interpretation, and
No ratings yet
Statistics Is The Study of The Collection, Organization, Analysis, Interpretation, and
18 pages
Statistical Foundations for Psychology
From Everand
Statistical Foundations for Psychology
James C. Ware
No ratings yet
Edu 533 Tmon Outline
No ratings yet
Edu 533 Tmon Outline
9 pages
SINGLE VARIABLE Notes 5.3 Year 10
No ratings yet
SINGLE VARIABLE Notes 5.3 Year 10
9 pages
Measures of Central Tendency and Dispersion
No ratings yet
Measures of Central Tendency and Dispersion
9 pages
Statistics,2
No ratings yet
Statistics,2
33 pages
Subtitle Big Data Coursera 1
No ratings yet
Subtitle Big Data Coursera 1
2 pages
Features
No ratings yet
Features
42 pages
Statistics For Data Science
No ratings yet
Statistics For Data Science
93 pages
Unit 3
No ratings yet
Unit 3
42 pages
Edu 533 Outline
No ratings yet
Edu 533 Outline
10 pages
Measures of Central Tendency Position and Dispersion 1.Pptx 20241015 145631 0000
No ratings yet
Measures of Central Tendency Position and Dispersion 1.Pptx 20241015 145631 0000
44 pages
Data Presentation
No ratings yet
Data Presentation
64 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
38 pages
Document (9)
No ratings yet
Document (9)
8 pages
standard error
No ratings yet
standard error
14 pages
Module 3
No ratings yet
Module 3
12 pages
Marketing Analytics
No ratings yet
Marketing Analytics
6 pages
SAMRIDDHA DEY
No ratings yet
SAMRIDDHA DEY
2 pages
ABSTRACT
No ratings yet
ABSTRACT
1 page
Review Report_souvikpaul
No ratings yet
Review Report_souvikpaul
1 page
BCA 1st Sem Yoga
No ratings yet
BCA 1st Sem Yoga
9 pages
3192-SUBMISSION - Manuscript File (.pdf_.docx)-11384-1-4-20231227
No ratings yet
3192-SUBMISSION - Manuscript File (.pdf_.docx)-11384-1-4-20231227
8 pages
Module 2
No ratings yet
Module 2
8 pages
Lecture Plan BBA504A
No ratings yet
Lecture Plan BBA504A
2 pages
IJCTE Template
No ratings yet
IJCTE Template
4 pages
Convergence 2023 - Schedule
No ratings yet
Convergence 2023 - Schedule
3 pages
Hypothesis Testing Using P-Value Approach
No ratings yet
Hypothesis Testing Using P-Value Approach
16 pages
All chapter download THINK Sociology Canadian 2nd Edition Carl Solutions Manual
100% (10)
All chapter download THINK Sociology Canadian 2nd Edition Carl Solutions Manual
66 pages
Remaining Shelf-Life Estimation of Fresh Fruits and Vegetables During Transportation-1
No ratings yet
Remaining Shelf-Life Estimation of Fresh Fruits and Vegetables During Transportation-1
47 pages
Tut 3 Confidence Interval - Answer
No ratings yet
Tut 3 Confidence Interval - Answer
2 pages
Conditional Probability and Medical Tests
No ratings yet
Conditional Probability and Medical Tests
20 pages
Full Dissertation Template
No ratings yet
Full Dissertation Template
15 pages
QP Vtu Etr Electrical Engineering
No ratings yet
QP Vtu Etr Electrical Engineering
64 pages
ABMC_Group6_revised-full-paper
No ratings yet
ABMC_Group6_revised-full-paper
28 pages
Quantitave Examen y Test Unidos
100% (1)
Quantitave Examen y Test Unidos
21 pages
Zhang et al., 2022(1)
No ratings yet
Zhang et al., 2022(1)
15 pages
Formulating Min-Research
No ratings yet
Formulating Min-Research
43 pages
SAT Problem-Solving and Data Analysis H
No ratings yet
SAT Problem-Solving and Data Analysis H
71 pages
Applied Business Statistics, 7 Ed. by Ken Black
No ratings yet
Applied Business Statistics, 7 Ed. by Ken Black
24 pages
Upgrad ML
100% (1)
Upgrad ML
7 pages
CSR2 Week 3 Quiz
No ratings yet
CSR2 Week 3 Quiz
5 pages
BSC (Hons) Actuarial Science: School of Innovative Technologies and Engineering
No ratings yet
BSC (Hons) Actuarial Science: School of Innovative Technologies and Engineering
14 pages
570-Asm2-GBS1006-Tran Khanh Ly
No ratings yet
570-Asm2-GBS1006-Tran Khanh Ly
34 pages
Module 6 Normal Distribution
No ratings yet
Module 6 Normal Distribution
31 pages
Learning Bayesian Networks (Neapolitan, Richard) PDF
100% (1)
Learning Bayesian Networks (Neapolitan, Richard) PDF
704 pages
Structural Damage Image Classification: Minnie Ho Jorge Troncoso
No ratings yet
Structural Damage Image Classification: Minnie Ho Jorge Troncoso
6 pages
Machine Learning and Data Mining: Prof. Alexander Ihler
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler
51 pages
Department of Economics College of Arts and Sciences San Beda College Course Syllabus
No ratings yet
Department of Economics College of Arts and Sciences San Beda College Course Syllabus
24 pages
BC Deva
No ratings yet
BC Deva
30 pages
Green Entrepreneurship A Sustainable Development Initiative With Special Reference To Selected Districts
No ratings yet
Green Entrepreneurship A Sustainable Development Initiative With Special Reference To Selected Districts
10 pages
Chapter Three Reviewed Acc Comments Three
No ratings yet
Chapter Three Reviewed Acc Comments Three
4 pages
Statistical Treatment (Part of Module 4
No ratings yet
Statistical Treatment (Part of Module 4
56 pages
Random Variables: NPQ - X NP Z NPQ
No ratings yet
Random Variables: NPQ - X NP Z NPQ
4 pages
CE 513-2018-RP-Calculus
No ratings yet
CE 513-2018-RP-Calculus
7 pages
Deep Learning Hybrid Approaches To Detect Fake Reviews and Ratings
No ratings yet
Deep Learning Hybrid Approaches To Detect Fake Reviews and Ratings
8 pages