0% found this document useful (0 votes)
4 views12 pages

Intro To Statistics and Assignments

The document discusses the role of statistics in data science, highlighting descriptive and inferential statistics. It details types of data, measures of central tendency, variability, and the importance of variance and standard deviation in understanding data spread. Additionally, it provides an example of creating age-based insurance premiums and assignments related to statistical calculations.

Uploaded by

ashwin150900014
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views12 pages

Intro To Statistics and Assignments

The document discusses the role of statistics in data science, highlighting descriptive and inferential statistics. It details types of data, measures of central tendency, variability, and the importance of variance and standard deviation in understanding data spread. Additionally, it provides an example of creating age-based insurance premiums and assignments related to statistical calculations.

Uploaded by

ashwin150900014
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Statistics:

In the realm of data science, statistics, particularly data science statistics play a
pivotal role in data analysis and decision-making. It gathers, analyse, visualize
to make conclusions from the data.
Descriptive statistics describe the main features of a data set, while inferential
statistics use sample data to draw conclusions about a larger population

Types of Data:

Numerical Data:
• Discrete
eg:
1. No of students in Data Science People click
2. No. of voters in RS Puram

• Continuous:
Height, Weight…

Categorical Data:
1.Eg:
Gender: Male, Female

2.Eg:
Problem Statement: Designing Age-Based Insurance Premiums for Families
(INR)
Objective:
An insurance company wants to create family insurance plans with premiums
tailored to different age groups. The goal is to categorize family members by
age, determine appropriate premiums based on risk, and ensure affordability
while maintaining profitability.

Age Categories and Premiums (INR):


1. Children (1–12 years)
o Low medical risk: Basic healthcare needs such as vaccinations and
minor illnesses.
o Premium range: ₹1,000 – ₹2,000 per month.
2. Teens (13–19 years)
o Moderate risk: Higher likelihood of sports injuries, mental health
care.
o Premium range: ₹2,000 – ₹3,500 per month.
3. Middle-aged Adults (21–59 years)
o Higher risk: Lifestyle-related health issues such as hypertension
and diabetes.
o Premium range: ₹4,000 – ₹8,000 per month.
4. Old-aged Adults (60+ years)
o Highest risk: Chronic diseases, frequent hospital visits, critical care.
o Premium range: ₹10,000 – ₹15,000 per month.
Types of Data and Level of Measurements:
Descriptive statistics – Describes about the data

Summarize a data set's characteristics using measures like mean, median, and
standard deviation. Descriptive statistics are limited to the data collected and
are used to present and summarize it.

Measure of Central Tendency:


1.Eg: Age of people visiting the tuition centre
Age: {18,20,25,20,15,10}

2.Eg: Age of people visiting the tuition centre


Age: {18,20,25,75,15,10}

Step 1: Sum of all values


18+20+25+75+15+10=163
Step 2: Number of values
There are 6 values in the data set.
Median:

Mode:
Eg: Age of people attending Maths Tuition in RS Puram.
Age: {19,20,21,25,21,24,25,21,20,18,24,24}
The dataset indicates that most of the age groups attending Maths tuition in RS
Puram belongs to the age 21 and 24.
Measures of Variability/Dispersion (Spread):
Range:
1.Eg: Age of people visiting the tuition centre
Age: {18,20,25,20,15,10}
Range = Max – Min
25-10= 15

Variance:
Variance and Standard Deviation
Variance and Standard Deviation are measures of how spread out a set of data
is around the mean.
• Variance measures the average squared deviation of each data point
from the mean.
• Standard Deviation is the square root of the variance, providing a
measure of spread in the same units as the data.
Normal Distribution(Overview):
Variance
• Definition: Variance is the average of the squared differences between
each data point and the mean of the dataset. It provides a measure of
how spread out the data is.
• Interpretation:
o Higher variance means the data points are more spread out from
the mean. This indicates greater variability in the dataset.
o Lower variance suggests the data points are closer to the mean,
showing less variability.
• Units: Variance is expressed in squared units of the original data, which
makes it harder to interpret directly in the context of the data.
Standard Deviation
• Definition: Standard deviation is the square root of the variance. It gives
a measure of the spread of data points around the mean, but in the
same units as the original data.
• Interpretation:
o Higher standard deviation indicates more variability in the
dataset.
o Lower standard deviation suggests that the data points are closer
to the mean.
• Units: Standard deviation is expressed in the same units as the data
itself, making it more interpretable than variance.
Comparison & Inferences:
• Direct comparison: Both measure the dispersion of data, but the
standard deviation is generally more useful for understanding the spread
in real-world terms because it's in the same units as the data.
• Understanding variability: If you know the standard deviation, you can
easily gauge how far, on average, data points are from the mean.
• For data analysis: If you're comparing datasets with similar means, the
dataset with the higher standard deviation (or variance) will show more
fluctuation or unpredictability.
In essence, while variance tells you about the overall spread, standard
deviation provides a more intuitive understanding of that spread in the context
of the original data's units.

Inferential statistics
Inferential statistics on the other hand is an important concept that deals with
drawing conclusions based on small samples collected from the entire
population. For example, during an election poll, people will often want to
predict the exit poll results so they will conduct a survey in various parts of
state or country and record their opinion. Based on the information they have
collected they tend to draw conclusions and make inferences to predict results
for the entire population.
Assignments:
1)
Calculate the mean, median, mode, variance, standard deviation and range for
the problem statement given below.
The number of calls from motorists per day for roadside service was recorded
for a particular month:
28, 122, 217, 130, 120, 86, 80, 90, 140, 120, 70, 40, 145, 113, 90, 68, 174, 194,
170,100, 75, 104, 97, 75,123, 100, 75, 104, 97, 75, 123, 100, 89, 120, 109.

2)
a) Please find the mean, median, mode, variance, standard deviation, and
total for Annual income.

b) Syntax: Filtered_data = data[data[‘Age’] <= 20 ]


Filter the gender whose age is less than or equal to 20 using just Age,
Gender and Annual Income (k$).

c) Filter the gender whose age is less than or equal to 40 using just Age,
Gender and Annual Income (k$).

d) Follow the above given syntax to filter the gender whose age is above 20
and below 40 using just Age, Gender and Annual Income (k$) .

Hint: Use (&) AND Operator

Note: Attached the file on whatsapp - Customer Segmentation.csv

You might also like