0% found this document useful (0 votes)
15 views6 pages

Adsexp 1

The document outlines an experiment focused on studying and implementing descriptive and inferential statistics using a dataset. It explains the concepts of descriptive statistics, including measures of central tendency and dispersion, as well as inferential statistics methods like regression analysis and hypothesis testing. The document also includes a Python program for calculating and visualizing statistical measures using the Iris dataset.

Uploaded by

om29khatri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views6 pages

Adsexp 1

The document outlines an experiment focused on studying and implementing descriptive and inferential statistics using a dataset. It explains the concepts of descriptive statistics, including measures of central tendency and dispersion, as well as inferential statistics methods like regression analysis and hypothesis testing. The document also includes a Python program for calculating and visualizing statistical measures using the Iris dataset.

Uploaded by

om29khatri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Roll no:

Date:
EXPERIMENT NO.:01

Aim : Study and Implement Descriptive and Inferential Statistics on a given dataset.

Theory:
Statistics is the branch of mathematics that deals with collecting, analyzing, interpreting,
presenting, and organizing data. It involves the study of methods for gathering, summarizing,
and interpreting data to make informed decisions and draw meaningful conclusions.

Descriptive Statistics: Descriptive statistics is a term given to the analysis of data that helps to
describe, show and summarize data in a meaningful way. It is a simple way to describe our data.
Descriptive statistics is very important to present our raw data in an ineffective/meaningful way
using numerical calculations or graphs or tables. This type of statistics is applied to already
known data.

Types of Descriptive Statistics:


1.​ Measure of Central Tendency
2.​ Measure of Dispersion

1.​ Measures of Central Tendency:


●​ Mean: Calculated as the sum of values divided by the number of values. It is sensitive to
outliers.
​ In Python, we can calculate data mean with the following code.
round(tips[‘tip’].mean(),3)
●​ Median: The middle value when the data is sorted, useful when dealing with skewed
data.We can calculate the Median with Python using the following code.
tips[‘tip’].median()
●​ Mode: The most frequent value in the dataset. Useful for categorical data.We can
calculate the data Mode with the following code.

​tips['day'].mode()
2.​ Measures of Spread:
●​ Range: The difference between the maximum and minimum values.
tips['tip'].max() - tips['tip'].min()

●​ Variance: The average of the squared differences from the mean. Sensitive to outliers.
round(tips['tip'].var(),3)

●​ Standard Deviation: The square root of the variance. Provides a measure of data spread
in the same units as the original data.
round(tips['tip'].std(),3)

3.​ Skewness and Kurtosis


●​ Skewness: Measures the asymmetry of the data distribution (whether the data is skewed
to the left or right).
●​ Kurtosis: Measures the "tailedness" of the data distribution (how heavy or light the tails
are, and how extreme the values are compared to a normal distribution).

Inferential Statistics: In inferential statistics, predictions are made by taking any group of data
in which you are interested. It can be defined as a random sample of data taken from a
population to describe and make inferences about the population. Any group of data that includes
all the data you are interested in is known as population. It basically allows you to make
predictions by taking a small sample instead of working on the whole population.

Types of Inferential Statistics:

1.​ Regression analysis


2.​ Hypothesis Testing

1.​ Regression analysis:Calculates how one variable will change to another. Linear
regression is the most common type of regression used in inferential statistics.
2.​ Hypothesis Testing: A method to draw conclusions about a population parameter based
on sample data. It involves setting null and alternative hypotheses, determining a
significance level, and using the p-value to decide whether to reject the null hypothesis.
●​ Z-Test is mainly used when comparing two groups or a sample mean to a
population mean, especially with large sample sizes.
●​ ANOVA is used when comparing three or more groups to assess if there is any
significant difference between them. It is widely used in experimental designs
involving multiple groups.
Aspect Descriptive Statistics Inferential Statistics

Purpose Summarizes and describes the Makes predictions or generalizations about a


characteristics of a data set. population based on a sample.

Scope Deals with observed data Deals with inferences and predictions about
(specific to the sample or a larger population based on a sample
population).

Outcome Provides summary measures and Draws conclusions, tests hypotheses, and
visual representations (e.g., mean, makes predictions beyond the data (e.g.,
variance). p-values, confidence intervals).

Key Methods Mean, Median, Mode, Standard Hypothesis Testing (t-tests, chi-square tests),
Deviation, Range, Histograms, Confidence Intervals, Regression, ANOVA,
Bar Charts Correlation.

Example Calculating the average age of a Using data from a sample to predict the
group of students, creating a average height of a population of students or
histogram of test scores testing if two groups have different average
heights.

Data Focus Describes the actual data Makes inferences about a population or
collected future events based on the sample.
Program:
# Import necessary libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from scipy import stats
from sklearn.datasets import load_iris

# Load the Iris dataset


iris = load_iris()
iris_df = pd.DataFrame(data=iris.data, columns=iris.feature_names)

# Calculate mean, median, and mode for each feature


mean_values = iris_df.mean()
median_values = iris_df.median()
mode_values = iris_df.apply(lambda x: stats.mode(x)[0][0])

# Display results
print("Mean values:")
print(mean_values)

print("\nMedian values:")
print(median_values)

print("\nMode values:")
print(mode_values)

# Visualization
# Create subplots for each feature
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Visualization - KDE plots with mean, median, and mode


plt.figure(figsize=(14, 10))

# Plot KDE for each feature


for i, feature in enumerate(iris.feature_names):
plt.subplot(2, 2, i + 1)

# Plot KDE using seaborn


sns.kdeplot(iris_df[feature], shade=True, color='gray', label='KDE')

# Plot vertical lines for mean, median, and mode


plt.axvline(mean_values[feature],color='r',linestyle='--',label=f'Mean:
{mean_values[feature]:.2f')
plt.axvline(median_values[feature],color='g',linestyle='-',label=f'Median:
{median_values[feature]:.2f}')
plt.axvline(mode_values[feature],color='b',linestyle='-.',label=f'Mode:
{mode_values[feature]:.2f}')

# Set the title and legend


plt.title(f'KDE of {feature}')
plt.legend()
# Adjust layout for better spacing
plt.tight_layout()
plt.show()

Output:
Conclusion: Thus, we studied Descriptive statistics summarize and visualize data, while
inferential statistics make predictions and generalizations about a population based on sample
data.

You might also like