Unit2 Modified
Unit2 Modified
preprocessing
Loading a dataset
• Loading a dataset into Python can be done
using various libraries, but one of the most
commonly used libraries for data
manipulation and analysis is pandas.
Statistical measures
• Statistical measures, also known as summary
statistics or descriptive statistics, are
numerical values or techniques used to
summarize and describe a dataset.
• These measures provide a concise overview of
key characteristics of the data, helping to
understand its central tendency, variability,
and distribution.
Skewness
Kurtosis
• Kurtosis is a statistical measure that describes
the distribution of data points in a dataset,
specifically how data points are distributed in
the tails (extreme values) compared to the
center (mean) of the distribution.
• There are three main types of kurtosis:
• Mesokurtic: A mesokurtic distribution has kurtosis equal to
zero. This means that the distribution has tails that are
neither too heavy (leptokurtic) nor too light (platykurtic)
compared to a normal distribution.
• Leptokurtic: A leptokurtic distribution has positive kurtosis.
This means that the distribution has heavier tails and a
sharper peak around the mean compared to a normal
distribution.
• Platykurtic: A platykurtic distribution has negative kurtosis.
This means that the distribution has lighter tails and a flatter
peak around the mean compared to a normal distribution.
• In summary, skewness describes the asymmetry of the
distribution, while kurtosis describes the tails of the
distribution. Both measures can provide valuable insights into
the characteristics of a dataset, such as whether it is skewed,
whether it has extreme values in the tails, and how it deviates
from a normal distribution.
• Researchers and statisticians often use skewness and kurtosis
in combination with other descriptive statistics to gain a
comprehensive understanding of data distributions.
• These statistical measures are essential for
summarizing and gaining insights from
datasets in various fields, including statistics,
data science, and research.
• Depending on the characteristics of the data
and the research question, different measures
may be more relevant or informative.
statistics
• To calculate and examine basic summary
statistics like mean, median, mode, standard
deviation, and range for a dataset in Python,
you can use the pandas library.
Data Cleaning: