Ads 1 Exp
Ads 1 Exp
1
Aim: To Explore the Descriptive Statistics on the given dataset.
Theory: Statistics serves as the backbone of data science providing tools and methodologies
to extract meaningful insights from raw data. Data scientists rely on statistics for every
crucial task from cleaning messy datasets and creating powerful visualizations to building
predictive models that glimpse into the future. Without statistics, we cannot transform raw
data into actionable insights that drive business success.
Definition: Descriptive statistics refers to a branch of statistics that involves summarizing,
organizing, and presenting data meaningfully and concisely. It focuses on describing and
analysing a dataset's main features and characteristics without making any generalizations or
inferences to a larger population. Descriptive statistics help summarize and organize data so it
becomes more understandable.
Descriptive statistics types and its techniques:
Measures of central tendency: Measures of central tendency focus on the average or
middle values of data sets, whereas measures of variability focus on the dispersion of
data.
1. Mean: is the sum of observations divided by the total number of observations. It is
also defined as average which is the sum divided by count.
2. Mode: The most frequently occurring value in the dataset. It’s useful for categorical
data and in cases where knowing the most common choice is crucial.
3. Median: is the middle value in the dataset that splits the data into two halves. If the
number of elements in the data set is odd then the centre element is the median and if
it is even then the median would be the average of two central elements.
Measures of variability: Measures of variability (or measures of spread) aid in analysing how
dispersed the distribution is for a set of data. For example, while the measures of central
tendency may give a person the average of a data set, it does not describe how the data is
distributed within the set.
1. Range: Describes the difference between the largest and smallest data point in our
data set. The bigger the range, the more the spread of data and vice versa.
2. Variance: Is defined as an average squared deviation from the mean. It is calculated
by finding the difference between every data point and the average which is also
known as the mean, squaring them, adding all of them, and then dividing by the
number of data points present in our data set.
3. Standard Deviation: Standard deviation is widely used to measure the extent of
variation or dispersion in data. It is defined as the square root of the variance. It is
calculated by finding the mean, then subtracting each number from the mean which is
also known as the average, and squaring the result. Adding all the values and then
dividing by the no. of terms followed by the square root.
Measures of frequency distribution: Describe the occurrence of data within the data set
(count). Frequency distribution table is a powerful summarize way to show how data points
are distributed across different categories or intervals. Helps identify patterns, outliers, and
the overall structure of the dataset. It is often the first step in understanding the dataset before
applying more advanced analytical methods or creating visualizations like histograms or pie
charts.
Output:
Conclusion: Hence we have learned about descriptive statistics and its techniques.