0% found this document useful (0 votes)
26 views11 pages

Use of Statistics in Data Science

idk

Uploaded by

lavanyaaverma7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views11 pages

Use of Statistics in Data Science

idk

Uploaded by

lavanyaaverma7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Data Science Class 10 Solution – Use of Statistics in Data Science

Objective Type Questions

Please choose the correct option in the questions below:

(1) We want to get the cars of red color from the below data set. Which type
of subsetting should be used?
Name Height Color
Innova 70 White
Swift 50 Red
Amaze 50 Red
Bolero 80 Grey
(a) Column based subsetting

(b) Data based subsetting

(c) Row based subsetting

(d) None of the above

Ans: (b) Data based subsetting

(2) Which is a more accurate measure of central tendency when there are outliers in the
data set?

(a) Mean

(b) Median

Ans: (b) Median

(3) Mean absolute deviation is an identifier of the variability of the data set. Is this a correct
statement?

(a) Yes

(b) No

Ans: (a) Yes


(4) The mean absolute deviation is divided by coefficient of mean absolute deviation to
calculate

(a) Variance
(b) Median
(c) Arithmetic Mean
(d) Coefficient of Variation
Ans: (c) Arithmetic Mean

(5) In a manufacturing company, the number of employers in unit A is 40, the mean is Rs
6400 and the number of employee in unit B is 30 with the mean of Rs. 5,500 then the
combined arithmetic mean is –

(a) 9500

(b) 8000

(c) 7014.29

(d) 6014.29

Ans: (d) 6014.29

(6) The mean deviation about the mean for the following data: 5, 6, 7, 8, 9, 13, 12, 15 is

(a) 1.5

(b) 3.2

(c) 2.89

(d) 5

Ans: (c) 2.89

(7) The arithmetic mean of the numerical values of the deviations of items from some
average value is called the

(a) Standard Deviation


(b) Range
(c) Quartile Deviation
(d) Mean Deviation
Ans: (b) Range

Standard Questions

(1) Explain the different ways of subsetting data.

Ans: There are basically three ways of subsetting data which are:

(a) Row based subsetting: In the Row based subsetting we consider some rows of the table
from top to bottom. Suppose you have inserted 8 rows and 6 columns in your table, so you
can only take 4 rows that too from the top side of the table.

(b) Column based subsetting: We have always observed that in original data set there is
inclusion of columns in a large number, but all of these columns are not necessary for the
analysis. In that case, we have to select some columns from the original dataset. Such
method of subsetting is to be termed as column- based subsetting.

(c) Data based subsetting: In this type of subsetting, the data is subsetted on the basis of
the specific data. We can also notice that the rows which we select will be colored.

(2) When should we use median over mean?

Ans :- As we know that Median is the exact form of the tendency specially where there are
irregular values. Such are to be termed as outliers.

For ex:- Let us consider the following example:

Rahul’s father gets his blood pressure checked for every week. But for one week due to the
defect in the machine, the blood pressure was recorded high.
From the above illustration we can observe that Rahul’s father mean value is different from
regular blood pressure values due to the problem/defect in the blood pressure machine.
Though the median value still correctly shows the centre point of the data set. Now, in the
data set where there is presence of outliers , as compared to mean median is the most
effective measuring of central tendency.

(3) What is Mean Absolute Deviation?

Ans:- Mean Absolute Deviation (MAD) is the average calculation of the distance between
the values of the data set from the mean.

Let us consider the following data set and solve the following:-

12 16 10 18 11 19

Step 1: Make the calculation of Mean

Mean = (12 + 16 + 10 + 18 + 11 + 19)/6 = 14 (Round figure)

Step 2: In order to find the exact/absolute value, we are supposed to calculate the distance
of each point from the mean. Suppose if the distance from mean is -2,then we can avoid
the negative sign (-).

Following is the table which is related to the distance which we get after
calculating the each data point from the mean.
Value Distance form the mean value (14)
12 2
16 2
10 4
18 4
11 3
19 5
Total 20

Step 3: Now it’s the time for us to calculate the mean of the distances

Mean of distances = (2+2+4+4+3+5)/6 = 3.33

So 3.33 is our absolute deviation and the mean is 14.

The Mean absolute deviation will give us an idea about the variation of data set.

(4) What is a two way relative frequency table? How is it different from two way frequency
table?

Ans: The two-way relative frequency table is similar to two way frequency type of table. We
can consider the difference here on the basis of percentage instead of number. In two-way
table frequency tables shows data points which fits in each category. We can also take the
help of column relative frequencies and row relative frequencies, which mostly depends on
the problem.

Let us take into consider the table of two-way table where the indoor and outdoor games
preference are been recorded :-

Two-way frequency table


Preferences Girls Boys
Indoor games 70 20
Outdoor games 30 80
Total 100 100
We can make the conversion into the relative two-way frequency table, we will only
change the individual cells into the percentages.
Two-way relative frequency table
Preferences Girls Boys
Indoor games 70% 20%
Outdoor games 30% 80%
Total 100% 100%
Two way relative frequency is much useful when there is difference in the sizes of
the sample data set. Preference comparison can be made by using percentages.
(5) What are the two way frequency table beneficial for?

Ans: Two way relative frequency is much useful when there is difference in the sizes of the
sample data set. Preference comparison can be made by using percentages.

(6) What is Standard Deviation?

Ans:- Standard Deviation is related to the measuring how the numbers are been spreaded
out. In other terms, it shows how much data is been spreaded around the mean or an
average.

For ex :- We can determine whether all the points are nearer to the average or whether they
are above or below the average.

(7) How to calculate Standard Deviation?

Ans: We can make use of the following steps if we want to find the final standard deviation:

For ex: Suppose take the values as 1, 2, 3, 5 and 8

(a) You have to calculate the mean by adding up all the pieces of the data and then make a
division by the number of the pieces of data.

1+2+3+5+8 = 19

19/5 = 3.8 (mean)

(b) You have to subtract the mean from every single values.

1 -3.8 = -2.8
2 -3.8 = -1.8
3 -3.8 = -0.8
4 – 3.8 = 1.2

8 – 3.8 = 4.2

(c) Find out the differences of each squares.

-2.8 *- 2.8 = 7.84


-1.8 * – 1.8 = 3.24
– 0.8 * – 0.8 = 0.64
1.2 * 1.2 = 1.44
4.2 * 4.2 = 17.64

(d) To find difference/variance, we need to find out the average of the squared numbers
which is calculated to point number 3.

7.84+3.24+0.64+1.44+17.64 = 30.8

30.8/5 = 6.16 (Variance)

(e) Now we can get our standard deviation by finding out the square root of the variance.

Square root of 6.16 = 2.48

Hence the standard deviation of the values 1,2,3,5 and 8 are 2.48

(8) Name five real-life applications of Standard Deviation

Ans: The five real-life applications of Standard Deviation are :

(a) Grading Tests: If in case the teacher wants to analyse that the students performance is
at the same level or it is a higher standard deviation.

(b) To calculate the results of any survey: If any of the person has received any
responsibility from the survey and wants to measure its reliability, then he may make the
prediction about how the bigger group people may answer.

(c) Weather Forecasting: If the person has analysed the low temperature forecasted for
three different cities, then a low standard deviation will always show the reliable weather
forecast.

(d) Marketing: Every marketers they calculate the standard deviation of the revenues which
is been earned after every advertisement. So they can expect the variation in the revenue
how much they expect from the given advertisement.

(e) Real – Estate :- Every real estate agents makes use of standard deviation. It is helpful in
calculating the prices of houses as per the square footage in the particular area, so they
can inform their clients about the different in the prices of houses as per their
expectations.
(9) Explain five real-life situations where subsetting data can be advantageous

Ans:

Here are some key uses of subsetting:

• Excel and Spreadsheet Operations

It is commonly employed in spreadsheet software like Microsoft Excel.


Users can filter data rows based on specific conditions, allowing them to
view and manipulate only the data that meets certain criteria. This is
particularly useful when dealing with large datasets, streamlining the
analysis process.

• Data Analysis and Business Intelligence

It plays a crucial role in data analysis and business intelligence. Analysts


can focus on subsets of data that are relevant to their research, enabling
them to uncover patterns, trends, and insights that might be obscured in a
larger dataset.

• Database Management and Queries

In database systems, filtering retrieves specific records that meet certain


criteria. This ensures that only relevant data is accessed, reducing
processing time and improving overall system performance.

In data base management systems, filtering is integral to crafting SQL


queries. By applying filters to SELECT statements, users can retrieve data
filters that match specific conditions, avoiding the need to sift through
irrelevant information.

• E-commerce and Marketing

For businesses engaged in e-commerce, data filtering aids in targeting


specific customer segments. Marketers can leverage this process to tailor
campaigns, promotions, and product recommendations based on customer
preferences and behaviors.
• Network Security

Filtering is a crucial component of network security and data security,


where it is employed to identify and block potentially harmful data or
traffic. This helps prevent cyber threats and ensures the integrity of a
network.

• Research and Academia

Researchers often sift through vast datasets to identify relevant


information for their studies. Data filtering streamlines this process,
enabling scholars to focus on the specific data points that are pertinent to
their research objectives

High Order Thinking Skills (HOTS)

(1) Draw a graph to represent Standard Deviation

Ans:

Create a standard deviation Excel graph using the below steps:

1. Select the data and go to the “INSERT” tab. Then, under


“Charts,” select “Scatter” chart, and prefer a “Scatter with
Smooth Lines” chart.
2. Now, we will have a chart like this.

3. If needed, you can change the chart axis and title.


Conclusion: Our SD is 3.82, slightly higher, so our bell curve is
wider. If the SD is small, we will get a slim bell curve.

(2) Calculate the mean of the data set – [56, 89, 76, 58, 58, 65]

Ans: Mean = 56 + 89 + 76 + 58 + 58 + 65 = 402


Mean = 402 / 6
Mean = 67

(3) Calculate the median of this data set – [56, 89, 76, 58, 58, 65]

Ans: 56, 89, 76, 58, 58, 65


Here the no of observations (n) = 6 which is an even.
First we will try to arrange all the observations in an ascending order 56, 58, 58, 65, 76, 89
The two middle scores are 56, 58 so we will add them together Median = (56 + 58 ) = 114
and then divide this total by 2 Median = 114/2

Median = 57

You might also like