Identifying Patterns
Identifying Patterns
Chapter 3
(1) What is the Data Science term used to describe partiality, preference, and prejudice?
(a) Bias
(b) Favouritism
(c) Influence
(d) Unfairness
(c) A value near 0 means that the event is not likely to occur/happen
(4) The central limit theorem states that sampling distribution of the sample mean is
approximately normal if
(a) All possible samples are selected
(5) The central limit theorem says that the mean of the sampling distribution of the sample
mean is
(a) Equal to the population mean divided by the square root of the sample size
Ans: (b) Close to the population mean if the sample size is large
(6) Sample of size 25 are selected from a population with mean 40 and standard deviation
7.5. the mean of the sampling distribution sample mean is
(a) 7.5
(b) 8
(c) 40
Ans: (c) 40
Standard Questions
Ans: There is always occurrence of the situation where if some one is fond of a particular
thing, that person slightly tries to become partial towards it. This action may effect the
result of the certain thing. It is not the exact way of dealing with the data if it is large. The
action which of partiality, preference, prejudices towards a particular set of data is to be
termed as Bias. In the Data Science, Bias is termed as the change in the data which is
different from the expected outcome. In the other words, you can even define Bias as data
error. Such error is unnoticeable and indistinct. So the question arises that why the bias
takes first place? Sampling and estimation are the reasons for the occurrence of bias. The
occurrence of the bias would be avoided if we could know the data entities better and
would store the information on the alternative entity. Data science does not occur in the
controlled conditions carefully. It is mostly done on the searched data which is mostly
collected for modelling. This is the reason why mostly biases occur to happen in this data.
You may have the next question arising in your mind that why the bias really matters?
The data which is used for only training, such data are often considered by predictive
models. They are much aware that in their system no other reality other than the data is
feeded in their systems. The data which is feeded in the system and there is presence of
bias, then there will be a compromise of model accuracy. The models which are biased can
also try to discriminate the group of people. In order to avoid this risks, it is necessary to
avoid such type of bias.
Ans: When the model tries to disturbs the data creation that is used to train it then the
selection bias happens. When the sample data that is been collected but it fails in acting
as the representative in seeing the models exact future or predicting the population of
cases, then selection bias takes place. It even occurs in the systems which ranks the
content like the recommendation systems polls or the advertisements which are
personalised. The user responds and collects the contents which are been displayed and
the response given to the contents which are not been displayed is unknown.
Ans: It is the measuring of the bias or the most common method of labelling the data of any
project. Such type of bias occurs when the same type of data are been labelled
inconsistently, which results in lower accuracy. For ex :- Imagine we have a team which
looking after the work of the image labelling of the laptops which are damaged. After
labelling the damaged laptops, it easier or helpful in making a difference between the
damage and undamaged laptops. The data will be in consisted if the team member tries to
label the image as damaged and the similar image as partially damaged.
Ans: Confirmation bias is also termed as observer bias which is the actual result of the
data that really you want to see. This takes place when the researchers undergo the
projects with some subjective thoughts in regards to their studies that is either conscious
or unconscious. We can also notice when the labellers permits their subjective thoughts to
control their habits of labelling, which leads to inaccuracy of data.
Ans: Central Limit Theorem tells that sample distribution refers to a normal distribution
stating the reason of the larger size of sample in respective to the population distribution
shape. This theorem also states that large sample size completely differs from the
population. The sample sets mean population will be roughly equal to the mean of the
population. It mostly depends on the source of population whether it is normal or skewed
that is provided that the sample size is large. Some points will give an idea about the
Central Limit Theorem are :-
(a) The Central Limit Theorem tells that sample distribution means near to normal
distribution as the sample size gets larger.
(b) The sample sizes which are equal or greater than 30 are to be considered as enough for
the Central Limit Theorem.
(c) In the Central Limit Theorem, he average of sample mean and standard deviation will be
equal to the population mean and standard deviation.
(d) A population can be easily and correctly predicted with the help of large sample size.
(a) Voting Polls: In the elections, voting polls always gives an idea about the counting in the
supporters of a candidate. By making use of the Central Limit Theorem, the news channels
comes up the results with confidence intervals.
(b) Family Income Calculation: In the particular region/area it is much helpful in calculating
the mean family income.
(c) Economics: Many economists make use of Central Limit Theorem when the sample
data been used in order to make conclusion about the population.
(d) Manufacturing: Central Limit Theorem is mostly used in the manufacturing plant in
determining the defective products which are produced by plant.
Ans: The Central Limit Theorem tells us that whatever the distribution of population might
be, the shape of the sampling distribution will always appear normally when the sample
size increases. It is more helpful when any research never knows that in sampling
distribution which mean is much similar to population mean, by abstracting some random
samples from the population, the sample means will cluster together, which ill allow any
researcher to estimate the population mean.
Hence it is observed that when the size of the sample increases, there will be decrease in
errors.
(10) The coaches of various sports around the world use probability to better their game
and create gaming strategies. Can you explain how probability is applied in this case and
how does it help players?
Ans: In every game, the very coach makes the probability about their team in terms of
strongerness and where there is much improvement required to win the match.
Ex:- on the basis of the players previous performance in the match, the coach makes a
deep study about the average results related to the batting and bowling skills of a particular
player and then lines him up in the team.