8614
8614
Explain in detail.
Answer:
Statistics is a branch that deals with every aspect of the data. Statistical knowledge helps to
choose the proper method of collecting the data, and employ those samples in the correct
analysis process, in order to effectively produce the results. In short, statistics is a crucial
process, which helps to make the decision based on the data.
In the plural sense, statistics refers to data, but data to be called statistics must consist of
aggregate of certain facts.
A single and isolated fact or figure like, 60 Kgs. weight of a student or the death of a particular
person on a day does not amount to statistics.
For a data may amount to statistics it must be in the form of a set or aggregate of certain facts,
viz. 50, 65, 70 Kgs. Weight of students in a class or profits of a firm over different times etc. is
liable to be effected by multiplicity of causes.
It is not easy to study the effects of one factor only by ignoring the effects of other factors. Here
we have to go for the effects of all the factors on the phenomenon separately as well as
collectively, because effects of the factors can change with change of place, time or situation.
Here, the overall effect is taken and not of one factor only as in other natural sciences. For
example, we can say that result of class XII in board examination does not depend on any single
factor but collectively on standard of teachers, teaching methods, teaching aids, practical’s
performance of students, standard of question papers and as well as of evaluation.
Another characteristic of statistics is that the data should be collected in a systematic manner.
The data collected in a haphazard manner will lead to difficulties in the process of analysis, and
wrong conclusions. A proper plan should be made and trained investigators should be used to
collect data so that they may collect statistics. If it is not done, in such cases reliability of data
gets decreased. So to get correct results the data must be collected in a precise manner.
Before we start the collection of data, we must be clear with the purpose for which we are
collecting the data. If we have no information about its purpose, we may not be collecting data
according to the needs. We may need some more relevant data to achieve the required purpose,
which we would miss in the event of its ignorance.
Suppose we want to get data on imports and exports, we have to know about various segments
such as electronics, consumer articles, grains and such other segregations also. If some person on
govt. duty is counting the vehicles passing through a road in a unit time is statistics, but same
work done by any other person not related to this field, is not statistics because the former is
doing it for the Government which wants to make it four lane road-if needed.
It is last but not less important of the characteristics of the statistics. The collection of data is
generally done with the motive to compare. If the figures collected are not comparable, in that
case, they lose a large part of their significance.
It means, the figures collected should be homogeneous for comparison and not heterogeneous.
For example, Heterogeneous data like sale of Rs. 20,000 result of 80% cases and mileage of 80
kms can never be placed in relation to each other and compared for analysis and interpretation
which is the ulterior motive of the science of statistics. It can be concluded that all statistics are
numerical data but all numerical data are not statistics unless they satisfy all the essential
characteristics of statistics, depicted as above.
Q.2 What do you understand by the term “data”? Write in detail the types of data.
Answer:
Data Types are an important concept of statistics, which needs to be understood, to correctly
apply statistical measurements to your data and therefore to correctly conclude certain
assumptions about it. This blog post will introduce you to the different data types you need to
know, to do proper exploratory data analysis (EDA), which is one of the most underestimated
parts of a machine learning project.
Having a good understanding of the different data types, also called measurement scales, is a
crucial prerequisite for doing Exploratory Data Analysis (EDA), since you can use certain
statistical measurements only for specific data types.
You also need to know which data type you are dealing with to choose the right visualization
method. Think of data types as a way to categorize different types of variables. We will discuss
the main types of variables and look at an example for each. We will sometimes refer to them as
measurement scales.
Categorical Data
Categorical data represents characteristics. Therefore it can represent things like a person’s
gender, language etc. Categorical data can also take on numerical values (Example: 1 for female
and 0 for male). Note that those numbers don’t have mathematical meaning.
Nominal Data
Nominal values represent discrete units and are used to label variables, that have no quantitative
value. Just think of them as „labels“. Note that nominal data that has no order. Therefore if you
would change the order of its values, the meaning would not change. You can see two examples
of nominal features below:
The left feature that describes if a person is married would be called „dichotomous“, which is a
type of nominal scales that contains only two categories.
Ordinal Data
Ordinal values represent discrete and ordered units. It is therefore nearly the same as nominal
data, except that it’s ordering matters. You can see an example below:
Note that the difference between Elementary and High School is different than the difference
between High School and College. This is the main limitation of ordinal data, the differences
between the values is not really known. Because of that, ordinal scales are usually used to
measure non-numeric features like happiness, customer satisfaction and so on.
Numerical Data
1. Discrete Data
We speak of discrete data if its values are distinct and separate. In other words: We speak of
discrete data if the data can only take on certain values. This type of data can’t be measured but
it can be counted. It basically represents information that can be categorized into a
classification. An example is the number of heads in 100 coin flips.
You can check by asking the following two questions whether you are dealing with discrete data
or not: Can you count it and can it be divided up into smaller and smaller parts?
2. Continuous Data
Continuous Data represents measurements and therefore their values can’t be counted but they
can be measured. An example would be the height of a person, which you can describe by using
intervals on the real number line.
Interval Data
Interval values represent ordered units that have the same difference. Therefore we speak of
interval data when we have a variable that contains numeric values that are ordered and where
we know the exact differences between the values. An example would be a feature that contains
temperature of a given place like you can see below:
The problem with interval values data is that they don’t have a „true zero“. That means in
regards to our example, that there is no such thing as no temperature. With interval data, we can
add and subtract, but we cannot multiply, divide or calculate ratios. Because there is no true zero,
a lot of descriptive and inferential statistics can’t be applied.
Ratio Data
Ratio values are also ordered units that have the same difference. Ratio values are the same as
interval values, with the difference that they do have an absolute zero. Good examples are
height, weight, length etc.
Q.3 What types of characteristics a pictogram should have to successfully convey the
meaning? Write down the advantages and drawbacks of using pictograms.
Answer:
A typical tricky chart includes utilizing a vertical scale that begins at some worth more
noteworthy than zero to misrepresent contrasts between gatherings. Here's an illustration of two
diagrams that portray similar data. The chart on the left doesn't have a zero beginning stage, it
really begins at 10%. The chart on the privilege has a zero beginning stage. The diagrams
address similar data albeit the chart on the left would cause you to accept that there's greater
contrast between those utilizing oxycontin and their involvement in sickness than the fake
treatment. When in actuality there's that large of a distinction. So consistently analyze a chart
cautiously to see whether the vertical pivot starts sooner or later other than zero so contrasts are
overstated. Pictographs. Drawings of items called pictographs are frequently deceptive.
Information that is one-dimensional in nature, for example, spending sums, are regularly
portrayed with two-dimensional items, for example, dollar notes or three-dimensional articles
like heaps of coins homes, or barrels. By utilizing pictographs craftsmen can make bogus
impressions that terribly mutilate contrasts by utilizing these straightforward standards of
fundamental math. At the point when you twofold each side of a square its territory does it only
twofold, it increments by a factor of four. At the point when you twofold each side of
a 3D square its volume doesn't only twofold, it increments by a factor of eight. Here's an
illustration of a pictogram portrayal of the abatement in smoking from 1970 to 2013. Presently,
these are three-dimensional items, the chamber fits as a fiddle fundamentally. It would appear
that the cigarette on the privilege is a lot more modest than the cigarette on the left - not exactly a
large portion of the size without a doubt.
In any case, if notice, the rates in 1970: 37 percent of family country grown-ups smoked, while
in 2013 18 percent of grown-ups smoked. In the event that this was an exact pictorial portrayal of
the connection between the nineteen 70% and the 2,000 thirteen percent of smokers, at that point
the image on the privilege would just be half pretty much as large as the image on the left. Be
that as it may, it has all the earmarks of being a lot more modest. Some finishing up
considerations notwithstanding the diagrams we've examined in this segment there are numerous
other helpful charts, some of which may not yet have been made. The world necessities more
individuals who can make unique diagrams that illuminate us about the idea of information. In
the visual showcase of quantitative data, Edward Tufte offers these standards: for little datasets
of twenty qualities utilize a table rather than a diagram.
A diagram of information should make us center around the real essence of the information, not
on different components like eye-getting but rather diverting plan highlights. Try not to misshape
information develop a diagram to uncover the real essence of the information. Practically the
entirety of the ink in a diagram ought to be utilized for the information, not for the other plan
components. Diagrams that can be far beyond this concise acquaintance may lead you with
acceptance. Look at the diagrams and related data found on this site. We're not going to take a
great deal of time together taking a gander at the various diagrams yet this is a month-to-month
highlight in the Country Times that incorporates a posting of a chart and afterward, they get
some information about what they think shows up in the diagrams. There are some actually quite
intriguing diagrams that are utilized here this initial one knows it's little so you might need to
take some time and afterward read the translation of others as they took a gander at the charts.
Q.4 Define normal curve. Write down the properties of normal curve.
Answer:
The normal distribution is also referred to as Gaussian or Gauss distribution. The distribution is
widely used in natural and social sciences. It is made relevant by the Central Limit Theorem,
which states that the averages obtained from independent, identically distributed random
variables tend to form normal distributions, regardless of the type of distributions they are
sampled from.
A normal distribution is symmetric from the peak of the curve, where the mean is. This means
that most of the observed data is clustered near the mean, while the data become less frequent
when farther away from the mean. The resultant graph appears as bell-shaped where the mean,
median, and mode are of the same values and appear at the peak of the curve.
The graph is a perfect symmetry, such that, if you fold it at the middle, you will get two equal
halves since one-half of the observable data points fall on each side of the graph.
Parameters of Normal Distribution
The two main parameters of a (normal) distribution are the mean and standard deviation. The
parameters determine the shape and probabilities of the distribution. The shape of the distribution
changes as the parameter values change.
1. Mean
The mean is used by researchers as a measure of central tendency. It can be used to describe the
distribution of variables measured as ratios or intervals. In a normal distribution graph, the mean
defines the location of the peak, and most of the data points are clustered around the mean. Any
changes made to the value of the mean move the curve either to the left or right along the X-axis.
2. Standard Deviation
The standard deviation measures the dispersion of the data points relative to the mean. It
determines how far away from the mean the data points are positioned and represents the
distance between the mean and the observations.
On the graph, the standard deviation determines the width of the curve, and it tightens or expands
the width of the distribution along the x-axis. Typically, a small standard deviation relative to the
mean produces a steep curve, while a large standard deviation relative to the mean produces a
flatter curve.
Properties
1. It is symmetric
A normal distribution comes with a perfectly symmetrical shape. This means that the distribution
curve can be divided in the middle to produce two equal halves. The symmetric shape occurs
when one-half of the observations fall on each side of the curve.
3. Empirical rule
In normally distributed data, there is a constant proportion of distance lying under the curve
between the mean and specific number of standard deviations from the mean. For example,
68.25% of all cases fall within +/- one standard deviation from the mean. 95% of all cases fall
within +/- two standard deviations from the mean, while 99% of all cases fall within +/- three
standard deviations from the mean.
Skewness and kurtosis are coefficients that measure how different a distribution is from a normal
distribution. Skewness measures the symmetry of a normal distribution while kurtosis measures
the thickness of the tail ends relative to the tails of a normal distribution.
Most statisticians give credit to French scientist Abraham de Moivre for the discovery of normal
distributions. In the second edition of “The Doctrine of Chances,” Moivre noted that
probabilities associated with discreetly generated random variables could be approximated by
measuring the area under the graph of an exponential function.
Moivre’s theory was expanded by another French scientist, Pierre-Simon Laplace, in “Analytic
Theory of Probability.” Laplace’s work introduced the central limit theorem that proved that
probabilities of independent random variables converge rapidly to the areas under an exponential
function.
Q.5 Explain procedure for determining median, with one example each at least, if:
Answer:
Median, in statistics, is the middle value of the given list of data, when arranged in an order. The
arrangement of data or observations can be done either in ascending order or descending order.
In Maths, the median is also a type of average, which is used to find the center value. Therefore,
it is also called measure of central tendency.
Apart from the median, the other two central tendencies are mean and mode. Mean is the ratio of
sum of all observations and total number of observations. Mode is the value in the given data-set,
repeated most of the time.
Learn more:
Mean
Mode
In geometry, a median is also defined as the center point of a polygon. For example, the median
of a triangle is the line segment joining the vertex of triangle to the center of the opposite sides.
Therefore, a median bisects the sides of triangle.
Median in Statistics
The median of a set of data is the middlemost number or center value in the set. The median is
also the number that is halfway into the set.
To find the median, the data should be arranged, first, in order of least to greatest or greatest to
the least value. A median is a number that is separated by the higher half of a data sample, a
population or a probability distribution, from the lower half. The median is different for different
types of distribution.
For example, the median of 3, 3, 5, 9, 11 is 5. If there is an even number of observations, then
there is no single middle value; the median is then usually defined to be the mean of the two
middle values: so the median of 3, 5, 7, 9 is (5+7)/2 = 6.
Median Formula
The formula to calculate the median of the finite number of data set is given here. Median
formula is different for even and odd numbers of observations. Therefore, it is necessary to
recognise first if we have odd number of values or even number of values in a given data set.
The formula to calculate the median of the data set is given as follow.
If the total number of observation given is odd, then the formula to calculate the median is:
Median = {(n+1)/2}thterm
If the total number of observation is even, then the median formula is:
To find the median, place all the numbers in the ascending order and find the middle.
Example 1:
solution:
Example 2:
4, 17, 77, 25, 22, 23, 92, 82, 40, 24, 14, 12, 67, 23, 29
Solution:
4, 12, 14, 17, 22, 23, 23, 24, 25, 29, 40, 67, 77, 82, 92,
Example 3:
Rahul’s family drove through 7 states on summer vacation. The prices of Gasoline differ from
state to state. Calculate the median of gasoline cost.
Solution:
Hence, the median of the gasoline cost is 1.84. There are three states with greater gasoline costs
and 3 with smaller prices.
Let us see an example here to find mean, median and mode of the observations.
Thus,
Median = Middle Value = 9
For more Maths-related articles, register with BYJU’S – The Learning App and download the
app to learn with ease.
A median is the center value of a given list of observations when arranged in an order.
For example, a list of observations is 33, 55, 77, 22, 11.
Arranging in ascending order, we get:
11,22,33,55,77
Hence, the median is 33.
If the number of given set of observations is 2, then we have to apply the formula of median for
even number of observations, i.e.
Median = [(n/2)th term + {(n/2)+1}th term]/2
Example: Median of 15 and 20 is: [(15)+(20)]/2 = 35/2 = 17.5