Statistics
Statistics
What is statistics?
We are just going to tell briefly about the two main
branches of statistics which will
be relevant at this point of time to you people will tell what
you mean what is understood
by descriptive statistics and inferential statistics.
The minute I talk about inferential statistics, I need to
introduce what is the notion of
a sample and a population.
So, that is what I am going to introduce.
Then we move on to understand why we need data; we
will understand a bit about how data
is collected and we will talk about how to organize data in
form of what we call a data
set.
Once we have a data set, we will understand to more about
data by classifying data in
terms of categorical and numerical or cross-sectional and
time-series, and we will talk a bit we
will discuss a bit about measurement scales.
Finally, I think any statistical analysis, the key is to
understand your data and frame
questions based on data.
So, we will focus some time to try and understand and
train ourselves to frame questions based
on data.
So, these are the learning objectives for the week 1.
What is statistics?
If you go through the definitions of statistics over the
years, you can see that there has
been a transformation and that has been changing over the
period of time.
What started as just summarizing data, then afterwards
gradually improve to inference
from data and then afterwards now with lot of data
available, statistics is being redefined
as the art of learning from data.
Now, the minute I say learning from data, it includes that
you want to seek some information
from data.
So, Sheldon Ross defined statistics as the art of learning
from data, you are concerned
with collection of data, subsequent description and their
analysis which often leads to drawing
of conclusion.
So, the main idea of statistics and statistical analysis is to
actually draw conclusions based
on data.
So, if you look at the classification of statistics, even
though there are newer branches of statistics
and new titles given, you may broadly classify the
branches of statistics or you might broadly
look at the main branches of statistics to be two: one way
you are describing data that
is a part of statistics which is concerned to description and
summarization of data more
popularly referred to as the descriptive statistics branch.
The part of statistics which is concerned with drawing
conclusions from data is called
the inferential statistics branch that is you want to infer
from data.
Now, when you want to infer from data, there is one very
important thing which is the possibility
of chance because when you are inferring from data there
is an element of chance you do
not have exactly what you are having what you know.
03:46And, hence we are preparing in this course in this
foundation course with an introduction
to probability, to help you understand or help you prepare
for the next league or the
next course where will you where you will be learning
about inferential statistics.
So, primarily when you talk about inferential statistics, we
are trying to talk about drawing
of conclusions from data.
Now, a branch of inference as inferential statistics, one
important thing is many a
time you are interested perhaps in knowing about the
percentage of all students in India
who have passed their Class 12 exams and study
engineering; the prices of all households
in Tamil Nadu; the total sales of all cars in India in the
year 2019; the age distribution
of people who visit a city Mall in a particular month.
So, one way of answering all these questions is one is
through a complete enumeration – you
go and collect data on everybody or everything you are
interested.
For example, in this question you are interested in knowing
about the percentage of all students
in India, but very quickly you understand that getting this
kind of data might not be
very easy.
So, many a time what we are interested in knowing is the
percentage of all students
in India.
Now, if I just want to construct a database and I would
want the actual data of all the
students who have passed class 12, but if my intention is
just to know an overall feel
of what are the kind of people who finally, end up taking
engineering then one thing I
would want to know is work with a smaller subset of all
the students in India.
All the set of all students in India is what we refer to as a
population.
A smaller subset of this is referred to as a sample.
It is a subset, so, I am putting it as a sample.
Now, many a time you might be wanting to know about
the prices of all houses.
Again, you need not go and find out about all the houses
that have been sold in a particular
year; you might want to know about a smaller subset of the
entire population.
One thing you want about the sample is you want it to be
as representative as possible
you want the sample to be as representative as possible.
Now, what do we mean by representative sample?
For example, let me define the population is a collection of
all elements that we are
interested in.
If this is the population so, let me draw different colours
here.
What is the tool I use?.
So, suppose this is a population and I take another subset
here.
Suppose I take a subset, this is a subset.
The smaller set is actually a subset of the larger set, but we
very quickly notice that
the smaller set does not have any yellow elements in it.
So, I cannot say this smaller set is actually a good
representative sample of the larger
set.
So, a sample is basically a subgroup of the population that
will be studied in detail.
Now, we need the idea of a population and sample and you
will be introduced to this
concept of population and sample in greater detail when
you do your inferential statistics
course.
But, nevertheless why do we need the concept of a
population and sample in this course
is, eventually when we are going to come up with
summary statistics, we always need to
understand whether the summary statistics is for a
population or a sample and this is
something which we will know in due course.
So, what is the purpose of statistical analysis?
Now, when would you use a descriptive statistics?
When would you use inferential statistics?
Now, if the purpose of your analysis is just to examine and
explore information for its
own intrinsic interest only, this study is descriptive.
Now, what do we mean by that?
Let me demonstrate it to you through a data set ok.
This is again another hypothetical data set which is just
showing the names of the cricket
players.
All of us are very well aware of these cricket players –
Tendulkar, Kohli, Dhoni.
The matches they have played, in what role, what are the
total runs, the batting average,
the highest score, wickets, bowling average and best
bowling.
Now, suppose a purpose is just to understand what are the
total runs scored, what is the
batting average, what is the who has the highest batting
average, who has the highest run scored,
who have played the most number of matches, if these who
has taken the highest number
of wickets – if these are the questions of interest then all
these questions of interest
which I have just posed now, can directly be just got from
the data set.
I might also want to order the number of runs of a batsman
has scored; I might want to also
know what is among the batsman how have the people
scored runs and all of this I can just
describe this data.
I do not have to do anything more about this data.
So, in this case the question I am asking is basically, the
purpose I have here the
purpose I have here is to just examine and explore the
information that is given.
So, the study is just descriptive.
I am not asking anything more.
I just want to describe the data set that is given here and
this study is descriptive.
But, suppose I am using this and one thing which we
notice again in this data is the
following.
If you look at this data this data is not the entire cricketing
data about all the cricketers
available from all the countries.
It is a sample from an entire population of data.
It is just a small sample.
I can say it is at best a representative sample of the Indian
cricketing data over the last
5 or 10 years or perhaps about this could be about for the
in the last decade.
It is a sample of definitely, it is a sample of the Indian
cricketing data.
But, it is again not the entire population which includes
over all batsmen and overall
cricketers, but however, if I am just interested in
summarizing this data if my inherent interest
is just about summarizing this data, then I would be
interested in only a descriptive
nature of studies for which descriptive statistics is
sufficient.
But, now if I am going to use this to draw a conclusions
further conclusions; for example,
if I want to know about the role a batsman plays with a
batting average, I would need
more information and I want to pick up a team for the
future.
For example, you we all know about the IPL auctions and
how people are chosen.
So, there is a further role.
I am just not interested in describing this data.
The bigger role for me or the bigger interest for me is to
use this data to gather or infer
some information which I am going to use in my decision
making process.
For that I would I am going to have an element of chance
and there I am going to have what
I need is I am going to have an inferential study in that
case.
So, very often we see that a descriptive study we need to
understand whether our nature of
a study is only going to be descriptive or whether we want
to do an inferential study.
When we come to inferential study a descriptive study
sorry when we come to for a descriptive
study it might be either performed on a sample or on a
population.
Since in the classes to come we will be talking about
descriptive statistics in detail.
We need to understand whether a descriptive study is
performed on a sample or on an entire
population that is the reason why we introduced the notion
of a sample and a population at
this stage.
However, if our inference is to be made about a population
based on the sample, then the
study becomes inferential.
Inferential statistics is not the scope of this course, but
however, you will be introduced
to the concept of probability which will help you develop
the methodology towards inferential
statistics.
So, in summary, you should know the two main branches
are descriptive statistics, inferential
statistics.
You are going to do a descriptive study or inferential study
based on what is your purpose
of study.
If your intrinsic purpose is just to summarize your data,
you would go for an descriptive
statistic.
But if your purpose of study is to infer into the future or
infer about a larger population
using a smaller subset, you would go for inferential
statistic.
To do understand inferential statistic, you need to
understand what is the concept of
a population and sample.