1) Unit 1. Introduction PDF
1) Unit 1. Introduction PDF
Introduction of Statistics
The term “Statistics” has been derived from the Latin word “status” or Italian word “Statista” or
German word “Statistik” of which mean political state.
In those days “Statistics” was used only in collecting the information relating to the population
of the state military strength, incomes etc. for framing the military strength was considered only
as the science of statecraft. However with passage of time, the science of statistics has been
applied very widely. So, in modern times, the science of statistics has been applied very widely
and the scope of statistics has considerably enlarged. It is used not only to the state in
administration but it is used in computer, IT, Economics, Business, Research, Bank etc. There is
hardly any place of human activity where statistics has not been used.
Statistics is the science of art of learning from data. It is concerned with the collection of data, its
subsequent description, and its analysis, which often leads to the drawing of valid conclusions.
It may be defined as the collection, presentation, analysis and interpretation of numerical data.
Scopes of Statistics
1. Statistics, computer and information technology.
2. Statistics and Accounting.
3. Statistics and Economics.
4. Statistics and Business.
5. Statistics and Planning.
6. Statistics and Mathematics
7. Statistics and Medical science.
8. Statistics and Psychology education.
Parts of statistics:
Statistics may be classified into two parts:
1. Theoretical statistics or mathematical statistics
i. Descriptive statistics
ii. Inferential statistics
2. Applied Statistics:
i. Descriptive statistics:: The part of statistics concerned with the description and summarization
of data is called descriptive statistics. Descriptive statistics measures the measure of location,
measure of dispersion, measure of skewness, measure of kurtosis etc.
ii. Inferential statistics: :The part of statistics concerned with the drawing of conclusion is called
inferential statistics .In inferential statistics; samples are taken from the population in such a way
that the drawn sample can represent the entire population. Different statistical techniques are
used to draw the valid conclusions on the basis of the statistical measures calculated from the
sample data so that the conclusion can be representative of whole data.
Limitations of statistics:
Although statistics has wide field of application, it has some limitations. Some of these
limitations are as follows:
i. Statistics doesn’t deal with individuals.
ii. Statistics doesn’t deal qualitative characteristics.
iii. Statistical laws are not exact.
iv. Statistics can be misused.
ii. Statistics doesn’t deal qualitative characteristics: The qualitative characteristics like
honesty, intelligence, kind, efficiency etc. which cannot be expressed in numbers are
not directly studies by statistics. However, it is possible to analyze such problems
statistically by expressing them in numbers. For example, we can study the
intelligence of students on the basis of their test grades.
iii. Statistical laws are not exact: Statistical laws and rules do not hold good in every
case. However, they are true in majority of cases. Generally statistical laws are
probabilistic in nature. It is also said that statistics itself is not an absolute measure. It
provides precise result minimizing error as much as possible.
iv. Statistics can be misused: Only or statistician can handle statistical data properly. It is
likely to be misused the statistics by non-statistical persons in handling data and
interpreting the result.
Scales of measurement
1. Nominal scale
2. Ordinal scale
3. Interval scale
4. Ratio scale
1. Nominal scale: It is the simplest type scale, also known as categorical scale. It is lowest
level of measurement .It is simply a system of assigning number or the symbols to objects
or events to distinguish one from another or in order or level them. The symbols or the
numbers have no numerical meaning. The arithmetic operations cannot be used for these
numerals.
For example, gender, occupation, religion are measured in nominal scale. If we use 1 for
male and 2 for female for measuring the gender, then 1 and 2 have no numeric meaning.
2. Ordinal scale: In this scale, the numerals are arranged in some order but the gaps between
the positions of the numerals are not made equal. It is used to rate preference of
respondents. It represents qualitative values in ascending or descending order.
For example, the characteristic under study is the attitude of people towards certain fact
such as positive, negative and bad, and then we may assign numbers 1 for positive, 2 for negative
and 3 for bad. These numbers are known as ranks.
3. Interval scale: In addition to ordering the data, this scale uses equidistant units to measure
the difference between scores. It assumes data have equal intervals. The intervals
between the ordered numerals are adjusted in terms of some rule.
For example, scale of temperature is an example of ordinal scale. In an increase in
temperature from 32°F to 42°F and from 64°F to 74°F, we can say the increases are equal of
10°F, but one cannot say that the temperature of 64°F is twice as warm as the temperature of
32°F.It means there is no true zero, but it possesses only arbitrary zero.
4. Ratio scale: Ratio scale is an extension of interval scale. It includes all the properties of
interval scale. This interval has also true zero point. Physical scales of time, length,
breadth, weight etc. can be considered as the simple example of ratio scale. Thus for
example, we can say that 40 seconds is twice as long as 20 seconds in certain
measurement of time. Mathematical operations like addition, subtraction, multiplication
and division can be performed.
Types of data
First step in statistical approach to a problem is the collection of numerical information
i.e. data. Actually data are the raw materials for final statistical conclusions. In statistics,
the main source is the data. To start any statistical work, we need information. These
information are data.
There are mainly two types of data on the basis of collection procedures.
i. Primary data
ii. Secondary data
ii. Secondary data: The data that has been already collected for a particular
purpose and used for next purpose is called secondary data. It is not new
and original data. These types of data are generally published in
newspapers, magazines, bulletins, reports, journals, website, radio etc.
Secondary data source is broadly classified into two types
i. Published source
ii. Unpublished source
Panel Data
It is longitudinal or cross sectional time series data. It is data related to behavior of entities
observed across time. Data of individuals is recorded repeatedly over number of years. E.g.
income of persons X and Y in years 2013, 2014 and 2015 according to age and qualification.
Population
It is totality of units or items under study belonging to a particular a class or group e.g. children
in a school, patients in a hospital, fruits in a tree, fishes in a pond etc. Census survey is conducted
to enumerate all the population units. Population can be divided into finite and infinite
population according to number of individuals belonging to the group.
Finite Population
Population containing countable number of individuals is called finite population e.g. vehicles in
workshop, customers in shopping mall, passengers in vehicle etc.
Infinite Population
Population containing unlimited number of individuals is called infinite population. E.g. fishes in
an ocean, stars in the sky etc.
Population can be further divided into homogeneous and heterogeneous according to type of
individuals in population.
Homogeneous Population
Population consisting of individuals of same type is called homogeneous population. e.g._
population of graduate passed out student.
Heterogeneous Population
Population consisting of individuals of different types is called heterogeneous population. It
contains sub population of different types. E.g. population of United States.
First of all, researcher must pay attention toward data organization and coding prior to the input
stage of data analysis. If data are not properly organized, the researcher may face difficulty while
analyzing their meaning later on. For this purpose, the data must be coded. Categorical data need
to be given a number to represent them. Once the data is coded, 1t ls ready to be stored in the
computer.
Input devices may be used for the purpose. After this, the researcher must decide the appropriate
statistical measures he will use to analyze the data. He will also have to select the appropriate
program to be used. SPSS, SAS, STATA etc. are the special statistical packaged program
whereas Microsoft Excel can be used for simple statistical analysis.