Statistics
Refers to methods that are used to collect, process, analyse and
interpret data.
Sample
• Samples are made up of individuals. The individuals in a particular
sample may be humans, light-bulbs, cars, students, potato-fields etc.
• All members of the sample will share some common attribute or
characteristic we are interested in: color, sex, weight, price,
durability, etc.
• Furthermore, each individual member of the sample will differ from one or
more of the others on this characteristic: some will be one color, some
another; some will be male, others female; some will be lighter,
others heavier; etc
Variable
• Looking at the members of a sample, we ask how they vary among themselves
on one (or more) of such characteristics.
• Because of the variation among individuals, such characteristics
are called variable characteristics or, simply, variables.
• Variable in statistics, then, is any attribute or characteristic that
will enable us to distinguish between one individual and another.
Example:
• See the variables related to bicycle:
1. Make of bicycle (e.g. Atlas, Hero, Lifelong etc.)
2. Type of bicycle (e.g. racer, tourer, roadster, etc.)
3. Color
4. Age
5. Condition (e.g. Excellent, Acceptable, Poor)
6. Price
7. Size of frame
8. Number of gears
Types of Variables: Category
• A variable like 'Make of bicycle,' set up in categories, e.g. Atlas,
Hero, Lifelong etc.
• Such variables are called category-variable
• Also called a 'nominal variable’ (Latin – of a name).
Types of Variables: Ordinal
• Category variables that provides an order.
• For example condition of the bicycle: Excellent, Good,
Poor
• A variable that provides relative condition.
Types of Variables: Quantity
• A variable which can take any numerical values.
• For example price of bicycle.
• These are of two types:
1. Discrete (Number of gears)
2. Continuous (Height, Age)
Observations
• Each measurement, or count or classification, that is made for each
member of the sample.
• For instance, if we record the ages of a sample of 100 students, we’ll
have 100 observations.
• If we record each one's sex as well, we’ll have a total of 200
observations.
Primary and Secondary Data
• Primary data refers to the first hand data gathered by
the researcher himself thorugh Surveys, observations,
experiments, questionnaire, personal interview, etc.
• Secondary data means data collected by someone else
earlier. For example data collected from different
sources such as government publications, censuses,
internal records of the organisation, books, journal
articles, websites and reports, etc.
SURYA BANK: CASE STUDY
• questionnaire
Things to be Taken Care while
collecting Data
• Where did the data come from? Is the source biased—that is, is it likely to have an
interest in supplying data points that will lead to one conclusion rather than another?
• How many observations do we have? Do we need to collect more observations.
• Do they represent all the groups we wish to study?
• Is the conclusion logical? Have we made conclusions that the data do not support?
Population
• A population is a collection of all the elements we are
studying.
• For example if we are doing some study related to students of
Delhi University then all students of Delhi Univeristy forms
the Population.
Sample
• Sample is a subset of population, that is collected to study the
population.
• For example if we are doing some studies on Delhi Univeristy students
and if we collect information about 1000 students, then the data of
1000 students is our sample.
• Sample is collected to make inference about the population.
Benefit of Sample
• Studying samples is easier than studying the whole population; it
costs less and takes less time.
• Sometimes its not possible to collect data of whole population or the
access to whole population is not available.
Arranging Data
• The purpose of organizing data is to enable us to see
quickly some of the characteristics of the data we have
collected.
• We look for things such as the range (the largest and
smallest values), apparent patterns, what values the
data may tend to group around, what values appear
most often, and so on.
The Frequency Distribution
. One way we can compress data is to use a frequency table or a frequency distribution.
• Divide the data in groups of similar values. Then we recorded the number of data
points that fell into
each group.
Lost some information: but gains
more
• Notice that we lose some information in constructing the frequency distribution. We no
longer know, for example, that the value 5.5 appears four times or that the value 5.1
does not
appear at all.
• Yet we gain information concerning the pattern of average inventories. We can see
from Table 2-6 that average inventory falls most often in the range from 3.8 to 4.3
days.
• It is unusual to find an average inventory in the range 2.0 to 2.5 days or from 2.6 to
3.1.
• Inventories in the ranges of 4.4 to 4.9 days and 5.0 to 5.5 days are not prevelant but
occurs frequently than others.
• Thus frequency table sacrifice some details but gains more insights.
Frequency Distribution: Continue
• Thus a frequency distribution is a table that organizes data into classes, that is, into
groups of values describing one characteristic of the data.
• A frequency distribution shows the number of observations from the data set
that fall into each of the classes.
• If you can determine the frequency with which values occur in each class of a data set,
you can construct a frequency distribution
Relative Frequency
• We can also express the frequency of each value as a fraction or a percentage of the
total number of observations.
• A relative frequency distribution presents frequencies in terms of fractions or
percentages .
• It is obtained by dividing frequency by total count.
Discrete and Continuous classes.
• Classes/Classification can be quantitative or
qualitative and either discrete or continuous.
• Discrete classes are separate entities that do
not progress from one class to the next
without a break. Such classes as the number
of children in each family, the number of
trucks owned by moving companies .
• Continuous data do progress from one class
to the next without a break. They involve
numerical measurement such as the weights
of cans of tomatoes, the pounds of pressure
on concrete, or the high school GPAs of
college students.
Class width
• The range must be divided by equal classes; that is, the width of the interval from the
beginning of one class to the beginning of the next class must be the same for every
class.
• If the classes were unequal and the width of the intervals differed among the classes,
then we would have a distribution that will be more difficult to interpret.
• As a rule, statisticians rarely use fewer than 6 or more than 15 classes .
Plotting Frequency Distribution:
Histogram
Relative Frequency Histogram
Advantages of Histograms:
Frequency Polygon
Advantage of Polygons
Ogives : Cumulative frequency distribution defined
• A cumulative frequency distribution enables us to see how many
observations lie above or below certain values, rather than
merely recording the number of items within intervals .
• A graph of a cumulative frequency distribution is called an ogive
(pronounced “oh-jive”).
Example
Ogives of relative frequencies
Pie Chart