Business Statistics Note
Business Statistics Note
Definition of Statistics:
Statistics can be defined as a branch of science that is concerned with collection, organization,
summarization, presentation and analysis of data – COSPA.
Data: Data are the actual measurements or observations recorded on individuals. A datum
(singular) is a single measurement or observation, usually referred to as a score or raw
score.
There are two branches of statistics, namely: Descriptive and Inferential statistics.
Descriptive Statistics: This consists of methods for organizing and summarizing information
(data) in a clear and effective way.
Examples:
Inferential Statistics: This involves methods of drawing conclusions about a population based
on information obtained from a sample of the population. This also involves testing of hypothesis
making predictions and forecasting values of the population parameters and decision making.
Examples:
Basically, there are two sources from which statistical data can be obtained, namely, published
and unpublished sources.
1
1.2.1 PUBLISHED SOURCES
These include:
ii. International Publications: World bodies such as IMF, World Bank, UNESCO, UNICEF,
WTO, WHO etc. also publish the data regarding their organizations. These are used as
published secondary data.
iii. Reports of Committees and Commissions: Union and state governments at times
appoint some committees or commissions to make research into any problem such as
Finance Commission, Minority Commission, Planning Commission etc. These committees
are given a term to probe into the matter. After the expiry of the term, they present the
report to the respective authority, which are then published. The data is analyzed to find
the required solutions.
iv. Publication by Trade Business Associations: Big trade and business associations also
publish periodic data about trade and industry which are of much use. These data is used
by scholars to analyze various problems being faced by the country. Different industries
also publish data about their own production and other elements.
v. Newspapers, Magazines and Journals: These are one of the main providers of data
on day to day basis.
ii. Individuals: This has to do with data collected by individuals. These may be data of
their thesis, research articles, term papers, etc.
iii. Private Publications: Some private institutions belonging to big education houses
also bring out their publications with data on different topics. These topics may include
2
development, employment import/export or balance of payments position etc. Different
stock exchanges also publish data in respect of companies listed with them.
Quantitative Data
Quantitative data is a set of data that is numerical. They are counts or numerical measurements,
also called ‘scale’. Quantitative data can either be discrete, for example, age, number of
customers that visit the bank in a day, etc., or continuous, for example, salaries, weight, height,
etc.
Qualitative Data
Qualitative, also called Categorical data is information that cannot be measured numerically, but
can be coded for it to be meaningful. Each value is chosen from a set of non-overlapping
categories. For example: Marital Status with categories: 'Single', 'Married', 'Divorced', 'Separated',
‘Widowed’ and ‘Engaged’. Another example is Gender with categories: 'male' and 'female'.
In statistics, the term measurement is used more broadly and is more appropriately
termed scales of measurement. Scales of measurement refer to ways in which
variables/numbers are defined and categorized. Each scale of measurement has certain
properties which in turn determine the appropriateness for use of certain statistical analyses. The
four scales of measurement are nominal, ordinal, interval, and ratio.
3
Nominal Data
Nominal data is a set of data that can be coded into categories without a particular order. For
example, in a data set, males could be coded as 0, females as 1; marital status of an individual
could be coded as 1 if married, 2 if single, 3 if separated, 4 if divorced and 5 if widowed. This
type of data are considered as categorical data but the order of the categories is meaningless.
Data that consist of only two categories like male and female or dead and alive are called
binomial data, while those that consist of more than two categories like married, single,
separated, divorced and widowed are known as multinomial data.
Ordinal Data
Ordinal data is a type of data that is categorical, but the categories are ordered logically. These
data can be ranked in order of magnitude like Good, Better, Best; where Good can be coded as
1, Better as 2 and Best as 3 showing order or hierarchy. When rating items or products, the data
generated through it is usually ordinal. Most of the scores and scales used in research fall under
the ordinal data. For example, rating score/scale for taste, smell, ease of application of products,
etc.
Interval Data
Interval data, also called discrete data, is measured along a scale in which each position is
equidistant from one another. But this type of data has no natural zero. This allows for the
distance between two pairs to be equivalent in some way. For example Celsius scale of
temperature, age of respondents, number of female lecturers in all the departments in a
particular faculty (school), etc.
Ratio Data
Ratio data is also called continuous data; it has all the qualities of interval data (natural order,
equal intervals) plus a natural zero point. This type of data is observed to be used most
frequently. Example of ratio data is height, weight, length, etc. In this type of data, it can be said
meaningfully that 10m of length is double of 5m. This ratio holds true regardless of which scale
the object is being measured in (e.g., meters or yards). Reason for this is the presence of natural
zero.
1. Primary data refer to self-acquired data. Here, the researcher or interviewer collects the
information (data) by himself and uses it for the purpose for which it was collected. For
example, if in a population census, as is always the case, the government uses data
4
collected on employment, deaths, school age, etc., to formulate policies to tackle these
aspects of the nation's problems; so we can say that the data is primary.
2. On the other hand, if one uses an already collected data for a study/research we say that
his data is secondary, since he was not the person that made the collection.
There are five major methods through which data can be collected. They are the interview
method, questionnaire, observation, register and focus group discussion (FGD) methods.
5
In market research, this is by far the most commonly used way of collecting information from the
general public.
Advantages:
1. Can be flexible with respondents.
2. More information can be collected.
3. Develops relationship with respondents.
4. Helps get full range and depth of information.
5. Help can be given to those respondents who are unable to understand the questions.
Disadvantages:
1. Can take much time.
2. Can be hard to analyze and compare.
3. Can be expensive.
4. Interviewer can bias respondent’ responses.
• Participant observation: The observer takes part in the situation he or she observes.
For example, the social worker who becomes a factory worker, to learn the habits and
customs of the community they are observing.
• Non-participant observation: The observer watches the situation, openly or concealed,
but does not participate. For example, by observing the "traffic" flow in a supermarket
before and after making changes in the store layout.
Advantages:
1. It can keep the system undisturbed.
2. The actual actions or habits of persons are observed and noted.
Disadvantages:
1. Can be expensive
2. Can be difficult to interpret seen behaviors
6
3. Can be complex to categorize observations
4. Opinions and attitudes cannot usually be obtained by observation.
5. Can influence behaviors of program participants. Actions which took place before the study
may not be observed.
This method involves data or information collected and recorded over time either when they
occurred or after the occurrence. For example, registration of births, deaths, marriages, divorces,
immigration and emigration, motor accidents, industrial accidents, etc.
Advantages
1. It is relatively inexpensive.
2. Relatively fast and easy to access.
Disadvantages
• Sending questionnaires by mail with clear instructions on how to answer the questions and
asking for mailed responses.
• Gathering all or part of the respondents in one place at one time, giving oral or written
instructions, and letting the respondents fill out the questionnaires.
• Hand-delivering questionnaires to respondents and collecting them later.
• Designing and distribution it electronically online using the computer.
Advantages:
1. Relatively inexpensive.
2. There is no interviewer bias.
7
3. Ability to reach more participants.
4. Summarizes findings in a clear and precise way.
5. The respondent has time to consult any necessary documents.
Disadvantages:
Questionnaire design is a skill that involves the consideration of the topic under study.
There are three types of questions that are used in questionnaire, namely: closed-ended
questions, open-ended questions and Likert-Scale questions.
CLOSED-ENDED QUESTION
Closed-ended questions are the questions that have options for the respondents to choose from.
For example:
OPEN-ENDED QUESTION
Open-ended questions are questions that have no options for the respondent to choose from,
rather the respondent is free to answer as much as s/he can. For example:
Age: _________
LIKERT-SCALE QUESTION
Likert-scale questions are questions with closed-ended options, but are scaled. For example:
(a) Strongly Agree (b) Agree (c) Neutral (d) Disagree (e) Strongly Disagree.
8
To design a questionnaire, the following must be considered:
1. The topic/purpose of the research.
2. The specific objectives of the research.
3. Type of research been carried out.
4. The demographic background of the respondents.
5. The type of questionnaire to be designed.
6. The types of questions that will capture the required information.
7. Operationalization of the variables in order to quantify the responses.
These are the major problems and errors that arise in data collection:
Sampling is the process of selecting units (e.g., people, organizations, items, etc) from a
population of interest so that by studying the sample we may fairly generalize our results back to
the population from which they were chosen.
Sampling Frame
A sampling frame is the population from which a sample is drawn. It is made up of a list of all
those within a population who can be sampled, and may include individuals, households,
institutions, experimental group, etc.
9
Census
A census is a complete enumeration of a population or groups at a point in time with respect to
well-defined characteristics (population, production). Data are collected for a specific reference
period. A census should be taken at regular intervals in order to have comparable information
available, therefore, most statistical censuses are conducted every 5 or 10 years. Data are usually
collected through questionnaires mailed to respondents, via the Internet, or completed by
an enumerator visiting respondents, or contacting them by telephone.
• An advantage is that censuses provide better data than surveys for small geographic
areas or sub-groups of the population. Census data can also provide a basis for sampling
frames used in subsequent surveys.
• The major disadvantage of censuses is usually the high cost associated with planning and
conducting them, and processing the resulting data.
Population
Population in statistics refers total items of which one is interested in.
Sample
A sample is a subset of a population. We use sample to make inference over the population.
10
Disadvantages of Sampling
1. Inadequacy of the samples.
2. Chances for bias.
3. Problems of accuracy.
4. Difficulty of getting the representative sample.
5. Untrained manpower.
6. Absence of the informants.
7. Chances of committing the errors in sampling.
11
Probability Sampling
A probability sampling method is one in which every unit in the population has equal chance of
being selected in the sample.
Non-Probability Sampling
Non-probability sampling is a method of sampling where the elements of the population have no
equal chance of being selected. It involves the selection of elements based on assumptions
regarding the population of interest, which forms the criteria for selection. Hence, because the
selection of elements is non-random, non-probability sampling does not allow the estimation of
sampling errors.
Advantages
Estimates are easy to calculate.
Simple random sampling is always an EPS design, but not all EPS designs are simple
random sampling.
12
Disadvantages
If sampling frame large, this method impracticable.
Minority subgroups of interest in population may not be present in s ample in sufficient
numbers for study.
For example, if we catch fish, measure them, and immediately return them to the water before
continuing with the sample, this is a WR design, because we might end up catching and
measuring the same fish more than once. However, if we do not return the fish to the water (e.g.
if we eat the fish), this becomes a WOR design.
SYSTEMATIC SAMPLING
Systematic sampling relies on arranging the target population according to some ordering scheme
and then selecting elements at regular intervals through that ordered list.
Systematic sampling involves a random start and then proceeds with the selection of every kth
element from then onwards. In this case, k = population size/sample size.
It is important that the starting point is not automatically the first in the list, but is instead
randomly chosen from within the first to the kth element in the list.
A simple example would be to select every 10th name from the telephone directory (an 'every
10th' sample, also referred to as 'sampling with a skip of 10').
Stratified random sampling involves categorizing the members of the population into mutually
exclusive and collectively exhaustive groups. An independent simple random sample is then
drawn from each group. Stratified sampling techniques can provide more precise estimates if the
population being surveyed is more heterogeneous than the categorized groups, can enable the
researcher to determine desired levels of sampling precision for each group, and can provide
administrative efficiency. An example of a stratified sample would be a sample conducted to
13
determine the average income earned by families in the Nigeria. To obtain more precise
estimates of income, the researcher may want to stratify the sample by geographic region (north,
south, etc) and/or stratify the sample by urban, suburban, and rural groupings. If the differences
in income among the regions or groupings are greater than the income differences within the
regions or groupings, precision of the estimates is improved. In addition, if the research
organization has branch offices located in these regions, the administration of the survey can be
decentralized and perhaps conducted in a more cost-efficient manner.
CLUSTER SAMPLING
CONVENIENCE SAMPLING
As the name implies, convenience sampling involves choosing respondents at the convenience of
the researcher.
Examples of convenience samples include people-in-the street interviews the sampling of people
to which the researcher has easy access, such as a class of students; and studies that use people
14
who have volunteered to be questioned as a result of an advertisement or another type of
promotion. A drawback to this methodology is the lack of sampling accuracy. Because the
probability of inclusion in the sample is unknown for each respondent, none of the reliability or
sampling precision statistics can be calculated. Convenience samples, however, are employed by
researchers because the time and cost of collecting information can be reduced.
QUOTA SAMPLING
Quota sampling is often confused with stratified and cluster sampling methodologies. All of these
methodologies sample a population that has been subdivided into classes or categories. The
primary differences between the methodologies is that with stratified and cluster sampling the
classes are mutually exclusive and are isolated prior to sampling. Thus, the probability of being
selected is known, and members of the population selected to be sampled are not arbitrarily
disqualified from being included in the results. In quota sampling, the classes cannot be isolated
prior to sampling and respondents are categorized into the classes as the survey proceeds. As
each class fills or reaches its quota, additional respondents that would have fallen into these
classes are rejected or excluded from the results. An example of a quota sample would be a
survey in which the researcher desires to obtain a certain number of respondents from various
income categories. Generally, researchers do not know the incomes of the persons they are
sampling until they ask about income. Therefore, the researcher is unable to subdivide the
population from which the sample is drawn into mutually exclusive income categories prior to
drawing the sample. Bias can be introduced into this type of sample when the respondents who
are rejected, because the class to which they belong has reached its quota, differ from those who
are used.
15