UNIT 1-Module 1
UNIT 1-Module 1
and problem solving for a new world filled with more advanced tools of technology. The main
emphasis of the applied course is on developing the ability of the students to start with a problem in
a non-mathematical form and transform it into mathematical language. This will enable them to bring
mathematical insights and skills in devising a solution, and then interpreting this solution in real-world
terms.
Students accomplish this by exploring problems using symbolic, graphical, numerical, physical and
verbal techniques in the context of finite or discrete real-world situations. Furthermore, students
engage in mathematical thinking and modelling to examine and solve problems arising from a wide
variety of disciplines including, but not limited to, economics, medicine, agriculture, marine science,
law, transportation, engineering, banking, natural sciences, social sciences and computing.
What is Statistics?
Statistics - Statistics is a group of methods used to collect, analyse, present, and interpret data and to
make decisions.
Broadly speaking, applied statistics can be divided into two areas: descriptive statistics and inferential
statistics.
Descriptive Statistics - Descriptive statistics consists of methods for organizing, displaying, and
describing data by using tables, graphs, and summary measures.
Inferential Statistics - Inferential statistics consists of methods that use sample results to help make
decisions or predictions about a population.
COLLECTING AND DESCRIBING DATA
Primary Data: It is a term for data collected at source. This type of information is obtained directly
from first-hand sources by means of surveys, observations and experimentation and not subjected to
any processing or manipulation. Primary data means original data that has been collected specially for
the purpose in mind. It means someone collected the data from the original source first-hand. Eg.
Questionnaires, interviews, surveys done by the user etc.
Secondary Data: It refers to the data collected by someone other than the user i.e. the data is already
available and analysed by someone else. Common sources of secondary data include various
published or unpublished data, books, magazines, newspaper, trade journals etc.
Variable - A variable is a characteristic under study that assumes different values for different
elements. In contrast to a variable, the value of a constant is fixed.
Incomes, heights, gross sales, prices of homes, number of cars owned, and number of
accidents are examples of quantitative variables because each of them can be
expressed numerically. For instance, the income of a family may be $81,520.75 per
year, the gross sales for a company may be $567 million for the past year, and so forth.
Such quantitative variables may be classified as either discrete variables or continuous
variables.
Discrete Variable - A variable whose values are countable is called a discrete variable.
In other words, a discrete variable can assume only certain values with no
intermediate values.
For example, the number of cars sold on any day at a car dealership is a discrete
variable because the number of cars sold must be 0, 1, 2, 3,... and we can count it. The
number of cars sold cannot be between 0 and 1, or between 1 and 2. Other examples
of discrete variables are the number of people visiting a bank on any day, the number
of cars in a parking lot, the number of cattle owned by a farmer, and the number of
students in a class.
Continuous Variable - A variable that can assume any numerical value over a certain
interval or intervals is called a continuous variable.
EXERCISES
a. Quantitative variable
b. Qualitative variable
c. Discrete variable
d. Continuous variable
e. Quantitative data
f. Qualitative data
APPLICATIONS
Indicate which of the following variables are quantitative and which are qualitative.
Indicate which of the following variables are quantitative and which are qualitative.
Most of the time, decisions are made based on portions of populations. For
example, the election polls conducted in the United States to estimate the
percentages of voters who favour various candidates in any presidential election are
based on only a few hundred or a few thousand voters selected from across the
country.
Most of the time, decisions are made based on portions of populations. For
example, the election polls conducted in the United States to estimate the
percentages of voters who favor various candidates in any presidential election are
based on only a few hundred or a few thousand voters selected from across the
country. In this case, the population consists of all registered voters in the United
States. The sample is made up of a few hundred or few thousand voters who are
included in an opinion poll. Thus, the collection of a few elements selected from a
population is called a sample.
Census and Sample Survey - A survey that includes every member of the population
is called a census. The technique of collecting information from a portion of the
population is called a sample survey.
Parameter - A numerical measure such as the mean, median, mode, range, variance,
or standard deviation calculated for a population data set is called a population
parameter, or simply a parameter.
Statistic - A summary measure calculated for a sample data set is called a sample
statistic, or simply a statistic.
Thus, 𝜇 𝑎𝑛𝑑 𝜎 are population parameters, and 𝑥 and s are sample statistics.
A sampling frame is a list of all the items in your population. It’s a complete list of
everyone or everything you want to study. The difference between a population
and a sampling frame is that the population is general and the frame is specific. For
example, the population could be “People who live in Jacksonville, Florida.” The
frame would name all of those people, from Adrian Abba to Felicity Zappa. A couple
more examples:
Population: People in STAT101.
Sampling Frame: Adrian, Anna, Bob, Billy, Howie, Jess, Jin, Kate, Kaley, Lin, Manuel,
Norah, Paul, Roger, Stu, Tim, Vanessa, Yasmin.
Population: Birds that are pink.
Sampling Frame:
• Brown-capped Rosy-Finch.
• White-winged Crossbill.
• American Flamingo.
• Roseate Spoonbill.
• Black Rosy-Finch.
• Cassin’s Finch.
APPLICATIONS
Random Sample - A sample drawn in such a way that each element of the
population has a chance of being selected is called a random sample.
One way to select a random sample is by lottery or draw. For example, if we are to
select 5 students from a class of 50, we write each of the 50 names on a separate
piece of paper. Then we place all 50 slips in a box and mix them thoroughly. Finally,
we randomly draw 5 slips from the box. The 5 names drawn give a random sample.
On the other hand, if we arrange all 50 names alphabetically and then select the first
5 names on the list, it is a nonrandom sample because the students listed 6th to
50th have no chance of being included in the sample. select the next element. Thus,
in sampling with replacement, the population contains the same number of items
each time a selection is made. As a result, we may select the same item more than
once in such a sample. Consider a box that contains 25 marbles of different colors.
Suppose we draw a marble, record its color, and put it back in the box before
drawing the next marble. Every time we draw a marble from this box, the box
contains 25 marbles. This is an example of sampling with replacement. The
experiment of rolling a die many times is another example of sampling with
replacement because every roll has the same six possible outcomes. Sampling
without replacement occurs when the selected element is not replaced in the
population. In this case, each time we select an item, the size of the population is
reduced by one element. Thus, we cannot select the same item more than once in
this type of sampling. Most of the time, samples taken in statistics are without
replacement. Consider an opinion poll based on a certain number of voters selected
from the population of all eligible voters. In this case, the same voter is not selected
more than once. Therefore, this is an example of sampling without replacement.
Suppose we have a list of 100 students and we want to select 10 of them. If we write
the names of all 100 students on pieces of paper, put them in a hat, mix them, and
then draw 10 names, the result will be a random sample of 10 students. However, if
we arrange the names of these 100 students alphabetically and pick the first 10
names, it will be a non-random sample because the students who are not among the
first 10 have no chance of being selected in the sample.
v. Simple random, stratified random, systematic random, cluster and quota
sampling.
Random Sample - A sample drawn in such a way that each element of the
population has a chance of being selected is called a random sample. If all
samples of the same size selected from a population have the same chance of
being selected, we call it simple random sampling. Such a sample is called a
simple random sample.
Advantages:
3. This is suitable for data analysis which includes the use of inferential
statistics.
Disadvantages:
1. This method carries larger errors from the same sample size than that are
found in stratified sampling.
6. It may be impossible to contact the cases which are very widely dispersed.
Advantages:
Disadvantages:
2. Overlapping can be an issue if there are subjects that fall into multiple
subgroups. When simple random sampling is performed, those who are
in multiple subgroups are more likely to be chosen. The result could be
a misrepresentation or inaccurate reflection of the population.
The simple random sampling procedure becomes very tedious if the size of
the population is large. For example, if we need to select 150 households
from a list of 45,000, it is very timeconsuming either to write the 45,000
names on pieces of paper and then select 150 households or to use a table
of random numbers. In such cases, it is more convenient to use systematic
random sampling. The procedure to select a systematic random sample is
as follows. In the example just mentioned, we would arrange all 45,000
households alphabetically (or based on some other characteristic). Since
the sample size should equal 150, the ratio of population to sample size is
45,000/150 = 300. Using this ratio, we randomly select one household from
the first 300 households in the arranged list using either method. Suppose
by using either of the methods, we select the 210th household. We then
select every 210th household from every 300 households in the list. In
other words, our sample includes the households with numbers 210, 510,
810, 1110, 1410, 1710, and so on
Advantages:
1. It is simple and convenient to use - The algorithm to make selections is
predetermined, which means the only randomized component of the
work involves the selection of the first individual. Then the selection
process moves across the linear or circular pattern initiated until the
desired population group is ready for review.
Disadvantages:
Advantages:
4. Cluster sampling can be taken from multiple areas - Clusters can be defined
within a single community, multiple communities, or multiple
demographics. The procedures used for obtaining information follow the
same process, no matter how large the sample happens to be.
Disadvantages:
4. Every cluster may have some overlapping data points - The goal of cluster
sampling is to reduce overlaps in data, which may affect the integrity of the
conclusions which can be found. When creating a cluster, however, every
demographic, community, or population group will have some level of
overlap on an individual level. That creates a level of variability within the
data that creates sampling errors on a regular basis. In some instances, the
sampling error could be large enough to reduce the representative nature
of the data, invalidating the conclusions.
Advantages:
Disadvantages:
1. Quota sampling does not allow random selection of participants of the research.
2. Quota sampling increases the risk of researcher bias as a researcher might include
people in research who he finds easy to approach or have co-operative nature.
5. The accuracy of quota sampling largely depends on the judgment of the study. The
biased approach of the researcher influences the accuracy of the result of the
quota sampling research method.
Vi Random numbers, “lottery” techniques
Random Numbers - Random numbers are numbers that occur in a sequence such
that two conditions are met: (1) the values are uniformly distributed over a
defined interval or set, and (2) it is impossible to predict future values based on
past or present ones. Random numbers are important in statistical analysis and
probability theory.
Lottery Technique - The most common set from which random numbers are
derived is the set of single-digit decimal numbers {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. The
task of generating random digits from this set is not trivial. A common scheme is
the selection (by means of a mechanical escape hatch that lets one ball out at a
time) of numbered ping-pong balls from a set of 10, one bearing each digit, as the
balls are blown about in a container by forced-air jets. This method is popular in
lotteries. After each number is selected, the ball with that number is returned to
the set, the balls are allowed to blow around for a minute or two, and then another
ball is allowed to escape.
There are four major uses of random numbers:
(1) To protect against selective bias in the acquisition of information from sample
surveys and experiments. In these same contexts, random numbers provide a
known probability structure for statistical calculations.
(2) To gain insight, by simulation, into the behaviour of complex mechanisms or
models.
(3) To study theoretical properties of statistical procedures, such as efficiency of
estimation and power of statistical tests.
(4) To obtain approximate solutions to other mathematical problems.
(5) Random numbers are useful for a variety of purposes, such as generating data
encryption keys, simulating and modelling complex phenomena and for selecting
random samples from larger data sets. They have also been used aesthetically, for
example in literature and music, and are of course ever popular for games and
gambling.
(b) Data Collection
What is a Questionnaire?
A questionnaire is a research instrument that consists of a set of questions or other types
of prompts that aims to collect information from a respondent. A research questionnaire
is typically a mix of close-ended questions and open-ended questions. Open-ended, long-
form questions offer the respondent the ability to elaborate on their thoughts. Research
questionnaires were developed in 1838 by the Statistical Society of London.
The data collected from a data collection questionnaire can be both qualitative as well as
quantitative in nature. A questionnaire may or may not be delivered in the form of a
survey, but a survey always consists of a questionnaire.
Questionnaire Examples
The best way to understand how questionnaires work is to see the types of questionnaires
available. Some examples of a questionnaire are:
1. Customer Satisfaction Questionnaire: This type of research can be used in any
situation where there’s an interaction between a customer and an organization. For
example, you might send a customer satisfaction survey after someone eats at your
restaurant. You can use the study to determine if your staff is offering excellent
customer service and a positive overall experience.
2. Product Use Satisfaction: You can use this template to better understand your
product’s usage trends and similar products. This also allows you to collect customer
preferences about the types of products they enjoy or want to see on the market.
2. Keep it simple:
The words or phrases you use while writing the questionnaire must be easy to understand. If
the questions are unclear, the respondents may simply choose any answer and skew the data
you collect.
For efficient market research, researchers need a representative sample collected using one
of the many sampling techniques, such as a sample questionnaire. It is imperative to plan and
define these target respondents based on the demographics required.
Observation Schedules - Observation research is a qualitative research technique where
researchers observe participants’ ongoing behaviour in a natural situation. Usually, it is an
analytical form, or coding sheet, filled out by researchers during structured observation.
It carefully specifies beforehand the categories of behaviours or events under scrutiny and
under what circumstances they should be assigned to those categories. Observations are
then fragmented, or coded, into these more manageable pieces of information, which are
later aggregated into usable, quantifiable data.
Procedure - Depending on the type of observation research and the goal of the study, the
researcher will have varying levels of participation in the study. Sometimes the researcher
will insert themselves into the environment, and other times, the researcher will not
intervene in the setting and observe from a distance or in a laboratory setting.
The purpose of this type of research is to gather more reliable insights. In other words,
researchers can capture data on what participants do as opposed to what they say they
do.
1. Controlled Observation
Controlled observations are typically a structured observation that takes place in a psych
lab. The researcher has a question in mind and controls many of the variables, including
participants, observation location, time of the study, circumstances surrounding the
research, and more.
During this type of study, the researcher will often create codes that represent different
types of behaviours. That way, instead of writing a detailed report, they can classify
behaviour into different categories and analyse the data with more ease.
2. Naturalistic observation
Naturalistic observation is another type of observation research method used by market
researchers. This type of observation is when market researchers study the behaviours of
participants in a natural surrounding. There are typically no predetermined behavioural
codes. Instead, the researcher will take rigorous notes and code the data later.
Some advantages of naturalistic observation include:
• The study ensures validity when participants are in their natural setting.
• This type of study can generate new ideas and research questions.
• It opens researchers’ minds to possibilities they might not have considered before.
• Researchers can collect authentic data and avoid any potential problems with self-
reported data.
Disadvantages include:
• You can’t control different variables, making it difficult to replicate the study and test
for reliability.
• It may be challenging to conduct this type of study on a wide scale.
• You have to use skilled researchers, so you don’t risk missing critical behavioral data.
• You aren’t able to manipulate any variables.
3. Participant observation
The last type of observation method is participant observation. This is a type of naturalistic
observation in the fact that market researchers will observe participants in their natural
habitat. The difference is market researchers will insert themselves into the environment.
• Semi-Structured Interviews:
Semi-structured interviews offer a considerable amount of leeway to the researcher to
probe the respondents along with maintaining basic interview structure. Even if it is a
guided conversation between researchers and interviewees – an appreciable flexibility is
offered to the researchers. A researcher can be assured that multiple interview rounds
will not be required in the presence of structure in this type of research interview.
Keeping the structure in mind, the researcher can follow any idea or take creative
advantage of the entire interview. Additional respondent probing is always necessary to
garner information for a research study. The best application of semi-structured interview
is when the researcher doesn’t have time to conduct research and requires detailed
information about the topic.
Advantages of semi-structured interviews:
• Questions of semi-structured interviews are prepared before the scheduled interview
which provides the researcher with time to prepare and analyze the questions.
• It is flexible to an extent while maintaining the research guidelines.
• Researchers can express the interview questions in the format they prefer, unlike the
structured interview.
• Reliable qualitative data can be collected via these interviews.
• Flexible structure of the interview.
Learn more: Quantitative Data
Disadvantages of semi-structured interviews:
• Participants may question the reliability factor of these interviews due to the
flexibility offered.
• Comparing two different answers becomes difficult as the guideline for
conducting interviews is not entirely followed. No two questions will have the
exact same structure and the result will be an inability to compare are infer
results.
• Unstructured Interviews:
Also called as in-depth interviews, unstructured interviews are usually described as
conversations held with a purpose in mind – to gather data about the research study.
These interviews have the least number of questions as they lean more towards a normal
conversation but with an underlying subject.
The main objective of most researchers using unstructured interviews is to build a bond
with the respondents due to which there are high chances that the respondents will be
100% truthful with their answers. There are no guidelines for the researchers to follow
and so, they can approach the participants in any ethical manner to gain as much
information as they possibly can for their research topic.
Since there are no guidelines for these interviews, a researcher is expected to keep their
approach in check so that the respondents do not sway away from the main research
motive. For a researcher to obtain the desired outcome, he/she must keep the following
factors in mind:
• Intent of the interview.
• The interview should primarily take into consideration the participant’s interest and
skills.
• All the conversations should be conducted within permissible limits of research and
the researcher should try and stick by these limits.
• The skills and knowledge of the researcher should match the purpose of the interview.
• Researchers should understand the do’s and don’ts of unstructured interviews.
Advantages of Unstructured Interviews:
• Due to the informal nature of unstructured interviews – it becomes extremely easy
for researchers to try and develop a friendly rapport with the participants. This leads
to gaining insights in extreme detail without much conscious effort.
• The participants can clarify all their doubts about the questions and the researcher
can take each opportunity to explain his/her intention for better answers.
• There are no questions which the researcher has to abide by and this usually increases
the flexibility of the entire research process.
Disadvantages of Unstructured Interviews:
• As there is no structure to the interview process, researchers take time to execute
these interviews.
• The absence of a standardized set of questions and guidelines indicates that the
reliability of unstructured interviews is questionable.
• In many cases, the ethics involved in these interviews are considered borderline
upsetting.
Advantages
A trimmed mean is obviously less susceptible to the effects of extreme scores than is the
arithmetic mean. It is therefore less susceptible to sampling fluctuation than the mean for
extremely skewed distributions.
Eg. Trimmed means are often used in Olympic scoring to minimize the effects of extreme
ratings possibly caused by biased judges.
Disadvantages
It is less efficient than the mean for normal distributions.
Standard Deviation
Standard deviation is a statistic that looks at how far from the mean a group of numbers
is, by using the square root of the variance. The calculation of variance uses squares
because it weighs outliers more heavily than data closer to the mean. This calculation also
prevents differences above the mean from canceling out those below, which would result
in a variance of zero.
Standard deviation is calculated as the square root of variance by figuring out the variation
between each data point relative to the mean. If the points are further from the mean,
there is a higher deviation within the date; if they are closer to the mean, there is a lower
deviation. So the more spread out the group of numbers are, the higher the standard
deviation.
Variance
The variance is the average of the squared differences from the mean. To figure out the
variance, first calculate the difference between each point and the mean; then, square
and average the results.
For example, if a group of numbers ranges from 1 to 10, it will have a mean of 5.5. If you
square the differences between each number and the mean, and then find their sum, the
result is 82.5. To figure out the variance, divide the sum, 82.5, by N-1, which is the sample
size (in this case 10) minus 1. The result is a variance of 82.5/9 = 9.17. Standard deviation
is the square root of the variance so that the standard deviation would be about 3.03.
Because of this squaring, the variance is no longer in the same unit of measurement as
the original data. Taking the root of the variance means the standard deviation is restored
to the original unit of measure and therefore much easier to interpret.