100% found this document useful (1 vote)
19 views47 pages

Lecture 1-Introduction To Statistics

The document serves as an introductory lecture on statistics and probability, covering key concepts such as descriptive and inferential statistics, types of variables, data collection methods, and sampling techniques. It includes examples and exercises to illustrate these concepts, emphasizing the importance of statistics in analyzing data and making informed decisions. Additionally, it introduces the use of Microsoft Excel for statistical calculations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
19 views47 pages

Lecture 1-Introduction To Statistics

The document serves as an introductory lecture on statistics and probability, covering key concepts such as descriptive and inferential statistics, types of variables, data collection methods, and sampling techniques. It includes examples and exercises to illustrate these concepts, emphasizing the importance of statistics in analyzing data and making informed decisions. Additionally, it introduces the use of Microsoft Excel for statistical calculations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 47

Joao Saldanha University/Inst

Elementary Statistics
(STATS 1)
Lecture 1: Introduction to
Statistics and Probability

Lecturer: Januario da Costa, PhD Cand.


Email: [email protected]
What is Statistics?
Statistics is the science of conducting studies to collect, organize, summarize, analyze,
and draw conclusions from data.

Source: Timor-Leste Agricultural Census 2019 Source: Timor-Leste Population Census 2015
Source: UNICEF, 2020

Source: National health sector strategic plan 2011-


2030
Source: Ministry of Finance, 2022
Descriptive vs Inferential
Statistics
Variable and Data
A variable is a characteristic or attribute that can assume different
values.
Data are the values (measurements or observations) that the variables
can assume. A collection of data values forms a data set.

Example 1: Collecting data about Wages (Salary) of different profession


around Dili
Sample and Population
A population consists of all subjects (human or otherwise) that are
being studied.
A sample is a group of subjects selected from a population.

Example 2: We want to study malnutrition in children under 5 years


who has mall nutrition.
Descriptive vs Inferential Statistics
Statistics is sometimes divided into two main areas, depending on how
data are used. The two areas are:

Descriptive Statistics : consists of the collection, organization, summarization,


and presentation of data.

Inferential Statistics consists of generalizing from samples to populations,


performing estimations and hypothesis tests, determining relationships
among variables, and making predictions.
Descriptive Statistics
Example 3: Consider National Census conducted by Timor-Leste Government

Source: Timor-Leste Census, 2015


Inferential Statistics
Probability used for Decision making (Hypothesis)
Example 4: A business owner may wish to know which social media platform to
use for marketing of their product.

Source: Google Image


Inferential Statistics
Probability used to determine relationships among variables.
Example 5: A researcher may wish to know the relationship between “Smoking
and Health”

Source: Google Image


EXERCISE 1
Determine which of the following is Descriptive or Inferential Statistics?
Events Type of Statistics
The average top five SDSB winners was
$30,349 Descriptive
A study done by Instituto Nacional Ciensia e
Technologia (INCT) shows that pigs who are
raised with good food has higher meet to fat Inferential
ratio than those that eats anything.

A survey of 1020 families reveal that on


average families spends $534.50 on Descriptive
Christmas Celebration.
Scientist concluded that people who often
laugh significantly increase tolerance for Inferential
pain level
EXERCISE 2
Read the following on attendance and grades and answer the questions.
A study conducted at Manatee Community College revealed that students who
attended class 95 to 100% of the time usually received an A in the class.
Students who attended class 80 to 90% of the time usually received a B or C in
the class. Students who attended class less than 80% of the time usually
received a D or an F or eventually withdrew from the class.
Based on this information, attendance and grades are related. The more you
attend class, the more likely it is you will receive a higher grade. If you improve
your attendance, your grades will probably improve. Many factors affect your
grade in a course. One factor that you have considerable control over is
attendance. You can increase your opportunities for learning by attending class
more often.
1. What are the variables under study?
The variables are grades and attendance.

2. What are the data in the study?


The data consist of specific grades and attendance numbers

3. Are descriptive, inferential, or both types of statistics used?


These are descriptive statistics; however, if an inference were made to all students, then
that would be inferential statistics.
4. What is the population under study?
The population under study is students at Manatee Community College

5. Was a sample collected? If so, from where?


While not specified, we probably have data from a sample of MCC students.

6. From the information given, comment on the relationship between the variables.
Based on the data, it appears that, in general, the better your attendance, the higher
your grade.
Variables and Types of Data
Qualitative (Categorical) vs Quantitative Variable

Qualitative variables are variables that have distinct categories according to


some characteristic or attribute.

Example 6:
• Subjects are classified according to gender (male or female), then the
variable gender is qualitative.
• Subjects are classified to indicate Religious upbringing (Catholic, Islam,
Hindu, Buddha), then the variable religious upbringing is qualitative.
• Subjects are classified geographical origin (Aileu, Baucau, Covalima,
etc.), then the variable geographical origin is qualitative.
Quantitative variables are variables that can be counted or measured.

Example 7:
• Variable age is numerical, and people can be ranked in order according
to the value of their ages. Therefore, variable age is quantitative.
• Data being taken is the Heights of people. Therefore, variable height is
quantitative.
• Data being taken is the Weights of people. Therefore, variable weights is
quantitative.
• Data being taken is the body temperatures of people. Therefore,
variable body temperature is quantitative.
Discrete vs Continuous Variable
Quantitative variables can be further classified into two groups: discrete
and continuous.
Discrete variables assume values that can be counted.
Example 8:
• Number of eggs a female chicken has in a month
• The number of students in a class.
Continuous variables can assume an infinite number of values between any
two specific values. They are obtained by measuring. They often include
fractions and decimals.
Example 9:
• The Heights of people in a city
• Time a marathon runners reaches finish line.
The classification of variables can be summarized as
follows:
EXERCISE 3
Classify the following Variables as Discrete or Continuous Variables?
Events Type of Statistics
Maximum allowable motorcycle speed on
public road Continuous
The number of pages in a Statistics book
Discrete
Volume of gasoline pumped into the car Continuous
The number of books on a shelf in the JSU Discrete
library
Level of Measurements

In addition to being classified as qualitative or quantitative, variables can be


classified by how they are categorized, counted, or measured.

There are four common types of measurements scales used:


• Nominal
Qualitative Level of Measurement
• Ordinal
• Interval, and Quantitative Level of Measurement
• Ratio.
Nominal
The nominal level of measurement classifies data into mutually exclusive
(nonoverlapping) categories in which no order or ranking can be imposed on the
data.
Example 12:
• A sample of college instructors classified according to subject taught (e.g.,
English, history, psychology, or mathematics) is an example of nominal-level
measurement.
• Classifying survey subjects as male or female
• Classifying players by the number on the back of their Jersey
• Classifying people based on their political affiliation
• Classifying people based on their Religion
• Classifying people based on their Marital Status
Ordinal
The ordinal level of measurement classifies data into categories that can be
ranked; however, precise differences between the ranks do not exist.
Example 13:
• Evaluation of guest speakers from the audience might be ranked as superior,
average, or poor.
• Class exam ranks as 1st, 2nd and 3rd .
• Classify clothes according to their size (small, medium, large).
• Letter Grades (A, B, C, D, E, F).
Interval
The interval level of measurement ranks data, and precise differences between
units of measure do exist; however, there is no meaningful zero.
Example 14:
• The temperature of water and coffee ; cold Coca Cola and warm water
• The of people’s Intelligence Quotient (IQ). E.g., Antonio (IQ =109) and Maria
(IQ =111) ; Jose (IQ =130) and Jacinta (IQ =128)

Since the interval scale has no true zero point, you cannot calculate Ratios. For
example, there is no any sense the ratio of 90 to 30 degrees F to be the same
as the ratio of 60 to 20 degrees.
Ratio
The ratio level of measurement possesses all the characteristics of interval
measurement, and there exists a true zero. In addition, true ratios exist when
the same variable is measured on two different members of the population.

Example 15:
• Salary
• Length
• Weight
• Height
• Time
EXERCISE 4
What level of measurements would be used to measure each of the following
variables?

a. The ages of patients in a local hospital Ratio


b. The ratings of movies released this month Ordinal
c. Colors of athletic shirts sold by Oak Park Health Club Nominal
d. Temperatures of hot tubs in local health clubs Interval
Data Collection and Sampling
Techniques
Ways of Collecting Data
Data for research can be collected in a variety of ways. One of the most
common methods is using surveys. Surveys can be done by using a variety of
methods.
Three of the most common methods are
 The telephone survey
 The mailed questionnaire, and
 The personal interview
Data can also be collected in other ways, such as
 Surveying records or
 Direct observation of situations
Telephone surveys Advantages:
Less costly People are more candid on their opinion (no face-to-face)

Disadvantages:
have NO cellphone might not pick up the phone Many Unlisted Numbers

Mailed Questionnaires
Advantages:
Cover a wider geographic area respondents remain anonymous
Disadvantages:
Low number of responses inappropriate answers difficulty reading or understanding

Personal Interview
Advantages:
Obtaining in-depth responses
Disadvantages:
Interviewers must be trained More Costly Biased on Selection of Respondents
Sampling Technique
As stated earlier in our discussion, researchers use samples to collect data and
information about a particular variable from a large population. Using
samples saves time and money and in some cases enables the researcher to
get more detailed information about a particular subject.

Statisticians use four basic methods of sampling:


• Random sampling
• Systematic sampling
• Stratified sampling, and
• Cluster sampling
Random sampling
A random sample is a sample in which all members of the population
have an equal chance of being selected.
Systematic sampling
A systematic sample is a sample obtained by selecting every member of
the population where k is a counting number.
Stratified sampling
A stratified sample is a sample obtained by dividing the population into
subgroups or strata according to some characteristic relevant to the study.
(There can be several subgroups.) Then subjects are selected from each
subgroup.
Cluster sampling
A cluster sample is obtained by dividing the population into sections or
clusters and then selecting one or more clusters and using all members in
the cluster(s) as the members of the sample.
Other sampling Techniques
In addition to the four basic sampling methods, researchers use other
methods to obtain samples.
• Convenience sample: Here a researcher uses subjects who are
convenient.
For example, the researcher may interview subjects entering a local mall
to determine the nature of their visit or perhaps what stores they will be

patronizing.
• Volunteer sample or self-selected sample: Respondents decide for
themselves if they wish to be included in the sample.
For example, RTTL asks a question about Violence situation and then asks
people to call one number if they agree with the action taken by the government
or the police. The results are then announced at the end of the day.
Sampling Error
Since samples are not perfect representatives of the populations from which
they are selected, there is always some error in the results. This error is
called a sampling error
Sampling error is the difference between the results obtained from
a sample and the results obtained from the population from which
the sample was selected.
For example, suppose you select a sample of full-time students at your
college and find 56% are female. Then you go to the admissions office
and get the genders of all full-time students that semester and find that
54% are female. The difference of 2% is said to be due to sampling
error.
EXERCISE 5
Which of the sampling techniques were used in the following study?
a.) Out of 10 High Schools in Dili, researchers select one and record Cluster
the number of students late for a whole month.
b.) A researcher divides a group of students according to gender,
major field, and low, average, and high-grade point average. Then Stratified
she randomly selects six students from each group to answer
questions in a survey.
c.) Number of people visited Cristo Rei are recorded. Then sample Random
of this people are to be selected.
d.) Every 10th bottle of GOTA water is selected, and the amount of
liquid in the bottle is measured. The purpose is to see if the Systematic
machines that fill the bottles are working properly.
Experimental Design
Observational vs Experimental Studies
Observational Study
In an observational study, the researcher merely observes what is
happening or what has happened in the past and tries to draw
conclusions based on these observations.
Example 16:
A researcher wants to know how many people pursue higher education after
finishing high school.
There are three main types of observational studies:
• Cross-sectional study: When all the data are collected at one time.
• Retrospective study : When the data are collected using records obtained
from the past.
• Longitudinal study : When data are collected over a period of time, say,
past and present.
Experimental Study
In an experimental study, the researcher manipulates one of the variables
and tries to determine how the manipulation influences other variables.

Example 17:
A researcher is conducting a study to know which language of instruction is
better for student learning in one of the elementary school in Viqueque.
Dependent and Independent Variables
The independent variable in an experimental study is the one that is being
manipulated by the researcher. The independent variable is also called the
explanatory variable. The resultant variable is called the dependent variable
or the outcome variable.
Example 18:
A researcher is conducting the relationship between physical exercise and
health. The study is to examine various physical exercise and its affects on
overall physical exercise.
Computers use for Statistics
Microsoft Excel
In the past statistical calculations were done with pencil and paper.
However, with the invention of computers numerical computation
becomes much easier and faster. In this class we will be using Microsoft
excel for our statistical calculation.

See Videos for Clear Instruction


Statistical
Function in
EXCEL
Data Analysis Tool pack
Installation Steps:
1. File Options
2. Click on “Add-ins”
3. In Manage scroll to select “Excel add-ins”
4. Click “go”
5. Check box “Analysis tool pack” , “Analysis Tool pack - VBA” and
“Solver ad-ins”
6. Click “OK”
7. Go back to main interface
8. Click “data”
9. Click “Data Analysis”

You might also like