0% found this document useful (0 votes)
17 views8 pages

STS Reviewer

Reviewer for Statistics

Uploaded by

gwellnrd6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views8 pages

STS Reviewer

Reviewer for Statistics

Uploaded by

gwellnrd6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Overview of Statistics Matlab

 MatLab is an analytical platform and programming


language that is widely used by engineers and
Statistics
scientists. As with R, the learning path is steep, and
 Statistics is the science of collecting, organizing, you will be required to create your own code at
summarizing, and analyzing information to draw some point.
conclusions or answer questions.
 It provides procedure in data collection, presentation,
organization, and interpretation to have a meaningful SAS
idea.  SAS is a statistical analysis platform that offers
options to use either the GUI, or to create scripts
for more advanced analyses. It is a premium
Importance of Statistics
solution that is widely used in business, healthcare,
 Statistics plays a major role in many aspects of our lives. and human behavior research alike.
 It is used in sports, for example, to help a general
manager decide which player might be the best fit for a
team. GraphPad Prism
 It is used in politics to help candidates understand how  GraphPad Prism is premium software primarily
the public feels about various policies. used within statistics related to biology, but offers a
 It is used in medicine to help determine the range of capabilities that can be used across various
effectiveness of new drugs. fields.
 Statistical research in business enables managers to
analyze past performance, predict future business Minitab
practices and lead organizations effectively. Statistics  The Minitab software offers a range of both basic
can describe markets, inform advertising, set prices and and fairly advanced statistical tools for data
respond to changes in consumer demand. analysis.
 Statistics, being quantitative tools widely used in the
areas of economics and finance, could help to shape
Excel
effective monetary and fiscal policies and to develop
pricing models for financial assets such as equities,  Excel offers a wide variety of tools for data
bonds, currencies, and derivative securities. visualization and simple statistics. It is simple to
generate summary metrics and customizable
graphics and figures, making it a usable tool for
many who want to see the basics of their data.

Computer Software
Recall
 Imagine you've just spent weeks, months, or even years
gathering data for a research project, and now you want  Statistics is the science of collecting, organizing,
to analyze it all to find out what it means. If the data summarizing, and analyzing information to draw
conclusions or answer questions.
seems too massive to handle, then you use computer
 It provides procedure in data collection,
software to deal with the data and make sure the results
presentation, organization, and interpretation to
are useful and informative.
have a meaningful idea.

SPSS
Data
 SPSS (Statistical Package for the Social Sciences)
 The information referred to the definition is the data.
is perhaps the most widely used statistics software
 According to the Merriam Webster dictionary, data
package within human behavior research. SPSS
are “factual information used as a basis for
offers the ability to easily compile descriptive
reasoning, discussion, or calculation”
statistics, parametric and non-parametric analyses,
as well as graphical depictions of results through
the graphical user interface (GUI). Types of Statistics
Descriptive Statistics
R It basically consists of organizing and summarizing data.
 R is a free statistical software package that is Descriptive statistics describe data through numerical
widely used across both human behavior research summaries, tables, and graphs.
and in other fields. While R is a very powerful Examples:
software, it also has a steep learning curve, 1. The average score of a volleyball player for the past 10
requiring a certain degree of coding. games
2. Birth rate in rural areas in the Philippines
3. Enrollment record of all colleges in BSU – TNEU Lipa  Quantitative variables or numerical variables are
Campus variables that take on numerical values representing
an amount or quantity. These numerical values
Inferential Statistics should answer the question how much or how
It is the logical process that involves generalizing many.
from a sample to the population from which the sample was  Some examples of qualitative variables are height,
selected and assessing the reliability of such generalizations. weight, distance, salary, etc.
It is also called as statistical inference or inductive statistics.  Variables can also be classified into two according
Examples: to purpose whether experimental or mathematical.

1. A car manufacturer wishes to estimate the average


lifetime of batteries by testing a sample of 50 batteries. Experimental Classification
2. The political views of the youth in the urban areas with
respect to inflation rate in Asia  Independent variables or explanatory variables
3. A campaign manager analyzes the effect of TV ads on the are variables controlled by the experimenter or
promotion of a presidential candidate researcher, and expected to have an effect on the
behavior of the subjects.
 Dependent variables or outcome variables
Basic Terminologies in Statistics
measure the behavior of subjects and expected to
be influenced by the independent variable.
 A population consists of all the members of the o Example:
group about which you want to draw a conclusion, o For instance, to predict the value of
while sample is a portion or part of the population fertilizer on the growth of plants, the
of interest selected for analysis. dependent variable is the growth of plants
 A parameter is a numerical index describing a while the independent variable is the
characteristic of a population while a statistic is a amount of fertilizer used.
numerical index describing a characteristic of a
sample.
Mathematical Classification
 Discrete variables are quantitative variables that
Sources of Data are either a finite number of possible values or a
countable number of possible values. These are
 Primary data are data that come from an original variables that are countable.
source, and are intended to answer a specific Some examples of these variable are number of cars,
research question. This can be taken by interview, number of siblings, etc.
mail-in questionnaire, survey or experimentation.  Continuous variables are quantitative variables
 Secondary data are data taken from previously that have an infinite number of possible values that
recorded data, such as information in previously are not countable. These are variables that are no
conducted research, financial statements, business longer countable but are measurable.
periodicals, and government reports. It can also be
Some examples of these variables are height, weight,
taken electronically, for instance via internet
volume, etc.
websites, etc.
 A constant is a characteristic of objects, people, or
events that does not vary. For example, the Level of Measurement of Variables
temperature at which water boils (100 degree  Nominal Level is the first level of measurement and
Celsius) is a constant. it is characterized by data that consist of names,
 A variable is a characteristic of objects, people, or labels or categories only. Data cannot be arranged
events that can take different values. It can vary in in ordering scheme. Nominal scales have no
quantity like weight of people, or in quality like numerical value.
hair color of people. Some examples of nominal level variables are
- Sex (male or female)
Two Types of Variables - Type of School (public or private)
- Eye Color (blue, green, brown).
 Qualitative variables or categorical variables are
variables that yield categorical responses. These are  Ordinal Level involves data that may be arranged
words or codes that represent class or category. in some order, but differences between data values
 Some examples of qualitative variables are eye either cannot be determined or meaningless. An
color, sex, occupation, student number, etc. ordinal scale not only classifies subjects but also
ranks them in terms of the degree to which they existing data that were originally collected for the
possess a characteristic of interest. purpose of the study.
Some examples of ordinal level variables are  Questions can either be:
- Highest Educational Attainment (elementary, high school, o An open-ended question is a type of
bachelor, masteral, doctoral) question that does not include response
- Rank of military officer (lieutenant, captain, major, categories. This type of question is usually
colonel). appropriate for collecting subjective data.
o A closed-ended question is a type of
question that includes a list of response
 Interval Level is a measurement level that specifies
categories from which the respondent will
the distances between each interval on the scale.
select his answer. This type of question is
Variables of this level have no absolute zero. This
usually appropriate for collecting
means that a value of zero does not mean the
objective data.
absence of the quantity.
3. Focus Group – It is a group interview of
Some examples of interval level variables are approximately six to twelve people who share
- Temperature on Fahrenheit/Celsius thermometer similar characteristics or common interests. A
- IQ (e.g., high IQ vs. average IQ vs. low IQ), facilitator guides the group based on a
predetermined set of topics.
 Ratio Level represents the highest, most precise, 4. Experiment – It is a method of collecting data
level of measurement. Variables of this level have where there is direct human intervention on the
absolute zero which means that a value of zero conditions that may affect the values of the variable
means the absence of the quantity. of interest.
5. Observation – It is a method of collecting data on
Some examples of ratio level variables are
the phenomenon of interest by recording the
- Height and weight
observations made about the phenomenon as it
- Time actually happens. involves collecting information
- Distance and speed without asking questions.

Important Note Secondary data can be collected by:


If the entire population is studied, then inferential statistics 1. Published report on newspaper and periodicals.
is not necessary, because descriptive statistics will provide 2. Financial Data reported in annual reports.
all the information that we need regarding the population.
3. Records maintained by the institution.
4. Internal reports of the government departments.
Data collection is the process of gathering and measuring
5. Information from official publications.
information on variables of interest, in an established
systematic fashion that enables one to answer stated
research questions, test hypotheses, and evaluate outcomes. Sample Size
 The sample size is typically denoted by n and it is
always a positive integer. No exact sample size can
Steps in Data Gathering
be mentioned here and it can vary in different
1. Set the objectives for collecting data
research settings. However, all else being equal,
2. Determine the data needed based on the set objectives. large sized sample leads to increased precision in
3. Determine the method to be used in data gathering and estimates of various properties of the population.
define the comprehensive data collection points.
4. Design data gathering forms to be used. Choosing of sample size depends on nonstatistical
5. Collect data. considerations and statistical considerations.
 Non-statistical considerations – It may include
Methods of Data Collection availability of resources, man power, budget, ethics
Primary data can be collected by: and sampling frame.
 Statistical considerations – It will include the
1. Direct personal interviews – The researcher has
desired precision of the estimate.
direct contact with the interviewee. The researcher
gathers information by asking questions to the
interviewee.
2. Indirect/Questionnaire Method – These methods
of data collection involve sourcing and accessing
Three criteria need to be specified to determine the • Sampling technique/Sampling Strategies - It is a plan you
appropriate sample size:
set forth to be sure that the sample you use in your
research study represents the population from which you
1. Level of Precision drew your sample.
 Also called sampling error, the level of precision, is • Sampling Bias - This involves problems in your
the range in which the true value of the population
sampling, which reveals that your sample is not
is estimated to be.
representative of your population.
2. Confidence Interval
 It is statistical measure of the number of times out
of Advantages of Sampling
100 that results can be expected to be within a Here are the advantages of sampling over complete
specified range. For example, a confidence interval enumeration:
of 90% means that results of an action will - Less Labor
probably meet expectations 90% of the time. - Greater Efficiency and Accuracy
3. Degree of Variability - Reduced Cost
 Depending upon the target population and attributes - Convenience
under consideration, the degree of variability varies
- Greater Speed
considerably. The more heterogeneous a population is,
- Ethical Considerations
the larger the sample size is required to get an optimum
level of precision. - Greater Scope

Basic Sampling Design be selected. For a survey using in-person interviews, the
The goal in sampling is to obtain individuals for a study in sampling frame might be a list of all street addresses.
such a way that accurate information about the population
can
be obtained.

Reason for Sampling


- Important that the individuals included in a sample
represent a cross section of individuals in the
population.
- If sample is not representative it is biased. You
cannot generalize to the population from your
statistical data.

Definitions
• Observation unit - An object on which a measurement is
taken. This is the basic unit of observation, sometimes
called an element. In studying human populations,
observation units are often individuals.
• Target population - The complete collection
of observations we want to study.
• Sampled population - The collection of all possible
observation units that might have been chosen in a
sample; the population from which the sample was taken.
• Sample - A subset of a population.
• Sampling unit - A unit that can be selected for a sample.
We may want to study individuals, but do not have a list
of all individuals in the target population. Instead,
households serve as the sampling units, and the
observation units are the individuals living in the
households.
• Sampling frame - A list, map, or other specification of
sampling units in the population from which a sample may
 Population is a group to which the results of the
study are intended to apply. A sample is a group in
a research study on which information is obtained.
One of the most important steps in the research
process is to select the sample of individuals who
will participate as a part of the study.
 Sampling refers to the process of selecting these
individuals.

Two Types of Sampling Random Sampling or Probability


Sampling
o It is a process whose members had an equal
chance of being selected from the population.
Samples are obtained using some objective chance
mechanism, thus involving randomization.
o They require the use of a complete listing of the
elements of the universe called the sampling
frame.
o The probabilities of selection are known. They are
generally referred to as random samples. They
allow drawing of valid generalizations about the
universe/population.

a. Simple Random Sampling


It is the most basic method of drawing a
probability sample which assigns equal probabilities of
selection to each possible sample. It is also a process of
selecting n sample size in the population via random
numbers or through lottery.
EXAMPLE:
Alice conducted a study to determine the prevalence of
malaria in a province. From the list of 300 health centers,
Alice obtained 100 health centers using a random number
generator. The directors of each sampled health center were
interviewed to obtain the necessary information.
b. Systematic Sampling EXAMPLE:
It is obtained by selecting every kth individual A human resource director interviews the qualified
from the population until the desired number of subjects or applicants in a supervisory position
respondents is obtained. The first individual selected c. Quota Sampling
corresponds to a random number between 1 to k. It is applied when an investigator survey collects
EXAMPLE: information from an assigned number, or quota of
Leni conducted a study to determine the prevalence of individuals from one of several sample units fulfilling
malaria in a province. From the list of all patients in the certain prescribed criteria or belonging to one stratum to one
province, Leni sampled 50 patients starting from patient stratum.
with ID number 4 and every 23rd patient thereafter, and EXAMPLE:
retrieved their medical records. When the respondents are composed of men aged over 30 or
20 people who have bought cellular phones in the last week.
c. Stratified Random Sampling It is in the interviewer’s discretion which men or cellular
It is obtained by separating the population into non- phones buyers they select.
overlapping groups called strata and then obtaining a simple
random sample from each stratum. The individuals within d. Snowball Sampling
each stratum should be homogeneous (or similar) in some It is a technique in which one or more members of
way. a population are located and used to lead the researchers to
EXAMPLE: other members of the population.
A media manager wants to determine the proportion of EXAMPLE:
Filipino households who patronize their nationwide drama To obtain a sample of homeless individuals, the researcher
program simultaneously aired on radio and shown on TV. will interview individuals on the street or at homeless
Using the sampling frame of households arranged by region, shelter.
200 households from each region were randomly selected.

e. Voluntary Sampling
d. Cluster Sampling
It is a technique when a sample is composed of
It is a process of selecting clusters from a respondents who are self-select (volunteered) into the
population which is very large or widely spread out over a study/survey. Most of the time, the respondents have a
wide geographical area strong interest in the topic of the study.
EXAMPLE: EXAMPLE:
The Fuds Administration (FA) wants to know if there are Consider a news show asks their viewers to participate in an
high levels of aflatoxin in Gagaraya’s Cracker Nut. The on-line poll. The samples are viewers who have chosen
FA head took a random sample of batches of the said themselves and not the survey administrator.
cracker nut and all bags in the chosen batches are included
in the sample. Measure of Central Tendency
- A measure of central tendency, commonly referred to as
Non-random Sampling or Non-probability Sampling
an average, is a single value that represents a data set.
o It is a sampling procedure where samples selected Its purpose is to locate the center of a data set.
in a deliberate manner with little or no attention to
randomization. Samples are obtained haphazardly,
There are three different measures of central tendency:
selected purposively or are taken as volunteers. The
mean, median, mode.
probabilities of selection are unknown. They
should not be used for statistical inference.
a. Convenience Sampling Mean
It is a process of selecting a group of individuals  The mean, or arithmetic mean, is the most frequently
who are conveniently available for a study. used measure of central tendency. It is the only
EXAMPLE: common measure in which all values play an equal
role meaning to determine its values you would need
A researcher may only include close friends and clients to be
to consider all the values of any given data set.
included in the sample population
 It is appropriate to determine the central tendency of
an interval or ratio data.
b. Purposive Sampling  The symbol , called “x bar”, is used to represent the
It is a process of selecting based from judgement to mean of a sample and the symbol μ, called “mu”, is
select a sample which the researcher believed, based on used to denote the mean of a population.
prior information, will provide the data they need.
Properties of Mean Mode
- A set of data has only one mean. The mode is the value in a data set that appears most
- Mean can be applied for interval and ratio data. frequently. Like the median and unlike the mean, the
- All values in the data set are included in computing the extreme values in a data set do not affect the mode.
mean.
- The mean is very useful in comparing two or more data sets.  A data set that has only one value that occur the
- Mean is most appropriate in symmetrical data. greatest frequency is said to be unimodal.
- Mean is affected by the extreme small or large values
(outliers) on a data set. If the data has two values with the same greatest
frequency, both values are considered the mode and the data
Mean can be computed as: set is bimodal.
𝑆𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 If a data set have more than two modes, and the data set is
MEAN = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠 said to be multimodal.

Sample Mean There are also some cases when data set values
have the same number frequency, when this occur, the data
set is said to be no mode.

Population Mean Properties of Mode


- The mode is found by locating the most frequently
occurring value.
- The mode is the easiest average to compute.
- There can be more than one mode or even no mode in any
Median given data set.
 The median is the midpoint of the data array.
- Mode is not affected by the extreme small or large values.
 When the data set is ordered whether ascending or
- Mode can be applied for nominal, ordinal, interval, and
descending, it is called data array.
ratio data.
 Median is an appropriate measure of central
tendency for data that are ordinal or above, but is
more valuable in an ordinal type of data. MEASURES OF RELATIVE
Properties of Median
POSITION
- The median is unique, there is only one median for a set of
data.
- The median is found by arranging the set of data from The measure of relative position provides information about
lowest or highest (or highest to lowest) and getting the value the position or location of particular values relative to the
of the middle observation. entire data set.
- Median is not affected by the extreme small or large
values.  Quantiles are statistics that describe various
- Median can be applied for ordinal, interval, and ratio data. subdivisions of a frequency distribution into equal
- Median is most appropriate in a skewed data. proportions.
1. Quartiles – split the data array in 4 equal parts.
2. Deciles – split the data array to 10 equal parts.
To determine the value for median in a data set with n
3. Percentiles – split the data array in 100 equal parts.
values, we need to consider two rules.

Formula: (Position)
A. If n is odd, the median is the middle-ranked value.
𝑛𝑘
QUARTILE PERCENTILES
𝑛𝑘
+ 0.5 + 0.5
B. If n is even, the median is the average of the two

𝑛+1 4 100
middle ranked values. Q k= PK=

2
Median (Rank Value) =

𝑛𝑘
+ 0.5
DECLES

10
D K=
If the resulting positioning is an INTEGER, then the
particular numerical observation to that point is chosen for
the quartile.

If the resulting positioning is NOT AN INTEGER, then use


interpolation.

INTERPOLATION (Formula [Value])


Q=Lower Value+Decimal(Upper Value−Lower Value)

Measure of Dispersion
 Spread of data values from the average
 Dispersion is the difference between the actual value
and the average value.

Range
- Difference of highest and lowest value.
- (low value – lesser the variability or malapit sa mean)

Standard Deviation
- Describes the difference between data values and
mean
-
- calculated as the square root of variance.

Variance
- Squared measure of standard deviation.

Whatever you do, work at it with all your heart, as working for the Lord, not for human masters.

- Colossians 3:23

You might also like