0% found this document useful (0 votes)
12 views5 pages

STS

Uploaded by

Hazel Dimaano
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views5 pages

STS

Uploaded by

Hazel Dimaano
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Lesson 1.

1 SAS

Overview of Statistics - It is a statistical analysis platform that offers options to use either the GUI, or to create
scripts for more advanced analyses. It is a premium solution that is widely used in
Statistics business, healthcare, and human behavior research alike.
- It provides procedure in data collection, presentation, organization, and interpretation to GraphPad Prism
have a meaningful idea.
- It is premium software primarily used within statistics related to biology, but offers a range
Importance of Statistics of capabilities that can be used across various fields.
Statistics plays a major role in many aspects of our lives. Minitab
 It is used in sports, for example, to help a general manager decide which player might be - It offers a range of both basic and fairly advanced statistical tools for data analysis.
the best fit for a team.
 It is used in politics to help candidates understand how the public feels about various Excel
policies.
 It is used in medicine to help determine the effectiveness of new drugs. - It offers a wide variety of tools for data visualization and simple statistics. It is simple to
 Statistical research in business enables managers to analyze past performance, predict generate summary metrics and customizable graphics and figures, making it a usable tool
future business practices and lead organizations effectively. Statistics can describe for many who want to see the basics of their data.
markets, inform advertising, set prices and respond to changes in consumer demand.
Basic Terminologies in Statistics
 Statistics, being quantitative tools widely used in the areas of economics and finance,
could help to shape effective monetary and fiscal policies and to develop pricing models population - consists of all the members of the group about which you want to draw a conclusion
for financial assets such as equities, bonds, currencies, and derivative securities.
sample - is a portion or part of the population of interest selected for analysis.
Computer Software
Experimental Classification
Imagine you've just spent weeks, months, or even years gathering data for a research project, and
now you want to analyze it all to find out what it means. If the data seems too massive to handle, Dependent variables or outcome variables
then you use computer software to deal with the data and make sure the results are useful and
- measure the behavior of subjects and expected to be influenced by the independent
informative.
variable.
SPSS (Statistical Package for the Social Sciences) - Example: For instance, to predict the value of fertilizer on the growth of plants, the
dependent variable is the growth of plants while the independent variable is the amount of
- It is perhaps the most widely used statistics software package within human behavior fertilizer used.
research. SPSS offers the ability to easily compile descriptive statistics, parametric and
non-parametric analyses, as well as graphical depictions of results through the graphical Mathematical Classification
user interface (GUI).
Continuous variables
R
- are quantitative variables that have an infinite number of possible values that are not
- It is a free statistical software package that is widely used across both human behavior countable. These are variables that are no longer countable but are measurable.
research and in other fields. While R is a very powerful software, it also has a steep - Some examples of these variables are height, weight, volume, etc.
learning curve, requiring a certain degree of coding.
Process of Statistics
MatLab
1. Identify the research objective
- It is an analytical platform and programming language that is widely used by engineers
A researcher must determine the question(s) he or she wants answered. The question(s) must
and scientists. As with R, the learning path is steep, and you will be required to create your
clearly identify the population that is to be studied. Identify the research objective.
own code at some point.
2. Collect the information needed to answer the questions.
Conducting research on an entire population is often difficult and expensive, so we typically look at Primary data can be collected by:
a sample. This step is vital to the statistical process, because if the data are not collected correctly,
the conclusions drawn are meaningless. Do not overlook the importance of appropriate data 1. Direct personal interviews – The researcher has direct contact with the interviewee. The
collection. researcher gathers information by asking questions to the interviewee.
2. Indirect/Questionnaire Method – This method of data collection involves sourcing and
3. Organize and summarize the information. accessing existing data that were originally collected for the purpose of the study.

Descriptive statistics allow the researcher to obtain an overview of the data and can help Questions can either be:
determine the type of statistical methods the researcher should use.
An open-ended question is a type of question that does not include response categories. This
4. Draw conclusion from the information. type of question is usually appropriate for collecting subjective data.

In this step the information collected from the sample is generalized to the population. Inferential A closed-ended question is a type of question that includes a list of response categories from
statistics uses methods that takes results obtained from a sample, extends them to the population, which the respondent will select his answer. This type of question is usually appropriate for
and measures the reliability of the result. collecting objective data.

Important Note: If the entire population is studied, then inferential statistics is not necessary, 3. Focus Group – It is a group interview of approximately six to twelve people who share
because descriptive statistics will provide all the information that we need regarding the similar characteristics or common interests. A facilitator guides the group based on a
population. predetermined set of topics.
4. Experiment – It is a method of collecting data where there is direct human intervention on
the conditions that may affect the values of the variable of interest.
5. Observation – It is a method of collecting data on the phenomenon of interest by
Lesson 1.2
recording the observations made about the phenomenon as it actually happens. involves
Data collection collecting information without asking questions.

- It is the process of gathering and measuring information on variables of interest, in an Secondary data can be collected by:
established systematic fashion that enables one to answer stated research questions, test
1. Published report on newspaper and periodicals.
hypotheses, and evaluate outcomes.
2. Financial Data reported in annual reports.
Consequences of Improperly Collected Data 3. Records maintained by the institution.
4. Internal reports of the government departments.
 Inability to answer research questions accurately. 5. Information from official publications.
 Inability to repeat and validate the study.
 Distorted findings resulting in wasted resources. Take Note:
 Misleading other researchers to pursue fruitless avenues of investigation.
 Always investigate the validity and reliability of the data by examining the collection
 Compromising decisions for public policy.
method employed by your source.
 Causing harm to human participants and animal subjects.
 Do not use inappropriate data for your research.
Steps in Data Gathering
sample size
1. Set the objectives for collecting data
- It is typically denoted by n and it is always a positive integer.
2. Determine the data needed based on the set objectives.
- No exact sample size can be mentioned here and it can vary in different research settings.
3. Determine the method to be used in data gathering and define the comprehensive data
- However, all else being equal, large sized sample leads to increased precision in estimates
collection points.
of various properties of the population.
4. Design data gathering forms to be used.
5. Collect data.

Take Note:
Methods of Data Collection  Representativeness, not size, is the more important consideration.
 Use no less than 30 subjects if possible. measure of central tendency
 If you use complex statistics, you may need a minimum of 100 or more in your sample
(varies with method). - It is commonly referred to as an average, is a single value that represents a data set.
- Its purpose is to locate the center of a data set.
Choosing of sample size depends on nonstatistical considerations and statistical considerations.
Three Different Measures of Central Tendency
 Non-statistical considerations – It may include availability of resources, man power,
budget, ethics and sampling frame. a. Mean or arithmetic mean
 Statistical considerations – It will include the desired precision of the estimate. - It is the most frequently used measure of central tendency.
- It is the only common measure in which all values play an equal role meaning to determine
Three criteria need to be specified to determine the appropriate sample size: its values you would need to consider all the values of any given data set.
- It is appropriate to determine the central tendency of an interval or ratio data.
1. Level of Precision – Also called sampling error, the level of precision, is the range in
which the true value of the population is estimated to be. Properties of Mean
2. Confidence Interval – It is statistical measure of the number of times out of 100 that
results can be expected to be within a specified range. For example, a confidence interval  A set of data has only one mean.
of 90% means that results of an action will probably meet expectations 90% of the time.  Mean can be applied for interval and ratio data.
3. Degree of Variability – Depending upon the target population and attributes under  All values in the data set are included in computing the mean.
consideration, the degree of variability varies considerably. The more heterogeneous a  The mean is very useful in comparing two or more data sets.
population is, the larger the sample size is required to get an optimum level of precision.  Mean is most appropriate in symmetrical data.
 Mean is affected by the extreme small or large values (outliers) on a data set.
Reason for Sampling b. median
- It is the midpoint of the data array.
 Important that the individuals included in a sample represent a cross section of individuals - When the data set is ordered whether ascending or descending, it is called data array.
in the population. - It is an appropriate measure of central tendency for data that are ordinal or above, but is
 If sample is not representative it is biased. You cannot generalize to the population from more valuable in an ordinal type of data.
your statistical data.
Properties of Median
Definitions
 The median is unique, there is only one median for a set of data.
 Observation unit - An object on which a measurement is taken. This is the basic unit of  The median is found by arranging the set of data from lowest or highest (or highest to
observation, sometimes called an element. In studying human populations, observation lowest) and getting the value of the middle observation.
units are often individuals.  Median is not affected by the extreme small or large values.
 Target population - The complete collection of observations we want to study.  Median can be applied for ordinal, interval, and ratio data.
 Sampled population - The collection of all possible observation units that might have  Median is most appropriate in a skewed data.
been chosen in a sample; the population from which the sample was taken. c. Mode
 Sample - A subset of a population. - It is the value in a data set that appears most frequently. Like the median and unlike the
 Sampling unit - A unit that can be selected for a sample. We may want to study mean, the extreme values in a data set do not affect the mode.
individuals, but do not have a list of all individuals in the target population. Instead, - A data set that has only one value that occur the greatest frequency is said to be
households serve as the sampling units, and the observation units are the individuals living unimodal.
in the households. - If the data has two values with the same greatest frequency, both values are considered
 Sampling technique/Sampling Strategies - It is a plan you set forth to be sure that the the mode and the data set is bimodal.
sample you use in your research study represents the population from which you drew - If a data set have more than two modes, and the data set is said to be multimodal.
your sample. - There are also some cases when data set values have the same number frequency, when
 Sampling Bias - This involves problems in your sampling, which reveals that your sample this occur, the data set is said to be no mode.
is not representative of your population.
Properties of Mode
Lesson 2.1
 The mode is found by locating the most frequently occurring value.
Measures of Central Tendency  The mode is the easiest average to compute.
 There can be more than one mode or even no mode in any given data set.
 Mode is not affected by the extreme small or large values.
 Mode can be applied for nominal, ordinal, interval, and ratio data. Lesson 2.3

Types of Distribution Measures of Dispersion

1. symmetric distribution Dispersion


- the data values are evenly distributed on both sides of the mean. Also, the distribution is
- It is the difference between the actual value and the average value.
unimodal and the mean, median, and mode are similar and are at the center of the
distribution. Range
2. positively skewed or right- skewed distribution
- most of the values in the data fall to the left of the mean and group at the lower end of the - It is the simplest and easiest way to determine measure of dispersion.
distribution; the tail is to the right. In addition, the mean is to the right of the median, and
Advantages:
the mode is to the left of the median.
3. negatively skewed or left- skewed distribution o Easy to compute
- It is when the mass of the data fall to the right of the mean and group at the upper end of o Easy to understand
the distribution, with the tail to the left. In addition, the mean is to the left of the median,
and the mode is to the right of the median. Disadvantages:

o Can be distorted by a single extreme value


o Only two values are used in the calculation
Lesson 2.2
Standard deviation
Measures of Relative Position
- It is one of the most widely used measures of dispersion
measure of relative position
- It is calculated as the square root of variance. It provides a good indication of volatility. It
- It provides information about the position or location of particular values relative to the measures how widely values are dispersed from the average.
entire data set.
Variance
Quantiles
- It is the mathematical expectation of the average squared deviations from the mean.
- are statistics that describe various subdivisions of a frequency distribution into equal
Shape of Distribution
proportions.
Skewness
To find the Quartile,
- It is the degree of distortion from the symmetrical bell curve or the normal distribution.
 If the resulting positioning is an INTEGER, then the particular numerical observation to
- It measures the lack of symmetry in data distribution.
that point is chosen for the quartile.
a. Symmetrical Distribution
 If the resulting positioning is NOT AN INTEGER, then use interpolation.
- It will have a skewness of 0. So, a normal distribution will have a skewness of 0.
To find the Decile, - The Mean, Median and Mode are equal to each other and the ordinate at mean divides
the distribution into two equal parts.
 If the resulting positioning is an INTEGER, then the particular numerical observation to
that point is chosen for the quartile. b. Negatively Skewed / Skewed Left (Skewness<0)
 If the resulting positioning is NOT AN INTEGER, then use interpolation. - It is when the tail of the left side of the distribution is longer or fatter than the tail on
the right side.
To find the Percentile, - The mean and median will be less than the mode.
 If the resulting positioning is an INTEGER, then the particular numerical observation to c. Positively Skewed / Skewed Right (Skewness>0)
that point is chosen for the quartile. - It means when the tail on the right side of the distribution is longer or fatter. The mean
 If the resulting positioning is NOT AN INTEGER, then use interpolation. and median will be greater than the mode.
Kurtosis

- It is a measure of the combined sizes of the two tails. It tells you how tall and sharp the
central peak is, relative to a standard bell curve.
- It is actually the measure of outliers present in the distribution. The outliers in a sample,
therefore, have even more effect on the kurtosis than they do on the skewness.
 Higher kurtosis means more of the variance is the result of infrequent extreme deviations,
as opposed to frequent modestly sized deviations. In other words, it is the tails that mostly
account for kurtosis, not the central peak.
 The kurtosis decreases as the tails become lighter. It increases as the tails become
heavier.
a. Mesokurtic (Kurtosis=3)
- This distribution has kurtosis statistic similar to that of the normal distribution.
b. Leptokurtic (Kurtosis>3)
- Peak is higher and sharper than normal distribution, which means that data are heavy-
tailed or profusion of outliers.
c. Platykurtic (Kurtosis<3)
- Compared to a normal distribution, its tails are shorter and thinner, and often its central
peak is lower and broader.

You might also like