Midterms Matm
Midterms Matm
Data Financial
• In statistics is always a result of experiment, • The financial market completely relies on the
observation, investigation and other means and financial market.
often appears as a numerical figure and then
evaluated to make it into useful knowledge. Sports
• For most people, “statistics” is a scary thing that • There is lots of uses of statistics in sports. Every
must be avoided as much as possible because sports require statistics to make the sport more
they think that it is a collection of numbers and effective.
vague formulas.
DIVISION OF STATISTICS
Top 10 Uses of Statistics in Our Day to Day Life
1. Predictions Types of Statistics
2. Testing 1. Descriptive Statistics
3. Forecasts 2. Inferential Statistics
4. Preparedness
5. Predicting Statistics
6. Political • The branch of mathematics that transform data
7. Insurance into useful information for decision makers.
8. Consumer
9. Financial Descriptive Statistics
10. Sports • Collecting, summarizing, and describing data.
• Deals with the collection and presentation of data
Predictions and collection of summarizing values to describe
• The figures help us make predictions about its group characteristics.
something that is going to happen in the future. • The most common summarizing values are the
measure of central tendency and variation.
Testing
• Quality testing is another important use of Inferential Statistics
statistics in every area of life. • Drawing conclusions and/or making decisions
concerning a population based only on sample
Forecasts data.
• Have you ever seen weather forecasting? Do you • Deals with the predictions and inferences based
know how the government. on the analysis and interpretation of the results of
the information gathered by the statistician.
Preparedness • Some of the common statistical tools of inferential
• Statistics is also helpful in emergency statistics are the t-test, z-test, analysis of variance,
preparedness. chi-square, and pearson r.
Major Groupings of Data Scales of Measurement
1. Quantitative Data (Numerical) • Subdivided into four categories and upon drawing
a. Discrete inferences on a random sample, the type of
b. Continuous measurement scale ust be carefully chosen.
i. Interval Scale Data
ii. Ratio Scale Data
2. Qualitative Data (Categorial)
a. Nominal Scale Data (Named)
b. Ordinal Scale Data (Ordered)
i. Nominal with Order
ii. Nominal without Order
iii. Dichotomous
Quantitative Data
• Data that can be measured with numbers, such as
distance, duration, length, revenue, speed.
Discrete
• Whole numbers (integers) that cannot be divided,
such as the number of eggs, number of wins, or
number of dogs. You can’t have 3.2 dogs.
• This data is binary. Nominal
• Classifies elements into two more categories or
Continuous classes, the numbers indicating that the elements
• Numbers that can be broken into finer and finer are different but not according to order or
units (usually within a range). Weight, height, magnitude.
temperature are all examples (3.4981637081 lbs).
Nominal Scale
Qualitative Data • Assign responses to different categories.
• Non-numerical data that is usually textual and • No numerical differences between categories.
descriptive, like “mostly satisfied,” “brown eyes,” • i.e. gender, marital status, state of residence,
“female,” “yes/no,” etc. college major, SSN, zip code, student id.
Variable Ordinal
• A numerical characteristics or attribute associated • A scale that ranks individual in terms of the
with the population being studied. degree to which they possess a characteristic of
• They are further classified as categorical or interest.
qualitative and numerical or quantitative.
Ordinal Scale
Discrete Variables • Set of categories that are ordered from least to
• Values obtained by counting. most.
• i.e. number of pages in a book, shoe size, number • Don’t know numerical distance from each
of people in a race. category to the next.
• i.e. miss america results, miliary rank, letter grade
Continuous Variables in class, degrees held, medical condition, rank
• Values obtained by measuring, all of which cannot order of your preference from 1 to 4.
be put into a list because they can have any value
in some interval of real numbers. Interval
• i.e. legth of a film, temperature, time taken to run a • In addition to ordering scores from high to low, it
race. also establishes a uniform unit in the scale so that
any equal distance between two scores is of euqal
magnitude. There is no absolute zero in this
scale.
Interval Scale PARAMETER AND STATISTICS
• Scale with values, and there is the same
numerical distance between each value. Parameter
• This scale has an arbitrary zero point (no true • A numerical measure that describes a
meaningful zero point). characteristics of the population.
• i.e. current temperature, many behavioral science
questionnaires, IQ. Statistic
• Numerical measure that is used to describe a
Ratio characteristic of a sample.
• In addition to being an interval scale, it also has an
absolute zero.
Ratio Scale
• Scale with scores where there is the same
numerical distance between each score.
• The scale has a true, meaningful zero point that
anchors comparisons, such as “Maribel’s income
is 35% more than Susan’s.”
• i.e. weight of a package of candy, number of times
you return to a restaurant after visiting it the first
time, amount of money in your checking account, MEASURES OF CENTRAL TENDENCY
number of questions correct on a quiz, distance
from San Antonio to Laredo. Measures of Centre or Central Location / Central
Tendency
• To describe a whole set of data with a single value
that represents the middle or centre of its
distribution.
• To put in other words, it a way to describe the
center of a data set.
• It is to let us know what is normal or average for
a set of data.
• It also condenses the data set down to one
representative value, which is useful when you are
working with large amounts of data.
POPULATION AND SAMPLE
Mode
Population • Most frequent data point.
• Defined as groups of people, animals, places, • Mode exists as a data point.
things or ideas to which any conclusions based • Unaffected by extreme values.
on characteristics of a sample will be applied.
• Useful for qualitative data.
• May have more than 1 value.
Sample
• A subgroup of the population.
Median
• Value that divides ranked data points into
halves: 50% larger than it, 50% smaller.
• May not exist as a data point in the set.
• Influenced by position of items, but not their
values.
Mean MEASURES OF OTHER POSITION
• Most stable measure.
• Affected by extreme values. Quantiles
• May not exist as a data point in the set. • It is a score distribution where the scores are
divided into different equal parts.
Mean
• The mean by definition is the sum of all the values Three (3) Kinds of Quantiles
in the observation or a dataset divided by the total 1. Quartile
number of observations. This is also known as the 2. Decile
arithmetic average. 3. Percentile
• The mean can be used for both continuous and
discrete numeric data as well as for categorical Quartile
data, as the values cannot be summed. • A measure of position that divides the ordered
• As the mean includes every value in the observations or score distribution into 4 equal
distribution the mean is influenced by outliers parts.
(which are numbers that are much higher or much
lower than the rest of the data set) and skewed Decile
(asymmetric) distributions. • A measure of position that divides the ordered
• This measurement is applicable to use for ratio and observations or score distribution into 10 equal
interval data. parts.
Median Percentile
• The median is considered as the physical middle • A measure of position that divides the ordered
point in a distribution because it is located at the observations or score distribution into 100 equal
center position when the values are arranged in parts.
ascending or descending order, which in turn
divides the distribution in half (there are 50% of
observations on either side of the median value).
• If a distribution has an odd number of
observations, the median value is the middle
value.
• If it is an even number, the median value is the
mean or average of the two middle values.
Regression Line
• Calculates the “best-fit” line for a certain set of
data.
• The regression line makes the sum of the squares
of the residuals smaller than for any other line.
Regression Equation
• Regression equation describes the regression line
mathematically.
Pearson Correlation Coefficient
• Pearson correlation coefficient, r, indicates how far
away all these data points are to this line of best fit
(i.e., how well the data points fit this new
model/line of best fit).
Testing of Hypothesis
• In statistics is a way for you to test the results of a
survey or experiment to see if you have meaningful
results.
Regression Analysis
Regression Analysis
• Technique concerned with predicting some
variables by knowing others
• The process of predicting variable Y using variable
X.
Regression
• Uses a variable (x) to predict some outcome
variable (y)
• Tells you how values in y change as a function of
changes in values of x.