STATISTICS PROBABILITY Final
STATISTICS PROBABILITY Final
Statistics allows us to derive knowledge from large datasets and this knowledge can
then be used to make predictions, decisions, classifications etc.
Sales Weather
Medical Research Stock Market
Projection Forecasting
Population is the entire dataset such as the whole population of a country, Sample is subset of that
population which is analyzed to make inferences
Stratified Sampling is the process of dividing your samples into layers or groups and
then performing random sampling for each group
Central Tendency is used to indicate where does the middle or center of the
distribution of our data lies
Mean is the average of the data. In simpler terms it’s the sum of values divided by total
number of values. It’s represented by Greek letter Sigma
Mean
Mode is used to indicate the most frequent data point, in other words the one which
occurs most number of times
Mode
Median is the middle of the data. If the data is arranged in ascending order then the
data element which occurs right at the center is the median
Median
Variation in statistics is used to show how data is dispersed, or spread out. Several
measures of variation are used in statistics.
Range is the difference between the highest and the lowest values in our dataset. Range
tells us the distance between the lowest and highest values in our data
Percentiles are scores that are used to describe a value below which some Observations fall.
E.g.: If X is at 70th Percentile it mean 70% of other data points from our sample are below X
Quartiles are used to break the data into 4 parts so as to better find the spread of data in a
way that is less influenced by outliers.
Quartiles are expressed in percentiles. 1st Quartile is 25th Percentile, 2nd Quartile is 50th
Percentile (Median) and 3rd Quartile is 75th Percentile
Interquartile Range (IQR) is the difference between the lower and upper quartile. This gives
us a better idea of the range of data.
Standard Variance measures how far a set of numbers are spread out from their average
value.
Standard Deviation is used to express the magnitude by which the members of a group differ
from the mean value for the group.
Correlation is a term that is a measure of the strength of a linear relationship between two
quantitative variables
Positive Correlation is a term that is used to describe a positive linear relationship between
two quantitative variables
Positive
Correlation
No
Correlation
Negative Correlation is a term that is used to describe the strength of a Negative linear
relationship between two quantitative variables
Negative
Correlation
A statistical table may be regarded as representing a subject and predicate. The meaning of
each number is indicated by the headings of the corresponding row and column.
Types Of Charts
1.Bar chart
2.Histogram
3.Pie chart
4.Box chart
5.Line Graph
6.Scatter plot
Bar charts are among the most frequently used chart types. As the name suggests
a bar chart is composed of a series of bars illustrating a variable’s development.
Given that bar charts are such a common chart type, people are generally familiar
with them and can understand them easily
Box plot, also called the box-and-whisker plot: a way to show the distribution of
values based on the five-number summary: minimum, first quartile, median, third
quartile, and maximum.
A pie chart is a circular graph divided into slices. The larger a slice is the bigger
portion of the total quantity it represents.
A line chart is, as one can imagine, a line or multiple lines showing how single, or
multiple variables develop over time. It is a great tool because we can easily
highlight the magnitude of change of one or more variables over a period.
A scatter plot is a type of chart that is often used in the fields of statistics and data
science. It consists of multiple data points plotted across two axes. Each variable
depicted in a scatter plot would have multiple observations. If a scatter plot includes
more than two variables, then we would use different colors to signify that.
Hypothesis testing refers to the process of making inferences or educated guesses about a
particular parameter. This can either be done using statistics and sample data, or it can be
done on the basis of an uncontrolled observational study.
Estimation, in statistics, any of numerous procedures used to calculate the value of some
property of a population from observations of a sample drawn from the population.
The goodness-of-fit test is a statistical hypothesis test to see how well sample data fit a
distribution from a population.
his test shows if your sample data represents the data you would expect to find in the actual
population
The goodness-of-fit test is a statistical hypothesis test to see how well sample data fit a
distribution from a population.
his test shows if your sample data represents the data you would expect to find in the actual
population
Goodness-of-fit establishes the discrepancy between the observed values and those that
would be expected of the model in a normal distribution case.
Probability can be defined as the ratio of the number of favorable outcomes to the total
number of outcomes of an event.
Probability can be defined as the ratio of the number of favorable outcomes to the total
number of outcomes of an event.
Probability can be defined as the ratio of the number of favorable outcomes to the total
number of outcomes of an event.
For an experiment having 'n' number of outcomes, the number of favorable outcomes
can be denoted by x. The formula to calculate the probability of an event is as follows.
[email protected]
[email protected]