We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 47
What Is Variance?
• The variance is a measure of variability. It is
calculated by taking the average of squared deviations from the mean. • Variance tells you the degree of spread in your data set. • The more spread the data, the larger the variance is in relation to the mean. standard deviation
• The standard deviation is derived from variance
and tells you, on average, how far each value lies from the mean. • It’s the square root of variance. • Both measures reflect variability in a distribution, but their units differ: • Standard deviation is expressed in the same units as the original values (e.g., meters). • Variance is expressed in much larger units (e.g., meters squared) • Since the units of variance are much larger than those of a typical value of a data set, it’s harder to interpret the variance number intuitively. That’s why standard deviation is often preferred as a main measure of variability. • However, the variance is more informative about variability than the standard deviation, and it’s used in making statistical inferences. steps for calculating the sample standard deviation: • Calculate the mean (simple average of the numbers). • For each number: subtract the mean. • Square the result. • Add up all of the squared results. • Divide this sum by one less than the number of data points (N - 1). • This gives you the sample variance. • Take the square root of this value to obtain the sample standard deviation. • Population Standard Deviation • The population standard deviation, the standard definition of σ, is used when an entire population can be measured, and is the square root of the variance of a given data set. • In cases where every member of a population can be sampled, the following equation can be used to find the standard deviation of the entire population:
• Where xi is an individual value
μ is the mean/expected value N is the total number of values • i.e. for the data set 1, 3, 4, 7, 8, i=1 would be 1, i=2 would be 3, and so on. Hence the summation notation simply means to perform the operation of (xi - μ)2 on each value through N, which in this case is 5 since there are 5 values in this data set. • EX: μ = (1+3+4+7+8) / 5 = 4.6 σ = √[(1 - 4.6)2 + (3 - 4.6)2 + ... + (8 - 4.6)2)]/5 σ = √(12.96 + 2.56 + 0.36 + 5.76 + 11.56)/5 = 2.577 Sample Standard Deviation
• In many cases, it is not possible to sample every member within a
population, requiring that the above equation be modified so that the standard deviation can be measured through a random sample of the population being studied. A common estimator for σ is the sample standard deviation, typically denoted by s.
Where xi is one sample value
x̄ is the sample mean N is the sample size • https://www.thoughtco.com/sample- standard-deviation-problem-609528 • https://byjus.com/maths/standard-deviation/ Applications of Standard Deviation
• Standard deviation is widely used in experimental and
industrial settings to test models against real-world data. • An example of this in industrial applications is quality control for some products. • Standard deviation can be used to calculate a minimum and maximum value within which some aspect of the product should fall some high percentage of the time. • In cases where values fall outside the calculated range, it may be necessary to make changes to the production process to ensure quality control. • Standard deviation is also used in weather to determine differences in regional climate. • Imagine two cities, one on the coast and one deep inland, that have the same mean temperature of 75°F. While this may prompt the belief that the temperatures of these two cities are virtually the same, the reality could be masked if only the mean is addressed and the standard deviation ignored. • Coastal cities tend to have far more stable temperatures due to regulation by large bodies of water, since water has a higher heat capacity than land; essentially, this makes water far less susceptible to changes in temperature, and coastal areas remain warmer in winter, and cooler in summer due to the amount of energy required to change the temperature of the water. Hence, while the coastal city may have temperature ranges between 60°F and 85°F over a given period of time to result in a mean of 75°F, an inland city could have temperatures ranging from 30°F to 110°F to result in the same mean. • Another area in which standard deviation is largely used is finance, where it is often used to measure the associated risk in price fluctuations of some asset or portfolio of assets. • The use of standard deviation in these cases provides an estimate of the uncertainty of future returns on a given investment. • For example, in comparing stock A that has an average return of 7% with a standard deviation of 10% against stock B, that has the same average return but a standard deviation of 50%, the first stock would clearly be the safer option, since the standard deviation of stock B is significantly larger, for the exact same return. That is not to say that stock A is definitively a better investment option in this scenario, since standard deviation can skew the mean in either direction. While Stock A has a higher probability of an average return closer to 7%, Stock B can potentially provide a significantly larger return (or loss). Types of data Chi-squared test • A chi-square (χ2) statistic is a test that measures how a model compares to actual observed data. • The data used in calculating a chi-square statistic must be random, raw, mutually exclusive, drawn from independent variables, and drawn from a large enough sample. For example, the results of tossing a fair coin meet these criteria. • Chi-square tests are often used to test hypotheses. The chi-square statistic compares the size of any discrepancies between the expected results and the actual results, given the size of the sample and the number of variables in the relationship. • For these tests, degrees of freedom are used to determine if a certain null hypothesis can be rejected based on the total number of variables and samples within the experiment. • As with any statistic, the larger the sample size, the more reliable the results. • Degrees of Freedom • Degrees of freedom are the number of independent variables that can be estimated in a statistical analysis. These value of these variables are without constraint, although the values do impost restrictions on other variables if the data set is to comply with estimate parameters. What Does a Chi-Square Statistic Tell You? • There are two main kinds of chi-square tests: 1. The test of independence, which asks a question of relationship, such as, "Is there a relationship between student gender and course choice?“ 2. Goodness-of-Fit χ2 provides a way to test how well a sample of data matches the (known or assumed) characteristics of the larger population that the sample is intended to represent. This is known as goodness of fit. • When considering student gender and course choice, a χ2 test for independence could be used. To do this test, the researcher would collect data on the two chosen variables (gender and courses picked) and then compare the frequencies at which male and female students select among the offered classes using the formula given above and a χ2 statistical table. • If there is no relationship between gender and course selection (that is, if they are independent), then the actual frequencies at which male and female students select each offered course should be expected to be approximately equal, or conversely, the proportion of male and female students in any selected course should be approximately equal to the proportion of male and female students in the sample. • A χ2 test for independence can tell us how likely it is that random chance can explain any observed difference between the actual frequencies in the data and these theoretical expectations. Goodness-of-Fit
• For example, consider an imaginary coin with exactly a 50/50
chance of landing heads or tails and a real coin that you toss 100 times. If this coin is fair, then it will also have an equal probability of landing on either side, and the expected result of tossing the coin 100 times is that heads will come up 50 times and tails will come up 50 times.4 • In this case, χ2 can tell us how well the actual results of 100 coin flips compare to the theoretical model that a fair coin will give 50/50 results. The actual toss could come up 50/50, or 60/40, or even 90/10. The farther away the actual results of the 100 tosses is from 50/50, the less good the fit of this set of tosses is to the theoretical expectation of 50/50, and the more likely we might conclude that this coin is not actually a fair coin. When to Use a Chi-Square Test
• A chi-square test is used to help determine if
observed results are in line with expected results, and to rule out that observations are due to chance. • A chi-square test is appropriate for this when the data being analyzed are from a random sample, and when the variable in question is a categorical variable. • A categorical variable is one that consists of selections such as type of car, race, educational attainment, male or female, or how much somebody likes a political candidate (from • These types of data are often collected via survey responses or questionnaires. Therefore, chi- square analysis is often most useful in analyzing this type of data. very much to very little). How to Perform a Chi-Square Test
• These are the basic steps whether you are performing a
goodness of fit test or a test of independence: 1. Create a table of the observed and expected frequencies; 2. Use the formula to calculate the chi-square value; 3. Find the critical chi-square value using a chi-square value table or statistical software; 4. Determine whether the chi-square value or the critical value is the larger of the two; 5. Reject or accept the null hypothesis. Example Problem1 Q.1 Ans Q.2 Ans • Step 1: Formulate the hypotheses Null Hypothesis: H0: There is no significant association between students’ educational level and their preference for online or face-to-face instruction. or H0: There is no difference in the distribution of instructional preferences between undergraduate and graduate students. • Alternative Hypothesis: Ha: There is a significant association between students’ educational level and their preference for online or face-to-face instruction. or Ha: There is a significant difference in the distribution of instructional preferences between undergraduate and graduate students
Victor A. Skormin (Auth.) - Introduction To Process Control - Analysis, Mathematical Modeling, Control and Optimization (2016, Springer International Publishing)