0% found this document useful (0 votes)
4 views26 pages

Distributions DAE. Buisness Analytics Essentials

The document discusses the concept of distributions in statistics, focusing on probability distributions, including formulas for calculating mean and standard deviation. It explains discrete distributions, binomial distributions, and the use of Excel functions for probability calculations. Additionally, it highlights the relevance of probability distributions in various applications such as investment returns and population growth.

Uploaded by

yaminigjadhav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
4 views26 pages

Distributions DAE. Buisness Analytics Essentials

The document discusses the concept of distributions in statistics, focusing on probability distributions, including formulas for calculating mean and standard deviation. It explains discrete distributions, binomial distributions, and the use of Excel functions for probability calculations. Additionally, it highlights the relevance of probability distributions in various applications such as investment returns and population growth.

Uploaded by

yaminigjadhav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 26
Distriputions a4] DISTRIBUTIONS The distribution of ae SCT eee dataset is the spread of the data which shows all ona data and how they occur. A distribution is simply a variable. Usually, these scores are arranged in order from and nd then they can be presented graphically. The distribution rovides a parameterized mathemati ical , or individual observation from the ae which will calculate the probability of any ssi collection of data or scores ascending to descending The term “probabili the possible outcomes of a common examples of a py likelihood that happen. Probability Distribution Formula ty distribution” refers to any statistical function that dictates all random variable within a given range of values. One of the most a Probability distribution is the Normal distribution. It records the 1s to occur. It is based on theoretical assumption of what should The mean is the expected value of the random variable in the probability distribution. The formula for the mean of a Probability distribution is expressed as the aggregate of the products of the value of the random variable and its probability. Mathematically, it is represented as, x=E [x,* P(x] where, = x,= Value of the random variable in the i" observation = P(x) = Probability of the i value The standard deviation is a measure of the variation of all the random variable values from its expected value. The formula for standard deviation is expressed as the square root of the aggregate of the product of the square of the deviation of each value from the mean and the probability of each value, Mathematically, it is represented as, o= F(x, ~%) #P(x,) Please note that the summation of all the probabilities in a probability distribution is equal to 1 79 1 Tata Puatications OO ll B.COM (Business Analytics) 411 Analyze Distributions Time to Failure (TTF) ay Proba aed alysis to describe the robability Distribution Analysis allows you ecific pattern. Based on a a statistical distribution, which usually is characterized by a sp ran ble en query, dataset, or data that you manually enter, you can use an independe generate a Probability Distribution Analysis. Example : Probability Distribution Formula using Excel Spread sheet. . te Let us take the example of a survey conducted in certain to find out the ee “ number of persons in a family; the following data is available. Calculate the mean an standard deviation of the probability distribution. 0.22 0.48 0.25 0.05 j Solution : Mean (x) is calculated using the formula given below XFL [x,t P(x] c o E 2 3_| fais |eeestae) oz | 048 | 025 | 005 Mean (x) is calculated using the formula given below KEE DG * Pix) TRis4*as}o(ca*cs) “}lo4*D5}+(e4*eS) = Mean (x) = 2*0.22 + 3* 0.48 +4*0.25 +5 * 0.05 = Mean (x) = 3.13 Standard Deviation (0) is calculated using the formula given below. Standard Deviation o (x, -X)? *P(x,) [ 80} ———_—aA_ a __ Tata Purscications yr OO § ESSENTIALS —oO a DATA ANALYTIC: " 12. Standard Deviation (9) is calculated usin, 13 Standard Deviation (o)=¥ § (xi -i}2 ° 14 1 the formula given below Pox : }QRTI(84-B10)"2"854(c4-B10 pope" 15 ee — <2 NO4-810)42*D5+(E4-810}42*E5) Standard Deviation (0) [2 - 3.13)? * 0.22 + 3 - 3.13)? * 0.48 + 2 * 0.25 + (5 - 3.13)? * 0.05] (4-31 , Standard Deviation (c) = 0.808 Therefore, according to the survey, the expected no. of persons per family is 3.13 with a standard deviation of 0.808. Explanation The formula for a mean and standard deviation of a probability distribution can be derived by using the following steps: Step 1: Firstly, determine the values of the random variable or event through a number of observations, and they are denoted by x,, Xy «+. X, Of X, Step 2: Next, compute the probability of occurrence of each value of the random variable and they are denoted by P(x,), P(x), --.--, P(x,) or P(x). P(x) = No. of Events with i Value / Total No. of Events Step 3: Next, the formula for mean can be derived by adding up the products of the value of the random variable (step 1) and its probability (step 2), as shown below ® = E[x,* PO] Step 4: Next, compute each value’s deviation (step 1) of the random variable from the mean (step 3) of the probability distribution. Tata Pusuicaions in | -_ B.COM (Business Analytics) Step 5: Next, the formula for standard deviation can be derived by adding up 4 the products of the squares of deviation of each value (step 4) and its probability (ste, 3 and then computing the square root of the result as shown below. L(x) -X)* #P(X)] Relevance and Use of Probability Distribution Formula The probability distribution formula concept is very important as it basically estimate, the expected outcome on the basis of all the possible outcomes for a given range of data One of the most important parts of a probability distribution is the definition of the function, as every other parameter just revolves around it. Probability distribution finds application, in the calculation of the return of an investment portfolio, hypothesis testing, the expecteg growth of population, etc. 4.1.2 Discrete Distributions The Discrete distribution is a general type of probability distribution used to describe a variable that can take one of several explicit discrete values {x} and where a probability weights {p] are assigned to each value. For example, the number of bridges to be built over a motorway extension or the number of times a software module will have to be re-coded after testing. Discrete Probability Distributions When the random variable in consideration is discrete in nature, the probability distribution also comes out to be discrete. The required condition associated with it are as follows: 1> f(x) >0 and df(x) = 1. We can ¢arry out following observations from these two equations: = The probability of the random variable can be greater than or equal to 0 and can be less than or equal to 1. = The probability of a certain outcomeis 1 and the probability of impossible outcome is 0. Thus, for certain outcome, f(x) = 1 and for impossible outcome, f(x) = 0. = If the outcomes are random in nature, without any bias, having equal chance, then second equation holds true. (2) sma rPrecrrors | , ESSENTIALS ao DATA ANALYTICS ESSENTIALS ple 1: rind the distribution function for the w. Also show the graph of the iw F frequency function given in columns A and B Fequency and distribution functions. rel A 8 ¢ o 5 Fteauency/oistibution Functions 3 fe] #4) ea2} 0.2) 5 2} 02s} oy) S 3) 008] 0.45} 7 4 oad oss} 8 5} 0.03) 0.68) 3 6] 018 86 107} e093] 095 11__8|_ 005] 1.001 Fig. : Table of frequency and distribution functions Given the frequency function defined by the table in the range the distribution function in the range C4:C11 by putting the formula =B4 in cell C4 and the formula =B5+C4 in cell C5 and then copying this formula into cells C6 to C11 (e.g. by highlighting the range C5:C11 and pressing Ctrl-D), Using the approach described in Example 2.1, we can generate the graphs of the frequency and distribution functions as follows: ~ 1 Frequency Function f(x) 030 7 | 028 +— 020 + Distribution Function F(x) ous | cal I ous 5 i i i i ooo | i = Figure 2 : Charts of frequency and distribution functions Excel Function Excel provides the function PROB, which is defined as follows: Where R1 is the range defining the discrete values of the random variable x (e.g. AGAILin Figure 1) and R2is the range consisting of the frequency values f(x) corresponding ‘othe x values in R1 (e.g, B4:B11 in Figure 1), the Excel function PROB is defined as follows TATA Pusuicarions ss} ~ Data Tools| Remove Duplicates. The highlighted data can then optionally be sorted via Data > Sort & Filter| Sort. The result appears in cell range C4:C8 above. Alternatively use the Real Statistics QSORT and NoDupes functions as described in Supplemental Functions. Then use the COUNTIF function (see Built-in Functions) to count how many times each score appears in the sample data. E.g. cell D4 contains the formula = COUNTIF TATA PUBLICATIONS am DATA ANALYTICS ESSENTIALS $3$A$15,C4), which has value 2 since the data element 12 (the value in cell C4) appears ;ce in the raw data. Since there are 12 data elements, the correct value of the frequency action for data element 2is 2/12 = 0.167, which can be calculated via the formula D4/DS? ia col Ed where D9 contains the formula SUM(D4:D8). poample3 Repeat Example 2 using the FREQTABLE function Fig. : Using the FREQTABLE function ‘The output from = FREQTABLE(A3:A14) (where A3:A14 is as in Figure 3) is shown in range M4:08 of Figure 4 (the headings in row 3 have been added manually) [22] BINOMIAL DISTRIBUTIONS The binomial distribution is a discrete probability distribution. It describes the outcome of n independent trials in an experiment. Each trial is assumed to have only two outcomes, either success or failure. If the probability of a successful trial is p, then the probability of having x successful outcomes in an experiment of n independent trials is as follows. ieo-(") p\(1-p)"” where x=0,1,2, Example Suppose there are twelve multiple choice questions in an English class quiz. Each question has five possible answers, and only one of them is correct. Find the probability of having four or less correct answers if a student attempts to answer every question at random. Solution : Since only one out of five possible answers is correct, the probability of answering a question correctly by random is 1/5=0.2. We can find the probability of having exactly 4 correct answers by random attempts as follows. Tata Pustications dbinom(4, size=12, prob=0.2) [1] 0.1329 To find the probability of having four or less correct answers by random attempts, we apply the function dbinom with x = 0,...4 > dbinom(0, 12, prob=0.2) + + dbinom(I, size=12, prob=0.2) + + dbinom(, size=12, prob=0.2) + + dbinom@, size=12, prob=0.2) + + dbinom(4, size=12, prob=0.2) [1] 0.9274 Alternatively, we can use the cumulative probability function for binomial distribution pbinom. > pbinom(4, size=12, prob=0.2) {1} 0.92744 Answer: The probability of four or less questions answered correctly by random in a twelve question multiple choice quiz is 92.7%. c (_4.3 | POISSON DISTRIBUTIONS The POISSON.DIST function is categorized under Excel Statistical functions. It will calculate the Poisson probability mass function.As a financial analyst, POISSON.DIST is useful in forecasting revenue. Also, we can use it to predict the number of events occurring over a specific time, e.g., the number of cars arriving at the mall parking per minute. The POISSON. DIST function was introduced in MS Excel 2010 and hence not available in earlier versions. For older versions of MS Excel, we can use the POISSON function. Formula = POISSON. DIST(x,mean,cumulative) The POISSON. DIST function uses the following arguments: 1. X (required argument) - This is the number of events for which we want to calculate the probability. The value must be greater than or equal to 0. Mean (required argument) - This is the expected number of events. The argument greater than or equal to zero. must ics] Tata PUBLICATIONS Tata Pusuications ay ee § ESSENTIALS _—..SoOo_—— DATA ANALYTIC: Cumulative (required argument) — This is ic 5 t specifies the is is the logical argument that Sp* 3 ype of distribution to be calculated. It can either be «TRUE ~ Returns the cumulative Poisson probability that the number of random events occurring will be between zero and x inclusive. » _ FALSE - Returns the Poisson probability mass function that the number of events occurring will be exactly x, The Poisson probability mass function calculates the probability that there will pe exactly x occurrences and is given by the formula: eax x! £(x,2) = Where é is the expected number of occurrences within the specified time period. The cumulative Poisson distribution function calculates the probability that there will be at most x occurrences and is given by the formula: F(x,a) = See Ko x! How to use the POISSON.DIST Function in Excel? To understand the uses of the POISSON.DIST function, let’s consider an example: Example Suppose we are given the following data: = Number of events :5 = Expected mean : 10 To find out the Cumulative Poisson probability, we will use the following formula : mR - Xv) fe =POISSON.DIST(CS,C6,TRUE) Sar are sepmactoniec 8 © oD i a 2] ocean - 4 4) [Description Data 3 Number of events | 5 | [Number ofevents 6 [expected Mean 7 ———_. a] Cumulative Poisson probability POISSON.DIST(C5,C6,TRUE}} CC B.COM (Business Analytics) We get the result below cs : fe =POISSON.DIST(CS,C6,TRUE) 1 3 a [Description - Data 3 Number of events 5 6 [Expected Mean 19 7 8| Cumulative Poisson probability |_0.067086) To find out the Poisson probability mass function, we will use the following formula: IRR ° * Vv fe —_ =POISSON.DIST(C5,C6,FALSE) | A 8 D E | 1 3 _ 4 Description {Data | 5 Number of events ! 5 6 Expected Mean | 10] 7 a Cumulative Poisson probability 0.067086 9| Poisson probability mass function =POISSON.DIST(C5,Cé6,FALSE}| Tata PUBLICATIONS > _—.S DATA ANALYTICS ESSENTIALS: esult below we get the i =POISSON. DIST(C5,C6, FALSE) A 8 p Sead [Description Number of events Expected Mean _ Cumulative Poisson probability Poisson probability mass function CONTINUOUS DISTRIBUTIONS A probability distribution in which the random variable X can take on any value (is continuous). Because there are infinite values that X could assume, the probability of X taking on any one specific value is zero.The normal distribution is one example of a continuous distribution. A probability distribution in which the random variable X can take on any value (is continuous). Because there are infinite values that X could assume, the probability of X taking on any one specific value is zero. Therefore we often speak in ranges of values (P0%>0) = 50). The normal distribution is one example of a continuous distribution. The probability that X falls between two values (a and b) equals the integral (areaundex the curve) from a to b: Probability Density Function F(x) = P(a Here is a graph of the continuous uniform distribution with a = 1, b = 3, “4 - _ 04 5 e Problem x Select ten random numbers between one and three. Solution : We apply the generation function runif of the uniform distribution to generate ten random numbers between one and three. > runif(10, min=1, max=3) [1] 1.6121 1.2028 1.9306 2.4233 1.6874 1.1502 2.7068 [8] 1.4455 2.4122 2.2171 | IDENTIFY CUMULATIVE DISTRIBUTIONS The NORM DIST (earlier NORMDIST) function in Excel is used for calculating normal distribution of value in a set of data. ‘Syntax of NORM.DIST =NORM.DIST(x, mean, standard_dev, cumulative) Tata Pusticarions « F Eee) 2 femp113 41] [Bandayd Deviation | 3.742] 3 femp112 a (Mean 47] 1 [amp105 43] s [emoto1 a4] «6 [emp103 45| 7 [emo 46 1 [emp109 ai] 9 [emptoz 4a] so[emp110 49] sifemp107, 30) uufemp106 sified 12 [emp108 52] uulempi04 33] Write this NORMDIST formula in cell C2 and drag it down. = NORM DIST(B2,$F$3,$F$2,TRUE) TATA PUBLICATIONS _ , Here, B2 contains the number of tit changes when copied below. wandard deviation, 1 0 #4 stand fs “ TT DATA ANALYTICS ESSENTIALS which we want to get CDF. It is a relative reference Next, we have given absolute references for mean Pectively, the NORM. DIST function will return below result. & =NORM.015T(82,$°53,$652.2) < ° e a $ 0.05 (ieeacees SA | 3742 emp1i2 a a emp105, ona emp101 |___~oaal emp103 Osal emp lit 0.39) emp109 47 0.50] emp102 rT) O61 emp110 a9 0.70] emp107 50) 0.79] ‘emp106 sl 0.86] emp108 82 0.91 emp104 33 0.95 This CDF's tell us the probability of any number below that given number. Now, we know that there are 79% chances of a person weighing 50 kgs or less in your company. The graph shown below is visualisation of CDF in Excel. 1.00 090 089 070 0.60 080 040 030 0.20 0.10 0.00 Cumulative Distribution fo / 4 oe ap 41 42 43 4b 45 46 47 48 49 $0 51.52 53 54.55 5657 S859 60 : ike this if data is sorted ascendingly. Because i 1 will always look like this i ‘ ; : as theneg con ae the NORM.DIST calculates cumulative probability. If you want to just wv they eee of a number at a point in a data set, you should use PDF. Tata Pusticanions Cs B.COM (Business Analytics) 2. How to Calculate Probability Distribution Function in Excel The NORM. DIST function is also used for calculating PDF sre He Probably istril ii e of a given ni Population, Distribution Function tells the probability of occurrenc: escent For example, you may want to know what is the probability of a We have prepared another data from men in your organisation. 2 ferw113, coe 1871 3 jer il? 35} 60) 4 |emp10s 40} 5 fevetor | 4s « [er103 of rJempr11_ | ss] ‘lorena |— eke] +fessim a wlemp110_ 70} afemp107_ 75) jemp106_ 80} . wa}emp 108° 85] ‘erpiot | og zl Thave already calculated the standard deviation and mean of the data. Just writ this NORM.DIST formula in C2 and copy in below cells. = NORM.DIST(B2,$FS3,$F52,0) Now you have the probability of each weight in data. It says that there are 2% chances of a person weighing 60% kgs. The graph of PDF is normally looks like below image. POF \ . oo ww | TATA PUBLICATIONS ] y RS —__ DATA ANALYTICS ESSENTIALS this is also called bell curve graph so yeah guys, this is the NORM.DIST cel © calculate the probability of inaae the complex mathematics al IDENTIFY NORMAL DISTRIBUTIONS 1 dis i The normal distribution is an important class of Statistical Distribution that has wide range of applications. This distribution applies in most Machine Learning Algorithms and the concept of the N ormal Distribution is a must for any Statistician, Machine Learning gngineet, and Data Scientist. The Normal distribution is also known as Gaussian or Gauss distribution. Many ups follow this type of pattern. That's why it's widely used in business, statistics, and Srgovernment bodies like the FDA : » Heights of people. = Measurement errors. function of excel. You can easily use this function nae a number appearing in data set without worrying, ‘hind normal distribution function » Blood pressure. » Points ona test. = IQ scores. = Salaries. Why is Normal Distribution Important? ‘There are several reasons why the normal distribution is crucial in statistics. Some of those are as follows: 1. The statistical hypothesis test assumes that the data follows a normal distribution. 2. Both linear and non-linear regression assumes that the residual follows the normal distribution. 3. Moreover, the central limit theorem states that as the sample size increases the distribution of the mean follows normal distribution irrespective of the distribution of the original variable ‘Apart from this most of the statistical software programs support some of the probability functions for normal distribution as well. * Parameters of Normal Distribution There are two main parameters of anormal distribution-.the mean and standard deviation, With the help of these parameters, we can decide the shape and probabilities of q Tata Puauications os} —_——_______ the distribution wrt our problem statement, As the parameter value changes, the shape oy the distribution changes, 1 Mean > Researchers used the mean or average value as a measure of central tendency. can be used to describe the distribution of variables that are measured as Tatios or intervals. > — The mean determines the location of the peak, and most of the data points are -@ustered around the mean ina normal distribution graph. » we change the value of the mean, then the curve of normal distribution moves either to the left or right along the X-axis. ‘Normal Distribution: Different Means - Same Standard Deviation Normal, tDev=15 oon ons od on = Yaw one 2. Statdard Deviation > — The standard deviation measures how the data points are dispersed relative to the mean. >» It determines how far the data points are away from the mean and represents the distance between the mean and the data points. > — The standard cleviation defines the width of the graph. As a result, changing the value of standard deviation tightens or expands che width of the distribution along the x-axis. >» — Usually, a smaller standard deviation wrt to the mean results in a steep curve while a larger standard deviation results in a flatter curve. —_——J_—_——.T_|_——_ Tata Pusticarions y Eee a DATA ANALYTICS ESSENTIALS ‘Normal Distribution: ‘Same Means -oiteren eon ‘Normal. Mean 100 s J ons one. i Empirical Rule for Normal Distribution Have you heard of the em ‘pirical rule? It’ in statistics (and inalot of performance reviewsone ee le? It’s a commonly used concept in s ( 68-95-99.7 Rule utz0 According to the Empirical Rule for Normal Distribution: > — 68.27% of data lies within 1 standard deviation of the mean > — 95.45% of data lies within 2 standard deviations of the mean > 99.73% of data lies within 3 standard deviations of the mean Thus, almost all the data lies within 3 standard deviations. This rule enables us to check for Outliers and is very helpful when determining the normality of any distribution. 46.1 Calculate Normal Distributions The normal distribution is defined by the following probability density function, where i is the population mean and 6” is the variance. Len? /20? f(x) = ae w? 2 Tata Pustications = | ren we write fa random variable X follows the normal distribution, th X~Ni(u,07) = Lis called the 'n particular, the normal distribution with = Oand o= Tis called the standayy normal distribution, and is denoted as N(0,1) It can be graphed as follows. The normal distribution is important because of the Central ee ft states that the population of all possible samples of sen from a population with mean 4 and variance c* approaches a normal distribution with mean ¢ and o"/n when n approaches infinity. Problem Assume that the test scores of a college entrance exam fits a normal distribution, Furthermore, the mean test score is 72, and the standard deviation is 15.2. What is the percentage of students scoring 84 or more in the exam? Solution : We apply the function pnorm of the normal distribution with mean 72 and standard deviation 15.2. Since we are looking for the percentage of students scoring higher than 84, we are interested in the upper tail of the normal distribution. > pnorm(84, mean=72, sd=15.2, lower.tail=FALSE) [1] 0.21492 Answer : The percentage of students scoring 84 or more in the college entrance exam is 21.59, Quartiles are used to summarize a group of numbers. Quartiles are great for reporting on a set of data and for making box and whisker plots. Quartiles are especially useful when you're working with data that isn’t symmetrically distributed, or a data set that has outliers A quartile is a statistical term thai describes a division of observations into four defined intervals based on the values of the data and how they compare to the entire set of observations. It is important to understand the median as a measure of central tendency. The median in statistics is the middle value of a set of numbers. It is the point at which exactly half of the data lies below and above the central value. So, given a set of 13 numbers, the median would be the seventh number, The six numbers preceding this value are the lowest numbers in the data, and the six numbers after the median are the highest numbers in the dataset given. Because the median is not TATA PUBLICATIONS OO DATA ANALYTICS ESSENTIALS ferred to asfected by extreme values or outliers in the distribution, it is sometimes Pre! the mea”: The median is a Tobust estimator of location but says nothing about how the data on «ather side of its value is spread or dispersed. That's where the quartile steps in. The grtile measures the spread of values above and below the mean by dividing the distributio? iio four BFOUPS. Normal Distribution Normal distribution, also known as the Gaussian distribution, is a probability gistribution that is symmetric about the mean, showing that data near the mean are more frequent in occurrence than data far from the mean. In graph form, normal distribution will appeat as a bell curve, The normal distribution has several advantages over the other distributions. a. The normal distribution and distributions associated with it are very tractable and analytically. b. The normal distribution has the familiar bell shape, whose symmetry makes it an appealing choice for many popular models. c. There is the Central Limit Theorem, which shows that, under mild conditions, the normal distribution can be used to approximate a large variety of distributions in large samples. IDENTIFY SKEW The skewness of a data population is defined by the following formula, where 1, and 4, are the second and third central moments. = Hs/ 2” Intuitively, the skewness is a measure of symmetry. As a rule, negative skewness indicates that the mean of the data values is less than the median, and the data distribution is left-skewed. Positive skewness would indicate that the mean of the data values is larger than the median, and the data distribution is right-skewed. Problem Find the skewness of eruption duration in the data set faithful. Solution : We apply the function skewness from the e1071 package to compute the skewness coefficient of eruptions. As the package is not in the core R library, it has to be installed and loaded into the R workspace Tata Pusuicarions —————————_—_TpT._-.--.’ - -- —- L 99 ] > B.COM (Business Analytics) > library(e1071) # load e071 > duration = faithful$eruptions # eruption durations > skewness(duration) # apply the skewness function {1} -0.41355 Answer: The skewness of eruption duration is -0.41355. It indicates that the eruption duration distribution is skewed towards the left. Exercise Find the skewness of eruption waiting period in faithful. C—O TIALS DATA ANALYTICS ESSE! 2 KEY WoRDS ribution : The distribution of a statistic, . spread of th » data which ows all possible v a statistical dataset is the spread of the alues or interval: distribution ‘als of the data and ow »y occur. A distr? is simply a collection of data org hearer moguls ed wat OF scores on a variable. Usually, these scores are arrang' ascending to descending and then they can be presented graphically The distribution provides a parameterized mathematical function which will calculate the probability of any individu, in order from al observation from the sample space 2 Probability distribution : The term ” 3 : “probability distribution” refers to any statistical function that dictates all the Possible outcomes of a random variable within a given ‘ most common examples of a probability distribution is the Normal distribution, It records the likelihood that an event is to occur. It is based on theoretical assumption of what should happen range of values. One of the 3, Discrete Distributions : The Discrete distribution is a general type of probability distribution used to describe a variable that can take one of several explicit discrete values {x} and where a probability weights {p} are assigned to each value. For example, the number of bridges to be built over a motorway extension or the number of times a software module will have to be re-coded after testing. 4. Binomial distribution : The binomial distribution is a discrete probability distribution. Itdescribes the outcome of n independent trials in an experiment. Each trial is assumed to have only two outcomes, either success or failure. 5. _ Poisson distribution : If statistics, a Poisson distribution is a probability distribution that can be used to show how many times an eventis likely-to- occur within a specified period of time. The Poisson distribution is a discrete function, meaning that the variable can only take specific values in a (potentially infinite) list. 5 za pe PO a eee) east 6. Continuous Distribution : A probability distribution in which the random variable X can take on any value (is continuous). Because there are infinite values that X could assume, the probability of X taking on any one specific value is zero. The normal distribution is one example of a continuous distribution. 7. Normal distribution : The normal distribution is an important class of Statistical Distribution that has a wide range of applications. This distribution applies in most Machine Learning Algorithms and the concept of the Normal Distribution is a must for any Statistician, Machine Learning Engineer, and Data Scientist. Tata Pusuications 101 B.COM (Business Analytics) eRe MULTIPLE CHOICE QUESTIONS oe nm 7. Which of the following mentioned standard Probability density functions is applicable to discrete Random Variables? a) Gaussian Distribution Pr Poisson Distribution ¢) Rayleigh Distribution d) Exponential Distribution A sampling distribution is the probability distribution for which one of the following: a. sample A. sample statistic ©. population d. population parameter Which of the following is the most common example of a situation for which the main parameter of interest is a population proportion? 2 Pinomial experiment b. normal experiment ¢. randomized experiment d. An observational study Poisson distribution is applied for__—_ 2) Continuous Random Variable _}}Wiscrete Random Variable ©) Irregular Random Variable d) Uncertain Random Variable are used when you want to visually examine the relationship between two quantitative variables. a. Bar graph b. pie graph c. line graph L7 Scatter plot Standard deviation is always calculated from: oa Mean b. Median . Mode d. Lower quartile In case of positively skewed distribution, the extreme values lie in the: a. Middle b. Left tail cc. Right tail d. Anywhere Answers : bb la be od 6a 7a 402 “Tata PUBLICATIONS | | 7 y po FILL IN THE BLANKS Tata Pupuicaions Half of the difference between upper and lower quartiles is called ——— The standard deviation one distribution dividedly the mean of the distribution and expressing in percentage is called —— is the process of transforming qualitative research data from written interviews or field notes into typed text —— is the cyclical process of collecting and analysing data during a single research study called ? Data Analysis is a process of ___ The measurements of spread or scatter of the individual values around the central point is called All odd order moments about mean in a symmetrical distribution are ——— Answers : Quartile deviation Coefficient of variation Transcription Interim Analysis inspecting data Measures of dispersion Noo Fe NS Zero - B.COM (Business Analytics) fae ke it IMPORTANT QUESTIONS What is Distributions ? How to analyze distributions ? Discuss about Discrete distributions, Binomial distributions and Poisson distribution, with example What is Continuous Distributions? How Identify continuous distributions ang Calculate continuous distributions? How to identify cumulative distributions and normal distributions ? How to Calculate normal distributions by using excel spreadsheet ? Explain about Compare quartiles and normal distributions. How to Identify skew by using excel spreadsheet ? TATA PUBLICATIONS

You might also like