Fda Unit 5

The document discusses predictive analytics, focusing on various regression techniques such as linear least squares, multiple regression, and logistic regression. It explains the importance of goodness-of-fit tests, parameter estimation, and the use of statistical tools like StatsModels for regression analysis. Additionally, it covers concepts like spurious regression and the significance of residual analysis in validating regression models.

Uploaded by

paulsteffin2003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

44 views20 pages

Fda Unit 5

Uploaded by

paulsteffin2003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 20

: Predictive Analytics ear least squares - implementation - goodness of fit - testing a linear model - ‘esampling. Regression using StatsModels - multiple regression - nonlinear ighted a ° relationships = logistic regression - estimating parameters - Time series analysis - moving ‘averages ~ missing values - serial correlation - autocorrelation. Introduction to survival analysis. Contents Linear Least Squares Regression using StatsModels Multiple Regression Logistic Regression Time Series Analysis Introduction to Survival Two Marks Questions with Answers— Fundamentals of Data Science and Analytics (5-2) Predictive Analytiog Qs1 Linear Least Squares Least square method * The method of least squares is about estimating parameters by minimizing the squareq discrepancies between observed data, on the one-hand and their expected values on the other. ¢ The Least Squares (LS) criterion states that the sum of the squares of errors is minimum, The least-squares solutions yield y(x) whose elements sum to 1, but do not ensure the outputs to be in the range [0,1]. * How to draw such a line based on data points observed ? Suppose a imaginary line of y=a+bx. E(Y)=a+bx x, Fig. 5.1.1 © Imagine a vertical distance between the line and a data point E = Y - EY). This error is the deviation of the data point from the imaginary line, regression line. Then what is the best values of a and b ? A and b that minimizes the sum of such errors. Deviation does not have good properties for computation. Then why do we use squares of deviation ? Let us get a and b that can minimize the sum of squared deviations rather than the sum of deviations. This method is called least squares. Least squares method minimizes the sum of squares of errors. Such a and b are called least squares estimators i,c. estimators of parameters 0: and B. *. The process of getting parameter estimators (e.g., a and b) is called estimation. Lest squares method is the estimation method of Ordinary Least Squares (OLS). TECHNICAL PUBLICATIONS® an pst or iowiedgamentals of Data Science and Analytics (5-3) Predictive Analytics a pot ised 1 yack rob : certain & ntages of least square ustness to outliers. iatasets unsuitable for least squares classification. pecis ‘on boundary corresponds to ML solution, 3. example 5.1.4 : Fit straight line to the points in the table, Compute m and b by least uares Solution : Represent in matrix form : 3.00 1 4507 [Va 425 1 IR =| 45 |, Vp 550 1 |LbJ | 550 |*] ve 8.00 1 ssod | y, : x-["]-a'w'a'y [ 121.3125 anasto" _ [ 195825] _ [o25] ={ 20.7500. 4.0000 | ~L 19.7500 | = L 3.663 V =AX-L ; . 3.00 1 4507] [-0.10 _|425 1 [226] 425 |_| 0.46 5.50 1 |L3.663)~ | 5.50 | | -0.48 8.00 1 5.50 0.13 5.1.1 Goodness of Fit * A goodness-of-fit test, in general, refers to measuring how well do the observed data Correspond to the fitted (assumed) model. The goodness-of-fit test compares the observed Values to the expected (fitted or predicted) values. * Goodness-of-fit tests are frequently applied in business decision making. For example, the below image depicts the linear regression function. The goodness-of-fit test here will Compare the actual observéd values denoted by dots to the predicted values denoted by the Tegression line, TECHNICAL PUBLICATIONS® - an upthrust for knowledge mee— —+ Fundamentals of Data Science and Analytics (5-4) Predictive Anaiyticg Fig. 5.1.2 : Goodness of fit © Broadly, the goodness of fit test categorization can be done based on the distribution of the predict and variable of the dataset. a) The chi-square b) Kolmogorov-Smimov _‘c) Anderson-Darling. ' @ 5:1.2 Testing a Linear Model : © The following measures are used to validate the simple linear regression models : 1. Co-efficient of determination (R-square). 2. Hypothesis test for the regression coefficient by. 3. Analysis of variance for overall model validity (relevant more for multiple linear regression). : 4, Residual analysis to validate the regression model assumptions. 5. Outlier analysis. ©, The primary objective of regression is to explain the variation in Y using the knowledge of X. The coefficient of determination (R-square) measures the percentage of variation in Y explained by the model (8, + B, X). Characteristics of R-square : © Here are some basic characteristics of the measure : 1. Since R’ is a proportion, it is always a number between 0 and 1. 2 . 2. If R’ = 1, all of the data points fall perfectly on the regression line. The predictor x . accounts for all of the variation in y. 2 . aes 3. If R= 0, the estimated regression line is perfectly horizontal. The predictor x accounts for none of the variation in y. TECHNICAL PUBLICATIONS® - an up-thrust for knowledgeean eae the linear regression setting. More specifically, R- indicates the rion of the Variance in the dependent variable (Y) that is predicted or explained by ession and the predictor variable (X, also known as the independent variable). P sient TRF 2 jn general @ high R° value indicates that the model is a good fit for the data, although interpretations of fit depend on the context of analysis, An R. of 0.35, for examples indicates that 35 percent of the variation in the outcome has been explained just by predicting th outcome using the covariates included in the model. «» That percentage right be a very high portion of variation to predict in a field such, as the social sciences in other fields, such as the physical sciences, one would expect R' to be rnuch closer to 100 percent. ‘The theoretical minimum R° is 0. However, since linear regression is based on the best possible fit, R’ will always be greater than zero, even when the predictor and outcome variables bear no relationship to one another. increases when a new predictor variable i added tothe model, even if the new predictor js not associated with the outcome. 1» ‘account for that effect, the adjusted R' incorporates the same information as the usual R’ but then also penalizes for the number of predictor variables included in the model. 2 ‘As a result, R” increases as new predictors are added to a multiple linear regression model, but the adjusted R’ increases only ifthe increase in Ris greater than one would expect from chance alone. In such a model, the adjusted R is the most realistic estimate of the proportion of the variation that is predicted by the covariates included in the model. B sa24 ‘Spurious Regression 1 we regress one random walk onto another independent © The regression is spurious whe will most likely indicate a non-existing random walk, It is spurious because the regression " relationship : 1. The coefficient estimate will not convel limit the coefficient estimate will follow a nom srge toward Zero (the true value). Instead, in the degenerate distribution. 2. The t value most often is significant. 2. 3. Ris typically very high. TEGHRICAL PUBLICATIONS®-an ups! for rowledge= 6-9) Predictive Analytics Fundamentals of Data Science and Analytics * Spurious regression is linked to serially correlated errors. Granger and Newbold(1974) pointed out that along with the large t-values strong evidence stating that when a low value of serially correlated errors will appear in regression analysis, of the Durbin-Watson statistic is combined with a high value of the t-statistic the relationship is not true. Hypothesis Test for Regression Co-Efficient (t-Test) ¢ The regression co-efficient (B,) captures the existence of a line response variable and the explanatory variable. If B, = 0, we can conclude that there is no statistically significant linear relationship ar relationship between the between the two variables. Using the Analysis of Variance (ANOVA), we can test whether the overall model is statistically significant. However, for a simple linear regression, the null and alternative hypotheses in ANOVA and t-test are exactly same and thus there will be no difference in the p-value. Residual analysis © Residual (error) analysis is important to check whether the assumptions of regression models have been satisfied. It is performed to check the following : 1. The residuals are normally distributed. 2. The variance of residual is constant (homoscedasticity). 3. The functional form of regression is correctly specified. 4, If there are any outliers. 5.2 Regression using StatsModels Linear regression statsmodel is the model that helps us to predict and is used for fitting up the scenario where one parameter is directly dependent on the other parameter. Here, we have one variable that is dependent and the other one which is independent. Depending on the change in the value of the independent parameter, we need to predict the change in the dependent variable. The statsmodels library has more advanced statistical tools as compared to sci-kit learn. Moreover, it’s regression analysis tools can give more detailed results, TECHNICAL PUBLICATIONS? - an up-ratfornomoageS aon of Data Science and Analytics (5-7) pretetve Anaytlcs | | pe i ll axe ace four availabe clasts ofthe properties ofthe regression model that wil help us f° se te statsmodel linear regression. The classes are as follows : « | 4) ordinary Least Square (OLS) . ; | 1 Weighted Least Square (WLS) | ° Generalized Least Square (GLS) | @) GLSAR- Feasible generalized leat square along with the errors that are auto correlated. ‘| | statsmodel Linear regression model helps to predict or estimate the values of the dependent variables a8 and when there is a change in the independent quantities. 4. statsmodels is @ Python module that provides classes and functions for the estimation of i many different statistical models, as well as for conducting statistical tests and statistical data exploration. + Statsmodels is built on top of NumPy, SciPy and matplotlib. 53 Multiple Regression let’s see how big the difference in weight «The mothers of ftt babies are 3.59 yers younger, Running the linear model agai, we Bet the change in birth weight as a function of age: si ad tal agers ete «The slope is 0.0175 pounds per yea. we multiply the slope by the difference in ages, We get the expected difference in birth weight for first babies and others, due to mother’s age : © The result is 0.063, just about half of the observed difference. So we conclude, tentatively, that the observed difference in birth weight can be partly explained by the difference in og oe] mother’s age. these relationships more systematically. i 4 TEGHNIGAL PUBLICATIONS® - en uphrustorInewfodgeFundamentals of Data Science and Analytics (5-8) Predictive Analytics © The first line creates a new column named isfrst that is True for first babies and False otherwise. Then we fit a model using isfirst as an explanatory variable. © Here are the sample results : © Because isfirst is a boolean, ois treats it as a categorical variable, which means that the values fall into categories, like True and False and should not be treated as numbers. The estimated parameter is the effect on birth weight when isfist is true, so the result, -0.125 Ibs, is the difference in birth weight between first babies and others. The slope and the intercept are statistically significant, which means that they were unlikely to occur by chance, but the R” value for this model is small, which means that isfirst doesn’t account for a substantial part of the variation in birth weight. The results are similar with seers: ice 2 ae e « Again, the parameters are statistically significant, but RX is low. © These models confirm results we have already seen. But now we can fit a single model that includes both variables. With the formula totawet Ib ~ isfirst-+ agepreg, we get : In the combined model, the parameter for isist is smaller by about half, which means that part of the apparent effect of isfirst is actually accounted for by agepreg. And the p-value for isfirst is about 2.5 %, which is on the border of statistical significance. ‘ @ 5.3.1 Nonlinear Relationship « Remembering that the contribution of agepreg might be nonlinear, we might consider adding a variable to capture more of this relationship. One option is to create a column, agepreg2, that contains the squares of the ages : : TECHNICAL PUBLICATIONS® - an up-thrust for knowledgeSERIE wa 36) URPRON Isaacs cans aesea IASI « The parameter of agepreg? is negative, so the parabola curves downward, which is consistent with the shape of the lines in Fig. 5.3.1. 19 05} 2 00 3 3 g-05 oh aa — 50th — 251 ' a a rr Age (years) Fig. 5.3.1 : Residuals of the linear fit * The quadratic model of agepreg accounts for more of the variability in birth weight; the parameter for isfirst is smaller in this model and no longer statistically significant. * Using computed variables like agepreg2 is a common way to fit polynomials and other functions to data. This process is still considered linear regression, because the dependent Variable is ‘a linear function of the explanatory variables, regardless of whether some Variables are nonlinear functions of others.* TECHNICAL PUBLICATIONS® - an up-trust for knowledgeFundamentals of Data Science and Analytics (5 - 10) Predictive Analytics OD 5.4 Logistic Regression © Logistic regression is a form of regression analysis in which the outcome variable is binary or dichotomous. A statistical method used to model dichotomous or binary outcomes using Predictor variables. ‘© Logistic regression is one of the supervised learning algorithms. The binary logistic regression model is given by, z PM) = l+e Where Z = By +B, X + By Xp +By Xy + --- + By Xy (Kp Xp» Xs ate independent variable) The logistic regression function is rewritten as, PY z T- Ty) gee (Po) = By +B, X, + By Xp +». + By X Multiple linear regression is an extension of linear regression, which allows a response variable, y to be modeled as a linear function of two or more predictor variables. Logit function is similar to a multiple linear regression model. Such models are called Generalized Linear Models (GLM), in GLM the errors do not follow normal distribution and there exists a transformation function of the outcome variable that takes a linear function. @ 5.4.1 Estimation of Parameters in Logistic Regression © Parameter estimates (also called coefficients) are the log odds ratio associated with a one- unit change of the predictor, all other predictors being held constant. For each term involving a categorical variable, a number of dummy predictor variables are created to predict the effect of each different level. Regression parameters in the case of logistic regression are estimated using Maximum Likelihood Estimator (MLE). In binary logistic regression, the response variable Y takes only two values (Y = 0 and 1). © The unknown model parameters are estimated using maximum-likelihood estimation. © A coefficient describes the size of the contribution of that predictor; a large coefficient indicates that the variable strongly influences the probability of that outcome, while a near- zero coefficient indicates that variable has little influence on the probability of that outcome. TEGHNIGAL PUBLICATIONS® - en up:tvt or rawiedgeea pul ndamentals of Data Science and Analytics (5-11) Predictive Analytics «A positive sign indicates thatthe explanatory variable increases the probability of the outcome, while a negative sign indicates that the variable decreases the probability of that outeome. A confidence interval for each parameter shows the uncertainty in the estimate. PY=NZ = BytB, X,+B)X) +... +B, X, = a(Z) z e Ite «The probability function of binary logistic regression for specific observation Y; is given by Yi 1-¥j PY) = m2) ‘(=n *t «The log-likelihod function is given by, in(L) = LL = 2 Maloy + Bay) {Int —m(Z) } 6 5.4.2 Logistic Regression Model Diagnostics Regression models for categorical outcomes should be evaluated for fit and adherence to model assumptions. There are two main elements of such an assessment : Discrimination and calibration. Discrimination measutes the ability of the model to correctly classify observations into outcome categories, Calibration measures how well the model estimated probabilities agree with the observed outcomes and it is typically evaluated via a goodness-of-fit test. The (binary) logistic regression model describes the relationship between a binary outcome variable and one or more predictor variables, Here we discuss four test models : 1, Omnibus test : The omnibus test is a likelihood-ratio chi-square test of the current model versus the null model. The significance value of less than 0.05 indicates that the current model outperforms the null model. Omnibus tests are generic statistical tests used for checking whether the variance explained by the model is more than the unexplained variance, Wald's test : Wald’s test is used for checking whether an individual explanatory Variable is statistically significant. Wald's test is a chi-square test, A Wald test calculates aZ statistic which is : we SE(B) [TEGHICAL PLBLUCATIONS an wpe tontdgeFundamentals of Data Science and Analytics (5 - 12) Predictive Analytics This value is squared which yields a chi-square distribution and is used as the Wald test statistic 3. Hosmer-Lemeshow test : It is a chi-square goodness of fit test for bin regression. 4. Psendo R” : Pseudo R” is a measure of goodness of the model. It is called pseudo R because it does not have the same interpretation of R’ as in the MLR model. ary logistic 5.4.3 Variable Selection in Logistic Regression © Variable selection is an important consideration when creating logistic re; Variables must be selected carefully so that the model makes accurate predictions, but gression models. without over-fitting the data. Forward LR (Likelihood Ratio) : © In Forward LR, at each step one variable is added to the model. The following steps are used in building logistic regression model using forward LR selection method : 1. Start with no variables in the model. 2. For each independent variable, calculate the difference between — 2LL, and — 2LL; , , value. 3, Repeat step 2, till all the variables are exhausted or the change in ~ 2LL is not significant, that is the p-value after adding a new variable is greater than 0. Forward selection wald : Stepwise selection method with entry testing based on the significance of the score statistic and removal testing based on the probability of the Wald statistic. Method selection allows you to specify how independent variables are entered into the analysis. Using different methods, you can construct a variety of regression models from the same set of variables. Enter ; A procedure for variable selection in which all variables in a block are entered in single step. Forward selection (Conditional) : Stepwise selection method with entry testing based on the significance of the score statistic and removal testing based on the probability of a likelihood-ratio statistic based on conditional parameter estimates. Forward selection (Likelihood Ratio) : Stepwise selection method with entry testing based on the significance of the score statistic and removal testing based on the probability of @ likelihood-ratio statistic based on the maximum partial likelihood estimates. TECHNICAL PUBLICATIONS® - an up-thrust for knowtedgoFundamentals of Data Science and Analytics (5 - 19) Analytics « Forward selection (Wald) : Stepwise selection method with entry testing based on the significance of the score statistic and removal testing based on the probability of the Wald statistic. « Backward elimination (Conditional) ; Backward stepwise selection. Removal testing is _ based on the probability of the likelihood-ratio statistic based on conditional parameter estimates. « Backward elimination Likelihood Ratio) : Backward stepwise selection, Removal testing is based on the probability of the likelihood-ratio statistic based on the maximum partial likelihood estimates. Backward elimination (Wald) : Backward stepwise selection. Removal testing is based on the probability of the Wald statistic, (1 5.5 Time Series Analysis First of all, we will create a scatter plot of dates and values in Matplotlib using plt.plot_date(). We will be using Python’s built-in module called datetime(datetime, timedelta) for parsing the dates. So, let us create a python file called ‘plot_time_series.py’Fundamentals of Data Science and Analytics (5 - 14) feen tae Output : pomo732 mers 0745 mort 0-739 20728 207-33 J 5.5.1 Missing Values Data can have missing values for a number of reasons such as observations that were not recorded and data corruption. Handling missing data is important as many machine learning algorithms do not support data with missing values. We can load the dataset as a Pandas DataFrame and print summary statistics on each ay a4 © In Python, specifically Pandas, NumPy and Scikit-Leam, we mark missing values as NaN. Values with a NaN value are ignored from operations like sum, count, ete. TECHNICAL PUBLICATIONS® on ups for knonedpeFundamentals of Data Science and Analytics (5 . 15) Predictive Analytics ¢ Use the ismall() method to detect the missing values. Pandas Dataframe provides a function isnull(), it cree new dataframe of same size as calling dataframe, it contains only True & False only. With True at the place NaN in original dataframe and False at other places, Encoding missingness : «The fillna() function is used to fill NA/NNaN values using the specified method. Syntax: ‘pateFrame-fillna(valie=None, tmethod=None, axis=None, inplace=False, limit =None, ‘downoast=None,**kwargs) i Where, 1. value : It is a value that is used to fill the null values, 2. method : A method that is used to fill the null values. 3. axis : It takes int or string value for rows/columns, 4, inplace : If it is true, it fills values at an empty place. 5. limit : It is an integer value that specifies the maximum number of consecutive forward/backward NaN value fills. 6. downcast : It takes a dict that specifies what to downcast like Float64 to int64. (| 5.5.2 Serial Coorelation Serial correlation is the relationship between a given variable and a lagged version of itself cover various time intervals. It measures the relationship between a variable's current value given its past values. ¢ A variable that is serially correlated indicates that it may not be random. Technical analysts validate the profitable patterns of a security or group of securities and determine the risk associated with investment opportunities. © The most common form of serial correlation is called first-order serial correlation in which the error in time is related to the previous (t- 1) period’s error : & = P&_; tty -l 0 indicates positive serial correlation - The error terms will tend to have the same sign from one period to the next period. ©) <0 indicates negative serial correlation - The error terms will tend to have a different sign from one period to the next period. Impure serial correlation * This type of serial correlation is caused by a specification error such as an omitted variable or ignoring nonlinearities. Suppose the true regression equation is given by, YY, = Bo+B, Xj, + By Xq +e The error term €, will capture the effect of X,,. Since many economic variables exhibit trends over time, X,, is likely to depend on X2, ¢_j» X,,¢_7 ++» This will translate into a seeming correlation between €, and €_y, &_ 2» «-- and this serial correlation would violate assumption. A specification error of the functional form can also cause this type of serial correlation. ‘Suppose the true regression equation between Y and X is quadratic. but we assume it’s linear. The error term will depend on X’. The consequences of serial correlation : 1. Pure serial correlation does not cause bias in the regression coefficient estimates. 2. Serial correlation causes OLS to no longer be a minimum variance estimator. 3. Serial correlation causes the estimated variances of the regression coefficients to be biased, leading to unreliable hypothesis testing. 5.5.3 Autocorrelation * Autocorrelation refers to the degree of correlation of the same variables between two successive time intervals, It measures how the lagged version of the value of a variable is related to the original version of it in a time series. The value of autocorrelation varies between + 1 and — 1. If the autocorrelation of series is a very small value that does not mean, there is no correlation. The correlation could be nonlinear. A value between — 1 and 0 represents negative autocorrelation. A value between O and 1 represents positive autocorrelation. ‘ © Autocorrelation gives information about the trend of a set of historical data so that it can be usefal in the technical analysis for the equity market. TECHNICAL PUBLIGATIONS® - an up-thrust for knowledgefundamentals of Data Science and Analytics (5-17) Analytics « Fig. 5.5.1 shows positive and negative autocorrelation. ype a — (a) Positive autocorrelation (b) Negative autocorrelation Fig. 5.5.1 A technical analyst can learn how the stock price of a particular day is affected by those of previous days through autocorrelation. Thus, he/she can estimate how the price will move in the future. «If the price of a stock with strong positive autocorrelation has been increasing for several days, the analyst can reasonably estimate the future price will continue to move upward in the recent future days. The analyst may buy and hold the stock for a short period of time to profit from the upward price movement, «The autocorrelation analysis only provides information about short-term trends and tells little about the fundamentals of a company. Therefore, it can only be applied to support the trades with short holding periods. 4 5.6 Introduction to Survival © Survival analysis is used to analyze data in which the time until the event is of interest. The response is often referred to as a failure time, survival time or event time. © Originally, this branch of statistics developed around measuring the effects of medical treatment on patient’s survival in clinical trials. For example, imagine a group of cancer patients who are administered a certain new form of treatment. Survival analysis can be used for analyzing the results of that treatment in terms of the patients’ life expectancy. Censoring : # Censoring is present when we have some information about a subject’s event time, but we don’t know the exact event time. For the analysis methods we will discuss to be valid, censoring mechanism must be independent of the survival mechanism. TEOPRIGHL PUBLIGATIONS® nop knowFundamentals of Data Science and Analytics (5-18) Brectetive Ansivics * There are generally three reasons why censoring might occur : ‘ & A subject does not experience the event before the study ends b. A person is lost to follow-up during the study period ©. A person withdraws from the study. © These are all examples of right-censoring, © Types of right-censoring :* 1. Fixed type I censoring occurs when a study is designed to end after C years of follow- up. In this case, everyone who does not have an event observed during the course of the study is censored at C years. 2. In random type I censoring, the study is designed to end after C years, but censored subjects do not all have the same censoring time. This is the main type of right- censoring we will be concerned with. 3. In type II censoring, a study ends when there is a pre-specified number of events. ©. The survival fiinction is a function of time (t) and can be represented as : S(t) = Pr(T>t) where Pr() stands for the probability and T for the time of the event of interest for a random observation from the sample. We can interpret the survival function as the probability of the event of interest (for example, the death event) not occurring by the time t. * The survival function takes values in the range between 0 and 1 (inclusive) and is a non- increasing function of t. £2 5.7 Two Marks Questions with Answers Q.1 What is logistic regression ? Ans, : Logistic regression is a form of regression analysis in which the outcome variable is binary or dichotomous. A statistical method used to model dichotomous or binary outcomes using predictor variables. Logistic regression is one of the supervised learning algorithms. Q2 What is omnibus test ? Ans. : The omnibus test is a likelihood-ratio chi-square test of the current model versus the null model. The significance value of less than 0.05 indicates that the current model outperforms the null model. Omnibus tests are generic statistical tests used for checking whether the Variance explained by the model is more than the unexplained variance, TECHNICAL PUBLICATIONS® . en up-thrust for knowledge Serundoments of Dal Scien and Aneytes (5.19) a Predictive Anelytics a3 Define serial correlation, ‘ans. : Serial correlation is the relationship between a given variable and a lagged version of ious time i itself over various ime intervals. It measures the relationship between a variable's current value given its past values. as What are the consequences of serial correlation 7 ‘Ans.t 1. Pure serial correlation does not cause bias inthe regression coefficient estimates. 2. Serial correlation causes OLS to no longer be a minimum variance estimator. 3. Serial sexaition causes the estimated variances of the regression coefficients to be biased, leading to unreliable hypothesis testing. Q.5 Define autocorrelation, ‘Ans. : Autocorrelation refers to the degree of correlation of the same variables between two successive time intervals. It measures how the lagged version of the value of a variable is related to the original version of it in a time series. Q.6. What are reasons for censoring 7 ‘Ans, : There are generally three reasons why censoring might occur: a. A subject does not experience the event before the study ends. b. A person is lost to follow-up during the study period. c. A person withdraws from the study. Q.7_ Explain regression using statsmodels. ‘Ans. : Linear regression statsmodel is the model that helps us to predict and is used for fitting up the scenario where one parameter is directly dependent on the other parameter. Here, we have one variable that is dependent and the other one which is independent. Depending on the change in the value of the independent parameter, we need to predict the change in the dependent variable. Q.8 Why residual analysis is important 7 ‘Ans. : Residual (error) analysis is important to check whether the assumptions of regression models have been satisfied. It is performed to check the following : 1. The residuals are normally distributed. 2. The variance of residual is constant (homoscedasticity). 3, The functional form of regression is correctly specified. 4. If there are any outliers. TECHNICAL PUBLICATIONS® - an up-trust for knowledgetive Analytics Fundamentals of Data Science and Analytics —_(5- 20) Predictive Analytic Q.9 What Is spurious regression 7 ther independent Ans. : The regression is spurious when we regress one random. walk onto anol pt a non-existin, random walk, It is spurious because the regression will most likely indicate 1g relationship. 900 TECHNICAL PUBLICATIONS® - an up-thrust for knowledge

MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
Unit V
No ratings yet
Unit V
27 pages
Regression Analysis
No ratings yet
Regression Analysis
49 pages
Classical Machine Learning: Linear Regression: Ramesh S
No ratings yet
Classical Machine Learning: Linear Regression: Ramesh S
28 pages
Course+notes Regression Analysis
No ratings yet
Course+notes Regression Analysis
8 pages
Course Notes Linear Regression
No ratings yet
Course Notes Linear Regression
8 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
Linear Regression-Part 2
No ratings yet
Linear Regression-Part 2
26 pages
Linear Regression
100% (2)
Linear Regression
28 pages
STAT22209 - Chapter 02-Regression Analyisis - 2022
No ratings yet
STAT22209 - Chapter 02-Regression Analyisis - 2022
41 pages
NOTES - UNIT 2 - Machine Learning
No ratings yet
NOTES - UNIT 2 - Machine Learning
33 pages
(Unit-04) Part-01 - ML Algo
No ratings yet
(Unit-04) Part-01 - ML Algo
49 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
27 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
Unit - Iii
No ratings yet
Unit - Iii
9 pages
Chapter 3 Notes
No ratings yet
Chapter 3 Notes
5 pages
Unit 3 1
No ratings yet
Unit 3 1
41 pages
3 Da
No ratings yet
3 Da
16 pages
Linear Regression Models
No ratings yet
Linear Regression Models
41 pages
Linear Regression
No ratings yet
Linear Regression
42 pages
Manual ML 1
No ratings yet
Manual ML 1
8 pages
Chapter 8 Regression Model - 2023
No ratings yet
Chapter 8 Regression Model - 2023
21 pages
Regression
No ratings yet
Regression
24 pages
Data Science Interview Preparation
100% (1)
Data Science Interview Preparation
113 pages
Applying Machine Learning Algorithms With Scikit-Learn (Sklearn) - Notes
No ratings yet
Applying Machine Learning Algorithms With Scikit-Learn (Sklearn) - Notes
19 pages
R-Programming - Unit 5
No ratings yet
R-Programming - Unit 5
43 pages
Regression (Hrishikesh)
No ratings yet
Regression (Hrishikesh)
30 pages
Introduction of Regression
No ratings yet
Introduction of Regression
57 pages
US - TMC - 06 - Curve Fitting & Interpolation
No ratings yet
US - TMC - 06 - Curve Fitting & Interpolation
64 pages
Intermediate Analytics-Regression-Week 1
No ratings yet
Intermediate Analytics-Regression-Week 1
52 pages
STAT630Slide Adv Data Analysis
No ratings yet
STAT630Slide Adv Data Analysis
238 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
13 pages
Lecture6 Regression
No ratings yet
Lecture6 Regression
42 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
Unit III
No ratings yet
Unit III
18 pages
Da Unit III
No ratings yet
Da Unit III
43 pages
Regression Coeffient
No ratings yet
Regression Coeffient
52 pages
What Is Multiple Linear Regression
No ratings yet
What Is Multiple Linear Regression
23 pages
Regression Kann Ur 14
No ratings yet
Regression Kann Ur 14
43 pages
Regression Analysis
100% (1)
Regression Analysis
280 pages
Da Unit III
0% (1)
Da Unit III
43 pages
Session 1: Simple Linear Regression: Figure 1 - Supervised and Unsupervised Learning Methods
No ratings yet
Session 1: Simple Linear Regression: Figure 1 - Supervised and Unsupervised Learning Methods
16 pages
BST 32202 Linear Regression 6 SLR Assumptions Lse
No ratings yet
BST 32202 Linear Regression 6 SLR Assumptions Lse
20 pages
Daunit 3
No ratings yet
Daunit 3
32 pages
Chapter 5
No ratings yet
Chapter 5
73 pages
Unit 3
No ratings yet
Unit 3
24 pages
Data Analytics Unit 3 Notes
100% (3)
Data Analytics Unit 3 Notes
28 pages
Regression Notes
No ratings yet
Regression Notes
7 pages
Linear Regression For Intermediate
No ratings yet
Linear Regression For Intermediate
6 pages
Concepts - Regression Overview
No ratings yet
Concepts - Regression Overview
14 pages
02 Simple Regression
No ratings yet
02 Simple Regression
29 pages
Ch10 - Curve Fitting
No ratings yet
Ch10 - Curve Fitting
157 pages
Linear Regression
No ratings yet
Linear Regression
56 pages
Simple Linear Regression and Multiple Linear Regression: MAST 6474 Introduction To Data Analysis I
No ratings yet
Simple Linear Regression and Multiple Linear Regression: MAST 6474 Introduction To Data Analysis I
15 pages
Experiment No 7
No ratings yet
Experiment No 7
7 pages
Regression Lecture Summary
No ratings yet
Regression Lecture Summary
31 pages
REGRESSION
No ratings yet
REGRESSION
8 pages

Fda Unit 5

Uploaded by

Fda Unit 5

Uploaded by

You might also like