0% found this document useful (0 votes)
19 views

Regression Problem

Uploaded by

Rajesh Mahapatra
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
19 views

Regression Problem

Uploaded by

Rajesh Mahapatra
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 28
sort7/22, 11:08 PM MU_prac_15-410-22 In import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from pandas_profiling import ProfileReport from scipy.stats import chi2_contingency Xmatplotlib inline I from sklearn.datasets import load_boston In boston = load_boston() C:\Users\Jecky\anaconda3\1ib\site-packages\sklearn\utils\deprecation.py:87: FutureWar nin Function load_boston is deprecated; “load_boston’ is deprecated in 1.0 and will be removed in 1.2. The Boston housing prices dataset has an ethical problem. You can refer to ‘the documentation of this function for further details. ‘The scikit-learn maintainers therefore strongly discourage the use of this dataset unless the purpose of the code is to study and educate about ethical issues in data science and machine learning. In this special case, you can fetch the dataset from the original source: : import pandas as pc import numpy as np data_url = "http://1ib. stat.cmu.edu/datasets/boston” raw_df = pd.read_csv(data_url, sep="\s+", skiprow: target = raw_df.values[: 2, 2) Alternative datasets include the California housing dataset (i.e :func:"~sklearn.datasets.fetch_california_ housing’) and the Anes housing dataset. You can load the datasets as follows:: from sklearn.datasets import fetch_california_housing housing = fetch_california_housing() for the California housing dataset and from sklearn.datasets import fetch_openml housing = fetch_openml(name="house_prices", as_frame=True) for the Anes housing dataset. warnings.warn(msg, category=FutureWlarning) In print(boston) ‘e116: Backup 16-05-2022/Downloads/ML_prac_15-10-22.himl 2, header=None) data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]]) 8 s0117/22, 11:08 PM {‘data': array([[6.3208e-03, 1.8000e+01, 2.3100e+00, ... 4,9808e+00], [2 9 [2 4. (6. 5. (1 6 [a 7. 7310e-02, 0.0000¢+00, 7.0700e+00, ... -1408¢+00), 7290e-02, 0.0000¢+00, 7.0700e+00, ... 0300600), 760e-02, 0.0000¢+00, 1.1938e+01, ... -6408¢+00)], 0959-01, 0.0000e+00, 1.1938e+01, ... -4808¢+00], 7410e-02, @.0008e+00, 1.193e+01, ~8800e+00]]), ‘target’: array([24. 1, 16.5, 18.9, 15. , 18 15. 2B. ai. 35. 19. 20. 2. 33. au. 20. 2. 15. v7. 25. 2. 32. 34, 20. 26. 31. 22. 42. 36. 32. 20. 20. 22. au. 19. 32. 18. 16. 3B. 7. 12. 27. 8. 9. 10. 15. 19. 29. 20. 2B ‘e116: Backup 16-05-2022/Downloads/ML_prac_15-10-22.himl 9, rs 21. 4. B 19. 24. 22. 21. 28. 27. 22. 20. 48. u. 15. 50. 24. 29. 3a. 21. 21. 41. 23. 21. 22. 33. 23. 22. 20. 2. a. 16. 30. 21. 15. 10. 8. a. 8. 14 a7. 16. 20. B. a. 19. ames’: array([*CR 6. 7, 20.4, 18. 5, 15.6, 13. -5, 18.9, 20. 3, 28. , 16 7, 31.6, 23: » 7a, 28 2, 20.3, 28. 7, 22.6, 22. 5, 26.5, 18. 8, 18.8, 18. 5, 17.3, 18. 4, 15.6, 18 8, 13.8, 15. 6, 13.1, 41. ) 23.8, 23. 6, 29.9, 37. 8, 34.9, 37. 9, 32.9, 24 7, 19.3, 22 7, 27.5, 30 7, 48.3, 29. 7, 17.6, 18. 9, 20.9, 44. 8, 30.7, 50. 2, 33.1, 29. 2, 22.3, 24. 5, 29. 5 24. 3, 16.1, 22. 8, 23.1, 20. 1, 19.4, 22) 5, 23.9, 31. 1, 18.2, 20. 9, 27.5, 21. 213.3, 23. 5, 7.4, 10. 5, 5.5 2, 27.5, 15. 4, 16.7, 14. :5, 14.1, 16. 1, 18.4, 15. 4, 17.8, 14. 2, 21.4, 19. 8, 13.3, 16. 2, 19.1, 20. 7, 18.3, 21. IM", "ZN", +9, 16. > 2. +6, 14. 3, 19. 29, 24. » 2B. » 22. 6, 19. 7, 18. 8, 21. 1, 17. 6, 14. 3, 24. 8, 22. 2, 39. » 30. +1, 42. 4, 28. -1, 44. » 2. 2. > 58. a. 1, 35. 8, 28. 8, 22. 1, 19. 4, 18. 2, 20. 2, 17. 6, 17. 9, 23. 3, 1B. 2, 11. 3, 5. 1. 2, 20. 1, 14. 4, 10. +9, 14. 9, 19. 7, 12. 6, 15. 2 17 3 37.5, ‘nous* ML prac 15-10-22 23. 14. 24 19 18. au. 24 25. 2. 18. 15 v7. v7. 2B v7 36. 36. 4g. 2. 50. 25.1, 20.5, 36. , 20.7) 45.4, 37.3, 4. 6 BED! Payeueb Sib Rabe buon 26. a. 25.5 2.4, 17.2, 21.7, 50. , 10. 15. 7. v7 BB. uu. a 12 19. 14. ap 16.8, apleubopen 17. 18. 30. 19. 16. 22. 22. 20. 19. 21. 16. B. 15. 27. 19. 37. 31. 58. 25. 37. 31. 24. 30. au. 35. 27. 33. 2. 24, 19. 2. 22. 50. 10. 2. 12. 16. 1. B. 14. 2B. 19. 21. a 22. 3 TCHAS', "NOX", 1.7800e+01, 1.7800e+01, 2.1000e+01, 2.1000e+01, 2.1000e+01, 16) 34. 20.2, a. , 34.9, 20.5, 22.2, 23.4, 23.9, 28.4, 19.5, 19.2, 1B. , 17.8, 21.5, 50. , 23.1, 32.5, 29.1, 22.6, 23.3, 31.6, 23.7, 26.2, 33.8, 25.2, 46. 23.9, 36.1, 16.2, 7, 18. 12. 26. 25. 25. 24. 26 21 2. 20. 14 4. 19. 50. 2. 26. 50. 24. 28 46. 2. 24 43, 24. 50. au. 28. 7. 2 20 26. 25. 50. i. B. 8. 2. 10. 8. 14. 20. 19. 2. 20 23. Daabob BERGE aaveay 33 5 1,530@e101, 3.96900+02, 3.9690e+02, 3.9283e+02, 3.9690+02, 3.9345e+02, 3.9690e+02, 4, 36.2, 28.7, 22.9, 27 13.6, 19.6, 44.5, 13.2, 25.3, 24.7, 23.4, 18.9, 33. , 23.5, 21.4, 20. , 22.5, 22.2, 38.7, 43.8, 19.8, 19.4, 19.3, 22. , 19.2, 19.6, 14.4, 13.4, 15.3, 19.4, 50. , 22.7, 22.6, 29.4, 29.6, 50. , 33.3, 30.3, 22.5, 24.4, 21.5, 23. , 31.5, 24.3, 22. , 20.1, 24.8, 29.6, 48.8, 31. , 35.2, 32.4, 32.2, 22. , 28.6, 27.1, 33.4, 28.2, 19.8, 23.1, 19.3, 22.6, 19. , 18.7, 22.9, 24.1, 19.9, 20.8, 50. , 13.8, 12.3, 8.8, 12.7, 13.1, 5.5 11.9, 7.5, 18.4, 10.9, 11. , 8.4, 12.8, 2B. , 13.4, 16.4, 17.7, 19.6, 23.2, 25. , 21.8, 21.8, 24.5, 22. , 11.9]), ‘feature_n ce", ‘DIS, “RAD', s0117/22, 11:08 PM ML_prac 15-10-22 'TAX', "PTRATIO', 'B', 'LSTAT'], dtype=' AGE proportion of owner-occupied units built prior to 1946 - DIS weighted distances to five Boston employment centres = RAD index of accessibility to radial highways - TX ___ full-value property-tax rate per $10,008 = PTRATIO pupil-teacher ratio by town -B 1000(Bk - 0.63)°2 where Bk is the proportion of black people by te wn = LSTAT —% lower status of the population = MEDV Median value of owner-occupied homes in $1000's sMissing Attribute Values: None sCreator: Harrison, D. and Rubinfeld, D.L. This is a copy of UCI ML housing dataset. https: //archive.ics.uci.edu/ml/machine-learning-databases/housing/ This dataset was taken from the StatLib library which is maintained at Carnegie Melle 1 University. The Boston house-price data of Harrison, D. and Rubinfeld, D.L. ‘Hedonic prices and the demand for clean air’, J. Environ. Economics & Management, vol.5, 81-12, 1978. Used in Belsley, Kuh & Welsch, ‘Regression diagnostics sss!) Wiley, 1988. N.B. Various transformations are used in the table or pages 244-261 of the latter The Boston house-price data has been used in many machine learning papers that addres s regression problems. <+ topic:: References = Belsley, Kuh & Welsch, ‘Regression diagnostics: Identifying Influential Data and Sources of Collinearity', Wiley, 1980. 244-261. = Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceed ings on the Tenth International Conference of Machine Learning, 236-243, University ¢ f Massachusetts, Anherst. Morgan Kaufmann ‘e116: Backup 16-05-2022/Downloads/ML_prac_15-10-22.himl 428 rae 15-10-22 Mp print(boston.target) s0117/22, 11:08 PM REQATEIRaMmm aTINdEHIATeERnwaqNedacaqs AQEt ATINGS AHR A amt Ada NTE aes Ten = wantanass és dsasusssusissdesadsdanndgnnnaudagda 1.7800¢+81 3.9690e+02 9.1400e+00} 1,7800e+81 3.9283e+02 4.0300e+00] 2,1000e+81 3.9690e+02 5.64000+00] 2,1000e+81 3.9345e+02 6.480100) 7410e-02 .00000+00 1.1938e+01 ... 2,1000e+01 3.9690e+02 7.88000+00] deat 2 ange saunnndagd sean ammandd|am Rdeddadddessaa sade sand sadism esscseds dddsagadadunda gaandaesdaendo“seesada ++ 1,5300e+01 3.9698e+02 4.98000+00) sASASERSIASSAGAREA SARA RARE MASA TA EA aS Bee anoceTpees oe cha ahaapasaeeseeeees SNSANAANSS NASR AABN AS ER SM eSAsIn eas 7310e-02 .00000+00 7.0708e+00 ... 7290e-02 @.00000+00 7.0708e+00 ... '0768e-02 0.80000+00 1.19300+01 ... 9959e-01 0.8000e+00 1.19300+01 ... [[6.32@8e-03 1.80@0e+01 2.31000+00 . 2 1s 2 2 3 2 2 3 1k 4 uv 1 2 3 3 1s 5i 2 4 5 3 2 2 1s 1s 2 2 1 1: z u 1 cE 2 1 cE 1 print(boston.data) i 2 fs ta [4 928 ['CRIM' 'ZN" "INDUS' "CHAS" 'NOX' 'RM' 'AGE' ‘DIS' 'RAD' ‘TAX’ 'PTRATIO" "BY 'LSTAT® Let's prepare the data set df = pd.DataFrane(boston.data, columns=boston. feature nanes) print(boston.feature_names) ‘e116: Backup 16-05-2022/Downloads/ML_prac_15-10-22.himl s0117/22, 11:08 PM af. shead() CRIM ZN INDUS CHAS 900632 180 23° 06 90273' 00 707 06 902728 00 707 06 003237 00 218 0c 906905 00 218 06 4df['price® ]=boston. target af. head() CRIM ZN INDUS CHAS 00632 180 23° oc 00273° 00 707 06 902728 90 707 06 003237 00 218 06 906905 00 218 oc Nox 0538 0.468 0.468 0458 0.458 Nox 0538 0.468 0.468 0.458 0.458 RM 6575 6.421 7.185 6998 7347 RM 6575 6421 7.185 6998 7347 ML_prac 15-10-22 AGE 652 789 6 458 542. AGE 652 789 ee 458 542 DIs RAD TAX PTRATIO 4.0900 4967" 4967" 6.0622 6.0622 Dis 4.0900 4967" 4967 6.0622 6.0622 10 20 20 30 30 10 20 20 30 30 2960 242.0 242.0 2220 222.0 TAX 2960 242.0 242.0 220 2220 PTRATIO B 396.90 396,90 392,82 394.68 396.90 396.90 396.90 392.82 394,63 396.90 df.info() # first need to check wether is there catagorical feature or not RangeIndex: 506 entries, @ to 505 Data columns (total 14 columns) Non-Null Count type # @ 9 Column 506 506 506 506 506 506 506 506 506 506 10 PTRATIO 06 a 2 B memory usage: 8 Lstat price 506 506 586 df.describe() ‘e116: Backup 16-05-2022/Downloads/ML_prac_15-10-22.himl non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null dtypes: floatea(14) 55.5 KE Floates Floates Floates Floates Floates Floates Floates Floates Floates Floate4 Floated Floated Floated Floates LSTAT s0117/22, 11:08 PM cRIM count 506,000000 mean std min 25% 50% 75% 3.613524 8.601545 9.006320 9.082045 n2s6510 3.677083 max 88976200 ZN INDUS 506.0000 506.00000¢ 1.363636 1.136775 23322452 6860353 0.000000 9.460000 0.000000 .19000¢ ‘nnn — 9.690000 12.500000 18.10000¢ '100,000000 27.74000¢ ‘df. isnull().sum() cRIM 2N INDUS cas NOX RM AGE ors RAD TAX DTRATIO 3 Lstar price @ @ @ @ @ @ @ @ @ @ @ @ @ @ dtype: inted there is not missing value EDA df.corr() ‘e116: Backup 16-05-2022/Downloads/ML_prac_15-10-22.himl ML prac 15-10-22 CHAS 506.0000 0.069170 0.253994 0.000000 9.000000 ‘noon 0.000000 1.000000 Nox 06,0000 0.554695 0.115878 0385000 0.449000 0538000 0.624000 0.871000 RM 506.000000 5.284634 0.702617 3.561000 5.885500 208600 5.623500 8.780000 AGE 506,000 68574901 28.148861 2.900000 45.025000 ‘7.500000, 94,075000 100.000000 Dr 506,00000 3.79504 2.10571 1.12960, 2.10017 320786 5.18842 1212650 128 sort7/22, 11:08 PM cRIM ZN INDUS CHAS Nox RM AGE Dis TAX PTRATIO. price cRIM 1.000000 -0.200468 0.406582 0.055892 0420972 0.219247 0352734 -0379670 0.62550 0582764 0.289946 0.385064 0.455621 0.388305 [18]: sns.painplot(d#) ‘e116: Backup 16-05-2022/Downloads/ML_prac_15-10-22.himl Nox 0.420972 -0516604 0.763651 0.091203 1.000000 0302188 0.731470 0.769230 0.611441 0.668023 0.188933 0.380051 0.90879 0427321 RM -0.219247 0311991 0391676 0.091251 0302188 1.900000 0.24026 0.205246 -0.209847 0292048 0.35550" 0.128069 -0.613808 0.695360 AGE 0.352734 -0.569537 0.64479 0.086518 0.731470 0.240265 1.000000 0.74788" 0.456022 0506456 0.261515 0.273534 0.602339 -0376955 Dis -0379670 0.664408 0.708027, 0.099176 0.769230 9.205246 0.74788" 1.000000 0.494588 0534432, 023247" 0.291812 -0.496996 0.249929 0.62! 031 059! 0.00 061 0.20 0.45 0.49. 1.00 091 0.46 44. 0.48 038 aN. Wed av | (Rip SRR Bcd Ny ouria), “AxesSubplot:> 5. 2022/Dowrleads ML, prac_15-10-22,himl ‘e116: ackup 16-0 s0117/22, 11:08 PM ML_prac 15-10-22 10 RIN a 08 INDUS 06 cas Nox on Rut AGE 02 Ds oo RAD Pe eu aK 02 PrRATIO |--o4 8 LSTAT “oe sns. scatterplot (d¥[ CRIM’), df[ "price" ]) Cs\Users\Jecky\anaconda3\1ib\site-packages\seaborn\_decorators.py:36: Futurearning: Pass the following variables as keyword args: x, y. Fron version 0.12, the only valid positional argument will be “data’, and passing other arguments without an explicit k eyword will result in an error or misinterpretation. warnings.warn( ‘e116: Backup 16-05-2022/Downloads/ML_prac_15-10-22.himl ML prac 15-10-22 RM 1208 so117i22, 11:08 PM ML prac 15-10-22 mee mem enee m8 0 2 x 0 ® price Independent and dependent Features f-A1oc[:, :-1] df-dloc[:, -1] In (24): xehead() out [24 CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B _LSTAT. © 9.00632 180 23° OC 0538 6575 652 40900 10 296 153 39690 498 100273 00 7.07 OC 0469 6.421 789 49677 20 242 «178 39690 9.14 2 002728 00 707 OC 0468 7.185 61." 4967" 20 2420 17.8 39283408 3 003237 00 218 OC 0458 6998 458 6.0622 30 2220 187 39462 294 4 006905 00 218 OC 0458 7.147 542 6.0622 30 2220 187 39690 533 In [25]: y # depedent data will be an array ‘e116: Backup 16-05-2022/Downloads/ML_prac_15-10-22.himl s0117/22, 11:08 PM 501 502 503 504 505 34.7 33.4 36.2 22.4 20.6 23.8 22.8 1s ML prac 15-10-22 Name: price, Length: 506, dtype: floatea from sklearn.model_selection import train_test_split x train, x test, x train 147 330 388 238 113 320 15 484 125 265 cRIM 2.36862 0.04544 1433370 0.08244 0.22212 0.16760 062736 237857 0.16902 076162 x,y, test_size=i 30.0 00 00 00 00 00 200 INDUS CHAS 1958 00 324 00 1810 00 493 00 1001 00 738 00 B14 00 1810 00 2568 00 39700 339 rows x 13 columns y_train uy 146 330 «19.8 388 10.2 238 23.7 1318.7 320 23.8 1s 19.9 484 (20.6 1s 21.4 265 22.8 Nox 0871 o46e 0.700 428 0547 0493 0538 0583 0581 647 RM 4926 5.144 4880 5.48" 5.092 5.426 5.834 587° 5.986 5.560 Name: price, Length: 339, dtype: floated ‘e116: Backup 16-05-2022/Downloads/ML_prac_15-10-22.himl 98.7 322 100.0 185 95.4 y_train, y_test = train test_split( -33, randon_state=10) pis 1.4608 5.8736 15895 6.1899 2.5480 45404 4.4986 3.7240 1.9929 119R66 RAD 50 40 240 6c 6c 50 40 240 20 50 TAX PTRATIO 4030 «147 4300 © 169 6660 202 3000 166 4320-178 2870196 3070210 6660 202 198019. 740130 39171 36857 37292 37941 396.90 396.90 395,62 370.73 385,02 3940 72 84 133 1481 194 1428 s0117/22, 11:08 PM x test cRIM 305 0.05479 193 0.02187 65 903584 349° 0.02899 151 1.49632 442 5.6637 451 5.44114 188 9.12579 76 910153 314 9.36920 ZN 330 500 800 40.0 00 00 00 450 00 00 INDUS 218 293 337 128 1958 18.10 18.10 344 1283 9.90 167 rows x 13 columns ytest 305 193 65 349 ist 442 451 188 76 314 28.4 31.1 23.5 26.6 19.6 18.4 15.2 29.8 20.6 Be CHAS ac oc oc ac oc oc oc oc oc oc Nox 0472 0.401 0398 02g og71 0.740 ona 0437 0437 0544 ML prac 15-10-22 RM 66t€ 6.800 6.290 6.938 5.404 6216 6.655 6.55¢ 6276 6.567 Name: price, Length: 167, dtype: floated Algorithms AGE 581 178 345 1000 1000 982 204 74s 873 Standardize the dataset Dis 33700 52196 56118 8.792" 1916 2.0048 2.3552 45667 4.0522 3.6022 from sklearn.preprocessing import Standardscaler scaler = Standardscaler() scaler StandardScaler() ‘e116: Backup 16-05-2022/Downloads/ML_prac_15-10-22.himl RAD 70 10 40 10 50 240 240 50 50 40 TAX PTRATIO 220 184 265015 3370161 3350197 4030 147 6660-202 6660 202 3980152 3980187 3040184 8 LSTAT 39336 8.93 39337 5.02 39690 467 30985 5.89 341.60 1328 39565 16.59 35528 17.73 38284 456 37366 11.97 39565 9.28 1628 s0117/22, 11:08 PM ML prac 15-10-22 x_train-scaler.fit_transform(x_train) x_test=scaler.fit_transform(x_test) Model Training Linear Regression fron sklearn.linear_model import LinearRegression regression = LinearRegression() regression LinearRegression() regression.fit(x train, y_train) LinearRegression() ## Print the coefficient and the intercept print(regression. coef_) [-1.29099218 1,60949999 -0.14031574 0.37201867 -1.76205329 2.22752218 0. 32268871 -3.31184248 2.70288107 -2.09005699 -1.7609799 1.25191514 -3.83392028 ## Print the coefficient and the intercept print (regression. intercept_) 22.077286135693214 ## Prediction for the test data reg_pred = regression.predict(x_test) reg_pred ‘e116: Backup 16-05-2022/Downloads/ML_prac_15-10-22.himl 1628 s0117/22, 11:08 PM array([38. 93866374, 51567981, -91023943, 21286889, 4846032 36770864, 86813617, ~52599281, ~51565881, 72668462, 76066465, 84579024, 15462533, 83848542, 86730156, -92008623, 76329757, 17894088, -75886976, 84022702, 45456247, 96276763, -71605337, 21403345, 17333191, 50580783, -79365854, +10073374, -14686104, 59330965, 81906182, 72263707, 38661658, 14 28 38 31 19 31 18 32 27 2B 33 18 16 7 9. 21 26 28 v7 22 19 14 7 32 28 32 2 27 14 18 14 10 21 59684412, 30 34. 29 22 ru 18 30. 2 28 18 18. 20. 22 22 25 36. 19 24) 18 26. ru 22 21 15 28 12 6. 27 2 35 19 23 10. 24 926267, 20445758, 29010812, 81724097, 05779645, 58174565, |-60697117, -95160934, 58351045, 63745819, -98299935, - 20644574, 48481008, -51725859, 57026884, 74993168, -61736113, 78626934, 97875733, 17846214, 4699548 , 89957413, 44772184, 51768247, 44848279, -16419639, 24421681, -40493207, 4420681 , 88269423, 62710053, 29016357, - 50510273, 40657928] ) plt.scatter(y_test,reg_pred) plt.xlabel("Test truth data") plt.ylabel("Test predicted dat ) 30 14 27 2 24 15 25 34 7 21 2B 19 37 3 18 8 19 30 2B 20 20 2 25 16 20 20 33 32 v7 25 33 10 18 Text(@, @.5, ‘Test predicted data’) ‘e116: Backup 16-05-2022/Downloads/ML_prac_15-10-22.himl ML prac 15-10-22 33014998, +1327705. 09402183, -74229525, 37752707, 98217045, 2851063 37316636, -76556968, -0139912 , -59362751, 10609598, +17864945, -73862965, 83335769, 49692665, 2870424 38234859, 64885528, 88404552, -19755955, 95015342, 26786005, 47732825, 3607667 , 95908769, 8495783 9263249 761899653, -0048405 -21222778, 64630177, 72028554, 20, 24, 32 24, 2B 36 22 25 30 36 32 18 40) 30 24, 7 27 20 2 25 10 27 27 14, B 23 22 6 18 22 19 14, 18 -75874819, 96494922, 91714617, -53374667, 6689752 , 38381088, 49875388, 96168347, -55212407, 36201436, 32951618, -20139667, 98162476, -26577625, 36459384, -75796033, -68689001, -75541355, 1048085 , 93730326, 98326941, -49604956, 32556869, 68378398, -52648523, 17246362, 46565739, 89796776, -99994513, -33310553, -40442199, -63031452, 64114445, 18 35 32 20, 31 19. 26. 16. 45 25 23 29. 26. 23 28 29. 5 18 2 2 15 23 20 15 30 16. 26. uu 24. 31 32360471, +16101054, 33310274, 17666147, 06039985, 36033852, 84186452, 43686051, 93882687, 752907374, 35688728, 175014442, 06076978, 39088323, 69116283, 9848192 2610806 -8156888 45240999, 22629068, ~69503443,, +15262955,, 33577408, 46093521, :23810797, 12502105, 88765778, 197346158, +34124369, 09698705, 40111097, -55720944, -37460282, 728 s0117/22, 11:08 PM MU_prac_15-10-22 * 2 3 z2 i E i ® oe e : ¢ . . 0 » 2 Test run da tn (93); [Feskduel|e yltest realpred th (94); restauen 305 -2.196844 193 0.173733, 65 83015¢ 349 5.841252 151 1.276395 320286, 442 451 441144 18 -1.574603, 76 -1.386617 314 606579 Name: price, Length: 167, dtype: floate4 In [95]: sns-displot (residuel, int kde") ‘e116: Backup 16-05-2022/Downloads/ML_prac_15-10-22.himl s0117/22, 11:08 PM MU_prac_15-10-22 a0 08 Density 04 02 2.00 10 0 0 » 2 price In [96]: # scatter plot with prediction and residuel # Uniforndistribution sns.scatterplot(x = reg_pred, y-residuel) 3 : » 6 ” << 5 e5 . td . 5 ae ‘e116: Backup 16-05-2022/Downloads/ML_prac_15-10-22.himl 1928 s0117/22, 11:08 PM ML prac 15-10-22 Performance Matrix fron sklearn.metrics import mean_squared_error rom sklearn.metrics import mean_absolute_error print(mean_absolute_error(y test, reg_pred)) print(mean_absolute_error(y test, reg_pred)) print(np.sqrt(mean_squared_error(y test,reg_pred))) 3,545903243244831 3.545903243244831 5.485719266731377 Ridge Regression * Different cases for tuning values of lambda. «If lambda is set to be 0, Ridge Regression equals Linear Regression «If lambda is set to be infinity, all weights are shrunk to zero. from sklearn.linear_model import Ridge regression_ridge = Ridge() # by default Alpha means Lembda is 1.0 regression_ridge. fit(x_train, y_train) Ridge() ## Print the coefficient and the intercept print(regression_ridge.coet_) [-1.27565151 1.581946 [email protected] 037673024 ~ @. 30956702 -3.26398836 2.60274628 -1.99924081 - ~3.81456087 72386872 2.24434183 -75198618 1,25002916 wa Print the coefficient and the intercept print(regression_ridge.intercept_) 22.077286135693214 ## Prediction for the test data rreg_pred_ridge = regression_ridge.predict(x test) reg_pred_ridge ‘e116: Backup 16-05-2022/Downloads/ML_prac_15-10-22.himl 20128 so/17r2, 1:08PM array ([38.. 14. 20. 38. 31 19. 31 18. 32. 27 2. 33. 18. 16. 17. 9 a 26. 20. 17 2 19. 14 17 32. 28. 32. 2 27 14. 10. 14 10. 2 47802312, 93817009, 52274275, 87592661, -18079496, -4808925 , -37630256, 89512109, -5103288 , +52407236, 70566654, -71010974, 88584721, -19197901, 83722865, 89541069, -93107222, -75659283, 20938667, 80488432, 83997441, -49969242, 96494986, -7038021 , -14560921, -12936983, 3782836, 81658283, 709241446, 04770175, -62894981, -95418649, -73311408, 39994887, 30. 34. 29 22 7 18. 30. 2 28. 18. 18. 20. 22 22 25 36. 19 24. 18. 26. ru 22 21 15 28. 2 6. 27 2 35 19 23 10. 24 91378311, 30. 18224216, 14. -25957974, 27. 85177542, 22. 05476934, 24. 60946865, 15. -55350734, 25. 99853482, 34. -52299886, 17. 63555774, 26. 95879689, 23. - 23405015, 19. -51389699, 37. 52855139, 3. 57724675, 18. 71724266, 8. -7421639 , 19. 78405803, 30. 97117238, 23. -1556996 , 20. -49498074, 26. 84882676, 13. 40888738, 25. -51151998, 16. 50866445, 20. -16007381, 20. 28606477, 33. -37564336, 32. 46097305, 17. 83013577, 25. 65449462, 33. -29675113, 16. -50856274, 18. 40421817) plt.scatter(y_test,reg_pred_ridge) plt.xlabel("Test truth data") plt.ylabel("Test predicted dat ) Text(@, @.5, ‘Test predicted data’) ‘e116: Backup 16-05-2022/Downloads/ML_prac_15-10-22.himl ML prac 15-10-22 28916234, 16783774, 02281082, 66760814, 38770048, 991394 31046057, 26864254, 7477341 96823118, 54440927, 10649882, 19941037, 72448691, 80653549, 49179926, 33365969, 40265898, 64925953, 39483723, 197837, 11326122, 29828649, 53186868, 37593295, 94887373, 78410615, 88806358, 62264899, 04183569, 22855888, 62216115, 70537964, 20, 24, 32 24, 2B 36 22 25 30. 36 32 18 40. 30, 24, 7 27 20. 22 25 10. 27 27 14, B 23 22 6 19 22 19 14, 18 89790667, 01959062, 8357385, 52239653, 6779802 , 34571447, 5187788 , 93505381, 52981871, 29399577, 35692503, -17539741, 9427759, - 18410366, 30459186, -74187179, -65125979, -78336115, - 13304853, 89120283, 87994745, 4783905, 30949156, -73725104, 61069076, -17669959, 54714189, 88507233, 04654591, 31749666, 4427778, -69528013, 63965107, a8. -11481314, -29172541, 25626731, -10458931, 33059019, 80612046, 42702635, 94750999, -56355187, 3545357 , -73657695, 04506677, -5005151 , -71674006, -03245081, 25922195, 76391348, 42209282, 23770914, -73240465, 23070923, -29848912, 42204672, 27484185, 13844266, 89429575, 06016713, 34942543, 05349072, 36730464, 58515036, -36374771, 35 32 20 31 19 26 16. 45 25 23 29 26. 23 29 29 5 18 22 12 15 23 20 15 30 16. 26 nn 24. 31 36003743, 28 s0117/22, 11:08 PM ML prac 15-10-22 0 » Bm & 5 & 0 oe é ‘° a . 10 » Test truth data In [107. residuel_ridge In [108. residuel_ridge 305 -2.078023 : 193 0.186217 55 789162 349 5.702093, 151 1.239963 442 -0,30538¢ 451 -3.439651 188 -1.563748 76 +399949 314 -0.604218, y_test-reg_pred_ridge In [10 Name: price, Length: 167, dtype: floate4 sns.displot(residuel_ridge, kind='kde" ) ‘e116: Backup 16-05-2022/Downloads/ML_prac_15-10-22.himl 228 sort7/22, 11:08 PM MU_prac_15-410-22 a0 08 Density 04 02 2.00 10 0 0 » a price In (110. # scatter plot with prediction and residuel # Uniforndistribution sns.scatterplot(x = reg_pred_ridge, y-residuel_ridge) 5