0% found this document useful (0 votes)
691 views31 pages

Deepak Data Analysis 1

Uploaded by

Omkar More
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
691 views31 pages

Deepak Data Analysis 1

Uploaded by

Omkar More
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 31
Name: Deepak Yadav Assignment 1:Linear and Logistic Regression SETA Create ‘sales’ Data set having 5 columns namely: ID, TV, Radio, Newspaper and Sales. (random 500 entries) Build a linear regression model by identifying independent and target variable. Split thevariables into training and testing sets.then divide the training and testing sets into a 7:3 ratio,respectively and print them. Build a simple linear regression model. tees import numpy as np d = pd.read_csv("Conpany_data.csv") print (d) TV Radio Newspaper Sales @ 230.1 37.8 69.2 22.1 1 44.5 39.3 45.1 10.4 2 17.2. 45.9 69.3 12.6 3 151.5 41.3 58.5 16.5 4 180.8 10.8 58.4 17.9 195 38.2 3.7 3.8 7.6 196 94.2 4.9 81 1.e 197 177.8 9.3 64 14.8 198 283.6 42.6 66.2 25.5 199 232.1 8.6 8.7 18.4 200 rows x 4 columns d.shape 4.info() d.deseribe() Rangelndex: 200 entries, @ to 199 Data columns (total 4 columns) # Column —-Non-Null Count Dtype ew 200 non-null —_floate4 1 Radio 200 non-null —_floate4 2 Newspaper 200 non-null floated 3 Sales 200 non-null floated dtypes: floate4(4) memory usage: 6.4 KB TV Radio Newspaper Sales count 200,00000¢ 200000000 200.0000 200.000000 mean 147.04250C 23264000 30.554000 _15.130500 std 85,85423€ 14.846809 21.7862 5.283892 min 9.700000 0.000000 9.300000 1.600000 25% 74375000 9975000 12.750000 11.0000, 50% 149.75000¢ 22.9000 2.750000 16,000000 75% 218825000 36525000 45.1000 19.050000, max 296400000 49.600000 114,000000 7.000000 import matplotlib.pyplot as plt import seaborn as sns # Using pairplot we'll visualize the data for correlation sns.pairplot(d, x_vars=['TV', ‘Radio’, Newspaper"), y_vars='Sales', size=4, aspect=1, kind='scatter’) plt.show() C:\Users\YASH KULKARNI\AppData\Local \Programs\Python\Python31@\1ib\site-packages\seat orn\axisgrid.py:2076: Userwarning: The “size” parameter has been renamed to ‘height’; please update your code. warnings.warn(msg, Userarning) sns.heatmap(d.corr(), cma plt.show() 'YignBu", annot = True) In [10 In [12 In [1s 10 os a] a6 g - 4 2 02 a Newspaper Sales Simple Linear Regression # Creating X and y X= d['1V"] y = d['Sales"] # Splitting the varaibles as training and testing from sklearn.model_selection import train_test_split Xtrain, X_test, y train, y_test = train test_split(X, y, train_size = 0.7, test_size # Take a Look at the train dataset X_train y_train m 17.8 3 16.5 185 (22.6 2 © 15.8 90 (ide 87 16.6 103 19.7 67 134 4 9.7 8 a8 Name: Sales, Length: 148, dtype: floates # Importing Statsmodels.api Library from Stamodel package import statsmodels.api as sm # Adding a constant to get an intercept X_train_sm = sm.add_constant (X_train) # Fitting the resgression Line using ‘OLS" Ir = sm.OLS(y_train, X_train_sm).Fit() # Printing the parameters Ir. params const 6.948683 wv 0.054546 dtype: floated Lr.summary() OLS Regression Results Dep. Variable: Sales Resquared: 0816 Mode: OLS Adj. R-squared: 0.814 Method: Least Squares Festatistic 6112 Date: Sat,07 May 2022 Prob (F-statistic): 1.52e-52 Time: 163620 ikelihood: —-321.12 No. Observations: 1a Alc: 6462 Df Residuals: 138 BIC: 652.1 Df Model: 1 Covariance Type: onrobust coet stderr t-Pa|t| [0.025 0.975) const 59487 0385 18068 0.000 6.188 7.708 TW 20545 0.002 24722 0000 0.050 0059 Omnibus: 0.027 Durbin-Watson: 2.196 Prob(Omnibus): 0.987 Jarque-Bera (JB): 0.150 Skew: -0.00€ Probus): 0.928 Kurtosis: 2.840 Cond.No. 328 Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified Create ‘realestate’ Data set having 4 columns namely: ID,flat, houses and purchases (random 500entries). Build a linear regression model by identifying independent and target variable. Split thevariables into training and testing sets and print them. Build a simple linear regression model for predicting purchases. pee import pandas as pd import seaborn as sns from pylab import rcParans import matplotlib.pyplot as plt import matplotlib.animation as animation from matplotlib import rc import unittest. xmatplotlib inline sns.set(style='whitegrid’, palette='muted’, font_scale=1.5) reParans['Figure.figsize'] = 14, 8 RANDOM_SEED = 42 np. random. seed (RANDOM_SEED) def run_tests(): unittest .main(arg 1, verbosity=1, exit-False) import pandas as pd train = pd.read_csv('train.csv') print(train) Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape \ @ 1 60 RL 65.0 8450 Pave NaN Reg 1 2 20 RL 80.0 9608 Pave NaN Reg 2 3 60 RL 68.0 11258 Pave NaN IRL 3 4 70 RL 60.0 9558 Pave NaN IRL 4 5 60 RL 84.0 14260 Pave NaN IRL 1455 1456 6 RL 62.0 7917 Pave NaN Reg 1456 1457 28 RL 85.0 13175 Pave NaN Reg 1457 1458 78 RL 66.0 9842 Pave NaN Reg 1458 1459 28 RL 68.0 9717 Pave NaN Reg 1459. 1460 28 RL 75.0 9937 Pave NaN Reg Landcontour Utilities ... PoolArea PoolQC Fence MiscFeature MiscVal \ @ Lvl allpub @ NaN NaN NaN @ 1 Lvl allpub @ NaN NaN NaN @ 2 Lvl al.pub @ NaN NaN NaN @ 3 Lvl allpub @ NaN NaN NaN @ 4 Lvl allpub @ NaN NaN NaN @ 1455 Lvl al.pub @ NaN NaN NaN @ 1456 Lvl allpub @ NaN HnPrv NaN @ 1457 Lvl ALLPub @ NaN GaPrv Shed 2500 1458 Lvl ALLPub @ NaN NaN NaN @ 1459 Lvl al.pub @ NaN NaN NaN @ MoSold YrSold SaleType SaleCondition SalePrice @ 2 2008 wo Normal 208580 1 5 2007 wo Normal 181580 2 9 2008 wo Normal 223500 3 2 2006 wo Abnorml 140000 4 12 2008 Wo Normal 250000 1455 8 2007 wo Normal 17588 1456 2 2010 wo Normal 210000 1457 5 2010 wo Normal 266580 1458 4 2010 wo Normal 142125 1459 6 2008 wo Normal 147500 [1468 rows x 81 columns] ‘train[ 'SalePrice* ].describe() count 1460. 000000 mean 18921195890 std 79442.502883 min 34900..000000 25% 129975..000000 50% 163000..000000 73% 214900.080000 max 755000..000000 Name: SalePrice, dtype: floats sns.distplot(train[ 'SalePrice’ ]); C:\Users\YASH KULKARNI\AppData\Local \Programs\Python\Python31@\1ib\site-packages\seat orn\distributions.py:2619: FutureWiarning: “distplot’ is a deprecated function and wil 1 be removed in a future version. Please adapt your code to use either ‘displot’ (a f igure-level function with similar flexibility) or “histplot’ (an axes-level function for histograms). warnings.warn(msg, FutureWiarning) 128 Density 0 200000 490000 ‘600000 SalePrice var = ‘GrLivarea’ data = pd.concat((df_train{'SalePrice'], df_train{var)], axis-1) data.plot.scatter(x-var, y="SalePrice’, ylin=(0,880008), s=32); ‘ct argument looks like a single numeric RGB or RGBA sequence, which should be avoide d as value-mapping will have precedence in case its length matches with *x* & *y*. F lease use the *color* keyword-argument or provide a 2D array with a single row if you intend to specify the same RGB or RGBA value for all points. In [ 00000 ‘700000 ‘600000 3 00000 400000 SalePrice 00000 200000 100000 1000 2000 ‘3000 4000 5000 GrLivArea data data. plot. scatter “TotalpsntsF* pd.concat([df train['SalePrice’], df_train[var]], axis=1) ar, SalePrice', ylim=(@,800000)); ‘ct argument looks like a single numeric RGB or RGBA sequence, which should be avoide d as value-mapping will have precedence in case its length matches with *x* & *y*. F lease use the *color* keyword-argument or provide a 2D array with a single row if you intend to specify the same RGB or RGBA value for all points. 00000, 700000 00000 00000 400000 SalePrice 00000 ‘200000 100000 ° 1000 ‘2000 3000 4000 5000 ‘6000 TotalBsmiSF ‘overalqual’ data = pd.concat([df_train[ 'SalePrice'], df_train{var]], axis=1) #, ax = plt.subplots(Figsize=(14, 8)) fig = sns.boxplot(x-var, y="SalePrice", data=data) fig.axis(ymin=@, ymax=800000); 00000 e000 20000 $ se0000 400000 SalePrice 00000 4 econo a | 100000 am S =" = 5 6 7 8 9 10 OverallQual In [26]: coremat = train.corr() £, ax = plt.subplots(figsize=(12, 9) sns.heatmap(corrmat, vmax=.8, square=True); id -08 LotFrontage OveraliQual YearBuilt MasVnrArea BsmtFinSF2 TotalBsmtsF 2ndFirsF GrlivArea BsmtHaltBath HalfBath KitehenAbyGr Fireplaces GarageCars WoodDeckSF EnclosedPorch ScreenPorch MiscVal YrSold 06 04 }o2 0.0 0.2 --0.4 ld LotFrontage YearBuilt MasVnrArea HalfBath OverallQual BomtFinsF2 TotalBsmtsF 2ndFirsF GrLivArea BsmtHialfBath KitchenAbvGr Fireplaces GarageCars WoodDeckSF EnclosedPorch ScreenPorch In [27]: ols = ['SalePrice’, ‘Overallqual', ‘GrLivArea’, ‘GarageCars'] sns.pairplot(df_train{cols], size = 4); C:\Users\YASH KULKARNI\AppData\Local \Programs\Python\Python310\1ib\site-packages\seat orn\axisgrid.py:2076: UserWarning: The “size” parameter has been renamed to “height”; please update your code. warnings.warn(msg, UserWarning) ooo vil te i i i | — alll | | | i sdf I i a z e 8 : a LE I 3 eo edd Sale G1. | oem ‘ « mc00 000% 09 —=S«28 87S OSC OD ‘ ‘aeProe overtua ‘Givin df _train{ 'GrLivarea' } df _train{ 'SalePrice' ] x = (x ~ x.mean()) / x.std() x = np.c_[np.ones(x.shape[@]), x] In [29]: xeshape ouspoe), (2460, 2) In [38]: def loss(h, y): sqerron = (h = y)**2 n= len(y) return 1.8 / (2*n) * sqerror.sum() class TestLoss(unittest.TestCase): def test_zero_h_zero_y(self): self. assertAlnost€qual (1oss( p-array([@]), y=np.array([@])), @) def test_one_h_zero_y(self): self. assertAlnostEqual (1oss (=F p-array([1]), y=np.array([@])), @.5) def test_two_h_zero_y(self): self. assertAlnostEqual (1oss (=F p-array([2]), y=np.array([@])), 2) def test_zero_h_one_y(self): self. assertAlnostéqual (1oss (h=F p-array([@]), y=np.array([1])), @.5) def test_zero_h_two_y(self): self.assertAlnost€qual (loss(h=np.array((@]), y=np.array([2])), 2) run_tests() Ran 5 tests in 0.0085 0K. class Linearkegression: def predict(self, X): return np.dot(X, self._W) def _gradient_descent_step(self, X, targets, Ir): predictions = self.predict(x) error = predictions - targets gradient = np.dot(X.T, error) / len(x) self._W -= Ir * gradient def fit(self, X, y, n_iter=100000, 1r-0.01): self._W = np.zeros(X.shape[1]) self._cost_history = [] self._whistory = [self._W] for i in range(n_iter): prediction = self.predict(x) cost = loss(prediction, y) self._cost_history.append(cost) self._gradient_descent_step(x, y, Ir) self.» return self history.append(self._W.copy()) class TestLinearRegression(unittest. TestCase) : def test_find_coefficients (self): lf = LinearRegression() clf.fit(x, y, n_iter=2000, 1r-0.01) np. testing.assert_array_almost_equal(clf._W, np-array([180921.19555322, 56294.¢ run_tests() Ran 6 tests in 1.6615, 0K. clf = LinearRegression() clf.fit(x, y, n_iter=2008, 1r=0.01) ._tnain__.LinearRegression at @x219d3211c0> elf. array([180921.19555322, 56294.9199925]) plt.title('Cost Function 3") plt.xlabel(‘No. of iterations") plt.ylabel(‘Cost") pit. plot(clf._cost_history) plt.show() te10 Cost Funetion J 200 175 150 1.25 8 100 ors oso 025 o 250 500 750 tooo 1250» 1500-1750 ©2000 No. of iterations. clf._cost_history[-1] 1569921604.833264 fig = plt.figure() ax = plt.axes() plt.title('Sale Price vs Living Area’) plt.xlabel( ‘Living Area in square feet (normalised) ') plt.ylabel(‘Sale Price ($)") plt.scatter(x[:,1], y) Line, = ax.plot([], [], Iw=2, colo annotation = ax.text(-1, 700000, '') annotation. set_animated(True) plt.close() red") Generate the animation data, def init(): Line. set_data([], []) annotation. set_text('") return line, annotation # animation function. This is called sequentially def animate(i): x = np.linspace(-5, 20, 100) y = clf._whistory[i][1]*x + cl Line.set_data(x, y) annotation. set_text(‘Cost = %.2f e10" % (clf._cost_history[1]/1¢900000000) ) return line, annotation _w_history[i][@] anim = animation.FuncAnimation(fig, animate, init_func=init, frames=300, interval=10, blit-True) rc(‘animation', html= jshtm") anim ‘Animation size has reached 20990697 bytes, exceeding the limit of 20971520.0. If yo 4're sure you want a larger animation embedded, set the animation.enbed limit rc para meter to a larger value (in MB). This and further frames will be dropped. Sale Price vs Living Area o0000) Cost= 1.95 e10 0000 ‘50000 Sale Price (S) 5 00000 ‘000 10000 6 a 2 4 Living rea in square feet (normalised) ee Once @Loop O Reflect Create ‘User’ Data set having 5 columns namely: User ID, Gender, Age, EstimatedSalary andPurchased. Build a logistic regression model that can predict whether on the given parameter aperson will buy a car or not. import pandas as pd import numpy as np import matplotlib.pyplot as plt % matplotlib inline df=pd.read_csv("suv_data.csv") df.head() User ID Gender Age EstimatedSalary Purchased 15624510 Male 19 19000 0 15810944 Male 35 20000 15668575 Female 26 43000 1860324 Female 27 57000 0 0 0 0 15804002 Male 19 76000 In [52 In [54 In [55 df.info() RangeIndex: 400 entries, @ to 39 Data colunns (total 5 colunns): # Column’ Non-Nul. Count @ User 1D 400 non-null int64 1 Gender 400 non-null object. 2 Age 400 non-null intea 3. EstimatedSalary 400 non-null —int64 4 Purchased 400 non-null int64 types: int64(4), object(1) memory usage: 15.8% KB d#.4snul1() .sun() User ID Gender Age EstimatedSalary Purchased dtype: inte4 sns.countplot (df ‘Gender’ }) plt.show() C:\Users\YASH KULKARNI \AppData\ Local \Programs \Python\Python310\1ib\site-packages\seab orn\_decorators.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From version @.12, the only valid positional argument will be ‘data’, and passing other arguments without an explicit keyword will result in an error or misinterpretat ion. warnings.warn( 200 175 150 125 8 100 rc 8 Gender If. iloc[:,[2,3]].values If. iloc[:,4].values ts array([[ 19, 19000), [ 35, 20000], [ 26, 43000], [ 27, 57008), [ 19, 76088), [ 27, 58008), [ 27, 84000), [ 32, 158000), [ 25, 33008), [ 35, 65000), [ 26, seeee), [ 26, 52088), [ 28, 86000], [ 32, 18088), [ 18, 82000], [ 29, seeee), [ 47, 25088), [ 45, 26000], [ 46, 28000), [ 48, 2900), [ 45, 22008), [ 47, 49000), [ 48, 41000), [ 45, 22008), [ 46, 23000), [ 47, 20008), [ 49, 28008], [ 47, 30008), [ 29, 43088), [ 31, 18088), [ 31, 74988], [ 27, 137008), [ 21, 16008), [ 28, 44000], [ 27, 90008), [ 35, 27008), [ 33, 28008), [ 30, 49000), [ 26, 72008), [ 27, 31008), [ 27, 17000), [ 33, 51¢08), [ 35, 198000], [ 38, 15000], [ 28, 84000), [ 23, 20000], [ 25, 79000], [ 27, sae0e], [ 30, 135000], [ 31, 89000], [ 24, 32000), [ 18, 44000), [ 29, 83000], [ 35, 23000], [ 27, 58008), [ 24, 55e08), [ 23, 48000), [ 28, 79008), [ 22, 18088), [ 32, 117000], 20000], 87000], 66000), 120000), 83000], 58000], 19000], 82000], 63000], 68000], 80000], 27000), 23008), 113000], 18000], 112000], 52008), 27000), 87000], 17000], 30000], 42008), 49000), 88000], 62000], 118000], 55000], 85000], 81000], 50280], 81000), 116000], 15000], 28000], 83000], 44000), 25000], 123000], 73000], 37000], 88000], 59000], 86000], 149008), 21000], 72000], 35000], 89000], 86000], 80000], 71000], 71000], 61000], 55000], 80000], 57000], 75000], 52000], 59000], 59000], 75000], 72000), 75000), 53000], 51000], 61000], 65000], 32000], 17000], 34000], 58000], 31000], 87000], 68000], 55000], 63000], 82000], 107000), 59000], 25000], 35000], 68000], 59000], 89000], 25000], 89000], 96008], 30000], 61000], 74000), 15000], 45000], 76000], 52000], 47008), 15000], 59000], 75000], 30000], 135000], 100000], 90000], 33000], 38000], 69000], 86000], 55000], 71000), 148000], 47000), 88000], 115009], 118000], 43000), 72000], 28000], 47000), 22000], 23008), 34000), 16000], 71000), 117000], 43000), 60000], 66000], 82000], 41000), 72000], 32000], 84000], 26008), 43000), 70008), 89000), 43000], 79000], 36000), 80000], 22000], 39000], 74000], 134000], 71000], 101000], 47000), 130000], 114008), 142008), 22000], 96000], 158000], 42000], 58000], 43000), 108000], 65000], 78000], 96000], 143000], 80000], 91000], 1440008], 102000], 60000], 53000], 126000], 133000], 72000), 80000], 147000], 42000), 107000], 86000), 112000], 79000], 57000], 80000], 82000], 143000], 149000], 59000], 88000), 104000], 72000], 146000], 50000], 122000], 52000], 97000], 39000], 52000], 134008), 146008), 44000], 90000], 72008), 57008), 95000], 131000], 77000), 144000], 125000], 72000], 90000], 108000], 75008), 74000), 144008), 61000], 133000], 76000], 42000], 196000], 26000], 74000], 71000], 88000], 38000], 36000], 38000], 61000], 70000), 21000], 141008), 93000], 62000], 138000], 79000), 78000], 134000], 89000], 39000], 77000), 57000], 63000], 73000], 112009], 79000], 117000], 38000], 74000), 137000], 79000], 60000], 54000], 134000], 113000], 125000], 50000], 70200), 96008), 50000], 141008), 79008), 75000], 104000], 55000], 32000], 60000], 138000], 82000], 52000], 30000], 131000], 60000], 72000], 75008), 118000], 107000), 51000), 119000], 65000], 65000], 60000], 54000], 144000), 79000], 55000], 122000], 104000), 75000], 65000], 51000], 105090], 63000], 72000), 108000], 77e0@), 61000], 113009], 75000], 90000], 57e0@], 99000], 34000], 70000], 72000], 71000], 54000], 43, 129000], 53, 34000], 47, 50000], 42, 79000), 42, 104800), 59, 29000), 58, 47000), 46, 88000), 38, 71000), 54, 26000), 60, 46000], 60, 83000], 39, 73000), 59, 130000), 37, 80000], 46, 32000], 46, 74000], 42, 53000], 41, 87600), 23008), 42, 64800), 48, 33000), 44, 139000), 43, 28000), 57, 33000), 56, 60000), 43, 39000), 39, 71000), 47, 34000), 48, 35000], 48, 33000], 47, 23000), 45, 45000], 60, 42000), 39, 59000), 46, 41600), 51, 23000), 58, 20000), 36, 33000), 49, 36000}], dtype=inte4) In [57]: /¥ Beeeeeeeseeeer eae split in_test, port trail y_train,y_test=train_test_split (x,y, test_size: from sklearn.model_selection im -25, randon_state=0) x_test, X_train from sklearn.preprocessing import Standardscaler sc=StandardScaler() 5g 55 X train array([ 39000), 128000], 50000], 135000], 21008); 104000), 42000], 61008], 52008], 63008), 25008), 50000) , 73008), 49000), 29000], 65000), 131000], 89000), 82008), 51008], 15008]; 102000), 112000], 107000), 53008], 59008], 41000), 134000), 113000), 148008), 15000], 42000), 19000], 149000], 96008), 59008], 96008), 89008), 72008), 26008); 69008]; 82008), 74008), 80000), 72008), 149000], 71000], 146000], 73008], 75008), 51008], 75008), 78008), 61000], 108000], 82008), 74008), 65008), 80000), 117008], 61000], 68000], 44008], 87000], 33000], 90000], 42008), 123000], 118000], 37000], 71200], 70000), 39000], 23008), 147000], 138000], 86000), 79000), 138000], 23000], 60000], 113009], 107000], 33000], 80000], 96000], 18008], 71000], 129008), 76008), 44000], 118000], 90000], 30000), 43000), 78000], 59000], 42000), 74000], 91000], 59000], 57000], 143000], 26008), 38000], 113000], 143000], 27e00], 101000], 45000], 82000], 23000], 65000), 84200], 59000], 84000], 28000], 71000], 55000], 35000), 28000], 65000], 17008), 22008), 141000), 17008), 97008), 59008], 27008), 18008); 88000], 58000), 60000), 34000), 72000], 100000], 21000], 90000], 88000), 32008), 22008); 59008], 44000], 72008), 142000], 32008), 71008), 74008), 75008), 76608), 25000], 61000), 112000], 80000), 75008), 47200), 75008); 25008); 80008), 60008], 52008], 125000], 29008), 126008), 134000), 37000], 71008), 61000], 27000), 60000), 74008), 23008); 72008), 117000), 72008), 80008), 95008], 52008], 79008), 55000], 75000], 28000), 139000], 18008], 51008), 133000], 32008], 22008); 55000], 104000), 119608], 53000), 144000], 66600), 137008), 58000], 41000), 22000], 15008), 19008]; 74008); 122000], 73008), 71008), 23008); 72008); 83000], 26008), 44000), 75008), 47000), 68000], 54000), 135000], 114000], 36008), 133000], 61008), 89008), 16008); 31008), 72008), 33000), 125008), 131008), 71000), 62000], 72008), 63000], 47200), 116000], 49000], 74000], 59000], 89008), 79008); 82008), 57008], 34000), 108000], 72000], 71000], 106000], 57000], 72000], 23000], 108009], 17000], 134000], 43000), 43008], 38000], 45008), 72000), 134000], 137000], 16000], 32000), 66000], 73000], 79000], 50000], 30000], 93000], 46000), 22000], 37000], 55000], 54000], 36000], 194000], 57008), 108000], 23000], 65000], 20000], 36000], 79000], 33000], 72000], 39000], 31000], 70000), 79800], 81000], 80000], 85000], 39000), 88000], 88000], 150000], 65000], sage], 43000), 52000], 30000], 43000), 52000], 54900], 118000]], dtype=int6s) from sklearn.linear_model import LogisticRegression classifier=LogisticRegression(randon_state=0) classifier. fit (x train,y_train) Lassifier. predict (x_test) array([®, @, @, @, @, @, , ®, @, @, @, 0, @, @, @, @, @, @, @, @, @, 2, 2, @ @, @, 2, @ 0, % 2 2, 2 B 2, % 2 0, 2 2 2, 2 2 2, 2, 0 @, 2, 2, @ 0, 2 2, 2, 2, B 0, 2% 2 @, ® 2 2, 2 2 2, 2, 0 @, 2, 2, @ 0, 2 2, 2, 2 B, 2, % 2 @, ® 2 2, % 2 @, @, 0, 8, 6, 2, 8, 8, 2, 2, @, @, a], dtypenintés) y_test array([@, ®, @, 8, ® @, @ 1, 2, ® 0, 2 ® 2, 2 8 8, 1, 2 @ 1, @, 1, ® 1, 2 @ 2, % @ 1, 1, @ 0, 2 2 @, 2 1, 2 2 2 @ 1, @ ®, 1, @ 1, 1, 2 @, @ 1, 1, 2, % 1, @, 2 1, 2 2, 2 1, 2, ® @, 1, ® 0, 2, ® 2, % ® 1, 1, 1, @, ® @ 1, 1, 2 1, 1, ® @ 1, @, 8, 1, @ 1, 1, 1], dtypesintes) from sklearn.metrics import accuracy_score accuracy_score(y_test,y_pred) 0.68 from sklearn.metrics import accuracy_score accuracy_score(y_test,y_pred)*10@ 58.0 from sklearn.metrics import confusion_matrix cm=confusion_matrix(y_test,y_pred) array([[68, 2], [32, 0], dtype=inte4) SET B Build a simple linear regression model for Fish Species Weight Prediction. https://www kaggle.com/aungpyaeap/fish-market?select=Fish. csv import pandas as pd import seaborn as sns import matplotlib.pyplot as plt from itertools import combinations import numpy as np data = pd.read_csv("Fish.csv") data.head() Species Q Bream 1 Bream 2 Bream 3 Bream 4 Bream Weight Length1 Length2 242.0 2900 3400 3630 4300 data.isna().sum() SalePrice @ Overallqual @ dtype: intea 232 240 239 263 265 290 290 Length3 300 312 34 335 340 Height 11.5200 124800 123778 12.7300 12.4440 Width 4.0200 43056 4696" 4asss 5.1340

You might also like