0% found this document useful (0 votes)

2 views20 pages

Machine Learning Lab

Uploaded by

ramyarajan0713

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views20 pages

Machine Learning Lab

Uploaded by

ramyarajan0713

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Mini Project

1. a. Housing Price Decision Sklearn Linear Regression

Aim:
To predict the price of houses using linear regression

Procedure:

1. Import libraries
In [1]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))
/kaggle/input/housing-prices-dataset/Housing.csv
2. Reading data
In [2]:
data_frame = pd.read_csv('/kaggle/input/housing-prices-dataset/Housing.csv')
data_frame
Out[2]:

are bedro bathro stor mainr guestr base hotwaterh aircondit park pref furnishin
price
a oms oms ies oad oom ment eating ioning ing area gstatus

1330 74
0 4 2 3 yes no no no yes 2 yes furnished
0000 20

1225 89
1 4 4 4 yes no no no yes 3 no furnished
0000 60

1225 99 semi-
2 3 2 2 yes no yes no no 2 yes
0000 60 furnished

1221 75
3 4 2 2 yes no yes no yes 3 yes furnished
5000 00

Page | 1
are bedro bathro stor mainr guestr base hotwaterh aircondit park pref furnishin
price
a oms oms ies oad oom ment eating ioning ing area gstatus

1141 74
4 4 1 2 yes yes yes no yes 2 no furnished
0000 20

... ... ... ... ... ... ... ... ... ... ... ... ... ...

5
1820 30 unfurnish
4 2 1 1 yes no yes no no 2 no
000 00 ed
0

5
1767 24 semi-
4 3 1 1 no no no no no 0 no
150 00 furnished
1

5
1750 36 unfurnish
4 2 1 1 yes no no no no 0 no
000 20 ed
2

5
1750 29
4 3 1 1 no no no no no 0 no furnished
000 10
3

5
1750 38 unfurnish
4 3 1 2 yes no no no no 0 no
000 50 ed
4

545 rows × 13 columns

3. Cleaning the data
Convert datatype "string" to "int64" in some categories
In [3]:
data_frame['mainroad'] = data_frame['mainroad'].astype('category')
data_frame['mainroad'] = data_frame['mainroad'].cat.codes

data_frame['guestroom'] = data_frame['guestroom'].astype('category')
data_frame['guestroom'] = data_frame['guestroom'].cat.codes

Page | 2
data_frame['basement'] = data_frame['basement'].astype('category')
data_frame['basement'] = data_frame['basement'].cat.codes

data_frame['hotwaterheating'] = data_frame['hotwaterheating'].astype('category')
data_frame['hotwaterheating'] = data_frame['hotwaterheating'].cat.codes

data_frame['airconditioning'] = data_frame['airconditioning'].astype('category')
data_frame['airconditioning'] = data_frame['airconditioning'].cat.codes

data_frame['prefarea'] = data_frame['prefarea'].astype('category')
data_frame['prefarea'] = data_frame['prefarea'].cat.codes

code_mapping_furniture = {'unfurnished':0, 'semi-furnished':1, 'furnished':2}

data_frame['furnishingstatus'] = data_frame['furnishingstatus'].astype('category')
data_frame['furnishingstatus'] = data_frame['furnishingstatus'].map(code_mapping_furniture)

## We need to map the value of furnishingstatus with the numbers above since it will increase the accuracy of
the model, because of the fact that the more furniture we have, the more expensive the house will be

data_frame
Out[3]:

are bedro bathro stor mainr guestr base hotwaterh aircondit park pref furnishin
price
a oms oms ies oad oom ment eating ioning ing area gstatus

1330 74
0 4 2 3 1 0 0 0 1 2 1 2
0000 20

1225 89
1 4 4 4 1 0 0 0 1 3 0 2
0000 60

1225 99
2 3 2 2 1 0 1 0 0 2 1 1
0000 60

1221 75
3 4 2 2 1 0 1 0 1 3 1 2
5000 00

1141 74
4 4 1 2 1 1 1 0 1 2 0 2
0000 20

... ... ... ... ... ... ... ... ... ... ... ... ... ...

Page | 3
are bedro bathro stor mainr guestr base hotwaterh aircondit park pref furnishin
price
a oms oms ies oad oom ment eating ioning ing area gstatus

5
1820 30
4 2 1 1 1 0 1 0 0 2 0 0
000 00
0

5
1767 24
4 3 1 1 0 0 0 0 0 0 0 1
150 00
1

5
1750 36
4 2 1 1 1 0 0 0 0 0 0 0
000 20
2

5
1750 29
4 3 1 1 0 0 0 0 0 0 0 2
000 10
3

5
1750 38
4 3 1 2 1 0 0 0 0 0 0 0
000 50
4

545 rows × 13 columns

Check if the data frame contains Null values
In [4]:
data_frame.isnull().sum()
Out[4]:
price 0
area 0
bedrooms 0
bathrooms 0
stories 0
mainroad 0
guestroom 0
basement 0
hotwaterheating 0
airconditioning 0
parking 0
prefarea 0
furnishingstatus 0
Page | 4
dtype: int64
Define target value and indepentdent variables
In [5]:
x = data_frame.drop(columns = 'price')
y = data_frame['price']
y_max = max(data_frame['price'])
print(y_max)
13300000
Delete outliers
In [6]:
# import libaries
import sklearn
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import IsolationForest
/opt/conda/lib/python3.10/site-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.16.5 and
<1.23.0 is required for this version of SciPy (detected version 1.23.5
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
In [7]:
# Create an instance of Isolation Forest
outlier_detector = IsolationForest(contamination=0.07) # Adjust the contamination parameter as needed

# Fit the detector on your data

outlier_detector.fit(x)

# Predict outliers (anomalies)

outliers = outlier_detector.predict(x)

x_clean = x[outliers == 1]
y_clean = y[outliers == 1]

#x = x_clean
#y = y_clean

## In this version, I DON'T USE the x_clean, and y_clean, but I have noted the result with different
contaminaitons at the end of the file (I have runned it sereval times with different contaminations before)
/opt/conda/lib/python3.10/site-packages/sklearn/base.py:439: UserWarning: X does not have valid feature
names, but IsolationForest was fitted with feature names
warnings.warn(
4. Split data
In [8]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.3, random_state = 0)
print("Done")
Done
5. Training the model
In [9]:
model = LinearRegression()
model.fit(x_train, y_train)
Page | 5
print("Done")
Done
In [10]:
c = model.intercept_
m = model.coef_
print(c)
print(m)

## y = m1x1 + m2x2 + ... + c

-353311.8336093277
[2.48857876e+02 1.34994406e+05 9.50583380e+05 4.18321569e+05
4.66890751e+05 3.68497644e+05 3.59364424e+05 1.24665331e+06
8.97037026e+05 2.23301809e+05 6.96754525e+05 2.30222653e+05]
In [11]:
y_pred_train = model.predict(x_train)
print("Done")
Done

In [12]:
import matplotlib.pyplot as plt
plt.scatter(y_train, y_pred_train)
plt.xlabel("Actual result")
plt.ylabel("Predicted result")
x_point = np.array([0,14000000])
y_point = np.array([0,14000000])
# max value of y is around 13 million
plt.plot(x_point, y_point, c = 'r')
print("RESULT WITH TRAINED DATA")
print("Number of data train: ", len(x_train))
plt.show()

Page | 6
## I create the red line because it is easier to visualize the data, if the dot is near the red line, that means the
model is quite accurate
RESULT WITH TRAINED DATA
Number of data train: 381

In [13]:
from sklearn.metrics import r2_score
r2_score_without_test = r2_score(y_train, y_pred_train)
print(r2_score_without_test)
0.6575703217254214
6. Test the model with tested data
In [14]:
y_pred_test = model.predict(x_test)
print("Done")
Done
In [15]:
import matplotlib.pyplot as plt
plt.scatter(y_test, y_pred_test)
plt.xlabel("Actual result")
plt.ylabel("Predicted result")
x_point = np.array([0,14000000])
y_point = np.array([0,14000000])
plt.plot(x_point, y_point, c = 'r')
print("RESULT WITH TESTED DATA")
print("Number of data test: ", len(x_test))
Page | 7
plt.show()

RESULT WITH TESTED DATA

Number of data test: 164

In [16]:
from sklearn.metrics import r2_score
r2_score_with_test = r2_score(y_test, y_pred_test)
print(r2_score_with_test)
0.723501522320035
7. Result
Result with different contaminations
linkcode

Contaminations 0% 3% 5% 7%

Without test 0.67 0.66 0.68 0.62

With test 0.72 0.64 0.6 0.64

1. b. Housing Prices multiple Regression

import numpy as np # linear algebra

import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory

# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you
create a version using "Save & Run All"
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session
/kaggle/input/housing-prices-dataset/Housing.csv
In [2]:
# Import necessary libraries
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
Page | 8
import matplotlib.pyplot as plt

df = pd.read_csv(r'/kaggle/input/housing-prices-dataset/Housing.csv')

# List of variables to map

varlist = ['mainroad', 'guestroom', 'basement', 'hotwaterheating', 'airconditioning', 'prefarea']

# Defining the map function

def binary_map(x):
return x.map({'yes': 1, "no": 0})

df['newFurnish'] = LabelEncoder().fit_transform(df['furnishingstatus'])

# Applying the function to the housing list

df[varlist] = df[varlist].apply(binary_map)
df.drop(['furnishingstatus'], axis = 1, inplace = True)
df.head(5)
Out[2]:

are bedro bathro stor mainr guestr base hotwaterh airconditi park prefa newFur
price
a oms oms ies oad oom ment eating oning ing rea nish

13300 74
0 4 2 3 1 0 0 0 1 2 1 0
000 20

12250 89
1 4 4 4 1 0 0 0 1 3 0 0
000 60

12250 99
2 3 2 2 1 0 1 0 0 2 1 1
000 60

12215 75
3 4 2 2 1 0 1 0 1 3 1 0
000 00

11410 74
4 4 1 2 1 1 1 0 1 2 0 0
000 20

In [3]:
# Drop missing and invalid values
df = df.dropna()

# Separate the independent and dependent variables

Page | 9
X = df.drop(['price'], axis=1)
y = df['price']

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
In [4]:
# Create a linear regression model
model = LinearRegression()

# Fit the model to the training data

model.fit(X_train, y_train)

# Predict the prices on the test data

y_pred = model.predict(X_test)
In [5]:
# Calculate the mean squared error
from sklearn.metrics import mean_squared_error
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)
Mean Squared Error: 986041803890.0269
In [6]:
from sklearn.metrics import r2_score
r2_score(y_test, y_pred)
Out[6]:
0.6578047592637595
In [7]:
# Calculate the root mean squared error
rmse = mean_squared_error(y_test, y_pred, squared=False)

# Print the root mean squared error

print("Root Mean Squared Error:", rmse)
Root Mean Squared Error: 992996.3765744701
In [8]:
plt.scatter(y_test,y_pred)
plt.xlabel('actual')
plt.ylabel('predected')
plt.show
Out[8]:
<function matplotlib.pyplot.show(close=None, block=None)>

Page | 10
Page | 11
2. Data Project - Stock Market Analysis

Time Series data is a series of data points indexed in time order. Time series data is everywhere, so
manipulating them is important for any data analyst or data scientist.
we will discover and explore data from the stock market, particularly some technology stocks (Apple,
Amazon, Google, and Microsoft). We will learn how to use yfinance to get stock information, and visualize
different aspects of it using Seaborn and Matplotlib. we will look at a few ways of analyzing the risk of a stock,
based on its previous performance history. We will also be predicting future stock prices through a Long Short
Term Memory (LSTM) method!
We'll be answering the following questions along the way:
1.) What was the change in price of the stock over time?
2.) What was the daily return of the stock on average?
3.) What was the moving average of the various stocks?

4.) What was the correlation between different stocks'?

5.) How much value do we put at risk by investing in a particular stock?
6.) How can we attempt to predict future stock behavior? (Predicting the closing price stock price of APPLE inc
using LSTM)

Getting the Data

The first step is to get the data and load it to memory. We will get our stock data from the Yahoo Finance
website. Yahoo Finance is a rich resource of financial market data and tools to find compelling investments. To
get the data from Yahoo Finance, we will be using yfinance library which offers a threaded and Pythonic way to
download market data from Yahoo. Check this article to learn more about yfinance: Reliably download
historical market data from with Python
1. What was the change in price of the stock overtime?
In this section we'll go over how to handle requesting stock information with pandas, and how to analyze basic
attributes of a stock.
unfold_moreShow hidden cell
In [2]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns
sns.set_style('whitegrid')
plt.style.use("fivethirtyeight")
%matplotlib inline
Page | 12
# For reading stock data from yahoo
from pandas_datareader.data import DataReader
import yfinance as yf
from pandas_datareader import data as pdr

yf.pdr_override()

# For time stamps

from datetime import datetime

# The tech stocks we'll use for this analysis

tech_list = ['AAPL', 'GOOG', 'MSFT', 'AMZN']

# Set up End and Start times for data grab

tech_list = ['AAPL', 'GOOG', 'MSFT', 'AMZN']

end = datetime.now()
start = datetime(end.year - 1, end.month, end.day)

for stock in tech_list:

globals()[stock] = yf.download(stock, start, end)

company_list = [AAPL, GOOG, MSFT, AMZN]

company_name = ["APPLE", "GOOGLE", "MICROSOFT", "AMAZON"]

for company, com_name in zip(company_list, company_name):

company["company_name"] = com_name

df = pd.concat(company_list, axis=0)
df.tail(10)
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
Out[2]:

Open High Low Close Adj Close Volume company_name

Date

2023-01-17 98.680000 98.889999 95.730003 96.050003 96.050003 72755000 AMAZON

00:00:00-

Page | 13
Open High Low Close Adj Close Volume company_name

Date

05:00

2023-01-18
00:00:00- 97.250000 99.320000 95.379997 95.459999 95.459999 79570400 AMAZON
05:00

2023-01-19
00:00:00- 94.739998 95.440002 92.860001 93.680000 93.680000 69002700 AMAZON
05:00

2023-01-20
00:00:00- 93.860001 97.349998 93.199997 97.250000 97.250000 67307100 AMAZON
05:00

2023-01-23
00:00:00- 97.559998 97.779999 95.860001 97.519997 97.519997 76501100 AMAZON
05:00

2023-01-24
00:00:00- 96.930000 98.089996 96.000000 96.320000 96.320000 66929500 AMAZON
05:00

2023-01-25
00:00:00- 92.559998 97.239998 91.519997 97.180000 97.180000 94261600 AMAZON
05:00

2023-01-26
00:00:00- 98.239998 99.489998 96.919998 99.220001 99.220001 68523600 AMAZON
05:00

2023-01-27 99.529999 103.489998 99.529999 102.239998 102.239998 87678100 AMAZON

00:00:00-
Page | 14
Open High Low Close Adj Close Volume company_name

Date

05:00

2023-01-30
00:00:00- 101.089996 101.739998 99.010002 100.550003 100.550003 70566100 AMAZON
05:00

Reviewing the content of our data, we can see that the data is numeric and the date is the index of the data.
Notice also that weekends are missing from the records.
Quick note: Using globals() is a sloppy way of setting the DataFrame names, but it's simple. Now we have our
data, let's perform some basic data analysis and check our data.
Descriptive Statistics about the Data
.describe() generates descriptive statistics. Descriptive statistics include those that summarize the central
tendency, dispersion, and shape of a dataset’s distribution, excluding NaN values.

Analyzes both numeric and object series, as well as DataFrame column sets of mixed data types. The output will
vary depending on what is provided. Refer to the notes below for more detail.
In [3]:
# Summary Stats
AAPL.describe()
Out[3]:

Open High Low Close Adj Close Volume

count 251.000000 251.000000 251.000000 251.000000 251.000000 2.510000e+02

mean 152.117251 154.227052 150.098406 152.240797 151.861737 8.545738e+07

std 13.239204 13.124055 13.268053 13.255593 13.057870 2.257398e+07

min 126.010002 127.769997 124.169998 125.019997 125.019997 3.519590e+07

Page | 15
Open High Low Close Adj Close Volume

25% 142.110001 143.854996 139.949997 142.464996 142.190201 7.027710e+07

50% 150.089996 151.990005 148.199997 150.649994 150.400497 8.100050e+07

75% 163.434998 165.835007 160.879997 163.629997 163.200417 9.374540e+07

max 178.550003 179.610001 176.699997 178.960007 178.154037 1.826020e+08

We have only 255 records in one year because weekends are not included in the data.
Information About the Data
.info() method prints information about a DataFrame including the index dtype and columns, non-null values,
and memory usage.
In [4]:
# General info
AAPL.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 251 entries, 2022-01-31 00:00:00-05:00 to 2023-01-30 00:00:00-05:00
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Open 251 non-null float64
1 High 251 non-null float64
2 Low 251 non-null float64
3 Close 251 non-null float64
4 Adj Close 251 non-null float64
5 Volume 251 non-null int64
6 company_name 251 non-null object
dtypes: float64(5), int64(1), object(1)
memory usage: 23.8+ KB
Closing Price
The closing price is the last price at which the stock is traded during the regular trading day. A stock’s closing
price is the standard benchmark used by investors to track its performance over time.
In [5]:
linkcode
# Let's see a historical view of the closing price
plt.figure(figsize=(15, 10))
plt.subplots_adjust(top=1.25, bottom=1.2)

Page | 16
for i, company in enumerate(company_list, 1):
plt.subplot(2, 2, i)
company['Adj Close'].plot()
plt.ylabel('Adj Close')
plt.xlabel(None)
plt.title(f"Closing Price of {tech_list[i - 1]}")

plt.tight_layout()

Volume of Sales
Volume is the amount of an asset or security that changes hands over some period of time, often over the course
of a day. For instance, the stock trading volume would refer to the number of shares of security traded between
its daily open and close. Trading volume, and changes to volume over the course of time, are important inputs
for technical traders.
In [6]:
linkcode
# Now let's plot the total volume of stock being traded each day
plt.figure(figsize=(15, 10))
plt.subplots_adjust(top=1.25, bottom=1.2)

for i, company in enumerate(company_list, 1):

plt.subplot(2, 2, i)
company['Volume'].plot()
plt.ylabel('Volume')
plt.xlabel(None)
plt.title(f"Sales Volume for {tech_list[i - 1]}")
Page | 17
plt.tight_layout()

Now that we've seen the visualizations for the closing price and the volume traded each day, let's go
ahead and caculate the moving average for the stock.

2. What was the moving average of the various stocks?

The moving average (MA) is a simple technical analysis tool that smooths out price data by creating a
constantly updated average price. The average is taken over a specific period of time, like 10 days, 20 minutes,
30 weeks, or any time period the trader chooses.
In [7]:
linkcode
ma_day = [10, 20, 50]

for ma in ma_day:
for company in company_list:
column_name = f"MA for {ma} days"
company[column_name] = company['Adj Close'].rolling(ma).mean()

fig, axes = plt.subplots(nrows=2, ncols=2)

fig.set_figheight(10)
fig.set_figwidth(15)

AAPL[['Adj Close', 'MA for 10 days', 'MA for 20 days', 'MA for 50 days']].plot(ax=axes[0,0])
Page | 18
axes[0,0].set_title('APPLE')

GOOG[['Adj Close', 'MA for 10 days', 'MA for 20 days', 'MA for 50 days']].plot(ax=axes[0,1])
axes[0,1].set_title('GOOGLE')

MSFT[['Adj Close', 'MA for 10 days', 'MA for 20 days', 'MA for 50 days']].plot(ax=axes[1,0])
axes[1,0].set_title('MICROSOFT')

AMZN[['Adj Close', 'MA for 10 days', 'MA for 20 days', 'MA for 50 days']].plot(ax=axes[1,1])
axes[1,1].set_title('AMAZON')

fig.tight_layout()

We see in the graph that the best values to measure the moving average are 10 and 20 days because we still
capture trends in the data without noise.

Page | 19
Page | 20

Collins English Skills 2 Answers
No ratings yet
Collins English Skills 2 Answers
41 pages
Multiple - Linear - Regression - AirBNB - Student - File0.2 - New (1) .Ipynb - Colaboratory
No ratings yet
Multiple - Linear - Regression - AirBNB - Student - File0.2 - New (1) .Ipynb - Colaboratory
8 pages
1993 Subaru Legacy
No ratings yet
1993 Subaru Legacy
37 pages
House Price Prediction: Project Description
No ratings yet
House Price Prediction: Project Description
11 pages
DA Lab2
No ratings yet
DA Lab2
5 pages
Housing Prices Linear Regression
No ratings yet
Housing Prices Linear Regression
3 pages
Machine Learning Project: TITLE: Predicting The Sale Price of A House Using Linear Regression
No ratings yet
Machine Learning Project: TITLE: Predicting The Sale Price of A House Using Linear Regression
20 pages
Exercise5 Solution
No ratings yet
Exercise5 Solution
22 pages
Project 4 - House Price Prediction - Ipynb - Colab
No ratings yet
Project 4 - House Price Prediction - Ipynb - Colab
5 pages
Document From Jahnavi
No ratings yet
Document From Jahnavi
20 pages
Regression Algorithm
No ratings yet
Regression Algorithm
9 pages
ML Manual
No ratings yet
ML Manual
30 pages
Data Clearning
No ratings yet
Data Clearning
7 pages
Linear Regression Analysis - Polynomial Regression
No ratings yet
Linear Regression Analysis - Polynomial Regression
25 pages
Mlext
No ratings yet
Mlext
1 page
T2 Summary VHA
No ratings yet
T2 Summary VHA
14 pages
Report
No ratings yet
Report
40 pages
House Price Prediction: # Importing Necessary Libraries
No ratings yet
House Price Prediction: # Importing Necessary Libraries
18 pages
ML Manual
No ratings yet
ML Manual
24 pages
Ds ML House Price Book
No ratings yet
Ds ML House Price Book
46 pages
Data Science Record - 05
No ratings yet
Data Science Record - 05
20 pages
House Price Prediction Using Machine Learning in Python
No ratings yet
House Price Prediction Using Machine Learning in Python
13 pages
ML Manual
No ratings yet
ML Manual
9 pages
House Price Prediction Models
No ratings yet
House Price Prediction Models
16 pages
House Price Prediction
No ratings yet
House Price Prediction
14 pages
A
No ratings yet
A
2 pages
USA Real Estate Price Prediction Using Decision Tree Regressor, and AdaBoost Regressor
No ratings yet
USA Real Estate Price Prediction Using Decision Tree Regressor, and AdaBoost Regressor
14 pages
Data Preprocessing 2
No ratings yet
Data Preprocessing 2
5 pages
ML 1-11
No ratings yet
ML 1-11
27 pages
Train
No ratings yet
Train
17 pages
California Housing Project
No ratings yet
California Housing Project
5 pages
1684918425867
No ratings yet
1684918425867
14 pages
Emllab
No ratings yet
Emllab
6 pages
Unit 1: Shobana T S Assistant Professor Dept. of ISE, BMSCE
No ratings yet
Unit 1: Shobana T S Assistant Professor Dept. of ISE, BMSCE
127 pages
ML Full For Print New 1
No ratings yet
ML Full For Print New 1
38 pages
Lecture02. ML Pipeline (Chapter 2)
No ratings yet
Lecture02. ML Pipeline (Chapter 2)
50 pages
Docu 4
No ratings yet
Docu 4
3 pages
Analysis On Weight Capacity
No ratings yet
Analysis On Weight Capacity
4 pages
Machinelearning
No ratings yet
Machinelearning
26 pages
Exercise4 Solution
No ratings yet
Exercise4 Solution
20 pages
Data Analytics I
No ratings yet
Data Analytics I
4 pages
Deepak Data Analysis 1
No ratings yet
Deepak Data Analysis 1
31 pages
New Opendocument Text
No ratings yet
New Opendocument Text
7 pages
QB 1
No ratings yet
QB 1
11 pages
Housing Prices Notebook
No ratings yet
Housing Prices Notebook
14 pages
Untitled6.Ipynb - Colab
No ratings yet
Untitled6.Ipynb - Colab
6 pages
Python File
No ratings yet
Python File
5 pages
MLLab Manual
No ratings yet
MLLab Manual
24 pages
Faseeh Chap 2 Report
No ratings yet
Faseeh Chap 2 Report
30 pages
One Hot Encoding
No ratings yet
One Hot Encoding
12 pages
ML Short Code - Under Updating
No ratings yet
ML Short Code - Under Updating
4 pages
Introduction To Machine Learning (ML) With Sklearn
No ratings yet
Introduction To Machine Learning (ML) With Sklearn
10 pages
Prac - 8 (1) - Jupyter Notebook
No ratings yet
Prac - 8 (1) - Jupyter Notebook
6 pages
Setup: Chapter 2 - End-To-End Machine Learning Project
No ratings yet
Setup: Chapter 2 - End-To-End Machine Learning Project
31 pages
DSBDA Prac4 2
No ratings yet
DSBDA Prac4 2
1 page
Predicting Home Prices in Bangalore
No ratings yet
Predicting Home Prices in Bangalore
18 pages
Set 2
No ratings yet
Set 2
19 pages
Import As Import As From Import: "Mean Squared Errors: "
No ratings yet
Import As Import As From Import: "Mean Squared Errors: "
1 page
Linear Regression - Jupyter Notebook
No ratings yet
Linear Regression - Jupyter Notebook
2 pages
Kaggle Course Notes
No ratings yet
Kaggle Course Notes
87 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
22 pages
Tom learns about money: Practical money skills for kids
From Everand
Tom learns about money: Practical money skills for kids
Riin Tuttelberg
No ratings yet
Automata Theory and Formal Languages - Module 2
No ratings yet
Automata Theory and Formal Languages - Module 2
4 pages
7.3 Java Applet
No ratings yet
7.3 Java Applet
6 pages
Thesis For Girl Interrupted
100% (3)
Thesis For Girl Interrupted
8 pages
Loyalty - From Single Stage Loyalty To Four Stage
No ratings yet
Loyalty - From Single Stage Loyalty To Four Stage
5 pages
Manual de Medidores de GN B3 Roots
No ratings yet
Manual de Medidores de GN B3 Roots
32 pages
Association of Aphids With Plants Belonging To Order Nymphaeales-Austrobaileyales-Laurales-Magnoliales and Piperales in India
No ratings yet
Association of Aphids With Plants Belonging To Order Nymphaeales-Austrobaileyales-Laurales-Magnoliales and Piperales in India
7 pages
Enm302 Speaking Test
No ratings yet
Enm302 Speaking Test
12 pages
CESC AQUILA WEEK 4 2nd Sem
No ratings yet
CESC AQUILA WEEK 4 2nd Sem
2 pages
Equivalence Relations - A
50% (2)
Equivalence Relations - A
14 pages
Rubrics For Design Project 1 Report (Part 1) CPB 30703 Design Project 1
No ratings yet
Rubrics For Design Project 1 Report (Part 1) CPB 30703 Design Project 1
1 page
Function of Communication & Role of Communication in Everyday Life
No ratings yet
Function of Communication & Role of Communication in Everyday Life
17 pages
MVE 200/15E-30A0 (EE40020030A0JA0000) : 3 PH - 4 Poles - 1500 RPM - 220-240/380-415 V - 50 HZ
No ratings yet
MVE 200/15E-30A0 (EE40020030A0JA0000) : 3 PH - 4 Poles - 1500 RPM - 220-240/380-415 V - 50 HZ
1 page
LKG Syllabus ICSE
No ratings yet
LKG Syllabus ICSE
5 pages
A Medieval Catalan Noble Family: The Montcadas, 1000-1230: John C. Shideler
No ratings yet
A Medieval Catalan Noble Family: The Montcadas, 1000-1230: John C. Shideler
4 pages
An Efficient Robotic Tendon For Gait Assistance
No ratings yet
An Efficient Robotic Tendon For Gait Assistance
4 pages
Organic Chemistry 4th Edition by Smith Janice Instant Download
100% (1)
Organic Chemistry 4th Edition by Smith Janice Instant Download
33 pages
Project Domain / Category: Online Scrabble Game
No ratings yet
Project Domain / Category: Online Scrabble Game
2 pages
Ethernet Twist Per Inch
No ratings yet
Ethernet Twist Per Inch
8 pages
Language Construct - Struct-Comparators
No ratings yet
Language Construct - Struct-Comparators
3 pages
7th Sem Tuition Fees
No ratings yet
7th Sem Tuition Fees
1 page
How To Respond To Unhappy Customers
No ratings yet
How To Respond To Unhappy Customers
10 pages
Portfolio What Is A Portfolio?
No ratings yet
Portfolio What Is A Portfolio?
4 pages
CORE Stat and Prob Q4 Mod11 W1 Hypothesistesting
No ratings yet
CORE Stat and Prob Q4 Mod11 W1 Hypothesistesting
24 pages
Uniqueness of The Earth
No ratings yet
Uniqueness of The Earth
19 pages
Rubrics For Student Engagement or Class Participation
No ratings yet
Rubrics For Student Engagement or Class Participation
2 pages
STM Report Writing
No ratings yet
STM Report Writing
19 pages
Marketing Management 622 - AM
No ratings yet
Marketing Management 622 - AM
7 pages
9020H SV Ver0
100% (2)
9020H SV Ver0
59 pages

Machine Learning Lab

Uploaded by

Machine Learning Lab

Uploaded by

Mini Project

1. a. Housing Price Decision Sklearn Linear Regression

545 rows × 13 columns

code_mapping_furniture = {'unfurnished':0, 'semi-furnished':1, 'furnished':2}

545 rows × 13 columns

# Fit the detector on your data

# Predict outliers (anomalies)

## y = m1x1 + m2x2 + ... + c

RESULT WITH TESTED DATA

Without test 0.67 0.66 0.68 0.62

With test 0.72 0.64 0.6 0.64

1. b. Housing Prices multiple Regression

import numpy as np # linear algebra

# Input data files are available in the read-only "../input/" directory

# List of variables to map

# Defining the map function

# Applying the function to the housing list

# Separate the independent and dependent variables

# Split the data into training and testing sets

# Fit the model to the training data

# Predict the prices on the test data

# Print the root mean squared error

4.) What was the correlation between different stocks'?

Getting the Data

import matplotlib.pyplot as plt

# For time stamps

# The tech stocks we'll use for this analysis

# Set up End and Start times for data grab

for stock in tech_list:

company_list = [AAPL, GOOG, MSFT, AMZN]

for company, com_name in zip(company_list, company_name):

Open High Low Close Adj Close Volume company_name

2023-01-17 98.680000 98.889999 95.730003 96.050003 96.050003 72755000 AMAZON

2023-01-27 99.529999 103.489998 99.529999 102.239998 102.239998 87678100 AMAZON

Open High Low Close Adj Close Volume

count 251.000000 251.000000 251.000000 251.000000 251.000000 2.510000e+02

mean 152.117251 154.227052 150.098406 152.240797 151.861737 8.545738e+07

std 13.239204 13.124055 13.268053 13.255593 13.057870 2.257398e+07

min 126.010002 127.769997 124.169998 125.019997 125.019997 3.519590e+07

25% 142.110001 143.854996 139.949997 142.464996 142.190201 7.027710e+07

50% 150.089996 151.990005 148.199997 150.649994 150.400497 8.100050e+07

75% 163.434998 165.835007 160.879997 163.629997 163.200417 9.374540e+07

max 178.550003 179.610001 176.699997 178.960007 178.154037 1.826020e+08

for i, company in enumerate(company_list, 1):

2. What was the moving average of the various stocks?

fig, axes = plt.subplots(nrows=2, ncols=2)

You might also like