Hint Sheet

The document outlines a project focused on developing a machine learning model for house price prediction using a dataset from the Indian Institute of Technology Delhi. It details the steps involved in data preprocessing, exploratory data analysis, feature engineering, and model training using linear regression. Additionally, it covers model evaluation metrics and how to save the trained model for future use.

Uploaded by

prashantdwivedi636

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

27 views13 pages

Hint Sheet

Uploaded by

prashantdwivedi636

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 13

2125125, 9:47 AM CCheatShest.ipynb -Colab Project Title FOUNDATION FOR INNOVATION AND TECHNOLOGY TRANSFER ada wenhrar dear fecett Indian Institute of Technology Delhi Submitted By: Name: College : ID: Explain Goal: House Price Prediction Regression Model We all have experienced a time when we have to look up for a new house to buy. But then the journey begins with a lot of frauds, negotiating deals, researching the local areas and so on. Intps:ifcolab research google.comidrveymwpjlK6sI0_{yHYG8cZG77VHVSEVBtprintMode=true a32286, 947 AM Chaatshesipyn Cola House Price Prediction using Machine Learning So to deal with this kind of issues Today we will be preparing a MACHINE LEARNING Based model, trained on the House Price Prediction Dataset. y Import Liabaries #import pandas as pd import nunpy as np #import matplotlib.pyplot as plt import seaborn as sns #import warnings warnings. filterwarnings("ignore") y Drive Mounting from google.colab import drive drive mount ('/content/drive’) Loading the dataset file from the Location (Drive) using pd.read_csv df = pd.read_csv('/Housing Price Dataset - Housing.csv') df -head() How to show the first 5 and last 5 entiries from the sheets ? v dF head() df tail() How to find the total number of rows and column in excel sheet ? ‘#df.shape() hitpssfcola research, google. comidrveltymwpji6kést0_fyHY G9eZG77YHIvS4V8HprintMode=ttue 232128125, 9:47 AM CCheatShest.ipynb -Colab Idendify Features and labels There are 12 featues in this data, and the target/label is predicting the price of the based on these features. Exploratory Data Analysis Explain the need of EDA. We will preprocess data and cleaning of the data. . How to check the total number of record from the datset . Handle missing values How to fill missing values . How to delete the columns/rows How to add new column . Label Encoding . How to deal with duplicate values @PNAARYONA How to get a statistical inference of data 9. Plotting the relatip between feartures of the data 10. Feature Engineering 11, Data Visualization 12. Understanding relationship between the variables 13, Drawing Conclusion Describe Function : Gives the statistical facts of the dataset df describe() ¥ *null values #4f-null() hitpssfcola research, google. comidrveltymwpji6kést0_fyHY G9eZG77YHIvS4VBHpriniModesttue 332128125, 9:47 AM CCheatShest.ipynb -Colab v_ Finding Null Values using [.isna().sum] df isna().sum() How to undertand the complete data, its column (fearures, their datatypes), null values, etc ? df info Finding the Unique values in the for the column v furnishingstatus. #dF[' Column_name" ] .nunique() iF) 3 ¥ How to fill a empty column in dataframe ? sdf 'Column_nane"].fillna('Not req*, inplace=True) By filling the numerical value with some stat oprtaion (mean, median, mode)etc. #mean_value = df['Anount"].mean() # Replace NaNs in column Amount with the mean of values in the same column #d#['Anount'] fillna(value=mean_value, inplace=True) print (‘updated Datafrane:') sprint (df.head) v Droping/Deleting the empty/not required columns. hitpssfcola research, google. comidrveltymwpji6kést0_fyHY G9eZG77YHIvS4VBHpriniModesttue 432128125, 9:47 AM CCheatShest.ipynb -Colab #drop unrelated/blank columns #df.drop([ "Status", ‘unnamedi', 'Newcolunn’], axis=1, inplace=True) Label_Encoding y Converting all categorical valures in continous form. #columns_to_transform = [‘mainroad', ‘guestroom’, ‘basement’, ‘hotwaterheating',, 'airconditior #df[colunns_to_transforn] = df[columns_to_transform].replace({'yes’: 1, ‘no’: @}) Hdf['furnishingstatus'] = df[‘furnishingstatus'].replace({‘unfurnished': @, ‘semi-furnished’ sdf -head() v StandardScaler StandardScaler is a preprocessing technique in scikit-learn used for standardizing features by removing the mean and scaling to unit variance. StandardScaler, a popular preprocessing technique provided by scikit-learn, offers a simple yet effective method for standardizing feature values. Let's delve deeper into the workings of StandardScaler: Normalization Process: StandardScaler operates on the principle of normalization, where it transforms the distribution of each feature to have a mean of zero and a standard deviation of one. This process ensures that all features are on the same scale, preventing any single feature from dominating the learning process due to its larger magnitude In essence, StandardScaler is a versatile and widely used preprocessing technique that contributes to the robustness, interpretability, and performance of machine learning models trained on diverse datasets. Understanding its principles and application is essential for effectively preparing data for model training and achieving reliable results in various machine-learning tasks. from sklearn.preprocessing import StandardScaler scaler = StandardScaler() #Columns = [‘price', ‘area'] adf[sc] = scaler. fit_transform(df[Columns]) hitpssfcola research, google. comidrveltymwpji6kést0_fyHY G9eZG77YHIvS4VBHpriniModesttue 5132128125, 9:47 AM CCheatShest.ipynb -Colab itdf .head() y Check the datatypes for all the variables. taf. dtypes f.shape - Nonetror Traceback (nost recent call last) Cipython-input-13-653337079cd8> in () ee Ge shape NameError: name ‘df' is not defined » Correlational Matrix: A correlation matrix is a table that shows the correlation coefficients between a set of variables. It's a tool used to identify patterns and trends in data *What does it show? * Correlation coefficients The correlation coefficient measures how closely two variables are related. It can range from -1 to +1, with 0 indicating no correlation and 1 indicating a perfect prediction Direction A positive value indicates a positive relationship, while a negative value indicates a negative relationship. What's it used for? Summarizing data: A correlation matrix can summarize a large dataset. Identifying patterns: A correlation matrix can help identify patterns and trends in data. Understanding relationships: A correlation matrix can help understand the relationships between variables. #eorr_matrix = df.corr() hitpssfcola research, google. comidrveltymwpji6kést0_fyHY G9eZG77YHIvS4VBHpriniModesttue 632128125, 9:47 AM CCheatShest.ipynb -Colab #plt .Figure(Figsize=(18, 5)) fsns.heatmap(corr_matrix, annot=True, cmap="coolwarm” ) plt.show() Data Visualization using seaborn and matplotlib ¥ Histogram A histogram is a graph that shows the frequency distribution of numerical data. It's used to represent continuous or discrete data, and is especially useful for large data sets How a histogram works ? A histogram divides the data into groups called bins. The height of each bin’s rectangle represents the number of data points in that bin. The width of each bin’s rectangle represents the value of the variable. #df hist (Figsize=(10, 10), bins=10) Hplt. suptitle("Histograms for All Columns", fontsize=16) fplt.show() ¥ 1. Gender count plot from data : Bar Graph # plotting a bar chart for Gender and it's count #ax = sns.countplot(x = ‘Gender',data = df) #for bars in ax.containers: # — ax.bar_label(bars) v Pie Chart hitpssfcola research, google. comidrveltymwpji6kést0_fyHY G9eZG77YHIvS4VBHpriniModesttue m32128125, 9:47 AM CCheatShest.ipynb -Colab plotting a pie chart for Gender and it's count # Calculate value counts for Gender #gender_counts = df['Gender' ].value_counts() # Create a pie chart using matplotlib #plt.pie(gender_counts, labels=gender_counts.index, autopct='%1.1f%%') #plt.title( ‘Gender Distribution’) #plt. show) v Line Graph # total number of orders from top 18 states Hsales_state = df.groupby(['State’], as_index=False)[ ‘Orders’ ].sum().sort_values(by="Orders* sns.set(rc={'Figure.figsize’ :(15,5)}) sns.lineplot(data = sales_state, x = 'State',y= 'Orders') ¥ Separating the featues and labesl from data frame. Now, X stroes all the independent values, and y stores the dependent values. X will have all features. y will have target value (price) 8x ay df.drop('price’, axis=1) dF[ ‘price’ ] NameError Traceback (most recent call last) in () ----> 1 df.head() NameError: name ‘df! is not defined hitpssfcola research, google. comidrveltymwpji6kést0_fyHY G9eZG77YHIvS4V8HprintMode=ttue ans2128125, 9:47 AM CCheatShest.ipynb -Colab Preaparing data feature set and labels for training of the model. #print(X.shape) sprint (y.shape) v Look at the some samples of X and y. #print (x [:10]) sprint (y [:1@]) Now, the task is to split the training data and testing data y_ for model training, by importing train-test split from sklearn. #from sklearn.model_selection import train_test_split Y Creating the Training and testing data with a split train, X test, y_train, y test = train_test_split(X, y, test_size-0.3, random_state=42) v Loading the model from Sklearn from sklearn.linear_model import LinearRegression #1r_model = LinearRegression() hitpssfcola research, google. comidrveltymwpji6kést0_fyHY G9eZG77YHIvS4VBHpriniModesttue ons2128125, 9:47 AM CCheatShest.ipynb -Colab v Train the model to the dataset using fit function #Ir_model.fit(X_train, y_train) Now, Test the trained model on test data using Predict function #1r_y_pred = 1r_model.predict(x_test) MODEL EVALUATION : Importing the Error metrices from sklearn from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score y Calculating MSE, MAE, & R2 for regression mdel #mse = mean_squared_error(y test, 1r_y pred) #mae = mean_absolute_error(y_test,1r_y_pred) #r2 = r2_score(y_test, Ir_y_pred) sprint ("\nModel Performance Metrics:") print (f"Mean Squared Error (MSE): {mse:.2F}") print (#"Mean Absolute Error (MAE): {nae:.2f}") print (#"R-Squared (R2): {r2:.24)") ¥ Confusion Matrix #class_labels = labels_test ‘from sklearn.metrics import confusion_natrix #plt.figure(figsize=(8,8)) Hy_pred_labels = [ np.argmax(label) for label in predicted_classes ] icm = confusion matrix(y_test, y_pred_labels) hitpssfcola research, google. comidrveltymwpji6kést0_fyHY G9eZG77YHIvS4VBHpriniModesttue s01132128125, 9:47 AM CCheatShest.ipynb -Colab # show cm fsns.heatmap(cm, annot=True, fmt="d' ,xticklabels=class_labels, yticklabels=class_labels) v Classification Report #from sklearn.metrics import classification_report ficr= classification_report(y test, y_pred_labels, target_names-labels_test) print (cr) Double-click (or enter) to edit Y Plotting the Regression Line splt.figure(figsize=(8, 6)) uplt-scatter(y test, 1ry pred, color="blue’, [email protected], label="Predictions') uplt.plot({min(y_test), max(y_test)], [min(y_test), max(y_test)], color="red", linestyli #plt_xlabel(‘Target Values’) #plt.ylabel (‘Predicted Values") tplt.title('Linear Regression Predicted vs Target Values’) #plt legend() aplt.grid(True) #plt.show() v Score—R Square Value #print( ‘Training score’ ,1r_model.score(x_train,y train)) aprint( ‘Testing score’ ,1r_model.score(x_test,y_test)) ¥ Importantn: How to Save a model Pickle library is used to save a model and use this in real time. import pickle hitpssfcola research, google. comidrveltymwpji6kést0_fyHY G9eZG77YHIvS4V8HprintMode=ttue 32128125, 9:47 AM CCheatShest.ipynb -Colab v Save the model wb! means 'write binary’ and is used for the file handle: open(‘save.p’, ‘wb’) which writes the pickeled data into a file. fwith open(‘model_pickle",'wb') as file: # — pickle.dump(Ir_model, file) from sklearn.linear_model import LinearRegression f#1r_nodel = LinearRegression() #1r_model.fit(X_train, y_train) y load the model with a name with open(‘model_pickle",'rb') as file: # LR = pickle. load(file) Calculate the coefficiets for the regression line for this data. #LR. coef_ Fy array((0.29361062, 0.04427349, 0.59793397, 0.22250551, @.21850242, 0.14958561, 0.25952496, 0.33174227, 0.36388859, 0.16271987, 0.27261478, 0.10597148]) ¥ Calculate the intercept for the regression line for this data #LR. intercept_ Sy -1.9988122975861022 Equation for Multi-Regression Model for this house price < prediction hitpssfcola research, google. comidrveltymwpji6kést0_fyHY G9eZG77YHIvS4VBHpriniModesttue rans2128125, 9:47 AM CCheatShest.ipynb -Colab Y = atb1X+b2X2+b3X3+b4X4+b5X5+b6X6+b7X6+b8X8+b9X9+b10X10+b11X11+b12X12 Y = -1,9988 +X1 0.29361062 + X2 0.04427349 + X3 0.59793397 + X4 0.22250551 + XS 0.21850242 + X6 0.14958561 + X7 0.25952496 + X8 0,33174227 + X9 0.36388859 X10 0.16271987 + X10 0.27261478 + X11 0.10597148 #d€[ "bedrooms" ] .unique() array([4, 3, 5, 2, 6, 1]) v Predicting house price for some user input values. #LR. predict ([[3.000677,3,2,1,0,2,0,1,1,2,1,2]]) By array({2.08862621]) v Another way: By creating a new_var fnew_data = [2.347980,3,2,3,1,0,1,1,1,2,1,3] WLR. predict ([new_data]) Conclusion Discuss results, Write down 4-5 lines about the model used, training time and testin time. Write down advantages and solved probelm closing statements. Hence, we have learn to build our first. machine learning based model for house price predicts based on user input. hitpssfcola research, google. comidrveltymwpji6kést0_fyHY G9eZG77YHIvS4V8HprintMode=ttue 19113

(Feature Engineering) (Extended-Cheatsheet)
No ratings yet
(Feature Engineering) (Extended-Cheatsheet)
9 pages
Chapter 2 - Python Pandas II
No ratings yet
Chapter 2 - Python Pandas II
71 pages
Data Cleaning - Cheatsheet
100% (2)
Data Cleaning - Cheatsheet
8 pages
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (3)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
9 pages
Python For DS Cheat Sheet
100% (2)
Python For DS Cheat Sheet
6 pages
Lesson 2 - Data Preprocessing
100% (1)
Lesson 2 - Data Preprocessing
72 pages
# (Data Preprocessing) : (Cheatsheet)
No ratings yet
# (Data Preprocessing) : (Cheatsheet)
10 pages
ML Book Notes
No ratings yet
ML Book Notes
9 pages
Unit 1: Shobana T S Assistant Professor Dept. of ISE, BMSCE
No ratings yet
Unit 1: Shobana T S Assistant Professor Dept. of ISE, BMSCE
127 pages
Python For Machine Learning
No ratings yet
Python For Machine Learning
66 pages
Lecture02. ML Pipeline (Chapter 2)
No ratings yet
Lecture02. ML Pipeline (Chapter 2)
50 pages
Pandas Cheat Sheet PDF
67% (3)
Pandas Cheat Sheet PDF
1 page
EDA - Exploratory Data Analysis
No ratings yet
EDA - Exploratory Data Analysis
16 pages
ML Unit 2
No ratings yet
ML Unit 2
52 pages
L03 The Regression Pipeline
No ratings yet
L03 The Regression Pipeline
94 pages
2 DataPreProcessing Code
No ratings yet
2 DataPreProcessing Code
46 pages
Unit 4 - Working With Graphs - Python
No ratings yet
Unit 4 - Working With Graphs - Python
49 pages
Data Analysis
No ratings yet
Data Analysis
42 pages
Dsbda Ass2
No ratings yet
Dsbda Ass2
49 pages
EDS - Python Cheat Sheet
0% (1)
EDS - Python Cheat Sheet
3 pages
DM Lab Cycle 2 1
No ratings yet
DM Lab Cycle 2 1
10 pages
Python Cheat Sheet Code Academy
100% (1)
Python Cheat Sheet Code Academy
1 page
Lesson 1 - Data Visualisation
No ratings yet
Lesson 1 - Data Visualisation
35 pages
ML Lab Records
No ratings yet
ML Lab Records
101 pages
EDA+Cheatsheet+ +Class+Note
No ratings yet
EDA+Cheatsheet+ +Class+Note
29 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
Analysis and Prediction of House Prices by Linear Regression Model
No ratings yet
Analysis and Prediction of House Prices by Linear Regression Model
91 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
EDA+Cheatsheet+ +Class+Note
No ratings yet
EDA+Cheatsheet+ +Class+Note
29 pages
EDA Explanations
No ratings yet
EDA Explanations
22 pages
EDA+Cheatsheet+ +Class+Note
No ratings yet
EDA+Cheatsheet+ +Class+Note
29 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
Machine Learning
No ratings yet
Machine Learning
30 pages
TYCS Practical
No ratings yet
TYCS Practical
26 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
Data Engineer Interview 1740985064
No ratings yet
Data Engineer Interview 1740985064
14 pages
05 Pandas
No ratings yet
05 Pandas
12 pages
Pandas Cheat Sheet - Python For Data Science
No ratings yet
Pandas Cheat Sheet - Python For Data Science
5 pages
Data Science Cheat Sheet: KEY Imports
100% (1)
Data Science Cheat Sheet: KEY Imports
1 page
Eda Code Snippets
No ratings yet
Eda Code Snippets
17 pages
Data Treatment
No ratings yet
Data Treatment
6 pages
Python CheatSheet
No ratings yet
Python CheatSheet
2 pages
Ap Python
No ratings yet
Ap Python
12 pages
Lecture Material 3
No ratings yet
Lecture Material 3
7 pages
Complete Guide To Exploratory Data Analysis With Python Plotly - by Anar Abiyev - Mar, 2022 - Medium
No ratings yet
Complete Guide To Exploratory Data Analysis With Python Plotly - by Anar Abiyev - Mar, 2022 - Medium
11 pages
Data Clearning
No ratings yet
Data Clearning
7 pages
Asset-V1 VIT+MBA109+2020+type@asset+block@Introductio To ML Using Python
No ratings yet
Asset-V1 VIT+MBA109+2020+type@asset+block@Introductio To ML Using Python
7 pages
2 Program
No ratings yet
2 Program
8 pages
ML Complete Notes Hridoy
No ratings yet
ML Complete Notes Hridoy
5 pages
Python Basics - Hamza Zahoor
No ratings yet
Python Basics - Hamza Zahoor
6 pages
Code Shabab Error 7
No ratings yet
Code Shabab Error 7
5 pages
Data Analysis W Pandas
No ratings yet
Data Analysis W Pandas
4 pages
Lecture Material 10
No ratings yet
Lecture Material 10
9 pages
Day 30 UnderstandingYourData 7steps
No ratings yet
Day 30 UnderstandingYourData 7steps
4 pages

Hint Sheet

Uploaded by

Hint Sheet

Uploaded by

You might also like