0% found this document useful (0 votes)
27 views13 pages

Hint Sheet

The document outlines a project focused on developing a machine learning model for house price prediction using a dataset from the Indian Institute of Technology Delhi. It details the steps involved in data preprocessing, exploratory data analysis, feature engineering, and model training using linear regression. Additionally, it covers model evaluation metrics and how to save the trained model for future use.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
27 views13 pages

Hint Sheet

The document outlines a project focused on developing a machine learning model for house price prediction using a dataset from the Indian Institute of Technology Delhi. It details the steps involved in data preprocessing, exploratory data analysis, feature engineering, and model training using linear regression. Additionally, it covers model evaluation metrics and how to save the trained model for future use.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 13
2125125, 9:47 AM CCheatShest.ipynb -Colab Project Title FOUNDATION FOR INNOVATION AND TECHNOLOGY TRANSFER ada wenhrar dear fecett Indian Institute of Technology Delhi Submitted By: Name: College : ID: Explain Goal: House Price Prediction Regression Model We all have experienced a time when we have to look up for a new house to buy. But then the journey begins with a lot of frauds, negotiating deals, researching the local areas and so on. Intps:ifcolab research google.comidrveymwpjlK6sI0_{yHYG8cZG77VHVSEVBtprintMode=true a3 2286, 947 AM Chaatshesipyn Cola House Price Prediction using Machine Learning So to deal with this kind of issues Today we will be preparing a MACHINE LEARNING Based model, trained on the House Price Prediction Dataset. y Import Liabaries #import pandas as pd import nunpy as np #import matplotlib.pyplot as plt import seaborn as sns #import warnings warnings. filterwarnings("ignore") y Drive Mounting from google.colab import drive drive mount ('/content/drive’) Loading the dataset file from the Location (Drive) using pd.read_csv df = pd.read_csv('/Housing Price Dataset - Housing.csv') df -head() How to show the first 5 and last 5 entiries from the sheets ? v dF head() df tail() How to find the total number of rows and column in excel sheet ? ‘#df.shape() hitpssfcola research, google. comidrveltymwpji6kést0_fyHY G9eZG77YHIvS4V8HprintMode=ttue 23 2128125, 9:47 AM CCheatShest.ipynb -Colab Idendify Features and labels There are 12 featues in this data, and the target/label is predicting the price of the based on these features. Exploratory Data Analysis Explain the need of EDA. We will preprocess data and cleaning of the data. . How to check the total number of record from the datset . Handle missing values How to fill missing values . How to delete the columns/rows How to add new column . Label Encoding . How to deal with duplicate values @PNAARYONA How to get a statistical inference of data 9. Plotting the relatip between feartures of the data 10. Feature Engineering 11, Data Visualization 12. Understanding relationship between the variables 13, Drawing Conclusion Describe Function : Gives the statistical facts of the dataset df describe() ¥ *null values #4f-null() hitpssfcola research, google. comidrveltymwpji6kést0_fyHY G9eZG77YHIvS4VBHpriniModesttue 33 2128125, 9:47 AM CCheatShest.ipynb -Colab v_ Finding Null Values using [.isna().sum] df isna().sum() How to undertand the complete data, its column (fearures, their datatypes), null values, etc ? df info Finding the Unique values in the for the column v furnishingstatus. #dF[' Column_name" ] .nunique() iF) 3 ¥ How to fill a empty column in dataframe ? sdf 'Column_nane"].fillna('Not req*, inplace=True) By filling the numerical value with some stat oprtaion (mean, median, mode)etc. #mean_value = df['Anount"].mean() # Replace NaNs in column Amount with the mean of values in the same column #d#['Anount'] fillna(value=mean_value, inplace=True) print (‘updated Datafrane:') sprint (df.head) v Droping/Deleting the empty/not required columns. hitpssfcola research, google. comidrveltymwpji6kést0_fyHY G9eZG77YHIvS4VBHpriniModesttue 43 2128125, 9:47 AM CCheatShest.ipynb -Colab #drop unrelated/blank columns #df.drop([ "Status", ‘unnamedi', 'Newcolunn’], axis=1, inplace=True) Label_Encoding y Converting all categorical valures in continous form. #columns_to_transform = [‘mainroad', ‘guestroom’, ‘basement’, ‘hotwaterheating',, 'airconditior #df[colunns_to_transforn] = df[columns_to_transform].replace({'yes’: 1, ‘no’: @}) Hdf['furnishingstatus'] = df[‘furnishingstatus'].replace({‘unfurnished': @, ‘semi-furnished’ sdf -head() v StandardScaler StandardScaler is a preprocessing technique in scikit-learn used for standardizing features by removing the mean and scaling to unit variance. StandardScaler, a popular preprocessing technique provided by scikit-learn, offers a simple yet effective method for standardizing feature values. Let's delve deeper into the workings of StandardScaler: Normalization Process: StandardScaler operates on the principle of normalization, where it transforms the distribution of each feature to have a mean of zero and a standard deviation of one. This process ensures that all features are on the same scale, preventing any single feature from dominating the learning process due to its larger magnitude In essence, StandardScaler is a versatile and widely used preprocessing technique that contributes to the robustness, interpretability, and performance of machine learning models trained on diverse datasets. Understanding its principles and application is essential for effectively preparing data for model training and achieving reliable results in various machine-learning tasks. from sklearn.preprocessing import StandardScaler scaler = StandardScaler() #Columns = [‘price', ‘area'] adf[sc] = scaler. fit_transform(df[Columns]) hitpssfcola research, google. comidrveltymwpji6kést0_fyHY G9eZG77YHIvS4VBHpriniModesttue 513 2128125, 9:47 AM CCheatShest.ipynb -Colab itdf .head() y Check the datatypes for all the variables. taf. dtypes f.shape - Nonetror Traceback (nost recent call last) Cipython-input-13-653337079cd8> in () ee Ge shape NameError: name ‘df' is not defined » Correlational Matrix: A correlation matrix is a table that shows the correlation coefficients between a set of variables. It's a tool used to identify patterns and trends in data *What does it show? * Correlation coefficients The correlation coefficient measures how closely two variables are related. It can range from -1 to +1, with 0 indicating no correlation and 1 indicating a perfect prediction Direction A positive value indicates a positive relationship, while a negative value indicates a negative relationship. What's it used for? Summarizing data: A correlation matrix can summarize a large dataset. Identifying patterns: A correlation matrix can help identify patterns and trends in data. Understanding relationships: A correlation matrix can help understand the relationships between variables. #eorr_matrix = df.corr() hitpssfcola research, google. comidrveltymwpji6kést0_fyHY G9eZG77YHIvS4VBHpriniModesttue 63 2128125, 9:47 AM CCheatShest.ipynb -Colab #plt .Figure(Figsize=(18, 5)) fsns.heatmap(corr_matrix, annot=True, cmap="coolwarm” ) plt.show() Data Visualization using seaborn and matplotlib ¥ Histogram A histogram is a graph that shows the frequency distribution of numerical data. It's used to represent continuous or discrete data, and is especially useful for large data sets How a histogram works ? A histogram divides the data into groups called bins. The height of each bin’s rectangle represents the number of data points in that bin. The width of each bin’s rectangle represents the value of the variable. #df hist (Figsize=(10, 10), bins=10) Hplt. suptitle("Histograms for All Columns", fontsize=16) fplt.show() ¥ 1. Gender count plot from data : Bar Graph # plotting a bar chart for Gender and it's count #ax = sns.countplot(x = ‘Gender',data = df) #for bars in ax.containers: # — ax.bar_label(bars) v Pie Chart hitpssfcola research, google. comidrveltymwpji6kést0_fyHY G9eZG77YHIvS4VBHpriniModesttue m3 2128125, 9:47 AM CCheatShest.ipynb -Colab plotting a pie chart for Gender and it's count # Calculate value counts for Gender #gender_counts = df['Gender' ].value_counts() # Create a pie chart using matplotlib #plt.pie(gender_counts, labels=gender_counts.index, autopct='%1.1f%%') #plt.title( ‘Gender Distribution’) #plt. show) v Line Graph # total number of orders from top 18 states Hsales_state = df.groupby(['State’], as_index=False)[ ‘Orders’ ].sum().sort_values(by="Orders* sns.set(rc={'Figure.figsize’ :(15,5)}) sns.lineplot(data = sales_state, x = 'State',y= 'Orders') ¥ Separating the featues and labesl from data frame. Now, X stroes all the independent values, and y stores the dependent values. X will have all features. y will have target value (price) 8x ay df.drop('price’, axis=1) dF[ ‘price’ ] NameError Traceback (most recent call last) in () ----> 1 df.head() NameError: name ‘df! is not defined hitpssfcola research, google. comidrveltymwpji6kést0_fyHY G9eZG77YHIvS4V8HprintMode=ttue ans 2128125, 9:47 AM CCheatShest.ipynb -Colab Preaparing data feature set and labels for training of the model. #print(X.shape) sprint (y.shape) v Look at the some samples of X and y. #print (x [:10]) sprint (y [:1@]) Now, the task is to split the training data and testing data y_ for model training, by importing train-test split from sklearn. #from sklearn.model_selection import train_test_split Y Creating the Training and testing data with a split train, X test, y_train, y test = train_test_split(X, y, test_size-0.3, random_state=42) v Loading the model from Sklearn from sklearn.linear_model import LinearRegression #1r_model = LinearRegression() hitpssfcola research, google. comidrveltymwpji6kést0_fyHY G9eZG77YHIvS4VBHpriniModesttue ons 2128125, 9:47 AM CCheatShest.ipynb -Colab v Train the model to the dataset using fit function #Ir_model.fit(X_train, y_train) Now, Test the trained model on test data using Predict function #1r_y_pred = 1r_model.predict(x_test) MODEL EVALUATION : Importing the Error metrices from sklearn from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score y Calculating MSE, MAE, & R2 for regression mdel #mse = mean_squared_error(y test, 1r_y pred) #mae = mean_absolute_error(y_test,1r_y_pred) #r2 = r2_score(y_test, Ir_y_pred) sprint ("\nModel Performance Metrics:") print (f"Mean Squared Error (MSE): {mse:.2F}") print (#"Mean Absolute Error (MAE): {nae:.2f}") print (#"R-Squared (R2): {r2:.24)") ¥ Confusion Matrix #class_labels = labels_test ‘from sklearn.metrics import confusion_natrix #plt.figure(figsize=(8,8)) Hy_pred_labels = [ np.argmax(label) for label in predicted_classes ] icm = confusion matrix(y_test, y_pred_labels) hitpssfcola research, google. comidrveltymwpji6kést0_fyHY G9eZG77YHIvS4VBHpriniModesttue s0113 2128125, 9:47 AM CCheatShest.ipynb -Colab # show cm fsns.heatmap(cm, annot=True, fmt="d' ,xticklabels=class_labels, yticklabels=class_labels) v Classification Report #from sklearn.metrics import classification_report ficr= classification_report(y test, y_pred_labels, target_names-labels_test) print (cr) Double-click (or enter) to edit Y Plotting the Regression Line splt.figure(figsize=(8, 6)) uplt-scatter(y test, 1ry pred, color="blue’, [email protected], label="Predictions') uplt.plot({min(y_test), max(y_test)], [min(y_test), max(y_test)], color="red", linestyli #plt_xlabel(‘Target Values’) #plt.ylabel (‘Predicted Values") tplt.title('Linear Regression Predicted vs Target Values’) #plt legend() aplt.grid(True) #plt.show() v Score—R Square Value #print( ‘Training score’ ,1r_model.score(x_train,y train)) aprint( ‘Testing score’ ,1r_model.score(x_test,y_test)) ¥ Importantn: How to Save a model Pickle library is used to save a model and use this in real time. import pickle hitpssfcola research, google. comidrveltymwpji6kést0_fyHY G9eZG77YHIvS4V8HprintMode=ttue 3 2128125, 9:47 AM CCheatShest.ipynb -Colab v Save the model wb! means 'write binary’ and is used for the file handle: open(‘save.p’, ‘wb’) which writes the pickeled data into a file. fwith open(‘model_pickle",'wb') as file: # — pickle.dump(Ir_model, file) from sklearn.linear_model import LinearRegression f#1r_nodel = LinearRegression() #1r_model.fit(X_train, y_train) y load the model with a name with open(‘model_pickle",'rb') as file: # LR = pickle. load(file) Calculate the coefficiets for the regression line for this data. #LR. coef_ Fy array((0.29361062, 0.04427349, 0.59793397, 0.22250551, @.21850242, 0.14958561, 0.25952496, 0.33174227, 0.36388859, 0.16271987, 0.27261478, 0.10597148]) ¥ Calculate the intercept for the regression line for this data #LR. intercept_ Sy -1.9988122975861022 Equation for Multi-Regression Model for this house price < prediction hitpssfcola research, google. comidrveltymwpji6kést0_fyHY G9eZG77YHIvS4VBHpriniModesttue rans 2128125, 9:47 AM CCheatShest.ipynb -Colab Y = atb1X+b2X2+b3X3+b4X4+b5X5+b6X6+b7X6+b8X8+b9X9+b10X10+b11X11+b12X12 Y = -1,9988 +X1 0.29361062 + X2 0.04427349 + X3 0.59793397 + X4 0.22250551 + XS 0.21850242 + X6 0.14958561 + X7 0.25952496 + X8 0,33174227 + X9 0.36388859 X10 0.16271987 + X10 0.27261478 + X11 0.10597148 #d€[ "bedrooms" ] .unique() array([4, 3, 5, 2, 6, 1]) v Predicting house price for some user input values. #LR. predict ([[3.000677,3,2,1,0,2,0,1,1,2,1,2]]) By array({2.08862621]) v Another way: By creating a new_var fnew_data = [2.347980,3,2,3,1,0,1,1,1,2,1,3] WLR. predict ([new_data]) Conclusion Discuss results, Write down 4-5 lines about the model used, training time and testin time. Write down advantages and solved probelm closing statements. Hence, we have learn to build our first. machine learning based model for house price predicts based on user input. hitpssfcola research, google. comidrveltymwpji6kést0_fyHY G9eZG77YHIvS4V8HprintMode=ttue 19113

You might also like