It Report
It Report
Report
of
Industrial Training
On
Bachelor of Technology
in
Submitted By Guide
Aaditya vyas Dr. Vijeta Kumawat
20EJCCS002 Associate Professor
CERTIFICATE
This is to certify that the industrial training entitled “Data Science with ML and AI” is
the bonafide work carried out by student of B.Tech. in Computer Science & Engineering
at Jaipur Engineering College and Research Centre, during the year 2023-24 in partial
fulfillment of the requirements for the award of the Degree of Bachelor of Technology in
Computer Science & Engineering under my guidance.
Place: Jaipur
ii
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
Training Certificate
iii
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
To become renowned Centre of excellence in computer science and engineering and make
competent engineers & professionals with high ethical values prepared for lifelong learning.
1. To impart outcome based education for emerging technologies in the field of computer science
and engineering.
2. To provide opportunities for interaction between academia and industry.
3. To provide platform for lifelong learning by accepting the change in technologies
4. To develop aptitude of fulfilling social responsibilities.
iv
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
v
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
11. Project management and finance: Demonstrate knowledge and understanding of the
engineering and management principles and apply these to one’s own work, as a member and
leader in a team, to manage projects and in multidisciplinary environments.
12. Life-long learning: Recognize the need for, and have the preparation and ability to engage in
independent and life-long learning in the broadest context of technological change.
1. To produce graduates who are able to apply computer engineering knowledge to provide
turn-key IT solutions to national and international organizations.
2. To produce graduates with the necessary background and technical skills to work
professionally in one or more of the areas like – IT solution design development and
implementation consisting of system design, network design, software design and
development, system implementation and management etc. Graduates would be able to
provide solutions through logical and analytical thinking.
3. To able graduates to design embedded systems for industrial applications.
4. To inculcate in graduates effective communication skills and team work skills to enable
them to work in multidisciplinary environment.
5. To prepare graduates for personal and professional success with commitment to their ethical
and social responsibilities.
vi
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
PSO2: Ability to design and develop mobile and web-based applications under realistic
constraints.
vii
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
ACKNOWLEDGEMENT
It has been a great honour and privilege to undergo training at UPflairs pvt.ltd, Jaipur. I am very
grateful to Mr. peeyush Sir giving his valuable time and constructive guidance in preparing the
report for training. It would not have been possible to complete this report in short period of time
without their kind encouragement and valuable guidance.
I wish to express our deep sense of gratitude to our Industrial Training Guide Dr. Vijeta Kumawat,
Deputy HOD & Associate Professor, Department of CSE, Jaipur Engineering College and Research
Centre, Jaipur for guiding us from the inception till the completion of the industrial training. We
sincerely acknowledge him for giving his valuable guidance, support for literature survey, critical
reviews and comments for our industrial training. I would like to first of all express our thanks to
Mr. Arpit Agrawal Director, JECRC Foundation, for providing us such a great infrastructure and
environment for our overall development. I express sincere thanks to Dr. V. K. Chandna, Principal,
JECRC College, for his kind cooperation and extendible support towards the completion of our
industrial training. Words are inadequate in offering our thanks to Dr. Sanjay Gaur, HOD, CSE
viii
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
department, for consistent encouragement and support for shaping our industrial training in the
presentable form. Also our warm thanks to Jaipur Engineering College and Research Centre, who
provided us this opportunity to carryout, this prestigious industrial training and enhance our
learning in various technical fields.
Aaditya vyas
20EJCCS002
ABSTRACT
Data Science has become the most demanding job of the 21st century. Every
organization is looking for candidates with knowledge of data science. Data science is
a deep study of the massive amount of data, which involves extracting meaningful
insights from raw, structured, and unstructured data that is processed using the
scientific method, different technologies, and algorithms.
Industrial training is an important phase of a student life. A well planned, properly executed and
evaluated industrial training helps a lot in developing a professional attitude.
The aim and motivation of this industrial training is to receive discipline, skills, teamwork and
technical knowledge through a proper training environment, which will help me, as a student in
the field of Computer Science, to develop a responsiveness of the self-disciplinary nature of
ix
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
problems in information and communication technology. Data science uses the most
powerful hardware, programming systems, and most efficient algorithms to solve the
data related problems. It is the future of artificial intelligence.
List of Figures:
1. Training Certificate
2. Basics Of Python
3. Advanced Python
4. Python Libraries
5. Machine
Learning
6. Screenshots of Outputs
7. Screenshots of Outputs
8. Screenshots of Outputs
9. Screenshots of Outputs
x
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
TABLE OF CONTENTS
Title page i
Certificate ii
Acknowledgement ix
Abstract x
List of Figures ix
1. Introduction
xi
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
3. Modules in python
3.1 Numpy 20
3.2 Pandas 22
3.3 Matplotlib 24
3.4 Scikit-learn 26
5. Machine Learning
5.1 Scatter plot 28
5.2 Linear Regression 29
5.3 Logistic Regression 29
5.4 Workflow of Machine learning 32
6. Project
7. Conclusion 46
8. Reference 46
CHAPTER 1
xii
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
INTRODUCTION
The company deals in Information Technology training for students of B.Tech, M.tech., BCA,
MCA, etc. We expertise in software solutions and consultancy.We also provide Corporate
trainings and Software Development Assistance.
xiii
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
1.6 Conclusion
This course is offered from the Techienest pvt. limited and it offers various types of
specializations, courses.
CHAPTER 2
xiv
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
What is Python?
Python is a popular programming language. It was created by Guido van Rossum, and released in
1991.
It is used for:
Why Python?
Python works on different platforms (Windows, Mac, Linux, Raspberry Pi, etc).
Python has a simple syntax similar to the English language.
Python has syntax that allows developers to write programs with fewer lines than some
other programming languages.
Python runs on an interpreter system, meaning that code can be executed as soon as it is
written. This means that prototyping can be very quick.
Python can be treated in a procedural way, an object-oriented way or a functional way.
Good to know
The most recent major version of Python is Python 3, which we shall be using in this
tutorial. However, Python 2, although not being updated with anything other than security
updates, is still quite popular.
In this tutorial Python will be written in a text editor. It is possible to write Python in an
Integrated Development Environment, such as Thonny, Pycharm, Netbeans or Eclipse
which are particularly useful when managing larger collections of Python files.
xv
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
Python was designed for readability, and has some similarities to the English language with
influence from mathematics.
Python uses new lines to complete a command, as opposed to other programming languages
which often use semicolons or parentheses.
Python relies on indentation, using whitespace, to define scope; such as the scope of loops,
functions and classes. Other programming languages often use curly-brackets for this
purpose.
Python features
xvi
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
xvii
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
CHAPTER 3
Data Science
What is Data?
Data can come in the form of text, observations, figures, images, numbers, graphs, or
symbols. For example, data might include individual prices, weights, addresses, ages,
names, temperatures, dates, or distances. Data is a raw form of knowledge and, on its own,
doesn't carry any significance or purpose.
Types of Data :
A. Qualitative\Quantitative Data
B. Discrete\Continuous Data
C. Nominal\Ordinal Data
D. Primary\Secondary Data
xviii
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
● Data Preparation
● Feature selection
● Exploratory Data Analysis
● Model development
● Test the Model/Hypothesis Testing
● Communicate the findings to the Business Leaders
● Deployment ( Data as a product)
xix
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
Data Visualization
Data visualization is the graphical representation of information and data. By using visual
elements like charts, graphs, and maps, data visualization tools provide an accessible way to
see and understand trends, outliers, and patterns in data. Additionally, it provides an
excellent way for employees or business owners to present data to non-technical audiences
without confusion.
Examples:
Area Map: A form of geospatial visualization, area maps are used to show specific
values set over a map of a country, state, county, or any other geographic location.
Two common types of area maps are choropleths and isopleths. Learn more.
Bar Chart: Bar charts represent numerical values compared to each other. The length
of the bar represents the value of each variable. Learn more.
Box-and-whisker Plots: These show a selection of ranges (the box) across a set
measure (the bar). Learn more.
Gantt Chart: Typically used in project management, Gantt charts are a bar chart
depiction of timelines and tasks. Learn more.
Heat Map: A type of geospatial visualization in map form which displays specific
data values as different colors (this doesn’t need to be temperatures, but that is a
common use).
xx
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
CHAPTER 4
Modules in Python
Numpy
NumPy is a Python library used for working with arrays. It also has functions for working in
domain of linear algebra, fourier transform, and matrices.
NumPy stands for Numerical Python. In Python we have lists that serve the purpose of arrays, but
they are slow to process.
NumPy aims to provide an array object that is up to 50x faster than traditional Python lists. The
array object in NumPy is called ndarray.
It provides a lot of supporting functions that make working with ndarray very easy. Arrays are very
frequently used in data science, where speed and resources are very important
Import numpy as np
arr=np.array([1,2,3,4,5])
Uses of Numpy
Arithmetic Operations
Searching, Sorting and Counting
Bitwise Operators
Linear Algebra
Matrix Operations
xxi
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
xxii
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
Pandas
The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was
created by Wes McKinney in 2008.
Pandas allows us to analyze big data and make conclusions based on statistical theories.
Pandas can clean messy data sets, and make them readable and relevant.
Pandas are also able to delete rows that are not relevant, or contains wrong values, like empty or
NULL values. This is called cleaning the data.
Features of Pandas
xxiii
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
xxiv
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
Python Matplotib
Matplotlib is a low level graph plotting library in python that serves as a visualization utility.
Matplotlib is mostly written in python, a few segments are written in C, Objective-C and Javascript
for Platform compatibility.
If you have Python and pip already installed on a system, then installation of Matplotlib is very
easy.
Most of the Matplotlib utilities lies under the pyplot submodule, and are usually imported
under the plt alias:
xpoints = np.array([0,6])
ypoints = np.array([0,250])
plt.plot(xpoints, ypoints)
plt.show()
xxv
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
xxvi
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
Python Scikit-learn
Scikit-learn (Sklearn) is the most useful and robust library for machine learning in Python. It
provides a selection of efficient tools for machine learning and statistical modeling including
classification, regression, clustering and dimensionality reduction via a consistence interface in
Python.
Features
Rather than focusing on loading, manipulating and summarising data, Scikit-learn library is focused
on modeling the data. Some of the most popular groups of models provided by Sklearn are as
follows −
Supervised Learning algorithms − Almost all the popular supervised learning algorithms, like
Linear Regression, Support Vector Machine (SVM), Decision Tree etc., are the part of scikit-learn.
Unsupervised Learning algorithms − On the other hand, it also has all the popular unsupervised
learning algorithms from clustering, factor analysis, PCA (Principal Component Analysis) to
unsupervised neural networks.
Clustering − This model is used for grouping unlabeled data.
Cross Validation − It is used to check the accuracy of supervised models on unseen data.
Dimensionality Reduction − It is used for reducing the number of attributes in data which can be
further used for summarisation, visualisation and feature selection.
Ensemble methods − As name suggest, it is used for combining the predictions of multiple
supervised models.
Feature extraction − It is used to extract the features from data to define the attributes in image
and text data.
Feature selection − It is used to identify useful attributes to create supervised models.
Open Source − It is open source library and also commercially usable under BSD license.
xxvii
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
xxviii
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
CHAPTER 5
Machine Learning
Machine Learning is making the computer learn from studying data and statistics.
Machine Learning is a program that analyses data and learns to predict the outcome.
Scatter Plot
A scatter plot is a diagram where each value in the data set is represented by a dot.
xxix
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
Regression
The term regression is used when you try to find the relationship between variables.
In Machine Learning, and in statistical modeling, that relationship is used to predict the outcome of
future events.
Linear Regression
Linear regression uses the relationship between the data-points to draw a straight line through all
them.
xxx
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
Logistic Regression
Logistic regression aims to solve classification problems. It does this by predicting categorical
outcomes, unlike linear regression that predicts a continuous outcome.
In the simplest case there are two outcomes, which is called binomial, an example of which is
predicting if a tumor is malignant or benign. Other cases have more than two outcomes to classify,
xxxi
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
in this case it is called multinomial. A common example for multinomial logistic regression would
be predicting the class of an iris flower between 3 different species.
Logistic Regression has become a classification technique only when a decision threshold is
brought into the picture. The setting of threshold value is a very important aspect of logistic
regression and is dependent on a classification problem itself.
xxxii
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
1. Gathering data
2. Data pre-processing
3. Researching the model that will be best for the type of data
xxxiii
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
xxxiv
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
Project-
Real Estate Bangalore House Prediction
The aim is to predict the efficient house pricing for real estate customers with respect to their
budgets and priorities. By analyzing previous market trends and price ranges, and also upcoming
developments future prices will be predicted. The functioning involves a website which accepts
customers specifications and then combines the application of Naive bayes algorithm of data
mining. This application will help customers to invest in an estate without approaching an agent. It
also decreases the risk involved in the transaction.
Program code-
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
%matplotlib inline
matplotlib.rcParams["figure.figsize"] = (20,10)
df1 = pd.read_csv("bengaluru_house_prices.csv")
df1.dtypes
df1.shape
df1.columns
df1['area_type'].unique()
df1['area_type'].value_counts()
xxxv
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
df2 = df1.drop(['area_type','society','balcony','availability'],axis='columns')
df2.shape
df2.head()
df2.isnull().sum()
df2.shape
df3 = df2.dropna()
df3.isnull().sum()
df3.shape
df3.head()
df3.bhk.unique()
def is_float(x):
try:
float(x)
except:
return False
return True
2+3
xxxvi
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
df3[~df3['total_sqft'].apply(is_float)].head(10)
def convert_sqft_to_num(x):
tokens = x.split('-')
if len(tokens) == 2:
return (float(tokens[0])+float(tokens[1]))/2
try:
return float(x)
except:
return None
df4 = df3.copy()
df4.total_sqft = df4.total_sqft.apply(convert_sqft_to_num)
df4 = df4[df4.total_sqft.notnull()]
df4.head(2)
df4.loc[30]
(2100+2850)/2
df5 = df4.copy()
df5['price_per_sqft'] = df5['price']*100000/df5['total_sqft']
df5.head()
df5_stats = df5['price_per_sqft'].describe()
xxxvii
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
df5_stats
df5.to_csv("bhp.csv",index=False)
location_stats.values.sum()
len(location_stats[location_stats>10])
len(location_stats)
len(location_stats[location_stats<=10])
location_stats_less_than_10 = location_stats[location_stats<=10]
location_stats_less_than_10
len(df5.location.unique())
xxxviii
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
len(df5.location.unique())
df5.head(10)
df5[df5.total_sqft/df5.bhk<300].head()
df5.shape
df6 = df5[~(df5.total_sqft/df5.bhk<300)]
df6.shape
df6.price_per_sqft.describe()
# **Here we find that min price per sqft is 267 rs/sqft whereas max is 12000000, this shows a wide
variation in property prices. We should remove outliers per location using mean and one standard
deviation**
def remove_pps_outliers(df):
df_out = pd.DataFrame()
for key, subdf in df.groupby('location'):
m = np.mean(subdf.price_per_sqft)
st = np.std(subdf.price_per_sqft)
reduced_df = subdf[(subdf.price_per_sqft>(m-st)) & (subdf.price_per_sqft<=(m+st))]
df_out = pd.concat([df_out,reduced_df],ignore_index=True)
return df_out
df7 = remove_pps_outliers(df6)
df7.shape
xxxix
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
def plot_scatter_chart(df,location):
bhk2 = df[(df.location==location) & (df.bhk==2)]
bhk3 = df[(df.location==location) & (df.bhk==3)]
matplotlib.rcParams['figure.figsize'] = (15,10)
plt.scatter(bhk2.total_sqft,bhk2.price,color='blue',label='2 BHK', s=50)
plt.scatter(bhk3.total_sqft,bhk3.price,marker='+', color='green',label='3 BHK', s=50)
plt.xlabel("Total Square Feet Area")
plt.ylabel("Price (Lakh Indian Rupees)")
plt.title(location)
plt.legend()
plot_scatter_chart(df7,"Rajaji Nagar")
plot_scatter_chart(df7,"Hebbal")
def remove_bhk_outliers(df):
exclude_indices = np.array([])
for location, location_df in df.groupby('location'):
bhk_stats = {}
for bhk, bhk_df in location_df.groupby('bhk'):
bhk_stats[bhk] = {
'mean': np.mean(bhk_df.price_per_sqft),
'std': np.std(bhk_df.price_per_sqft),
'count': bhk_df.shape[0]
}
for bhk, bhk_df in location_df.groupby('bhk'):
stats = bhk_stats.get(bhk-1)
if stats and stats['count']>5:
exclude_indices = np.append(exclude_indices,
bhk_df[bhk_df.price_per_sqft<(stats['mean'])].index.values)
return df.drop(exclude_indices,axis='index')
df8 = remove_bhk_outliers(df7)
# df8 = df7.copy()
df8.shape
xl
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
plot_scatter_chart(df8,"Rajaji Nagar")
plot_scatter_chart(df8,"Hebbal")
import matplotlib
matplotlib.rcParams["figure.figsize"] = (20,10)
plt.hist(df8.price_per_sqft,rwidth=0.8)
plt.xlabel("Price Per Square Feet")
plt.ylabel("Count")
df8.bath.unique()
plt.hist(df8.bath,rwidth=0.8)
plt.xlabel("Number of bathrooms")
plt.ylabel("Count")
df8[df8.bath>10]
df8[df8.bath>df8.bhk+2]
df9 = df8[df8.bath<df8.bhk+2]
df9.shape
xli
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
df9.head(2)
df10 = df9.drop(['size','price_per_sqft'],axis='columns')
df10.head(3)
dummies = pd.get_dummies(df10.location)
dummies.head(3)
df11 = pd.concat([df10,dummies.drop('other',axis='columns')],axis='columns')
df11.head()
df12 = df11.drop('location',axis='columns')
df12.head(2)
df12.shape
X = df12.drop(['price'],axis='columns')
X.head(3)
X.shape
y = df12.price
y.head(3)
len(y)
xlii
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
cross_val_score(LinearRegression(), X, y, cv=cv)
def find_best_model_using_gridsearchcv(X,y):
algos = {
'linear_regression' : {
'model': LinearRegression(),
'params': {
'normalize': [True, False]
}
},
'lasso': {
'model': Lasso(),
'params': {
'alpha': [1,2],
'selection': ['random', 'cyclic']
}
xliii
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
},
'decision_tree': {
'model': DecisionTreeRegressor(),
'params': {
'criterion' : ['mse','friedman_mse'],
'splitter': ['best','random']
}
}
}
scores = []
cv = ShuffleSplit(n_splits=5, test_size=0.2, random_state=0)
for algo_name, config in algos.items():
gs = GridSearchCV(config['model'], config['params'], cv=cv, return_train_score=False)
gs.fit(X,y)
scores.append({
'model': algo_name,
'best_score': gs.best_score_,
'best_params': gs.best_params_
})
return pd.DataFrame(scores,columns=['model','best_score','best_params'])
find_best_model_using_gridsearchcv(X,y)
def predict_price(location,sqft,bath,bhk):
loc_index = np.where(X.columns==location)[0][0]
x = np.zeros(len(X.columns))
x[0] = sqft
x[1] = bath
x[2] = bhk
if loc_index >= 0:
x[loc_index] = 1
return lr_clf.predict([x])[0]
xliv
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
predict_price('Indira Nagar',1000, 2, 2)
predict_price('Indira Nagar',1000, 3, 3)
import pickle
with open('banglore_home_prices_model.pickle','wb') as f:
pickle.dump(lr_clf,f)
import json
columns = {
'data_columns' : [col.lower() for col in X.columns]
}
with open("columns.json","w") as f:
f.write(json.dumps(columns))
Output-
xlv
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
xlvi
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
xlvii
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
xlviii
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
Conclusion
The main aim of this project is to predict the price of Real Estate Properties using the
various Machine Learning (ML) models. Machine Learning project is a must for aspiring
developers. This project helps developers develop real-world projects to hone their skills
and materialise their theoretical knowledge into practical experience. Machine Learning has
significant advantages both as a commercial language and also as a teaching language.
Industrial training is significantly beneficial to all concerned parties in contributing towards the
development of the nation. Being a student, one can acquire Industrial experiences and at the
xlix
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023
same time familiarize themselves with the real working environment at the
Industrial training sites.
Future Scope
The Future Scope of Data Scientist is Data is being regularly collected by businesses and
companies for transactions and through website interactions. Many companies face a
common challenge – to analyze and categorize the data that is collected and stored. A data
scientist becomes the savior in a situation of mayhem like this. Companies can progress a lot
with proper and efficient handling of data, which results in productivity.
The future is all about automating processes and utilizing the heaps of data to make
intelligent decisions. This puts to the forefront technologies such as artificial intelligence
(AI), machine and deep learning, Internet of Things (IoT), etc.
REFERENCES-
[1] https://www.geeksforgeeks.org/
[2] https://github.com/
[3] https://www.javatpoint.com/
[4] https://www.kaggle.com/datasets/amitabhajoy/bengaluru-house-price-data
Thank You
l