0% found this document useful (0 votes)
10 views50 pages

It Report

The document is a report on industrial training in Data Science with Machine Learning and Artificial Intelligence, submitted by Aaditya Vyas as part of his Bachelor of Technology degree in Computer Science and Engineering at Jaipur Engineering College. It includes details about the training program, objectives, outcomes, and acknowledgments, emphasizing the importance of data science in today's job market. The report also outlines the vision, mission, program outcomes, and specific skills acquired during the training period.

Uploaded by

v7king2202
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views50 pages

It Report

The document is a report on industrial training in Data Science with Machine Learning and Artificial Intelligence, submitted by Aaditya Vyas as part of his Bachelor of Technology degree in Computer Science and Engineering at Jaipur Engineering College. It includes details about the training program, objectives, outcomes, and acknowledgments, emphasizing the importance of data science in today's job market. The report also outlines the vision, mission, program outcomes, and specific skills acquired during the training period.

Uploaded by

v7king2202
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 50

A

Report

of

Industrial Training

On

DATA SCIENCE WITH ML AND AI

Submitted in partial fulfillment for the award of degree of

Bachelor of Technology

in

Computer Science & Engineering

Submitted By Guide
Aaditya vyas Dr. Vijeta Kumawat
20EJCCS002 Associate Professor

Department of Computer Science & Engineering


Jaipur Engineering College & Research Centre
Jaipur, Rajasthan
2022-23
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

CERTIFICATE

This is to certify that the industrial training entitled “Data Science with ML and AI” is
the bonafide work carried out by student of B.Tech. in Computer Science & Engineering
at Jaipur Engineering College and Research Centre, during the year 2023-24 in partial
fulfillment of the requirements for the award of the Degree of Bachelor of Technology in
Computer Science & Engineering under my guidance.

Name of Guide : Dr. Vijeta Kumawat

Designation : Associate Professor

Place: Jaipur

Date: 3 December 2023

ii
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

Training Certificate

iii
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

VISION OF CSE DEPARTMENT

To become renowned Centre of excellence in computer science and engineering and make
competent engineers & professionals with high ethical values prepared for lifelong learning.

MISSION OF CSE DEPARTMENT

1. To impart outcome based education for emerging technologies in the field of computer science
and engineering.
2. To provide opportunities for interaction between academia and industry.
3. To provide platform for lifelong learning by accepting the change in technologies
4. To develop aptitude of fulfilling social responsibilities.

iv
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

PROGRAM OUTCOMES (POs)


1. Engineering knowledge: Apply the knowledge of mathematics, science, engineering
fundamentals, and an engineering specialization to the solution of complex engineering
problems.
2. Problem analysis: Identify, formulate, research literature, and analyze complex engineering
problems reaching substantiated conclusions using first principles of mathematics, natural
sciences, and engineering sciences.
3. Design/development of solutions: Design solutions for complex engineering problems and
design system components or processes that meet the specified needs with appropriate
consideration for the public health and safety, and the cultural, societal, and environmental
considerations.
4. Conduct investigations of complex problems: Use research-based knowledge and research
methods including design of experiments, analysis and interpretation of data, and synthesis of
the information to provide valid conclusions.
5. Modern tool usage: Create, select, and apply appropriate techniques, resources, and modern
engineering and IT tools including prediction and modeling to complex engineering activities
with an understanding of the limitations.
6. The engineer and society: Apply reasoning informed by the contextual knowledge to assess
societal, health, safety, legal and cultural issues and the consequent responsibilities relevant to
the professional engineering practice.
7. Environment and sustainability: Understand the impact of the professional engineering
solutions in societal and environmental contexts, and demonstrate the knowledge of, and need
for sustainable development.
8. Ethics: Apply ethical principles and commit to professional ethics and responsibilities and
norms of the engineering practice.
9. Individual and team work: Function effectively as an individual, and as a member or leader in
diverse teams, and in multidisciplinary settings.
10. Communication: Communicate effectively on complex engineering activities with the
engineering community and with society at large, such as, being able to comprehend and write
effective reports and design documentation, make effective presentations, and give and receive
clear instructions.

v
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

11. Project management and finance: Demonstrate knowledge and understanding of the
engineering and management principles and apply these to one’s own work, as a member and
leader in a team, to manage projects and in multidisciplinary environments.
12. Life-long learning: Recognize the need for, and have the preparation and ability to engage in
independent and life-long learning in the broadest context of technological change.

PROGRAM EDUCATIONAL OBJECTIVES (PEOs)

The PEOs of the B.Tech (CSE) program are:

1. To produce graduates who are able to apply computer engineering knowledge to provide
turn-key IT solutions to national and international organizations.

2. To produce graduates with the necessary background and technical skills to work
professionally in one or more of the areas like – IT solution design development and
implementation consisting of system design, network design, software design and
development, system implementation and management etc. Graduates would be able to
provide solutions through logical and analytical thinking.
3. To able graduates to design embedded systems for industrial applications.
4. To inculcate in graduates effective communication skills and team work skills to enable
them to work in multidisciplinary environment.
5. To prepare graduates for personal and professional success with commitment to their ethical
and social responsibilities.

PROGRAM SPECIFIC OUTCOMES (PSOs)


 PSO1: Ability to interpret and analyze network specific and cyber security issues in real world
environment.

vi
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

 PSO2: Ability to design and develop mobile and web-based applications under realistic
constraints.

COURSE OUTCOMES (COs)


On completion of project Graduates will be able to-
 CO1: Generate the report based on the Projects carried out for demonstrating the ability to
apply the knowledge of engineering field during training
 CO2: Demonstrate Competency in relevant engineering fields through problem
identification, formulation and solution.

MAPPING: CO’s & PO’s


Subject Code Cos Program Outcomes (POs)

PO- PO PO- PO- PO- PO PO- PO- PO- PO PO- PO-


1 -2 3 4 5 -6 7 8 9 -10 11 12
3 3 2 2 2 1 1 2 2 3 3 3
3CS7-30 CO-1
Industrial Training 3 3 3 3 3 1 1 2 2 3 3 3
CO-2

vii
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

ACKNOWLEDGEMENT

It has been a great honour and privilege to undergo training at UPflairs pvt.ltd, Jaipur. I am very
grateful to Mr. peeyush Sir giving his valuable time and constructive guidance in preparing the
report for training. It would not have been possible to complete this report in short period of time
without their kind encouragement and valuable guidance.

I wish to express our deep sense of gratitude to our Industrial Training Guide Dr. Vijeta Kumawat,
Deputy HOD & Associate Professor, Department of CSE, Jaipur Engineering College and Research
Centre, Jaipur for guiding us from the inception till the completion of the industrial training. We
sincerely acknowledge him for giving his valuable guidance, support for literature survey, critical
reviews and comments for our industrial training. I would like to first of all express our thanks to
Mr. Arpit Agrawal Director, JECRC Foundation, for providing us such a great infrastructure and
environment for our overall development. I express sincere thanks to Dr. V. K. Chandna, Principal,
JECRC College, for his kind cooperation and extendible support towards the completion of our
industrial training. Words are inadequate in offering our thanks to Dr. Sanjay Gaur, HOD, CSE

viii
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

department, for consistent encouragement and support for shaping our industrial training in the
presentable form. Also our warm thanks to Jaipur Engineering College and Research Centre, who
provided us this opportunity to carryout, this prestigious industrial training and enhance our
learning in various technical fields.

Aaditya vyas

20EJCCS002

ABSTRACT

Data Science has become the most demanding job of the 21st century. Every
organization is looking for candidates with knowledge of data science. Data science is
a deep study of the massive amount of data, which involves extracting meaningful
insights from raw, structured, and unstructured data that is processed using the
scientific method, different technologies, and algorithms.

Industrial training is an important phase of a student life. A well planned, properly executed and
evaluated industrial training helps a lot in developing a professional attitude.

It develops an awareness of industrial approach to problem solving, based on a broad


understanding of process and mode of operation of organization.

The aim and motivation of this industrial training is to receive discipline, skills, teamwork and
technical knowledge through a proper training environment, which will help me, as a student in
the field of Computer Science, to develop a responsiveness of the self-disciplinary nature of

ix
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

problems in information and communication technology. Data science uses the most
powerful hardware, programming systems, and most efficient algorithms to solve the
data related problems. It is the future of artificial intelligence.

List of Figures:

Figure No. Figure Description Page no.

1. Training Certificate

2. Basics Of Python

3. Advanced Python

4. Python Libraries

5. Machine
Learning

6. Screenshots of Outputs

7. Screenshots of Outputs

8. Screenshots of Outputs

9. Screenshots of Outputs

x
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

TABLE OF CONTENTS
Title page i

Certificate ii

Vision and Mission iv

Program Outcomes (P0s) v

Program Educational Outcomes(PEOs) vii

Program Specific Outcomes(PSOs) vii

Course Outcomes(Cos) viii

Mapping: COs and POs viii

Acknowledgement ix

Abstract x

List of Figures ix

1. Introduction

1.1 Introduction about Techienest pvt. limited 10


1.2 Introduction to Training Platform 11
1.3 Training Starting Date 11
1.4 Training Ending Date 11
1.5 Date of Certification 11
1.6 Conclusion 12
2. Introduction About Internship
2.1 Introduction About Python 14
2.2 Why Python 14
2.3 Python features 15
3 Data Science
3.1 What is Data? 17
3.2 What is Data Science 17
3.3 Data Science Methodology 17

xi
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

3.4 Benefits of Data Science 18


3.5 Data Visualization 19

3. Modules in python
3.1 Numpy 20
3.2 Pandas 22
3.3 Matplotlib 24
3.4 Scikit-learn 26
5. Machine Learning
5.1 Scatter plot 28
5.2 Linear Regression 29
5.3 Logistic Regression 29
5.4 Workflow of Machine learning 32

6. Project

6.1 Real estate banglore house prediction 34

6.2 Outputs using screeshots 44

7. Conclusion 46

8. Reference 46

CHAPTER 1

xii
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

INTRODUCTION

1.1 Introduction about Techienest


Learn and Build is an initiative by TechieNest Pvt. Limited who has trained more than
candidates across the country.

At LnB, we are providing courses in various demanding technologies including Big


Data, DevOps, Cybersecurity, Cloud Computing, IoT, Artificial Intelligence, & many
more. The courses are designed effectively to facilitate theoretical as well as practical
underpinnings. LnB works as a cornerstone for budding technocrats and a stepping
stone for the working professionals. Our innovative blended platform integrates the
flexibility of recorded lectures and interactivity in LIVE classrooms, providing the
learner a package of resources to have proficiency in the technology. We believe that
Learning is a life-long process and Industry 4.0 demands continuous learning &
upskilling to accelerate your career thats why we desire to be your everlasting partner.

The company deals in Information Technology training for students of B.Tech, M.tech., BCA,
MCA, etc. We expertise in software solutions and consultancy.We also provide Corporate
trainings and Software Development Assistance.

1.2 Introduction to Training Platform


Techienest is a private Company located in Jaipur, Rajasthan. Our trainees are working in the
leading IT companies all over the world (Amazon, Google, Facebook etc.) and some of them
have got certificate of excellence wherever they are working. The achievement of Techienest
pvt. limited is amalgamated by the fact that we have trainees coming from various parts of
world including USA, CANADA, Dubai etc. Our team has experts in most of the IT technologies
like Java.
1.3 Training Starting Date
I have started my course on 27 June 2022.

1.4 Training Ending Date

xiii
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

I have successfully completed it on 8 August 2022

The total duration of the course is of 45 Days .

1.5 Date of Certification


I have received my Certificate on 8 August 2022 for the course.

1.6 Conclusion
This course is offered from the Techienest pvt. limited and it offers various types of
specializations, courses.

CHAPTER 2

INTRODUCTION ABOUT INTERNSHIP

2.1 Introduction About Python

xiv
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

What is Python?
Python is a popular programming language. It was created by Guido van Rossum, and released in
1991.

It is used for:

 web development (server-side),


 software development,
 mathematics,
 system scripting.

What can Python do?

 Python can be used on a server to create web applications.


 Python can be used alongside software to create workflows.
 Python can connect to database systems. It can also read and modify files.
 Python can be used to handle big data and perform complex mathematics.
 Python can be used for rapid prototyping, or for production-ready software development.

Why Python?

 Python works on different platforms (Windows, Mac, Linux, Raspberry Pi, etc).
 Python has a simple syntax similar to the English language.
 Python has syntax that allows developers to write programs with fewer lines than some
other programming languages.
 Python runs on an interpreter system, meaning that code can be executed as soon as it is
written. This means that prototyping can be very quick.
 Python can be treated in a procedural way, an object-oriented way or a functional way.

Good to know

 The most recent major version of Python is Python 3, which we shall be using in this
tutorial. However, Python 2, although not being updated with anything other than security
updates, is still quite popular.
 In this tutorial Python will be written in a text editor. It is possible to write Python in an
Integrated Development Environment, such as Thonny, Pycharm, Netbeans or Eclipse
which are particularly useful when managing larger collections of Python files.

xv
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

Python Syntax compared to other programming languages

 Python was designed for readability, and has some similarities to the English language with
influence from mathematics.
 Python uses new lines to complete a command, as opposed to other programming languages
which often use semicolons or parentheses.
 Python relies on indentation, using whitespace, to define scope; such as the scope of loops,
functions and classes. Other programming languages often use curly-brackets for this
purpose.

Python features

 Easy to learn and use


 Portable
 Easy to Understand
 Scalable
 Free and Open Source
 Automatic Garbage Collection
 Dynamic Typed and Typing Checking

xvi
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

xvii
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

CHAPTER 3

Data Science

What is Data?

Data can come in the form of text, observations, figures, images, numbers, graphs, or
symbols. For example, data might include individual prices, weights, addresses, ages,
names, temperatures, dates, or distances. Data is a raw form of knowledge and, on its own,
doesn't carry any significance or purpose.

Types of Data :

A. Qualitative\Quantitative Data

B. Discrete\Continuous Data

C. Nominal\Ordinal Data

D. Primary\Secondary Data

What is Data Science?


● Data science is the process of finding insights/trends/ intelligence that supports the business
leaders to make the better decision.
● Data science is a relatively new field and deeply rooted to Statistics and Decision Support
System.
● It is a Multidisciplinary field ( Domain Knowledge, Tools & technology, Mathematics &
Statistics, Problem Solving Skills).

Data Science Methodology


● Statement of the problem/Objective of the Study

xviii
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

● Data Preparation
● Feature selection
● Exploratory Data Analysis
● Model development
● Test the Model/Hypothesis Testing
● Communicate the findings to the Business Leaders
● Deployment ( Data as a product)

● Feedback/Lesson Learned and Continuous improvement


Benefits of Data Science
● Data Security
● Predictive Strategy
● improve the business
● reduce risk
● decision making
● Data management is better

xix
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

Data Visualization

Data visualization is the graphical representation of information and data. By using visual
elements like charts, graphs, and maps, data visualization tools provide an accessible way to
see and understand trends, outliers, and patterns in data. Additionally, it provides an
excellent way for employees or business owners to present data to non-technical audiences
without confusion.

Examples:

Area Map: A form of geospatial visualization, area maps are used to show specific
values set over a map of a country, state, county, or any other geographic location.
Two common types of area maps are choropleths and isopleths. Learn more.

Bar Chart: Bar charts represent numerical values compared to each other. The length
of the bar represents the value of each variable. Learn more.

Box-and-whisker Plots: These show a selection of ranges (the box) across a set
measure (the bar). Learn more.

Bullet Graph: A bar marked against a background to show progress or performance


against a goal, denoted by a line on the graph. Learn more.

Gantt Chart: Typically used in project management, Gantt charts are a bar chart
depiction of timelines and tasks. Learn more.

Heat Map: A type of geospatial visualization in map form which displays specific
data values as different colors (this doesn’t need to be temperatures, but that is a
common use).

xx
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

CHAPTER 4

Modules in Python

Numpy

NumPy is a Python library used for working with arrays. It also has functions for working in
domain of linear algebra, fourier transform, and matrices.

NumPy was created in 2005 by Travis Oliphant.

It is an open source project and you can use it freely.

NumPy stands for Numerical Python. In Python we have lists that serve the purpose of arrays, but
they are slow to process.

NumPy aims to provide an array object that is up to 50x faster than traditional Python lists. The
array object in NumPy is called ndarray.

It provides a lot of supporting functions that make working with ndarray very easy. Arrays are very
frequently used in data science, where speed and resources are very important

We can create a NumPy ndarray object by using the array() function.

Import numpy as np

arr=np.array([1,2,3,4,5])

Uses of Numpy

 Arithmetic Operations
 Searching, Sorting and Counting
 Bitwise Operators
 Linear Algebra
 Matrix Operations

xxi
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

xxii
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

Pandas

Pandas is a Python library used for working with data sets.

It has functions for analyzing, cleaning, exploring, and manipulating data.

The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was
created by Wes McKinney in 2008.

Pandas allows us to analyze big data and make conclusions based on statistical theories.

Pandas can clean messy data sets, and make them readable and relevant.

Relevant data is very important in data science.

Pandas are also able to delete rows that are not relevant, or contains wrong values, like empty or
NULL values. This is called cleaning the data.

Pandas provide two types of data structures

Series : A Pandas Series is like a column in a table.

DataFrame: A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array.

Features of Pandas

 It support Multiple file format


 Great handling of data
 Unique data
 Cleaning up data
 Merging and joining of datasets

xxiii
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

xxiv
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

Python Matplotib

Matplotlib is a low level graph plotting library in python that serves as a visualization utility.

Matplotlib was created by John D. Hunter.

Matplotlib is open source and we can use it freely.

Matplotlib is mostly written in python, a few segments are written in C, Objective-C and Javascript
for Platform compatibility.

If you have Python and pip already installed on a system, then installation of Matplotlib is very
easy.

Install it using this command:

pip install matplotlib

Most of the Matplotlib utilities lies under the pyplot submodule, and are usually imported
under the plt alias:

import matplotlib.pyplot as plt


import numpy as np

xpoints = np.array([0,6])
ypoints = np.array([0,250])

plt.plot(xpoints, ypoints)
plt.show()

xxv
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

xxvi
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

Python Scikit-learn

Scikit-learn (Sklearn) is the most useful and robust library for machine learning in Python. It
provides a selection of efficient tools for machine learning and statistical modeling including
classification, regression, clustering and dimensionality reduction via a consistence interface in
Python.

Features
Rather than focusing on loading, manipulating and summarising data, Scikit-learn library is focused
on modeling the data. Some of the most popular groups of models provided by Sklearn are as
follows −
Supervised Learning algorithms − Almost all the popular supervised learning algorithms, like
Linear Regression, Support Vector Machine (SVM), Decision Tree etc., are the part of scikit-learn.
Unsupervised Learning algorithms − On the other hand, it also has all the popular unsupervised
learning algorithms from clustering, factor analysis, PCA (Principal Component Analysis) to
unsupervised neural networks.
Clustering − This model is used for grouping unlabeled data.
Cross Validation − It is used to check the accuracy of supervised models on unseen data.
Dimensionality Reduction − It is used for reducing the number of attributes in data which can be
further used for summarisation, visualisation and feature selection.
Ensemble methods − As name suggest, it is used for combining the predictions of multiple
supervised models.
Feature extraction − It is used to extract the features from data to define the attributes in image
and text data.
Feature selection − It is used to identify useful attributes to create supervised models.
Open Source − It is open source library and also commercially usable under BSD license.

xxvii
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

xxviii
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

CHAPTER 5

Machine Learning

Machine Learning is making the computer learn from studying data and statistics.

Machine Learning is a step into the direction of artificial intelligence (AI).

Machine Learning is a program that analyses data and learns to predict the outcome.

Scatter Plot

A scatter plot is a diagram where each value in the data set is represented by a dot.

xxix
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

Regression

The term regression is used when you try to find the relationship between variables.

In Machine Learning, and in statistical modeling, that relationship is used to predict the outcome of
future events.

Linear Regression

Linear regression uses the relationship between the data-points to draw a straight line through all
them.

xxx
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

This line can be used to predict future values.

Logistic Regression

Logistic regression aims to solve classification problems. It does this by predicting categorical
outcomes, unlike linear regression that predicts a continuous outcome.

In the simplest case there are two outcomes, which is called binomial, an example of which is
predicting if a tumor is malignant or benign. Other cases have more than two outcomes to classify,

xxxi
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

in this case it is called multinomial. A common example for multinomial logistic regression would
be predicting the class of an iris flower between 3 different species.

Logistic Regression models the data using the sigmoid function

Logistic Regression has become a classification technique only when a decision threshold is
brought into the picture. The setting of threshold value is a very important aspect of logistic
regression and is dependent on a classification problem itself.

xxxii
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

Workflow of Machine Learning

We can define the machine learning workflow in following stages.

1. Gathering data
2. Data pre-processing
3. Researching the model that will be best for the type of data
xxxiii
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

4. Training and testing the model


5. Evaluation

xxxiv
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

Project-
Real Estate Bangalore House Prediction
The aim is to predict the efficient house pricing for real estate customers with respect to their
budgets and priorities. By analyzing previous market trends and price ranges, and also upcoming
developments future prices will be predicted. The functioning involves a website which accepts
customers specifications and then combines the application of Naive bayes algorithm of data
mining. This application will help customers to invest in an estate without approaching an agent. It
also decreases the risk involved in the transaction.

Program code-
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
%matplotlib inline
matplotlib.rcParams["figure.figsize"] = (20,10)

df1 = pd.read_csv("bengaluru_house_prices.csv")

df1.dtypes

df1.shape

df1.columns

df1['area_type'].unique()

df1['area_type'].value_counts()

xxxv
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

df2 = df1.drop(['area_type','society','balcony','availability'],axis='columns')
df2.shape
df2.head()

df2.isnull().sum()

df2.shape

df3 = df2.dropna()
df3.isnull().sum()

df3.shape

df3['bhk'] = df3['size'].apply(lambda x: int(x.split(' ')[0]))

df3.head()
df3.bhk.unique()

def is_float(x):
try:
float(x)
except:
return False
return True

2+3

xxxvi
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

df3[~df3['total_sqft'].apply(is_float)].head(10)

def convert_sqft_to_num(x):
tokens = x.split('-')
if len(tokens) == 2:
return (float(tokens[0])+float(tokens[1]))/2
try:
return float(x)
except:
return None

df4 = df3.copy()
df4.total_sqft = df4.total_sqft.apply(convert_sqft_to_num)
df4 = df4[df4.total_sqft.notnull()]
df4.head(2)

df4.loc[30]

(2100+2850)/2

df5 = df4.copy()
df5['price_per_sqft'] = df5['price']*100000/df5['total_sqft']
df5.head()

df5_stats = df5['price_per_sqft'].describe()

xxxvii
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

df5_stats

df5.to_csv("bhp.csv",index=False)

df5.location = df5.location.apply(lambda x: x.strip())


location_stats = df5['location'].value_counts(ascending=False)
location_stats

location_stats.values.sum()

len(location_stats[location_stats>10])

len(location_stats)

len(location_stats[location_stats<=10])

location_stats_less_than_10 = location_stats[location_stats<=10]
location_stats_less_than_10

len(df5.location.unique())

df5.location = df5.location.apply(lambda x: 'other' if x in location_stats_less_than_10 else x)

xxxviii
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

len(df5.location.unique())

df5.head(10)

df5[df5.total_sqft/df5.bhk<300].head()

df5.shape

df6 = df5[~(df5.total_sqft/df5.bhk<300)]
df6.shape

df6.price_per_sqft.describe()

# **Here we find that min price per sqft is 267 rs/sqft whereas max is 12000000, this shows a wide
variation in property prices. We should remove outliers per location using mean and one standard
deviation**

def remove_pps_outliers(df):
df_out = pd.DataFrame()
for key, subdf in df.groupby('location'):
m = np.mean(subdf.price_per_sqft)
st = np.std(subdf.price_per_sqft)
reduced_df = subdf[(subdf.price_per_sqft>(m-st)) & (subdf.price_per_sqft<=(m+st))]
df_out = pd.concat([df_out,reduced_df],ignore_index=True)
return df_out
df7 = remove_pps_outliers(df6)
df7.shape

xxxix
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

def plot_scatter_chart(df,location):
bhk2 = df[(df.location==location) & (df.bhk==2)]
bhk3 = df[(df.location==location) & (df.bhk==3)]
matplotlib.rcParams['figure.figsize'] = (15,10)
plt.scatter(bhk2.total_sqft,bhk2.price,color='blue',label='2 BHK', s=50)
plt.scatter(bhk3.total_sqft,bhk3.price,marker='+', color='green',label='3 BHK', s=50)
plt.xlabel("Total Square Feet Area")
plt.ylabel("Price (Lakh Indian Rupees)")
plt.title(location)
plt.legend()

plot_scatter_chart(df7,"Rajaji Nagar")

plot_scatter_chart(df7,"Hebbal")

def remove_bhk_outliers(df):
exclude_indices = np.array([])
for location, location_df in df.groupby('location'):
bhk_stats = {}
for bhk, bhk_df in location_df.groupby('bhk'):
bhk_stats[bhk] = {
'mean': np.mean(bhk_df.price_per_sqft),
'std': np.std(bhk_df.price_per_sqft),
'count': bhk_df.shape[0]
}
for bhk, bhk_df in location_df.groupby('bhk'):
stats = bhk_stats.get(bhk-1)
if stats and stats['count']>5:
exclude_indices = np.append(exclude_indices,
bhk_df[bhk_df.price_per_sqft<(stats['mean'])].index.values)
return df.drop(exclude_indices,axis='index')
df8 = remove_bhk_outliers(df7)
# df8 = df7.copy()
df8.shape

xl
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

plot_scatter_chart(df8,"Rajaji Nagar")

plot_scatter_chart(df8,"Hebbal")

import matplotlib
matplotlib.rcParams["figure.figsize"] = (20,10)
plt.hist(df8.price_per_sqft,rwidth=0.8)
plt.xlabel("Price Per Square Feet")
plt.ylabel("Count")

df8.bath.unique()

plt.hist(df8.bath,rwidth=0.8)
plt.xlabel("Number of bathrooms")
plt.ylabel("Count")

df8[df8.bath>10]

df8[df8.bath>df8.bhk+2]

df9 = df8[df8.bath<df8.bhk+2]
df9.shape

xli
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

df9.head(2)

df10 = df9.drop(['size','price_per_sqft'],axis='columns')
df10.head(3)

dummies = pd.get_dummies(df10.location)
dummies.head(3)

df11 = pd.concat([df10,dummies.drop('other',axis='columns')],axis='columns')
df11.head()

df12 = df11.drop('location',axis='columns')
df12.head(2)

df12.shape

X = df12.drop(['price'],axis='columns')
X.head(3)

X.shape

y = df12.price
y.head(3)

len(y)

xlii
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

from sklearn.model_selection import train_test_split


X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=10)

from sklearn.linear_model import LinearRegression


lr_clf = LinearRegression()
lr_clf.fit(X_train,y_train)
lr_clf.score(X_test,y_test)

from sklearn.model_selection import ShuffleSplit


from sklearn.model_selection import cross_val_score

cv = ShuffleSplit(n_splits=5, test_size=0.2, random_state=0)

cross_val_score(LinearRegression(), X, y, cv=cv)

from sklearn.model_selection import GridSearchCV

from sklearn.linear_model import Lasso


from sklearn.tree import DecisionTreeRegressor

def find_best_model_using_gridsearchcv(X,y):
algos = {
'linear_regression' : {
'model': LinearRegression(),
'params': {
'normalize': [True, False]
}
},
'lasso': {
'model': Lasso(),
'params': {
'alpha': [1,2],
'selection': ['random', 'cyclic']
}

xliii
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

},
'decision_tree': {
'model': DecisionTreeRegressor(),
'params': {
'criterion' : ['mse','friedman_mse'],
'splitter': ['best','random']
}
}
}
scores = []
cv = ShuffleSplit(n_splits=5, test_size=0.2, random_state=0)
for algo_name, config in algos.items():
gs = GridSearchCV(config['model'], config['params'], cv=cv, return_train_score=False)
gs.fit(X,y)
scores.append({
'model': algo_name,
'best_score': gs.best_score_,
'best_params': gs.best_params_
})

return pd.DataFrame(scores,columns=['model','best_score','best_params'])

find_best_model_using_gridsearchcv(X,y)

def predict_price(location,sqft,bath,bhk):
loc_index = np.where(X.columns==location)[0][0]

x = np.zeros(len(X.columns))
x[0] = sqft
x[1] = bath
x[2] = bhk
if loc_index >= 0:
x[loc_index] = 1

return lr_clf.predict([x])[0]

predict_price('1st Phase JP Nagar',1000, 2, 2)

predict_price('1st Phase JP Nagar',1000, 3, 3)

xliv
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

predict_price('Indira Nagar',1000, 2, 2)

predict_price('Indira Nagar',1000, 3, 3)

import pickle
with open('banglore_home_prices_model.pickle','wb') as f:
pickle.dump(lr_clf,f)

import json
columns = {
'data_columns' : [col.lower() for col in X.columns]
}
with open("columns.json","w") as f:
f.write(json.dumps(columns))

Output-

xlv
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

xlvi
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

xlvii
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

xlviii
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

Conclusion
The main aim of this project is to predict the price of Real Estate Properties using the
various Machine Learning (ML) models. Machine Learning project is a must for aspiring
developers. This project helps developers develop real-world projects to hone their skills
and materialise their theoretical knowledge into practical experience. Machine Learning has
significant advantages both as a commercial language and also as a teaching language.

Industrial training is significantly beneficial to all concerned parties in contributing towards the
development of the nation. Being a student, one can acquire Industrial experiences and at the

xlix
Jaipur Engineering College and Research
Centre, Shri Ram ki Nangal, via Sitapura
Academic Year-
RIICO Jaipur- 302 022. 2022-2023

same time familiarize themselves with the real working environment at the
Industrial training sites.

Future Scope
The Future Scope of Data Scientist is Data is being regularly collected by businesses and
companies for transactions and through website interactions. Many companies face a
common challenge – to analyze and categorize the data that is collected and stored. A data
scientist becomes the savior in a situation of mayhem like this. Companies can progress a lot
with proper and efficient handling of data, which results in productivity.

The future is all about automating processes and utilizing the heaps of data to make
intelligent decisions. This puts to the forefront technologies such as artificial intelligence
(AI), machine and deep learning, Internet of Things (IoT), etc.

REFERENCES-
[1] https://www.geeksforgeeks.org/
[2] https://github.com/
[3] https://www.javatpoint.com/
[4] https://www.kaggle.com/datasets/amitabhajoy/bengaluru-house-price-data

Thank You
l

You might also like