0% found this document useful (0 votes)

661 views10 pages

Data Analysis With Python - Jupyter Notebook

This document contains code for analyzing housing data using Python. It: 1) Imports necessary libraries and reads in housing data from a CSV file. 2) Cleans the data by replacing missing values in columns like 'bedrooms' and 'bathrooms' with the column means. 3) Generates descriptive statistics and value counts to analyze features of the housing data. 4) Creates a box plot to visualize the relationship between house prices and waterfront properties.

Uploaded by

Nitish Ravuvari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

661 views10 pages

Data Analysis With Python - Jupyter Notebook

Uploaded by

Nitish Ravuvari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

6/8/23, 1:10 PM Data Analysis with Python - Jupyter Notebook

In [2]: import pandas as pd

import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler,PolynomialFeatures
%matplotlib inline

In [3]: file_name='https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DA0101EN/coursera/project/
df=pd.read_csv(file_name)

In [4]: df.head()

Out[4]:
Unnamed:
id date price bedrooms bathrooms sqft_living sqft_lot floors waterfront ... grade sqft_above sqft_
0

0 0 7129300520 20141013T000000 221900.0 3.0 1.00 1180 5650 1.0 0 ... 7 1180

1 1 6414100192 20141209T000000 538000.0 3.0 2.25 2570 7242 2.0 0 ... 7 2170

2 2 5631500400 20150225T000000 180000.0 2.0 1.00 770 10000 1.0 0 ... 6 770

3 3 2487200875 20141209T000000 604000.0 4.0 3.00 1960 5000 1.0 0 ... 7 1050

4 4 1954400510 20150218T000000 510000.0 3.0 2.00 1680 8080 1.0 0 ... 8 1680

5 rows × 22 columns

localhost:8888/notebooks/Data Analysis with Python.ipynb# 1/10

6/8/23, 1:10 PM Data Analysis with Python - Jupyter Notebook

In [6]: print(df.dtypes)

Unnamed: 0 int64
id int64
date object
price float64
bedrooms float64
bathrooms float64
sqft_living int64
sqft_lot int64
floors float64
waterfront int64
view int64
condition int64
grade int64
sqft_above int64
sqft_basement int64
yr_built int64
yr_renovated int64
zipcode int64
lat float64
long float64
sqft_living15 int64
sqft_lot15 int64
dtype: object

localhost:8888/notebooks/Data Analysis with Python.ipynb# 2/10

6/8/23, 1:10 PM Data Analysis with Python - Jupyter Notebook

In [7]: df.describe()

Out[7]:
Unnamed: 0 id price bedrooms bathrooms sqft_living sqft_lot floors waterfront vi

count 21613.00000 2.161300e+04 2.161300e+04 21600.000000 21603.000000 21613.000000 2.161300e+04 21613.000000 21613.000000 21613.0000

mean 10806.00000 4.580302e+09 5.400881e+05 3.372870 2.115736 2079.899736 1.510697e+04 1.494309 0.007542 0.2343

std 6239.28002 2.876566e+09 3.671272e+05 0.926657 0.768996 918.440897 4.142051e+04 0.539989 0.086517 0.7663

min 0.00000 1.000102e+06 7.500000e+04 1.000000 0.500000 290.000000 5.200000e+02 1.000000 0.000000 0.0000

25% 5403.00000 2.123049e+09 3.219500e+05 3.000000 1.750000 1427.000000 5.040000e+03 1.000000 0.000000 0.0000

50% 10806.00000 3.904930e+09 4.500000e+05 3.000000 2.250000 1910.000000 7.618000e+03 1.500000 0.000000 0.0000

75% 16209.00000 7.308900e+09 6.450000e+05 4.000000 2.500000 2550.000000 1.068800e+04 2.000000 0.000000 0.0000

max 21612.00000 9.900000e+09 7.700000e+06 33.000000 8.000000 13540.000000 1.651359e+06 3.500000 1.000000 4.0000

8 rows × 21 columns

In [8]: df.drop(['id', 'Unnamed: 0'], axis=1, inplace=True)

df.describe()

Out[8]:
price bedrooms bathrooms sqft_living sqft_lot floors waterfront view condition gr

count 2.161300e+04 21600.000000 21603.000000 21613.000000 2.161300e+04 21613.000000 21613.000000 21613.000000 21613.000000 21613.000

mean 5.400881e+05 3.372870 2.115736 2079.899736 1.510697e+04 1.494309 0.007542 0.234303 3.409430 7.656

std 3.671272e+05 0.926657 0.768996 918.440897 4.142051e+04 0.539989 0.086517 0.766318 0.650743 1.175

min 7.500000e+04 1.000000 0.500000 290.000000 5.200000e+02 1.000000 0.000000 0.000000 1.000000 1.000

25% 3.219500e+05 3.000000 1.750000 1427.000000 5.040000e+03 1.000000 0.000000 0.000000 3.000000 7.000

50% 4.500000e+05 3.000000 2.250000 1910.000000 7.618000e+03 1.500000 0.000000 0.000000 3.000000 7.000

75% 6.450000e+05 4.000000 2.500000 2550.000000 1.068800e+04 2.000000 0.000000 0.000000 4.000000 8.000

max 7.700000e+06 33.000000 8.000000 13540.000000 1.651359e+06 3.500000 1.000000 4.000000 5.000000 13.000

localhost:8888/notebooks/Data Analysis with Python.ipynb# 3/10

6/8/23, 1:10 PM Data Analysis with Python - Jupyter Notebook

In [9]: print("number of NaN values for the column bedrooms :", df['bedrooms'].isnull().sum())
print("number of NaN values for the column bathrooms :", df['bathrooms'].isnull().sum())

number of NaN values for the column bedrooms : 13

number of NaN values for the column bathrooms : 10

In [11]: mean=df['bedrooms'].mean()
df['bedrooms'].replace(np.nan,mean, inplace=True)

In [12]: mean=df['bathrooms'].mean()
df['bathrooms'].replace(np.nan,mean, inplace=True)

In [13]: print("number of NaN values for the column bedrooms :", df['bedrooms'].isnull().sum())
print("number of NaN values for the column bathrooms :", df['bathrooms'].isnull().sum())

number of NaN values for the column bedrooms : 0

number of NaN values for the column bathrooms : 0

In [14]: df['floors'].value_counts().to_frame()

Out[14]:
floors

1.0 10680

2.0 8241

1.5 1910

3.0 613

2.5 161

3.5 8

localhost:8888/notebooks/Data Analysis with Python.ipynb# 4/10

6/8/23, 1:10 PM Data Analysis with Python - Jupyter Notebook

In [15]: sns.boxplot(x='waterfront', y='price', data=df)

Out[15]: <AxesSubplot:xlabel='waterfront', ylabel='price'>

localhost:8888/notebooks/Data Analysis with Python.ipynb# 5/10

6/8/23, 1:10 PM Data Analysis with Python - Jupyter Notebook

In [16]: sns.regplot(x='sqft_above', y='price', data=df)

Out[16]: <AxesSubplot:xlabel='sqft_above', ylabel='price'>

localhost:8888/notebooks/Data Analysis with Python.ipynb# 6/10

6/8/23, 1:10 PM Data Analysis with Python - Jupyter Notebook

In [17]: df.corr()['price'].sort_values()

Out[17]: zipcode -0.053203

long 0.021626
condition 0.036362
yr_built 0.054012
sqft_lot15 0.082447
sqft_lot 0.089661
yr_renovated 0.126434
floors 0.256794
waterfront 0.266369
lat 0.307003
bedrooms 0.308797
sqft_basement 0.323816
view 0.397293
bathrooms 0.525738
sqft_living15 0.585379
sqft_above 0.605567
grade 0.667434
sqft_living 0.702035
price 1.000000
Name: price, dtype: float64

In [18]: import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression

X = df[['long']]
Y = df['price']
lm = LinearRegression()
lm
lm.fit(X,Y)
lm.score(X, Y)

Out[18]: 0.00046769430149007363

localhost:8888/notebooks/Data Analysis with Python.ipynb# 7/10

6/8/23, 1:10 PM Data Analysis with Python - Jupyter Notebook

In [19]: X = df[['sqft_living']]
Y = df['price']
lm = LinearRegression()
lm.fit(X, Y)
lm.score(X, Y)

Out[19]: 0.4928532179037931

In [21]: features =["floors", "waterfront","lat" ,"bedrooms" ,"sqft_basement" ,"view" ,"bathrooms","sqft_living15","sqft_above"

In [22]: X = df[features]
Y= df['price']
lm = LinearRegression()
lm.fit(X, Y)
lm.score(X, Y)

Out[22]: 0.6576435664044019

In [23]: Input=[('scale',StandardScaler()),('polynomial', PolynomialFeatures(include_bias=False)),('model',LinearRegression())]

In [24]: pipe=Pipeline(Input)
pipe

Out[24]: Pipeline(steps=[('scale', StandardScaler()),

('polynomial', PolynomialFeatures(include_bias=False)),
('model', LinearRegression())])

In [25]: pipe.fit(X,Y)

Out[25]: Pipeline(steps=[('scale', StandardScaler()),

('polynomial', PolynomialFeatures(include_bias=False)),
('model', LinearRegression())])

localhost:8888/notebooks/Data Analysis with Python.ipynb# 8/10

6/8/23, 1:10 PM Data Analysis with Python - Jupyter Notebook

In [26]: pipe.score(X,Y)

Out[26]: 0.750441999451871

In [27]: from sklearn.model_selection import cross_val_score

from sklearn.model_selection import train_test_split
print("done")

done

In [28]: features =["floors", "waterfront","lat" ,"bedrooms" ,"sqft_basement" ,"view" ,"bathrooms","sqft_living15","sqft_above"

X = df[features ]
Y = df['price']

x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.15, random_state=1)

print("number of test samples :", x_test.shape[0])
print("number of training samples:",x_train.shape[0])

number of test samples : 3242

number of training samples: 18371

In [29]: from sklearn.linear_model import Ridge

RidgeModel = Ridge(alpha = 0.1)
RidgeModel.fit(x_train, y_train)
RidgeModel.score(x_test, y_test)

Out[29]: 0.6478759163939111

localhost:8888/notebooks/Data Analysis with Python.ipynb# 9/10

6/8/23, 1:10 PM Data Analysis with Python - Jupyter Notebook

In [30]: from sklearn.preprocessing import PolynomialFeatures

from sklearn.linear_model import Ridge
pr = PolynomialFeatures(degree=2)
x_train_pr = pr.fit_transform(x_train)
x_test_pr = pr.fit_transform(x_test)
poly = Ridge(alpha=0.1)
poly.fit(x_train_pr, y_train)
poly.score(x_test_pr, y_test)

Out[30]: 0.7002744282813562

In [ ]:

localhost:8888/notebooks/Data Analysis with Python.ipynb# 10/10

Manufacturing Project
100% (1)
Manufacturing Project
34 pages
Aicte Idea Lab Equipments 05.11.2022
No ratings yet
Aicte Idea Lab Equipments 05.11.2022
8 pages
Lesson 3 Resistor Tolerance
No ratings yet
Lesson 3 Resistor Tolerance
14 pages
ManualBatcher en
No ratings yet
ManualBatcher en
39 pages
Basics Study For AUTOCAD
No ratings yet
Basics Study For AUTOCAD
6 pages
Nonmetallic Materials: Plastics, Elastomers, Ceramics and Composites
100% (2)
Nonmetallic Materials: Plastics, Elastomers, Ceramics and Composites
38 pages
Haward Technology Middle East: API 936: Refractory Inspection Code
No ratings yet
Haward Technology Middle East: API 936: Refractory Inspection Code
8 pages
All Green Microwave Assisted 99 Depolymerisation of Polyethylene Terephthalate Into Value Added Products Via Glycerol Pretreatment and Hydrolysis ReactionJournal of Polymers and The Environment
No ratings yet
All Green Microwave Assisted 99 Depolymerisation of Polyethylene Terephthalate Into Value Added Products Via Glycerol Pretreatment and Hydrolysis ReactionJournal of Polymers and The Environment
13 pages
Pricelist Alat Laboratorium Fakultas Pertanian Tahun 2022
No ratings yet
Pricelist Alat Laboratorium Fakultas Pertanian Tahun 2022
22 pages
Lecture 1 Screen Analysis
No ratings yet
Lecture 1 Screen Analysis
8 pages
Table of Contents
No ratings yet
Table of Contents
131 pages
PDF Rainbow Office Furniture Catalog Design-Low
No ratings yet
PDF Rainbow Office Furniture Catalog Design-Low
137 pages
System Dynamics: Modeling With STELLA Software
No ratings yet
System Dynamics: Modeling With STELLA Software
24 pages
Transportation
33% (3)
Transportation
4 pages
Lesson 14: Depreciation
No ratings yet
Lesson 14: Depreciation
36 pages
Research Proposal 04112017 - 2 Columns Versie 2-2
No ratings yet
Research Proposal 04112017 - 2 Columns Versie 2-2
69 pages
Graphing in Physics Notes
No ratings yet
Graphing in Physics Notes
2 pages
Technological - Comparision - Chart For Continuous Online Monitoring System
No ratings yet
Technological - Comparision - Chart For Continuous Online Monitoring System
45 pages
Application of Mathematical Models in Drug Release
No ratings yet
Application of Mathematical Models in Drug Release
9 pages
Transforming Lives. Transforming Materials.: Advanced Co-Rotating Twin Screw Extruder
No ratings yet
Transforming Lives. Transforming Materials.: Advanced Co-Rotating Twin Screw Extruder
16 pages
Bài tập tự động hóa sản xuất
No ratings yet
Bài tập tự động hóa sản xuất
7 pages
Rubrics For ANSYS-Evaluation 06-Omitted
No ratings yet
Rubrics For ANSYS-Evaluation 06-Omitted
1 page
Mechanical Operations
No ratings yet
Mechanical Operations
2 pages
Design of Experiments (DoE) Studies - Method, Chemistry, Videos
No ratings yet
Design of Experiments (DoE) Studies - Method, Chemistry, Videos
11 pages
03 Air Pollution Control Equipment
No ratings yet
03 Air Pollution Control Equipment
158 pages
Uniwell BMS Presentation
No ratings yet
Uniwell BMS Presentation
48 pages
Operating Instructions SZ-7552-62-P: General Description Operating Messages and Icon Status
No ratings yet
Operating Instructions SZ-7552-62-P: General Description Operating Messages and Icon Status
3 pages
Preview
No ratings yet
Preview
111 pages
Marginal Costing 1st Sem
No ratings yet
Marginal Costing 1st Sem
6 pages
Box-Jenkins (Part 1)
No ratings yet
Box-Jenkins (Part 1)
35 pages
Assignment No 1 (ME 461)
No ratings yet
Assignment No 1 (ME 461)
6 pages
Mathematical Models Used in The Drug Release Studies
100% (12)
Mathematical Models Used in The Drug Release Studies
27 pages
Extract Loss Reduction in The Filling Area
No ratings yet
Extract Loss Reduction in The Filling Area
19 pages
Al Ict Paper 3 2010-2021
No ratings yet
Al Ict Paper 3 2010-2021
35 pages
CLB 10904 / CLB12004: Chemical Process Principles
No ratings yet
CLB 10904 / CLB12004: Chemical Process Principles
39 pages
ECE111 - Analog Electronics: Sandeep Saini Gaurav Chatterjee
No ratings yet
ECE111 - Analog Electronics: Sandeep Saini Gaurav Chatterjee
100 pages
Filtration
No ratings yet
Filtration
2 pages
Unit No. 06
100% (1)
Unit No. 06
28 pages
Leadership (Industrial Management)
No ratings yet
Leadership (Industrial Management)
8 pages
House Price Prediction
No ratings yet
House Price Prediction
14 pages
Prac - 8 (1) - Jupyter Notebook
No ratings yet
Prac - 8 (1) - Jupyter Notebook
6 pages
USA Real Estate Price Prediction Using Decision Tree Regressor, and AdaBoost Regressor
No ratings yet
USA Real Estate Price Prediction Using Decision Tree Regressor, and AdaBoost Regressor
14 pages
House Price Prediction Models
No ratings yet
House Price Prediction Models
16 pages
Data Cleaning EDA
No ratings yet
Data Cleaning EDA
5 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
20 pages
House Price Prediction: # Importing Necessary Libraries
No ratings yet
House Price Prediction: # Importing Necessary Libraries
18 pages
Kaggle Machine Learning
No ratings yet
Kaggle Machine Learning
6 pages
Ds ML House Price Book
No ratings yet
Ds ML House Price Book
46 pages
Pandas Assignment 1
No ratings yet
Pandas Assignment 1
7 pages
ML Merged
No ratings yet
ML Merged
28 pages
Exercise3 Solution
No ratings yet
Exercise3 Solution
19 pages
Real Estate Valuation Data Set: Section Order
No ratings yet
Real Estate Valuation Data Set: Section Order
17 pages
1.11 Lab 1 Data Analysis With Python 3
No ratings yet
1.11 Lab 1 Data Analysis With Python 3
25 pages
Project 4 - House Price Prediction - Ipynb - Colab
No ratings yet
Project 4 - House Price Prediction - Ipynb - Colab
5 pages
Setup: Chapter 2 - End-To-End Machine Learning Project
No ratings yet
Setup: Chapter 2 - End-To-End Machine Learning Project
31 pages
Week 12
No ratings yet
Week 12
2 pages
1684918425867
No ratings yet
1684918425867
14 pages
Introduction To Machine Learning (ML) With Sklearn
No ratings yet
Introduction To Machine Learning (ML) With Sklearn
10 pages
Tarea - Prediccion de Casas en California
No ratings yet
Tarea - Prediccion de Casas en California
5 pages
P04 The Regression Pipeline - Preprocessing Ans
No ratings yet
P04 The Regression Pipeline - Preprocessing Ans
19 pages
Image Processing (RCS082) Unit V Huffman Coding
No ratings yet
Image Processing (RCS082) Unit V Huffman Coding
12 pages
Unit VI - Ruby and Rail
No ratings yet
Unit VI - Ruby and Rail
101 pages
Database System Architectures DS 2
No ratings yet
Database System Architectures DS 2
37 pages
Python Module 6 Type Conversion
No ratings yet
Python Module 6 Type Conversion
14 pages
Lab Manual 02 - 8-11-2021
No ratings yet
Lab Manual 02 - 8-11-2021
6 pages
Bisma Ali - Assignment
No ratings yet
Bisma Ali - Assignment
5 pages
Fods in C Lab Manual
No ratings yet
Fods in C Lab Manual
93 pages
Programming Languaged Scanning Week 1-2
No ratings yet
Programming Languaged Scanning Week 1-2
7 pages
DS Programshj GFH DF
No ratings yet
DS Programshj GFH DF
54 pages
R20 OS Lab Manual
No ratings yet
R20 OS Lab Manual
26 pages
Assignment 3
No ratings yet
Assignment 3
7 pages
DSD Lab Experiment-8 (A) : To Design and Implement A SR Flip-Flop Using Behavioural Modeling
No ratings yet
DSD Lab Experiment-8 (A) : To Design and Implement A SR Flip-Flop Using Behavioural Modeling
8 pages
IT3031-Database Systems and Data-Driven Application
No ratings yet
IT3031-Database Systems and Data-Driven Application
6 pages
Chtp5e Pie SM 09
50% (2)
Chtp5e Pie SM 09
12 pages
Algorithms With JULIA
100% (1)
Algorithms With JULIA
447 pages
MCA - II & III Years Syllabus
No ratings yet
MCA - II & III Years Syllabus
94 pages
Dynamic Programming DP Explanation
No ratings yet
Dynamic Programming DP Explanation
3 pages
Visual Basci: F3 Student
No ratings yet
Visual Basci: F3 Student
20 pages
02 CSharp Fundamentals Data Types and Variables Lab
No ratings yet
02 CSharp Fundamentals Data Types and Variables Lab
5 pages
Module 2
No ratings yet
Module 2
40 pages
Apps - Descriptive Flexfield Basics in Oracle Apps
No ratings yet
Apps - Descriptive Flexfield Basics in Oracle Apps
19 pages
Validation and Substitution Usevalidatio
No ratings yet
Validation and Substitution Usevalidatio
3 pages
Learn Data Structures and Algorithms - DSA Tutorials - CodeChef
No ratings yet
Learn Data Structures and Algorithms - DSA Tutorials - CodeChef
15 pages
Conversion Functions
No ratings yet
Conversion Functions
3 pages
50 Programming Questions
No ratings yet
50 Programming Questions
3 pages
Important Programs in Java
0% (1)
Important Programs in Java
35 pages
HTML Form Validation
No ratings yet
HTML Form Validation
6 pages
Notes - PPS Unit 1
No ratings yet
Notes - PPS Unit 1
31 pages
Viktor Schastnyy CV
No ratings yet
Viktor Schastnyy CV
2 pages
Understanding Operating Systems Seventh Edition: Memory Management: Simple Systems
No ratings yet
Understanding Operating Systems Seventh Edition: Memory Management: Simple Systems
48 pages

Data Analysis With Python - Jupyter Notebook

Uploaded by

Data Analysis With Python - Jupyter Notebook

Uploaded by

6/8/23, 1:10 PM Data Analysis with Python - Jupyter Notebook

In [2]: import pandas as pd

localhost:8888/notebooks/Data Analysis with Python.ipynb# 1/10

localhost:8888/notebooks/Data Analysis with Python.ipynb# 2/10

In [8]: df.drop(['id', 'Unnamed: 0'], axis=1, inplace=True)

localhost:8888/notebooks/Data Analysis with Python.ipynb# 3/10

number of NaN values for the column bedrooms : 13

number of NaN values for the column bedrooms : 0

localhost:8888/notebooks/Data Analysis with Python.ipynb# 4/10

In [15]: sns.boxplot(x='waterfront', y='price', data=df)

Out[15]: <AxesSubplot:xlabel='waterfront', ylabel='price'>

localhost:8888/notebooks/Data Analysis with Python.ipynb# 5/10

In [16]: sns.regplot(x='sqft_above', y='price', data=df)

Out[16]: <AxesSubplot:xlabel='sqft_above', ylabel='price'>

localhost:8888/notebooks/Data Analysis with Python.ipynb# 6/10

Out[17]: zipcode -0.053203

In [18]: import matplotlib.pyplot as plt

localhost:8888/notebooks/Data Analysis with Python.ipynb# 7/10

In [21]: features =["floors", "waterfront","lat" ,"bedrooms" ,"sqft_basement" ,"view" ,"bathrooms","sqft_living15","sqft_above"

In [23]: Input=[('scale',StandardScaler()),('polynomial', PolynomialFeatures(include_bias=False)),('model',LinearRegression())]

Out[24]: Pipeline(steps=[('scale', StandardScaler()),

Out[25]: Pipeline(steps=[('scale', StandardScaler()),

localhost:8888/notebooks/Data Analysis with Python.ipynb# 8/10

In [27]: from sklearn.model_selection import cross_val_score

In [28]: features =["floors", "waterfront","lat" ,"bedrooms" ,"sqft_basement" ,"view" ,"bathrooms","sqft_living15","sqft_above"

number of test samples : 3242

In [29]: from sklearn.linear_model import Ridge

localhost:8888/notebooks/Data Analysis with Python.ipynb# 9/10

In [30]: from sklearn.preprocessing import PolynomialFeatures

localhost:8888/notebooks/Data Analysis with Python.ipynb# 10/10

You might also like