0% found this document useful (0 votes)
7 views

1.11 Lab 1 Data Analysis With Python 3

The document outlines a lab assignment for data analysis using Python, focusing on housing price data from King County. Participants are tasked with creating a Jupyter notebook to analyze and model housing prices based on various features, answering a total of 11 specific questions. The assignment includes tasks such as data cleaning, statistical analysis, and fitting regression models to predict house prices.

Uploaded by

nhunhse183644
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

1.11 Lab 1 Data Analysis With Python 3

The document outlines a lab assignment for data analysis using Python, focusing on housing price data from King County. Participants are tasked with creating a Jupyter notebook to analyze and model housing prices based on various features, answering a total of 11 specific questions. The assignment includes tasks such as data cleaning, statistical analysis, and fitting regression models to predict house prices.

Uploaded by

nhunhse183644
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 25

Lab 1: Data Analysis with

Python
Objectives
 Create a Jupyter notebook
 Apply data analysis and modeling techniques to housing
price data
 Answer 11 questions: implement using Python code

Lab 1: Data Analysis with


Python 2
Project Scenario
 In this assignment, you are a Data Analyst working at a Real
Estate Investment Trust. The Trust would like to start
investing in Residential real estate. You are tasked with
determining the market price of a house given a set of
features. You will analyze and predict housing prices using
attributes or features such as square footage, number of
bedrooms, number of floors, and so on. A template notebook
is provided in the lab; your job is to complete the ten
questions. Some hints to the questions are given in the
template notebook.
Lab 1: Data Analysis with
Python 3
Data sets
 The dataset contains house sale prices for King County,
which includes Seattle.
 It includes homes sold between May 2014 and May 2015

Lab 1: Data Analysis with


Python 4
Data sets

Lab 1: Data Analysis with


Python 5
Data sets

Lab 1: Data Analysis with


Python 6
Data sets

Lab 1: Data Analysis with


Python 7
Data sets

Lab 1: Data Analysis with


Python 8
Data sets

Lab 1: Data Analysis with


Python 9
Data sets

Lab 1: Data Analysis with


Python 10
Data sets

Lab 1: Data Analysis with


Python 11
Data sets

Lab 1: Data Analysis with


Python 12
Question
 Question 1
 Display the data types of each column using the function dtypes

Lab 1: Data Analysis with


Python 13
Question
 Question 2
 Drop the columns "id" and "Unnamed: 0" from axis 1 using the
method drop(), then use the method describe() to obtain a
statistical summary of the data

Lab 1: Data Analysis with


Python 14
Question
 Question 3
 Use the method value_counts to count the number of houses with
unique floor values, use the method .to_frame() to convert it to a
dataframe.

Lab 1: Data Analysis with


Python 15
Question
 Question 4
 Use the function boxplot in the seaborn library to determine
whether houses with a waterfront view or without a waterfront view
have more price outliers.

Lab 1: Data Analysis with


Python 16
Question
 Question 5
 Use the function regplot in the seaborn library to determine if the
feature sqft_above is negatively or positively correlated with price.

Lab 1: Data Analysis with


Python 17
Question
 Question 6
 Fit a linear regression model to predict the 'price' using the
feature 'sqft_living' then calculate the R^2.

Lab 1: Data Analysis with


Python 18
Question
 Question 7
 Fit a linear regression model to predict the 'price' using the list of
features: features =["floors",
"waterfront","lat" ,"bedrooms" ,"sqft_basement" ,"view" ,"bathroom
s","sqft_living15","sqft_above","grade","sqft_living"]
 Then calculate the R^2.

Lab 1: Data Analysis with


Python 19
Question
 Question 8
 Create a list of tuples, the first element in the tuple contains the
name of the estimator:
 'scale'
 'polynomial'
 'model'
 The second element in the tuple contains the model constructor
 StandardScaler()
 PolynomialFeatures(include_bias=False)
 LinearRegression()

Lab 1: Data Analysis with


Python 20
Question
 Question 9
 Use the list to create a pipeline object to predict the 'price', fit the
object using the features in the list features, and calculate the R^2.

Lab 1: Data Analysis with


Python 21
Question
 Question 10
 Create and fit a Ridge regression object using the training data, set
the regularization parameter to 0.1, and calculate the R^2 using
the test data.

Lab 1: Data Analysis with


Python 22
Question
 Question 11
 Perform a second order polynomial transform on both the training
data and testing data. Create and fit a Ridge regression object
using the training data, set the regularisation parameter to 0.1, and
calculate the R^2 utilising the test data provided.

Lab 1: Data Analysis with


Python 23
Summary
 Create a Jupyter notebook
 Apply data analysis and modeling techniques to housing
price data

Lab 1: Data Analysis with


Python 24
Q&A

Lab 1: Data Analysis with


Python 25

You might also like