0% found this document useful (0 votes)

4 views15 pages

Task 1 - Data Analytics in Python

This report investigates the Manchester Housing dataset to identify factors influencing property values using the CRISP DM framework. Key findings indicate that floor space and the number of bedrooms significantly impact prices, while waterfront status has a minor effect. The analysis recommends focusing on larger properties with better amenities for pricing strategies and investment decisions.

Uploaded by

sahilsahilkamboj510

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views15 pages

Task 1 - Data Analytics in Python

Uploaded by

sahilsahilkamboj510

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

COM7024

Msc Data Science

Programming for Data

Analytics

Investigating the Manchester Housing Market

STU218659

Lee Braiden
Investigating the Manchester Housing Market
The main goal of this report is to examine the Manchester Housing dataset and offer
insights to help make informed decisions. The analysis is based on the CRISP DM (Cross
Industry Standard Process, for Data Mining) framework encompassing stages like Business
Understanding, Data Understanding, Data Preparation, Modeling, Evaluation and Deployment.
Within this report are statistical examinations the application of the Central Limit Theorem and
Python utilization, for data analysis.

Exploring Business Factors

The main objective is to pinpoint the elements that impact property values in Manchester
specifically looking at features, like footage, construction year, proximity to water and available
amenities. This study seeks to provide insights, for pricing tactics, real estate development
choices and potential investment prospects.

Data Understanding

Dataset Overview

The dataset contains various attributes of properties in Manchester, including:

• Price

• Waterfront status

• Floor Space

• Year Built

• Bedrooms

• Bathrooms

• Location

• Property Type

• Condition
• Lot Size

• Amenities

First, we loaded the dataset and displayed the first 10 rows for initial inspection.

Descriptive Statistics

In this study we analyzed the statistics, for waterfront homes to get insights, into their
characteristics and variations. The findings revealed that waterfront properties generally
command prices offer spacious living areas and come with a greater range of amenities
compared to non-waterfront properties.

Data Preparation

Data Cleaning and Transformation

It is important to find and fill in missing values accurately for analysis. We replaced missing
values, with the occurring value for categorical variables and made sure to verify and adjust data
types as needed. This process guaranteed that all data points were ready for use and maintained
consistency, for analysis.

Statistical Test: T-test

A statistical test known as a T test was performed to analyze the price disparity between
properties near water and those that are not. The results showed a T statistic of 0.210 and a p
value of 0.836 suggesting that there is a slight difference in prices, between waterfront and non-
waterfront properties.

Central Limit Theorem Demonstration

To explain the Central Limit Theorem, we took samples from the dataset. Graphed the averages
of these samples. The outcome showed that the distribution of sample averages resembled a
distribution. This proves that as the sample size grows the average price becomes normally
distributed, regardless of whether the original price distribution's normal or not.
Modeling and Analysis

Correlation Analysis

Correlation matrices were computed before and after data preprocessing to understand
relationships between numeric variables. Key correlations identified include:

• A moderate positive correlation (0.390) between Floor Space and Price.

• A minor correlation (0.094) between Year Built and Price.

• A minor correlation (0.045) between Waterfront status and Price.

Heatmaps were used to visualize these correlations, highlighting the relationships between
different property attributes.

Visualizations

Several plots were created to visualize relationships between variables:

• Distribution of Floor Space: This histogram showed the spread and central tendency of
floor space across properties.

• Year Built vs. Price: A scatter plot revealed a positive trend, indicating that newer
properties tend to be priced higher.

• Floor Space vs. Price: A scatter plot demonstrated a clear positive relationship,
suggesting that larger properties command higher prices.

• Waterfront vs. Price: A box plot showed that waterfront properties generally have higher
median prices, though the variability within each category was considerable.

Evaluation

The analysis revealed key insights:

• There is a moderate positive correlation between Floor Space and Price.

• Bedrooms and Bathrooms have strong positive correlations with Price.

• Waterfront status has a minor impact on Price, as indicated by the T-test results.
These findings suggest that while certain factors like floor space and the number of bedrooms
significantly influence property prices, others like the year built and waterfront status have less
impact.

Recommendations

1. Focus on Floor Space and Amenities: Properties with larger floor space and better
amenities should be priced higher, as these factors significantly influence property prices.

2. Year Built Consideration: While newer properties are slightly more valuable, this factor
is less significant compared to floor space and amenities.

3. Investment in Non-Waterfront Properties: Given the minor price difference between

waterfront and non-waterfront properties, investing in well-located non-waterfront
properties with good amenities might be more cost-effective.

The thorough investigation of the Manchester Housing dataset has given us information,
about the factors affecting property prices. By using the CRISP DM framework, we carefully
studied the data, utilized techniques and drew significant conclusions to guide our strategic
choices.
References

Han, J., Pei, J., & Kamber, M. (2011). Data Mining: Concepts and Techniques. Elsevier.

McKinney, W. (2010). Data Analysis with Python. O'Reilly Media.

Silver, N. (2012). The Signal and the Noise: Why So Many Predictions Fail--but Some Don't.
Penguin.
Appendix

# Importing required libraries

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from scipy import stats

# Path of Manchester Housing dataset

file_path = r'C:\Users\Administrator\Desktop\DataAnalytics\manchester_housing_data.csv'
data = pd.read_csv(file_path)

# Display the first few rows of the dataset

print("First 10 rows of the dataset:")
print(data.head(10))

# Descriptive statistics for waterfront properties

print("statistics for waterfront properties:")
waterfront_properties = data[data['Waterfront'] == 1]
print(waterfront_properties.describe())

# Graph the distribution of floor space

plt.figure(figsize=(10, 6))
sns.histplot(data['Floor Space'], kde=True)
plt.title('Distribution of Floor Space')
plt.xlabel('Floor Space (sq ft)')
plt.ylabel('Frequency')
plt.show()

# Correlation matrix for numeric columns

print("\nCorrelation matrix for numeric columns:")
numeric_cols = data.select_dtypes(include=[np.number])
correlation_matrix = numeric_cols.corr()
print(correlation_matrix)

# Visualize the correlation matrix

plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Matrix')
plt.show()

# Scatter plot for Year Built vs. Price

plt.figure(figsize=(10, 6))
sns.scatterplot(x='Year Built', y='Price', data=data)
plt.title('Year Built vs. Price')
plt.xlabel('Year Built')
plt.ylabel('Price')
plt.show()
# Scatter plot for Floor Space vs. Price
plt.figure(figsize=(10, 6))
sns.scatterplot(x='Floor Space', y='Price', data=data)
plt.title('Floor Space vs. Price')
plt.xlabel('Floor Space (sq ft)')
plt.ylabel('Price')
plt.show()

# Box plot for Waterfront vs. Price

plt.figure(figsize=(10, 6))
sns.boxplot(x='Waterfront', y='Price', data=data)
plt.title('Waterfront vs. Price')
plt.xlabel('Waterfront')
plt.ylabel('Price')
plt.show()

# Correlation between Floor Space and Price

correlation_floor_space_price = data['Floor Space'].corr(data['Price'])
print(f"\nCorrelation between Floor Space and Price:
{correlation_floor_space_price:.3f}")

# Correlation between Year Built and Price

correlation_year_price = data['Year Built'].corr(data['Price'])
print(f"Correlation between Year Built and Price: {correlation_year_price:.3f}")

# Central Limit Theorem

sample_means = []
for _ in range(1000):
sample = data['Price'].sample(30, replace=True)
sample_means.append(sample.mean())

plt.figure(figsize=(10, 6))
sns.histplot(sample_means, kde=True)
plt.title('Sampling Distribution of the Sample Mean [Central Limit Theorem]')
plt.xlabel('Sample Mean of Price')
plt.ylabel('Frequency')
plt.show()

# T-test (Statistical test) to compare prices of waterfront vs. non-waterfront properties

print("\nPerforming T-test to compare prices of waterfront vs. non-waterfront
properties:")
waterfront_prices = data[data['Waterfront'] == 1]['Price']
non_waterfront_prices = data[data['Waterfront'] == 0]['Price']

t_stat, p_val = stats.ttest_ind(waterfront_prices, non_waterfront_prices)

print(f"Results: t-statistic = {t_stat:.3f}, p-value = {p_val:.3f}")
# Identifying missing values in data
print("\nIdentifying missing values in the dataset:")
missing_values = data.isnull().sum()
print("Missing Values in Dataset:\n", missing_values)
# Impute missing values
data['Amenities'] = data['Amenities'].fillna(data['Amenities'].mode()[0])

# Checking data types and converting them if necessary

data['Price'] = data['Price'].astype(float)
data['Waterfront'] = data['Waterfront'].astype(int)
data['Floor Space'] = data['Floor Space'].astype(float)
data['Year Built'] = data['Year Built'].astype(int)

# To use only numeric columns for correlation

numeric_cols_post = data.select_dtypes(include=[np.number])
correlation_matrix_post = numeric_cols_post.corr()
print("\nCorrelation Matrix after Preprocessing:\n", correlation_matrix_post)

# Visualize the updated correlation matrix

plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix_post, annot=True, cmap='coolwarm')
plt.title('Updated Correlation Matrix')
plt.show()

Output (in sequence)

Advanced Regression Techniques Based Housing Price Prediction Model
No ratings yet
Advanced Regression Techniques Based Housing Price Prediction Model
11 pages
Real Estate Analysis
No ratings yet
Real Estate Analysis
38 pages
Project Presentation On House Price Prediction System: Presented by Name: Simran B Solanki Roll No: 19020
100% (1)
Project Presentation On House Price Prediction System: Presented by Name: Simran B Solanki Roll No: 19020
32 pages
Business: Capstone Project House Price Prediction Project Note-1
88% (8)
Business: Capstone Project House Price Prediction Project Note-1
40 pages
Dawit House
No ratings yet
Dawit House
49 pages
House Price Prediction
No ratings yet
House Price Prediction
14 pages
The Boston Housing Dataset
100% (2)
The Boston Housing Dataset
4 pages
Anushi Project-House Price Prediction
100% (2)
Anushi Project-House Price Prediction
26 pages
Unit 1: Shobana T S Assistant Professor Dept. of ISE, BMSCE
No ratings yet
Unit 1: Shobana T S Assistant Professor Dept. of ISE, BMSCE
127 pages
DM Assignment
No ratings yet
DM Assignment
17 pages
House Ames Project
No ratings yet
House Ames Project
15 pages
Module 2notes
No ratings yet
Module 2notes
44 pages
EDA and Hypothesis Testing On KC Housing Data: Daniele Sammarco - Exploratory Data Analysis For Machine Learning by IBM
No ratings yet
EDA and Hypothesis Testing On KC Housing Data: Daniele Sammarco - Exploratory Data Analysis For Machine Learning by IBM
9 pages
Laboratory Eercise 4.1 - Del Pilar
No ratings yet
Laboratory Eercise 4.1 - Del Pilar
9 pages
Capstone Project Submission
100% (2)
Capstone Project Submission
31 pages
Unit 2
No ratings yet
Unit 2
78 pages
FALLSEM2021-22 MDI4001 ETH VL2021220104135 Reference Material I 09-Aug-2021 Data2 1
No ratings yet
FALLSEM2021-22 MDI4001 ETH VL2021220104135 Reference Material I 09-Aug-2021 Data2 1
9 pages
Final DA LAB1 Merged
No ratings yet
Final DA LAB1 Merged
48 pages
Girish Chadha Capstone Final Report Submission 16 Jul 23
No ratings yet
Girish Chadha Capstone Final Report Submission 16 Jul 23
33 pages
Ese Lab File
No ratings yet
Ese Lab File
30 pages
2023 MScIT Patel Mirza
No ratings yet
2023 MScIT Patel Mirza
54 pages
Housing Prices AI
No ratings yet
Housing Prices AI
10 pages
Information Regarding Sales Made in Real Estate in A Tabular Format
No ratings yet
Information Regarding Sales Made in Real Estate in A Tabular Format
13 pages
Problem Statement
No ratings yet
Problem Statement
6 pages
Capstone Project 6 April
No ratings yet
Capstone Project 6 April
64 pages
MiniProject BI
No ratings yet
MiniProject BI
16 pages
Data Analysis Project MAIN
No ratings yet
Data Analysis Project MAIN
6 pages
ADV Exp 5 2022301014
No ratings yet
ADV Exp 5 2022301014
9 pages
Khare 2021 IOP Conf. Ser. Mater. Sci. Eng. 1099 012053
No ratings yet
Khare 2021 IOP Conf. Ser. Mater. Sci. Eng. 1099 012053
15 pages
Real-Estate Property
No ratings yet
Real-Estate Property
11 pages
IBM Applied Data Science - Cspstone Project - Final Report
No ratings yet
IBM Applied Data Science - Cspstone Project - Final Report
7 pages
Business Mathematics and Statistics CIA 3
No ratings yet
Business Mathematics and Statistics CIA 3
36 pages
Making Predictions
No ratings yet
Making Predictions
13 pages
Template For The International Journal of Computational Linguistics and Chinese Language Processing IJCLCLP
No ratings yet
Template For The International Journal of Computational Linguistics and Chinese Language Processing IJCLCLP
19 pages
Report
No ratings yet
Report
40 pages
AIML
No ratings yet
AIML
5 pages
ML Project CLG
No ratings yet
ML Project CLG
62 pages
House-Price-Prediction-Using-Regression-Techniques Retouch - Removed
No ratings yet
House-Price-Prediction-Using-Regression-Techniques Retouch - Removed
14 pages
FML PROJECT Diya
No ratings yet
FML PROJECT Diya
9 pages
ML Mini Project HousePricePrediction
No ratings yet
ML Mini Project HousePricePrediction
17 pages
Story Point Estimation Copy
No ratings yet
Story Point Estimation Copy
16 pages
House Price Prediction
No ratings yet
House Price Prediction
14 pages
House Value
No ratings yet
House Value
22 pages
Coding
No ratings yet
Coding
7 pages
Predictive Analytics For Housing Market Trends and Valuation
No ratings yet
Predictive Analytics For Housing Market Trends and Valuation
6 pages
Case Study 219302405
No ratings yet
Case Study 219302405
14 pages
House Price Prediction With Analysis
No ratings yet
House Price Prediction With Analysis
9 pages
(House Price Prediction) Capstone Project For Python
No ratings yet
(House Price Prediction) Capstone Project For Python
10 pages
Oral Presentation
No ratings yet
Oral Presentation
9 pages
Ix Developer: User's Guide
100% (1)
Ix Developer: User's Guide
48 pages
House Price Prediction
No ratings yet
House Price Prediction
5 pages
Boiler Efficiency by Indirect Method Coal Fired Boiler
No ratings yet
Boiler Efficiency by Indirect Method Coal Fired Boiler
4 pages
CSIC 6132 排版870 878
No ratings yet
CSIC 6132 排版870 878
9 pages
House Price Prediction 1
No ratings yet
House Price Prediction 1
27 pages
Kingdom of Saudi: Jubail Industrial City Project
No ratings yet
Kingdom of Saudi: Jubail Industrial City Project
45 pages
House Pricing Regression
No ratings yet
House Pricing Regression
11 pages
Predicting House Prices Using Regression Techniques: Problem Statement: Problems Faced During Buying A House
No ratings yet
Predicting House Prices Using Regression Techniques: Problem Statement: Problems Faced During Buying A House
20 pages
Project1 Report1
No ratings yet
Project1 Report1
3 pages
2001 Chevy S10 T10 Blazer Distributor Replacement REMOVAL PROCEDURE
50% (2)
2001 Chevy S10 T10 Blazer Distributor Replacement REMOVAL PROCEDURE
7 pages
Introduction To Machine Learning (ML) With Sklearn
No ratings yet
Introduction To Machine Learning (ML) With Sklearn
10 pages
Iamsp 2
No ratings yet
Iamsp 2
8 pages
Grade 8 Pretechnical
No ratings yet
Grade 8 Pretechnical
8 pages
A Report On Chaos Theory
100% (1)
A Report On Chaos Theory
17 pages
Unit 5 - Part - 2 Limitations of Algorithm Power
No ratings yet
Unit 5 - Part - 2 Limitations of Algorithm Power
9 pages
Emerging ICT Technologies and Cybersecurity: Kutub Thakur Al-Sakib Khan Pathan Sadia Ismat
No ratings yet
Emerging ICT Technologies and Cybersecurity: Kutub Thakur Al-Sakib Khan Pathan Sadia Ismat
291 pages
Riscv Server Soc
No ratings yet
Riscv Server Soc
34 pages
Sew Cost Map
No ratings yet
Sew Cost Map
20 pages
European Catalog Solenoid Valve Flow Data Asco en 6867432
No ratings yet
European Catalog Solenoid Valve Flow Data Asco en 6867432
8 pages
Viva Question CSE-376
No ratings yet
Viva Question CSE-376
7 pages
Micro Star Restricted Secret Cover Sheet Micro Star Restricted Secret Cover Sheet Micro Star Restricted Secret Cover Sheet
100% (1)
Micro Star Restricted Secret Cover Sheet Micro Star Restricted Secret Cover Sheet Micro Star Restricted Secret Cover Sheet
32 pages
21 22
No ratings yet
21 22
14 pages
Numpy Tutorial by Expertized Guy
No ratings yet
Numpy Tutorial by Expertized Guy
12 pages
Maths Links 8c Homework Book Answers
100% (1)
Maths Links 8c Homework Book Answers
4 pages
21S18052 - Joshuapartogihutauruk - Busnov - Studycase - Ibm'S Decade of Transformation: Turnaround To Growth
No ratings yet
21S18052 - Joshuapartogihutauruk - Busnov - Studycase - Ibm'S Decade of Transformation: Turnaround To Growth
7 pages
PCS50-630 User Manual 20220509
No ratings yet
PCS50-630 User Manual 20220509
37 pages
Candidate Privacy
No ratings yet
Candidate Privacy
6 pages
Ul-1 13
No ratings yet
Ul-1 13
13 pages
Hepa Filters 01
No ratings yet
Hepa Filters 01
1 page
Advanced Power Electronics Corp.: Description
No ratings yet
Advanced Power Electronics Corp.: Description
6 pages
Kolom Distilasi Tinjauan Umum
No ratings yet
Kolom Distilasi Tinjauan Umum
22 pages
Celia SlidesCarnival
No ratings yet
Celia SlidesCarnival
30 pages
SinoGNSS A300 GNSS Receiver
No ratings yet
SinoGNSS A300 GNSS Receiver
2 pages
Swe-Spp-001-P-Dc-061 - DC String Cable Sizing - R3
No ratings yet
Swe-Spp-001-P-Dc-061 - DC String Cable Sizing - R3
9 pages
KOM-MICS, A "Tsunagaruka" System For Production Sites: Technical Paper
No ratings yet
KOM-MICS, A "Tsunagaruka" System For Production Sites: Technical Paper
6 pages
Sonnenschein A412/20 G5 Data Sheet: Drawing: Terminal
No ratings yet
Sonnenschein A412/20 G5 Data Sheet: Drawing: Terminal
1 page
3.1 Critical Thinking Rubric
No ratings yet
3.1 Critical Thinking Rubric
1 page
Innovating HRM in The Local Government - The Northern Samar Experience - BATULA, FLORENCIO A
No ratings yet
Innovating HRM in The Local Government - The Northern Samar Experience - BATULA, FLORENCIO A
1 page
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
From Everand
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
Sama Alshatali
No ratings yet
Time Series with Python: How to Implement Time Series Analysis and Forecasting Using Python
From Everand
Time Series with Python: How to Implement Time Series Analysis and Forecasting Using Python
Bob Mather
3/5 (1)

Task 1 - Data Analytics in Python

Uploaded by

Task 1 - Data Analytics in Python

Uploaded by

COM7024

Msc Data Science

Programming for Data

Investigating the Manchester Housing Market

Exploring Business Factors

The dataset contains various attributes of properties in Manchester, including:

Data Cleaning and Transformation

Statistical Test: T-test

Central Limit Theorem Demonstration

• A moderate positive correlation (0.390) between Floor Space and Price.

• A minor correlation (0.094) between Year Built and Price.

• A minor correlation (0.045) between Waterfront status and Price.

Several plots were created to visualize relationships between variables:

The analysis revealed key insights:

• There is a moderate positive correlation between Floor Space and Price.

• Bedrooms and Bathrooms have strong positive correlations with Price.

3. Investment in Non-Waterfront Properties: Given the minor price difference between

McKinney, W. (2010). Data Analysis with Python. O'Reilly Media.

# Importing required libraries

# Path of Manchester Housing dataset

# Display the first few rows of the dataset

# Descriptive statistics for waterfront properties

# Graph the distribution of floor space

# Correlation matrix for numeric columns

# Visualize the correlation matrix

# Scatter plot for Year Built vs. Price

# Box plot for Waterfront vs. Price

# Correlation between Floor Space and Price

# Correlation between Year Built and Price

# Central Limit Theorem

# T-test (Statistical test) to compare prices of waterfront vs. non-waterfront properties

t_stat, p_val = stats.ttest_ind(waterfront_prices, non_waterfront_prices)

# Checking data types and converting them if necessary

# To use only numeric columns for correlation

# Visualize the updated correlation matrix

Output (in sequence)

You might also like