0% found this document useful (0 votes)
15 views11 pages

Critical Thinking Exercise-Real Estate

The Python Forecasting Project for Critical Thinking involves analyzing a synthetic 'Real Estate' dataset to develop critical thinking skills and create a regression model. The project includes creating scatterplots to examine relationships between home prices and various features, conducting regression analysis, and evaluating the significance of different independent variables. The findings suggest that not all variables contribute meaningfully to predicting home prices, and alternative data elements are recommended for forecasting sales performance in a real estate context.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views11 pages

Critical Thinking Exercise-Real Estate

The Python Forecasting Project for Critical Thinking involves analyzing a synthetic 'Real Estate' dataset to develop critical thinking skills and create a regression model. The project includes creating scatterplots to examine relationships between home prices and various features, conducting regression analysis, and evaluating the significance of different independent variables. The findings suggest that not all variables contribute meaningfully to predicting home prices, and alternative data elements are recommended for forecasting sales performance in a real estate context.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Name: ____Khanh Ngo 1677046___________________________________________

Python Forecasting Project for Critical Thinking


This project utilizes a synthetic dataset “Real Estate”. The purpose is twofold:
- Build critical thinking skills needed to structure data analysis appropriately for effective
decision making.
- Analyze available data practically and skillfully in order to build an explanatory
regression model.
The Real Estate - Base database includes the following variables for homes (* NOTE: These
variables are shown as qualitative variables within the database):
a. *Unit# (An assigned database key)
b. *Type (H = House, C = Condo/Apartment)
c. *Location (1 through 10 – voting district where located)
d. *U/S/R (Urban vs. Suburban vs. Rural location)
e. Price (The price the house ended up selling for in 2017)
f. Sq. Ft. (Heated/Cooled & Attached square footage)
g. Lot (Acres) (Acreage of property)
h. Garage (Number of attached covered and/or enclosed parking positions)
i. BRs (Number of qualified bedrooms)
j. Baths (Number of bathrooms – no tub or shower indicated as .5)
k. *Pool (No=No Access; HA=Shared Pool; AG=Above Ground; IG=In Ground)
l. Age (Age of home in rounded year at end of 2017)

At a high level, you will complete the following:

Create a series of Scatter Plot

1. Download the dataset generated by the Python script (CriticalThinking.ipynb) to create


scatterplots.

Important: Ensure that the Dependent Variable (Price) is plotted on the Y-axis,
and the Independent Variable is on the X-axis, as the column order determines
the axis assignment.

Here are the steps in detail:


Create the following charts in Excel using the charting tools and the indicated variables in
“Real Estate.xlsx” (Remember, Price is your Dependent Variable)
Create a new tab in the spreadsheet for each plot and name it “Price Vs feature”,
the feature being the independent variable i.e. Sq. Ft, Lot, etc.. After creating
each Scatterplot on the original tab, move it to the Scatterplot tab you created.
a. Create a Scatterplot using the variables Price and Sq. Ft.

b. Create a Scatterplot using the variables Price and Lot (Acres).

c. Create a Scatterplot using the variables Price and Garage.


d. Create a Scatterplot using the variables Price and BRs.

e. Create a Scatterplot using the variables Price and Baths.


f. Create a Scatterplot using the variables Price and Age.
What sort of relationship do you see between these variables based on the scatterplots?

a. Between Price and Sq. Ft. (Circle)?

Price and Sq. Ft. (R² = 0.0072): No relationship

1. No relationship Weak Moderate Strong


b. Between Price and Lot (Circle)?

Price and Lot (R² = 0.0002): No relationship

1. No relationship Weak Moderate Strong


c. Between Price and Garage (Circle)?

Price and Garage (R² = 0.0027): No relationship

1. No relationship Weak Moderate Strong


d. Between Price and BRs (Circle)?

Price and BRs (R² = 0.0001): No relationship

1. No relationship Weak Moderate Strong


e. Between Price and Baths (Circle)?

Price and Baths (R² = 0.0264): Slightly related but still very weak

1. No relationship Weak Moderate Strong


f. Between Price and Age (Circle)?

Price and Age (R² = 0.0013): No relationship

1. No relationship Weak Moderate Strong

Conduct Regression Analysis in Excel

Use Excel’s regression tools to evaluate how well the prescribed independent
variables explain variations in the dependent variable.

Preparation: Encoding Categorical Variables

Before running the regression, follow these steps to properly encode categorical
variables:
Identify Categorical Variables – Review the dataset and determine which
columns contain categorical data (e.g., "Region," "Category," or "Yes/No"
values).

Create Dummy Variables – Use an IF statement to convert categorical values


into numerical format. For example, if encoding a "Gender" column:

=IF(A2="Male",1,0)

Drop the Reference Variable – To avoid multicollinearity, remove one of the


dummy variables for each categorical feature.

Run Regression with Excel

In the Excel spreadsheet provided, using the Data Analysis Add-in, run a regression
analysis with Price as the Dependent Variable with all Independent Variables.
Next run regression with Price as the Dependent and each of the following Independent
Variable (Lot, Garage and BRs as the Independent Variables and place each of the
regression results in a separate worksheet with the appropriate name, i.e., select new
worksheet to place the results of each regression “Regression Model with Lot, Regression
Model with Garage, etc.…”).

What is the R-Square obtained for the three simple regression?

R-Square obtained for Simple Regression Price with Lot = 0.0002


R-Square obtained for Simple Regression Price with Garage = 0.0027
R-Square obtained for Simple Regression Price with BRs = 0.0001

Garage is in three simple regressions has the highest R2.

a. Provide the following from the “Full Regression Model” (Model with all
features):
a. Adjusted R2 ________ 0.0118

b. Y-Intercept for the Regression Model

_________ 596764.38

c. Coefficient of X1 (Lot)

_________ -16847.73

d. Coefficient of X2 (Garage)
__________ -18183.71
e. Coefficient of X3 (BRs)
________ -2450.48

b. Do you think we need all independent variables in your Regression model to


predict changes in Price (Circle)? Yes No

Explain: I believe not all independent variables are necessary for predicting changes in Price. A
negative adjusted R² suggests that some variables do not meaningfully contribute to the model.
Additionally, Lot Size, Garage, and Age have very small coefficients, indicating a weak
relationship with Price. Eliminating these less significant variables can enhance the model’s
effectiveness and clarity.

c. Which variable(s) would you remove (Circle)?

Lot Size Garage BRs

Answer: These variables should be removed because their coefficients are


relatively small, indicating that they do not significantly impact the prediction of
house prices. By eliminating these weaker predictors, we can simplify the model
and potentially improve its predictive power.
d. Of the following variables in the spreadsheet, which variable would you select
next to add to the model (i.e., you think it would create a stronger prediction
of Price)?
Type Location U/S/R Sq. Ft. Baths Pool
Age
Answer: Square footage is a crucial factor in determining home prices, as larger
homes typically sell for more. Although its R² = 0.0072 is weak, it is still more
relevant than other removed variables.
The number of bathrooms affects home value because buyers prefer more
bathrooms for convenience. With R² = 0.0264, it has the highest correlation
among tested variables, making it a useful addition.

e. Run a Regression Model on the Real Estate – Base database using Price as the
Dependent Variable (Y) and include the original Independent Variables
(minus any you removed in step 6) and adding the variable you chose in step
7. Print your model output and turn it in with the assignment. (NOTE: You
may have to repeat this exercise until you find a combination of variables that
gives you a higher R2).

The output from SAS Regression Model is printed as follow:


f. Provide the following from the Regression Model:
a. Coefficient of Determination (R-squared ).
_________ 0.031

b. Y-Intercept for the Regression Model


________ 620481.0286

c. Slope value (coefficient) for each of your Independent Variables.

i. Var____Sq. Ft.___________________
_________ 10.47_______________

ii. Var______Baths_________________
______________ -45614.7__________

iii. Var___________BRs____________
_________ -3792.258 _______________
Critical Thinking Question:

g. A large real estate company is trying to use similar data plus their own sales
data to forecast total sales for the coming year for each of their agents and
they have pulled data from their Finance records. They are trying to assemble
the best data to build a Regression model.

a. Would it make sense to use the same data as we used above in the
model? Why or why not?

Answer: No, it would not be ideal to use the same data. The current model
focuses on predicting house prices based on property features, whereas
forecasting total sales for agents requires different factors. Sales performance is
influenced by variables such as the number of transactions completed, marketing
efforts, client network, and market trends. Using property characteristics alone
would not provide an accurate prediction of an agent’s sales.

b. Recommend two data elements you think they probably have available
to help them predict sales for each of their sales people.

1. Number of Transactions Closed per Agent – This reflects an agent’s past sales
activity and is a strong indicator of their future performance.
2. Total Commission Earned per Agent – Higher commissions suggest more
successful sales, making it a useful metric for predicting overall sales
performance.

3.
GRADING RUBRIC
Overall Score Possible = 100

Problem Area Possible Points


Points Awarded
Did the student create the Excel tab for Scatterplots? 2

Did the student create the correct scatterplots and move 3


them to the new tab?

Did the student make a selection for each type of 5


relationship?

Did the student run Data Analysis on the Excel 10


spreadsheet creating a new tab for the model output?

Did the student provide the correct model output values 10


from the spreadsheet in the problem document?

Did the student answer Critical Thinking questions 5, 6 20


and 7?

Did the student run a regression model in SAS and 20


provide a print out of the model output?

Did the student provide the correct model output values 10


from SAS in the problem document and answer the
decision problem (#10)?

Did the student complete all parts of the Critical Thinking 20


problem #11?

Total Critical Thinking Points 100

You might also like