Week 11 Features More Inputs Intro
Week 11 Features More Inputs Intro
Import Modules
In [1]: import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
Introduction
We know how to work with 1 CONTINUOUS input in our LINEAR MODELS for REGRESSION.
Additive
Linear relationships
We will define a function that calculates the AVERAGE OUTPUT or TREND given input 1 and
input 2 and the 3 regression coefficients.
The purpose of this function is to allow visualizing the TREND with respect to input 1.
res_df['x2'] = x2
return res_df
In [3]: b0 = -0.25
b1 = 1.95
b2 = 0.2
1 of 19 11/16/2024, 10:26 AM
week_11_features_more_inputs_intro https://d3c33hcgiwev3.cloudfront.net/4vZ0flVvSCavJKBzGp98FQ_ef...
Let's now define 101 evenly or uniformly spaced values of x1 between -3 and 3.
In [5]: x1_values.size
Out[5]: 101
In [6]: x1_values.ndim
Out[6]: 1
Out[7]: x1 x2 trend
0 -3.00 0 -6.100
1 -2.94 0 -5.983
2 -2.88 0 -5.866
3 -2.82 0 -5.749
4 -2.76 0 -5.632
96 2.76 0 5.132
97 2.82 0 5.249
98 2.88 0 5.366
99 2.94 0 5.483
Out[9]: 0 101
Name: x2, dtype: int64
Let's visualize the RELATIONSHIP between the AVERAGE OUTPUT and INPUT 1.
plt.show()
2 of 19 11/16/2024, 10:26 AM
week_11_features_more_inputs_intro https://d3c33hcgiwev3.cloudfront.net/4vZ0flVvSCavJKBzGp98FQ_ef...
3 of 19 11/16/2024, 10:26 AM
week_11_features_more_inputs_intro https://d3c33hcgiwev3.cloudfront.net/4vZ0flVvSCavJKBzGp98FQ_ef...
Out[11]: x1 x2 trend
0 -3.00 -2 -6.500
1 -2.94 -2 -6.383
2 -2.88 -2 -6.266
3 -2.82 -2 -6.149
4 -2.76 -2 -6.032
96 2.76 -2 4.732
97 2.82 -2 4.849
98 2.88 -2 4.966
99 2.94 -2 5.083
plt.show()
4 of 19 11/16/2024, 10:26 AM
week_11_features_more_inputs_intro https://d3c33hcgiwev3.cloudfront.net/4vZ0flVvSCavJKBzGp98FQ_ef...
Let's REPEAT or REPLICATE calculating the TREND with respect to x1 FOR different values of
x2 !
In [14]: x2_values
Out[16]: 9
In [17]: study_wrt_x1_list[0]
5 of 19 11/16/2024, 10:26 AM
week_11_features_more_inputs_intro https://d3c33hcgiwev3.cloudfront.net/4vZ0flVvSCavJKBzGp98FQ_ef...
Out[17]: x1 x2 trend
In [18]: study_wrt_x1_list[1]
Out[18]: x1 x2 trend
In [20]: study_wrt_x1_df
6 of 19 11/16/2024, 10:26 AM
week_11_features_more_inputs_intro https://d3c33hcgiwev3.cloudfront.net/4vZ0flVvSCavJKBzGp98FQ_ef...
Out[20]: x1 x2 trend
In [21]: study_wrt_x1_df.x2.value_counts()
Visualize the TREND or AVERAGE OUTPUT with respect to x1 FOR EACH unique value of
x2 as a LINE CHART!!!
plt.show()
7 of 19 11/16/2024, 10:26 AM
week_11_features_more_inputs_intro https://d3c33hcgiwev3.cloudfront.net/4vZ0flVvSCavJKBzGp98FQ_ef...
By default Seaborn LINE CHARTS want to SUMMARIZE data and CALCULATE AVERAGE
behavior of the y-axis variable with respect to the x-axis variable!!!!
plt.show()
8 of 19 11/16/2024, 10:26 AM
week_11_features_more_inputs_intro https://d3c33hcgiwev3.cloudfront.net/4vZ0flVvSCavJKBzGp98FQ_ef...
We therefore need to DISABLE or TURN OFF Seaborn's DEFAULT line chart averaging!!!!!!
We must tell Seaborn WHAT DEFINES each LINE!!!! Seaborn refers to this as the UNITS of the
line!!!!!
plt.show()
9 of 19 11/16/2024, 10:26 AM
week_11_features_more_inputs_intro https://d3c33hcgiwev3.cloudfront.net/4vZ0flVvSCavJKBzGp98FQ_ef...
However, we cannot tell the difference between the lines! They all have the same color!
plt.show()
10 of 19 11/16/2024, 10:26 AM
week_11_features_more_inputs_intro https://d3c33hcgiwev3.cloudfront.net/4vZ0flVvSCavJKBzGp98FQ_ef...
plt.show()
11 of 19 11/16/2024, 10:26 AM
week_11_features_more_inputs_intro https://d3c33hcgiwev3.cloudfront.net/4vZ0flVvSCavJKBzGp98FQ_ef...
plt.show()
12 of 19 11/16/2024, 10:26 AM
week_11_features_more_inputs_intro https://d3c33hcgiwev3.cloudfront.net/4vZ0flVvSCavJKBzGp98FQ_ef...
IF the INPUTS have roughly the same MAGNITUDE and SCALE the MAGNITUDE of the SLOPE
tells you which input causes the GREATER CHANGE on the AVERAGE OUTPUT!!!!!
Let's see one more time by FOCUSING on the relationship with respect to x2 .
res_df['x1'] = x1
return res_df
Let's define 2 new arrays so we can visualize the TRENDS with respect to x2 for different
values of x1 .
In [30]: x2_values_b.shape
13 of 19 11/16/2024, 10:26 AM
week_11_features_more_inputs_intro https://d3c33hcgiwev3.cloudfront.net/4vZ0flVvSCavJKBzGp98FQ_ef...
Out[30]: (101,)
In [32]: x1_values_b
In [34]: len(study_wrt_x2_list)
Out[34]: 9
In [36]: study_wrt_x2_df
Out[36]: x2 x1 trend
In [37]: study_wrt_x2_df.x1.value_counts()
14 of 19 11/16/2024, 10:26 AM
week_11_features_more_inputs_intro https://d3c33hcgiwev3.cloudfront.net/4vZ0flVvSCavJKBzGp98FQ_ef...
Visualize the TREND or AVERAGE OUTPUT with respect to x2 for each unique value of x1 .
Use the DIVERGING color palette to help distinguish x1 values ABOVE and BELOW the x1
MIDPOINT.
plt.show()
15 of 19 11/16/2024, 10:26 AM
week_11_features_more_inputs_intro https://d3c33hcgiwev3.cloudfront.net/4vZ0flVvSCavJKBzGp98FQ_ef...
All concepts APPLY whether I had DIFFERENT SIGNS for the SLOPES. Or if the SLOPE
multiplying input 2 was greater than the slope multiplying input 1!!!
Interactions
Let's define another function that calculates the TREND or AVERAGE OUTPUT focusing on
the RELATIONSHIP with x1 . But, this time we will have the INTERACTION FEATURES which
equals the PRODUCT of the two inputs!!!
res_df['x2'] = x2
return res_df
In [40]: b3 = 1
In [42]: len(study_interaction_wrt_x1_list)
Out[42]: 9
In [44]: study_interaction_wrt_x1_df
16 of 19 11/16/2024, 10:26 AM
week_11_features_more_inputs_intro https://d3c33hcgiwev3.cloudfront.net/4vZ0flVvSCavJKBzGp98FQ_ef...
Out[44]: x1 x2 trend
In [45]: study_interaction_wrt_x1_df.x2.value_counts()
Let's now VISUALIZE the AVERAGE OUTPUT or TREND with respect to x1 for each x2
unique value.
plt.show()
17 of 19 11/16/2024, 10:26 AM
week_11_features_more_inputs_intro https://d3c33hcgiwev3.cloudfront.net/4vZ0flVvSCavJKBzGp98FQ_ef...
INTERACTION represents that the RELATIONSHIP with respect 1 INPUT DEPENDS on the
OTHER INPUT!!!!!!!
res_df['x1'] = x1
return res_df
plt.show()
18 of 19 11/16/2024, 10:26 AM
week_11_features_more_inputs_intro https://d3c33hcgiwev3.cloudfront.net/4vZ0flVvSCavJKBzGp98FQ_ef...
The RELATIONSHIP of the AVERAGE OUTPUT with respect to 1 INPUT DEPENDS on the
OTHER INPUT!!!!
In [ ]:
19 of 19 11/16/2024, 10:26 AM