0% found this document useful (0 votes)

7 views7 pages

Week 11 Features Interaction

This document discusses the implementation of linear models with interaction features using Python's statsmodels library. It covers the syntax for including interaction terms in models, demonstrates fitting models with various interaction specifications, and illustrates how to create predictions based on these models. The document emphasizes the convenience of the formula interface for managing interactions and polynomial features in linear regression analysis.

Uploaded by

Diya Patel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views7 pages

Week 11 Features Interaction

Uploaded by

Diya Patel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

week_11_features_interaction https://d3c33hcgiwev3.cloudfront.net/d3p9joI9RIWmaZxVB9aToA_a3...

CMPINF 2100 Week 11

Working with LINEAR MODELS with INTERACTION features
Let's read in the data we created in the PREVIOUS ADDITIVE features demonstration.

Import Modules
In [1]: import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import seaborn as sns

In [2]: import statsmodels.formula.api as smf

Read data
In [3]: df = pd.read_csv('linear_additive_example.csv')

In [4]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 35 entries, 0 to 34
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 x1 35 non-null float64
1 x2 35 non-null float64
2 trend 35 non-null float64
3 y 35 non-null float64
dtypes: float64(4)
memory usage: 1.2 KB

In [5]: df.head()

Out[5]: x1 x2 trend y

0 1.024252 -0.650623 1.617167 1.547730

1 -1.733671 0.993904 -3.431878 -3.988479

2 -0.089294 0.251805 -0.373762 -0.107839

3 -1.093748 1.356543 -2.111500 -2.594083

4 0.320995 -0.145121 0.346917 -0.583119

1 of 7 11/16/2024, 10:26 AM
week_11_features_interaction https://d3c33hcgiwev3.cloudfront.net/d3p9joI9RIWmaZxVB9aToA_a3...

Fit linear model with INTERACTIONS

The formula interface has several STRANGE and WEIRD...YET...quite useful syntax artifacts
when working with INTERACTIONS.

In [6]: fit_a = smf.ols(formula='y ~ x1 + x2 + x1 * x2', data=df).fit()

In [7]: fit_a.params

Out[7]: Intercept -0.107266

x1 1.748614
x2 0.174882
x1:x2 0.264916
dtype: float64

Let's use the FORMULA INTERFACES actual SYNTAX for the MULTIPLICATION or
INTERACTION FEATURE.

In [8]: fit_b = smf.ols(formula='y ~ x1 + x2 + x1:x2', data=df).fit()

In [9]: fit_b.params

Out[9]: Intercept -0.107266

x1 1.748614
x2 0.174882
x1:x2 0.264916
dtype: float64

In [10]: fit_a.bse

Out[10]: Intercept 0.120003

x1 0.136364
x2 0.132036
x1:x2 0.129129
dtype: float64

In [11]: fit_b.bse

Out[11]: Intercept 0.120003

x1 0.136364
x2 0.132036
x1:x2 0.129129
dtype: float64

In [12]: fit_a.pvalues

Out[12]: Intercept 3.782866e-01

x1 6.215348e-14
x2 1.950164e-01
x1:x2 4.874308e-02
dtype: float64

In [13]: fit_b.pvalues

2 of 7 11/16/2024, 10:26 AM
week_11_features_interaction https://d3c33hcgiwev3.cloudfront.net/d3p9joI9RIWmaZxVB9aToA_a3...

Out[13]: Intercept 3.782866e-01

x1 6.215348e-14
x2 1.950164e-01
x1:x2 4.874308e-02
dtype: float64

The FORMULA interface creates a SHORTCUT to INCLUDE MAIN EFFECTS AND

INTERACTIONS in the model!!!!

The SHORT CUT is to use the * operator in the FORMULA interface!!

In [14]: fit_c = smf.ols(formula='y ~ x1 * x2', data=df).fit()

In [15]: fit_c.params

Out[15]: Intercept -0.107266

x1 1.748614
x2 0.174882
x1:x2 0.264916
dtype: float64

In [16]: fit_b.bse

Out[16]: Intercept 0.120003

x1 0.136364
x2 0.132036
x1:x2 0.129129
dtype: float64

In [17]: fit_c.bse

Out[17]: Intercept 0.120003

x1 0.136364
x2 0.132036
x1:x2 0.129129
dtype: float64

In [18]: fit_b.pvalues

Out[18]: Intercept 3.782866e-01

x1 6.215348e-14
x2 1.950164e-01
x1:x2 4.874308e-02
dtype: float64

In [19]: fit_c.pvalues

Out[19]: Intercept 3.782866e-01

x1 6.215348e-14
x2 1.950164e-01
x1:x2 4.874308e-02
dtype: float64

This means if we would ONLY want PRODUCTS then we should use the : operator in the
FORMULA.

In [20]: fit_d = smf.ols(formula='y ~ x1:x2', data=df).fit()

3 of 7 11/16/2024, 10:26 AM
week_11_features_interaction https://d3c33hcgiwev3.cloudfront.net/d3p9joI9RIWmaZxVB9aToA_a3...

In [21]: fit_d.params

Out[21]: Intercept 0.006942

x1:x2 0.939718
dtype: float64

There is ONE MORE way of representing the SHORT CUT and this is related to WHY we need
to use np.power() when we want to RAISE an input to a polynomial degree.

In [23]: df.x1.head() ** 2

Out[23]: 0 1.049092
1 3.005615
2 0.007973
3 1.196285
4 0.103038
Name: x1, dtype: float64

In [24]: np.power( df.x1.head().to_numpy(), 2)

Out[24]: array([1.04909217, 3.00561521, 0.00797335, 1.19628455, 0.10303796])

In [25]: fit_e = smf.ols(formula='y ~ x1 2 + x2 2', data=df).fit()

In [26]: fit_e.params

Out[26]: Intercept -0.109888

x1 1.879500
x2 0.075586
dtype: float64

In [27]: fit_f = smf.ols(formula='y ~ x1 + x2', data=df).fit()

In [28]: fit_f.params

Out[28]: Intercept -0.109888

x1 1.879500
x2 0.075586
dtype: float64

In the FORMULA interface the ** or POWER operator denotes the DEGREE of

INTERACTION between the inputs!!!!!

You cannot use the ** operator to RAISE an input to POWER!!!!

In [29]: fit_g = smf.ols(formula='y ~ np.power(x1,2) + np.power(x2,2)', data=df).fit()

In [30]: fit_g.params

Out[30]: Intercept 0.023664

np.power(x1, 2) 0.468992
np.power(x2, 2) -0.573997
dtype: float64

In [31]: fit_h = smf.ols(formula='y ~ x1 + np.power(x1,2) + x2 + np.power(x2,2)', data=df).fit

4 of 7 11/16/2024, 10:26 AM
week_11_features_interaction https://d3c33hcgiwev3.cloudfront.net/d3p9joI9RIWmaZxVB9aToA_a3...

In [32]: fit_h.params

Out[32]: Intercept -0.133105

x1 1.844778
np.power(x1, 2) 0.085897
x2 0.072746
np.power(x2, 2) -0.058627
dtype: float64

A shortcut is that you can CREATE up to and INCLUDING ALL DEGREE OF INTERACTIONS
using the ** operator in the FORMULA interface!!!

In [33]: fit_i = smf.ols(formula='y ~ (x1 + x2)**2', data=df).fit()

In [34]: fit_i.params

Out[34]: Intercept -0.107266

x1 1.748614
x2 0.174882
x1:x2 0.264916
dtype: float64

You can use this notation to create very COMPLEX interactions between many different types
of FEATURES!!!

Thus, you can even INTERACT non-linear features together!!!!

In [35]: fit_j = smf.ols(formula='y ~ (x1 + np.power(x1,2) + x2 + np.power(x2,2))**2 ', data

In [36]: fit_j.params

Out[36]: Intercept 0.143433

x1 1.043322
np.power(x1, 2) -0.363776
x2 0.044654
np.power(x2, 2) -0.219340
x1:np.power(x1, 2) 0.230775
x1:x2 -0.214390
x1:np.power(x2, 2) 0.484325
np.power(x1, 2):x2 -0.154070
np.power(x1, 2):np.power(x2, 2) 0.482368
x2:np.power(x2, 2) -0.063944
dtype: float64

When you want to INTERACT polynomials...I like to separate the POLYNOMIALS derived from
each input using () in the formula interface!!!

In [37]: fit_k = smf.ols(formula='y ~ (x1 + np.power(x1,2)) * (x2 + np.power(x2,2))', data=df

In [38]: fit_k.params

5 of 7 11/16/2024, 10:26 AM
week_11_features_interaction https://d3c33hcgiwev3.cloudfront.net/d3p9joI9RIWmaZxVB9aToA_a3...

Out[38]: Intercept 0.048390

x1 1.355946
np.power(x1, 2) -0.287992
x2 -0.030340
np.power(x2, 2) -0.061388
x1:x2 -0.229695
x1:np.power(x2, 2) 0.641887
np.power(x1, 2):x2 -0.229303
np.power(x1, 2):np.power(x2, 2) 0.458874
dtype: float64

Make Predictions
You do NOT need to create the interaction features yourself!!!!

You ONLY need to define the INPUTS and the model object will generate all necessary
features for you...because the FORMULA INTERACE REMEMBERS what to do!!!

We learned how to create a GRID of combinations of the INPUTS!

In [39]: input_grid = pd.DataFrame([ (x1, x2) for x1 in np.linspace(df.x1.min(), df.x1.max(),

for x2 in np.linspace(df.x2.min(), df.x2.max(),
columns=['x1', 'x2'])

In [43]: input_grid

Out[43]: x1 x2

0 -1.733671 -1.813330

1 -1.733671 -1.269344

2 -1.733671 -0.725357

3 -1.733671 -0.181371

4 -1.733671 0.362615

... ... ...

904 1.822354 0.362615

905 1.822354 0.906601

906 1.822354 1.450587

907 1.822354 1.994573

908 1.822354 2.538560

909 rows × 2 columns

In [40]: input_grid.nunique()

Out[40]: x1 101
x2 9
dtype: int64

6 of 7 11/16/2024, 10:26 AM
week_11_features_interaction https://d3c33hcgiwev3.cloudfront.net/d3p9joI9RIWmaZxVB9aToA_a3...

In [41]: viz_grid = input_grid.copy()

PREDICT the AVERAGE OUTPUT or TREND using the last model we fit.

In [42]: viz_grid['pred'] = fit_k.predict( input_grid )

In [44]: sns.relplot(data = viz_grid,

x='x1', y='pred', kind='line',
hue='x2', palette='coolwarm',
estimator=None, units='x2')

plt.show()

In [ ]:

7 of 7 11/16/2024, 10:26 AM

(Feature Engineering) (Extended-Cheatsheet)
No ratings yet
(Feature Engineering) (Extended-Cheatsheet)
9 pages
Week 11 Features Additive
No ratings yet
Week 11 Features Additive
19 pages
Da Program Upto 6
No ratings yet
Da Program Upto 6
20 pages
21brs1474 ML Lab 2
No ratings yet
21brs1474 ML Lab 2
25 pages
EE2211 CheatSheet
No ratings yet
EE2211 CheatSheet
15 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
Assignment 2 - .Ipynb - Colab
No ratings yet
Assignment 2 - .Ipynb - Colab
2 pages
Introduction To Matlab Tutorial 11
No ratings yet
Introduction To Matlab Tutorial 11
37 pages
Prob13: 1 EE16A Homework 13
No ratings yet
Prob13: 1 EE16A Homework 13
23 pages
Assigment Python 2
No ratings yet
Assigment Python 2
7 pages
Introduction To Matlab Tutorial 11
No ratings yet
Introduction To Matlab Tutorial 11
37 pages
1 - Standard Linear Regression: Numpy NP Pandas
No ratings yet
1 - Standard Linear Regression: Numpy NP Pandas
4 pages
Data Wrangling and Preprocessing
100% (1)
Data Wrangling and Preprocessing
41 pages
Ad3411 - Student
No ratings yet
Ad3411 - Student
27 pages
Chap9 CurveFitting Interpolation
No ratings yet
Chap9 CurveFitting Interpolation
7 pages
Zerox Ready
No ratings yet
Zerox Ready
21 pages
AIML 01 Merged
No ratings yet
AIML 01 Merged
25 pages
Hadi Saadat Program Matlab 523-660
No ratings yet
Hadi Saadat Program Matlab 523-660
5 pages
2.3 SciPy-1
No ratings yet
2.3 SciPy-1
17 pages
Data Science Manual
No ratings yet
Data Science Manual
16 pages
Week 11 Features Categorical
No ratings yet
Week 11 Features Categorical
15 pages
CE206 Curvefitting Interpolation 4
No ratings yet
CE206 Curvefitting Interpolation 4
20 pages
ML Interview Questions
No ratings yet
ML Interview Questions
10 pages
ML Lab Codes
No ratings yet
ML Lab Codes
14 pages
ML Labs
No ratings yet
ML Labs
14 pages
Matlab Homework Experts 2
No ratings yet
Matlab Homework Experts 2
10 pages
Math Programs
No ratings yet
Math Programs
4 pages
Dataanalysis Finals123
No ratings yet
Dataanalysis Finals123
36 pages
Lecture 6 Python
No ratings yet
Lecture 6 Python
38 pages
Predictive 23-06-2025 - Jupyter Notebook
No ratings yet
Predictive 23-06-2025 - Jupyter Notebook
14 pages
DataAnalytics Lab Manual
No ratings yet
DataAnalytics Lab Manual
35 pages
Use Julia
No ratings yet
Use Julia
68 pages
Submission Template 513 E Div
No ratings yet
Submission Template 513 E Div
53 pages
Logistic - Regresssion
No ratings yet
Logistic - Regresssion
22 pages
Part 2 Modelling and Simulation in MATLAB - Overview
100% (1)
Part 2 Modelling and Simulation in MATLAB - Overview
68 pages
003-FIN7790 (Part2)
No ratings yet
003-FIN7790 (Part2)
162 pages
Import As: Pandas PD DF PD - Read - CSV DF - Head
No ratings yet
Import As: Pandas PD DF PD - Read - CSV DF - Head
91 pages
Scipy Cheat Sheet Python For Data Science: Linear Algebra
No ratings yet
Scipy Cheat Sheet Python For Data Science: Linear Algebra
1 page
Scipy Cheat Sheet Python For Data Science: Linear Algebra
No ratings yet
Scipy Cheat Sheet Python For Data Science: Linear Algebra
1 page
ML Practical File
100% (2)
ML Practical File
43 pages
Week 11 Logistic Fitting Statsmodels
No ratings yet
Week 11 Logistic Fitting Statsmodels
14 pages
New Text Document
No ratings yet
New Text Document
7 pages
Interpolation (Scipy - Interpolate) - SciPy v1.9.3 Manual
No ratings yet
Interpolation (Scipy - Interpolate) - SciPy v1.9.3 Manual
12 pages
UNIT-5 Detailed Notes
No ratings yet
UNIT-5 Detailed Notes
50 pages
Data Science Practical 9
No ratings yet
Data Science Practical 9
6 pages
Pandas Notes
No ratings yet
Pandas Notes
54 pages
Ma324: TP2
No ratings yet
Ma324: TP2
1 page
Group Work Assignment Supervised and Unsupervised Learning
No ratings yet
Group Work Assignment Supervised and Unsupervised Learning
10 pages
FDS Slot 1
No ratings yet
FDS Slot 1
19 pages
Aiml Lab
No ratings yet
Aiml Lab
37 pages
Workshop 8: Numerical Differentiation and Integration
No ratings yet
Workshop 8: Numerical Differentiation and Integration
9 pages
ASSi2 DSBDA
No ratings yet
ASSi2 DSBDA
4 pages
Data - Preprocessing - Tools - Ipynb - Colaboratory
No ratings yet
Data - Preprocessing - Tools - Ipynb - Colaboratory
4 pages
ML Contenthalf
No ratings yet
ML Contenthalf
35 pages
Data Fitting
No ratings yet
Data Fitting
7 pages
Ex7 HTML
No ratings yet
Ex7 HTML
3 pages
Unit 2
No ratings yet
Unit 2
4 pages
p6 Lab Work - Merged
No ratings yet
p6 Lab Work - Merged
22 pages
L-2 (Data Frame Part 1) .Ipynb - Colab
No ratings yet
L-2 (Data Frame Part 1) .Ipynb - Colab
5 pages
Computer Science, Career and Job
From Everand
Computer Science, Career and Job
Ramkrishna Ghosh
No ratings yet
Clarion IDE Users Guide
No ratings yet
Clarion IDE Users Guide
302 pages
Atcd Model QP
0% (1)
Atcd Model QP
4 pages
LabVIEW and Web Browser Based UIs
No ratings yet
LabVIEW and Web Browser Based UIs
22 pages
Syllabus OOPJ
No ratings yet
Syllabus OOPJ
2 pages
ABAP CDS Views With Authorization Based On Access Control
100% (3)
ABAP CDS Views With Authorization Based On Access Control
18 pages
DP - 14 - 1 - Practice FAZRULAKMALFADILA - C2C022001
No ratings yet
DP - 14 - 1 - Practice FAZRULAKMALFADILA - C2C022001
4 pages
Narrow One Aimbot and Esp
No ratings yet
Narrow One Aimbot and Esp
4 pages
PPSC Computer Science Lecturer 2015 (68 Question With Answer)
No ratings yet
PPSC Computer Science Lecturer 2015 (68 Question With Answer)
9 pages
Bitwise Operators
No ratings yet
Bitwise Operators
15 pages
Readme
No ratings yet
Readme
6 pages
3139 Database Management System Lab
No ratings yet
3139 Database Management System Lab
60 pages
ASSIGNMENT 1 String Random
No ratings yet
ASSIGNMENT 1 String Random
3 pages
23 PPSC Unit 4 FILEOperations
No ratings yet
23 PPSC Unit 4 FILEOperations
33 pages
Lesson 3
No ratings yet
Lesson 3
35 pages
File Handling
No ratings yet
File Handling
8 pages
Chandigarh MS 22-23
No ratings yet
Chandigarh MS 22-23
6 pages
Brief History and Turbo C++ Editor Environment
No ratings yet
Brief History and Turbo C++ Editor Environment
11 pages
Windowing Functions in Databricks 1736450539
No ratings yet
Windowing Functions in Databricks 1736450539
23 pages
3-1 SML Lab Instructor Manual - Cordinator Copy For The A.Y 2023-24
No ratings yet
3-1 SML Lab Instructor Manual - Cordinator Copy For The A.Y 2023-24
77 pages
Self Organizing Linked List Updated
No ratings yet
Self Organizing Linked List Updated
4 pages
Final Question Bank
No ratings yet
Final Question Bank
22 pages
Model 1 and Model 2 (MVC) Architecture
No ratings yet
Model 1 and Model 2 (MVC) Architecture
6 pages
Desired Qualifications Sought in Entry Level Software Engineers
No ratings yet
Desired Qualifications Sought in Entry Level Software Engineers
7 pages
Lec04 AI Informed Search
No ratings yet
Lec04 AI Informed Search
38 pages
Se Unit 2 Analysis Modelling
No ratings yet
Se Unit 2 Analysis Modelling
68 pages
PWC Unit-1 Part-2 (Basics of Programming)
No ratings yet
PWC Unit-1 Part-2 (Basics of Programming)
44 pages
Creating Search Sets
No ratings yet
Creating Search Sets
12 pages
App Development Roadmap
No ratings yet
App Development Roadmap
3 pages
BPMN
No ratings yet
BPMN
5 pages
Miniproject (The Survival Duck Hunt - A Computer Game)
No ratings yet
Miniproject (The Survival Duck Hunt - A Computer Game)
42 pages

Week 11 Features Interaction

Uploaded by

Week 11 Features Interaction

Uploaded by

week_11_features_interaction https://d3c33hcgiwev3.cloudfront.net/d3p9joI9RIWmaZxVB9aToA_a3...

CMPINF 2100 Week 11

import seaborn as sns

In [2]: import statsmodels.formula.api as smf

0 1.024252 -0.650623 1.617167 1.547730

1 -1.733671 0.993904 -3.431878 -3.988479

2 -0.089294 0.251805 -0.373762 -0.107839

3 -1.093748 1.356543 -2.111500 -2.594083

4 0.320995 -0.145121 0.346917 -0.583119

Fit linear model with INTERACTIONS

In [6]: fit_a = smf.ols(formula='y ~ x1 + x2 + x1 * x2', data=df).fit()

Out[7]: Intercept -0.107266

In [8]: fit_b = smf.ols(formula='y ~ x1 + x2 + x1:x2', data=df).fit()

Out[9]: Intercept -0.107266

Out[10]: Intercept 0.120003

Out[11]: Intercept 0.120003

Out[12]: Intercept 3.782866e-01

Out[13]: Intercept 3.782866e-01

The FORMULA interface creates a SHORTCUT to INCLUDE MAIN EFFECTS AND

The SHORT CUT is to use the * operator in the FORMULA interface!!

In [14]: fit_c = smf.ols(formula='y ~ x1 * x2', data=df).fit()

Out[15]: Intercept -0.107266

Out[16]: Intercept 0.120003

Out[17]: Intercept 0.120003

Out[18]: Intercept 3.782866e-01

Out[19]: Intercept 3.782866e-01

In [20]: fit_d = smf.ols(formula='y ~ x1:x2', data=df).fit()

Out[21]: Intercept 0.006942

In [24]: np.power( df.x1.head().to_numpy(), 2)

Out[24]: array([1.04909217, 3.00561521, 0.00797335, 1.19628455, 0.10303796])

In [25]: fit_e = smf.ols(formula='y ~ x1 ** 2 + x2 ** 2', data=df).fit()

Out[26]: Intercept -0.109888

In [27]: fit_f = smf.ols(formula='y ~ x1 + x2', data=df).fit()

Out[28]: Intercept -0.109888

In the FORMULA interface the ** or POWER operator denotes the DEGREE of

You cannot use the ** operator to RAISE an input to POWER!!!!

In [29]: fit_g = smf.ols(formula='y ~ np.power(x1,2) + np.power(x2,2)', data=df).fit()

Out[30]: Intercept 0.023664

In [31]: fit_h = smf.ols(formula='y ~ x1 + np.power(x1,2) + x2 + np.power(x2,2)', data=df).fit

Out[32]: Intercept -0.133105

In [33]: fit_i = smf.ols(formula='y ~ (x1 + x2)**2', data=df).fit()

Out[34]: Intercept -0.107266

Thus, you can even INTERACT non-linear features together!!!!

In [35]: fit_j = smf.ols(formula='y ~ (x1 + np.power(x1,2) + x2 + np.power(x2,2))**2 ', data

Out[36]: Intercept 0.143433

In [37]: fit_k = smf.ols(formula='y ~ (x1 + np.power(x1,2)) * (x2 + np.power(x2,2))', data=df

Out[38]: Intercept 0.048390

We learned how to create a GRID of combinations of the INPUTS!

In [39]: input_grid = pd.DataFrame([ (x1, x2) for x1 in np.linspace(df.x1.min(), df.x1.max(),

... ... ...

904 1.822354 0.362615

905 1.822354 0.906601

906 1.822354 1.450587

907 1.822354 1.994573

908 1.822354 2.538560

909 rows × 2 columns

In [41]: viz_grid = input_grid.copy()

In [42]: viz_grid['pred'] = fit_k.predict( input_grid )

In [44]: sns.relplot(data = viz_grid,

You might also like

In [25]: fit_e = smf.ols(formula='y ~ x1 2 + x2 2', data=df).fit()