0% found this document useful (0 votes)

22 views11 pages

Gplearn SymbolicRegression AnalyticsVidhya Medium-Com

Genetic Progression with Python

Uploaded by

banstala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views11 pages

Gplearn SymbolicRegression AnalyticsVidhya Medium-Com

Genetic Progression with Python

Uploaded by

banstala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

gplearn Symbolic Regression | by Andrea Castiglioni | ... https://medium.com/analytics-vidhya/python-symbolic-...

Sign In Get started

Published in Analytics Vidhya

You have 1 free member-only story left this month. Sign up for Medium and get an extra one

Andrea Castiglioni Follow

Aug 3, 2020 · 7 min read · Member-only · Listen

Python Symbolic Regression with gplearn: how

to discover analytical relationships in your data
In this tutorial I want to introduce you to Genetic Programming in Python
with the library gplearn.

Symbolic regression is a machine learning technique that aims to identify an

underlying mathematical expression that best describes a relationship. It
begins by building a population of naive random formulas to represent a
relationship between known independent variables and their dependent
variable targets in order to predict new data. Each successive generation of
programs is then evolved from the one that came before it by selecting the
fittest individuals from the population to undergo genetic operations.

1. Introduction
Imagine you were a scientist working in any field. You can basically split your
work in three steps: the first is to gather the data, the second is to propose a
phenomenological explanation of those data. Third you would like to explain
those data and phenomenological observations to first principles.
Imagine you are about to gather data and discover gravity. There are three
scientist we should remember about this field: Tycho Brahe (data acquisition)
who took extensive and precise measurements of the position of planets over

1 of 11 21/10/22, 21:13
gplearn Symbolic Regression | by Andrea Castiglioni | ... https://medium.com/analytics-vidhya/python-symbolic-...

time.
Sign In Get started
Johannes Kepler who, from Brahe’s measurements, derived analytical
expressions that describe the motion of the solar system in a concise manner
(data analysis).
Finally, the theorist, Isaac Newton. He realized the mechanisms underlying
planets travelling around the Sun which could be formulated into a universal
law (derivation from 1st principles).

Machine learning (ML) models are currently the tools of choice for
uncovering these physical laws. Although they have shown some promising
performance in predicting materials properties, typical parameterized
machine learning models are not allowing the ultimate steps of scientific
research: automating Newton. Take into consideration for example ML/Deep
learning algorithms who can predict with a perfect score the number of
infected people in a global pandemic: if they can’t explain the number, they
are of limited use. That’s why simpler mechanistic models like SIR are still
widely used. ML models can be predictive but their descriptions are often too
verbose (e.g. deep-learning models with thousands of parameters) or
mathematically restrictive (e.g. assuming the target variable is a linear
combination of input features).

In this prospective paper, we focus on an alternative to machine-learning

models: symbolic regression. Symbolic regression simultaneously searches
for the optimal form of a function and set of parameters to the given problem,
and is a powerful regression technique when little if any a-priori knowledge of
the data structure/distribution is available.

2 of 11 21/10/22, 21:13
gplearn Symbolic Regression | by Andrea Castiglioni | ... https://medium.com/analytics-vidhya/python-symbolic-...

Sign In Get started

Genetic programming flowchart depicting the iterative solution finding process. source arxiv.

2. Initial dataset and data analysis

We generate the data like we did in the regression tutorial. The only
difference here is that we modify the amount of sample and the coefficients to
give us with a complex function. We see that the result is the combination of a
linear x, a sin(x) and x**3.

nsample = 400
sig = 0.2
x = np.linspace(-50, 50, nsample)
X = np.column_stack((x/5, 10*np.sin(x), (x-5)**3, np.ones(nsample)))
beta = [0.01, 1, 0.001, 5.]

y_true = np.dot(X, beta)

y = y_true + sig * np.random.normal(size=nsample)
df = pd.DataFrame()
df['x']=x
df['y']=y

The shape of the curve is illustrated below:

3 of 11 21/10/22, 21:13
gplearn Symbolic Regression | by Andrea Castiglioni | ... https://medium.com/analytics-vidhya/python-symbolic-...

Sign In Get started

Function to be predicted.

3. GPlearn imports and implementation

We will import SymbolicRegressor from gplearn and also the decision tree
and random forest regressor from sklearn from which we will compare the
results.

We import from sympy in particular we will use simpify to make the outcome
expression more readable.

Basic libraries to import from gplearn, sklearn and sympy.

Since we will be comparing the results from different approaches, we split our
dataset in train/test:

X = df[['x']]
y = df['y']
y_true = y
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.30, random_state=42)

Our first fit will be extremely easy: in the function set we import most, if not
all, the default functions which comes with the library. Then we initialize the
regressor with some standard parameters.

With this code we’re telling the regressor to use the function from the

4 of 11 21/10/22, 21:13
gplearn Symbolic Regression | by Andrea Castiglioni | ... https://medium.com/analytics-vidhya/python-symbolic-...

function_set list. The number of generations (evolution) of the expression will

Sign In Get started
be 40, with an initial population of 5000. Each of those 5000 will combinate
between them to form expressions which will be evaluated. The evaluation
function by default will assess the “fitness” by using by default a MSE metric.
There’s a stopping_criteria which by default will be of 0.01 in case we reach
this MSE before the number of generations. Then there are three parameters
linked with evolution: p_crossover, p_subtree_mutation and
p_hoist_mutation. For more detail about the specific use of those parameters
see the documentation. One thing you should note, however, is that the sum of
the three parameters should be equal or less than 1. We can also try to control
the total length of our expression by using Parsimony_coefficient. The higher
this parameter will be, the shorter the regressor will try to keep the
expression. With max_samples we’re telling the fitness function to compare
the result with 90% of our data.

# First Test
function_set = ['add', 'sub', 'mul', 'div','cos','sin','neg','inv']

est_gp =
SymbolicRegressor(population_size=5000,function_set=function_set,
generations=40, stopping_criteria=0.01,
p_crossover=0.7, p_subtree_mutation=0.1,
p_hoist_mutation=0.05,
p_point_mutation=0.1,
max_samples=0.9, verbose=1,
parsimony_coefficient=0.01, random_state=0,
feature_names=X_train.columns)

We define a dictionary called converter: its use will be clear later but
essentially we need it to convert a function name, or string, in its equivalent
python mathematical code.

converter = {
'sub': lambda x, y : x - y,
'div': lambda x, y : x/y,
'mul': lambda x, y : x*y,
'add': lambda x, y : x + y,
'neg': lambda x : -x,
'pow': lambda x, y : x**y,
'sin': lambda x : sin(x),
'cos': lambda x : cos(x),
'inv': lambda x: 1/x,
'sqrt': lambda x: x**0.5,
'pow3': lambda x: x**3
}

We then fit the data, we evaluate our R2 score and with the help of simpyfy we
print the resulting expression:

5 of 11 21/10/22, 21:13
gplearn Symbolic Regression | by Andrea Castiglioni | ... https://medium.com/analytics-vidhya/python-symbolic-...

est_gp.fit(X_train, y_train) Sign In Get started

print('R2:',est_gp.score(X_test,y_test))
next_e = sympify((est_gp._program), locals=converter)
next_e

With verbose = 1 the output will be something like this: the first column
indicates the generation number. Then we have the average length and fitness
of the whole population (5000 units). On the right we have the best parameters
for the individual.

With the code above, the result will be like:

R squared and analytical function from gplearn.

6 of 11 21/10/22, 21:13
gplearn Symbolic Regression | by Andrea Castiglioni | ... https://medium.com/analytics-vidhya/python-symbolic-...

4. Comparing Gplearn to traditional ML approaches Get started

Sign In
We reached a good level of correlation with the original expression. However I
must say that it’s particularly difficult to interpret this equation! Let’s see how
it compares with standard ML tools:

est_tree = DecisionTreeRegressor(max_depth=5)
est_tree.fit(X_train, y_train)
est_rf = RandomForestRegressor(n_estimators=100,max_depth=5)
est_rf.fit(X_train, y_train)
y_gp = est_gp.predict(X_test)
score_gp = est_gp.score(X_test, y_test)
y_tree = est_tree.predict(X_test)
score_tree = est_tree.score(X_test, y_test)
y_rf = est_rf.predict(X_test)
score_rf = est_rf.score(X_test, y_test)

From scores above, the result is not impressive apparently: the ML tools both
scored pretty well if compared to our analytical expression:

RF score:0.993
DT score: 0.992
GPlearn score: 0.978

However, if we plot the results, we can highlight that our analytical function is
actually better in the middle range while it goes off at the extreme points.

fig = plt.figure(figsize=(12, 10))

for i, (y, score, title) in enumerate([(y_test, None, "Ground Truth"),

(y_gp, score_gp,
"SymbolicRegressor"),
(y_tree, score_tree,
"DecisionTreeRegressor"),
(y_rf, score_rf,
"RandomForestRegressor")]):

ax = fig.add_subplot(2, 2, i+1)
points = ax.scatter(X, y_true, color='green', alpha=0.5)
test = ax.scatter(X_test,y,color='red', alpha=0.5)
plt.title(title)

plt.show()

7 of 11 21/10/22, 21:13
gplearn Symbolic Regression | by Andrea Castiglioni | ... https://medium.com/analytics-vidhya/python-symbolic-...

Sign In Get started

Comparison of full data (green points) and test data (red points) for three different approaches. Top left
indicates original data.

Above: relationship between y_pred and y_true. Below: prediction error for the three algorithms.

8 of 11 21/10/22, 21:13
gplearn Symbolic Regression | by Andrea Castiglioni | ... https://medium.com/analytics-vidhya/python-symbolic-...

5. Improving the result Get started

Sign In
By looking at the function and using some prior knowledge of mathematics,
we can improve our algorithm by feeding it some functions to improve the
result. gplearn allows us to create custom functions and we will implement a
simple cubic expression.

from gplearn.functions import make_function

def pow_3(x1):
f = x1**3
return f

pow_3 = make_function(function=pow_3,name='pow3',arity=1)

# add the new function to the function_set

function_set = ['add', 'sub', 'mul',

'div','cos','sin','neg','inv',pow_3]

est_gp =
SymbolicRegressor(population_size=5000,function_set=function_set,
generations=45, stopping_criteria=0.01,
p_crossover=0.7, p_subtree_mutation=0.1,
p_hoist_mutation=0.05,
p_point_mutation=0.1,
max_samples=0.9, verbose=1,
parsimony_coefficient=0.01, random_state=0,
feature_names=X_train.columns)
est_gp.fit(X_train, y_train)
print('R2:',est_gp.score(X_test,y_test))
next_e = sympify((est_gp._program), locals=converter)
next_e

Result of gplearn

The R2 is now really close to 1, also note how simpler our equation is! If we
compare again to the traditional ML tools we really appreciate the
improvement with this step.

9 of 11 21/10/22, 21:13
gplearn Symbolic Regression | by Andrea Castiglioni | ... https://medium.com/analytics-vidhya/python-symbolic-...

Sign In Get started

37 1

Enjoy the read? Reward the writer.Beta

Your tip will go to Andrea Castiglioni through a third-party platform of
Give a tip
their choice, letting them know you appreciate their story.

Above: relationship between y_pred and y_true. Below: prediction error for the three algorithms.

Sign up for Analytics Vidhya News Bytes

By Analytics Vidhya

10 of 11 21/10/22, 21:13
gplearn Symbolic Regression | by Andrea Castiglioni | ... https://medium.com/analytics-vidhya/python-symbolic-...

6. Conclusion
Latest news from Analytics Vidhya on our Hackathons and some of our best articles! Take a look.
Get started
Sign In
I think symbolic regression is a great tool to be aware of. It isn’t Get
perhaps perfect for every kind o
this newsletter
Your email
approach but it gives you another option which can be really useful as the outcome is readily
By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy
understandable.
practices.

About Help Terms Privacy

Photo by Hannah Troupe on Unsplash

11 of 11 21/10/22, 21:13

Machine Learning For Environmental Noise Classification in Smart Cities (Ali Othman Albaji) (Z-Library)
100% (2)
Machine Learning For Environmental Noise Classification in Smart Cities (Ali Othman Albaji) (Z-Library)
179 pages
Machine Learning Fundamentals A Concise Introduction by Hui Jiang
No ratings yet
Machine Learning Fundamentals A Concise Introduction by Hui Jiang
423 pages
Data Science For Complex Systems 9781108844796 9781108953597 - Compress
No ratings yet
Data Science For Complex Systems 9781108844796 9781108953597 - Compress
304 pages
Active Learning
100% (3)
Active Learning
116 pages
Data Science An Introduction To Statistics and Machine Learning (Matthias Plaue) (Z-Library)
100% (2)
Data Science An Introduction To Statistics and Machine Learning (Matthias Plaue) (Z-Library)
372 pages
Modern Data Science With R - Baumer Benjamin SKaplan Daniel THort
No ratings yet
Modern Data Science With R - Baumer Benjamin SKaplan Daniel THort
985 pages
Introduction To Data ScienceA Python Approach To Concepts, Techniques and Applications PDF
100% (10)
Introduction To Data ScienceA Python Approach To Concepts, Techniques and Applications PDF
227 pages
Bayesian Statistical Methods
100% (10)
Bayesian Statistical Methods
288 pages
Algorithms For Data Science 1st Brian Steele (WWW - Ebook DL - Com)
94% (16)
Algorithms For Data Science 1st Brian Steele (WWW - Ebook DL - Com)
438 pages
Data Mining and Data Warehousing Principles and Practical Techniques 1108727743 9781108727747 Compress
No ratings yet
Data Mining and Data Warehousing Principles and Practical Techniques 1108727743 9781108727747 Compress
513 pages
Lecture Maths
No ratings yet
Lecture Maths
104 pages
Advances in Data Science: Methodologies and Applications: Gloria Phillips-Wren Anna Esposito Lakhmi C. Jain Editors
100% (1)
Advances in Data Science: Methodologies and Applications: Gloria Phillips-Wren Anna Esposito Lakhmi C. Jain Editors
342 pages
001-2023-0714 DLBDSIDS01 Course Book
No ratings yet
001-2023-0714 DLBDSIDS01 Course Book
90 pages
Maths For Intelligent Systems
No ratings yet
Maths For Intelligent Systems
76 pages
INT254 Syllabus
No ratings yet
INT254 Syllabus
2 pages
Benjamin S. Baumer, Daniel T. Kaplan, Nicholas J. Horton - Modern Data Science With R (Chapman & Hall - CRC Texts in Statistical Science) - Chapman and Hall - CRC (2021)
100% (1)
Benjamin S. Baumer, Daniel T. Kaplan, Nicholas J. Horton - Modern Data Science With R (Chapman & Hall - CRC Texts in Statistical Science) - Chapman and Hall - CRC (2021)
650 pages
Guide To Intelligent Data Analysis
No ratings yet
Guide To Intelligent Data Analysis
398 pages
Artificial Intelligence in Biomedical Engineering
No ratings yet
Artificial Intelligence in Biomedical Engineering
25 pages
(Chapman & Hall - CRC Computer Science & Data Analysis) Faraway, Julian James - Wang, Xiaofeng - Yue, Yu - Bayesian Regression Modeling With Inla-CRC Press (2018)
No ratings yet
(Chapman & Hall - CRC Computer Science & Data Analysis) Faraway, Julian James - Wang, Xiaofeng - Yue, Yu - Bayesian Regression Modeling With Inla-CRC Press (2018)
325 pages
Designing Machine Learning Systems With Python - Sample Chapter
100% (1)
Designing Machine Learning Systems With Python - Sample Chapter
31 pages
Advances in Data Science Methodologies and Applications 2021
No ratings yet
Advances in Data Science Methodologies and Applications 2021
472 pages
Unit 2
No ratings yet
Unit 2
67 pages
Lavin - Thesis 4-22-18
No ratings yet
Lavin - Thesis 4-22-18
168 pages
Module 1 - 1
No ratings yet
Module 1 - 1
48 pages
Genetic Algorithms: The Crossover-Mutation Debate
No ratings yet
Genetic Algorithms: The Crossover-Mutation Debate
26 pages
INT254 Lect0
No ratings yet
INT254 Lect0
48 pages
MLDAF
No ratings yet
MLDAF
102 pages
Epics Springer
No ratings yet
Epics Springer
12 pages
VI Sem IT Syllabus 2021-22300921012947
No ratings yet
VI Sem IT Syllabus 2021-22300921012947
15 pages
MAT8033 Lecture Slides
No ratings yet
MAT8033 Lecture Slides
62 pages
An Operator Semigroup in Mathematical Genetics
No ratings yet
An Operator Semigroup in Mathematical Genetics
92 pages
TOBo ML
No ratings yet
TOBo ML
135 pages
Advanced R
100% (2)
Advanced R
24 pages
Statistics For Applied Science 200l
No ratings yet
Statistics For Applied Science 200l
122 pages
Data Science
No ratings yet
Data Science
10 pages
MAT8033 Lecture Slides
No ratings yet
MAT8033 Lecture Slides
29 pages
Fall2024 W4995 Lecture1
No ratings yet
Fall2024 W4995 Lecture1
110 pages
Lecture 1-Unit 3.3
No ratings yet
Lecture 1-Unit 3.3
33 pages
Chapter1 ML
No ratings yet
Chapter1 ML
101 pages
3-Artificial Neural Networks Vs Biological Neural Networks, Neural Network Architectures, Char
No ratings yet
3-Artificial Neural Networks Vs Biological Neural Networks, Neural Network Architectures, Char
29 pages
Week 12 Intro To DS and ML
No ratings yet
Week 12 Intro To DS and ML
67 pages
SC DOC Final
No ratings yet
SC DOC Final
47 pages
Active Learning Book
No ratings yet
Active Learning Book
116 pages
Machine Learning For Evolution Strategies by Oliver Kramer (Auth.)
No ratings yet
Machine Learning For Evolution Strategies by Oliver Kramer (Auth.)
120 pages
Causal Tutorial 1
No ratings yet
Causal Tutorial 1
18 pages
LN ML Rug
No ratings yet
LN ML Rug
267 pages
PGP-AIML Curriculum - Great Lakes
No ratings yet
PGP-AIML Curriculum - Great Lakes
43 pages
Design Expert
No ratings yet
Design Expert
74 pages
DS ML Applied Modelling
No ratings yet
DS ML Applied Modelling
6 pages
Icoft Wilson
No ratings yet
Icoft Wilson
12 pages
Introduction To Integrative Engineering A Computational Approach To Biomedical Problems 1st Edition Complete DOCX Download
No ratings yet
Introduction To Integrative Engineering A Computational Approach To Biomedical Problems 1st Edition Complete DOCX Download
14 pages
GADataMining CNA
No ratings yet
GADataMining CNA
73 pages
MLbook Extract
No ratings yet
MLbook Extract
14 pages
Data Science Master
No ratings yet
Data Science Master
11 pages
23042025DM
No ratings yet
23042025DM
5 pages
A Step-By-Step Guide To Robust ML Classification by Ryan Burke Mar, 2023 Towards Data Science
No ratings yet
A Step-By-Step Guide To Robust ML Classification by Ryan Burke Mar, 2023 Towards Data Science
1 page
Poetry and Mathematics by Scott M. Buchanan
100% (2)
Poetry and Mathematics by Scott M. Buchanan
159 pages
M.V. Anil Kumar Kumar: Summary: Work History
No ratings yet
M.V. Anil Kumar Kumar: Summary: Work History
3 pages
Test Bank For Understanding Economics A Contemporary Perspective, 9th Edition Mark Lovewell
100% (1)
Test Bank For Understanding Economics A Contemporary Perspective, 9th Edition Mark Lovewell
10 pages
American Scientist, Vol. 111.1 (January-February 2023)
No ratings yet
American Scientist, Vol. 111.1 (January-February 2023)
68 pages
History IX 3rd Summative
No ratings yet
History IX 3rd Summative
30 pages
Telangana Inter 1st Year Result 2021
No ratings yet
Telangana Inter 1st Year Result 2021
8 pages
Tribhuvan University Institute of Engineering Pulchwok Central Campus Pulchwok, Lalitpur
No ratings yet
Tribhuvan University Institute of Engineering Pulchwok Central Campus Pulchwok, Lalitpur
13 pages
Final Marketing Plan Whole
No ratings yet
Final Marketing Plan Whole
19 pages
Financial Modelling PDF
No ratings yet
Financial Modelling PDF
2 pages
Hazardous Area Ventilation Sce Performance Standard
No ratings yet
Hazardous Area Ventilation Sce Performance Standard
82 pages
Royal Park Property Development Limited
No ratings yet
Royal Park Property Development Limited
7 pages
Corrosion of Aluminium 2nd Edition Christian Vargel Instant Download
100% (1)
Corrosion of Aluminium 2nd Edition Christian Vargel Instant Download
62 pages
Session 7: Genetics, Experience and Financial Sophistication
100% (1)
Session 7: Genetics, Experience and Financial Sophistication
40 pages
What Is Capacity Planning
No ratings yet
What Is Capacity Planning
6 pages
1st Summative Bengali Questions
No ratings yet
1st Summative Bengali Questions
15 pages
New Microsoft Office Word Document
No ratings yet
New Microsoft Office Word Document
34 pages
Alternative Binder Systems For Lower Carbon Concrete Code of Practice
No ratings yet
Alternative Binder Systems For Lower Carbon Concrete Code of Practice
8 pages
MOS-II Lecture 01 - Stress Analysis
No ratings yet
MOS-II Lecture 01 - Stress Analysis
25 pages
TF-IDF - From - Scratch - Towards - Data - Science
No ratings yet
TF-IDF - From - Scratch - Towards - Data - Science
20 pages
Peps Grade 3 Sample
No ratings yet
Peps Grade 3 Sample
3 pages
NRC, Logistics Officer, Cover Letter & CV, Elhamfrotan.
No ratings yet
NRC, Logistics Officer, Cover Letter & CV, Elhamfrotan.
4 pages
Ug II New Sem 2024 Time Table
No ratings yet
Ug II New Sem 2024 Time Table
4 pages
Essay For Ielts
No ratings yet
Essay For Ielts
64 pages
PR A2plus B1 The World Today Videos Videoscript
No ratings yet
PR A2plus B1 The World Today Videos Videoscript
3 pages
Accuracy Interpretability Tradeoff
No ratings yet
Accuracy Interpretability Tradeoff
24 pages
Fisher 1922
No ratings yet
Fisher 1922
17 pages
Adding and Subtracting Integers Lesson Plan
No ratings yet
Adding and Subtracting Integers Lesson Plan
3 pages
A Customer Churn Prediction Using Pearson Correlation Function and K Nearest Neighbor Algorithm For Telecommunication Industry
No ratings yet
A Customer Churn Prediction Using Pearson Correlation Function and K Nearest Neighbor Algorithm For Telecommunication Industry
14 pages
My Life 10 Years From Now
No ratings yet
My Life 10 Years From Now
2 pages
RRL
No ratings yet
RRL
2 pages
Chapter-4 Bullet
No ratings yet
Chapter-4 Bullet
5 pages
Emergency Cart Checklist
No ratings yet
Emergency Cart Checklist
1 page
Semester V (2022-25)
No ratings yet
Semester V (2022-25)
1 page
MVP Comprehensive Resource Impacts Agreement
No ratings yet
MVP Comprehensive Resource Impacts Agreement
16 pages
MC Den PDF
No ratings yet
MC Den PDF
1 page
T Den PDF
No ratings yet
T Den PDF
1 page
About Paraguay
No ratings yet
About Paraguay
4 pages
"Office Green Wants To Increase Brand Awareness.": Goal One: SMART Goal One
No ratings yet
"Office Green Wants To Increase Brand Awareness.": Goal One: SMART Goal One
2 pages
Basics of Share Allotement
No ratings yet
Basics of Share Allotement
3 pages

Gplearn SymbolicRegression AnalyticsVidhya Medium-Com

Uploaded by

Gplearn SymbolicRegression AnalyticsVidhya Medium-Com

Uploaded by

gplearn Symbolic Regression | by Andrea Castiglioni | ... https://medium.com/analytics-vidhya/python-symbolic-...

Sign In Get started

Published in Analytics Vidhya

Andrea Castiglioni Follow

Aug 3, 2020 · 7 min read · Member-only · Listen

Python Symbolic Regression with gplearn: how

Symbolic regression is a machine learning technique that aims to identify an

In this prospective paper, we focus on an alternative to machine-learning

Sign In Get started

2. Initial dataset and data analysis

y_true = np.dot(X, beta)

The shape of the curve is illustrated below:

Sign In Get started

3. GPlearn imports and implementation

Basic libraries to import from gplearn, sklearn and sympy.

function_set list. The number of generations (evolution) of the expression will

est_gp.fit(X_train, y_train) Sign In Get started

With the code above, the result will be like:

R squared and analytical function from gplearn.

4. Comparing Gplearn to traditional ML approaches Get started

fig = plt.figure(figsize=(12, 10))

for i, (y, score, title) in enumerate([(y_test, None, "Ground Truth"),

Sign In Get started

5. Improving the result Get started

from gplearn.functions import make_function

# add the new function to the function_set

function_set = ['add', 'sub', 'mul',

Sign In Get started

Enjoy the read? Reward the writer.Beta

Sign up for Analytics Vidhya News Bytes

About Help Terms Privacy

Photo by Hannah Troupe on Unsplash

You might also like