Jarvis Auto ML

Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/353823706

JARVIS : An Easy Automated Machine Learning Tool in Python

Presentation · August 2021

CITATIONS READS

5 1,382

1 author:

Corentin Macqueron
ORANO (ex-AREVA)
29 PUBLICATIONS   26 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Machine Learning for Industry View project

Industrial Mixing View project

All content following this page was uploaded by Corentin Macqueron on 02 November 2022.

The user has requested enhancement of the downloaded file.


JARVIS
An Easy Automated Machine Learning Tool in Python

Corentin Macqueron, Computational Fluid Dynamics and Machine Learning Engineer


[email protected]
WHY JARVIS?
There are already many machine learning tools such as Scikit-Learn [1],
Keras [2] or Caret [3] but we felt that ‘something’ was missing

We wanted a very easy ‘Auto-ML’ tool that would automate as many


things as possible from reading the data to explain the model and make
new predictions, going through data cleaning, qualitative variables
encoding, natural language processing, data scaling, series-to-
supervised data transformation for transient state, learning, validation
and hyperparameters optimization, while using state-of-the-Art
algorithms from Scikit-Learn and Keras

We wanted automated parallelization on CPUs and GPUs

We wanted the tool to be able to deal with regression and classification


for both steady and transient states and computer vision

JARVIS had to be launched by a simple copy-paste


2
WHY JARVIS?
In short, we wanted JARVIS to be able to build machine learning models that are :

- state-of-the-Art
- easy to build, with automated preprocessing and hyperparameter
optimization
- easy to use
- fast to train, capable of using multi-CPUs and GPUs
- not a black box
- able to deal with regression, classification, steady state and transient state
and vision
- able to deal with numerical and textual data

3
WHAT’S JARVIS?

JARVIS is a Python script based on Scikit-Learn [1] and Keras [2] that only requires a few
inputs from the user and that will automately build a machine learning model

JARVIS requires the data to be in the TSV format, the type of model to look for, the
validation strategy, the ‘intensity’ of the hyperparameters optimization and whether the
model should be explained or not (and some few other inputs)

The JARVIS Graphical User Interface is either a text file that just needs a copy-paste to be
run in a Python console, or a Jupyter Notebook navigator

4
WHAT’S JARVIS?

View of the JARVIS


Jupyter Notebook
navigator Graphical
User Interface

5
DATA LAB

JARVIS offers a DATA LAB providing insights on the data prior to the machine learning
phase

The DATA LAB analyzes the data and shows the statistical distributions, the correlations
and the relative importance of the inputs (according to linear correlation coefficients and
to a quick gradient boosting feature importance) for the outputs prediction and allows for
selecting only a fraction of relevant inputs

6
PREPROCESSING
JARVIS automatically :

- eliminates NaN or missing values


- encodes (binarizes) any qualitative input
- deletes any input or output that has zero variance (because
it has no meaning)
- scales inputs and/or outputs with standard scaling, min-
max scaling or power scaling (or no scaling at all, following
the user’s request)
- balances the dataset by downsampling or upsampling
(SMOTE algorithm or duplicates without train-test leakage)
(following the user’s request) (for classification)
- eliminates any class under a user-specified threshold (for
classification)
- transforms the data from time-series to supervised-data
(for transient states) (following the user’s request)

7
PREPROCESSING – TIME-SERIES
For transient states, JARVIS uses the ‘sliding window’
approach and recursive integration to make
predictions on any arbitrarily long sequence

The user provides the sizes of the past- and future-


windows and the number of integration steps to
perform

JARVIS can also compute statistical summaries for


‘deep’ past data beyond the past rolling window (min,
max, mean, standard deviation, skewness, kurtosis,
Fourier transform, etc.) to enhance predictions

8
TIME-SERIES
For transient states, JARVIS can use
both machine learning and deep
learning techniques

For deep learning, JARVIS uses ‘classic’


feedforward perceptrons

We tried LSTMs and CNNs for time


series but we were not convinced so
they were finally not implemented

9
PREPROCESSING
– NATURAL LANGUAGE PROCESSING
For text data, JARVIS automatically :

- removes capitals, numbers and punctuation


- removes stop words
- removes any word under a certain user-defined
frequency threshold
- applies a word stemmer
- encodes words either with :
- the bag-of-words approach (according to count,
frequence or TF-IDF following the user’s
request)
- the embedding approach (Keras embedding
layer) for neural networks (to preserve context
and sequentiality) 10
VALIDATION STRATEGIES

JARVIS offers several validation strategies :

- training-validation split
- training-validation-test split
- 3 folds cross validation split
- 3 folds cross validation + test split

Split can be performed randomly or


following a ‘PreDefinedSplit’ variable given
by the user

11
ALGORITHMS
JARVIS offers several regression algorithms [1][2]:

- Dummy (Scikit-Learn)
- Linear (Scikit-Learn)
- Polynomial (Scikit-Learn)
- kNN (Scikit-Learn)
- Spline (EARTH)
- Decision Tree (Scikit-Learn)
- Random Forest (Scikit-Learn)
- AdaBoost (Scikit-Learn)
- Gradient Boosting (Scikit-Learn)
- XGBoost (DMLC)
- LightGBM (Microsoft)
- Gaussian processes (Scikit-Learn)
- Neural Network (Keras-TensorFlow) 12
ALGORITHMS
JARVIS offers several classification algorithms [1][2]:

- Dummy (Scikit-Learn)
- Logistic Regression (Scikit-Learn)
- kNN (Scikit-Learn)
- Naive Bayes (Scikit-Learn)
- Linear Discriminant Analysis (Scikit-Learn)
- Decision Tree (Scikit-Learn)
- Random Forest (Scikit-Learn)
- AdaBoost (Scikit-Learn)
- Gradient Boosting (Scikit-Learn)
- XGBoost (DMLC)
- LightGBM (Microsoft)
- Gaussian processes (Scikit-Learn)
- Neural Network (Keras-TensorFlow) 13
HYPERPARAMETER OPTIMIZATION
JARVIS offers 4 ‘intensity’ levels for
hyperparameter optimization

Level 0 will only look for default parameters, level


1 will look for ~10 models for each algorithm,
level 2 will look for ~100 models for each
algorithm and level 3 will look for ~1000 models
for each algorithm

JARVIS uses brute-force grid search

The grids are predefined but the user can modify


the grids or ask for random grids

‘Smart’ search using results-aware approaches


such as Bayesian search or Forest Minimize [4]
were tested but we were not convinced [10] so
‘smart’ search was finally not implemented 14
HYPERPARAMETER OPTIMIZATION
AND METRICS
JARVIS will automatically optimize
and compare all the algorithms
requested by the user and will
indicate and select the best one with
its best hyperparameters

JARVIS offers Mean Absolute Error


and Mean Absolute Percentage Error
for regression (allowing for relative
error), and weighted or non-weighted
F1-score for classification

15
MODEL EXPLANATION
JARVIS offers 3 ways to explain the model :

- 1D Partial Dependence Plots


- Sobol Indices (with interactions) (SALib package [5])
- Shapley Values and Shapleys Graphs
(SHAP package [6])

These 3 approaches show how the model behave, both quantitatively and
qualitatively, can ‘explain’ the value of the outputs as a function of each input and
can even show how the model behaves in extrapolation

16
COMPUTING
JARVIS can run the training on a single or multiple CPU core(s) or on a single or multiple
GPU(s)

The parallelization is performed with the Joblib and Multiprocessing libraries

The training is parallelized at the grid search level and not at the model level
(tiny models would not benefit from parallel computing)

The speed-up from 1 CPU core to 4 GPUs can be of 2 orders of magnitude

17
CONVERGENCE
JARVIS automatically performs a convergence study to show the effect of the size of
the dataset on the machine learning score or error, by performing 10 subsamplings
from 10% to 100% of the training set to plot the validation score or error for each
subsampling, so the user can see if more data would be better or not :

18
GRAPHS AND RESULTS

JARVIS will automatically calculate errors or scores, draw parity plots or


confusion matrices and show error statistics, histograms and classification
reports

JARVIS will automatically saves the database with the original and predicted
results so the user can check the error case by case

19
COMPUTER VISION

JARVIS also offers image processing capabilities for vision (classification learning)

JARVIS automates image pre-processing :

- Sizing to a single and homogeneous scale


- Gray conversion if requested
- Data augmentation (scaling, rotations, mirrors, etc.) if requested

20
COMPUTER VISION
JARVIS builds and trains convolutional neural layers followed by fully connected layers, following the user’s
requests for the following parameters :

- Number and size of convolutional layers


- Type and size of pooling layers
- Number and size of fully connected layers
- Type of activation functions

It is possible to load a pre-trained convolutional neural network such as VGG [7], ResNet [8] or INCEPTION [9]
with a single keyword to reduce training time and increase performance

JARVIS trains the network on CPU or GPU

JARVIS also offers the possibility to train an XGBoost model


instead of a fully connected neural network (after loading a
frozen pre-trained neural network), making the model extremely
fast to train compared to a « full » neural network
21
DEPLOYMENT

The models are easily deployed to perform new predictions


as JARVIS saves the final model in a dedicated directory with
all the necessary pipelines (cleaning, encoding, scaling, NLP,
unscaling, etc.)

The model can then be easily reloaded and used to make


new prediction from new inputs from a simple text file

The new prediction can be ‘explained’ with the Shapley


values

In transient state, JARVIS can read any new input sequence


of any length and will automatically perform the new
transient prediction sequence
22
SAMARITAN – UNSUPERVISED LEARNING
JARVIS also offers the SAMARITAN module that performs unsupervised learning for :

- Clustering (the number of clusters to be looked for is automated with the Elbow method)
(Gaussian Mixture and DBSCAN) [1]
- Anomaly detection (DBSCAN, One-Class SVM, Z-Score, Isolation Forest and Auto-Encoding) [1][2]

23
OPTIMIZATION
Models built with JARVIS can be coupled
to an optimizer program built with SciPy

This approach makes use of the


instantaneous response of the model to
look quickly for an optimum, for instance
to minimize a total cost or a reactant while
ensuring performance and safety criteria

It is then not only possible to make


instantaneous guesses, but to obtain an
optimal engineering solution

24
USE CASES

JARVIS is being used to build surrogate models in fluid dynamics, structural mechanics,
fire safety, to understand phenomena (such as weather impact on industrial cooling
systems), to make decision helping tools in various engineering domains such as piping or
IC, to perform topology optimization for complex systems such as thermoelectric
generators and to build digital twins for process engineering in transient states

25
LIMITATIONS
What JARVIS doesn’t do (yet):

- Multilabel classification (multiclass only for the moment)


- Video processing
- Reinforcement Learning

26
REFERENCES
[1] Pedregosa, F., et al. (2011), Scikit-Learn: Machine learning in Python, https://scikit-learn.org/stable

[2] Chollet, F. (2015), Keras: Deep Learning Library for Theano and TensorFlow, https://keras.io/

[3] Kuhn, M., et al. (2008), Caret package, https://CRAN.R-project.org/package=caret

[4] Pak, M. et al. (2017), Scikit-Optimize : Sequential model-based optimization in Python, https://scikit-optimize.github.io/stable/

[5] Iwanaga, T., et al. (2022), SALib - Sensitivity Analysis Library in Python, https://salib.readthedocs.io/en/latest/

[6] Lundberg, S. (2018), SHAP, https://shap.readthedocs.io/en/latest/index.html

[7] Simonyan, K., Zisserman, A. (2014), Very Deep Convolutional Networks for Large-Scale Image Recognition,
https://arxiv.org/abs/1409.1556

[8] He, K., et al. (2015), Deep Residual Learning for Image Recognition, https://arxiv.org/abs/1512.03385

[9] Szegedy, C., et al. (2014), Going deeper with convolutions, https://arxiv.org/pdf/1409.4842v1.pdf

[10] Macqueron, C. (2022), Machine Learning : The Hunt for Performance,


https://www.researchgate.net/publication/362177565_Machine_Learning_The_Hunt_for_Performance

JARVIS was mainly developed by Corentin Macqueron, with the much appreciated help of Kenza Hammou for Computer Vision
27
View publication stats

You might also like