[WIP] add Voronoi Isolation Forest implementation #79
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Thanks in advance to all reviewers who will spend their time reading this pull request.
What?
I have implemented Voronoi Isolation Forest. It is an Anomaly Detection algorithm based on the isolation approach, similar to the already present sklearn.ensemble.IsolationForest.
In short, an ensemble of trees is constructed, each one representing a nested Voronoi tessellation. The average depth is used to compute the anomaly scores of the samples.
A description of the algorithm can be found in the following article, recently accepted at the 25th International Conference on Pattern Recognition (ICPR2020).
2020_PIF: Anomaly detection via preference embedding_Filippo Leveni, Luca Magri, Giacomo Boracchi and Cesare Alippi.pdf
Why?
Voronoi Isolation Forest has a greater breadth of applicability compared to Isolation Forest, as it solves two of its main problems:
Additional comments
I believe that this algorithm can benefit many people interested in Data Mining, both for work and for passion.
I am available for any code correction and to produce any documentation (including demos) to allow users to grasp the usefulness and applicability of the algorithm.
Thanks again for your attention.