Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AdaBoostStumpsSampler #124

Closed
wants to merge 21 commits into from
Closed

Conversation

glevv
Copy link

@glevv glevv commented Jun 27, 2021

MC approximation of AdaBoost stump kernel #119

@glevv glevv closed this Jun 27, 2021
@glevv glevv reopened this Jun 30, 2021
@TimotheeMathieu
Copy link
Contributor

Thank you @glevv for this, some comments:

  • Could you include a short description and explanation of the difference between this and fastfood in the documentation ?
  • Also, I think on some datasets this may not be a good idea to use a scaling of 1/max(|x|) as this is can vary a lot. More generally I personally would use one of scikit-learn scalers on the data (for instance StandardScaler, what you propose if MinMaxScaler). Maybe you could include a scale_X which can be True or False among the parameters and by default this would use the StandardScaler which is the most common scaler ? THe best preprocessing really depends on the dataset so it should not be fixed in the algorithm.

Otherwise LGTM, thanks.

@glevv
Copy link
Author

glevv commented Jul 30, 2021

They are completely different kernels and methods of computing them.
Stump kernel was presented in Support Vector Machinery for Infinite Ensemble Learning, but it could be hard to compute exactly, so in the paper Uniform Approximation of Functions with Random Bases MC approximation was proposed (the same paper where MC approximation of RBF kernel - RBFSampler - is described).
I think that StumpKernelSampler/StumpSampler should be shorter and more consistent name (similar to RBFSampler).

As for the scaling, I think it is possible to remove it altogether and let users build their own pipelines (obviously stating in the docs that this method requires scaling). It will be consistent with other kernel methods/approximations (RBFSampler also requires scaling to give proper approximation) and original formulation in the paper.

@glevv
Copy link
Author

glevv commented Jan 10, 2023

Closed due to inactivity

@glevv glevv closed this Jan 10, 2023
@glevv glevv deleted the abssampler branch May 14, 2024 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants