-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ENH Adds HTML visualizations for estimators #14180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Awesome! I need to test out this parameter display thing. I want to know what happens when there's a lot to show (does it wrap?) Is the monospace font necessary / helpful? |
PR updated with:
|
One of my questions here is how we want to make this appear in the documentation... As in, should these appear throughout the user guide or tutorial or example gallery. |
Right now this generates a self contained HTML file with the css and html it needs to render. If this were to be placed in the examples, it would output a bunch of HTML when one runs the example locally. (We can do some clever hacking to get the HTML to render nicely on the webpage) As for documentation, we can use this internally to display pipelines and/or metaestimators. (Most likely needing some Sphinx wizardry) |
Hm so pipelines show names if the step is a meta-estimator, and it doesn't show the name otherwise. That seems a bit inconsistent? Similarly there's no box for pipelines containing simple estimators. There doesn't seem to be a distinction between how a column transformer is visualized and how a voting classifier is visualized. I feel like they should look different in the graph, or they should have names somehow? |
I feel like we need someone that knows about UX/UI design to work on this |
We can do this two ways: either we merge a MVP and iterate or we try to "get it right". I think getting it right will result in lots of bike shedding and I'm not sure if we have anyone that's good with UIs. So maybe iterative is better? Then the main remaining points are rendering in sphinx and adding generic meta-estimators and deciding on a name? I would only really use it in the user guide for the pipeline and feature union, I think. |
Agreed. Not to denigrate @thomasjpfan's illustrious skills of course. Is there someone we can can call on? |
I'm okay to iterate too. I.e. to make a reasonable first effort visible
then hope the right helper comes along.
|
@jnothman I'm trying to figure out if Columbia lets me make payments to upwork, which would increase the chances of the right helper to come along tremendously, I think ;) I suggested to @thomasjpfan to implement support for generic meta-estimators and figure out what it takes to render this in sphinx and then we can try and merge and iterate. |
You can demo the visualization here: https://thomasjpfan.github.io/sklearn_viz_html/index.html |
But my main concern here is visibility of the feature. Do we want this to be the default With |
Maybe @lesteve knows? |
Full disclosure I am not following sphinx-gallery very closely any more. I don't think there is support for capturing rich output from the notebook inside sphinx-gallery yet. There were related discussions: sphinx-gallery/sphinx-gallery#396 and sphinx-gallery/sphinx-gallery#421. According to sphinx-gallery/sphinx-gallery#421 (comment), it seems like image scrapers may help. |
Even if sphinx-gallery adds support, sphinx doesn't and we'd need to create a new extension to show it inside the user guide (which we would want). I think this would be cool as One option would be to make this only the default for meta-estimators, but not sure? |
If it was a graphic, it wouldn't need extra Sphinx support if we only used
diagrams from examples, as we do with plots...
For Sphinx support: We can easily generate the raw html in external files
with a simple preprocessor, and could eventually make it into a directive.
It sounds like the default repr discussion should happen separately where
we can consider the proposals.
|
This directive might help: https://ipython.readthedocs.io/en/stable/sphinxext.html |
Some learning would also likely be involved. How would we feel if a library that uses cross-validation invented a new word for it because cross-validation is too technical?
That's different: it's about adapting to our audience. Our audience needs to know some ML/stats to use scikit-learn.
A lot of statisticians use R partly because it's less "geeky", less computer jargon. We need to cater to this audience.
|
For two endpoints it might be a bit awkward to maintain them in sync code wise I think (and a bit confusing to have two option with the same effect). But having the display config option that is {"text", "html"} and optionally accepts a list of MIME types for extensibility by advanced users, why not. This is all linked to the user extensibility discussion though. |
yeah, we could maybe call it |
Not chiming into the subjective appreciation on whether For example, |
Or |
I don't think this is a question of MIME type. This is a question of how to format estimators. We could have multiple representations in each MIME type and could effectively return a simple repr in HTML with some highlighting etc, and it would still be distinct from the visual diagram representation implemented here. Were we to have an ASCII implementation of this diagram it would still be visual but text/plain. Further, were we to have implemented this in SVG (MIME image/svg+xml), changing the implementation to text/html should not concern the user. The question here for the user is how we should summarise an estimator by default in a repl or cell-based framework that supports whatever MIME types we can offer it; the option to constrain MIME type is only pertinent if we have multiple MIME types with the same effective representation (but distinct presentation, I suppose). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still +1 on display="diagram"
or alternatively display_style="diagram"
.
doc/modules/compose.rst
Outdated
`display` option in :func:`sklearn.set_config`:: | ||
|
||
>>> from sklearn import set_config | ||
>>> set_config(display=True) # doctest: +SKIP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>>> set_config(display=True) # doctest: +SKIP | |
>>> set_config(display="diagram") # doctest: +SKIP |
Still +1 on display="diagram" or alternatively display_style="diagram".
I like it
|
That is an excellent point. Diagram or another style is independent of the mime type. |
To be clear this PR adjusts the output of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should merge and can consider name changes right up to release. Should we implement an ipython magic to make it easy to enable?
(though I'm happy with this config name) |
Amazing work, @thomasjpfan |
OMG I'm so happy this is merged!!! It seems to me right now the option does both, allowing an additional mime type and selecting what the content of the representation is. If you'd added the hypothetical different html representation that's not a graph,how would you enable that? I was thinking about this addition in terms of mime type, and the diagram just as the most natural way to express nested estimators in html. I don't have a strong opinion on the naming though. 'diagram' doesn't seem like a very natural option but I'm happy as long as we have the feature. These days I always think in terms of teaching stuff. How would you explain having a |
Yes. Also this might be temporary. Maybe in 6 months it will be enabled by default. |
Hi, |
Here is the documentation. |
Reference Issues/PRs
Closes #14061
What does this implement/fix? Explain your changes.
You can demo the visualization here: https://thomasjpfan.github.io/sklearn_viz_html/index.html
This PR implements a HTML visualization for estimators with a focus on displaying it in a Jupyter notebook or lab. This implementation is in pure HTML and CSS (no javascript or external dependencies):
print_changed_only=True
is the default forexport_html
):_type_of_html_estimator
returns how to layout metaestimators, (ColumnTransformer
andFeatureUnion
is "parallel", whilePipeline
is "serial") If there are any other metaestimators to add, we just need to add it to_type_of_html_estimator
)sk-final-spacer
as a hack to provide enough space for the information displayed while hovering over elements.Code to Create HTML (In jupyterlab or a notebook)