Skip to content

Misleading error message when incorrectly using DataFrame.style.apply #45313

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
aberres opened this issue Jan 11, 2022 · 5 comments · Fixed by #45383
Closed

Misleading error message when incorrectly using DataFrame.style.apply #45313

aberres opened this issue Jan 11, 2022 · 5 comments · Fixed by #45383
Labels
Error Reporting Incorrect or improved errors from pandas Styler conditional formatting using DataFrame.style
Milestone

Comments

@aberres
Copy link
Contributor

aberres commented Jan 11, 2022

Reproducible Example

def my_styler(df: pd.DataFrame):
    # This would silence the error message
    # return pd.Series("background-color: #E6E6E6" if c else "" for c in df)

    # The docs tell us to return a data frame/series here
    return ["background-color: #E6E6E6" if c else "" for c in df]

df = pd.DataFrame([[1, 2], [3, 4]], columns=["A", "B"])

# Not ok according to the docs, but it does not fail
df.style.apply(my_styler, subset=df.columns == "A").to_excel("foo.xlsx")

# Subset is empty, my_styler is called with an empty series, things fail
df.style.apply(my_styler, subset=df.columns == "C").to_excel("foo.xlsx")

Issue Description

I was running into this issue when testing 1.4.0rc0 on a code base which worked with Pandas 1.3.

Our code which used to work before now triggers an error which is kind of expected when I read the documentation. The function passed to DataFrame.style.apply should either return a series or a data frame. In our case instead a list was returned. And after all, we should just use applymap.

As this is clearly against the contract, it is ok, that the code now fails - it is just that the error message might be a bit misleading.

        if isinstance(result, Series):
>           raise ValueError(
                f"Function {repr(func)} resulted in the apply method collapsing to a "
                f"Series.\nUsually, this is the result of the function returning a "
                f"single value, instead of list-like."
            )
E           ValueError: Function <function test_xlsx.<locals>.my_styler at 0x7fe1d9baea60> resulted in the apply method collapsing to a Series.
E           Usually, this is the result of the function returning a single value, instead of list-like.

/Users/armin/venv/weplan3.9/lib/python3.9/site-packages/pandas/io/formats/style.py:1310: ValueError

What happened in practice is that a list-like was returned. Returning a series would actually fix the problem.

Expected Behavior

Should the error message be changed to clarify that a series or data frame should be returned?
Should the first call - with a matching column - also fail?

Installed Versions

INSTALLED VERSIONS

commit : d023ba7
python : 3.9.6.final.0
python-bits : 64
OS : Darwin
OS-release : 20.6.0
Version : Darwin Kernel Version 20.6.0: Wed Jun 23 00:26:31 PDT 2021; root:xnu-7195.141.2~5/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8

pandas : 1.4.0rc0

@aberres aberres added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 11, 2022
@attack68
Copy link
Contributor

attack68 commented Jan 11, 2022

Not sure your right here, since in docs (should probably insert the word 'return' though):

func should take a Series if axis in [0,1] and return an object of same length, also with identical index if the (return) object is a Series.

The axis arg defaults to 0 so you are passing a Series to my_styler not a DataFrame, when you call:

df.style.apply(my_styler, subset=df.columns == "A"),

and you are returning a list of correct length which is fine.

However, you have a valid point about the error message. When you call DataFrame.apply (which Styler does internally) on an empty DataFrame the result is an empty Series. So the method has collapsed to a Series, but this is not really the case where this error message is useful. Probably a test for an empty DataFrame and not calling DataFrame.apply might be preferable.

Just as an example where this message is useful:

def my_styler_error(s):
    return "background-color: #E6E6E6" * len(s)
df = pd.DataFrame([[1, 2], [3, 4]], columns=["A", "B"])
df.style.apply(my_styler_error, subset="A")  # <- ValueError: collapse to series: returns scalar

def my_styler_correct(s):
    return ["background-color: #E6E6E6"] * len(s)
df = pd.DataFrame([[1, 2], [3, 4]], columns=["A", "B"])
df.style.apply(my_styler_correct, subset="A")  # <- OK

@attack68 attack68 added Styler conditional formatting using DataFrame.style Error Reporting Incorrect or improved errors from pandas and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 11, 2022
@aberres
Copy link
Contributor Author

aberres commented Jan 12, 2022

Not sure your right here, since in docs (should probably insert the word 'return' though):

Ah, I see. To be honest I replaced the apply calls with applymap as I figured this more readable for our case.

@aberres
Copy link
Contributor Author

aberres commented Jan 12, 2022

@attack68

I had another short look:

My initial example could be rewritten as:

def my_styler_correct(s):
    return ["background-color: #E6E6E6"] * len(s)
df = pd.DataFrame([[1, 2], [3, 4]], columns=["A", "B"])
df.style.apply(my_styler_correct, subset=[False, False])
# or
df.style.apply(my_styler_correct, subset=[])

I guess you have something like the diff below in mind? I can confirm this helps with the issue.
As there was no error with 1.3, maybe a change like this should be considered for 1.4?

diff --git a/pandas/io/formats/style.py b/pandas/io/formats/style.py
index c8f65c6a2b..b2d70d2a0f 100644
--- a/pandas/io/formats/style.py
+++ b/pandas/io/formats/style.py
@@ -1487,6 +1487,9 @@ class Styler(StylerRenderer):
         subset = slice(None) if subset is None else subset
         subset = non_reducing_slice(subset)
         data = self.data.loc[subset]
+        if data.empty:
+            return self
+
         if axis is None:
             result = func(data, **kwargs)
             if not isinstance(result, DataFrame):

@attack68
Copy link
Contributor

yes you are right, although I wonder if df.style.apply(my_styler_correct, subset=[df.columns == "C"]) should not be equivalent to df.style.apply(my_styler_correct, subset="C"), which I believe raises a KeyError

@aberres
Copy link
Contributor Author

aberres commented Jan 12, 2022

Not sure - our original use case was actually df.style.apply(my_styler_correct, subset=(df.columns != "C")). Aka "style all columns but C".

If - for whatever reason, columns beeing dynamic in our case - the frame only contains the column C this is absolutely fine I'd say.

@jreback jreback modified the milestones: 1.4, 1.5 Jan 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas Styler conditional formatting using DataFrame.style
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants