Skip to content

Conversation

phofl
Copy link
Member

@phofl phofl commented Sep 18, 2020

In case of axis=1 and mixed dtypes, the _apply_blockwise did not calculate the sum for a complete row. Through converting the obj and consolidating the blocks we can avoid this.

@mroeschke
Copy link
Member

Does this also potentially close #35596?

@phofl
Copy link
Member Author

phofl commented Sep 18, 2020

Thanks, I missed that issue. When the output you posted is the correct one, then yes, this fixes the issue. I will add a test.

@@ -243,7 +243,9 @@ def _create_data(self, obj: FrameOrSeries) -> FrameOrSeries:
if self.on is not None and not isinstance(self.on, Index):
if obj.ndim == 2:
obj = obj.reindex(columns=obj.columns.difference([self.on]), copy=False)

if self.axis == 1:
obj = obj.astype("float64", copy=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment with the issue number and a small explanation on why we are doing this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @jbrockmendel FYI

@mroeschke mroeschke added Bug Window rolling, ewma, expanding labels Sep 18, 2020
@mroeschke mroeschke added this to the 1.2 milestone Sep 18, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add another test which has mixed types with say a string in one column (and also try with datetimes we don't want to coerce), so this raises I think we are going to have an odd error message and may want to catch / re-raise if that is the case

actually i think we need to explicity get outnumeric dtypes (e.g. do a select_dtypes(include=[['integer', 'float']])

otherwise we might include timedelta for example (and we don't know how to coerce back I think)

@@ -243,7 +243,9 @@ def _create_data(self, obj: FrameOrSeries) -> FrameOrSeries:
if self.on is not None and not isinstance(self.on, Index):
if obj.ndim == 2:
obj = obj.reindex(columns=obj.columns.difference([self.on]), copy=False)

if self.axis == 1:
obj = obj.astype("float64", copy=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @jbrockmendel FYI

@phofl
Copy link
Member Author

phofl commented Sep 19, 2020

@jreback so we should raise if we get dtypes except float and int?

@mroeschke
Copy link
Member

I think we should just drop columns that are not coercible to float like axis=0 behavior

In [1]: df = pd.DataFrame({'a': range(5), 'b': ['a'] * 5})

In [2]: df
Out[2]:
   a  b
0  0  a
1  1  a
2  2  a
3  3  a
4  4  a

In [3]: df.rolling(2).sum()
Out[3]:
     a
0  NaN
1  1.0
2  3.0
3  5.0
4  7.0

@phofl
Copy link
Member Author

phofl commented Sep 19, 2020

Thanks @mroeschke. Implemented it.

@jreback
Copy link
Contributor

jreback commented Sep 19, 2020

lgtm ping on green.

@phofl
Copy link
Member Author

phofl commented Sep 19, 2020

@jreback green

@jreback jreback merged commit 00a510b into pandas-dev:master Sep 19, 2020
@jreback
Copy link
Contributor

jreback commented Sep 19, 2020

thanks @phofl very nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Window rolling, ewma, expanding
Projects
None yet
3 participants