
Description
Posted the same question on stackoverflow. A user there said I should open a issue here on the github page, since it is a bug.
I have the following dataframe:
df = pd.DataFrame([[0, 1, 2, 4, np.nan, np.nan, np.nan],
[0, 1, 2 ,np.nan, np.nan, np.nan,np.nan],
[0, 2, 2 ,np.nan, 2, np.nan,1]])
With output:
0 1 2 3 4 5 6
0 0 1 2 4.0 NaN NaN NaN
1 0 1 2 NaN NaN NaN NaN
2 0 2 2 NaN 2.0 NaN 1.0
with dtypes:
df.dtypes
0 int64
1 int64
2 int64
3 float64
4 float64
5 float64
6 float64
dtype: object
Then the underneath rolling summation is applied:
df.rolling(window = 7, min_periods =1, axis = 'columns').sum()
And the output is as follows:
0 1 2 3 4 5 6
0 0.0 1.0 3.0 4.0 4.0 4.0 4.0
1 0.0 1.0 3.0 NaN NaN NaN NaN
2 0.0 2.0 4.0 NaN 2.0 2.0 3.0
I notice that the rolling window stops and starts again whenever the dtype
of the next column is different.
I however have a dataframe whereby all columns are of the same object
type.
df = df.astype('object')``
which has output:
0 1 2 3 4 5 6
0 0.0 1.0 3.0 7.0 7.0 7.0 7.0
1 0.0 1.0 3.0 3.0 3.0 3.0 3.0
2 0.0 2.0 4.0 4.0 6.0 6.0 7.0
My desired output however, stops and starts again after a nan
value appears. This would look like:
0 1 2 3 4 5 6
0 0.0 1.0 3.0 7.0 NaN NaN NaN
1 0.0 1.0 3.0 NaN NaN NaN NaN
2 0.0 2.0 4.0 NaN 2.0 NaN 3.0
I figured there must be a way that NaN values are not considered but also not filled in with values obtained from the rolling window.
Anything would help!