Skip to content

BUG: rolling_window yields unexpected results with win_type='triang' #7618

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
AllenDowney opened this issue Jun 30, 2014 · 12 comments · Fixed by #8238
Closed

BUG: rolling_window yields unexpected results with win_type='triang' #7618

AllenDowney opened this issue Jun 30, 2014 · 12 comments · Fixed by #8238
Labels
Bug Numeric Operations Arithmetic, Comparison, and Logical operations
Milestone

Comments

@AllenDowney
Copy link
Contributor

Here's the example in the documentation, modified to have non-zero mean:

n = 100
ser = pandas.Series(randn(n)+10, index=pandas.date_range('1/1/2000', periods=n))
pandas.rolling_window(ser, 5, 'triang').plot()
pandas.rolling_window(ser, 5, 'boxcar').plot()

The rolling boxcar window is centered around 10, as expected.

The triang window is centered around 6. That suggests that the weights in the window don't add up to 1.

Either that or my expectation of how it should work is wrong?

@jreback
Copy link
Contributor

jreback commented Jun 30, 2014

these are scipy dependent so

import scipy.signal as sig
sig.get_window(win_type, window)

yields the weights

@AllenDowney
Copy link
Contributor Author

Understood. It looks like there is some logic in rolling_window that is
supposed to normalize the weights. Maybe that is not correct?

On Mon, Jun 30, 2014 at 11:13 AM, jreback [email protected] wrote:

these are scipy dependent so

import scipy.signal as sig
sig.get_window(win_type, window)

yields the weights


Reply to this email directly or view it on GitHub
#7618 (comment).

@jreback
Copy link
Contributor

jreback commented Jun 30, 2014

@AllenDowney certainly possible (not sure whether this should be normalized or not). Much of this I don't think is actually tested (as it relies on the weights from scipy). Want to put some test cases together?

@AllenDowney
Copy link
Contributor Author

I don't have a pandas dev environment set up, but here's a simple case:

n = 10
ser = pandas.Series(np.ones(n))

mean = pandas.rolling_window(ser, 5, 'triang').mean()
np.testing.assert_approx_equal(mean, 1.0)

mean = pandas.rolling_window(ser, 5, 'gaussian', std=1.5).mean()
np.testing.assert_approx_equal(mean, 1.0)

mean = pandas.rolling_window(ser, 5, 'boxcar').mean()
np.testing.assert_approx_equal(mean, 1.0)

Currently only 'boxcar' is passing.

One theory: when rolling_window calls algos.roll_window, should it be
sending avg_wgt=True?

Allen

On Mon, Jun 30, 2014 at 11:48 AM, jreback [email protected] wrote:

@AllenDowney https://github.com/AllenDowney certainly possible (not
sure whether this should be normalized or not). Much of this I don't think
is actually tested (as it relies on the weights from scipy). Want to put
some test cases together?


Reply to this email directly or view it on GitHub
#7618 (comment).

@jreback
Copy link
Contributor

jreback commented Jun 30, 2014

In [10]: sig.get_window(('gaussian',1.5),5)
Out[10]: array([ 0.41111229,  0.8007374 ,  1.        ,  0.8007374 ,  0.41111229])

In [11]: sig.get_window('boxcar',5)
Out[11]: array([ 1.,  1.,  1.,  1.,  1.])

In [12]: sig.get_window('triang',5)
Out[12]: array([ 0.33333333,  0.66666667,  1.        ,  0.66666667,  0.33333333])

Do these need to be normalized?

@AllenDowney
Copy link
Contributor Author

Yes, I think so.

On Mon, Jun 30, 2014 at 1:12 PM, jreback [email protected] wrote:

In [10]: sig.get_window(('gaussian',1.5),5)
Out[10]: array([ 0.41111229, 0.8007374 , 1. , 0.8007374 , 0.41111229])

In [11]: sig.get_window('boxcar',5)
Out[11]: array([ 1., 1., 1., 1., 1.])

In [12]: sig.get_window('triang',5)
Out[12]: array([ 0.33333333, 0.66666667, 1. , 0.66666667, 0.33333333])

Do these need to be normalized?


Reply to this email directly or view it on GitHub
#7618 (comment).

@jreback
Copy link
Contributor

jreback commented Jun 30, 2014

so why is 'boxcar' right then?

@AllenDowney
Copy link
Contributor Author

Because the window is all 1s, it is already normalized.

On Mon, Jun 30, 2014 at 1:19 PM, jreback [email protected] wrote:

so why is 'boxcar' right then?


Reply to this email directly or view it on GitHub
#7618 (comment).

@jreback jreback added this to the 0.15.0 milestone Jul 7, 2014
@marco-giancotti
Copy link

Indeed it appears to be a normalization issue. The result is correct (i.e. it overlaps with the original signal) if you multiply it by the following number:

window / sum(sig.get_window('triang', window))

@jreback
Copy link
Contributor

jreback commented Aug 7, 2014

@marco-giancotti @AllenDowney

either of you want to put some tests / fix in place for this?

ideally some comprehensive testing of all window types to validate this

@AllenDowney
Copy link
Contributor Author

I might have time to do this next week, but would not be deeply offended if
someone else beat me to it.

Allen

On Thu, Aug 7, 2014 at 10:24 AM, jreback [email protected] wrote:

@marco-giancotti https://github.com/marco-giancotti @AllenDowney
https://github.com/AllenDowney

either of you want to put some tests / fix in place for this?

ideally some comprehensive testing of all window types to validate this


Reply to this email directly or view it on GitHub
#7618 (comment).

@stahlous
Copy link
Contributor

stahlous commented Sep 9, 2014

I ran across this while looking into rolling_apply and got sidetracked thinking about rolling_window instead. I'm wondering if this function even makes sense in the default case of center=False for non-boxcar weights. If you have a time-series and apply a gaussian weighting to the rolling average where the most strongly weighted point is in the middle of the window, it seems strange that the computed mean would be correlated with the last index point in the window, at least by default.

This roughly illustrates my point:

s = pd.Series(np.sin(np.arange(0, 2*np.pi, 2*np.pi/20)))
s.plot()
pd.rolling_window(s, window=5, win_type='triang', std=1.0, center=False).plot()
pd.rolling_window(s, window=5, win_type='triang', std=1.0, center=True).plot()

index

I suppose this is nothing new, but it seems like a place where unaware users could get tripped up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants