Skip to content

negative variances #1090

@turkeytest

Description

@turkeytest

Hello,

It seems possible to have negative variances due to numerical inaccuracies. This is because nanops.py, line 120 does not take the absolute value of the result. Having negative values will cause std() to return NaN when it should be 0.

The code below should [probabilistically] recreate the problem. It could also be turned into a unit test.

Thanks!

from pandas import DataFrame
import numpy as np

random_repeated_rows = np.array( [np.random.random((10000,)),] * 10 )
my_var = DataFrame( random_repeated_rows ).var()

len( my_var[ my_var < 0 ] ) # returns a negative slightly less than half of the time
np.min( DataFrame( random_repeated_rows ).var() ) # returns a tiny negative -9.8686491077791697e-16
np.min( DataFrame( random_repeated_rows ).values.var(axis=0) ) # returns 0

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions