-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Description
Hello,
It seems possible to have negative variances due to numerical inaccuracies. This is because nanops.py, line 120 does not take the absolute value of the result. Having negative values will cause std() to return NaN when it should be 0.
The code below should [probabilistically] recreate the problem. It could also be turned into a unit test.
Thanks!
from pandas import DataFrame
import numpy as np
random_repeated_rows = np.array( [np.random.random((10000,)),] * 10 )
my_var = DataFrame( random_repeated_rows ).var()
len( my_var[ my_var < 0 ] ) # returns a negative slightly less than half of the time
np.min( DataFrame( random_repeated_rows ).var() ) # returns a tiny negative -9.8686491077791697e-16
np.min( DataFrame( random_repeated_rows ).values.var(axis=0) ) # returns 0