-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Closed
Labels
Milestone
Description
When you use apply on a DataFrame with datetimes in, the result is unexpected. This is a dataframe with just integers and strings and the result is that we get the market names back out.
positions = pd.DataFrame([[1, 'ABC', 50], [1, 'YUM', 20],
[1, 'DEF', 20], [2, 'ABC', 50],
[2, 'YUM', 20], [2, 'DEF', 20]],
columns=['a', 'market', 'position'])
positions.apply(lambda r: r['market'], axis=1)
Out[210]:
0 ABC
1 YUM
2 DEF
3 ABC
4 YUM
5 DEF
dtype: object
If we replace the data in column 'a' with datetimes, then we get the wrong result - the first value in the market column is repeated:
import datetime
positions = pd.DataFrame([[datetime.datetime(2013, 1, 1), 'ABC', 50],
[datetime.datetime(2013, 1, 1), 'YUM', 20],
[datetime.datetime(2013, 1, 1), 'DEF', 20],
[datetime.datetime(2013, 1, 2), 'ABC', 50],
[datetime.datetime(2013, 1, 2), 'YUM', 20],
[datetime.datetime(2013, 1, 2), 'DEF', 20]],
columns=['a', 'market', 'position'])
positions.apply(lambda r: r['market'], axis=1)
Out[213]:
0 ABC
1 ABC
2 ABC
3 ABC
4 ABC
5 ABC
dtype: object
If you replace the lambda function with a function which prints the object passed in, then you can see that you only ever receive the first row of the dataframe:
def print_input(r):
print r
return 1
positions.apply(print_input, axis=1)
a 2013-01-01 00:00:00
market ABC
position 50
Name: 0, dtype: object
a 2013-01-01 00:00:00
market ABC
position 50
Name: 1, dtype: object
a 2013-01-01 00:00:00
market ABC
position 50
Name: 2, dtype: object
a 2013-01-01 00:00:00
market ABC
position 50
Name: 3, dtype: object
a 2013-01-01 00:00:00
market ABC
position 50
Name: 4, dtype: object
a 2013-01-01 00:00:00
market ABC
position 50
Name: 5, dtype: object
Out[215]:
0 1
1 1
2 1
3 1
4 1
5 1
dtype: int64
This is new in the master, I didn't see it in pandas 0.11.0 or 0.13.0.