Closed
Description
Using pandas 1.0.5 and latest dask 2020.12.0:
In [2]: import dask.dataframe as dd
In [3]: df = pd.DataFrame({"x": ["a", "b", "c"] * 100}, dtype="category")
...: ddf = dd.from_pandas(df, npartitions=3)
In [4]: df.x
Out[4]:
0 a
1 b
2 c
3 a
4 b
..
295 b
296 c
297 a
298 b
299 c
Name: x, Length: 300, dtype: category
Categories (3, object): [a, b, c]
In [5]: ddf.x
Out[5]:
Dask Series Structure:
npartitions=3
0 category[known]
100 ...
200 ...
299 ...
Name: x, dtype: category
Dask Name: getitem, 6 tasks
In [6]: df.x == ddf.x
Out[6]:
0 True
1 True
2 True
3 True
4 True
...
295 True
296 True
297 True
298 True
299 True
Name: x, Length: 300, dtype: bool
In [9]: (df.x == ddf.x).all()
Out[9]: True
But with master (using same dask version), this gives:
In [3]: df.x == ddf.x
Out[3]:
0 False
1 False
2 False
3 False
4 False
...
295 False
296 False
297 False
298 False
299 False
Name: x, Length: 300, dtype: bool