Skip to content

ENH: df.to_parquet() should return bytes #37105

@impredicative

Description

@impredicative

Is your feature request related to a problem?

I find it useful to write a parquet to a bytes object for some unit tests. The code that I currently use to do this is quite verbose.

To provide some background, df.to_csv() (w/o args) just works. It returns a str object as is expected. In the same vein, df.to_parquet() (w/o args) should return a bytes object.

More precisely, the current behavior is:

>>> df = pd.DataFrame()

>>> type(df.to_csv())  # This works
<class 'str'>

>>> df.to_parquet() # This should be made to work
Traceback (most recent call last):
  File "<input>", line 1, in <module>
TypeError: to_parquet() missing 1 required positional argument: 'path'

Describe the solution you'd like

The requested behavior is:

>>> df = pd.DataFrame()

>>> type(df.to_parquet())
<class 'bytes'>

Other uses of df.to_parquet should obviously remain unaffected.

API breaking implications

It won't break the documented API.

Describe alternatives you've considered

I currently use this verbose code to get what I want:

import io

import pandas as pd

df = pd.DataFrame()
pq_file = io.BytesIO()
df.to_parquet(pq_file)
pq_bytes = pq_file.getvalue()

This workaround is too effortful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    API - ConsistencyInternal Consistency of API/BehaviorEnhancementIO Parquetparquet, featherNeeds DiscussionRequires discussion from core team before further action

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions