Open In App

Convert Bytes To a Pandas Dataframe

Last Updated : 11 Jul, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

In Python, bytes are a built-in data type used to represent a sequence of bytes. They are immutable sequences of integers, with each integer typically representing a byte of data ranging from 0 to 255.

Convert Bytes Data into a Python Pandas Dataframe?

We can convert bytes into data frames using different methods:

1. Using the pd.DataFrame Constructor, bytes_data decoder and StringIO

We can convert bytes into data frames using pd.DataFrame constructor directly. Here, we created a byte data and stored it in a variable, we converted this byte data into string using the decode('utf-8') method then the pd.read_csv method reads the string as a CSV and converts it into a DataFrame (df_method1).

Python
import pandas as pd
from io import StringIO

# Sample bytes data
bytes_data = b'Name,Age,Occupation\nJohn,25,Engineer\nAlice,30,Doctor\nBob,28,Artist'

# Convert bytes to string and then to DataFrame
data_str = bytes_data.decode('utf-8')
df_method1 = pd.read_csv(StringIO(data_str))

# Display the DataFrame
print(df_method1)

Output:

    Name  Age Occupation
0 John 25 Engineer
1 Alice 30 Doctor
2 Bob 28 Artist

2. Using NumPy and io.BytesIO

The io.BytesIO class is part of Python's built-in io module. It provides a way to create a file-like object that operates on in-memory bytes data.

Here, we use NumPy's genfromtxt() function to read data from a CSV-like formatted byte stream. BytesIO(bytes_data) creates a file-like object that provides a stream interface to the bytes data. delimiter=',' names=True, dtype=None, and encoding='utf-8' specifies the parameters of the encoding.

Then we converted this array data into dataframe.

Python
import numpy as np
import pandas as pd
from io import BytesIO

# Sample bytes data
bytes_data = b'Name,Age,Occupation\nJohn,25,Engineer\nAlice,30,Doctor\nBob,28,Artist'

# Convert bytes to DataFrame using NumPy and io.BytesIO
array_data = np.genfromtxt(
    BytesIO(bytes_data), delimiter=',', names=True, dtype=None, encoding='utf-8')
df_method2 = pd.DataFrame(array_data)

# Display the DataFrame
print(df_method2)

Output:

    Name  Age Occupation
0 John 25 Engineer
1 Alice 30 Doctor
2 Bob 28 Artist

3. Using Custom Parsing Function

We can use parsing function. here, the code decodes bytes data into strings using UTF-8 encoding, then splits it into records by newline characters. Each record is further split into key-value pairs delimited by '|', and key-value pairs by ':'. It constructs dictionaries for each record, with keys and values derived from the splits.

Finally, it assembles these dictionaries into a DataFrame using pandas. This approach allows structured byte data to be converted into a tabular format

Python
import pandas as pd

# Sample bytes data
bytes_data = b'Name:John|Age:25|Occupation:Engineer\nName:Alice|Age:30|Occupation:Doctor\nName:Bob|Age:28|Occupation:Artist'


def parse_bytes_data(data):
    # Decode bytes data and split into records
    records = data.decode('utf-8').split('\n')
    parsed_data = []
    for record in records:
        if record:  # Skip empty records
            items = record.split('|')  # Split record into key-value pairs
            record_dict = {}
            for item in items:
                key, value = item.split(':')  # Split key-value pair
                record_dict[key] = value
            # Append record dictionary to parsed data
            parsed_data.append(record_dict)
    return pd.DataFrame(parsed_data)  # Create DataFrame from parsed data


# Convert bytes to DataFrame using custom parsing function
df_method3 = parse_bytes_data(bytes_data)

# Display the DataFrame
print(df_method3)

Output:

    Name Age Occupation
0 John 25 Engineer
1 Alice 30 Doctor
2 Bob 28 Artist

Conclusion

In conclusion, Python offers different methods for converting bytes to DataFrames like Using the pd.DataFrame Constructor, Using NumPy and io.BytesIO, and Custom Parsing Functions.


Next Article
Article Tags :

Similar Reads