Open In App

Reading Python File-Like Objects from C | Python

Last Updated : 07 Jun, 2019
Comments
Improve
Suggest changes
Like Article
Like
Report
Writing C extension code that consumes data from any Python file-like object (e.g., normal files, StringIO objects, etc.). read() method has to be repeatedly invoke to consume data on a file-like object and take steps to properly decode the resulting data. Given below is a C extension function that merely consumes all of the data on a file-like object and dumps it to standard output. Code #1 : CPP
#define CHUNK_SIZE 8192

/* Consume a "file-like" object and write bytes to stdout */
static PyObject* py_consume_file(PyObject* self, PyObject* args)
{
    PyObject* obj;
    PyObject* read_meth;
    PyObject* result = NULL;
    PyObject* read_args;

    if (!PyArg_ParseTuple(args, "O", &obj)) {
        return NULL;
    }

    /* Get the read method of the passed object */
    if ((read_meth = PyObject_GetAttrString(obj, "read")) == NULL) {
        return NULL;
    }

    /* Build the argument list to read() */
    read_args = Py_BuildValue("(i)", CHUNK_SIZE);
    while (1) {
        PyObject* data;
        PyObject* enc_data;
        char* buf;
        Py_ssize_t len;

        /* Call read() */
        if ((data = PyObject_Call(read_meth, read_args, NULL)) == NULL) {
            goto final;
        }

        /* Check for EOF */
        if (PySequence_Length(data) == 0) {
            Py_DECREF(data);
            break;
        }

        /* Encode Unicode as Bytes for C */
        if ((enc_data = PyUnicode_AsEncodedString(data,
             "utf-8", "strict")) == NULL) {
            Py_DECREF(data);
            goto final;
        }

        /* Extract underlying buffer data */
        PyBytes_AsStringAndSize(enc_data, &buf, &len);

        /* Write to stdout (replace with something more useful) */
        write(1, buf, len);

        /* Cleanup */
        Py_DECREF(enc_data);
        Py_DECREF(data);
    }
    result = Py_BuildValue("");

final:
    /* Cleanup */
    Py_DECREF(read_meth);
    Py_DECREF(read_args);
    return result;
}
A file-like object such as a StringIO instance is prepared to test the code and then it is passed in: Code #2 : Python3 1==
import io
f = io.StringIO('Hello\nWorld\n')
import sample
sample.consume_file(f)
Output :
Hello
World
Unlike a normal system file, a file-like object is not necessarily built around a low-level file descriptor. Thus, a normal C library functions can't be used to access it. Instead, a Python’s C API is used to manipulate the file-like object much like you would in Python. So, the read() method is extracted from the passed object. An argument list is built and then repeatedly passed to PyObject_Call() to invoke the method. To detect end-of-file (EOF), PySequence_Length() is used to see if the returned result has zero length. For all I/O operations, the concern is underlying encoding and distinction between bytes and Unicode. This recipe shows how to read a file in text mode and decode the resulting text into a bytes encoding that can be used by C. If the file is read in binary mode, only minor changes will be made as shown in the code below. Code #3 : CPP
/* Call read() */
if ((data = PyObject_Call(read_meth, read_args, NULL)) == NULL) {
    goto final;
}

/* Check for EOF */
if (PySequence_Length(data) == 0) {
    Py_DECREF(data);
    break;
}

if (!PyBytes_Check(data)) {
    Py_DECREF(data);
    PyErr_SetString(PyExc_IOError, "File must be in binary mode");
    goto final;
}

/* Extract underlying buffer data */
PyBytes_AsStringAndSize(data, &buf, &len);

Next Article
Article Tags :
Practice Tags :

Similar Reads