Skip to content

read_csv parse issues with \r line ending and quoted items #3453

@sandbox

Description

@sandbox

There seems to be an issue with quotes containing the separator in read_csv

>>> pd.read_csv(StringIO.StringIO(' a,b,c\r"a,b","e,d","f,f"'), header=None)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/john/app/venv/lib/python2.7/site-packages/pandas/io/parsers.py", line 399, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/home/john/app/venv/lib/python2.7/site-packages/pandas/io/parsers.py", line 215, in _read
    return parser.read()
  File "/home/john/app/venv/lib/python2.7/site-packages/pandas/io/parsers.py", line 631, in read
    ret = self._engine.read(nrows)
  File "/home/john/app/venv/lib/python2.7/site-packages/pandas/io/parsers.py", line 954, in read
    data = self._reader.read(nrows)
  File "parser.pyx", line 644, in pandas._parser.TextReader.read (pandas/src/parser.c:5925)
  File "parser.pyx", line 666, in pandas._parser.TextReader._read_low_memory (pandas/src/parser.c:6145)
  File "parser.pyx", line 719, in pandas._parser.TextReader._read_rows (pandas/src/parser.c:6750)
  File "parser.pyx", line 706, in pandas._parser.TextReader._tokenize_rows (pandas/src/parser.c:6634)
  File "parser.pyx", line 1572, in pandas._parser.raise_parser_error (pandas/src/parser.c:17055)
pandas._parser.CParserError: Error tokenizing data. C error: Expected 3 fields in line 2, saw 4
EXPECTED BEHAVIOR:
>>> pd.read_csv(StringIO.StringIO(' a,b,c\n"a,b","e,d","f,f"'), header=None)

     0    1    2
0    a    b    c
1  a,b  e,d  f,f

This should have the same behavior as when the line ending is \n


Maybe this should be in a separate bug report, but a possibly related issue occurs when you don't say header=None

>>> pd.read_csv(StringIO.StringIO(' a,b,c\r"a,b","e,d","f,f"'))
     a    b    c
"a  b"  e,d  f,f

The above shows the first quoted-delimited item set as the index_col. The following shows what happens when we tell pandas to use index_col=False

>>> pd.read_csv(StringIO.StringIO(' a,b,c\r"a,b","e,d","f,f"'), index_col=False)
    a   b    c
0  "a  b"  e,d
EXPECTED BEHAVIOR:
>>> pd.read_csv(StringIO.StringIO(' a,b,c\n"a,b","e,d","f,f"'))
     a    b    c
0  a,b  e,d  f,f

and with index_col=False

>>> pd.read_csv(StringIO.StringIO(' a,b,c\n"a,b","e,d","f,f"'), index_col=False)
     a    b    c
0  a,b  e,d  f,f

Here is my system information if that is necessary

>>> pd.__version__
'0.10.1'
>>> sys.version_info
sys.version_info(major=2, minor=7, micro=3, releaselevel='final', serial=0)
>>> sys.platform
'darwin'
>>> os.name
posix'

Metadata

Metadata

Assignees

Labels

IO DataIO issues that don't fit into a more specific label

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions