-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Closed
Labels
IO DataIO issues that don't fit into a more specific labelIO issues that don't fit into a more specific label
Milestone
Description
There seems to be an issue with quotes containing the separator in read_csv
>>> pd.read_csv(StringIO.StringIO(' a,b,c\r"a,b","e,d","f,f"'), header=None)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/john/app/venv/lib/python2.7/site-packages/pandas/io/parsers.py", line 399, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/john/app/venv/lib/python2.7/site-packages/pandas/io/parsers.py", line 215, in _read
return parser.read()
File "/home/john/app/venv/lib/python2.7/site-packages/pandas/io/parsers.py", line 631, in read
ret = self._engine.read(nrows)
File "/home/john/app/venv/lib/python2.7/site-packages/pandas/io/parsers.py", line 954, in read
data = self._reader.read(nrows)
File "parser.pyx", line 644, in pandas._parser.TextReader.read (pandas/src/parser.c:5925)
File "parser.pyx", line 666, in pandas._parser.TextReader._read_low_memory (pandas/src/parser.c:6145)
File "parser.pyx", line 719, in pandas._parser.TextReader._read_rows (pandas/src/parser.c:6750)
File "parser.pyx", line 706, in pandas._parser.TextReader._tokenize_rows (pandas/src/parser.c:6634)
File "parser.pyx", line 1572, in pandas._parser.raise_parser_error (pandas/src/parser.c:17055)
pandas._parser.CParserError: Error tokenizing data. C error: Expected 3 fields in line 2, saw 4
EXPECTED BEHAVIOR:
>>> pd.read_csv(StringIO.StringIO(' a,b,c\n"a,b","e,d","f,f"'), header=None)
0 1 2
0 a b c
1 a,b e,d f,f
This should have the same behavior as when the line ending is \n
Maybe this should be in a separate bug report, but a possibly related issue occurs when you don't say header=None
>>> pd.read_csv(StringIO.StringIO(' a,b,c\r"a,b","e,d","f,f"'))
a b c
"a b" e,d f,f
The above shows the first quoted-delimited item set as the index_col
. The following shows what happens when we tell pandas to use index_col=False
>>> pd.read_csv(StringIO.StringIO(' a,b,c\r"a,b","e,d","f,f"'), index_col=False)
a b c
0 "a b" e,d
EXPECTED BEHAVIOR:
>>> pd.read_csv(StringIO.StringIO(' a,b,c\n"a,b","e,d","f,f"'))
a b c
0 a,b e,d f,f
and with index_col=False
>>> pd.read_csv(StringIO.StringIO(' a,b,c\n"a,b","e,d","f,f"'), index_col=False)
a b c
0 a,b e,d f,f
Here is my system information if that is necessary
>>> pd.__version__
'0.10.1'
>>> sys.version_info
sys.version_info(major=2, minor=7, micro=3, releaselevel='final', serial=0)
>>> sys.platform
'darwin'
>>> os.name
posix'
Metadata
Metadata
Assignees
Labels
IO DataIO issues that don't fit into a more specific labelIO issues that don't fit into a more specific label