Excel 2008 for Mac’s CSV export bug
I ran into this at work a few weeks ago and thought I’d share.
Excel 2008’s CSV export feature is broken. For instance, enter the following fake data into Excel:
Row | Name | Age |
0 | Nick | 23 |
1 | Bill | 48 |
Save as -> CSV file
When you use standard unix commands to view the output, the results are all garbled.
[Documents]$ cat Workbook1.csv 1,Bill,48[Documents]$ $ wc -l Workbook1.csv 0 Workbook1.csv
What is the issue? The file command reveals the problem:
$ file Workbook1.csv Workbook1.csv: ASCII text, with CR line terminators
CR stands for Carriage return, the ‘\r’ control sequence which, along with the newline character (‘\n’), is used to break up lines on Windows. Unix OSes like Mac OS expect a single ‘\n’ new line character to terminate lines.
How can we fix this?
# convert the Workbook1.csv file into a Unix appropriate file dos2unix Workbook1.csv WithUnixLineEndings.csv
If you don’t have dos2unix on your Mac, and you don’t want to install it, you can fake it with the tr command:
tr '\15' '\n' < Workbook1.csv # remove the carriage returns, replace with a newline Row,Name,Age 0,Nick,23 1,Bill,48
Very annoying that the Mac Excel doesn’t respect Unix line terminators. Interestingly, I found a post that talks about ensuring that you choose a CSV file encoded for Mac, but that option seems missing from the Mac version itself.
If I’m missing something obvious, please correct me.
Oh my oh my. This really saved my day. You are a hero.
Cheers!
Awesome, it was a shot in the dark that someone else would need this info. Glad you found it.
You could also just choose to save in “Format: Windows Separated Values (.csv)” which produces CRLF. Default Format: Comma Separated Values (.csv) produces CR.
that (resave as windows csv file) doesn’t seem to work with mac excel 2008, but it does with mac excel 2011. ARGH. probably better just to use open office.
Thanks for writing the article. There’s a potential additional place for problems: character set. My Excel-produced CSV was in “Western (Mac OS Roman)”. As a result, the tr command suggested here failed with “tr: Illegal byte sequence”. I opened the csv in TextWrangler and saved using UTF-8. After that your tr command worked perfectly. Perhaps there’s a way to tell tr to handle Mac OS Roman… but I don’t know.
Thank you so much for this article! I was parsing field records saved in csv files using c++ and for the life of me couldn’t figure out why the line breaks disappeared. Could never notice it’s a exporting issue. Thanks again!!!
Another option is to save your file as UTF-16 Unicode Text (.txt), and then in qGIS specify the encoding as UTF-16 and specify custom delimiters (comma, tab and semicolon). Works perfectly for me on El Capitan.