When given a comma or tab-delimited file, we usually want to import this into some kind of database. The first you need to find out is what type of file this is as it could make or break your import.
g _ – goes to the first non whispace
Mac OS Pre-X | CR | ASCII 13 | Control Key: ^M |
Mac X / Unix | LF | ASCII 10 | Control Key: ^J |
Windows | CRLF | N/A | ^M$ |
This could easily be done at the command line with:
file my-text-file.txt # my-text-file.txt: ASCII text, with very long lines, with CRLF line terminators
Doesn’t get much clearer than that. So this particular file is from a Windows system with CRLF terminators. Now, without doing this, it’s a bit vague what we’re importing from. For example, if we assumed the text file was from a OSX/Unix system and ran this:
LOAD DATA LOCAL INFILE "" INTO TABLE cool_table FIELDS TERMINATED BY '\t' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';
You’ll get some funky carriage return characters at the beginning of each row (and they will count as characters and truncate your data). Now using vim editor, you won’t see much wrong with the file with the “:set list” command. You’ll see ^I where the tabs are and a $ for end-of-line. All looks well but that’s because vim auto-detected the file. You could see this with the vim command:
:set ff? ffs? # fileformat=dos
Now to see its true colors, let’s force it to read as unix/DOS.
:e ++ff=unix
You’ll start to see the infamous ^M in the file.
In conclusion, find out exactly what kind of file you’re importing, and let the importer know. In the case with MySQL, terminate lines by “\r\n” will provide a proper import. I hope this solves the mystery of imports that go wrong!
Recent Comments