🌇 🕌 🛷 CSV format, or Where to put commas? 🍹 🎫 🚔

Among the applied tasks that an advanced user, if not a programmer, is faced with, importing data from one application to another is not so rare. And quite often you have to import tabular data, for which, in fact, the CSV format was invented. Now several alternative solutions have already been developed (XML is leading among them). But creating a CSV file in some cases remains the most convenient solution, and in some cases the most problematic.

The name of the format is deciphered as commaseparatedvalues - comma-separated values. In Notepad, the file is as follows:

00, 01, 02, 03;

01, 15, 34, 11;

16, 27, 33, 06 (it is not necessary to put a sign at the very end)

And in an Excel spreadsheet or similar application, it’s already like this:

00	01	02	03
01	fifteen	34	eleven
16	27	33	06

But now we have examined the simplest case. The problems begin precisely due to the fact that due to the very compressed presentation of data, the CSV format is used to transfer large amounts of information. Most often, heterogeneous values are found in databases: digital, alphabetic, alphanumeric, with spaces, etc. Often errors occur in large arrays, and if, for example, somewhere an extra comma appears inside the contents of one of the fields, all cells can shift to one.

In general, the problems that arise when importing data through the CSV file format can be divided into three categories:

1 - Incorrect filling of the fields.

2 - Incorrect data conversion when saving a file to CSV.

3 - Incorrect format recognition by the importer.

The first case we have already partially considered. In order to deal with the problem, it is necessary to provide means for checking the correctness of the input data. For example, if you need to compile a database of addresses for mailing, there are special scripts that check whether a person entered his e-mail in the desired format. Where there are spaces, commas, and any other data that may not be read correctly, the contents of the field must be enclosed in quotation marks (so-called computer “”).

The second case is related to the choice of encoding. It is necessary to find out, often empirically, what encoding the importer program prefers. Windows uses 1251 everywhere, so if the text in Unicode appears in the source file, it will be displayed in the destination application as a series of questions. In Excel, it is possible to save to CSV format with different encoding and delimiters (except for the semicolon, tabs and semicolons are also used), but it is best to create a file in Notepad ++ or OpenOfficeCalc.

The third case is a consequence of the second. Historically, the CSV format does not have a single standard. Many programs also perceive TSV (tab delimited) and SCSV (semicolon separated) formats as CSV, in particular because the file extension does not report this. In this case, the data is read erroneously and is not distributed in the cells in the desired order. The best thing you can advise is to create a CSV file yourself. As already mentioned, for this you need to use the convenient editor.

The tricks of how to tame the CSV format are known to programmers. For a simple user, it is enough to adhere to our recommendations regarding the creation of a file. Least of all problems occurs when a file is created specifically for import into a specific application (it happens differently). So it’s easier to take into account all the possible problems that arise in the way of data transfer.

CSV format, or Where to put commas?

More articles: