CSV files have been loaded using SheetJS.
However, there was a CSV file that I just couldn't get to read properly because of garbled characters, so I decided to try another library.
There was a library called "csv" that came up at the top of a search for "csv" in npm, so I tried using that and it worked.
csv" has quite a few useful functions, and I think I will use "csv" to read CSV files from now on, so I will make a rough note of how to use it.
CSV files should be read using a library.
CSV has an easy to understand format, so it is not impossible to write your own readings.
However, in Excel, we often put data containing "line breaks and commas" in cells, and when we try to read CSV files output from such Excel files, the process becomes troublesome.
Example of a cumbersome CSV file
How to read CSV files with line breaks and commas output by Excel using .
If you use "csv", it reads CSV that includes "newlines and commas" well, skips blank lines, automatically converts to JSON data, and is easy to use, so I think it would be easier to use it proactively.
install
There is also a CSV writing function, but if you only want to read, you only need to install "csv-parse
".
npm install csv-parse
use
CSV data (string) is directly passed as an argument.
Since there is no function to read from a file, use fs
to read a file and pass data to it.
import * as parse_csv from 'csv-parse/lib/sync'; import * as fs from 'fs'; const csv = parse_csv(fs.readFileSync('test.csv'),{ bom: true }); console.log(csv); // [ // [ 'key1', 'key2', 'key3' ],. // [ 'Data containing newline, comma,\r\n,\r\n,}', '', '' ], '' // [ '', '', '' ], // [ '# comment line', '', '' ] // ]
That's all!
Detailed settings are described in the second argument.
As described later, bom
setting is required for CSV output from Excel in UTF-8.
setting (of a computer or file, etc.)
The following is a list of settings that may be used frequently.
bom
CSV output from Excel in UTF-8 will be UTF-8 with BOM.
The default is false
, so you do not need to set bom
if you are reading normal UTF-8 CSV.
Incidentally, the garbled characters in SheetJS were caused by UTF-8 with BOM.
columns
Normally, data is retrieved as an array, but if columns
is changed to true
, the first line of the CSV is considered a header, and the data is retrieved as a JSON object.
const csv = parse_csv( fs.readFileSync('test.csv'),{ bom: true, columns: true }); console.log(csv); // [ // { 'key1': 'data containing newline, comma,\r\n,\r\n}', 'key2': '', 'key3': '' } // { 'key1': '', 'key2': '', 'key3': '' } // { 'key1': '#comment line', 'key2': '', 'key3': '' } // ]
If you want the key names in JSON to be different from those in the header, you can explicitly change them by specifying the keys as an array.
In this case, however, the header is also treated as data, so it is necessary to use from
(described below) to skip the header.
const csv = parse_csv( fs.readFileSync('test.csv'),{ bom: true, columns: ['key1', 'key2', 'key3'] }); console.log(csv); // [ // { key1: 'key1', key2: 'key2', key3: 'key3' },. // { key1: 'data containing newline, comma,\r\n,\r\n,}, key2: '', key3: '' } // { key1: '', key2: '', key3: '' }, // { key1: '# comment line', key2: '', key3: '' } // ]
from
When retrieving data, there are times when you want to skip header rows.
You can specify the line where the data starts with from
, so you can skip the header line.
Note, however, that lines do not start with a "0" but with a "1". In other words, to skip the first header, set from
to "2".
const csv = parse_csv( fs.readFileSync('test.csv'),{ bom: true, from: 2, columns: ['key1', 'key2', 'key3'] }); console.log(csv); // [ // { key1: 'data containing newline, comma,\r\n,\r\n,}, key2: '', key3: '' } // { key1: '', key2: '', key3: '' }, // { key1: '# comment line', key2: '', key3: '' } // ]
skip_lines_with_empty_values
If skip_lines_with_empty_values
is changed to true
, lines with no data are skipped.
const csv = parse_csv( fs.readFileSync('test.csv'),{ bom: true, skip_lines_with_empty_values: true }); console.log(csv); // [ // [ 'key1', 'key2', 'key3' ],. // [ 'Data containing newline, comma,\r\n,\r\n,}', '', '' ], '' // [ '# comment line', '', '' ] // ]
comment
Lines beginning with the string specified by comment
are skipped as comment lines.
const csv = parse_csv( fs.readFileSync('test.csv'),{ bom: true, comment: '#' }); console.log(csv); // [ // [ 'key1', 'key2', 'key3' ],. // [ 'Data containing newline, comma,\r\n,\r\n,}', '', '' ], '' // [ '', '', '' ] // ]
relax_column_count
The CSV format requires that all rows have the same number of columns.
However, some CSVs created by programs or manual editing may have different numbers of columns.
When I load a CSV with different number of columns in "csv", I get an error CSV_INCONSISTENT_RECORD_LENGTH
, but if I change relax_column_count
to true
, the variable number of columns is accepted.
trim
The data in each column is retrieved as a string.
If there are white spaces before or after a string, those white spaces are also included in the data.
If trim
is changed to true
, the data will be obtained after removing the preceding and following spaces in advance.
Of course, if the data intentionally contains spaces as string data, they will not be removed.
Impressions, etc.
Libraries are a pain to learn how to use each time, but I'm glad it was easier than I thought.