When loading Google Cloud Storage files into Google BigQuery, multiple files can be loaded at once.
However, I got stuck trying to do that with Google BigQuery's Node.js SDK, so here are my notes on that.
Overview and Issues
- Load using
table.createLoadJob()
. - The file specifies a Cloud Storage File object as the first argument
- If there is only one file, pass only one File object
- If there are multiple files, pass an array of File objects
However, TypeScript's type definition only describes the case of a single file, so passing the File array as is will result in an error.
procedure
Pass the File array as any in a cast so that no error occurs.
A modified Google sample would look like the following
await table.createLoadJob( ( [ storage.bucket('institutions').file('2011.csv'), storage.bucket('institutions').file('2012.csv') ] as any) );
Impressions, etc.
The documentation says there is only one File object, but the sample shows an example of passing multiple objects.
The source code is written in such a way that either a single file or an array of files can be used, but since only one type definition is set, it is rejected by the compiler as is.
It is possible to load one file at a time, but BigQuery has a limit on the number of times it can be loaded per day, so if you have a large number of files, you want to load them all at once, and it is hard to do so.