Handling other formats
Earlier examples of loading JSON data assume the use of JSONEachRow
(ndjson). We provide examples of loading JSON in other common formats below.
Array of JSON objectsβ
One of the most popular forms of JSON data is having a list of JSON objects in a JSON array, like in this example:
> cat list.json
[
{
"path": "Akiba_Hebrew_Academy",
"month": "2017-08-01",
"hits": 241
},
{
"path": "Aegithina_tiphia",
"month": "2018-02-01",
"hits": 34
},
...
]
Letβs create a table for this kind of data:
CREATE TABLE sometable
(
`path` String,
`month` Date,
`hits` UInt32
)
ENGINE = MergeTree
ORDER BY tuple(month, path)
To import a list of JSON objects, we can use a JSONEachRow
format (inserting data from list.json file):
INSERT INTO sometable
FROM INFILE 'list.json'
FORMAT JSONEachRow
We have used a FROM INFILE clause to load data from the local file, and we can see the import was successful:
SELECT *
FROM sometable
ββpathβββββββββββββββββββββββ¬ββββββmonthββ¬βhitsββ
β 1971-72_Utah_Stars_season β 2016-10-01 β 1 β
β Akiba_Hebrew_Academy β 2017-08-01 β 241 β
β Aegithina_tiphia β 2018-02-01 β 34 β
βββββββββββββββββββββββββββββ΄βββββββββββββ΄βββββββ
Handling NDJSON (line delimited JSON)β
Many apps can log data in JSON format so that each log line is an individual JSON object, like in this file:
cat object-per-line.json
{"path":"1-krona","month":"2017-01-01","hits":4}
{"path":"Ahmadabad-e_Kalij-e_Sofla","month":"2017-01-01","hits":3}
{"path":"Bob_Dolman","month":"2016-11-01","hits":245}
The same JSONEachRow
format is capable of working with such files:
INSERT INTO sometable FROM INFILE 'object-per-line.json' FORMAT JSONEachRow;
SELECT * FROM sometable;
ββpathβββββββββββββββββββββββ¬ββββββmonthββ¬βhitsββ
β Bob_Dolman β 2016-11-01 β 245 β
β 1-krona β 2017-01-01 β 4 β
β Ahmadabad-e_Kalij-e_Sofla β 2017-01-01 β 3 β
βββββββββββββββββββββββββββββ΄βββββββββββββ΄βββββββ
JSON object keysβ
In some cases, the list of JSON objects can be encoded as object properties instead of array elements (see objects.json for example):
cat objects.json
{
"a": {
"path":"April_25,_2017",
"month":"2018-01-01",
"hits":2
},
"b": {
"path":"Akahori_Station",
"month":"2016-06-01",
"hits":11
},
...
}
ClickHouse can load data from this kind of data using the JSONObjectEachRow
format:
INSERT INTO sometable FROM INFILE 'objects.json' FORMAT JSONObjectEachRow;
SELECT * FROM sometable;
ββpathβββββββββββββ¬ββββββmonthββ¬βhitsββ
β Abducens_palsy β 2016-05-01 β 28 β
β Akahori_Station β 2016-06-01 β 11 β
β April_25,_2017 β 2018-01-01 β 2 β
βββββββββββββββββββ΄βββββββββββββ΄βββββββ
Specifying parent object key valuesβ
Letβs say we also want to save values in parent object keys to the table. In this case, we can use the following option to define the name of the column we want key values to be saved to:
SET format_json_object_each_row_column_for_object_name = 'id'
Now, we can check which data is going to be loaded from the original JSON file using file()
function:
SELECT * FROM file('objects.json', JSONObjectEachRow)
ββidββ¬βpathβββββββββββββ¬ββββββmonthββ¬βhitsββ
β a β April_25,_2017 β 2018-01-01 β 2 β
β b β Akahori_Station β 2016-06-01 β 11 β
β c β Abducens_palsy β 2016-05-01 β 28 β
ββββββ΄ββββββββββββββββββ΄βββββββββββββ΄βββββββ
Note how the id
column has been populated by key values correctly.
JSON Arraysβ
Sometimes, for the sake of saving space, JSON files are encoded in arrays instead of objects. In this case, we deal with a list of JSON arrays:
cat arrays.json
["Akiba_Hebrew_Academy", "2017-08-01", 241],
["Aegithina_tiphia", "2018-02-01", 34],
["1971-72_Utah_Stars_season", "2016-10-01", 1]
In this case, ClickHouse will load this data and attribute each value to the corresponding column based on its order in the array. We use JSONCompactEachRow
format for this:
SELECT * FROM sometable
ββc1βββββββββββββββββββββββββ¬βββββββββc2ββ¬ββc3ββ
β Akiba_Hebrew_Academy β 2017-08-01 β 241 β
β Aegithina_tiphia β 2018-02-01 β 34 β
β 1971-72_Utah_Stars_season β 2016-10-01 β 1 β
βββββββββββββββββββββββββββββ΄βββββββββββββ΄ββββββ
Importing individual columns from JSON arraysβ
In some cases, data can be encoded column-wise instead of row-wise. In this case, a parent JSON object contains columns with values. Take a look at the following file:
cat columns.json
{
"path": ["2007_Copa_America", "Car_dealerships_in_the_USA", "Dihydromyricetin_reductase"],
"month": ["2016-07-01", "2015-07-01", "2015-07-01"],
"hits": [178, 11, 1]
}
ClickHouse uses the JSONColumns
format to parse data formatted like that:
SELECT * FROM file('columns.json', JSONColumns)
ββpathββββββββββββββββββββββββ¬ββββββmonthββ¬βhitsββ
β 2007_Copa_America β 2016-07-01 β 178 β
β Car_dealerships_in_the_USA β 2015-07-01 β 11 β
β Dihydromyricetin_reductase β 2015-07-01 β 1 β
ββββββββββββββββββββββββββββββ΄βββββββββββββ΄βββββββ
A more compact format is also supported when dealing with an array of columns instead of an object using JSONCompactColumns
format:
SELECT * FROM file('columns-array.json', JSONCompactColumns)
ββc1βββββββββββββββ¬βββββββββc2ββ¬βc3ββ
β Heidenrod β 2017-01-01 β 10 β
β Arthur_Henrique β 2016-11-01 β 12 β
β Alan_Ebnother β 2015-11-01 β 66 β
βββββββββββββββββββ΄βββββββββββββ΄βββββ
Saving JSON objects instead of parsingβ
There are cases you might want to save JSON objects to a single String
(or JSON) column instead of parsing it. This can be useful when dealing with a list of JSON objects of different structures. Let's take this file where we have multiple different JSON objects inside a parent list:
cat custom.json
[
{"name": "Joe", "age": 99, "type": "person"},
{"url": "/my.post.MD", "hits": 1263, "type": "post"},
{"message": "Warning on disk usage", "type": "log"}
]
We want to save original JSON objects into the following table:
CREATE TABLE events
(
`data` String
)
ENGINE = MergeTree
ORDER BY ()
Now we can load data from the file into this table using JSONAsString
format to keep JSON objects instead of parsing them:
INSERT INTO events (data)
FROM INFILE 'custom.json'
FORMAT JSONAsString
And we can use JSON functions to query saved objects:
SELECT
JSONExtractString(data, 'type') AS type,
data
FROM events
ββtypeββββ¬βdataββββββββββββββββββββββββββββββββββββββββββββββββββ
β person β {"name": "Joe", "age": 99, "type": "person"} β
β post β {"url": "/my.post.MD", "hits": 1263, "type": "post"} β
β log β {"message": "Warning on disk usage", "type": "log"} β
ββββββββββ΄βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Note that JSONAsString
works perfectly fine in cases we have JSON object-per-line formatted files (usually used with JSONEachRow
format).
Schema for nested objectsβ
In cases when we're dealing with nested JSON objects, we can additionally define schema and use complex types (Array
, Object Data Type
or Tuple
) to load data:
SELECT *
FROM file('list-nested.json', JSONEachRow, 'page Tuple(path String, title String, owner_id UInt16), month Date, hits UInt32')
LIMIT 1
ββpageββββββββββββββββββββββββββββββββββββββββββββββββ¬ββββββmonthββ¬βhitsββ
β ('Akiba_Hebrew_Academy','Akiba Hebrew Academy',12) β 2017-08-01 β 241 β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββ΄βββββββββββββ΄βββββββ
Accessing nested JSON objectsβ
We can refer to nested JSON keys by enabling the following settings option:
SET input_format_import_nested_json = 1
This allows us to refer to nested JSON object keys using dot notation (remember to wrap those with backtick symbols to work):
SELECT *
FROM file('list-nested.json', JSONEachRow, '`page.owner_id` UInt32, `page.title` String, month Date, hits UInt32')
LIMIT 1
ββpage.owner_idββ¬βpage.titleββββββββββββ¬ββββββmonthββ¬βhitsββ
β 12 β Akiba Hebrew Academy β 2017-08-01 β 241 β
βββββββββββββββββ΄βββββββββββββββββββββββ΄βββββββββββββ΄βββββββ
This way we can flatten nested JSON objects or use some nested values to save them as separate columns.
Skipping unknown columnsβ
By default, ClickHouse will ignore unknown columns when importing JSON data. Letβs try to import the original file into the table without the month
column:
CREATE TABLE shorttable
(
`path` String,
`hits` UInt32
)
ENGINE = MergeTree
ORDER BY path
We can still insert the original JSON data with 3 columns into this table:
INSERT INTO shorttable FROM INFILE 'list.json' FORMAT JSONEachRow;
SELECT * FROM shorttable
ββpathβββββββββββββββββββββββ¬βhitsββ
β 1971-72_Utah_Stars_season β 1 β
β Aegithina_tiphia β 34 β
β Akiba_Hebrew_Academy β 241 β
βββββββββββββββββββββββββββββ΄βββββββ
ClickHouse will ignore unknown columns while importing. This can be disabled with the input_format_skip_unknown_fields settings option:
SET input_format_skip_unknown_fields = 0;
INSERT INTO shorttable FROM INFILE 'list.json' FORMAT JSONEachRow;
Ok.
Exception on client:
Code: 117. DB::Exception: Unknown field found while parsing JSONEachRow format: month: (in file/uri /data/clickhouse/user_files/list.json): (at row 1)
ClickHouse will throw exceptions in cases of inconsistent JSON and table columns structure.
BSONβ
ClickHouse allows exporting to and importing data from BSON encoded files. This format is used by some DBMSs, e.g. MongoDB database.
To import BSON data, we use the BSONEachRow format. Letβs import data from this BSON file:
SELECT * FROM file('data.bson', BSONEachRow)
ββpathβββββββββββββββββββββββ¬βmonthββ¬βhitsββ
β Bob_Dolman β 17106 β 245 β
β 1-krona β 17167 β 4 β
β Ahmadabad-e_Kalij-e_Sofla β 17167 β 3 β
βββββββββββββββββββββββββββββ΄ββββββββ΄βββββββ
We can also export to BSON files using the same format:
SELECT *
FROM sometable
INTO OUTFILE 'out.bson'
FORMAT BSONEachRow
After that, weβll have our data exported to the out.bson
file.