Mapping JSON with automatic structure detection
This section demonstrates how orcus_json can detect the
mapping structure of a JSON document automatically, without writing any path
expressions by hand.
Consider the following JSON document:
[
{
"name": "Mercury",
"type": "Terrestrial",
"distance-au": 0.39,
"diameter-km": 4879,
"moons": 0
},
{
"name": "Venus",
"type": "Terrestrial",
"distance-au": 0.72,
"diameter-km": 12104,
"moons": 0
},
{
"name": "Earth",
"type": "Terrestrial",
"distance-au": 1.0,
"diameter-km": 12742,
"moons": 1
},
{
"name": "Mars",
"type": "Terrestrial",
"distance-au": 1.52,
"diameter-km": 6779,
"moons": 2
},
{
"name": "Jupiter",
"type": "Gas Giant",
"distance-au": 5.20,
"diameter-km": 139820,
"moons": 95
},
{
"name": "Saturn",
"type": "Gas Giant",
"distance-au": 9.58,
"diameter-km": 116460,
"moons": 146
},
{
"name": "Uranus",
"type": "Ice Giant",
"distance-au": 19.22,
"diameter-km": 50724,
"moons": 28
},
{
"name": "Neptune",
"type": "Ice Giant",
"distance-au": 30.05,
"diameter-km": 49244,
"moons": 16
}
]
The document is a top-level array of objects, each with the same set of fields.
Simple auto-detection
The setup is identical to the previous example: create a document, a factory, and a filter instance:
orcus::spreadsheet::range_size_t ssize{100, 20};
orcus::spreadsheet::document doc{ssize};
orcus::spreadsheet::import_factory factory(doc);
orcus::orcus_json filter{&factory};
Then replace the manual path registration with a single call to
detect_map_definition(), followed by
read_stream():
filter.detect_map_definition(input.str());
filter.read_stream(input.str());
detect_map_definition() analyzes the document,
identifies every repeating array of objects, and registers the mapping rules
internally. Each detected range is assigned to a new sheet named
range-0, range-1, and so on.
read_stream() then imports the data using those
rules.
Note
detect_map_definition() also appends the
necessary sheets to the document automatically — one per detected range —
so there is no need to call append_sheet()
manually.
Dumping the first sheet:
const auto* sheet = doc.get_sheet(0);
if (!sheet)
throw std::runtime_error("failed to fetch the first sheet");
sheet->dump_flat(std::cout);
produces:
rows: 9 cols: 5
+---------+-------------+-------------+-------------+---------+
| name | type | distance-au | diameter-km | moons |
+---------+-------------+-------------+-------------+---------+
| Mercury | Terrestrial | 0.39 [v] | 4879 [v] | 0 [v] |
+---------+-------------+-------------+-------------+---------+
| Venus | Terrestrial | 0.72 [v] | 12104 [v] | 0 [v] |
+---------+-------------+-------------+-------------+---------+
| Earth | Terrestrial | 1 [v] | 12742 [v] | 1 [v] |
+---------+-------------+-------------+-------------+---------+
| Mars | Terrestrial | 1.52 [v] | 6779 [v] | 2 [v] |
+---------+-------------+-------------+-------------+---------+
| Jupiter | Gas Giant | 5.2 [v] | 139820 [v] | 95 [v] |
+---------+-------------+-------------+-------------+---------+
| Saturn | Gas Giant | 9.58 [v] | 116460 [v] | 146 [v] |
+---------+-------------+-------------+-------------+---------+
| Uranus | Ice Giant | 19.22 [v] | 50724 [v] | 28 [v] |
+---------+-------------+-------------+-------------+---------+
| Neptune | Ice Giant | 30.05 [v] | 49244 [v] | 16 [v] |
+---------+-------------+-------------+-------------+---------+
The column headers are taken directly from the JSON object keys, and the column order matches the order in which the keys appear in the source document.
Inspecting and editing the map definition
detect_map_definition() commits the mapping
immediately. When you need to inspect or adjust the detected rules before
importing, use write_map_definition() instead.
It performs the same structure analysis but serializes the result to a JSON
string rather than applying it:
std::ostringstream os;
filter.write_map_definition(input.str(), os);
// print the map definition to stdout for inspection
auto map_def = os.str();
std::cout << map_def << std::endl;
The map definition is written as a single compact line. The following is its pretty-printed form for readability:
{
"sheets": ["range-0"],
"ranges": [
{
"sheet": "range-0",
"row": 0,
"column": 0,
"row-header": true,
"fields": [
{"path": "$[*]['name']"},
{"path": "$[*]['type']"},
{"path": "$[*]['distance-au']"},
{"path": "$[*]['diameter-km']"},
{"path": "$[*]['moons']"}
],
"row-groups": [
{"path": "$"}
]
}
]
}
The string can be edited freely at this point — fields can be removed,
reordered, or relabelled via an optional "label" key, sheet names can
be changed, and unwanted ranges can be dropped entirely.
Once any edits are complete, load the definition and import the data:
filter.read_map_definition(map_def);
filter.read_stream(input.str());
read_map_definition() parses the map definition
JSON, appends the necessary sheets to the document, and populates the internal
mapping rules — exactly as if the rules had been set up manually.
read_stream() then applies those rules to import
the source document.