Mapping JSON to spreadsheet
This section demonstrates how to use the orcus_json
class to map the contents of a JSON document onto a spreadsheet.
Consider the following JSON document:
{
"meta": {
"title": "My Book Collection",
"owner": "John Doe",
"last-updated": "2025-01-15"
},
"books": [
{
"title": "Crime and Punishment",
"author": "Fyodor Dostoevsky",
"year": 1866,
"genre": "Novel",
"rating": 4.8
},
{
"title": "The Great Gatsby",
"author": "F. Scott Fitzgerald",
"year": 1925,
"genre": "Novel",
"rating": 4.0
},
{
"title": "One Hundred Years of Solitude",
"author": "Gabriel Garcia Marquez",
"year": 1967,
"genre": "Magic Realism",
"rating": 4.7
},
{
"title": "To Kill a Mockingbird",
"author": "Harper Lee",
"year": 1960,
"genre": "Novel",
"rating": 4.9
},
{
"title": "Brave New World",
"author": "Aldous Huxley",
"year": 1932,
"genre": "Dystopian",
"rating": 4.2
}
]
}
The document has a meta object at the top level containing three scalar
fields, followed by a books array where each element describes a book.
JSON path expressions
Paths used by orcus_json use a JSONPath-inspired syntax,
but are not a strict subset of the JSONPath standard. A path always starts
with $ representing the document root. Object keys are addressed with
bracket-and-quote notation (['key']), and array elements are addressed
with the wildcard notation ([*]), or the equivalent empty bracket notation ([]).
For example, in the document above:
$['meta']['title']points to the string"My Book Collection"$['books'][*]['author']points to theauthorfield of every element in thebooksarray
Cell links
Start by including the necessary headers:
#include <orcus/stream.hpp>
#include <orcus/orcus_json.hpp>
#include <orcus/spreadsheet/factory.hpp>
#include <orcus/spreadsheet/document.hpp>
#include <orcus/spreadsheet/sheet.hpp>
#include <iostream>
#include <filesystem>
namespace fs = std::filesystem;
Load the input file into memory using file_content:
auto inputpath = fs::path{INPUTDIR} / "books.json";
orcus::file_content input{inputpath};
Create a spreadsheet document and its associated import factory:
orcus::spreadsheet::range_size_t ssize{200, 10};
orcus::spreadsheet::document doc{ssize};
orcus::spreadsheet::import_factory factory(doc);
Construct an orcus_json filter instance connected to the
import factory:
orcus::orcus_json filter{&factory};
Use set_cell_link() to map individual JSON
values to specific cells:
filter.set_cell_link("$['meta']['title']", "Books", 0, 0);
filter.set_cell_link("$['meta']['owner']", "Books", 1, 0);
filter.set_cell_link("$['meta']['last-updated']", "Books", 2, 0);
Each call takes a JSONPath expression identifying a node in the JSON document, the name of the target sheet, and the row and column where the value should be placed.
Now insert the target sheet and parse the input stream:
filter.append_sheet("Books");
filter.read_stream(input.str());
Then retrieve the sheet and dump its content:
const auto* sheet = doc.get_sheet(0);
if (!sheet)
throw std::runtime_error("failed to fetch the first sheet");
sheet->dump_flat(std::cout);
This should produce the following output:
rows: 3 cols: 1
+--------------------+
| My Book Collection |
+--------------------+
| John Doe |
+--------------------+
| 2025-01-15 |
+--------------------+
Range mapping
While set_cell_link() maps individual values to
individual cells, orcus_json also supports mapping an
array of objects to a range of rows — similar to a database table. Use
start_range() to begin defining a range,
append_field_link() to map each JSON field to a
named column, set_range_row_group() to identify
the array that determines row boundaries, and
commit_range() to finalize the mapping:
filter.start_range("Books", 0, 0, true);
filter.append_field_link("$['books'][*]['title']", "Title");
filter.append_field_link("$['books'][*]['author']", "Author");
filter.append_field_link("$['books'][*]['year']", "Year");
filter.append_field_link("$['books'][*]['genre']", "Genre");
filter.append_field_link("$['books'][*]['rating']", "Rating");
filter.set_range_row_group("$['books']");
filter.commit_range();
The first three arguments to start_range() are
the target sheet name and the row and column of the top-left corner of the
range. The fourth argument is a boolean flag that, when true, reserves
the first row for column headers. Each
append_field_link() call takes a JSONPath
expression and a column label. The path $['books'][*]['title'] selects
the title field from every element of the books array. The call to
set_range_row_group() with $['books'] tells
orcus that each element of the books array represents one row in the
output.
Running this code produces the following output:
rows: 6 cols: 5
+-------------------------------+------------------------+----------+---------------+---------+
| Title | Author | Year | Genre | Rating |
+-------------------------------+------------------------+----------+---------------+---------+
| Crime and Punishment | Fyodor Dostoevsky | 1866 [v] | Novel | 4.8 [v] |
+-------------------------------+------------------------+----------+---------------+---------+
| The Great Gatsby | F. Scott Fitzgerald | 1925 [v] | Novel | 4 [v] |
+-------------------------------+------------------------+----------+---------------+---------+
| One Hundred Years of Solitude | Gabriel Garcia Marquez | 1967 [v] | Magic Realism | 4.7 [v] |
+-------------------------------+------------------------+----------+---------------+---------+
| To Kill a Mockingbird | Harper Lee | 1960 [v] | Novel | 4.9 [v] |
+-------------------------------+------------------------+----------+---------------+---------+
| Brave New World | Aldous Huxley | 1932 [v] | Dystopian | 4.2 [v] |
+-------------------------------+------------------------+----------+---------------+---------+
Note that the [v] marker in the output indicates that the value is stored
as a numeric type rather than a string.
Cell links and range mappings can be freely mixed within the same session.
When combined, both become active when
read_stream() is called, and orcus populates
all of them in a single pass over the document. Multiple ranges can also be
defined in the same session by repeating the
start_range() …
commit_range() sequence before calling
read_stream().