Mapping JSON to spreadsheet

This section demonstrates how to use the orcus_json class to map the contents of a JSON document onto a spreadsheet.

Consider the following JSON document:

{
  "meta": {
    "title": "My Book Collection",
    "owner": "John Doe",
    "last-updated": "2025-01-15"
  },
  "books": [
    {
      "title": "Crime and Punishment",
      "author": "Fyodor Dostoevsky",
      "year": 1866,
      "genre": "Novel",
      "rating": 4.8
    },
    {
      "title": "The Great Gatsby",
      "author": "F. Scott Fitzgerald",
      "year": 1925,
      "genre": "Novel",
      "rating": 4.0
    },
    {
      "title": "One Hundred Years of Solitude",
      "author": "Gabriel Garcia Marquez",
      "year": 1967,
      "genre": "Magic Realism",
      "rating": 4.7
    },
    {
      "title": "To Kill a Mockingbird",
      "author": "Harper Lee",
      "year": 1960,
      "genre": "Novel",
      "rating": 4.9
    },
    {
      "title": "Brave New World",
      "author": "Aldous Huxley",
      "year": 1932,
      "genre": "Dystopian",
      "rating": 4.2
    }
  ]
}

The document has a meta object at the top level containing three scalar fields, followed by a books array where each element describes a book.

JSON path expressions

Paths used by orcus_json use a JSONPath-inspired syntax, but are not a strict subset of the JSONPath standard. A path always starts with $ representing the document root. Object keys are addressed with bracket-and-quote notation (['key']), and array elements are addressed with the wildcard notation ([*]), or the equivalent empty bracket notation ([]).

For example, in the document above:

$['meta']['title'] points to the string "My Book Collection"
$['books'][*]['author'] points to the author field of every element in the books array

Cell links

Start by including the necessary headers:

#include <orcus/stream.hpp>
#include <orcus/orcus_json.hpp>

#include <orcus/spreadsheet/factory.hpp>
#include <orcus/spreadsheet/document.hpp>
#include <orcus/spreadsheet/sheet.hpp>

#include <iostream>
#include <filesystem>

namespace fs = std::filesystem;

Load the input file into memory using file_content:

auto inputpath = fs::path{INPUTDIR} / "books.json";
orcus::file_content input{inputpath};

Create a spreadsheet document and its associated import factory:

orcus::spreadsheet::range_size_t ssize{200, 10};
orcus::spreadsheet::document doc{ssize};
orcus::spreadsheet::import_factory factory(doc);

Construct an orcus_json filter instance connected to the import factory:

orcus::orcus_json filter{&factory};

Use set_cell_link() to map individual JSON values to specific cells:

filter.set_cell_link("$['meta']['title']", "Books", 0, 0);
filter.set_cell_link("$['meta']['owner']", "Books", 1, 0);
filter.set_cell_link("$['meta']['last-updated']", "Books", 2, 0);

Each call takes a JSONPath expression identifying a node in the JSON document, the name of the target sheet, and the row and column where the value should be placed.

Now insert the target sheet and parse the input stream:

filter.append_sheet("Books");
filter.read_stream(input.str());

Then retrieve the sheet and dump its content:

const auto* sheet = doc.get_sheet(0);
if (!sheet)
    throw std::runtime_error("failed to fetch the first sheet");

sheet->dump_flat(std::cout);

This should produce the following output:

rows: 3  cols: 1
+--------------------+
| My Book Collection |
+--------------------+
| John Doe           |
+--------------------+
| 2025-01-15         |
+--------------------+

Range mapping

While set_cell_link() maps individual values to individual cells, orcus_json also supports mapping an array of objects to a range of rows — similar to a database table. Use start_range() to begin defining a range, append_field_link() to map each JSON field to a named column, set_range_row_group() to identify the array that determines row boundaries, and commit_range() to finalize the mapping:

filter.start_range("Books", 0, 0, true);
filter.append_field_link("$['books'][*]['title']", "Title");
filter.append_field_link("$['books'][*]['author']", "Author");
filter.append_field_link("$['books'][*]['year']", "Year");
filter.append_field_link("$['books'][*]['genre']", "Genre");
filter.append_field_link("$['books'][*]['rating']", "Rating");
filter.set_range_row_group("$['books']");
filter.commit_range();

The first three arguments to start_range() are the target sheet name and the row and column of the top-left corner of the range. The fourth argument is a boolean flag that, when true, reserves the first row for column headers. Each append_field_link() call takes a JSONPath expression and a column label. The path $['books'][*]['title'] selects the title field from every element of the books array. The call to set_range_row_group() with $['books'] tells orcus that each element of the books array represents one row in the output.

Running this code produces the following output:

rows: 6  cols: 5
+-------------------------------+------------------------+----------+---------------+---------+
| Title                         | Author                 | Year     | Genre         | Rating  |
+-------------------------------+------------------------+----------+---------------+---------+
| Crime and Punishment          | Fyodor Dostoevsky      | 1866 [v] | Novel         | 4.8 [v] |
+-------------------------------+------------------------+----------+---------------+---------+
| The Great Gatsby              | F. Scott Fitzgerald    | 1925 [v] | Novel         | 4 [v]   |
+-------------------------------+------------------------+----------+---------------+---------+
| One Hundred Years of Solitude | Gabriel Garcia Marquez | 1967 [v] | Magic Realism | 4.7 [v] |
+-------------------------------+------------------------+----------+---------------+---------+
| To Kill a Mockingbird         | Harper Lee             | 1960 [v] | Novel         | 4.9 [v] |
+-------------------------------+------------------------+----------+---------------+---------+
| Brave New World               | Aldous Huxley          | 1932 [v] | Dystopian     | 4.2 [v] |
+-------------------------------+------------------------+----------+---------------+---------+

Note that the [v] marker in the output indicates that the value is stored as a numeric type rather than a string.

Cell links and range mappings can be freely mixed within the same session. When combined, both become active when read_stream() is called, and orcus populates all of them in a single pass over the document. Multiple ranges can also be defined in the same session by repeating the start_range() … commit_range() sequence before calling read_stream().