Mapping XML to spreadsheet

This section demonstrates how to use the orcus_xml class to map the contents of an XML document onto a spreadsheet.

Consider the following XML document:

<?xml version="1.0" encoding="UTF-8"?>
<cities>
  <header date-generated="2026-03-23">
    <title>Basic Facts About Major Cities</title>
    <source>Public knowledge</source>
  </header>
  <city name="Tokyo">
    <country>Japan</country>
    <population>37400000</population>
    <fact>World's most populous metropolitan area.</fact>
    <landmark>Tokyo Skytree</landmark>
  </city>
  <city name="New York">
    <country>United States</country>
    <population>19200000</population>
    <fact>Home to the United Nations headquarters.</fact>
    <landmark>Statue of Liberty</landmark>
  </city>
  <city name="London">
    <country>United Kingdom</country>
    <population>9300000</population>
    <fact>One of the world's oldest financial centers.</fact>
    <landmark>Big Ben</landmark>
  </city>
  <city name="Paris">
    <country>France</country>
    <population>12100000</population>
    <fact>Known as the City of Light.</fact>
    <landmark>Eiffel Tower</landmark>
  </city>
</cities>

This document contains a <header> element with metadata, followed by a series of <city> elements each describing a major city. We will first focus on extracting the header metadata into a sheet.

Start by including the necessary headers:

#include <orcus/xml_namespace.hpp>
#include <orcus/stream.hpp>
#include <orcus/orcus_xml.hpp>

#include <orcus/spreadsheet/factory.hpp>
#include <orcus/spreadsheet/document.hpp>
#include <orcus/spreadsheet/sheet.hpp>

#include <iostream>
#include <filesystem>

namespace fs = std::filesystem;

Next, load the input file into memory using file_content:

auto inputpath = fs::path{INPUTDIR} / "cities.xml";
orcus::file_content input{inputpath};

INPUTDIR is a constant that stores a path to the directory where the input file is located.

Now, create a spreadsheet document and its associated import factory. The range_size_t value defines the maximum number of rows and columns the document will support:

orcus::spreadsheet::range_size_t ssize{200, 10};
orcus::spreadsheet::document doc{ssize};
orcus::spreadsheet::import_factory factory(doc);

Here, we are specifying the sheet size to be 200 rows and 10 columns.

Create an xmlns_repository to manage XML namespace identifiers for the session, and use it to construct an orcus_xml filter instance connected to the import factory:

orcus::xmlns_repository repo;
orcus::orcus_xml filter{repo, &factory};

Use set_cell_link() to define individual cell mapping rules:

filter.set_cell_link("/cities/header/@date-generated", "Cities", 0, 0);
filter.set_cell_link("/cities/header/title", "Cities", 1, 0);
filter.set_cell_link("/cities/header/source", "Cities", 2, 0);

Each call takes an XPath expression identifying a node in the XML tree, the name of the target sheet, and the row and column position where the value should be placed. Here we map the header metadata into the first three rows of the sheet.

Note that the @ prefix is used to reference an XML attribute, as opposed to a child element. The date-generated attribute of the <header> element is therefore addressed as /cities/header/@date-generated.

Now that the mapping rules have been defined, insert the target sheet by calling append_sheet() and parse the input stream:

filter.append_sheet("Cities");
filter.read_stream(input.str());

Finally, retrieve the first sheet from the document and dump its content to standard output:

const auto* sheet = doc.get_sheet(0);
if (!sheet)
    throw std::runtime_error("failed to fetch the first sheet");

sheet->dump_flat(std::cout);

Since get_sheet() may return a null pointer if the referenced sheet doesn’t exist, you should check the returned value to make sure it’s not null. Calling dump_flat() dumps the content of the sheet to std::cout in an ASCII-art box-style format.

This code should produce the following output:

rows: 3  cols: 1
+--------------------------------+
| 2026-03-23                     |
+--------------------------------+
| Basic Facts About Major Cities |
+--------------------------------+
| Public knowledge               |
+--------------------------------+

While set_cell_link() maps individual XML nodes to individual cells, orcus also supports mapping repeating XML elements to a range of rows — similar to how a database table is structured. Use start_range() to begin defining a range mapping, append_field_link() to map each XML node to a named column, set_range_row_group() to identify the repeating element that determines row boundaries, and commit_range() to finalize the mapping:

filter.start_range("Cities", 0, 0);
filter.append_field_link("/cities/city/@name", "City");
filter.append_field_link("/cities/city/country", "Country");
filter.append_field_link("/cities/city/population", "Population");
filter.append_field_link("/cities/city/fact", "Fact");
filter.append_field_link("/cities/city/landmark", "Popular Spot");
filter.set_range_row_group("/cities/city");
filter.commit_range();

The first argument to start_range() is the target sheet name, followed by the row and column of the top-left corner of the range. Each call to append_field_link() takes an XPath expression and a column label. The call to set_range_row_group() tells orcus that each <city> element represents one row in the output. Once commit_range() is called, the mapping is finalized and ready for parsing. Re-running the code with this range mapping added should produce the following output:

rows: 5  cols: 5
+----------+----------------+--------------+----------------------------------------------+-------------------+
| City     | Country        | Population   | Fact                                         | Popular Spot      |
+----------+----------------+--------------+----------------------------------------------+-------------------+
| Tokyo    | Japan          | 37400000 [v] | World's most populous metropolitan area.     | Tokyo Skytree     |
+----------+----------------+--------------+----------------------------------------------+-------------------+
| New York | United States  | 19200000 [v] | Home to the United Nations headquarters.     | Statue of Liberty |
+----------+----------------+--------------+----------------------------------------------+-------------------+
| London   | United Kingdom | 9300000 [v]  | One of the world's oldest financial centers. | Big Ben           |
+----------+----------------+--------------+----------------------------------------------+-------------------+
| Paris    | France         | 12100000 [v] | Known as the City of Light.                  | Eiffel Tower      |
+----------+----------------+--------------+----------------------------------------------+-------------------+

Note that while the example above runs the header metadata mapping and the city range mapping as two separate sessions, cell links and range mappings can be freely mixed within the same session. When mixed, both become active when read_stream() is called, and orcus will populate all of them in a single pass over the document. You can also define multiple ranges in the same session by calling start_range(), append_field_link(), set_range_row_group(), and commit_range() again for each additional range before calling read_stream().