Mapping XML to spreadsheet
This section demonstrates how to use the orcus_xml
class to map the contents of an XML document onto a spreadsheet.
Consider the following XML document:
<?xml version="1.0" encoding="UTF-8"?>
<cities>
<header date-generated="2026-03-23">
<title>Basic Facts About Major Cities</title>
<source>Public knowledge</source>
</header>
<city name="Tokyo">
<country>Japan</country>
<population>37400000</population>
<fact>World's most populous metropolitan area.</fact>
<landmark>Tokyo Skytree</landmark>
</city>
<city name="New York">
<country>United States</country>
<population>19200000</population>
<fact>Home to the United Nations headquarters.</fact>
<landmark>Statue of Liberty</landmark>
</city>
<city name="London">
<country>United Kingdom</country>
<population>9300000</population>
<fact>One of the world's oldest financial centers.</fact>
<landmark>Big Ben</landmark>
</city>
<city name="Paris">
<country>France</country>
<population>12100000</population>
<fact>Known as the City of Light.</fact>
<landmark>Eiffel Tower</landmark>
</city>
</cities>
This document contains a <header> element with metadata, followed by
a series of <city> elements each describing a major city. We will
first focus on extracting the header metadata into a sheet.
Start by including the necessary headers:
#include <orcus/xml_namespace.hpp>
#include <orcus/stream.hpp>
#include <orcus/orcus_xml.hpp>
#include <orcus/spreadsheet/factory.hpp>
#include <orcus/spreadsheet/document.hpp>
#include <orcus/spreadsheet/sheet.hpp>
#include <iostream>
#include <filesystem>
namespace fs = std::filesystem;
Next, load the input file into memory using
file_content:
auto inputpath = fs::path{INPUTDIR} / "cities.xml";
orcus::file_content input{inputpath};
INPUTDIR is a constant that stores a path to the directory where the
input file is located.
Now, create a spreadsheet document and its associated import factory.
The range_size_t value defines the
maximum number of rows and columns the document will support:
orcus::spreadsheet::range_size_t ssize{200, 10};
orcus::spreadsheet::document doc{ssize};
orcus::spreadsheet::import_factory factory(doc);
Here, we are specifying the sheet size to be 200 rows and 10 columns.
Create an xmlns_repository to manage XML namespace
identifiers for the session, and use it to construct an
orcus_xml filter instance connected to the import
factory:
orcus::xmlns_repository repo;
orcus::orcus_xml filter{repo, &factory};
Use set_cell_link() to define individual
cell mapping rules:
filter.set_cell_link("/cities/header/@date-generated", "Cities", 0, 0);
filter.set_cell_link("/cities/header/title", "Cities", 1, 0);
filter.set_cell_link("/cities/header/source", "Cities", 2, 0);
Each call takes an XPath expression identifying a node in the XML tree, the name of the target sheet, and the row and column position where the value should be placed. Here we map the header metadata into the first three rows of the sheet.
Note that the @ prefix is used to reference an XML attribute, as
opposed to a child element. The date-generated attribute of the
<header> element is therefore addressed as
/cities/header/@date-generated.
Now that the mapping rules have been defined, insert the target sheet by
calling append_sheet() and parse the
input stream:
filter.append_sheet("Cities");
filter.read_stream(input.str());
Finally, retrieve the first sheet from the document and dump its content to standard output:
const auto* sheet = doc.get_sheet(0);
if (!sheet)
throw std::runtime_error("failed to fetch the first sheet");
sheet->dump_flat(std::cout);
Since get_sheet() may return
a null pointer if the referenced sheet doesn’t exist, you should check
the returned value to make sure it’s not null. Calling
dump_flat() dumps the content of
the sheet to std::cout in an ASCII-art box-style format.
This code should produce the following output:
rows: 3 cols: 1
+--------------------------------+
| 2026-03-23 |
+--------------------------------+
| Basic Facts About Major Cities |
+--------------------------------+
| Public knowledge |
+--------------------------------+
While set_cell_link() maps individual XML
nodes to individual cells, orcus also supports mapping repeating XML
elements to a range of rows — similar to how a database table is
structured. Use start_range() to begin
defining a range mapping,
append_field_link() to map each XML node
to a named column, set_range_row_group()
to identify the repeating element that determines row boundaries, and
commit_range() to finalize the mapping:
filter.start_range("Cities", 0, 0);
filter.append_field_link("/cities/city/@name", "City");
filter.append_field_link("/cities/city/country", "Country");
filter.append_field_link("/cities/city/population", "Population");
filter.append_field_link("/cities/city/fact", "Fact");
filter.append_field_link("/cities/city/landmark", "Popular Spot");
filter.set_range_row_group("/cities/city");
filter.commit_range();
The first argument to start_range() is
the target sheet name, followed by the row and column of the top-left
corner of the range. Each call to
append_field_link() takes an XPath
expression and a column label. The call to
set_range_row_group() tells orcus that
each <city> element represents one row in the output. Once
commit_range() is called, the mapping is
finalized and ready for parsing. Re-running the code with this range
mapping added should produce the following output:
rows: 5 cols: 5
+----------+----------------+--------------+----------------------------------------------+-------------------+
| City | Country | Population | Fact | Popular Spot |
+----------+----------------+--------------+----------------------------------------------+-------------------+
| Tokyo | Japan | 37400000 [v] | World's most populous metropolitan area. | Tokyo Skytree |
+----------+----------------+--------------+----------------------------------------------+-------------------+
| New York | United States | 19200000 [v] | Home to the United Nations headquarters. | Statue of Liberty |
+----------+----------------+--------------+----------------------------------------------+-------------------+
| London | United Kingdom | 9300000 [v] | One of the world's oldest financial centers. | Big Ben |
+----------+----------------+--------------+----------------------------------------------+-------------------+
| Paris | France | 12100000 [v] | Known as the City of Light. | Eiffel Tower |
+----------+----------------+--------------+----------------------------------------------+-------------------+
Note that while the example above runs the header metadata mapping and
the city range mapping as two separate sessions, cell links and range
mappings can be freely mixed within the same session. When mixed, both
become active when read_stream() is
called, and orcus will populate all of them in a single pass over the
document. You can also define multiple ranges in the same session by
calling start_range(),
append_field_link(),
set_range_row_group(), and
commit_range() again for each additional
range before calling read_stream().