Mapping XML with namespaces
This section extends the previous example of mapping a basic XML document to cover documents that use XML namespaces. When element and attribute names are namespace-qualified, the XPath expressions used to identify them must include the corresponding namespace prefixes.
Consider the following XML document:
<?xml version="1.0" encoding="UTF-8"?>
<log:serverLogs
xmlns:log="http://example.com/server-logs"
xmlns:meta="http://example.com/server-logs/meta"
meta:host="web-prod-04"
meta:date="2026-03-23">
<log:entry log:id="1">
<log:timestamp>2026-03-23T08:02:11Z</log:timestamp>
<log:level>INFO</log:level>
<log:service>AuthService</log:service>
<log:message>User alice@example.com authenticated successfully.</log:message>
</log:entry>
<log:entry log:id="2">
<log:timestamp>2026-03-23T08:14:37Z</log:timestamp>
<log:level>WARN</log:level>
<log:service>AuthService</log:service>
<log:message>Failed login attempt for user bob@example.com. Attempt 3 of 5.</log:message>
</log:entry>
<log:entry log:id="3">
<log:timestamp>2026-03-23T08:31:05Z</log:timestamp>
<log:level>ERROR</log:level>
<log:service>SessionManager</log:service>
<log:message>Cache connection timed out after 30s. Session store unreachable.</log:message>
</log:entry>
<log:entry log:id="4">
<log:timestamp>2026-03-23T08:31:09Z</log:timestamp>
<log:level>INFO</log:level>
<log:service>SessionManager</log:service>
<log:message>Cache connection restored. Resuming normal operations.</log:message>
</log:entry>
<log:entry log:id="5">
<log:timestamp>2026-03-23T09:45:22Z</log:timestamp>
<log:level>ERROR</log:level>
<log:service>ApiGateway</log:service>
<log:message>Request to /api/orders returned 503. Upstream service unavailable.</log:message>
</log:entry>
<log:entry log:id="6">
<log:timestamp>2026-03-23T10:00:00Z</log:timestamp>
<log:level>INFO</log:level>
<log:service>Scheduler</log:service>
<log:message>Daily report job completed. 1,402 records processed in 4.2s.</log:message>
</log:entry>
</log:serverLogs>
The root element <log:serverLogs> and every child element carry the
log namespace prefix, while the host and date attributes on the
root element carry the meta prefix. Each prefix is bound to a URI in the
document’s namespace declarations.
The setup is the same as in the basic example. First, load the input file:
auto inputpath = fs::path{INPUTDIR} / "server-logs.xml";
orcus::file_content input{inputpath};
then create a spreadsheet document and an import factory:
orcus::spreadsheet::range_size_t ssize{200, 10};
orcus::spreadsheet::document doc{ssize};
orcus::spreadsheet::import_factory factory(doc);
and finally construct the orcus_xml filter and pass
an xmlns_repository instance and the factory instance to it:
orcus::xmlns_repository repo;
orcus::orcus_xml filter{repo, &factory};
Here is where we need to do something different; before defining any
mapping rules, register short aliases for the namespace URIs used in the
document by calling set_namespace_alias()
one per alias:
filter.set_namespace_alias("log", "http://example.com/server-logs");
filter.set_namespace_alias("meta", "http://example.com/server-logs/meta");
Each call maps a short prefix string to its full URI. The prefixes chosen here do not need to match the ones declared in the XML document; they are local to the mapping session and are used solely to qualify element and attribute names inside the XPath expressions that follow.
Note
Documents that declare a default namespace (xmlns="...") require
special handling. Pass an empty string as the alias to mark that URI as
the default namespace for the mapping session:
filter.set_namespace_alias("", "http://example.com/default-ns");
Once a default namespace is set, any unprefixed name in an XPath expression is automatically resolved to that namespace, so the paths can be written without a prefix:
filter.set_cell_link("/root/child", "Sheet", 0, 0);
With the aliases in place, define cell links for the two metadata attributes on the root element:
filter.set_cell_link("/log:serverLogs/@meta:host", "Logs", 0, 1);
filter.set_cell_link("/log:serverLogs/@meta:date", "Logs", 1, 1);
The namespace prefix appears before the colon in each path segment, so
/log:serverLogs/@meta:host targets an attribute named host in the
meta namespace on the root element.
Define a range mapping for the repeating <log:entry> elements:
filter.start_range("Logs", 3, 0);
filter.append_field_link("/log:serverLogs/log:entry/@log:id", "ID");
filter.append_field_link("/log:serverLogs/log:entry/log:level", "Level");
filter.append_field_link("/log:serverLogs/log:entry/log:service", "Service");
filter.append_field_link("/log:serverLogs/log:entry/log:message", "Message");
filter.append_field_link("/log:serverLogs/log:entry/log:timestamp", "Timestamp");
filter.set_range_row_group("/log:serverLogs/log:entry");
filter.commit_range();
Namespace prefixes are required on every qualified name in the path. The
@log:id segment selects the id attribute in the log namespace.
Finally, insert a new sheet, parse the input:
filter.append_sheet("Logs");
filter.read_stream(input.str());
and dump the sheet content:
auto* sheet = doc.get_sheet(0);
if (!sheet)
throw std::runtime_error("failed to fetch the first sheet");
sheet->set_string(0, 0, "Host");
sheet->set_string(1, 0, "Date");
sheet->dump_flat(std::cout);
Note that the two label strings "Host" and "Date" are inserted into
the sheet programmatically after parsing, since the XML document does not
contain them. The values in column 1 of those rows were already populated by
the cell links during read_stream().
Running this code produces the following output:
rows: 10 cols: 5
+-------+-------------+----------------+--------------------------------------------------------------------+----------------------+
| Host | web-prod-04 | | | |
+-------+-------------+----------------+--------------------------------------------------------------------+----------------------+
| Date | 2026-03-23 | | | |
+-------+-------------+----------------+--------------------------------------------------------------------+----------------------+
| | | | | |
+-------+-------------+----------------+--------------------------------------------------------------------+----------------------+
| ID | Level | Service | Message | Timestamp |
+-------+-------------+----------------+--------------------------------------------------------------------+----------------------+
| 1 [v] | INFO | AuthService | User alice@example.com authenticated successfully. | 2026-03-23T08:02:11Z |
+-------+-------------+----------------+--------------------------------------------------------------------+----------------------+
| 2 [v] | WARN | AuthService | Failed login attempt for user bob@example.com. Attempt 3 of 5. | 2026-03-23T08:14:37Z |
+-------+-------------+----------------+--------------------------------------------------------------------+----------------------+
| 3 [v] | ERROR | SessionManager | Cache connection timed out after 30s. Session store unreachable. | 2026-03-23T08:31:05Z |
+-------+-------------+----------------+--------------------------------------------------------------------+----------------------+
| 4 [v] | INFO | SessionManager | Cache connection restored. Resuming normal operations. | 2026-03-23T08:31:09Z |
+-------+-------------+----------------+--------------------------------------------------------------------+----------------------+
| 5 [v] | ERROR | ApiGateway | Request to /api/orders returned 503. Upstream service unavailable. | 2026-03-23T09:45:22Z |
+-------+-------------+----------------+--------------------------------------------------------------------+----------------------+
| 6 [v] | INFO | Scheduler | Daily report job completed. 1,402 records processed in 4.2s. | 2026-03-23T10:00:00Z |
+-------+-------------+----------------+--------------------------------------------------------------------+----------------------+