Mapping XML with namespaces

This section extends the previous example of mapping a basic XML document to cover documents that use XML namespaces. When element and attribute names are namespace-qualified, the XPath expressions used to identify them must include the corresponding namespace prefixes.

Consider the following XML document:

<?xml version="1.0" encoding="UTF-8"?>
<log:serverLogs
  xmlns:log="http://example.com/server-logs"
  xmlns:meta="http://example.com/server-logs/meta"
  meta:host="web-prod-04"
  meta:date="2026-03-23">

  <log:entry log:id="1">
    <log:timestamp>2026-03-23T08:02:11Z</log:timestamp>
    <log:level>INFO</log:level>
    <log:service>AuthService</log:service>
    <log:message>User alice@example.com authenticated successfully.</log:message>
  </log:entry>

  <log:entry log:id="2">
    <log:timestamp>2026-03-23T08:14:37Z</log:timestamp>
    <log:level>WARN</log:level>
    <log:service>AuthService</log:service>
    <log:message>Failed login attempt for user bob@example.com. Attempt 3 of 5.</log:message>
  </log:entry>

  <log:entry log:id="3">
    <log:timestamp>2026-03-23T08:31:05Z</log:timestamp>
    <log:level>ERROR</log:level>
    <log:service>SessionManager</log:service>
    <log:message>Cache connection timed out after 30s. Session store unreachable.</log:message>
  </log:entry>

  <log:entry log:id="4">
    <log:timestamp>2026-03-23T08:31:09Z</log:timestamp>
    <log:level>INFO</log:level>
    <log:service>SessionManager</log:service>
    <log:message>Cache connection restored. Resuming normal operations.</log:message>
  </log:entry>

  <log:entry log:id="5">
    <log:timestamp>2026-03-23T09:45:22Z</log:timestamp>
    <log:level>ERROR</log:level>
    <log:service>ApiGateway</log:service>
    <log:message>Request to /api/orders returned 503. Upstream service unavailable.</log:message>
  </log:entry>

  <log:entry log:id="6">
    <log:timestamp>2026-03-23T10:00:00Z</log:timestamp>
    <log:level>INFO</log:level>
    <log:service>Scheduler</log:service>
    <log:message>Daily report job completed. 1,402 records processed in 4.2s.</log:message>
  </log:entry>

</log:serverLogs>

The root element <log:serverLogs> and every child element carry the log namespace prefix, while the host and date attributes on the root element carry the meta prefix. Each prefix is bound to a URI in the document’s namespace declarations.

The setup is the same as in the basic example. First, load the input file:

auto inputpath = fs::path{INPUTDIR} / "server-logs.xml";
orcus::file_content input{inputpath};

then create a spreadsheet document and an import factory:

orcus::spreadsheet::range_size_t ssize{200, 10};
orcus::spreadsheet::document doc{ssize};
orcus::spreadsheet::import_factory factory(doc);

and finally construct the orcus_xml filter and pass an xmlns_repository instance and the factory instance to it:

orcus::xmlns_repository repo;
orcus::orcus_xml filter{repo, &factory};

Here is where we need to do something different; before defining any mapping rules, register short aliases for the namespace URIs used in the document by calling set_namespace_alias() one per alias:

filter.set_namespace_alias("log", "http://example.com/server-logs");
filter.set_namespace_alias("meta", "http://example.com/server-logs/meta");

Each call maps a short prefix string to its full URI. The prefixes chosen here do not need to match the ones declared in the XML document; they are local to the mapping session and are used solely to qualify element and attribute names inside the XPath expressions that follow.

Note

Documents that declare a default namespace (xmlns="...") require special handling. Pass an empty string as the alias to mark that URI as the default namespace for the mapping session:

filter.set_namespace_alias("", "http://example.com/default-ns");

Once a default namespace is set, any unprefixed name in an XPath expression is automatically resolved to that namespace, so the paths can be written without a prefix:

filter.set_cell_link("/root/child", "Sheet", 0, 0);

With the aliases in place, define cell links for the two metadata attributes on the root element:

filter.set_cell_link("/log:serverLogs/@meta:host", "Logs", 0, 1);
filter.set_cell_link("/log:serverLogs/@meta:date", "Logs", 1, 1);

The namespace prefix appears before the colon in each path segment, so /log:serverLogs/@meta:host targets an attribute named host in the meta namespace on the root element.

Define a range mapping for the repeating <log:entry> elements:

filter.start_range("Logs", 3, 0);
filter.append_field_link("/log:serverLogs/log:entry/@log:id", "ID");
filter.append_field_link("/log:serverLogs/log:entry/log:level", "Level");
filter.append_field_link("/log:serverLogs/log:entry/log:service", "Service");
filter.append_field_link("/log:serverLogs/log:entry/log:message", "Message");
filter.append_field_link("/log:serverLogs/log:entry/log:timestamp", "Timestamp");
filter.set_range_row_group("/log:serverLogs/log:entry");
filter.commit_range();

Namespace prefixes are required on every qualified name in the path. The @log:id segment selects the id attribute in the log namespace.

Finally, insert a new sheet, parse the input:

filter.append_sheet("Logs");
filter.read_stream(input.str());

and dump the sheet content:

auto* sheet = doc.get_sheet(0);
if (!sheet)
    throw std::runtime_error("failed to fetch the first sheet");

sheet->set_string(0, 0, "Host");
sheet->set_string(1, 0, "Date");

sheet->dump_flat(std::cout);

Note that the two label strings "Host" and "Date" are inserted into the sheet programmatically after parsing, since the XML document does not contain them. The values in column 1 of those rows were already populated by the cell links during read_stream().

Running this code produces the following output:

rows: 10  cols: 5
+-------+-------------+----------------+--------------------------------------------------------------------+----------------------+
| Host  | web-prod-04 |                |                                                                    |                      |
+-------+-------------+----------------+--------------------------------------------------------------------+----------------------+
| Date  | 2026-03-23  |                |                                                                    |                      |
+-------+-------------+----------------+--------------------------------------------------------------------+----------------------+
|       |             |                |                                                                    |                      |
+-------+-------------+----------------+--------------------------------------------------------------------+----------------------+
| ID    | Level       | Service        | Message                                                            | Timestamp            |
+-------+-------------+----------------+--------------------------------------------------------------------+----------------------+
| 1 [v] | INFO        | AuthService    | User alice@example.com authenticated successfully.                 | 2026-03-23T08:02:11Z |
+-------+-------------+----------------+--------------------------------------------------------------------+----------------------+
| 2 [v] | WARN        | AuthService    | Failed login attempt for user bob@example.com. Attempt 3 of 5.     | 2026-03-23T08:14:37Z |
+-------+-------------+----------------+--------------------------------------------------------------------+----------------------+
| 3 [v] | ERROR       | SessionManager | Cache connection timed out after 30s. Session store unreachable.   | 2026-03-23T08:31:05Z |
+-------+-------------+----------------+--------------------------------------------------------------------+----------------------+
| 4 [v] | INFO        | SessionManager | Cache connection restored. Resuming normal operations.             | 2026-03-23T08:31:09Z |
+-------+-------------+----------------+--------------------------------------------------------------------+----------------------+
| 5 [v] | ERROR       | ApiGateway     | Request to /api/orders returned 503. Upstream service unavailable. | 2026-03-23T09:45:22Z |
+-------+-------------+----------------+--------------------------------------------------------------------+----------------------+
| 6 [v] | INFO        | Scheduler      | Daily report job completed. 1,402 records processed in 4.2s.       | 2026-03-23T10:00:00Z |
+-------+-------------+----------------+--------------------------------------------------------------------+----------------------+