DOM tree

document_tree is a DOM-style, in-memory representation of an XML document. It is built on top of the low-level SAX parsers: loading a document runs a parser internally and assembles the resulting events into a navigable tree of nodes.

The tree is accessed through lightweight value-type handles. const_node is a read-only handle that exposes a node’s type, name, attributes, children and parent, while node derives from it and adds methods for building and editing the tree in place. Both handles refer to storage owned by the enclosing document_tree, so they must not be used after the tree is destroyed. Names are represented by entity_name, which pairs a local name with an optional namespace identifier.

Note

A document_tree is constructed with a reference to an xmlns_repository, which it uses to create namespace identifiers for the names it stores. The repository must outlive the tree.

Both examples below share the following headers:

#include <orcus/dom_tree.hpp>
#include <orcus/xml_namespace.hpp>
#include <orcus/stream.hpp>

#include <iostream>
#include <filesystem>

namespace fs = std::filesystem;

Loading and navigating

Consider the following XML document, stored in a file named library.xml:

<?xml version="1.0" encoding="UTF-8"?>
<library name="City Library">
  <book id="b1" title="The Go Programming Language"/>
  <book id="b2" title="Effective Modern C++"/>
</library>

Construct a tree from an xmlns_repository, load the file into memory with file_content, and parse it with load():

// the repository creates namespace identifiers for the stored names and must
// outlive the tree
xmlns_repository repo;
dom::document_tree tree(repo);

auto inputpath = fs::path{INPUTDIR} / "library.xml";
file_content input{inputpath};
tree.load(input.str());

INPUTDIR is a constant that stores a path to the directory where the input file is located.

Obtain the root element with root() and inspect it. Attribute values are looked up by name via attribute(), which returns an empty value when no such attribute exists:

// root() hands back a read-only handle into storage owned by the tree
dom::const_node root = tree.root();
std::cout << "root: " << root.name().name << std::endl;
std::cout << "  name attribute: " << root.attribute("name") << std::endl;
std::cout << "  child count: " << root.child_count() << std::endl;

Walk the child elements with child_count() and child(). Note that child_count() counts only child elements, so the whitespace between the elements in the source does not contribute to the count:

// child_count() counts only child elements, so text and whitespace between
// the elements are not included
for (std::size_t i = 0; i < root.child_count(); ++i)
{
    dom::const_node child = root.child(i);
    std::cout << "  child " << i << ": " << child.name().name
        << " id=" << child.attribute("id")
        << " title='" << child.attribute("title") << "'" << std::endl;
}

This produces the following output:

--- load and navigate ---
root: library
  name attribute: City Library
  child count: 2
  child 0: book id=b1 title='The Go Programming Language'
  child 1: book id=b2 title='Effective Modern C++'

Building a tree

The same API can build a document from scratch. set_root() installs the root element and returns a mutable node, which is then populated with append_element(), set_attribute() and append_content():

xmlns_repository repo;
dom::document_tree tree(repo);

// set_root() installs a fresh root element and returns a mutable handle
dom::node root = tree.set_root({"message"});
root.set_attribute("lang", "en");

dom::node greeting = root.append_element({"greeting"});
greeting.append_content("Hello, world!");

Note

Each name is passed as an entity_name. The braces in {"message"} are what construct that entity_name from the string: passing a bare string literal would not compile, because converting it to an entity_name requires two user-defined conversions (first to std::string_view, then to entity_name), and only one is allowed in an implicit conversion. The braced-initializer form sidesteps this by constructing the argument in place. To give a name a namespace, pass both an xmlns_id_t and the local name, as in {ns, "message"}.

Finally, serialize the tree back to XML with dump(). The indent argument gives the number of spaces per nesting level:

// dump() serializes the tree; the indent is the number of spaces per level
std::cout << tree.dump(2) << std::endl;

This produces the following output:

--- build and serialize ---
<message lang="en">
  <greeting>Hello, world!</greeting>
</message>