XML

Orcus provides a set of low-level, high-performance XML parsing primitives that can be used independently of its spreadsheet import features. At the foundation is sax_parser, a template-based SAX parser that fires event callbacks into a user-supplied handler as it encounters elements, attributes, text content, declarations, CDATA sections, and DOCTYPE declarations. On top of it sit two higher-level variants: sax_ns_parser, which adds namespace resolution and element-scope tracking via an xmlns_context object, and sax_token_parser, which further tokenizes element and attribute names into integer tokens against a predefined vocabulary for faster downstream dispatch.

Namespace management is handled by two cooperating types. xmlns_repository is a session-level intern table that maps each unique namespace URI to a stable xmlns_id_t identifier. xmlns_context is a per-document companion that manages the stack of active prefix-to-URI bindings as parsing progresses through element scopes; a fresh context should be created from the repository for each XML stream.

Finally, orcus provides a higher-level mapping feature through orcus_xml, which allows you to project the contents of an XML document onto a spreadsheet by defining how repeating structures in the XML tree correspond to rows and columns in a sheet. This mapping can be defined either via a map file or programmatically through the C++ API.

Parsing

XML mapping