orcus-xml

Help output

Usage: orcus-xml [OPTIONS] FILE

Options:
  -h [ --help ]              Print this help.
  --mode arg                 Mode of operation. Select one of the following
                             options: dump, lint, map, map-gen, structure, or
                             transform.
  -m [ --map ] arg           Path to the map file. A map file is required for
                             all modes except for the structure mode.
  -o [ --output ] arg        Path to either an output directory, or an output
                             file.
  -f [ --output-format ] arg Specify the output format.  Supported format types
                             are:

                             * check - Flat format that fully encodes document
                             content. Suitable for automated testing.
                             * csv - CSV format.
                             * debug-state - This format dumps the internal
                             state of the document in detail, useful for
                             debugging.
                             * flat - Flat text format that displays document
                             content in grid.
                             * html - HTML format.
                             * json - JSON format.
                             * none - No output to be generated. Maybe useful
                             during development.
                             * xml - This format is currently unsupported.
                             * yaml - This format is currently unsupported.
  --indent arg               Number of spaces per indent level for XML output
                             when lint mode is specified. 0 produces compact
                             single-line output.

Supported modes

This command supports the following modes:

  • dump

  • lint

  • map

  • map-gen

  • structure

  • transform

dump

The dump mode parses the XML document into a full DOM tree and prints its content in a compact, human-readable format to standard output. This is useful for quickly inspecting the full content of a document.

lint

The lint mode is used to reformat an XML document optionally with a different indent level.

map and map-gen

The map and map-gen modes are related, and are typically used together. The map mode is used to map an XML document to a spreadsheet document model with a user-defined mapping rule, and the map-gen mode is used to auto-generate a mapping rule based on the structure of the source XML document.

Refer to the Mapping XML to spreadsheet section for a detailed example of how to use these modes to map an XML document to a spreadsheet document model.

structure

The structure mode analyses the overall structure of the source XML document and prints a compact representation of all unique element paths to standard output. Unlike the dump mode, which prints the full document content, this mode focuses on the schema-like element hierarchy, making it useful for understanding the shape of an unfamiliar document.

transform

The transform mode loads the XML document into a spreadsheet document model via a mapping rule, then writes the mapped content back out as an XML file. It requires both a map file (--map) and an output file path (--output).

Example usage

Reformat XML document

You can use this command to re-format an XML document by specifying --mode to lint, with an optional indent level via --indent option. The following command reformats the input XML file with an indent level of 2:

orcus-xml --mode lint --indent 2 path/to/input.xml

The command writes the output to standard output by default, or you can specify the --output option to have it written to a local file instead. Specifying --indent 0 produces compact single-line output.

Inspect document structure

To get a quick overview of the element hierarchy of an XML document without looking at its full content, use the structure mode:

orcus-xml --mode structure path/to/input.xml

Generate a map file

To auto-generate a map file from an XML document, use the map-gen mode:

orcus-xml --mode map-gen path/to/input.xml -o map.xml

The generated map.xml can then be edited and used with the map or transform modes via the --map option.