Docpaths¶
Docpaths are xpath 1.0 style paths that can be used with docutils doctrees. They should feel familiar to you if you have used xpaths before, but even though they are similar there are a few differences to the syntax and available functions.
Examples¶
To quickly get an idea of the kind of docpaths that can be created, and to get you started here are some example docpaths, along with a description of the nodes that they select. See the Syntax section below for a in-depth reference.
title- This selects all the title nodes that are children of the context node.
paragraph[1]- This selects the first paragraph that is a child of the context node.
paragraph[last()]- This selects the last paragraph that is a child of the context node.
paragraph[position() <= 3]- This selects the first three paragraphs that are children of the context node.
node[name() != "section"]- This selects the children of the context node that are not sections.
preceding_sibling::paragraph[1]- This selects the previous paragraph sibling of the context node.
following_sibling::paragraph[1]- This selects the next paragraph sibling of the context node.
/section- This selects the top level sections in the document.
/section[2]/paragraph[3]/bullet_list- This selects the bullet_list that appears in the third paragraph of the second top level section.
/descendant_or_self::node/child::section- This selects all the sections in the document, regardless of level.
//section- This selects all the sections in the document, regardless of level, and is abbreviated syntax for the previous example.
//section[title=="Examples"]- This selects the section that has a child title node whose string value is
equal to
Examples, or put more simply this selects the section calledExamples, regardless of what level it is at. //bullet_list/list_item- This selects the all the list_item nodes that are in bullet_lists.
Syntax¶
In general docpaths are comprised of one or more steps, separated by a forward
slash (/). Each step can be made up from an axis, a node test and some
optional predicates. If an axis is specified, then it must be immediately
followed by a double colon (::). If no axis is specified then it defaults
to the child axis, and the double colon should also be omitted. Any
predicates must appear inside square brackets ([, and ]).
E.g. ancestor::section/child::title[1]
Docpaths can be combined together using an | operator, this merges the
result of the two docpaths together.
E.g. (child::section|child::paragraph)[1] would find the first child node
that is either section or paragraph.
Docpaths steps are evaluated with respect to a context. The context contains
a context node, a context position and context size. If a docpaths starts with
a / then it is said to be an absolute path, and selects nodes that are
with respect to the root node of the document. Other docpaths are relative
paths and select nodes with respect to the context node.
Axes¶
An axis specifies the tree relationship between the context node and the nodes
that are selected by the axis. For example, the child axis will select all
the children nodes of the context node.
The available axes are:
ancestor- All direct ancestors of the node up to and including the root node. In other words, the context node’s parent, the parent’s parent and so on all the way up to the root node.
ancestor_or_self- All the nodes selected by the
ancestoraxis and also the context node. attribute- A special axis that selects the attributes of the context node.
child- All the children nodes of the context node.
descendant- All the children nodes of the context node, and their children nodes, and so on down to the bottom of the tree.
descendant_or_self- All the nodes selected by the
descendantaxis and also the context node. following- All nodes that occur in the document after the context node.
following_sibling- All siblings nodes of the context node that occur after it.
parent- The parent node of the context node.
preceding- All nodes that occur in the document before the context node.
preceding_sibling- All siblings nodes of the context node that occur before it.
self- The context node.
As you can probably see, the names of the docpath axes are the same as xpath
axes, but with - characters converted to _ characters instead.
The ancestor, descendant, following, preceding and self
axes partition a document. They do not overlap, and together contain all of
the nodes from the document.
The ancestor, ancestor_or_self, parent, preceding, and
preceding_sibling axes are reverse axes. This means that their nodes are
traversed in reverse order. This is important to know when using predicate
expressions that depend on the position of the node.
Node Test¶
A node test is either the name of a type of node, or one of the special node
types listed below. The * character can also be used and is short for
element. The node test is used to filter the nodes that are selected by
an axis. If a node test is the name of a type of node then only nodes of that
type will be selected.
The special node types select the following types of node:
node- All nodes match this node type, use this if you don’t want to filter out any nodes.
element- Selects all nodes that are not
text, orcommentnodes. text- Only selects nodes that are docutils Text nodes.
comment- Only selects nodes that are docutils comment nodes.
These special node types behave similarly to the NodeType functions in
xpath, except that they should not be followed by ().
For example a node test of section will mean that only section nodes will
be selected by the associated docpath step.
Predicates¶
Predicates are defined inside square brackets ([, and ]). They filter
nodes-sets that are generated in docpath steps to produce new node-sets. For
each node in the node-set the predicate expression is evaluated. If the
predicate evaluates to true for the node then it is included in the new
node-set, otherwise it is not included.
A predicate is evaluated with respect to a context. The context node is the node from the preceding docpath step. The context size is the number of nodes in the node-set, and the context position is the position of the node in the node-set.
Predicates can be chained together, the resulting node-set from one predicate is passed through to the next predicate. The context size and position are recalculated for each predicate.
When a predicate is evaluated its result is converted to a boolean in different ways depending on the type of the result:
- integer: the result is compared to the value of the
position()function, if it matches then the result is converted toTrue, otherwise it is converted toFalse. - Docpath: the docpath is evaluated to get a node-set. If the node-set
contains any nodes then the result is converted to
True, if the node-set is empty then it is converted toFalse. - everything else is converted to a boolean using the
bool()function.
Docpaths can be included in predicates, and produce node-sets that can be passed to functions or compared to other values or docpaths.
Note
One minor difference to normal docpaths is that absolute paths must
be preceded with the ^ character when they are inside a predicate
(e.g. ^//section).
Predicates can contain the following operators:
- comparison:
==,!=,<, <=``,>,>= - arithmetic:
+, -``,*,/,//,mod - logical:
and,or,not - unary:
+,-
The comparison operators work as normal except when one, or both, of its
operands is a node-set. In this case the nodes in the node-sets are first
converted to values using the nodes astext() function. If both
operands are node-sets then the comparison is true if there exists any node in
the first node-set and any node in the second node-set such that when their
values are compared would make the result true. If one of the operands is a
node-set then the result is true if the value of any node from the node-set
when compared to the other value would make the result true.
Predicates can contain the following functions:
-
count(value)¶ Parameters: value – The value to be counted. Returns: the length of the value. If the value is a docpath then the number of nodes in it’s node-set is returned, otherwise the result is the
len()of the value.
-
last()¶ Returns: the index of the last item in the context node-set. The return value is the context size from the evaluation context.
-
name([value])¶ Parameters: value (node-set) – An optional node-set. Returns: the type of first node in document order from the node-set. The name of the type of the first node is calculated and returned. The nodes in the node-set are ordered in document order. If no node-set is provided then the name of the type of the context node is returned instead.
-
position()¶ Returns: the position of the node in the node-set. The return value is the context position from the evaluation context.
Abbreviations¶
In order to make the docpaths shorter and more convenient there are some abbreviations that can be used which mirror similar abbreviations from xpaths:
child::can be omitted as it is the default axis.//is short for/descendant_or_self::node/..is short forparent::node.is short forself::node@is short forattribute::*is short forelement