
xpath.pl -- Select nodes in an XML DOMThe library xpath.pl provides predicates to select nodes from an XML DOM tree as produced by library(sgml) based on descriptions inspired by the XPath language.
The predicate xpath/3 selects a sub-structure of the DOM non-deterministically based on an XPath-like specification. Not all selectors of XPath are implemented, but the ability to mix xpath/3 calls with arbitrary Prolog code provides a powerful tool for extracting information from XML parse-trees.
xpath_chk(+DOM, +Spec, ?Content) is semidet
xpath(+DOM, +Spec, ?Content) is nondet//Term/Term
The Terms above are of type callable. The functor specifies
the element name. The element name '*' refers to any element.
The name self refers to the top-element itself and is often
used for processing matches of an earlier xpath/3 query. A term
NS:Term refers to an XML name in the namespace NS. Optional
arguments specify additional constraints and functions. The
arguments are processed from left to right. Defined conditional
argument values are:
lastlast - IntExprlast-1 is the element directly preceding the last one.index(Integer).lastindex(last).last - IntExprindex(last-IntExpr).Defined function argument values are:
selfcontenttexttext(As)atom or string.normalize_spacetext, but uses normalize_space/2 to normalise
white-space in the outputnumber@Attributenumber, but subsequently transform the value
into an integer using the round/1 function.number, but subsequently transform the value
into a float using the float/1 function.@href and
@href(atom) are equivalent. The SGML parser
can return attributes as strings using the
attribute_value(string) option.In addition, the argument-list can be conditions:
content = content defines that the content
of the element is the atom content.
The functions lower_case and upper_case can be applied
to Right (see example below).contains(Haystack, Needle)h3 element inside a div element, where the div
element itself contains an h2 child with a strong
child.
//div(h2/strong)/h3
This is equivalent to the conjunction of XPath goals below.
..., xpath(DOM, //(div), Div), xpath(Div, h2/strong, _), xpath(Div, h3, Result)
Examples:
Match each table-row in DOM:
xpath(DOM, //tr, TR)
Match the last cell of each tablerow in DOM. This example illustrates that a result can be the input of subsequent xpath/3 queries. Using multiple queries on the intermediate TR term guarantee that all results come from the same table-row:
xpath(DOM, //tr, TR), xpath(TR, /td(last), TD)
Match each href attribute in an <a> element
xpath(DOM, //a(@href), HREF)
Suppose we have a table containing rows where each first column is the name of a product with a link to details and the second is the price (a number). The following predicate matches the name, URL and price:
product(DOM, Name, URL, Price) :-
xpath(DOM, //tr, TR),
xpath(TR, td(1), C1),
xpath(C1, /self(normalize_space), Name),
xpath(C1, a(@href), URL),
xpath(TR, td(2, number), Price).
Suppose we want to select books with genre="thriller" from a
tree containing elements <book genre=...>
thriller(DOM, Book) :-
xpath(DOM, //book(@genre=thiller), Book).
Match the elements <table align="center"> and <table
align="CENTER">:
//table(@align(lower) = center)
Get the width and height of a div element as a number,
and the div node itself:
xpath(DOM, //div(@width(number)=W, @height(number)=H), Div)
Note that div is an infix operator, so parentheses must be
used in cases like the following:
xpath(DOM, //(div), Div)
sub_dom(-Index, -Count, +Name, -Sub, +DOM) is nondet[private]
count_named_elements(+Content, +Name, -Count) is det[private]
nth_element(?N, +Name, -Element, +Content:list) is nondet[private]
modifiers(+Modifiers, +I, +Clen, +DOM, -Value)[private]
process_equality(+Left, +Right) is semidet[private]For example the XPath expression in [1], and the equivalent Prolog expression in [2], would both match the HTML element in [3].
[1] //table[align=lower-case(center)] [2] //table(@align=lower_case(center)) [3] <table align="CENTER">
text_of_dom(+DOM, +As, -Text:atom) is det[private]