Scroll to navigation

LaTeXML::Core::Document(3) User Contributed Perl Documentation LaTeXML::Core::Document(3)

NAME

"LaTeXML::Core::Document" - represents an XML document under construction.

DESCRIPTION

A "LaTeXML::Core::Document" represents an XML document being constructed by LaTeXML, and also provides the methods for constructing it. It extends LaTeXML::Common::Object.

LaTeXML will have digested the source material resulting in a LaTeXML::Core::List (from a LaTeXML::Core::Stomach) of LaTeXML::Core::Boxs, LaTeXML::Core::Whatsits and sublists. At this stage, a document is created and it is responsible for `absorbing' the digested material. Generally, the LaTeXML::Core::Boxs and LaTeXML::Core::Lists create text nodes, whereas the LaTeXML::Core::Whatsits create "XML" document fragments, elements and attributes according to the defining LaTeXML::Core::Definition::Constructor.

Most document construction occurs at a current insertion point where material will be added, and which moves along with the inserted material. The LaTeXML::Common::Model, derived from various declarations and document type, is consulted to determine whether an insertion is allowed and when elements may need to be automatically opened or closed in order to carry out a given insertion. For example, a "subsection" element will typically be closed automatically when it is attempted to open a "section" element.

In the methods described here, the term $qname is used for XML qualified names. These are tag names with a namespace prefix. The prefix should be one registered with the current Model, for use within the code. This prefix is not necessarily the same as the one used in any DTD, but should be mapped to the a Namespace URI that was registered for the DTD.

The arguments named $node are an XML::LibXML node.

The methods here are grouped into three sections covering basic access to the document, insertion methods at the current insertion point, and less commonly used, lower-level, document manipulation methods.

Accessors

"$doc = $document->getDocument;"
Returns the "XML::LibXML::Document" currently being constructed.
"$doc = $document->getModel;"
Returns the "LaTeXML::Common::Model" that represents the document model used for this document.
"$node = $document->getNode;"
Returns the node at the current insertion point during construction. This node is considered still to be `open'; any insertions will go into it (if possible). The node will be an "XML::LibXML::Element", "XML::LibXML::Text" or, initially, "XML::LibXML::Document".
"$node = $document->getElement;"
Returns the closest ancestor to the current insertion point that is an Element.
"$node = $document->getChildElement($node);"
Returns a list of the child elements, if any, of the $node.
"@nodes = $document->getLastChildElement($node);"
Returns the last child element of the $node, if it has one, else undef.
"$node = $document->getFirstChildElement($node);"
Returns the first child element of the $node, if it has one, else undef.
"@nodes = $document->findnodes($xpath,$node);"
Returns a list of nodes matching the given $xpath expression. The context node for $xpath is $node, if given, otherwise it is the document element.
"$node = $document->findnode($xpath,$node);"
Returns the first node matching the given $xpath expression. The context node for $xpath is $node, if given, otherwise it is the document element.
"$node = $document->getNodeQName($node);"
Returns the qualified name (localname with namespace prefix) of the given $node. The namespace prefix mapping is the code mapping of the current document model.
"$boolean = $document->canContain($tag,$child);"
Returns whether an element $tag can contain a child $child. $tag and $child can be nodes, qualified names of nodes (prefix:localname), or one of a set of special symbols "#PCDATA", "#Comment", "#Document" or "#ProcessingInstruction".
"$boolean = $document->canContainIndirect($tag,$child);"
Returns whether an element $tag can contain a child $child either directly, or after automatically opening one or more autoOpen-able elements.
"$boolean = $document->canContainSomehow($tag,$child);"
Returns whether an element $tag can contain a child $child either directly, or after automatically opening one or more autoOpen-able elements.
"$boolean = $document->canHaveAttribute($tag,$attrib);"
Returns whether an element $tag can have an attribute named $attrib.
"$boolean = $document->canAutoOpen($tag);"
Returns whether an element $tag is able to be automatically opened.
"$boolean = $document->canAutoClose($node);"
Returns whether the node $node can be automatically closed.

Construction Methods

These methods are the most common ones used for construction of documents. They generally operate by creating new material at the current insertion point. That point initially is just the document itself, but it moves along to follow any new insertions. These methods also adapt to the document model so as to automatically open or close elements, when it is required for the pending insertion and allowed by the document model (See Tag).

"$xmldoc = $document->finalize;"
This method finalizes the document by cleaning up various temporary attributes, and returns the XML::LibXML::Document that was constructed.
"@nodes = $document->absorb($digested);"
Absorb the $digested object into the document at the current insertion point according to its type. Various of the the other methods are invoked as needed, and document nodes may be automatically opened or closed according to the document model.

This method returns the nodes that were constructed. Note that the nodes may include children of other nodes, and nodes that may already have been removed from the document (See filterChildren and filterDeleted). Also, text insertions are often merged with existing text nodes; in such cases, the whole text node is included in the result.

"$document->insertElement($qname,$content,%attributes);"
This is a shorthand for creating an element $qname (with given attributes), absorbing $content from within that new node, and then closing it. The $content must be digested material, either a single box, or an array of boxes, which will be absorbed into the element. This method returns the newly created node, although it will no longer be the current insertion point.
"$document->insertMathToken($string,%attributes);"
Insert a math token (XMTok) containing the string $string with the given attributes. Useful attributes would be name, role, font. Returns the newly inserted node.
"$document->insertComment($text);"
Insert, and return, a comment with the given $text into the current node.
"$document->insertPI($op,%attributes);"
Insert, and return, a ProcessingInstruction into the current node.
"$document->openText($text,$font);"
Open a text node in font $font, performing any required automatic opening and closing of intermedate nodes (including those needed for font changes) and inserting the string $text into it.
"$document->openElement($qname,%attributes);"
Open an element, named $qname and with the given attributes. This will be inserted into the current node while performing any required automatic opening and closing of intermedate nodes. The new element is returned, and also becomes the current insertion point. An error (fatal if in "Strict" mode) is signalled if there is no allowed way to insert such an element into the current node.
"$document->closeElement($qname);"
Close the closest open element named $qname including any intermedate nodes that may be automatically closed. If that is not possible, signal an error. The closed node's parent becomes the current node. This method returns the closed node.
"$node = $document->isOpenable($qname);"
Check whether it is possible to open a $qname element at the current insertion point.
"$node = $document->isCloseable($qname);"
Check whether it is possible to close a $qname element, returning the node that would be closed if possible, otherwise undef.
"$document->maybeCloseElement($qname);"
Close a $qname element, if it is possible to do so, returns the closed node if it was found, else undef.
"$document->addAttribute($key=>$value);"
Add the given attribute to the node nearest to the current insertion point that is allowed to have it. This does not change the current insertion point.
"$document->closeToNode($node);"
This method closes all children of $node until $node becomes the insertion point. Note that it closes any open nodes, not only autoCloseable ones.

Internal Insertion Methods

These are described as an aide to understanding the code; they rarely, if ever, should be used outside this module.

"$document->setNode($node);"
Sets the current insertion point to be $node. This should be rarely used, if at all; The construction methods of document generally maintain the notion of insertion point automatically. This may be useful to allow insertion into a different part of the document, but you probably want to set the insertion point back to the previous node, afterwards.
"$string = $document->getInsertionContext($levels);"
For debugging, return a string showing the context of the current insertion point; that is, the string of the nodes leading up to it. if $levels is defined, show only that many nodes.
"$node = $document->find_insertion_point($qname);"
This internal method is used to find the appropriate point, relative to the current insertion point, that an element with the specified $qname can be inserted. That position may require automatic opening or closing of elements, according to what is allowed by the document model.
"@nodes = getInsertionCandidates($node);"
Returns a list of elements where an arbitrary insertion might take place. Roughly this is a list starting with $node, followed by its parent and the parents siblings (in reverse order), followed by the grandparent and siblings (in reverse order).
"$node = $document->floatToElement($qname);"
Finds the nearest element at or preceding the current insertion point (see "getInsertionCandidates"), that can accept an element $qname; it moves the insertion point to that point, and returns the previous insertion point. Generally, after doing whatever you need at the new insertion point, you should call "$document->setNode($node);" to restore the insertion point. If no such point is found, the insertion point is left unchanged, and undef is returned.
"$node = $document->floatToAttribute($key);"
This method works the same as "floatToElement", but find the nearest element that can accept the attribute $key.
"$node = $document->openText_internal($text);"
This is an internal method, used by "openText", that assumes the insertion point has been appropriately adjusted.)
"$node = $document->openMathText_internal($text);"
This internal method appends $text to the current insertion point, which is assumed to be a math node. It checks for math ligatures and carries out any combinations called for.
"$node = $document->closeText_internal();"
This internal method closes the current node, which should be a text node. It carries out any text ligatures on the content.
"$node = $document->closeNode_internal($node);"
This internal method closes any open text or element nodes starting at the current insertion point, up to and including $node. Afterwards, the parent of $node will be the current insertion point. It condenses the tree to avoid redundant font switching elements.
"$document->afterOpen($node);"
Carries out any afterOpen operations that have been recorded (using "Tag") for the element name of $node.
"$document->afterClose($node);"
Carries out any afterClose operations that have been recorded (using "Tag") for the element name of $node.

Document Modification

The following methods are used to perform various sorts of modification and rearrangements of the document, after the normal flow of insertion has taken place. These may be needed after an environment (or perhaps the whole document) has been completed and one needs to analyze what it contains to decide on the appropriate representation.

"$document->setAttribute($node,$key,$value);"
Sets the attribute $key to $value on $node. This method is preferred over the direct LibXML one, since it takes care of decoding namespaces (if $key is a qname), and also manages recording of xml:id's.
"$document->recordID($id,$node);"
Records the association of the given $node with the $id, which should be the "xml:id" attribute of the $node. Usually this association will be maintained by the methods that create nodes or set attributes.
"$document->unRecordID($id);"
Removes the node associated with the given $id, if any. This might be needed if a node is deleted.
"$document->modifyID($id);"
Adjusts $id, if needed, so that it is unique. It does this by appending a letter and incrementing until it finds an id that is not yet associated with a node.
"$node = $document->lookupID($id);"
Returns the node, if any, that is associated with the given $id.
"$document->setNodeBox($node,$box);"
Records the $box (being a Box, Whatsit or List), that was (presumably) responsible for the creation of the element $node. This information is useful for determining source locations, original TeX strings, and so forth.
"$box = $document->getNodeBox($node);"
Returns the $box that was responsible for creating the element $node.
"$document->setNodeFont($node,$font);"
Records the font object that encodes the font that should be used to display any text within the element $node.
"$font = $document->getNodeFont($node);"
Returns the font object associated with the element $node.
"$node = $document->openElementAt($point,$qname,%attributes);"
Opens a new child element in $point with the qualified name $qname and with the given attributes. This method is not affected by, nor does it affect, the current insertion point. It does manage namespaces, xml:id's and associating a box, font and locator with the new element, as well as running any "afterOpen" operations.
"$node = $document->closeElementAt($node);"
Closes $node. This method is not affected by, nor does it affect, the current insertion point. However, it does run any "afterClose" operations, so any element that was created using the lower-level "openElementAt" should be closed using this method.
"$node = $document->appendClone($node,@newchildren);"
Appends clones of @newchildren to $node. This method modifies any ids found within @newchildren (using "modifyID"), and fixes up any references to those ids within the clones so that they refer to the modified id.
"$node = $document->wrapNodes($qname,@nodes);"
This method wraps the @nodes by a new element with qualified name $qname, that new node replaces the first of @node. The remaining nodes in @nodes must be following siblings of the first one.

NOTE: Does this need multiple nodes? If so, perhaps some kind of movenodes helper? Otherwise, what about attributes?

"$node = $document->unwrapNodes($node);"
Unwrap the children of $node, by replacing $node by its children.
"$node = $document->replaceNode($node,@nodes);"
Replace $node by @nodes; presumably they are some sort of descendant nodes.
"$node = $document->renameNode($node,$newname);"
Rename $node to the tagname $newname; equivalently replace $node by a new node with name $newname and copy the attributes and contents. It is assumed that $newname can contain those attributes and contents.
"@nodes = $document->filterDeletions(@nodes);"
This function is useful with "$doc-"absorb($box)>, when you want to filter out any nodes that have been deleted and no longer appear in the document.
"@nodes = $document->filterChildren(@nodes);"
This function is useful with "$doc-"absorb($box)>, when you want to filter out any nodes that are children of other nodes in @nodes.

AUTHOR

Bruce Miller <bruce.miller@nist.gov>

COPYRIGHT

Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US.

2024-02-26 perl v5.40.0