XML::DOM (3)
Leading comments
Automatically generated by Pod::Man 2.28 (Pod::Simple 3.28) Standard preamble: ========================================================================
NAME
XML::DOM - A perl module for building DOM Level 1 compliant document structuresSYNOPSIS
use XML::DOM; my $parser = new XML::DOM::Parser; my $doc = $parser->parsefile ("file.xml"); # print all HREF attributes of all CODEBASE elements my $nodes = $doc->getElementsByTagName ("CODEBASE"); my $n = $nodes->getLength; for (my $i = 0; $i < $n; $i++) { my $node = $nodes->item ($i); my $href = $node->getAttributeNode ("HREF"); print $href->getValue . "\n"; } # Print doc file $doc->printToFile ("out.xml"); # Print to string print $doc->toString; # Avoid memory leaks - cleanup circular references for garbage collection $doc->dispose;
DESCRIPTION
This module extends the XML::Parser module by Clark Cooper. The XML::Parser module is built on top of XML::Parser::Expat, which is a lower level interface to James Clark's expat library.XML::DOM::Parser is derived from XML::Parser. It parses
The XML::Parser NoExpand option is more or less supported, in that it will generate EntityReference objects whenever an entity reference is encountered in character data. I'm not sure how useful this is. Any comments are welcome.
As described in the synopsis, when you create an XML::DOM::Parser object, the parse and parsefile methods create an XML::DOM::Document object from the specified input. This Document object can then be examined, modified and written back out to a file or converted to a string.
When using
When using XML::Parser 2.27 and above, you can suppress expansion of parameter entity references (e.g. %pent;) in the
A Document has a tree structure consisting of Node objects. A Node may contain other nodes, depending on its type. A Document may have Element, Text, Comment, and CDATASection nodes. Element nodes may have Attr, Element, Text, Comment, and CDATASection nodes. The other nodes may not have any child nodes.
This module adds several node types that are not part of the
XML::DOM Classes
The- *
- XML::DOM::Node - Super class of all node types
- *
-
XML::DOM::Document - The root of the XMLdocument
- *
- XML::DOM::DocumentType - Describes the document structure: <!DOCTYPE root [ ... ]>
- *
-
XML::DOM::Element - An XMLelement: <elem attr=``val''> ... </elem>
- *
-
XML::DOM::Attr - An XMLelement attribute: name=``value''
- *
- XML::DOM::CharacterData - Super class of Text, Comment and CDATASection
- *
-
XML::DOM::Text - Text in an XMLelement
- *
- XML::DOM::CDATASection - Escaped block of text: <![CDATA[ text ]]>
- *
-
XML::DOM::Comment - An XMLcomment: <!-- comment -->
- *
-
XML::DOM::EntityReference - Refers to an ENTITY:&ent; or %ent;
- *
-
XML::DOM::Entity - An ENTITYdefinition: <!ENTITY ...>
- *
- XML::DOM::ProcessingInstruction - <?PI target>
- *
- XML::DOM::DocumentFragment - Lightweight node for cut & paste
- *
-
XML::DOM::Notation - An NOTATIONdefinition: <!NOTATION ...>
In addition, the
- *
- XML::DOM::ElementDecl - Defines an element: <!ELEMENT ...>
- *
- XML::DOM::AttlistDecl - Defines one or more attributes in an <!ATTLIST ...>
- *
- XML::DOM::AttDef - Defines one attribute in an <!ATTLIST ...>
- *
-
XML::DOM::XMLDecl - An XMLdeclaration: <?xml version=``1.0'' ...>
Other classes that are part of the
- *
- XML::DOM::Implementation - Provides information about this implementation. Currently it doesn't do much.
- *
- XML::DOM::NodeList - Used internally to store a node's child nodes. Also returned by getElementsByTagName.
- *
- XML::DOM::NamedNodeMap - Used internally to store an element's attributes.
Other classes that are not part of the
- *
-
XML::DOM::Parser - An non-validating XMLparser that creates XML::DOM::Documents
- *
-
XML::DOM::ValParser - A validating XMLparser that creates XML::DOM::Documents. It uses XML::Checker to check against the DocumentType (DTD)
- *
- XML::Handler::BuildDOM - A PerlSAX handler that creates XML::DOM::Documents.
XML::DOM package
- Constant definitions
- The following predefined constants indicate which type of node it is.
UNKNOWN_NODE (0) The node type is unknown (not part of DOM) ELEMENT_NODE (1) The node is an Element. ATTRIBUTE_NODE (2) The node is an Attr. TEXT_NODE (3) The node is a Text node. CDATA_SECTION_NODE (4) The node is a CDATASection. ENTITY_REFERENCE_NODE (5) The node is an EntityReference. ENTITY_NODE (6) The node is an Entity. PROCESSING_INSTRUCTION_NODE (7) The node is a ProcessingInstruction. COMMENT_NODE (8) The node is a Comment. DOCUMENT_NODE (9) The node is a Document. DOCUMENT_TYPE_NODE (10) The node is a DocumentType. DOCUMENT_FRAGMENT_NODE (11) The node is a DocumentFragment. NOTATION_NODE (12) The node is a Notation. ELEMENT_DECL_NODE (13) The node is an ElementDecl (not part of DOM) ATT_DEF_NODE (14) The node is an AttDef (not part of DOM) XML_DECL_NODE (15) The node is an XMLDecl (not part of DOM) ATTLIST_DECL_NODE (16) The node is an AttlistDecl (not part of DOM) Usage: if ($node->getNodeType == ELEMENT_NODE) { print "It's an Element"; }
Not In
Global Variables
- $VERSION
- The variable $XML::DOM::VERSION contains the version number of this implementation, e.g. ``1.43''.
METHODS
These methods are not part of the - getIgnoreReadOnly and ignoreReadOnly (readOnly)
-
The DOMLevel 1 Spec does not allow you to edit certain sections of the document, e.g. the DocumentType, so by default this implementation throws DOMExceptions (i.e.NO_MODIFICATION_ALLOWED_ERR) when you try to edit a readonly node. These readonly checks can be disabled by (temporarily) setting the global IgnoreReadOnly flag.
The ignoreReadOnly method sets the global IgnoreReadOnly flag and returns its previous value. The getIgnoreReadOnly method simply returns its current value.
my $oldIgnore = XML::DOM::ignoreReadOnly (1); eval { ... do whatever you want, catching any other exceptions ... }; XML::DOM::ignoreReadOnly ($oldIgnore); # restore previous value
Another way to do it, using a local variable:
{ # start new scope local $XML::DOM::IgnoreReadOnly = 1; ... do whatever you want, don't worry about exceptions ... } # end of scope ($IgnoreReadOnly is set back to its previous value)
- isValidName (name)
-
Whether the specified name is a valid ``Name'' as specified in the XMLspec. Characters with Unicode values > 127 are now also supported.
- getAllowReservedNames and allowReservedNames (boolean)
-
The first method returns whether reserved names are allowed.
The second takes a boolean argument and sets whether reserved names are allowed.
The initial value is 1 (i.e. allow reserved names.)
The
XMLspec states that ``Names'' starting with (X|x)(M|m)(L|l) are reserved for future use. (Amusingly enough, theXMLversion of theXMLspec (REC-xml-19980210.xml) breaks that very rule by defining anENTITYwith the name 'xmlpio'.) A ``Name'' in this context means the Name token as found in theBNFrules in theXMLspec.XML::DOMonly checks for errors when you modify theDOMtree, not when theDOMtree is built by the XML::DOM::Parser. - setTagCompression (funcref)
-
There are 3 possible styles for printing empty Element tags:
-
- Style 0
-
<empty/> or <empty attr="val"/>
XML::DOMuses this style by default for all Elements. - Style 1
-
<empty></empty> or <empty attr="val"></empty>
- Style 2
-
<empty /> or <empty attr="val" />
This style is sometimes desired when using
XHTML.(Note the extra space before the slash ``/'') See <www.w3.org/TR/xhtml1> Appendix C for more details.
-
By default
XML::DOMcompresses all empty Element tags (style 0.) You can control which style is used for a particular Element by calling XML::DOM::setTagCompression with a reference to a function that takes 2 arguments. The first is the tag name of the Element, the second is the XML::DOM::Element that is being printed. The function should return 0, 1 or 2 to indicate which style should be used to print the empty tag. E.g.XML::DOM::setTagCompression (\&my_tag_compression); sub my_tag_compression { my ($tag, $elem) = @_; # Print empty br, hr and img tags like this: <br /> return 2 if $tag =~ /^(br|hr|img)$/; # Print other empty tags like this: <empty></empty> return 1; }
-
IMPLEMENTATION DETAILS
- *
-
Perl Mappings
The value undef was used when the
DOMSpec said null.The
DOMSpec says: Applications must encode DOMString usingUTF-16(defined in Appendix C.3 of [UNICODE] and Amendment 1 of [ISO-10646]). In this implementation we use plain old Perl strings encoded inUTF-8instead ofUTF-16. - *
-
Text and CDATASection nodes
The Expat parser expands EntityReferences and CDataSection sections to raw strings and does not indicate where it was found. This implementation does therefore convert both to Text nodes at parse time. CDATASection and EntityReference nodes that are added to an existing Document (by the user) will be preserved.
Also, subsequent Text nodes are always merged at parse time. Text nodes that are added later can be merged with the normalize method. Consider using the addText method when adding Text nodes.
- *
-
Printing and toString
When printing (and converting an
XMLDocument to a string) the strings have to encoded differently depending on where they occur. E.g. in a CDATASection all substrings are allowed except for ``]]>''. In regular text, certain characters are not allowed, e.g. ``>'' has to be converted to ``>''. These routines should be verified by someone who knows the details. - *
-
Quotes
Certain sections in
XMLare quoted, like attribute values in an Element. XML::Parser strips these quotes and the print methods in this implementation always uses double quotes, so when parsing and printing a document, single quotes may be converted to double quotes. The default value of an attribute definition (AttDef) in an AttlistDecl, however, will maintain its quotes. - *
-
AttlistDecl
Attribute declarations for a certain Element are always merged into a single AttlistDecl object.
- *
-
Comments
Comments in the
DOCTYPEsection are not kept in the right place. They will become child nodes of the Document. - *
-
Hidden Nodes
Previous versions of
XML::DOMwould expand parameter entity references (like %pent;), so when printing theDTD,it would print the contents of the external entity, instead of the parameter entity reference. With this release (1.27), you can prevent this by setting the XML::DOM::Parser options ParseParamEnt => 1 and ExpandParamEnt => 0.When it is parsing the contents of the external entities, it *DOES* still add the nodes to the DocumentType, but it marks these nodes by setting the 'Hidden' property. In addition, it adds an EntityReference node to the DocumentType node.
When printing the DocumentType node (or when using to_expat() or to_sax()), the 'Hidden' nodes are suppressed, so you will see the parameter entity reference instead of the contents of the external entities. See test case t/dom_extent.t for an example.
The reason for adding the 'Hidden' nodes to the DocumentType node, is that the nodes may contain <!ENTITY> definitions that are referenced further in the document. (Simply not adding the nodes to the DocumentType could cause such entity references to be expanded incorrectly.)
Note that you need XML::Parser 2.27 or higher for this to work correctly.
SEE ALSO
XML::DOM::XPathThe Japanese version of this document by Takanori Kawai (Hippo2000) at <member.nifty.ne.jp/hippo2000/perltips/xml/dom.htm>
The
The
The XML::Parser and XML::Parser::Expat manual pages.
XML::LibXML also provides a
CAVEATS
The method getElementsByTagName() does not return a ``live'' NodeList. Whether this is an actual caveat is debatable, but a few people on the www-dom mailing list seemed to think so. I haven't decided yet. It's a pain to implement, it slows things down and the benefits seem marginal. Let me know what you think.AUTHOR
Enno Derksen is the original author.Send patches to T.J. Mather at <tjmather@maxmind.com>.
Paid support is available from directly from the maintainers of this package. Please see <www.maxmind.com/app/opensourceservices> for more details.
Thanks to Clark Cooper for his help with the initial version.