Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add tests to explain what the namespace declarations in HTMLParsingHe…
…lper are used for (#10) This PR aims to explain by adding tests what the following lines are doing: https://github.com/webfactory/dom/blob/17f6c52d64424830c86f3e28ecdda9c6d8351cf5/src/Webfactory/Dom/HTMLParsingHelper.php#L72-L73 They define _implicit_ namespace mappings, i. e. namespace prefix-to-URI-mappings that will be used by various methods in this library when no _explicit_ mappings are given. Those methods are: * `BaseParsingHelper::createXPath()`, to create an XPath query with namespace bindings * `BaseParsingHelper::dump()`, to know which namespaces are in effect at the place where the dumped XML string shall be used * `BaseParsingHelper::parseFragment()`, to provide context which namespace declarations and which default namespace is active at the place where the XML fragement string was taken from. `BaseParsingHelper::parseDocument()` does not need any explicit namespace declarations. After all, those are part of the XML document given. The "default namespace" mapping (for the empty `''` prefix) is relevant for `BaseParsingHelper::parseFragment()`. It defines that code like... ```php $parser = PolyglotHTML5ParsingHelper(); $fragment = $parser->parseFragment('<p>Hello XML</p>'); ``` ... will associate the `<p>` element with the `http://www.w3.org/1999/xhtml` namespace. This is the _native_ namespace for HTML5 elements that does not need to be declared.[^1] The following code achieves the same, but parses a full HTML5 document: ```php $parser = PolyglotHTML5ParsingHelper(); $document = $parser->parseDocument('<html xmlns="http://www.w3.org/1999/xhtml"><body><p>test</p></body></html>'); ``` Now, in both examples, we have a `<p>` element from that namespace. In order to match this element with an XPath expression, one needs to be aware that an XPath expression like `//p` queries for a `<p>` element _not connected to a namespace_.[^2] But, as explained above, for XHTML and Polyglot HTML5 documents, nodes are connected to `http://www.w3.org/1999/xhtml`. To make using XPath more convenient, in the absence of explicit declarations we also include the implicit defaults. The `''` prefix is ignored in this case (it's not a valid prefix, after all), but `html` is what you're probably after. So, the XPath expression to match the `<p>` node from both preceding examples is `//html:p`. [^1]: When parsing a full HTML5 document with a parser that is aware of XML only, but not HTML5, this needs to be explicitly specified as the default namespace on the root element, see https://www.w3.org/TR/html-polyglot/#h4_element-level-namespaces. When parsing an HTML5 fragment only, the `BaseParsingHelper::parseFragment()` method will use a wrapping container to provide this default declaration. [^2]: There is no such thing as a "default" namespace in XPath. The default namespace at some point in an XML document is the namespace URI that elements will be connected to when no other namespace prefix is given. It can be different at different places in the XML document. An XPath expression matches an element if it is from the expected namespace or not namespaced. --------- Co-authored-by: mpdude <[email protected]>
- Loading branch information