Add tests to explain what the namespace declarations in HTMLParsingHelper are used for #10
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR aims to explain by adding tests what the following lines are doing:
dom/src/Webfactory/Dom/HTMLParsingHelper.php
Lines 72 to 73 in 17f6c52
They define implicit namespace mappings, i. e. namespace prefix-to-URI-mappings that will be used by various methods in this library when no explicit mappings are given. Those methods are:
BaseParsingHelper::createXPath()
, to create an XPath query with namespace bindingsBaseParsingHelper::dump()
, to know which namespaces are in effect at the place where the dumped XML string shall be usedBaseParsingHelper::parseFragment()
, to provide context which namespace declarations and which default namespace is active at the place where the XML fragement string was taken from.BaseParsingHelper::parseDocument()
does not need any explicit namespace declarations. After all, those are part of the XML document given.The "default namespace" mapping (for the empty
''
prefix) is relevant forBaseParsingHelper::parseFragment()
. It defines that code like...... will associate the
<p>
element with thehttp://www.w3.org/1999/xhtml
namespace. This is the native namespace for HTML5 elements that does not need to be declared.1 The following code achieves the same, but parses a full HTML5 document:Now, in both examples, we have a
<p>
element from that namespace.In order to match this element with an XPath expression, one needs to be aware that an XPath expression like
//p
queries for a<p>
element not connected to a namespace.2 But, as explained above, for XHTML and Polyglot HTML5 documents, nodes are connected tohttp://www.w3.org/1999/xhtml
.To make using XPath more convenient, in the absence of explicit declarations we also include the implicit defaults. The
''
prefix is ignored in this case (it's not a valid prefix, after all), buthtml
is what you're probably after.So, the XPath expression to match the
<p>
node from both preceding examples is//html:p
.Footnotes
When parsing a full HTML5 document with a parser that is aware of XML only, but not HTML5, this needs to be explicitly specified as the default namespace on the root element, see https://www.w3.org/TR/html-polyglot/#h4_element-level-namespaces. When parsing an HTML5 fragment only, the
BaseParsingHelper::parseFragment()
method will use a wrapping container to provide this default declaration. ↩There is no such thing as a "default" namespace in XPath. The default namespace at some point in an XML document is the namespace URI that elements will be connected to when no other namespace prefix is given. It can be different at different places in the XML document. An XPath expression matches an element if it is from the expected namespace or not namespaced. ↩