document url

The TreeWalker.

Nodes and their types.

The tree-like Document Object Model (DOM) is comprised of nodes whose arrangement describes the structure of the document. There are nine types of nodes; which type a particular node is can be discovered through its nodeType property. This will return a number corresponding to a type as {% footnoteref "node-types", "5, 6, and 12 are deprecated" %} follows{% endfootnoteref %}:

Value Type
1 Element
2 Attribute
3 Text
4 {% footnoteref "cdata-section", "XML only." %}CDATA Section{% endfootnoteref %}
7 {% footnoteref, "processing-instruction", "XML only" %}Processing Instruction{% endfootnoteref %}
8 Comment
9 Document
10 Document Type
11 Document Fragment

Element nodes are the most frequently targeted types in front-end development, but all are part of the DOM. Text nodes are interesting because they can contain not only the text of an element, but also the whitespace in the original HTML file. Although of limited practical value, these whitespace text nodes illustrate the nature of the DOM — it isn't simply an outline of a document, but a true representation of all its constituent parts and their relationship to one another.

The TreeWalker.

JavaScript provides a built-in object that can be used to list all the nodes in a document, the TreeWalker, which can be created using the createTreeWalker() method on the Document interface. This createTreeWalker() method takes three parameters: root, whatToShow, and filter. More on these parameters and their use can be found on MDN.

To take a look at every node on a page, a function that looks something like the following can be used:

function walk() {
    // create a new instance of the TreeWalker object.
    
    const walker = document.createTreeWalker(document, NodeFilter.SHOW_ALL);
    
    // starting at the root, iterate over each node in the DOM.
    
    do {
        console.log(walker.currentNode.nodeName);
        } while (walker.nextNode());
    }

Here, a new instance of TreeWalker is created with a root of document, which is always the first node in a document. It isn't required to start there unless the goal is to capture everything. Substituting another selector works the same way and will capture everything that follows that selector. The second parameter is whatToShow and it is set to show every node regardless of its type. This can also be used to only show certain types of nodes, such as element or text nodes, or, if more refined filtering is needed, the filter parameter with a callback can be included.

Next, a do...while loop is called that prints the nodeName of each node to the console. Every node has several properties, such as nodeType , nextSibling, or textContent, that can be returned as well. If a text node only contains whitespace, the textContent property will print a blank line to the console. To show the whitespace characters, String.replaceAll can be used to substitute text such as "[tab]" or "[newline]" for the various types of whitespace.

Finally, a quick note about using do...while to iterate over the TreeWalker as opposed to while. Using while (walker.nextNode()) will skip the root node.I found this confusing, because I thought that every node would print so long as that node had a node following it. However, it seems that while evaluates the expression and then executes the code, whereas do...while executes the code and then evaluates the expression. Subtle!

{% footnotes %}