HTML, CSS, and JavaScript are all coded documents that a Web browser renders together into a visual Web page. When the browser renders the page and then does a subsequent execution, it needs an interface for handling the Web page. Enter the “DOM.”
No, it’s not someone’s nickname - The Document Object Model, or “DOM” for short, is a programming interface for HTML and XML documents. Programming with the DOM is a big deal. It enables programmers to manipulate the page in various ways, such as:
The DOM is called an “object model” because it presents the page as an object. That “document object” contains an object representing each element. Element objects are nested from a root element to mirror the HTML structure of the page.
What’s really nice about the DOM is that it is not dependent upon any one programming language. It is most commonly used by JavaScript to manipulate Web pages in a browser, but it could be used by any other language, too. A good example of this would be using a scripting language like Python to scrape Web page contents. Another good example would be using test automation to poke and prod pages. The DOM also works for XML, but for this course, we will focus on HTML.
The first step with DOM programming is getting the elements themselves. Programming with the DOM makes one thing very clear: there is a difference between an element, a locator, and a selector.
To sum them up in one line: A locator uses a selector to find an element on a web page.
Why is this distinction important? Two main reasons:
For these reasons, we must separate the concerns of the element objects themselves and the locators used to find them.
There are many types of locators, such as:
We will cover different locator types in great detail in future chapters, as well as when to use which one. For now, just know that locators are the standard way for finding elements in a Web page, and that every element can have a unique locator. Also, know that a locator can return multiple elements, not just one - it will return all elements found that match its query.
Once element objects are obtained, there are many ways to interact with them. JavaScript specifically provides methods to not only change the state of the elements but also to send user-like actions to them. For example, the “click()” method will programmatically click an element as if a user had clicked it visually. The “textContent” property will get the text displayed by the element. The “getAttribute()” method will get a particular element attribute by name, and the “setAttribute()” method will add or change an element attribute. Anything a user can do visually in a browser can also be done programmatically with JavaScript actions. In fact, Cypress relies upon direct JavaScript calls within the browser.
Locators are also crucial for black-box testing outside of the browser. For example, Selenium WebDriver relies upon locators to find elements and interact with them. The main difference for WebDriver calls is that they cannot change the state of elements - they can only access the state and send interactions. Furthermore, WebDriver calls don’t call JavaScript directly - they operate using the WebDriver protocol.
Another browser automation tool that uses locators is Playwright. Unlike Selenium WebDriver, Playwright manipulates the browser using debug protocols. However, just like Selenium and Cypress, Playwright uses locators to find elements.
Regardless of the tool, you need to understand the DOM and know how to write good locators to develop automation.