Chapter 2 – W3C WebDriver Protocol

Selenium 4 in Java

Chapter 1 - Installing Selenium 4

Chapter 3 – Relative Locators

Chapter 4 – Window and Tab Management

Chapter 5 – Element Position With getRect

Chapter 6 – Screenshots

Chapter 7.1 – Chrome DevTools Protocol (CDP) Intro

Chapter 7.2 – Console Logs

Chapter 7.3 – Mock GeoLocation

Chapter 7.4 – Emulate Network Conditions

Transcripted Summary

Selenium 4 has been revamped with one major change and that's the W3C WebDriver Protocol.

This diagram is a snapshot of Selenium's 3 architecture.

Do you see the second component, JSON Wire Protocol? Its role is to transfer information from the client to the server. That information is passed over an HTTP.

HTTP is an acronym for hypertext transfer protocol. We see it sends HTTP requests and receives HTTP responses. This component has been removed from the new architecture.

Now there is direct communication between the client and server.

As a result, the W3C WebDriver Protocol has at least 3 advantages.

Number 1, it provides standards.
Number 2, it provides stability.
Number 3, it provides an updated actions API that is supplied with better resources.

For standards, the W3C, which stands for World Wide Web Consortium, is an international group of people that creates long term standards for the web. Therefore, our automation test scripts will run consistent on each browser. Since Selenium 4 is compliant with W3C WebDriver, we have no more required encoding and decoding of the API request.

The key advantage for stability is backward compatibility.

It is no problem for people who want to still use the old JSON Wire Protocol. Selenium became compliant with W3C and Selenium 3. That's why it still works in Selenium 4.

The Java bindings and Selenium server provides a mechanism for us to use JSON wire protocol with the updated actions API. We can manage keyboard and mouse events.

It's an advantage because now Selenium 4 offers a way to perform more than 1 action at the same time, like pressing 2 keys.

# Selenium 4 Architecture — Direct Communication

Now, when it comes to Selenium 4, we see it has direct communication. There are 3 components.

The first component has 2 parts combined into one — Selenium Client is a separate part and WebDriver Language Bindings is a different part.

Selenium is an API that has commands for automating our browser.
WebDriver talks to the web browser through a browser driver. All the languages have their own bindings. Bindings mean the same commands written for Java is also written for C#, Python, Ruby, and JavaScript.

The second component is Browser Drivers, and we see it has 2 functions.

The first function receives a request from Selenium Client and WebDriver Language Bindings, then passes that request to the browser. A driver is also known as a proxy, which has the responsibility for controlling the browser.
The second function is to return a response from the browser back to the Selenium Client and WebDriver Language Bindings. All the drivers use a W3C WebDriver Protocol, and most of them are created by the browser vendors.

The third component is Web Browsers.

For our test scripts, we are going to use Chrome. But the other major browsers are Firefox, Safari, and Edge. This is where all of the Selenium commands are performed.

Here's the process. The browser receives a performance request and sends back a response to the driver.

Next, I will demo the relative locators, which locate elements based on their relationship to other elements.

Add AI to your existing test scripts in minutes!