Welcome to new things

[Technical] [Electronic work] [Gadget] [Game] memo writing

Thoughts on using both Selenium and Playwright

I use Puppeteer for web crawling.

I had hoped to someday put together a guide on how to use Puppeteer, but in the meantime, I have been hearing a lot about Playwright as a browser manipulation tool similar to Puppeteer.

Then I started thinking that I should switch to Playwright when I write a new crawler and summarize how to use it.

On the other hand, Selenium is another well-known browser manipulation tool.

I heard that Selenium could be used with Microsoft Power Automate, so I was interested in Selenium as well, thinking that Selenium would be useful not only for crawling but also for automating daily operations.

I was torn between Selenium and Playwright, but since I had never used either, I ended up using both to find out for sure.

Here I would like to write about my experience with both of them, Selenium and Playwright.

structure

Selenium

The W3C has defined an API standard called WebDriver as a way to control the browser from the outside.

Selenium uses WebDriver to control the browser.

Safari has WebDriver functionality, so it can be operated directly from Selenium, but other browsers do not have WebDriver functionality, in which case a separate WebDriver application for each browser is available for installation.

Selenium operates the installed WebDriver app. The WebDriver app then operates the browser, thereby indirectly operating the browser from Selenium.

Playwright

Chrome-based browsers implement protocols for external manipulation and communication, and Playwright and Puppeteer are libraries that make it easy to use those protocols and directly manipulate the browser using those protocols.

Program Example

Let's write a program in Selenium and Playwright that searches for "test" on Google and retrieves the source of the fourth page of results.

Selenium

from selenium import webdriver
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager

driver = webdriver.Chrome(ChromeDriverManager().install())

driver.get("https://www.google.com")

elem = driver.find_element(By.CSS_SELECTOR, "form input[name=q]")
elem.send_keys("test\n")

elems = driver.find_elements(By.CSS_SELECTOR, "div[role=navigation] table td a")
elems[3].click()

print(driver.page_source)

driver.quit()

Playwright

import { chromium } from 'playwright';

(async () => {
    try {
        const browser = await chromium.launch({
            headless: false,
        });
        const context = await browser.newContext();
        const page = await context.newPage();

        await page.goto("https://www.google.com");

        await page.type("form input[name=q]", "test\n");

        let elems = page.locator("div[role=navigation] table td a");
        await elems.nth(3).click()

        console.log(await page.content());

        await browser.close();

    } catch (err) {
        console.error(err);
    }
})();

Both are similar.

difference

Playwright for finer control

WebDriver is a browser-independent standard, so its functionality is generic. Playwright, on the other hand, communicates using the Chrome protocol, so it can do everything that is implemented in the Chrome protocol.

For example, when accessing a Basic Authentication site with Selnium, the ID and password must be embedded in the URL each time.

driver.get("https://[user_name]:[password]@example.com/sub_dir")

On the other hand, in the case of Playwright, although it is possible to do this with Selenium, Chrome has a function to set an ID and password for authentication, which can be used to access the site without embedding the ID and password in the URL.

const context = await browser.newContext({
    httpCredentials: {
        username: 'user_name',
        password: 'password',
    },
});
const page = await context.newPage();
await page.goto("https://example.com/sub_dir");

Selenium that can also operate smartphones

Selenium only communicates with WebDriver; by setting the WebDriver's connection destination to the smartphone's browser, it is possible to control the smartphone's browser from Selenium.

Although we only tested it from the perspective of web crawling applications, Selenium is better suited for testing on actual devices and other situations where a wide range of devices and browsers need to be manipulated.

impressions

Web crawling

Playwright is more detailed, but WebDriver is also quite detailed, and for most web crawling, either Selenium or Playwright can be used.

Ease of writing programs

I felt that Playwright was more primitive and the program was more complex and verbose because of its richer functionality. It also uses asynchronous functions, so I felt it was difficult to get to grips with.

Selenium, on the other hand, is a synchronous function, and WebDriver is a generic operation, so I found it simpler and easier to write than Playwright.

For example, if you write a file download in Playwright, it would look like this (excerpt from Playwright documentation)

const [ download ] = await Promise.all([
    page.waitForEvent('download'),
    page.locator('text=Download file').click(),
]);

const path = await download.path();

Click the download button and wait for the download event to fire. After the event fires, the download is complete when the download path is obtained.

Thus, Playwright must be written with a strong sense of strictly direct browser control and with an awareness of synchronous and asynchronous operations.

Comparison with Puppeteer

Playwright is developed by a former Puppeteer developer, and its functions are almost identical to those of Puppeteer. If you have been using Puppeteer, you will be able to use Playwright easily and comfortably.

document

The official Selenium documentation is sufficient if you want to do something rough. If you want to know more in-depth information or detailed explanations of functions and arguments, you should refer to the API documentation, but this API documentation is not very easy to use, and it is inconvenient because you need to look up the source.

Playwright, on the other hand, is well documented and I don't think there is anything particularly troubling about it.

Conclusion.

For web crawling purposes, I honestly think either Selenium or Playwright is fine. However, I think that Playwright, which directly touches the browser, has better crawling ability when you are really in trouble.

Playwright is more feature-rich and asynchronous functions, so the learning cost is higher.

If you are using Puppeteer or writing programs in JavaScript, I think Playwright would be fine, but if you are a beginner and want to do a quick crawl, I recommend Selenium.

Personally, I have been using Puppeteer and have decided to use Playwright. However, since I have touched Selenium this time, I would like to keep notes on how to use it so that I can use both Selenium and Playwright to the same extent.

The aforementioned Selenium call in Microsoft Power Automate is now deprecated, and although I have learned how to use Selenium, it seems unlikely that it will be useful in Microsoft Power Automate, which is a shame.

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com