I had previously tested both Selenium and Playwright for comparison. Now that I've learned how to use them, I've compiled a brief summary of how to use them in the following article so that I can quickly recall them when I need to use them again.

Since I originally used puppeteer as a crawler, I decided that if I put together Selenium and Playwright, I couldn't leave out puppeteer, so I decided to put together a brief summary of how to use puppeteer here.

However, after testing Selenium and Playwright, I am now mainly using Playwright. Therefore, I will keep this as a brief summary of Puppeteer's manners, rather than an exhaustive summary for regular use.

install

npm install --save puppeteer

Browser Chromium is installed on node_modules and the browser is used
You can use your own browser instead of the one installed by Puppeteer. In that case, set executablePath.
If you do not want to install a browser, install puppeteer-core

Use your own browser.

npm install --save puppetter-core

const browser = await puppeteer.launch({executablePath: <browser_path>});

sample

Example: Search for "test" on Google and view the source of the result page.

import puppeteer from 'puppeteer';

(async () => {
    try {
        const browser = await puppeteer.launch({
            headless: false,
            slowMo: 100,
        });

        const page = await browser.newPage();

        let selector;

        await page.goto("https://www.google.com");

        selector = 'input[name="q"]';
        await page.waitForSelector(selector);
        await page.type(selector, "test\n");

        await page.waitForNavigation();

        console.log(await page.content());

    } catch (err) {
        console.error(err);
    }
})();

How Puppeteer works and flow

The flow of the browser operation is to acquire the target dom and repeat the operation on dom.
Since dom may be generated dynamically, it is possible that dom is not yet generated when you try to retrieve it. Therefore, it is necessary to confirm the existence of dom before acquiring dom. Waiting until dom is found is page.waitForSelector().
When dom is found, dom is manipulated by executing JavaScript on the browser for dom.

Example

button as follows.

page.$eval() executes the JavaScript function of the second argument in the browser.
The selector dom specified in the first argument of page.$eval() becomes dom in the second argument.

let selector = 'input[type="button"]';
await page.waitForSelector(selector);
await page.$eval(selector, dom=>{ dom.click() });

The page contains some of the most common JavaScript operations in a set of functions.

Example

The above mouse click can be written as follows using page.click(<selector>).

let selector = 'input[type="button"]';
await page.waitForSelector(selector);
await page.click(selector);

In any case, the browser operation is repeated below.

Wait for dom
Execute JavaScript with dom

Getting values from JavaScript

The function return, which executes JavaScript, can return values from JavaScript.

Example

The innerText of dom can be obtained as follows.

let selector = 'div';
await page.waitForSelector(selector);
const text = await page.$eval(selector, dom=> dom.innerText);
console.log(text);

Passing to JavaScript

If a value is passed to the third argument of page.$eval(), it becomes the second argument of the second function and the value can be passed to JavaScript.

Example

The value in input can be set as follows.

const userName = 'ABC';
let selector = 'input[name="user"]';
await page.waitForSelector(selector);
await page.$eval(selector, (dom, val)=>{ dom.value = val }, userName);

plural element

The selector may match more than one dom.

With page.$eval(), only the first match of dom is passed to JavaScript, but with page.$$eval(), an array of matching dom is passed to JavaScript.

Example

Get value of option as an array

let selector = 'select option';
await page.waitForSelector(selector);
const res = await page.$$eval(selector, doms=>{
    const optionVals = [];
    for(const dom of doms){
        res.push(dom.innerText);
    }
    return optionVals;
});
console.log(res);

element

Operation

page.click(<selector>)
page.type(<selector>, <value>)
- text input
page.focus(<selector>)
page.$eval(<selector>,(dom, val)=>{ dom.value = val }, <val>)
- Set value for <input>

If you want to manipulate dom, use $eval() and manipulate it directly with JavaScript.

Properties

page.$eval(<selector>, dom=> dom.getAttribute(<attribute_name>))

If you want to get information from dom, use $eval() and get it directly by JavaScript.

transition

page.goto(<url>)
page.reload()

Wait for page load to complete.

page.waitForNavigation()
page.waitForNetworkIdle()

Properties

page.title()
page.url()
page.content()
- Page html

Impressions, etc.

I was also thinking of summarizing other common crawling cases such as uploading/downloading files, getting new pages with target="_blank", etc.

However, Puppeteer does not provide functions for such things on a case-by-case basis, but rather allows users to write and implement their own JavaScript.

Selenium and Playwright are written by retrieving an element and manipulating it using its methods, but Puppeteer is a library of Chrome DevTools Protocol calls, so it is written by using the methods of the page and manipulating the target element by specifying it as an argument. The method of the page is used to manipulate the target element by specifying it as an argument.

Puppeteer often becomes a rude code when it tries to do something, but this is due to the difference in purpose: Selenium and Playwright are libraries whose purpose is to automate browser work, whereas Puppeteer is a library whose purpose is to communicate with the browser.

Welcome to new things

[Technical] [Electronic work] [Gadget] [Game] memo writing

Notes on how to use puppeteer

install

Use your own browser.

sample

How Puppeteer works and flow

Example

Example

Getting values from JavaScript

Example

Passing to JavaScript

Example

plural element

Example

element

Operation

Properties

navigation

transition

Wait for page load to complete.

Properties

Impressions, etc.

install

Use your own browser.

sample

How Puppeteer works and flow

Example

Example

Getting values from JavaScript

Example

Passing to JavaScript

Example

plural element

Example

element

Operation

Properties

navigation

transition

Wait for page load to complete.

Properties

Impressions, etc.

Related Category Articles