Welcome to new things

[Technical] [Electronic work] [Gadget] [Game] memo writing

Notes on how to use Playwright

Since I have been experimenting with Playwright recently, I thought I would make a note of how to use Playwright so that I can remember how to use it again if I forget how to use it.

install

npm install --save playwright
  • When Playwright is installed, the browsers Chromium, Firefox, and Webkit are installed at the same time.
  • By default, the browser called by Playwright uses the browser installed at the time
  • It is possible to install only Playwright without installing a browser and use a separately installed browser
  • However, since the version of Playwright and the version of the browser are closely related, it is better to install Playwright and the browser at the same time and use the installed browser

configuration

Playwright consists of the following classes

Browser > Context > Page > Locator

Brower

  • Browsers such as Chromium, Chrome, Firefox, Edge, etc.
  • Example of setting up a possible project

    • headless mode
    • Browser Launch Options
    • Slow operation (slowMo)

Context

  • What constitutes the execution environment
  • Each window (not tab) as in a browser
  • One browser window is associated with one Context
  • Example of setting up a possible project

    • user agent
    • locale
    • time zone
    • window size

Page

  • So-called browser tabs
  • JavaScript popups also become one Page

Locator

  • Each dom element in html

Context.close() Browser.close()

When finished, it is necessary to close Context, Browser, and .close() in that order.

sample

Example: Search for "test" on Google and view the source of the result page.

import { chromium } from 'playwright';

(async () => {
    try {

        const browser = await chromium.launch({
            headless: false,
        });

        const context = await browser.newContext();

        const page = await context.newPage();

        await page.goto('https://www.google.com/');
        await page.locator('input[name="q"]').type("test\n");
        await page.locator('#pnnext').click();
        await page.waitForURL(/search\?q/);

        console.log(await page.content());

        await context.close();
        await browser.close();

    } catch (err) {
        console.error(err);
    }
})();

Locator

The flow of the browser operation is to acquire the target dom and repeat the operation for that dom.

In Playwright, dom is represented by Locator, and all attribute acquisition and operations of dom are performed via Locator.

Auto standby

  • The dom may not have been generated yet when dom is obtained.
  • When Locator is used to acquire attributes or perform operations on dom, Locator will automatically wait until dom is created and then perform the operations after it is created.
  • If dom is not found until the set time, an error occurs.
  • Setting time is 30sec by default

In the example below, input[type="button"] is automatically waited until it is found inside Locator and clicked when it is found. Therefore, there is no need to wait until it is found.

await page.locator('input[type="button"]').click();

There are multiple ways to specify elements in Locator.

The designation method is described below.

page.getBy...()

  • There are several functions to get Locator, page and page.getBy...().
  • Not very small, so not actively used.

CSS selector and XPath selector

  • CSS selector and XPath selector can be used with page.locator(<selector>).
  • Write css= for CSS selector, xpath= for XPath selector, and xpath= for CSS selector. (Both can be omitted).
await page.locator('css=input.test1').click();
await page.locator('input.test1').click();
await page.locator('xpath=//input[@class="test1"]').click();
await page.locator('//input[@class="test1"]').click();

Playwright's own selector

There is a unique Plawright selector, not found in the CSS selector, that can be combined with the CSS selector to make more complex selections.

Contains a specific string

  • locator.filter({hasText: <RegExp>})

    • Narrow down locator to those that contain a specific string.
    • Regular expressions can be used to specify strings.
    • String search extends to child nodes and below

html

<div id="small">abc</div>
<div id="large">ABC</div>

code

// large
console.log(
    await page.locator('div').filter({ hasText: /ABC/ }).getAttribute('id')
);

The string is looked at up to the child nodes, so in the following case, the id's are "test1", "test2", and "large".

html

<div id="test1">
    <div id="test2">
        <div id="small">abc</div>
        <div id="large">ABC</div>
    </div>
</div>

code

await page.locator('div').filter({hasText: /test/).click();

tracing back to a parent element

  • locator.locator('..')

    • One above locator, dom.
    • Can be chained and traced upwards.

html

<div id="a">
    <div id="b">
        <div id="c">abc</div>
    </div>
</div>

code

// c
console.log(
    await page.locator('#c').getAttribute('id')
);

// b
console.log(
    await page.locator('#c').locator('..').getAttribute('id')
);

// a
console.log(
    await page.locator('#c').locator('..').locator('..').getAttribute('id')
);

plural element

  • Selector may match multiple elements
  • You can get an array of locator matching locator.all()
for(const l of await page.locator('input').all()){
    console.log(await l.getAttribute('id'));
}

Locator (Operation)

  • locator.click()

    • click
  • locator.fill(<string>)

    • Set value in input to the specified string
  • locator.type(<string>)

    • String input
    • Emulate keyboard input
  • locator.clear()

    • Clear value of input
  • locator.press(<key>)

  • locator.forcus()

    • shift focus
  • locator.scrollIntoViewIfNeeded()

    • It scrolls up to where the element is visible.

Locator (Attribute)

  • locator.getAttribute(<attribute_name>)

    • Property Value Acquisition
    • Attribute Name Example

      • id
      • name
      • value

Locator (Other)

  • locator.isChecked()

    • Determine check status of check boxes/radio buttons
  • locator.setChecked(true|false)

    • Set check status of check boxes/radio buttons
  • locator.setInputFiles([<file>,...])

    • Set the file <input type="file" />
    • For multiple file selection, pass multiple files in an array
  • locator.selectOption([<val>,...])

    • Select option for select
    • Specify value or label
    • Multiple option selections, pass multiple values in an array
  • locator.waitFor()

    • Wait until that element appears.
    • Used when you do not want to acquire or manipulate attributes, but want to wait for them to be displayed.
    • Used for waiting for page load completion, etc.

Obtains the selection state of the option of select

Cannot be done with Playwright's functionality alone; use JavaScript to get it.

select (single selection)

Obtained using select.selectedIndex

html

<select>
  <option value="v1">v1</option>
  <option value="v2" selected>v2</option>
  <option value="v3">v3</option>
</select>

code

const selectedOption = await page.locator('select')
    .evaluate((node: HTMLSelectElement) => {
        return node.options[node.selectedIndex].value;
    });

// v2
console.log(selectedOption);

select (multiple choice)

Obtained using option.selected

html

<select multiple>
  <option value="v1" selected>v1</option>
  <option value="v2">v2</option>
  <option value="v3" selected>v3</option>
</select>

code

const selectedOptions = await page.locator('select#test7')
    .evaluate((node: HTMLSelectElement) => {
        const selectedValues: Array<string> = [];
        for (const option of node.options) {
            if (option.selected) {
                selectedValues.push(option.value);
            }
        }
        return selectedValues;
    });

// [ 'v1', 'v3' ]
console.log(selectedOptions);

navigation

  • Default is to wait until the load event occurs
  • The argument can be changed to wait until networkidle is reached.

transition

  • page.goto(<URL>)

    • It automatically waits until loading is complete.
  • page.reload()

    • It automatically waits until loading is complete.
  • page.waitForTimeout()

    • Wait for specified time (ms)

Wait for page load to complete.

  • page.waitForLoadState()

    • Wait for the current page to complete loading.
  • page.waitForURL(<url|glob_string|RegExp>)

    • Wait for completion of loading at the specified URL
    • URL, glob format, and regular expressions can be used.
  • page.waitForSelector(<selector>)

    • Wait for the specified element to appear.
    • If you want to use that element, you can use locator directly and it will automatically stand by, so there is no need to describe it.
  • locator.waitFor()

    • Wait for the specified element to appear.
    • What page.waitForSelector(<selector>) does is the same.
    • If you want to use that element, you can use locator directly and it will wait automatically, so there is no need to describe it.

The button click locator automatically waits until the button is displayed, so there is no need to check for page load completion.

await page.locator('a').click();
await page.locator('button').click();

Must wait for page load to complete. Any of the following methods of waiting are acceptable.

  • page.waitForLoadState()
  • page.waitForURL(<url>)
  • page.waitForSelector(<selector>)
  • page.locator(<selector>).waitFor()
await page.locator('a').click();

// wait load finish
await page.waitForLoadState();

console.log(page.content());

Properties

  • page.title()
  • page.url()
  • page.content()

    • Page html

frame

  • iframe can be obtained by using a selector like other tags.
  • Use page.frameLocator(<selector>) instead of locator to get ifreme
  • The retrieved iframe is like page, from which the elements in iframe can be accessed using locator.

Example: Click button in iframe

const iframe = await page.frameLocator('iframe');
await iframe.locator('button').click();

Tab

  • Tabs and page are the same thing
  • When a new tab is created, a new page is created

How to get a new tab

When a new tab is created with <a target="_blank">, etc., how to get page of the newly created tab.

  • The return value of the page event handler of context becomes the newly created page object.
  • To prevent the event handler from blocking the action to create a new tab, it is created in Promise and waits in await after the action.
const pagePromise = context.waitForEvent('page');
await page.locator('a').click();
const newPage = await pagePromise;
await newPage.locator('button').click();

Browser and Context Settings

  • Slow down your movements

    • Browser

      • slowMo
  • Go to headless mode

    • Browser

      • headless
  • Regional and language settings

    • Context

      • locale
      • timezoneId
const browser = await chromium.launch({
    headless: false,
    slowMo: 100,
});

const context = await browser.newContext({
    locale: 'ja',
    timezoneId: 'Asia/Tokyo',
});

Browser state saving

cookie・localStorage 保存

Cookie and localStorage can be saved to save browser state.

  • Save the information of cookei and localStorage to a file with context.storageState({ path: <file_path> }).
  • Restore cookie and localStorage information stored in file browser.newContext({ storageState: <file_path> }) and launch window
  • Must be saved before context.close()
const context = await browser.newContext({
    storageState: './state.json',
});

//////////

await context.storageState({ path: './state.json' });

Browser user data storage

The user data storage location of the browser can be specified to save the browser status.

  • Specify the destination for user data storage with BrowerType.launchPersistentContext().
  • Since BrowerType.launchPersistentContext() generates Browser and Context at the same time, the Browser and Context options are set here
const saveDir = './save/browser'
const context = await chromium.launchPersistentContext(
    saveDir,
    {
        headless: false,
        locale: 'ja',
        timezoneId: 'Asia/Tokyo',
        slowMo: 100,
    }
);

Smartphone emulation

  • Smartphone template defined in playwright.devices
  • For iPhone, it is better to use webkit.
  • If TLS certificate error occurs, set ignoreHTTPSErrors
import { webkit, devices } from 'playwright';

const browser = await webkit.launch();
const context = await browser.newContext({
    ...devices['iPhone 12'],
    ignoreHTTPSErrors: true,
});

File Download

Specifications/Procedures

  • Files are stored in temporary
  • Temporary files are deleted when context is closed
  • Download file information is stored in the Download object
  • The return value of the download event handler for page is now an Download object.
  • To prevent the event handler from blocking the download action, generate it with Promise and wait with await after the action.
  • When the download is complete, await download.path() is complete
  • The file name is download.suggestedFilename().
  • Use download.saveAs(<path>) to copy a file stored in a temporary location to another location.
const downloadPromise = page.waitForEvent('download');
await page.locator('button#btn_save').click();
const download = await downloadPromise;
await download.path();
const fileName = download.suggestedFilename();
await download.saveAs(`./download/${fileName}`);

file upload

form

  • Set locator.setInputFiles([<path>]) to the absolute path of the file to be uploaded to <input type="file" />
await page.locator('input[type="file"]').setInputFiles([<path>]);
await page.locator('input[type="submit"]').click();

File Selection Dialog

Clicking on a file brings up a file selection dialog, where you specify the file and upload the pattern.

Specifications/Procedures

  • Set the file path in the file selection dialog and press OK with the FileChooser object
  • The return value of the filechooser event handler of page becomes an FileChooser object.
  • To prevent the event handler from blocking the download action, generate it with Promise and wait with await after the action.
  • It specifies the file path with FileChooser.setFiles([<path>]) and presses the OK button.
const fileChooserPromise = page.waitForEvent('filechooser');
await page.locator('button#btn_save').click();
const fileChooser = await fileChooserPromise;
await fileChooser.setFiles([<path>]);

How the exchange works between Playwright and the browser

In order to execute JavaScript in the browser from Playwright, it is necessary to know some of the interactions between Playwright and the browser, so I will briefly explain.

Not only Playwright, but puppeteer and Selenium have roughly the same mechanism.

Browser objects cannot be brought to Playwright.

You can get <input> and <button> of the browser, get and set attributes, and click on them.

const input = await page.locator('input');
await input.fill('TEST');

const button = await page.locator('button');
await button.click();

At first glance, it appears as if the browser object is being brought to Playwright, but that is impossible; it is merely obtaining reference information from the browser to identify that object.

Then, when the user wants to operate the object from Playwright, the Playwright tells the browser the identification information and the contents of the operation, the browser executes it, and the user receives the result.

Locator & JSHandle

Playwright and puppeteer define the above identification information as JSHandle.

In addition, Playwright has Locator, which is an extension of JSHandle wrapped for ease of use, and uses Locator instead of JSHandle.

Situations for using JSHandle in Playwrigth

When executing JavaScript in the browser from Playwright, the exchange of objects between the browser and the browser is done using JSHandle, not Locator.

More details are explained in the JavaScript below.

JavaScript

You can run JavaScript in your browser from Playwright.

Sending and Receiving Values

  • Can pass values from Playwright to JavaScript
  • Can return values from JavaScript to Playwright

transceive data

Since the browser object cannot be passed directly, dom is passed as JSHandle.

page.evaluate()

  • Execute JavaScript
  • Return value with return
  • Return values can be scalar, array, or JSON.
  • Arguments can be passed to JavaScript
  • Arguments can be scalars, arrays, JSON, and JSHandle.
  • The JSHandle passed as an argument is restored to the JavaScript object in the browser
//////////
// return json
const res1 = await page.evaluate(() => {
    const obj = {
        val1: 123,
        val2: 'abc'
    };
    return obj;
});

// { val1: 123, val2: 'abc' }
console.log(res1);

//////////
// pass json
const res2 = await page.evaluate((arg) => {
    const obj = {
        val1: arg.num,
        val2: arg.str
    };
    return obj;
}, { num: 123, str: 'abc' });

// { val1: 123, val2: 'abc' }
console.log(res2);

locator.evaluate()

  • Almost the same as page.evaluate().
  • The first argument of the function becomes dom selected by locator
// set input value 'test'
await page.locator('input').evaluate((dom: HTMLInputElement, arg) => {
    dom.value = arg
}, 'test');

page.evaluateHandle()

  • Execute JavaScript
  • Return value with return
  • Return value is JSHandle only
  • Values other than JSHandle cannot be returned.
  • Arguments can be passed to JavaScript
  • Arguments can be scalars, arrays, JSON, and JSHandle
  • The JSHandle passed as an argument is restored to the JavaScript object in the browser

Example

Get input from the browser and set "abc" to value in input.

const res = await page.evaluateHandle(() => {
    return document.querySelector('input');
});

await page.evaluateHandle((arg) => {
    arg.dom.value = arg.val;
}, { dom: res, val: 'abc' });
  • document.querySelector('input') refers to input in the browser
  • resinputJSHandle
  • Passing res to evaluateHandle() restores JSHandle to the object input in the browser it refers to

    • In other words, arg.dom is not JSHandle but document.querySelector('input').

locator.evaluateHandle()

  • Almost the same as page.evaluateHandle()
  • The first argument of the function becomes dom selected by locator

debug

console出力

  • Output browser console to Playwright
await page.on('console', msg => {
    console.log(msg.text());
});

Stop at once

  • Put await page.pause() where you want to pause

    • Pause and the Inspector window appears.
    • Browser operation is possible while paused.

Find out how to write a selector to select an element

  • Launch Inspector window with await page.pause()
  • Click on Pic locator
  • From the plaza page, click on the element for which you want to examine the selector
  • A selector statement to select that element is displayed.

Find out which element of the page the selector is

  • Launch Inspector window with await page.pause()
  • Enter the selector expression in the text box next to Pic locator

    • locator('input') etc.
  • The appropriate element is highlighted.

Automatic code generation from operations

  • Launch Inspector window with await page.pause()
  • Click on Record
  • (Perform the operation you want to code)
  • Click on Record
  • The operation is coded and displayed in the Inspector window

Impressions, etc.

It would be useful to be able to quickly create a crawler when the idea strikes me, but I don't have the time to look up how to use it, so I don't feel inclined to create one.

So I started writing this because I thought it would be useful to summarize the minimum necessary functional description, but it turned out to be a long story.

I have written a lengthy article on JavaScript, but since there are few cases in which JSHandle is handled, I might as well have left the explanation of evaluete() alone.

Also, await page.pause() is very useful.

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com