Since I have been experimenting with Playwright recently, I thought I would make a note of how to use Playwright so that I can remember how to use it again if I forget how to use it.
install
npm install --save playwright
- When Playwright is installed, the browsers Chromium, Firefox, and Webkit are installed at the same time.
- By default, the browser called by Playwright uses the browser installed at the time
- It is possible to install only Playwright without installing a browser and use a separately installed browser
- However, since the version of Playwright and the version of the browser are closely related, it is better to install Playwright and the browser at the same time and use the installed browser
configuration
Playwright consists of the following classes
Browser
> Context
> Page
> Locator
Brower
- Browsers such as Chromium, Chrome, Firefox, Edge, etc.
Example of setting up a possible project
- headless mode
- Browser Launch Options
- Slow operation (slowMo)
Context
- What constitutes the execution environment
- Each window (not tab) as in a browser
- One browser window is associated with one
Context
Example of setting up a possible project
- user agent
- locale
- time zone
- window size
Page
- So-called browser tabs
- JavaScript popups also become one
Page
Locator
- Each
dom
element inhtml
Context.close() Browser.close()
When finished, it is necessary to close Context
, Browser
, and .close()
in that order.
sample
Example: Search for "test" on Google and view the source of the result page.
import { chromium } from 'playwright'; (async () => { try { const browser = await chromium.launch({ headless: false, }); const context = await browser.newContext(); const page = await context.newPage(); await page.goto('https://www.google.com/'); await page.locator('input[name="q"]').type("test\n"); await page.locator('#pnnext').click(); await page.waitForURL(/search\?q/); console.log(await page.content()); await context.close(); await browser.close(); } catch (err) { console.error(err); } })();
Locator
The flow of the browser operation is to acquire the target dom
and repeat the operation for that dom
.
In Playwright, dom
is represented by Locator
, and all attribute acquisition and operations of dom
are performed via Locator
.
Auto standby
- The
dom
may not have been generated yet whendom
is obtained. - When
Locator
is used to acquire attributes or perform operations ondom
,Locator
will automatically wait untildom
is created and then perform the operations after it is created. - If
dom
is not found until the set time, an error occurs. - Setting time is 30sec by default
In the example below, input[type="button"]
is automatically waited until it is found inside Locator
and clicked when it is found. Therefore, there is no need to wait until it is found.
await page.locator('input[type="button"]').click();
There are multiple ways to specify elements in Locator
.
The designation method is described below.
page.getBy...()
- There are several functions to get
Locator
,page
andpage.getBy...()
. - Not very small, so not actively used.
CSS selector and XPath selector
- CSS selector and XPath selector can be used with
page.locator(<selector>)
. - Write
css=
for CSS selector,xpath=
for XPath selector, andxpath=
for CSS selector. (Both can be omitted).
await page.locator('css=input.test1').click(); await page.locator('input.test1').click(); await page.locator('xpath=//input[@class="test1"]').click(); await page.locator('//input[@class="test1"]').click();
Playwright's own selector
There is a unique Plawright selector, not found in the CSS selector, that can be combined with the CSS selector to make more complex selections.
Contains a specific string
locator.filter({hasText: <RegExp>})
- Narrow down
locator
to those that contain a specific string. - Regular expressions can be used to specify strings.
- String search extends to child nodes and below
- Narrow down
html
<div id="small">abc</div> <div id="large">ABC</div>
code
// large console.log( await page.locator('div').filter({ hasText: /ABC/ }).getAttribute('id') );
The string is looked at up to the child nodes, so in the following case, the id's are "test1", "test2", and "large".
html
<div id="test1"> <div id="test2"> <div id="small">abc</div> <div id="large">ABC</div> </div> </div>
code
await page.locator('div').filter({hasText: /test/).click();
tracing back to a parent element
locator.locator('..')
- One above
locator
,dom
. - Can be chained and traced upwards.
- One above
html
<div id="a"> <div id="b"> <div id="c">abc</div> </div> </div>
code
// c console.log( await page.locator('#c').getAttribute('id') ); // b console.log( await page.locator('#c').locator('..').getAttribute('id') ); // a console.log( await page.locator('#c').locator('..').locator('..').getAttribute('id') );
plural element
- Selector may match multiple elements
- You can get an array of
locator
matchinglocator.all()
for(const l of await page.locator('input').all()){ console.log(await l.getAttribute('id')); }
Locator (Operation)
locator.click()
- click
locator.fill(<string>)
- Set
value
ininput
to the specified string
- Set
locator.type(<string>)
- String input
- Emulate keyboard input
locator.clear()
- Clear
value
ofinput
- Clear
locator.press(<key>)
- Move focus to the element and press and release the specified key.
- Key Definition List
locator.forcus()
- shift focus
locator.scrollIntoViewIfNeeded()
- It scrolls up to where the element is visible.
Locator (Attribute)
locator.getAttribute(<attribute_name>)
- Property Value Acquisition
Attribute Name Example
- id
- name
- value
Locator (Other)
locator.isChecked()
- Determine check status of check boxes/radio buttons
locator.setChecked(true|false)
- Set check status of check boxes/radio buttons
locator.setInputFiles([<file>,...])
- Set the file
<input type="file" />
- For multiple file selection, pass multiple files in an array
- Set the file
locator.selectOption([<val>,...])
- Select
option
forselect
- Specify
value
orlabel
- Multiple
option
selections, pass multiple values in an array
- Select
locator.waitFor()
- Wait until that element appears.
- Used when you do not want to acquire or manipulate attributes, but want to wait for them to be displayed.
- Used for waiting for page load completion, etc.
Obtains the selection state of the option of select
Cannot be done with Playwright's functionality alone; use JavaScript to get it.
select (single selection)
Obtained using select.selectedIndex
html
<select> <option value="v1">v1</option> <option value="v2" selected>v2</option> <option value="v3">v3</option> </select>
code
const selectedOption = await page.locator('select') .evaluate((node: HTMLSelectElement) => { return node.options[node.selectedIndex].value; }); // v2 console.log(selectedOption);
select (multiple choice)
Obtained using option.selected
html
<select multiple> <option value="v1" selected>v1</option> <option value="v2">v2</option> <option value="v3" selected>v3</option> </select>
code
const selectedOptions = await page.locator('select#test7') .evaluate((node: HTMLSelectElement) => { const selectedValues: Array<string> = []; for (const option of node.options) { if (option.selected) { selectedValues.push(option.value); } } return selectedValues; }); // [ 'v1', 'v3' ] console.log(selectedOptions);
navigation
- Default is to wait until the
load
event occurs - The argument can be changed to wait until
networkidle
is reached.
transition
page.goto(<URL>)
- It automatically waits until loading is complete.
page.reload()
- It automatically waits until loading is complete.
page.waitForTimeout()
- Wait for specified time (ms)
Wait for page load to complete.
page.waitForLoadState()
- Wait for the current page to complete loading.
page.waitForURL(<url|glob_string|RegExp>)
- Wait for completion of loading at the specified URL
- URL, glob format, and regular expressions can be used.
page.waitForSelector(<selector>)
- Wait for the specified element to appear.
- If you want to use that element, you can use
locator
directly and it will automatically stand by, so there is no need to describe it.
locator.waitFor()
- Wait for the specified element to appear.
- What
page.waitForSelector(<selector>)
does is the same. - If you want to use that element, you can use
locator
directly and it will wait automatically, so there is no need to describe it.
Example: Click on a link to go to a new URL, and then click on a button at the new URL.
The button click locator
automatically waits until the button is displayed, so there is no need to check for page load completion.
await page.locator('a').click(); await page.locator('button').click();
Example: Click on a link to go to a new URL and display the page source of the page to which it transitions.
Must wait for page load to complete. Any of the following methods of waiting are acceptable.
page.waitForLoadState()
page.waitForURL(<url>)
page.waitForSelector(<selector>)
page.locator(<selector>).waitFor()
await page.locator('a').click(); // wait load finish await page.waitForLoadState(); console.log(page.content());
Properties
page.title()
page.url()
page.content()
- Page html
frame
iframe
can be obtained by using a selector like other tags.- Use
page.frameLocator(<selector>)
instead oflocator
to getifreme
- The retrieved
iframe
is likepage
, from which the elements iniframe
can be accessed usinglocator
.
Example: Click button
in iframe
const iframe = await page.frameLocator('iframe'); await iframe.locator('button').click();
Tab
- Tabs and
page
are the same thing - When a new tab is created, a new
page
is created
How to get a new tab
When a new tab is created with <a target="_blank">
, etc., how to get page
of the newly created tab.
- The return value of the
page
event handler ofcontext
becomes the newly createdpage
object. - To prevent the event handler from blocking the action to create a new tab, it is created in
Promise
and waits inawait
after the action.
const pagePromise = context.waitForEvent('page'); await page.locator('a').click(); const newPage = await pagePromise; await newPage.locator('button').click();
Browser and Context Settings
Slow down your movements
Browser
slowMo
Go to headless mode
Browser
headless
Regional and language settings
Context
locale
timezoneId
const browser = await chromium.launch({ headless: false, slowMo: 100, }); const context = await browser.newContext({ locale: 'ja', timezoneId: 'Asia/Tokyo', });
Browser state saving
cookie・localStorage 保存
Cookie and localStorage can be saved to save browser state.
- Save the information of cookei and localStorage to a file with
context.storageState({ path: <file_path> })
. - Restore cookie and localStorage information stored in file
browser.newContext({ storageState: <file_path> })
and launch window - Must be saved before
context.close()
const context = await browser.newContext({ storageState: './state.json', }); ////////// await context.storageState({ path: './state.json' });
Browser user data storage
The user data storage location of the browser can be specified to save the browser status.
- Specify the destination for user data storage with
BrowerType.launchPersistentContext()
. - Since
BrowerType.launchPersistentContext()
generatesBrowser
andContext
at the same time, theBrowser
andContext
options are set here
const saveDir = './save/browser' const context = await chromium.launchPersistentContext( saveDir, { headless: false, locale: 'ja', timezoneId: 'Asia/Tokyo', slowMo: 100, } );
Smartphone emulation
- Smartphone template defined in
playwright.devices
- For iPhone, it is better to use
webkit
. - If
TLS certificate
error occurs, setignoreHTTPSErrors
import { webkit, devices } from 'playwright'; const browser = await webkit.launch(); const context = await browser.newContext({ ...devices['iPhone 12'], ignoreHTTPSErrors: true, });
File Download
Specifications/Procedures
- Files are stored in temporary
- Temporary files are deleted when
context
is closed - Download file information is stored in the
Download
object - The return value of the
download
event handler forpage
is now anDownload
object. - To prevent the event handler from blocking the download action, generate it with
Promise
and wait withawait
after the action. - When the download is complete,
await download.path()
is complete - The file name is
download.suggestedFilename()
. - Use
download.saveAs(<path>)
to copy a file stored in a temporary location to another location.
const downloadPromise = page.waitForEvent('download'); await page.locator('button#btn_save').click(); const download = await downloadPromise; await download.path(); const fileName = download.suggestedFilename(); await download.saveAs(`./download/${fileName}`);
file upload
form
- Set
locator.setInputFiles([<path>])
to the absolute path of the file to be uploaded to<input type="file" />
await page.locator('input[type="file"]').setInputFiles([<path>]); await page.locator('input[type="submit"]').click();
File Selection Dialog
Clicking on a file brings up a file selection dialog, where you specify the file and upload the pattern.
Specifications/Procedures
- Set the file path in the file selection dialog and press OK with the
FileChooser
object - The return value of the
filechooser
event handler ofpage
becomes anFileChooser
object. - To prevent the event handler from blocking the download action, generate it with
Promise
and wait withawait
after the action. - It specifies the file path with
FileChooser.setFiles([<path>])
and presses the OK button.
const fileChooserPromise = page.waitForEvent('filechooser'); await page.locator('button#btn_save').click(); const fileChooser = await fileChooserPromise; await fileChooser.setFiles([<path>]);
How the exchange works between Playwright and the browser
In order to execute JavaScript in the browser from Playwright, it is necessary to know some of the interactions between Playwright and the browser, so I will briefly explain.
Not only Playwright, but puppeteer and Selenium have roughly the same mechanism.
Browser objects cannot be brought to Playwright.
You can get <input>
and <button>
of the browser, get and set attributes, and click on them.
const input = await page.locator('input'); await input.fill('TEST'); const button = await page.locator('button'); await button.click();
At first glance, it appears as if the browser object is being brought to Playwright, but that is impossible; it is merely obtaining reference information from the browser to identify that object.
Then, when the user wants to operate the object from Playwright, the Playwright tells the browser the identification information and the contents of the operation, the browser executes it, and the user receives the result.
Locator & JSHandle
Playwright and puppeteer define the above identification information as JSHandle
.
In addition, Playwright has Locator
, which is an extension of JSHandle
wrapped for ease of use, and uses Locator
instead of JSHandle
.
Situations for using JSHandle in Playwrigth
When executing JavaScript in the browser from Playwright, the exchange of objects between the browser and the browser is done using JSHandle
, not Locator
.
More details are explained in the JavaScript below.
JavaScript
You can run JavaScript in your browser from Playwright.
Sending and Receiving Values
- Can pass values from Playwright to JavaScript
- Can return values from JavaScript to Playwright
transceive data
Since the browser object cannot be passed directly, dom
is passed as JSHandle
.
page.evaluate()
- Execute JavaScript
- Return value with
return
- Return values can be scalar, array, or JSON.
- Arguments can be passed to JavaScript
- Arguments can be scalars, arrays, JSON, and
JSHandle
. - The
JSHandle
passed as an argument is restored to the JavaScript object in the browser
////////// // return json const res1 = await page.evaluate(() => { const obj = { val1: 123, val2: 'abc' }; return obj; }); // { val1: 123, val2: 'abc' } console.log(res1); ////////// // pass json const res2 = await page.evaluate((arg) => { const obj = { val1: arg.num, val2: arg.str }; return obj; }, { num: 123, str: 'abc' }); // { val1: 123, val2: 'abc' } console.log(res2);
locator.evaluate()
- Almost the same as
page.evaluate()
. - The first argument of the function becomes
dom
selected bylocator
// set input value 'test' await page.locator('input').evaluate((dom: HTMLInputElement, arg) => { dom.value = arg }, 'test');
page.evaluateHandle()
- Execute JavaScript
- Return value with
return
- Return value is
JSHandle
only - Values other than
JSHandle
cannot be returned. - Arguments can be passed to JavaScript
- Arguments can be scalars, arrays, JSON, and
JSHandle
- The
JSHandle
passed as an argument is restored to the JavaScript object in the browser
Example
Get input
from the browser and set "abc" to value
in input
.
const res = await page.evaluateHandle(() => { return document.querySelector('input'); }); await page.evaluateHandle((arg) => { arg.dom.value = arg.val; }, { dom: res, val: 'abc' });
document.querySelector('input')
refers toinput
in the browserres
はinput
のJSHandle
Passing
res
toevaluateHandle()
restoresJSHandle
to the objectinput
in the browser it refers to- In other words,
arg.dom
is notJSHandle
butdocument.querySelector('input')
.
- In other words,
locator.evaluateHandle()
- Almost the same as
page.evaluateHandle()
- The first argument of the function becomes
dom
selected bylocator
debug
console出力
- Output browser console to Playwright
await page.on('console', msg => { console.log(msg.text()); });
Stop at once
Put
await page.pause()
where you want to pause- Pause and the
Inspector
window appears. - Browser operation is possible while paused.
- Pause and the
Find out how to write a selector to select an element
- Launch
Inspector
window withawait page.pause()
- Click on
Pic locator
- From the plaza page, click on the element for which you want to examine the selector
- A selector statement to select that element is displayed.
Find out which element of the page the selector is
- Launch
Inspector
window withawait page.pause()
Enter the selector expression in the text box next to
Pic locator
locator('input')
etc.
- The appropriate element is highlighted.
Automatic code generation from operations
- Launch
Inspector
window withawait page.pause()
- Click on
Record
- (Perform the operation you want to code)
- Click on
Record
- The operation is coded and displayed in the
Inspector
window
Impressions, etc.
It would be useful to be able to quickly create a crawler when the idea strikes me, but I don't have the time to look up how to use it, so I don't feel inclined to create one.
So I started writing this because I thought it would be useful to summarize the minimum necessary functional description, but it turned out to be a long story.
I have written a lengthy article on JavaScript, but since there are few cases in which JSHandle
is handled, I might as well have left the explanation of evaluete()
alone.
Also, await page.pause()
is very useful.