For a while now, I have been putting together a list of how to use Selenium, puppeteer, and Playwright.
Since the summary is getting long, I will put the operations that require some explanation and procedures in a separate article.
Here is a summary of how to do infinite scrolling in Selenium, puppeteer, and Playwright, respectively.
scroll
Before we get into the infinite scroll procedure, let's start with the basic method of scrolling.
There is another way to move the scroll position of window dom
using JavaScript Element.scrollTop
, but here we will use Element.scrollIntoView()
to scroll the window so that the bottom dom
in the window is visible.
The method using Element.scrollTop
is also included at the end of this document for reference.
sample html
We assume a page in which articles are arranged in the same format in a window.
Execute Element.scrollIntoView()
on the bottom element.
<style> .scroll{ width:200px; height: 200px; overflow:scroll; } .in{ height:100px; } </style> <div class="scroll"> <div class="in">test1</div> <div class="in">test2</div> <div class="in">test3</div> </div>
Selenium
If you get the bottom element in the window and refer to webelement.location_once_scrolled_into_view
, it will scroll until you see that element.
Since webelement.location_once_scrolled_into_view
is a property, you only need to refer to it.
Internally, Element.scrollIntoView()
is called on the element.
driver.find_element(By.CSS_SELECTOR, 'div.in:last-child') \ .location_once_scrolled_into_view
puppeteer
Call Element.scrollIntoView()
in JavaScript.
let selector = 'div.in:last-child'; await page.waitForSelector(selector); await page.$eval(selector, (dom) => { dom.scrollIntoView() });
Playwright
The function locator..scrollIntoViewIfNeeded()
is provided for scrolling.
Internally, Element.scrollIntoViewIfNeeded()
is called on the element.
await page.locator('div.in:last-child').scrollIntoViewIfNeeded();
infinite scroll
It is a common type of scrolling, in which more elements are added below it as you scroll.
Basically, it loops through the above scrolling, but since the elements may be finite rather than infinite, it checks to see if the bottom element is the same as the previous one and stops the loop if it is.
The comparison of whether elements are the same or not is determined by the JavaScript dom
comparison operator ===
.
Selenium
prevElem = None while True: elem = driver.find_element(By.CSS_SELECTOR, "div.in:last-child") bSame = driver.execute_script( "return arguments[0]===arguments[1]", elem, prevElem) if bSame: break elem.location_once_scrolled_into_view prevElem = elem time.sleep(1)
puppeteer
import { setTimeout } from 'timers/promises'; let prevElem; while (true) { const selector = 'div.in:last-child'; const elem = await page.waitForSelector(selector); const bSame = await page.evaluate((_elem, _prevElem) => { return _elem === _prevElem; }, elem, prevElem); if (bSame) { break; } await page.evaluate((dom) => { dom.scrollIntoView() }, elem); prevElem = elem; await setTimeout(1000); }
Playwright
import { setTimeout } from 'timers/promises'; let prevElem; while (true) { const selector = 'div.in:last-child'; const elem = await page.locator(selector).evaluateHandle((dom: Element) => dom); const bSame = await page.evaluate((arg) => { return arg[0] === arg[1]; }, [elem, prevElem]); if (bSame) { break; } await page.locator(selector).scrollIntoViewIfNeeded(); prevElem = elem; await setTimeout(1000); }
Playwright should be careful.
JavaScript checks to see if they are the same element, but locator
cannot be passed to JavaScript.
Therefore, locator.evaluateHandle()
returns dom
as return
, which is pointed to by locator
, to obtain JSHandle
in dom
.
Since JSHandle
can be passed to JavaScript, it can be compared with the previous dom
.
When Element.scrollTop
is used
For reference, here is an example of using Element.scrollTop
in Playwright.
The same process applies to Selenium/puppeteer.
- The value
Element.scrollTop
indicates the position of the window from the top of the content. Assigning a value toElement.scrollTop
will move the window to that position. Element.clientHeight
represents the window height andElement.scrollHeight
represents the content height.- So, to scroll to the bottom, change
Element.scrollTop
toElement.scrollHeight - Element.clientHeight
. - Then, to determine if the window has scrolled to the bottom,
Element.scrollTop + Element.clientHeight
is equal toElement.scrollHeight
.
Since the statement alone is complicated, the image below is a good example.
import { setTimeout } from 'timers/promises'; let bContinue = true; while (bContinue) { await page.locator('div.scroll').evaluate(dom => { dom.scrollTop = dom.scrollHeight - dom.clientHeight; }); await setTimeout(1000); bContinue = await page.locator('div.scroll').evaluate(dom => { return dom.scrollTop + dom.clientHeight < dom.scrollHeight; }); }
Impressions, etc.
Since the element is dynamically generated, it must be weighted until the next element appears after scrolling.
I prefer to use Element.scrollIntoView()
because it is more intuitive, although Element.scrollTop
is more concise.