Welcome to new things

[Technical] [Electronic work] [Gadget] [Game] memo writing

Infinite scrolling with Selenium, puppeteer and Playwright

For a while now, I have been putting together a list of how to use Selenium, puppeteer, and Playwright.

Since the summary is getting long, I will put the operations that require some explanation and procedures in a separate article.

Here is a summary of how to do infinite scrolling in Selenium, puppeteer, and Playwright, respectively.

scroll

Before we get into the infinite scroll procedure, let's start with the basic method of scrolling.

There is another way to move the scroll position of window dom using JavaScript Element.scrollTop, but here we will use Element.scrollIntoView() to scroll the window so that the bottom dom in the window is visible.

The method using Element.scrollTop is also included at the end of this document for reference.

sample html

We assume a page in which articles are arranged in the same format in a window.

Execute Element.scrollIntoView() on the bottom element.

<style>
    .scroll{
        width:200px;
        height: 200px;
        overflow:scroll;
    }
    .in{
        height:100px;
    }
</style>
<div class="scroll">
    <div class="in">test1</div>
    <div class="in">test2</div>
    <div class="in">test3</div>
</div>

Selenium

If you get the bottom element in the window and refer to webelement.location_once_scrolled_into_view, it will scroll until you see that element.

Since webelement.location_once_scrolled_into_view is a property, you only need to refer to it.

Internally, Element.scrollIntoView() is called on the element.

driver.find_element(By.CSS_SELECTOR, 'div.in:last-child') \
    .location_once_scrolled_into_view

puppeteer

Call Element.scrollIntoView() in JavaScript.

let selector = 'div.in:last-child';
await page.waitForSelector(selector);
await page.$eval(selector, (dom) => { dom.scrollIntoView() });

Playwright

The function locator..scrollIntoViewIfNeeded() is provided for scrolling.

Internally, Element.scrollIntoViewIfNeeded() is called on the element.

await page.locator('div.in:last-child').scrollIntoViewIfNeeded();

infinite scroll

It is a common type of scrolling, in which more elements are added below it as you scroll.

Basically, it loops through the above scrolling, but since the elements may be finite rather than infinite, it checks to see if the bottom element is the same as the previous one and stops the loop if it is.

The comparison of whether elements are the same or not is determined by the JavaScript dom comparison operator ===.

Selenium

prevElem = None
while True:
    elem = driver.find_element(By.CSS_SELECTOR, "div.in:last-child")
    bSame = driver.execute_script(
        "return arguments[0]===arguments[1]",
        elem, prevElem)

    if bSame:
        break

    elem.location_once_scrolled_into_view
    prevElem = elem
    time.sleep(1)

puppeteer

import { setTimeout } from 'timers/promises';

let prevElem;
while (true) {
    const selector = 'div.in:last-child';
    const elem = await page.waitForSelector(selector);
    const bSame = await page.evaluate((_elem, _prevElem) => {
        return _elem === _prevElem;
    }, elem, prevElem);

    if (bSame) {
        break;
    }

    await page.evaluate((dom) => { dom.scrollIntoView() }, elem);
    prevElem = elem;
    await setTimeout(1000);
}

Playwright

import { setTimeout } from 'timers/promises';

let prevElem;
while (true) {
    const selector = 'div.in:last-child';
    const elem = await page.locator(selector).evaluateHandle((dom: Element) => dom);
    const bSame = await page.evaluate((arg) => {
        return arg[0] === arg[1];
    }, [elem, prevElem]);

    if (bSame) {
        break;
    }

    await page.locator(selector).scrollIntoViewIfNeeded();
    prevElem = elem;
    await setTimeout(1000);
}

Playwright should be careful.

JavaScript checks to see if they are the same element, but locator cannot be passed to JavaScript.

Therefore, locator.evaluateHandle() returns dom as return, which is pointed to by locator, to obtain JSHandle in dom.

Since JSHandle can be passed to JavaScript, it can be compared with the previous dom.

When Element.scrollTop is used

For reference, here is an example of using Element.scrollTop in Playwright.

The same process applies to Selenium/puppeteer.

  • The value Element.scrollTop indicates the position of the window from the top of the content. Assigning a value to Element.scrollTop will move the window to that position.
  • Element.clientHeight represents the window height and Element.scrollHeight represents the content height.
  • So, to scroll to the bottom, change Element.scrollTop to Element.scrollHeight - Element.clientHeight.
  • Then, to determine if the window has scrolled to the bottom, Element.scrollTop + Element.clientHeight is equal to Element.scrollHeight.

Since the statement alone is complicated, the image below is a good example.

import { setTimeout } from 'timers/promises';

let bContinue = true;
while (bContinue) {
    await page.locator('div.scroll').evaluate(dom => {
        dom.scrollTop = dom.scrollHeight - dom.clientHeight;
    });

    await setTimeout(1000);

    bContinue = await page.locator('div.scroll').evaluate(dom => {
        return dom.scrollTop + dom.clientHeight < dom.scrollHeight;
    });
}

Impressions, etc.

Since the element is dynamically generated, it must be weighted until the next element appears after scrolling.

I prefer to use Element.scrollIntoView() because it is more intuitive, although Element.scrollTop is more concise.

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com