Handling Pagination In Web Scraping

June 12, 2025

Making sure your web scraper can navigate through all the possible results can pose an interesting challenge. While scraping just one page at a time is an acceptable method, it starts to become difficult once you have dozens to hundreds of pages to scrape or have to handle a click-to-load or endless scrolling page.

Approximately 65% of e-commerce websites use pagination, which highlights the value of learning this skill. As modern websites become more complex and dynamic, scrapers have to adapt to the changes.

This article will explore and explain pagination in web scraping and present code samples detailing how to handle it, as well as going over some challenges that come up.

Drawings of two arrows pointing away from each other, a circle with three dots, and a page with text that continues off the edge under the title "What is Pagination?"

What is Pagination?

E-commerce platforms, job boards, and social media websites use pagination to handle large datasets. Showing everything on one page would result in slow download times and increased memory usage.

Pagination splits the content across multiple pages, making it easier to manage. Pagination in web scraping is not just important but necessary, especially if you want to get all the possible results, not just the ones that appear on the first page — it gives scrapers the opportunity to navigate through pages systematically, ensuring comprehensive data collection.

Types of Pagination

Understanding pagination in web scraping requires recognizing the different forms it can take across various websites.

Pagination can come in many forms as each website will experiment with different ways to keep customers engaged with the website. However, pagination can be broken down into three main categories.

Numbered Pagination: This method lets users navigate through separate pages using numbered links that change using the /page1 /page2 method within the URL.
Click-to-Load Pagination: With this method, users have to click a button, typically labeled “Load More” or “See More” to reveal additional content. This allows for more controlled loading of data.
Infinite Scrolling Pagination: With infinite scrolling, content will load automatically as a user scrolls further down the page. This creates a seamless browsing experience without needing to constantly click through pages or scroll down and click a button.

Tackling Pagination in Web Scraping

Mastering pagination in web scraping requires adapting to how different sites structure and load content. What makes pagination in web scraping different from other forms of web scraping is that most websites try to be overtly creative with their structure.

From static and changing URLs to load-more and infinite scroll pages, knowing how each website operates and how to tackle the many forms of pagination will have you ready for any challenges that come your way.

Additionally, paginated pages are indexed as a single page which is why you may have faced difficulties when using a typical web scraper. We will be providing you with the code necessary to understand and implement pagination in web scraping.

Numbered Pagination

Numbered pagination, often called “Next and Previous Pagination”, “Arrow Pagination”, or “URL-Based Pagination”, uses discrete page links that are displayed at the bottom of the page and allow users to jump between pages.

It is one of the easiest methods to scrape because the URL will change incrementally, making it straightforward to iterate through pages. To scrape websites with numbered pagination, you will simply need to identify the base URL and URL pattern, and you will need to increment the page parameter in a loop until the last page is reached so the scraper knows where to stop.

We will be using the website “ScrapeThisSite” as practice for this example. If you click on that link and scroll through the pages, you will notice the URL is changing ever so slightly with the addition of “?page_num=x” with X being the current page’s number.

If you inspect the page and check the “Next” button, you will notice it is an anchor tag with an href attribute that links to the next page. The aria-label attribute will show that the button is still active. When analyzing a webpage for scraping, using CSS selectors to target child elements or specific attribute selectors allows for precise data extraction.

With all that in mind, here is the script that will present you with all the data available in this collection.

import requests
from bs4 import BeautifulSoup

base_url = "https://www.scrapethissite.com/pages/forms/"
session = requests.Session()
page_num = 1

while True:
    response = session.get(base_url, params={'page_num': page_num})
    if response.status_code != 200:
        break

    soup = BeautifulSoup(response.text, 'html.parser')

    rows = soup.select('tr.team')
    if not rows:
        break

    print(f"Scraping page {page_num}…")
    for row in rows:
        cells = row.find_all('td')
        if len(cells) >= 3:
            team_name = cells[0].get_text(strip=True)
            wins = cells[1].get_text(strip=True)
            losses = cells[2].get_text(strip=True)
            print(f"Team: {team_name}, Wins: {wins}, Losses: {losses}")

    next_btn = soup.select_one('li.next')
    if next_btn and 'disabled' in next_btn.get('class', []):
        break

    page_num += 1

print("Done.")

In this code, the scraper will continue to increment the page_num variable and scrape every page until it reaches the end. However, when using this script, there are a few things to keep in mind; some websites have dynamic page numbers or use JavaScript to load content, making this script defunct.

Not all numbered pagination will be visible in the URL, in which case an AJAX call is involved. In some cases, numbered pagination may be controlled through an API endpoint, where the response content includes pagination details, allowing for efficient iteration through a pagination loop.

Click-to-Load Pagination

Click-to-load pagination, usually seen as a “Load More” button on the bottom of the page, dynamically loads new content on the same page. This will require the scraper to simulate a click event repeatedly to load all available content.

To handle the dynamic loading of the content as well as having to simulate a click each time, tools like Selenium or Playwright can be used to automate the process by repeatedly clicking the button until no more content is available.

from selenium import webdriver
from selenium.webdriver.common.by import By
import time

# Start browser
driver = webdriver.Chrome()
driver.get("https://www.scrapingcourse.com/button-click")

# Allow full page load
time.sleep(5)

# Keep clicking Load More while button exists
while True:
    try:
        # Scroll down to bottom of the page to trigger lazy loading
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        time.sleep(1)

        load_more_button = driver.find_element(By.XPATH, "//button[contains(text(), 'Load More')]")
        driver.execute_script("arguments[0].scrollIntoView(true);", load_more_button)
        time.sleep(0.5)
        load_more_button.click()
        time.sleep(2)
    except:
        break

# Wait for final content to load
time.sleep(3)

# Grab product data after everything loaded
items = driver.find_elements(By.CLASS_NAME, "card-body")
for item in items:
    name = item.find_element(By.TAG_NAME, "h5").text
    price = item.find_element(By.TAG_NAME, "p").text
    print(f"{name} - {price}")

driver.quit()

Click-to-load pagination in web scraping often involves simulating repeated user actions like button clicks. In the example above, the find_element method is used to locate the “Load More” button, and click() is called to load more content until the button no longer appears.

Be careful when using this script, as too many requests might result in a CAPTCHA test which could slow down your operations. If necessary, implement a time delay between requests or use a rotating mobile proxy to avoid the chances of getting a CAPTCHA; more on that later.

Infinite Scroll Pagination

Unlike numbered pagination or click-to-load pagination, infinite scrolls automatically load more content as the user scrolls down. While this makes it easier for users to navigate, it complicates things for pagination in web scraping due to its reliance on JavaScript and dynamically loaded content. Playwright, which supports automation with Chromium-based browsers, can handle infinite scrolling.

import asyncio
from playwright.async_api import async_playwright
async def scroll_to_bottom(page):
    while True:
        previous_height = await page.evaluate("document.body.scrollHeight")
        await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
        await asyncio.sleep(2)
        new_height = await page.evaluate("document.body.scrollHeight")
        if new_height == previous_height:
            break
async def scrape_infinite_scroll(url):
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        page = await browser.new_page()
        await page.goto(url)
        await scroll_to_bottom(page)
        # Extract data after fully loading the page
        content = await page.content()
        print(content)
        await browser.close()
asyncio.run(scrape_infinite_scroll("https://example.com/items"))

The code scrolls down until no more content is loaded, ensuring all items are visible on the page for scraping. The challenge arises with your code detecting when to stop scrolling, as that is generally not straightforward.

Additionally, some websites implement lazy-loading where content is not loaded until it is visible in the viewport. While pagination in web scraping enables full data coverage, it introduces several technical challenges.

Image of a computer screen showing gears, a lock, and an emergency sign and two fishing hooks picking up item. Text above the image reads

Challenges with Pagination in Web Scraping

When working with pagination in web scraping, there are many risks that you need to keep an eye out for, such as suffering an IP ban. Some websites block access if there are too many requests being sent or they will present the user with a CAPTCHA challenge.

If you decide to implement pagination in web scraping, you could encounter a 403 status code, which typically indicates that you are blocked from accessing the website due to a bot detection system. While there are CAPTCHA solvers you can implement within your code, a better approach would be to avoid the chance of a CAPTCHA entirely.

To avoid encountering CAPTCHA or suffering from an IP ban, consider using a proxy server to make your traffic appear as though it is coming from multiple different sources. You could similarly rotate your User Agent, giving you the appearance of a real browser.

Conclusion

Perfecting pagination in web scraping is essential for handling modern websites that present data across multiple pages, segments, or dynamic content blocks. From the simple pagination buttons to the more complex JavaScript-based pagination, each system requires an understanding of the site’s URL structure, content loading behavior, and adaptive pagination techniques to ensure accurate and successful data retrieval.

Using tools like Beautiful Soup for static pages or browser automation platforms like Selenium and Playwright for dynamic content loading, you can tailor your approach to match the site’s architecture.

Key Takeaways:

Pagination in web scraping is essential when dealing with large datasets spread across multiple pages.
Different websites implement numeric pagination, click-to-load pagination, and infinite scrolling, each requiring a specific scraping approach.
Numbered pagination is often the easiest to handle as page numbers are typically visible in the URL.
Click-to-load and infinite scroll require tools like Selenium or Playwright to simulate user behavior during the scraping process.
Handling pagination correctly ensures complete data extraction without missing segments of important content.

This comprehensive guide covered multiple pagination methods including numbers, click-to-load, and infinite scrolling, giving you code examples and solutions for each.

While some cases can be handled with simple code, others demand more advanced techniques to overcome obstacles like anti-bot detection and asynchronous pagination scraping.

By using the scripts provided and keeping in mind the challenges that pop up such as CAPTCHAs and IP bans, you can scrape any website from e-commerce sites to social media platforms. With the right approach, you can overcome the challenges of pagination in web scraping and achieve comprehensive data extraction.

Handling Pagination in Web Scraping

IN THIS ARTICLE:

What is Pagination?

Types of Pagination

Tackling Pagination in Web Scraping

Numbered Pagination

Click-to-Load Pagination

Infinite Scroll Pagination

Challenges with Pagination in Web Scraping

Conclusion

Related articles

How to Enhance Your SEO Strategy With Mobile Proxies?

Resolving Error Code 502

32 Best Telegram Channels in 2025

Exploring Mobile Proxies’ Role in Geolocation Testing

What to Expect: