Understanding the Offset and Cursor Pagination | by Jason Ngan | Dec, 2022

A quick look at pagination algorithms

photo by monstera Feather Pexels

As the amount of data increases, pagination becomes an integral part of software development.

Instead of returning a large chunk of data in a single request, pagination divides and returns data to clients in smaller batches.

Cursor and offset pagination are the two most commonly used algorithms and they carry their respective tradeoffs.

Let’s take an in-depth look at how they work today.

offset pagination takes advantage of OFFSET And LIMIT Command in SQL to paginate data.

execution

Paginate using OFFSET & LIMIT commands in SQL

Having said that we are implementing an API to get the list of user information.

API: GET /api/user/list

request:
page_size: 10,
page_number: 3

In each request, the client passes page_size (offset) and a page_number (range).

  • The page size indicates the number of data to be returned.
  • The page number indicates the current requesting page.
SELECT COUNT(*) AS total FROM user_tab;

The server first queries the users table about the total number of records.

  • This allows customers to understand the total number of pages.
SELECT * FROM user_tab
ORDER BY id DESC
LIMIT 10 OFFSET 20;

The server uses the OFFSET and LIMIT commands to retrieve ten records from the table.

  • since given page_number is 3, offset = 10 * 2 = 20.
response: 
"users": [...],
"paging":
"total_record": 295,
"page": 3,
"total_pages": 30

The server returns paging information to the clients so that they can keep track of the current and available pages.

pros

  • It allows customers to see the total number of pages.
  • It allows clients to jump to a specific page by passing the page number.

Shortcoming

result discrepancy

  • If an item from a previous page is deleted, the data will shift forward, causing some results to be discarded.
  • If an item is added to the previous page, the data will be shifted backwards, causing some results to be duplicated.

offset inefficiency – does not scale well with large datasets

  • Seeks (offset + limit) number of database records before discarding unwanted ones and returning the rest.
  • Therefore, the query time increases significantly as the offset increases.

cursor pagination uses a indicator which refers to a specific database record.

execution

The client provides a cursor that points to a unique database record.
API: GET /api/user/list

request:
cursor: 12345,
page_size: 10

In each request, the client passes cursor and a page_size

  • A cursor refers to a specific unique value in the database.
  • If a cursor is not given, the server fetches from the first record.
SELECT * FROM users
WHERE id <= %cursor
ORDER BY id DESC
LIMIT %

server fetches (limit + 1) records whose ID is smaller than the cursor value.

Note that the limit is equal to the given page size and one.

  • If less than the number of records returned LIMITIt means that we are on the last page.
  • Additional records are not returned to the client. ID Additional records are passed back to the client as next_cursor,
response: 
"users": [...],
"next_cursor": "12335", # the user id of the extra result

pros

static pagination window

  • Since we are fetching from a stable reference point, adding or deleting records will not affect the pagination window.

Scale well with large datasets

  • Cursor is unique and indexed.
  • The database jumps directly to records without iterating through unwanted data. Hence, making it more efficient.

Shortcoming

  • Cursor pagination does not allow the client to visit a specific page.
  • The cursor must come from a unique and sequential column (such as timestamp). Otherwise, some data will be discarded.
  • Limited sorting features. If the requirement is to sort by a non-unique column (such as first name), this would be challenging to implement using cursor pagination. concatenating multiple columns to get a unique key slow time complexity,

The encoded cursor suggests a return encoded base64 string regardless of the underlying pagination solution.

When using offset pagination, we encode page_number And total_page into a base64 string and return it to the clients as a cursor.

response: 
// "page=3

Similarly, we can encode the cursor into a base64 string in cursor pagination before returning it to the client.

response: 
// "next_cursor:1234"
next_cursor: "dcjadfaXMDdQTQ"

Customer can always make a pass cursor and a page_size Without knowing the underlying implementation.

request: 
cursor: "dcjadfaXMDdQTQ",
page_size: 10

This allows servers to implement different underlying pagination solutions while providing a consistent interface to API consumers.

Pagination is an inevitable but tricky topic in software development.

While there isn’t a one-size-fits-all solution, knowing the pros and cons allows us to make better tradeoffs when designing our next API.

I hope you find this helpful, and I’ll see you on the next one!

Leave a Reply