Disadvantages of the standard paging API
Initially, we need to understand why the offset pagination approach is not suitable for large datasets using the following example:
The client provides two parameters - LIMIT for the expected maximum number of results and OFFSET for the page offset. For example, with OFFSET = 400, LIMIT = 20, we return 20 items from the database, throwing out the first 400.
Using LIMIT and OFFSET does not work well on large datasets. As the OFFSET grows, the database still has to read the data up to OFFSET + of the required number of records from the disk, before it drops OFFSET and returns only the expected number of records.
')
If the records arrive in the database at a high speed, the current window becomes unreliable for page access, potentially leading to data loss or the return of duplicates.
The solution to this problem can be the Cursor API, after each request returning a cursor that can be used by the client when requesting the next / previous piece of data.
Cursor API
The Cursor API works by returning a certain marker in the response that refers to a dataset defined by the item. In a subsequent request, this token is used to request results starting with the entries following the marker.
Suppose we have the following GET REST service operation:
GET /api/v1/items/in
For each of the returned records, we need to have a unique sequential ID, which for newly added records will matter more than all existing ones. In some databases, this may be an existing field, for example, a numeric primary key. In the case of using an alphanumeric primary key, an additional field, such as serial / bigserial in PostgreSQL, can act as such a sequential ID.
The value of this ID for the last record found will be used to form the forward cursor, the ID value of the first record found will be used to form the backward cursor.
Request of the first portion of data by the client
When a client makes a request for the first portion of data, it uses a request of the form:
GET /api/v1/items/in?someParam=...&sortBy={sortingFieldName}&size={count}
Here, to complicate things, we added another sorting by a specific field named sortingFieldName.
In this case, the query is executed to the database:
SELECT * FROM items WHERE ...
The client returns the following response:
{ "elements" : [ { "sortingFieldName" : "John", "sequentialId" : 37, ... }, { "sortingFieldName" : "John", "sequentialId" : 38, ... } ], "nextCursor" : "eyJmaWVsZE5hbWUiOiJuYW1lIiwiZmllbGRWYWx1ZSI6IkX0=" }
where in nextCursor Base64 is encoded with the following content:
{ "fieldName" : "sortingFieldName", "fieldValue" : "John", "sequentialId" : 38, "forward" : true }
It is easy to specify that we stopped at the sortingFieldName = John value in the previous page to form the next one - not enough, because several subsequent entries may also have the same value of this field. For this we need a sequential ID.
The nextCursor field is absent in response if we have found fewer than the requested count records (so they are sure that there is no next page).
(UPD: either nextCursor can be returned in any case, if the last query found any non-zero number of records, storing the information about the last record of the set found in the cursor)
The prevCursor field is absent in response after the first request (the cursor was absent in the request)
Request for the next client data
When a client makes a subsequent request, it needs to use the cursor (forward or reverse) from the previous response.
Direct cursor: GET /api/v1/items/in?someParam=...&sortBy={sortingFieldName}&size={count}& cursor=eyJmaWVsZE5hbWUiOiJuYW1lIiwiZmllbGRWYWx1ZSI6IkX0=
In this case, the query is executed to the database:
SELECT * FROM items WHERE ...
And the client returns the response:
{ "elements" : [ { "sortingFieldName" : "Zeppelin", "sequentialId" : 39, ... } ], "nextCursor" : "eyJmaWVsZE5JuYW1lIiwiZmllbGRWYhbWUiOiWx1ZSI6IkFUR3Jvd", "prevCursor" : "eyJmaWVsZE5hbWUiOiJuYW1lIiwiZmllbGRWYWx1ZSI6IkFUR3Jvd" }
Here prevCursor is also Base64-encoded object:
{ "fieldName" : "sortingFieldName", "fieldValue" : "Zeppelin", "sequentialId" : 39, "forward" : false }
Reverse cursor: GET /api/v1/items/in?someParam=...&sortBy={sortingFieldName}&size={count}& cursor=eyJmaWVsZE5hbWUiOiJuYW1lIiwiZmllbGRWYWx1ZSI6IkFUR3Jvd
In this case, the query is executed to the database:
SELECT * from items WHERE ...
After that, we need to reverse the order of the entries as a result, in order to give them to the client with the expected order. Respons has a similar look:
{ "elements": [...], "nextCursor" : "eyJmaWVsZE5hbWUiOiJuYW1lIiwiZmllbGRWYWx1ZSI6IkFUR3JvdXAtd0ZQ", "prevCursor" : "eyJmaWVsZE5h4YW1lIiwiZmllbGRWYWx1ZSI6IkFUR3JvdXAtTGFtN0B1aR1l" }
Packing request parameters in the cursor itselfIn principle, we can pack query parameters (someParam and sortBy fields) into the cursor itself, and then subsequent client requests will have a similar look for forward and backward cursors:
GET /api/v1/items/in?size={count}&cursor={...}
And the cursor itself has the form:
{ "someParam" : ..., "sortBy" : ..., "fieldName" : "sortingFieldName", "fieldValue" : "Zeppelin", "sequentialId" : 39, "forward" : true/false }
Results
We designed the Cursor API, representing an alternative to the standard Paging, devoid of some of its shortcomings.
UPD: according to advice in comments - I add the link to
article where the similar problem is considered.
UPD2: in order not to embarrass the user by returning next / prev elements in response in cases where the next / previous page is empty in essence - the idea arose to get 2 more elements from the database - in this case, the extreme elements serve only to determine the need to return to next / prev fields (their visibility is essentially). In this case, the user in any case receives no more than the expected size elements.