Bulk data export

The data export API serves as a proxy to a portion of accesso's data lake. Using the API allows the caller to retrieve data to which they have been authorized. This guide provides an overview of the export endpoint and some basic best practices for using the API.

Exporting data

To export data, the following steps are performed:

Acquiring a service token, if a valid/non-expired one is not already in the caller’s possession.
Initiating the export and fetching the first page of results.
Fetching remaining pages of results.

1. Acquiring a service token

See the Service Tokens documentation for instructions on how to obtain a service token.

2. Initiating the export and fetching first page of results

Due to the amount of data which can possibly be returned, the export API returns its results in pages. The maximum number of records returned by any request is 1,000 records. When an export is requested, only the first page of results will be initially returned. Subsequent pages must be explicitly requested.

Note: The -G option on the curl commands below includes the parameters passed to the -d option as query string parameters. This format is used on this guide for readability of the multiple parameters.

The following request is used to invoke an export:

curl -X GET -G \
  'https://api.{region}.te2.io/v1/export/{dataType}' \
  -d 'startTime={startTime}' \
  -d 'endTime={endTime}' \
  -d 'venueId={venueId}' \
  -H 'Cache-Control: no-cache' \
  -H 'Authorization: Bearer {token}'

Parameters used above include:

dataType: Data set from which to export. To which dataTypes a user has access is driven by configuration. Examples of dataTypes to which consumers generally are granted access include but are not limited to:
- user_locations: Records of when a user enters/exits a venue and when a user’s device emits location data while within the bounds of a venue.
- user_registrations: Records of when a user registered an account.
- user_tag_updates: Record of user tag changes (set, update, remove).
- user_updates: Record of changes to a user’s profile (set, update, remove).
- ticket_updates: Records of updates to ticket user mapping when receiving orders from Passport and during ticket registration.
startTime: Starting date and time (inclusive) from which to export. Expected to be in ISO-8601 date-time format including timezone (e.g., 2019-10-15T16:29:05.734Z).
endTime: Ending date and time (inclusive) from which to export. Expected to be in ISO-8601 date-time format including timezone (e.g., 2019-10-30T16:29:05.734Z).
venueId: Optional. ID of a venue. If not provided, the export will cover all venues in which you are authorized to retrieve. Learn how to fetch a list of venues.
token: Service token.

A successful request will return the first page of results and a status of 200. The body of the response has the following format:

{
  "data": [
    {
      "property1": "value1",
      "property2": "value2"
    }
  ],
  "queryExecutionId": "{queryExecutionId}",
  "nextToken": "{nextToken}"
}

Parameters used above include:

queryExecutionId: ID of the query. Used to retrieve the next page of results.
nextToken: Token used as a result bookmark. Used to retrieve the next page of results. If null, there are no additional pages of data to export.
data: Array of records as JSON objects. The exact properties of these records is highly dependent on dataType requested.

3. Fetching remaining pages of results

To fetch the remaining pages of results, if any exist, the following request is used:

curl -X GET -G \
  'https://api.{region}.te2.io/v1/export/{dataType}' \
  -d 'queryExecutionId={queryExecutionId}' \
  -d 'nextToken={nextToken}' \
  -H 'Cache-Control: no-cache' \
  -H 'Authorization: Bearer {token}'

Parameters used above include:

dataType: Must match the dataType used when invoking the export.
queryExecutionId: Must match the queryExecutionId returned with the first page of results.
nextToken: Must match the nextToken returned with the previous page of results.
token: Service token.

A successful request will return a response in the same format as the initial page of results.

Retrieval flow

This is a general example of what an export looks like using the above steps. Your exact set up may differ slightly, however the basic flow will most likely remain the same.

# Fetch a service token.
service_token <- fetch_service_token()

# Fetch the initial page of results.
result_page, query_execution_id, next_token <- fetch_first_page(
    service_token,
    start_time,
    end_time,
    venue_id
)

# Store the first page of results.
results <- result_page

# So long as there are additional pages, keep retrieving data
# and appending the returned data to the result set.
while next_token != null do:
  result_page, query_execution_id, next_token <- fetch_next_page(
      service_token,
      query_execution_id,
      next_token
  )

  results <- results + result_page
end while

# Return your exported data.
return results

Rules governing export request parameters

Violating any of the following will result in a returned status of 422:

an unknown or disallowed dataType is provided
queryExectionId is not provided and neither is startTime nor endTime
the startTime is after the endTime
the startTime is too far in the past
the endTime is in the future
a queryExecutionId is provided without a nextToken

Best practices

Do:

use the venueId parameter. Pull back only the data you need and no more.

Don’t:

set the endTime to the current time. The data returned by the export is streaming data. It can take several minutes for data to arrive in accesso’s data lake. Thus, if you query too close to the present time, you may receive incomplete data for your date range. It is recommended to not set endTime to a value more recent than 15 minutes from the current time.
repetitively execute the same request multiple times in a short period. Execute once and cache the response if you need the data for an extended period of time.
attempt to pull back exceedingly large amounts of data in a single request. Limit your requests to periods of no more than a day or so at a time.