List Corpora

Overview

List all corpora owned by the authenticated user with pagination support. This endpoint is safe for multi-tenant environments as it automatically filters results to only show corpora you created.

Use this for: Building corpus selection UI, monitoring corpus status, tracking indexing progress, auditing corpus inventory.

Authentication

Requires valid JWT token or session authentication. Anonymous requests return 401 Unauthorized.

Query Parameters

page

integer

default:"1"

Page number for pagination. Use with page_size to navigate through large corpus lists.

page_size

integer

default:"20"

Number of corpora per page. Maximum value depends on server configuration (typically 100).Recommendation: Use smaller page sizes (20-50) for better performance.

Example request

curl -X GET https://{your-host}/api/corpora/ \
  -H "Authorization: Bearer $SOAR_LABS_TOKEN" \
  -G --data-urlencode "page=1" --data-urlencode "page_size=10"

Example response

{
  "count": 1,
  "next": null,
  "previous": null,
  "results": [
    {
      "id": "8d0f0a5d-4b5e-4c09-9db6-0e9d2aa8a9fd",
      "created_at": "2024-09-01T10:05:03.291Z",
      "updated_at": "2024-09-01T10:06:42.102Z",
      "corpora_name": "support_playbooks",
      "description": "Playbooks for tier-1 engineers",
      "size_on_disk": 10485760.0,
      "index_location": "qdrant_free_collection",
      "is_published": false,
      "index_type": "VSI",
      "indexing_status": "IND",
      "creator": "eb81c1d5-78fe-4f35-b58e-0ff6a3ad5d12"
    }
  ]
}

Response Structure

count

integer

Total number of corpora owned by you (across all pages).

string | null

URL to fetch the next page of results. null if on the last page.

string | null

URL to fetch the previous page of results. null if on the first page.

results

array

Array of corpus objects. See Create Corpus for full field documentation.Key fields:

id - Corpus UUID
corpora_name - Normalized name (lowercase snake_case)
indexing_status - Current status (PND, PRS, IND, ERR)
size_on_disk - Storage size in bytes
index_type - Indexing strategy (VSI, SMI, DSI)
is_published - Public visibility flag

Best Practices

Efficient Pagination

Handle large corpus lists efficiently:

def get_all_corpora(base_url, token):
    all_corpora = []
    page = 1

    while True:
        response = requests.get(
            f"{base_url}/api/corpora/",
            headers={"Authorization": f"Bearer {token}"},
            params={"page": page, "page_size": 50}
        )
        data = response.json()

        all_corpora.extend(data["results"])

        if not data["next"]:
            break

        page += 1

    return all_corpora

Filter by Status

Client-side filtering for specific corpus states:

# Get only fully indexed corpora
response = requests.get(f"{base_url}/api/corpora/", headers=headers)
indexed_corpora = [
    c for c in response.json()["results"]
    if c["indexing_status"] == "IND"
]

# Get corpora with errors
error_corpora = [
    c for c in response.json()["results"]
    if c["indexing_status"] == "ERR"
]

Display Corpus Names

Use normalized names consistently:

# ✓ Correct - Use the normalized name from API
display_name = corpus["corpora_name"]

# ✗ Incorrect - Don't try to transform yourself
display_name = user_input.lower().replace(" ", "_")

Why: The server may apply additional normalization beyond simple lowercasing and underscore replacement.

Check Name Uniqueness

Before creating a new corpus, validate the name:

curl -X GET "https://{your-host}/api/check_corpora_name/?corpora_name=my_corpus" \
  -H "Authorization: Bearer $SOAR_LABS_TOKEN"

Returns 200 OK if name is available, 400 Bad Request if already taken.

Monitoring & Analytics

Use this endpoint to build monitoring dashboards that track:

Total corpus count over time
Storage usage (size_on_disk aggregation)
Indexing pipeline health (indexing_status distribution)
Published vs private corpus ratio

Client examples

Python
TypeScript / JavaScript
Java

import os
import requests

BASE_URL = "https://your-soar-instance.com"
TOKEN = os.environ["SOAR_LABS_TOKEN"]

response = requests.get(
    f"{BASE_URL}/api/corpora/",
    headers={"Authorization": f"Bearer {TOKEN}"},
    params={"page": 1, "page_size": 10},
    timeout=30,
)
response.raise_for_status()
corpora = response.json()["results"]

const BASE_URL = "https://your-soar-instance.com";
const token = process.env.SOAR_LABS_TOKEN!;

async function listCorpora() {
  const url = new URL("/api/corpora/", BASE_URL);
  url.searchParams.set("page", "1");
  url.searchParams.set("page_size", "10");

  const response = await fetch(url, {
    headers: {
      Authorization: `Bearer ${token}`,
    },
  });

  if (!response.ok) {
    throw new Error(`Listing corpora failed: ${response.status}`);
  }

  return response.json();
}

import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;

var BASE_URL = "https://your-soar-instance.com";
var token = System.getenv("SOAR_LABS_TOKEN");

var client = HttpClient.newHttpClient();
var uri = URI.create(BASE_URL + "/api/corpora/?page=1&page_size=10");

var request = HttpRequest.newBuilder(uri)
    .header("Authorization", "Bearer " + token)
    .GET()
    .build();

var response = client.send(request, HttpResponse.BodyHandlers.ofString());

if (response.statusCode() >= 400) {
    throw new RuntimeException("Listing corpora failed: " + response.statusCode());
}

var body = response.body(); // parse with your preferred JSON library

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Query Parameters

page

integer

A page number within the paginated result set.

page_size

integer

Number of results to return per page.

Response

200 - application/json

count

integer

required

Example:

123

results

object[]

required

Show child attributes

results.id

string<uuid>

required

results.created_at

string<date-time>

required

The date and time the organization was created

results.updated_at

string<date-time>

required

Last updated time

results.corpora_name

string

required

Name of the corpora

Maximum string length: 100

results.size_on_disk

number<double>

required

Size of the corpora on disk (in bytes)

results.index_location

string | null

required

Location of the index on Remote Storage

results.creator

string<uuid>

required

results.description

string | null

Description of the corpora

results.is_published

boolean

Is the corpora Visible to all users?

results.index_type

enum<string>

Type of index to be used for the corpora

VSI - VectorStoreIndex
SMI - SummaryIndex
DSI - DocumentSummaryIndex

Available options:

VSI,

SMI,

DSI

results.indexing_status

enum<string>

Status of the corpora processing

PND - Pending
IQE - In Queue
PRS - Processing
DEX - Data Extracted Successfully
DER - Data Extraction Error
IND - Indexed
CMP - Completed
ERR - Error

Available options:

PND,

IQE,

PRS,

DEX,

DER,

IND,

CMP,

ERR

string<uri> | null

Example:

"http://api.example.org/accounts/?page=4"

string<uri> | null

Example:

"http://api.example.org/accounts/?page=2"

Getting Started

Corpus Management

Query and Retrieve

Resources

Overview

Authentication

Query Parameters

Example request

Example response

Response Structure

Best Practices

Monitoring & Analytics

Client examples

Authorizations

Query Parameters

Response

Getting Started

Corpus Management

Query and Retrieve

Resources

​Overview

​Authentication

​Query Parameters

​Example request

​Example response

​Response Structure

​Best Practices

​Monitoring & Analytics

​Client examples

Authorizations

Query Parameters

Response

Overview

Authentication

Query Parameters

Example request

Example response

Response Structure

Best Practices

Monitoring & Analytics

Client examples