Skip to main content
GET
/
api
/
corpora
{
  "count": 123,
  "results": [
    {
      "id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
      "created_at": "2023-11-07T05:31:56Z",
      "updated_at": "2023-11-07T05:31:56Z",
      "corpora_name": "<string>",
      "size_on_disk": 123,
      "index_location": "<string>",
      "creator": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
      "description": "<string>",
      "is_published": true,
      "index_type": "VSI",
      "indexing_status": "PND"
    }
  ],
  "next": "http://api.example.org/accounts/?page=4",
  "previous": "http://api.example.org/accounts/?page=2"
}

Overview

List all corpora owned by the authenticated user with pagination support. This endpoint is safe for multi-tenant environments as it automatically filters results to only show corpora you created.
Use this for: Building corpus selection UI, monitoring corpus status, tracking indexing progress, auditing corpus inventory.

Authentication

Requires valid JWT token or session authentication. Anonymous requests return 401 Unauthorized.

Query Parameters

page
integer
default:"1"
Page number for pagination. Use with page_size to navigate through large corpus lists.
page_size
integer
default:"20"
Number of corpora per page. Maximum value depends on server configuration (typically 100).Recommendation: Use smaller page sizes (20-50) for better performance.

Example request

curl -X GET https://{your-host}/api/corpora/ \
  -H "Authorization: Bearer $SOAR_LABS_TOKEN" \
  -G --data-urlencode "page=1" --data-urlencode "page_size=10"

Example response

{
  "count": 1,
  "next": null,
  "previous": null,
  "results": [
    {
      "id": "8d0f0a5d-4b5e-4c09-9db6-0e9d2aa8a9fd",
      "created_at": "2024-09-01T10:05:03.291Z",
      "updated_at": "2024-09-01T10:06:42.102Z",
      "corpora_name": "support_playbooks",
      "description": "Playbooks for tier-1 engineers",
      "size_on_disk": 10485760.0,
      "index_location": "qdrant_free_collection",
      "is_published": false,
      "index_type": "VSI",
      "indexing_status": "IND",
      "creator": "eb81c1d5-78fe-4f35-b58e-0ff6a3ad5d12"
    }
  ]
}

Response Structure

count
integer
Total number of corpora owned by you (across all pages).
next
string | null
URL to fetch the next page of results. null if on the last page.
previous
string | null
URL to fetch the previous page of results. null if on the first page.
results
array
Array of corpus objects. See Create Corpus for full field documentation.Key fields:
  • id - Corpus UUID
  • corpora_name - Normalized name (lowercase snake_case)
  • indexing_status - Current status (PND, PRS, IND, ERR)
  • size_on_disk - Storage size in bytes
  • index_type - Indexing strategy (VSI, SMI, DSI)
  • is_published - Public visibility flag

Best Practices

Handle large corpus lists efficiently:
def get_all_corpora(base_url, token):
    all_corpora = []
    page = 1

    while True:
        response = requests.get(
            f"{base_url}/api/corpora/",
            headers={"Authorization": f"Bearer {token}"},
            params={"page": page, "page_size": 50}
        )
        data = response.json()

        all_corpora.extend(data["results"])

        if not data["next"]:
            break

        page += 1

    return all_corpora
Client-side filtering for specific corpus states:
# Get only fully indexed corpora
response = requests.get(f"{base_url}/api/corpora/", headers=headers)
indexed_corpora = [
    c for c in response.json()["results"]
    if c["indexing_status"] == "IND"
]

# Get corpora with errors
error_corpora = [
    c for c in response.json()["results"]
    if c["indexing_status"] == "ERR"
]
Use normalized names consistently:
# ✓ Correct - Use the normalized name from API
display_name = corpus["corpora_name"]

# ✗ Incorrect - Don't try to transform yourself
display_name = user_input.lower().replace(" ", "_")
Why: The server may apply additional normalization beyond simple lowercasing and underscore replacement.
Before creating a new corpus, validate the name:
curl -X GET "https://{your-host}/api/check_corpora_name/?corpora_name=my_corpus" \
  -H "Authorization: Bearer $SOAR_LABS_TOKEN"
Returns 200 OK if name is available, 400 Bad Request if already taken.

Monitoring & Analytics

Use this endpoint to build monitoring dashboards that track:
  • Total corpus count over time
  • Storage usage (size_on_disk aggregation)
  • Indexing pipeline health (indexing_status distribution)
  • Published vs private corpus ratio

Client examples

import os
import requests

BASE_URL = "https://your-soar-instance.com"
TOKEN = os.environ["SOAR_LABS_TOKEN"]

response = requests.get(
    f"{BASE_URL}/api/corpora/",
    headers={"Authorization": f"Bearer {TOKEN}"},
    params={"page": 1, "page_size": 10},
    timeout=30,
)
response.raise_for_status()
corpora = response.json()["results"]

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Query Parameters

page
integer

A page number within the paginated result set.

page_size
integer

Number of results to return per page.

Response

200 - application/json
count
integer
required
Example:

123

results
object[]
required
next
string<uri> | null
Example:

"http://api.example.org/accounts/?page=4"

previous
string<uri> | null
Example:

"http://api.example.org/accounts/?page=2"