Skip to main content
GET
/
api
/
corpora
/
{id}
{
  "id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "created_at": "2023-11-07T05:31:56Z",
  "updated_at": "2023-11-07T05:31:56Z",
  "corpora_name": "<string>",
  "size_on_disk": 123,
  "index_location": "<string>",
  "creator": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "description": "<string>",
  "is_published": true,
  "index_type": "VSI",
  "indexing_status": "PND"
}

Overview

Fetch detailed metadata for a single corpus by its unique identifier. This endpoint returns the same comprehensive information as the list endpoint but scoped to one record. Only the corpus owner can retrieve this data.
Use this for: Checking indexing status after creation, refreshing UI state after updates, validating corpus existence before operations.

Authentication

Requires valid JWT token or session authentication. You must own the target corpus.

Path Parameters

id
UUID
required
The corpus identifier returned during creation or from the list endpoint.Example: 8d0f0a5d-4b5e-4c09-9db6-0e9d2aa8a9fd

Example request

curl -X GET https://{your-host}/api/corpora/8d0f0a5d-4b5e-4c09-9db6-0e9d2aa8a9fd/ \
  -H "Authorization: Bearer $SOAR_LABS_TOKEN"

Example response

{
  "id": "8d0f0a5d-4b5e-4c09-9db6-0e9d2aa8a9fd",
  "created_at": "2024-09-01T10:05:03.291Z",
  "updated_at": "2024-09-01T10:08:11.522Z",
  "corpora_name": "support_playbooks",
  "description": "Runbooks feeding LlamaIndex",
  "size_on_disk": 4194304.0,
  "index_location": "qdrant_free_collection",
  "is_published": false,
  "index_type": "VSI",
  "indexing_status": "IND",
  "creator": "eb81c1d5-78fe-4f35-b58e-0ff6a3ad5d12"
}

Response Structure

id
UUID
Unique corpus identifier.
created_at
timestamp
ISO 8601 timestamp when the corpus was created.
updated_at
timestamp
Last modification timestamp. Updates when metadata changes or resources are added.
corpora_name
string
Normalized corpus name (lowercase with underscores).
description
string
User-provided description of the corpus purpose.
size_on_disk
float
Storage size in bytes consumed by this corpus and its resources.
index_location
string
Vector database collection identifier where embeddings are stored.
is_published
boolean
Public visibility flag. true makes the corpus discoverable by other users.
index_type
string
Indexing strategy: VSI (Vector Store Index), SMI (Summary Index), or DSI (Document Summary Index).
indexing_status
string
Current processing state:
  • PND - Pending (waiting for resources)
  • PRS - Processing (indexing in progress)
  • IND - Indexed (ready for queries)
  • ERR - Error (indexing failed)
creator
UUID
User ID of the corpus owner.

Common Use Cases

After creating a corpus and uploading resources, poll this endpoint to check when indexing completes:
import time
import requests

def wait_for_indexing(base_url, token, corpus_id, timeout=300):
    start = time.time()
    while time.time() - start < timeout:
        response = requests.get(
            f"{base_url}/api/corpora/{corpus_id}/",
            headers={"Authorization": f"Bearer {token}"}
        )
        corpus = response.json()

        if corpus["indexing_status"] == "IND":
            print("Corpus ready for queries!")
            return corpus
        elif corpus["indexing_status"] == "ERR":
            raise Exception("Indexing failed")

        time.sleep(5)  # Poll every 5 seconds

    raise TimeoutError("Indexing did not complete in time")
After modifying corpus settings, fetch the latest state to confirm changes:
# Update the corpus
requests.patch(
    f"{base_url}/api/corpora/{corpus_id}/",
    headers=headers,
    json={"is_published": True}
)

# Verify the change
corpus = requests.get(
    f"{base_url}/api/corpora/{corpus_id}/",
    headers=headers
).json()

assert corpus["is_published"] == True
Monitor storage consumption before adding more resources:
corpus = requests.get(
    f"{base_url}/api/corpora/{corpus_id}/",
    headers=headers
).json()

size_mb = corpus["size_on_disk"] / (1024 * 1024)
print(f"Current storage: {size_mb:.2f} MB")

# Warn if approaching limits
if size_mb > 500:
    print("Warning: Large corpus size may affect query performance")
Ensure corpus exists and is ready before performing operations:
try:
    corpus = requests.get(
        f"{base_url}/api/corpora/{corpus_id}/",
        headers=headers
    ).json()

    # Check if ready for queries
    if corpus["indexing_status"] != "IND":
        raise ValueError("Corpus not fully indexed yet")

    # Proceed with operation
    query_corpus(corpus_id)
except requests.HTTPError as e:
    if e.response.status_code == 404:
        print("Corpus not found or access denied")
Security Note: Requests return 404 Not Found if the corpus belongs to another user. This prevents leaking corpus existence to unauthorized users.

Client examples

import os
import requests

BASE_URL = "https://your-soar-instance.com"
TOKEN = os.environ["SOAR_LABS_TOKEN"]
CORPUS_ID = "8d0f0a5d-4b5e-4c09-9db6-0e9d2aa8a9fd"

response = requests.get(
    f"{BASE_URL}/api/corpora/{CORPUS_ID}/",
    headers={"Authorization": f"Bearer {TOKEN}"},
    timeout=30,
)
response.raise_for_status()
corpus = response.json()

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Path Parameters

id
string<uuid>
required

A UUID string identifying this Corpora.

Response

200 - application/json
id
string<uuid>
required
created_at
string<date-time>
required

The date and time the organization was created

updated_at
string<date-time>
required

Last updated time

corpora_name
string
required

Name of the corpora

Maximum string length: 100
size_on_disk
number<double>
required

Size of the corpora on disk (in bytes)

index_location
string | null
required

Location of the index on Remote Storage

creator
string<uuid>
required
description
string | null

Description of the corpora

is_published
boolean

Is the corpora Visible to all users?

index_type
enum<string>

Type of index to be used for the corpora

  • VSI - VectorStoreIndex
  • SMI - SummaryIndex
  • DSI - DocumentSummaryIndex
Available options:
VSI,
SMI,
DSI
indexing_status
enum<string>

Status of the corpora processing

  • PND - Pending
  • IQE - In Queue
  • PRS - Processing
  • DEX - Data Extracted Successfully
  • DER - Data Extraction Error
  • IND - Indexed
  • CMP - Completed
  • ERR - Error
Available options:
PND,
IQE,
PRS,
DEX,
DER,
IND,
CMP,
ERR