Overview
Create a corpus before uploading any resources. Every corpus belongs to the authenticated user and encapsulates indexing configuration (vector index type, publication flag, etc.). Back-end logic normalizes the name into lowercase snake_case and enforces uniqueness per user.
Prerequisite : You must be authenticated with a valid JWT token or session cookie.
Request Body
Human-friendly name for your corpus. The system automatically converts it to lowercase with underscores (e.g., “Support Playbooks” becomes “support_playbooks”). Must be unique for your user account.
Optional context about what content lives in this corpus. Helps you and your team understand the corpus purpose.
Controls whether the corpus is discoverable via public listings. Set to true for shared knowledge bases.
Indexing strategy for the corpus. Available options:
VSI - Vector Store Index (recommended for semantic search)
SMI - Summary Index
DSI - Document Summary Index
System-managed field that tracks indexing progress. Leave empty - Soar Labs automatically updates this as ingestion jobs complete. Status values:
PND - Pending (newly created)
PRS - Processing (ingestion in progress)
IND - Indexed (ready for queries)
ERR - Error (ingestion failed)
Example request
curl -X POST https://{your-host}/api/corpora/ \
-H "Authorization: Bearer $SOAR_LABS_TOKEN " \
-H "Content-Type: application/json" \
-d '{
"corpora_name": "Support Playbooks",
"description": "Runbooks feeding LlamaIndex",
"is_published": false,
"index_type": "VSI"
}'
Response
Unique identifier for the corpus. Use this ID in all subsequent operations (uploading resources, querying, etc.).
ISO 8601 timestamp when the corpus was created.
ISO 8601 timestamp of the last update to corpus metadata.
Normalized corpus name in lowercase snake_case format.
User-provided description of the corpus content and purpose.
Total storage size in bytes. Initially 0.0 for new corpora, updates as resources are ingested.
Storage location of the vector index (e.g., "qdrant_free_collection"). null until first resource is indexed.
Whether the corpus is publicly discoverable.
The indexing strategy: VSI (Vector Store), SMI (Summary), or DSI (Document Summary).
Current indexing status: PND (Pending), PRS (Processing), IND (Indexed), or ERR (Error).
User ID of the corpus creator. Read-only field for ownership tracking.
Example Response
{
"id" : "8d0f0a5d-4b5e-4c09-9db6-0e9d2aa8a9fd" ,
"created_at" : "2024-09-01T10:05:03.291Z" ,
"updated_at" : "2024-09-01T10:05:03.291Z" ,
"corpora_name" : "support_playbooks" ,
"description" : "Runbooks feeding LlamaIndex" ,
"size_on_disk" : 0.0 ,
"index_location" : null ,
"is_published" : false ,
"index_type" : "VSI" ,
"indexing_status" : "PND" ,
"creator" : "eb81c1d5-78fe-4f35-b58e-0ff6a3ad5d12"
}
Best Practices
Use GET /api/check_corpora_name/?corpora_name=your_name to validate name availability before creating a corpus. This prevents 400 errors from duplicate names. curl -X GET "https://{your-host}/api/check_corpora_name/?corpora_name=support_playbooks" \
-H "Authorization: Bearer $SOAR_LABS_TOKEN "
Save the returned id field immediately - you’ll need it for:
Uploading resources (POST /api/data/files/, /urls/, /strings/)
Executing queries (POST /api/query/)
Retrieving corpus details (GET /api/corpora/{id}/)
Track the indexing_status field as resources are added:
PND → PRS → IND: Normal progression
ERR: Check resource ingestion logs for failures
Poll GET /api/corpora/{id}/ to monitor status changes.
Understanding Read-Only Fields
The following fields are managed by SOAR and cannot be set directly:
size_on_disk - Updated as resources are indexed
index_location - Assigned when first resource is processed
creator - Automatically set to your user ID
id, created_at, updated_at - System-generated metadata
Choose VSI (Vector Store Index) for most use cases. It provides the best semantic search capabilities and works well with the advanced RAG retrieval pipeline.
Client examples
Python
TypeScript / JavaScript
Java
import os
import requests
BASE_URL = "https://your-soar-instance.com"
TOKEN = os.environ[ "SOAR_LABS_TOKEN" ]
payload = {
"corpora_name" : "support_playbooks" ,
"description" : "Runbooks feeding LlamaIndex" ,
}
response = requests.post(
f " { BASE_URL } /api/corpora/" ,
headers = {
"Authorization" : f "Bearer { TOKEN } " ,
"Content-Type" : "application/json" ,
},
json = payload,
timeout = 30 ,
)
response.raise_for_status()
corpus = response.json()
const BASE_URL = "https://your-soar-instance.com" ;
const token = process . env . SOAR_LABS_TOKEN ! ;
async function createCorpus () {
const response = await fetch ( ` ${ BASE_URL } /api/corpora/` , {
method: "POST" ,
headers: {
"Content-Type" : "application/json" ,
Authorization: `Bearer ${ token } ` ,
},
body: JSON . stringify ({
corpora_name: "support_playbooks" ,
description: "Runbooks feeding LlamaIndex" ,
}),
});
if ( ! response . ok ) {
throw new Error ( `Create corpus failed: ${ response . status } ` );
}
return response . json ();
}
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
var BASE_URL = "https://your-soar-instance.com" ;
var token = System . getenv ( "SOAR_LABS_TOKEN" );
var json = "{" +
" \" corpora_name \" : \" support_playbooks \" ," +
" \" description \" : \" Runbooks feeding LlamaIndex \" " +
"}" ;
var request = HttpRequest . newBuilder ( URI . create (BASE_URL + "/api/corpora/" ))
. header ( "Authorization" , "Bearer " + token)
. header ( "Content-Type" , "application/json" )
. POST ( HttpRequest . BodyPublishers . ofString (json))
. build ();
var response = HttpClient . newHttpClient (). send (request, HttpResponse . BodyHandlers . ofString ());
if ( response . statusCode () >= 400 ) {
throw new RuntimeException ( "Create corpus failed: " + response . statusCode ());
}
var body = response . body ();
Authorizations jwtHeaderAuth jwtCookieAuth cookieAuth basicAuth
Bearer authentication header of the form Bearer <token> , where <token> is your auth token.
Body application/json application/x-www-form-urlencoded multipart/form-data
Maximum string length: 100
Description of the corpora
Is the corpora Visible to all users?
Type of index to be used for the corpora
VSI - VectorStoreIndex
SMI - SummaryIndex
DSI - DocumentSummaryIndex
Available options:
VSI,
SMI,
DSI
Status of the corpora processing
PND - Pending
IQE - In Queue
PRS - Processing
DEX - Data Extracted Successfully
DER - Data Extraction Error
IND - Indexed
CMP - Completed
ERR - Error
Available options:
PND,
IQE,
PRS,
DEX,
DER,
IND,
CMP,
ERR
created_at
string<date-time>
required
The date and time the organization was created
updated_at
string<date-time>
required
Maximum string length: 100
Size of the corpora on disk (in bytes)
Location of the index on Remote Storage
Description of the corpora
Is the corpora Visible to all users?
Type of index to be used for the corpora
VSI - VectorStoreIndex
SMI - SummaryIndex
DSI - DocumentSummaryIndex
Available options:
VSI,
SMI,
DSI
Status of the corpora processing
PND - Pending
IQE - In Queue
PRS - Processing
DEX - Data Extracted Successfully
DER - Data Extraction Error
IND - Indexed
CMP - Completed
ERR - Error
Available options:
PND,
IQE,
PRS,
DEX,
DER,
IND,
CMP,
ERR