Skip to main content
POST
/
api
/
data
/
strings
{
  "id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "created_at": "2023-11-07T05:31:56Z",
  "updated_at": "2023-11-07T05:31:56Z",
  "indexed_on": "2023-11-07T05:31:56Z",
  "indexing_status": "PND",
  "string": "<string>",
  "character_count": 123,
  "corpora": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "title": "<string>"
}

Overview

Strings are lightweight resources for ingesting raw text without creating files. The endpoint accepts batches so you can import multiple snippets in one request. Each record is queued for ingestion, then chunked and indexed for retrieval just like uploaded files.
Perfect for: Code snippets, FAQs, short documents, configuration examples, or any text content that doesn’t require file uploads.
Strings are processed asynchronously. Monitor indexing_status to track when content is ready for querying.

Authentication

Requires valid JWT token or session authentication. You must be the owner of the target corpus.

Request Body

corpora
UUID
required
ID of the corpus that will own these text strings. Must be a corpus you created and have access to.
strings
array<object>
required
Array of text snippets to ingest. Each object represents one string resource.Batch size recommendations:
  • Optimal: 10-50 strings per request
  • Maximum: Check your instance configuration (typically 100-200)

Example request

curl -X POST https://{your-host}/api/data/strings/ \
  -H "Authorization: Bearer $SOAR_LABS_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "corpora": "8d0f0a5d-4b5e-4c09-9db6-0e9d2aa8a9fd",
    "strings": [
      {"title": "Escalation policy", "string": "Escalate to L2 after 30 minutes."},
      {"title": "SLA definition", "string": "Critical tickets must be responded to within 10m."}
    ]
  }'

Response

Returns an array of string objects (one for each submitted string):
id
UUID
Unique identifier for the string resource. Use for tracking, retrieval, or deletion.
created_at
timestamp
ISO 8601 timestamp when the string was created.
updated_at
timestamp
Last update timestamp. Changes when indexing status updates.
indexed_on
timestamp | null
Timestamp when indexing completed successfully. null while processing.
indexing_status
string
Current processing status:
  • PRS - Processing (chunking and indexing in progress)
  • IND - Indexed (ready for queries)
  • ERR - Error (processing failed)
  • PND - Pending (queued for processing)
title
string
The display title you provided for this string.
string
string
The raw text content being indexed.
character_count
integer
Total character count of the string content. Useful for tracking corpus size.
corpora
UUID
ID of the parent corpus containing this string.

Example Response

[
  {
    "id": "b43bbf35-9819-47a8-8aff-d4eb4b3e8219",
    "created_at": "2024-09-01T10:19:59.709807Z",
    "updated_at": "2024-09-01T10:19:59.709922Z",
    "indexed_on": null,
    "indexing_status": "PRS",
    "title": "Escalation policy",
    "string": "Escalate to L2 after 30 minutes.",
    "character_count": 35,
    "corpora": "8d0f0a5d-4b5e-4c09-9db6-0e9d2aa8a9fd"
  },
  {
    "id": "2f3c3748-173a-4ace-875d-cff12d5d71ee",
    "created_at": "2024-09-01T10:19:59.710042Z",
    "updated_at": "2024-09-01T10:19:59.710057Z",
    "indexed_on": null,
    "indexing_status": "PRS",
    "title": "SLA definition",
    "string": "Critical tickets must be responded to within 10m.",
    "character_count": 55,
    "corpora": "8d0f0a5d-4b5e-4c09-9db6-0e9d2aa8a9fd"
  }
]

Important Notes

Batch Validation: The entire request fails if any string in the batch has missing title or string fields. Validate all entries before submitting.
Track ingestion progress via GET /api/data/strings/?corpora={id}. Each record transitions from PRSIND when indexing completes.

Client examples

import os
import requests

BASE_URL = "https://your-soar-instance.com"
TOKEN = os.environ["SOAR_LABS_TOKEN"]
CORPUS_ID = "8d0f0a5d-4b5e-4c09-9db6-0e9d2aa8a9fd"

payload = {
    "corpora": CORPUS_ID,
    "strings": [
        {"title": "Escalation policy", "string": "Escalate to L2 after 30 minutes."},
        {"title": "SLA definition", "string": "Critical tickets must be responded to within 10m."},
    ],
}

response = requests.post(
    f"{BASE_URL}/api/data/strings/",
    headers={
        "Authorization": f"Bearer {TOKEN}",
        "Content-Type": "application/json",
    },
    json=payload,
    timeout=30,
)
response.raise_for_status()
strings = response.json()

Best Practices

Structure strings for optimal retrieval:
  1. Use descriptive titles - Helps with search context and user navigation
  2. Keep content focused - One topic per string for better semantic matching
  3. Include relevant keywords - Natural language is best, avoid keyword stuffing
  4. Add contextual information - Background details improve answer quality
  5. Format code properly - Use markdown code blocks for syntax highlighting
Example - Good Structure:
{
  "title": "JWT Token Refresh Endpoint",
  "string": "To refresh an expired JWT token, send a POST request to /api/auth/token/refresh/ with your refresh token in the request body. The endpoint returns a new access token valid for 1 hour."
}
Efficient batch importing techniques:
  • Group related content - Import FAQ sets, documentation sections together
  • Use consistent naming - Helps with organization and search
  • Start small - Test with 5-10 strings before bulk importing
  • Monitor progress - Poll status endpoint after each batch
  • Handle failures gracefully - Retry failed batches with corrections
Typical workflow:
  1. Prepare 20-50 strings in a batch
  2. Submit batch via POST request
  3. Wait 5-10 seconds for initial processing
  4. Poll status endpoint until all show IND
  5. Proceed to next batch
Ideal scenarios for string resources:Developer Documentation:
  • API endpoint descriptions
  • Code examples and snippets
  • Configuration templates
  • Error messages and solutions
Knowledge Base:
  • FAQ answers
  • Policy statements
  • Product specifications
  • Troubleshooting steps
Training Content:
  • Glossary definitions
  • Best practice guidelines
  • Process documentation
  • Quick reference guides
Avoid common issues:Validation Errors:
  • Ensure every object has both title and string
  • Check title length doesn’t exceed 100 characters
  • Verify corpus ID is valid UUID format
Processing Failures:
  • Avoid extremely long strings (>50,000 chars)
  • Remove special control characters
  • Use UTF-8 encoding for text with unicode
  • Test special characters in small batches first
Performance Issues:
  • Don’t send more than 100 strings per request
  • Wait for previous batch to complete before sending next
  • Implement exponential backoff on errors

Management Operations

Retrieve all strings in a corpus:
curl -X GET "https://{your-host}/api/data/strings/?corpora={corpus-id}" \
  -H "Authorization: Bearer $SOAR_LABS_TOKEN"
Supports pagination with page and page_size parameters.
Modify an existing string’s content or title:
curl -X PATCH "https://{your-host}/api/data/strings/{string-id}/" \
  -H "Authorization: Bearer $SOAR_LABS_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "Updated Title",
    "string": "Updated content"
  }'
Note: Updates trigger re-indexing of the content.
Remove strings from the corpus:
curl -X DELETE "https://{your-host}/api/data/strings/{string-id}/" \
  -H "Authorization: Bearer $SOAR_LABS_TOKEN"
Warning: Deletion is immediate and removes vector embeddings. Cannot be undone.

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

strings
object[]
required
string
string
required

Content of the string

corpora
string<uuid>
required

Corpora to which the Maps to

title
string | null

Title of the string

Maximum string length: 100

Response

201 - application/json
id
string<uuid>
required
created_at
string<date-time>
required

The date and time the organization was created

updated_at
string<date-time>
required

Last updated time

indexed_on
string<date-time> | null
required
indexing_status
enum<string>
required
  • PND - Pending
  • IQE - In Queue
  • PRS - Processing
  • DEX - Data Extracted Successfully
  • DER - Data Extraction Error
  • IND - Indexed
  • CMP - Completed
  • ERR - Error
Available options:
PND,
IQE,
PRS,
DEX,
DER,
IND,
CMP,
ERR
string
string
required

Content of the string

character_count
integer | null
required

Number of characters in the string

corpora
string<uuid>
required

Corpora to which the Maps to

title
string | null

Title of the string

Maximum string length: 100