Fetch detailed metadata for a single corpus by its unique identifier. This endpoint returns the same comprehensive information as the list endpoint but scoped to one record. Only the corpus owner can retrieve this data.
Use this for: Checking indexing status after creation, refreshing UI state after updates, validating corpus existence before operations.
After creating a corpus and uploading resources, poll this endpoint to check when indexing completes:
Copy
import timeimport requestsdef wait_for_indexing(base_url, token, corpus_id, timeout=300): start = time.time() while time.time() - start < timeout: response = requests.get( f"{base_url}/api/corpora/{corpus_id}/", headers={"Authorization": f"Bearer {token}"} ) corpus = response.json() if corpus["indexing_status"] == "IND": print("Corpus ready for queries!") return corpus elif corpus["indexing_status"] == "ERR": raise Exception("Indexing failed") time.sleep(5) # Poll every 5 seconds raise TimeoutError("Indexing did not complete in time")
Verify Update Success
After modifying corpus settings, fetch the latest state to confirm changes:
Copy
# Update the corpusrequests.patch( f"{base_url}/api/corpora/{corpus_id}/", headers=headers, json={"is_published": True})# Verify the changecorpus = requests.get( f"{base_url}/api/corpora/{corpus_id}/", headers=headers).json()assert corpus["is_published"] == True
Check Storage Usage
Monitor storage consumption before adding more resources:
Copy
corpus = requests.get( f"{base_url}/api/corpora/{corpus_id}/", headers=headers).json()size_mb = corpus["size_on_disk"] / (1024 * 1024)print(f"Current storage: {size_mb:.2f} MB")# Warn if approaching limitsif size_mb > 500: print("Warning: Large corpus size may affect query performance")
Validate Before Operations
Ensure corpus exists and is ready before performing operations:
Copy
try: corpus = requests.get( f"{base_url}/api/corpora/{corpus_id}/", headers=headers ).json() # Check if ready for queries if corpus["indexing_status"] != "IND": raise ValueError("Corpus not fully indexed yet") # Proceed with operation query_corpus(corpus_id)except requests.HTTPError as e: if e.response.status_code == 404: print("Corpus not found or access denied")
Security Note: Requests return 404 Not Found if the corpus belongs to another user. This prevents leaking corpus existence to unauthorized users.