Permanently delete a corpus and all its associated resources. This operation cascades through all files, strings, URLs, and social media imports, removing storage objects and vector embeddings from Qdrant.
Irreversible Operation: This action cannot be undone. All indexed data, resources, and query history will be permanently deleted.
Alternative: If you want to temporarily disable access, set is_published: false via PATCH instead of deleting.
import jsonfrom datetime import datetime# Get corpus and all resourcescorpus = requests.get( f"{base_url}/api/corpora/{corpus_id}/", headers=headers).json()resources = { "corpus": corpus, "files": requests.get( f"{base_url}/api/data/files/?corpora={corpus_id}", headers=headers ).json()["results"], "strings": requests.get( f"{base_url}/api/data/strings/?corpora={corpus_id}", headers=headers ).json()["results"], "urls": requests.get( f"{base_url}/api/data/urls/?corpora={corpus_id}", headers=headers ).json()["results"]}# Save backuptimestamp = datetime.now().strftime("%Y%m%d_%H%M%S")backup_file = f"corpus_backup_{corpus['corpora_name']}_{timestamp}.json"with open(backup_file, 'w') as f: json.dump(resources, f, indent=2)print(f"Backup saved to {backup_file}")# Now safe to deleterequests.delete(f"{base_url}/api/corpora/{corpus_id}/", headers=headers)
Alternative: Soft Delete
Instead of deleting, make the corpus private and add an archive marker:
Copy
# Mark as archived instead of deletingrequests.patch( f"{base_url}/api/corpora/{corpus_id}/", headers=headers, json={ "is_published": False, "description": f"[ARCHIVED {datetime.now().date()}] {corpus['description']}" })# Corpus remains accessible but hidden# Can be unarchived later by removing the marker
Benefits:
Reversible operation
Data preserved for future reference
Can restore if needed
Maintains audit trail
Handling Deletion Failures
Handle 409 errors gracefully:
Copy
import timedef safe_delete_corpus(base_url, headers, corpus_id, max_retries=3): """Delete corpus with retry logic for transient failures.""" for attempt in range(max_retries): try: response = requests.delete( f"{base_url}/api/corpora/{corpus_id}/", headers=headers, timeout=60 ) if response.status_code == 204: print("Corpus deleted successfully") return True elif response.status_code == 409: if attempt < max_retries - 1: wait_time = 2 ** attempt # Exponential backoff print(f"Deletion conflict. Retrying in {wait_time}s...") time.sleep(wait_time) else: print("Deletion failed after retries. Contact support.") return False else: response.raise_for_status() except requests.RequestException as e: print(f"Error: {e}") if attempt < max_retries - 1: time.sleep(2 ** attempt) else: return False return False
Batch Deletion
Delete multiple corpora safely:
Copy
corpus_ids = ["id1", "id2", "id3"]deleted = []failed = []for corpus_id in corpus_ids: try: # Optional: Verify corpus name before deleting corpus = requests.get( f"{base_url}/api/corpora/{corpus_id}/", headers=headers ).json() print(f"Deleting {corpus['corpora_name']}...") response = requests.delete( f"{base_url}/api/corpora/{corpus_id}/", headers=headers ) if response.status_code == 204: deleted.append(corpus_id) print(f"✓ Deleted {corpus['corpora_name']}") else: failed.append((corpus_id, response.status_code)) print(f"✗ Failed: {response.status_code}") except Exception as e: failed.append((corpus_id, str(e))) print(f"✗ Error: {e}")print(f"\nSummary: {len(deleted)} deleted, {len(failed)} failed")
Security: Deletion requests for corpora owned by other users return 403 Forbidden without revealing whether the corpus exists.
Production Best Practice: Implement a two-step deletion process: mark for deletion first, then actually delete after a grace period (e.g., 30 days).