Permanently delete a text snippet resource along with its metadata and vector embeddings. This immediately removes the string content from search results and retrieval operations.
Irreversible: Deletion cannot be undone. The string record and its vector embeddings are permanently removed.
Use cases: Retracting outdated guidance, removing obsolete policies, correcting misinformation, or managing corpus content quality.
Version tracking: Include version dates or identifiers in string titles for easy tracking and replacement.
Clean Up Duplicates
Find and remove duplicate string content:
Copy
# Get all stringsstrings = requests.get( f"{base_url}/api/data/strings/?corpora={corpus_id}", headers=headers).json()["results"]# Find duplicates by contentseen_content = {}duplicates = []for string in strings: content = string["string"].strip().lower() if content in seen_content: duplicates.append(string["id"]) print(f"Duplicate found: {string['title']}") else: seen_content[content] = string["id"]# Delete duplicatesfor dup_id in duplicates: requests.delete( f"{base_url}/api/data/strings/{dup_id}/", headers=headers )print(f"Removed {len(duplicates)} duplicate strings")
Policy Retraction Workflow
Safely retract policy or guidance:
Copy
# Step 1: Verify string content before deletionstring = requests.get( f"{base_url}/api/data/strings/{string_id}/", headers=headers).json()print(f"Title: {string['title']}")print(f"Content: {string['string'][:100]}...")print(f"Created: {string['created_at']}")# Step 2: Log the deletion for audit trailaudit_log = { "action": "string_deletion", "string_id": string_id, "title": string["title"], "timestamp": datetime.now().isoformat(), "reason": "Policy superseded by new guidance"}# Save audit logwith open("deletion_audit.json", "a") as f: f.write(json.dumps(audit_log) + "\n")# Step 3: Delete the stringresponse = requests.delete( f"{base_url}/api/data/strings/{string_id}/", headers=headers)if response.status_code == 204: print("String retracted successfully") print("Audit log saved")
Compliance note: Maintain audit logs for all content deletions to meet regulatory requirements.
Selective Corpus Cleanup
Clean up strings based on age or usage:
Copy
from datetime import datetime, timedelta# Get all stringsstrings = requests.get( f"{base_url}/api/data/strings/?corpora={corpus_id}", headers=headers).json()["results"]# Delete strings older than 1 yearcutoff_date = datetime.now() - timedelta(days=365)old_strings = []for string in strings: created = datetime.fromisoformat(string["created_at"].replace("Z", "+00:00")) if created < cutoff_date: old_strings.append(string)print(f"Found {len(old_strings)} strings older than 1 year")# Delete with confirmationif input("Proceed with deletion? (yes/no): ") == "yes": for string in old_strings: requests.delete( f"{base_url}/api/data/strings/{string['id']}/", headers=headers ) print(f"Deleted {len(old_strings)} old strings")
Content quality: Regularly review and remove outdated strings to maintain high-quality retrieval results and corpus relevance.