Delete a URL

Overview

Permanently delete a URL resource along with its crawled content, metadata, and vector embeddings. This immediately removes the web page content from search results and retrieval operations.

Irreversible: Deletion cannot be undone. The URL record, crawled content, and all vector embeddings are permanently removed.

Use cases: Removing outdated web content, refreshing stale pages, managing dead links, or retracting content from sources that no longer exist.

Authentication

Requires valid JWT token or session authentication. You must own the parent corpus.

Path Parameters

UUID

required

URL resource identifier returned when the URL was created.Example: f0d6fe08-87c8-4eb0-80d8-7a2de638514b

Example request

curl -X DELETE https://{your-host}/api/data/urls/f0d6fe08-87c8-4eb0-80d8-7a2de638514b/ \
  -H "Authorization: Bearer $SOAR_LABS_TOKEN"

Response Codes

204

No Content

URL successfully deleted. No response body returned.

404

Not Found

URL does not exist, was already deleted, or belongs to a corpus you don’t own.

What Gets Deleted

URL record - Database entry with URL and metadata
Crawled content - Extracted text and page information
Vector embeddings - All chunks removed from Qdrant
Crawl metadata - Content type, title, description, and sitemap info

Storage impact: Corpus size reported by GET /api/corpora/ decreases after background cleanup completes.

Common Use Cases

Refresh Stale Content

Delete and re-add URLs to fetch updated content:

# Step 1: Delete old version
response = requests.delete(
    f"{base_url}/api/data/urls/{url_id}/",
    headers=headers
)

if response.status_code == 204:
    print("Old URL deleted")

    # Step 2: Re-add to fetch fresh content
    response = requests.post(
        f"{base_url}/api/data/urls/",
        headers=headers,
        json={
            "corpora": corpus_id,
            "urls": [{
                "url": "https://docs.example.com/updated-page",
                "scrape_sitemap": False
            }]
        }
    )

    new_url = response.json()[0]
    print(f"New URL ID: {new_url['id']}")
    print("Refreshing content...")

Refresh strategy: Schedule periodic refreshes for documentation sites that update frequently (e.g., weekly or monthly).

Clean Up Dead Links

Remove URLs that return 404 or crawl errors:

# Get all URLs in corpus
urls = requests.get(
    f"{base_url}/api/data/urls/?corpora={corpus_id}",
    headers=headers
).json()

# Find URLs with errors
dead_links = []
for url in urls["results"]:
    if url["indexing_status"] == "ERR":
        dead_links.append(url)
        print(f"Dead link found: {url['url']}")

# Delete dead links
if dead_links:
    print(f"\nFound {len(dead_links)} dead links")
    if input("Remove all dead links? (yes/no): ") == "yes":
        for url in dead_links:
            requests.delete(
                f"{base_url}/api/data/urls/{url['id']}/",
                headers=headers
            )
        print(f"Removed {len(dead_links)} dead links")

Batch URL Deletion

Remove multiple URLs from a specific domain or pattern:

# Get all URLs
urls = requests.get(
    f"{base_url}/api/data/urls/?corpora={corpus_id}",
    headers=headers
).json()["results"]

# Delete URLs matching pattern
domain_to_remove = "old-docs.example.com"
matched_urls = []

for url in urls:
    if domain_to_remove in url["url"]:
        matched_urls.append(url)

print(f"Found {len(matched_urls)} URLs from {domain_to_remove}")

# Delete with confirmation
for url in matched_urls:
    print(f"Deleting: {url['url']}")
    requests.delete(
        f"{base_url}/api/data/urls/{url['id']}/",
        headers=headers
    )

print(f"Removed {len(matched_urls)} URLs")

Content Source Migration

Replace URLs from old domain with new domain:

# Get all URLs from old domain
urls = requests.get(
    f"{base_url}/api/data/urls/?corpora={corpus_id}",
    headers=headers
).json()["results"]

old_domain = "old.example.com"
new_domain = "new.example.com"
migrated = []

for url in urls:
    if old_domain in url["url"]:
        # Create new URL with updated domain
        new_url_string = url["url"].replace(old_domain, new_domain)

        # Add new URL
        response = requests.post(
            f"{base_url}/api/data/urls/",
            headers=headers,
            json={
                "corpora": corpus_id,
                "urls": [{"url": new_url_string}]
            }
        )

        if response.status_code == 201:
            # Delete old URL after successful addition
            requests.delete(
                f"{base_url}/api/data/urls/{url['id']}/",
                headers=headers
            )
            migrated.append(new_url_string)
            print(f"Migrated: {url['url']} -> {new_url_string}")

print(f"\nMigrated {len(migrated)} URLs to new domain")

Selective Content Pruning

Remove URLs based on age or relevance:

from datetime import datetime, timedelta

# Get all URLs
urls = requests.get(
    f"{base_url}/api/data/urls/?corpora={corpus_id}",
    headers=headers
).json()["results"]

# Find URLs older than 6 months
cutoff_date = datetime.now() - timedelta(days=180)
old_urls = []

for url in urls:
    indexed_on = url.get("indexed_on")
    if indexed_on:
        indexed_date = datetime.fromisoformat(indexed_on.replace("Z", "+00:00"))
        if indexed_date < cutoff_date:
            old_urls.append(url)

print(f"Found {len(old_urls)} URLs indexed over 6 months ago")

# Review and delete
for url in old_urls:
    print(f"\nURL: {url['url']}")
    print(f"Indexed: {url['indexed_on']}")

    if input("Delete this URL? (y/n): ").lower() == 'y':
        requests.delete(
            f"{base_url}/api/data/urls/{url['id']}/",
            headers=headers
        )
        print("Deleted")

Sitemap URL Cleanup

Remove all URLs added via sitemap scraping:

# Get all URLs in corpus
urls = requests.get(
    f"{base_url}/api/data/urls/?corpora={corpus_id}",
    headers=headers
).json()["results"]

# Find sitemap URLs
sitemap_urls = [url for url in urls if url.get("scrape_sitemap")]

print(f"Found {len(sitemap_urls)} URLs added via sitemap")

# Delete sitemap URLs
if input("Remove all sitemap URLs? (yes/no): ") == "yes":
    for url in sitemap_urls:
        requests.delete(
            f"{base_url}/api/data/urls/{url['id']}/",
            headers=headers
        )
    print(f"Removed {len(sitemap_urls)} sitemap URLs")

Use case: Useful when sitemap scraping added too many irrelevant pages.

Content freshness: Set up automated workflows to periodically refresh documentation URLs (delete old → re-add new) for sources that update frequently.

Other resources unaffected: Removing URLs does not affect files or strings in the same corpus. Each resource type is independent.

Client examples

Python
TypeScript / JavaScript
Java

import os
import requests

BASE_URL = "https://your-soar-instance.com"
TOKEN = os.environ["SOAR_LABS_TOKEN"]
URL_ID = "f0d6fe08-87c8-4eb0-80d8-7a2de638514b"

response = requests.delete(
    f"{BASE_URL}/api/data/urls/{URL_ID}/",
    headers={"Authorization": f"Bearer {TOKEN}"},
    timeout=30,
)
if response.status_code != 204:
    response.raise_for_status()

const BASE_URL = "https://your-soar-instance.com";
const token = process.env.SOAR_LABS_TOKEN!;

async function deleteUrl(urlId: string) {
  const response = await fetch(`${BASE_URL}/api/data/urls/${urlId}/`, {
    method: "DELETE",
    headers: {
      Authorization: `Bearer ${token}`,
    },
  });

  if (response.status !== 204) {
    throw new Error(`Delete URL failed: ${response.status}`);
  }
}

import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;

var BASE_URL = "https://your-soar-instance.com";
var token = System.getenv("SOAR_LABS_TOKEN");
var urlId = "f0d6fe08-87c8-4eb0-80d8-7a2de638514b";

var request = HttpRequest.newBuilder(URI.create(BASE_URL + "/api/data/urls/" + urlId + "/"))
    .header("Authorization", "Bearer " + token)
    .DELETE()
    .build();

var response = HttpClient.newHttpClient().send(request, HttpResponse.BodyHandlers.discarding());

if (response.statusCode() != 204) {
    throw new RuntimeException("Delete URL failed: " + response.statusCode());
}

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Path Parameters

string<uuid>

required

A UUID string identifying this URL.

Response

204

No response body

Getting Started

Corpus Management

Query and Retrieve

Resources

Overview

Authentication

Path Parameters

Example request

Response Codes

What Gets Deleted

Common Use Cases

Client examples

Authorizations

Path Parameters

Response

Getting Started

Corpus Management

Query and Retrieve

Resources

​Overview

​Authentication

​Path Parameters

​Example request

​Response Codes

​What Gets Deleted

​Common Use Cases

​Client examples

Authorizations

Path Parameters

Response

Overview

Authentication

Path Parameters

Example request

Response Codes

What Gets Deleted

Common Use Cases

Client examples