Skip to main content

Privacy Data Transformation

Privacy Data Transformation is CustodianAI's de-identification API. It detects and removes sensitive information from text, documents, and images — before that content reaches an LLM, gets stored, or is shared downstream.

Two approaches in one API

Standard compliance modes

Rules and NER-based detection for common PII categories: names, emails, phone numbers, addresses, dates, and IDs. Outputs mapped to regulatory frameworks.

ModeUse for
MASKEDGeneral-purpose redaction with ****
PROPRIETARYGuardian Layer — domain-aware semantic de-identification

GDPR, HIPAA, and CUSTOM modes are coming soon. → See what's planned

Compliance Modes

Guardian Layer

CustodianAI's proprietary detection layer, identifying domain-specific sensitive content beyond standard PII — proprietary terminology, internal identifiers, and contextually sensitive language that compliance rules don't cover.

Guardian Layer


What you can de-identify

  • Text — any string, from a single sentence to a full document
  • CSV files — every cell is processed independently
  • DOCX files — text content replaced in-place, formatting preserved
  • PDF files — word-level redaction, layout preserved
  • TXT files — plain-text in, plain-text out
  • Images — OCR-based detection and redaction (pending public release)

File De-Identification


Authentication

All Privacy Data Transformation endpoints require your API key in the X-API-Key header:

X-API-Key: cai_your_key_here

API Reference: Authentication


Character credits

Each request consumes credits equal to the number of characters in the input text. File endpoints count the total characters across all processed cells or pages.

Character Credits