Privacy Data Transformation
Privacy Data Transformation is CustodianAI's de-identification API. It detects and removes sensitive information from text, documents, and images — before that content reaches an LLM, gets stored, or is shared downstream.
Two approaches in one API
Standard compliance modes
Rules and NER-based detection for common PII categories: names, emails, phone numbers, addresses, dates, and IDs. Outputs mapped to regulatory frameworks.
| Mode | Use for |
|---|---|
MASKED | General-purpose redaction with **** |
PROPRIETARY | Guardian Layer — domain-aware semantic de-identification |
GDPR, HIPAA, and CUSTOM modes are coming soon. → See what's planned
Guardian Layer
CustodianAI's proprietary detection layer, identifying domain-specific sensitive content beyond standard PII — proprietary terminology, internal identifiers, and contextually sensitive language that compliance rules don't cover.
What you can de-identify
- Text — any string, from a single sentence to a full document
- CSV files — every cell is processed independently
- DOCX files — text content replaced in-place, formatting preserved
- PDF files — word-level redaction, layout preserved
- TXT files — plain-text in, plain-text out
- Images — OCR-based detection and redaction (pending public release)
Authentication
All Privacy Data Transformation endpoints require your API key in the X-API-Key header:
X-API-Key: cai_your_key_here
→ API Reference: Authentication
Character credits
Each request consumes credits equal to the number of characters in the input text. File endpoints count the total characters across all processed cells or pages.