Introduction
What is PII?
Personally Identifiable Information (PII) is any data that can be used — alone or in combination with other data — to identify a specific individual.
Some identifiers are direct: a person's full name, email address, phone number, or social security number is enough on its own to identify them. Others are indirect: a job title, city, age, and gender each seem harmless alone, but together they often narrow down to a single person.
This distinction matters because indirect identifiers are easy to overlook in real-world data — especially in unstructured text like support transcripts, clinical notes, or internal documents.
Why detection is harder than it looks
Rules-based systems — regular expressions and simple entity lists — catch the obvious cases. But real-world text is messy:
- Names that are also place names or common words ("Jordan left the office")
- Dates written in dozens of formats across different locales
- Medical or financial terminology that is sensitive in context but not on a general PII list
- Internal codenames, project names, or client identifiers that no off-the-shelf model has ever seen
This is where standard compliance tools fall short. They handle the categories they were trained on; they don't know what's sensitive for your organization.
How CustodianAI approaches it
Privacy Data Transformation handles both sides of the problem — detection and output transformation.
Detection uses NER-based models and rule patterns to identify common PII categories across your text: names, phone numbers, email addresses, physical addresses, dates, IDs, ages, and more.
Transformation controls what happens to detected entities in the output:
| Mode | Output | Status |
|---|---|---|
MASKED | Replaces PII with asterisks — e.g. J*** S****, (***) ***-**** | Available |
PROPRIETARY | Guardian Layer — replaces with semantically similar alternatives or redacts in-place | Available |
GDPR, HIPAA, CUSTOM | Labeled placeholders, HIPAA Safe Harbor, custom replacement maps | Coming soon |
Guardian Layer is CustodianAI's proprietary detection mode within Privacy Data Transformation. It goes beyond standard PII to identify domain-specific sensitive content — proprietary terminology, internal identifiers, and contextually sensitive language that compliance rules don't cover.
All endpoints are available through the same API and can be tuned by domain (General, Medical, Finance, or Custom).
All processing is stateless. Text submitted to the API is not stored, logged, or used for training.
Next steps
Ready to make your first call? Head to the Quickstart.