Introduction

What is PII?

Personally Identifiable Information (PII) is any data that can be used — alone or in combination with other data — to identify a specific individual.

Some identifiers are direct: a person's full name, email address, phone number, or social security number is enough on its own to identify them. Others are indirect: a job title, city, age, and gender each seem harmless alone, but together they often narrow down to a single person.

This distinction matters because indirect identifiers are easy to overlook in real-world data — especially in unstructured text like support transcripts, clinical notes, or internal documents.

Why detection is harder than it looks

Rules-based systems — regular expressions and simple entity lists — catch the obvious cases. But real-world text is messy:

Names that are also place names or common words ("Jordan left the office")
Dates written in dozens of formats across different locales
Medical or financial terminology that is sensitive in context but not on a general PII list
Internal codenames, project names, or client identifiers that no off-the-shelf model has ever seen

This is where standard compliance tools fall short. They handle the categories they were trained on; they don't know what's sensitive for your organization.

How CustodianAI approaches it

Privacy Data Transformation handles both sides of the problem — detection and output transformation.

Detection uses NER-based models and rule patterns to identify common PII categories across your text: names, phone numbers, email addresses, physical addresses, dates, IDs, ages, and more.

Transformation controls what happens to detected entities in the output:

Mode	Output	Status
`MASKED`	Replaces PII with asterisks — e.g. `J* S`, `() -**`	Available
`PROPRIETARY`	Guardian Layer — replaces with semantically similar alternatives or redacts in-place	Available
`GDPR`, `HIPAA`, `CUSTOM`	Labeled placeholders, HIPAA Safe Harbor, custom replacement maps	Coming soon

Guardian Layer is CustodianAI's proprietary detection mode within Privacy Data Transformation. It goes beyond standard PII to identify domain-specific sensitive content — proprietary terminology, internal identifiers, and contextually sensitive language that compliance rules don't cover.

All endpoints are available through the same API and can be tuned by domain (General, Medical, Finance, or Custom).

note

All processing is stateless. Text submitted to the API is not stored, logged, or used for training.

Next steps

Ready to make your first call? Head to the Quickstart.

What is PII?​

Why detection is harder than it looks​

How CustodianAI approaches it​

Next steps​

What is PII?

Why detection is harder than it looks

How CustodianAI approaches it

Next steps