Skip to main content

File De-Identification

Privacy Data Transformation can de-identify entire files using Guardian Layer. Upload a file, get a de-identified file back — structure and formatting preserved.

All file endpoints use multipart/form-data and require your API key in the X-API-Key header. Character credits are consumed based on the total characters processed.


CSV

Endpoint: POST /api/v1/deidentify/text/proprietary/outputs/csv

Every cell in the CSV is de-identified independently. The structure (rows, columns, headers) is preserved. Returns a CSV file.

import requests

with open("customer-data.csv", "rb") as f:
response = requests.post(
"https://api.custodianai.com/api/v1/deidentify/text/proprietary/outputs/csv",
headers={"X-API-Key": "cai_your_key_here"},
files={"file": ("customer-data.csv", f, "text/csv")},
data={
"domain": "General",
"masking_type": "redact",
"pii_entities": "PERSON,EMAIL_ADDRESS,PHONE_NUMBER",
},
)

with open("deid_customer-data.csv", "wb") as out:
out.write(response.content)

Form parameters:

ParameterTypeDefaultDescription
filefile.csv file
domainstringGeneralContent domain
masking_typestringtransformredact or transform
pii_entitiesstringallComma-separated entity types, e.g. "PERSON,EMAIL_ADDRESS"

DOCX

Endpoint: POST /api/v1/deidentify/text/proprietary/outputs/docx

Text content is replaced in-place within the DOCX XML. Formatting, fonts, and document structure are preserved. Returns a DOCX file.

with open("report.docx", "rb") as f:
response = requests.post(
"https://api.custodianai.com/api/v1/deidentify/text/proprietary/outputs/docx",
headers={"X-API-Key": "cai_your_key_here"},
files={"file": ("report.docx", f, "application/vnd.openxmlformats-officedocument.wordprocessingml.document")},
data={"domain": "General", "masking_type": "redact"},
)

with open("deid_report.docx", "wb") as out:
out.write(response.content)

PDF

Endpoint: POST /api/v1/deidentify/text/proprietary/outputs/pdf

De-identification is applied word-by-word directly on the PDF. Page layout is preserved. Returns a PDF file.

note

PDF de-identification works on PDFs with a text layer (i.e., not scanned images). For image-based PDFs, use the Image De-Identification endpoint instead.

with open("contract.pdf", "rb") as f:
response = requests.post(
"https://api.custodianai.com/api/v1/deidentify/text/proprietary/outputs/pdf",
headers={"X-API-Key": "cai_your_key_here"},
files={"file": ("contract.pdf", f, "application/pdf")},
data={"domain": "General", "masking_type": "redact"},
)

with open("deid_contract.pdf", "wb") as out:
out.write(response.content)

TXT

Endpoint: POST /api/v1/deidentify/text/proprietary/outputs/txt

Plain-text de-identification. Returns a .txt file.

with open("notes.txt", "rb") as f:
response = requests.post(
"https://api.custodianai.com/api/v1/deidentify/text/proprietary/outputs/txt",
headers={"X-API-Key": "cai_your_key_here"},
files={"file": ("notes.txt", f, "text/plain")},
data={"domain": "General", "masking_type": "transform"},
)

with open("deid_notes.txt", "wb") as out:
out.write(response.content)

Common parameters

All file endpoints share the same form parameters:

ParameterTypeDefaultDescription
domainstringGeneralContent domain. Only General is currently available — see Coming Soon for Medical, Finance, and Custom
masking_typestringtransformredact replaces with *****; transform replaces with a plausible alternative
pii_entitiesstringallComma-separated entity types to target. Leave empty or pass ALL for everything