Skip to main content

POST /api/v1/deidentify/text/proprietary/outputs/pdf

De-identifies a PDF file word by word using Guardian Layer, preserving page layout. Sensitive words are replaced using PDF redaction annotations. Returns a de-identified PDF.

note

This endpoint processes PDFs with a text layer. Scanned PDFs without selectable text will not have content de-identified. For image-based documents, use the image endpoint instead.


Request

POST /api/v1/deidentify/text/proprietary/outputs/pdf
X-API-Key: cai_your_key_here
Content-Type: multipart/form-data

Form fields

FieldTypeRequiredDefaultDescription
filefileYesA .pdf file
domainstringNoGeneralGeneral, Medical, Finance, or Custom
masking_typestringNotransformredact fills changed words with a black rectangle; transform replaces with alternative text
pii_entitiesstringNoallComma-separated entity types. Leave empty for all

Response

Returns application/pdf. The filename is prefixed with deid_.


Example

import requests

with open("contract.pdf", "rb") as f:
response = requests.post(
"https://api.custodianai.com/api/v1/deidentify/text/proprietary/outputs/pdf",
headers={"X-API-Key": "cai_your_key_here"},
files={"file": ("contract.pdf", f, "application/pdf")},
data={
"domain": "Finance",
"masking_type": "redact",
"pii_entities": "PERSON,ID_NUMBER",
},
)

with open("deid_contract.pdf", "wb") as out:
out.write(response.content)

Error responses

StatusDescription
400File is not a .pdf or is malformed
401Missing or invalid API key
403Key expired or character limit reached
500PDF processing library unavailable