Detectors¶
Detectors scan text for PII and return a list of EntitySpan objects. PrivacyLens ships with three detectors — one built-in and two optional.
Built-in: Regex detector¶
Zero dependencies. Always available. Covers the most common PII types.
| Entity | Example input | Token |
|---|---|---|
EMAIL |
john@example.com |
[EMAIL_1] |
PHONE |
555-123-4567, (415) 555-0198, +14155550198 |
[PHONE_1] |
SSN |
123-45-6789 |
[SSN_1] |
The regex detector is enabled by default — no config needed.
Adding custom patterns¶
# privacylens.yaml
detectors:
regex:
patterns:
- entity_type: EMPLOYEE_ID
pattern: 'EMP-\d{4,}'
- entity_type: CREDIT_CARD
pattern: '\b\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}\b'
Custom patterns are additive — EMAIL, PHONE, and SSN are still detected.
Optional: Presidio (50+ entity types)¶
Microsoft Presidio provides ML-backed detection for names, addresses, credit cards, passport numbers, and 50+ more entity types.
Install¶
Enable¶
Detected entities (sample)¶
PERSON, EMAIL_ADDRESS, PHONE_NUMBER, CREDIT_CARD, US_SSN, US_PASSPORT, IBAN_CODE, IP_ADDRESS, LOCATION, DATE_TIME, NRP, MEDICAL_LICENSE, and more.
Presidio uses spaCy under the hood. The first run downloads a language model (~50 MB).
Optional: GLiNER (semantic / ML-based)¶
GLiNER is a lightweight NER model that detects entities by semantic meaning rather than pattern matching. Useful for names and addresses that regex can't reliably catch.
Install¶
Enable¶
Default labels detected¶
person, email, phone, address, organization
The GLiNER model (
urchade/gliner_medium-v2.1, ~300 MB) is downloaded from HuggingFace on first use.
Using multiple detectors together¶
Detectors can be combined. PrivacyLens merges their results and resolves overlapping spans by keeping the longest match.
detectors:
regex:
enabled: true
patterns:
- entity_type: EMPLOYEE_ID
pattern: 'EMP-\d{4,}'
pii:
enabled: true
Writing a custom detector¶
See Writing a Custom Detector.
TypeScript¶
The TypeScript SDK currently ships the regex detector only. Presidio and GLiNER are Python-only.
import { inspect } from "privacylens";
// Uses regex detector by default
const spans = inspect("SSN: 123-45-6789");
Custom patterns work the same way via privacylens.yaml or loadConfig: