
🔍 X-Ray: Is Your “Redacted” PDF Still Readable?
Ever seen a PDF with text “blacked out” under a rectangle? Often, that text is still there. x-ray is a Python library by Free Law Project that automatically detects these failed redactions.
🚨 The problem:
Instead of removing text, many documents simply draw a black rectangle on top of it. The text remains selectable. Free Law Project found this pattern in millions of court PDFs — their favorite case: a document that exposed Taylor Swift’s personal phone number.
⚙️ How it works:
- Finds rectangles in the PDF
- Detects letters at the same location
- Renders the rectangle as an image
- Checks if it’s a single color → hidden text below
uvx --from x-ray xray document.pdf
# Output: {"1": [{"bbox": [...], "text": "secret text"}]}Also works with remote URLs and as a Python module. Internally uses PyMuPDF. BSD license.
💡 Quick explanation
A PDF is like a layered digital paper. A proper redaction deletes the text. A bad redaction just puts a black box on top — the text is still underneath! This tool makes visible what should have been deleted.
More information at the link 👇
