Have you ever tried extracting text from a photo of a receipt or a scanned contract? If so, you likely know the results can be hit or miss. Perhaps the paper is tilted, the lighting is uneven, or the picture is grainy. Because of this, your OCR engine cannot read the words correctly.
Document Detection fixes these problems. It finds the document in your image, straightens it out, and cleans it up. This makes it much easier for the computer to read the text.
Why OCR Fails on Real-World Images
OCR tools usually expect perfect pictures. They are trained on flat, bright, and straight text. However, real photos from users are rarely perfect.
For example, users often take photos of bills at odd angles. Furthermore, they take pictures of ID cards on messy desks or upload scans where the paper moved.
If you send these raw images to an OCR tool, it will likely fail. As a result, the text comes back mixed up, missing parts, or in the wrong order.
Filestack Document Detection handles these messy inputs automatically. It finds the edges of the paper, fixes the angle, and cleans up the image. In the end, you get a clean picture that is ready for text extraction.
How It Works
Document Detection uses smart computer programs to find the edges of the paper. It combines two different methods. The first looks for lines and edges in the picture. The second uses a computer brain that has learned from thousands of document photos.
Specifically, the process happens in four steps
It creates a map of the document using a computer model.
It finds the four corners of the paper.
It changes the angle so the document fills the whole picture.
Ideally, it cleans up the noise and makes the darks darker and lights lighter.
Let’s try uploading the receipt below and test out the modes.
Three Detection Modes
Document Detection offers three modes depending on what you need.
Coordinates Mode
This mode gives you a list of numbers that show where the document corners are. Consequently, the answer includes the position for each corner in order. This starts with the top-left and goes around to the bottom-left.
API Call
doc_detection=coords:true
Response
{
"coords": {
"x": 106,
"y": 464,
"width": 580,
"height": 231
}
}
Therefore, use coordinates mode when you need to draw boxes on the screen, crop the image yourself, or tell another system where the document is.
Warped Mode
Unlike coordinates mode, warped mode fixes the angle of the document so it looks flat. The image is straightened, but the system does not clean up the colors or brightness.
API Call
doc_detection=preprocess:false
The result is a new image where the document fills the whole frame. It fixes any twists or tilts from the camera angle. However, the picture looks exactly like the original in terms of color and quality.
You should use warped mode when you want a straight image but plan to clean it up yourself. It is also good if you need to keep the original colors.
Preprocessed Mode
Finally, this mode does everything at once. It straightens the image and also cleans up graininess and improves the contrast. This creates the best possible picture for reading text.
API Call
doc_detection=preprocess:true
This is the default setting if you do not choose one. The cleaning step reduces noise and makes the text sharp.
Use preprocessed mode for OCR tasks. The cleaning steps make it much easier for the computer to read names and numbers, especially on photos taken in bad lighting.
Full API Examples
Document Detection works by changing the URL in the Processing API. All requests need a security policy and signature.
Get coordinates
https://cdn.filestackcontent.com/security=p:POLICY,s:SIGNATURE/doc_detection=coords:true/HANDLE
Get warped image
https://cdn.filestackcontent.com/security=p:POLICY,s:SIGNATURE/doc_detection=preprocess:false/HANDLE
Get preprocessed image
https://cdn.filestackcontent.com/security=p:POLICY,s:SIGNATURE/doc_detection=preprocess:true/HANDLE
Chaining with Resize
Images must be 2000×2000 pixels or smaller. For larger images, you must add a resize step to the link
Chaining with OCR
Additionally, you can send the clean image directly into Filestack OCR
And it will return this from our uploaded receipt example (note, just the “text” part for brevity)
"text": "Manila Automated Fare\nCollection System\nRECEIPT\nTax ID:\nReceipt No:\nDate and Time:\nStation Name:\nPOS ID:\nOperator ID:\nCard ID:\nCard Name:\n004-1-231125-1134-53\n23 NOV 2025 11:34\nGil Puyat\nPOI, POS.A.02.00.03\n100108195\n6378050082232813\nStandard SVC\nAdd value\nAdd Value:\nold Remaining value:\n42.00\n100.00\nNew Remaining value:\n142.00\nAmount Payable:\n100.00\nAmount Received:\n100.00",
Common Use Cases
Receipt processing. Users take photos of receipts in dark rooms or at weird angles. Preprocessed mode straightens these out and makes the text pop before you try to read the prices.
ID verification. Similarly, driver’s licenses on tables are rarely straight. Coordinates mode can help you show users where to place their card, while preprocessed mode makes it easier to read names and dates.
Contract digitization. Scanned pages that moved during scanning get fixed. This makes sure you capture full paragraphs without cutting off lines.
Mobile document capture. Any app where users upload photos of papers needs this. Generally, the worse the original photo is, the more this tool helps.
Conclusion
Fixing messy real-world photos is very important for modern apps. Document Detection solves this by automatically straightening and cleaning your images. As a result, you get better accuracy and a smoother experience for your users, even if they upload imperfect photos.
Whether you are working with receipts, IDs, or contracts, using these tools helps you handle tricky situations. By cleaning your images before trying to read the text, you save time and fix errors before they happen.
To get started, look at the Processing API documentation to learn how to add this to your work.
This article was published first on the Filestack blog.



Top comments (0)