DEV Community

Cover image for How to Fix Crooked Documents Before OCR Runs
IderaDevTools
IderaDevTools

Posted on • Originally published at blog.filestack.com

How to Fix Crooked Documents Before OCR Runs

Have you ever tried extracting text from a photo of a receipt or a scanned contract? If so, you likely know the results can be hit or miss. Perhaps the paper is tilted, the lighting is uneven, or the picture is grainy. Because of this, your OCR engine cannot read the words correctly.

Document Detection fixes these problems. It finds the document in your image, straightens it out, and cleans it up. This makes it much easier for the computer to read the text.

Why OCR Fails on Real-World Images

OCR tools usually expect perfect pictures. They are trained on flat, bright, and straight text. However, real photos from users are rarely perfect.

For example, users often take photos of bills at odd angles. Furthermore, they take pictures of ID cards on messy desks or upload scans where the paper moved.

If you send these raw images to an OCR tool, it will likely fail. As a result, the text comes back mixed up, missing parts, or in the wrong order.

Filestack Document Detection handles these messy inputs automatically. It finds the edges of the paper, fixes the angle, and cleans up the image. In the end, you get a clean picture that is ready for text extraction.

How It Works

Document Detection uses smart computer programs to find the edges of the paper. It combines two different methods. The first looks for lines and edges in the picture. The second uses a computer brain that has learned from thousands of document photos.

Specifically, the process happens in four steps

  1. It creates a map of the document using a computer model.

  2. It finds the four corners of the paper.

  3. It changes the angle so the document fills the whole picture.

  4. Ideally, it cleans up the noise and makes the darks darker and lights lighter.

Let’s try uploading the receipt below and test out the modes.

Three Detection Modes

Document Detection offers three modes depending on what you need.

Coordinates Mode

This mode gives you a list of numbers that show where the document corners are. Consequently, the answer includes the position for each corner in order. This starts with the top-left and goes around to the bottom-left.

API Call

doc_detection=coords:true

Response

{
  "coords": {
    "x": 106,
    "y": 464,
    "width": 580,
    "height": 231
  }
}
Enter fullscreen mode Exit fullscreen mode

Therefore, use coordinates mode when you need to draw boxes on the screen, crop the image yourself, or tell another system where the document is.

Warped Mode

Unlike coordinates mode, warped mode fixes the angle of the document so it looks flat. The image is straightened, but the system does not clean up the colors or brightness.

API Call

doc_detection=preprocess:false

The result is a new image where the document fills the whole frame. It fixes any twists or tilts from the camera angle. However, the picture looks exactly like the original in terms of color and quality.

You should use warped mode when you want a straight image but plan to clean it up yourself. It is also good if you need to keep the original colors.

Preprocessed Mode

Finally, this mode does everything at once. It straightens the image and also cleans up graininess and improves the contrast. This creates the best possible picture for reading text.

API Call

doc_detection=preprocess:true

This is the default setting if you do not choose one. The cleaning step reduces noise and makes the text sharp.

Use preprocessed mode for OCR tasks. The cleaning steps make it much easier for the computer to read names and numbers, especially on photos taken in bad lighting.

Full API Examples

Document Detection works by changing the URL in the Processing API. All requests need a security policy and signature.

Get coordinates

https://cdn.filestackcontent.com/security=p:POLICY,s:SIGNATURE/doc_detection=coords:true/HANDLE

Get warped image

https://cdn.filestackcontent.com/security=p:POLICY,s:SIGNATURE/doc_detection=preprocess:false/HANDLE

Get preprocessed image

https://cdn.filestackcontent.com/security=p:POLICY,s:SIGNATURE/doc_detection=preprocess:true/HANDLE

Chaining with Resize

Images must be 2000×2000 pixels or smaller. For larger images, you must add a resize step to the link

https://cdn.filestackcontent.com/security=p:POLICY,s:SIGNATURE/resize=height:1500/doc_detection=preprocess:true/HANDLE

Chaining with OCR

Additionally, you can send the clean image directly into Filestack OCR

https://cdn.filestackcontent.com/security=p:POLICY,s:SIGNATURE/doc_detection=preprocess:true/ocr/HANDLE

And it will return this from our uploaded receipt example (note, just the “text” part for brevity)

"text": "Manila Automated Fare\nCollection System\nRECEIPT\nTax ID:\nReceipt No:\nDate and Time:\nStation Name:\nPOS ID:\nOperator ID:\nCard ID:\nCard Name:\n004-1-231125-1134-53\n23 NOV 2025 11:34\nGil Puyat\nPOI, POS.A.02.00.03\n100108195\n6378050082232813\nStandard SVC\nAdd value\nAdd Value:\nold Remaining value:\n42.00\n100.00\nNew Remaining value:\n142.00\nAmount Payable:\n100.00\nAmount Received:\n100.00",
Enter fullscreen mode Exit fullscreen mode

Common Use Cases

  • Receipt processing. Users take photos of receipts in dark rooms or at weird angles. Preprocessed mode straightens these out and makes the text pop before you try to read the prices.

  • ID verification. Similarly, driver’s licenses on tables are rarely straight. Coordinates mode can help you show users where to place their card, while preprocessed mode makes it easier to read names and dates.

  • Contract digitization. Scanned pages that moved during scanning get fixed. This makes sure you capture full paragraphs without cutting off lines.

  • Mobile document capture. Any app where users upload photos of papers needs this. Generally, the worse the original photo is, the more this tool helps.

Conclusion

Fixing messy real-world photos is very important for modern apps. Document Detection solves this by automatically straightening and cleaning your images. As a result, you get better accuracy and a smoother experience for your users, even if they upload imperfect photos.

Whether you are working with receipts, IDs, or contracts, using these tools helps you handle tricky situations. By cleaning your images before trying to read the text, you save time and fix errors before they happen.

To get started, look at the Processing API documentation to learn how to add this to your work.

This article was published first on the Filestack blog.

Top comments (0)