A lot of effort and energy is going into mobile receipt processing applications of late – and with good reason. Automated receipt processing with your mobile phone has a variety of truly useful applications – from making your expenses really easy to do to allowing marketers to come up with a variety of cool purchase-related marketing programs.
At Snipp we recently launched SnippCheck, our own mobile receipt processing and just recently wrapped up our first receipt processing campaign with Arm & Hammer Baking Soda (consumers could submit receipts through email or messaging and got back a $10 off coupon at 1-800-FLOWERS.com; see ahoffer.com for details) – and we wanted to share some of the learnings we had from applying our mobile receipt processing solution to a real world campaign. (Note: the learnings here are from doing a post-mortem of the campaign and not just from the real-time application of our technology during the campaign).
1. Mobile Receipt Processing Is Hard To Do
Before we go further I want to stress that receipt processing is hard, and that receipt processing with a mobile phone is even harder. Receipts tend to be printed on thermal paper (which fade over time) or using dot-matrix printers (which makes it hard to recognize the characters) or tend to be manhandled quite often (resulting in creases and crumples that further impair character recognition). Furthermore, taking photos of receipts with your mobile phones creates its own set of issues: camera resolution on mobile phones isn’t great, making images difficult to read; blurriness is quite common because of the need to hold your phone close to the receipt (and the inability of the phone to focus appropriately); and the receipt images tend to be skewed and angled because the photo’s taken from a mobile phone. (Technically we knew all this already so its not really a learning, and we’ve already built a pretty cool solution to make mobile receipt processing as foolproof as possible, but its worth pointing out all this for your edification.)
2. Not All Receipts Are Made Equal
Its trite but true. The single biggest variant in determining the success of applying optimal character recognition (OCR) to automatically process receipts is image quality – and nothing impacts image quality quite as much as the quality of the original image. During the Arm & Hammer campaign, we noticed significant variance in the image quality of the incoming receipts, and that variance was directly correlated to the stores the receipts were coming from. Put differently, the single biggest determinant of whether a receipt could be processed automatically was which retailer it came from. Typically, large retailers like Walmart, Target have clearly printed receipts – while others don’t. And it wasn’t just the clarity of the printing, but also the formatting and clarity of the information that also improved readability. Receipts from these retailers had scanning success rates of over twice those of the laggards. The good news was that the top 5 retailers accounted for about 50% of all submitted receipts.
3. What You Look For Makes A Difference
The more focused you are in what you are looking for, the better your OCR success rates will be. OCR success rates tend to be higher when searching just for numbers than when searching for alphanumeric characters. So looking for product UPC codes or SKU numbers (wherever possible) tended to produce better results than when looking for product descriptions.
4. Image Pre-processing Is An Absolute Must
Ask any self-respecting OCR practitioner and they will tell you that image pre-processing is a key part of the secret sauce in OCR. In the case of receipts there are a few tricks we found particularly helpful:
- Binarization, the process of converting every pixel in the image to black or white, enhanced the contrast of the image
- Asking consumers to circle the product in question and then cropping the image to remove all extraneous data
- Deskewing and line straightening to account for photos being taken at an angle
The truth is that there is no one single silver bullet so it is far better to rely on a multi-layered, staged approach to applying OCR to receipt processing. Rather than doing a single pass for each receipt it is better to do many, each time adjusting the image pre-processing or the search terms you are looking for.
7. OCR Only Gets You So Far
The truth is that you will never get 100% accuracy with OCR as it stands today. Anyone claiming to get above 90% consistently is probably fudging – especially when it comes to receipts. For real world applications you have to have a manual backup – a real set of eyes that will look at outlier and flagged results. Our SnippCheck solution has been built on that premise – results that fail the OCR are routed to our operations team that looks through the receipt to ensure no false negatives.
End Note: KISS
Technology can only take you so far. At the end of the day, the structure of a program involving receipt processing can have a huge impact on the technology’s ability to support the program. For instance, requiring that consumers circle the product on a receipt so the OCR can focus on the right part of a receipt makes a huge difference. Similarly, making intelligent tradeoffs between the amount of data you want to capture and the speed and accuracy of the receipt processing are critical.