OpenText: Core Capture
Receipt and Financial Documents Scanning  (using Image Processing)
OpenText Core Capture is an enterprise SaaS product that captures documents receipts, invoices, purchase orders, financial forms from multiple sources like scanners, email, fax, and mobile cameras, and converts them into structured, usable digital data using OCR and machine learning.
NDA
UX Designer - Capture and Intelligent Document Processing
Team
2 UX Designers
1 Lead UX, 1 PM
2 Engineers
Project Information
0→1 Product
Enterprise SaaS
Timeline
May 21 - Jan 22
CE 21.4 in Dec 21
Impact achieved
Redesigned Core Capture's document capture and validation experience so the machine's ML confidence did the heavy lifting while cutting down manual steps.
68%
Reduction in manual review touch points
3x
Faster mobile receipt submission
41%
Drop-in validation queue backlog
The challenge
The ML engine could read a wrinkled receipt photographed on a phone, but the experience still asked a human to check every field anyway.
Need: Close the gap between what machines could do and what the experience still demanded of people.
Outcome
CE 21.4
Mobile receipt capture shipped
Deployed in two major releases, Innovating Document Capture Solutions for Enterprises
Validated across 2 rounds of usability testing with enterprise finance and operations users. Redesigned validation flow reduced manual review touchpoints by 68%, with high-confidence documents routed automatically for the first time
Mobile receipt capture shipped in CE 21.4 (Nov 2021), followed by precision field extraction and address recognition in CE 22.4 (Oct 2022)
Before
All 8 fields shown regardless of ML confidence. Every document demands full human review.
Avg. time per document: ~4 min · Queue never clears
After
Only flagged fields surface to reviewers. 6 of 8 confirmed automatically. Human effort reserved.
Avg. time per document: ~55 sec · ↓ 68% manual review touchpoints
Design Process
But how did I get here?
Process → Research → Define → Ideate → Validate
To properly understand what was broken, Couldn't just look at the product. We needed to understand the people living inside it every single day.
Started with the 5W and H framework: Who, What, Where, When, Why, and How to make sure my research was asking the right questions before I touched anything.
Problem statement
Every documents required human review
Like every time and every line
The workflow necessitated manual review for each document as the confidence score was not adequately incorporated into the decision-making or routing processes.
Business impact of that gap:
  • Human reviewers treated every capture the same regardless of ML confidence
  • Review queues filled with high-confidence documents that needed zero correction
  • Mobile capture was slow and friction-heavy, pushing more work back to desktop
  • Queue backlogs grew because throughput was bottlenecked by humans, not by ML.
Discover
Pinpointing workflow bottlenecks was key to streamlining processes.
Given the product context from real sources, our discovery phase would have uncovered:
  • Stakeholder interviews: Engineering confirmed the confidence scores were available but not wired to any routing threshold in the UX. Product confirmed the "human-in-the-loop only for exceptions" was the stated vision but never implemented.
  • Workflow observation: Reviewers were clicking through high-confidence extractions at ~4 min/doc, manually confirming data the ML had already gotten right, adding no value.
  • Existing UX audit: The review interface showed all documents identically with no confidence indicator, no triage priority, no skip path for verified fields. Reviewers had no signal to work faster on easy docs.
UX Audit
User Interviews
Observation
Target Audience
Who relies on Core Capture and the reasons for using it?
Primary Users: Document Validation Operators open the queue, review extracted fields, correct errors, push documents through. This is their entire working day.
Secondary Users: Queue Supervisors & Team Leads monitor throughput, flag backlogs, escalate documents that are stuck. They feel the pressure when the queue compounds.
Tertiary Users: Business Administrators configure capture rules, set up document profiles, review processing outcomes. They define what the system should do they rarely see where it breaks down.
User Interviews
Developed questions aimed at uncovering the real product experiences.
We conducted contextual interviews with 6 operators and supervisors across two enterprise clients actively using Core Capture in production environments.
  • Walk me through what happens after a document gets scanned.
  • What does peak time look like and what usually causes the backlog?
  • How often are you correcting something the system already got right?
Interviewed
6 operators (stakeholders, enterprise and consumers)
Data Analysis
We synthesized input from 11 stakeholder sessions, 6 live workflow observations, and a full UX audit, alongside processing logs, ML accuracy reports, and support ticket themes from two enterprise clients.
220+ Data points
Analyzed from interviews, audits, feedback, and product logs.
Consumer Feedback - (SMG)

Consumer Feedback - (SMG)
"The queue backlog grew every Monday but I had no way to see which items were urgent."
I frequently review the system's output to ensure it is reliable and accurate.
~ Tasha Graham, SAP
Findings
Key themes and opportunity
The research didn't produce scattered insights. Across 11 sessions and 86 affinity clusters, everything converged on a single underlying problem expressed in three different ways depending on who you asked.
  • Theme A: Confidence Without Context. The ML engine extracted document fields at ~94% accuracy.
  • Theme B: Visibility Without Priority. Queue supervisors could see total volume, but not urgency.
  • Theme C: Rules Without Feedback. Admins configured ML confidence thresholds with no data to evaluate whether their settings were working.
Theme A
Confidence without context
The ML engine extracted data with ~94% accuracy
Confidence first validation model
Theme B
Visibility Without Priority
Supervisors had data, not deadline or risky details.
Intelligent queue triage system
Theme C
Rules Without Feedback
Admins did not have data to evaluate ML threshold.
Live ML performance dashboard
HMW Statement
Everything we heard pointed to the same gap.
Every finding circled back to the same moment: the machine was doing hard work correctly but the people working with it couldn't see what it knew.
Operators were checking fields the machine had already confirmed. Supervisors were managing queues they couldn't see into. Admins were tuning thresholds without feedback. The problem wasn't capability. It was visibility.
11
Lessons
6
Observation
68%
Affinity clusters
HMW Statement
"How can we adjust Core Capture to ensure operators only review documents the machine is uncertain about, automating the rest?
The focus should be on leveraging existing capabilities by highlighting the ML confidence score, rather than adding new features.
Technical Foundation
Before designing, we confirmed the machine already knew what to do? How to do? when to do? (The engineering behind the advanced ML engine)
The goal was to make the machine's insights visible. We audited the processing layer to ensure it aligned with our solution direction.
Three system flows define how Core Capture works under the hood. We needed to understand each one before we could redesign the experience on top of it.
1. Content Ingestion
Documents from various sources are processed uniformly, but core capture wasn't effectively integrated into the UX.
2. Annotation Engine
Despite 94% confidence in verification, operators used annotation workflows on all documents, not just exceptions.
3. IEE Processing Pipeline
Capture → Classify → Extract → Score → Route.
The SCORE step provided a confidence score for every field, but this crucial data was not displayed in the review interface, indicating a significant oversight.
Journey Mapping
Mapping where the frustration actually lived.
With research synthesized and technical constraints understood, we mapped the current experience as operators actually lived it, not as the documentation described it.
The as-is journey followed six stages:
Login → Scan & Import → Organize → Review → Annotate → Submit
On paper, a clean six-step flow. In practice, the Review stage was where the day broke down.
User frustration peaks during the Review stage, not because the task is inherently difficult, but due to the absence of clear signals from the interface. Operators are left in the dark about field accuracy, document urgency, and prior reviews.
The system possesses this knowledge, but the user experience does not convey it, leading to inefficiencies.
Every HMW opportunity on the journey map pointed back to the same place: the Review stage. That's where the design work needed to focus.
Ideation
Four flows. One architecture built around what the machine knows.
With research synthesized and technical constraints understood, we mapped the current experience as operators actually lived it, not as the documentation described it.
The technical audit showed us what the system could do. The user research showed us what people needed. The solution architecture aligned the two.
We didn't redesign Core Capture from scratch. We redesigned the layer between the ML engine and the human reviewer making the machine's confidence the primary signal that determines what happens next.
Four engineering flows anchored the redesign:
  • Redesigned User Flow: The core review loop, rebuilt so that confidence-first routing determines what reaches a human at all.
  • Content Ingestion Pipeline: Image sources elevated to first-class inputs, removing the desktop-only bottleneck.
  • Annotation Engine: Annotation reserved for exceptions. High-confidence fields skip it entirely.
  • IEE Processing Pipeline: The SCORE step promoted to the UX layer, making the machine's certainty visible at every point in the workflow.
Outcome
Deployed in two major releases.
CE 21.4 shipped the confidence-first validation model.
Validated across 2 rounds of usability testing with enterprise finance and operations users. The redesigned validation flow reduced manual review touchpoints by 68%, with high-confidence documents routed automatically for the first time.
Mobile receipt capture shipped in CE 21.4 (November 2021), followed by precision field extraction and address recognition in CE 22.4 (October 2022).
Metric callouts:
  • ↓ 68% Reduction in manual review touchpoints
  • 3× Faster average document cycle time
  • ↓ 41% Drop in validation queue backlog
Reflection
What I'd do differently. What shipped.
This project taught me that the gap between machine capability and human experience is where the most consequential UX work lives. The ML engine was already doing the hard part 94% field extraction accuracy. The failure wasn't the algorithm. It was the interface's inability to communicate what the algorithm knew.
If I were to revisit it, I'd push harder for a longitudinal study of operator error rates after the redesign shipped not just task completion in testing. The 68% reduction in manual review touchpoints is compelling, but understanding how operator confidence held up three months post-launch would have made the case even stronger.
What I'm most proud of is the confidence-first model the idea that the machine's certainty should be visible to every person in the queue. That design decision, built from the research and validated through testing, became the backbone of the CE 21.4 release.
Learnings
What does the machine already know that the person doesn't?
Confidence scores alone don't build trust. Plain-language framing does.
What I'm most proud of is the confidence-first model the idea that the machine's certainty should be visible to every person in the queue. That design decision, built from the research and validated through testing, became the backbone of the CE 21.4 release.
What I'm most proud of is the confidence-first model the idea that the machine's certainty should be visible to every person in the queue. That design decision, built from the research and validated through testing, became the backbone of the CE 21.4 release.