Skip to content
June 10, 20263 min readProduct

Quality Estimation: confidence scores for AI translations

AI translation has become good. Good enough that most of its output ships untouched. The problem is the part that isn't: a flipped negation here, a wrong term there, hidden somewhere in thousands of perfectly fine segments. Until now, finding those meant re-reading everything, which defeats the point of automating translation in the first place.

Quality Estimation in Locize closes that gap: every AI translation now gets an AI-judged confidence score, stored with the translation and surfaced exactly where you work.


A confidence score on every AI translation

With Quality Estimation enabled, an AI evaluator rates each AI-produced translation between 0 and 1 and adds a short reason. It judges accuracy, completeness, fluency, terminology, and whether placeholders and tags survived.

This covers all three places Locize produces AI translations:

  • the Automatic Translation workflow that translates new keys in the background,
  • AI bulk actions in the editor,
  • and the AI assistant on individual segments.

In the editor you see a per-segment confidence indicator, the exact percentage with the evaluator's reasoning on the selected segment, and a new "by AI: needs review" filter that lists every AI translation scoring below the threshold (0.7 by default).

The pending review at the bottom is the Review AI workflow in action: the low-confidence translation arrived as a proposal carrying its confidence, a critique, and a suggested revision the reviewer can apply with one click.

Scores are honest about their lifetime, too: edit a translation by hand, re-translate it, or apply a translation memory match, and the now-stale score disappears automatically. And because scores are editor metadata, they never leak into your published translation files.

In our calibration tests the evaluator separated good from deliberately broken translations perfectly. Flipped polarity, dropped negations, swapped tags: all caught, with zero false alarms on the good ones.


Route low confidence straight to humans

Scoring is half the story. The optional Review AI workflow setting acts on it:

  • Low-confidence AI translations are not saved silently. They become pending proposals in the review workflow, even in languages that don't have the regular review workflow enabled.
  • High-confidence translations save directly, untouched.
  • Each routed proposal carries a critique: issue categories (accuracy, fluency, terminology, style) with severity and a note. When the evaluator can clearly do better, it adds a suggested revision the reviewer applies with one click.

This is deliberately different from the regular review workflow, which reviews everything in selected languages. The Review AI workflow reviews selectively, by confidence, so your reviewers spend their time exactly where the AI is unsure.


Getting started

New projects have Quality Estimation enabled by default. For existing projects:

  1. Open your project settings, then EDITOR, TM/MT/AI, ORDERING.
  2. Enable Quality Estimation (requires enabled machine/AI translation).
  3. Optionally enable the Review AI workflow to auto-route low-confidence translations to review.

Scoring reuses whatever AI provider produced the translation: your own OpenAI, Gemini, or Mistral key, or the built-in Locize AI (metered as normal AI-token usage). A styleguide and glossary sharpen the evaluator's judgement the same way they sharpen translations.

Full details in the Quality Estimation documentation.

Trust the AI where it earns it, and know immediately where it doesn't.

Tired of managing translations by hand?

Locize is the translation management backend by the i18next team: CDN delivery, AI translation, in-context editing, no redeploys.

Start your free 14-day trial