What is quality estimation (QE) for machine translation?

Quality estimation predicts how good a translation is without a human-written reference translation. In Locize, an AI evaluator rates every AI-produced translation with a confidence score between 0 and 1 plus a short reason, judging accuracy, completeness, fluency, terminology, and the preservation of placeholders and tags. The score is stored with the translation and shown in the editor, so reviewers can focus on the segments that actually need attention instead of re-reading everything.

How do I enable quality estimation in Locize?

New projects have it enabled by default. For existing projects it is a setting: in the project settings under EDITOR, TM/MT/AI, ORDERING enable Quality Estimation (it requires enabled machine/AI translation). From then on, new AI translations from the automatic translation workflow, AI bulk actions, and the AI assistant are scored automatically. Scoring uses the same AI provider that produced the translation: your own OpenAI, Gemini, or Mistral key, or the built-in Locize AI (consuming AI tokens like any other Locize AI usage).

Can Locize automatically send low-confidence AI translations to human review?

Yes. The additional Review AI workflow setting routes low-confidence AI translations into the existing review workflow as pending proposals instead of saving them silently, even in languages without the regular review workflow enabled. High-confidence translations are saved directly. Each routed proposal carries the confidence score, a critique with issue categories (accuracy, fluency, terminology, style), and often a one-click suggested revision.

How is quality estimation different from the regular review workflow?

The regular review workflow in Locize reviews everything in the languages you select: every change becomes a proposal a reviewer must accept. Quality estimation with the Review AI workflow is selective: it only routes AI translations whose confidence falls below the threshold (0.7 by default), across all languages. The two combine cleanly: in languages with the regular review workflow enabled, everything still goes through review and quality estimation simply enriches the proposals with scores and critique.

June 10, 20263 min readProduct

Quality Estimation: confidence scores for AI translations

AI translation has become good. Good enough that most of its output ships untouched. The problem is the part that isn't: a flipped negation here, a wrong term there, hidden somewhere in thousands of perfectly fine segments. Until now, finding those meant re-reading everything, which defeats the point of automating translation in the first place.

Quality Estimation in Locize closes that gap: every AI translation now gets an AI-judged confidence score, stored with the translation and surfaced exactly where you work.

A confidence score on every AI translation

With Quality Estimation enabled, an AI evaluator rates each AI-produced translation between 0 and 1 and adds a short reason. It judges accuracy, completeness, fluency, terminology, and whether placeholders and tags survived.

This covers all three places Locize produces AI translations:

the Automatic Translation workflow that translates new keys in the background,
AI bulk actions in the editor,
and the AI assistant on individual segments.

In the editor you see a per-segment confidence indicator, the exact percentage with the evaluator's reasoning on the selected segment, and a new "by AI: needs review" filter that lists every AI translation scoring below the threshold (0.7 by default).

The pending review at the bottom is the Review AI workflow in action: the low-confidence translation arrived as a proposal carrying its confidence, a critique, and a suggested revision the reviewer can apply with one click.

Scores are honest about their lifetime, too: edit a translation by hand, re-translate it, or apply a translation memory match, and the now-stale score disappears automatically. And because scores are editor metadata, they never leak into your published translation files.

In our calibration tests the evaluator separated good from deliberately broken translations perfectly. Flipped polarity, dropped negations, swapped tags: all caught, with zero false alarms on the good ones.

Route low confidence straight to humans

Scoring is half the story. The optional Review AI workflow setting acts on it:

Low-confidence AI translations are not saved silently. They become pending proposals in the review workflow, even in languages that don't have the regular review workflow enabled.
High-confidence translations save directly, untouched.
Each routed proposal carries a critique: issue categories (accuracy, fluency, terminology, style) with severity and a note. When the evaluator can clearly do better, it adds a suggested revision the reviewer applies with one click.

This is deliberately different from the regular review workflow, which reviews everything in selected languages. The Review AI workflow reviews selectively, by confidence, so your reviewers spend their time exactly where the AI is unsure.

Getting started

New projects have Quality Estimation enabled by default. For existing projects:

Open your project settings, then EDITOR, TM/MT/AI, ORDERING.
Enable Quality Estimation (requires enabled machine/AI translation).
Optionally enable the Review AI workflow to auto-route low-confidence translations to review.

Scoring reuses whatever AI provider produced the translation: your own OpenAI, Gemini, or Mistral key, or the built-in Locize AI (metered as normal AI-token usage). A styleguide and glossary sharpen the evaluator's judgement the same way they sharpen translations.

Full details in the Quality Estimation documentation.

Trust the AI where it earns it, and know immediately where it doesn't.

Tired of managing translations by hand?

Locize is the translation management backend by the i18next team: CDN delivery, AI translation, in-context editing, no redeploys.

Start your free 14-day trial

← Back to blog