Quality Estimation
Availability: Enabled by default for new projects, opt-in for existing ones. Requires enabled machine/AI translation.
Quality Estimation (QE) lets an AI evaluator score every AI-produced translation with a confidence estimate between 0 and 1, together with a short reason. The score is stored with the translation and surfaced in the editor, so you immediately see which AI translations are trustworthy and which ones deserve a human look.
- Only AI translations are scored (traditional machine translation is not).
- Scoring works for translations produced by the automatic translation workflow as well as AI bulk actions and the AI assistant in the editor.
- Scoring uses the same AI provider that produced the translation: with your own API key (OpenAI, Gemini, Mistral AI) the scoring runs on your key; with the built-in Locize AI it consumes AI tokens like any other Locize AI usage.
Enable Quality Estimation
New projects have Quality Estimation enabled by default (the built-in Locize AI works out of the box). For existing projects:
- Open your Project settings.
- Go to EDITOR, TM/MT/AI, ORDERING.
- In Cat settings, enable Quality Estimation (the toggle is available once machine translation is enabled).
What you see in the editor
- A per-segment confidence indicator on AI-translated segments.
- A confidence percentage (with the evaluator's reason as tooltip) next to the quality info of the selected segment.
- A "by AI: needs review" filter in the state filters, listing AI translations scoring below the needs-review threshold (0.7 by default).

Low-confidence proposals routed via the Review AI workflow show the confidence, the critique, and a one-click suggested revision directly on the pending review.
When an AI translation is edited by a human, re-translated, or replaced from translation memory, its confidence score is removed automatically. A score always describes exactly the text it was computed for.
Review AI workflow (auto-route low confidence to humans)
With the additional Review AI workflow setting, low-confidence AI translations are not saved silently. Instead they are routed into the review workflow as pending proposals:
- Low-confidence AI translations (score below the threshold, or with a major issue found) become pending reviews, even in languages that don't have the regular review workflow enabled.
- High-confidence AI translations are saved directly.
- Routed proposals carry a critique: issue categories (accuracy, fluency, terminology, style) with severity and a short note. When the evaluator can clearly improve the translation, it also includes a suggested revision you can apply with one click.
- If a language has the regular review workflow enabled, everything still goes through review as usual; Quality Estimation then simply enriches those proposals with score and critique.
This differs from the regular review workflow: the review workflow reviews everything in selected languages, while the Review AI workflow reviews selectively, based on confidence, across all languages.
Good to know
- The needs-review threshold defaults to 0.7. (Via API,
qualityEstimationEnabledalso accepts a number between 0 and 1 to use a custom threshold.) - Scoring roughly doubles the AI calls of a translation run. On very large namespaces the backend caps scoring per language and saves the remaining translations unscored.
- Quality Estimation complements the deterministic checks & issues, it does not replace them: broken placeholders or untranslated fragments are best caught by those checks.
- Scores are editor metadata: they are not included in your published translation files.
- A Styleguide and Glossary improve the evaluator's judgement the same way they improve translations.