Skip to content

Quality Estimation

Availability: Enabled by default for new projects, opt-in for existing ones. Requires enabled machine/AI translation.

Quality Estimation (QE) lets an AI evaluator score every AI-produced translation with a confidence estimate between 0 and 1, together with a short reason. The score is stored with the translation and surfaced in the editor, so you immediately see which AI translations are trustworthy and which ones deserve a human look.

  • Only AI translations are scored (traditional machine translation is not).
  • Scoring works for translations produced by the automatic translation workflow as well as AI bulk actions and the AI assistant in the editor.
  • Scoring uses the same AI provider that produced the translation: with your own API key (OpenAI, Gemini, Mistral AI) the scoring runs on your key; with the built-in Locize AI it consumes AI tokens like any other Locize AI usage.

Enable Quality Estimation

New projects have Quality Estimation enabled by default (the built-in Locize AI works out of the box). For existing projects:

  1. Open your Project settings.
  2. Go to EDITOR, TM/MT/AI, ORDERING.
  3. In Cat settings, enable Quality Estimation (the toggle is available once machine translation is enabled).

What you see in the editor

  • A per-segment confidence indicator on AI-translated segments.
  • A confidence percentage (with the evaluator's reason as tooltip) next to the quality info of the selected segment.
  • A "by AI: needs review" filter in the state filters, listing AI translations scoring below the needs-review threshold (0.7 by default).

Low-confidence proposals routed via the Review AI workflow show the confidence, the critique, and a one-click suggested revision directly on the pending review.

When an AI translation is edited by a human, re-translated, or replaced from translation memory, its confidence score is removed automatically. A score always describes exactly the text it was computed for.


Review AI workflow (auto-route low confidence to humans)

With the additional Review AI workflow setting, low-confidence AI translations are not saved silently. Instead they are routed into the review workflow as pending proposals:

  • Low-confidence AI translations (score below the threshold, or with a major issue found) become pending reviews, even in languages that don't have the regular review workflow enabled.
  • High-confidence AI translations are saved directly.
  • Routed proposals carry a critique: issue categories (accuracy, fluency, terminology, style) with severity and a short note. When the evaluator can clearly improve the translation, it also includes a suggested revision you can apply with one click.
  • If a language has the regular review workflow enabled, everything still goes through review as usual; Quality Estimation then simply enriches those proposals with score and critique.

This differs from the regular review workflow: the review workflow reviews everything in selected languages, while the Review AI workflow reviews selectively, based on confidence, across all languages.


Good to know

  • The needs-review threshold defaults to 0.7. (Via API, qualityEstimationEnabled also accepts a number between 0 and 1 to use a custom threshold.)
  • Scoring roughly doubles the AI calls of a translation run. On very large namespaces the backend caps scoring per language and saves the remaining translations unscored.
  • Quality Estimation complements the deterministic checks & issues, it does not replace them: broken placeholders or untranslated fragments are best caught by those checks.
  • Scores are editor metadata: they are not included in your published translation files.
  • A Styleguide and Glossary improve the evaluator's judgement the same way they improve translations.