Fuzzy-repetitions. Fuzzy what? Come again please?
What are “fuzzy-repetitions” and how can we deal with them efficiently?
For technical translation and software localization, we translate many very similar phrases, with minor changes:
The Ocean product line is available in 4 colors and 3 configurations.
The Sierra product line is available in 3 colors and 3 configurations.
The Ocean product line is available in 3 configurations and 6 colors.
The Desert product range is available in 2 colors.
To continue, select Option 1 and restart.
To continue select “Option 1” and restart!
To continue, select Option 15, then restart.
To continue, please select Option 1 then restart.
How to efficiently manage fuzzy-repetitions?
How to effectively manage fuzzy-repetitions?
How can we effectively manage fuzzy-repetitions?
How to effectively manage near-repetitions?
The procedure lasted 12 minutes and ended around 6:30 pm.
The procedure lasted 12 minutes and ended at 6:30 pm.
The procedure lasted 45 minutes, to end around 10 pm.
The procedure lasted 2 hours and ended around 2:00 pm!
In a translation project, we refer to “fuzzy-repetitions” when working with segments that are:
- very similar but not identical
- not seen as repetitions by CAT tools
Sometimes the differences are very small, like:
- format change
- different or additional tag
- updated numerical value
- different punctuation
- spelling variation
Differences can also come from a replaced word:
- updated product name or part number
- different date or time
- product variation (color, size, option)
- proper name, country, region
The key element for fuzzy-repetitions is the almost identical structure of the sentences. This allows them to be translated very quickly by simply updating the differences.
Why deal with fuzzy-repetitions?
Since CAT tools cannot detect fuzzy-repetitions, they represent an additional source of work (larger volumes to translate), and a high risk of inconsistencies with small variations of the same text translated differently.
The treatment of fuzzy-repetitions consists in detecting these segments and setting them aside during the translation of the main content. After the main content has been fully translated and proofread, the fuzzy repetitions can be quickly translated and updated.
This process improves the three key factors of any translation project: cost, speed and quality.
First benefit: Cost
CAT tools can detect exact repetitions, but not fuzzy-repetitions, which are counted as completely new text in word-count analyses. The benefit of the similarities between sentences to translate is totally overlooked.
Some CAT solutions, such as memoQ, can analyse and report the level of “homogeneity” within a project. This very interesting feature lets you estimate the number of fuzzy-repetitions in your texts, but it does not allow you to benefit directly from them.
The homogeneity analysis simply indicates that translators will benefit from their own translations as they progress in the project. However the translators will still have to process the full text and should legitimately be paid for the entire text assigned to them.
To reduce costs without taking advantage of the translators, we must exclude fuzzy-repetitions from the initial translation.
The translators are fully paid for all the text they translate, but the cost of translation is reduced due to a smaller volume. Fuzzy-repetitions are locked until the end of the initial translation, when they can be processed more efficiently.
Second benefit: Speed
As with cost, the advantage in terms of speed is mainly due to the reduction of the initial volume to be translated. This smaller volume will be translated and subsequently proofread more quickly. The proofreader will check each sentence only once, and will no longer need to keep searching for similar sentences across the project in order to make consistent corrections (sometimes in several documents).
Third benefit: Quality
During translation, CAT tools can be extremely effective in detecting similarities (total or partial) and providing matches from existing translation memories. Some will even automatically correct minor differences between fuzzy-repetitions by adjusting numerical values, substituting names, or restoring formatting and tags.
But these tools are absolutely ineffective in isolating and processing fuzzy-repetitions.
When the proofreader corrects a segment, the CAT tool will generally be able to propagate the correction to all perfectly IDENTICAL repetitions within the document or section being reviewed.
But all fuzzy-repetitions are completely ignored by this propagation. The proofreader has to systematically search for similar phases in order to correct them consistently. And in the event of omissions or incomplete corrections, quality control modules are UNABLE to detect these inconsistencies.
Processing fuzzy-repetitions ensures better consistency, because the entire translated text is proofread and checked before it is propagated to similar repetitions and sentences.
How to proceed?
Today, many translation agencies are already using various procedures to treat exact repetitions more efficiently.
We went one step further and designed an innovative system to quickly and efficiently process all types of repetitions (both exact and fuzzy-repetitions). The basic workflow is:
- Detecting exact and fuzzy-repetitions
- Protecting (locking) exact and fuzzy-repetitions
- Translating and proofreading all unique text (unprotected)
- Checking quality and consistency of translated text
- Unlocking exact and fuzzy-repetitions
- Translating exact and fuzzy-repetitions using the proofread translation memory
- Final quality control of the translated project
In this procedure, the translation of fuzzy-repetitions from the proofread translation memory (step 6) requires adjustments to correct the minor differences between the memory matches and the nearly identical text. This step must always be carried out by a linguist who understands the target language.
DFR: Kevrenn’s solution to Disable Fuzzy-Repeats
At Kevrenn, we often translate files that need to be updated regularly, and content with many similarities.
We also work on multilingual projects where the benefits of source optimization are multiplied by the number of languages to translate.
We have therefore always been aware of the problems of fuzzy-repetitions, and have tried several approaches to correct them.
- For many years, we used a long and complex “home-brewed” procedure to optimize our source files before translation.
- Then, we looked for existing solutions on the market, but couldn’t find any.
- So, we designed and developed the DFR solution, which we have been using successfully for several years.
More information about DFR
The DFR toolkit is one of Kevrenn’s optimization solutions.
We use this solution for nearly all translation projects entrusted to us.
But we can also use DFR to optimize your projects before you translate them in-house or outsource them to your regular external translators.
Initially developed for the memoQ translation environment, DFR is now compatible with all the main CAT platforms on the market.A commercial version is also available to translation agencies interested in this unique solution.
Please contact us for more information about this optimization solution or to ask for a demo.