A few basic questions before translating a document
As you realize that you need a brochure translated for your next marketing event, you send off the perfectly formatted PDF to your translators and ask them to return it in a few days. This seems like a simple and reasonable request.
But the translators instantly reply with a few questions of their own:
– When exactly do you need it back?
– Which languages do you need?
– Can you provide the source file for the PDF?
You could have easily thought about the first two questions, and you can answers them immediately… But what’s important about the last question? The file you sent them looks great. They should be able to deal with it, right?
The simple answer is: Your file looks great and could probably be translated “as is”. But the results would not be optimal.
What is PDF, and why do we need a source?
The PDF (Portable Document Format) format was invented by Adobe to allow documents to be consistently displayed on multiple software, hardware or operating systems. As such, they offer a “digital printout” of your document that will display perfectly on any PDF reader. Most reader applications will also let you comment, share, sign, secure, fill-in, redact, search through, or even alter the content.
But, PDF readers do not offer any control on page layout and text flow that is required for translation. This control is provided by the publishing tool that created the document.
- Note that some tools, like Adobe Illustrator, can save their documents in editable PDF format. These files can be re-opened and edited in Illustrator without losing any formatting data. In that case, the document includes both the output and the source… and can therefore be used as the source document for translation.
Can you find the source?
At this point you start inquiring within your company to locate the source document. Who created it and what was the original format?
The source file may have been created internally, or also by a third party vendor, and could come from a variety of software applications: Adobe InDesign, Photoshop, Illustrator, FrameMaker, MS Word, PowerPoint, Excel, MadCap Flare, Open Office, DITA, XML, HTML, and even scanned documents.
When the source file is available, it is always better to translate from this original document, especially if you want your translation to look just like the PDF you sent out. Professional translation services should be able to process all native formats, no matter how complex they are, and deliver a translated document that retains the format, layout and quality of the original version.
The source is not available: let’s convert!
If the source file is not available (lost through archiving or unattainable from an external vendor or a different department), the translation process will be more complicated and can become more expensive:
Without its source document, a PDF needs to be converted into an editable document before being translated. How it is converted depends on what kind of document you want back (just the translated text, a bilingual table, or an exact replica of your original layout). There are a number of tools and methods, but each will take extra time to reformat the contents to match your PDF.
The conversion or recreation process should concentrate on three elements:
Text extraction via a PDF converter or OCR
To translate your PDF, you first need to access its text and content. If the document has been scanned or vectorized, it no longer contains any text, but only “bitmap” images of the written characters. In that case, an Optical Character Recognition (OCR) will be used to transform these visual images into editable text. This process is time-consuming and should be avoided when possible.
To re-create a basic page layout, a simple Word-processor may suffice. But, a more complex design will require advanced DTP solutions, like Adobe InDesign, Illustrator, Photoshop, or QuarkXpress to better match the output of your original PDF. The quality of illustrations will also depend on accessing the source images, or on extracting them from the original PDF.
Natural textflow for translations
Translation tools divide the source text into “segments” (usually sentences) in order to analyze documents and store the translated units.
To efficiently translate a PDF, the converted text needs to be “clean” and provide complete sentences that follow the original textflow.
Some PDF conversion tools can generate editable documents that “look” great by simply creating individual text lines to recreate the original layout. But, they don’t recreate the original textflow needed for translation.
Any disruption in the natural flow of text (line-breaks, column and page jumps, hyphenations, header or footers), will render translation tools unusable since they would not be able to process full sentences.
With these files, translation memories will not be able to find any matches and your translators will have a very hard time understanding the (lack of) logic in the bits of text.
How does conversion impact my project?
No matter which method you choose, converting a PDF to recreate a new editable document for translation will always have an impact on the time, cost and quality of the translation:
- Time: Working from a converted PDF is usually slower than working from the source document since it requires additional steps. And PDF translations are often requested at the last minute, making a tight schedule even tighter.
- Cost: You need to pay to redo a document that your company already paid for. And your translation memories might become less efficient with a poor segmentation.
- Quality: Your translated document may have a lower quality than your original PDF, since the page layout had to be manually redone and the translators could not fully benefit from their CAT tools.
Our recommandation: Always keep track of your source.
When generating PDF documents, remember to keep track of your source files. This will not only help you update your documents later on. It will also be essential to optimize the translation process, and to keep your costs within budget.
To streamline the translation of a PDF, locate the source document before requesting your translation. Everyone will be happier for it 🙂
And if you don’t have the source available, make sure that your translators use the right conversion process.