How to make DITA work for translation
There’s little debate about the benefits of DITA for the creation of technical content in large organizations. However, translating documents in a DITA environment is not always that straightforward, and can result in errors. Luckily, DITA offers a number of features that enable you to reduce the risk of mistakes.
DITA is the fastest growing XML standard for technical documentation. And for good reason, because this standard can significantly reduce time and cost of creating content, especially for companies dealing with large amounts of technical content across different sources. The biggest strength of DITA is that it makes it much easier to reuse content that has already been created.
DITA is not only great for technical writing. It’s also great for translation. However, different language architectures, mixed language documents, untranslatable content… The road to a smooth and acceptable translation is full of danger. This is how DITA handles it.
1. DITA’s internationalization attributes
DITA has a number of attributes you can use to make translation easier.
The first important attribute is
xml:lang, which identifies the language (and optionally the locale) of the element content.
xml:lang can be applied to any element and is handy when you have mixed languages in your source documents. This makes it easy for translation software to filter out content in other languages.
Not all languages are written from left to right, as is the case in English. Arabic, Hebrew, and many other languages are written from right to left. Some multilingual documents can contain a mixture of text segments in two directions. This is where the
dir attribute comes in handy.
dir provides direction about how processors should render bidirectional text.
You can use the
translate attribute for portions of text in your DITA files that should not be translated. To mark those pieces as untranslatable you simply set the value of the
translate attribute to
2. Tagging terminology
DITA also provides a clear way to tag terminology. Technical or specialized terms, acronyms, abbreviations, and other jargon that needs to be defined to be understood, can be semantically identified.
For example, the
term element identifies words that may have or require extended definitions or explanations:
The terms used in a document are normally defined in a separate glossary section. This allows translation systems to lock or unlock terms, automatically extract different terms from existing documents or glossaries and to do automatic quality control.
3. The conref attribute: handle with care!
DITA makes use of topics that should be short enough to be easily readable, but long enough to make sense on their own. These topics are then reused and repurposed in various ways. This is the true strength of DITA. Reuse of content from other topics or maps is handled by means of the content referencing (
conref) attribute. A fragment of content in one topic or map can be pulled by reference into any other topic or map where the content is allowed.
However, when a content block is used for translation, it is not necessarily combined with the original content block(s) it relates to. In other words: the content block is pulled out of its context. This can result in grammatical errors, especially when you need to translate into highly inflected and gender-sensitive languages such as most Slavonic and Germanic languages, where nouns can change depending on the case or context.
conref attribute is actively used with
uicontrol elements (used to mark up names of buttons, entry fields, menu items, etc) and
wintitle elements (used for names of windows or dialogues or other user interface elements).
Conref is also commonly used for product names, audience names or roles so that they can be changed where they occur in text for customer-specific terminology.
So, how can you avoid grammatical errors using the
- Translating in DITA is finding the right trade-off between more reuse and safe reuse. That’s why it’s always safer only to use
conreffor grammatically complete sentences or phrases. This text should be able to stand alone, not relying on the surrounding text in the sentence. Like this example here:
- Avoid using the
conrefattribute for common nouns, for example for adding individual words to a sentence. Situations like the example below should be avoided:
- When using proper nouns, such as Ford Focus, make sure that the proper noun is the subject of the sentence and in nominative case. This case usually does not require inflection.
- Especially for inflected languages, a possible workaround is to resolve or flatten conrefs prior to translation, in order not to impact the translation performance.
4. Write standalone units of text
It is recommended to write self-contained units of texts as much as possible. So, avoid relative references such as “as discussed in the previous chapter” or “See image below”, because when these text units are taken out of context, the relative references are useless.
5. Mind the spaces
DITA has dozens of useful inline elements, some of which are borrowed from HTML. However, using many inline elements can sometimes lead to confusion, especially with the use of spaces. Therefore, make sure spaces are used correctly, so words don’t stick together.
6. Provide reference material
Translators often need to translate stand-alone topics, which might prevent them from seeing the whole picture. To avoid this, make sure to provide some reference material about the topic, e.g. the published document, or the website.
7. Annotate where needed
Another way to give translators some background on the topic or to provide them with specific comments on how something should be translated is to annotate in the text. Annotations are a great way to give instructions about specific text strings.
Translating DITA: yes, you can!
Content reuse is DITA’s true forte. However, this reuse requires extra attention for translators, especially in languages that have a lot of inflection, unlike English. When pulled out of context, content blocks can be reused the wrong way, resulting in grammatical errors. That’s why the use of attributes like conref requires some common sense. Finding the right trade-off between more reuse and safe reuse is a continuous exercise.
DITA also provides excellent mechanisms to remove doubt and instruct translators on what to translate and what not. This can especially come in handy with multilingual documents, and for content with highly specialized terminology.
We believe translating in DITA offers a lot of opportunities if you take into account a few tips. If you have questions about DITA’s best practices for translation, let us know. We’ll be glad to help you out.
- Personalized machine translation: large-volume translations with your unique company voice Posted by Yamagata Europe posted on 5 june
- AR content enables train engineers to work more efficiently Posted by Yamagata Europe posted on 5 may
- Yamagata Europe makes QA Distiller free for all to help fight COVID-19 Posted by Yamagata Europe posted on 1 april