Shuttleworth-Lagoudaki TM Profesional

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/268348851

Translation Memory Systems: Technology in the service of the Translation


Professional

Article

CITATIONS READS

0 102

1 author:

Mark Shuttleworth
Hong Kong Baptist University
15 PUBLICATIONS   42 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Second edition of the Dictionary of Translation Studies View project

All content following this page was uploaded by Mark Shuttleworth on 29 June 2015.

The user has requested enhancement of the downloaded file.


Translation Memory Systems: Technology in the service
of the Translation Professional

Mark Shuttleworth Elina Lagoudaki


[email protected] [email protected]
Imperial College London Imperial College London

Introduction
Translation Memory (TM) technology has been with us for a good fifteen years now.
Heralded as the answer to the translator’s dreams in many respects, its success in real
terms has been considerable, and limited mainly by the level of take-up among
professional translators and translation companies.

The basic concept behind the technology is simple: as users translate, the application
‘remembers’ their translation sentence by sentence, and then ‘reminds’ them of precisely
what they wrote whenever a sentence recurs. Thus, more formally stated, TM is a
computer application that allows users to store previous translations along with their
originals in a database (or an index) and to re-use them in new translation projects,
whenever similar source text is encountered.

How does it work? Initially, the program splits the pair of texts (original + translation) into
‘segments’ (i.e. sentences, phrases or words) and then aligns them as the text is
translated segment by segment. A pair of aligned segments – i.e. the source segment
and its translation – is what we often call a ‘translation unit’, or TU. Each translation unit
is then stored and indexed in a database/index (or ‘translation memory’) in an organised
way with a variety of information – such as the date and time of translation, the identity
of the translator, and so on – attached. When the user starts a new translation – typically
of an updated version of the original text – which has some source segments identical or
similar to the ones existing in the database, the system recognises them, retrieves the
past translations for those segments and suggests them to the user.

Terminology is typically handled in a similar but slightly different manner, with the TM
tool connecting to a separate terminology database for automatic term look-up.

Thus TM cannot by any stretch of the imagination be likened to machine translation, its
better-known but less widely-used cousin, as the translator remains in complete control
of the translation, rather than simply correcting and polishing a rough version that the
system has produced. Also unlike machine translation, TM systems are not limited to a
specific language pair but can be used to translate between any pair of languages. In
essence, TM technology offers the user a database tool for accessing past translations;
the database is empty upon its first use, but expands rapidly in size as users fill it with
their translations; the bigger the TM database gets the more valuable it becomes.

1
Intrinsic to the technology is the ability to distinguish between matches that are identical
(‘exact’ or ‘100%’) and those that are only similar (‘fuzzy’). An example of the former
might be as follows:

Segment to be translated: ‘Every employee is expected to act in accordance with the


Business Principles’

TU that exists in the database:


EN (UK) ‘Every employee is expected to act in accordance with the Business Principles’
IT (IT) ‘Ogni dipendente è tenuto ad agire in conformità ai Principi commerciali’

Some tools ignore the question of formatting for the purposes of determining whether or
not a particular match is exact, while others permit the user to require a complete
replication of the formatting of the original in order to qualify as 100%.

A fuzzy match, then, is a match which is anything less than exact, however that is
defined. For example:

Segment to be translated: ‘Key milestones in the development of our company can be


found in the following section.’

TU that exists in the database:


EN (UK) ‘Key milestones in the development of Intracom can be found in the following
sections.’
IT (IT) ‘I punti salienti della storia di Intracom sono delineati nelle seguenti sezioni.’

Typically, for the convenience of the user a TM tool will highlight the changes that have
occurred, as can be seen in the example.

Use
There are a number of scenarios in which TM technology has a particularly clear
application, although its use is not completely restricted to these areas. These scenarios
are as follows:

• Scenario 1: new translation projects which bear a similarity (terminological or more


general) to previous translations – for example, texts with a large amount of
terminology repeated throughout the document
• Scenario 2: a large project carried out simultaneously by several translators – here
the use of TM will help to overcome the problem of how to achieve terminological
consistency
• Scenario 3: the constant revision or updating of existing documents – TM makes this
possible within a very short timescale
• Scenario 4: the translation of websites and desktop publishing files and the
localisation of software

Bearing these four scenarios in mind, one can say that certain text types are ideal
candidates for TM use:

• Repetitive texts such technical (e.g. manuals, technical documentation), financial and
legal documents (MS Word documents)
• E-content (HTML and XML files)

2
• Software (e.g. menu items, error messages, etc.: Java properties, Windows resource
files, etc.)
• Text contained in complex formats (such as DTP files: FrameMaker, Illustrator,
Interleaf, Pagemaker, etc.)

Conversely, the technology is not generally suitable for:

• Literary texts
• Short texts (one paragraph document, slogans, etc.)
• One-off projects or, more generally, small volumes of translation work

Benefits and limitations


The use of TM systems brings with it a number of well-documented benefits. Because of
the efficacy of the exact and fuzzy matching technology users can experience a
significant productivity gain. As TM databases grow in size translators start to enjoy
unparalleled access to ideas and solutions from previous translations, and also find that
they never need to translate the same sentence (or similar sentences) twice. In addition,
use of the technology leads to terminological consistency, uniformity of style and a
general improvement in translation quality.

It would be misleading to claim that all reactions have been totally positive, of course, as
there are critics who point to a number of limitations in the technology as it is
implemented in most currently available commercial TM systems: its unsuitability for
non-repetitive texts, the inflexibility of only having matches on the sentence level, the
difficulty of retrieving contextual information and the time it takes to produce useful TMs.

Overview of currently available TM systems


The situation has changed considerably since the near-simultaneous launch of IBM
TranslationManager and TRADOS Translator’s Workbench in 1992. Since that time,
those pioneering tools have been joined (and, in the case of the former, replaced) by a
host of other systems, so that a total of some thirty different TM systems are now
currently available on the market.

The best-known TM packages currently available are Déjà Vu, WordFast, SDL Trados,
STAR Transit, MultiTrans and Omega-T. Besides the most obvious difference of price
(WordFast is cheap and Omega-T free, while the others sell at full commercial prices),
the packages differ, for example, in terms of

• the text editing environment: some tools use MS Word while others offer their own
text editor (usually in a tabular format)
• the granularity of segmentation (which can occur at sentence, phrase or word level)
• the indexing method (indexing of segments vs. full-text indexing)
• the structure of the resources repository (physical TM database vs. virtual database
vs. index)
• the match retrieval techniques used (character-string-based matching vs.
linguistically enhanced matching)
• the level of automation offered

3
• the ease with which a particular tool can be integrated with a machine translation
engine

However, since all basically perform the same core tasks it is largely a matter of
personal preference which one a particular user will select.

Current research in TM technology


New socio-economic conditions instigated by globalization and the advancement of
global communications, as well as the content explosion on the Web, are having an
impact on the language services industry which is now called upon to cope with high
demand for multilingual documentation and content in a great variety of formats. This
increasing volume of translations combined with the pressure to produce translations in
shorter timescales has made necessary the deployment of translation technology
applications such as Translation Memory systems, and has intensified the research into
developing new tools and improving existing ones with a view to responding to the new
challenges faced by modern translation professionals.

Apart from TM system developers, several other research bodies (such as many
universities and some European institutions) have realized the importance of these
systems and their potential benefits to translation activity, and they have joined forces
with the industry to enhance and accelerate the research into TM technology. The focus
of current research is geared in several directions, depending on the interests of each
research team. However, the most important areas indicated by reports on the weaker
aspects of modern TM systems have been:

• the development of user-friendlier TM tools;


• the expansion of the scope of use of a TM tool;
• functionality expansion;
• enhancement of access to linguistic resources via a unified TM platform;
• optimised leveraging of previously translated content – improved fuzzy matching;
• the standardisation and efficient exchange of TM resources (through the
refinement of the TMX and SRX standards and the compliance of all TM systems
to those).

Recently, the research efforts of certain research teams have brought into light very
significant developments in TM technology. Those developments have addressed some
of the inherent limitations and problems encountered by traditional TM systems and
have opened up the way to a new generation of systems.

A common request expressed regularly by translators has been the ability of the TM
system to show some context for the match that is suggested to the user. This is
considered an important functionality as translators rely heavily on the context of use of
words and phrases before they decide on the correct translation. Up until recently, no
commercial TM system could offer this functionality, since it had split the original source
and target text into segments, so all it had in its database were a number of incoherent
segments but no text. Translators working on traditional TM systems normally resort to
concordance tools in order to get some additional information on the suggested matches
that will help them choose the right translation. Things changed with a new approach
adopted by some tools (such as MultiTrans and LogiTrans), called the full-text approach.
Instead of segmenting the texts at the beginning, they store them as full bitexts and

4
index them in the TM database using the character-string-in-bitext (CSB) technique.
Once the bitexts are in the database, they are aligned at paragraph level. This approach
has the advantage of retaining and displaying the context (the full paragraph in which the
match is found) for any match retrieved and suggested to the user.

Another challenge that is still under the spotlight of TM research is the improvement of
the match retrieval techniques so that the system offers better match recall and
precision. In order to enable the system to find all matches available in one’s TM
database for a queried source segment (match recall), some developers (such as Atril,
the company that produces Déjà Vu) have adopted a character-string-based technique
which uses character string matching algorithms to look for a match not only in
segments but also in sub-parts of the segments. In this way, the possibility of finding
more matches increases considerably. In terms of enabling the system to find the correct
matches for the queried source segment (match precision), a few linguistically enhanced
matching techniques have been developed addressing that challenge. Traditionally, TM
systems have been treating the segments – of any length – as a sequence of character
strings, therefore the match retrieval algorithms were trying to match the ‘surface’
appearance of source segments with the appearance of segments available in the TM
repository. A new generation of TM systems, in order to offer better match precision,
have introduced linguistic information to the segments, so that the system can look for
matches based not only on the appearance of segments but also on the linguistic
information they contain. In the case of Masterin® for example, each segment in the TM
database is annotated with grammatical information and constitutes a ‘translation
pattern’. So, during the match search, matches are sought by a deep-structure pattern
recognition method in addition to character string recognition techniques.

A further equally important challenge faced by TM researchers is the maximum


deployment of existing resources for the generation of matches, even if no exact or fuzzy
match can be found in the TM database. The solutions implemented to address this
challenge have been borrowed from Machine Translation systems. TM systems like Déjà
Vu X, for instance, uses Example-Based Machine Translation techniques to put together
two sub-segments that exist in two different segments stored in the TM database in
order to form a new suggested match. Masterin®, following a similar approach,
constructs and suggests a fuzzy match from the available resources in the database
(‘Knowledge Base’) by applying translation heuristics.

Future directions of research


Research in TM systems has still a long way to go, as requirements from translators
change constantly and increase in analogy with the demands of the profession. In
addition, translation professionals seem to have achieved a certain level of
sophistication as computer application users and greater familiarity with TM systems
which renders them more demanding in terms of the expectations they have of future
TM systems. The expansion and optimization of the functionality of TM systems will
continue to be among the top priorities in the future research agenda of TM developers.
The scope of use of a TM system also needs to be expanded so that the tool can assist
in the translation of general texts in addition to technical texts. For this to be possible,
TM systems need to rely less on the repetition rate exhibited by the source text and
more on the language resources they contain in their database.

5
A greater weight is expected to be given to language resources in terms of both access
via the TM system and maximum deployment. Language resources such as glossaries
and dictionaries will be sold perhaps as add-ins to the TM application, so that translators
will not have to wait long before their termbase has reached a level where it can offer
them valuable help on translation problems. Furthermore, future TM tools will be
probably able to integrate language resources (such as bilingual corpora – parallel or
aligned, glossaries and dictionaries – online or on CD-ROMs) in their TM database
efficiently, easily, quickly and on a large scale. The Web is also expected to contribute to
the improved utility of TM tools. In particular, thanks to the evolution of the Semantic
Web researchers and developers will be looking into ways to exploit the Web as a vast
resource of bilingual (or monolingual) corpora by developing capabilities to extract texts
from the Web and store them in a TM database so that they can be used by translators
as reference material.

In terms of the acquired resources, future research will concentrate on improving the
algorithms for search and retrieval of matches, so that the system offers more relevant
results more quickly and with more useful linguistic and contextual information.

Finally, there will be a higher degree of convergence between TM systems and Machine
Translation systems in the future, as both types of systems share common problems, for
which the solutions lie in the combination of the two technologies. A TM system with
carefully implemented Machine Translation capabilities seems to be the obvious way
towards the ideal translation support tool.

Conclusion
TM technology was developed with a view to serving the needs of translation
professionals within a constantly changing and demanding global environment.
Translators have a choice whether or not to use TM systems, depending on the nature
of their work and how much weight they attribute to potential benefits deriving from TM
use. However, due to the ignorance and misinformation which have triggered many of
the misconceptions around TM systems, it is natural that some translators have
developed a fear of this technology. The simplest way to combat this fear is to keep an
open mind and seek to be informed about these systems. There may well turn out to be
solutions out there that can take care of what each translator considers as grunt work
and, thus, let him focus on the creative part of translation.

V i e w p u b l i c a t i o n s t a t s

You might also like