Machine Translation Technologies

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 30

MACHINE TRANSLATION

TECHNOLOGIES
LECTURE 6
DEFINITION
Machine translation (MT) (not to be confused with
computer- or machine-aided translation), is a sub-field of
computational linguistics that investigates the use of software
to translate text or speech from one language to another.
Машинный перевод – это процесс автоматического
перевода текстов с одного языка на другой с помощью
искусственного интеллекта и без вмешательства со
стороны человека.
THE ESSENSE OF MACHINE TRANSLATION
On a basic level, MT performs mechanical substitution of
words in one language for words in another, but that alone
rarely produces a good translation because recognition of
whole phrases and their closest counterparts in the target
language is needed. Not all words in one language have
equivalent words in another language, and many words have
more than one meaning.
HISTORY OF MACHINE TRANSLATION

• The first researcher in the field, Yehoshua Bar-Hillel, began his


research in1951. Bar-Hillel organised the first International
Conference on Machine Translation in 1952. Later he expressed
doubts that general-purpose fully automatic high-quality machine
translation would ever be feasible.
• MT research programs popped up in Japan and Russia (1955).
SYSTRAN
SYSTRAN, an American software compony, founded by Dr. Peter
Toma in 1968 in California, is one of the oldest machine translation
companies. SYSTRAN has done extensive work for the United States
Department of Defense and the European Commission.
The company worked on translation of Russian to English text for
the United States Air Force during the Cold War. Russian scientific
and technical documents were translated using SYSTRAN at Wright-
Patterson Air Force Base, Ohio. The quality of the translations,
although only approximate, was usually adequate for understanding
content.
HISTORY OF MACHINE TRANSLATION
•Trados (1984) was the first to develop and market Translation Memory technology (1989),
though this is not the same as MT.
•The first commercial MT system for Russian / English / German-Ukrainian was developed at
Kharkov State University (1991).

•By 1998 anyone could buy a PC program for translating in one direction between
English and a major European language

•MT on the web started with SYSTRAN offering free translation of small texts
(1996) and then providing this via AltaVista Babelfish.
•The first mobile phone with built-in speech-to-speech translation functionality for
English, Japanese and Chinese was represented in 2009.
MACHINE TRANSLATION TECHNOLOGIES

• Rule-based machine translation


• Statistical machine translation
• Neural machine translation
• Hybrid machine translation
PROCESS OF TRANSLATION

• The translation process may be described as:


• Decoding the meaning of the source text
• Re-encoding this meaning in the target language.
RULE-BASED TRANSLATION
• Language experts develop built-in linguistic rules and bilingual dictionaries
for specific industries or topics. Rule-based machine translation uses these
dictionaries to translate specific content accurately. The steps in the process
are:
• The machine translation software parses the input text and creates a
transitional representation
• It converts the representation into target language using the grammar rules
and dictionaries as a reference
INTERLINGUAL MACHINE TRANSLATION
• Interlingual machine translation is one instance of rule-based machine-
translation approaches. In this approach, the source language, i.e. the text to
be translated, is transformed into an interlingual language, i.e. a "language
neutral" representation that is independent of any language. The target
language is then generated out of the interlingua.
• The only interlingual machine translation system that has been made
operational at the commercial level is the KANT system (Nyberg and
Mitamura, 1992), which is designed to translate Caterpillar Technical English
(CTE) into other languages.
RULE-BASED TRANSLATION: PROS AND
CONS

• Rule-based machine translation can be customized to a specific industry or


topic.
• It is predictable and provides quality translation.
• It produces poor results if the source text has errors or uses words not present
in the built-in dictionaries.
• The only way to improve it is by manually updating dictionaries regularly.
STATISTICAL MACHINE TRANSLATION
• Statistical MT generates translations based on statistical models whose parameters
are derived from bilingual text analysis. Statistical machine translation uses
machine learning to translate text. The machine learning algorithms analyze large
amounts of human translations that already exist and look for statistical patterns.
The software then makes an intelligent guess when asked to translate a new source
text. It makes predictions on the basis of the statistical likelihood that a specific
word or phrase will be with another word or phrase in the target language.
• Statistical methods are based on language corpora, where such corpora are
available, good results can be achieved translating similar texts, but such corpora
are still rare for many language pairs.
STATISTICAL MACHINE TRANSLATION
• The first statistical machine translation software was CANDIDE
from IBM.
• Google Translate, introduced in 2006, was originally released as a
statistical machine translation service. The input text had to be
translated into English first before being translated into the
selected language. Since SMT uses predictive algorithms to
translate text, it had poor grammatical accuracy.
STATISTICAL MACHINE TRANSLATION:
PROS AND CONS

• Statistical methods require training on millions of words


for every language pair.
• With sufficient data the machine translations are accurate.
HYBRID MACHINE TRANSLATION

• Hybrid machine translation uses two or more machine translation models on


one piece of software. This machine translation process commonly uses rule-
based and statistical machine translation subsystems. The final translation
output is the combination of the output of all subsystems.
HYBRID MACHINE TRANSLATION
• Rules post-processed by statistics: Translations are performed using a rules
based engine. Statistics are then used in an attempt to adjust/correct the
output from the rules engine.
• Statistics guided by rules: Rules are used to pre-process data in an attempt
to better guide the statistical engine. This approach has a lot more power,
flexibility and control when translating.
NEURAL MACHINE TRANSLATION
• Neural machine translation uses artificial intelligence to learn languages, and to
continuously improve that knowledge using a specific machine learning method
called neural networks.
Neural network
• A neural network is an interconnected set of nodes inspired by the human brain. It
is an information system where input data passes through several interconnected
nodes to generate an output. Neural machine translation software uses neural
networks to work with enormous datasets. Each node makes one attributed change
of source text to target text until the output node gives the final result.
NEURAL MACHINE TRANSLATION
• Neural machine translation (NMT) is not a drastic step beyond what has been
traditionally done in statistical machine translation (SMT). Its main departure
is the use of vector representations ("embeddings", "continuous space
representations") for words and internal states. The structure of the models is
simpler than phrase-based models.
• NMT is a state-of-the-art approach, taking advantage of deep learning
methods and resulting in a cleaner translation output in comparison with
classic Machine Translation solutions (common examples are DeepL and
Mirai Translate).
NEURAL MACHINE TRANSLATION: PROS
AND CONS
• Neural networks consider the whole input sentence at each step when producing the
output sentence.
• NMT can learn directly, in an end-to-end fashion, the mapping from input text to
associated output text.
• MT is inable to pick up on cultural nuances, contextual content clues, and local slang.
• Content can feel a bit robotic, choppy, and not culturally aligned.
• NMT translations are unreliable, unpredictable or utterly unintelligible, having no
guarantees of accuracy or consistency.
GOOGLE TRANSLATE
• There’s barely anyone who hasn’t heard of it: Google Translate has been
incorporated in almost every product of the Google ecosystem and has
reached high-quality levels—not only for the most common language pairs
but also for less popular ones.
• Ever since Google brought about the “MT neural revolution” in 2016,
machine translation results have significantly improved in terms of quality,
consistency, and productivity.
DEEPL
• German-based DeepL was launched in 2017 to further develop Linguee, the
world’s biggest database of human translations. The engineers at DeepL
applied the newest Deep Learning technique (hence the company’s name) to
get state-of-the-art machine translation software by training the models on
Linguee’s data. And the results are very similar (or better, depending on
language pair, field, and evaluation system) than the ones from Google,
which is still often considered as the benchmark for translation quality—
DeepL itself claims to be “the world’s best machine translation.”
SYSTRAN TRANSLATE
• The first company ever to offer machine translation for commercial purposes
(founded in 1968), Systran keeps following the latest technologies and
introducing some interesting innovations itself—the latest being pure neural
machine translation (PNMT). Systran’s free engine, Systran Translate, allows
users to translate their texts “on the go” into more than 140 language
combinations while trusting the power of the open-source community.
MICROSOFT TRANSLATOR

• Much like Google Translate, Microsoft Translator is integrated into


Microsoft’s own search engine, Bing. Moreover, lots of Microsoft products
now include the possibility to translate documents (Office), messages (MS
Teams, Skype), or posts (LinkedIn) between 90 languages and dialects using
their home-brewed MT system—also based on the newest neural network
technology with an attention-based model.
AMAZON TRANSLATE
• Amazon Translate, another online machine translation system from Big Tech
—quite young (launched in 2017), yet very powerful. Considering the power
of the parent company, it’s not quite surprising that Amazon Translate has
achieved impressive results in the short time since its release.

• To use Amazon’s machine translation engine, you need an AWS login.


CAN MACHINE TRANSLATION REPLACE HUMAN
TRANSLATION?

• Machine translation can replace human translation in a few instances where it


makes sense and is required in high volumes.
• For example, many service-related companies use machine translation to
help customers via an instant chat feature or quickly respond to emails.
However, if you translate more in-depth content, such as web pages or
mobile applications, the translation may be inaccurate. It is important to have
a human translator edit the content before use.
MACHINE TRANSLATION VS. HUMAN
TRANSLATION

• It is no longer necessary to decide whether to use MT or


human translation when beginning a project. The concept
of post-editing, that is the editing of machine-translated
content by a human linguist, is increasingly becoming
accepted by translation professionals.

You might also like