Custom Machine translation, a new revolution on the way? (Post-editing & Machine Translation)

Translation - art & business » Post-editing & Machine Translation »
Custom Machine translation, a new revolution on the way?
Track this topic

Pages in topic: [1 2] >

Custom Machine translation, a new revolution on the way?

Thread poster: Philippe Locquet

Philippe Locquet

Portugal
Local time: 10:24
English to French
+ ...

Nov 6, 2018

In the translation race one word has to stand out: quality.

Indeed having terminology adapted to an end client’s need is ever more important. That’s is why many LSPs are implementing quality strategies involving steps such as:
_Managing tailor made glossaries
_Managing tailor made Translation Memories
_Affecting the same team of translators to this client
_Having fluent editors capable of giving life to the text
This is without mentioning efforts in the DTP field.

Yet in this quality process there is a new emerging actor: Custom Machine Translation
It is a very different take on what we are generally acquainted with regarding machine translation. Rather than being a way to replace the translator, CMT (Custom Machine translation) is starting a small revolution of its own. Allow me to explain:
When Translation Memory was invented, it gave the translator the opportunity to work with CAT tools that would help work faster but also improve quality by allowing to translate in the same fashion even relying on different translators.

CMT is entering the scene with a similar approach, how?

Imagine that an LSP will start working for XYZ Soft Drinks. They will come with their TMs and glossaries negotiate a massive translation partnership for two years.
The LSP will start building a team of translators and set up the terminology assets (TMs and glossaries). Now they can also set up a CMT: create and train specifically a machine translation that will be solely used to translate XYZ Soft Drinks material. They will do this using the provided TMs and Glossaries. The suggestions given by this CMT will then adhere to the same style used before when translating for this client.
The translation team has now 3 types of suggestions available to get to work:
_XYZ Soft Drinks TM
_ XYZ Soft Drinks Glossary
_And, XYZ Soft Drinks CMT
As the team translates, they get good information to provide quality translation faster.
After each project, the final versions can be used to improve the CMT suggestions.
For the translator, it will likely mean that he’d get his suggestions from glossary and TM as usual and below a 70% match threshold the translator would get suggestions from the CMT (as with regular MT).
Indeed, as it was the case when TM started to be used CMT is improving the coherence between translators for style and terminology. It also can improve tremendously the speed at which the work is done. And it will prove relevant as a source of tailored suggestions.

What is your take on this subject?

There are a few CMT providers out there, the one that stands out to me (for user friendly and effective) is Kantan MT.
If you want to know more about it, you can check their website here: https://kantanmt.com
If you are an LSP and would like training on the topic, check this training page: https://www.proz.com/translator-training/course/16573-give_extra_power_to_your_tms_with_kantan_mt

Hope you'll all find this topic interesting.

My bests 😊 ▲ Collapse

Michael Beijer

United Kingdom
Local time: 10:24
Member (2009)
Dutch to English
+ ...

best custom MT = GT4T/RyS Translation Workflow Automation Package + DeepL/Google Translate (neural)

Nov 6, 2018

Fi2 n Co wrote:

In the translation race one word has to stand out: quality.

Indeed having terminology adapted to an end client’s need is ever more important. That’s is why many LSPs are implementing quality strategies involving steps such as:
_Managing tailor made glossaries
_Managing tailor made Translation Memories
_Affecting the same team of translators to this client
_Having fluent editors capable of giving life to the text
This is without mentioning efforts in the DTP field.

Yet in this quality process there is a new emerging actor: Custom Machine Translation
It is a very different take on what we are generally acquainted with regarding machine translation. Rather than being a way to replace the translator, CMT (Custom Machine translation) is starting a small revolution of its own. Allow me to explain:
When Translation Memory was invented, it gave the translator the opportunity to work with CAT tools that would help work faster but also improve quality by allowing to translate in the same fashion even relying on different translators.

CMT is entering the scene with a similar approach, how?

Imagine that an LSP will start working for XYZ Soft Drinks. They will come with their TMs and glossaries negotiate a massive translation partnership for two years.
The LSP will start building a team of translators and set up the terminology assets (TMs and glossaries). Now they can also set up a CMT: create and train specifically a machine translation that will be solely used to translate XYZ Soft Drinks material. They will do this using the provided TMs and Glossaries. The suggestions given by this CMT will then adhere to the same style used before when translating for this client.
The translation team has now 3 types of suggestions available to get to work:
_XYZ Soft Drinks TM
_ XYZ Soft Drinks Glossary
_And, XYZ Soft Drinks CMT
As the team translates, they get good information to provide quality translation faster.
After each project, the final versions can be used to improve the CMT suggestions.
For the translator, it will likely mean that he’d get his suggestions from glossary and TM as usual and below a 70% match threshold the translator would get suggestions from the CMT (as with regular MT).
Indeed, as it was the case when TM started to be used CMT is improving the coherence between translators for style and terminology. It also can improve tremendously the speed at which the work is done. And it will prove relevant as a source of tailored suggestions.

What is your take on this subject?

There are a few CMT providers out there, the one that stands out to me (for user friendly and effective) is Kantan MT.
If you want to know more about it, you can check their website here: https://kantanmt.com
If you are an LSP and would like training on the topic, check this training page: https://www.proz.com/translator-training/course/16573-give_extra_power_to_your_tms_with_kantan_mt

Hope you'll all find this topic interesting.

My bests 😊

Thanks for the interesting post!

If you ask me, the two best custom MT systems currently available and relevant to freelancers are:

(1) GT4T (https://gt4t.net/en/quick-tasks/ ) +
(2) ‘RyS Translation Workflow Automation Package 2018’ (https://appstore.sdl.com/language/app/rys-translation-workflow-automation-package-2018/822/ ),

combined with:

DeepL and/or
Google Translate (neural) and/or
Microsoft Translator (neural)

GT4T and RYS both contain a very clever feature that allows you to fix MT results with your own glossary. This basically means that you can start with the already very good output of, say, DeepL, and if a particular term is consistently wrong, you just add its preferred replacement to your special glossary (called ‘Simple Glossary’ in GT4T, and ‘Search-and-replace list’ in the RYS plugin). It's fast and easy and extremely useful.

Michael

[Edited at 2018-11-06 16:52 GMT]

[Edited at 2018-11-06 16:55 GMT]

Philippe Locquet

Jorge Payan

Jonathan Mello

Kay-Viktor Stegemann
Germany
Local time: 11:24
English to German

In memoriam

And grammar?

Nov 6, 2018

Michael Beijer wrote:
GT4T and RYS both contain a very clever feature that allows you to fix MT results with your own glossary. This basically means that you can start with the already very good output of, say, DeepL, and if a particular term is consistently wrong, you just add its preferred replacement to your special glossary (called ‘Simple Glossary’ in GT4T, and ‘Search-and-replace list’ in the RYS plugin). It's fast and easy and extremely useful.

I don't know this tool but it sounds to me as if it could work better into English than into other languages, since grammar in English is so simple. Search and replace won't work as easily when you work into German (and probably most other languages), since you cannot simply replace a term with another, without considering case, tense, mode, and so on. DeepL is rather impressive for German because it gets the grammar right (well, often), as opposed to most other MT stuff, but do the tools GT4T and RYS also get the grammar right when replacing one term with another?

Daniel Frisano

Italy
Local time: 11:24
Member (2008)
English to Italian
+ ...

There is a fine line...

Nov 6, 2018

... between posting and advertising, and it looks like it's getting finer by the day...

[Edited at 2018-11-06 18:03 GMT]

Katalin Szilárd

Anton Konashenok

Kaspars Melkis

Daryo

Rosalind Haigh

Tom in London

Kuochoe Nikoi-Kotei

Philippe Locquet

Portugal
Local time: 10:24
English to French
+ ...

TOPIC STARTER

Thanks!

Nov 6, 2018

Michael Beijer wrote:

Thanks for the interesting post!

If you ask me, the two best custom MT systems currently available and relevant to freelancers are:

(1) GT4T (https://gt4t.net/en/quick-tasks/ ) +
(2) ‘RyS Translation Workflow Automation Package 2018’ (https://appstore.sdl.com/language/app/rys-translation-workflow-automation-package-2018/822/ ),

combined with:

DeepL and/or
Google Translate (neural) and/or
Microsoft Translator (neural)

GT4T and RYS both contain a very clever feature that allows you to fix MT results with your own glossary. This basically means that you can start with the already very good output of, say, DeepL, and if a particular term is consistently wrong, you just add its preferred replacement to your special glossary (called ‘Simple Glossary’ in GT4T, and ‘Search-and-replace list’ in the RYS plugin). It's fast and easy and extremely useful.

Michael

[Edited at 2018-11-06 16:52 GMT]

[Edited at 2018-11-06 16:55 GMT]

Thanks for bringing this up. This ads an interesting layer to existing publi MTs. That's good for freelancers indeed.
The CMTs I mentioned allow for full costomization, by building an NMT from the ground up (be it an SMT or an NMT). Radical! Off-course teh regular freelancer won't be getting the service but I think it's only a matter of time that freelancers will have to learn how to collaborate with these tools where the LSP will connect them.

My bests

Philippe Locquet

Portugal
Local time: 10:24
English to French
+ ...

TOPIC STARTER

Practice makes perfect

Nov 6, 2018

Kay-Viktor Stegemann wrote:

I don't know this tool but it sounds to me as if it could work better into English than into other languages, since grammar in English is so simple. Search and replace won't work as easily when you work into German (and probably most other languages), since you cannot simply replace a term with another, without considering case, tense, mode, and so on. DeepL is rather impressive for German because it gets the grammar right (well, often), as opposed to most other MT stuff, but do the tools GT4T and RYS also get the grammar right when replacing one term with another?

For results, MT engines will depend on:

1/how clever is the programming. The type of MT can create diffrences on how it performs.
The industry is now moving towards Transformer models. To keep it simple it's a preprocessing by a Statistical model and a full processing by a Neural MT. This is what gives best results at the moment.
2/The amount and the quality of data used to train the engine. So for the same engine, different results will be generated depending on the quality and quantity of data that each language pair received. This is the reason why some MTs are known to be better at legal field, another at EU translations etc.

Vast subject isn't it?

My bests

Kay-Viktor Stegemann
Germany
Local time: 11:24
English to German

In memoriam

That is no answer

Nov 6, 2018

Fi2 n Co wrote:

I did not talk about the MT, I did talk about the tools Michael mentioned, that work on top of an MT system. And I would like to know if these tools are able to recognize and apply complex grammar rules in the course of their search and replace functions or not. Some generic optimistic answer is not helpful.

[Edited at 2018-11-06 19:28 GMT]

Philippe Locquet

Portugal
Local time: 10:24
English to French
+ ...

TOPIC STARTER

Principles

Nov 6, 2018

Kay-Viktor Stegemann wrote:

I did not talk about the MT, I did talk about the tools Michael mentioned, that work on top of an MT system. And I would like to know if these tools are able to recognize and apply complex grammar rules in the course of their search and replace functions or not. ]

The principles I described above are exactly what will define how MTs perform. The same general rule will apply.
For the freelancer using them, there are two options:
_Check out the information describing how the tool was built to understand if it will work well (grammar and all) for the use intended.
_Or simply try and see.

Hope this clarifies

Michael Beijer

United Kingdom
Local time: 10:24
Member (2009)
Dutch to English
+ ...

In my language (Dutch into English) it works surprisingly well.

Nov 6, 2018

Kay-Viktor Stegemann wrote:

You can create either a ‘Pre-tweaking list’ or a ‘Post-tweaking list’. The former substitutes the terms before being sent to MT, the latter substitutes them afterwards. Here is an example of one of my post-tweaking list, to show you the kind of thing that DeepL keeps getting wrong in job and which the RYS plugin can easily fix (without affecting the overall grammar etc. quality of the segment at all):

Annotation

(higher quality screenshot @ http://beijer.uk/screenshots/RYS-MT-post-tweaking-program.png )

I have no idea how they work, but my wild guess is, with post-tweaking:
DeepL returns its translation, and the program simply looks for any word in the MT output that matches the words in your list and then replaces it with the one in your list. If it's a noun, as is usually the case in my case, the replacement usually won't affect the overall quality of the segment.

Michael

PS: some other RYS screenshots, re their other SDL app (‘RyS Termbase & Translation Assembler’, which I use as a super-charged Studio termbase replacement):

http://beijer.uk/screenshots/RyS.png
http://beijer.uk/screenshots/RYS1.PNG
http://beijer.uk/screenshots/RYS3.PNG

info @ https://appstore.sdl.com/language/app/rys-termbase-translation-assembler/766/

[Edited at 2018-11-06 19:47 GMT]

Kay-Viktor Stegemann
Germany
Local time: 11:24
English to German

In memoriam

So it's just a simple search and replace without grammar

Nov 6, 2018

Michael Beijer wrote:

DeepL returns its translation, and the program simply looks for any word in the MT output that matches the words in your list and then replaces it with the one in your list. If it's a noun, as is usually the case in my case, the replacement usually won't affect the overall quality of the segment.

That's what I thought, and that's why I assume that it might work well into English, but not into German (or other languages with more complex grammar). The problem in German is that one term (noun or other) can appear in many different notations, for example for the different cases in German or within compound words, so that any simple search-and-replace algorithm will not be able to recognize the term when it is not spelled exactly as in your list, and neither to replace it with the correct form of the target term.

Philippe Locquet

Portugal
Local time: 10:24
English to French
+ ...

TOPIC STARTER

Hence, makes sense

Nov 6, 2018

Thanks Michael,

My guess is you are right because there is no way to directly "Tweak" DeepL at the moment.

That's Why I think Customizable MT brings new things to the table: you can adapt them to your liking (at the MT level):
_You can upload glossaries to help the MT with term selection
_You can upload TMs so the MT will adapt style and terminology
_You upload exclusion word lists.
_You can get reports to further tune terminology
_The MT adapts while you translate, so it gets better as it learns from the human translator

So in time, this can help avoid the pitfalls or third party MT since you are building your own tweaked to your liking.

(BTW I know for sure the functions exist in Kantan MT since I've used them, they may exist in other tools in different forms, these training materials you create for that purpose do not require programmer level to be handled, in Kantan anyway).

It will be interesting to see where all this goes, at any rate, many LSPs are in the process of asking their top reviewers to edit CMT returns to tune them to their needs. It seems CMTs can become a big actor on the scene. We'll see.

Thanks for the helpful information

▲ Collapse

Daniel Frisano

Italy
Local time: 11:24
Member (2008)
English to Italian
+ ...

Question

Nov 7, 2018

A simple yes/no question to all MT fans: Would you read a novel or an article written by a machine?

Daryo

Tom in London

Mirko Mainardi

Italy
Local time: 11:24
Member
English to Italian

Freelance Post-Editors

Nov 7, 2018

Fi2 n Co wrote:

That's good for freelancers indeed.

Indeed, if the freelancers you're referring to are "post-editors", not translators...

Samuel Murray

Netherlands
Local time: 11:24
Member (2006)
English to Afrikaans
+ ...

Going a bit off-topic here...

Nov 7, 2018

Daniel Frisano wrote:
A simple yes/no question to all MT fans: Would you read a novel or an article written by a machine?

`

Article: no (except for gisting)
Novel: yes (if the text makes sense, even if the original meaning is lost)

Jorge Payan

Anton Konashenok

Czech Republic
Local time: 11:24
French to English
+ ...

Customising the custom solution is a big project

Nov 8, 2018

In the last year or maybe a bit longer, one of my clients, a very large agency, has been regularly approaching me with offers to post-edit the output of their customised MT with the engines specifically trained to individual end customers (or so I was told). I don't normally do post-editing but I am willing to give a client the benefit of the doubt, so I asked for samples. All of them bar none were total garbage, easier to retranslate from scratch than to fix. When I gave my feedback, the project managers still tried to evangelise me anyway, claiming that their 'metric' showed these translations to be '96% human' or something like that, and in the long run would save me a great deal of time compared to straight translation. In my opinion, this situation is symptomatic of the whole MT industry. Firstly, the proponents of MT are inventing quantitative measures of MT that are calculated automatically by comparing MT output with human text, which is already a deeply flawed concept - quite often, changing just one word in a long phrase might totally pervert its meaning or make it stylistically unacceptable. Secondly, the reference translations used to train the MT engine and to assess the quality of its output are typically produced by mediocre translators because the top-flight professionals are few and their services cost a lot, and employing them - if they agree to get involved - would defy the very purpose of MT (saving money). Just like with dictionaries, and maybe even to a greater extent, the training corpus of an MT engine is only as good as its worst entry, and this is why using only the very best translators is indispensable. MT proponents are stubbornly refusing to accept it. Compare it to the human world: whatever we teach, the students at the end of the course are typically still well below the level of their teacher. In extremely rare cases when students exceed the teacher right away (rather than after many years of practice), it usually means they have gained extra insight elsewhere. In artificial intelligence in general, and in MT in particular, there is an additional complicating factor - AI systems still have a loooong way to go before they will be able to *understand* anything rather than fake the understanding, which - surprise surprise! - is also one of the major factors distinguishing a good human student from a mediocre one.
These qualitative arguments may be continued ad nauseam. To the MT community, I will just give a quantitative educated guess based on more than three decades of professional translation work: if you want to produce a custom MT engine of reasonably good quality, your training corpus should include on the order of 1 million words, all produced by top specialists in the given field, with as few repetitions as possible.

[Edited at 2018-11-08 15:16 GMT] ▲ Collapse

Kay-Viktor Stegemann

Kaspars Melkis

Philippe Locquet

Robert Rietvelt

mughwI

Michele Fauble

Matheus Chaud

Pages in topic: [1 2] >

Login to reply/comment

There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »

Custom Machine translation, a new revolution on the way?

Translation news

» New project on feminist translation
(0 comments)
» Language Models Can Predict the Most Suitable Translation Techniques, Study Finds
(0 comments)
» Why Large Language Models are the future of manufacturing
(0 comments)

Submit translation news »
Read more translation news »

Forum rules

Help and orientation

TM-Town
Manage your TMs and Terms ... and boost your translation business Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work. More info »

CafeTran Espresso
You've never met a CAT tool this clever! Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free Buy now! »

Recent posts | FAQ | Rules | Moderators | Article knowledgebase

Your current localization setting

English

Select a language

More languages...

Custom Machine translation, a new revolution on the way?

Custom Machine translation, a new revolution on the way?

You have native languages that can be verified

Your current localization setting

Select a language