Pages in topic:   < [1 2]
Software misnomers and how to use MT
Thread poster: Phil Hand
heikeb
heikeb  Identity Verified
Member (2003)
English to German
+ ...
:-) Sep 22, 2012

OK, Phil, try this one:

Winkler lack. So goes it off in Hannover, and many a anticipates evil. "Get but time someone the Winkler", shouts Claudia Roth and beckons in the hall. Oh dear! After all is Werner Winkler the man been, the after Roths Selbstausrufung zur top candidates Anwärterin with his basis appliation for this provided has, that the green primary election dynamic itself at all in movement placed. But Werner Winkler, Ortsverbandschef in the Swabian Waiblingen, dives then
... See more
OK, Phil, try this one:

Winkler lack. So goes it off in Hannover, and many a anticipates evil. "Get but time someone the Winkler", shouts Claudia Roth and beckons in the hall. Oh dear! After all is Werner Winkler the man been, the after Roths Selbstausrufung zur top candidates Anwärterin with his basis appliation for this provided has, that the green primary election dynamic itself at all in movement placed. But Werner Winkler, Ortsverbandschef in the Swabian Waiblingen, dives then but suddenly of in the Hanoverian Apostelhalle. An bit cheezy views he of. Perhaps is it the nervousness, well up of the stage and "please smile" fürs large Bewerberfoto. The gives already time applause of the perhaps three hundred Zuschauern.


My lookup tool was the online site Linguee.de. This lookup tool can probably provide better results than "normal" tools as it uses statistics to find the most common meaning of words, but it can also mislead ("auf" with first translation "of"??). But the above section illustrates several other issues no word lookup took can handle.

- German separable verbs. "goes it off" - "geht es los" > "losgehen" = start. "views he of" - "sieht er aus" = he looks like.
And German is full of those verbs. In this example, the separable prefix is pretty close to the verb, but many times you have 10 or more words between the actual verb and its prefix, because in certain situations the prefix is placed at the very end of the sentence, the verb is pretty close to the beginning.
- The online tool does not distinguish between capitalization, probably because many Germans ignore capitalization rules completely when writing informally. This results in "mal" being interpreted as "Mal" and translated as "time". However, the site's first translation for "mal" is "times" - not better at all. The actual meaning of this little "flavoring" term is missing completely.
- Germans write compounds as single words. Any word lookup tool must at least be able to analyze compounds because there is no way that all possible compounds could be found in a dictionary. And that logarithms for compounds are not always without pitfalls can be seen in the Word spell-checker. "Bewerberfoto" is one example the lookup tool couldn't handle.
- The tool also didn't find several inflected forms such as "Zuschauern" at the end. A bit surprising since it seemed to find generally all standard inflections, but not in this case. "Fürs" is a very common contraction, which wasn't found either.
- Idiomatic expressions consisting of several words. The above example doesn't really have any of those, but it's obvious that a word lookup tool would have difficulties with those.

OK, forgot the Google Translate version (translation and source below). But I must say in defense of MT, GT is by far the worst MT tool for English>German I know.

Winkler missing. So it starts in Hanover, and some guessed it wrong. "Get some someone to Winkler," calls Claudia Roth and waves in the room. Oh dear. After all, Werner Winkler has been the man who according to Roth's self-proclaimed leading candidate-candidate provided its base application that the green primary election dynamics at all into motion. But Werner Winkler, head of the local chapter Swabian Waiblingen dips, but then suddenly in the Hanoverian Apostle Hall. A bit cheesy he looks. Maybe it's the nervousness, or "cheese" up on stage for the big picture candidates. This is ever the applause of perhaps three hundred spectators.

Here's Systran's version:
Winkler is missing. Thus it goes loosely into Hanover, and some suspects already bad. “Get nevertheless times someone to the Winkler”, calls Claudia Roth and signs into the hall. Oje. Werner Winkler was nevertheless the man, who ensured after Roths self proclaiming to the leading candidate candidate with its basis application that the green primary election dynamics sat down at all in motion. But Werner Winkler, Ortsverbandschef in the Swabian Waiblingen, dips then nevertheless suddenly on in the Hanoverians apostle hall. It looks a little käsig. Perhaps it is nervousness, therefore up on the stage and “please smile” for the large applicant photo. That gives already applause from that perhaps three hundred spectators.

And this is Reverso (which takes the easy way out and lets the reader pick the correct translations themselves):
Winkler is absent(lacking). So it is off in Hannover, and some already anticipates bad person(evil). " Get, nevertheless, somebody the Winkler ", calls Claudia Roth and waves in the hall. Oje. Nevertheless, Werner Winkler has been the man(husband) who has cared after Roths selfproclamation to the leading candidate's pretender with his(its) basis application for the fact that the green primary election dynamism started to move actually. But, nevertheless, then Werner Winkler, local association chief in the Swabian Waiblingen, emerges suddenly in the Hanoverian apostle's hall. A little bit cheesily he(it) looks. Perhaps it is the nervousness, also scuffle on the stage and " request smile " for big applicant's photo. This already gives applause of perhaps 300 spectators.

My translation of the German source:
Winkler is missing. This is how it starts in Hanover, and already some are having misgivings. "Somebody go and get Winkler", calls Claudia Roth and waves into the hall. Oh dear! After all, Werner Winkler was the one who, after Roth's self-proclamation as aspirant for top-candidate, mobilized the primary election dynamics of the Green party with his candidature for the Green grassroot movement. But then Werner Winkler, leader of the local chapter of Waiblingen in Swabia, suddenly shows up after all in the Apostle Hall in Hanover. He looks a bit pale. Maybe it's nervousness, therefore up on the stage and "say cheeze" for the big photo with all candidates. This earns them already the first applause of the maybe three hundred spectators.

Source:
Winkler fehlt. So geht es los in Hannover, und mancher ahnt schon Böses. "Hol doch mal jemand den Winkler", ruft Claudia Roth und winkt in den Saal. Oje. Immerhin ist Werner Winkler der Mann gewesen, der nach Roths Selbstausrufung zur Spitzenkandidaten-Anwärterin mit seiner Basis-Bewerbung dafür gesorgt hat, dass die grüne Urwahl-Dynamik sich überhaupt in Bewegung setzte. Aber Werner Winkler, Ortsverbandschef im schwäbischen Waiblingen, taucht dann doch plötzlich auf in der Hannoveraner Apostelhalle. Ein bisschen käsig sieht er aus. Vielleicht ist es die Nervosität, also rauf auf die Bühne und "Bitte lächeln" fürs große Bewerberfoto. Das gibt schon mal Beifall von den vielleicht dreihundert Zuschauern.






[Edited at 2012-09-22 07:48 GMT]
Collapse


 
Phil Hand
Phil Hand  Identity Verified
China
Local time: 07:14
Chinese to English
TOPIC STARTER
MT still doing no better than look up Sep 22, 2012

Hi, Heike.

Well, I have a couple of comments there. First, I asked for a comparison between a look-up tool and an MT tool. You haven't supplied it, so I will.

Original text:
http://www.spiegel.de/politik/deutschland/gruene-bewerber-fuer-spitzenkandidatur-stellen-sich-in-hannover-vor-a-857219.html
Winkler fehlt. So geht es los in Hannover, und mancher ahnt schon Böses. "Hol doch mal jemand den Winkler", ruft Claudia Roth und winkt in den Saal. Oje. Immerhin ist Werner Winkler der Mann gewesen, der nach Roths Selbstausrufung zur Spitzenkandidaten-Anwärterin mit seiner Basis-Bewerbung dafür gesorgt hat, dass die grüne Urwahl-Dynamik sich überhaupt in Bewegung setzte.

Aber Werner Winkler, Ortsverbandschef im schwäbischen Waiblingen, taucht dann doch plötzlich auf in der Hannoveraner Apostelhalle. Ein bisschen käsig sieht er aus. Vielleicht ist es die Nervosität, also rauf auf die Bühne und "Bitte lächeln" fürs große Bewerber-Foto. Das gibt schon mal Beifall von den vielleicht dreihundert Zuschauern.

GT translation:

Winkler missing. So it starts in Hanover, and some guessed it wrong. "Get some someone to Winkler," calls Claudia Roth and waves in the room. Oh dear. After all, Werner Winkler has been the man who according to Roth's self-proclaimed leading candidate-candidate provided its base application that the green primary election dynamics at all into motion.

But Werner Winkler, head of the local chapter Swabian Waiblingen dips, but then suddenly in the Hanoverian Apostle Hall. A bit cheesy he looks. Maybe it's the nervousness, or "cheese" up on stage for the big-picture candidates. This is ever the applause of perhaps three hundred spectators.


Now, I have deliberately not read your version yet, because I don't want to spoil my perfect lack of comprehension. I've got to tell you that once again, the look up is performing no worse than the MT. Obviously the word-by-word look up that you generated was incomprehensible. But I've now read the MT version, and I still don't have the foggiest idea what this story is about. "Maybe it's the nervousness, or "cheese" up on stage for the big-picture candidates"?! Not the foggiest idea.

Honestly, I put this challenge out there, I thought someone might come back with a reasonable counterexample, but I'm 3-0 as of now.

And that's without even going into the all the problems with your look-up approach. To start with, you didn't do a proper look-up, because you've got inflections in there.

Secondly, remember that the point of a word-by-word look up is not to create a text. That's the illusion that MT creates - that it can create a target language text. One of the positive features of a look-up is that it doesn't pretend to make a text.

Thirdly - everything you've said about the problems of a look-up tool is right. If we were to try to build an effective, functioning text look-up tool, we would solve those problems.

Fourth - even without a good look-up tool available, you made some spectacularly poor look-ups. The only result for "fehlt" is that it's a part of the verb "fehlen", for which the top definition is "to be absent" or "to be missing". So I think you've not been quite fair in terms of the look ups. Let me try (still without reading your translation, so I have no idea what this story is about.

Winkler/ to be missing./ So/ go/ it/ off/ in/ Hannover,/ and/ many a/ forbode/ already/ evil./ "(excl.)/ but/ times/ someone/ the/ Winkler./" shout/ Claudia Roth/ and/ beckon/ in/ the/ hall./ oh dear./ for all that/ is/ Werner Winkler/ the/ man/ is,/ the/ to/ Roth's/ self-proclaimed/ to/ top candidate-/ aspirant/ with/ his/ base-/ application/ for it/ provide/ has,/ that/ the/ green/ election-/ dynamic/ itself/ at all/ in/ motion/ set.


I cannot tell whether I'm looking with a prejudiced eye, but I honestly think that I can get a little bit of an idea from my look-up, while I got nothing from the GT version.

Here's what I think it means from my look up version:
Winkler is missing, it's all happening in Hannover, many people are worried, but whatever happens, Winkler is the most important candidate for the Greens in this election, and that's what Roth says, too.

OK, I'm now going to read Heike's translation, to find out what it actually means...

...

OK - Heike, I have to tell you, I don't understand your version that well. "Roth's self-proclamation" is weird. Anyway, I had failed to understand from either the look up or from GT that both Winkler and Roth are candidates. That's huge, vital semantic information. It wasn't there in the look-up, and it wasn't there in the MT version.

I've yet to see MT outperform the dumbest word-by-word look up - and this if for German! One of the most intensely studied languages in the world. If you tried this kind of MT test in my pair, where the basis of good training texts is much thinner, the look up tool would win hands down every time.


 
Phil Hand
Phil Hand  Identity Verified
China
Local time: 07:14
Chinese to English
TOPIC STARTER
Systran seems to do best Sep 22, 2012

You updated your post as I was writing mine there, so we've got some cross-over.

That Systran version seems a little better. I guessed from the Systran version that Roth was a candidate, which I had not understood from any other version (except your human translation).

So, maybe, sometimes, Systran could outperform word-by-word look up. Hallelujah, that's a result for how many millions of hours of research?

I'm afraid the MT community has set their sights
... See more
You updated your post as I was writing mine there, so we've got some cross-over.

That Systran version seems a little better. I guessed from the Systran version that Roth was a candidate, which I had not understood from any other version (except your human translation).

So, maybe, sometimes, Systran could outperform word-by-word look up. Hallelujah, that's a result for how many millions of hours of research?

I'm afraid the MT community has set their sights very low, and failed to achieve even those low targets. I'll take a dictionary every day of the week and twice on Sundays.
Collapse


 
Jeff Allen
Jeff Allen  Identity Verified
France
Local time: 01:14
Multiplelanguages
+ ...
knowledge-based (semantic) MT Sep 22, 2012

Heike Behl, Ph.D. wrote:
There is a lot of semantic and grammar knowledge in MT. That's why it's important to have user-definable dictionaries.


Thanks Heike for pointing this out.

I'm giving a presentation this coming week at the Proz.com Virtual conference (I recall it is a session on Thursday) about the old and new generations of MT and questions and concerns raised about them.
The video of the presentation is done and will be shown, and I believe the session includes a live Question/Answer time at the end.
The talk mentions the different types of MT systems, including semantic/knowledge based MT systems and when they were in fashion, comparing them to various new types of stat-based system, and rule-based systems.

That presentation aims to answer many questions in this thread as well and dozens to hundreds of other threads over the past 15 years on this and related points.

Jeff


 
Jeff Allen
Jeff Allen  Identity Verified
France
Local time: 01:14
Multiplelanguages
+ ...
Systran non-customized or customized Sep 22, 2012

@Phil,

There are non-customizable versions of MT (the online portals) and customizable versions (opensource software and paid software program). I explain these and give examples in my talk on my MT at the ProZ Virtual Conference next week (mentioned in post above).

All of your examples seem to come from non-customizable MT version. So this is why the vocabulary and terminology is not corresponding to what you expect.

I have always used and always recommen
... See more
@Phil,

There are non-customizable versions of MT (the online portals) and customizable versions (opensource software and paid software program). I explain these and give examples in my talk on my MT at the ProZ Virtual Conference next week (mentioned in post above).

All of your examples seem to come from non-customizable MT version. So this is why the vocabulary and terminology is not corresponding to what you expect.

I have always used and always recommend using (and have given MT tutorials on it) "customizable" versions of MT systems to override the general standard vocabulary in the system with customer/user required terminology. It's the same as doing a general domain specific translation or performing the same translation task with customer-specific or sub-domain specific terminology.

Using any of the online portals will always disappoint and is not worth using for any type of comparison tests, because they simply are not designed to offer the possibility to enrich the system with specific terminology.

All MT software product review articles I have written (Systran, Reverso, PROMT, etc) are based on testing customizable software which costs in the range of 50-500 $USD depending on the level and number of features included.

Jeff
Collapse


 
Jeff Allen
Jeff Allen  Identity Verified
France
Local time: 01:14
Multiplelanguages
+ ...
MT business model - free vs paid features Sep 22, 2012

Phil Hand wrote:

So, maybe, sometimes, Systran could outperform word-by-word look up. Hallelujah, that's a result for how many millions of hours of research?

I'm afraid the MT community has set their sights very low, and failed to achieve even those low targets. I'll take a dictionary every day of the week and twice on Sundays.


@Phil,

It's not the case that millions of hours of research have resulted in the development of such limited vocabulary.

It's the result of business model. And other software publishers sometimes do the same with their software/services.

Many (but not all) MT providers give away for "free" on online portals a baseline version of their MT system, with a limited standard dictionary (about 100,000 words) which cannot be customized at all. You get the minimum.

And then they provide "paid features" in their commercial software which give extended dictionaries, domain specific dictionaries, user-dictionary customizable features, etc

They all put the maximum effort in the sets of paid features and just simply update the free system with minimal, limited updates on an ongoing basis.

That is why "some" MT providers do not provide free, very limited versions on their websites because they only want to focus on showing the customized and customizable extended versions.

Jeff


 
Phil Hand
Phil Hand  Identity Verified
China
Local time: 07:14
Chinese to English
TOPIC STARTER
Don't see the relevance to my challenge Sep 22, 2012

Jeff - I appreciate that a customised MT system will do certain jobs better than a non-customised system.

But I think you're missing the point of the challenge I made:

"I'd love to see a comparison of an MT tool against a tool that did exactly that - dictionary look-up on every word. I bet that the look-up tool would perform just as well as MT..."

Sure, you can make a big topic-specific glossary. That's always going to be helpful. My point is that having ma
... See more
Jeff - I appreciate that a customised MT system will do certain jobs better than a non-customised system.

But I think you're missing the point of the challenge I made:

"I'd love to see a comparison of an MT tool against a tool that did exactly that - dictionary look-up on every word. I bet that the look-up tool would perform just as well as MT..."

Sure, you can make a big topic-specific glossary. That's always going to be helpful. My point is that having made that glossary, the "MT" part of "customised MT" is then going to add no value at all. If you use your topic glossary directly as a part of your dictionary, and do what I suggested, dictionary look-up on every word in a text, you'll get strings from which a human reader can extract X% of the information in the original text.

My contention is that if you use your customised MT, a human reader will be able to extract at most X% of the information in the original text.

That's my claim, and it's a fairly strong claim. It should be fairly straightforward to disprove, because it surely can't be true that after all these years, MT has still not succeeded in creating any value at all, can it?
Collapse


 
heikeb
heikeb  Identity Verified
Member (2003)
English to German
+ ...
@Phil Sep 22, 2012

Phil Hand wrote:

Thirdly - everything you've said about the problems of a look-up tool is right. If we were to try to build an effective, functioning text look-up tool, we would solve those problems.


First of all: You underestimate these problems vastly. I guess that's why you don't give MT systems credit for being able to handle them much better than word lookup tools. It would take years of research and development to get a tool to be able to handle all these issues only half-way satisfactorily.

Furthermore, if you would add that kind of functionality to a word lookup tool, it wouldn't be a word lookup tool anymore, but an MT system. In order to be able to cope with those issues, you need a grammatical, syntactical sentence analysis that is able to figure out that in a sentence like this:
"... taucht dann doch noch auf in der..."
the two bold words are actually part of one and the same word "auftauchen". Only then can you get the correct word lookup. And as I mentioned, these words can be separated from each other by many more words, subordinate clauses, relative clauses, etc. This is one of the main reasons GT, trying to match longer strings instead of trying to make syntactical and grammatical sense of individual words, has so many problems with German.

The same is true for the rest of the issues. The matching of source with target words happens in MT system as well, except for systems like GT, which try to match longer strings in huge bilingual corpora. But it is the smallest, easiest, most mechanic function of them all. It's everything else that is necessary for a translation that is difficult to solve.

German might be one of the most studied language, but it is also one of the most complex to be handled with MT (speaking only in regards to the languages usually handled with MT, FIGPS, Japanese, Chinese; can't say anything about other rarer languages, but I wouldn't be surprised if this would still be true).


 
Phil Hand
Phil Hand  Identity Verified
China
Local time: 07:14
Chinese to English
TOPIC STARTER
Still not comparing like with like Sep 23, 2012

Heike, you're not getting the point that saying "it's difficult to achieve perfection" doesn't constitute an argument. Yes, of course it's difficult to achieve perfection in either MT even just in the simpler task of dictionary use. You're right to say that perfect dictionary use actually requires a lot of linguistic knowledge. That's why I've at each point suggested using the very dumbest dictionary tool - just word-by-word, nothing else. And it seems that word by word might actually be as good... See more
Heike, you're not getting the point that saying "it's difficult to achieve perfection" doesn't constitute an argument. Yes, of course it's difficult to achieve perfection in either MT even just in the simpler task of dictionary use. You're right to say that perfect dictionary use actually requires a lot of linguistic knowledge. That's why I've at each point suggested using the very dumbest dictionary tool - just word-by-word, nothing else. And it seems that word by word might actually be as good as MT.

Heike Behl, Ph.D. wrote:

Furthermore, if you would add that kind of functionality to a word lookup tool, it wouldn't be a word lookup tool anymore, but an MT system. In order to be able to cope with those issues, you need a grammatical, syntactical sentence analysis that is able to figure out that in a sentence like this:
"... taucht dann doch noch auf in der..."
the two bold words are actually part of one and the same word "auftauchen". Only then can you get the correct word lookup. And as I mentioned, these words can be separated from each other by many more words, subordinate clauses, relative clauses, etc.


Quite. So we have two choices: 1) produce a text that can seem like real sentences, but might well have misinterpreted these grammatical relations, and thus could send very incorrect semantic signals. Or 2) produce not a text, but a stream of dictionary results that readers can interpret at their own risk.

2) seems to me to be as effective and more honest.
Collapse


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 01:14
Member (2006)
English to Afrikaans
+ ...
Some comments on my example Sep 23, 2012

Samuel Murray wrote:
Afrikaans: 'n Begraafplaas word ook soms 'n kerkhof genoem, omdat begraafplase aanvanklik meestal in die hof (= tuin of erf) van 'n kerk aangelê is. Binne die begraafplase vind 'n mens soms 'n sogenaamde kerkhofhuis met begroetings- of afskeidsruimte waar daar kort voor die teraardebestelling aan die oorledene die laaste eer betoon word.

Google Translate: A cemetery is sometimes called a graveyard, because cemeteries initially mostly in the court (= garden or yard) of a church laid out. Within the cemeteries one finds sometimes a called kerkhofhuis with greetings or goodbyes space where there shortly before the burial pay the last respects to the deceased.

Dictionary (word for word): A cemetary become also sometimes a graveyard called, because cemetaries initially mostly in the court ( = garden or plot) of a church have-flair-for is. Inside the cemetaries find a human sometimes a so-called graveyard-house with greetings- or separation-space where there short in-front the internment on the deceased the last honour show become.

My translation: A cemetary is also sometimes called a "kerkhof" (church court), because in the old days cemetaries were often found inside a church yard ("hof" is an old word for garden or yard). One thing often found within a cemetary is a so-called "kerkhofhuis" (cemetary house) with special rooms where those attending the burial can say their farewells and pay their last respects.


Allow me to point out some issues:

1. GT correctly realised that "genoem word" (is called) is the verb, whereas a dictionary translates "word" as a separate word (i.e. become), which has nothing to do with the actual verb. To explain in English, a rough dictionary translator will create the impression that "in cognito" means "inside a cognito", because in = inside.

2. The primary dictionary interpretation of "aangelê" is that it is the verbal form of the noun "aanleg", which means flair or skill. GT correctly realises that it is simply the past tense of "aanlê", which means to lay out (i.e. to create a layout).

3. Both GT and the dictionary method were fooled by the position of the Afrikaans verb "aangelê is", not realising that the verb in English should be directly after "cemetaries".

4. I'm surprised that GT mistranslated "sogenaamde" (so-called).

5. GT correctly translated "mens" (literally "human") as "one".

6. I cheated a bit with the dictionary -- "kerkhofhuis" is not in the dictionary, but "kerkhof" is and so is "huis".

7. I'm surprised that the dictionary's primary translation of "afskeid" was "separation", because my (presumably contextless) first impression of the word would have been the same as GT's (which is correct). The dictionary translation of "afskeid" is terribly wrong.

8. GT correctly realised that "voor die" means "before the" and not "in-front-of the".

9. GT correctly translates "betoon word" (to show) as a single verb, whereas the dictionary would translate "word" (become) as a separate word.

10. GT correctly translates "laaste eer" as "last respects", whereas the dictionary translates "eer" literally, and the literal translation is "honour", which IMO is sufficiently off the mark to be called a miss.

11. The dictionary translation of "teraardebestelling" (internment) is literally more correct, but I think I would have used "burial" (GT's translation) myself as well.

12. Both GT and the dictionary opted for a literal translation of "aanvanklik", whereas I would have gone for a figurative meaning that would imply "in days of old" or "in centuries past", which is what is implied by the Afrikaans (but only implied).

13. Both GT and the dictionary translated "begroeting" as "greeting", but the Afrikaans word specifically means (implies very strongly) the greeting one gives to a person who is leaving, not coming.

Samuel


[Edited at 2012-09-23 19:43 GMT]


 
Phil Hand
Phil Hand  Identity Verified
China
Local time: 07:14
Chinese to English
TOPIC STARTER
Thanks, Samuel Sep 23, 2012

I think that kind of detailed work is really interesting, and very useful in showing us what value the tools can offer.

One thing that has emerged from this is the importance of identifying what are sometimes called "lexemes" - the correct unit for dictionary look up. A lexeme is often a word, but it can be a phrase; in German it can even be part of a word. This is also an issue in my pair, because Chinese is written without spaces.

My rants and my little bet notwithsta
... See more
I think that kind of detailed work is really interesting, and very useful in showing us what value the tools can offer.

One thing that has emerged from this is the importance of identifying what are sometimes called "lexemes" - the correct unit for dictionary look up. A lexeme is often a word, but it can be a phrase; in German it can even be part of a word. This is also an issue in my pair, because Chinese is written without spaces.

My rants and my little bet notwithstanding, it's clear that completely dumb word-by-word look up is deficient in many ways. If a computer can correctly analyse a sentence into lexemes, then that would be a considerable benefit.
Collapse


 
heikeb
heikeb  Identity Verified
Member (2003)
English to German
+ ...
Honest mistakes? Sep 23, 2012

Phil Hand wrote:

2) seems to me to be as effective and more honest.


Why should the mistakes of one system be more "honest" than those of another? But I think I know what you mean by that.

However, I think you're overall argument is a bit off.
On the one hand, you agree that a word lookup tool (WLT) would need to have certain functionalities added in order to deliver actually useful information, ie. be effective (which would turn the WLT into an MT system, though).
On the other hand, you keep insisting that WLTs are "as effective and more honest" as/than MT systems (which already offer these functionalities, even if they are not always successful).


Some facts remain:
With both systems, there is obviously a good probability that the outcome is incorrect or misleading. Therefore, this argument is not about achieving perfection.
But with MT (even at its current state), there is at least a reasonably good chance that the system is able to handle the discussed problematic issues correctly, whereas WLTs will never get those right. Never ever. (Unless they start to incorporate more features of MT systems and will cease to be WLTs.)


 
Phil Hand
Phil Hand  Identity Verified
China
Local time: 07:14
Chinese to English
TOPIC STARTER
I want to get a bit more granular than that Sep 23, 2012

Heike Behl, Ph.D. wrote:

On the one hand, you agree that a word lookup tool (WLT) would need to have certain functionalities added in order to deliver actually useful information, ie. be effective (which would turn the WLT into an MT system, though).


No, I wouldn't accept that, and the whole point of this thread is to get beyond this conception of "MT or nothing".

There are several clearly distinguishable jobs to be done in MT, and some of them computers can do well, some of them they can't.

They include:

1) Identifying words/lexemes in the source text.
2) Parsing the source text.
3) Selecting likely target equivalents based on context.
4) Assembling target equivalents into a target text.

(Of course, purely statistical MT tool like GT do this all in one step - or one could say that they don't do any of this.)

We've been talking about (1), which I think is quite a useful concepts. It would be very useful for a computer to do, and I think it's quite achievable. With a well constructed dictionary, I think a computer could perform very well on (1). But I wouldn't say that would "turn it into an MT system". It would turn it into a word identification system.

My objections to MT grow larger as we move through the list - I think that when you get to (4), you're well beyond the pale, because computers do not have the ability to construct decent texts.

Now, historically, these functions are today elided together because grammar-based MT failed, and statistical MT is now seen as the best chance for a breakthrough. And statistical MT ignores these steps.

But I don't really want to talk about MT at all - I want to talk about smart translator's tools. Complete MT programs are rubbish translator's tools, IMO. But partial tools - perhaps a tool that did (1) and (3) from my list - that could be very useful.


 
heikeb
heikeb  Identity Verified
Member (2003)
English to German
+ ...
:-) Sep 23, 2012

Phil Hand wrote:
But I don't really want to talk about MT at all - I want to talk about smart translator's tools. Complete MT programs are rubbish translator's tools, IMO. But partial tools - perhaps a tool that did (1) and (3) from my list - that could be very useful.


For (1), you'd already need a more sophisticated morphology tool than most WLTs probably offer, plus disambiguation and some kind of parsing. In case of German, you'd even need a perfectly parsed source sentence to be able to identify separable verbs, for instance. These abilities are not that easy as they might seem to be. Not all problems MT systems have stem from the lack of context or inability to transfer words correctly into the target language. And this is not due to the inability of computers, but the complexity of languages refusing to be locked down in a few simple rules. If your initial analysis of the source sentence contains errors, you're already in trouble. It would already take a considerable amount of time to develop those aspects for a WLT.

"I don't want to talk about MT at all" - Why did you post this then under the topic Machine Translation?

OK, if you want to talk about useful translator tools...
I wouldn't want to use either MT or word lookup tools for my translation work as gleaning the useful bits and pieces would take way more time than finding the correct translation somewhere else or typing the correct sentence from scratch. Particularly since there might not even be any useful bits and pieces.

What exactly do you find useful with WLT for translators? Even if they had the features you wish for?


 
Rolf Keller
Rolf Keller
Germany
Local time: 01:14
English to German
What do MLTs do? Sep 29, 2012

[quote]Heike Behl, Ph.D. wrote:

"inflected forms", "capitalization", "compounds"

"any word lookup tool must at least be able to {cope with}"


Must? Why? The translator can take care of such things by modifying the terms manually.

What exactly do you find useful with WLT for translators?


1: You can benefit from your personal passive vocabulary, which is more extensive than your active one.
2: You can search several ressources in one go.

1 & 2 --> Less time involved and better translations.

Last year I wrote a small article (in German) on this topic: http://www.tw-h.de/rolf_keller/Multifultor.1.1.0.0/Flyer.pdf

BTW, you called Linguee "a WLT". Actually, Linguee is a data base. As I see it, a WLT is a software that can acccess this and many other data bases. (Software misnomers ...)


 
Pages in topic:   < [1 2]


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Software misnomers and how to use MT






Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »
Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »