Pages in topic:   [1 2] >
Statistics about language length
Thread poster: Madeleine MacRae Klintebo
Madeleine MacRae Klintebo
Madeleine MacRae Klintebo  Identity Verified
United Kingdom
Local time: 17:25
Swedish to English
+ ...
Mar 6, 2008

This request might sound a bit weird, but I'm trying to make the people who develop our sites a bit more "language conscious".

I've been Googling for days trying to find statistics giving the length of text in various languages. I know I've previously read threads here on Proz in which people have stated that X language is Y% longer/shorter than Z language. Somehow I don't seem to be able to come up with the correct search words to find these threads.

I'm trying to put
... See more
This request might sound a bit weird, but I'm trying to make the people who develop our sites a bit more "language conscious".

I've been Googling for days trying to find statistics giving the length of text in various languages. I know I've previously read threads here on Proz in which people have stated that X language is Y% longer/shorter than Z language. Somehow I don't seem to be able to come up with the correct search words to find these threads.

I'm trying to put together a case which I intend to put to my company's website designers. I'm somewhat fed up with being told: your translation is TOO long. OK, often I can accommodate, but what really got me was translating a one word label (original 6 ch, translation 7 ch) and a button (original 13 ch, translation 14 ch). These two items (label and button) had to go on the same line, but the designers had not left a single pixel extra for us translators to use. So my translation was TOO long...

I'm now on a mission to change the attitude of our designers (not easy, but Mount Everest was concurred). To do this I need some statistics about the length of languages, particularly in relation to English (the source language of all our sites and other material). Just to back me up.

The languages I'm most interested in are (in relation to English): Swedish, Dutch, German. French, Italian, Spanish, Russian and Japanese.

If anyone can give me statistics about how these languages "score" in relation to English I would very much appreciate it. Or just point me to a site giving me this info.

Thanks,
Mads
Collapse


 
patyjs
patyjs  Identity Verified
Mexico
Local time: 10:25
Spanish to English
+ ...
Don't have a link I'm afraid... Mar 6, 2008

but I did have an interesting conversation, just last night in fact, with a client who mentioned that translating Spanish into French makes the text much longer, a surprise to me, and the opposite of what happens when you translate into English.

It might be an idea, if no-one can come up with a link, to:

1. Use the United Nations resources which have texts in several language combinations and do word counts on the languages that interest you;

2. Try using
... See more
but I did have an interesting conversation, just last night in fact, with a client who mentioned that translating Spanish into French makes the text much longer, a surprise to me, and the opposite of what happens when you translate into English.

It might be an idea, if no-one can come up with a link, to:

1. Use the United Nations resources which have texts in several language combinations and do word counts on the languages that interest you;

2. Try using a variety of translation glossaries where straight translations are given, not definitions, and do word counts.

I sometimes have a similar problem when images are pasted into a Word doc. with text written inside a shape. In order to make 'knowledge' fit where 'saber' once was you either have to make the font a lot smaller, which makes it looks odd in comparison to the rest of the image, or make the shape which contains it larger, which has the same effect. To make it look presentable I often end up redoing the whole image in powerpoint and then pasting it back.

Sorry I can't be more helpful. Interesting topic though.

Good luck!

Paty
Collapse


 
JennyC08 (X)
JennyC08 (X)
Local time: 12:25
German to French
+ ...
Localization industry Mar 6, 2008

Hello,

I am not sure to what extent it will help you, but in the software localization industry, we usually hear to allow 30 to 50% extra space for text expansion.
You can refer to that Microsoft article:

http://msdn2.microsoft.com/en-us/library/ms894183.aspx

HTH

Caroline


 
Madeleine MacRae Klintebo
Madeleine MacRae Klintebo  Identity Verified
United Kingdom
Local time: 17:25
Swedish to English
+ ...
TOPIC STARTER
Thank you Mar 6, 2008

Paty:

I'd thought of doing this, only I'm somewhat lazy and hoped the statistics wewre alredy out there in cyber space.

Caroline:

Even if it doesn't give the percentage for each language, this should be a great tool to use for knocking some sense into the head of non-linguists.

"Translation is a difficult task" - hear, hear.

Thanks again,
Madeleine


 
Gennady Lapardin
Gennady Lapardin  Identity Verified
Russian Federation
Local time: 19:25
Italian to Russian
+ ...
IT -> RU -> ENG Mar 6, 2008

The same text will be the longest in Italian, than in Russian and the shortest in English, by characters count. Agree with 30-50 per cent allowance. As to wordcount, I am not sure that the trend persists.
Trados should be the best tool for such statistics (wordcount to characters count relation).
The subject of the text also matters. In scientific, engineering, IT and all finance fields the English texts should be much shorter than in any other language (by word and characters count
... See more
The same text will be the longest in Italian, than in Russian and the shortest in English, by characters count. Agree with 30-50 per cent allowance. As to wordcount, I am not sure that the trend persists.
Trados should be the best tool for such statistics (wordcount to characters count relation).
The subject of the text also matters. In scientific, engineering, IT and all finance fields the English texts should be much shorter than in any other language (by word and characters count), since in those topics English is used generally as the source language.
The legal texts (such as e.g. insurance agreements, mamma mia) - no-o-o, the English original ones are much longer.

[Edited at 2008-03-06 22:43]
Collapse


 
Ken Cox
Ken Cox  Identity Verified
Local time: 18:25
German to English
+ ...
my experience.... Mar 6, 2008

...with German to English and Dutch to English is that the *character counts* will be within a few percent of each other, depending on the subject and the writing styles of the original author and the translator. With a Dutch source text, the English character count (with my writing style) may be as much as 5% higher or lower than than the Dutch character count, depending on how verbose the author of the Dutch text is (and some are quite verbose). With a German source text, my English character ... See more
...with German to English and Dutch to English is that the *character counts* will be within a few percent of each other, depending on the subject and the writing styles of the original author and the translator. With a Dutch source text, the English character count (with my writing style) may be as much as 5% higher or lower than than the Dutch character count, depending on how verbose the author of the Dutch text is (and some are quite verbose). With a German source text, my English character count is usually slightly higher than the German character count, but this naturally varies from document to document, and sometimes the English count is less than the German count.

I doubt that you will find much info on this on the Web, but you could do some research by comparing counts of equivalent multilingual texts. A good starting point would be EU publications, which are available in all official EU langauges. You will also find lots of French/Engish texts on Canadian (government) sites, German/French on Swiss sites, and Dutch/French on Belgian sites.

Most German federal government sites are now at least partially bilingual (English/German), and a fair amount of German legislation is available on the Web in English translation.

[Edited at 2008-03-06 23:39]

[Edited at 2008-03-06 23:41]
Collapse


 
Heinrich Pesch
Heinrich Pesch  Identity Verified
Finland
Local time: 19:25
Member (2003)
Finnish to German
+ ...
Not suitable in your case I'm afraid Mar 7, 2008

For single words or short expression these statistics are not suitable. Usually Finnish runs 10 % shorter than German and at about the same length than English, but with single phrases anything between -50 % and + 100 % might occur.
Once a customer wanted me to shorten strings for a digital tv-reciever. In English and German the word was 7 chars long, but for Finnish they had left only 3 chars. I refused and lost the customer.
I have seen a Finnish translation of a German manual whic
... See more
For single words or short expression these statistics are not suitable. Usually Finnish runs 10 % shorter than German and at about the same length than English, but with single phrases anything between -50 % and + 100 % might occur.
Once a customer wanted me to shorten strings for a digital tv-reciever. In English and German the word was 7 chars long, but for Finnish they had left only 3 chars. I refused and lost the customer.
I have seen a Finnish translation of a German manual which was 30 % shorter than the source. And I have seen an English translation of a Finnish original which was 20 % longer than my German translation. So anything is possible. It depends on style.

When memory space is so dirt cheap nowadays I really don't understand why designers can't be more generous.
Cheers
Heinrich
Collapse


 
Zamira B.
Zamira B.  Identity Verified
United Kingdom
Local time: 17:25
Member (2006)
English to Russian
+ ...
Russian Mar 7, 2008

Madeleine MacRae Klintebo wrote:

The languages I'm most interested in are (in relation to English): Swedish, Dutch, German. French, Italian, Spanish, Russian and Japanese.


It has been said on a Russian forum that on average one page in English makes 300 words and 250 words in Russian as almost always Russian words are longer than English ones and we have no articles in Russian, i.e translation from English into Russian needs +15-20 % space. Actually my own experience proves that statement, whenever I am given one page English>Russian translation I end up with 1.15-1.2 page in Russian.

Zamira


 
James McVay
James McVay  Identity Verified
United States
Local time: 12:25
Russian to English
+ ...
Russian to English Mar 7, 2008

I agree with Zamiray. My personal experience is that I produce, on the average, 1.2 English words for each Russian word in the source text. However, from eyeball comparisons, I can tell you that my translation is usually shorter than the original Russian text.

 
Christine Andersen
Christine Andersen  Identity Verified
Denmark
Local time: 18:25
Member (2003)
Danish to English
+ ...
Texts 20% but single words are often much longer from Danish to English Mar 7, 2008

One of the agencies I work for reckons there are on average 20% more words in an English text than in Danish, and adjust their rates accordingly.
But if you are translating single words as in headings and labels, you often, though not always, need far more space in English. The same applies to the Swedish I see too. There are more spaces between the words for a start!

E.g. -
Priskategori becomes
price category

Dækningsbidrag becomes
Contribut
... See more
One of the agencies I work for reckons there are on average 20% more words in an English text than in Danish, and adjust their rates accordingly.
But if you are translating single words as in headings and labels, you often, though not always, need far more space in English. The same applies to the Swedish I see too. There are more spaces between the words for a start!

E.g. -
Priskategori becomes
price category

Dækningsbidrag becomes
Contribution margin

The Scandinavian
xxxyyy word structure becomes
nnn of xxx in English... or even
nnnn of xxxx
And so on, for lots of logical and illogical reasons

Occasionally, reducing the size of the lettering (Arial from 12 to 11.5 or whatever) solves the problem, especially if the titles are simply enlarged to fill the available space, but it depends on how flexible the overall design is.

The layout of your body text is a very critical factor too.

Solid paragraphs do not take up much more space in English than in Danish or Swedish, but a parts of a contract, for instance, with many short clauses and empty spaces between can take up far more space in the target language. Clauses that take up one line in Danish will just run over in English - into two lines!

Or abbreviations are used, which have to be explained in full in the target text...
DI becomes
the Confederation of Danish Industries ....
at least the first time it is mentioned.

And so on.
Even where the target language takes up less space than the source, there is a need for a certain flexibility to produce a document that looks presentable.

DTP folk and layout designers simply have to take it into consideration.

Happy translating! That's another reason why we can't leave it all to the machines.

Collapse


 
Ken Cox
Ken Cox  Identity Verified
Local time: 18:25
German to English
+ ...
agree with Christine Mar 7, 2008

Intelligent advice IMO.

Also, as an interested outsider where it comes to graphic design, it seems to me that as a basic principle here it's better to allow lots of white space so you have room to play with instead of trying to fill all the available space. Even newspaper publishers (except the Wall Street Journal) have come to see the value of white space.

[Edited at 2008-03-07 10:25]


 
hazmatgerman (X)
hazmatgerman (X)
Local time: 18:25
English to German
EU bitexts Mar 7, 2008

@ Ken Cox and @ asker: I agree that EU texts are basically suitable for statistical analyses but have one caveat and one suggestion (looking at the bright side...) to make. The caveat is that EU *legislation* language versions need to be pagination neutral for publication in the OJ, which means some texts may have been fit into available space. The suggestion is to use the tmx files recently made available by the EU and base statistics on phrase-by-phrase analysis. But beware: my pair en-de yiel... See more
@ Ken Cox and @ asker: I agree that EU texts are basically suitable for statistical analyses but have one caveat and one suggestion (looking at the bright side...) to make. The caveat is that EU *legislation* language versions need to be pagination neutral for publication in the OJ, which means some texts may have been fit into available space. The suggestion is to use the tmx files recently made available by the EU and base statistics on phrase-by-phrase analysis. But beware: my pair en-de yields a very large aggregate file and contains thousands of duplicates that must be deleted. I also found that native speakers of English tend to produce shorter translations from the German than I do, and that my German renderings tend to be a bit shorter than theirs. However, that's only my subjective experience and not necessarily true elsewhere. But it may point to comparing original rather than translated texts.
Regards
Collapse


 
Henry Hinds
Henry Hinds  Identity Verified
United States
Local time: 10:25
English to Spanish
+ ...
In memoriam
Spanish Mar 7, 2008

There is no reliable statistic, but for many kinds of work an allowance of about 15% more space should be allowed for Spanish over English. This is especially important in published materials to avoid a space crunch. It is mostly due to the structure of Spanish as a romance language.

That said, I have found that many legal documents can often come to about an equal word count in Spanish and English, based on my experience with US English and Mexican legal language going both ways.... See more
There is no reliable statistic, but for many kinds of work an allowance of about 15% more space should be allowed for Spanish over English. This is especially important in published materials to avoid a space crunch. It is mostly due to the structure of Spanish as a romance language.

That said, I have found that many legal documents can often come to about an equal word count in Spanish and English, based on my experience with US English and Mexican legal language going both ways.

There does not appear to be any appreciable difference in word length, only in number of words and thus space.

Individual writing styles can have a big influence as well, and I have trained myself to be efficient in that respect, especially in Spanish. Some of my clients are not.
Collapse


 
Madeleine MacRae Klintebo
Madeleine MacRae Klintebo  Identity Verified
United Kingdom
Local time: 17:25
Swedish to English
+ ...
TOPIC STARTER
Thank you everyone Mar 7, 2008

Your replies have been very interesting and thought provoking. I'll definitely use this info when I, hopefully, get a chance to join the designers and developers at the "beginning" of the process rather than at the end.

One reason our translation often turn out longer has to do with the field we translate in. We are translating material about financial markets. In this field new inventions are usually developed in English and "everyone knows" what a "rollover" is. I and my colleagu
... See more
Your replies have been very interesting and thought provoking. I'll definitely use this info when I, hopefully, get a chance to join the designers and developers at the "beginning" of the process rather than at the end.

One reason our translation often turn out longer has to do with the field we translate in. We are translating material about financial markets. In this field new inventions are usually developed in English and "everyone knows" what a "rollover" is. I and my colleagues have to translate not only the term, but also the whole concept. And then fit this translation into a space without a single extra pixel...

Madeleine
Collapse


 
José Henrique Lamensdorf
José Henrique Lamensdorf  Identity Verified
Brazil
Local time: 13:25
English to Portuguese
+ ...
In memoriam
Portuguese Mar 7, 2008

In the translation from English into Portuguese, character count is expected to "swell" something between 0 to 20%. Apparently the subject plays a role in this, and it would be theoretically possible to narrow that range into sub-ranges by subject.

As a sworn translator in Brazil, where mandatory government-set rates apply to the target>/i> text, clients often ask me for an estimate. These sworn translations always include an introduction, something usually between 250 and 500 ch
... See more
In the translation from English into Portuguese, character count is expected to "swell" something between 0 to 20%. Apparently the subject plays a role in this, and it would be theoretically possible to narrow that range into sub-ranges by subject.

As a sworn translator in Brazil, where mandatory government-set rates apply to the target>/i> text, clients often ask me for an estimate. These sworn translations always include an introduction, something usually between 250 and 500 chars counting spaces, and all local acronyms and other things have to be explained out. So I apply 20% on top of the character count without spaces in English, and thus have reached 95% accuracy or higher often enough to amaze me.
Collapse


 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Statistics about language length






Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »
CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

Buy now! »