How to guess language in cyrillic script?
Thread poster: Jan Sundström
Jan Sundström
Jan Sundström  Identity Verified
Sweden
Local time: 02:10
English to Swedish
+ ...
Jan 24, 2008

Hi all,

Is there any guide or smart overview how you distinguish between different languages, if you have a paper in your hand with cyrillic script?

Sometimes I get documents where I can't say whether it's Russian, Azeri, Mongolian, Macedonian or [insert exotic language here]...?

It would be very useful to have a quick reference chart, what to look for, to identify which language it is.

If we received the documents as files on the computer it w
... See more
Hi all,

Is there any guide or smart overview how you distinguish between different languages, if you have a paper in your hand with cyrillic script?

Sometimes I get documents where I can't say whether it's Russian, Azeri, Mongolian, Macedonian or [insert exotic language here]...?

It would be very useful to have a quick reference chart, what to look for, to identify which language it is.

If we received the documents as files on the computer it would be easy to cut/paste a sentence and search on the internet, or use language guessing software.
But these are mostly diplomas or forms with handwritten entries, stamps, stickers etc, which makes it cumbersome to scan, OCR etc.

I found this extensive alphabet list:
http://en.wikipedia.org/wiki/List_of_Cyrillic_letters

But I'm looking for a set of hard and fast rules, that I can use on the spot. Like: "if you see the letter Y, you can be sure it's the language X".

Is there any website or guide for this, or am I wishing for the impossible?!

/Jan
Collapse


 
Rossi Ignatova
Rossi Ignatova  Identity Verified
Local time: 01:10
Spanish to Bulgarian
+ ...
Possibly helpful link Jan 24, 2008

Hi Jan,

You may wish to try this link

http://www.library.yale.edu/cataloging/music/cyrillic.htm

Kind regards,

Rossi Ignatova


 
Marek Daroszewski (MrMarDar)
Marek Daroszewski (MrMarDar)  Identity Verified
Local time: 02:10
English to Polish
+ ...
Language identifier Jan 24, 2008

You might want to try this site:

http://www.xrce.xerox.com/competencies/content-analysis/tools/guesser

It works for a few languages I have tired out of curiosity.

Best,
Marek


 
mjbjosh
mjbjosh
Local time: 02:10
English to Latvian
+ ...
Depends on the writer Jan 24, 2008

I am not familiar with all the languages that you named (also, I think Azeri is using a modified Latin alphabet), but I think it depends on the writer. For example, when I am writing in Russian, I tend to use a "t" that resembles the Greek "t" rather than the Cyrillic one that looks like a Latin "m". Or Greek "d" for that matter, which looks in Cyrillic rather like the Latin "g".

[Edited at 2008-01-24 21:44]


 
esperantisto
esperantisto  Identity Verified
Local time: 04:10
Member (2006)
English to Russian
+ ...
SITE LOCALIZER
I doubt that simple hard rules can be derived. Jan 25, 2008

a) Even for the same language, there may be huge differences in historical view: the pre-revolutionary Russian script drastically differs from the modern.
b) One matter are Slavic languages that use Cyrillic generically and developed it in their own ways each (although, of course, under great influence of Russian), and the other matter are non-Slavic languages of the ex-USSR + Mongolian: their scripts were developed from Russian and are more uniform on one hand but more complex on the othe
... See more
a) Even for the same language, there may be huge differences in historical view: the pre-revolutionary Russian script drastically differs from the modern.
b) One matter are Slavic languages that use Cyrillic generically and developed it in their own ways each (although, of course, under great influence of Russian), and the other matter are non-Slavic languages of the ex-USSR + Mongolian: their scripts were developed from Russian and are more uniform on one hand but more complex on the other hand.

Well, learn languages, not scripts! It's just like for Latin.

However, many languages have specific letters. Just a couple of tips:

1. If your see Ўў, this may be Belarusian, Uzbek or some language of the Extreme North of the Russian Federation. I know nothing about the latter, but for the first two:
a) if you also see Ии, that's Uzbek;
b) otherwise, it's Belarusian.

Note: If it's a text from the 20s of the XXth century, Ў may be also in Ossetin, but I doubt you'll encounter it.

2. If Ӕӕ, Ossetin.

3. If Її, Ukrainian (or Ruthenian, but it's a minor language with no official status, not recognized as a separate language in Ukraine).

4. If Ӂӂ, Moldovan (Romanian).
Collapse


 
Radica Schenck
Radica Schenck  Identity Verified
Germany
Local time: 02:10
English to Macedonian
+ ...
F7 for texts in soft copy Jan 26, 2008

If you received the text, say, in Word, select a word from the text, then press F7 and you will get a pop up message like this: "There is no Thesaurus available for (eg.) Macedonian"...


As for the table on wikipedia, it's also a very good source: it tells you, for example, that the Macedonian alphabet is the only alphabet that has the letters Ќ and Ѓ...

The same conclusion for Ћ and Ђ for Serb
... See more
If you received the text, say, in Word, select a word from the text, then press F7 and you will get a pop up message like this: "There is no Thesaurus available for (eg.) Macedonian"...


As for the table on wikipedia, it's also a very good source: it tells you, for example, that the Macedonian alphabet is the only alphabet that has the letters Ќ and Ѓ...

The same conclusion for Ћ and Ђ for Serbian...

Good luck!
Collapse


 
Victor Quero
Victor Quero  Identity Verified
Local time: 02:10
Serbo-Croat to Spanish
+ ...
Some hints Jan 31, 2008

1. Only Ukrainian and Belarussian use the letter I i.

2. If you find the letter Є є, it's Ukrainian. 100% sure, since no other language uses it(unless it's a text in Old Church Slavonic, but that would be very odd, and you would recognize it for the medieval look and the letter Ѣ ѣ). Also, only Ukrainian uses Ï ïI i.

2. If you find the letter Є є, it's Ukrainian. 100% sure, since no other language uses it(unless it's a text in Old Church Slavonic, but that would be very odd, and you would recognize it for the medieval look and the letter Ѣ ѣ). Also, only Ukrainian uses Ï ï, and it does NOT use Ъ ъ neither Ы ы.

Note: There's a small language called Rusyn, which some considere a dialect of Ukrainian. If you find Є є and Ï ï but also Ъ ъ and Ы ы, it must be Rusyn.

3. If you find a language with both I i and Ў ў, it's Belarussian. It does NOT use Ъ ъ neither Щ щ.

4. For Slavic languages, the letter J j is only used by Serbian and Macedonian. (There's a small dialect of Sami which also uses it, but you would recognize it for some letters with a comma-like symbol attached: Ӊ ӊ, Ҋ ҋ, Ӆ ӆ).

5. Besides J j, only Serbian and Macedonian have the distinctive letters Љ љ and Њ њ

6. If you find a text with Ћ ћ and Ђ ђ, you can be 100% sure it's Serbian.

7. If you find a text with J j plus Ѓ ѓ and Ќ ќ, you can be 100% sure it's Macedonian.

8. I don't know much about non-Slavic languages which use Cyrillic, but they are often characterized by 'unusual' letters like Ә ә or Ä ä, and by modifications like Ғ ғ, Ұ ұ (the latter found in Kazakh).

9. If there is not any distinctive letter of the mentioned above (I, Є, J, Ў, Љ, Ћ, Ќ, neither Ә, Ғ, Ұ), then most likely it's Russian or Bulgarian.

10. To tell Russian from Bulgarian: Bulgarian uses very often the letter Ъ ъ, while in Russian it's only used in some specific cases. Bulgarian does not use Ë ë, but the combination ьо instead, which is very unusual in Russian (I would say it is only possible with certain foreign words). Unfortunately, Ë ë in Russian is most often written as simply E e.

Hope it helps...

[Editado a las 2008-01-31 12:05]
Collapse


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

How to guess language in cyrillic script?






Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »
CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

Buy now! »