A free term extraction tool?
Thread poster: eileenPL
eileenPL
eileenPL
Local time: 04:27
Polish to English
Jan 28, 2018

Hi all,

could anyone recommend a free extraction tool for terms and phrases that can be used offline?

Regards,
E.



[Edited at 2018-01-28 21:21 GMT]


 
MuMyongCy (X)
MuMyongCy (X)
South Korea
Local time: 11:27
English to Korean
it's not for Online though, Jan 29, 2018

https://www.proz.com/forum/sdl_trados_support/322396-word_cloud.html

 
Danesh
Danesh
Local time: 05:57
English to Persian (Farsi)
Okapi Rainbow Jan 29, 2018

Rainbow — is a GUI application to launch various utilities related to translation and localization tasks, such as: Text extraction (to XLIFF, OmegaT projects, RTF, etc.) and merging, pre-translation, encoding conversion, terms extraction, file format conversions, quality verification, translation comparison, search and replace on filtered text, pseudo-translation, and much more. Using the framework's pipeline mechanism, you can use Rainbow to create chains of steps that perform a custom set of... See more
Rainbow — is a GUI application to launch various utilities related to translation and localization tasks, such as: Text extraction (to XLIFF, OmegaT projects, RTF, etc.) and merging, pre-translation, encoding conversion, terms extraction, file format conversions, quality verification, translation comparison, search and replace on filtered text, pseudo-translation, and much more. Using the framework's pipeline mechanism, you can use Rainbow to create chains of steps that perform a custom set of tasks specific to your needs.
http://okapiframework.org/wiki/index.php?title=Rainbow
Collapse


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 04:27
Member (2006)
English to Afrikaans
+ ...
@Eileen Jan 29, 2018

eileenPL wrote:
Could anyone recommend a free extraction tool for terms and phrases...?


From what data sources will you be extracting?


 
Rossana Triaca
Rossana Triaca  Identity Verified
Uruguay
Local time: 23:27
English to Spanish
In addition to Okapi... Jan 29, 2018

you can also use the old free version of Xbench.

But the tool I really like the most is AntConc; it's not exactly a term extraction tool, but the word list and, particularly, the "keyness" lists are great to this end, and the video tutorials are easy to follow for newcomers to corpus analysis.

For specialized texts, I find that ha
... See more
you can also use the old free version of Xbench.

But the tool I really like the most is AntConc; it's not exactly a term extraction tool, but the word list and, particularly, the "keyness" lists are great to this end, and the video tutorials are easy to follow for newcomers to corpus analysis.

For specialized texts, I find that having a baseline like the Brown corpus is invaluable!
Collapse


 
eileenPL
eileenPL
Local time: 04:27
Polish to English
TOPIC STARTER
.doc files Jan 29, 2018

Samuel Murray wrote:

eileenPL wrote:
Could anyone recommend a free extraction tool for terms and phrases...?


From what data sources will you be extracting?


I'd like to extract phrases from .doc files. I'm a Trados user, but when I think of manual copying terms from files into the translation memory... it'd take ages!



Thanks for all the replies!

[Edited at 2018-01-29 20:07 GMT]


 
MuMyongCy (X)
MuMyongCy (X)
South Korea
Local time: 11:27
English to Korean
try mine. Jan 30, 2018

eileenPL wrote:
I'd like to extract phrases from .doc files. I'm a Trados user, but ...

It works well with SDL Trados Studio project file. It means you can use it for the all file formats which SDL Trados Studio accepts.

regards


 
David Turner
David Turner  Identity Verified
Local time: 04:27
French to English
+ ...
You could try my PhraseMiner tool: Jan 30, 2018

eileenPL wrote:

Could anyone recommend a free extraction tool for terms and phrases...?


http://asap-traduction.com/PhraseMiner

From a Word document, it extracts "internal" or "intra-document" fuzzy matches, sentences containing 5 or more common consecutive words, sentences that are subsegments of longer sentences, terms containing at least two or more words and appearing two or more times using stop-word lists, sentences containing two more of such extracted terms, etc.

David Turner


 
Arianne Farah
Arianne Farah  Identity Verified
Canada
Local time: 22:27
Member (2008)
English to French
There's a new free app in the SDL store Jan 30, 2018

It works quite well - takes a little tweaking at first to remove common words, but you can save those custom exclusion dictionaries and reuse them, so you'll only have to exclude 'that' 'which' 'will' 'allow', etc. once. You set your minimum word length, your minimum amount of times it's repeated and it'll generate this cloud of words, you can then whittle it down before creating an sdxliff from it that you'll use to populate a glossary - it sounds like a lot of steps, but it's quite painless... See more
It works quite well - takes a little tweaking at first to remove common words, but you can save those custom exclusion dictionaries and reuse them, so you'll only have to exclude 'that' 'which' 'will' 'allow', etc. once. You set your minimum word length, your minimum amount of times it's repeated and it'll generate this cloud of words, you can then whittle it down before creating an sdxliff from it that you'll use to populate a glossary - it sounds like a lot of steps, but it's quite painless


http://appstore.sdl.com/language/app/projecttermextract/817/
Collapse


 
MuMyongCy (X)
MuMyongCy (X)
South Korea
Local time: 11:27
English to Korean
functionality limited. Jan 31, 2018

Arianne Farah wrote:

It works quite well - takes a little tweaking at first to remove common words, but you can ...
http://appstore.sdl.com/language/app/projecttermextract/817/


It can not extract multiple words (2 words or 3 words pharases).
It sees only 1 word.

[Edited at 2018-01-31 10:18 GMT]


 
stevebpdx
stevebpdx
United States
'Bilingual' Term Extraction? - Rainbow May 1, 2018

Does Rainbow do 'Bilingual' Term Extraction?

It doesn't appear that it does. Just source term extraction.


 
DZiW (X)
DZiW (X)
Ukraine
English to Russian
+ ...
https://www.wordfast.net/wiki/PlusTools May 1, 2018

Besides AntConc, for .doc files I prefer free and flexible PlusTools, although it requires weighted-words and collocations filtering too.

David, not very convenient to ask a software via email, yet does it support Russian (Unicode/UTF) ?
TY


 
Jerome KURES
Jerome KURES
Canada
Local time: 22:27
English to French
+ ...
2 efficient tools for repeated multi-term extraction Apr 12, 2023

https://libraryguides.mcgill.ca/text-mining/AntConc
http://www.nactem.ac.uk/software/termine/


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

A free term extraction tool?







Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »