Pages in topic:   [1 2] >
CatGuru video request: term base import/recognition speed comparison
Thread poster: Hans Lenting

Hans Lenting  Identity Verified
Netherlands
Member (2006)
German to Dutch
+ ...
Sep 28, 2012

Dear Dominique,

As a big fan of your informative videos I'd like to request a video in which you compare import speed, recognition speed and recognition accuracy of large datasets (say 500,000 term pairs) of the 'famous mainstream CAT tools'.

It would also be worthwhile to take CAT tools like Transit NXT (with MS SQL database engine) and CafeTran into account.

Since this will be a very complex benchmarking project (advanced features like stemming, case-awareness etc. make it difficult to compare), I could give you a hand on Transit and CafeTran.

Perhaps make a second video too: How fast are global changes in the dataset possible? What features are available for cleaning, merging and optimisation? Enough work to fill the long and cold Finnish*) autumn evenings.

Looking forward to a positive decision,

Hans

*) I actually wrote 'Finish' first)


Direct link Reply with quote
 

Michael Joseph Wdowiak Beijer  Identity Verified
United Kingdom
Local time: 12:13
Member (2009)
Dutch to English
+ ...
1. speed with large datasets + 2. data management Sep 28, 2012

Sounds very interesting Hans.

Speed with large datasets. The other day, just for the hell of it, I tried to import a large TMX into DVX2. I work in memoQ, so was surprised when it took 20 minutes, on my very fast computer. Since I actually have about 30 different TMs in memoQ at the moment, the prospect of moving all my stuff over to DVX2 in order to try it out is not looking that great.

Data management. I think Hans' second suggestion is actually even more interesting: 'What features are available for cleaning, merging and optimisation? I would love to see how other tools compare to memoQ, for example. As I have already said many times on the memoQ list, and in messages to Kilgray Support, data management really needs to be improved in memoQ. It is a great TEnT, with a very nice UI and all manner of features that I love, but data management is simply poor.

Michael


Direct link Reply with quote
 

Hans Lenting  Identity Verified
Netherlands
Member (2006)
German to Dutch
+ ...
TOPIC STARTER
Dr. Grauert & CAT tools Sep 28, 2012

Michael Beijer wrote:

Sounds very interesting Hans.


Thanks for your kind words . I now feel encouraged to go even one step further and present an idea that I have for years now.

When comparing printers people often use the famous Dr. Grauert letter:

http://de.wikipedia.org/wiki/Dr.-Grauert-Brief

Why not create such a benchmark text ourselves in order to test the quality of fuzzy matching and subsegment leverage, in order to compare our CAT tools?

What would be a good language combination for this?

Hans


Direct link Reply with quote
 

Dominique Pivard  Identity Verified
Local time: 14:13
Finnish to French
niche vs. mainstream Sep 28, 2012

Hans Lenting wrote:
As a big fan of your informative videos I'd like to request a video in which you compare import speed, recognition speed and recognition accuracy of large datasets (say 500,000 term pairs) of the 'famous mainstream CAT tools'.

I try to make videos that will appeal to as many people as possible. Very few freelance translators have termbases of that size, so most wouldn't relate with such a subject.

Besides, aspects such as recognition accuracy depend on languages to some extent. I will probably do one about Finnish as source, because I have a vested interest
Hans Lenting wrote:
It would also be worthwhile to take CAT tools like Transit NXT (with MS SQL database engine) and CafeTran into account.

Again, Transit is a niche tool from my perspective, and I don't even have access to it. CafeTran is also a niche tool, but it looks truly interesting to me, so I will cover it one day, in one way or the other.
Hans Lenting wrote:
Since this will be a very complex benchmarking project (advanced features like stemming, case-awareness etc. make it difficult to compare), I could give you a hand on Transit and CafeTran.

In my experience, as soon as you start to compare more than two tools, you have to focus on a very narrowly defined subject. Your "complex benchmarking" would result in a 30-45 minute video. The attention span of people nowadays is very short: anything longer than 5 minutes must be truly interesting if you want them to watch until the end.
Hans Lenting wrote:
Perhaps make a second video too: How fast are global changes in the dataset possible? What features are available for cleaning, merging and optimisation? Enough work to fill the long and cold Finnish*) autumn evenings.

Too esoteric for me, but maybe it would be something for you?


Direct link Reply with quote
 

Hans Lenting  Identity Verified
Netherlands
Member (2006)
German to Dutch
+ ...
TOPIC STARTER
Nice vs. mainstream Sep 28, 2012

Dominique Pivard wrote:

In my experience, as soon as you start to compare more than two tools, you have to focus on a very narrowly defined subject. Your "complex benchmarking" would result in a 30-45 minute video. The attention span of people nowadays is very short:


Sorry, what were you saying?

Too esoteric for me, but maybe it would be something for you?


Yeah, who knows. (Winter evenings are cold in Holland too.)


Direct link Reply with quote
 

Egidijus Slepetys  Identity Verified
Local time: 14:13
German to Lithuanian
My personal experience with Transit XV (Termstar) Sep 29, 2012

When adding a term into termbase (75k term records, MS SQL 2000, Win 7 64, SSD), the cursor clock shows up only for a tiny moment: I just notice, that the term was added, no interruption of my work at all.

This was not the case with MS Access database - adding a term took much longer (~1s - this is too long to me personally) and the cursor clock froze in a position - that was not perfect.

Inserting terms form dictionary into translation is just like pasting them - no delay at all (Hans, if you are having problems with it, then this is really strange).

My advice to Hans - try out Transit in a REAL Windows 7 environment - not on a Virtual Machine.

Could you please record a video showing the behavior of Termstar?

Egidijus


Direct link Reply with quote
 

Hans Lenting  Identity Verified
Netherlands
Member (2006)
German to Dutch
+ ...
TOPIC STARTER
I *did* use Transit on a PC ... Sep 29, 2012

Egidijus Slepetys wrote:

My advice to Hans - try out Transit in a REAL Windows 7 environment - not on a Virtual Machine.



Thanks for the good advice, but before I bought my iMac I used Transit XV for years on several PCs. With Access and with SQL. The slow response existed there too.


Direct link Reply with quote
 

Hans Lenting  Identity Verified
Netherlands
Member (2006)
German to Dutch
+ ...
TOPIC STARTER
The result of extensive benchmarking tests /SPOILER Sep 29, 2012

Hans Lenting wrote:

As a big fan of your informative videos I'd like to request a video in which you compare import speed, recognition speed and recognition accuracy of large datasets (say 500,000 term pairs) of the 'famous mainstream CAT tools'.



Since Dominique is not taking up the challenge, I'll report the results of my own experiments with importing terms in several CAT tools. The overall winner is CafeTran, that allows immediate opening of TMs and glossaries, without the need of importing them. (IIRC Wf Classic can do the same trick – but I don't want to translate in Word anymore.)

Second best is MemoQ, that only took a few minutes to import the test file. Places 3 and 4 were shared by Déjà Vu and Transit NXT (with SQL). The slowest import (in fact I had to cancel it after many hours of testing) was achieved by Wordfast Pro and Swordfish.

All tests were performed on a very fast iMac with SSD and 16 GB RAM, Core 7.

Note that I didn't compare ease of making global changes to terminology (e.g. because of a new spelling, a new preferred translation etc.). For instance in Transit (much praised for its advanced terminology module) these changes are quite complicated and very time consuming. AAMOF they are not feasible during the translation process. In CafeTran these changes can be made in seconds, after which the glossary has to be reloaded, which also takes a few seconds.


Direct link Reply with quote
 

Michael Joseph Wdowiak Beijer  Identity Verified
United Kingdom
Local time: 12:13
Member (2009)
Dutch to English
+ ...
@Hans: Sep 29, 2012

CafeTran sounds really interesting, it's just too bad that every time I open it to have a look, I can't seem to figure out how to do anything. It has some great features – I just wish Igor had tried to copy memoQ's UI instead of inventing his own from scratch.

Michael

PS: One small thing that is really annoying, at least on Win7, is the Open File dialogue box. Whenever I try and navigate to a file to be translated, or a TM, I have to switch from using my Wacom tablet stylus to my mouse (well, my Contour RollerMouse), because my double-clicks no longer work to open folders using the tablet stylus.

PPS: Interesting idea about the 'Dr. Hans letter'. I suppose you would have to have one in every language someone might want to test. However, once one existed, MOUSE tool vendors could of course cheat by optimising their tool to handle that one page particularly well... Also, there would be the problem of selecting a type of text; technical, legal, literary, etc. Good luck!

[Edited at 2012-09-29 11:22 GMT]


Direct link Reply with quote
 

Egidijus Slepetys  Identity Verified
Local time: 14:13
German to Lithuanian
What is slow? Sep 29, 2012

The slow response existed there too.


0,1s, 0,5s, 1s?


Direct link Reply with quote
 

Hans Lenting  Identity Verified
Netherlands
Member (2006)
German to Dutch
+ ...
TOPIC STARTER
Different interfaces Sep 29, 2012

Michael Beijer wrote:

CafeTran sounds really interesting, it's just too bad that every time I open it to have a look, I can't seem to figure out how to do anything. It has some great features – I just wish Igor had tried to copy memoQ's UI instead of inventing his own from scratch.


Perhaps you want to move this thread to the CafeTran forum? Nevertheless, a quick reply here. What is it that you prefer here:



And miss here:




PS: One small thing that is really annoying, at least on Win7, is the Open File dialogue box. Whenever I try and navigate to a file to be translated, or a TM, I have to switch from using my Wacom tablet stylus to my mouse (well, my Contour RollerMouse), because my double-clicks no longer work to open folders using the tablet stylus.



Could this be Java related?

Hans


Direct link Reply with quote
 

Hans Lenting  Identity Verified
Netherlands
Member (2006)
German to Dutch
+ ...
TOPIC STARTER
Slow is: not being able to proceed with work Sep 29, 2012

Egidijus Slepetys wrote:

The slow response existed there too.


0,1s, 0,5s, 1s?


Up to 5 secs


Direct link Reply with quote
 

Michael Joseph Wdowiak Beijer  Identity Verified
United Kingdom
Local time: 12:13
Member (2009)
Dutch to English
+ ...
@Hans: Sep 29, 2012

Well, a few things:

• One thing about the UI that is puzzling, for example, is that I can't seem to figure out how to quickly skip around through segments in CT. In memoQ you just click on a src or trgt segment and you are in it. How do you do this in CT?

• The segment numbers in CT are strangeley placed. It's hard to tell if they refer to the segment above or below.

• memoQ, I think the TM, TB, and MT hits (all in the Translation results window) are better integrated.

• In memoQ, clicking on 'Project home' brings you to a very nice overview of translations, LiveDocs, TMs, TBs, and settings. There seems to be no equivalent in CT.

• I love the simplicity of the open (or closed) tabs for documents you are translating in memoQ.

• Hits are highlighted in memoQ. CT doesn't seem to have this.

• In memoQ's you can quickly filter on phrases in both the source and target column, not to mention on pretty much anything else you can think of (commented segment, match rate, proofread, auto-joined.split, not started, pre-translated, etc).

Sorry, I'll ask any more questions that I might have in the CT forum!

Michael


Direct link Reply with quote
 

Egidijus Slepetys  Identity Verified
Local time: 14:13
German to Lithuanian
5 secs ???? Sep 29, 2012

that is way too much!!!
I'm having a 0,1-0,5 sec delay - I wouldn't even call it delay (but anything over 1 sec annoys me).


Direct link Reply with quote
 

Hans Lenting  Identity Verified
Netherlands
Member (2006)
German to Dutch
+ ...
TOPIC STARTER
75,000 terms is not enough to note the delay Sep 30, 2012

Hi Egidijus,

Egidijus Slepetys wrote:

that is way too much!!!



Exactly .

It sounds little, but let me assure you, it takes the 'flow' out of your translation process.


I'm having a 0,1-0,5 sec delay - I wouldn't even call it delay (but anything over 1 sec annoys me).


Just wait until you've tripled your number of terms ... You'll note it – I'm sure about that.

Hans


Direct link Reply with quote
 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

CatGuru video request: term base import/recognition speed comparison

Advanced search







WordFinder
The words you want Anywhere, Anytime

WordFinder is the market's fastest and easiest way of finding the right word, term, translation or synonym in one or more dictionaries. In our assortment you can choose among more than 120 dictionaries in 15 languages from leading publishers.

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search