Pages in topic:   [1 2] >
CatGuru video request: term base import/recognition speed comparison
Thread poster: Hans Lenting

Hans Lenting  Identity Verified
Netherlands
Member (2006)
German to Dutch
Sep 28, 2012

Dear Dominique,

As a big fan of your informative videos I'd like to request a video in which you compare import speed, recognition speed and recognition accuracy of large datasets (say 500,000 term pairs) of the 'famous mainstream CAT tools'.

It would also be worthwhile to take CAT tools like Transit NXT (with MS SQL database engine) and CafeTran into account.

Since this will be a very complex benchmarking project (advanced features like stemming, case-awareness etc. make it difficult to compare), I could give you a hand on Transit and CafeTran.

Perhaps make a second video too: How fast are global changes in the dataset possible? What features are available for cleaning, merging and optimisation? Enough work to fill the long and cold Finnish*) autumn eveningsicon_wink.gif.

Looking forward to a positive decision,

Hans

*) I actually wrote 'Finish' firsticon_wink.gif)


 

Michael Beijer  Identity Verified
United Kingdom
Local time: 20:42
Member (2009)
Dutch to English
+ ...
1. speed with large datasets + 2. data management Sep 28, 2012

Sounds very interesting Hans.

Speed with large datasets. The other day, just for the hell of it, I tried to import a large TMX into DVX2. I work in memoQ, so was surprised when it took 20 minutes, on my very fast computer. Since I actually have about 30 different TMs in memoQ at the moment, the prospect of moving all my stuff over to DVX2 in order to try it out is not looking that great.

Data management. I think Hans' second suggestion is actually even more interesting: 'What features are available for cleaning, merging and optimisation? I would love to see how other tools compare to memoQ, for example. As I have already said many times on the memoQ list, and in messages to Kilgray Support, data management really needs to be improved in memoQ. It is a great TEnT, with a very nice UI and all manner of features that I love, but data management is simply poor.

Michael


 

Hans Lenting  Identity Verified
Netherlands
Member (2006)
German to Dutch
TOPIC STARTER
Dr. Grauert & CAT tools Sep 28, 2012

Michael Beijer wrote:

Sounds very interesting Hans.

Thanks for your kind words icon_wink.gif. I now feel encouraged to go even one step further and present an idea that I have for years now.

When comparing printers people often use the famous Dr. Grauert letter:

http://de.wikipedia.org/wiki/Dr.-Grauert-Brief

Why not create such a benchmark text ourselves in order to test the quality of fuzzy matching and subsegment leverage, in order to compare our CAT tools?

What would be a good language combination for this?

Hans


 

Dominique Pivard  Identity Verified
Local time: 22:42
Finnish to French
niche vs. mainstream Sep 28, 2012

Hans Lenting wrote:
As a big fan of your informative videos I'd like to request a video in which you compare import speed, recognition speed and recognition accuracy of large datasets (say 500,000 term pairs) of the 'famous mainstream CAT tools'.

I try to make videos that will appeal to as many people as possible. Very few freelance translators have termbases of that size, so most wouldn't relate with such a subject.

Besides, aspects such as recognition accuracy depend on languages to some extent. I will probably do one about Finnish as source, because I have a vested interest
Hans Lenting wrote:
It would also be worthwhile to take CAT tools like Transit NXT (with MS SQL database engine) and CafeTran into account.

Again, Transit is a niche tool from my perspective, and I don't even have access to it. CafeTran is also a niche tool, but it looks truly interesting to me, so I will cover it one day, in one way or the other.
Hans Lenting wrote:
Since this will be a very complex benchmarking project (advanced features like stemming, case-awareness etc. make it difficult to compare), I could give you a hand on Transit and CafeTran.

In my experience, as soon as you start to compare more than two tools, you have to focus on a very narrowly defined subject. Your "complex benchmarking" would result in a 30-45 minute video. The attention span of people nowadays is very short: anything longer than 5 minutes must be truly interesting if you want them to watch until the end.
Hans Lenting wrote:
Perhaps make a second video too: How fast are global changes in the dataset possible? What features are available for cleaning, merging and optimisation? Enough work to fill the long and cold Finnish*) autumn eveningsicon_wink.gif.

Too esoteric for me, but maybe it would be something for you?


 

Hans Lenting  Identity Verified
Netherlands
Member (2006)
German to Dutch
TOPIC STARTER
Nice vs. mainstream Sep 28, 2012

Dominique Pivard wrote:

In my experience, as soon as you start to compare more than two tools, you have to focus on a very narrowly defined subject. Your "complex benchmarking" would result in a 30-45 minute video. The attention span of people nowadays is very short:


Sorry, what were you saying?

Too esoteric for me, but maybe it would be something for you?


Yeah, who knowsicon_wink.gif. (Winter evenings are cold in Holland too.)


 

Egidijus Slepetys  Identity Verified
Local time: 22:42
German to Lithuanian
My personal experience with Transit XV (Termstar) Sep 29, 2012

When adding a term into termbase (75k term records, MS SQL 2000, Win 7 64, SSD), the cursor clock shows up only for a tiny moment: I just notice, that the term was added, no interruption of my work at all.

This was not the case with MS Access database - adding a term took much longer (~1s - this is too long to me personally) and the cursor clock froze in a position - that was not perfect.

Inserting terms form dictionary into translation is just like pasting them - no delay at all (Hans, if you are having problems with it, then this is really strange).

My advice to Hans - try out Transit in a REAL Windows 7 environment - not on a Virtual Machine.

Could you please record a video showing the behavior of Termstar?

Egidijus


 

Hans Lenting  Identity Verified
Netherlands
Member (2006)
German to Dutch
TOPIC STARTER
I *did* use Transit on a PC ... Sep 29, 2012

Egidijus Slepetys wrote:

My advice to Hans - try out Transit in a REAL Windows 7 environment - not on a Virtual Machine.



Thanks for the good advice, but before I bought my iMac I used Transit XV for years on several PCs. With Access and with SQL. The slow response existed there too.


 

Hans Lenting  Identity Verified
Netherlands
Member (2006)
German to Dutch
TOPIC STARTER
The result of extensive benchmarking tests /SPOILER Sep 29, 2012

Hans Lenting wrote:

As a big fan of your informative videos I'd like to request a video in which you compare import speed, recognition speed and recognition accuracy of large datasets (say 500,000 term pairs) of the 'famous mainstream CAT tools'.



Since Dominique is not taking up the challenge, I'll report the results of my own experiments with importing terms in several CAT tools. The overall winner is CafeTran, that allows immediate opening of TMs and glossaries, without the need of importing them. (IIRC Wf Classic can do the same trick – but I don't want to translate in Word anymore.)

Second best is MemoQ, that only took a few minutes to import the test file. Places 3 and 4 were shared by Déjà Vu and Transit NXT (with SQL). The slowest import (in fact I had to cancel it after many hours of testing) was achieved by Wordfast Pro and Swordfish.

All tests were performed on a very fast iMac with SSD and 16 GB RAM, Core 7.

Note that I didn't compare ease of making global changes to terminology (e.g. because of a new spelling, a new preferred translation etc.). For instance in Transit (much praised for its advanced terminology module) these changes are quite complicated and very time consuming. AAMOF they are not feasible during the translation process. In CafeTran these changes can be made in seconds, after which the glossary has to be reloaded, which also takes a few seconds.


 

Michael Beijer  Identity Verified
United Kingdom
Local time: 20:42
Member (2009)
Dutch to English
+ ...
@Hans: Sep 29, 2012

CafeTran sounds really interesting, it's just too bad that every time I open it to have a look, I can't seem to figure out how to do anything. It has some great features – I just wish Igor had tried to copy memoQ's UI instead of inventing his own from scratch.

Michael

PS: One small thing that is really annoying, at least on Win7, is the Open File dialogue box. Whenever I try and navigate to a file to be translated, or a TM, I have to switch from using my Wacom tablet stylus to my mouse (well, my Contour RollerMouse), because my double-clicks no longer work to open folders using the tablet stylus.

PPS: Interesting idea about the 'Dr. Hans letter'. I suppose you would have to have one in every language someone might want to test. However, once one existed, MOUSE tool vendors could of course cheat by optimising their tool to handle that one page particularly well... Also, there would be the problem of selecting a type of text; technical, legal, literary, etc. Good luck!

[Edited at 2012-09-29 11:22 GMT]


 

Egidijus Slepetys  Identity Verified
Local time: 22:42
German to Lithuanian
What is slow? Sep 29, 2012

The slow response existed there too.


0,1s, 0,5s, 1s?


 

Hans Lenting  Identity Verified
Netherlands
Member (2006)
German to Dutch
TOPIC STARTER
Different interfaces Sep 29, 2012

Michael Beijer wrote:

CafeTran sounds really interesting, it's just too bad that every time I open it to have a look, I can't seem to figure out how to do anything. It has some great features – I just wish Igor had tried to copy memoQ's UI instead of inventing his own from scratch.


Perhaps you want to move this thread to the CafeTran forum? Nevertheless, a quick reply here. What is it that you prefer here:

Screen%20Shot%202012-09-29%20at%202.17.22%20PM.png

And miss here:

Screen%20Shot%202012-09-29%20at%202.23.01%20PM.png


PS: One small thing that is really annoying, at least on Win7, is the Open File dialogue box. Whenever I try and navigate to a file to be translated, or a TM, I have to switch from using my Wacom tablet stylus to my mouse (well, my Contour RollerMouse), because my double-clicks no longer work to open folders using the tablet stylus.



Could this be Java related?

Hans


 

Hans Lenting  Identity Verified
Netherlands
Member (2006)
German to Dutch
TOPIC STARTER
Slow is: not being able to proceed with work Sep 29, 2012

Egidijus Slepetys wrote:

The slow response existed there too.


0,1s, 0,5s, 1s?


Up to 5 secs


 

Michael Beijer  Identity Verified
United Kingdom
Local time: 20:42
Member (2009)
Dutch to English
+ ...
@Hans: Sep 29, 2012

Well, a few things:

• One thing about the UI that is puzzling, for example, is that I can't seem to figure out how to quickly skip around through segments in CT. In memoQ you just click on a src or trgt segment and you are in it. How do you do this in CT?

• The segment numbers in CT are strangeley placed. It's hard to tell if they refer to the segment above or below.

• memoQ, I think the TM, TB, and MT hits (all in the Translation results window) are better integrated.

• In memoQ, clicking on 'Project home' brings you to a very nice overview of translations, LiveDocs, TMs, TBs, and settings. There seems to be no equivalent in CT.

• I love the simplicity of the open (or closed) tabs for documents you are translating in memoQ.

• Hits are highlighted in memoQ. CT doesn't seem to have this.

• In memoQ's you can quickly filter on phrases in both the source and target column, not to mention on pretty much anything else you can think of (commented segment, match rate, proofread, auto-joined.split, not started, pre-translated, etc).

Sorry, I'll ask any more questions that I might have in the CT forum!

Michael


 

Egidijus Slepetys  Identity Verified
Local time: 22:42
German to Lithuanian
5 secs ???? Sep 29, 2012

that is way too much!!!
I'm having a 0,1-0,5 sec delay - I wouldn't even call it delay (but anything over 1 sec annoys me).


 

Hans Lenting  Identity Verified
Netherlands
Member (2006)
German to Dutch
TOPIC STARTER
75,000 terms is not enough to note the delay Sep 30, 2012

Hi Egidijus,

Egidijus Slepetys wrote:

that is way too much!!!



Exactly icon_smile.gif.

It sounds little, but let me assure you, it takes the 'flow' out of your translation process.


I'm having a 0,1-0,5 sec delay - I wouldn't even call it delay (but anything over 1 sec annoys me).


Just wait until you've tripled your number of terms ... You'll note it – I'm sure about that.

Hans


 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

CatGuru video request: term base import/recognition speed comparison

Advanced search







BaccS – Business Accounting Software
Modern desktop project management for freelance translators

BaccS makes it easy for translators to manage their projects, schedule tasks, create invoices, and view highly customizable reports. User-friendly, ProZ.com integration, community-driven development – a few reasons BaccS is trusted by translators!

More info »
SDL Trados Studio 2017 only €435 / $519
Get the cheapest prices for SDL Trados Studio 2017 on ProZ.com

Join this translator’s group buy brought to you by ProZ.com and buy SDL Trados Studio 2017 Freelance for only €435 / $519 / £345 / ¥63000 You will also receive FREE access to Studio 2019 when released.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search