ABBYY Aligner
Thread poster: kirstinerennie
kirstinerennie
Local time: 06:50
Jan 15, 2015

Hello,

My translation firm is looking to align a very large amount of work. I had already posted a question regarding this in the Across forum, assuming that it would have to be done in a CAT tool.

Another member responded saying that he thought it would be easier/quicker in something like ABBYY Aligner. My question is has anyone ever used ABBYY Aligner and if so what are the advantages to using it instead of aligning translations in a CAT tool? Is it significantly faster?

Any help would be much appreciated!

Kirstine


Direct link Reply with quote
 

Michael Joseph Wdowiak Beijer  Identity Verified
United Kingdom
Local time: 06:50
Member (2009)
Dutch to English
+ ...
Just typed, and lost a very long post :-( Jan 15, 2015

No time to retype it all.

I suggest first having a look at the free LF Aligner: http://sourceforge.net/projects/aligner/
You might also want to consult András Farkas: http://www.farkastranslations.com/alignment.php (the expert on aligning large amounts of text )

Also have a look at AlignFactory (probably the best commercial aligner on the market today): http://www.terminotix.com/index.asp?content=brand&brand=1&lang=en

Always test a sample with several aligners. Each text is different, and there is no aligner that will do a good job on all text types.

Michael


Direct link Reply with quote
 
FarkasAndras
Local time: 07:50
English to Hungarian
+ ...
Gah Jan 15, 2015

I also typed out a long post that the proz software destroyed instead of posting. Serves me right for typing in the browser instead of a text editor.
Anyway, how much material are we talking about (thousands of segment pairs or tens of thousands, maybe hundreds of thosuands or millions?) and how good a result do you want (is 95% correct pairing good enough, or do you want 100.00% correct)? Are you okay with discarding subpar documents or sections of texts or do you want every last sentence extracted?
Your choices depend on these factors. In any case, you need an aligner with a good autoaligner algorithm. Most CAT tools' aligners fail at this hurdle. Then what sort of a review/edit you do after autoalignment depends on your needs.

[Edited at 2015-01-15 15:00 GMT]


Direct link Reply with quote
 

Mikhail Zavidin
Ukraine
Local time: 08:50
English to Russian
intelligent and fast Jan 15, 2015

As to my translation pair it seemed to me quite intelligent and fast.
I had a filling that it caught the meaning of the sentence when aligning each pair. However, there was mistakes in aligning.
To be frank I haven't seen better aligner so far, though I can't say that I have used a lot of them.
Now you can try ABBYY Aligner 2.0 trial which works only 15 days though and has some restrictions.
The version I have used is 1.0.


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 07:50
Member (2006)
English to Afrikaans
+ ...
@Kristine Jan 15, 2015

kirstinerennie wrote:
My translation firm is looking to align a very large amount of work. I had already posted a question regarding this in the Across forum, assuming that it would have to be done in a CAT tool.


No, it typically can't be done "in" a CAT tool, although some CAT tools are accompanied by alignment programs. For very large amounts of work, the aligners that I have seen that come with CAT tools may not be suitable, though.

LF Aligner is a freeware option that tends to get good reviews. I haven't really used it myself, though. It is mainly a non-GUI aligner, but the latest version does have a GUI, but it's not the most user-friendly GUI that I have seen.

If you want high quality TMs that you can trust 100%, then you'd have to check the alignment manually, and that is when it becomes necessary to have an alignment GUI that is easy to use.

Another member responded saying that he thought it would be easier/quicker in something like ABBYY Aligner.


The product brief looks impressive, but you won't be able to tell whether it is "good" with your very large amount of work, because the trial version is limited to 1000 segments (or 50). For EUR 100 it is not cheap. The installer for the trial version is 300 MB (ouch!).

==Added:

Okay, I tested it. It is a "smart" aligner, which is a good thing, but it misses a crucial function: the ability to insert blank cells. It can move cells up and down only if there is a blank cell above or below that cell, but if there is no blank cell, and there is a misalignment, then you can't fix it using the keyboard shortcuts, but must fix it by manually copy/pasting text from one cell into another. That is *bad*.

There is a batch function (not tested) but I just dragged and dropped two files into it (EN and AF). It doesn't support AF, but recognised the AF file as NL, which is good enough. It only merges cells if you select them, and unfortunately the shortcut for merging is in an odd position, but it's not the end of the world. There is no simple shortcut for moving between whole cells, but the down and up arrows move between cells easily. It does not seem possible to delete a cell without deleting the entire row. Ctrl+Z works! PgDn and PgUp moves through the file (also a good thing).

The screenshot in the product brief showed that the program will mark possible misalignments in colour, but it didn't do it for me.

[Edited at 2015-01-15 15:45 GMT]


Direct link Reply with quote
 
kirstinerennie
Local time: 06:50
TOPIC STARTER
Thank you Jan 16, 2015

Hi everyone,

Thanks so much for all your helpful replies, very much appreciated.

We are currently looking at all the options as it's a very big alignment project (around 2 million words!).

Thanks again,

Kirstine


Direct link Reply with quote
 
FarkasAndras
Local time: 07:50
English to Hungarian
+ ...
2M words Jan 16, 2015

kirstinerennie wrote:

Hi everyone,

Thanks so much for all your helpful replies, very much appreciated.

We are currently looking at all the options as it's a very big alignment project (around 2 million words!).

Thanks again,

Kirstine


That's a pretty big alignment project. If 2M words is in one language, that'll probably work out to about 200K segment pairs. That's already in the size range where I'd normally do an autoalignment with only a partial manual review (more a series of spot checks looking for potential quality improvement tricks than an actual review, re-running the autoalignment if I find something system-level and fixable). Still, a full manual review is not outside the realm of possibility. It's just a huge job. I reckon I have probably done manual review on 100K+ segments so far, for my personal use, for a hobby project (aligning public domain literary works) and for paying clients (translators who needed TMs made from translated documents)... But then I'm probably quite unusual in terms of my tolerance for certain types of monotonous work and being able to review/fix alignments quickly.

As to tools, I personally would use LF Aligner. But then I wrote it so I'm obviously partial. Alignfactory and ABBYY seem to get good reviews, although if it works as Samuel describes I would say ABBYY is out.

[Edited at 2015-01-16 20:26 GMT]


Direct link Reply with quote
 
xxx2nl  Identity Verified
Netherlands
Local time: 07:50
Transit Alignment tool Jan 17, 2015

The Transit Alignment tool offers interesting ways to improve the alignment result, e.g. by use of your dictionaries.

Quick Start: https://transitnxt.wordpress.com/2013/11/27/aligning-files-in-transit-nxt/

Full manual: http://tinyurl.com/qf56j5q

Use internal word list
Transit NXT uses an internal word list to assess the probability of the source and target segments being correctly matched.
The alignments are saved in the file align.adc under config\global in your Transit NXT installation folder.
If Transit NXT finds that the source-language segment contains an entry from the internal word list, it searches for the translation of the term in the target-language segment.

Use project dictionaries
Transit NXT uses the current TermStar dictionary to assess the probability of the source and target segments being correctly matched.
If Transit NXT finds that the source-language segment contains a term that has been added to the current dictionary, it searches for the translation of the term in the target-language segment.

Resource files mode (with comparison of markup segments)
Transit NXT compares markup segments during align- ment, instead of text segments.
Use this option when aligning files with string IDs, perhaps for localisation projects.


Direct link Reply with quote
 
FarkasAndras
Local time: 07:50
English to Hungarian
+ ...
standard feature Jan 17, 2015

2nl wrote:

Use internal word list
Transit NXT uses an internal word list to assess the probability of the source and target segments being correctly matched.
The alignments are saved in the file align.adc under config\global in your Transit NXT installation folder.
If Transit NXT finds that the source-language segment contains an entry from the internal word list, it searches for the translation of the term in the target-language segment.

Use project dictionaries
Transit NXT uses the current TermStar dictionary to assess the probability of the source and target segments being correctly matched.
If Transit NXT finds that the source-language segment contains a term that has been added to the current dictionary, it searches for the translation of the term in the target-language segment.

That's been a standard feature of many autoalignment algorithms for many-many years. It's kind of an obvious method to use, so it's certainly not a selling point for any one algorithm. In many cases it's a drawback.
Alignment history lesson coming up, skip if uninterested: many different efforts were made to get away from this dictionary-based method in order to be able to align texts in language pairs where no good dictionary is immediately available (for the text pair in question, in the right format, to the person running the alignment). Perhaps the most widely used one is the Gale-Church algorithm developed in 1993. It is based on segment length: longer segments tend to correspond to longer segments, and shorter ones to shorter ones. If you go through the whole text trying to equalize the segment lenths, things start to fall into place. Some algorithms try to find identical strings in the two texts to use as anchors (e.g. proper names), some run the texts through a MT engine or deploy other tricks. Quite a few use a combination of methods. Hunalign, which testing has shown to be one of the best algorithms, uses a combination of the dictionary method and the Gale & Church algorithm. (It can actually run the Gale & Church to get a rough alignment, automatically extract a dictionary from the aligned texts and then do a second alignment run with the freshly made dictionary.) LF Aligner uses hunalign as its alignment engine and comes with dictionaries for a wide range of language pairs. You can also add your own dictionary.


2nl wrote:
Resource files mode (with comparison of markup segments)
Transit NXT compares markup segments during alignment, instead of text segments.
Use this option when aligning files with string IDs, perhaps for localisation projects.

That's a neat trick, which is also employed by multiple other aligners. LF Aligner doesn't do this (I tried to integrate an open source alignement engine that does this but couldn't get the alignment engine to work and abandoned the idea). In the case of XML files or similar, it could be very useful if it's implemented well. It's usefulness is limited to specific file types, though. E.g. it might 'work' with HTML files in that it might correctly pair up paragraphs... but most autoaligners will do that anyway. The really important bit is correctly pairing up sentences, and HTML markup probably won't help you do that at all.


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

ABBYY Aligner

Advanced search







Déjà Vu X3
Try it, Love it

Find out why Déjà Vu is today the most flexible, customizable and user-friendly tool on the market. See the brand new features in action: *Completely redesigned user interface *Live Preview *Inline spell checking *Inline

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search