Alignment tools and process
Thread poster: infactglobal
infactglobal
Local time: 17:33
Oct 6, 2012

Hello proz community,
After one of my threads was turning into a discussion that merited its own independent thread, here I am starting it.

I would like to know what alignment tools you use for big projects, how you navigate the alignment process and what you've learned from trial and error.

From what I understand, alignment can be an automatic process. I'm used to winalign and although it is in part automatic, it requires so much post-processing that I don't consider it automatic.

I've recently discovered fargasandras' open source alignment tool and plan on using it soon. But there is also align factory and hunalign... And probably more!

So your opinions, questions and comments about alignment tools are welcome here... The more we amateurs know the more we can improve!

Thanks in advance!
C


Direct link Reply with quote
 

Vadim Kadyrov  Identity Verified
Ukraine
Local time: 18:33
Member (2011)
English to Russian
+ ...
The thing is that Oct 6, 2012

any alignment tool you will use provides good results ONLY if the text you are processing is of high quality of formatting.

When you have two files which are 1) identical in terms of formatting and/or b) are simple text files (no pictures, a small amount of different styles, etc.), win align will be absolutely enough.

In case your text is badly formatted, you will have to manually change connections between two texts.


Direct link Reply with quote
 

Eileen Cartoon  Identity Verified
Local time: 17:33
Italian to English
Another issue with aligning Oct 6, 2012

Another issue with aligning is that the sentences must match. I translate from Italian to English and quite often split an Italian Sentence into 2 or even 3 in English. This can throw alignment way off.

Eileen


Direct link Reply with quote
 
FarkasAndras
Local time: 17:33
English to Hungarian
+ ...
autoaligner Oct 6, 2012

Eileen Cartoon wrote:

Another issue with aligning is that the sentences must match. I translate from Italian to English and quite often split an Italian Sentence into 2 or even 3 in English. This can throw alignment way off.

Eileen


The point of autoaligners is to eliminate or at least mitigate this problem. They use quite clever algorithms (I can go into this if anyone's interested) to determine which sentence goes with which, and they will realize that you split up a sentence into 3 and they will merge them back into one segment so that everything pairs up correctly again. They don't get it right 100% of the time, but they can come astonishingly close. Generally, if the alignment is pushed out of sync (by a sentence that the translator split up and the autoaligner failed to identify, etc.), then it will be out of sync for one or two segments, maybe 5. Then as it works its way down the text, the autoaligner finds some good anchor points and gets back on track.
The upshot of this is that if you use an autoaligner:
- You have much less manual correction to do, so the whole job is done much quicker. As in, an order of magnitude quicker.
- You can autoalign massive collections of texts, accept that some segments will be misaligned and simply not bother with manual correction. This is the only feasible option when you have hundreds of thousands - or even millions - of sentence pairs to draw on.


WinAlign doesn't have an autoaligner (only a very primitive and rarely used feature that attempts to work somewhat as an autoaligner). Therefore, I'd discourage everyone from using WinAlign.
MemoQ's built-in aligner includes an autoaligner, and so does AlignFactory and a couple of others.
I wrote an open-source aligner based on the hunalign autoalignment engine.

[Edited at 2012-10-06 17:58 GMT]


Direct link Reply with quote
 
septima
Local time: 17:33
What exactly do you want to do? Oct 6, 2012

The more we amateurs know the more we can improve!


Maybe if you actually said what you want to do, it would be possible to give you some better advice. You describe yourself as an "amateur" above, but your profile indicates that you are a language services provider of some sorts. Your other thread is about selling a translation memory. So, as I understand it, you're interested in finding efficient solutions to create automatically aligned TMs and sell them. Is that the plan?

I think there are some people already doing that


Direct link Reply with quote
 
infactglobal
Local time: 17:33
TOPIC STARTER
what I want to do Oct 7, 2012

septima wrote:

The more we amateurs know the more we can improve!


Maybe if you actually said what you want to do, it would be possible to give you some better advice. You describe yourself as an "amateur" above, but your profile indicates that you are a language services provider of some sorts. Your other thread is about selling a translation memory. So, as I understand it, you're interested in finding efficient solutions to create automatically aligned TMs and sell them. Is that the plan?

I think there are some people already doing that


Indeed, I am an LSP but a new one, so I'm learning all there is out there. I used to be a freelance translator and well, little by little, have grown into my current business. So that explains that.

What I need with alignment is a little complicated but I think I can get it with an autoaligner. My client (the same one as in my previous thread) has about 50,000 documents (averaging 1500 words each) in French, English, German and Italian.

Ten years ago I started working in the French to English pair for them. Then, three years ago I started working with the English to French pair. The English is written/translated by the German-speaking company. So in a nutshell, I work with a very poor quality English translation (since it was translated by non-natives... and YES, I've been trying to get this to change, but to no avail) into French and from French into English.

I have built quite a database (more than 80,000 segments now) in both directions and I made a mistake once by reversing and merging the TMs. So now in my French to English TM I have some segments that are very poor quality. And since the English I get from the client is so bad, my French translations don't have technical meaning. (The client reads the French, doesn't understand what I am technically talking about, reads the English to figure out where the problem is and understands why the French is the way it is... The client then rewrites the French to make it technially correct.)

This process is of course incredibly time-consuming for everyone, as I spend more time trying to understand the English than acutally translating into French, and my client has to spend enormous time correcting what I couldn't have done any better. They are trying to improve the process and their request for my TM (it being better quality than the German to English TM) is part of this improvement process.

But since my TM contains mistakes because of my reversing and merging with the English to French TM, I thought it would actually serve the client better to align the French (and technically correct documents) with the English documents. I would then post-process to make sure segments corresponded and then proof the English to render a satisfactory TM.

This could be the half-way I was looking for in my previous thread. The client would be happy because he would obtain a strong TM in French and English. And I would be able to show my client that sharing the TM is possible under certain contractual circumstances.

What would be even better is to give them a French-German TM, and a German-English TM as all their documents follow the same format, whatever language they are in.

This is what I want to do with an alignment tool.

Just for info, do these autoaligners take into account the repititions? For instance, with this client, in 50,000 documents I'll find probably 50% repititions. It would be great to align all the docs and all segments that are repeated are just ignored so that I don't have to validate or proof a segment twice.

Thanks for everyone's input.

C


Direct link Reply with quote
 

Siegfried Armbruster  Identity Verified
Germany
Local time: 17:33
Member (2004)
English to German
+ ...
Not sure if alignment will solve your problem Oct 7, 2012

Don't underestimate the effort required and the issues you will have to solve when you plan and perfom you alignment process. Mass alignment is a specialist task. It did take me a long time to get from absolute crap results to something that is worth using.

As I understand it, you already do have a TM but it contains a lot of segments in the wrong language. The advantage of your TM is that the segments are already aligned. Therefore you could skip the "alignment step" and start directly with cleaning/checking your TM.

I don't recall wich TM software you are using, but most tools offer an export function to TMX format. You could load the TMX in Olifant and clean your TM. Doing this will also give you some experience (for one step of many) for your future "alignment projects".


Direct link Reply with quote
 
FarkasAndras
Local time: 17:33
English to Hungarian
+ ...
align vs filter Oct 7, 2012

infactglobal wrote:

Just for info, do these autoaligners take into account the repititions? For instance, with this client, in 50,000 documents I'll find probably 50% repititions. It would be great to align all the docs and all segments that are repeated are just ignored so that I don't have to validate or proof a segment twice.


As a general rule, they leave repetitions (duplicate segments) in.
LF Aligner has a duplicate filter that you can switch on to get rid of duplicates; if you use some other aligner, you may or may not be able find a workaround to do it.

As Siegfried says, it's generally not a good idea to align documents that were translated with a CAT. It's best to use the bilingual documents or TMs produced by the CAT if at all possible. So, if you translated these files with Trados, you might have the bilingual word files, ttx files or sdlxliff files. Then you can clean them up into a new TM and get what you need. It's going to take ages with 50,000 files, of course, but some parts of the process may be automated depending on what folder structures/file naming schemes were used. Or, say, take the TMX you have and see if there is some property that allows you to filter out the TUs (segments) you need. The creationID (translator name), or the dates, "note" text fields added, whatever.


Direct link Reply with quote
 
infactglobal
Local time: 17:33
TOPIC STARTER
clarifications Oct 7, 2012

Siegfried Armbruster wrote:

Don't underestimate the effort required and the issues you will have to solve when you plan and perfom you alignment process. Mass alignment is a specialist task. It did take me a long time to get from absolute crap results to something that is worth using.


I'm sure there are specialists out there that could do the alignment process and that actually might be the way to go, I'm not sure yet. But I will attempt a mass alignment myself to test the process. In any case, once the alignment is done with (ie all segments correspond to eachother) I'll need to go through and validate the English, cause only a specialist with the client's technology will know what's technically right and what's not.

As I understand it, you already do have a TM but it contains a lot of segments in the wrong language. The advantage of your TM is that the segments are already aligned. Therefore you could skip the "alignment step" and start directly with cleaning/checking your TM.


I have two TMs : French into English and English into French. Neither of the TMs contain "other languages" but they both contain English written by non-natives so they are not good translations. Plus since the English is so bad (written by the German-speaking parent company) the French isn't technically stable either. That's why I thought aligning the French and English documents together, using the French as the source (and reliable) language. Afterwards, it would be interesting to combine the English - French TM with other languages in the company, but that's just an idea...

I don't recall wich TM software you are using, but most tools offer an export function to TMX format. You could load the TMX in Olifant and clean your TM. Doing this will also give you some experience (for one step of many) for your future "alignment projects".


I use Trados, but as stated above, neither language is "trustworthy" so I think alignment is the only way to go. I've aligned documents before, through WinAlign, so I know what the process entails, and I'm a quick learner, so I don't see why I can't try to climb the alignment mountain (like the little train that could!!!)

Thanks for the input!
C


Direct link Reply with quote
 

Richard Hill  Identity Verified
Mexico
Local time: 11:33
Member (2011)
Spanish to English
Superalign Oct 7, 2012

This is my alignment tool of choice, which allows you to work on multiple files.

http://sourceforge.net/projects/superalign/


Direct link Reply with quote
 
septima
Local time: 17:33
Cleaning Oct 7, 2012

Well, if you want to "clean up" that Big TM, there are some simpler things you could do before considering alignment.

For a start, you could gather together all the original FR documents you yourself translated into EN (quality). Make a Trados project from them and pre-translate them with the Big TM. The matches will be all your quality FR -> EN translations in the TM. You then save those to a "clean" TM separately.

In similar ways you could "filter" out other translations, just by pre-translating from the originals against your Big TM or another one.

---

As regards alignment, all bets are off until we know what scenario you have there. If you have a bunch of perfect mirror-image DOCs with little to no heavy formatting, then batch alignment could be okay. But if we're talking about a mixed scenario, with any PDFs, or unmirrored changes, additions, formatting etc., you could have a big task on your hands.


Direct link Reply with quote
 
infactglobal
Local time: 17:33
TOPIC STARTER
If you try and fail, try again, if you fail again, ask a specialist... Oct 9, 2012

I guess that sums it up pretty well... I will try to align these documents... hoping that they will be "easy" (yes, I am laughing)! I will certainly learn how to align better by trying and might apply what I've learned to the next batch of files, but if it's just too much, I'll ask a specialist.

I guess that's the way to go for me.

Thanks again to all proz contributors...

C


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Alignment tools and process

Advanced search






Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search