Best way to align a multilingual file?
Thread poster: Lori Cirefice

Lori Cirefice  Identity Verified
France
Local time: 13:22
French to English
Jun 7, 2013

I have a brochure with text in 5 five different languages (English + 4), now the original English source has been updated, and all 4 translations need to be updated accordingly. Much of the text is the same, only some parts have changed, so I want to align the old version to get 4 x EN - Target TMs.

I usually use WF +Tools for alignment projects, and it would involve a lot of copying and pasting to achieve the desired result. I'm hoping to avoid this tedious copy/paste job.

Are there any alignment tools out there that can handle a multilingual file in a single run?


 

SDL Community  Identity Verified
United Kingdom
Local time: 13:22
English
What do you mean by a multilingual file? Jun 7, 2013

Hi Lori,

You have a brochure in five languages... so you have the a single file that contains the text for the five languages? What format is the file?

I don't know of any multilingual alignment tool but perhaps there is another way of looking at this if the format of the file you have is easily accessible?

Regards

Paul


 

Lori Cirefice  Identity Verified
France
Local time: 13:22
French to English
TOPIC STARTER
A single file Jun 7, 2013

The file I want to align is a single, dead pdf file with all 5 languages. Each page has all 5 languages, for example page 1 is the company presentation in 5 languages, page 2 is about their product ABC with description in 5 languages, page 3 is about their product XYZ with description in 5 languages, etc.

I was planning to OCR it of course, to get a .doc file.


 

SDL Community  Identity Verified
United Kingdom
Local time: 13:22
English
Multilingual PDF Jun 7, 2013

Hi Lori,

ok - possibly the worst starting point then!

I'd be interested to see what others think but this is what I'd do... and you probably know this already which is why you're looking for other ideasicon_wink.gif

  1. OCR the file
  2. Clean it up to make sure there isn't an abundance of useless tags
  3. Now you have a docx with all the content
  4. Create five copies and hide the text in all languages barr one... so each one has a different language unhidden
  5. Using Studio (I know you may not have this but I can only base this suggestion on something I know and it might give you some better ideas...) I would create a project, language doesn't matter, and add all five files.
  6. I copy source to target for all of them and then use the SDLXLIFF for Microsoft Office converter to get excel files for each file (it's just a drag and drop for the whole project so doesn't take long)
  7. The excel files will contain columns of plain text with a segment per line, one column for the source and one for the target
  8. I then merge the excel files so I have one file containing a column for each language
  9. The file won't be big (you only have three pages) so I would then run down the columns manually and make sure they are all aligned nicely
  10. Once you have this there are any number of ways to get the contents into several TMs and once you have this you can go back and translate the updated files to get what you want... maybe based on the same mechanism of hiding the text that I started with.

It feels like I wrote a lot, and feels complicated, but in practice I think the steps are quite simple to process. Maybe this will prompt some better ideas... or you might just decide copy and paste is fastericon_wink.gif

Regards

Paul


 

Natron
Japan
Local time: 20:22
English to Japanese
+ ...
Are the languages styled differently? Jun 7, 2013

To be honest I was confused with what to call these type of documents at first as well. I originally thought they were called bilingual documents since both languages are in the same document, but I found out this was mostly used for properly aligned bilingual rtf type of files. I wonder what these type of documents are actually called?

How many pages is this brochure? To be honest it might save you your sanity to just manually copy & paste if it's not that long. Or if it's really long you can maybe find someplace to outsource the tedious copy/paste job. I know you're expecting a lot of matches and I can understand wanting to get everything in a nice TM, but if you only have a PDF, automating this process is not going to be very easy. It's going to come down to how good your OCR program is. Are the different languages styled differently in any way? Color, size? How are the languages oriented in the document? Are they in any kind of table or organized into columns? Also you can expect to clean up a lot of tags from the OCR conversion to doc.

LF Aligner is a wonder open source alignment tool but I don't think it supports more than 2 languages. The creator is a user here FarkasAndras. He may chime in with some better advice but pdf is pretty much the worst case scenario.


 

Lori Cirefice  Identity Verified
France
Local time: 13:22
French to English
TOPIC STARTER
Columns Jun 7, 2013

Thanks Paul for your detailed suggestions - I don't work with Studio but you gave me a good idea. I hadn't considered just "hiding" the various languages - I was thinking about copying/pasting each language into a new file for alignment. I think hiding will go faster, I should be able to end up with 5 monolingual files that way.

Natron, the text is roughly organized into columns, there are also some tables - the font attributes are fairly uniform, so I don't see how I could use font attributes to separate the chaff from the wheat.


 

Artem Vakhitov  Identity Verified
Estonia
English to Russian
+ ...
I have no kind words to say about such documents... Jun 9, 2013

... and not only as a translator, but as a reader, too. Most of them are extremely unreadable.
Sorry that I cannot be of any help to you in this case, I'm just expressing my sympathy.


 

FarkasAndras
Local time: 13:22
English to Hungarian
+ ...
LF Aligner and many languages Jul 5, 2013

Natron wrote:

LF Aligner is a wonder open source alignment tool but I don't think it supports more than 2 languages. The creator is a user here FarkasAndras. He may chime in with some better advice but pdf is pretty much the worst case scenario.


This is almost certainly too late for the OP but here's some info for reference:

LF Aligner is probably the only aligner that supports multi-language alignment. You can feed it the same text in a dozen languages and generate a 12-column table and a 12-language TMX file. Then (assuming that your CAT handles TMX correctly) you can create a TM with the desired language combination and import the TMX. Your CAT will pick the languages that are set in the TM and import those.
Of course LF Aligner can't pick the various texts out of a messy multilingual pdf like the OP's file. No widely available software can. You would need to move the texts into separate files before running LF Aligner.
LF Aligner autoaligns each language with all the others and provides a GUI for reviewing the alignments. If the file is longer than a couple of dozen sentences, these will speed up the work tremendously compared to doing it manually in Excel as suggested by Paul.


 

Samuel Murray  Identity Verified
Netherlands
Local time: 13:22
Member (2006)
English to Afrikaans
+ ...
Select manually, I'm afraid Jul 5, 2013

Lori Cirefice wrote:
I have a brochure with text in 5 five different languages (English + 4)... so I want to align the old version to get 4 x EN - Target TMs.


In many OCR programs, you can select the text blocks that you want to to be converted. Most modern OCR programs do the text box selection for you, but I'm sure it must still be possible to select manually. This means that you'll have to perform the OCR five times -- and each time select the block of text on each of the pages to be recognised and converted. The good news is that this is mouse-work, and that you don't need to use the mouse very accurately, so you won't get tired arms.


 

Lori Cirefice  Identity Verified
France
Local time: 13:22
French to English
TOPIC STARTER
How it turned out Jul 5, 2013

Thanks again everyone, I will definitely check out LF Aligner in the future.

So here is what I ended up doing... manually deleted all but 1 language from the OCR'd file x 5 so I ended up with 5 monolingual docs. Then I aligned as usual with +Tools.

Unfortunately there were formatting issues with the OCR file, which made alignment really difficult... things were in different locations across the various files (because of the original columns and general layout), so it turned out to be VERY time consuming to align, in addition to the OCR quirks.

After completing the EN-FR and EN-DE, I gave up on my initial idea and went with plan B (sending the dead pdf as a reference document for ES and IT).


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Best way to align a multilingual file?

Advanced search







CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use SDL Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search