Translation Memory Without Alignment
Thread poster: Worder Company
Worder Company
Russian Federation
Local time: 00:53
Russian to English
+ ...
Feb 11, 2014

I would be grateful for your comments on this topic.

I have received 10 pdf files (both originals and translations). The originals are in pdf and the translations are in word. The files contain a lot of graphics (brochures, etc.). The translations were done without using any cats, when a pdf is converted into word. I need to identify any matches across these files and the translated files.

I basically need to find two solutions:

1) Capability to concordance thru the files

2) Capability to identify any matches of higher than 70

Solution that I have used before:

1.

1) (Trados 11) I create a TM with the following settings: (in the tab "Fields and Settings") Name: someting like "Number in the Register"; Type: Number.

2. I create a list of folders with the numbers and place the original and translation for each file in separate folders.

3. I create a separate project for file No. 1 in Trados. Settings: (Pre-Translate) Copy source to target if no match has been found. I open the file and click on the number of the first segment, then I scroll down and click CTRL+click to select all segments. Then I change the status of all segments to "translated".

4. I update the main memory (via the batch task "update main memory"). Settings: (Tab "Translation Memories and Automated Translation", sub-tab "Update") this TM (which I have created in point 3) enables to type the number "1" under the word "Value" in this window.

When I open a file that has some matches with file No. 1, the matches are shown in the matches window in the following way: original text on both side and the text (Number in the Register: 1) to the right.

Thus, I know that I have to open file no 1 (original or translation) and copy-paste the match.

Benefits:

I have done this procedure for 5 projects, when the customer had no TM in place. I had to handle files of less than 50 pages each. It enabled me to quickly identify any matches and incorporate them. It took me a couple of hours to create these TMs, while I was doing other work on the PC -- I was just clicking a couple of times every 10 minutes. Thus, it means a couple of clicks per file vs. aligning the text in a file by hand.

Problems:

Now I have to handle files, each of which has 300 pages. When I try to just open a file in Trados, it takes 2 hours just to open the file (I have 4 G RAM, so it makes the process longer).



How do you tackle this issue? (Such options as aligning by hand or via other means, copying the materials to the same file and marking particular parts of the file (using colors or other means) are not feasible due to time constraints.


Direct link Reply with quote
 
pdf conversion Feb 12, 2014

I suspect many of your problems are the result of messy/noisy PDF files/conversions and the resulting huge files.

Studio's built-in PDF converter, while very good (I purchased the stand-alone version to get better performance), tends to insert tags for the most trivial reasons (word-spacing is a particular annoyance) The resulting Trados file may easily contain 1000s more tags than it otherwise would. For this reason, with a 300 page file, you might want to consider a separate tool to do the conversion, e.g. ABBYY Finereader. With a 300 page file, at a minimum, you will want to clean it of superfluous tags to make it more easy to handle. Consider a tool like Code Zapper, which will jettison unneeded tags, and will also export/re-import the heavy graphics temporarily while you translate. Also consider breaking up the Word file into different pieces. As far as alignment goes, the new aligner in Studio 2014 is reportedly very good. LF_Aligner is also an excellent product.


Direct link Reply with quote
 
Worder Company
Russian Federation
Local time: 00:53
Russian to English
+ ...
TOPIC STARTER
it takes too much time to create a project with the file Feb 12, 2014

T o b i a s wrote:

I suspect many of your problems are the result of messy/noisy PDF files/conversions and the resulting huge files.

Studio's built-in PDF converter, while very good (I purchased the stand-alone version to get better performance), tends to insert tags for the most trivial reasons (word-spacing is a particular annoyance) The resulting Trados file may easily contain 1000s more tags than it otherwise would. For this reason, with a 300 page file, you might want to consider a separate tool to do the conversion, e.g. ABBYY Finereader. With a 300 page file, at a minimum, you will want to clean it of superfluous tags to make it more easy to handle. Consider a tool like Code Zapper, which will jettison unneeded tags, and will also export/re-import the heavy graphics temporarily while you translate. Also consider breaking up the Word file into different pieces. As far as alignment goes, the new aligner in Studio 2014 is reportedly very good. LF_Aligner is also an excellent product.


Tobias, thank you for your answer.

1. Yes, I have to handle pdfs with a lot of graphics. Of course, I covert the pdfs into word and deleate the graphics manually.

2. I don't think I have to clean out any tags, since when I open the file in trados, I don't see any tags.

3. I can create the tmx via trados for this memory base (with the subsequent stages as above) or via other means (and then import it into trados, specifing the number for such file before importing (apply field values -- click on "edit" and enter the number before the finish stage of the tmx import)), but since it is easier to perform the creation of the tmx in trados (meaning that I don't have to make tmx in another programme, having to go thru several operations vs just working in trados), I used to make the tmx in Trados.

4. The problem right now is about creating the tmx and the fastest way to do it.

So, to summarize:

1. I do the Finereader conversion.
2. I clean manually the graphics in the obtained word file.
3. I don't see any tags in Trados.

The problem is about that it takes too much time to open the file in trados, mark the segments as translated in trados and upload the segments to the memory (200-page file is being created for 1 hour!!! or more in trados)

Colleagues, what is the fastest way to create the tmx (excluding winalign in tradox 14, 11, since these (I think) are the slowest and cumbersome ones).

The question was primarily about the full scheme to make the memory base as described, not just the stage of creating the tmx. So, if you use another method, it would be greate to find out about it (even if you decide to comment on this post after a while, since I'll be tracking this thread).

[Редактировалось 2014-02-12 21:06 GMT]


Direct link Reply with quote
 
Worder Company
Russian Federation
Local time: 00:53
Russian to English
+ ...
TOPIC STARTER
other programs Feb 12, 2014

The question was primarily about the full scheme to make the memory base as described, not just the stage of creating the tmx. So, if you use another method, it would be greate to find out about it (even if you decide to comment on this post after a while, since I'll be tracking this thread).

quote / unquote

By the way, if you use a similar method based on a programme other then trados, please feel welcome to share.


Direct link Reply with quote
 
Worder Company
Russian Federation
Local time: 00:53
Russian to English
+ ...
TOPIC STARTER
h Feb 18, 2014

Andrew052 wrote:

I would be grateful for your comments on this topic.

I have received 10 pdf files (both originals and translations). The originals are in pdf and the translations are in word. The files contain a lot of graphics (brochures, etc.). The translations were done without using any cats, when a pdf is converted into word. I need to identify any matches across these files and the translated files.

I basically need to find two solutions:

1) Capability to concordance thru the files

2) Capability to identify any matches of higher than 70

Solution that I have used before:

1.

1) (Trados 11) I create a TM with the following settings: (in the tab "Fields and Settings") Name: someting like "Number in the Register"; Type: Number.

2. I create a list of folders with the numbers and place the original and translation for each file in separate folders.

3. I create a separate project for file No. 1 in Trados. Settings: (Pre-Translate) Copy source to target if no match has been found. I open the file and click on the number of the first segment, then I scroll down and click CTRL+click to select all segments. Then I change the status of all segments to "translated".

4. I update the main memory (via the batch task "update main memory"). Settings: (Tab "Translation Memories and Automated Translation", sub-tab "Update") this TM (which I have created in point 3) enables to type the number "1" under the word "Value" in this window.

When I open a file that has some matches with file No. 1, the matches are shown in the matches window in the following way: original text on both side and the text (Number in the Register: 1) to the right.

Thus, I know that I have to open file no 1 (original or translation) and copy-paste the match.

Benefits:

I have done this procedure for 5 projects, when the customer had no TM in place. I had to handle files of less than 50 pages each. It enabled me to quickly identify any matches and incorporate them. It took me a couple of hours to create these TMs, while I was doing other work on the PC -- I was just clicking a couple of times every 10 minutes. Thus, it means a couple of clicks per file vs. aligning the text in a file by hand.

Problems:

Now I have to handle files, each of which has 300 pages. When I try to just open a file in Trados, it takes 2 hours just to open the file (I have 4 G RAM, so it makes the process longer).



How do you tackle this issue? (Such options as aligning by hand or via other means, copying the materials to the same file and marking particular parts of the file (using colors or other means) are not feasible due to time constraints.







There may be cases when you would want to delete the segments created based on a particulr file.

For instance, you have created the memory based on 50 files. Then you want to delete all segments for file 15 for some reason.

To do this:

1. open Translation memories view. Open you memory
2. create "filter 15": click on button add filter at the top of the window for filters (if you can't see roll back the defualt display settings.), to the right click add and type in 15.
3. click ok, click button save at the top of this window.
4. go to the left, click on the memory with your right mouse button, select batch delete, select "filter 15", click delete.

All segments for file 15 have been deleted.


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Translation Memory Without Alignment

Advanced search







Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
WordFinder
The words you want Anywhere, Anytime

WordFinder is the market's fastest and easiest way of finding the right word, term, translation or synonym in one or more dictionaries. In our assortment you can choose among more than 120 dictionaries in 15 languages from leading publishers.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search