Pages in topic:   [1 2] >
How to update previous translation memory?
Thread poster: Harklas

Harklas
Local time: 13:50
Feb 16, 2010

OmegaT 2.0.5_02

I start a project called Test1, import the file "weekdays.ods" and translate Monday as Moon Day. Then I "Create Translated Document" and close the project.

I start a project called Test2, import the file "weekdays2.ods", and in folder TM I put the translation memory created by Test1.

It now suggests that I translate Monday to Moon Day, but I go direct to Tuesday and translate it as "Tyr's Day", after which I "Create Translated Document" and close the project.

I start a project called Test3, import the file "weekdays3.ods", and in folder TM I put the translation memory created by Test2.

It now suggests that I translate Tuesday to Tyr's Day like I did last time, but I go to Monday to see what it suggests. It suggests nothing, the TM I put in Test 2 wasn't in the TM produced by Test2icon_frown.gif

I need to somehow merge the translation memory created by Test1 and Test2, so I can have one growing TM for all projects, and not an ever-growing folder with one TM file for each project I have done ...

I'd be most grateful for a solutionicon_smile.gif


 

Samuel Murray  Identity Verified
Netherlands
Local time: 13:50
Member (2006)
English to Afrikaans
+ ...
@Harklas Feb 17, 2010

Harklas wrote:
It suggests nothing, the TM I put in Test 2 wasn't in the TM produced by Test2. ... I need to somehow merge the translation memory created by Test1 and Test2, so I can have one growing TM for all projects, and not an ever-growing folder with one TM file for each project I have done.


Background information:

1. OmegaT has no TM merge function built-in.
2. The TMs that you place in the /tm/ folder will be consulted during translation.
3. The TMs that appear in the root of your project folder contain all of the segments (and only those segments) that appear in your source and target files.
4. The TM that OmegaT reads from *and* writes to, is called project_save.tmx, and it is in your project's /omegat/ folder.

Applicable to your situation:

1. You can merge TMs using another program, if you like.
2. Or, you can re-use the TM called "project_save.tmx". Simply replace the empty one that is created at the start of each new project with the old one that you had saved from previous translations. Remember to close the project in OmegaT before replacing the file (otherwise OmegaT will overwrite it).
3. Unfortunately, the project_save.tmx file must have that name, and no other name. So if you want to save it somewhere (e.g. under a client's name, you can rename it to client123_project_save.tmx, but whenever you put it in a new project's /omegat/ folder, it needs to be renamed "project_save.tmx" before OmegaT will use it.

What would be really nice is if OmegaT could detect multiple TMX files in the /omegat/ folder, and merge them when it detects them, into a single project_save.tmx file.


 

Didier Briel  Identity Verified
France
Local time: 13:50
Member (2007)
English to French
+ ...
To merge, you can use TMXMerger Feb 17, 2010

Samuel Murray wrote:
1. You can merge TMs using another program, if you like.

For that, you can use TMXMerger.
You can get it from OmegaT Resources

It's a simple command line tool.

Didier


 

Samuel Murray  Identity Verified
Netherlands
Local time: 13:50
Member (2006)
English to Afrikaans
+ ...
LOL @ Didier Feb 17, 2010

Didier Briel wrote:
It's a simple command line tool.


If there is a list of oxymorons from the opensource world, this item must be in the top 5. Reworded: the more cryptic something is, the easier it is to use.icon_smile.gif

I found this article very informative (if a little geeky):
http://www.burgaud.com/open-command-window-here/

For that, you can use TMXMerger.
You can get it from http://www.omegat.org/en/resources.html


Have you ever used TMXMerger, Didier? It seems very useful. How does it work?


 

Didier Briel  Identity Verified
France
Local time: 13:50
Member (2007)
English to French
+ ...
Simple means: no bells and whistles Feb 17, 2010

Samuel Murray wrote:

Didier Briel wrote:
It's a simple command line tool.


If there is a list of oxymorons from the opensource world, this item must be in the top 5. Reworded: the more cryptic something is, the easier it is to use.icon_smile.gif


Pardon my French.
I didn't mean to write it is easy to use. I meant it has no bells and whistles, even for a command line tool.


Have you ever used TMXMerger, Didier?


Yes, a few times.
I have not often the need to merge TMXs.

It seems very useful. How does it work?

Supposing one knows how to use a command line:
java -jar TMXMerger-1.0.jar first-tmx-to-merge second-tmx-to-merge final-merged-tmx
Actually, if you just launch TMXMerger with no TMX (i.e., java -jar TMXMerger-1.0.jar), it gives you reasonable instructions.

Didier


 

Samuel Murray  Identity Verified
Netherlands
Local time: 13:50
Member (2006)
English to Afrikaans
+ ...
Thanks, Didier Feb 17, 2010

Didier Briel wrote:
Supposing one knows how to use a command line:
java -jar TMXMerger-1.0.jar first-tmx-to-merge second-tmx-to-merge final-merged-tmx
Actually, if you just launch TMXMerger with no TMX (i.e., java -jar TMXMerger-1.0.jar), it gives you reasonable instructions.


I did launch it using a command-line window, but it gave me no instructions whatsoever. I also tried adding the usual -h, --h, -help, --help, /?, /h and /help switches, all to no avail.

I must add that I did not use "java -jar". It it not obvious to me to do that -- with most commandline utilities in Windows, you just type the name of the program. I'm not sure why Java should be different. Using "java -jar" would be obvious to people who regularly use or have previously launched Java programs from the commandline.


 

Harklas
Local time: 13:50
TOPIC STARTER
Merged TMX shrinked to half size ... Feb 17, 2010

Didier Briel wrote:

Supposing one knows how to use a command line:
java -jar TMXMerger-1.0.jar first-tmx-to-merge second-tmx-to-merge final-merged-tmx

Thanks! I used this Java tool for the two memories. It worked smoothly by putting the two memories and the Java tool in the same folder, and then klick SHIFT while right-clicking in the window (I have Vista) and then type the line you gave me.

But while my customers old TMX was 9246 KB and my new TMX was 11 KB, the merged TMX is 4169 KB.

It looks like half of the old TMX was destroyed somewhere in the processicon_frown.gif

Any experience from this?

In the meanwhile, Ill see if Murrays workaround will cause the same result ...


 

Didier Briel  Identity Verified
France
Local time: 13:50
Member (2007)
English to French
+ ...
Size is no indication of the content of a TMX Feb 17, 2010

Harklas wrote:
But while my customers old TMX was 9246 KB and my new TMX was 11 KB, the merged TMX is 4169 KB.

It looks like half of the old TMX was destroyed somewhere in the processicon_frown.gif

Any experience from this?

Where does your customer's TMX come from?

If it is from Trados, for instance, it usually contains a huge list of fonts, etc., which is useless (in OmegaT), but takes a lot of space. This is deleted when merging TMXs.

As you are under Windows, you can use Olifant to check the number of Translation Units of your TMXs, which is the important thing.

Didier


 

Samuel Murray  Identity Verified
Netherlands
Local time: 13:50
Member (2006)
English to Afrikaans
+ ...
Olifant, of course! Feb 17, 2010

Didier Briel wrote:
As you are under Windows, you can use Olifant to check the number of Translation Units of your TMXs, which is the important thing.


I forgot about Olifant! Well, you can open one TMX file in Olifant, and then select "Import" and import the second TMX file, and this will give you the same result as merging. It's just nice if one could use a small application.

The size reduction isn't necessarily a problem. I just tested it by merging to nearly identical TMs. The two TMs had roughly 800 TUs each, 550 KB each. The combined TM using Olifant was roughly 1600 TUs large (950 KB), but after removing duplicates, it was roughly 800 TUs again (obviously). Merging the same two TMs using TMXmerge gave me a file that was only 250 KB large, but it contained nearly the same number of TUs as the Okapi file that had duplicates removed. This means that TMXmerge automatically removes duplicates when merging.

The downside of TMXmerge was that the user ID and creation date of each TU was removed by TMXmerge. At the time when TMXmerge was written, OmegaT did not support these attributes. Perhaps it's time for an updated version of TMXmerge that supports these attributes...


 

Harklas
Local time: 13:50
TOPIC STARTER
And Olifant counts Translation Units! Feb 17, 2010

Thanks a lot! I downloaded it from here: http://sourceforge.net/projects/okapi/files/Olifant%20(Stable)/ (the latest one, #22)

And it showed me that the TU:s had indeed increased.

When summing up the two TM:s, I got a 10 lines higher figure than when counting the lines in the merged TM.

But that's a diff I'm sure I and the customer can live with. Bet those were just duplicates.

As for the other issues, user ID etc, let's just see if we get any feedback ...

Thanks a lot; it was indeed beneficial for our business to find you guysicon_smile.gif


 
Updating previous translation memory Nov 16, 2010

Samuel Murray wrote:

Background information:

1. OmegaT has no TM merge function built-in.
2. The TMs that you place in the /tm/ folder will be consulted during translation.
3. The TMs that appear in the root of your project folder contain all of the segments (and only those segments) that appear in your source and target files.
4. The TM that OmegaT reads from *and* writes to, is called project_save.tmx, and it is in your project's /omegat/ folder.

Applicable to your situation:

1. You can merge TMs using another program, if you like.
2. Or, you can re-use the TM called "project_save.tmx". Simply replace the empty one that is created at the start of each new project with the old one that you had saved from previous translations. Remember to close the project in OmegaT before replacing the file (otherwise OmegaT will overwrite it).


I found this explanation very helpful, however, simply copying the previous project_save.tmx file to the /omegat folder resulted in no matches. Does it need to be copied to both the /omegat folder and to the /tm folder of the new project?


 

Samuel Murray  Identity Verified
Netherlands
Local time: 13:50
Member (2006)
English to Afrikaans
+ ...
Was OmegaT closed when you copied? Nov 16, 2010

avastor wrote:
I found this explanation very helpful, however, simply copying the previous project_save.tmx file to the /omegat folder resulted in no matches.


Was OmegaT closed (i.e. not running) when you copied the file? If not, then OmegaT will overwrite the copied file with a blank file as soon as you start doing something in OmegaT.


 
Tags in OmegaT Nov 18, 2010

Samuel Murray wrote:
Was OmegaT closed (i.e. not running) when you copied the file? If not, then OmegaT will overwrite the copied file with a blank file as soon as you start doing something in OmegaT.

Thanks, I'm sure I must have done that or something equally idiotic. I wonder if I might pick your brain further. After first loading a .docx file for translation, there are many long strings of tags surrounding various bits of text. As it's not clear what these tags are, and as the translated text will usually be completely different in terms of both the number of words and the word order, I'm wondering how best to deal with these.
For example, word word word word word word word word word.
In every segment there are many such tags. How to deal with these in the translation? Thanks again in advance.


 

Didier Briel  Identity Verified
France
Local time: 13:50
Member (2007)
English to French
+ ...
Use the "Latest" version Nov 19, 2010

avastor wrote:
After first loading a .docx file for translation, there are many long strings of tags surrounding various bits of text. As it's not clear what these tags are, and as the translated text will usually be completely different in terms of both the number of words and the word order, I'm wondering how best to deal with these.
For example, {/w23}{/w16}{w24}{w25}{w26/} word word word {/w31}{/w24}{w32}{w33}{w34/}{w35/} word {/w39}{/w32}{w40}{w41} word word word word word. word word word word word.
In every segment there are many such tags. How to deal with these in the translation?

I have replaced the "lesser than" and "greater than" characters in your example to make them visible.
The "Latest" version has a tag reduction feature.
Your example would become {t0/} word word word {t1/} word {t2/} word word word word word. word word word word word.

Some of the unnecessary tags are also caused by Word itself (because of spell checking, for instance).
I recommend reading HOWTO: Translating Word 2007 (Office Open XML, .docx) files in OmegaT.

Didier


 

Anders Dalström
Germany
Local time: 13:50
Member (2008)
English to Swedish
+ ...
Olifant - segments from imported tmx are all blank Apr 11, 2011

Thanks for the useful info in this thread. I have a problem though: I have just tried to use Olifant to merge two tmx-files, one which the client provided containing previous translations and the other the tmx-file which Omega T created after I finished the current translation, but the segments from the new/second TM all show as blank, i.e. I can see the segments from the TM the client sent me (1900 segments) but the 200 new segments which I have translated are all blank. I have tried to use all three Omega T tm-files (level 1, level 2, omegat) What am I doing wrong? How do I correct this? Grateful for any and all suggestions/help.

Thanks,
Anders


 
Pages in topic:   [1 2] >


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


How to update previous translation memory?

Advanced search






SDL MultiTerm 2019
Guarantee a unified, consistent and high-quality translation with terminology software by the industry leaders.

SDL MultiTerm 2019 allows translators to create one central location to store and manage multilingual terminology, and with SDL MultiTerm Extract 2019 you can automatically create term lists from your existing documentation to save time.

More info »
BaccS – Business Accounting Software
Modern desktop project management for freelance translators

BaccS makes it easy for translators to manage their projects, schedule tasks, create invoices, and view highly customizable reports. User-friendly, ProZ.com integration, community-driven development – a few reasons BaccS is trusted by translators!

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search