Pages in topic:   [1 2] >
OmegaT ignores most of the source text :(
Thread poster: Harklas
Harklas
Local time: 03:31
Jun 15, 2010

I got four docx:s and one .tmx. Set up an OmegaT project and translated it in OmegaT. Worked well (except for some minor issues dealt with in other threads) and I "Create Translation" and get 4 corresponding docx:s, where these are supposed to be. However, I now see that only a portion of the text was caught by OmegaT. There is some target language sentences here in there in otherwise predominantly English texts, which was never caught by OmegaT.

I'll try with first converting the original docx:s to odt and do it all again. Maybe that makes it easier for OmegaT to find the words in there ... any idea why this happens?


Direct link Reply with quote
 

Dragomir Kovacevic  Identity Verified
Italy
Local time: 03:31
Italian to Serbian
+ ...
docx is complex a bit Jun 15, 2010

Harkias,
pay atttention to tags; all specific tags that OmegaT creates for open xml documents, have to be transferred into target. Try with this:

- make a new project for the same material
- take only the level1 tmx made by OmegaT for this specific project, and copy it into your TM repository: the place where your TM folder is targeted to. The level1 tmx is the one with pure text.
- when working with segments with the "tag forest" made by docx elaboration, firstly hit "insert source tags" on the very beginning of the target segment. Maybe, when you have some real formatting tags like those for "bold", "cursive", you should translate the segment as it goes, without this "insert source tags".
- control-validate tags during translation. Due to a great quantity of docx specific tags, OmegaT is under strain.
- save and compile often.

Dragomir

Harklas wrote:

I got four docx:s and one .tmx. Set up an OmegaT project and translated it in OmegaT. Worked well (except for some minor issues dealt with in other threads) and I "Create Translation" and get 4 corresponding docx:s, where these are supposed to be. However, I now see that only a portion of the text was caught by OmegaT. There is some target language sentences here in there in otherwise predominantly English texts, which was never caught by OmegaT.

I'll try with first converting the original docx:s to odt and do it all again. Maybe that makes it easier for OmegaT to find the words in there ... any idea why this happens?


Direct link Reply with quote
 
Harklas
Local time: 03:31
TOPIC STARTER
Thanks! Jun 15, 2010

But I'm afraid another CAT tool is on the wishlist...

Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 03:31
Member (2006)
English to Afrikaans
+ ...
On DOCX Jun 16, 2010

Harklas wrote:
I got four docx:s and one .tmx. Set up an OmegaT project and translated it in OmegaT. Worked well (except for some minor issues dealt with in other threads) and I "Create Translation" and get 4 corresponding docx:s, where these are supposed to be. However, I now see that only a portion of the text was caught by OmegaT. There is some target language sentences here in there in otherwise predominantly English texts, which was never caught by OmegaT.


If a CAT tool claims to support a certain file format, but that file format is closed (or proprietary), then you should be aware that the program will never be able to support more than 99% of documents in that format. So too with DOCX. It is possible that you had discovered a feature in DOCX that the OmegaT developers hadn't come across themselves, and the best thing to do (if you can) is to share the file with them using their bug tracking system. If your files are confidential, then that's too bad.

Dragomir thinks that your problem lies with tag soup, but it sounds to me as if OmegaT had failed to extract some of the text from the DOCX files.

Were you more successful using the ODT format (converted from DOCX) than with the DOCX files?


Direct link Reply with quote
 
Harklas
Local time: 03:31
TOPIC STARTER
Honestly, Jun 16, 2010

I had run out of patience, energy and time so I didn't do a proper all-out translation with everything converted to .odt ... I did convert them though, and started a new OmegaT project using those files and had a look at what was on the screen, and it looked like a lot of text was still missing (but there were definitely lines in there that weren't before so it seemed to work better at least). In retrospect, I realise it could also be that it looked like about the same amount of text because making it .odt got rid of the extreme tag-soup I had with .docx, which sort of inflated the sheer size of the text mass on screen. I'll go back and investigate when I'm in less of a hurry than this week.

But do you mean that .docx being proprietary means no other software has the proper keys to it? Does that hold for proprietary CAT tools as well, or do Trados and similar companies sort of buy code from Microsoft so they can handle their files correctly?


Direct link Reply with quote
 
Harklas
Local time: 03:31
TOPIC STARTER
I wish Jun 16, 2010

the people behind OpenOffice would somehow adopt this and merge it with their package. OOo Draw for example, is not far behind Illustrator and makes the OOo much more valuable than MS Office ... sorry if this was a bit OT

Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 03:31
Member (2006)
English to Afrikaans
+ ...
How to compare project sizes Jun 16, 2010

Harklas wrote:
I did convert them though, and started a new OmegaT project using those files...


If you had added the DOCX project's TM (omegat1) to the /tm/ folder of the ODT project, you could have reused a lot of your translations. In fact, you could simply have added the ODT files to the /source/ folder of your first project, and the TM would have been reused.

In retrospect, I realise it could also be that it looked like about the same amount of text because making it .odt got rid of the extreme tag-soup I had with .docx, which sort of inflated the sheer size of the text mass on screen.


In the project folder is a statistics file that tells you exactly how many words of text OmegaT thought the project had in it. Compare the statistics file of the first project with that of the second project if you want to compare the two -- simply looking at what you can see on screen isn't a very accurate method of comparing projects

But do you mean that .docx being proprietary means no other software has the proper keys to it? Does that hold for proprietary CAT tools as well, or do Trados and similar companies sort of buy code from Microsoft so they can handle their files correctly?


No, I'm not referring to CAT tools that have paid a fee to Microsoft for information about the format. I'm talking about CAT tools that try to figure out things for free.


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 03:31
Member (2006)
English to Afrikaans
+ ...
Adopt what? Jun 16, 2010

Harklas wrote:
I wish the people behind OpenOffice would somehow adopt this and merge it with their package.


Adopt what?


Direct link Reply with quote
 
Harklas
Local time: 03:31
TOPIC STARTER
Thanks for the tips! Jun 16, 2010

It'd indeed be better if one had more time for each project ...

Adopt OmegaT or some other CAT, I mean. So you could have CAT with OOo level of functionality. But I'm far from a programmer, so I might be talking out of my @$$


Direct link Reply with quote
 

Dragomir Kovacevic  Identity Verified
Italy
Local time: 03:31
Italian to Serbian
+ ...
OOo level of functionality Jun 16, 2010

It is unclear what you mean with this "OOo level of functionality".

If you think of a CAT working within OOo, then you can try Anaphraseus, opensource, like OmegaT is.

If you think about real functionality of OOo, it is a very nice suite that many of us use, I do, and use it much more than MS Office, but it is still under long development.

First of all, you have to be patient when working with opensource applications, which are much more user-dependant than payable ones.

Dragomir

Harklas wrote:

It'd indeed be better if one had more time for each project ...

Adopt OmegaT or some other CAT, I mean. So you could have CAT with OOo level of functionality. But I'm far from a programmer, so I might be talking out of my @$$


Direct link Reply with quote
 

Vito Smolej
Germany
Local time: 03:31
Member (2004)
English to Slovenian
+ ...
simple checking question Jun 19, 2010

Both Samuel and Dragomir have assumed, you dont have any tag errors.

Have you checked for that (you know, tag walidation window i.e. Ctrl+T). From my own experience, this is one of the most frequent causes for screwed-up target documents - I mean, do not feel bruised (g) if you have done it.

Regards

Vito


Direct link Reply with quote
 
Harklas
Local time: 03:31
TOPIC STARTER
I must repent Jun 24, 2010

and give some credit back to OmegaT

Yesterday I got 5 .ppt files, and I didn't have time to go look for some other CAT tool, so I just took a shot.

This time, I was careful to, before doing anything else, open the .ppt:s in OOo and save them as Open Presentation. Then I started the OmegaT project.

What I saw this time (and which I probably missed last time) is that OmegaT presents the documents one at a time. To get to document #2, I had to go down to the last segment and click Enter there. I didn't find any other way to switch between documents.

Now, after translating it all in OmegaT, I made target documents, checked them, and despite for some normal stuff (my language being a bit longer, thereby messing with layout), it looked PERFECT. After brushing up the .odp:s in OOo, I saved them as .ppt.

It all worked perfectly.

So my lesson here is (which you have already tried to make me understand, sorry for being a bit obtuse): leave Microsoft as early as possible, and go back to Microsoft as late as possible.

Now I suddenly feel OmegaT is as useful as some payable applications we bought, so I might stick around here for a while and keep bugging you


Direct link Reply with quote
 

Didier Briel  Identity Verified
France
Local time: 03:31
Member (2007)
English to French
+ ...
Project /Project Files Jun 24, 2010

To get to document #2, I had to go down to the last segment and click Enter there. I didn't find any other way to switch between documents.

Simply use Project/Project Files, and navigate from the Project Files window.

Didier


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 03:31
Member (2006)
English to Afrikaans
+ ...
@Harklas Jun 24, 2010

Harklas wrote:
What I saw this time (and which I probably missed last time) is that OmegaT presents the documents one at a time. To get to document #2, I had to go down to the last segment and click Enter there. I didn't find any other way to switch between documents.


Press Ctrl+L to bring up the Project List. Click on the file you want to go to. Press Escape to dismiss the Project List.

So my lesson here is (which you have already tried to make me understand, sorry for being a bit obtuse): leave Microsoft as early as possible, and go back to Microsoft as late as possible.


For PowerPoint, I'd say, do as little post-formatting in OpenOffice Impress as possible. If your client wants a PowerPoint file, then do the translation in OmegaT, but convert the translation back to PowerPoint as soon as you can, and then do the corrections in PowerPoint. Sometimes you can fix something in OpenOffice that isn't broken in PowerPoint, so if you do most fixing in PowerPoint itself, you'll do the least amount of unnecessary re-formatting.


Direct link Reply with quote
 
Harklas
Local time: 03:31
TOPIC STARTER
But if there is no MS Office on this computer Jun 24, 2010

I guess it's better to stick to .odp for as long as possible?

Direct link Reply with quote
 
Pages in topic:   [1 2] >


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


OmegaT ignores most of the source text :(

Advanced search






Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
SDL Trados Studio 2017 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2017 helps translators increase translation productivity whilst ensuring quality. Combining translation memory, terminology management and machine translation in one simple and easy-to-use environment.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search