Problems with project files and segmentation
Thread poster: Viivi

Local time: 06:32
Aug 17, 2010

Hello again!

I am a beginner in terms of OmegaT, and I apologise for my lack of tech-savviness. Anyway, I have now managed to get the glossaries to work and did a short test translation, which worked just fine.

Now I have a doc-file with lots of text boxes, pictures and whatnot. I converted it to an odt-file in Open Office Writer, created a new project in OmegaT and this is roughly what I get:

< f0 >U< /f0 >< f1 >n< /f1>< f2>a< /f2>< f3>u< /f3>< f4>t< /f4>< f5>h< /f5>< f6>o< /f6>< f7>r< /f7>< f8>i< /f8>< f9>z< /f9>< f10>e< /f10>< f11>d < /f11>< f12>c< /f12>< f13>ha< /f13>< f14>ng< /f14>< f15>e< /f15>< f16>s < /f16>< f17>o< /f17>< f18>r < /f18>< f19>mo< /f19>< f20>d< /f20>< f21>ifi< /f21>< f22>c< /f22>< f23>a< f24>ti< /f24>< f25>o< /f25>< f26>n < /f26>< f27>t< /f27>< f28>o < /f28>< f29>t< /f29>< f30>h< /f30>< f31>i< /f31>< f32>s < /f32>< f33>s< /f33>< f34>ys< /f34>< f35>t< /f35> ...

(I had to add some spaces after the < so you can see what it looks like to me.)

That particular segment/sentence is supposed to be about unauthorized changes etc. It is obvious I cannot translate like this.

Does anybody have any ideas what is wrong with my documents/project?

I am using OmegaT 2.1.7_1. The source language is EnUS and target language Finnish. I have four doc-files, which I have converted into odt-files. I have not yet tried to add glossaries or translation memories to the project. I simply added the project files. I am assuming the source files are too fancy somehow...? I suspect they might even have been originally something else than Word documents.

Is there any way to translate these with the help of OmegaT?


esperantisto  Identity Verified
Local time: 06:32
Member (2006)
English to Russian
+ ...
Nothing wrong with OmegaT, actually Aug 17, 2010

Is your file really .doc? The text looks typical to .docx (MS Word 2007), which is a really lousy format. If you can obtain the original .docx file, do it and try translating with the latest build of OmegaT (1.8.0). Otherwise, if you have Microsoft Office, try exporting the file to RTF and converting back to .doc, then to ODT.

Also search the Yahoo! group of OmegaT, the topic of tag reduction has been discussed.


Susan Welsh  Identity Verified
United States
Local time: 23:32
Member (2008)
Russian to English
+ ...
A couple of notes Aug 17, 2010

These things are known in the trade as "tag soup." Apart from what esperantisto wrote, let me add that you also get a lot of this junk if a document has been converted from a PDF--especially when there are lots of graphic elements, text boxes, etc., which yours has. Note that esperantiso was referring to build left out the 2, which might confuse you.

If your document is not a .docx, but rather a conversion from PDF, and esperantisto's instructions don't help, you should ask the client for the original file. (This is one reason that people charge extra for translating from PDFs.) Even though I have ABBYY PDF Converter, which does a pretty good job, I have found that converting PDFs is no good for translating in a CAT tool, because of the tag soup. I usually strip it down to "text" (via Adobe Reader), translate it, and then reformat it. But something with as many graphics as yours has would be quite time consuming for me, given my level of expertise with Word/OOo Writer.

good luck


esperantisto  Identity Verified
Local time: 06:32
Member (2006)
English to Russian
+ ...
The last resort Aug 17, 2010

If no fancy formatting is required, reset everything to the style default formatting (select text, press Ctrl+M in OOo Writer).


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »

Problems with project files and segmentation

Advanced search

SDL Trados Studio 2019 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2019 has evolved to bring translators a brand new experience. Designed with user experience at its core, Studio 2019 transforms how new users get up and running, helps experienced users make the most of the powerful features, ensures new

More info »
memoQ translator pro
Kilgray's memoQ is the world's fastest developing integrated localization & translation environment rendering you more productive and efficient.

With our advanced file filters, unlimited language and advanced file support, memoQ translator pro has been designed for translators and reviewers who work on their own, with other translators or in team-based translation projects.

More info »

  • All of
  • Term search
  • Jobs
  • Forums
  • Multiple search