Marc Baas Italy Local time: 21:45 Dutch to English + ...
Feb 18, 2011
Hi all,
I'm completely new to OmegaT, but not to CAT software.
Last night I downloaded the stable version to check it out and run some tests on a project that I am currently working on where I definitely need a tool to check what I have to translate and what not (complicated project as well).
I noticed two things that are a bit confusing to me.
1) The segmentation that OmegaT does seems very, very fragmented to me. I always prefer to have segments of at least full sentences, because more often than not, due to different word orders in languages, one needs to know the full sentence to know how to translate it.
OmegaT, in my case does not leave a single sentence in tact (tried with several different files). Is there an 'easy' way (don't have much time to spare at the moment) in which I can tell OmegaT to leave my sentences the way they are and not fragment them?
2) The other thing I noticed is that commas disappear from my source text as a result of the segmentation, which (if I cannot manage to resolve this) would make the tool useless to me. Obviously it would take too much time to first edit with OmegaT, and then compare source and target texts manually to try to find the commas that are missing. That is, when first having to compare the OmegaT source text with the original one for missing punctuation.
I hope this is just a setting, but if it is a bug, I will have to keep an eye out for something else that does not change my source text when I load it into the tool.
Aside from these two fundamental issues, it does look like a very nice tool though.
One last point: did anyone try the latest version, and can you comment on the reliability of that one for practical use in translating?
Any comments are highly appreciated.
Marc
Subject:
Comment:
The contents of this post will automatically be included in the ticket generated. Please add any additional comments or explanation (optional)
Didier Briel France Local time: 21:45 Member (2007) English to French + ...
Some things to check
Feb 18, 2011
Marc Baas wrote:
1) The segmentation that OmegaT does seems very, very fragmented to me. I always prefer to have segments of at least full sentences, because more often than not, due to different word orders in languages, one needs to know the full sentence to know how to translate it.
By default, OmegaT segmentation rules are quite conservative.
So, if your text is segmented more than for logical end of sentences, it could come:
- From your source text. If it contains hard returns in the middle of sentences, then OmegaT will consider them as two different sentences. What is the format of your source text?
- From abbreviations. If your source text contain numerous abbreviations ending with '.', and these abbreviations are not in OmegaT rules, then you have to provide them.
OmegaT, in my case does not leave a single sentence in tact (tried with several different files). Is there an 'easy' way (don't have much time to spare at the moment) in which I can tell OmegaT to leave my sentences the way they are and not fragment them?
Yes, it's called the paragraph mode. In Project > Properties, uncheck the Enable Sentence-Level Segmenting box.
If, after that, there is still some segmentation, it comes from your source text.
2) The other thing I noticed is that commas disappear from my source text as a result of the segmentation,
By design, OmegaT doesn't remove text with segmentation, it only splits it.
The only text removed from the Editor (but *not* from the target text) is space between sentences.
which (if I cannot manage to resolve this) would make the tool useless to me. Obviously it would take too much time to first edit with OmegaT, and then compare source and target texts manually to try to find the commas that are missing. That is, when first having to compare the OmegaT source text with the original one for missing punctuation.
I hope this is just a setting, but if it is a bug, I will have to keep an eye out for something else that does not change my source text when I load it into the tool.
I've never heard of such a behaviour (removing commas, or any other text, for that matter).
Again, what is the format of your source documents?
One last point: did anyone try the latest version, and can you comment on the reliability of that one for practical use in translating?
Most people experienced with OmegaT use the "latest" version, which is generally at least as stable as the "standard" one.
Didier
Subject:
Comment:
The contents of this post will automatically be included in the ticket generated. Please add any additional comments or explanation (optional)
Marc Baas Italy Local time: 21:45 Dutch to English + ...
TOPIC STARTER
Thanks
Feb 18, 2011
Thanks Didier for the very elaborate and detailed explanation. I very much appreciate your effort.
My source text is a MS Office document, which it seems to read fine, both the 2007 format as the older one. I tried to see if there are differences between that one and the .odt format (OpenOffice) but could not tell any differences in the tool itself.
So for that matter I was very pleasantly surprized as to how easily it reads these formats (as opposed to some of the commercial tools around).
I will definately follow your advice and check things there. Like I said before, I really lack the time at the moment to dive in deep into the software. I just urgently needed a tool that did some checking for me. Which, by the way, it did perfectly.
What you are pointing out that it might be the source text could be. I'm working on a medical document that is litterally cramped with abbreviations, formulas and such. So it could be that this is the main culprit.
Thanks agian, and I will also download the latest version and install that one, so I have the most current version.
Marc
Subject:
Comment:
The contents of this post will automatically be included in the ticket generated. Please add any additional comments or explanation (optional)
Samuel Murray Netherlands Local time: 21:45 Member (2006) English to Afrikaans + ...
My segadder script
Feb 18, 2011
Didier Briel wrote:
Marc Baas wrote:
1) The segmentation that OmegaT does seems very, very fragmented to me.
So, if your text is segmented more than for logical end of sentences, it could come:
- From abbreviations. If your source text contain numerous abbreviations ending with '.', and these abbreviations are not in OmegaT rules, then you have to provide them.
If you find that you need the ability to easily add abbreviations to the segmentation rules quickly, and if you're using MS Windows, you can use my segadder script which somewhat automtates the process of adding abbreviations to the segmentation rules:
The contents of this post will automatically be included in the ticket generated. Please add any additional comments or explanation (optional)
Didier Briel France Local time: 21:45 Member (2007) English to French + ...
Formulas, drawings, etc., can break the text
Feb 18, 2011
Marc Baas wrote:
My source text is a MS Office document, which it seems to read fine, both the 2007 format as the older one.
OmegaT can only read the "2007" (i.e., .docx) format, not the legacy (i.e., doc) one.
What you are pointing out that it might be the source text could be. I'm working on a medical document that is litterally cramped with abbreviations, formulas and such. So it could be that this is the main culprit.
Yes, if there are "things" in the middle of the text, they might well it.
Didier
Subject:
Comment:
The contents of this post will automatically be included in the ticket generated. Please add any additional comments or explanation (optional)
To report site rules violations or get help, contact a site moderator:
Save time by automatically extracting terms. 15% off!
SDL MultiTerm Extract 2011 allows you to automatically create candidate term lists from your existing documentation. This removes the manual effort involved with traditional terminology creation, allowing you to rapidly add terms to SDL MultiTerm.
Exclusive discount for ProZ.com users!
Save over 13% when purchasing Wordfast Pro 3.0 through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value