OmegaT segmentation
Thread poster: Alain Alameddine

Alain Alameddine  Identity Verified
Lebanon
Local time: 08:53
Member (2009)
English to French
+ ...
Jun 24, 2009

I'm translating doctor/patient conversations with much repetitions, and I'm trying to use OmegaT for the first time.

Sentences do not end with a full stop, and I end up having segments made of hundreds of sentences. Example:

"so in the meantime am i gonna be using any sort of cream or anything
or it just gonna be pills
we can give you a cream
something over-the-counter is fine
we wanna attack this as i say with a pill
but for your relief of the burning and that sensation of being uncomfortable we can use something like a blistex
i can give you something over-the-counter but it is basically the same thing"
(and much more)

All of this is considered to be only one segment. Do you know how I can "tell" OmegaT to segment them differently, say, at each "Enter"?


 

Samuel Murray  Identity Verified
Netherlands
Local time: 07:53
Member (2006)
English to Afrikaans
+ ...
I assume you're translating TXT files Jun 24, 2009

Alain Alameddine wrote:
All of this is considered to be only one segment. Do you know how I can "tell" OmegaT to segment them differently, say, at each "Enter"?


1. Go Options > File Filters...
2. Select "Text Files" and click Options...
3. You'll see an options dialog with this information in it:

Segment source text into paragraphs on:
* Line breaks
* Empty lines
* Never

4. Select the first option.

Of course, if you do have a single sentence spanning more than one line (with a line break in the middle), it'll come out in OmegaT as two segments, if you select the above option. Keep in mind that you can edit your source text in the /source/ folder at any time and just reload (F5) the project, in case you want to correct a mis-segmentation manually.

Oh, and unfortunately any changes in the segmentation system affects the entire project, so you can't enable and disable the above option for subparts of your source file. If your source file is a mix of both types of segmentation, then I suggest pre-editing the source file, or splitting your files into consistent sections.

[Edited at 2009-06-24 22:10 GMT]


 

Alain Alameddine  Identity Verified
Lebanon
Local time: 08:53
Member (2009)
English to French
+ ...
TOPIC STARTER
Thanks! Jun 24, 2009

Alright, great, exactly what I was looking foricon_smile.gif

I certainly have a lot more to learn, but is it really that helpful? I mean it finds repetitive terms and suggests matches, but those are always partial (a few similar words in a different sentence). So the time I take to choose the match and delete additional words is equal/superior to the time it would've taken me to actually type the correct word. Right?

Or has "adding terms to the glossary" (which I don't know how to do) got anything to do with that?


 

Susan Welsh  Identity Verified
United States
Local time: 01:53
Member (2008)
Russian to English
+ ...
Why CATs are useful / OmegaT in particular Jun 24, 2009

Alain Alameddine wrote:

Is it really that helpful? I mean it finds repetitive terms and suggests matches, but those are always partial (a few similar words in a different sentence). So the time I take to choose the match and delete additional words is equal/superior to the time it would've taken me to actually type the correct word. Right?

Or has "adding terms to the glossary" got anything to do with that?


I have never really used any other CAT tool except OmegaT, and am by no means the most experienced user. So take this for whatever it's worth:

If the issue is typing speed vs. inserting a fuzzy match, then I would say that is not the purpose of CAT tools, unless you have large chunks of copy that are identical to what's in the TM. The purpose is not to improve typing speed, but to facilitate translation and make it more consistent. If I don't remember how I translated "shoo-iss-mu" 100 pages ago, I will be reminded, either by the TM or by the glossary--if I had put it in the glossary and inflection does not get in the way. (Forgive me for showing off one of my 20 or so words of Arabic; I was born and spent my childhood in Beirut, and get nostalgic when I see someone from Lebanon on Proz.)

You don't get 100% matches very often in any CAT tool.

As for the glossary: It's not one of OmegaT's best features, because it's rather labor-intensive to "feed" it. Some other CAT tools do better. But it is improving on OmegaT, and development work is being done on it. A script has been provided called "tokenizers" which allows the program to recognize inflected variants of a word, which previously it could not do. (Unfortunately, for reasons never figured out, I could never get that to work for German, but only for Russian. If you're tech-literate, I'm sure you can do it.)

I do not translate highly repetitive material, fortunately or unfortunately. For me, the benefits of a CAT are mainly 1) there's almost no danger of dropping copy, because target and source segments are right there, side by side; 2) the formatting of the target document is easily replicated, if it is supplied in a program that the CAT tool supports, or can easily be converted into one. OmegaT copes easily with web pages, for example, including graphics and all, which I understand is not necessarily the case for some other CATs.

Good luck,
Susan


[Edited at 2009-06-25 10:30 GMT]


 

Alain Alameddine  Identity Verified
Lebanon
Local time: 08:53
Member (2009)
English to French
+ ...
TOPIC STARTER
Alright, thanks! Jun 24, 2009

And lol @ "shou iss mou"icon_smile.gif

 

Samuel Murray  Identity Verified
Netherlands
Local time: 07:53
Member (2006)
English to Afrikaans
+ ...
Two answers Jun 25, 2009

Alain Alameddine wrote:
...but is it really that helpful? I mean it finds repetitive terms and suggests matches, but those are always partial (a few similar words in a different sentence). So the time I take to choose the match and delete additional words is equal/superior to the time it would've taken me to actually type the correct word.


It depends on the length of the sentence and the percentage of overlap (called the match percentage). You can't set OmegaT to display only matches above a certain percentage, so unfortunately the top 5 matches are always displayed, but you can see the match percentages just below the match. If I were you, for short sentences, I'd ignore all matches below 80% or even 90%. Also, fuzzy matching is more useful for long sentences than for short ones.

One advantage of using a CAT tool is that you can search your previous translations for phrases and words, to see how you translated it previously. In OmegaT, you press Ctrl+F (with or without selecting a word first).

Or has "adding terms to the glossary" (which I don't know how to do) got anything to do with that?


There is no interaction between the glossary and TM matches in OmegaT. The glossary feature helps remind you that your client wanted you to translate certain words in certain ways. You don't create a glossary from within OmegaT, though -- you have to put a text file in the /glossary/ folder and edit it in a text editor. The User Manual explains glossaries in more detail.

In OmegaT, you can insert TM matches with a shortcut, but not glossary matches.


 

Alain Alameddine  Identity Verified
Lebanon
Local time: 08:53
Member (2009)
English to French
+ ...
TOPIC STARTER
Less helpful than I thought Jun 26, 2009

But thanks Samuel!icon_smile.gif

 


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


OmegaT segmentation

Advanced search






memoQ translator pro
Kilgray's memoQ is the world's fastest developing integrated localization & translation environment rendering you more productive and efficient.

With our advanced file filters, unlimited language and advanced file support, memoQ translator pro has been designed for translators and reviewers who work on their own, with other translators or in team-based translation projects.

More info »
PerfectIt consistency checker
Faster Checking, Greater Accuracy

PerfectIt helps deliver error-free documents. It improves consistency, ensures quality and helps to enforce style guides. It’s a powerful tool for pro users, and comes with the assurance of a 30-day money back guarantee.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search