OmegaT Segmentation
Thread poster: Jeremy Fuller

Jeremy Fuller
Japanese to English
Feb 25, 2008


I am just getting started in the translation field and I have been testing some CAT programs lately.

I am currently using OmegaT 1.7, and I would like to make two translation segments into one.

I can't find an option in Omega that allows me to do this. Is there some sort of tag that I can add to the text in the target file that specifies segmentation points?

For example, is there something like a tag that I can use?

The text is 2 sentences in Japanese (OmegaT makes the sentences separate segments). However, I want to make the English translation 1 sentence. So, I would like the segmentation to match.


Not possible Feb 25, 2008

Jeremy Fuller wrote:


I am just getting started in the translation field and I have been testing some CAT programs lately.

I am currently using OmegaT 1.7, and I would like to make two translation segments into one.

As yet, there is no such segment split/merge function in OmegaT.

You can, of course, make two sentences out of one within a segment, or spread a single sentence over two segments:

This is a segment and it has one sentence.
This is a segment. It originally had one sentence


This is a segment.
This is a segment and

It has two sentences.
it originally had two sentences.

For languages with similar syntax, this works on most occasions, and even when it doesn't, it can still be done; it simply results in more or less nonsensical TMs for the segments concerned (but the target text is perfectly serviceable).

Another thing you can do is to edit the source text: insert or remove the full stops where you wish to change the segmentation behaviour.

Although a segment split/merge function would certainly be desirable, it's quite possible in practice to live without it and still achieve the desired result. If you find that you frequently make changes to the sentence structure, though, you might consider using paragraph rather than sentence segmentation.



Not easily done, but it is possible, sometimes Feb 25, 2008

Jeremy Fuller wrote:
I am currently using OmegaT 1.7, and I would like to make two translation segments into one.

If the two sentences are in two different paragraphs, then there is no way you can do it without actually editing the source document.

If the two sentences are in the same paragraph, you can do it, but it is a bit daunting if you haven't done it before. What you must do, is to add the first sentence to the segmentation rules as an exception. Often you can just add the two or three words closest to the sentence break, but it may be safer to just add the first sentence in its entirety.

So, go Options... Segmentation... Add, and add a dummy language with the language code ".*". Then move that language to the top. Then click Add in the bottom set of options to add a rule.

The tickbox "break/exception" should not be ticked. The "pattern before" item should contain the first sentence. Note that spaces should be changed to "\s". The "pattern after" can be just "\s" (or I'm not sure about Japanese).

So, if your text is this: "This is a dog. This is a cat.", then your segmentation rules will be this:

(no tick) | This\sis\sa\sdog\. | \s

Take a look also at how the abbreviations in the English rules are done.

[Edited at 2008-02-25 11:03]


Non-breakable space is your friend Feb 25, 2008

I'm quite lazy to monkey around with the segmentation rules for just one or two segments. I simply open the source file (I work mostly with and glue two pieces of text with a non-breakable space (Ctrl+Space, if you did not know that). Then I reload the project and enjoy.


Jeremy Fuller
Japanese to English
problem with the period? Feb 27, 2008

Thanks to everyone for their replies.

I tried using Samuel's trick, but I can't seem to make it work. Not sure if this is because I am using the trick on a Japanese sentence or if I am just doing it wrong.

Or, it is possible that there is a problem with the "\" symbol. On Japanese operating systems it tends to change into the yen mark depending on the program and the computer's mood.

The force break didn't seem to work either. Although I am glad to have that information.

In the end, I erased the Japanese period at the end of the first sentence and replaced it with an English period.

Then my two sentences became one segment in OmegaT.



