New interactive aligner in OmegaT
Thread poster: Samuel Murray

Samuel Murray  Identity Verified
Netherlands
Local time: 07:08
Member (2006)
English to Afrikaans
+ ...
Sep 9, 2016

OmegaT Aligner

There is now a file aligner built into OmegaT. It will align two files of any type that OmegaT has a filter for, and use OmegaT's segmentation rules based on selected languages of the files.

The aligner has visual assistance features that most other aligners don't have, such as pattern highlighting (e.g. highlight all numbers, or highlight certain words) and marking individual segment pairs as "accepted" (green) or "needs review" (red).

All features can be used with the mouse, and most features can also be used with quick keyboard shortcuts. Segments can also be moved by dragging them with the mouse.

If you don't want all aligned segment pairs to be added to the exported TMX file, simply untick the ones you don't want (e.g. segment pairs with the same source and target text are automatically unticked, but you can tick them again).

Pinpoint alignment allows you to bring two cells into alignment even if they are very far from each other -- just press space and click with the mouse. The "realign-pending" feature allows you to perform automatic re-alignment at any time, on any row that you haven't marked as accepted yet (without affecting rows that are marked as accepted).

1. In OmegaT go to Tools > Align Files. The aligner will ask you to select your source and target files, and fill in their language codes. Then click OK.





2. Then, in what is labelled "Step 1: Adjust alignment parameters", you can choose whether to have the files segmented by sentence or paragraph, and whether any tags should be removed from the files.



You can also access OmegaT's segmentation rules from this screen, and adjust the segmentation rules, which will instantly be reflected in the aligner. For example, if you notice that misalignments are caused by a certain abbreviation, you can add that abbreviation to OmegaT's segmentation rules, and the aligner will refresh itself according to the new segmentation rule.

Some file types in OmegaT have additional options, which you can also access from this screen. For example, if you're aligning HTML files, but you don't want URLs to be included in the alignment, you can deselect that option in OmegaT's file filter settings directly from this dialog, and the aligner will refresh itself according to the new filter setting.

There are also some extra options for hardcore alignment fans, such as being able to select the comparison mode, algorithm, calculator and counter, but for most of us, these options can be left in peace.

To segment by paragraph, deselect the "Segment" tick box. To adjust the segmentation rules, click the "Rules" button.

No individual segments can be moved, merged, split or edited during step 1. You can't go back to step 1 from step 2, so make sure you are satisfied with your settings at step 1 before you move to step 2. When you're ready, move to step 2 by clicking "Continue".

3. Then, in what is labelled "Step 2: Make manual corrections", you can manually fix the alignment.



In both step 1 and 2 you can specify a highlight pattern as a regular expression. For example, if you're translating a file about Trados, and you want the word "Trados" as well as all numbers highlighted, simply add [0-9]|Trados to the pattern dialog. This makes it a lot easier to visually see which segments are out of alignment.

The aligner will automatically mark some segment pairs as accepted (green), and will automatically mark some segment pairs as "keep" (ticked).

The aligner shows alternating segment pairs in alternating shades of grey. It is important to realise that individual rows do not represent individual segment pairs -- if multiple consecutive rows all have the same shade of grey, then they are all in one segment pair.

Use the arrow keys to move between cells. Use Shift+arrow to select multiple cells. You can also move between cells using Enter and Shift+Enter (moves columns-wise) and Tab and Shift+Tab (moves in rows-wise).

To the right of the aligner, there are five buttons: move cell up, move cell down, split cell, merge cells, and edit cell. If any action is not possible with the selected cell or cells, the button for that action is greyed out. Generally, moving a cell or merging/splitting it will affect only that cell and immediately surrounding cells, without affecting the rest of the cells. There is no function to insert or delete cells, and there is no "undo" function.

If you use "merge" while only one cell is selected, the aligner will merge that cell with the one below it. If you select multiple cells and use "split", the aligner will attempt to split the text smartly. If you use "split" when only one cell is selected, a dialog will open up in which you can place your cursor at the point that you want the text to be split.

The keyboard shortcuts for these functions are:
 merge = M
 split = S
 edit = E
 move up = U
 move down = D

Segment pairs can be marked as "Accepted" (green) or "Needs Review" (red). The aligner assumes that if a segment pair is marked as "Accepted", then you consider that segment pair to be correctly aligned. This is important for the "realign pending" feature, which attempts to re-align all non-accepted segments.

The keyboard shortcuts for these functions are:
 mark as accepted = A
 mark as needs-review = R
 clear accept/review mark = C
 "realign pending" = Ctrl+R

At the end of the alignment process, only segment pairs marked as "keep" will be exported to the TMX file. You can change the keep/not-keep status of any segment pair by pressing the keyboard shortcut "K".



If you see two segments that should be aligned to each other, but they are so far from each other that using "move up" or "move down" would be bothersome, you can use pinpoint alignment. To do this, click the one segment in the one column, press Space, and click the other segment in the other column. The aligner will then align those two segments into a segment pair and mark it as accepted (green).

If you press the Reset button, all your hard work will be lost. The Save TMX button is for saving all "keep" segment pairs to a TMX file. You can generate a new TMX file as many times as you want.

If you had made changes to the file filters and/or segmentation rules during the alignment, you will be asked at the very end of the alignment process if you would like to save those changes. The default reply is "yes", so be careful after exiting the aligner that you select the correct option.

The suggested file name for the exported TMX file contains the names of the two files that were aligned, so if you want the file name to mention language codes, remember to add them when saving the TMX file.

Notes:

It is not possible to save your alignment. You can only export your alignment as a TMX file. There is no functionality for stopping halfway and then continuing the next day.

The accepted (green) and "keep" (ticked) segment pairs are not protected from accidental re-misalignment, so be careful what you select before you press a button. In fact, some non-keep segment pairs may end up being marked as "keep" (ticked) if you perform merge/split operations on other segments that have been marked as "keep".

The aligner uses the segmentation settings in OmegaT for the languages that you select at the start of the alignment, and it uses those same language codes when creating the TMX file. Therefore, when selecting the languages of the files, make sure you select language codes that correspond to the segmentation rules that you want to use.

It is sometimes a good idea to use "realign pending" after you've used pinpoint alignment, because pinpoint alignment does not automatically try to re-align all segments that were affected by the pinpoint alignment.


 

Dan Lucas  Identity Verified
United Kingdom
Local time: 06:08
Member (2014)
Japanese to English
Intriguing Sep 9, 2016

Samuel Murray wrote:
There is now a file aligner built into OmegaT. It will align two files of any type that OmegaT has a filter for, and use OmegaT's segmentation rules based on selected languages of the files.

Thank you for this detailed post Samuel. I have tried AlignFactory Light (generally solid, but expensive and doesn't feel like it has been updated for years), the SDL Studio 2015 aligner (so annoying it had me tearing my hair out) and MemoQ's aligner (less infuriating than that of SDL but less competent than AlignFactory) and none of them suited me. Maybe OmegaT should be next on the list.

I have been feeling well-disposed toward you since you recommended Agent Ransack, which I didn't think would be much use (since I also use dtSearch and Directory Opus) but I looked at it out of curiousity and found that it is in fact is an excellent utility.

Dan


 

Samuel Murray  Identity Verified
Netherlands
Local time: 07:08
Member (2006)
English to Afrikaans
+ ...
TOPIC STARTER
@Dan Sep 9, 2016

Dan Lucas wrote:
I have been feeling well-disposed toward you since you recommended Agent Ransack...


Well, my favourite aligner is still PlusTools. It has unfixed bugs but it deals well with poorly matched files.

OmegaT's built-in aligner is really only suitable for documents that are translations of the same version of each other. I other words, it's not really suitable if the one file contains a lot of extra text or omits a lot of text or has different text or has some of its text in a different order. This is due to the lack of "insert" and "delete" and the inability to move blocks of text across other blocks of text. The inability to save your alignment halfway and continue later also limits the types of documents that can be aligned with it.

On the other hand, merging/splitting/moving a single segments only affects that particular segment and has no effect on previous or subsequent segments. You can also select multiple segments and move them up/down as a single block, but you can't move them very far.


 

Michael Immoff Arsenault  Identity Verified
Canada
Local time: 01:08
Member
French to English
+ ...
Nice post / explanation Sep 9, 2016

Thanks for this post Samuel,

I've been using it for a while. OmegaT is just getting better and better, and it's still so fast.


 

Mathew Call
United States
Can it handle subtitles files? Nov 2, 2016

I'm looking for a more efficient way to translate English subtitles to non-English subtitles. For example, after a video has been captioned in English, if you download the SRT caption file, this is what it might look like:

1
00:00:01,439 --> 00:00:03,389
Your choice...

2
00:00:03,389 --> 00:00:05,129
for higher taxes...

3
00:00:05,129 --> 00:00:06,910
for workin' Joes...

4
00:00:06,910 --> 00:00:08,630
Spread your income.

5
00:00:08,630 --> 00:00:11,059
Keep what's yours.

6
00:00:11,059 --> 00:00:13,459
A trillion in new spending.

7
00:00:13,459 --> 00:00:16,299
Freeze spending, eliminate waste.

8
00:00:16,300 --> 00:00:18,440
Pain for small business.

9
00:00:18,440 --> 00:00:20,250
Economic growth.

10
00:00:20,250 --> 00:00:21,450
Risky.

11
00:00:21,450 --> 00:00:23,020
Proven.

12
00:00:23,020 --> 00:00:24,820
For a stronger America,

13
00:00:24,820 --> 00:00:26,200
McCain.

14
00:00:26,200 --> 00:00:29,960
JOHN MCCAIN: I'm John McCain,
and I approve this message.

The numbers are time codes indicating when the text should appear and disappear. Can the align feature automatically insert my human translation over the English in the correct places? Like auto-align and insert? (I did update to 4.0 to try this new alignment feature but didn't get the results I was looking for.)


 

Didier Briel  Identity Verified
France
Local time: 07:08
Member (2007)
English to French
+ ...
There is an SRT filter in OmegaT Nov 2, 2016

Mathew Call wrote:
I'm looking for a more efficient way to translate English subtitles to non-English subtitles. For example, after a video has been captioned in English, if you download the SRT caption file, this is what it might look like:

1
00:00:01,439 --> 00:00:03,389
Your choice...

2
00:00:03,389 --> 00:00:05,129
for higher taxes...


Can the align feature automatically insert my human translation over the English in the correct places? Like auto-align and insert? (I did update to 4.0 to try this new alignment feature but didn't get the results I was looking for.)

That's not an aligner does. An aligner will take an English version of the SRT file and, say, a French version of the same SRT file, and produce a translation memory from these files.

If what you want is translate the SRT file, simply load the SRT file in OmegaT and translate it.

Didier


 

Alessandra Armenise  Identity Verified
United Kingdom
Local time: 06:08
English to Italian
+ ...
Aegisub Nov 16, 2016

Mathew Call wrote:

I'm looking for a more efficient way to translate English subtitles to non-English subtitles. For example, after a video has been captioned in English, if you download the SRT caption file, this is what it might look like:

The numbers are time codes indicating when the text should appear and disappear. Can the align feature automatically insert my human translation over the English in the correct places? Like auto-align and insert? (I did update to 4.0 to try this new alignment feature but didn't get the results I was looking for.)



Hello Mathew,
Have you tried Aegisub?
There is a translator assistant in it which is quite helpful, the only problem is that it replaces the english version once you translated it.

Hope this helps!
Alessandra


 

Almudena González  Identity Verified
Spain
Local time: 07:08
English to Spanish
+ ...
Can't find alignment tool Feb 5, 2017

Hello everybody, I've just installed OmegaT and am trying to familiarize myself with it. Is it possible that the version I downloaded doesn't have the Alignment option in the Tools menu? How can I download a later version?

ztk34ekiieytuoyoeyvh.png

Thanks in advance,

Almudena González.


 

Samuel Murray  Identity Verified
Netherlands
Local time: 07:08
Member (2006)
English to Afrikaans
+ ...
TOPIC STARTER
The aligner is in the "beta" version Feb 5, 2017

Almudena González wrote:
How can I download a later version?

The green "download" button on the front page is for an older version, the "standard" version. The aligner is in the "latest" version (labelled "beta", for historical reasons), and currently it's at version 4.1.0. You can that here:
https://sourceforge.net/projects/omegat/files/OmegaT%20-%20Latest/


 

CafeTran Training (X)
Netherlands
Local time: 07:08
Basic? Feb 5, 2017

Samuel Murray wrote:

The aligner has visual assistance features that most other aligners don't have, such as pattern highlighting (e.g. highlight all numbers, or highlight certain words) and marking individual segment pairs as "accepted" (green) or "needs review" (red).


I think that most aligners indicate whether a match is perfect or needs human input.

[Edited at 2017-02-05 17:51 GMT]


 

Samuel Murray  Identity Verified
Netherlands
Local time: 07:08
Member (2006)
English to Afrikaans
+ ...
TOPIC STARTER
@CafeTran Feb 5, 2017

CafeTran Training wrote:
Samuel Murray wrote:
The aligner has visual assistance features that most other aligners don't have, such as ... marking individual segment pairs as "accepted" (green) or "needs review" (red).

I think that most aligners indicate whether a match is perfect or needs human input.


The "needs review" is actually short for "needs further review". A segment that is marked as "needs review" was seen by the user, but needs to be reviewed [more]. A segment that has not been seen at all, is simply unmarked.

In other words, there are three states: not marked as anything, marked as "needs [further] review", and marked as "accepted".

Most aligners that allow me to mark segments' acceptance status allow me to mark segments in two states only, namely "not reviewed" and "reviewed and accepted". There's no way to distinguish between a segment that has been seen but not yet been accepted, and a segment that hasn't been seen yet.

Those aligners assume that when the user reviews a segment, he will immediately review it fully, and finish it off at the same time, before moving to another segment. The OmegaT aligner makes it possible to distinguish visually between segments that the user has not yet evaluated, and segments that the user has only done a quick-and-dirty evaluation of.

In the OmegaT aligner, segments that have auto-detected perfect alignment (e.g. segments consisting of numbers only) are automatically marked green. Segments that were not auto-detected as perfectly aligned aren't marked as red -- instead, they are unmarked. Marking as red marking is done by the user, manually.

That said, the OmegaT aligner still needs work. The "needs review" status does not stick after pending realignment and spot realignment, and you can't save an alignment halfway through, so the "needs review" status is not very useful.



[Edited at 2017-02-05 19:35 GMT]


 

Almudena González  Identity Verified
Spain
Local time: 07:08
English to Spanish
+ ...
Thanks Feb 5, 2017

Thanks for the tip, Samuel.

 

HannahN
Germany
Aligner not displayed - sorry found answer above Sep 8, 2017

Thank you for the detailed posts.

I'd love to use the OmegaT Aligner, but it isn't displayed in my (German) version of OmegaT. Does anyone have an idea what to do about that??

Thanks and have a lovely weekend...


 


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


New interactive aligner in OmegaT

Advanced search






Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »
memoQ translator pro
Kilgray's memoQ is the world's fastest developing integrated localization & translation environment rendering you more productive and efficient.

With our advanced file filters, unlimited language and advanced file support, memoQ translator pro has been designed for translators and reviewers who work on their own, with other translators or in team-based translation projects.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search