Split multi-sentence TMX segments into single sentence segments
Thread poster: Samuel Murray
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 07:36
Member (2006)
English to Afrikaans
+ ...
Jun 27, 2016

Hello everyone

I have a TM from a client that was segmented by paragraph, and so many segments contain more than one sentence. I would like to split these segments up so that if the source and target fields have the same number of sentences, they are split into separate segments of one segment each. Does anyone know of a tool that can do this? I use Windows 7. Unfortunately my CAT tool does not do this automatically, as some others do.

Thanks
Samuel


 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 06:36
Member (2009)
Dutch to English
+ ...
hmm Jun 27, 2016

You could try importing your TMX into a CAT tool that can edit TMXs (memoQ and CafeTran), and seeing what happens (the paragraphs might get segmented into individual sentences), or fiddling with the segmentation rules.

Michael


 
Minh Nguyen
Minh Nguyen  Identity Verified
Vietnam
Local time: 12:36
English to Vietnamese
Export to excel Jun 27, 2016

You can use my tool to export the tmx to 2 column excel file format, copy each column to a new workbook and use any aligner to create new sentence-based segmented tm.

http://www.eng-vietranslator.net/2016/06/how-to-convert-tmx-to-excel-file-format.html


 
Rodolfo Raya
Rodolfo Raya  Identity Verified
Local time: 02:36
English to Spanish
Re-segment manually Jun 27, 2016

Hi,

Re-segmenting something that has paragraph segmentation should be done manually. You may have to split sentences or reorder translations.

You can use Stingray, http://www.maxprograms.com/products/stingray.html, for adjusting segmentation in a TMX file.

Regards,
Rodolfo


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 07:36
Member (2006)
English to Afrikaans
+ ...
TOPIC STARTER
Or: extract only multi-sentence segments Jun 27, 2016

Thanks everyone for your ideas. Something that would help me a great deal is a tool that can extract all segments with more than one sentence in it. Yes, I can convert the TM to a two-column format, convert that into two separate files, and then use an aligner to recreate the TM (which I would have to check manually), but my TM is huge and not all segments are multi-sentence segments.

 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 06:36
Member (2009)
Dutch to English
+ ...
ABBYY Aligner 2.0 Freelance (free version) ... Jun 27, 2016

… can split and join segments of a TMX. Maybe also worth a look.

https://www.abbyy.com/aligner/buy/


 
esperantisto
esperantisto  Identity Verified
Local time: 08:36
Member (2006)
English to Russian
+ ...
SITE LOCALIZER
OmegaT Jun 27, 2016

May I remind you that OmegaT converts paragraph-segmented TMX files to sentence-segmented on-the-fly?

[Edited at 2016-06-27 15:01 GMT]


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 07:36
Member (2006)
English to Afrikaans
+ ...
TOPIC STARTER
Yes, some CAT tools do this on the fly Jun 27, 2016

esperantisto wrote:
May I remind you that OmegaT converts paragraph-segmented TMX files to sentence-segmented on-the-fly?


Yes, some CAT tools do this on the fly, but I need the TMX file itself to become subsegmented, because I'm not using a CAT tool that can do that.


 
CafeTran Training (X)
CafeTran Training (X)
Netherlands
Local time: 07:36
Filter via regular expression Jun 27, 2016

Samuel Murray wrote:

Something that would help me a great deal is a tool that can extract all segments with more than one sentence in it.


You can use a regular expression to filter on segments that contain multiple sentences.

Here the project (CafeTran can handle TMX files like projects):



Filtered:



You can then Split and Merge left and right (source and target).

It's of course also possible to write these filtered segments to a new file (and delete them from the current one), duplicate, triplicate them.

Then via Find and Replace and regular expressions you can remove all second (third etc.) sentences from every segment (left and right). Repeat this in another copy of the file for all first and fourth (fifth etc.) sentences.

Here demonstrated to remove the first sentence of a paragraph:



Of course this doesn't consider abbrev.--you'll have to enhance the regular expression for that. If it's possible, at all.

[Edited at 2016-06-27 17:11 GMT]


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Split multi-sentence TMX segments into single sentence segments







Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »