MemoQ extensions to segmentation rules
Thread poster: Piotr Bienkowski

Piotr Bienkowski  Identity Verified
Poland
Local time: 00:23
Member (2005)
English to Polish
+ ...
Apr 18, 2012

Is there a help topic that explains thoroughly the MemoQ extensions to segmentation rules (I mean all those strings with # at both ends and words with underscores in the middle...). I can't find any!

I am not happy with the way MemoQ segments Polish originals, athough it appears to have some seg. rules for Polish.

Will appreciate your help.

Regards,

Piotr


 

Yasmin Moslem  Identity Verified
Egypt
Local time: 01:23
English to Arabic
Segmentation Rules Apr 18, 2012

Dear Piotr,

I mean all those strings with # at both ends and words with underscores in the middle...


For example, the first one #end##!#[\s]+#cap# can be read as follows:

#end# end of segment mark
#!# segment here
[\s]+ white space, one or more
#cap# capital letter


Both #end# and #cap# (as well as other ##) can be found under the tab "Custom lists".


Most importantly, could you please give us some segmented examples you do not like and how you expect them to be.

Kind regards,
Yasmin


 

Piotr Bienkowski  Identity Verified
Poland
Local time: 00:23
Member (2005)
English to Polish
+ ...
TOPIC STARTER
This helps a bit, but Apr 18, 2012

Yasmin Moslem wrote:

Dear Piotr,

I mean all those strings with # at both ends and words with underscores in the middle...


For example, the first one #end##!#[\s]+#cap# can be read as follows:

#end# end of segment mark
#!# segment here
[\s]+ white space, one or more
#cap# capital letter


Both #end# and #cap# (as well as other ##) can be found under the tab "Custom lists".


Most importantly, could you please give us some segmented examples you do not like and how you expect them to be.

Kind regards,
Yasmin



I am looking for a comprehensive lists that explains the use of ALL such symbols.

Examples of wrong segmentation:


wykonując badania geologiczne terenu i przeprowadzając uzgodnienia z Zarządem Fabryki Mydła S.A. Przygotowywany jest również projekt budowlany inwestycji, trwają pierwsze prace przygotowawcze przed wszczęciem właściwej procedury środowiskowej.


New segment should start after S.A.


Może podlegać nieznacznym zmianom po przeprowadzeniu oceny o oddziaływaniu projektu na środowisku i uzyskaniu pozwolenia na budowę, które planowane są odpowiednio na czerwiec 2012 r. i listopad 2012 r. Prace powinny rozpocząć się we grudniu 2012 r. i zakończyć w sierpniu 2013 r. Realizacja zadania jest ważna z uwagi na powiększenie miejsca składowania pustych kontenerów przeładowywanych w większej liczbie po zakupie urządzeń przeładunkowych i zwiększenia zdolności przeładunkowej.


New segments should start before Prace and before Realizacja.

P.S. I changed the company name in my quotes. I hope my client will not take me to courticon_wink.gif


 

Yasmin Moslem  Identity Verified
Egypt
Local time: 01:23
English to Arabic
abbreviation + end of segment + capital letter Apr 18, 2012

Dear Piotr,

In this case, you can go to the Segmentation Rules section of your project, click "Edit" and accept creating a copy; then select the copy and click "Edit".

On the "Segmentation" tab, make sure you select the first rule on the "Rules" pane #end##!#[\s]+#cap# and then move to the "Exceptions" pane and select the second rule [\s]#abbr_onlyabbr##!#[\s]+#cap# and click "Delete". Then, click "OK" to save the settings.

Now, try to reimport the document.

For your information, the meaning of this exceptional rule that you have deleted [\s]#abbr_onlyabbr##!#[\s]+#cap# is as follows:


[\s] white space
#abbr_onlyabbr# the list of abbreviations on the list with the same name under the "Custom lists".
#!# segment here
[\s]+ white space, one or more
#cap# capital letter


For your information again, memoQ segmentation rules are consisted of:
1- some custom list names, found under the "Custom lists" tab.
2- regular expressions, which need some knowledge. You should find the main ones here:
http://kilgray.com/memoq/50/help-en/index.html?regular_expressions.html
Also, here is a useful video by Denis Hay: http://vimeo.com/36075095
3- the mark #!# which means: segment here, and which separate what should be before and after the break.


HTH,
Yasmin



[Edited at 2012-04-18 14:45 GMT]


 

Piotr Bienkowski  Identity Verified
Poland
Local time: 00:23
Member (2005)
English to Polish
+ ...
TOPIC STARTER
The rules don't work consistently Apr 24, 2012

Today I had this Polish piece of text:

Klaster obliczeniowy został uruchomiony na potrzeby konsorcjum utworzonego w kwietniu 2007 r. przez Politechnikę Śląską, Centrum Onkologii (Instytut im. Marii Skłodowskiej-Curie Oddział w Gliwicach), Śląski Uniwersytet Medyczny (wówczas Śląską Akademię Medyczną) oraz Uniwersytet Śląski.

The segment was broken after "im." and I had to merge it. I did not modify the seg rules at all, so I did not introduce the change you suggested, and yet sometimes text is segmented in these places, and sometimes not. Oh well, maybe it is because "im" in Polish is also an actual word in addition to being an abbreviation ("im.").

I know that maybe I grumble too much and I can always (almost always, I can't do that in online MemoQ projects) split or merge, but I would like for MemoQ to get it right the first time. and I am much more familiar with "standard" regexes than with MemoQ's extensions.

Regards,

Piotr


 

Piotr Bienkowski  Identity Verified
Poland
Local time: 00:23
Member (2005)
English to Polish
+ ...
TOPIC STARTER
And the seg rules are really good for nothing when... Apr 29, 2012

I want to align the HTML format in Livedocs. Paragraphs are stuck together even if the end of paragraph tag and the start paragraph tag are separated by a "real" CRLF, e.g.


z tego względu istnieją podstawy do
wyłączenia takich urządzeń z zakresu niniejszej dyrektywy.{/p}
{p}(7) W
odniesieniu do urządzeń ciśnieniowych objętych konwencjami
międzynarodowymi,


The parts before and after (7) are lumped together. Angle brackets were changed to braces because the forum inteprets them as tags and they go away.


 

ahmadwadan.com  Identity Verified
Kuwait
Local time: 02:23
English to Arabic
+ ...
Similar issue here... Oct 30, 2016

Hello,

I have a similar issue.

I wish you can help handling a segmentation rule for company names below so MemoQ treats it as ONE segment instead of two segments:

XXX Company K.S.C. (Closed)

instead of:

XXX Company K.S.C.
(Closed)

Thank you


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

MemoQ extensions to segmentation rules

Advanced search






Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »
SDL Trados Studio 2019 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2019 has evolved to bring translators a brand new experience. Designed with user experience at its core, Studio 2019 transforms how new users get up and running and helps experienced users make the most of the powerful features.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search