MemoQ extensions to segmentation rules
Thread poster: Piotr Bienkowski

Piotr Bienkowski  Identity Verified
Poland
Local time: 12:01
Member (2005)
English to Polish
+ ...
Apr 18, 2012

Is there a help topic that explains thoroughly the MemoQ extensions to segmentation rules (I mean all those strings with # at both ends and words with underscores in the middle...). I can't find any!

I am not happy with the way MemoQ segments Polish originals, athough it appears to have some seg. rules for Polish.

Will appreciate your help.

Regards,

Piotr


 

Yasmin Moslem  Identity Verified
Egypt
Local time: 12:01
English to Arabic
Segmentation Rules Apr 18, 2012

Dear Piotr,

I mean all those strings with # at both ends and words with underscores in the middle...


For example, the first one #end##!#[\s]+#cap# can be read as follows:

#end# end of segment mark
#!# segment here
[\s]+ white space, one or more
#cap# capital letter


Both #end# and #cap# (as well as other ##) can be found under the tab "Custom lists".


Most importantly, could you please give us some segmented examples you do not like and how you expect them to be.

Kind regards,
Yasmin


 

Piotr Bienkowski  Identity Verified
Poland
Local time: 12:01
Member (2005)
English to Polish
+ ...
TOPIC STARTER
This helps a bit, but Apr 18, 2012

Yasmin Moslem wrote:

Dear Piotr,

I mean all those strings with # at both ends and words with underscores in the middle...


For example, the first one #end##!#[\s]+#cap# can be read as follows:

#end# end of segment mark
#!# segment here
[\s]+ white space, one or more
#cap# capital letter


Both #end# and #cap# (as well as other ##) can be found under the tab "Custom lists".


Most importantly, could you please give us some segmented examples you do not like and how you expect them to be.

Kind regards,
Yasmin



I am looking for a comprehensive lists that explains the use of ALL such symbols.

Examples of wrong segmentation:


wykonując badania geologiczne terenu i przeprowadzając uzgodnienia z Zarządem Fabryki Mydła S.A. Przygotowywany jest również projekt budowlany inwestycji, trwają pierwsze prace przygotowawcze przed wszczęciem właściwej procedury środowiskowej.


New segment should start after S.A.


Może podlegać nieznacznym zmianom po przeprowadzeniu oceny o oddziaływaniu projektu na środowisku i uzyskaniu pozwolenia na budowę, które planowane są odpowiednio na czerwiec 2012 r. i listopad 2012 r. Prace powinny rozpocząć się we grudniu 2012 r. i zakończyć w sierpniu 2013 r. Realizacja zadania jest ważna z uwagi na powiększenie miejsca składowania pustych kontenerów przeładowywanych w większej liczbie po zakupie urządzeń przeładunkowych i zwiększenia zdolności przeładunkowej.


New segments should start before Prace and before Realizacja.

P.S. I changed the company name in my quotes. I hope my client will not take me to courticon_wink.gif


 

Yasmin Moslem  Identity Verified
Egypt
Local time: 12:01
English to Arabic
abbreviation + end of segment + capital letter Apr 18, 2012

Dear Piotr,

In this case, you can go to the Segmentation Rules section of your project, click "Edit" and accept creating a copy; then select the copy and click "Edit".

On the "Segmentation" tab, make sure you select the first rule on the "Rules" pane #end##!#[\s]+#cap# and then move to the "Exceptions" pane and select the second rule [\s]#abbr_onlyabbr##!#[\s]+#cap# and click "Delete". Then, click "OK" to save the settings.

Now, try to reimport the document.

For your information, the meaning of this exceptional rule that you have deleted [\s]#abbr_onlyabbr##!#[\s]+#cap# is as follows:


[\s] white space
#abbr_onlyabbr# the list of abbreviations on the list with the same name under the "Custom lists".
#!# segment here
[\s]+ white space, one or more
#cap# capital letter


For your information again, memoQ segmentation rules are consisted of:
1- some custom list names, found under the "Custom lists" tab.
2- regular expressions, which need some knowledge. You should find the main ones here:
http://kilgray.com/memoq/50/help-en/index.html?regular_expressions.html
Also, here is a useful video by Denis Hay: http://vimeo.com/36075095
3- the mark #!# which means: segment here, and which separate what should be before and after the break.


HTH,
Yasmin



[Edited at 2012-04-18 14:45 GMT]


 

Piotr Bienkowski  Identity Verified
Poland
Local time: 12:01
Member (2005)
English to Polish
+ ...
TOPIC STARTER
The rules don't work consistently Apr 24, 2012

Today I had this Polish piece of text:

Klaster obliczeniowy został uruchomiony na potrzeby konsorcjum utworzonego w kwietniu 2007 r. przez Politechnikę Śląską, Centrum Onkologii (Instytut im. Marii Skłodowskiej-Curie Oddział w Gliwicach), Śląski Uniwersytet Medyczny (wówczas Śląską Akademię Medyczną) oraz Uniwersytet Śląski.

The segment was broken after "im." and I had to merge it. I did not modify the seg rules at all, so I did not introduce the change you suggested, and yet sometimes text is segmented in these places, and sometimes not. Oh well, maybe it is because "im" in Polish is also an actual word in addition to being an abbreviation ("im.").

I know that maybe I grumble too much and I can always (almost always, I can't do that in online MemoQ projects) split or merge, but I would like for MemoQ to get it right the first time. and I am much more familiar with "standard" regexes than with MemoQ's extensions.

Regards,

Piotr


 

Piotr Bienkowski  Identity Verified
Poland
Local time: 12:01
Member (2005)
English to Polish
+ ...
TOPIC STARTER
And the seg rules are really good for nothing when... Apr 29, 2012

I want to align the HTML format in Livedocs. Paragraphs are stuck together even if the end of paragraph tag and the start paragraph tag are separated by a "real" CRLF, e.g.


z tego względu istnieją podstawy do
wyłączenia takich urządzeń z zakresu niniejszej dyrektywy.{/p}
{p}(7) W
odniesieniu do urządzeń ciśnieniowych objętych konwencjami
międzynarodowymi,


The parts before and after (7) are lumped together. Angle brackets were changed to braces because the forum inteprets them as tags and they go away.


 

ahmadwadan.com  Identity Verified
Kuwait
Local time: 13:01
English to Arabic
+ ...
Similar issue here... Oct 30, 2016

Hello,

I have a similar issue.

I wish you can help handling a segmentation rule for company names below so MemoQ treats it as ONE segment instead of two segments:

XXX Company K.S.C. (Closed)

instead of:

XXX Company K.S.C.
(Closed)

Thank you


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

MemoQ extensions to segmentation rules

Advanced search






TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
SDL MultiTerm 2019
Guarantee a unified, consistent and high-quality translation with terminology software by the industry leaders.

SDL MultiTerm 2019 allows translators to create one central location to store and manage multilingual terminology, and with SDL MultiTerm Extract 2019 you can automatically create term lists from your existing documentation to save time.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search