Splitting segments after paragraph marks
Thread poster: Sophie Borel

Sophie Borel  Identity Verified
France
Local time: 06:27
English to French
+ ...
Oct 7

Hello,

Could someone help me with the segmentation of XML files?
I'd like to segment the file after paragraph marks. I tried to change the TM segmentation rules, but I don't know how to tell Studio to split segments after paragraph marks (¶). Can somebody help?

Thanks,
Sophie


[Edited at 2019-10-08 07:52 GMT]


 

Anthony Rudd

Local time: 06:27
German to English
+ ...
Segmentation Oct 8

Project Settings → Language Pairs → Translation Memory … → Settings → Language Resources → Segmentation Rules

Add Segmentation Rule


 

Sophie Borel  Identity Verified
France
Local time: 06:27
English to French
+ ...
TOPIC STARTER
Details Oct 8

Thanks Anthony,

Do you know if I have to add a segmentation rule for source language only or for target and source languages?

Thanks


 

Samuel Murray  Identity Verified
Netherlands
Local time: 06:27
Member (2006)
English to Afrikaans
+ ...
@Sophie Oct 8

Sophie Borel wrote:
I'd like to segment the file after paragraph marks (¶).


Firstly, let's just make sure that you really mean segmentation by paragraph marks, i.e. "¶", and not simply segmentation by paragraph. Do you have these characters (i.e. "¶") in the XML file and you want to split the text by those marks, or are you actually just viewing the XML file in a viewer that shows the paragraph breaks as "¶" symbols?

Let's assume that your XML file actually contains "¶" characters, i.e. that if the text is "The cat¶sat on¶the mat", then you want there to be three segments, namely "The cat¶", "sat on¶" and "the mat". To accomplish this, you have to add a segmentation rule that has "¶" as the "Before break" text and nothing as the "After break" text.

I'm no expert, but AFAIK, in Trados 2019, you can set segmentation rules in two places, namely as part of a translation memory's settings and as a language resources file. Then, when you create a project, you can specify either a translation memory or a specific language resources file.

If you want to set the segmentation rule as part of a translation memory's setting, then you can access that setting in a number of ways, but one way is to go to the "Translation Memories" pane, right-click the translation memory > Settings > Language Resources > Segmentation Rules > Edit > Add. Then name the rule something like "Paragraph mark break", click Advanced View, and put ¶ in the "Before break" field and make the "After break" field empty.

You can also reach this TM editing dialog from the "Projects" pane, right-click the relevant project > Project Settings > Language Pairs > (choose the correct language pair) > Translation Memories and Automated Translation > (click the relevant TM) > Settings (if not greyed out) > Language Resources > Segmentation Rules > Edit > Add.

If you want to set the segmentation rule as part of a language resource file, go File > New > New Language Resource Template Segmentation Rules > Edit > Add. Add the segmentation rule, and then save the language resource file somewhere. The language resource file will then show up in the "Translation Memories" pane, where you can edit it if you want to. When you create a new project, you can choose the language resource file at step #3, All Language Pairs > Language Resources > and select the file from the drop down or browse for it. You can also add the language resource file to an existing project, by right-clicking the project > Language Pairs > All Language Pairs > Translation Memory and Automated Translation > Language Resources.

Sophie Borel wrote:
Do you know if I have to add a segmentation rule for source language only or for target and source languages?


As far as I know, that is only necessary if you want to do alignment.

Also, the option to select a language resources file is only available under "All Language Pairs".


 

Sophie Borel  Identity Verified
France
Local time: 06:27
English to French
+ ...
TOPIC STARTER
paragraph Marks Oct 8

Hi Samuel,

Yes, I mean mean segmentation by paragraph marks.
Thanks for your reply. I tried to add the TM segmentation rule, but I still obtain the same segmentation...

For example, I have

(h3)The product Benefits(/h3)
(ul)
(li)Complete product solutions(/li)
(li)quality at high speeds(li)
(li)User-friendly(/li)
(li)Flexible and productive(/li)
(li)products for every application(/li)
(li)Small-or l
... See more
Hi Samuel,

Yes, I mean mean segmentation by paragraph marks.
Thanks for your reply. I tried to add the TM segmentation rule, but I still obtain the same segmentation...

For example, I have

(h3)The product Benefits(/h3)
(ul)
(li)Complete product solutions(/li)
(li)quality at high speeds(li)
(li)User-friendly(/li)
(li)Flexible and productive(/li)
(li)products for every application(/li)
(li)Small-or large-formats, on-demand (/li)
(li)Low cost of ownership
(li)Solutions that achieves a competitive edge(/li)
(/ul)
Above all, our brand manages..[...].

In ONE segment...
It would really be helpful to split that properly to use the TM efficiently, but I can't find a solution here...

[Edited at 2019-10-08 13:29 GMT]

[Edited at 2019-10-08 13:31 GMT]
Collapse


 

Samuel Murray  Identity Verified
Netherlands
Local time: 06:27
Member (2006)
English to Afrikaans
+ ...
@Sophie Oct 8

Sophie Borel wrote:
I mean mean segmentation by paragraph marks.


But the example that you show below has no paragraph marks in it. (-:

For example, I have
(h3)The product Benefits(/h3)


Did you use round brackets here to avoid using pointy brackets (i.e. < and > ) which break the ProZ.com forum software, or does your text actually use round brackets?

What is the file type?


 

Sophie Borel  Identity Verified
France
Local time: 06:27
English to French
+ ...
TOPIC STARTER
@Samuel Oct 8

Well, they disappeared in the forum, but there are paragraph marks every time there's a line break, at the end of each line...
And yes, I rounded the > signs to force them to appear in the post...


 

Samuel Murray  Identity Verified
Netherlands
Local time: 06:27
Member (2006)
English to Afrikaans
+ ...
@Sophie Oct 8

Sophie Borel wrote:
...


Well, I guess it may be a file filter issue, but I have reached the limit of what I can do without seeing the actual file. Sorry.


 

NeoAtlas
Spain
Local time: 06:27
English to Spanish
+ ...
Exclude Oct 9

Sophie Borel wrote:
(h3)The product Benefits(/h3)
(ul)
(li)Complete product solutions(/li)

(li)Solutions that achieves a competitive edge(/li)
(/ul)

Are the tags (h3) (/h3) (ul) (/ul) (li) (/li) in Project settings > File Types > [your XML] > Advanced > Embedded content?
If so, click on each tag and Edit > Advanced > Exclude
☛ Mentioned tags can be excluded, but (b) (/b) are not., so be careful if you change the settings.

[Edited at 2019-10-10 07:16 GMT]


 

Sophie Borel  Identity Verified
France
Local time: 06:27
English to French
+ ...
TOPIC STARTER
Solution found! Oct 10

Thanks for your help!

I finally found how to resolve this issue thanks to a link given by Paul Flikin from SDL Client Services :
multifarious.filkin.com/.../

Hope it will help others with the same issue!


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Splitting segments after paragraph marks

Advanced search







Déjà Vu X3
Try it, Love it

Find out why Déjà Vu is today the most flexible, customizable and user-friendly tool on the market. See the brand new features in action: *Completely redesigned user interface *Live Preview *Inline spell checking *Inline

More info »
SDL MultiTerm 2019
Guarantee a unified, consistent and high-quality translation with terminology software by the industry leaders.

SDL MultiTerm 2019 allows translators to create one central location to store and manage multilingual terminology, and with SDL MultiTerm Extract 2019 you can automatically create term lists from your existing documentation to save time.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search