Translate Joomla! export (import jDiction XLIFF file)
Thread poster: Languageman

Languageman  Identity Verified
United Kingdom
Local time: 10:37
German to English
+ ...
Jun 2, 2016

Hi,

I've recently invested in a new website built on Joomla! because it has good support for multilingual sites. The plan is to use the jDiction plugin to export to XLIFF in order to translate the site with an existing TM.

Unfortunately, when jDiction creates source segments it does not segment on punctuation, but rather puts the entire content of each page into a single segment. The file imports fine, but because there is so much in each segment most of them don't come up with TM matches (only the headings work as you'd hope). An example of how this looks when imported to MemoQ is shown at the bottom of this post.

Can anyone advise how I can get MemoQ to show me segments at sentence level so I can use my TMs? Or recommend an alternative way of extracting the Joomla pages for translation?

Thanks and kind regards,

Stephen

Import_j_Diction_XLIFF.png


 

Stanislav Okhvat
Local time: 12:37
English to Russian
Use XLF filter options Jun 2, 2016

Hi Stephen,

You should use Import with Options command instead and activate the cryptic "Segment text if no <seg-source> is present for a 'trans-unit'" option. Also add Regex tagger cascading filter with <[^>]+> expression in order to turn HTML tags into memoQ tags (or do it after using Regex tagger).

Best regards,
Stanislav


 

Languageman  Identity Verified
United Kingdom
Local time: 10:37
German to English
+ ...
TOPIC STARTER
Part way there Jun 2, 2016

Hi Stanislav,

Thanks for the suggestions. I implemented the two points in several combinations ("cryptic option" only, Regex Tagger only, "Cryptic Option" + regex tagger).

All resulted in warnings along the lines of: "Segmentation in source and target content may have resulted in a different number of segments in the following trans-unit elements:"

Some of the target segments were split better, but others were not, and the changes to source and target were not the same (see image at end of post).

I can think of a couple of things might be contributing to the continuing problems:

1/ The XLIFF contains identical text in source and target for some reason, not blank targets.
2/ I haven't set up the Regex Tagger correctly (first time using this) - I copied the settings in the image at the bottom

Thanks for any further suggestions you can offer.

Best wishes, Stephen

2016_06_02_20_54_10_memo_Q_Joomla_Chinese_Omflo.jpg2016_06_02_21_37_05_memo_Q_Joomla_Chinese_Omflo.png


 

Stanislav Okhvat
Local time: 12:37
English to Russian
Re: Part way there Jun 3, 2016

Hello again, Stephen,

The settings for the cascading filter look correct.

In my opinion, the problem lies in two areas:
1) The XLIFF file is set up in a specific way which causes memoQ to display these warnings.
2) The segments contain structural HTML elements such as Level 4 headings (h4), paragraphs (p), etc., all in the same segments. If you used the HTML cascading filter, memoQ would split the segments by structural boundaries. However, there is no way to use the HTML cascading filter after the XLIFF filter (only Regex tagger is supported as the cascading filter for the XLIFF filter), so we cannot split the text by boundaries of HTML structural elements. The only way to ensure proper segmentation is for you to split the segments manually in memoQ.

You can send me the document privately at stasokhvat AT gmail DOT com so I can have a better look.

Also, it may be possible to find some toolkits for XLIFF file processing which will clear the translation (target) from your files. Ideally here is how your XLIFF file records must look like (note that the target is empty):

<trans-unit id='p137'>
<source>RIL – Mega PP </source>
<target></target>
</trans-unit>

Best regards,
Stanislav Okhvat
TransTools – Useful tools for every translator


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Translate Joomla! export (import jDiction XLIFF file)

Advanced search






Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
BaccS – Business Accounting Software
Modern desktop project management for freelance translators

BaccS makes it easy for translators to manage their projects, schedule tasks, create invoices, and view highly customizable reports. User-friendly, ProZ.com integration, community-driven development – a few reasons BaccS is trusted by translators!

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search