Is anyone else having problems with MemoQ's handling of html or xml?
Thread poster: Thomas T. Frost

Thomas T. Frost  Identity Verified
Member (2014)
French to Danish
+ ...
Aug 10, 2015

I used MemoQ 2015 to translate a page on my own website, and I have other pages to translate. However, after exporting the first translation, I noticed a number of problems with MemoQ's handling of html:

1. Code formatting destroyed

Html and xml code have two levels of formatting:
1. The formatting of the code made by the programmer so he or she can overview the code. Line breaks and indentations are used for this, as well as html comments.
2. The formatting rendered to the end user, typically by a browser.

Html and xml are line-oriented formats. Editors typically only display the first 80 characters or so of a line, although some have 'wrap line' options. Hence, many programmers try to keep their line lengths to what can be seen on the screen, just as they do in any other programming language.

MemoQ does not respect any of the level 1 formatting (of the code). Here are some examples:

Example 1

Source:


Export:


Example 2

Source:


Export:


Example 3

Source:


Export:


The carefully organised formatting intended to make the code easy to maintain is destroyed by MemoQ.

Another problem appears when one translates the texts, and MQ outputs as long a line as necessary to hold the text within a given set of tags, meaning it cannot be displayed on the programmer's screen, and he or she has to manually insert as many line breaks as necessary to be able to read the text.

Their default html filter has the following option checked:
"Break segment at preserved newline characters: Check this check box to make memoQ start a new segment whenever it encounters a newline character in the HTML text, so that the newline character will be preserved in all cases."

But that's not how MemoQ behaves.

I reported this to Kilgray support on 25 July. So far, they haven't even admitted that it's a bug.

2. Html symbols not preserved

In example 3 above, one can see that the html symbol has been replaced with ©. That's indeed how it should be rendered, but keeping such characters as symbols in the code can prevent poor rendering caused by various software in the other end not respecting all standards correctly. It's not the translator's or the CAT tool's job to make the decision to change the symbols anyway. One has the option to export all characters that can be represented as symbols as symbols, but that outputs é as etc., not just those characters that originally were symbols.

Kilgray has not admitted that this is not desirable either.

3. Code page changed

I noticed that even though the source had a code page declaration, they changed it to UTF-8.

Their default filter does not have the following option checked:

"Use this codepage even if there is a different declaration in the file: Check this check box to enforce the import codepage selection. Use this when you suspect that the encoding declaration in the HTML file is incorrect or inconsistent. This check box is not checked by default."

So this looks like another bug, but they haven't admitted that yet.

UTF-8 may well be a better choice; it's just not the CAT tool's role to decide that.

4. tag syntax changed

By default, they will change . That is correct syntax for xml but not html.

However, unchecking the option

"Enforce empty tags: Check this check box to treat old-style tags as empty tags. Normally, these would be imported as opening tags, but with this setting memoQ will import them as XML-style empty tags in all cases – so you won't get rogue XML warnings when confirming segments in the document."

fixes this even though the explanation is totally cryptic and incomprehensible.

Am I the only one bothered by this, and does any other CAT tool do these things properly?

Fortunately, this happened on my own website, but I would have been very unhappy as a paying client to receive the mess MemoQ created in return, and I would require the translator to clean it up and re-establish the original formatting. Depending on the number of files, this clean-up task could take several hours. It took me 1-2 hours to clean up just one page on my own site.

What this means is that MemoQ is useless for html and xml, as I could not return such a shambles to a client.

Has anyone else had such problems?

How do other CAT tools handle html?

[Edited at 2015-08-10 17:34 GMT]


Direct link Reply with quote
 
VIP9N
Local time: 18:08
Russian to English
+ ...
Just use another CAT Aug 10, 2015

Dear Thomas T. Frost,

A similar issue has already been discussed here (I guess) http://www.proz.com/forum/memoq_support/285709-processing_of_internal_vs_structural_tags.html

See, for example I use Memo for Excel as it works best with Excel files. But I had to use Deja for XML files. I even sometimes used TradoStudio to import and segment (cut) horribly made (by some secretary or manager) PPTX files, then export it as a bilingua, copy the left column in a separate file, translate the file, export translation, copy translated column back in the right column of the initial bilingua, reimport bilingua and export the ready translation Yeah, sometimes a lot of work happens.

So, upon you read the link above, try another CAT. Consider CATs as a set of tools for a plumber Don't limit yourself and good luck.


Direct link Reply with quote
 

Thomas T. Frost  Identity Verified
Member (2014)
French to Danish
+ ...
TOPIC STARTER
Which CAT tool can handle html properly? Aug 11, 2015

VIP9N, thanks for your suggestions.

Using another CAT tool is obviously a solution, except that given how much they cost, I cannot go around buying one after another because there is a bug in one I've already paid a small fortune for, so I do indeed have to limit myself in that aspect, unless you know about an unlimited source of funds for buying CAT tools.

Anyway, you suggested Déjà Vu (I suppose that's what you refer to with "Deja", so I'll have a look at that, thanks.

I don't recognise the problem described in the other forum topic you refer to.


Direct link Reply with quote
 
esperantisto  Identity Verified
Local time: 18:08
Member (2006)
English to Russian
+ ...
OmegaT Aug 11, 2015

No problems with OmegaT (however, you might need to tweak HTML filter settings if files are not in UTF-8).

BTW, what you describe can hardly be qualified as a bug, here I second Kilgray’s support. At least, the examples provided are in valid HTML code.


Direct link Reply with quote
 

Thomas T. Frost  Identity Verified
Member (2014)
French to Danish
+ ...
TOPIC STARTER
OmegaT / a bug Aug 11, 2015

Thanks for the OmegaT suggestion, and that is furthermore freeware. I'll definitely try that.

We don't agree about this not being a bug. The mess MemoQ returns is hopeless for the html programmer to deal with, and I could not return that to a client. I would refuse it as a client myself and deduct the cost of putting the formatting back in order from the translator's invoice if the translator didn't deal with it him- or herself. A CAT tool's role is to translate. Nothing else. Not change the formatting or code. I've worked 20 years as an IT specialist, and I know all about the importance about programming code being written in a way that is easy to overview. It's as important as the final formatting of the web page or any other document. With the same logic, you could defend that MemoQ remove all the header tags in a document, as the resulting text is still valid.

Besides, inserting a slash in br and img tags is not valid in html. It is tolerated but not valid.


Direct link Reply with quote
 

Adrien Esparron
Local time: 16:08
Member (2007)
German to French
+ ...
Omega T / WF Anywhere Aug 11, 2015

Both are free and work without problem. I also tried with WordFast Classic with your danish index.html and the code formatting was respected.

Have fun!


Direct link Reply with quote
 

Thomas T. Frost  Identity Verified
Member (2014)
French to Danish
+ ...
TOPIC STARTER
Thanks Aug 11, 2015

Many thanks, Adrien. I'm downloading OmegaT right now, and I'll have a look at Wordfast too. It doesn't harm to have alternatives.

I guess this is not the first time in history that freeware does the job better than expensive bloatware.

Another example just popping up is Microsoft's FrontPage successor called Expression Web 4. I bought it for website maintenance only to discover it's full of bugs that were never fixed, and contrary to FrontPage, the only language it can spellcheck is US English. Why would Americans in Seattle see any need to spellcheck any other language? Synchronising files on the server with files on my PC didn't work, and the process took minutes. I found a free FTP sync tool that did the same in seconds and which worked. Microsoft ended up dumping Expression Web, i.e. making it unsupported freeware, rather than fixing all the bugs for those poor sods that had paid for it.


Direct link Reply with quote
 
esperantisto  Identity Verified
Local time: 18:08
Member (2006)
English to Russian
+ ...
OmegaT is free software Aug 11, 2015

Thomas T. Frost wrote:

Thanks for the OmegaT suggestion, and that is furthermore freeware.


OmegaT is not freeware, it is free software.


Direct link Reply with quote
 

Thomas T. Frost  Identity Verified
Member (2014)
French to Danish
+ ...
TOPIC STARTER
Freeware Aug 11, 2015

esperantisto wrote:

OmegaT is not freeware, it is free software.


You are correct according to a purist definition, but according to https://en.wikipedia.org/wiki/Freeware , " "freeware" is a loosely defined category and it has no clear accepted definition, although FSF asks that free software (libre; unrestricted and with source code available) should not be called freeware".

In common language, freeware simply means software that is free. That's what both the Oxford and Merriam-Webster dictionaries tell you. That's what is essential in this case.

In any case, that is a side issue with no relevance for the problem at hand.


Direct link Reply with quote
 
VIP9N
Local time: 18:08
Russian to English
+ ...
What about the right choice using trial versions? Aug 11, 2015

Thomas T. Frost wrote:
... except that given how much they cost, I cannot go around buying one after another because there is a bug ...


Well, I would say that before buying it, you have to try it first, and determine whether it fits for your purposes. As far as I know, each of these tools has trial period, which would be enough to check and know.

You see, in my opinion, DéjàVu was made by IT for IT from the very beginning. And I'm not surprised it might be good for your task of translating your Web-site. However, it's for your particular case. In my priorities MemoQ remains Number One. Deja is on the second place. I must recognise, during the last year Deja made a good piece of progress, and it's capability for autosuggestion of right words and expressions is truly impressive, but it still has areas to come up with Memo.

...I don't recognise the problem described in the other forum topic you refer to...


Too bad. I am far from being a programmer, just a translator. Probably, my experience with XML files (which were not corrupted by Deja) is not what you were looking for. In this case - my apologies to you. And good luck.


Direct link Reply with quote
 

Thomas T. Frost  Identity Verified
Member (2014)
French to Danish
+ ...
TOPIC STARTER
Trial versions Aug 12, 2015

About trial versions, it is practically impossible to test them for all potential problems in a month. Maybe you could if you had nothing else to do, but let's stick to real life.

It also takes quite some time to get to know each CAT tool well enough to be able to produce a result or even to be able to use their features to a moderate degree. I downloaded a Trados trial version a few days ago, but I haven't been able to figure out how to make it produce the target html file yet, and the user interface generally seems complicated and not particularly intuitive to me. Nothing is written to the target folder I specified when I export.

OmegaT appears to be less complicated, though.

Investing hours in getting to know a CAT tool's trial version only really makes sense if one might potentially buy it, but as I only just bought a MemoQ licence in June, my CAT tool budget is exhausted for the next year or two.

Déjà Vu may perhaps be a future candidate for replacing MemoQ.

If OmegaT, as suggested by Adrien, can do the job, then at least my immediate html problem is solved, although I don't appreciate that OmegaT keeps source and target text in one and the same column, but that's less of an inconvenience than having my html files mangled by MemoQ. I haven't found an option to get two-column layout as in MemoQ and Memsource.

MemoQ support claims it is "impossible" not to destroy the html layout. Interesting, since OmegaT appears to be able to do it. Html editors such as FrontPage and its successor Expression Web allows either wysiwyg editing or source editing, and one can switch between the modes as much as one likes, while no html layout is being destroyed. But it seems to be beyond the capabilities of MemoQ's developers who seem to be in denial about the problems they cause.

The example you referred to is this:

{para styleclass="Normal"}
{text styleclass="Normal"}This{/text}
{text styleclass="Bold"}is{/text}
{text styleclass="Normal"}text{/text}
{/para}

What sort of scripting language is this? Xml? I only understand html and css, not xml. In any case, the problem described is different from my problems, but thanks for trying, at least.


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Is anyone else having problems with MemoQ's handling of html or xml?

Advanced search






Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search