Mobile menu

Is it normal that exported TMs also contain HTML commands?
Thread poster: Darius Porebski
Darius Porebski
Germany
Local time: 07:03
English to German
+ ...
Jul 19, 2006

Hello everybody,

after having exported a TM (by means of TW 6.5.5.438) I opened the TM in a usual text editor (Wordpad). What I to my surprise noticed is that many of the entries contain HTML commands like e.g. (sentence sample with HTML):

"Verdrehen Sie die Nocken so\-{\expndtw-2 weit, bis die 90° Kantungen an }{\expndtw-1 den eingezeichneten Linien an\-}{\expndtw-5 liegen.}"

I know that the source file was Word (doc) which not really was formatted in a good way (many frames, hyphenation sometimes done with a "-" etc.).

So would it be possible that Trados adopts all the "bad formatting" and puts it into its TM?

Is there a possibility to "clean" the database so that only segments are contained (I tried to clean it manually but it's a waste of time -too many )?

Besides of that there's another problem concerned with that: As I mentioned I exported the tmx. Then I tried to import the tmx into across crossTank. But crossTank at once recognized that many of the HTML commands (like "}{\expndtw-1") are something like "stop commands". Therefore -just to mention the sample sentence from above- crossTank only would import a part of the segment (which is "Verdrehen Sie die Nocken so weit, bis die 90° Kantungen an" -the first HTML command in the sentence did not force crossTank to stop).

Anyone a hint?

Thanks in advance for your help,
Darius

P.s. HTML commands are invisible when checking corcondance with TW; the segment appears as a whole sentence.

[Bearbeitet am 2006-07-19 17:27]


Direct link Reply with quote
 
xxxOlaf
Local time: 07:03
English to German
Trados always exports tags Jul 20, 2006

Darius Porebski wrote:
So would it be possible that Trados adopts all the "bad formatting" and puts it into its TM?


Yep, Trados exports everything that's not nailed down. There's no option to remove tags or font information. IMO, their version of TMX is no fully TMX compliant. They should remove all Trados specific formatting to ensure that other CAT tools can read TMX file generated by Trados without problems.


Is there a possibility to "clean" the database so that only segments are contained (I tried to clean it manually but it's a waste of time -too many )?


There are some third party tools created by translators and agencies, but none of them works 100% reliably. You might try a tool such as Funduc's Search & Replace or any other replace program that supports regular expressions (wildcard characters) and search and replace all tags in the exported file.

Olaf


Direct link Reply with quote
 

Vito Smolej
Germany
Local time: 07:03
Member (2004)
English to Slovenian
+ ...
Yes, it is normal... Jul 20, 2006

and I would dare to say, it's not html, it's more like RTF formatting gizmos.

Why would you want to clean them? They contain the formatting information. If you change a segment in the TM, then you will not be able to pretranslate the original source - or hardly -.

In some cases strange characters (quotes, hash signs) are translitterated as well. In any case all this is done in such a way that the contents of segments (both ways) is represented correctly.

Regarding porting to other applications, I would go back to the original file, throw out all italics, bolds, change all the formats to standard and retranslate. Then hopefully the majority of strange HTMLish entries should be gone.



[Edited at 2006-07-20 15:12]


Direct link Reply with quote
 
Darius Porebski
Germany
Local time: 07:03
English to German
+ ...
TOPIC STARTER
I don't really see the reason for storing formatting etc. in TMs Jul 21, 2006

Thanks for your answers!

I really think that these formatting commands contained in the TM are a problem regarding other CATs (like I mentioned before I tried to import the Trados TM into Across which separated the segments because of the commands).

From my point of view Trados shouldn't store these formatting commands. They are completely useless. It should analyze a segment in a document (regardless of if it is bold, cursive etc.), store the segment itself (nothing more, only the content) translate it (or even give a concordance hint) and finally put the translated and well-formatted sentence into the target text box in the document.

Let's assume one gets two docs, one from client A, the other from client B. Both of the docs contain sentence X (or even segment X), but client A's sentence is bold, client B's cursive. Sentence X is also contained in the TM (but in underlined formatting). When analyzing the docs Trados should and surely will find a 100% match in both docs regardless of formatting in the source sentence.

Here I see that all formatting in the TM is completely useless. What counts is the sentence itself, nothing more. Of course formatting should be done by Trados by means of Word etc., but a stored segment should be free of formatting, only its content is important.

Or am I wrong?

Best regards,
Darius


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Is it normal that exported TMs also contain HTML commands?

Advanced search


Translation news related to CAT tools





Déjà Vu X3
Try it, Love it

Find out why Déjà Vu is today the most flexible, customizable and user-friendly tool on the market. See the brand new features in action: *Completely redesigned user interface *Live Preview *Inline spell checking *Inline

More info »
SDL Trados Studio 2017 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2017 helps translators increase translation productivity whilst ensuring quality. Combining translation memory, terminology management and machine translation in one simple and easy-to-use environment.

More info »



All of ProZ.com
  • All of ProZ.com
  • Term search
  • Jobs