https://www.proz.com/forum/sdl_trados_support/90040-strange_html_file.html

Strange html-file?
Thread poster: Heinrich Pesch
Heinrich Pesch
Heinrich Pesch  Identity Verified
Finland
Local time: 10:39
Member (2003)
Finnish to German
+ ...
Nov 22, 2007

Yesterday a customer sent me 6 pages of pdf. I had done a large job for them last summer, when they had delivered the tables in Excel and the texts as rtf. Now the text was together with the tables.
I scanned the files into a doc-file and counted 2100 words (Word statistics). But I was not sure if the customer would like the doc-format, so I asked if he could send the dtp-file directly. It turned out to be pagemaker. Because I cannot handle pagemaker files (Trados and SDLX cannot import th
... See more
Yesterday a customer sent me 6 pages of pdf. I had done a large job for them last summer, when they had delivered the tables in Excel and the texts as rtf. Now the text was together with the tables.
I scanned the files into a doc-file and counted 2100 words (Word statistics). But I was not sure if the customer would like the doc-format, so I asked if he could send the dtp-file directly. It turned out to be pagemaker. Because I cannot handle pagemaker files (Trados and SDLX cannot import them directly I believe) I asked for a html-export.
When I analysed this html-file in Workbench, I got a result of about 10 000 words total, 61 % repetitions and 4900 new words.

I always believed Trados wordcount would be lower than Word's, because WB does not count numbers, but this result astonished me. I knew I could not believe it, because 6 pages and 10 000 words is far too much.

So I finally created a project in SDLX from my doc-file, confirmed manually the segments which contain only numbers and got down to 1500 words.

When I look at the html-file in TE, the segmentation is very strange, the same happens in SDLX, if I use the html-file, and the statistics talk about more than 10 000 untranslated words.

I always thougt translation of html was child's play. What could be wrong?

(SDL Trados 2006)

Heinrich

[Bearbeitet am 2007-11-22 16:34]
Collapse


 
Margreet Logmans (X)
Margreet Logmans (X)  Identity Verified
Netherlands
Local time: 09:39
English to Dutch
+ ...
Tags not recognised? Nov 22, 2007

Hi Heinrich,

all I can think of is that - probably because of all these conversions - the tags and formatting is not recognised correctly.

I also found this article, perhaps
... See more
Hi Heinrich,

all I can think of is that - probably because of all these conversions - the tags and formatting is not recognised correctly.

I also found this article, perhaps it is of some use to you:
http://ell.proz.com/translation-articles/articles/22/1/How-to-collect-stories-with-PageMaker-for-Mac-(and-PC-too)-without-Trados-Story-Collector

Like you, I always thought HTML is child's play - let's not get worried yet.

Good luck!

Margreet
Collapse


 
Vito Smolej
Vito Smolej
Germany
Local time: 09:39
Member (2004)
English to Slovenian
+ ...
SITE LOCALIZER
"When I analysed this html-file in Workbench" Nov 23, 2007

It would be safer to import it into TagEditor and then look at the ttx. I would guess the HTML codes got counted as well, and the real stuff is 4900 words (minus the HTML codes of course).

Regards

Vito


 
Heinrich Pesch
Heinrich Pesch  Identity Verified
Finland
Local time: 10:39
Member (2003)
Finnish to German
+ ...
TOPIC STARTER
Looks terrible Nov 23, 2007

Vito Smolej wrote:

It would be safer to import it into TagEditor and then look at the ttx. I would guess the HTML codes got counted as well, and the real stuff is 4900 words (minus the HTML codes of course).

Regards

Vito


I did look at it in TE, and it looks terrible. The segmentation is all wrong. What I do not understand is why there are segments with all numbers (from the tables) and that the table headers are split. The table header could be (pump head) and Trados makes two segments of it; pump and head. Abbyy Finereader at least did a better job on the pdf.

The customer will send me another format today, lets see what he comes up with.


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Strange html-file?


Translation news related to SDL Trados





Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »
CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

Buy now! »