Tanslating an HTML-like file, leaving the original
Thread poster: xxxdrkoljan
xxxdrkoljan
Latvia
Local time: 14:46
English to Russian
+ ...
Oct 1, 2008

I have to translate a file full of HTML tags. The problem is, I have to leave the original text following the translation. Here's an example:

Code:
"chunk_1[english]"   "text
text
text
"
"chunk_2[english]" "text text text"

"chunk_3[english]" "text

text"




..has to become...


Code:
"chunk_1"   "translated text
translated text
translated text
"
"chunk_1[english]" "text
text
text
"
"chunk_2" "translated text translated text translated text"
"chunk_2[english]" "text text text"

"chunk_3" "translated text

translated text"

"chunk_3[english]" "text

text"



In other words, the total number of lines doubles, the translated chunk is followed by the English one - the translated chunk without the '[english]' part.

I'd greatly appreciate if someone could advise me on what would be the easiest way to accomplish this. Tag protection is preferable. If possible, I would be happy if I could use the Microsoft Office 2007 spelling functionality (like SDLX does) as well.

Thanks in advance.


Direct link Reply with quote
 

Lori Cirefice  Identity Verified
France
Local time: 13:46
French to English
taking a stab at this Oct 1, 2008

If I have understood you right, try this ...

Translate in your preferred CAT tool (wordfast, trados) and just don't clean up the bilingual file.

Do a find and replace to remove all delimiters at the end, and you should have your source/target segments in the same file. You might have to apply styles as well, I suppose the source would remain hidden, even if you remove the delimiters??

Or maybe try some 2-column approach in excel??


Direct link Reply with quote
 
xxxdrkoljan
Latvia
Local time: 14:46
English to Russian
+ ...
TOPIC STARTER
Thanks but... Oct 4, 2008

I have tried your suggestion with TagEditor. The bilingual ttx file contained both translation and original, but there were a few problems:

1) The code became such a mess that it would be practically impossible to clean it up with Find/Replace. I need the file to have the exact same formatting as the original.

2) The translation comes after the original, I need it to be vice versa.

2) TagEditor divides the text into sections by sentences (English sentence - translated sentence and so on), but I need it to be like: translated chunk (which might be several paragraphs) - English chunk and so on.

Any other suggestions?

I could just manually copy/paste each chunk BEFORE starting the translation, but there are about ~770 of them so that would take some time if I do it at once. I don't mind each time doubling the chunk I am going to translate, though.

Unfortunately, neither TagEditor nor SDLX allow editing the original text. Pushing the original chunk into one of the translation sections would probably screw up the translation memory. As far as I am aware, translating with Microsoft Word 2007 is the only case where you can edit the original text. However, if I open the HTML file with Word, I just see a plain text with all the unprotected tags, probably due to the missing "html" tags. If I add those, it starts working, but the code becomes a mess after editing and saving the file in Word.

When creating a project in Synergy, there is an option to convert the file into RTF and edit in Word. But no matter which option I choose, Synergy converts to TTX and opens TagEditor - could it be a Vista problem?

Any help will be much appreciated.

[Edited at 2008-10-04 11:48]


Direct link Reply with quote
 

Antoní­n Otáhal
Local time: 13:46
Member (2005)
English to Czech
+ ...
Several steps Oct 4, 2008

requiring some programming

(1) Put numbered "marks" after each chunk (from what you say, this step is not possible to automate, or is it?) say:

chunk1
§§§001§§§
chunk2
§§§002§§§
etc.

(2) having done that, it should be possible to create a "string array" with (chunk 1, chunk2, ....)

(3) translate as usual, leaving the new marks untouched

(4) replace the 1st mark with first field of your array, the 2nd one with the second field, etc.

HTH

Antonin

[Edited at 2008-10-04 13:45]

[Edited at 2008-10-04 13:45]


Direct link Reply with quote
 
xxxdrkoljan
Latvia
Local time: 14:46
English to Russian
+ ...
TOPIC STARTER
A few questions... Oct 4, 2008

Thanks a lot for your suggestions, Antonin, it sounds like a workable solution. However, not everything is clear for me.

(1) Put numbered "marks" after each chunk (from what you say, this step is not possible to automate, or is it?) say:

chunk1
§§§001§§§
chunk2
§§§002§§§
etc.


I think it is possible to automate, but I don't know how. Each chunk starts with
Code:
"[english]

, maybe that can be used somehow? Maybe there is some Find/Replace tool that can gradually change the Replace With field?

(2) having done that, it should be possible to create a "string array" with (chunk 1, chunk2, ....)


How do you do that in, say, TagEditor?

(4) replace the 1st mark with first field of your array, the 2nd one with the second field, etc.


How? I mean, after I have finished translating, the original text will be gone, no?


Direct link Reply with quote
 

Antoní­n Otáhal
Local time: 13:46
Member (2005)
English to Czech
+ ...
A few answers Oct 4, 2008



I think it is possible to automate, but I don't know how. Each chunk starts with
Code:
"[english]

, maybe that can be used somehow?



If you are saying that each chunk starts with

"english

and this does not occur elsewhere, then yes.


You cannot write a program for the procedure I briefly sketched within TagEditor - a html file is just a plain-text file you can process as I indicated using, say, Visual Basic.

Your problem seems a bit specific, so the code to be used will have to be specific - you will have to write it yourself or pay a programmer to do it for you.

Antonin


Direct link Reply with quote
 
xxxdrkoljan
Latvia
Local time: 14:46
English to Russian
+ ...
TOPIC STARTER
Okaaay... Oct 4, 2008

Um.. any other suggestions?

All I really need is some way to automatically double all the ~770 chunks, where all of them follow the same pattern. Or to be able to manually do that while translating. A program that can be configured to automatically place the source after the translation would be the best, of course.

Maybe some way to translate the HTML in Word without messing up the code and with tag protection?

[Edited at 2008-10-05 09:57]


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 13:46
Member (2006)
English to Afrikaans
+ ...
My stab at this Oct 5, 2008

drkoljan wrote:
"chunk_1[english]" "text
text
text
"
"chunk_2[english]" "text text text"

"chunk_3[english]" "text

text"[/code]


I assume that every segment starts with:

\"chunk_[0-9]@\[english\]\"[ ]@\"

...and every segment ends with:

"^l

...and that if a line ends, but not on a ", then it is still part of the source text.

Then this is what I would try, in MS Word:

Find: ^p
Replace all: ^l

Find: (\"chunk_[0-9]@\[)(english)(\]\"[ ]@\")(*)(\"^l)
Replace: \1russian\3\4\5\1\2\3\4\5^l
(with wildcards enabled)

Find: ^l
Replace all: ^p

This should take care of the duplication (and you'll just have to delete the text [russian] afterwards if your client doesn't want it).

It doesn't tell the CAT tool not to try to translate the English text, though. If you can translate a segmented RTF with tw4winExternal text, then here's what you can do. Do the above, then select all the text, and change it to tw4winExternal. Then do this:

Find: ^p
Replace all: ^l

Find: (\"chunk_[0-9]@\[russian\]\"[ ]@\")(*)(\"^l)
Replace: \1\2\3
(with wildcards enabled)
(with replace style "Default paragraph style")

Find: \"chunk_[0-9]@\[russian\]\"[ ]@\"
Replace: (nothing)
(with wildcards enabled)
(with replace style "tw4winExternal")

Find: \"^l
Replace: (nothing)
(with wildcards enabled)
(with replace style "tw4winExternal")

Find: ^l
Replace all: ^p

I haven't tested this but it should work. If you don't understand what this means, send your file to me and I'll do the change. I'm not sure if SDLX understands tw4winExternal. If you have extra tags, you can manualy protect them with tw4winInternal, I guess.

If possible, I would be happy if I could use the Microsoft Office 2007 spelling functionality (like SDLX does) as well.


If you were using Wordfast, then the target text would automatically be in the right language. If you need the above to be in the right language already, simply add the language to the item above where you change the style to "Default paragraph style" when you do find/replace.


Direct link Reply with quote
 
xxxdrkoljan
Latvia
Local time: 14:46
English to Russian
+ ...
TOPIC STARTER
Regarding the suggestion... Oct 10, 2008

Unfortunately, the way suggested by Saumel doesn't work for my document, as the example I provided in the first post was made without looking at the file - just to give the basic idea how the thing looks like - and the code is actually a bit different. Fortunately, the Samuel's post gave me a good idea to use a text editor to double the chunks. Here's what I used with the "Alternative Find & Replace for Writer" extension in OpenOffice.org Writer:

Find what:
Code:
\t\t"\[english\][::BigBlock::]\n



Replace with:
Code:
\t\t"&\e\b&\e



Now I have an HTML file ready to be translated with any CAT tool.

TagEditor does not quite suit my needs, though: it messes up the code, fails to automatically place some tags, and doesn't work with the Word spellchecker quite well.

I am looking for a way to translate the HTML file in Microsoft Word now. Any suggestions?

I've heard you can use Wordfast's +Tools to "tag" the HTML file and then translate as ordinary text - can I do that, and then translate with the Trados add-in? (I don't really like the design of Wordfast)

Thanks in advance.


Direct link Reply with quote
 

Heinrich Pesch  Identity Verified
Finland
Local time: 14:46
Member (2003)
Finnish to German
+ ...
Duplicate the code Oct 11, 2008

I would rewrite the file so that each paragraph is doubled and then translate one of each. That means I would translate the file normally, clean it and then copy each translated paragraph in front of the original paragraph in the source file. There is hardly any chance of finding an automatic approach but that should be done manually.
Cheers
Heinrich


Direct link Reply with quote
 
xxxdrkoljan
Latvia
Local time: 14:46
English to Russian
+ ...
TOPIC STARTER
Eh... Oct 11, 2008

Heinrich Pesch wrote:

I would rewrite the file so that each paragraph is doubled and then translate one of each. That means I would translate the file normally, clean it and then copy each translated paragraph in front of the original paragraph in the source file. There is hardly any chance of finding an automatic approach but that should be done manually.
Cheers
Heinrich


Thanks for the suggestion, but if you re-read my previous post, you can see I have already done this. What I am looking for now is a way to translate the "rewritten" file.


Direct link Reply with quote
 

Antoní­n Otáhal
Local time: 13:46
Member (2005)
English to Czech
+ ...
Translating html in Word Oct 11, 2008

In principle, you say TagEditor is not good enough for translating your html file. Right? If this is the case, I am 99% sure (Word + Workbench) will not be any better either and you will have to look elsewhere. There are really quite a few CAT tools out there...

That 1% stands for an option I can imagine in pure theory, like writing some code to protect html tags in Word, but that would be more or less writing a new version of "Trados for Word" and hardly a practicable solution.

HTH

Antonin


Direct link Reply with quote
 
xxxdrkoljan
Latvia
Local time: 14:46
English to Russian
+ ...
TOPIC STARTER
I didn't mean "pure" Word Oct 11, 2008

Antoní­n Otáhal wrote:

In principle, you say TagEditor is not good enough for translating your html file. Right? If this is the case, I am 99% sure (Word + Workbench) will not be any better either and you will have to look elsewhere. There are really quite a few CAT tools out there...

That 1% stands for an option I can imagine in pure theory, like writing some code to protect html tags in Word, but that would be more or less writing a new version of "Trados for Word" and hardly a practicable solution.

HTH

Antonin


Please, look here:

I've heard you can use Wordfast's +Tools to "tag" the HTML file and then translate as ordinary text - can I do that, and then translate with the Trados add-in? (I don't really like the design of Wordfast)


This is what I'm looking for. If this works, I will be able to translate like in TagEditor but in a more comfortable environment.


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 13:46
Member (2006)
English to Afrikaans
+ ...
Tagged RTF Oct 11, 2008

Nicholas Husbeek wrote:
I am looking for a way to translate the HTML file in Microsoft Word now. Any suggestions?


Well, you can run the PlusTools "tagger" and see if it works. It has an automatic function for standard HTML. You'll just have to remember to translate every second paragraph. You can also use MS Word's find/replace to replace the English text's style with a style called tw4winExternal. See the Wordfast manual for more info.

OmegaT can handle HTML files also, but for you I'd probably try paragraph segmentation instead of sentence segmentation because OmegaT translates non-unique segments with the same text. Or, translate a copy of the file in OmegaT. Then take the original and put some sort of text in all English segments (eg add ### to it). Replace the file in OmegaT with this new file, select "Create translated documents" and OmegaT will auto-translate the new file (but leave the English intact because they have no 100% matches in the TM). Then remove ### again.


Direct link Reply with quote
 
xxxdrkoljan
Latvia
Local time: 14:46
English to Russian
+ ...
TOPIC STARTER
Progress Oct 13, 2008

Well, I've found that you can tag the HTML file with +Tools and then translate with Translator's Workbench, which is what I want. However, after one or two times I used +Tools the tagging functionality started failing - it just hangs up Word. The problem persisted even after reinstalling MS Office and +Tools.

Is there any alternative to +Tools to tag the HTML file? If there isn't, I'll just reinstall my OS.

Also, if someone knows a better way to translate a tagged HTML file with Word, or anything close to that, please share. All I need is tag protection, Word spelling+grammar checker, and translation memory.

Thanks in advance.

[Edited at 2008-10-13 19:23]


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Tanslating an HTML-like file, leaving the original

Advanced search







memoQ translator pro
Kilgray's memoQ is the world's fastest developing integrated localization & translation environment rendering you more productive and efficient.

With our advanced file filters, unlimited language and advanced file support, memoQ translator pro has been designed for translators and reviewers who work on their own, with other translators or in team-based translation projects.

More info »
LSP.expert
You’re a freelance translator? LSP.expert helps you manage your daily translation jobs. It’s easy, fast and secure.

How about you start tracking translation jobs and sending invoices in minutes? You can also manage your clients and generate reports about your business activities. So you always keep a clear view on your planning, AND you get a free 30 day trial period!

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search