HTML-codes in XLS files - your workarounds please?
Thread poster: Wolfgang Jörissen

Wolfgang Jörissen  Identity Verified
Belize
Dutch to German
+ ...
May 27, 2009

That is, if there are any. It is one of those large files containing a whole database with long text fields containing the usual
etc. stuff. Any ideas?


 

Wolfgang Jörissen  Identity Verified
Belize
Dutch to German
+ ...
TOPIC STARTER
CSV May 27, 2009

I think I made some progress in the meantime myself. Saved the whole thing to csv and used the html filter to import this. The export looks ok at first sight. The only thing I will have to do is to adjust the segmentation rules and mark the semicolon as a segment end.

If you have other suggestions, they are still welcome.

[Edited at 2009-05-27 14:18 GMT]


 

Endre Both  Identity Verified
Germany
Local time: 19:32
Member (2002)
English to German
Mark HTML codes red so they import as codes May 27, 2009

Wolfgang Jörissen wrote:
Saved the whole thing to csv and used the html filter to import this. The export looks ok at first sight.


Wow, great idea (as long as the file is simple enough so that no loss of information happens during the XLS-CSV-XLS roundtrip).

Some time ago I resorted to painting the HTML codes red (using a VBA macro) so that they would show up as numbered codes in DV, but your solution is much neater.

Endre


 

Wolfgang Jörissen  Identity Verified
Belize
Dutch to German
+ ...
TOPIC STARTER
Seems to work, but... May 27, 2009

Endre Both wrote:

Wow, great idea (as long as the file is simple enough so that no loss of information happens during the XLS-CSV-XLS roundtrip).


Well, not so great after all, but it seems like I got away with it. XLS-CSV-XLS as such went fine (or actually I ended up with tab-separated txt), but after having it imported and exported through the HTML filter, it got messed up. The HTML filter does not recognize the tabstops and line breaks. Opening the txt file in word and replacing those with placeholders did the job.

But I'd be interested in your VBA macro anyway.icon_smile.gif


 

Jaroslaw Michalak  Identity Verified
Poland
Local time: 19:32
Member (2004)
English to Polish
Not sure if DVX supports it... May 27, 2009

Have you tried saving it in Excel as html? Maybe that would fool the filter enough to interpret the code in cells as HTML...

 

RieM  Identity Verified
United States
Local time: 13:32
English to Japanese
+ ...
a totally wacky way.... May 27, 2009

Instead of trying to save codes as tags, I send individual HTML codes to a TM, and insert them as I go (selecting it from Autosearch portion window and CTRL-R), or just have the system autoserach it.

I know this sounds very wacky and might not work for your particular project, but I'm ok with it because I use them over and over for one of my projects which is neither HTML or Excel, but have some HTML-like codes, and the word order is so different between the source and target languages anyway. I have a total control of where they should go, and can shoot them easily.

Interested? Here is how to create such a TM, though you can choose whatever method you like:

Import the file as is:

Run SQL (click the field where "All Rows" is displayed, and scroll to "SQL Statement" and type:

Source LIKE "∗<∗"
(Sorry , the BBcodes doesn't work here for some reason, I will send the sql statement later)


This leaves only the segments that contain HTML tags. Déjà Vu doesn't lock the source text, and so you are free to edit and weed out unwanted text. The goal is to create a simple list, with each one source segment containing just one code item, such as

(cannot seem to escape the some symbols.. awwww)

When done, then

Psedotranslate, highlight all, and send it to a TM!

Of course, the original project file is no longer useful. Just discard it and start anew.

Good luck!
Rie

PS: I have run into a problem some time ago when Deja Vu didn't import "phony" Excel files and truncated some text when the cell contained too many words or too many hard returns. And so be careful!


[Edited at 2009-05-27 17:34 GMT]

[Edited at 2009-05-27 17:42 GMT]

[Edited at 2009-05-27 17:54 GMT]


 

RieM  Identity Verified
United States
Local time: 13:32
English to Japanese
+ ...
follow-up May 27, 2009

I'm so sorry to flood your inbox with my poor attempts. Who said bbcode is simpler than HTML!!!!!

The SQL command part (in the case above, "LIKE") seems case-sensitive.

Here is THE correct statement:

Source LIKE ′ ∗ < ∗ ′

The text to search is surrounded by two single quotes. ∗ is a wildcard.

... Then, after filtering and cleaning, the source contains

<some codes>

<some codes>

<some codes>
....


 

Wolfgang Jörissen  Identity Verified
Belize
Dutch to German
+ ...
TOPIC STARTER
No such option May 27, 2009

Jabberwock wrote:

Have you tried saving it in Excel as html? Maybe that would fool the filter enough to interpret the code in cells as HTML...


Thought about that as well, but when checking the options under "Save as..", I found out that my Excel version (2002) does not allow that.

But... saving to text and replace tabs and line breaks with placeholders actually seems to work, only needs some finetuning.

@Rie: Thanks for your suggestion. I tried a similar attempt some time ago, but then DVX gave me fuzzy matches where they were actually more annoying than helpful. So really look them up through SQL, pseudo-translate them and it works? Interesting approach!
Although in this case, it will probably not be an option, because not all translators on this particular project will use DVX, so errors will be programmed if we do not get rid of the tags.


 

RieM  Identity Verified
United States
Local time: 13:32
English to Japanese
+ ...
Then.... May 27, 2009

Maybe.... This just popped up in my head.. not tested...

Saving the Excel file as a TAB Delimited text. Open it with a text editor, and

replace each TAB with two custom tags, starting with a closing one

</Wolfgang><Wolfgang>

Add to the beginning of the file
<Wolfgang>

Add to the ending of the file
</Wolfgang>

To indicate External tags.

Then create a custom XML filter using this file in DV --- It's not difficult. If you have Trados, you can use it as well. Indeed, Trados might work better for this purpose.



Rie

PS: my wacky method works for me, but I understand the fuzz with fuzzy matching. "Enable fuzzy terminology lookups" is always off, but you still have to be a bit careful.

PPS: hmm, I just don't like bbcodes, or preview function of this site which converts the HMTL codes to literals... sigh.






[Edited at 2009-05-27 20:18 GMT]


 


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Pavel Tsvetkov[Call to this topic]

You can also contact site staff by submitting a support request »

HTML-codes in XLS files - your workarounds please?

Advanced search






SDL Trados Studio 2019 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2019 has evolved to bring translators a brand new experience. Designed with user experience at its core, Studio 2019 transforms how new users get up and running, helps experienced users make the most of the powerful features, ensures new

More info »
PerfectIt consistency checker
Faster Checking, Greater Accuracy

PerfectIt helps deliver error-free documents. It improves consistency, ensures quality and helps to enforce style guides. It’s a powerful tool for pro users, and comes with the assurance of a 30-day money back guarantee.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search