Pages in topic:   [1 2] >
Help! What is the best CAT tools for embedded text (in diagrams, OLE) in Word
Thread poster: Claudia Alvis

Claudia Alvis  Identity Verified
Peru
Local time: 01:54
Spanish
+ ...
Dec 7, 2007

Hello,

I have a large .doc document I need to translate. The document has several embedded diagrams (Document Objects) with text that has to be translated in several text boxes. I know that Trados doesn't handle embedded object in Word, so I thought of using SDLX to get the text from the embedded diagrams, and an accurate word count. But I did a test with a 217-word segment + 100-word embedded diagram and both analysis gave me the same word count: 217. I don't want to translate each diagram separately because it will take me forever but also, I might miss something important.

Is there a CAT tool that handles embedded objects in Word properly? Or is there a way to tell Trados or SDLX to "read" those Objects? The kind of object I'm talking is this: {EMBED Word Document.8\s}, so if I right-click on it, I get the Document Object window and it shows up as a separate document.

I was also wondering, if I CAT tool can't do the trick, if I could batch-save all those Objects, work on them and then update like a TOC. I tried doing that manually, and TRADOS got the right word count but I don't know how to batch save them.

I'd appreciate ANY help.

Thanks.


[Edited at 2007-12-08 06:17]


Direct link Reply with quote
 

Antoní­n Otáhal
Local time: 08:54
Member (2005)
English to Czech
+ ...
Transit could make it Dec 8, 2007

I have used Star Transit XV for translating embedded Excel and "editable images" in Word and PowerPoint without problems. From what you say, you have something like "Word embeddd in Word" here, which I have never met, but I suppose it would work as well. Note that you need at least Smart version for importing Word into Transit.

HTH
Antonin


Direct link Reply with quote
 

Heinrich Pesch  Identity Verified
Finland
Local time: 09:54
Member (2003)
Finnish to German
+ ...
Did you try Tageditor? Dec 8, 2007

At least textboxes are handled well in TE, but when I tried SDLX, the program got stuck already when converting.
(Fortunately) I have no experiences with these embedded objects.
One circumvention could be to convert to pdf and scan these back to Word. Or simply copy the contents to another Word-doc without embedding.


Direct link Reply with quote
 

Claudia Alvis  Identity Verified
Peru
Local time: 01:54
Spanish
+ ...
TOPIC STARTER
All kinds of objects Dec 8, 2007

Actually, upon revising some of the documents, I've found that it has all kinds of objects like, Excel embedded in Word embedded in Word, that is an excel table, embedded in a Word file that's embedded in the main document. And I just started with the file, I don't wanna think what's coming up next.

Heinrich, TagEditor would definitely not work. I mean, it doesn't even work with simple objects, let alone this kind of 'monster'.


Direct link Reply with quote
 
xxxBrandis
Local time: 08:54
English to German
+ ...
multiple choice... Dec 8, 2007

Hi! Text embedded in pictures or diagrams can also be diagrams or alphabets / alphanumeric. In the second event, I would use a picture extraction facility like snagit into a folder and process them from there either in paintbrush, photoshop, fireworks without changing the dimensions of the diagrams. Process the rest of the .doc content clean the bilingual file and reembed the processed diagrams through replacement function. Best regards, Brandis

Direct link Reply with quote
 

Claudia Alvis  Identity Verified
Peru
Local time: 01:54
Spanish
+ ...
TOPIC STARTER
Graphics and diagrams Dec 8, 2007

Hello Brandis,

To be honest, I'm not too concerned about the embedded pictures (I'll worry about it later), the problem right now is extracting the text from the embedded objects (Excel, Word) on the files. I've been thinking maybe there's a tool like snagit, but that could work with embedded Office objects. So I could work on the files without having to worry about breaking the tags.

I'm also worried about the word count, because in just a couple of "double-embedded" Excel tables, I've found more than 500 words that neither Trados nor SDLX are counting.


Direct link Reply with quote
 
xxxBrandis
Local time: 08:54
English to German
+ ...
It works as it had worked for me... Dec 8, 2007

Claudia Alvis wrote:

Hello Brandis,

To be honest, I'm not too concerned about the embedded pictures (I'll worry about it later), the problem right now is extracting the text from the embedded objects (Excel, Word) on the files. I've been thinking maybe there's a tool like snagit, but that could work with embedded Office objects. So I could work on the files without having to worry about breaking the tags.

I'm also worried about the word count, because in just a couple of "double-embedded" Excel tables, I've found more than 500 words that neither Trados nor SDLX are counting.
Using Snagit ( This is not advertisement) one could extract all pictures from a website, from a book and similarly from a document. Save them to a separate folder. Tag the document, clean up and replace with the processed graphics, you have to maintain the original proportions to retain the document format. SNAGIt is a freeware for 30 days I think and it is fully functional. I found it to be great in such instances. There is however an ECM-Plugin, somewhat expensive but does a grand job. Best regards, Brandis

[Edited at 2007-12-08 06:25]


Direct link Reply with quote
 

Heinrich Pesch  Identity Verified
Finland
Local time: 09:54
Member (2003)
Finnish to German
+ ...
seems no tool will do it all Dec 8, 2007

But my advice about converting to pdf and scanning would allow you at least to count the text.
Probably nobody would be prepared to pay the thousands of Euros a tool would cost that could handle complicated files like in this example. That's why such a tool probably has not been developed.
You could ask Jost Zetschke, who routinely researches all possible translating tools. Perhaps there is a tool for big translation agencies that could handle these objects.

I would create a new file and copy and past the objects one by one. After translation they do not have to be embedded, because the content will not change.
Have you tried to open the file in Openoffice Writer?
There should be a function for "flattening" those objects.

Good luck!


Direct link Reply with quote
 

Antoní­n Otáhal
Local time: 08:54
Member (2005)
English to Czech
+ ...
Transit really can do it smoothly Dec 8, 2007

Heinrich, your statement "seems no tool will do it all" looks like a bit of an overstatement.

Antonin


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 08:54
Member (2006)
English to Afrikaans
+ ...
I know of no such tool Dec 8, 2007

Claudia Alvis wrote:
I have a large .doc document I need to translate. The document has several embedded diagrams (Document Objects) with text that has to be translated in several text boxes.


If there were only text boxes, then OmegaT could do it. And there is a macro in the Wordfast Yahoogroup's file section for extracting all text box text, translate it and put them all back in one go. But... you talk about embedded objects, so I gather it aint just simple text boxes with text in them, right?

I don't want to translate each diagram separately because it will take me forever but also, I might miss something important.


It may come to that. Still, isn't there a way you can select all text except embedded objects, and delete it all, leaving only the embedded stuff?

I was also wondering, if I CAT tool can't do the trick, if I could batch-save all those Objects, work on them and then update like a TOC.


Hmm, that would be interesting to play around with. Tis a pity I don't have access to your document...


Direct link Reply with quote
 

Gillian Scheibelein  Identity Verified
Germany
Local time: 08:54
Member (2003)
German to English
+ ...
a suggestion... Dec 8, 2007

Hi Claudia,

you can send me the file and I'll try a Transit import. We have a new filter that allows imports of all types of Office documents into a single project. I can then copy the extracted text into a Word file and you can count and translate it. It is worth a try. Transit is excellent at extracting text out of ppt and xls files - even hidden text!

Cheers,
Jill


Direct link Reply with quote
 

Peter Linton  Identity Verified
Local time: 07:54
Member (2002)
Swedish to English
+ ...
Create a PDF Dec 8, 2007

Whatever tools you find, Heinrich Pesch's advice about creating a PDF file is very good. That way you can be reasonably sure of displaying all the text, even if hidden by Word. It thus provides a good way of checking that you have not missed anything.

I recently had a good example of the problem -- a DOC file with several apparently empty pages that appeared only if you right clicked the mouse on each empty page. So the final word count was twice as big as the value Word (and the customer) expected.

In this case, I converted the PDF file into a Word DOC (using OmniPage 15), and all the hidden text suddenly appeared.

In hindsight, I should have checked the settings in Word Tools/Options/Show Picture placeholders and Field codes.


Direct link Reply with quote
 

Claudia Alvis  Identity Verified
Peru
Local time: 01:54
Spanish
+ ...
TOPIC STARTER
Transit XV Dec 8, 2007

Thanks everyone for your suggestions and offers, you're very generous especially you-know-who.

I also want to thank Antoní­n and Gillian for leading me in the right direction. Transit XV does recognize and work with embedded objects and 'objects embedded into objects embedded', which is not a small task. A generous colleague has let me use his copy of Transit (I know, I know), and even he's on vacations we've spent all morning in a trial-and-error session and I think we finally have it figured out.

First of all, I have to say that I am fairly impressed with Transit, I'd been looking for an alternative to Trados and I think Transit might be it. I'm posting what I did, even though with Transit, it doesn't seems so complicated anymore, it might help somebody else because Transit was not the end of the process.


  • I prepared the project with Transit. In File Type I chose MS-Word then I went to Options and checked 'Process objects' from the 'Embedded Objects' group box.
  • I added a couple of sample files I had previously prepared; one with a 'diagram within a word object within word' and the other with a 'table as an excel object within a word object within word'.
  • Transit managed to "read" the text from the embedded objects, I pseudo-translated some of them then I exported the files.
  • But when I opened the exported files, the text in the objects hadn't changed. I mean the normal text was translated but the text inside the tables and diagrams were still in the original language.
  • It turned out that I had to manually 'Convert' them. For instance, with the diagram with text-boxes embedded as a Word Object, I had to right-click on the code {EMBED Word Document.8\s} and select Document Object > Convert > Convert to > Microsoft Office Word Document. With the diagram as an Excel Object embedded in a Word Object embedded in the document, I had to do one extra thing. I right-clicked on the table, then selected Document Object>Open. Once I got into the Word Object, I right-clicked on the table then selected Worksheet Object> Convert > Convert to > Microsoft Office Excel Worksheet.
  • So 'Convert' was the way to update the objects, just like I thought I had to do manually but Transit save me a lot of time.


Since it's such a complex text, it's very likely that I'll come across many more of this jewells

Brandis, Heinrich, Peter, the reason I didn't want to resort to converting the file to pdf is that I didn't want to modify the code, let alone replace it with just the text. My file has plenty of bookmarks, codes, links, so I was afraid if I modify something, I will ruin the file. And those codes are a nightmare to fix.

My concern now is how to charge for doing this, as I don't really know how long this whole process would take me and I'm basically learning how to do this. I had never worked on Word document that was so heavily-coded as much as this one.

Thanks

Claudia


Direct link Reply with quote
 

Antoní­n Otáhal
Local time: 08:54
Member (2005)
English to Czech
+ ...
Regarding the "intellectual property rights" issue when testing Transit Dec 9, 2007

When I was considering a purchase of Transit, they gave me a full veriosn for one monh as a trial, so perhaps your admitted usage of someone else's licence may not be that bad.

Now that you mention it, I do recall that the embedded objects require kind of a "refresh" step after they are exported from Transit. Sorry I did not forewarn you - fortunately, that kind of jobs do not come my way that often, so I had happily forgotten about it. I mainly use this feature for Excel within PowerPoint, and the "after-processing" stage with ppt files (after exporting them from Transit before sending them out to the customer) is usually quite extensive anyway...

Antonin

[Edited at 2007-12-09 00:05]


Direct link Reply with quote
 

Heinrich Pesch  Identity Verified
Finland
Local time: 09:54
Member (2003)
Finnish to German
+ ...
Good to know Dec 9, 2007

But what is the price-tag? Which version of Transit is needed for this?

I do not understand very much of document structures, so I would like to ask, is it really worth the trouble?
I believe embedded objects are pieces of other applications somewhere in a document hierarchy. When person A embedds an excel file in a Word-file and person B makes changes to the excel-file, the next time person A opens his Word-file the changes are realised.
But what if person A gives the word-document to person C for translation. If translator C uses Transit and preserves all links to the original file structure, will subsequently any change person B will make to his excel-file and person A opens the translated Word-file, will the changes person B has done effect the translation, so that the untranslated version of the excel-file will replace the translated version?

Why is it really necessary to preserve the links when translating?

Regards
Heinrich

PS: If one can convert something to the Transit-format, it should be possible to do the actual translation in Word using PlusToys-macro and convert back to Transit format and from Transit back to the original.


Direct link Reply with quote
 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Help! What is the best CAT tools for embedded text (in diagrams, OLE) in Word

Advanced search







Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search