PROZ.COM COVID-19 RESOURCE CENTER
Access Covid-19 jobs, answer relevant terminology questions, read industry news and more.

pdf's in Studio 2009 are full of tags
Thread poster: Claudia Reynaud

Claudia Reynaud  Identity Verified
United States
Local time: 03:06
English to Spanish
+ ...
Jan 15, 2011

This is the second time I've prepared projects including searchable pdf files and every single word is enclosed in a tag. It makes the source file completely unreadable. Is there anyway to hide these tags or am I doing something wrong when setting up the project?

I am new to Trados, so any help will be greatly appreciated!

Thanks in advance!

Regards,
Claudia Reynaud


 

Laurent KRAULAND (X)  Identity Verified
France
Local time: 09:06
French to German
+ ...
Formatting Jan 15, 2011

These tags are formatting tags and usually appear in interfaces like TagEditor (i. e. TTX files).

I don't have a solution as I parted with Trados back in 2009, but you could read this for some suggestions: http://www.translationtribulations.com/2011/01/dining-on-tag-salad.html


 

Pablo Bouvier  Identity Verified
Local time: 09:06
German to Spanish
+ ...
pdf's in Studio 2009 are full of tags Jan 15, 2011

Claudia Reynaud wrote:

This is the second time I've prepared projects including searchable pdf files and every single word is enclosed in a tag. It makes the source file completely unreadable. Is there anyway to hide these tags or am I doing something wrong when setting up the project?

I am new to Trados, so any help will be greatly appreciated!

Thanks in advance!

Regards,
Claudia Reynaud


http://www.proz.com/forum/cat_tools_technical_help/189271-new_codezapper_version_is_available.html


 

Emma Goldsmith  Identity Verified
Spain
Local time: 09:06
Member (2010)
Spanish to English
use external ocr Jan 15, 2011

I understand you are using Studio to process the pdfs. If it is a basic text pdf it might manage, but by far the best workaround is to convert your pdf with an ocr application (abbyy, etc.). Then open it in Word and get rid of all the formatting you can (delete all formatting if possible) or use Code Zapper that Pablo recommends.
Then open your Word file in Studio.


 

SDL Community  Identity Verified
United Kingdom
Local time: 09:06
Member (1970)
English
Another solution is.... Jan 15, 2011

Claudia Reynaud wrote:
This is the second time I've prepared projects including searchable pdf files and every single word is enclosed in a tag. It makes the source file completely unreadable. Is there anyway to hide these tags or am I doing something wrong when setting up the project?


Hi Claudia,

As mentioned already this may be case of needing to use a better PDF tool than the filter provided in Studio. However, if you don't have one a useful workaround is this;

1. Open the document in Studio (no need for a TM) and immediately save the source
2. This will give you a word document
3. Open the document in MSWord and select all the content with CTRL+A
4. Press CTRL+spacebar and this should remove all the formatting
5. Now open the format free word document in Studio and tranlsate this instead

The advantage of this is that you will get clean text. The disadvantage is that you will have to format the document afterwards to look like the pdf version. However, I think it should be reasonable to charge more for this process if you are not given the true source document in the first place.... or am I dreaming

Regards

Paul


 

Soonthon LUPKITARO(Ph.D.)  Identity Verified
Thailand
Local time: 14:06
Member (2004)
English to Thai
+ ...
Text editor Jan 16, 2011



As mentioned already this may be case of needing to use a better PDF tool than the filter provided in Studio. However, if you don't have one a useful workaround is this;

1. Open the document in Studio (no need for a TM) and immediately save the source
2. This will give you a word document
3. Open the document in MSWord and select all the content with CTRL+A
4. Press CTRL+spacebar and this should remove all the formatting
5. Now open the format free word document in Studio and tranlsate this instead

Paul

I use text editor e.g. Notepad to eliminate formats of Words here. It work better. You can try opening Word file in step 2. above in SDLX [SDL edit] and remove all formats. Next, translate the resulting *.itd file in Studio 2009 as usual.

Soonthon Lupkitaro


 

Jonathan Hopkins  Identity Verified
Germany
Local time: 09:06
German to English
+ ...
Studio PDF filter Jan 16, 2011

Paul's idea looked like a good to me, so I thought I'd try it out. However, for some reason unbeknownst to me, no text is saved to the word file. Does anyone know what the problem might be? Using Studio's filter produces tagged text in the editor, but the saved word file includes no text. Here's what the respective files look like:



...
See more
Paul's idea looked like a good to me, so I thought I'd try it out. However, for some reason unbeknownst to me, no text is saved to the word file. Does anyone know what the problem might be? Using Studio's filter produces tagged text in the editor, but the saved word file includes no text. Here's what the respective files look like:



Collapse


 

Claudia Reynaud  Identity Verified
United States
Local time: 03:06
English to Spanish
+ ...
TOPIC STARTER
Thanks, everyone!!! Jan 16, 2011

I found an option in Studio under PDF / Settings, that prepares files, supposedly without tags... no luck there. I tried Emma's suggestion of OCRing with abbyy, and I also have a PDF to Word converter and I tried that but this document is full of formatting, charts and huge tables, so that didn't work either. So... I went ahead and started translating the document with all the tags (clearing the tags in the target), but the tables in the saved target document are all jumbled (and I already trans... See more
I found an option in Studio under PDF / Settings, that prepares files, supposedly without tags... no luck there. I tried Emma's suggestion of OCRing with abbyy, and I also have a PDF to Word converter and I tried that but this document is full of formatting, charts and huge tables, so that didn't work either. So... I went ahead and started translating the document with all the tags (clearing the tags in the target), but the tables in the saved target document are all jumbled (and I already translated them).

I'm on a very tight deadline so it seems like I'll just have to go ahead and overwrite the whole thing :'(... but I'll try your other suggestions as soon as I'm done with this project and I'll let you know how it goes.

So much for my recent investment in Trados! Oh, well...

Thanks a million to all of you!!!
Claudia
Collapse


 

Jerzy Czopik  Identity Verified
Germany
Local time: 09:06
Member (2003)
Polish to German
+ ...
Pardon my French... Jan 16, 2011

but did you really expect wonders?
There is no wonder bringing software in the world, as you state also Abbyy or other converter are not able to deliver a decent formatted file in fully automated modus.
The PDF conversion is one of the most discussed topics here on ProZ and there is no patent solution for that.
But regardless the way you convert a PDF manual pre and post processing is always necessary. I of course do not know how big is your PDF, but it seems converting the PDF
... See more
but did you really expect wonders?
There is no wonder bringing software in the world, as you state also Abbyy or other converter are not able to deliver a decent formatted file in fully automated modus.
The PDF conversion is one of the most discussed topics here on ProZ and there is no patent solution for that.
But regardless the way you convert a PDF manual pre and post processing is always necessary. I of course do not know how big is your PDF, but it seems converting the PDF to plain text and reformatting it manually in Word would be still the best option. IMHO if the automatic (mostly manually supported by the selection and definition of recognition areas in OCR) conversion does not really work this is the only way to go.

To get rid of the tags in between letters use CodeZapper or at least select the whole text in Word (CTRL+A), then press CTRL+D, selet ONE single font (ie Arial), press OK. Press CTRL+D again, change now to the Character Spacing tab and select 100% for Scale and set Spacing to normal. Press OK, save the file. This way most of the tags will be removed, but the core formatting of the document will still not change.
Collapse


 

Claudia Reynaud  Identity Verified
United States
Local time: 03:06
English to Spanish
+ ...
TOPIC STARTER
Yes, I did expect that!! Jan 16, 2011

Being new to Trados, I thought I could work on searchable PDFs without all the hassle of converting or copy/pasting and then formatting... I guess I'm sort of naive

I'll try your suggestion. Thanks so much for your help!

Claudia Reynaud


 

Jerzy Czopik  Identity Verified
Germany
Local time: 09:06
Member (2003)
Polish to German
+ ...
So you are a victim of marketing Jan 16, 2011

Sorry for you - but please believe me, there is no tool in the world, which will convert a PDF to Word and leave you with a perfect result, where you do not need to do anything.
Please try to use the Word file you've got from Abbyy or your other converter and remove the unnecessary formatting as I described above. This way you will really have far less tags than before.


 

Signe Golly  Identity Verified
Denmark
Local time: 09:06
English to Danish
+ ...
Paul saves my behind twice in two days! Feb 21, 2011

SDL Support wrote:

Claudia Reynaud wrote:
This is the second time I've prepared projects including searchable pdf files and every single word is enclosed in a tag. It makes the source file completely unreadable. Is there anyway to hide these tags or am I doing something wrong when setting up the project?


Hi Claudia,

As mentioned already this may be case of needing to use a better PDF tool than the filter provided in Studio. However, if you don't have one a useful workaround is this;

1. Open the document in Studio (no need for a TM) and immediately save the source
2. This will give you a word document
3. Open the document in MSWord and select all the content with CTRL+A
4. Press CTRL+spacebar and this should remove all the formatting
5. Now open the format free word document in Studio and tranlsate this instead

The advantage of this is that you will get clean text. The disadvantage is that you will have to format the document afterwards to look like the pdf version. However, I think it should be reasonable to charge more for this process if you are not given the true source document in the first place.... or am I dreaming

Regards

Paul


Another star for you, Paul.
I opened a Word file containing some simple tables to be translated in Trados 2009 and it was a MESS with about a gazillion seemingly pointless tags within single segments. Thanks to your advice, I was able to remove the formatting and save an unformatted version of the source file which will now make the translation so much easier. Plugging everything back into the original format should be a breeze since there really isn't much too the tables (at least not that's visible to the naked eye)


 

Anthony Kehoe  Identity Verified
Local time: 16:06
Japanese to English
PDF Sep 8, 2012

I have a problem in that the client sends me the TTX file with more tags than text, and wants the same in return.
God, but I hate PDF.


 

Emma Goldsmith  Identity Verified
Spain
Local time: 09:06
Member (2010)
Spanish to English
Unnecessary tags in TTX files Sep 8, 2012

Kimpatsu wrote:

I have a problem in that the client sends me the TTX file with more tags than text, and wants the same in return.


If your client is sending you TTX files that are originally PDFs converted to Word using an OCR tool, and your files are full of rogue tags, then you need to train your client how to do this efficiently. You may be interested in reading a blog article I wrote on How to get rid of a tag soup in Trados Studio.

I personally would ask my client to let me create the TTX file so that I can see what the original file looks like and work on them if necessary.

If you're concerned about tags in general (not tags within words and other junk tags), then I'm afraid that they're an essential part of the environment when you work outside Word. Make sure you're working in Studio and not in Tag Editor (the old, user-unfriendly application that comes in Trados 2007) and you'll get used to them and see why they're essential.

The WYSIWYG mode in the Editor view of Studio isn't advisable because you have a clear workspace but it is all too easy to enter text on the "wrong" end of a tag, or delete one by mistake. So I advise keeping all tags and formatting visible.

HTH,
Emma


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

pdf's in Studio 2009 are full of tags

Advanced search







SDL Trados Business Manager Lite
Create customer quotes and invoices from within SDL Trados Studio

SDL Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »
PerfectIt consistency checker
Faster Checking, Greater Accuracy

PerfectIt helps deliver error-free documents. It improves consistency, ensures quality and helps to enforce style guides. It’s a powerful tool for pro users, and comes with the assurance of a 30-day money back guarantee.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search