Pages in topic:   [1 2] >
How do we translate PDF with SDL Trados 2011
Thread poster: Secilia Njau

Secilia Njau  Identity Verified
United States
Local time: 14:59
Member (2011)
English to Swahili
+ ...
Apr 18, 2012

Hi,

I was wondering if I could get some help on how to translate PDF documents with SDL Trados studio 2011.

1. Is it possible to translate them as we do with word documents in SDL Trados? How?

2. I tried to convert to word but I get word documents which are not editable!

I will real appreciate for any kind advice.


Direct link Reply with quote
 

Andrzej Lejman  Identity Verified
Local time: 12:59
German to Polish
+ ...
Please use the "search" function of the forum Apr 18, 2012

This topic has been discussed some 17 831 times.

A.


Direct link Reply with quote
 

SDL Community  Identity Verified
United Kingdom
Local time: 12:59
English
Probably an image Apr 18, 2012

Hi,

Studio can handle PDF documents if they are based on text in the first place.... although the quality of the pdf itself can often lead to very taggy documents so sometimes it's better to save the source so you get a word document and you can clean that up and remove the tags first (processes for this have been discussed quite a bit so I won't try to repeat them here).

In your case however it sounds as though the PDF is actually based on an image rather than text and this is probably why you can't get an editable word document either. So you need to print the document and then scan it with an OCR tool to create a document containing translatable text.

Depending on your budget and/or how much time you want to spend on tidying up the document there is quite a bit of software out there that will do this for you. I sometimes use FreeOCR which is free... not great, but does a fast and reasonable job of parsing the text into a text file for me. At the other end of the scale is something like AbbyFineReader which is an excellent tool for getting the text and the proper format of the document... but of course this is not free.

I hope this helps a little.

Regards

Paul


Direct link Reply with quote
 

Henning Holthusen  Identity Verified
Philippines
Local time: 19:59
English to German
+ ...
DON'T Apr 22, 2012

It's not a good idea to work on converted PDFs in Studio, unless you REALLY clean them up properly in advance.
The problem is that you don't really know what the finished text will look like, and that can lead to some bad surprises or simply the recognition that you have to clean up the document and/or start the translation over again - in Word, where you can adjust formatting as you go.
Recently, for example, a portion of a table got put into the document header.

I don't think that PDF conversion can be made to work with Studio, because it is just too much of a risk that you end up with text that is where it is not supposed to be, or invisible (i.e. in a badly formatted table).

I hope SDL doesn't waste any resources on trying to get this functionality to work. It is unneeded and for documents with any level of complexity it will always be risky to use.


Direct link Reply with quote
 

SDL Community  Identity Verified
United Kingdom
Local time: 12:59
English
Interesting thoughts... Apr 22, 2012

Henning Holthusen wrote:

I hope SDL doesn't waste any resources on trying to get this functionality to work. It is unneeded and for documents with any level of complexity it will always be risky to use.



You'd be surprised how many translators have told me they only purchased Studio so they could handle pdf files..! Much work, often carried out without a CAT at all comes from PDF documents so I think there is a very valid reason to try to handle this as well as we can.

But you are correct in that the tagging can be problematic depending on the quality of the source. But this is simple to resolve. Just roundtrip the file befroe you translate and look at the quality of the output. If it's taggy then clean all the tags and translate the cleaned up file... then spend time formatting (if needed) for the client (amd charge accordingly I'd say..).

If it's not too taggy then translate immediately and you're done.

Regards

Paul


Direct link Reply with quote
 

Henning Holthusen  Identity Verified
Philippines
Local time: 19:59
English to German
+ ...
Not the right way Apr 23, 2012

You'd be surprised how many translators have told me they only purchased Studio so they could handle pdf files..! Much work, often carried out without a CAT at all comes from PDF documents so I think there is a very valid reason to try to handle this as well as we can.



I translate legal documents and therefore get a lot of PDF documents. Those tend to have complex formatting, however, which often screws up the resulting conversion somewhat.
I just don't think it's a good idea to work on such documents directly in the Studio environment, unless you clean it up extensively in advance or check the resulting translation meticulously against the original PDF.

The problem is that you don't really know what the converted document you are working on will look like until you save the target document, and you may lose bits of text or get a really badly layouted and even effectively un-rescuable document if the conversion went badly enough (especially if the document has a lot of columns and tables). If, for example, the translated sentence is longer than the source sentence, that can create a bit formatting headache which is immediately visible with Word/Workbench but not with Studio.

The majority of PDFs will simple formatting will not be a problem, I suppose, but I don't like to run risks like that.


Direct link Reply with quote
 

Emma Goldsmith  Identity Verified
Spain
Local time: 12:59
Member (2010)
Spanish to English
Check target file before translating Apr 23, 2012

Henning Holthusen wrote:

The problem is that you don't really know what the converted document you are working on will look like until you save the target document


I recommend saving the target document as soon as you open the file in the editor window, before you start to translate. Obviously the "target file" will still be in your source language, but you will be able to see all potential formatting problems and so you can decide whether to go ahead and work on the file or whether it would be better to convert the PDF using OCR before you open it in Studio. That way there are no risks involved and you won't get any nasty surprises at the end.

My 2 cents.


Direct link Reply with quote
 
FarkasAndras
Local time: 12:59
English to Hungarian
+ ...
What else is there Apr 23, 2012

Henning Holthusen wrote:

I just don't think it's a good idea to work on such documents directly in the Studio environment, unless you clean it up extensively in advance or check the resulting translation meticulously against the original PDF.


I think we can all agree that this is self-evident.

I don't see how this took you to the conclusion that PDF in Studio is a bad idea, though. In my opinion, if you want to create a .doc that follows the formatting of the pdf, the best way to process PDFs is through Studio. I mean, what else do you do? Recreate the document from scratch?

My method is this:
- Import PDF to Studio, see if there are too many tags and generate a "target" doc right away.
- Open the "target" .doc and see if the layout is roughly right (it always is).
- Clean up the .doc as necessary and import it to Studio.
- Work on the .doc.sdlxliff and clean it up at the end as usual. Ignore the pdf.sdlxliff.

BTW Paul, can you ask the developers to fix spaces in the PDF import filter? Studio tries to get word spacing "right" by entering two spaces and specific-width (extra narrow and extra wide) spaces between words. This is the source of 90% of surplus tags in my pdf files, and it should be trivially easy to fix in the import filters. In running text, anything between certain limits should be treated as a normal single space.


Direct link Reply with quote
 

Henning Holthusen  Identity Verified
Philippines
Local time: 19:59
English to German
+ ...
Better in Word Apr 23, 2012

I don't see how this took you to the conclusion that PDF in Studio is a bad idea, though. In my opinion, if you want to create a .doc that follows the formatting of the pdf, the best way to process PDFs is through Studio. I mean, what else do you do? Recreate the document from scratch?


I convert the PDF with Finereader or Solid PDF Converter and translate it with Workbench in Word.
The advantage is that any formatting issues that lead to disastrous layout problems or (critically!) disappearing text are immediately obvious and can be addressed at once.
If you do it in Studio, the worst case scenario is that after a long translation process you end up with a document with a layout that is screwed up beyond your ability to repair it or invisible text or text in the wrong place which you fail to spot.

These kinds of problems are of course going to present themselves mostly with documents with complex formatting such as tables and columns.

[Edited at 2012-04-23 10:43 GMT]


Direct link Reply with quote
 
FarkasAndras
Local time: 12:59
English to Hungarian
+ ...
Not if you are a smart CAT user Apr 23, 2012

Henning Holthusen wrote:

If you do it in Studio, the worst case scenario is that after a long translation process you end up with a document with a layout that is screwed up beyond your ability to repair it

If you read my little description, you'll notice that this can't happen with the right workflow. BTW exporting the target file as a test before starting work is something you must do in Studio every time, including .doc source files.

Minor flaws are always possible, but they are fixed during the meticulous proofreading that you must always do with PDF files, whatever method you used to translate them.

The advantage of your method is that you can see what's wrong right away as you're translating... the downside is that the likelyhood of something going wrong is much higher (there is a reason why SDL dumped Workbench). I'll stick with Studio.


Direct link Reply with quote
 

Antoní­n Otáhal
Local time: 12:59
Member (2005)
English to Czech
+ ...
infix Apr 23, 2012

Repating myself on this forum, I point out this solution:

http://iceni.cachefly.net/infix/DemoMovies/Translation/Translation.swf


In principle, it stores the "pdf-creatiing" info in a xml format and lets you translate the language content in the xml file, after which the traslated pdf file is re-created.

Of course, it does not work for "scanned image" pdf files.

Antonion


Direct link Reply with quote
 
FarkasAndras
Local time: 12:59
English to Hungarian
+ ...
Could be nice if you don't have Studio... Apr 23, 2012

Antoní­n Otáhal wrote:

Repating myself on this forum, I point out this solution:

http://iceni.cachefly.net/infix/DemoMovies/Translation/Translation.swf


In principle, it stores the "pdf-creatiing" info in a xml format and lets you translate the language content in the xml file, after which the traslated pdf file is re-created.

Of course, it does not work for "scanned image" pdf files.

Antonion


...but this thread is about how to handle pdf files if you work with Studio and I hardly think infix is relevant in that scenario.


Direct link Reply with quote
 

Antoní­n Otáhal
Local time: 12:59
Member (2005)
English to Czech
+ ...
infix + studio Apr 23, 2012


...but this thread is about how to handle pdf files if you work with Studio and I hardly think infix is relevant in that scenario.


As long as you can translate xml files using the Studio (which you can), this workflow is probably the best route anyway. I have never used the pdf tool of Studio myself, but (or should I say because?) I have seen a number of programs trying to reverse-engineer pdf files and I am pretty sure their shortcomings are inherent, not technical.

So using the tool, even if it is added to the Studio package for no extra fee, does not make much sense (in my eyes).

Antonin


Direct link Reply with quote
 
FarkasAndras
Local time: 12:59
English to Hungarian
+ ...
Infix vs Studio Apr 23, 2012

Antoní­n Otáhal wrote:

I have never used the pdf tool of Studio myself, but (or should I say because?) I have seen a number of programs trying to reverse-engineer pdf files and I am pretty sure their shortcomings are inherent, not technical.


Perhaps you should reserve judgement until you try it then. It is really good. Not perfect by any means, but good enough to be very useful a lot of the time.

Antoní­n Otáhal wrote:

As long as you can translate xml files using the Studio (which you can), this workflow is probably the best route anyway.

One problem I can see is that you get a PDF file at the end (right?)
So far, my clients have always asked for Word files (which they can post-edit if they wish). Even if you wanted to submit your final translation in PDF, getting a PDF from your CAT would make life difficult. It'd be much easier to work with a doc and generate a pdf at the end. If you find errors when proofreading the final pdf, you have to go back to your CAT to fix them. And what do you do when the layout is not right, which happens often (the target text may end up being shorter or longer than the source text, and you get funky page breaks or you might need to resize the column widths in a table to make things fit correctly etc.)?

So no, I don't think I'd ever consider using this workflow. It is enticing, but too limiting and risky.

[Edited at 2012-04-23 15:09 GMT]


Direct link Reply with quote
 

SDL Community  Identity Verified
United Kingdom
Local time: 12:59
English
Not so easy... Apr 24, 2012

FarkasAndras wrote:

BTW Paul, can you ask the developers to fix spaces in the PDF import filter? Studio tries to get word spacing "right" by entering two spaces and specific-width (extra narrow and extra wide) spaces between words. This is the source of 90% of surplus tags in my pdf files, and it should be trivially easy to fix in the import filters. In running text, anything between certain limits should be treated as a normal single space.


Hi Farkas,

We actually use a third party tool for this... Solid Converter... as you probably know so this isn't something we can just simply do ourselves. Two questions though:

1. This doesn't happen with every file does it?
2. Can you give me a sample I can use to repro and we'll put a fix into the works

Regards

Paul


Direct link Reply with quote
 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

How do we translate PDF with SDL Trados 2011

Advanced search







Déjà Vu X3
Try it, Love it

Find out why Déjà Vu is today the most flexible, customizable and user-friendly tool on the market. See the brand new features in action: *Completely redesigned user interface *Live Preview *Inline spell checking *Inline

More info »
WordFinder
The words you want Anywhere, Anytime

WordFinder is the market's fastest and easiest way of finding the right word, term, translation or synonym in one or more dictionaries. In our assortment you can choose among more than 120 dictionaries in 15 languages from leading publishers.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search