PDF file in Trados Studio 2014 - How do I procede? (New to Trados)
Thread poster: Paula DeFilippo

Paula DeFilippo  Identity Verified
United States
Local time: 06:49
Member (2015)
French to English
Mar 7, 2015

I am a bit sleep-deprived, as I just stayed up all night struggling with a huge file I had translated using my newly-acquired Trados Studio 2014. I completed the file, then discovered that a simple little paragraph mark at the end of the source document was preventing Trados from processing the translation. I wasted a lot of time trying to work around it, then ended up manually transferring each line to the template supplied by the client, tags and all. Now I have received a file in PDF form with charts, graphs, etc., and I have decided to humbly ask for advice and assistance from more experienced translators before jumping into this one!
Any advice for me on how to translate this pdf file in Trados would be greatly appreciated and I would be endebted to you forever! It appears to be a true pdf file and not a scan, but I don't even know how to determine that small fact (I am a newby). The client has specified that I should not OCR the file, so that seems like an issue as well. Thank you from the bottom of my heart!

[Edited at 2015-03-07 15:29 GMT]


Direct link Reply with quote
 

SDL Community  Identity Verified
United Kingdom
Local time: 12:49
English
I think I would have to say... Mar 7, 2015

Paula DeFilippo wrote:

The client has specified that I should not OCR the file, so that seems like an issue as well.



... that your client is a little uneducated in this area. PDF files are not translatable formats. They are formats created to allow a user to read the content on any computer without requiring the native application that was used to create the original source. Nobody creates material directly into a PDF. So they would do you a big favour, and themselves, if they were to listen to you and provide you with the original source instead.

If you don't OCR then these are your options (or variations on these I think):

  1. Retype the text and return them a text only file
  2. Run it through a CAT which will probably convert the PDF to DOCX or Text anyway. the DOCX will be a "best attempt" approach to matching the formatting in the PDF
  3. Use something like InFix which will attempt to extract the text to a TXT or XML format. You then translate this file instead and finally import it back into the PDF. Potentially the best results and the client gets the PDF back.

The sort of content you have now will make the first option impossible without scanning the images to add into the Text. The second option will attempt to extract the images and put them into the DOCX so you can tidy it up afterwards. The last option would be better if it works... worth a try using the free trial first.

If the DOCX is the preferred route for you then I would open the PDF in Studio and immediately save the source file (or the target if you don't fill it with translations from your TM). This will give you the DOCX. Now tidy up the DOCX to get rid of all the incorrect paragraph breaks, silly kerning that adds uneeded tags etc. Then translate the tidied up DOCX instead.

But as I said at the start ... ask your client to stop being silly and give you the source file. I hope you charge them for all the messing around!
Regards

Paul
SDL Community Support


Direct link Reply with quote
 

Siegfried Armbruster  Identity Verified
Germany
Local time: 12:49
Member (2004)
English to German
+ ...
Thank you for your clear words Mar 8, 2015

SDL Community wrote:
  • Use something like InFix which will attempt to extract the text to a TXT or XML format. You then translate this file instead and finally import it back into the PDF. Potentially the best results and the client gets the PDF back.


  • We are using Infix PDF Editor for these cases and it actually gives excellent results. It is not to difficult to learn, but it means additional work and you should charge extra for it.



    But as I said at the start ... ask your client to stop being silly and give you the source file. I hope you charge them for all the messing around!


    In my opinion the best solution.


    Direct link Reply with quote
     

    Emma Goldsmith  Identity Verified
    Spain
    Local time: 12:49
    Member (2010)
    Spanish to English
    Silly clients and silly instructions Mar 8, 2015

    SDL Community wrote:
    ask your client to stop being silly and give you the source file.

    I wish it were that easy.
    Some files have changed hands so many times that it's impossible to locate the editable source file.
    Some files are scanned to protect their content from further editing (no one thinks of the poor translator).
    Some files are scanned because they've been signed, and, again, the signature is more important than the translator.
    Some files are still sent by fax, and my client can't go back to a government department to ask for the original.

    Paula DeFilippo wrote:

    The client has specified that I should not OCR the file, so that seems like an issue as well.



    Agencies instruct translators not to use OCR because they've seen too many OCR-contaminated translations and know how much work is involved in correcting this. Typical problems:

    1. The delivered translation is poorly formatted. Text runs in a never-ending single character columns / margins are off the page / footers are inserted into the main text, etc.

    2. Typical OCR interpretation errors: Alphanumerical strings - Part XO3dl001 - or an unusual date format 11Dec2011 are particularly error-prone. Number errors are common too. A 5 can look like a 6 or an 8 in poor quality scans.

    3. Scanned noise is inserted as rogue commas or apostrophes.

    4. Signatures over printed names cause errors in those names.

    All these problems can be overcome by translators who know how to format Word documents properly, and take time to check the translated file very thoroughly. But agencies don't always hire such meticulous translators.

    Their solution: instruct translators not to use OCR.


    Direct link Reply with quote
     

    Irene Johnson  Identity Verified
    France
    Local time: 12:49
    Member (2014)
    French to English
    + ...
    PDF and OCR shouldn't be problems Mar 8, 2015

    Hi Paula,

    First, it's not true that you can't use a PDF file in Trados. Trados will read true PDF files. It won't read the PDF files that are made from scanned documents, though. As long as the PDF contains actual text instead of an image of text, you should be okay. Just check the layout afterwards to make sure Trados didn't mix things up.

    Second, the reason clients don't want you to use OCR is that some people who use it aren't aware of/don't pay attention to/don't know how to fix OCR errors.

    In my opinion, as long as you do a professional job and give your client satisfaction, the means you use to do it isn't really your client's business, as long as you're not breaching any confidentiality or other agreements. Unless your client has access to your computer, they won't know you used OCR.

    If you have very good OCR software (I recommend ABBYY Finereader), you can certainly use it. Then proofread the text you get from your OCR to make sure it didn't make any mistakes. For complex documents, you'll have to rework the page setup in Word afterwards. Also, to eliminate some of the problems with OCR, such as frames, paragraph formatting and tags, use TransTools, which you can get here: http://www.translatortools.net/about.html

    If you're forced to work without CAT tools, you'll likely spend about twice as long on your translation, and you should bill accordingly.

    My two cents.


    Direct link Reply with quote
     

    Paula DeFilippo  Identity Verified
    United States
    Local time: 06:49
    Member (2015)
    French to English
    TOPIC STARTER
    Thank you all so much! Mar 8, 2015

    Thank you all for your kind responses - I now have a better understanding of the entire issue! I have been so frustrated, and I felt like there was something I was missing, so I spent hours and hours of wasted time trying to figure it out on my own - too stubborn/embarassed to ask for help. Thank you all for taking the time to help me. I hope some day to be able to do the same for you!

    Direct link Reply with quote
     

    SDL Community  Identity Verified
    United Kingdom
    Local time: 12:49
    English
    All very good points Emma... Mar 8, 2015

    Emma Goldsmith wrote:

    SDL Community wrote:
    ask your client to stop being silly and give you the source file.

    I wish it were that easy.
    Some files have changed hands so many times that it's impossible to locate the editable source file.
    Some files are scanned to protect their content from further editing (no one thinks of the poor translator).
    Some files are scanned because they've been signed, and, again, the signature is more important than the translator.
    Some files are still sent by fax, and my client can't go back to a government department to ask for the original.



    ... and I didn't mean to ignore this. I think when Paula said they told her not to OCR I just put everything she said together and came to my narrow conclusion.

    If the scenario is one of the above then clearly OCR may be the only sensible way to go, or just tell the client you can't do it. If they make it too difficult what can they expect!!

    Regards

    Paul
    SDL Community Support


    Direct link Reply with quote
     

    NeoAtlas
    Spain
    Local time: 12:49
    English to Spanish
    + ...
    Thanks Siegfried… Mar 9, 2015

    Siegfried Armbruster wrote:
    We are using Infix PDF Editor for these cases and it actually gives excellent results. It is not to difficult to learn, but it means additional work and you should charge extra for it.


    I was actually thinking of getting one licence of Infix Pro and I needed some feedback.

    Here is a nice video showing the translation workflow with this tool: http://www.iceni.com/infixcompare.htm , link on line "Translate PDFs via XML import / export (movie) ".

    Regards,

    ... Jesús Prieto ...


    Direct link Reply with quote
     


    To report site rules violations or get help, contact a site moderator:


    You can also contact site staff by submitting a support request »

    PDF file in Trados Studio 2014 - How do I procede? (New to Trados)

    Advanced search







    Wordfast Pro
    Translation Memory Software for Any Platform

    Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

    More info »
    Anycount & Translation Office 3000
    Translation Office 3000

    Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

    More info »



    Forums
    • All of ProZ.com
    • Term search
    • Jobs
    • Forums
    • Multiple search