Scanned PDF File - Do I have to copy this to Word?
Thread poster: Ashley Wans

Ashley Wans  Identity Verified
United States
Local time: 02:19
Spanish to English
+ ...
Apr 9, 2011

Hi all,

I am new to Trados, and have several projects that are scanned into PDFs that the client has requested in clean and unclean Trados files. Is there any way to get Trados to read the documents, or do I need to type the source text into Word? I have tried adjusting the Project Settings, but in the editing screen it still only shows the file name and will not load any of the actual document.

Any help with this would be greatly appreciated. I am looking through the "Help" section on PDFs to try to find more info on this.


Direct link Reply with quote
 

Manuela Ribecai  Identity Verified
Italy
Local time: 11:19
Member (2010)
English to French
+ ...
You can use an OCR software Apr 9, 2011

I don't use Trados, but you don't have to retype everything.
At worst, if Trados doesn't recognize the pdf documents, you can, using an OCR (Omnipage, for example), convert your PDF documents into .doc which they'll be recognized by Trados.

Hoping to have you helped, despite my miserable English,

Manuela


Direct link Reply with quote
 

Ashley Wans  Identity Verified
United States
Local time: 02:19
Spanish to English
+ ...
TOPIC STARTER
Your English is great. Apr 9, 2011

Manuela R wrote:

I don't use Trados, but you don't have to retype everything.
At worst, if Trados doesn't recognize the pdf documents, you can, using an OCR (Omnipage, for example), convert your PDF documents into .doc which they'll be recognized by Trados.

Hoping to have you helped, despite my miserable English,

Manuela


Thanks-- I was actually just discovering this answer via some Google as you mentioned it.


Direct link Reply with quote
 

Emma Goldsmith  Identity Verified
Spain
Local time: 11:19
Member (2010)
Spanish to English
A word of warning Apr 9, 2011

When you convert from PDF to Word using ocr - and you want to process the file later in Tag Editor or Studio - you must be careful to get rid of unnecessary tags that are created in the process.
There are many posts on this topic, e.g.: http://www.proz.com/forum/sdl_trados_support/196297-tags_tags_everywhere.html#1720615


Direct link Reply with quote
 

Pavel Tsvetkov  Identity Verified
Bulgaria
Local time: 12:19
Member (2008)
English to Bulgarian
+ ...

MODERATOR
Use FineReader and save target as *.txt Apr 9, 2011

The best OCR program on the market is called FineReader, but never export the recognized text as a *.doc, because it will be full of formatting that Trados cannot handle. Instead, export as a txt file and apply your own formatting in Word if you have to.

I hope that helps!


Direct link Reply with quote
 

Ashley Wans  Identity Verified
United States
Local time: 02:19
Spanish to English
+ ...
TOPIC STARTER
Thanks for the recommendation Apr 9, 2011

Pavel Tsvetkov wrote:

The best OCR program on the market is called FineReader, but never export the recognized text as a *.doc, because it will be full of formatting that Trados cannot handle. Instead, export as a txt file and apply your own formatting in Word if you have to.

I hope that helps!


I grabbed the 15 day trial off the Abbyy website. If it works, I will go ahead and invest in the full version of the product. Thanks for the input!


Direct link Reply with quote
 

Ashley Wans  Identity Verified
United States
Local time: 02:19
Spanish to English
+ ...
TOPIC STARTER
Creating clean and unclea files when using an OCR Apr 9, 2011

The Abbyy software is pretty good, but in some areas it does read the document incorrectly and misspell things.

I have a question about that to those of you who use Trados. When a client requests both clean and unclean files, do you correct all the spelling errors in the source text before loading it into Trados, or rather leave them there and just make sure your translation is correct? Since one version is "unclean", is it acceptable to leave the misspellings in the source text, or should they all be fixed since the client will be seeing it?


Direct link Reply with quote
 

Oliver Walter  Identity Verified
United Kingdom
Local time: 10:19
Member (2005)
German to English
+ ...
About spelling errors Apr 9, 2011

Ashley Wans wrote:
The Abbyy software is pretty good, but in some areas it does read the document incorrectly and misspell things.

Yes, I use an earlier version of FineReader (FR) & I find it quite good. Spelling errors from the OCR process: it's up to you to correct these, either using the editing facilities of FR or the word processor (MS Word, probably) into which you export the text. You may be able to improve (i.e. reduce) the number of spelling errors made by the OCR program by adjusting the contrast and brightness of the image on which the OCR is done. My FR version can't process PDFs directly so I typically print to paper and use a scanner to produce such images ( have very little such work, so I haven't obtained a more recent version of FR). If you want to adjust contrast & brightness but FR itself can't do that (I don't know whether it can), you could copy the image off the screen and use an image editor to do that.


I have a question about that to those of you who use Trados. When a client requests both clean and unclean files, do you correct all the spelling errors in the source text before loading it into Trados, or rather leave them there and just make sure your translation is correct? Since one version is "unclean", is it acceptable to leave the misspellings in the source text, or should they all be fixed since the client will be seeing it?

Spelling errors in the original document: I would leave them there but obviously translate what the text should have been. When I do this, I also send my client a further little document (or comments in the email) to indicate those errors.

Oliver


Direct link Reply with quote
 

xxxMack Tillman
Local time: 11:19
German to English
+ ...
Online Conversion PDF-Format to .doc-Format or Otherwise Apr 9, 2011

Hi,

try experimenting with these sites:
www.online-convert.com : Converted a PDF-File I have in quite good quality.
www.zamzar.com
or just google:
http://www.google.com/search?num=100&hl=de&newwindow=1&safe=active&q=free%20online%20file%20conversion&aq=f&aqi=g1&aql=&oq=


The nice thing about these online tools is that you don't have to worry about installing any software on your own system which just fills your computer's registry even more.

Spelling Mistakes
I agree with Oliver Walter. I tend to either comment on spelling mistakes by using Word's own commenting function or by drawing up a short document with all mistakes or questions that may arise. Depending on the deadline, I either forward these with the translation or wait till I've collected some and then send them on to get commented on by my customer.


Direct link Reply with quote
 

Ashley Wans  Identity Verified
United States
Local time: 02:19
Spanish to English
+ ...
TOPIC STARTER
Thanks Oliver and Mack Apr 9, 2011

I think I will probably go with the option of drawing up a sheet indicating where the errors are. Thanks to you both for your input.

Direct link Reply with quote
 

Emma Goldsmith  Identity Verified
Spain
Local time: 11:19
Member (2010)
Spanish to English
spelling errors Apr 9, 2011

I'm afraid I don't agree.
If the spelling errors are indeed errors from the ocr (yes, there are usually lots of them) and if your client has requested unclean file delivery, then you should correct the spelling in the original too, because otherwise the unclean file (and updated TM) will not be of any use to the client.

If, however, you are talking about spelling mistakes in the original pdf, then I agree with Oliver and Mack and you should just comment on them.


Direct link Reply with quote
 

Pavel Tsvetkov  Identity Verified
Bulgaria
Local time: 12:19
Member (2008)
English to Bulgarian
+ ...

MODERATOR
Spelling Errors Apr 10, 2011

Spelling mistakes should be corrected in the OCR software manually by you - before exporting to txt. FineReader walks you through the errors and also magnifies the source for you to see the error in context.

It makes no sense to work with a text full of errors as those cannot be corrected within Trados and also they will ultimately get into your TM and mess percentage calculations later on.

I always make sure I correct the files (OCR-produced or not) before starting to translate them.

Good luck!


Direct link Reply with quote
 

Ashley Wans  Identity Verified
United States
Local time: 02:19
Spanish to English
+ ...
TOPIC STARTER
Agreed. Apr 10, 2011

Pavel Tsvetkov wrote:

Spelling mistakes should be corrected in the OCR software manually by you - before exporting to txt. FineReader walks you through the errors and also magnifies the source for you to see the error in context.

It makes no sense to work with a text full of errors as those cannot be corrected within Trados and also they will ultimately get into your TM and mess percentage calculations later on.

I always make sure I correct the files (OCR-produced or not) before starting to translate them.

Good luck!


I did end up correcting all the errors first. I actually ended up closing Trados twice, because I saw errors while translating that I missed in my initial proof of the documents. Sort of tedious work, but I think I would have been doing misservice to the client to send them a file full of errors.

That said, now I know to adjust my rates to account for the extra time that takes next time. Lesson learned!


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Scanned PDF File - Do I have to copy this to Word?

Advanced search







Déjà Vu X3
Try it, Love it

Find out why Déjà Vu is today the most flexible, customizable and user-friendly tool on the market. See the brand new features in action: *Completely redesigned user interface *Live Preview *Inline spell checking *Inline

More info »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search