working with .pdf and .jpg files
Thread poster: Ilze Klotina

Ilze Klotina
Latvia
Local time: 08:40
Japanese to Latvian
+ ...
Sep 16, 2010

I would be grateful for some help concerning the following:

I am working as a freelancer, so sometimes I don't really "see" those documents which I have to translate. For example, lots of different documents are sent to me as .jpg or .pdf files, i.e., scanned. My brother is a programmer, so I asked him if I can convert those files somehow. He said that it is possible, but the result won't be as good as I would like it to be. The problem is that it takes a lot of time to translate those scanned files - I have a source text which is not editable and I have to create a translation as a text document. If it is only a certificate on one page then it's fine... but I've had also some quite big files (scanned), and it really tires me out...
I am working with Open Office and in the latest version there is an option to edit .jpg and .pdf files with Open Office Draw. However, it works very slowly - if the file is big, I have to wait for 20 minutes until the program converts it. And it is impossible to use OmegaT with those files, so it takes a lot of time anyway...

Does any of you know how to convert scanned files to make them editable?


 

Susan Welsh  Identity Verified
United States
Local time: 01:40
Member (2008)
Russian to English
+ ...
converting PDFs Sep 16, 2010

There is discussion of this in various other forums in the archives, as it is not an issue for OmegaT as such. No CAT tool can work directly with PDFs, to my knowledge.

You can convert PDFs to plain text using Adobe Reader, but the results can be quite poor, especially for something with complex formatting, like a certificate.

I use ABBYY PDF Transformer, which converts PDFs to .rtf or .txt (it says it converts to .doc, but it's not a "real" Microsoft .doc file, it's really .rtf, as I understand it). The results are quite variable. You have the option of converting the file through an OCR procedure, which works the best. (The better, more expensive tool for OCR is ABBYY Finereader.) For a rather simply formatted document, this works okay. For a certificate, or anything with lots of graphics and tables and text boxes, I have not found it satisfactory.

I now take the advice of various people from the OmegaT group, and convert the PDF to plain text, then reformat it. It's less grief all around

Search in the forum archives for "converting PDFs," and you'll find plenty more advice.

Good luck. PDFs are a pain.


 

Graeme Waller  Identity Verified
Finland
Local time: 08:40
Finnish to English
+ ...
One workaround Sep 16, 2010

ilzeilze wrote:

I would be grateful for some help concerning the following:

I am working as a freelancer, so sometimes I don't really "see" those documents which I have to translate. For example, lots of different documents are sent to me as .jpg or .pdf files, i.e., scanned. My brother is a programmer, so I asked him if I can convert those files somehow. He said that it is possible, but the result won't be as good as I would like it to be. The problem is that it takes a lot of time to translate those scanned files - I have a source text which is not editable and I have to create a translation as a text document. If it is only a certificate on one page then it's fine... but I've had also some quite big files (scanned), and it really tires me out...
I am working with Open Office and in the latest version there is an option to edit .jpg and .pdf files with Open Office Draw. However, it works very slowly - if the file is big, I have to wait for 20 minutes until the program converts it. And it is impossible to use OmegaT with those files, so it takes a lot of time anyway...

Does any of you know how to convert scanned files to make them editable?


Clients often send me scanned pdf files as source files. First I ask them if they can send me a doc, rtf or txt file. If they cannot, I use OCR (Optical Character Recognition) software to get a doc or txt source file, tidy it up to match the original as close a possible and run a spell check to fix spelling mistakes. Where there are a lot of complex graphics, I set the OCR software to produce text only, unformatted. I then closely proofread / check my source files against the original file.

All this is very time consuming but in some case works quite well. I explain to the client about the necessary preprocessing and charge them at a higher word rate. If they felt unable to pay the higher rate, I would just have to turn down the work

Sometimes my OCR software will not accept the pdf file (I have not yet had to process jpegs). I have also found using OpenOffice Draw terribly slow and frustrating. On at least one order I ended up typing in the source file.

If anyone has a workaround. I would be very grateful to hear it too.

By the way there might be a OCR program with your printer software. I have two OCR programs both of which came with hardware.

[Edited at 2010-09-16 21:31 GMT]

[Edited at 2010-09-17 12:18 GMT]


 

Isaac Verdú  Identity Verified
Venezuela
Local time: 03:40
English to Spanish
+ ...
Note exactly cheap, but it can be done. Sep 16, 2010

SDL Trados Studio 2009 works directly with PDF's. It's not quite as comfortable as working with editable formats: takes a bit longer to convert the document for translation, there are a lot of tags in the editor window, and the output generated is a .doc file that you will have to clean up. Still, it's a fairly painless process.

However, I haven't done this with scanned files, so I'm not sure whether Trados itself performs OCR. What I do know, is that Acrobat has a built-in OCR option, which creates a new PDF "text-based" document, while keeping everything else as it is. I suppose you could do this, then process it with Trados, and be translating a lot sooner.


 

John Fossey  Identity Verified
Canada
Local time: 23:40
Member (2008)
French to English
ABBYY Trasformer Sep 17, 2010

I have been using ABBYY Transformer for a couple of years, with very good results. However, I have found that its best not to let it use its "automatic" setup feature but spend a few minutes to go through the documents manually identifying where the graphics and text are. With some practise I find it reasonably quick - maybe 1 minute or less per page.

Only very occasionally - with a document that is very heavy on images and graphics - can it not produce a document that's quite close to the original. In that case I will only select the text and produce a text only document for the client.

For .jpg and other image files, I have the free version of CutePDF installed as a printer, so will open the image in whichever program suits and print it to a .pdf file, then transform the .pdf with ABBYY Transformer.


 

Susan Welsh  Identity Verified
United States
Local time: 01:40
Member (2008)
Russian to English
+ ...
ABBYY Transformer @John Sep 17, 2010

John Fossey wrote:

I have found that its best not to let it use its "automatic" setup feature but spend a few minutes to go through the documents manually identifying where the graphics and text are.

John, I always use the "manual" setup, but the converted document comes out with the text and graphics already marked off in boxes. I have never figured out what I was supposed to do with them, since they are already there, and--almost always--correct. Yet the document, when opened in Word, is almost invariably messed up in one way or another. What am I missing?


 

Olieslagers
French Polynesia
Local time: 19:40
Dutch to French
+ ...
ABBYY PDF transformer Sep 17, 2010

I find ABBYY transformer a great tool to convert PDFs into RTF even for converting graphics, tables and text boxes. But for a good result it is asolutely necessary to use the manual setup and also, to check the typo before starting to translate. I noticed that the OCR tools have difficulties to detect the difference between the end of a line and the end of a sentence...

 

Ilze Klotina
Latvia
Local time: 08:40
Japanese to Latvian
+ ...
TOPIC STARTER
got ABBYY FineReader Sep 20, 2010

Thank you for suggestions.
I checked also other forum topics and finally bought ABBYY FineReader Pro. It works great, however, quite slowly - converting one page takes about 10 min. Is that normal?


 

Quang Ngo
Local time: 12:40
English to Vietnamese
+ ...
Try Nitro PDF Reader to extract text and images Mar 2, 2011

I would suggest you to try Nitro PDF Reader , a powerful tool that can extract text and images from the .pdf files. You can download Nitro PDF reader (100% free) at http://www.nitroreader.com/download/. For translating .pdf files, I use this program to extract the source text into Notepad, and then transfer it to a Word file using the C&P command. After some minor modifications, I can start translating using Omega T. Hope this will help.

123Translations
 

Jeremy Rhoads
United States
Local time: 01:40
Spanish to English
PDF editor Jun 19

Hello! here's a great online resource https://www.altopdf.com/, with it you can edit and convert PDF files

 

Milan Condak  Identity Verified
Local time: 07:40
English to Czech
Now are services for free Jun 24

Jeremy Rhoads wrote:

Hello! here's a great online resource https://www.altopdf.com/, with it you can edit and convert PDF files


There are the links into 3 groups and 16 URL's

MANAGE PDF

Merge PDF
Rotate PDF
Split PDF
Compress PDF
Protect PDF
Unlock PDF
Extract Pages

CONVERT

JPG to PDF
PDF to JPG
PNG to PDF
PDF to PNG

OFFICE & PDF

Word to PDF
PDF to Word https://www.altoconvertpdftoword.com/ (just now: 23,330 Documents Converted)
PDF to Excel
PPT to PDF
PDF to PPT

I converted 3 PDFs and I compared DOCX with other conversion method. The DOCX files have smaller size.
When you read it a count of documents converted is higher. Now the services are free of charge.

https://www.pdffiller.com/en/terms_of_services.htm

16. YOU AGREE TO PAY THE FEES OWED FOR YOUR USE OF SERVICES

Milan


 

Milan Condak  Identity Verified
Local time: 07:40
English to Czech
Two exaples Jun 25

Jeremy Rhoads wrote:

Hello! here's a great online resource https://www.altopdf.com/, with it you can edit and convert PDF files


Conversion PDF to DOCX and to XLSX (in Czech)

http://www.condak.cz/nove/2018-06/24/cs/00.html

PT Magazine-folha_56

01 PT Magazine
02 Konverze PDF do DOCX
03 PC Translator - překlad DOCX a HTML
04 Konverze PDF do XLSX

Milan


 

Milan Condak  Identity Verified
Local time: 07:40
English to Czech
Comparition of converted files in OmegaT and DGT-OmegaT Jun 27

Milan Condak wrote:

I converted 3 PDFs and I compared DOCX with other conversion method. The DOCX files have smaller size.



http://www.condak.cz/nove/2018-06/23/cs/00.html

Překlad DOCX v DGT-OmegaT = Translation of DOCX files in DGT-OmegaT

Režim Ostranění tagů = The mode of untagged file view

Milan


 

DZiW
Ukraine
English to Russian
+ ...
instead Jun 27

Some 10 minutes a page? Even my old low-end Core2Duo scans an English 300dpi A4 page up to 10 seconds. Why, it depends on the file, software, and PC, just ask your brother to check where might be a bottleneck.

Rendering a PDF/JPG may pose a lot of issues for complicated languages, typefaces, and texts, relying badly on the quality and settings. So, instead of mitigating the problems it's preferable to ask the client for an editable version--or warn(!) and charge for extra DTP fuss (OCRing, proofreading, formatting, and tag-souping).

In the view of NDA, I'm rather reluctant to use online converters, but for other readable PDFs a secure and free CAT WordFast Anywhere (freetm.com) also works nicely in my language pairs.

Cheers

[Edited at 2018-06-27 21:02 GMT]


 

Milan Condak  Identity Verified
Local time: 07:40
English to Czech
WordFast Anywhere include OCR Jun 28

DZiW wrote:

Some 10 minutes a page?


The conversion in "altopdf" is fast. Slow is my documentation.

DZiW wrote:
for other readable PDFs a secure and free CAT WordFast Anywhere (freetm.com) also works nicely in my language pairs.


We are in OmegaT forum.

Wordfast Anywhere (WFA) is free service of Wordfast LLC. WFA can convert not readable PDFs and pictures, too.

I was in spa Luhačovice seven years ago.
Here is my presentation on OCRing of TIFF in Wordfast Anywhere and translating of water pricelist from a coffee room.

http://www.condak.net/wfa/2011-10-03/cs/00.html

Wordfast Anywhere v Luhačovicích

Překlad TIFF

01 WFA v Luhačovicích
02 Nastavení WFA a upload TIFu
03 OCR a download k opravě chybných znaků
04 Upload upraveného textu
05 Překlad CS > EN
06 Konkordanční hledání
07 Stažení přeloženého souboru
08 Otevření ve WFC a porovnání HTML

Copyright 03.10.2011

Milan Condak
Czech Wordfast Trainer


 


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


working with .pdf and .jpg files

Advanced search






CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use SDL Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

More info »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search