Can Swordfish process PDF documents?
Thread poster: Thomas Johansson

Thomas Johansson  Identity Verified
Peru
Local time: 13:47
English to Swedish
+ ...
Apr 21, 2011

I will receive a PDF file with approx. 40,000 words and have been asked to process it with a CAT tool while generating a TM (for future versions). Is this something I can do with Swordfish?

Also, I got the impression Swordfish is written in Java (though I am not sure). Is it by any chance slow to work with or does it perform well?

Thomas

[Edited at 2011-04-21 19:12 GMT]


 

Rodolfo Raya  Identity Verified
Local time: 15:47
English to Spanish
No PDF Apr 21, 2011

Hi,

Swordfish doesn't support PDF files. You will have to use an OCR to extract the text into a better format (.docx for example).

Java is not slow, it is as fast as C++. Speed depends mostly on your hardware (memory & processor).

Regards,
Rodolfo


 

Laurent KRAULAND (X)  Identity Verified
France
Local time: 20:47
French to German
+ ...
PDF = pain in the back Apr 22, 2011

Hi Thomas,
while I certainly understand that clients may have their reasons to request translations from PDF files, it must be said once again that PDF was thought to be a non-editable format.

And if the document is as you described it, there must be an original in an editable and CAT-compatible format somewhere.


 

Thomas Johansson  Identity Verified
Peru
Local time: 13:47
English to Swedish
+ ...
TOPIC STARTER
It is in an "editable" format Apr 22, 2011

Well, it is in an "editable" format, at least for instance in the sense that I can copy the text and paste it, say, to a Word document, if I like. So, OCR shouldn't really be needed. (I am not sure whether "editable" is the right word here, but it is in one of those modern PDF formats that started appearing a few years ago, where you can e.g. highlight text, copy it and paste it into some other file.)

Given this, is Swordfish still not able to process this? (I would prefer to delive
... See more
Well, it is in an "editable" format, at least for instance in the sense that I can copy the text and paste it, say, to a Word document, if I like. So, OCR shouldn't really be needed. (I am not sure whether "editable" is the right word here, but it is in one of those modern PDF formats that started appearing a few years ago, where you can e.g. highlight text, copy it and paste it into some other file.)

Given this, is Swordfish still not able to process this? (I would prefer to deliver the translation back to the client as a PDF file, i.e. in the same format as the source file.)

Or otherwise, what CAT tool could process PDF files of this sort?

Thomas
Collapse


 

Piotr Bienkowski  Identity Verified
Poland
Local time: 20:47
Member (2005)
English to Polish
+ ...
PDF files are best served by OCR Apr 22, 2011

PDF files are best served by OCR, unless your client has tools to convert PDF into some DTP formats that Swordfish can import/export.

OCR processing has its drawbacks and you have to know the quirks of your OCR software, many clients do not like the way Finereader formats documents for example, so I learned to mark blocks for recognition manually to avert their rage.

I prepared a 24page PDF for translation
... See more
PDF files are best served by OCR, unless your client has tools to convert PDF into some DTP formats that Swordfish can import/export.

OCR processing has its drawbacks and you have to know the quirks of your OCR software, many clients do not like the way Finereader formats documents for example, so I learned to mark blocks for recognition manually to avert their rage.

I prepared a 24page PDF for translation manually once, it had graphics and tables and Greek characters sometimes. After that it looked like the original PDF, but it took me more than two days!

Regards,

Piotr
Collapse


 

Milos Prudek  Identity Verified
Czech Republic
Local time: 20:47
English to Czech
+ ...
Print into PDF May 6, 2011

I would prefer to deliver the translation back to the client as a PDF file, i.e. in the same format as the source file.)


Here is the workflow:
- Use OCR to convert PDF to MS Word (or read the text and translate it)
- Translate the MS Word file with any CAT
- Print the MS Word file into PDF (OpenOffice can do this)


 


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Maya Gorgoshidze[Call to this topic]

You can also contact site staff by submitting a support request »

Can Swordfish process PDF documents?

Advanced search






PerfectIt consistency checker
Faster Checking, Greater Accuracy

PerfectIt helps deliver error-free documents. It improves consistency, ensures quality and helps to enforce style guides. It’s a powerful tool for pro users, and comes with the assurance of a 30-day money back guarantee.

More info »
SDL MultiTerm 2019
Guarantee a unified, consistent and high-quality translation with terminology software by the industry leaders.

SDL MultiTerm 2019 allows translators to create one central location to store and manage multilingual terminology, and with SDL MultiTerm Extract 2019 you can automatically create term lists from your existing documentation to save time.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search