Pages in topic:   [1 2] >
Reject a job based solely on possible technical problems?
Thread poster: Elisabeth Maurland

Elisabeth Maurland
United States
Local time: 14:40
Member (2013)
Norwegian to English
+ ...
Nov 18, 2016

I had a project last month of about 15,500 words, I enjoyed it and didn't feel rushed, and it took quite a few days. The document was an MS Word file, and I used my WordFast Pro 4. I am fairly new to this tool.
When I was finally done, I saved it and converted it back to MS Word, and it looked great – at first. The further down into the file I looked, the more jumbled it got. Figures and margins had been moved, text was seemingly missing or in odd places.
It was basically unreadable and useless.

I spent a full day making changes and saving and resaving, loading and reloading, and finally I was able to send off a version that had a couple of minor grammar mistakes that I didn't dare touch, but at least it was readable.

Now I have been asked to do a revision of the project (lots of additions), plus another, new one. I had told the agency that the problem might be that the original file was organized with a mix of section breaks and page breaks, so the new documents only have section breaks (plus one column break). But I am not 100% sure if this was the (only) problem. Maybe the Word files had other issues as well, different margin settings in different sections etc. There are lots of pictures and tables, so there could be multiple different settings.

So I am not sure I dare take this job on. I can't see translating 18,500 words only to have the files ruined at the end. I would like the job – but is it worth the risk, or is there a way to go about it? I feel somewhat confident with MS Word, but I don't know if there is a way for me to ascertain the feasibility of this task.

So there are three questions, which should maybe be in three different forums:

- Should I (would you?) take the risk of taking a job that I need and would love to do, but which has caused major problems in the past, hoping that it's fixed or fixable?
- Is this a problem that people often run into when translating Word files in CAT tools?
- Can it be (easily) fixed? (Already asked in a WF forum elsewhere, but unanswered so far.)

I can also discuss it with the agency, but I don't really know what to tell them. They have already fixed the one issue, but I don't know if there is more.

Any input would be great!


Direct link Reply with quote
 
xxxToon Theuwis  Identity Verified
Belgium
Local time: 21:40
English to Dutch
+ ...
Originally PDF? Nov 18, 2016

Never worked with WFP, but when a pdf is converted into Word it often comes with terrible formatting that usually causes problems in the exported translated file. If it is an original Word document it should be less of a problem or no problem at all.

In Trados you notice by the amount of tags in the text. Don't know about this is WFP.

There are tools to remove tags automatically, don't know them by heart. Others might give suggestions about this. That is, if pdf is the actual problem here.


Direct link Reply with quote
 

Christine Andersen  Identity Verified
Denmark
Local time: 21:40
Member (2003)
Danish to English
+ ...
Ask them to check the formatting Nov 18, 2016

I don't know anything about Wordfast Pro - I never got it to work, though I liked WF Classic several versions ago.

But it is perfectly reasonable to ask the client to tidy up formatting before you begin.

CATS should not have problems with standard Word formatting, but if it has been converted via PDF or some other program, there can be weird boxes and section breaks in it, or spacing that results in a dreadful mess of tags. Set everything with left-hand margins and NOT with straight right-hand margins for a start - you can always tidy up the finished result, or even better, let the client's DTP department do it.

Check before you start translating would be my advice!

I would very probably accept the job, but insist on a properly formatted source file.

If features like automatic section numbering and forming the Contents page are used, make sure they work.
If there are columns and layout features, check how they are done, and ask the client to set them up properly.
Etc.

Formatting issues can turn an attractive job into a nightmare, so you are right to be cautious.


Direct link Reply with quote
 

Elisabeth Maurland
United States
Local time: 14:40
Member (2013)
Norwegian to English
+ ...
TOPIC STARTER
That is possible! Nov 18, 2016

They first sent me PDF files of over 200 pages with lots of pictures and wanted me to give them a quote. I asked for Word files then, and got them, but I guess I don't know which came originally first. Well, originally it would have to be Word or some other word processor? I didn't know you could make a Word file out of a PDF. Then there should be an original Word file somewhere??

There are large amounts of tags in parts of the text (at least, there were in the first project – I can't see all of it easily in the new text, since WFPro moves slowly with such a large document).

Thanks!


Direct link Reply with quote
 

Oscar Martin
Spain
Local time: 21:40
English to Spanish
+ ...
Clean up tags Nov 18, 2016

Translator Tools lets you clean this tag soup (http://www.translatortools.net/download.html).

However, ask the client to remove all unnecessary tags.

When converting from pdf to doc/docx, most of the programs try to reproduce the layout as much as possible.

In most cases, it will change the width of every letter in a word. So you can get every letter in a segment between tags.

If possible, save the file as docx, import the file into memoQ and then export it. The memoQ filter for docx files will remove most of these unnecessary tags.

This will change the layout as the same text will occupy more space than before.

A DTP expert can fix these issues but it will take some time for 200 pages.


Direct link Reply with quote
 

B D Finch  Identity Verified
France
Local time: 21:40
Member (2006)
French to English
+ ...
Inspect document Nov 18, 2016

Have you tried using the document inspector (in MS Word) on your original source file? I always use this to remove stuff like metatext that might cause trouble. It's also worth splitting large source files into several smaller files before loading them into Wordfast. It makes it easier to identify and isolate formatting problems.

If the problem is formatting linked to conversion into Word from PDF (note that Wordfast Pro does handle PDF format if it's not too complex), then the text will be arranged in text boxes. Ask them to format all textboxes to "resize to fit text" and so that they are allowed to break across pages; you might still need to do some adjustment to text boxes in the finished document. If you work directly from PDF files, you'll need to do all this yourself and should allow for it in your price.

[Edited at 2016-11-18 18:33 GMT]


Direct link Reply with quote
 

Jorge Payan  Identity Verified
Colombia
Local time: 14:40
Member (2002)
German to Spanish
+ ...
Useful tool for suppressing rouge format markings (even those inserted by PDF conversion to Word) Nov 18, 2016

I suggest you to try CodeZapper. This useful collection of Word macros includes PDFFix and PDFTidy which will suppress most of the rogue format markings in the Word file when converted from PDF, before open it in Studio or any other CAT tool. You can also covert all text boxes to plain text, extract the images and reinserting them after translation, etc., etc. for just EUR 20. I am a very satisfied customer

Direct link Reply with quote
 

José Henrique Lamensdorf  Identity Verified
Brazil
Local time: 16:40
English to Portuguese
+ ...
It might have been created with a DTP app Nov 18, 2016

Elisabeth Maurland wrote:

They first sent me PDF files of over 200 pages with lots of pictures and wanted me to give them a quote. I asked for Word files then, and got them, but I guess I don't know which came originally first. Well, originally it would have to be Word or some other word processor? I didn't know you could make a Word file out of a PDF. Then there should be an original Word file somewhere??

There are large amounts of tags in parts of the text (at least, there were in the first project – I can't see all of it easily in the new text, since WFPro moves slowly with such a large document).

Thanks!


Word is a text processor. Its original paradigm is the typewriter. Over time, it gained a truckload of bells and whistles to turn it into a makeshift desktop publishing (DTP) tool, and so it remains. Evidence that it is a word processor is that you can't, say, do the layout on page 20, then do it on 13, then on 1, and finally on page 27, without messing up the entire publication.

DTP apps - InDesign, its father Page Maker, QuarkXpress and Frame Maker at the pro level, and Serif Page Plus, MS Publisher, and Scribus at the amateur level - are based on the art studio (paste-up) paradigm. They are intended for publication layout, often using text from other sources.

Any program capable of printing to a PostScript (it's a standard, not a brand) printer can create a PDF. And many programs convert a PDF into a Word file, not always so effectively. It only "looks" okay, until you change one letter, and the picture on page 22 flies away to page 60, or simply vanishes.

I've come to the conclusion that translating PDF files converted into Word can be effective only if it's about flowing text. If the layout is just a bit more complex, severe troubles will be lurking everywhere.

That's why I prefer to translate PDF files using Infix, though it DOES require DTP skills to fix the layout after translation. I assume the creative part is past, as I received the original, so there is no point in using the initial DTP app. That's why I do it on the PDF.

A while ago I prepared a walk-through of the PDF translation process with a now OLD version of Infix on this page.


Direct link Reply with quote
 

Lianne van de Ven  Identity Verified
United States
Local time: 15:40
Member (2008)
English to Dutch
+ ...
Spot on! Nov 20, 2016

José Henrique Lamensdorf wrote:

Word is a text processor. Its original paradigm is the typewriter. Over time, it gained a truckload of bells and whistles to turn it into a makeshift desktop publishing (DTP) tool, and so it remains. Evidence that it is a word processor is that you can't, say, do the layout on page 20, then do it on 13, then on 1, and finally on page 27, without messing up the entire publication.

DTP apps - InDesign, its father Page Maker, QuarkXpress and Frame Maker at the pro level, and Serif Page Plus, MS Publisher, and Scribus at the amateur level - are based on the art studio (paste-up) paradigm. They are intended for publication layout, often using text from other sources.

Any program capable of printing to a PostScript (it's a standard, not a brand) printer can create a PDF. And many programs convert a PDF into a Word file, not always so effectively. It only "looks" okay, until you change one letter, and the picture on page 22 flies away to page 60, or simply vanishes.

I've come to the conclusion that translating PDF files converted into Word can be effective only if it's about flowing text. If the layout is just a bit more complex, severe troubles will be lurking everywhere.

That's why I prefer to translate PDF files using Infix, though it DOES require DTP skills to fix the layout after translation. I assume the creative part is past, as I received the original, so there is no point in using the initial DTP app. That's why I do it on the PDF.

A while ago I prepared a walk-through of the PDF translation process with a now OLD version of Infix on this page.


I don't think there is any way around it, José is spot on. I see this on a regular basis. Word does a rather poor job of converting pdf to word, the lay-out is ad-hoc. When the target language is somewhat different in length (# of words), the entire layout may shift. Once the document has been made editable in the source language, someone needs to redo the formatting and lay-out to create a logical structure. Once that is done, WFP or any other CAT tool should be able to produce a good result. Bottom line: it needs to be done anyway, so either client needs to do it (so you are not responsible), or you need to charge extra for this step.

[Edited at 2016-11-20 05:50 GMT]


Direct link Reply with quote
 

José Henrique Lamensdorf  Identity Verified
Brazil
Local time: 16:40
English to Portuguese
+ ...
Some clarification Nov 20, 2016

Lianne van de Ven wrote:

I don't think there is any way around it, José is spot on. I see this on a regular basis. Word does a rather poor job of converting pdf to word, the lay-out is ad-hoc. When the target language is somewhat different in length (# of words), the entire layout may shift. Once the document has been made editable in the source language, someone needs to redo the formatting and lay-out to create a logical structure. Once that is done, WFP or any other CAT tool should be able to produce a good result. Bottom line: it needs to be done anyway, so either client needs to do it (so you are not responsible), or you need to charge extra for this step.


AFAIK Word does not convert PDF files into its own. Other programs do it.

There are two major categories of PDF files:
a) "live" - aka editable, distilled (where an "O" is a character); and
b) "dead" - aka non-editable, scanned (where an "O" character is a circle).

Translating each category involves a different workflow.

DEAD PDFs call for OCR (Optical Character Recognition) to generate TEXT, that CAT tools can use to accelerate translation. If OCR is not used, they are equivalent to hard copy for translation. OCR relies heavily on the quality of the scan, which involves issues including - however not limited to - resolution, sharpness, fonts used (cursive, gothic, ornate, and other fonts can't be OCR-ed), layout complexity, and software capabilities.

The major issue with these is when OCR turns dead PDFs into a laid-out word processor (e.g. MS Word) file. If the layout is more complex than streaming text, this file - at its best- is like a movie set: everything looks like real life, but the moment you replace anything, the rest of it gets displaced, and so on successively. Translation usually cause text to swell or shrink, so it's a mess.

This would behoove DTP, however I see too many translators doing it for free (i.e. only charging their per-word translation fees), and using inadequate DTP tools (viz. MS Word). It's like painstakingly rebuilding the shattered movie set with duct tape for free.

The right solution would be to translate, and then have a skilled DTP operator - some translators are; most are not - re-create the layout with proper DTP software, of course, charging for it.

I did it for almost a quarter century. At first, originals came in hard copy - and I scanned them into PDFs - or scanned PDFs. I'd do OCR and translate TEXT. Then I'd use Page Maker (InDesign's father) to place each scanned page as a template on its corresponding place. I'd format the translated text blocks, placing them exactly over their respective sources, redraw all lines, crop pictures from the scan and align them in place, and thus assemble the entire translated publication. Then I'd simply delete the original page - the template - from the background, and get a pristine translated publication, which I could distill to a PDF, assuming my client wouldn't want PageMaker files.

In spite of the amazing speed I could do it after so many years, this HAS a cost, and I always charged for it - on top of the translation - accordingly.

LIVE PDFs have the text therewithin. That's why some CAT tools like Trados, WordFast Anywhere, MemoQ, and even Google Translate can trespass into them, and let you translate the text. However CAT tools CANNOT handle the layout issues that arise from text shrinking/swelling in translation.

Again, this requires DTP, however no longer as extensive as for dead PDFs after live PDF editing tools came up. Among such tools are Infix, NitroPDF, and more recently Adobe Acrobat (full) itself. This is why, when I first saw Infix I suggested their developers to devise some way to facilitate translation, which they did over the years.

Infix merely exports the text from a live PDF, so it can be translated as TEXT, using the CAT tool/word processor of your choice and then imports the translation back into the PDF, properly laid out, for the necessary adjustments with the tools provided. This is merely streamlining the process of going into a live PDF with a CAT tool to translate it, and then using any PDF editor (including the same Infix) to fix the layout.

Of course, fixing the layout involves WORK, so it should have a cost of its own, not included in translation alone. The advantage here is that PDF covers ALL sources, so it is no longer necessary to have the software used to generate it.

After I gave the idea to Iceni, and they developed it, I became a beta-tester for the Infix translation features. I had to set a price for the post-translation DTP service using Infix.

Now we need a little diversion. Over the past few decades, I haven't changed my per-word translation rates. Why? Because progress in IT has increased my productivity as a translator sufficiently to cover inflation, cost of living, etc. So, with these modern contrivances, I can generate the same (or better) relative level of income with the same effort, keeping the per-word rates unchanged. I produce much more and faster, so I can charge the same per unit, and make much more money in absolute ($) terms.

I hate to charge per hour. Of course I do it when it involves selling my available time, like in interpreting jobs. I get the same per hour, regardless of my interpretee speaking at a snail's pace or like spitfire. However when I do DTP, it would be unfair. If a client hires me to do a job using PageMaker, I'll have it finished in a snap, so I won't get compensation for all the years invested in developing such skill. If they hired me to do the very same job using QuarkXpress, they'd be grossly overcharged for the hours I'd spend reading manuals and help screens!

So I made a marketing decision. In PageMaker I always charged a per-page rate for DTP, regardless of whether the page contained just one large title, like "Introduction" or a very complex table or flowchart. I worked on the average. The marketing decision was that I'd charge HALF of that PageMaker per-page rate for the DTP work in fixing my translation layout into an existing live PDF.

This would motivate me to master the technique using Infix, as well as to contribute as a beta-tester so they'd improve its efficiency. Most of all, it would motivate clients to hire such services, as the post-translation DTP work would cost them half of what it would using conventional DTP apps.

Of course, there are still some stubborn translation clients who provide a, say, live PDF file, and demand insistently that they require a neatly laid-out DOC, DOCX, or RTF translated file from it, which is often impossible. Using the same line of reasoning, my per-page rate to do DTP work using MS Word is TWICE my PageMaker rate. So far this has been an effective deterrent.

The problem is with translators who burn the midnight oil desperately trying to do "impossible" DTP work with a Word processor... for free! (IMHO a really bad marketing decision, but it's their choice, not mine.)


Direct link Reply with quote
 

Elisabeth Maurland
United States
Local time: 14:40
Member (2013)
Norwegian to English
+ ...
TOPIC STARTER
This is great! Nov 20, 2016

I have not yet had time to look into these various options (Infix, CodeZapper), but all of you have given me a much better insight into what may have caused the problem, what might be done about it, and what I can/cannot do about it – basically answered all my questions.
Fantastic with all the details and experience, José! (And of course, I forgot about the possibility that it had been done with a DTP. I decided long ago not to take jobs that required DTP work.)

I had to give them an answer on Friday, though, and I asked them to supply me with a properly formatted source file (thank you, Christine!), and then I would happily do it. By then they had asked me to lower my price significantly, which I won't do, so I might not have to deal with this at all. Either way, this is great information that I will need in the future.

Thank you all!


Direct link Reply with quote
 

Lianne van de Ven  Identity Verified
United States
Local time: 15:40
Member (2008)
English to Dutch
+ ...
Codezapper specifically for word Nov 20, 2016

Elisabeth Maurland wrote:

I have not yet had time to look into these various options (Infix, CodeZapper)....

Thank you all!


I just wanted to clarify that Codezapper is not just for pdf to word files. I use it on every word file before importing it into Studio (or other software) if a client does not prepare files. It gets rid of redundant (rogue) formatting that was created when the word file was composed or edited and that may show up in the bilingual file as unnecessary tags. I always save a copy of the original file but I have never seen a problem as a result of zapping code.


Direct link Reply with quote
 
MikeTrans
Germany
Local time: 21:40
Member (2005)
Italian to German
+ ...
My solution was a 'pay for use' with Adobe for best PDF conversions, but... Nov 20, 2016

Hi Elisabeth,
I hope the above advices have helped you out and you were (or will be) able to accept similar projects without major trouble.
Because good PDF conversion tools are still rare, and certainly do not achieve all possible conversions in a way that only little after-work is needed, I still have to include an hourly rate payment for projects that need deeper conversion work to be done (generally for PDFs). This also 'educates' my clients to send me Word documents (or other formats) in good quality.

It is costly, but I think the Adobe Acrobat software does a neat and troubleless conversion of most HTML, Word, Excel files. Other than buying the whole software which I will consider, there is the possibility for around 50$ to use the software in a pay-for-use situtation for x number of documents or similar. I don't know if this service is still available, but it was very convenient for me some years ago when I really needed to convert a lot of PDFs in really *good* quality. Although some of my clients have sent me 'bad' Word documents, I then asked them to send me the original PDFs and they were surprised when I sent back to them 1:1 translated PDFs. One has also to consider such a scenario instead of messing around with bad formats.
With Infix PDF Editor you can do very special workflows in combination with supported CAT tools (I don't know about Wordfast), so it's worth informing you on their homepage to see if it fits your case.

Good quality conversions or not: Still a lot of after-work needs to be done during or after conversion to fit into a CAT tool in order to exclude major troubles afterwards, like the one you have experienced. I'm still looking forward for such a conversion or working tool, Infix is very special, but more a PDF edition rather than conversion tool suited for working with PDF documents in the first place. Adobe Acrobat is good but it doesn't eliminate after-work either.

Greetings,
Mike


Direct link Reply with quote
 

Susan Welsh  Identity Verified
United States
Local time: 15:40
Member (2008)
Russian to English
+ ...
I second that: Codezapper Nov 21, 2016

Jorge Payan wrote:

I suggest you to try CodeZapper. This useful collection of Word macros includes PDFFix and PDFTidy which will suppress most of the rogue format markings in the Word file when converted from PDF, before open it in Studio or any other CAT tool. You can also covert all text boxes to plain text, extract the images and reinserting them after translation, etc., etc. for just EUR 20. I am a very satisfied customer


I'm not sure what it does for rouge formatting, but it works very well for rogue formatting!


Direct link Reply with quote
 

Elisabeth Maurland
United States
Local time: 14:40
Member (2013)
Norwegian to English
+ ...
TOPIC STARTER
CodeZapper for Mac? Nov 21, 2016

It looks like it doesn't work on a Mac?

Direct link Reply with quote
 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Reject a job based solely on possible technical problems?

Advanced search







TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
memoQ translator pro
Kilgray's memoQ is the world's fastest developing integrated localization & translation environment rendering you more productive and efficient.

With our advanced file filters, unlimited language and advanced file support, memoQ translator pro has been designed for translators and reviewers who work on their own, with other translators or in team-based translation projects.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search