Word count having 2 different languajes in source document
Thread poster: Ana Lopez

Ana Lopez  Identity Verified
Mexico
Local time: 07:48
Member (2013)
English to Spanish
+ ...
Jun 4, 2014

Hello!!

I'm working on a PDF document that has German/English in two "columns" and I only have to translate the English part, do you know any way I can ONLY count the English words?

Trados has statistics, but I don't know if there is a tool to count by language.

The only way I can think of is to count them manually. Do you know anything faster?

Thank you.


Direct link Reply with quote
 

Jack Doughty  Identity Verified
United Kingdom
Local time: 13:48
Member (2000)
Russian to English
+ ...
Convert to Word Jun 4, 2014

You can convert it to Word using an OCR. Abbyy fine Reader and Abbyy PDF Converter come to mind.

Direct link Reply with quote
 

Ana Lopez  Identity Verified
Mexico
Local time: 07:48
Member (2013)
English to Spanish
+ ...
TOPIC STARTER
Can Word count by language? Jun 4, 2014

Thanks! I already converted it to Word however, since the columns are mixed with images I cannot just "select" the English column. Thus asking if there is any other way than by marking page by page. Maybe there isn't, just asking

Direct link Reply with quote
 

Tony M  Identity Verified
France
Local time: 14:48
Member
French to English
+ ...
Are languages set? Jun 4, 2014

When you did the conversion using OCR, were you able to set the languages of the relevant bits?

If the text DOES have its 'language' attributes correctly set, then you can do an ordinary word count in Word; then search and replace all for 'any character' + language attribute = (say) German, replacing with nothing.

Then do another word count, and this will be the EN words without the German ones; in fact, you don't even need to have done the preliminary word count, I was just thinking of subtracting the EN from the total, since TOTAL – EN = German, of course!

Naturally, if the language attribute was NOT correctly set in the first place, this won't work; but at least you'll know for next time.

BTW, you say that the images are stopping you from selecting all the EN column, but why? Are they in merged cells or something? You ought to be able to process your table in such a way as to unmerge all the cells, which will probably push all the images into the l/h column or something, but will leave you with two clean columns you can select properly.

Your are SURE it is in a proper Word table? OCR conversions have a nasty habit of 'organizing' (well, that's not what I call it...) text into newspaper-style columns, in which case you'll have a harder job on your hands trying to sort it out. It might even be simpler to convert everything to single-column and remove all column breaks from the document, and then see what you have left...


Direct link Reply with quote
 

Tony M  Identity Verified
France
Local time: 14:48
Member
French to English
+ ...
Failing that... Jun 4, 2014

...if the original document really is organized neatly into two columns, why not just do another 'dummy' OCR run on it, selecting ONLY the EN column as you go through, so you'll actually have a document at the end of it that ONLY contains the EN you need to translate; you might even be able to use this for your translation, or at worst, it will be a useful intermediate stage for your word count.

[Modifié le 2014-06-04 20:58 GMT]


Direct link Reply with quote
 

Ana Lopez  Identity Verified
Mexico
Local time: 07:48
Member (2013)
English to Spanish
+ ...
TOPIC STARTER
I'll try the option Jun 4, 2014

I'll try making a dummy OCR conversion, from Abbyy, only identifying English as language, and see how it goes with the find & replace.

Thank you so much Tony M.!!


Direct link Reply with quote
 

zabit2005  Identity Verified
Turkey
Local time: 15:48
Member (2014)
English to Turkish
+ ...
Paste only text Jun 5, 2014

Hi.

Try to copy the all by Ctrl+A, Ctrl+C and then choose to paste it as text only in a blank word page. So you can get rid of images.



[Edited at 2014-06-05 01:14 GMT]


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Word count having 2 different languajes in source document

Advanced search






TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search