Word/character count for bilingual word file
Thread poster: egya123

egya123
Latvia
Latvian to English
+ ...
Aug 12, 2015

Hi,

I have an issue regarding the need to count one language characters/words in a bilingual file. I would like to know how and whether it is possible to be done without making a separate one language document as it takes too much time to copy each segment of the language interested (the document has many pages). The document I want to get the info about characters/words is a word (.doc) file that is original and has not been created via Trados or some other program.

Perhaps you know some program that could do it (Trados, etc.) or you have some other suggestions. I would really appreciate your answer. Thank you in advance.


Direct link Reply with quote
 

Cilian O'Tuama  Identity Verified
Local time: 20:03
German to English
+ ...
Different formats? Aug 13, 2015

If the two languages are formatted differently, you could run a search for and delete all text in a particular format and then just count the remaining text.

Direct link Reply with quote
 

Soonthon LUPKITARO(Ph.D.)  Identity Verified
Thailand
Local time: 02:03
Member (2004)
English to Thai
+ ...
Bilingual file format? Aug 13, 2015

egya123 wrote:
I have an issue regarding the need to count one language characters/words in a bilingual file. I would like to know how and whether it is possible to be done without making a separate one language document as it takes too much time to copy each segment of the language interested (the document has many pages). The document I want to get the info about characters/words is a word (.doc) file that is original and has not been created via Trados or some other program.


I cannot imagine of your bilingual MS Word file format. I guess it is like a WordFast Classic bilingual file (which tags are hidden text fonts) or other paragraphed bilingual texts. If so, it is quite easy to count words as follows:
1. Select All in Word
2. Change Texts to Table by separating with tab or other bilingual file tags.
3. Delete the language you do not want to count (source or target?)
4. Use MS Word function to count words.
5. Count only number of tag words and subtract from 4. above.
The result is the exact count.
Note: Use MS Excel Macro to split texts if source and target bilingual texts are on different lines, and count with Word above.

Soonthon L.


Direct link Reply with quote
 
valerius  Identity Verified
Latvia
Local time: 21:03
English to Latvian
+ ...
Deleting the unnecessary is all I can think of as well Aug 13, 2015

From what I have understood from the topic author's other posting (http://www.proz.com/forum/right_to_left_language_technical_forum/289493-word_character_count_for_bilingual_word_file.html ), it is a Word document which happens to be in multiple languages.
Now, assuming that is the case, I can only think of deleting the paragraphs in which you are not interested in (make a backup copy of the two-language file if needed) and then counting what remains using the word/character count function. I understand that, depending on the file, it may not be optimal and time-consuming. Maybe someone more clever can come up with something involving language auto-recognition and styles...

To help you locate the "other" language sentences, you can mark all text as the language you are interested in and then run spell check, which should take you to the next "other" language instance.

Another option, probably, would be translating the document with Trados 2007, not touching the segments you are not interested in and then get the wordcount from the cleanup log (Cleaned) AFTER the translation is done. If this works in your case

[Edited at 2015-08-13 19:31 GMT]


Direct link Reply with quote
 

Tony M  Identity Verified
France
Local time: 20:03
Member
French to English
+ ...
IF the original doc was created properly! Aug 13, 2015

It would be reasonable to expect that the language attribute would have been correctly set for each language.

IF this is the acse, then you can simply use a copy of your file to do a search & replace on 'any character' + the language attribute set to the language you don't want, and put nothing in the replace box, then perform 'replace all'. This will simply delete all text in your unwanted language, from which point you can just do a straight word count.

Of course, the original document may not have been correctly formatted! I guess it all depends how it originated; had it been cobbled together by copying-and pasting chuncks from (say) an EN document into (say) a FR document, then the languages of those individual documents MIGHT have been set correctly and it will work. However, if this is just a document in which people have typed in the 2 different languages, chances are they won't have bothered to change the language between each chunk

If not, is there any OTHER attribute that differentiates the text, like colour or font, for example? If one of the languages uses a foreign script, chances are you could differentiate on the basis of the font that was used.

Good luck!

Oh, and by the way, this has been discussed before in the forums, if you try a search, you may find some other better suggestions.


Direct link Reply with quote
 

egya123
Latvia
Latvian to English
+ ...
TOPIC STARTER
Thanks Aug 14, 2015

Thank you all for your suggestions for now it stays only in manual level, but I will try to check also other forums.

Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Word/character count for bilingual word file

Advanced search






PerfectIt consistency checker
Faster Checking, Greater Accuracy

PerfectIt helps deliver error-free documents. It improves consistency, ensures quality and helps to enforce style guides. It’s a powerful tool for pro users, and comes with the assurance of a 30-day money back guarantee.

More info »
WordFinder
The words you want Anywhere, Anytime

WordFinder is the market's fastest and easiest way of finding the right word, term, translation or synonym in one or more dictionaries. In our assortment you can choose among more than 120 dictionaries in 15 languages from leading publishers.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search