How to count word frequencies/occurrences across documents?
Thread poster: stereojones
Oct 6, 2011

Hey There,
I am new to the forum therefore please accept my apologies if I did place the topic in the wrong forum accidentally.

I am urgently looking for a search program which would allow me to count keyword occurrences or repetitions across a large amount of different pdf files. I have a need for statistics per document, i.e. total number of occurrences of a few keywords per pdf file.

To understand the set up more easily, I have folders and sub folders with pfd docs roughly 100 pages each. As they are numerous and I am searching for 20 keywords opening up each and every one of them to search for those words obviously makes no sense. I am also not interested in where the words actually are within the texts. All I need to know is how frequently they are used within each document. Preferably, I would like to see a program search throughout all files automatically and not per file and the ability to present a table or statistic with the total number of occurrences of the keywords.

Is there anyone by chance who does know what program could do the job? Any other ideas of how to get this done efficiently are extremely welcome.

Thanks to everyone.


Direct link Reply with quote
 

David Turnbull
United Kingdom
Local time: 17:25
Italian to English
Search in Adobe Reader Oct 6, 2011

Easily done:

http://www.online-tech-tips.com/computer-tips/how-to-search-for-text-inside-multiple-pdf-files-at-once/


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

How to count word frequencies/occurrences across documents?

Advanced search







BaccS – Business Accounting Software
Modern desktop project management for freelance translators

BaccS makes it easy for translators to manage their projects, schedule tasks, create invoices, and view highly customizable reports. User-friendly, ProZ.com integration, community-driven development – a few reasons BaccS is trusted by translators!

More info »
SDL MultiTerm 2017
Guarantee a unified, consistent and high-quality translation with terminology software by the industry leaders.

SDL MultiTerm 2017 allows translators to create one central location to store and manage multilingual terminology, and with SDL MultiTerm Extract 2017 you can automatically create term lists from your existing documentation to save time.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search