Google-based KWIC (keyword in context) tools?
Thread poster: Olaf (X)

Olaf (X)
Local time: 09:36
English to German
Feb 10, 2010

I'd like to use Google as a monolingual corpus to generate KWIC (keyword in context) lists. So far I only found one tool that allows me to that--WebCorp (http://www.webcorp.org.uk/) which has several limitations that I don't like. Does anybody know any other tools or scripts that could be used for this purpose?

Olaf

[Subject edited by staff or moderator 2010-02-11 08:29 GMT]


 

Bilbo Baggins
Catalan to English
+ ...
A couple of options Feb 10, 2010

Hi Olaf

Many years ago I used WebCorp and even wrote about it (for the ATA), but it was limited by its slowness, although it seems to have improved somewhat. Have you tried the advanced search features?

The only tools I know that might do something similar to what you need are:

Rollyo: you can create your own restricted search engine, limited to specific sites. www.rollyo.com/
PERC: a true corpus available online, but pre-built. http://www.corpora.jp/~perc04/

There's another way of compiling a quick corpus on the basis of keywords, Webbootcat, but it's not free and you need a concordancer to be able to search the texts (although it's possibly got an online concordance feature). http://www.sketchengine.co.uk/

There's also Corpis-Eye, but you can't concordance the web, just limited, specific parts of it: http://corp.hum.sdu.dk/cqp.en.html

Although that's about the limits of my knowledge, if you gave more details of what you want to achieve, I could maybe be more precise.





[Edited at 2010-02-10 23:14 GMT]


 

Olaf (X)
Local time: 09:36
English to German
TOPIC STARTER
Thanks for the links Feb 11, 2010

Bilbo Baggins wrote:
Hi Bilbo,

Many years ago I used WebCorp and even wrote about it (for the ATA), but it was limited by its slowness, although it seems to have improved somewhat. Have you tried the advanced search features?

Yes, I did, but it didn't make a difference. Unfortunately, WebCorp doesn't seem to support queries for languages using non-Latin alphabets.
I'll check out the other links that you mentioned.

Thanks,
Olaf


[Edited at 2010-02-11 10:10 GMT]


 

Bilbo Baggins
Catalan to English
+ ...
Re PERC and others Feb 11, 2010

PERC: I should have mentioned that it's an English language corpus. Also Corpis-Eye.

Seems to me that Rollyo might be the best option. And maybe Webbootcat.

With the first one, you can select URLs to roll your own search engine. You get a Google-like display (not a KWIC) with the search term highlighted.

With the second one, you enter keywords, then select from the URLs that result, and these are used to ctreate a corpus in TXT format that you can concordance.


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Google-based KWIC (keyword in context) tools?

Advanced search






WordFinder Unlimited
For clarity and excellence

WordFinder is the leading dictionary service that gives you the words you want anywhere, anytime. Access 260+ dictionaries from the world's leading dictionary publishers in virtually any device. Find the right word anywhere, anytime - online or offline.

More info »
SDL Trados Studio 2019 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2019 has evolved to bring translators a brand new experience. Designed with user experience at its core, Studio 2019 transforms how new users get up and running, helps experienced users make the most of the powerful features, ensures new

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search