Mobile menu

Off topic: A New Twist in Word Comparisons
Thread poster: Vito Smolej

Vito Smolej
Germany
Local time: 20:30
Member (2004)
English to Slovenian
+ ...
Mar 11, 2009

From Science / Random Samples vol 323

Sung-Hou Kim, a chemist at the University of California, Berkeley, usually explores gene and protein relationships. Now, Kim and colleagues are using similar algorithms to probe literary ties.

Books, genes, and proteins can all be represented as strings of letters, which Kirn's software analyzes to tease out underlying patterns. It's filed the Koran with other religious texts rather than with philosophical tracts as other literary comparison programs often do. Recently, the team cast fresh doubt on whether Shakespeare penned Pericles, Prince of Tyre, in which some scholars have detected the Bard's hand.



Unlike programs that simply compare word frequencies, Kim's approach first strips the text of punctuation and spaces, transforming the book into a single string of letters. Their algorithm records the first eight letters in a string and then advances this "window" one letter and repeats. It then looks at the frequency with which two letters appear next to one another. "I'm just stunned" at how well it works, says Kim, who thinks that's because the eight-letter windows often span multiple words, thereby picking up common syntax patterns.

The team has also found that the software can classify evolutionary relationships among hundreds of viruses, a feat that conventional tools struggle with because viruses share so few genes in common. Next, they hope to adapt the technique to analyze everything from musical patterns to ancient languages.

[Edited at 2009-03-11 19:33 GMT]


Direct link Reply with quote
 

chica nueva
Local time: 08:30
Chinese to English
word'strings Mar 18, 2009

hello'vito:

how'are'you!i'quite'like'this'idea'because'it'saves'a'lot'of'space'and'uses'fewer'keys。what'do'you'think?

lai'an

[Edited at 2009-03-18 04:22 GMT]


Direct link Reply with quote
 

Vito Smolej
Germany
Local time: 20:30
Member (2004)
English to Slovenian
+ ...
TOPIC STARTER
the metaphore is pretty straightforward Mar 18, 2009

The method looks at the text as the genomic material, i.e. as a sequence of codons, without externally enforced semantics. Anyhow - whatever - as long as it works (g).

Regards

Vito

[Edited at 2009-03-18 06:16 GMT]


Direct link Reply with quote
 

chica nueva
Local time: 08:30
Chinese to English
externally enforced semantics/syntax: punctuation, spaces Mar 20, 2009

Hello Vito

Semantics/Syntax: Yes, you are right. AFAIK in the old days Chinese and Hebrew didn't use to have punctuation. Is that your understanding? Also, there are no spaces between words in some language scripts.

Codons: BTW what is your understanding of a codon? one letter, or a string of 8? I wonder why Kim chose eight letters. There are 26 letters in the English alphabet. What if he had used Chinese or Korean characters as codons ...

Applications: Would it work on prosody and metre I wonder? For example, I am not convinced of the distinction between stress-timed and syllable-timed languages. I wonder whether this method can verify the difference. To me (English being 'stress-timed', and Chinese so-called 'syllable-timed') they seem much the same (but that could just be me, imposing L1 patterns on L2)

Lesley

[Edited at 2009-03-20 03:40 GMT]


Direct link Reply with quote
 

Vito Smolej
Germany
Local time: 20:30
Member (2004)
English to Slovenian
+ ...
TOPIC STARTER
some explanations Mar 20, 2009

...Codons: BTW what is your understanding of a codon?

I meant genetic codons (three base pairs > one codon > the corresponding amino acid, or stop or nonsense, see http://en.wikipedia.org/wiki/Genetic_code)
I wonder why Kim chose eight letters.

My gut feeling is that it was just to speed up the analysis (subsampling of the complete text sequence.
...I am not convinced of the distinction between stress-timed and syllable-timed languages. I wonder whether this method can verify the difference.

Well, there's for sure one way to find this out (g).

Regards

Vito

[Edited at 2009-03-20 05:32 GMT]


Direct link Reply with quote
 

chica nueva
Local time: 08:30
Chinese to English
text, fabric, literature, 'Pattern' Mar 24, 2009

文 wen in the 《文心雕龙》"The Literary Mind and the Carving of Dragons" can be translated as Pattern, sometimes interpreted as 'text' apparently.

Then again the word 'text', comes from Latin textus = fabric, pp. of texere = to weave.

Somehow, it doesn't surprise me that Kim is investigating patterns in literature in this way.

Link to post on Wen Xin Diao Long "The Literary Mind and the Carving of Dragons":
http://www.proz.com/forum/chinese/129977-question_about_tv_show_the_water_margin_水滸傳-page3.html#1087931

[Edited at 2009-03-24 22:16 GMT]


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

A New Twist in Word Comparisons

Advanced search






SDL MultiTerm 2017
Guarantee a unified, consistent and high-quality translation with terminology software by the industry leaders.

SDL MultiTerm 2017 allows translators to create one central location to store and manage multilingual terminology, and with SDL MultiTerm Extract 2017 you can automatically create term lists from your existing documentation to save time.

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »



All of ProZ.com
  • All of ProZ.com
  • Term search
  • Jobs