Unique strings

Next: Common words Up: Case study: Language Identification Previous: Case study: Language Identification

Unique strings

It seems reasonable to suppose that each language has a small set of characteristic ``dead giveaway'' strings, but that isn't so, not least because clearly foreign words are common in almost all languages. The approach doesn't weight evidence well enough, and focusses attention on too few pieces of evidence.

Chris Brew
8/7/1998