next up previous contents
Next: Markov models Up: Case study: Language Identification Previous: Unique strings

Common words

You could build a list of common words for each language. This will work, but not for short spans of text, since there frequently long stretches without common words in. And often what you want is exactly those intrusive bits of text which are obviously foreign but not common in any language (e.g company names or technical terms.



Chris Brew
8/7/1998