Next: A compendium of UNIX Up: Tools for finding and Previous: Summary

A final exercise.

The following exercise is hard, and is provided, without explicit solution, as a challenge to your ingenuity.

Exercise:: Write a program (in awk PERL or any other computer language) which reads two sorted text files and generates a sorted list of the words which are common to both files. Write a second program which takes the same input, producing the list of words found only in the first file. What happens if the second file is an authoritative dictionary and the first is made from a document full of spelling errors? How is this useful? Describe the sorts of spelling error which this approach won't find. Does this matter?

Solution:: An industrial strength solution to this problem is described in chapter 13 of Jon Bentley's Programming Pearls. It describes the UNIX tool spell, which is a spelling checker. It just points out words which might be wrong. Spelling suggesters, which detect spelling errors, then offer possible replacement strings to the human user, are at the edges of research. Spelling correctors, which don't feel the need of human intervention, are probably a bad idea. Automatic detection of hidden spelling errors (the ones where the output is real word, but not the word which the writer intended) is an active research issue.

Next: A compendium of UNIX Up: Tools for finding and Previous: Summary

Chris Brew
8/7/1998