Next:
Probability and Language Models
Up:
Data-Intensive Linguistics
Previous:
Annotation Tools
Statistics for Data-Intensive Linguistics
Probability and Language Models
Events and probabilities
Events:
Random variables:
Probabilities:
Conditional probabilities and independence:
Bayes rule
Medical diagnosis:
Statistical models of language
Case study: Language Identification
Unique strings
Common words
Markov models
Bayesian Decision Rules
Choice of priors may not matter:
Estimating Model Parameters
Results
Summary
Applying probabilities to Data-Intensive Linguistics
Contingency Tables
Text preparation
Contingency tables
Counting words in documents
Introduction
Bigram probabilities
Words and documents
Probability and information
Introduction
Data-intensive grocery selection
Entropy
Cross entropy
Why is the cross-entropy always more than the entropy?
Summary and self-check
Questions:
Hidden Markov Models and Part of Speech-Tagging
Graphical presentations of HMMs
Converting between the two presentations
Rabiner to Charniak
Charniak to Rabiner
Example
Transcript
Chris Brew
8/7/1998