1990 · Lexical Statistics

Pointwise Mutual Information

Which words keep each other company — more than chance would predict?

How it works

Every technique so far has counted words in isolation. PMI is the first to ask about a relationship between two words: do they appear together more often than you would expect if they were thrown into the text independently? Slide a small window across the corpus — here, three words on either side — and count how often each pair of words lands inside it together.

Then compare the pair's joint probability against the product of the two words' individual probabilities. If the words were independent, those two quantities would be equal and their ratio would be 1. Pointwise Mutual Information is the base-2 logarithm of that ratio, so independence scores 0, genuine association scores well above 0, and mutual avoidance scores below 0.

PMI(x, y)  =  log2 ( P(x, y) / ( P(x) · P(y) ) )

The crucial point is that PMI rewards surprise, not frequency. In Shakespeare's sonnets the pair "but + the" co-occurs 16 times yet scores only 0.39 — two common words are expected to bump into each other. The pair "ten + times" co-occurs just 5 times but scores 7.38, because neither word is common and yet they cling together. That is a real collocation, surfaced from counting alone.

tied + tongue
8.28
again + back
8.02
fulfil + will
7.51
ten + times
7.38
much + too
6.58
knows + well
6.35
as + fast
6.08
mine + own
5.43

Top collocations by PMI in Shakespeare's 154 sonnets (17,608 tokens, window ±3, pairs seen at least 4 times). These are recognisable English phrases — "ten times", "much too", "as fast", "mine own" — recovered with no grammar and no dictionary, purely from co-occurrence statistics. "mine + own" co-occurs 14 times yet still scores 5.43; the others are rarer but even more tightly bound.

Try it

Find collocations by PMI
Loading the corpus…

Lower the min-count and rare one-off pairs flood the top with meaningless high PMI — which is exactly why a threshold is needed.

Where it falls short

It is biased toward rare events. A pair seen exactly once, made of two otherwise-rare words, gets a huge PMI for no meaningful reason — the denominator P(x)·P(y) is tiny. The only practical fix is a blunt count threshold (here: ignore pairs seen fewer than 4 times), which is a patch, not a cure. Variants like Positive PMI and smoothed PMI exist precisely to tame this bias.

It is symmetric and direction-free. PMI(x,y) equals PMI(y,x). It says "tied" and "tongue" are associated but not that "tied" modifies "tongue", nor which word comes first. There is no syntax and no directionality.

Window co-occurrence ignores sentence structure. A flat ±k window treats a word three positions away identically whether or not a clause or sentence boundary sits between them. Grammar is invisible to the count.

It scores pairs, not meaning. PMI characterises a relationship between two specific words. It still gives you no representation of a single word's overall meaning — you can ask "do these two go together?" but never "what is this word like?".