My Wordle Strategy
I can’t say that I’m exactly addicted to Wordle, but I really like the game and try to play it most days. Since I regularly clean out my browser cache, I don’t have any statistical history to show for it, but I seem to guess most words on my fourth try, with fifth, third, and sixth tries next most common. Like any self-respecting geek, I have a strategy.
My prime goal is to guess the word by the sixth (final) try. That may be obvious, but as much as I’d like to guess correctly on the second try, that’s not my goal. That led me to a strategy that’s willing to burn a couple early guesses aimed at discovering as many vowels and common letters in the day’s answer as possible. I sought to devise my first two guesses to carry out that strategy.
So the obvious tactic was to decide which two words would be best at that task.
My first couple games used orate and music. These guesses include all the vowels and several consonants that I considered widely used. These words worked fine, but I thought perhaps I could do better.
Cryptologists have long studied the frequency with which letters appear in English words. I suspect it’s obvious that the body of literature studied will impact the frequency analysis. A study of the letter frequency in contemporary childrens books will likely differ from, say, that of the King James Bible.
Since Wordle is essentially a game for guessing dictionary words, I sought out a frequency chart for English dictionaries. This chart from a set of class handouts at the University of Notre Dame and based on the Concise Oxford Dictionary seemed useful for me. Caveat emptor: you can find charts with letters in different orders; you’ll need to decide on your priorities when picking a chart.
The chart I consulted listed the 13 most used letters, in order, as:
E A R I O T N S L C U D P
To show that the literature studied impacts the letter order, here’s a similar analysis (see note below) based only on my blog posts:
E T O A S I N R L C H D U
Compared to the list from the dictionary, my list adds ‘H’ and drops ‘P,’ although the latter falls right after ‘U’ in my full list. The lists are substantially the same, however, and probably either would work well enough as a starting point.
That understood, I then had to choose two words that would use as many of the first ten of those letters as possible. Actually, that’s not 100% true. I wanted to ensure that my first two guesses included all the standard vowels. Since ‘U’ is the eleventh-most used letter, I needed two words that included all the vowels and as many of the most used consonants as possible.
I evaluated my original words in this light. Green, boldface letters are those used:
- E
- A
- R
- I
- O
- T
- N
- S
- L
- C
- U
- D
- P
- (orate, music)
That’s not too bad. My words covered nine of the top 11 letters and my sole errant guess (‘M’) actually falls 14th in the dictionary frequency chart, immediately after ‘P.’
Still, I thought I could do better. After a bit of work, I decided to use orate and incus, which use ten of the top eleven letters:
- E
- A
- R
- I
- O
- T
- N
- S
- L
- C
- U
- D
- P
- (orate, incus)
Two other word pairs work decently as well:
- E
- A
- R
- I
- O
- T
- N
- S
- L
- C
- U
- D
- P
- (audio, terns)
- E
- A
- R
- I
- O
- T
- N
- S
- L
- C
- U
- D
- P
- (orate, lucid)
As I said, this strategy purposely burns the first two guesses. Not everyone will be happy with that approach. C’est la vie. How I proceed from the third guess onward depends completely on the results of the first two guesses. Typically I have enough letter coverage that I launch right into guessing for real, but sometimes I burn another guess to guage more lesser-used consonants.
Blog Letter Analysis
To count letters in all the markdown files that hold my blog posts, I used a long shell one-liner:
cat *.md | tr a-z A-Z | egrep -o '[A-Z]' | sort | uniq -c | sort -n -r
One Last Thought
It might be interesting to figure out a way to analyze only five-letter words, perhaps by downloading a bunch of sample texts from Project Gutenberg and isolating those words.
To be honest, however, I doubt the result would differ from the dictionary analysis all that much.