Stylometry for Investigators: Part 2
Last week, I discussed how each person's writing has unique fingerprints—patterns that can be tracked and analyzed. This week, I want to explore the key aspects of using stylometry in investigations: the metrics we use, how we compare different samples, and the potential pitfalls if we overlook certain details.
So what do you actually look at?
Vocabulary: Every writer has their own rhythm with words. Some people use 600 different words in a 1,000-word sample; others stick to 300. That difference tells you something. Word choice matters too. The person who says “utilize” instead of “use” probably says “commence” instead of “start.” Those habits show up again and again, even when they don’t realize it.
Sentence structure: Sentence length and flow are dead giveaways. Some writers fire off short sentences. Others love long, winding ones. I examine averages, variation, and aspects such as the frequency of active or passive voice usage. “The investigator reviewed the evidence” sounds different, and tests differently, than “The evidence was reviewed by the investigator.”
Function words: Here’s the big tell. Words like the, and, of, to, but, for, with — they make up about half of any text. We use them without thinking, and that’s what makes them so useful. You can fake vocabulary, but you can’t fake how your brain naturally strings those little words together.
Punctuation and habits: Writers leave fingerprints in their punctuation too. Some people love commas. Others avoid them. Regular readers of Tw/oB know a writer who loves commas. Some use semicolons or contractions; others don’t. Even things like how they use capitalization in informal writing can set them apart. I’ve matched suspects before just by their comma and dash habits.
While the Unabomber (Ted Kaczynski) is probably the most famous case partially solved by stylometry, the case involving author J.K. Rowling is likely the best example of the technique’s power.
In 2013, a crime novel published anonymously under the name Robert Galbraith led some readers to suspect that “Galbraith” was actually J.K. Rowling. A journalist from The Sunday Times received an anonymous Twitter tip asserting that Galbraith was J.K. Rowling. To verify this, they consulted two independent linguistic experts, Professor Patrick Juola from Duquesne University and Peter Millican from Oxford University, both specialists in computational stylometry.
Using stylometric analysis, they compared The Cuckoo’s Calling to works by various authors, including Rowling’s The Casual Vacancy. The findings clearly showed that Rowling’s style matched almost every measurable aspect: word length, sentence rhythm, and particularly her frequent use of function words like “to,” “of,” and “the.” The statistical correlation was so strong that it effectively ruled out the contributions of other authors. Within days, Rowling confirmed she was the author.
Stylometry isn’t magic, though, and like anything in this line of work, there are ways it can go wrong.
- Too little text: You can’t analyze just a few sentences and expect reliable results. You need at least a few hundred words to start seeing meaningful patterns. 
- Different topics: Writing about a technical process versus writing about your weekend aren’t the same. Changing topics can hide or distort your style. 
- Time gaps: People’s writing evolves over the years. A sample from 2015 might not match one from 2025, even if it’s the same person. 
- Different platforms: Social media changes how people write. A tweet doesn’t look like a LinkedIn post. Mixing those can make the data noisy quickly. 
- Intentional disguise: If someone knows their writing is being analyzed, they might try to throw you off. They can change sentence lengths, swap favorite words, or use a paraphrasing tool. It works — for a while. 
- Group or edited writing: When a document is written by multiple people or heavily edited, it’s no longer just one person’s work — it’s a mix. 
Stylometry is a tool, not a verdict. It provides measurable evidence, not definitive proof. It can reveal patterns you’d never notice by eye, but it can also mislead if you treat it as absolute. I use it as part of a bigger picture, alongside timeline analysis, metadata, digital forensics, and OSINT. When multiple methods point to the same person, that’s when I start to feel confident. Used correctly, stylometry can connect dots you didn’t even know existed. Used improperly, it can waste your time and damage your credibility.
See you next week.
