I love to listen to podcasts, and by chance, over the past year, I listened to two podcast episodes that talked about the same trove of data: the Enron Corpus.
Enron was an enormous, famous, and successful energy-trading company, based in Houston, that perpetrated one of the biggest accounting frauds in history. In 2001, it collapsed, and top executives were convicted of fraud; estimated losses totaled $74 billion.
In 2001, as part of its investigation of Enron, the Federal Energy Regulatory Commission seized the e-mail folders of about 150 mostly high-ranking Enron employees—and decided to release the emails online. This archive includes hundreds of thousands of emails, and is one of the biggest collections of private emails turned public.
In 2003, a researcher at MIT paid $10,000 for the material, which became known as the “Enron Corpus,” and people began to put it in order and clean it up, so that it could be used by researchers.
It has been studied extensively for all kinds of purposes. It was used to study how people put their emails in folders—to study about who connects with whom at work—to develop a “compliance bot” to alert writers if they’re writing something that might get them in trouble—to study what level of formality people use in emails, and when and why—to study patterns in using greetings or not—to train spam filtering—and so on.
And in this data, people’s emails can be read. All their ordinary exchanges, about work and about their own private lives—and sometimes they write about very private things—it’s all there. You can read the actual emails, with people’s names and all the information. I’ve looked at them. It’s extraordinary.
In the episode “Wrong Tools Cost Lives,” the terrific podcast Cautionary Tales discusses research by those who’ve have studied spreadsheets by using the 15,000 spreadsheets that were included as attachments in the Enron emails.
In episode 35 “You’ve Got Enron Mail!” the podcast Brought to You By… explores the history of the corpus, the many uses to which it has been put, and also the important ethical questions that it raises.
I’m fascinated by the Enron Corpus, and speaking of cautionary tales, I use it as a cautionary tale for myself and my daughters. “Remember, you’re leaving a footprint, you’re leaving a record,” I remind myself and my daughters. “Never write anything that doesn’t live up to your own values. Don’t be impulsive. Don’t write in anger.”
There’s an old proverb, “Don’t do anything you wouldn’t want to see reported on the front page of the New York Times.” And don’t write anything in an email that you wouldn’t want to see included in the Enron Corpus.
If you want to learn more about the Enron Corpus, check out Nathan Heller’s piece in The New Yorker, “What the Enron E-Mails Say About Us.”