x-posted from the MIT Admissions blog
Some of you may remember @horse_ebooks, a Twitter account which, before it was bought and subverted by Buzzfeed, was a truly delightful gibberish machine which spouted pseudorandomly generated spam tweets from a collection of source texts. Some of my favorites:
Crying is great exercise
— Horse ebooks (@Horse_ebooks) September 14, 2012
Unfortunately, as you probably already know, people
— Horse ebooks (@Horse_ebooks) July 25, 2012
As you might know, I am a full time Internet
— Horse ebooks (@Horse_ebooks) February 24, 2012
Fraudulent or not, @horse_ebooks helped inspire an entire genre of surrealist _ebooks-style Twitter bots, which actually do take source texts and produce randomly generated tweets inflected by the voice of various academics, journalists, and programmers. Because they are randomly generated, many, perhaps most, of these tweets aren't very funny. But some of them are really funny, if in an admittedly odd way, because while they are consonant in subject and voice with the source texts, they are probabilistically written in ways that the 'actual' authors never would. The practical result is that you get tweets which sound strangely familiar but are off just enough to be startling and (sometimes) funny.
A few months ago I decided I wanted to make one for the blogs. Over the last few weeks, after reading and committee ended, I actually did. Here's how:
First, I wrote a crude but effective scraper in Python. This script crawls the blogs, downloads every entry ever written, uses the BeautifulSoup library to parse the HTML, and writes each parsed line to a text file.
Then, I cobbled together a tweet generator in Ruby. This script takes the text file as a source, uses the MarkyMarkov gem to map probabilistic word relationships, randomly generates sentences, rolls a D20 to decide if they should be SHOUTED IN ALL CAPS, and posts the final result to Twitter.
I uploaded the source text and the ruby script to scripts, a free hosting service operated by MIT students for the MIT community, and set my cron file to run it every three hours.
TL;DR: @mitblogs_ebooks is now a thing. Everything it says is randomly generated from a source text of every blog entry every written. I like to think of it as admissions advice from an alternate universe, spoken not by any single blogger but by the rumbling chorus of a collective, semi-sentient blogger organism:
Professor Murder and Oxford Collapse were probably my favorite dorm
— мιтα∂мιѕѕισηѕ (@mitblogs_ebooks) May 17, 2014
Suddenly remember I'm supposed to get into Hogwarts
— мιтα∂мιѕѕισηѕ (@mitblogs_ebooks) May 18, 2014
Covering a wide representative group of friends were executed for treason
— мιтα∂мιѕѕισηѕ (@mitblogs_ebooks) May 28, 2014
So there you go. I had never used Ruby, or MarkyMarkov, or a lot of other things before I began this piece of carpentry, but I personally find that trying (and failing, and trying again) is the best way to learn. In making @mitblogs_ebooks, I learned a lot, and sometimes the thing I made even makes me laugh because of how weird it is, which is an added bonus. If you want to try your own exercise in computationally generated weirdness, you can download my code here. Happy making!