14 June 32011 / Robin Wellner

The Path to Philosophy

Wikipedia trivia: if you take any article, click on the first link in the article text not in parentheses or italics, and then repeat, you will eventually end up at “Philosophy”.
— Randall Munroe, title text of xkcd/903 .

The method described in the quote at the top of this page is actually away to make a graph out of Wikipedia, I realized. Some sort of reverse tree. Not exactly the right description, since cycles are possible (and do exist on Wikipedia).

I wondered whether it would be possible to write a script that starts from some article on Wikipedia and goes on until it finds Philosophy or ends up in a cycle.

As it turns out, it is. Once you get past the parsing of HTML, it is not that hard, even. For that, I avoided regular expressions, and went straight to html.parser.HTMLParser. Can’t say I used it before, but it was a breeze. 90% of the code in the parser is concerned with keeping track of all the conditions whether it is the right time to follow a link. Surprisingly, handling the “not in parentheses” condition was the simplest. I want to keep the code out of this post and in a Gist, but I can’t resist sharing this snippet beforehand:

	def handle_data(self, data):
		self.open_parens += data.count('(') - data.count(')')

For my walker, I ran into the problem of redirects: this could cause problems — or at least redundancy — for cycles. I solved the problem by making what can be described as a dictionary of sets. I sought help from Stack Overflow, and got a neat new toy out of it, but in the end I just went with what I had already.

Long story short: most articles seem to end up in Philosophy eventually (often in ~20 steps), but some end up in a cycle. It is possible that there is some article in there without a valid link, which means there could be dead ends (I don’t check for that, only for Philosophy and cycles), but I haven’t found one yet.

I thought of building an extensive network of Wikipedia articles, but decided against it because of the high volume of requests I would need to send to Wikipedia for it.

And finally, for those of you craving code, here it is. Requires Python 3.

← Mastermind Solver

The Déjà Vu virtual machine →

	Mr. Noßody on A random alphabet
	davisdude on Déjà Vu Grew a Pair
	Robin Wellner on Déjà Vu Grew a Pair
	davisdude on Déjà Vu Grew a Pair
	The Déjà Vu virtual… on How I implemented Déjà Vu in t…

The Path to Philosophy

2 Comments

Leave a comment Cancel reply

Search

Recent Posts

Top Posts

Archives

Stack Overflow

Lousy tweets

Email Subscription

Recent Comments

The Path to Philosophy

Share this:

Related

2 Comments

Leave a comment Cancel reply

Search

Recent Posts

Top Posts

Archives

Stack Overflow

Lousy tweets

Email Subscription

Recent Comments