Dr. Peet Morris is an expert in software engineering, computational linguistics, and statistics. Here he explains how Wolfram Alpha works.
As there's been a lot of talk linking Wolfram Alpha with Google lately, let's start this 'how it works' hypothesis with a simple comparison.
In a sense, you can think of Wolfram Alpha as a kind of 'super' version of Google's Calculator.
If you don't know about Google's Calculator, go to Google's homepage, and enter '8 nautical miles in chains' - you'll see that the answer is 736.498847 (and if a 'chain' is an unknown to you - well, you've at least two places to look for it now).
So now, let's enter the same query in Wolfram Alpha.
You'll also notice that Wolfram Alpha has 'rounded up', and has assumed that we're not too concerned with the missing inch!
Like Google, Wolfram Alpha has carried out some sort of look-up to find out what a 'nautical mile' and a 'chain' have in common; and it's discovered that they're units of measurement (the probable key word is 'mile' here).
Wolfram Alpha then carried out some syntactic and semantic analysis on the sentence as a whole, and came to the conclusion that we'd like to know one (times eight), converted to the other. Apart from giving us some extra handy data, Google has done more or less the same thing.
Now, let's take another example.
If we enter; 'distance between urbana and champaign' (proper-case omitted on purpose here - no hints given and none appreciated) into Google, we find that Google's Calculator can't help.
Google, and as you might expect, gives us a bunch of results drawn from web-text. The first result I got when I actually did this was taken from WikiAnswers where someone asked 'What is the driving distance between Chicago Illinois and Champaign Urbana Illinois?' - which is not what we're interested in. Result: there's no immediately obvious and satisfactory answer from a Google search for this query.
OK, now let's try Wolfram Alpha.
You'll see from the output that Wolfram Alpha has assumed that, maybe because they're rather close together, we're asking about suburbs of Illinois (which I was). And, having given that a high probability, with a little semantic analysis, it has assumed that we'd like to know the distance between those two places (which I did); and you can see the result.
So, how did Wolfram Alpha do it?
This is where we need to come back to the vague term I used previously - look-up.
Google, as far as we're concerned, gets its data from the public domain, that is, from what's on the public web.
On the other hand, Wolfram Alpha gets its data from the 'Deep Web'. That's data sources that either require a subscription, or at least some sort of entry point (an account maybe). In simple terms, some are free, and some are 'fee' (guess which is more robust, structured and reliable).
Considering our latest query, here is a partial list of data sources that might have been employed/analysed to answer our 'distance between urbana and champaign' query.
Wolfram Alpha exploits data sources to determine links (relationships is probably a more accurate word) between search-terms, and, more importantly, how the likely important terms (might) relate to one another.
The words used in a query could supply Wolfram Alpha (more so in Wolfram Alpha than in Google say) with some useful hints - for example the preposition 'between' is certainly more interesting than verb 'driving'; and this is the part of the Wolfram Alpha-engine where I suspect Mathematica (the language the whole platform is written in) really earns its corn (the whys and wherefores are outside of the space provided here).
Now to the bottom line: As to how all this works (Google/Wolfram Alpha) in detail - well, we don't really know (secrets and all). However, I for one suspect that at Wolfram Alpha there's a rather clever ontological-database at work here. Indeed, one that will learn and evolve as more data is fed into it, and given that it has the right rules to link it all up.
So, there you go - not really an in-depth look at Wolfram Alpha and what you can do with it, but hopefully something that will at least go someway to showing how Wolfram Alpha isn't what we commonly call a 'search engine'. Next time I'll give you a detailed example of how Wolfram Alpha can pull together some really useful statistics.
By the way, I'll tell you why I used those two suburbs (Urbana and Champaign). One is where HAL (you know, 2001 and all that) was activated, whilst the other is where Wolfram Alpha did the very same thing.
Dr. Peet Morris studied Software Engineering, Computational Linguistics, and Statistics at the University of Oxford. He is currently a researcher in the Department of Experimental Psychology, and a College Lecturer in Statistics at St. Hilda's College.
This was first published in May 2009