
Dr. Peet Morris is an expert in software engineering,
computational linguistics, and statistics. Here he explains how
Wolfram Alpha works.
As there's been a lot of talklinking Wolfram Alpha with Googlelately, let's start this 'how it works' hypothesis with a
simple comparison.
In a sense, you can think of Wolfram Alpha as a kind of 'super'
version of Google's Calculator.
If you don't know about Google's Calculator, go to Google's
homepage, and enter '8 nautical miles in chains' - you'll see that
the answer is 736.498847 (and if a 'chain' is an unknown to you -
well, you've at least two places to look for it now).
So now, let's enter the same query in Wolfram Alpha.

You'll also notice that Wolfram Alpha has 'rounded up', and has
assumed that we're not too concerned with the missing inch!
Like Google, Wolfram Alpha has carried out some sort of look-up
to find out what a 'nautical mile' and a 'chain' have in common;
and it's discovered that they're units of measurement (the probable
key word is 'mile' here).
Wolfram Alpha then carried out some syntactic and semantic
analysis on the sentence as a whole, and came to the conclusion
that we'd like to know one (times eight), converted to the other.
Apart from giving us some extra handy data, Google has done more or
less the same thing.
Now, let's take another example.
If we enter; 'distance between urbana and champaign'
(proper-case omitted on purpose here - no hints given and none
appreciated) into Google, we find that Google's Calculator can't
help.
Google, and as you might expect, gives us a bunch of results
drawn from web-text. The first result I got when I actually did
this was taken from WikiAnswers where someone asked 'What is the
driving distance between Chicago Illinois and Champaign Urbana
Illinois?' - which is not what we're interested in. Result: there's
no immediately obvious and satisfactory answer from a Google search
for this query.
OK, now let's try Wolfram Alpha.

You'll see from the output that Wolfram Alpha has assumed that,
maybe because they're rather close together, we're asking about
suburbs of Illinois (which I was). And, having given that a high
probability, with a little semantic analysis, it has assumed that
we'd like to know the distance between those two places (which I
did); and you can see the result.
So, how did Wolfram Alpha do it?
This is where we need to come back to the vague term I used
previously - look-up.
Google, as far as we're concerned, gets its data from the public
domain, that is, from what's on the public web.
On the other hand,
Wolfram Alpha gets its data from the 'Deep Web'. That's data
sources that either require a subscription, or at least some sort
of entry point (an account maybe). In simple terms, some are free,
and some are 'fee' (guess which is more robust, structured and
reliable).
Considering our latest query, here is a partial list of data
sources that might have been employed/analysed to answer our
'distance between urbana and champaign' query.

Wolfram Alpha exploits data sources to determine links
(relationships is probably a more accurate word) between
search-terms, and, more importantly, how the likely important terms
(might) relate to one another.
The words used in a query could supply Wolfram Alpha (more so in
Wolfram Alpha than in Google say) with some useful hints - for
example the preposition 'between' is certainly more interesting
than verb 'driving'; and this is the part of the Wolfram
Alpha-engine where I suspect Mathematica
(the
language the whole platform is written in) really earns its
corn (the whys and wherefores are outside of the space provided
here).
Now to the bottom line: As to how all this works (Google/Wolfram
Alpha) in detail - well, we don't really know (secrets and all).
However, I for one suspect that at Wolfram Alpha there's a rather
clever ontological-database at work here. Indeed, one that will
learn and evolve as more data is fed into it, and given that it has
the right rules to link it all up.
So, there you go - not really an in-depth look at Wolfram Alpha
and what you can do with it, but hopefully something that will at
least go someway to showing how Wolfram Alpha isn't what we
commonly call a 'search engine'. Next time I'll give you a detailed
example of how Wolfram Alpha can pull together some really useful
statistics.
By the way, I'll tell you why I used those two suburbs (Urbana
and Champaign). One is where HAL (you know, 2001 and all that) was
activated, whilst the other is where Wolfram Alpha did the very
same thing.
Dr. Peet Morris studied Software Engineering, Computational
Linguistics, and Statistics at the University of Oxford. He is
currently a researcher in the Department of Experimental
Psychology, and a College Lecturer in Statistics at St. Hilda's
College.