Wolfram Alpha – how it works (part 2)

Not every question you ask Wolfram Alpha brings back meaningful results. Peet Morris find out why.

Launched on 18 May, the long-term goal of Wolfram Alpha - a natural language-based search tool which claims to offer an alternative approach to Google - "is to make all systematic knowledge immediately computable and accessible to everyone".

Its creators say, "When computers were young, people assumed that they'd be able to ask a computer any factual question and have it compute the answer." I'm sure they would have like to have added "and I'm happy to say that"

If Wolfram Alpha's goal is to deliver such a system, it would seem that it's the tool many of us have been dreaming of.

In my previous article I tried - as have many - to say something about how the system appears to function, and in this - the round up - I'll take that a little further, and tell you why I'm a little disappointed at the moment - yet hopeful too.

Last time I suggested that Wolfram Alpha works by a) feeding your query through a linguistic-parser, b) taking that output, and applying further rules and manipulation, and, c) assessing deep-web data to provide - hopefully - an answer.

One of the hardest things about Wolfram Alpha is using it. That is to say, getting your query "correct"; or at least in a form that Wolfram Alpha can understand and use.

Understand and use

To "understand" means that the linguistic-parser (all the parts: syntactic, semantic, etc) has made some initial "sense" of your words; whereas "use" implies that this resultant sense is able to be further understood and manipulated in order to be mapped to Wolfram Alpha's deep-web data sources.

Important point: to get a successful result, you must realise that although Wolfram Alpha might like your query - in other words, all but one of the criteria above were satisfied - it might still be short on relevant data. However, that just requires adding more structured data sources. Worth bearing in mind though.

Getting a successful result also means you have not misunderstood Wolfram Alpha's function altogether Wolfram Alpha is not Google.

Wolfram Alpha has some useful (yet minimal) guidance here which highlights the differences:

Wolfram Apha part 2 figure 1

If you enter the first-parts of these examples into Wolfram Alpha, you will mostly get "proper answers", whereas you won't with the second bits. For instance, "highest mountain" doesn't produce any output. With Google, either part results in hits, but that's what you'd expect. However, don't expect to get 'proper answers' - in fact, be surprised if you do.

So one hurdle in using Wolfram Alpha effectively is to get to grips with how to create, and ask a good question.

I can think of one easy way to improve learning this skill. Give users the chance to rate their Wolfram Alpha answers: "Yes, that's exactly what I wanted"; "WTF!"; "Oh no - not that 'Wolfram Alpha isn't sure what to do with your input' message again!" Or perhaps just a simple 1 to 10 score.

Then, as a learning aid, Wolfram Alpha could display lists of previous queries that scored highly - what better way to learn than to see what has previously worked.

Anyway, let's have a look at an actual attempt to use Wolfram Alpha; one where Wolfram Alpha produces data that is at least semi-relevant.

The problem I wanted answering was to do with tossing a coin multiple times - yet with a twist in the tail.

So to start with I entered "coin toss". Wolfram Alpha came back with some basic probability stuff on tossing a coin - like the probability of seeing 12 heads and eight tails given 20 tosses.

However, the question I wanted answering was quite a bit more complex and concerns tossing a coin and observing a sequence in the results. Here's the question:

Q. Given a series of consecutive coin tosses, is it more/less likely that I would see the sequence HTT before I saw the sequence HTH - or is it the same?

Just to cut to the chase here; the answer is that the average number of tosses needed to see HTH is 10, whereas it's eight for HTT (if you don't believe me, try it. Or just e-mail me for an explanation).

Ok, so on entering that straight into Wolfram Alpha I got what you might expect:

"Wolfram Alpha isn't sure what to do with your input."

And that's pretty reasonable, isn't it? For one thing, I actually re-typed that question sentence about 10 times before I was pretty sure that I had made the problem pretty clear, and all linguists know that I could re-word that question an infinite number of times, yet still have it make just about as much (or better) sense. So, the "WolframAlpha isn't sure what to do with your input" message might be a linguistic failure to "understand".

However, maybe it did well with that - after all, I gave it a fair few: "coins", "tosses", "likely", "sequence", "T", "H" - and Wolfram Alpha does know some stuff about tossing coins.

Try as I might I couldn't get any further on this problem (so you are going to have to try it for yourselves), and this might be simply because Wolfram Alpha does not have enough relevant data to draw upon yet. Alternatively, as I hope I have made clear already, it may not be able to get enough clues about what I'm asking from its linguistic analysis of inputs, or it is unable to take adequate clues, and then manipulate them in such a way as to make querying its data sources work in any way viable. Who knows?

Perhaps another improvement would be to have Wolfram Alpha tell you more about its "I'm not sure" message? What exactly was the problem?

In the end I found myself muttering, "Come on, it's only a particular case of conditional probability," and so I found myself entering "conditional probability".

Wolfram Apha part 2 figure 2

It seems as though I'll have to wait, and, in my gut, I think I'll probably have to wait a fairly long time before I can enter something along the lines of my question and see something pertinent come back.

OK, so it was a tough ask. I'll also admit that I don't profess to know all the ins and outs of how to interact with Wolfram Alpha. However, if Wolfram Alpha is hoping to be a natural language computational knowledge engine, which I assume is the ultimate goal, it has quite some way to go - that is if you don't want to know how high Everest is in terms of Golden Gate bridges!

Peet Morris studied software engineering, computational linguistics and statistics at the University of Oxford. He is currently a researcher in the Department of Experimental Psychology, and a college lecturer in statistics at St Hilda's College.

Read more on Web software