The Strengths and Weaknesses of Data Mining

Bruce Schneier’s blog drew my attention to a recent report on the limits of predictive data mining for counterterrorism, published by the Cato Institute, a libertarian public policy research organization. We’ve already seen a fair amount of debate about the dangers of large-scale data mining for the identification of potential terrorists. And it’s been pretty damning. But this report provides a good, professional summary of some of the major issues.

Now I’m a great supporter of data mining, data fusion and information visualization to help solve business and security problems. In fact I believe they’re the most under-utilised management tool in the security armoury. But there are dangers in applying such techniques across large databases of information without strong human guidance and a very clear set of rules, patterns and filters to separate the wheat from the chaff. And that’s the problem. We simply don’t have enough of a basis to filter out the mass of false positives that will emerge.

Smart use of neural networks, especially Kohonan mapping, can be tremendously useful when applied on a smaller scale to identity anomalous behaviour. And the right combination of imaginative human and computer skills can work small wonders on large sets of data. We even built a partially-successful model of the human immune system to detect fraud in Post Office transactions. But you simply can’t expect computers to find needles in haystacks without an awful lot of reliable clues.