« Digital Rights Management – Must Try Harder | Main | Truth, Lies and Perception »

The Strengths and Weaknesses of Data Mining

Bruce Schneier’s blog drew my attention to a recent report on the limits of predictive data mining for counterterrorism, published by the Cato Institute, a libertarian public policy research organization. We’ve already seen a fair amount of debate about the dangers of large-scale data mining for the identification of potential terrorists. And it’s been pretty damning. But this report provides a good, professional summary of some of the major issues.

Now I’m a great supporter of data mining, data fusion and information visualization to help solve business and security problems. In fact I believe they’re the most under-utilised management tool in the security armoury. But there are dangers in applying such techniques across large databases of information without strong human guidance and a very clear set of rules, patterns and filters to separate the wheat from the chaff. And that’s the problem. We simply don’t have enough of a basis to filter out the mass of false positives that will emerge.

Smart use of neural networks, especially Kohonan mapping, can be tremendously useful when applied on a smaller scale to identity anomalous behaviour. And the right combination of imaginative human and computer skills can work small wonders on large sets of data. We even built a partially-successful model of the human immune system to detect fraud in Post Office transactions. But you simply can’t expect computers to find needles in haystacks without an awful lot of reliable clues.

TrackBack

TrackBack URL for this entry:
http://www.computerweekly.com/cgi-bin/mt/mt-tb.cgi/1117

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

About

This page contains a single entry from the blog posted on December 17, 2006 11:31 AM.

The previous post in this blog was Digital Rights Management – Must Try Harder .

The next post in this blog is Truth, Lies and Perception.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type