York University builds a Linux cluster to speed up searching unstructured data

Neural networks Accelerated pattern-matching boosts recognition systems.

Neural networks Accelerated pattern-matching boosts recognition systems.

Could a search engine ever "guess" what your are searching for? For the past 17 years Jim Austin has been investigating neural computers. He works as professor of neural computation in the advanced computer architectures group at York University, and is chief executive of a technology start-up which builds search engines based on technology developed from his research at the university.

Austin's work has involved a project called Aura (Advanced Uncertain Reasoning Architecture), a set of high-speed search techniques for matching patterns in unstructured data.

"We want to solve the problem of how to get machines to recognise a large number of objects," said Austin. This involves recalling and matching sounds, smells, tastes and images quickly. The Aura engine can be used in areas such as postal address matching, high-speed rule-matching systems, matching 3D molecular structures and trademark-database searching.

Austin demonstrated how Aura could be used when he joined Business Trade International's UK contingency that exhibited at this year's CeBit show.

Aura is at the heart of Fedaura, a £1.4m project for the Department for Work and Pensions looking at ways to reduce benefit fraud using its text-searching engine. It is also being used in an e-science pilot project called Dame (Distributed Aircraft Maintenance Environment), where it listens to vibration signatures in aircraft engines to detect when maintenance is required.

Another project involves graph matching, developed by GlaxoSmithKline, which could be used for pattern matching in drug discovery projects.

In normal databases, data can be indexed to allow a user to search for a given piece of information quickly without having to compare every item in the database. Clean data is required to construct such an index, as well as knowledge of how the data is to be accessed.

For example, unless the index takes into account every spelling variation of each name, a lookup on the database will fail if the text in the query is misspelt. If the user has an address and wants to find the corresponding name, the system would need an additional index for the address field (which again requires clean data). Without this, it would be inflexible and have limited applications.

Computers are good at handling tasks that can be modelled as a series of processes or calculations, but they are less efficient at dealing with other types of information processing.

In particular, it is extremely difficult to program computers to recognise patterns like speech, images or mis-typed text. Neural computing is a research area that investigates how computers can solve problems by mimicking the way the brain works through building neural networks.

"Conventional neural networks are based on mathematics, where the network stores information as continuous numbers," Austin said. In his approach, data is held as binary ones and zeros. To achieve the speed to run the searches, Austin said the code needs to use low-level computer operations.

Most neural networks rely on learning. Users have to "train" speech recognition or writing recognition software to understand their voices or read their handwriting before the software can be used.

Training is good for personal use, but does not help when speed is important, as with face recognition or reading a postal address label, where the machine has just one shot at recognising the image.

Aura allows a user to compare an unknown item against every item stored, and retrieve the examples that are the most similar very quickly, said Austin.

The benefit of this approach is that Austin has been able to build hardware to speed up searches. Based on field programmable gate arrays and digital signal processors, the design, called Presence 1, is configured with 4Gbytes of memory and plugs into a Sun or PC server. The current installation, CortexOne, uses 24 Presence 1 cards to accelerate searches.

Austin has built a network of 24 PC servers configured as a Linux Beowulf cluster. Each PC in the cluster is equipped with a Presence 1 card, giving a total of 24 cards in operation.

CortexOne was able to run 11 matches per second when used for postal address matching he said. This month a new version of the card, Presence 2, will be installed and Austin is expecting a tenfold increase in speed.

CV: Jim Austin       

Jim Austin is professor of neural computation at York University.  He has directed the advanced computer architectures group for more than 15 years. The group is one of the largest units in the computer sciences department at York University, with more than 35 researchers, technicians and academics.  

 Austin has more than 170 published works, and he has participated in many research projects involving the likes of GlaxoSmithKline, Royal Mail, BAE Systems, Rolls Royce and EDS.  

He is most well known for his work on binary neural networks and the Aura technology. Austin's research is primarily motivated by neurobiology. His primary interest is neural networks but he also works on applications for grid computing, computer vision and advanced computer architectures. 

He is chief executive of the recently-established, privately-funded university spin off Cybula, which is focused on transferring the Aura technology from the university into industry applications.

The company operates in conjunction with Austin's group from the Science Park at the University of York.  

Read more on IT strategy