Beyond keywords: bringing initiative to enterprise search

This is a guest blogpost by Jarred McGinnis, UK managing consultant of Ontotext, which explores the interdependency of business intelligence and enterprise-strength semantic search.

Business intelligence and enterprise search have always been closely related; enterprise business intelligence is useless without the ability to index and search it. Despite the rapid proliferation in the sheer quantity of enterprise data that we’ve all been hearing about, enterprise search remains in many ways stuck in its old ways. While we’ve progressed from a team of librarians scouring through a basement full of card catalogues to computers on every desk, the underlying principles of information management remain the same. Businesses today have a vast bank of data available and limited means to parse through it.

Structured data is easy to process in bulk with existing technologies, however, ‘unstructured data’, which makes up the majority of business data, is almost always more valuable. Until recently this was very much in the domain of human expertise. Qualitative data like court depositions, customer complaint emails or academic papers require critical thinking from humans to extract value.

Innovations in the field of natural language processing have made it far easier for enterprise companies to automate analysis and bring structure to their unstructured data. This involves the computer system being able to recognise things that matter most (e.g. proper nouns) and the relationships between them. Some solutions, such as Ontotext’s, pair this with a graph database, and this is where things get interesting. When you marry these two technologies you get content indexing that is beyond the rigid rows and columns and begin to capture information in a much more nuanced way. The way we as human users understand and use the information to make informed business decisions.

Search for most organisations is limited: enterprises are forced to play ‘keyword bingo’, rephrasing their question multiple times until they land on what gets them to their answer. The technologies we’ve been exploring can alleviate this problem by not stopping at capturing the keywords, but by capturing the meaning behind the keywords, labelling the keywords into different categories, entities or types, and linking them together and inferring new relationships.

By capturing the relationships between keywords, semantic graph databases help situate keywords as real-world objects and entities rather than just a string of characters. Semantics introduces context to the words we are using to search: for example, a search for ‘Samsung Apple’ is probably referring to Apple the company rather than the fruit.

This will mean that enterprise search will no longer hinge on a crude system of recognising keywords and returning exact matches, a system that costs an organisation not only in time spent but also in the hidden cost of missing valuable information which may be relevant but not tagged with the phrases used in the search. This is lost content that an organisation may find monetisable.

 Think of it as search with initiative: the results will be more targeted towards the goal of the search rather than being limited by the technical limitations of the person doing the searching.

We’re already seeing some progression in this area: the Houses of Parliament in the UK use our technology to index and categorise legislation. The underlying technology is flexible and can be applied to any sector which deals with large quantities of text-based data.

The key here is to move enterprise search from keyword-driven search to context-driven search. Search should be a high-level interface, enabling experts to stop wasting their time figuring out how the computer sees the world and spend more time thinking about the actual problem at hand.