Technical evangelist Courtney Claussen at Sybase IQ has this week posted a blog titled "Text Analytics - Slaying the Unstructured Data Dragon".
Claussen describes the etymology of the word "dragon" as being traceable back to the original Greek word meaning of "sharp-sighted one", saying that the dragon is purported to have unusually acute vision.
As we all know, dragons are scary beasts, but behind them we usually find temples and hidden treasures.
"Today's dragon in the world of data is the massive amount of unstructured text that originates from an extensive array of sources: web pages, email, news, blogs, social media sites, surveys and every kind of document imaginable. Unstructured data, like a dragon, is a big scary, fire-breathing beast -- overwhelming to face and seemingly impossible to vanquish. Yet like a dragon, it is the guardian of an enticing treasure trove of information," writes Claussen.
As we now aim to tame the beasts of unstructured data using text analytics focused data processing, Claussen suggests that there are multiple steps or phases needed to make sense of the chatter and help acquire business insight.
Phases here will include:
- Collecting and preparing the data
- Cleansing and 'tokenising' the data
- Categorisation of the data
- Running analytics on the repository of now 'enriched' data
- Reporting and delivery on the data
Claussen rounds out by commenting that text analytics as exercised by machines is not nearly as sophisticated as the functions possible inside our human brains.
"But computers are superior at processing large volumes of data quickly. With strong algorithms, an extensive knowledge base and some human involvement to drive and refine the search, they can be very effective at locating and analyzing the unstructured data that matters to you," says Claussen.
"Sybase IQ 15 incorporates text analytics capabilities with its handling of large objects, specialized indexing for locating and scoring terms and phrases, and an integration layer for plugging in language processing libraries. Sybase IQ is an analytics platform that offers you serious artillery in your battle against the unstructured data dragon," she added.