vege - stock.adobe.com
Structured data is a small and declining percentage of business information. But it is critically important for almost all organisations.
Structured data volumes are growing more slowly than stores of unstructured and semi-structured information.
Research firm IDC says unstructured data is growing at 29.8% year-on-year, against 19.6% for structured data. That growth is driven by the substantial expansion in new datasets, generated by sources ranging from social media to the internet of things (IoT), and by businesses’ willingness to store unstructured data for textual and other advanced analysis.
Structured data remains important though, and this fact is supported by its continued growth. In fact, as Sharad Patel, a technology expert at PA Consulting Group, points out, more mature firms are moving data into semi-structured and even structured formats.
Structured data is easier to manage, analyse and secure, and applications – from specialist data mining tools to Salesforce.com – help businesses put unstructured data into structured formats, or at least attach them to structured records.
Defining structured data
Structured data is data organised into a fixed format, data model or “schema”. These elements are then addressable by an application – such as a database – for retrieval or reporting.
Structured data is designed so it can be passed into another application, such as a business intelligence package, for further analysis.
Business systems, such as enterprise resource planning (ERP), human resources management and sales automation, are built on top of structured data, either in an integrated database, or by linking to an external relational database application, including Oracle, IBM’s DB2 and the various flavours of SQL.
Structured data is defined by fields that each contain a record or file. Metadata helps applications, and humans, index and organise the information in those files or records.
The growth of this metadata and metadata analysis tools are blurring the lines between structured and unstructured data. A digital image, for example, can be saved with powerful, searchable metadata, from the GPS coordinates of where a picture was taken to the technical settings on the camera.
Organisations can use these structured records to extract information from surveillance or delivery systems, for example, and carry out powerful analysis on the metadata alone, without needing to view the file’s actual image data. The growth of object storage, which is especially suited to dealing with metadata, has also narrowed the gap between structured and unstructured data.
Some experts describe spreadsheets as structured data, although others argue that because there is no fixed data schema for the value of a cell, spreadsheets are more accurately semi-structured data.
XML files are structured, and are often used to transport metadata. Developers can also add structured data to web pages, to help search engines. Google gives the example of a JSON script to tell its search engine that a page contains a recipe.
Changing data models
The vast – and often untapped – business value contained in corporate information is prompting organisations to change the way they store and manage data.
While one trend is to move unstructured data to structured environments, or to make more use of metadata, the other is to focus analytics efforts on unstructured data. Both have different implications for IT infrastructure.
Managing structured data requires a degree of expertise.
“One of the first things you realise about structured data is that it has usually been modelled or organised by an expert for a purpose. This could mean the data has been structured to represent a specific style of data – for example, a customer’s account details or an electronic bank transfer format,” says Nick Jewell, director at data science firm Alteryx.
Nick Jewell, Alteryx
“It can also be structured to reflect the usage of that data, such as processing a customer’s transaction,” he adds.
The efficiency of databases and structured data processing tools, coupled with the wide variety of applications that run on top of them, mean businesses will continue to move data across to structured formats.
In-memory database technology, such as SAP’s Hana, rely on structured data. Businesses are using in-memory systems for real-time or near-real-time information processing. At present, unstructured data systems cannot match in-memory database performance.
The disadvantage of structured data models is the need for experts to set them up. Analytics and storage professionals want greater automation to help them format and manage data. Richer metadata and intelligent systems that can mine unstructured data, possibly using AI, are alternatives to investing up front in putting data in a structured schema.
Automation is also increasingly important when it comes to storage management.
According to Julia Palmer, a research director at Gartner, enterprises want to simplify data management and management of the underlying storage hardware.
“Even though amount of data is not giant [compared with unstructured data], they want an architecture that is easier to use and need experts,” she says. Enterprise systems should be able to handle tiering, compression and deduplication at the storage array level.
In turn, the trend for automation is driven by the move to flash and solid-state storage for structured data. No other technology can compete with solid state for performance, and the core enterprise and analytics applications that use structured data are best placed to turn that performance into business value.
Gartner expects the structured data market to become all-flash. But the higher cost of solid-state systems is forcing organisations to look at automation to ensure storage is used efficiently. An ability to automatically tier off data to disk-based arrays, cloud-based backup or even tape are expected to be required features for enterprise storage systems.
Gartner also points out that organisations want to run tiering, archiving and other services without additional hardware or gateways. The trend is towards fewer suppliers, even as IT departments look at hybrid storage and integration with the cloud.
At present, cloud systems cannot match storage area network (SAN) or direct-attached storage (DAS) systems for performance, so the ability to move data to and from low-cost cloud storage on demand appeals to organisations, provided they can automate it. The alternative, of course, is to process and store data in the cloud.
The supplier landscape for structured data storage is largely made up of the conventional, direct-attached and SAN system suppliers.
Companies such as Dell EMC, HPE, Hitachi Vantara, NetApp and IBM specialise in enterprise-grade storage systems for high-performance applications. The mainstream storage suppliers now offer all-flash systems, disk-based systems, or combinations of both, as well as cloud connectivity.
Flash-only suppliers include Pure Storage and Violin Systems. These newer suppliers have gained ground in systems where performance is critical. Enterprise storage managers looking to future-proof their systems should also look at HPE’s InfoSight division – tailored for data storage and analytics – and Nutanix’s software-defined approach to storage.
Read more about structured and unstructured data
- Unstructured data exists in huge volumes, but often actually it is semi-structured with metadata. We lift the lid on unstructured data and key approaches to its storage.
- NAS and object storage offer highly scalable file storage for large volumes of unstructured data, but which is right for your environment?