Mike Kiev - Fotolia
In 2022, IDC predicted the amount of data created on an annual basis would grow at a compound annual growth rate of over 21% to reach more than 221,000 exabytes by 2026. Keeping in mind that one exabyte is equal to one billion gigabytes, one cannot help but notice that’s a lot of data. Furthermore, IDC says that more than 90% of the data created each year is unstructured data.
Unstructured data is any type of data that does not have a specific, predefined format or structure. Unlike structured data, which is organised into tables, fields and columns, it is typically stored in a free-form or semi-structured format, such as text files, images, videos, emails, presentations, social media posts and other types of multimedia content. The percentage of unstructured data is expected to grow even more with new technology such as the metaverse and machine learning.
But what’s being done with all the unstructured data created each year? Not enough, says Carl D’Halluin, chief technology officer of Datadobi. Created in 2010, the Leuven-based firm is now a global leader in unstructured data management. The company founders worked at EMC, which for a very long time was the leading provider of storage services and is now owned by Dell. Datadobi’s founders were all instrumental in building the world’s first commercial object storage platform, which is now called EMC Centera.
“Unstructured data is often more difficult to analyse and process than structured data, because it lacks the consistency and predictability of structured data,” he says. “However, it can also contain valuable insights and information that is not available in structured data, making it important for data analysis, machine learning and other applications. As a result, many organisations are investing in technologies that can help them better manage, analyse and make use of unstructured data.”
As D’Halluin sees it, there are four primary types of challenges with unstructured data: cost, risk, carbon footprint and value. Organisations need to understand the cost of existing data and ensure it’s stored in the optimal location or tier of storage. Data should be optimally placed on cloud or on-premise as dictated by the needs of the business.
Risk comes in different forms. It includes hardware failure, ransomware infections, malicious or accidental deletion of data, personal data being stored along with non-business-related data, data stored beyond retention requirements, and orphaned data (data with no active owner within an organisation). The first thing needed is to gain visibility, so organisations can head off problems early.
“To minimise risk, they can replicate data and/or create golden copies that augment other backup methods,” he says. “They can relocate ageing data to an archive tier. When it comes to dealing with challenges such as orphaned and/or non-business-related data, they can either relocate data to a quarantine area for further review or possibly even delete the invalid data to reduce the risk introduced by these datasets.”
Read more about data management
- Getting data management right has been pivotal for Singapore IoT startup SensorFlow to optimise energy consumption and reduce carbon emissions for its hotelier clients.
- Data lakes influence the modern data management platform at all levels. Organisations can gain faster insights, save costs, improve governance and boost self-service data access.
- The past year focused heavily on data intelligence, lakehouse development and observability as vendors innovated to help enterprises make effective use of converged data and technologies.
Decision-makers seeking to reduce the carbon footprint associated with storing unstructured data need visibility. They need to know the carbon footprints associated with storage systems and the data stored on these systems. Given that information, they can relocate it from high CO2 emission environments to lower-emission environments.
“There is a tremendous amount of valuable information in all the data companies have been collecting,” says D’Halluin. “It’s important to have the data in the right place at the right time in order to extract maximum value. For example, relocating data from edge locations to a central location for protection – but also for distribution to other locations for consumption by analytics applications – is becoming increasingly important.”
Data can no longer sit statically on the storage platform, where it was originally written. It needs to be placed ahead of time where it needs to be for processing.
Digital Cleanup Day and the future of data management
Data that is no longer needed also has to be removed. Held on 13 March, Digital Cleanup Day promoted better digital hygiene habits to help individuals and organisations become more efficient, productive and secure in their digital lives.
According to the Digital Cleanup Day website, each year, the internet and its supporting systems produce more than 900 million tonnes of CO2. Many experts estimate that internet use accounts for 3.7% of global emissions, or equivalent to the amount generated by all air traffic in the world.
“Organisations that wish to declutter on Digital Cleanup Day (or any other time) and maintain a clean and well-organised digital footprint moving forward should start with the biggest nuts to crack,” says D’Halluin. “This includes removing unnecessary data copies, outdated data, data belonging to employees no longer with the organisation, and expired data backups and archives.”
He adds that the biggest challenge in data management is the management of unstructured data. Organisations need to ensure they use the data they collect and generate to its full potential. At the same time, they need to manage it in a sustainable way.
There are many other challenges in data management, now and in the near future. These include data privacy and security, data integration and inoperability, data governance, and data quality.
Finally, organisations need to be able to process, store and analyse huge volumes of data at a high rate.
In any case, Belgium is well placed to help overcome the growing challenges in managing unstructured data.