There are lots of questions about unstructured data and its impact on the data enterprise. Can we start with a definition?
What we're really doing is designating our data as structured or unstructured. Let's start with structured data, which is really data that is organized in a structure so that it is identifiable. The most universal form of structured data is a database like SQL or Access. For example, SQL (Structured Query Language) allows you to select specific pieces of information based on columns and rows in a field. You might look for all the rows containing a particular date or ZIP code or name -- this is structured data, and it is organized and searchable by data type within the actual content.
@34147 By comparison, unstructured data has no identifiable structure. Unstructured data typically includes bitmap images/objects, text and other data types that are not part of a database. Most enterprise data today can actually be considered unstructured. An email is considered unstructured data. Even though the email messages themselves are organized in a database, such as Microsoft Exchange or Lotus Notes, the body of the message is really freeform text without any structure at all -- the data is considered raw. Documents are another example of unstructured data. Although a Word document has some formatting attached to it, the content of the document is completely free form.
The nature of some data types, such as spreadsheets, is still a matter of debate. The spreadsheet itself has some structure, but the data you put into each cell of a spreadsheet, like Excel, is not regulated by the application.
Listen to the Unstructured data FAQ audiocast.
Go to the beginning of the Unstructured Data FAQ Guide.