What we're really doing is designating our data as structured or
unstructured. Let's start with structured data, which is really
data that is organized in a structure so that it is identifiable.
The most universal form of structured data is a database like
SQL or Access. For example, SQL (Structured
Query Language) allows you to select specific pieces of
information based on columns and rows in a field. You might look
for all the rows containing a particular date or ZIP code or
name -- this is structured data, and it is organized and
searchable by data type within the actual content.
@34147 By comparison, unstructured data has no identifiable
structure. Unstructured data typically includes bitmap
images/objects, text and other data types that are not part of a
database. Most enterprise data today can actually be considered
unstructured. An email is considered unstructured data. Even though
the email messages themselves are organized in a database, such as
Microsoft Exchange or Lotus Notes, the body of the message is
really freeform text without any structure at all -- the data is
considered raw. Documents are another example of unstructured data.
Although a Word document has some formatting attached to it, the
content of the document is completely free form.
The nature of some data types, such as spreadsheets, is still a
matter of debate. The spreadsheet itself has some structure, but
the data you put into each cell of a spreadsheet, like Excel, is
not regulated by the application.
Listen to the
Unstructured data FAQ audiocast.
Go to the beginning of the
Unstructured Data FAQ Guide.