Tip

5 data mining techniques for optimal results

Faulty data mining makes seeking of decisive information akin to finding a needle in a haystack. Here are some tips to tweak your data mining exercises.

Sunil Dutt Jha

Published: 01 Apr 2011

With enterprises operating out of multiple geographic locations, multi-database mining is becoming important for effective and informed decision making. The following data mining techniques will help you optimize your data mining efforts.

Step 1: Handling of incomplete data

Incomplete data affects classification accuracy and hinders effective data mining. The following techniques are effective for working with incomplete data.

The ISOM-DH model handles incomplete data using independent component analysis (ICA) and self-organizing maps (SOM). It uses existing data to estimate the missing data and visualize the handled high-dimensional data.
Another data mining technique is based on the evolution of strategies built using parametric and non-parametric imputation methods. Genetic algorithms and multilayer perceptrons have to be applied to develop a framework to construct imputation strategies which address multiple incomplete attributes.
Network approaches based on multi-task learning (MTL): the learning of a problem/instance in relation to others) for pattern classification, with missing inputs, can be compared with representative procedures used for handling incomplete data on two well-known data sets.

Heterogeneous database systems play a vital role in the information industry in 2011. Data warehouses must support data extraction from multiple databases to keep up with the trend.

Step 2: Ensure efficiency and scalability of data mining algorithms

A great deal of expertise and effort is currently required for the implementation, maintenance, and performance-tuning of a parallel data mining application. These data mining techniques can help:

Ensure parallel and scalable execution of data mining algorithms.
Grid-enable data mining applications without any intervention on the application side.
Opt for scalable data mining instead of mere associations when mining market basket data.
Remove barriers to the widespread adoption of support vector machines.

Step 3: Mining of large databases

It’s a good data mining technique to combine set architectural alternatives for coupling mining with database systems. Such data mining techniques could include:

Encapsulation of the data mining algorithm in a stored procedure.
Caching the data to a file system on the fly, then mining.
Tight-coupling, primarily with user-defined functions.
SQL implementations for processing in the DBMS.

Step 4: Handling of relational and complex data types

It’s critical to develop a system to support the interactive mining of multiple-level knowledge in large relational databases and data warehouses. This requires tight integration of online analytical processing (OLAP) with a wide spectrum of data mining functions including characterization, association, classification, prediction, and clustering. The system should facilitate query-based, interactive mining of multidimensional databases by implementing a set of advanced data mining techniques including:

OLAP-based induction
Multidimensional statistical analysis
Progressive deepening for data mining refined knowledge
Meta-rule guided mining, and data and knowledge visualization
Assessing data mining results via swap randomization
Analyzing graph databases by aggregate queries
Image classification using sub-graph histogram representation
A preprocessing scheme for high-cardinality categorical attributes in classification and prediction problems

Step 5: Data mining techniques for heterogeneous databases

Heterogeneous database systems play a vital role in the information industry in 2011. Data warehouses must support data extraction from multiple databases to keep up with the trend.

For example, three heterogeneous data mining programs are needed to model the behavior of telecom organizations. First, the client’s attribute weight is calculated from original data using the neural network method. Then, based on attribute weight, exceptional client characteristics are identified using the decision tree method. Finally, the distinguishing model is generated adaptively on the basis of clustering. The combination of three algorithms helps effective distinguishing of exceptional client.

About the author: Sunil Dutt Jha is practicing architect and the CEO of iCMG. He is an architecture strategist who influences enterprise decisions for realizing systems worth $10 mn to $50 mn. As an architecture coach/mentor, Sunil has taught enterprise and software architecture classes to over 5,000 professionals (cumulative) from South America to South Asia.

(As told to Sharon D'Souza)

5 data mining techniques for optimal results

Faulty data mining makes seeking of decisive information akin to finding a needle in a haystack. Here are some tips to tweak your data mining exercises.

Read more on IT architecture

Top 10 vector database use cases across industries

What is a data scientist? What do they do?

Top enterprise process mining challenges, ways to solve them

Investigating electronic phenotyping’s role in clinical analytics