Mining Through The Data: Numerical Algorithms Group Releases DMC 2.0

Data mining has become more than the stuff of a "CSI" episode. It has become integral from handwriting recognition to traffic management to even medical diagnostics. As in all toolkits, the need to upgrade is always there.

The Numerical Algorithms Group ( Downers Grove , IL ; Oxford , UK ; NAG) has released version 2.0 of its Data Mining and Cleaning Components (DMC 2.0). DMC 2.0 is the first commercially available data mining application development toolkit that uses results from a three-year European Union-funded project, EUREDIT, among other advances in data mining techniques developed by the NAG, a global collaborative network of more than 300 computer scientists and mathematical experts.

According to NAG's Dr. Stephen Langdell, EUREDIT poured $4.5 million into the project. "Creating those algorithms was kind of a tricky task," he said.

New And Improved
Improvements to DMC 2.0 over previously commercially available data mining applications have been added. In data imputation, missing values in data are replaced by suitable values by using one of three fundamental approaches: summary statistics, distance-based measures, or the EM algorithm for multi-variate normal data. Outlier detection concerns finding suspect data records in a set of data. Data records are identified as suspect if they seem not to be drawn from an assumed data distribution.

Other features include a wide range of added functionality for machine learning and pattern recognition. "Any pattern recognition—that's the key phrase," Langdell said. "There are two main functions of pattern recognition: one is classification—is it there or isn't it there? The other function answers the question—how many are there?"

