organised by



» Home » Overview


Comments from Participants of MLDM 2015, Hamburg, Germany


by Diana Benavides Prado, Universidad de Los Andes

On July 20 and 21th, at the beautiful city of Hamburg, data mining and machine learning practitioners met to enjoy two exciting days for sharing research studies related to extract-ing patterns from data, in a wide variety of topics ranging from applied experiences in health, marketing, environment, to theoretical approaches in graph mining, clustering, fre-quent pattern mining, support vector machines, text mining, among others.

Despite the high quality of all of the talks, this year I would like to make special emphasis on three contributions which, in my opinion, are remarkable examples addressing current challenges in data mining and machine learning.

The first one, "Seizure prediction by graph mining, transfer learning and transformation learning", gives important clues for applying graph mining on health, specifically on epi-leptic patients, for predicting a specific event (seizure occurrence). Involving a wide-range of challenges such as those of dealing with complex data on electroencephalographic re-cordings (EEG), differentiating signals between normal and seizure-related, selecting fea-tures and selecting the best forecasting method to use, this research addressed them by using approaches such as independent component analysis, quadratic programming for feature selection, autoregressive modeling and transfer learning for improving learning process. This research has given important steps for applying data mining in healthcare, because it could be extended for improving treatment of some others brain diseases for which EEG data is available.

A second remarkable contribution, entitled "A novel algorithm for the integration of the imputation of missing values and clusterin", concerns dealing one of the main challenges in the pre-processing phase of almost every data mining task: missing values. This contri-bution proposes to address the problem of imputing values to missing features not as a pre-processing phase of clustering but by including this task within the clustering task it-self, in an integrated manner; by imputing values to one or more of the features of a case for which such values are unknown, using (summary) information of cluster containing that case, this approach seems as a very innovative and practical way, that might be an effective example of such kinds of solutions when using other classification techniques over missing data.

A final special contribution, entitled " Author attribution of email messages using parse-tree features", concerns one of the main topics in our modern unstructured data world: text data. In this approach, a parse tree is constructed for describing typical structure of e-mail messages of some specific author; this model is able to establish authorship according to parse trees’ features and by using a set of divergence measures for comparing an e-mail to be predicted against those features. This research gives some clues for similar ap-proaches that might be used in text and e-mail mining, since it is an example of how to infer a structure from text, associate such a structure with some classes (in this case, au-thors), deal with unknown classes (in this case, unknown authors) and use it to (accurate-ly) predict a new text.

Those exciting contributions are just examples of the high quality of the conference. Re-searchers from many different countries gave the conference, as usual, a taste of a multi-cultural event where knowledge and experience can meet in order to interchange experi-ences, new approaches and better practices for improving work and research as data min-ers and machine learners.

DSA 2011 DSA 2011 DSA 2011