Selected aspects of interactive feature extraction
- Speaker(s)
- Marek Grzegorowski
- Affiliation
- MIMUW
- Date
- March 12, 2021, 2:15 p.m.
- Information about the event
- meet.google.com/jbj-tdsr-aop
- Seminar
- Research Seminar of the Logic Group: Approximate reasoning in data mining
During the presentation, the most important scientific results related to my phd dissertation will be presented. The dissertation discusses the problem of interactive feature extraction, and several innovative approaches to automate the process of features creation and selection are proposed. The current state of knowledge on feature extraction processes used in commercial applications is shown. The problems associated with processing big data sets, as well as approaches to process evolving data sets are discussed. Introduced feature extraction methods were subjected to experimental verification on the basis of the real data. Besides the experimentation, the practical case studies and applications of developed techniques in selected scientific projects are shown.
Feature extraction addresses the problem of finding the most compact and informative data representation to improve the efficiency of data storage and processing, and to facilitate the subsequent learning and generalization steps. Feature extraction not only simplifies the obtained data representation, but also allows to acquire features that can be easily utilized by both analysts and learning algorithms. In its most common flow, the process starts from an initial set of measured data and builds derived features intended to be informative and non-redundant. Logically, there are two phases of this process: the first is the construction of the new attributes based on original data (sometimes referred to as feature engineering), the second is a selection of the most important among the obtained attributes (sometimes referred to as feature selection). There are many approaches for automatic feature creation and selection that are well-described in literature. Still, it is hard to find methods facilitating interaction with a user, which would take into consideration users' knowledge about the domain, their experience and preferences.
In the study on the interactiveness of the feature extraction methodologies, the problems of deriving useful and understandable parameters from raw sensor readings as well as reducing the amount of those parameters in order to achieve possibly simplest yet accurate models are addressed. The novel methods proposed in the dissertation go beyond the current standards by enabling a more efficient way to express the domain knowledge associated with the most important subsets of attributes. The proposed algorithms for construction and selection of features can make use of various forms of granulation, problem decomposition and parallelization. They are also capable of tackling large spaces of derivable features, and they ensure a satisfactory level of information about the target variable according to a given criterion, even after removing an arbitrary number of elements.
The proposed approaches have been developed based on the experience gained in the course of several research projects in the fields of data analysis and processing multi-sensor streams. The methods have been validated in terms of the quality of the obtained features, as well as throughput, scalability, and robustness of their operation. The discussed methodology has been verified in open data mining competitions to confirm its usefulness.
Link to Meeting: https://meet.google.com/jbj-tdsr-aop