You are not logged in | Log in

Text and process mining with Formal Concept Analysis

Speaker(s)
Jonas Poelmans
Affiliation
Katholieke Univ. Leuven, Belgia
Date
March 9, 2012, 2:15 p.m.
Room
room 5820
Seminar
Seminarium badawcze Zakładu Logiki: Wnioskowania aproksymacyjne w eksploracji danych

Text mining scientific papers, process mining healthcare data and CORDIET software
system.
 
In the first part of this talk we show how we used Formal Concept Analysis (FCA) to analyze recent literature on FCA and some closely related disciplines using FCA. We collected 1072 papers published between 2003-2011 mentioning FCA in the abstract. We developed a knowledge browsing environment to support our literature analysis process. The pdf-files containing the papers were converted to plain text and indexed by
Lucene using a thesaurus containing terms related to FCA research. We use the
visualization capabilities of FCA to explore the literature, to discover and
conceptually represent the main research topics in the FCA community. We zoom
in on and give an overview of the papers published between 2003 and 2011 on
using FCA for knowledge discovery and ontology engineering in various
application domains. We also give an overview of the literature on FCA
extensions such as pattern structures, logical concept analysis, relational
concept analysis, power context families, fuzzy FCA, rough FCA, temporal and
triadic concept analysis and discuss scalability issues.
 
In the second part of this talk we analyze a dataset consisting of the activities performed to 148 patients during hospitalization for breast cancer treatment in a hospital in
Belgium. Hospitals increasingly use process models for structuring their care
processes. Activities performed to patients are logged to a database but these
data are rarely used for managing and improving the efficiency of care
processes and quality of care. In this talk, we propose a synergy of process
mining with data discovery techniques. We expose multiple quality of care
issues that will be resolved in the near future, discover process variations
and best practices and we discover issues with the data registration system.
For example, 25 % of patients receiving breast-conserving therapy did not
receive the key intervention "revalidation''. We found this was caused by
lowering the length of stay in the hospital over the years without modifying
the care process. Whereas the process representations offered by Hidden Markov
Models are easier to use than those offered by FCA, this data discovery
technique has proven to be very useful for analyzing process anomalies and
exceptions in detail. 
 
In the third part of this talk we introduce a novel human-centered data mining software system CORDIET which was designed to gain intelligence from unstructured textual data. The architecture takes its roots in several case studies which were a collaboration
between the Amsterdam-Amstelland Police, GZA hospitals and KU Leuven. It is
currently being implemented by bachelor and master students of Moscow Higher
School of Economics. At the core of the system are concept lattices which can
be used to interactively explore the data. They are combined with several other
complementary statistical data analysis techniques such as Emergent Self
Organizing Maps and Hidden Markov Models.  
 
We round up this presentation with a discussion on the potential of human centered knowledge discovery and scalability issues. We give some avenues for future research and possibilities for collaboration.