A large part of my research is about Silent Speech Recognition, which is speech recognition without making use of the acoustic signal, either because it is unavailable, or because it is undesired. The former is the case for speech-imparied patients, in particular for laryngectomees, whose voice box has been removed. Even for healthy subjects, using acoustic speech may be undesired when confidential, undisturbing communication in public places is desired.
A large part of my early research on the topic is collected in my Dissertation, which was supervised by Prof. Tanja Schultz and which I defended at Karlsruhe Institute of Technology in 2014. During my dissertation work I substantially improved a Silent Speech recognizer based on Surface Electromyography (EMG) (for a very accessible overview of that technology, have a look at Wikipedia). The technology is based on capturing the electrical activity generated by human muscle contractions: for the Silent Speech interface, small electrodes are placed in the user's face, allowing the capturing of speech-related muscle activity. The resulting signal is processed by a suitable recognizer, thus enabling allowing speech processing by machines, as well as speech-based communication, even without making use of an acoustic signal at all. The challenges in this scientifically exciting and medically promising field are manifold and cover the entire processing chain of the underlying system, ranging from biophysics to signal capturing and machine learning to modelling the underlying speech process.
My recent work on electromyographic speech recognition covers porting the system to increasingly rely on neural networks, which have shown great potential in a variety of applications; a major issue is dealing with adaptation between different recording sessions. I am also interested in joining recent advances in image processing with the Silent Speech approach, creating a Lipreading system, i.e. a vision-based silent speech recognizer, which is trained end-to-end using neural networks.
Selected publications for EMG-based Speech Recognition:
Application of Artificial Intelligence in the chemical and pharmaceutical industry is a highly current topic. In Organic Chemistry, more than a hundred million different compounds are known, and new ones are discovered and synthesized on a daily basis. These molecules are the foundation of life on earth, and the basis of a multitude of applications, in particular for medical purposes, where drugs have to be designed to fulfill a very specific role, while minimizing undesired side effects.
Tasks which are tackled with Artificial Intelligence include the prediction of reaction products, retrosynthesis (determination of reaction steps to obtain a desired result), optimization of reaction conditions, prediction of molecule properties, etc.
Selected publications:
Within the wide field of applied machine learning, there are several areas which are very well-researched, including image classification and speech recognition. Since I started my research career in Silent Speech interfaces, I have been particularly interested in bringing advanced machine learning methodology to the area of automated biosignal processing.
Biosignal processing comprises such diverse areas as detecting falls of elderly or disabled patients, alerting to heartbeat irregularities, control of prostheses or other assistive devices, EMG-based speech recognition (as described above), and interpretation of brain signals. The large variety of tasks and the diversity of methods make it a highly interesting field, while small amounts of data, as well as low-quality recordings, are challenges to be overcome.
Selected publications:
When applying machine learning to realistic signals, one observes several challenges which may not be present in a well-curated, laboratory-style dataset. Based on my work on biophysiological signals, I have a particular interest for dealing with noisy, nonstationary signals. Nonstationarity frequently occurs whenever measurements are repeated across different subjects, or even for the same subject under different conditions (environmental conditions, sensor positioning, active vs resting, etc.). A practically applicable system must be able to deal with such situations; when possible, a powerful approach consists in using different modalities to register the underlying activity. Fusion of these modalities may be a complex task, particularly when they have different properties (e.g. lack of temporal alignment, varying quality parameters [which again need to be estimated], etc.).
Selected publications: