Research

Silent Speech

Silent Speech presentation on CeBIT fair

A large part of my research is about Silent Speech Recognition, which is speech recognition without making use of the acoustic signal, either because it is unavailable, or because it is undesired. The former is the case for speech-imparied patients, in particular for laryngectomees, whose voice box has been removed. Even for healthy subjects, using acoustic speech may be undesired when confidential, undisturbing communication in public places is desired.

A large part of my early research on the topic is collected in my Dissertation, which was supervised by Prof. Tanja Schultz and which I defended at Karlsruhe Institute of Technology in 2014. During my dissertation work I substantially improved a Silent Speech recognizer based on Surface Electromyography (EMG) (for a very accessible overview of that technology, have a look at Wikipedia). The technology is based on capturing the electrical activity generated by human muscle contractions: for the Silent Speech interface, small electrodes are placed in the user's face, allowing the capturing of speech-related muscle activity. The resulting signal is processed by a suitable recognizer, thus enabling allowing speech processing by machines, as well as speech-based communication, even without making use of an acoustic signal at all. The challenges in this scientifically exciting and medically promising field are manifold and cover the entire processing chain of the underlying system, ranging from biophysics to signal capturing and machine learning to modelling the underlying speech process.

My recent work on electromyographic speech recognition covers porting the system to increasingly rely on neural networks, which have shown great potential in a variety of applications; a major issue is dealing with adaptation between different recording sessions. I am also interested in joining recent advances in image processing with the Silent Speech approach, creating a Lipreading system, i.e. a vision-based silent speech recognizer, which is trained end-to-end using neural networks.

Selected publications for EMG-based Speech Recognition:

  • Schultz, Wand: Modeling Coarticulation in EMG-based Continuous Speech Recognition. Speech Communication 52(4), pp. 341 - 353, 2010.
  • Wand, Janke, Schultz: Tackling Speaking Mode Varieties in EMG-Based Speech Recognition. IEEE Transactions on Biomedical Engineering 61(10), pp. 2515 - 2526, 2014.
  • Schultz, Wand, Hueber, Krusienski, Herff, Brumberg: Biosignal-Based Spoken Communication: A Survey. IEEE/ACM Transactions on Audio, Speech, and Language Processing 25(12), pp. 2257 - 2271, 2017.
  • Proroković, Wand, Schultz, Schmidhuber: Adaptation of an EMG-Based Speech Recognizer via Meta-Learning. Proc. of the 7th IEEE Global Conference on Signal and Information Processing, 2019.
Selected lipreading publications:
  • Wand, Koutník, Schmidhuber: Lipreading with Long Short-Term Memory. Proc. of the 41st IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016, 6115 - 6119
  • Wand, Vu, Schmidhuber: Investigations on End-to-End Audiovisual Fusion. Proc. of the 43th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018, pp.3041 - 3045.
Also have a look at the Press page.

Artificial Intelligence in Chemistry and Drug Design

Application of Artificial Intelligence in the chemical and pharmaceutical industry is a highly current topic. In Organic Chemistry, more than a hundred million different compounds are known, and new ones are discovered and synthesized on a daily basis. These molecules are the foundation of life on earth, and the basis of a multitude of applications, in particular for medical purposes, where drugs have to be designed to fulfill a very specific role, while minimizing undesired side effects.

Tasks which are tackled with Artificial Intelligence include the prediction of reaction products, retrosynthesis (determination of reaction steps to obtain a desired result), optimization of reaction conditions, prediction of molecule properties, etc.

Selected publications:

Applied Biosignal Processing

Within the wide field of applied machine learning, there are several areas which are very well-researched, including image classification and speech recognition. Since I started my research career in Silent Speech interfaces, I have been particularly interested in bringing advanced machine learning methodology to the area of automated biosignal processing.

Biosignal processing comprises such diverse areas as detecting falls of elderly or disabled patients, alerting to heartbeat irregularities, control of prostheses or other assistive devices, EMG-based speech recognition (as described above), and interpretation of brain signals. The large variety of tasks and the diversity of methods make it a highly interesting field, while small amounts of data, as well as low-quality recordings, are challenges to be overcome.

Selected publications:

Versatile Treatment of Real-life Signals

When applying machine learning to realistic signals, one observes several challenges which may not be present in a well-curated, laboratory-style dataset. Based on my work on biophysiological signals, I have a particular interest for dealing with noisy, nonstationary signals. Nonstationarity frequently occurs whenever measurements are repeated across different subjects, or even for the same subject under different conditions (environmental conditions, sensor positioning, active vs resting, etc.). A practically applicable system must be able to deal with such situations; when possible, a powerful approach consists in using different modalities to register the underlying activity. Fusion of these modalities may be a complex task, particularly when they have different properties (e.g. lack of temporal alignment, varying quality parameters [which again need to be estimated], etc.).

Selected publications: