girls-914823_1920

Voice Biomarkers

Many diseases and medical states are known to leave traces in the way humans speak. Vocal biomarkers, the results of sophisticated algorithms, quantify such states. PeakProfiling is a leading player in the field. We'll explain the background and scope of this technique as well as some examples.

Background and scope of vocal biomarkers

To start getting into the topic, think about the following situation: imagine you call your spouse and he/she says: “hello” – many people claim that after hearing just this single word, they can immediately sense how the spouse is doing. This is, in a very rudimentary form, what voice biomarkers can offer, namely assesing medical or emotional conditions from the way someone speaks, regardless of content.

Medical voice analytics is nowadays an established field of research. A search for “voice biomarker” on PubMed reveals almost 300 results (as of May 2021), with a steep upward trend over the most recent years (see graph). For speech analysis, there are almost 28.000 studies, although only a fraction really covers automatic voice analytics by advanced algorithms.

voicebiomarker graph

The range of diseases covered by the the field is very broad: much research goes into diseases of the central nervous system (CNS); for example, neurodegenerative diseases like Parkinson’s or Alzheimer’s and mental diseases like depression are frequent research topics. Moreover, respiratory diseases and cardiology have been identified as especially relevant topics for voice analytics (Murton et al., 2017; Mayorga, Druzgalski, Morelos, Gonzalez,  & Vidales, 2010). We believe that in the future, hundreds of diseases will be detectable from the sound of the voice. Strongly simplified, building the biomarkers that enable us to detect emotions and mental- or physical diseases requires the following steps:

  • Voice recordings of patients and healthy controls provide the basis - the training data. 
  • The recordings are “labelled” with the information about the type (or severity etc.) of the disease, e.g. in a simple example: “this voice recording belongs to a patient that has been diagnosed with disease X”. 
  • Essentially, when building an algorithmic voice biomarker, the task is to computationally find patterns in the voice signal that allow one to differentiate or correlate these labelled groups. For example, an automatic classifier could be trained to differentiate between  “depression yes vs. depression no” from the voice data.
  • This differentiation based on patterns in the voice signal has to be robust in the sense that it
    performs similarly well when applied to to a data set that had not been part of the training process.

The most crucial part is therefore the  technology for detecting patterns in the signal. This is further elaborated in the section about PeakProfiling technology. However, several additional factors are important when building voice biomarkers:

Difficulty of Task
multiply
Amount of Voice Data
multiply
Quality of Voice Data
multiply
Development Time
multiply
Sophistication of Technology
equal
Expected Performance

Difficulty of task in medical voice analytics

As discussed above, the scope of diseases that can potentially be detected with voice analytics is very broad. Which medical indications are challenging to measure with voice biomarkers and which are comparatively "easy"?

In essence, human perception can serve as a guideline: whenever a medical state leads to an impairment of speech which is audible by humans - or in other words, which is a known symptom of a disease - then a detection with algorithms should be well feasible. For example, it is widely known in the medical literature that on of Parkinson’s symptoms is changes in speech - therefore, it is also not surprising that this has been found in voice analytics with high success rates (Tsanas et al., 2012). 

Vice versa, indications for which voice impairment is not a classic symptom are expected to be more challenging to detect from the voice (Fusaroli, Lambrechts, Bang, Bowler,  & Gaigg, 2017)). For example, in our recent clinical trial for attention deficit and hyperactivity disorder (ADHD) with Charité, we successfully detected the disorder from the sound of the voice even though doctors typically do not consider voice changes as classic symptom of ADHD. Clearly, this endeavor belonged to the 'challenging' category.

Amount of voice data needed to build stable algorithms

Building stable algorithms requires the availability of sufficient quantities of voice data. The exact amount needed depends on many variables , including the technical approach that is used: heavily feature-tuned algorithms (a speciality of PeakProfiling, see Technology part) can provide robust results even on smaller data sets with a few hundred patients. In contrast, brute force, end-to-end AI/Deep Learning systems without any domain knowledge will have high chances to run into stability issues ("overfitting") on such a data basis once they are used on new data that was not part of the training set.

This interdependence between data quantity and suitable machine learning approach is the backdrop against which much of the research in the field has to be evaluated. Clinical data is almost by definition small data, much of the current academic research builds models on less than 100 patients. While this is sufficient to find and publish interesting patterns, it will rarely be robust enough for practical usage.

PeakProfiling therefore takes the large scale route: for example, our clinical trial with Charité Berlin and Forschungzentrum Jülich with almost 700 patients was, to our knowledge, one of the biggest purely clinical trial in the field worldwide.

Quality of voice data needed to build vocal biomarkers

The question of quality of the data is two-sided: on the one hand, higher quality from high end microphones  leads to better algorithms, since there is more information in the data which can be used to solve the given task. 

On the other hand, a demanding recording setup is less usable in practice later on. For example, it may be easier for the patient to use a simple smartphone to record voice rather than a high end headset microphone. 

An optimal solution therefore has to balance out this trade-off and should be defined specifically for every indication and usage scenario. That being said, PeakProfiling can typically work with any audio format and quality. The higher the quality, the easier the task - yet there are definietely means to counter a low sound quality.

Shopping Basket