Float like a butterfly sting like a bee

5/31/2023

Speaking rate estimation directly from the speech waveform is a long-standing problem in speech signal processing. Special attention is paid to PMs usage both in different communication situations and in speech of different sociolects.

Besides, it presents statistical data of PM distributions obtained for 60 basic (invariant) markers, PMs common in both dialogue and monologue (for example, hesitative marker such as vot, tam, tak) are identified, as well as those that are more typical for monologues (boundary markers like znachit, nu vot, vs’o) or dialogues (‘xeno’-markers like takoj, grit and meta-communicative markers vidish’, (ja) ne znaju). The article describes samples from two speech corpora: “One Speaker’s Day” (ORD corpus, consisting of mostly dialogue speech, the annotated subcorpus containing 321 504 tokens) and “Balanced Annotated Text Library” (SAT corpus, which consists only of monologues, the annotated subcorpus containing 50 128 tokens). PMs are an essential part of any oral discourse, therefore, quantitative data on their distribution are necessary for solving both theoretical and practical tasks related to studies of speech communication, as well as for translation and teaching Russian as a foreign language. The paper presents the distribution of pragmatic markers (PM) of Russian everyday speech in two types of discourse: dialogical and monologic. In a pilot classification experiment, we show that by using the simulated speech samples to balance an existing dataset, the classification accuracy improves by about 10% after data augmentation. The results reveal that the SLPs identify the transformed speech as dysarthric 65% of the time. We present the transformed samples to five experienced speech-language pathologists (SLPs) and ask them to identify the samples as healthy or dysarthric. We evaluate the efficacy of our approach using both objective and subjective criteria. In this paper, we propose a method for simulating training data for clinical applications by transforming healthy speech to dysarthric speech using adversarial training. As a result, clinical speech applications are typically developed using small data sets with only tens of speakers. This is problematic for clinical applications where obtaining such data is prohibitively expensive because of privacy concerns or lack of access. Training machine learning algorithms for speech applications requires large, labeled training data sets. The longitudinal study of age-related changes in speech rhythm and intonation could contribute to the normal ageing process’ characterization, being a reference for clinical assessment and intervention. In general, in comparison with his younger age, the speaker got a higher F0 mean level, more F0 variability, higher F0 peaks, more variable F0 peak values, less variable F0 falls, higher F0 min, less steeper F0 rises, less steeper F0 falls, less variable F0 rises, more energy in high frequencies, slower speech and articulation rate, less vocal effort and less variable global intensity. Group mean comparison tests revealed that 14 prosodic features presented statistically significant differences between the three ages.

The ProsodyDescriptor Extractor was used to extract 17 prosodic features (intonation, intensity and rhythm measures) in a set of 90 speech intervals of 3 s to 6 s selected from three interviews collected in different ages of the same male public figure. This pilot study intends to analyse suprasegmental (i.e., prosodic) features in conversational longitudinal speech samples in uncontrolled environments. The understanding of human communication development throughout the lifetime involves the characterization of both segmental and suprasegmental parameters.

0 Comments

Float like a butterfly sting like a bee

Leave a Reply.

Author

Archives

Categories