Hearing Hypernasality: Online Crowdsourcing of Cleft Speech
Saint Louis University School of Medicine
Background: Speech assessments are critical in the care of patients with cleft palate, as speech intelligibility is fundamental to social interactions. Online crowdsourcing of perceptual speech outcomes is a burgeoning technology; this project represents the first time cleft speech has been evaluated by lay listeners in an online setting. We hypothesized lay ratings of cleft speech would be highly concordant with the ratings of speech experts. Methods: Videonasoendoscopy (VNE) recordings from patients with cleft had IRB approval for audio extraction. Recordings were assessed by the cleft team's speech language pathologists (SLP), and given a speech score based on the Pittsburgh Weighted Speech Score (PWSS). Specific phrases from each VNE were presented in survey format to internet raters recruited from the online crowdsourcing platform Amazon Mechanical Turk (MTurk). Six main phrases were used: Katie likes cookies (KC), Tell Ted to try (TT), Should I wash the dishes (WD), Peter has a puppy (PP), Sissy sissy sissy (SS), and Zippers are easy to close (ZC). Sound clips were rated on a Likert scale from 1 to 5 corresponding to the PWSS. To determine the accuracy of each phrase, residuals (layperson minus SLP) were calculated. ANOVA with Tukey posthoc pairwise comparisons assessed statistical significance. Results: Audio of hypernasal speech was extracted from VNEs using QuickTime Player. Speech was provided by 3 children with history of cleft palate, ages 4-9, with timing of recordings ranging pre-surgical repair to 6.5 years followup. 68 survey responses resulted in 1,088 unique layperson ratings. For each individual patient (P), P1 crowd-mean 2.62 (SLP rated 2-3); P2 crowd-mean 2.66 (SLP 3), P3 crowd-mean 1.76 (SLP 2). Rounded to nearest whole number for consistency with PWSS scale, all patients matched SLP ratings. The mean rating for each phrase for P1 (SLP rated her 2-3): KC 2.25, TT 2.97, WD 2.93, PP 2.19, SS 2.75; for P2 (SLP rated him 3) KC 2.15, TT 3.32, WD 3.24, PP 3.49, SS 1.96, ZC 1.78; for P3 (SLP rated him 2) KC 1.56, WD 1.49, PP 1.38, SS 1.40, ZC 2.97. ANOVA with Tukey showed phrase accuracy ordered WD>PP>ZC>TT>SS>KC. Conclusion: Online crowdsourcing of cleft speech produces ratings consistent with speech experts. In all 3 cases, the averaged ratings of MTurk workers predicted the gold-standard rating provided by a trained SLP, with phrase accuracy WD>PP>ZC>TT>SS>KC. This novel technology has immediate translation in clinical speech assessments.