|
| 1 | +.. _case_study_praat: |
| 2 | + |
| 3 | +********************************************************* |
| 4 | +Case study 4: Sibilant analysis using custom Praat script |
| 5 | +********************************************************* |
| 6 | + |
| 7 | +Motivation |
| 8 | +---------- |
| 9 | + |
| 10 | +Sibilants, and in particular, /s/, have been observed to show interesting sociolinguistic variation according to a range of intersecting factors, including gendered, class, and ethnic identities (Stuart-Smith, 2007; Levon, Maegaard and Pharao, 2017). Sibilants - /s ʃ z ʒ/ - also show systematic variation according to place of articulation (Johnson, 2003). Alveolar fricatives /s z/ as in send, zen, are formed as a jet of air is forced through a narrow constriction between the tongue tip/blade held close to the alveolar ridge, and the air strikes the upper teeth as it escapes, resulting in high pitched friction. The post-alveolar fricatives /ʃ ʒ/, as in ‘sheet’, ‘Asia’, have a more retracted constriction, the cavity in front of the constriction is a bit longer/bigger, and the pitch is correspondingly lower. In many varieties of English, the post-alveolar fricatives also have some lip-rounding, reducing the pitch further. |
| 11 | + |
| 12 | +Acoustically, sibilants show a spectral ‘mountain’ profile, with peaks and troughs reflecting the resonances of the cavities formed by the articulators (Jesus and Shadle, 2002). The frequency of the main spectral peak, and/or main area of acoustic energy (Centre of Gravity), corresponds quite well to shifts in place of articulation, including quite fine-grained differences, such as those which are interesting for sociolinguistic analysis: alveolars show higher frequencies, more retracted post-alveolars show lower frequencies. |
| 13 | + |
| 14 | +* How do English /ʃ/ and /ʒ/ differ in their spectral peaks and centre of gravity? |
| 15 | + |
| 16 | +Step 1: Import |
| 17 | +-------------- |
| 18 | + |
| 19 | +As with previous case studies, the Python libraries are loaded, and the aligned Librispeech corpus is imported. |
| 20 | + |
| 21 | +.. code-block:: python |
| 22 | +
|
| 23 | + import os # for parsing the paths to the corpus enrichment files |
| 24 | +
|
| 25 | + # PolyglotDB imports |
| 26 | + from polyglotdb import CorpusContext |
| 27 | + import polyglotdb.io as pgio |
| 28 | +
|
| 29 | + ## name and path to the corpus |
| 30 | + corpus_root = '.data/LibriSpeech-aligned' |
| 31 | + corpus_name = 'Librispeech-aligned' |
| 32 | +
|
| 33 | + ## names of the enrichment files |
| 34 | + speaker_filename = "SPEAKERS.csv" |
| 35 | + stress_data_filename = "iscan_lexicon.csv" |
| 36 | +
|
| 37 | + ## get the paths to the corpus enrichment files |
| 38 | + speaker_enrichment_path = os.path.join(corpus_root, 'enrichment_data', speaker_filename) |
| 39 | + lexicon_enrichment_path = os.path.join(corpus_root, 'enrichment_data', stress_data_filename) |
| 40 | +
|
| 41 | + ## use the MFA parser |
| 42 | + parser = pgio.inspect_mfa(corpus_root) |
| 43 | + parser.call_back = print |
| 44 | +
|
| 45 | + with CorpusContext(corpus_name) as c: |
| 46 | + print("Loading data...") |
| 47 | + c.load(parser, corpus_root) |
| 48 | +
|
| 49 | +Step 2: Basic enrichment |
| 50 | +------------------------ |
| 51 | + |
| 52 | +Also as with previous case studies, utterance, syllabic, and speaker information is encoded. |
| 53 | + |
| 54 | +.. code-block:: python |
| 55 | +
|
| 56 | + ## set of syllabic segments |
| 57 | + syllabics = ["ER0", "IH2", "EH1", "AE0", "UH1", "AY2", "AW2", "UW1", "OY2", |
| 58 | + "OY1", "AO0", "AH2", "ER1", "AW1", "OW0", "IY1", "IY2", "UW0", "AA1", "EY0", |
| 59 | + "AE1", "AA0", "OW1", "AW0", "AO1", "AO2", "IH0", "ER2", "UW2", "IY0", "AE2", |
| 60 | + "AH0", "AH1", "UH2", "EH2", "UH0", "EY1", "AY0", "AY1", "EH0", "EY2", "AA2", |
| 61 | + "OW2", "IH1"] |
| 62 | +
|
| 63 | + ## use syllabic labels to encode syllables |
| 64 | + with CorpusContext(corpus_name) as c: |
| 65 | + print("Encoding syllables...") |
| 66 | + c.encode_type_subset('phone', syllabics, 'syllabic') |
| 67 | + c.encode_syllables(syllabic_label='syllabic') |
| 68 | +
|
| 69 | + ## pause label |
| 70 | + pause_labels = ['<SIL>'] |
| 71 | +
|
| 72 | + ## encode utterances from both |
| 73 | + ## pause labels and 150ms stretches |
| 74 | + with CorpusContext(corpus_name) as c: |
| 75 | + print("Encoding utterances...") |
| 76 | + c.encode_pauses(pause_labels) |
| 77 | + c.encode_utterances(min_pause_length=0.15) |
| 78 | +
|
| 79 | + with CorpusContext(corpus_name) as c: |
| 80 | + print("Encoding speakers...") |
| 81 | + c.enrich_speakers_from_csv(speaker_enrichment_path) |
| 82 | +
|
| 83 | + with CorpusContext(corpus_name) as c: |
| 84 | + print("Encoding lexicon...") |
| 85 | + c.enrich_lexicon_from_csv(lexicon_enrichment_path) |
| 86 | + c.encode_stress_from_word_property('stress_pattern') |
| 87 | +
|
| 88 | +Step 3: Sibilant acoustic enrichment |
| 89 | +------------------------------------ |
| 90 | + |
| 91 | +PolyglotDB supports the enrichment of custom information from Praat scripts. Here, a custom Praat script has been written to extract spectral information -- spectral Centre of Gravity (COG) and spectral peak -- for a given segment. PolyglotDB will apply this script to the subset of segments, and enrich the database with these measures. `Praat script <https://github.com/MontrealCorpusTools/PolyglotDB/blob/main/examples/case_studies/praat_sibilants/polyglotdb_sibilant.praat>`_ |
| 92 | + |
| 93 | +First a subset of segments are defined -- `sibilants` -- which are going to be analysed for spectral information. |
| 94 | + |
| 95 | +.. code-block:: python |
| 96 | +
|
| 97 | + sibilant_segments = ["S", "Z", "SH", "ZH"] |
| 98 | +
|
| 99 | +Polyglot is provided both the path to the Praat executable and the specific sibilant enrichment script. |
| 100 | + |
| 101 | +.. code-block:: python |
| 102 | +
|
| 103 | + praat_path = "/usr/bin/praat" # default path on Unix machine |
| 104 | + sibilant_script_path = "./polyglotdb_sibilant.praat" |
| 105 | +
|
| 106 | +The script is then called via the `analyze_script` function. |
| 107 | + |
| 108 | +.. code-block:: python |
| 109 | +
|
| 110 | + with CorpusContext(corpus_name) as c: |
| 111 | + c.encode_class(sibilant_segments, 'sibilant') |
| 112 | + c.analyze_script(annotation_type='phone', subset='sibilant', script_path=sibilant_script_path, duration_threshold=0.01) |
| 113 | +
|
| 114 | +Step 4: Query |
| 115 | +------------- |
| 116 | + |
| 117 | +Now with sibilant spectral information enriched in the database, a query can be generated to extract the sibilant tokens of interest. Here, the focus is on syllable-onset sibilant segments. Columns for the segmental, syllabic, and word-level information are extracted, as well as the spectral measurements made from the Praat script (`cog`, `peak`). |
| 118 | + |
| 119 | +.. code-block:: python |
| 120 | +
|
| 121 | + output_path = "./sibilant_spectral_output.csv" |
| 122 | +
|
| 123 | + with CorpusContext(corpus_name) as c: |
| 124 | + print("Generating query...") |
| 125 | + ## use the sibilant subset to filter segments |
| 126 | + q = c.query_graph(c.phone).filter(c.phone.subset == "sibilant") |
| 127 | + ## syllable-initial (onset) only |
| 128 | + q = q.filter(c.phone.begin == c.phone.syllable.word.begin) |
| 129 | +
|
| 130 | + q = q.columns( |
| 131 | + ## segmental information |
| 132 | + c.phone.id.column_name("phone_id"), |
| 133 | + c.phone.label.column_name('phone_label'), |
| 134 | + c.phone.duration.column_name('phone_duration'), |
| 135 | + c.phone.begin.column_name("phone_begin"), |
| 136 | + c.phone.end.column_name("phone_end"), |
| 137 | +
|
| 138 | + ## surrounding segmental labels |
| 139 | + c.phone.following.label.column_name("following_phone_label"), |
| 140 | + c.phone.previous.label.column_name("previous_phone_label"), |
| 141 | +
|
| 142 | + ## syllabic information |
| 143 | + c.phone.syllable.label.column_name("syllable_label"), |
| 144 | + c.phone.syllable.stress.column_name("syllable_stress"), |
| 145 | + c.phone.syllable.duration.column_name("syllable_duration"), |
| 146 | +
|
| 147 | + ## labels for each part of the syllable |
| 148 | + c.phone.syllable.phone.filter_by_subset('onset').label.column_name('onset'), |
| 149 | + c.phone.syllable.phone.filter_by_subset('nucleus').label.column_name('nucleus'), |
| 150 | + c.phone.syllable.phone.filter_by_subset('coda').label.column_name('coda'), |
| 151 | +
|
| 152 | + ## word, speaker, and utterance-level information |
| 153 | + c.phone.syllable.word.label.column_name('word_label'), |
| 154 | + c.phone.syllable.word.begin.column_name('word_begin'), |
| 155 | + c.phone.syllable.word.end.column_name('word_end'), |
| 156 | + c.phone.syllable.word.utterance.speech_rate.column_name('utterance_speech_rate'), |
| 157 | + c.phone.syllable.speaker.name.column_name('speaker'), |
| 158 | + c.phone.syllable.discourse.name.column_name('file'), |
| 159 | +
|
| 160 | + ## spectral measures enriched from Praat script |
| 161 | + c.phone.cog.column_name('cog'), |
| 162 | + c.phone.peak.column_name('peak') |
| 163 | + ) |
| 164 | +
|
| 165 | + print("Writing query to file...") |
| 166 | + q.to_csv(export_path) |
| 167 | +
|
| 168 | +Step 5: Analysis |
| 169 | +---------------- |
| 170 | + |
| 171 | +As before, the exported CSV file can then be loaded into R. |
| 172 | + |
| 173 | +.. code-block:: r |
| 174 | +
|
| 175 | + library(tidyverse) |
| 176 | +
|
| 177 | + df <- read.csv("sibilant_spectral_output.csv") |
| 178 | +
|
| 179 | + ## check the number of tokens for each segment |
| 180 | + df %>% |
| 181 | + group_by(phone_label) %>% |
| 182 | + tally() |
| 183 | + # A tibble: 3 × 2 |
| 184 | + # phone_label n |
| 185 | + # <chr> <int> |
| 186 | + # 1 S 3298 |
| 187 | + # 2 SH 641 |
| 188 | + # 3 Z 12 |
| 189 | +
|
| 190 | +Both spectral centre of gravity and spectral peak are plotted below, showing that /ʃ/ generally exhibit both lower peaks and centre of gravity, compared with both /s/ and /z/. |
| 191 | + |
| 192 | +.. code-block:: r |
| 193 | +
|
| 194 | + df %>% |
| 195 | + ## make a single column for spectral measures |
| 196 | + ## so both measures can be plotted side-by-side |
| 197 | + pivot_longer(c(peak, cog), names_to = "measure", values_to = "value") %>% |
| 198 | + ggplot(aes(x = phone_label, y = value)) + geom_boxplot() + |
| 199 | + facet_wrap(~measure) + |
| 200 | + scale_y_sqrt() + |
| 201 | + ylab("Frequency (Hz)") + |
| 202 | + xlab("Sibilant") |
| 203 | +
|
| 204 | +.. image:: ../images/sibilant_plot.png |
| 205 | + :width: 400 |
0 commit comments