Skip to content

Commit f886c82

Browse files
authored
Merge pull request #202 from MontrealCorpusTools/csv_updates
Add csv related functionalities and tutorial 6
2 parents f1b1511 + 44797cf commit f886c82

17 files changed

+30044
-45
lines changed

docs/images/fasttrack_output.png

508 KB
Loading

docs/source/acoustics_encoding.rst

Lines changed: 103 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
1+
.. _FastTrack: https://github.com/santiagobarreda/FastTrack
2+
3+
.. _AutoVOT: https://github.com/mlml/autovot
14

25
**************************
36
Encoding acoustic measures
@@ -194,9 +197,9 @@ Encoding Voice Onset Time(VOT)
194197
==============================
195198

196199
Currently there is only one method to encode Voice Onset Times(VOTs) into PolyglotDB.
197-
This makes use of the `AutoVOT <https://github.com/mlml/autovot>`_ program which automatically calculates VOTs based on various acoustic properties.
200+
This makes use of the `AutoVOT`_ program which automatically calculates VOTs based on various acoustic properties.
198201

199-
VOTs are encoded over a specific subset of phones using :code: `analyze_vot` as follows:
202+
VOTs are encoded over a specific subset of phones using :code:`analyze_vot` as follows:
200203

201204
.. code-block:: python
202205
@@ -215,7 +218,7 @@ VOTs are encoded over a specific subset of phones using :code: `analyze_vot` as
215218

216219
Parameters
217220
----------
218-
The :code: `analyze_vot` function has a variety of parameters that are important for running the function properly.
221+
The :code:`analyze_vot` function has a variety of parameters that are important for running the function properly.
219222
`classifier` is a string which has a paht to an AutoVOT classifier directory.
220223
A default classifier is available in `/tests/data/classifier/sotc_classifiers`.
221224

@@ -228,6 +231,8 @@ The `AutoVOT repo <https://github.com/mlml/autovot>` has some sane defaults for
228231
So, a `window_min` of -30 means that AutoVOT will look up to 30 milliseconds before the start of a phone for the burst, and
229232
a `window_max` of 30 means that it will look up to 30 milliseconds after the end of a phone.
230233

234+
.. _custom_script_encoding:
235+
231236
Encoding other measures using a Praat script
232237
============================================
233238

@@ -257,15 +262,29 @@ In this format, the system generates temporary sound files, each containing one
257262

258263
- One required input: the full path to the sound file. This input will be automatically filled by the system. You can define additional attributes as needed.
259264

260-
Example input section for a Praat script using Format 1::
265+
Example Praat script using Format 1::
261266

262267
form Variables
263268
sentence filename
264-
# add more arguments here
265269
endform
266270

271+
# Read the sound file
267272
Read from file... 'filename$'
268273

274+
# Extract the pitch
275+
To Pitch... 0 75 600
276+
277+
# Compute the mean F0
278+
averageF0 = Get mean... 0 0 Hertz
279+
280+
# Print the result
281+
output$ = "mean_pitch" + newline$ + string$(averageF0)
282+
echo 'output$'
283+
284+
# Clean up
285+
select all
286+
Remove
287+
269288
Format 2 (for optimized analysis):
270289
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
271290
This format is more efficient as it reuses the same discourse sound file for all annotations in the same discourse, avoiding the creation of extra files.
@@ -282,37 +301,58 @@ This format is more efficient as it reuses the same discourse sound file for all
282301
Do not assign values to these five fields; the system will populate them during processing. You may include additional
283302
attributes beyond these five, but ensure that values are passed as an array via the API.
284303

285-
Example Praat script for Format 2::
304+
Example Praat script using Format 2::
286305

287306
form Variables
288-
sentence filename
289-
real begin
290-
real end
291-
integer channel
292-
real padding
293-
# add more arguments here
307+
sentence filename # path to the sound file
308+
real begin # actual begin time (not including the padding)
309+
real end # actual end time (not including the padding)
310+
integer channel # Channel number of the speaker (for discourse with multiple speakers)
311+
real padding # Padding time around the segment (s)
294312
endform
295313

314+
# Load the long sound file
296315
Open long sound file... 'filename$'
297316

317+
# Adjust segment boundaries with padding
298318
seg_begin = begin - padding
299319
if seg_begin < 0
300320
seg_begin = 0
301321
endif
302322

303323
seg_end = end + padding
324+
duration = Get total duration
304325
if seg_end > duration
305326
seg_end = duration
306327
endif
307328

329+
# Extract padded segment
308330
Extract part... seg_begin seg_end 1
309331
channel = channel + 1
310332
Extract one channel... channel
311333

334+
# Extract pitch from full padded segment
335+
# Padding is added specifically for this step because pitch extraction
336+
# requires a minimum window length, which could be too short for certain
337+
# segments (e.g. a phone/word segment)
338+
To Pitch... 0 75 600
339+
340+
# Compute the mean F0 only over the **unpadded** segment
341+
averageF0 = Get mean... begin end Hertz
342+
343+
# Print the result in the required format
344+
output$ = "mean_pitch" + newline$ + string$(averageF0)
345+
echo 'output$''
346+
347+
# Clean up
348+
select all
349+
Remove
350+
351+
312352
**Key Notes:**
313353

314354
- Always use :code:`Open long sound file` to ensure compatibility with the system.
315-
- The `padding` field allows flexibility by extending the actual start and end times of the segment (default is 0.1s).
355+
- The `padding` field allows flexibility by extending the actual start and end times of the segment (default is 0).
316356
- Channel indexing starts at 0 in the system, so increment by 1 for use in Praat (Praat uses 1-based indexing).
317357

318358
**Output Requirements:**
@@ -339,7 +379,11 @@ To run :code:`analyze_script`, follow these steps:
339379
.. code-block:: python
340380
341381
with CorpusContext(config) as c:
382+
# Defines a subset of phones called "sibilant"
342383
c.encode_type_subset('phone', ['S', 'Z', 'SH', 'ZH'], 'sibilant')
384+
385+
# Uses a praat script that takes as input a filename and begin/end time, and outputs measures we'd like to take for sibilants
386+
# The analyze_script call then applies this script to every phone of type "sibilant" in the corpus.
343387
c.analyze_script(subset='sibilant', annotation_type="phone", script_path='path/to/script/sibilant.praat')
344388
345389
@@ -360,10 +404,11 @@ The script will then run separately for each instance of the selected annotation
360404

361405
Example output::
362406

363-
time f1 f2 f3 f4
364-
0.000 502 1497 2502 3498
365-
0.050 518 1483 2475 3452
366-
0.100 537 1471 2462 3441
407+
time H1_A1 H1_A2 H1_A3 H1_H2
408+
0.242 1.378 -4.326 14.369 8.522
409+
0.277 -3.169 -10.276 9.383 3.002
410+
0.312 -0.217 -4.195 3.497 7.215
411+
367412

368413
.. code-block:: python
369414
@@ -373,25 +418,63 @@ Example output::
373418
props = [('H1_H2', float), ('H1_A1',float), ('H1_A2',float), ('H1_A3',float)]
374419
c.analyze_track_script('voice_quality', props, script_path, annotation_type='phone')
375420
421+
A detailed example of using this functionality for voice quality analysis, along with a sample Praat script, is provided in the tutorial. See (:ref:`tutorial_vq`) for more details.
422+
423+
Encoding acoustic tracks from CSV
424+
=================================
425+
426+
Sometimes, you may want to use external software to extract specific measurement tracks. For example, `FastTrack`_ is a Praat plugin that can generate formant tracks.
427+
If you have generated tracks using other software, you can import them into PolyglotDB using the functions :code:`save_track_from_csvs` and :code:`save_track_from_csv` as long as the files
428+
follow the expected structure.
429+
430+
CSV Format::
431+
432+
time, measurement1, measurement2, measurement3, ...
433+
434+
Additionally, the file name should match the name of the discourse for which the track should be saved.
435+
436+
Calling the function :code:`save_track_from_csv` with the file path will save the track. You must also provide a list of the columns that the system should read. It is assumed that all columns are of type float.
437+
438+
To load multiple CSV files at once, pass a directory path to :code:`save_track_from_csvs`.
439+
440+
**Example** (FastTrack output):
441+
442+
.. image:: images/fasttrack_csvoutput.png
443+
:width: 600
444+
445+
To load all the measures from the generated tracks:
446+
447+
.. code-block:: python
448+
449+
with CorpusContext(config) as c:
450+
# loading one file
451+
c.save_track_from_csv('formants', '/path/to/csv', ['f1','b1','f2','b2','f3','b3','f1p','f2p','f3p','f0','intensity','harmonicity'])
452+
# loading multiple csv files
453+
c.save_track_from_csvs('formants', '/path/to/directory', ['f1','b1','f2','b2','f3','b3','f1p','f2p','f3p','f0','intensity','harmonicity'])
454+
376455
377456
Encoding acoustic track statistics
378457
==================================
379458

380459
After encoding an acoustic track measurement—either through the built-in algorithms or custom Praat scripts—
381460
you can perform statistical aggregation on these data tracks. The supported statistical measures are: mean, median,
382461
standard deviation (stddev), sum, mode, and count.
462+
383463
Aggregation can be performed on a specified annotation type, such as phones, words, or syllables
384464
(if syllable encoding is available). The aggregation is conducted for all annotations with the same label.
465+
385466
Aggregation can be performed by speaker, in which case the results will be grouped by speaker,
386467
and each (annotation_label, speaker) pair will have its corresponding statistical measure computed.
468+
387469
Once encoded, the computed statistics are stored and can be queried later.
388470

389471
.. code-block:: python
390472
391473
with CorpusContext(config) as c:
392474
# Encode a statistic for an acoustic measure
393-
c.encode_acoustic_statistic('formants', 'mean', by_annotation='phone', by_speaker=True)
475+
c.encode_acoustic_statistic('voice_quality', 'mean', by_annotation='phone', by_speaker=True)
394476
395477
# Alternatively, call the get function directly; it will encode the statistic if not already available
396-
results = c.get_acoustic_statistic('formants', 'mean', by_annotation='phone', by_speaker=True)
397-
478+
results = c.get_acoustic_statistic('voice_quality', 'mean', by_annotation='phone', by_speaker=True)
479+
# This would compute, save, and return the mean values for all voice quality measurements on a by speaker and by phone basis.
480+
# for example ('speaker1', 'AO1'): [1.4283178345991416, 5.21375241700153, 28.8672225446156, 18.57861883658481]
Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
.. _voice quality: https://linguistics.ucla.edu/people/keating/Keating_SST2006_talk.pdf
2+
3+
.. _UCLA Phonetics Lab: https://phonetics.linguistics.ucla.edu/
4+
5+
.. _script: https://github.com/MontrealCorpusTools/PolyglotDB/tree/main/examples/tutorial/tutorial_6_vq_script.praat
6+
7+
.. _full version of the script: https://github.com/MontrealCorpusTools/PolyglotDB/tree/master/examples/tutorial/tutorial_6.py
8+
9+
.. _expected output: https://github.com/MontrealCorpusTools/PolyglotDB/tree/master/examples/tutorial/results/tutorial_6_subset_voice_quality_mean.csv
10+
11+
.. _tutorial_vq:
12+
13+
*************************
14+
Tutorial 6: Custom Script
15+
*************************
16+
17+
The main objective of this tutorial is to perform `voice quality`_ analysis on the corpus using a Praat script and extract
18+
spectral measures like H1-H2, H1-A1, H1-A2, and H1-A3.
19+
20+
As in the other tutorials, import statements and the corpus name (as it is stored in pgdb) must be set for the code in this tutorial
21+
to be runnable. The example given below continues to make use of the "tutorial-subset" corpus we have been using in tutorials 1-5.
22+
23+
.. code-block:: python
24+
25+
from polyglotdb import CorpusContext
26+
27+
corpus_name = 'tutorial-subset'
28+
script_path = '/path/to/your/praat/script'
29+
export_path_1 = './results/tutorial_6_subset_voice_quality.csv'
30+
export_path_2 = './results/tutorial_6_subset_voice_quality_mean.csv'
31+
praat_path = "/usr/bin/praat" # Make sure to check where Praat executable is stored on your device and change accordingly
32+
33+
.. _tutorial_vq_script:
34+
35+
The Example Praat script
36+
=======================
37+
The `Praat script`_ used for this analysis will extract H1-H2 and amplitude differences for higher harmonics (A1, A2, A3).
38+
It is adopted from an online script from the `UCLA Phonetics Lab`_.
39+
For more information on how to format your Praat script, check out (:ref:`custom_script_encoding`)
40+
41+
.. _tutorial_vq_analysis:
42+
43+
Performing VQ Analysis
44+
======================
45+
46+
.. code-block:: python
47+
48+
with CorpusContext(corpus_name) as c:
49+
c.reset_acoustic_measure('voice_quality') # Reset exisiting acoustic measure
50+
c.config.praat_path = praat_path
51+
# Properties extracted from the Praat script
52+
props = [('H1_H2', float), ('H1_A1', float), ('H1_A2', float), ('H1_A3', float)]
53+
54+
# Custom arguments (must be universal across all sound files)
55+
arguments = [10] # Number of measurements per vowel
56+
c.analyze_track_script('voice_quality', props, script_path, subset='vowel', annotation_type='phone', file_type='vowel', padding=0.1, arguments=arguments, call_back=print)
57+
58+
.. note::
59+
60+
When annotation_type is set to phone or word, some sound file segments may be too short for certain analyses.
61+
(For example, the Sound: To Pitch... command in Praat requires each segment to be longer than a minimum duration.)
62+
63+
If you encounter such an error, you can try adding padding to the segments. The modified segments will have a new duration calculated as:
64+
:code:`duration = duration + 2 * padding`.
65+
66+
However, ensure that your Praat script defines the analysis range correctly so that the measurements are performed within the original sound range.
67+
Any measurements obtained outside the segment's original time range will not be stored after the analysis is complete.
68+
69+
The file_type parameter has three options based on resampled frequency upper bounds:
70+
16000Hz for ``consonant``, 11000Hz for ``vowel``, and 2000Hz for ``low_freq``.
71+
Choose the one that best fits your analysis range. By default, you can use consonant.
72+
73+
.. _tutorial_vq_query:
74+
75+
Querying results
76+
================
77+
After running the analysis, we can query and export the results to verify the extracted data.
78+
79+
The CSV file will contain the following columns:
80+
81+
- Phone label: The label of the phone.
82+
- Begin/End time: The time range for the phone.
83+
- Voice quality measures: H1-H2, H1-A1, H1-A2, and H1-A3 values.
84+
85+
86+
.. code-block:: python
87+
88+
# 2. Query and output analysis results
89+
print("Querying results...")
90+
with CorpusContext(corpus_name) as c:
91+
q = c.query_graph(c.phone).filter(c.phone.subset=='vowel')
92+
q = q.columns(c.phone.label.column_name('label'), c.phone.begin.column_name('begin'), c.phone.end.column_name('end'), c.phone.voice_quality.track)
93+
q = q.order_by(c.phone.begin)
94+
results = q.all()
95+
96+
# Display sample result
97+
print(results[0].track)
98+
99+
# Export to CSV
100+
q.to_csv(export_path_1)
101+
102+
103+
.. _tutorial_vq_statistics:
104+
105+
Calculating Mean Values
106+
=======================
107+
To understand the general trend, we can encode acoustic statistics (mean).
108+
109+
.. code-block:: python
110+
111+
with CorpusContext(corpus_name) as c:
112+
acoustic_statistics = c.get_acoustic_statistic('voice_quality', 'mean', by_annotation='phone', by_speaker=True)
113+
114+
# Display example result
115+
key = ('61', 'AO1')
116+
value = acoustic_statistics[key]
117+
print("speaker_word_pair: {}".format(key))
118+
print("mean measures: {}".format(value))
119+
120+
# Export to CSV
121+
with open(export_path_2, 'w') as csv_file:
122+
writer = csv.writer(csv_file)
123+
writer.writerow(['Speaker', 'Phone', 'mean_H1_H2', 'mean_H1_A1', 'mean_H1_A2', 'mean_H1_A3'])
124+
for key, value in acoustic_statistics.items():
125+
writer.writerow([key[0], key[1], *value])
126+
127+
128+
The CSV file generated will then be ready to open in other programs or in R for data analysis. You can see a `full version of the script`_ and its `expected output`_ when run on the 'LibriSpeech-subset' corpora.

docs/source/tutorial_first_steps.rst

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99

1010
.. _full version of the script: https://github.com/MontrealCorpusTools/PolyglotDB/tree/master/examples/tutorial/tutorial_1.py
1111

12-
.. _expected output: https://github.com/MontrealCorpusTools/PolyglotDB/tree/master/examples/tutorial/results/tutorial_1_subset_output.txt
12+
.. _expected results: https://github.com/MontrealCorpusTools/PolyglotDB/tree/master/examples/tutorial/results/tutorial_1_subset_output.txt
1313

1414
.. _formant: https://github.com/MontrealCorpusTools/PolyglotDB/tree/master/examples/tutorial/results/tutorial_4_formants.Rmd
1515

@@ -34,10 +34,13 @@ The main objective of this tutorial is to import a downloaded corpus consisting
3434
database so that they can be queried.
3535

3636
.. note::
37+
3738
The following Python scripts are presented in step-by-step blocks to guide you through the process.
3839
However, it is expected that you run the entire Python script as a single unit when using PolyglotDB.
39-
#. The complete Python script is available here `tutorial scripts`_.
40-
#. If you prefer running the steps in blocks, this tutorial is also available as a `Jupyter notebook`_.
40+
41+
The complete Python script is available here `tutorial scripts`_.
42+
43+
If you prefer running the steps in blocks, this tutorial is also available as a `Jupyter notebook`_.
4144

4245
.. _tutorial_download:
4346

0 commit comments

Comments
 (0)