You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently there is only one method to encode Voice Onset Times(VOTs) into PolyglotDB.
197
-
This makes use of the `AutoVOT<https://github.com/mlml/autovot>`_ program which automatically calculates VOTs based on various acoustic properties.
200
+
This makes use of the `AutoVOT`_ program which automatically calculates VOTs based on various acoustic properties.
198
201
199
-
VOTs are encoded over a specific subset of phones using :code:`analyze_vot` as follows:
202
+
VOTs are encoded over a specific subset of phones using :code:`analyze_vot` as follows:
200
203
201
204
.. code-block:: python
202
205
@@ -215,7 +218,7 @@ VOTs are encoded over a specific subset of phones using :code: `analyze_vot` as
215
218
216
219
Parameters
217
220
----------
218
-
The :code:`analyze_vot` function has a variety of parameters that are important for running the function properly.
221
+
The :code:`analyze_vot` function has a variety of parameters that are important for running the function properly.
219
222
`classifier` is a string which has a paht to an AutoVOT classifier directory.
220
223
A default classifier is available in `/tests/data/classifier/sotc_classifiers`.
221
224
@@ -228,6 +231,8 @@ The `AutoVOT repo <https://github.com/mlml/autovot>` has some sane defaults for
228
231
So, a `window_min` of -30 means that AutoVOT will look up to 30 milliseconds before the start of a phone for the burst, and
229
232
a `window_max` of 30 means that it will look up to 30 milliseconds after the end of a phone.
230
233
234
+
.. _custom_script_encoding:
235
+
231
236
Encoding other measures using a Praat script
232
237
============================================
233
238
@@ -257,15 +262,29 @@ In this format, the system generates temporary sound files, each containing one
257
262
258
263
- One required input: the full path to the sound file. This input will be automatically filled by the system. You can define additional attributes as needed.
259
264
260
-
Example input section for a Praat script using Format 1::
This format is more efficient as it reuses the same discourse sound file for all annotations in the same discourse, avoiding the creation of extra files.
@@ -282,37 +301,58 @@ This format is more efficient as it reuses the same discourse sound file for all
282
301
Do not assign values to these five fields; the system will populate them during processing. You may include additional
283
302
attributes beyond these five, but ensure that values are passed as an array via the API.
284
303
285
-
Example Praat script for Format 2::
304
+
Example Praat script using Format 2::
286
305
287
306
form Variables
288
-
sentence filename
289
-
real begin
290
-
real end
291
-
integer channel
292
-
real padding
293
-
# add more arguments here
307
+
sentence filename # path to the sound file
308
+
real begin # actual begin time (not including the padding)
309
+
real end # actual end time (not including the padding)
310
+
integer channel # Channel number of the speaker (for discourse with multiple speakers)
311
+
real padding # Padding time around the segment (s)
294
312
endform
295
313
314
+
# Load the long sound file
296
315
Open long sound file... 'filename$'
297
316
317
+
# Adjust segment boundaries with padding
298
318
seg_begin = begin - padding
299
319
if seg_begin < 0
300
320
seg_begin = 0
301
321
endif
302
322
303
323
seg_end = end + padding
324
+
duration = Get total duration
304
325
if seg_end > duration
305
326
seg_end = duration
306
327
endif
307
328
329
+
# Extract padded segment
308
330
Extract part... seg_begin seg_end 1
309
331
channel = channel + 1
310
332
Extract one channel... channel
311
333
334
+
# Extract pitch from full padded segment
335
+
# Padding is added specifically for this step because pitch extraction
336
+
# requires a minimum window length, which could be too short for certain
337
+
# segments (e.g. a phone/word segment)
338
+
To Pitch... 0 75 600
339
+
340
+
# Compute the mean F0 only over the **unpadded** segment
A detailed example of using this functionality for voice quality analysis, along with a sample Praat script, is provided in the tutorial. See (:ref:`tutorial_vq`) for more details.
422
+
423
+
Encoding acoustic tracks from CSV
424
+
=================================
425
+
426
+
Sometimes, you may want to use external software to extract specific measurement tracks. For example, `FastTrack`_ is a Praat plugin that can generate formant tracks.
427
+
If you have generated tracks using other software, you can import them into PolyglotDB using the functions :code:`save_track_from_csvs` and :code:`save_track_from_csv` as long as the files
428
+
follow the expected structure.
429
+
430
+
CSV Format::
431
+
432
+
time, measurement1, measurement2, measurement3, ...
433
+
434
+
Additionally, the file name should match the name of the discourse for which the track should be saved.
435
+
436
+
Calling the function :code:`save_track_from_csv` with the file path will save the track. You must also provide a list of the columns that the system should read. It is assumed that all columns are of type float.
437
+
438
+
To load multiple CSV files at once, pass a directory path to :code:`save_track_from_csvs`.
439
+
440
+
**Example** (FastTrack output):
441
+
442
+
.. image:: images/fasttrack_csvoutput.png
443
+
:width:600
444
+
445
+
To load all the measures from the generated tracks:
The CSV file generated will then be ready to open in other programs or in R for data analysis. You can see a `full version of the script`_ and its `expected output`_ when run on the 'LibriSpeech-subset' corpora.
0 commit comments