Skip to content

Commit a97551b

Browse files
committed
Modified track related functionalities
1 parent 24ec5fc commit a97551b

File tree

8 files changed

+523
-210
lines changed

8 files changed

+523
-210
lines changed

docs/source/acoustics_encoding.rst

Lines changed: 146 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -231,38 +231,167 @@ a `window_max` of 30 means that it will look up to 30 milliseconds after the end
231231
Encoding other measures using a Praat script
232232
============================================
233233

234-
Other acoustic measures can be encoded by passing a Praat script to :code:`analyze_script`.
234+
You can encode additional acoustic measures by passing a Praat script to either
235+
:code:`analyze_script` or :code:`analyze_track_script`. It is essential to follow the exact input and output format for
236+
your Praat script to ensure compatibility with the system.
235237

236-
The requirements for the Praat script are:
238+
- :code:`analyze_script`: Designed for single-point measurements. This function works for user-specific
239+
measurements that occur at exactly one point in time for any target annotation type
240+
(or a defined subset of that type) in the hierarchy, such as a predefined set of vowels within all phones.
237241

238-
* exactly one input: the full path to the sound file containing (only) the phone. (Any other parameters can be set manually
239-
within your script, and an existing script may need some other modifications in order to work on this type of input)
240-
* print the resulting acoustic measurements (or other properties) to the Praat Info window in the following format:
242+
- :code:`analyze_track_script`: Use this for continuous measurements or when measurements are required
243+
at multiple time points per annotation. This function allows you to configure your Praat script to
244+
output results for multiple time points.
241245

242-
* The first line should be a space-separated list of column names. These are the names of the properties that will be
243-
saved into the database.
244-
* The second line should be a space-separated list containing one measurement for each property.
245-
* (It is okay if there is some blank space before/after these two lines.)
246+
analyze_script
247+
--------------
246248

247-
An example of the Praat output::
249+
There are two input formats available for designing your Praat script:
250+
251+
Format 1:
252+
~~~~~~~~~
253+
This is sufficient for most use cases and should be your default choice unless runtime efficiency is critical.
254+
In this format, the system generates temporary sound files, each containing one instance of your chosen annotation type.
255+
256+
**Input Requirements:**
257+
258+
- One required input: the full path to the sound file. This input will be automatically filled by the system. You can define additional attributes as needed.
259+
260+
Example input section for a Praat script using Format 1::
261+
262+
form Variables
263+
sentence filename
264+
# add more arguments here
265+
endform
266+
267+
Read from file... 'filename$'
268+
269+
Format 2 (for optimized analysis):
270+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
271+
This format is more efficient as it reuses the same discourse sound file for all annotations in the same discourse, avoiding the creation of extra files.
272+
273+
**Input Requirements:**
274+
275+
- Five required inputs:
276+
- Full path to the **long** sound file
277+
- `begin` time
278+
- `end` time
279+
- `channel`
280+
- `padding`
281+
282+
Do not assign values to these five fields; the system will populate them during processing. You may include additional
283+
attributes beyond these five, but ensure that values are passed as an array via the API.
284+
285+
Example Praat script for Format 2::
286+
287+
form Variables
288+
sentence filename
289+
real begin
290+
real end
291+
integer channel
292+
real padding
293+
# add more arguments here
294+
endform
295+
296+
Open long sound file... 'filename$'
297+
298+
seg_begin = begin - padding
299+
if seg_begin < 0
300+
seg_begin = 0
301+
endif
302+
303+
seg_end = end + padding
304+
if seg_end > duration
305+
seg_end = duration
306+
endif
307+
308+
Extract part... seg_begin seg_end 1
309+
channel = channel + 1
310+
Extract one channel... channel
311+
312+
**Key Notes:**
313+
314+
- Always use :code:`Open long sound file` to ensure compatibility with the system.
315+
- The `padding` field allows flexibility by extending the actual start and end times of the segment (default is 0.1s).
316+
- Channel indexing starts at 0 in the system, so increment by 1 for use in Praat (Praat uses 1-based indexing).
317+
318+
**Output Requirements:**
319+
320+
- Print results to the Praat Info window in this format:
321+
- The first line contains space-separated column names (property names to be saved in the database).
322+
- The second line contains space-separated measurements for each property.
323+
324+
An example of the Praat output::
248325

249326
peak slope cog spread
250327
5540.7376 24.3507 6744.0670 1562.1936
251328

252-
Output format if you are only taking one measure::
329+
Output format if you are only taking one measure::
253330

254331
cog
255332
6013.9
256333

257-
To run :code:`analyze_script`, do the following:
334+
To run :code:`analyze_script`, follow these steps:
335+
336+
1. (Optional) Encode a subset for the annotation type you want to analyze.
337+
2. Call :code:`analyze_script` with the annotation type, the subset name and the path to your script.
338+
339+
.. code-block:: python
340+
341+
with CorpusContext(config) as c:
342+
c.encode_type_subset('phone', ['S', 'Z', 'SH', 'ZH'], 'sibilant')
343+
c.analyze_script(subset='sibilant', annotation_type="phone", script_path='path/to/script/sibilant.praat')
344+
345+
346+
analyze_track_script
347+
--------------------
348+
349+
This function shares the same input formats and functionality as :code:`analyze_script`. However,
350+
:code:`analyze_track_script` is specifically designed for continuous measurements.
351+
Before using this functionality, you must add utterance encoding. When calling the API, you will
352+
need to specify an annotation type (e.g., phone, syllable, or word) to perform the analysis.
353+
The script will then run separately for each instance of the selected annotation type in a multiprocessing manner.
258354

259-
1. encode a phone class for the subset of phones you would like to analyze
260-
2. call :code:`analyze_script` on that phone class, with the path to your script
355+
**Output Requirements:**
261356

262-
For example, to run a script which takes measures for sibilants:
357+
- Print results to the Praat Info window in the following format:
358+
- The first line begins with time, followed by space-separated column names.
359+
- Subsequent lines contain timestamps and measurements for each property.
360+
361+
Example output::
362+
363+
time f1 f2 f3 f4
364+
0.000 502 1497 2502 3498
365+
0.050 518 1483 2475 3452
366+
0.100 537 1471 2462 3441
263367

264368
.. code-block:: python
265369
266370
with CorpusContext(config) as c:
267-
c.encode_class(['S', 'Z', 'SH', 'ZH'], 'sibilant')
268-
c.analyze_script('sibilant', 'path/to/script/sibilant.praat')
371+
script_path = 'voice_quality.praat'
372+
c.config.praat_path = '/path/to/your/praat/executable'
373+
props = [('H1_H2', float), ('H1_A1',float), ('H1_A2',float), ('H1_A3',float)]
374+
c.analyze_track_script('voice_quality', props, script_path, annotation_type='phone')
375+
376+
377+
Encoding acoustic track statistics
378+
==================================
379+
380+
After encoding an acoustic track measurement—either through the built-in algorithms or custom Praat scripts—
381+
you can perform statistical aggregation on these data tracks. The supported statistical measures are: mean, median,
382+
standard deviation (stddev), sum, mode, and count.
383+
Aggregation can be performed on a specified annotation type, such as phones, words, or syllables
384+
(if syllable encoding is available). The aggregation is conducted for all annotations with the same label.
385+
Aggregation can be performed by speaker, in which case the results will be grouped by speaker,
386+
and each (annotation_label, speaker) pair will have its corresponding statistical measure computed.
387+
Once encoded, the computed statistics are stored and can be queried later.
388+
389+
.. code-block:: python
390+
391+
with CorpusContext(config) as c:
392+
# Encode a statistic for an acoustic measure
393+
c.encode_acoustic_statistic('formants', 'mean', by_annotation='phone', by_speaker=True)
394+
395+
# Alternatively, call the get function directly; it will encode the statistic if not already available
396+
results = c.get_acoustic_statistic('formants', 'mean', by_annotation='phone', by_speaker=True)
397+

docs/source/developer_influxdb_implementation.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -42,16 +42,16 @@ along with the ``time`` in seconds will always give a unique acoustic time point
4242

4343

4444
In addition to these tags, there are several queryable fields which are always present in addition to the measurement fields.
45-
First, the ``phone`` for the time point is saved to allow for efficient aggregation across phones. Second, the ``utterance_id``
46-
for the time point is also saved. The ``utterance_id`` is used for general querying, where each utterance's track for the
45+
First, the ``phone``, ``word``, ``syllable``(if syllable encoding has been performed for the corpus) for the time point are saved to allow for efficient aggregation across annotations.
46+
Second, the ``utterance_id``for the time point is also saved. The ``utterance_id`` is used for general querying, where each utterance's track for the
4747
requested acoustic property is queried once and then cached for any further results to use without needing to query the
4848
InfluxDB again. For instance, a query on phone formant tracks might return 2000 phones. Without the ``utterance_id``, there
4949
would be 2000 look ups for formant tracks (each InfluxDB query would take about 0.15 seconds), but using the utterance-based caching,
5050
the number of hits to the InfluxDB database would be a fraction (though the queries themselves would take a little bit longer).
5151

5252
.. note::
5353

54-
For performance reasons internal to InfluxDB, ``phone`` and ``utterance_id`` are ``fields`` rather than ``tags``, because
54+
For performance reasons internal to InfluxDB, ``phone``, ``syllable``, ``word``, and ``utterance_id`` are ``fields`` rather than ``tags``, because
5555
the cross of them with ``speaker``, ``discourse``, and ``channel`` would lead to an extremely large cross of possible tag
5656
combinations. This mix of tags and fields has been found to be the most performant.
5757

docs/source/queries_annotations.rst

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -179,7 +179,7 @@ contains.
179179
180180
with CorpusContext(config) as c:
181181
q = c.query_graph(c.word)
182-
q = q.columns(c.word.phone.label.column('phones'))
182+
q = q.columns(c.word.phone.label.column_name('phones'))
183183
results = q.all()
184184
print(results)
185185
@@ -210,8 +210,8 @@ The keyword ``count`` will return the number of elements.
210210
211211
with CorpusContext(config) as c:
212212
q = c.query_graph(c.word)
213-
q = q.columns(c.word.phone.rate.column('phones_per_second'))
214-
q = q.columns(c.word.phone.count.column('num_phones'))
213+
q = q.columns(c.word.phone.rate.column_name('phones_per_second'))
214+
q = q.columns(c.word.phone.count.column_name('num_phones'))
215215
results = q.all()
216216
print(results)
217217
@@ -221,9 +221,9 @@ These keywords can also leverage subsets, as above:
221221
222222
with CorpusContext(config) as c:
223223
q = c.query_graph(c.word)
224-
q = q.columns(c.word.phone.rate.column('phones_per_second'))
225-
q = q.columns(c.word.phone.filter_by_subset('+syllabic').count.column('num_syllabic_phones'))
226-
q = q.columns(c.word.phone.count.column('num_phones'))
224+
q = q.columns(c.word.phone.rate.column_name('phones_per_second'))
225+
q = q.columns(c.word.phone.filter_by_subset('+syllabic').count.column_name('num_syllabic_phones'))
226+
q = q.columns(c.word.phone.count.column_name('num_phones'))
227227
results = q.all()
228228
print(results)
229229

polyglotdb/acoustics/other.py

Lines changed: 26 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -34,9 +34,8 @@ def generate_praat_script_function(praat_path, script_path, arguments=None):
3434

3535

3636
def analyze_script(corpus_context,
37-
phone_class=None,
37+
annotation_type='phone',
3838
subset=None,
39-
annotation_type=None,
4039
script_path=None,
4140
duration_threshold=0.01,
4241
arguments=None,
@@ -58,8 +57,6 @@ def analyze_script(corpus_context,
5857
----------
5958
corpus_context : :class:`~polyglot.corpus.context.CorpusContext`
6059
corpus context to use
61-
phone_class : str
62-
DEPRECATED, the name of an already encoded subset of phones on which the analysis will be run
6360
subset : str, optional
6461
the name of an already encoded subset of an annotation type, on which the analysis will be run
6562
annotation_type : str
@@ -81,12 +78,13 @@ def analyze_script(corpus_context,
8178
"""
8279
if file_type not in ['consonant', 'vowel', 'low_freq']:
8380
raise ValueError('File type must be one of: consonant, vowel, or low_freq')
84-
85-
if phone_class is not None:
86-
raise DeprecationWarning("The phone_class parameter has now been deprecated, please use annotation_type='phone' and subset='{}'".format(phone_class))
87-
annotation_type = corpus_context.phone_name
88-
subset = phone_class
89-
81+
82+
if not annotation_type in corpus_context.hierarchy.annotation_types:
83+
raise ValueError('Annotation type does not exists')
84+
85+
if script_path is None:
86+
raise ValueError('Please specify script path')
87+
9088
if call_back is not None:
9189
call_back('Analyzing {}...'.format(annotation_type))
9290
time_section = time.time()
@@ -111,25 +109,35 @@ def analyze_script(corpus_context,
111109
def analyze_track_script(corpus_context,
112110
acoustic_name,
113111
properties,
114-
script_path,
112+
script_path=None,
113+
subset=None,
114+
annotation_type='phone',
115115
duration_threshold=0.01,
116-
phone_class=None,
117116
arguments=None,
118117
call_back=None,
119118
file_type='consonant',
120119
stop_check=None, multiprocessing=True):
120+
121121
if file_type not in ['consonant', 'vowel', 'low_freq']:
122122
raise ValueError('File type must be one of: consonant, vowel, or low_freq')
123+
124+
if not annotation_type in corpus_context.hierarchy.annotation_types:
125+
raise ValueError('Annotation type does not exists')
126+
127+
if script_path is None:
128+
raise ValueError('Please specify script path')
129+
123130
if acoustic_name not in corpus_context.hierarchy.acoustics:
124131
corpus_context.hierarchy.add_acoustic_properties(corpus_context, acoustic_name, properties)
125132
corpus_context.encode_hierarchy()
133+
else:
134+
raise ValueError('Acoustic measure already exists')
135+
126136
if call_back is not None:
127-
call_back('Analyzing phones...')
128-
if phone_class is None:
129-
segment_mapping = generate_utterance_segments(corpus_context, padding=PADDING)
130-
else:
131-
segment_mapping = generate_segments(corpus_context, corpus_context.phone_name, phone_class, file_type=file_type,
132-
padding=PADDING, duration_threshold=duration_threshold)
137+
call_back('Analyzing track...')
138+
139+
segment_mapping = generate_segments(corpus_context, annotation_type, subset, file_type=file_type,
140+
padding=PADDING, duration_threshold=duration_threshold)
133141

134142
segment_mapping = segment_mapping.grouped_mapping('speaker')
135143
praat_path = corpus_context.config.praat_path

0 commit comments

Comments
 (0)