Add csv related functionalities and tutorial 6 #202

lxy2304 · 2025-02-28T11:47:04Z

Code changes:

Added two functions save_track_from_csv and save_track_from_csvs, updated function enrich_tokens_with_csv with timestamp option.
Added unit tests for the new functions
Minor fixes

Docs changes:

Added FastTrack example for save_track_from_csvs
Addressed comments from last PR, adding example praat scripts showing analyze_script's two input formats
Added tutorial 6, voice quality praat script, and expected outputs.

msonderegger · 2025-03-18T13:58:41Z

docs/source/acoustics_encoding.rst

 ----------
-The :code: `analyze_vot` function has a variety of parameters that are important for running the function properly.
+The :code:`analyze_vot` function has a variety of parameters that are important for running the function properly.
 `classifier` is a string which has a paht to an AutoVOT classifier directory. 


typo "paht"

msonderegger · 2025-03-18T14:01:36Z

docs/source/acoustics_encoding.rst

+    # Clean up
+    select all
+    Remove
+


this looks good. Can you also:

say in prose somewhere (or a comment in the Praat script) what the script does ("Computes average F0 over a sound file")

Make this an actual Praat script (.praat), in addition to putting in the tutorial. Put the script somewhere in the polyglotdb repo. Put a link here to the script.
(this is because people are very used to just running Praat scripts, as opposed to copy-pasting into a new praat script)

msonderegger

Amazing!

msonderegger · 2025-03-18T14:02:44Z

docs/source/acoustics_encoding.rst

+    select all
+    Remove
+
+


this is great. same comments as above:

add high-level desciprtion

add actual Praat script file somewhere

msonderegger · 2025-03-18T14:03:50Z

docs/source/acoustics_encoding.rst


 - Always use :code:`Open long sound file` to ensure compatibility with the system.
- The `padding` field allows flexibility by extending the actual start and end times of the segment (default is 0.1s).
+- The `padding` field allows flexibility by extending the actual start and end times of the segment (default is 0).


let's add an issue (for future devs + Michael/me) to make this more detailed. can we give any guidance on when padding is and isn't needed (e.g., yes for pitch, no for power spectrum analysis?) ?

msonderegger · 2025-03-18T14:04:19Z

docs/source/acoustics_encoding.rst

        c.encode_type_subset('phone', ['S', 'Z', 'SH', 'ZH'], 'sibilant')
+
+        # Uses a praat script that takes as input a filename and begin/end time, and outputs measures we'd like to take for sibilants
+        # The analyze_script call then applies this script to every phone of type "sibilant" in the corpus.


good comment

msonderegger · 2025-03-18T14:07:14Z

docs/source/acoustics_encoding.rst

+Encoding acoustic tracks from CSV
+=================================
+
+Sometimes, you may want to use external software to extract specific measurement tracks. For example, `FastTrack`_ is a Praat plugin that can generate formant tracks.


I'd add a bit more detail here on what "specific measurement tracks" means.

Like: sometimes you may want to use external software to generate measurement tracks. Examples include:

F0 (pitch) tracks computed by an external library, across entire files

Voice quality tracks for each vowel, computed using VoiceSauce (give link)

Vowel formant tracks, e.g. using FastTrack...

msonderegger · 2025-03-18T14:09:22Z

docs/source/acoustics_encoding.rst

+        c.save_track_from_csv('formants', '/path/to/csv', ['f1','b1','f2','b2','f3','b3','f1p','f2p','f3p','f0','intensity','harmonicity'])
+        # loading multiple csv files 
+        c.save_track_from_csvs('formants', '/path/to/directory', ['f1','b1','f2','b2','f3','b3','f1p','f2p','f3p','f0','intensity','harmonicity'])
+


could you make an issue to write (full) example scripts that:

import dynamic tracks from FastTrack

aggregate (what you show below)

then query and do some output

(In the future I think we'll want an "examples" part of the Polyglot documentation, but not now)

msonderegger · 2025-03-18T14:13:52Z

docs/source/tutorial_custom_script.rst

+*************************
+
+The main objective of this tutorial is to perform `voice quality`_ analysis on the corpus using a Praat script and extract 
+spectral measures like H1-H2, H1-A1, H1-A2, and H1-A3.


This tutorial is good! Let's add an issue to have some users (in the lab + James T) work through and give comments Some comments:

Make clearer in initial setup that this is an example of a custom script where the result is tracks -- as opposed to the sibilants tutorial example, which uses a custom script, but the result is static measures.

I think it's fine to put a link here to where to read more about voice quality (the slides you found)

msonderegger · 2025-03-18T14:17:38Z

examples/tutorial/results/tutorial_6_subset_voice_quality_mean.csv

@@ -0,0 +1,67 @@
+"('61', 'AO2')","[('mean_H1_A1', 5.21375241700153), ('mean_H1_H2', 1.4283178345991416), ('mean_H1_A2', 18.57861883658481), ('mean_H1_A3', 28.8672225446156)]"


I'm a bit surprised at the output here.

Could we get this to look more like other output CSVs, which are rows and columns without extra formatting (no nesting)?

an example would be the output of Tutorial 5: the f0 column is actually the mean F0 over the vowel.

so in the current case, I think we'd want

speaker, vowel, mean_H1_A1, mean_H1_H2, ...
61, A02, 5.21375241700153, 1.4283178345991416
...

?

msonderegger · 2025-03-18T14:22:46Z

polyglotdb/io/importer/from_csv.py

+
+    return type_map    
+
+def import_token_csv_with_timestamp(corpus_context, path, annotated_type, timestamp_column, discourse_column, properties=None):


could you add documentation of the import CSV with vs. without timestamp cases to readthedocs? I think currently the docs for CSV import assume IDs and don't support timestamps.

Add csv related functionalities and tutorial 6

44797cf

lxy2304 requested a review from msonderegger February 28, 2025 11:47

msonderegger reviewed Mar 18, 2025

View reviewed changes

msonderegger approved these changes Mar 18, 2025

View reviewed changes

msonderegger merged commit f886c82 into main Mar 18, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add csv related functionalities and tutorial 6 #202

Add csv related functionalities and tutorial 6 #202

lxy2304 commented Feb 28, 2025

Uh oh!

msonderegger Mar 18, 2025

Uh oh!

msonderegger Mar 18, 2025

Uh oh!

msonderegger left a comment

Uh oh!

msonderegger Mar 18, 2025

Uh oh!

msonderegger Mar 18, 2025

Uh oh!

msonderegger Mar 18, 2025

Uh oh!

msonderegger Mar 18, 2025

Uh oh!

msonderegger Mar 18, 2025

Uh oh!

msonderegger Mar 18, 2025

Uh oh!

msonderegger Mar 18, 2025

Uh oh!

msonderegger Mar 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -0,0 +1,67 @@
		"('61', 'AO2')","[('mean_H1_A1', 5.21375241700153), ('mean_H1_H2', 1.4283178345991416), ('mean_H1_A2', 18.57861883658481), ('mean_H1_A3', 28.8672225446156)]"


		return type_map

		def import_token_csv_with_timestamp(corpus_context, path, annotated_type, timestamp_column, discourse_column, properties=None):

Add csv related functionalities and tutorial 6 #202

Add csv related functionalities and tutorial 6 #202

Conversation

lxy2304 commented Feb 28, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

msonderegger left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants