-
Notifications
You must be signed in to change notification settings - Fork 17
Add csv related functionalities and tutorial 6 #202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| ---------- | ||
| The :code: `analyze_vot` function has a variety of parameters that are important for running the function properly. | ||
| The :code:`analyze_vot` function has a variety of parameters that are important for running the function properly. | ||
| `classifier` is a string which has a paht to an AutoVOT classifier directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo "paht"
| # Clean up | ||
| select all | ||
| Remove | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks good. Can you also:
- say in prose somewhere (or a comment in the Praat script) what the script does ("Computes average F0 over a sound file")
- Make this an actual Praat script (.praat), in addition to putting in the tutorial. Put the script somewhere in the polyglotdb repo. Put a link here to the script.
(this is because people are very used to just running Praat scripts, as opposed to copy-pasting into a new praat script)
msonderegger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing!
| select all | ||
| Remove | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is great. same comments as above:
- add high-level desciprtion
- add actual Praat script file somewhere
|
|
||
| - Always use :code:`Open long sound file` to ensure compatibility with the system. | ||
| - The `padding` field allows flexibility by extending the actual start and end times of the segment (default is 0.1s). | ||
| - The `padding` field allows flexibility by extending the actual start and end times of the segment (default is 0). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add an issue (for future devs + Michael/me) to make this more detailed. can we give any guidance on when padding is and isn't needed (e.g., yes for pitch, no for power spectrum analysis?) ?
| c.encode_type_subset('phone', ['S', 'Z', 'SH', 'ZH'], 'sibilant') | ||
|
|
||
| # Uses a praat script that takes as input a filename and begin/end time, and outputs measures we'd like to take for sibilants | ||
| # The analyze_script call then applies this script to every phone of type "sibilant" in the corpus. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good comment
| Encoding acoustic tracks from CSV | ||
| ================================= | ||
|
|
||
| Sometimes, you may want to use external software to extract specific measurement tracks. For example, `FastTrack`_ is a Praat plugin that can generate formant tracks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add a bit more detail here on what "specific measurement tracks" means.
Like: sometimes you may want to use external software to generate measurement tracks. Examples include:
- F0 (pitch) tracks computed by an external library, across entire files
- Voice quality tracks for each vowel, computed using VoiceSauce (give link)
- Vowel formant tracks, e.g. using FastTrack...
| c.save_track_from_csv('formants', '/path/to/csv', ['f1','b1','f2','b2','f3','b3','f1p','f2p','f3p','f0','intensity','harmonicity']) | ||
| # loading multiple csv files | ||
| c.save_track_from_csvs('formants', '/path/to/directory', ['f1','b1','f2','b2','f3','b3','f1p','f2p','f3p','f0','intensity','harmonicity']) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you make an issue to write (full) example scripts that:
- import dynamic tracks from FastTrack
- aggregate (what you show below)
- then query and do some output
(In the future I think we'll want an "examples" part of the Polyglot documentation, but not now)
| ************************* | ||
|
|
||
| The main objective of this tutorial is to perform `voice quality`_ analysis on the corpus using a Praat script and extract | ||
| spectral measures like H1-H2, H1-A1, H1-A2, and H1-A3. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This tutorial is good! Let's add an issue to have some users (in the lab + James T) work through and give comments Some comments:
- Make clearer in initial setup that this is an example of a custom script where the result is tracks -- as opposed to the sibilants tutorial example, which uses a custom script, but the result is static measures.
- I think it's fine to put a link here to where to read more about voice quality (the slides you found)
| @@ -0,0 +1,67 @@ | |||
| "('61', 'AO2')","[('mean_H1_A1', 5.21375241700153), ('mean_H1_H2', 1.4283178345991416), ('mean_H1_A2', 18.57861883658481), ('mean_H1_A3', 28.8672225446156)]" | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit surprised at the output here.
Could we get this to look more like other output CSVs, which are rows and columns without extra formatting (no nesting)?
an example would be the output of Tutorial 5: the f0 column is actually the mean F0 over the vowel.
so in the current case, I think we'd want
speaker, vowel, mean_H1_A1, mean_H1_H2, ...
61, A02, 5.21375241700153, 1.4283178345991416
...
?
|
|
||
| return type_map | ||
|
|
||
| def import_token_csv_with_timestamp(corpus_context, path, annotated_type, timestamp_column, discourse_column, properties=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add documentation of the import CSV with vs. without timestamp cases to readthedocs? I think currently the docs for CSV import assume IDs and don't support timestamps.
Code changes:
Docs changes: