Skip to content

Commit 9a2da8c

Browse files
authored
Merge pull request #232 from MontrealCorpusTools/docs-updates
Installation and tutorial updates
2 parents 7d9ede5 + 664a2bc commit 9a2da8c

File tree

11 files changed

+124
-100
lines changed

11 files changed

+124
-100
lines changed

docs/source/getting_started.rst

Lines changed: 85 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
.. _ISCAN server: https://github.com/MontrealCorpusTools/ISCAN
1+
.. _ISCAN documentation: https://iscan.readthedocs.io/en/latest/
22

3-
.. _installation:
3+
.. _ISCAN: https://github.com/MontrealCorpusTools/ISCAN
44

55
.. _Conda Installation: https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html
66

@@ -10,20 +10,21 @@
1010

1111
.. _Docker: https://docs.docker.com/get-started/get-docker/
1212

13+
.. _installation:
14+
1315
***************
1416
Getting started
1517
***************
1618

17-
PolyglotDB is the Python API for interacting with Polyglot databases and is installed through ``pip``. There are other
18-
dependencies that must be installed prior to using a Polyglot database, depending on the user's platform.
19+
PolyglotDB is the Python API for interacting with Polyglot databases and is installed through ``conda-forge`` or ``pip``.
1920

2021
.. note::
2122

22-
Another way to use Polyglot functionality is through setting up an `ISCAN server`_.
23+
`ISCAN`_ is a separate project built on top of PolyglotDB that provides a web-based interface for corpus management and analysis.
2324
An Integrated Speech Corpus Analysis (ISCAN) server can be set up on a lab's central server, or you can run it on your
2425
local computer as well (though many
25-
of PolyglotDB's algorithms benefit from having more processors and memory available). Please see the ISCAN
26-
documentation for more information on setting it up (http://iscan.readthedocs.io/en/latest/getting_started.html).
26+
of PolyglotDB's algorithms benefit from having more processors and memory available). Please see the `ISCAN
27+
documentation`_ for more information on setting it up.
2728
The main feature benefits of ISCAN are multiple Polyglot databases (separating out different corpora and allowing any
2829
of them to be started or shutdown), graphical interfaces for inspecting data, and a user authentication system with different levels
2930
of permission for remote access through a web application.
@@ -40,66 +41,81 @@ If you don't have conda installed on your device:
4041
#. Install either Anaconda, Miniconda, or Miniforge (`Conda Installation`_)
4142
#. Make sure your conda is up to date :code:`conda update conda`
4243

43-
.. warning::
44+
.. _add_conda_to_path:
45+
46+
.. Note::
4447

45-
On Windows, you must use the Anaconda Prompt or Miniforge Prompt to effectively manage and execute conda commands.
46-
This is crucial to avoid potential issues specific to the Windows environment and to ensure that all functionalities work as intended.
48+
On Windows, it is recommended to use the Anaconda Prompt or Miniforge Prompt to manage and execute conda commands effectively.
49+
This is because, by default, installing Anaconda or Miniforge does not add the conda command to your system's PATH environment variable.
50+
However, if you prefer to use the regular Windows Command Prompt or run Python scripts directly from your IDE, you will need to manually add the necessary directories to your PATH.
51+
To do so, follow these steps:
4752

53+
#. Open the Start Menu and search for ``Environment Variables``.
54+
#. Click on ``Edit the system environment variables``.
55+
#. In the System Properties window, click on the ``Environment Variables`` button.
56+
#. In the Environment Variables window, find the ``Path`` variable in the ``User variables`` or ``System variables`` section and select it.
57+
#. Click ``Edit``, then ``New``, and add the following two paths (adjust to your installation):
4858

49-
Quick Installation via conda-forge (Recommended):
59+
#. ``C:\Users\YourUsername\Anaconda3``
60+
#. ``C:\Users\YourUsername\Anaconda3\Scripts``
61+
62+
After completing these steps, you should be able to use conda in the Windows Command Prompt and configure your IDE accordingly.
63+
64+
**Quick Installation via conda-forge (Recommended)**:
5065

5166
#. You can install PolyglotDB using a single Conda command :code:`conda create -n polyglotdb -c conda-forge polyglotdb python=3.12`
5267
#. Activate conda environment :code:`conda activate polyglotdb`
5368
#. You then have the ``pgdb`` utility that can be run inside your conda environment and manages a local database.
5469

55-
To install from source (primarily for development):
70+
**To install from source (primarily for development)**:
5671

5772
#. Clone or download the Git repository (https://github.com/MontrealCorpusTools/PolyglotDB).
5873
#. Navigate to the directory via command line and create the conda environment via :code:`conda env create -f environment.yml`
5974
#. Activate conda environment :code:`conda activate polyglotdb-dev`
6075
#. Install PolyglotDB via :code:`pip install -e .`, which will install the ``pgdb`` utility that can be run inside your conda environment
6176
and manages a local database.
6277

78+
**Using the Conda Environment in your IDE's integrated terminal**: (VSCode example)
79+
80+
If you are using an IDE, you may encounter issues where the IDE's default Python interpreter is different from the one set up in your Conda environment.
81+
This can lead to errors such as missing packages, even if you've installed everything correctly in Conda.
82+
In such cases, you need to manually set the Python interpreter in your IDE to point to the one used by your Conda environment.
83+
If you are on Windows, make sure you have completed :ref:`this step<add_conda_to_path>` so that the Conda environment is accessible from your IDE's terminal.
84+
For Visual Studio Code, follow these steps (a similar process applies to most other IDEs):
85+
86+
#. Make sure you have the Python extension installed in VSCode.
87+
#. Open VSCode and open Command Palette (``Ctrl+Shift+p`` on Windows or ``cmd+shift+p`` on Mac), then choose ``Python: Select Interpreter``.
88+
#. Select the interpreter corresponding to your Conda environment (e.g., ``conda-env:polyglotdb``).
89+
#. Open a new terminal in VSCode. If the environment is not activated automatically, run :code:`conda activate polyglotdb`
90+
91+
Now, you can run PolyglotDB commands and scripts directly within VSCode's integrated terminal.
92+
6393
.. _local_setup:
6494

6595
Set up local database
6696
---------------------
6797

6898
Installing the PolyglotDB package also installs a utility script (``pgdb``) that is then callable from the command line inside your conda environment.
6999
The ``pgdb`` command allows for the administration of a single Polyglot database (install/start/stop/uninstall).
70-
Using ``pgdb`` requires that several prerequisites be installed first, and the remainder of this section will detail how
71-
to install these on various platforms.
72-
Please be aware that using the ``pgdb`` utility to set up a database is not recommended for larger groups or those needing
73-
remote access.
74-
See the `ISCAN server`_ for a more fully featured solution.
75-
76-
Mac & Linux
77-
```````````
78-
#. Make sure you are inside the dedicated conda environment just created. If not, activate it via :code:`conda activate polyglotdb`
79-
#. Inside your conda environment, run :code:`pgdb install /path/to/where/you/want/data/to/be/stored`, or
80-
:code:`pgdb install` to save data in the default directory.
81-
82-
.. warning::
83-
84-
Do not use ``sudo`` with this command on Macs, as it will lead to permissions issues later on.
100+
``pgdb install`` is a separate step that installs the actual local database backend, including Neo4j and InfluxDB. This is necessary to run PolyglotDB locally.
85101

86-
Once you have installed PolyglotDB, to start it run :code:`pgdb start`.
87-
Likewise, you can close PolyglotDB by running :code:`pgdb stop`.
102+
Installing the local database
103+
`````````````````````````````
88104

89-
To uninstall, run :code:`pgdb uninstall`
90-
91-
Windows
92-
```````
93-
94-
#. Make sure you are running as an Administrator (right-click on Anaconda Prompt/Miniforge Prompt and select "Run as administrator"), as Neo4j will be installed as a Windows service.
95-
#. If you had to reopen a command prompt in Step 1, reactivate your conda environment via: :code:`conda activate polyglotdb`.
105+
#. Make sure you are inside the dedicated conda environment just created. If not, activate it via :code:`conda activate polyglotdb`
96106
#. Inside your conda environment, run :code:`pgdb install /path/to/where/you/want/data/to/be/stored`, or
97107
:code:`pgdb install` to save data in the default directory.
98108

99-
To start/stop the database, you likewise have to use an administrator command prompt before entering the commands :code:`pgdb start`
100-
or :code:`pgdb stop`.
109+
.. Warning::
110+
#. On Windows, make sure you are running as an Administrator (right-click on Anaconda Prompt/Miniforge Prompt/Command Prompt/Your IDE and select "Run as administrator"), as Neo4j will be installed as a Windows service.
111+
#. Do not use ``sudo`` with ``pgdb install`` on Macs, as it will lead to permissions issues later on.
112+
113+
Managing the local database
114+
```````````````````````````
101115

102-
To uninstall, run :code:`pgdb uninstall` (also requires an administrator command prompt).
116+
To start the database :code:`pgdb start`
117+
To stop the database :code:`pgdb stop`
118+
To uninstall the database :code:`pgdb uninstall`
103119

104120

105121
To view your conda environments:
@@ -122,7 +138,7 @@ Steps to use PolyglotDB
122138
Now that you have set up the PolyglotDB environment and installed local databases,
123139
follow these steps each time you use PolyglotDB:
124140

125-
#. Navigate to your working directory, either in your IDE or via the command line. (On Windows, use Anaconda Prompt/Miniforge Prompt.)
141+
#. Navigate to your working directory, either in your IDE or via the command line.
126142
#. Activate the conda environment: :code:`conda activate polyglotdb`.
127143
#. Start the local databases: :code:`pgdb start`.
128144
#. Write your Python scripts inside this working directory.
@@ -132,22 +148,21 @@ follow these steps each time you use PolyglotDB:
132148

133149
.. _docker_install:
134150

135-
Docker Environment
136-
===================
151+
Alternative Installation (Using Docker Environment)
152+
===================================================
137153

138-
Running PolyglotDB in a `Docker`_ container is a great way to maintain a consistent environment, isolate dependencies, and streamline your setup process. This section will guide you through setting up and using PolyglotDB within Docker.
154+
Running PolyglotDB in a `Docker`_ container is a great way to maintain a consistent environment, isolate dependencies, and streamline your setup process.
155+
This section will guide you through setting up and using PolyglotDB within Docker. Note that this method is an alternative to the default installation with conda environment.
139156

140157
Prerequisites
141158
-------------
142159

143-
Before starting, ensure that Docker is installed on your system. You can check if Docker is installed and verify its version by running the following command in your terminal:
160+
Before starting, ensure that Docker is installed on your system. You can check if Docker is installed by running the following command in your terminal:
144161

145162
.. code:: bash
146163
147164
docker version
148165
149-
Make sure your Docker Engine version is **19.03.0** or higher.
150-
151166
Setting Up the Docker Container
152167
-------------------------------
153168

@@ -166,6 +181,7 @@ Follow these steps to get your Docker container up and running:
166181
**Note for Mac Users:**
167182
If you're using a Mac, you might need to run :code:`docker compose run polyglotdb`
168183

184+
The docker compose run automatically starts the databases server therefore there's no extra steps to set up the databases.
169185
This command launches an interactive shell inside the `polyglotdb` container, allowing you to execute PolyglotDB scripts directly.
170186

171187
3. **Working with the Default Folder Structure:**
@@ -211,6 +227,28 @@ Follow these steps to get your Docker container up and running:
211227
However, if you want to preserve your scripts after shutting down the container,
212228
ensure you save them in the directory mounted to your device (default: ``/polyglotdb``).
213229

230+
- **Note when writing your scripts**:
231+
232+
#. It is important to **avoid** using absolute paths in your scripts when working with Docker.
233+
This is because the Docker container has its own internal filesystem, so absolute paths from your host machine
234+
(e.g., ``/home/user/documents/my_corpus``) will not be valid inside the container.
235+
Instead, always use relative paths based on the current working directory inside the container.
236+
Additionally, you must place all files you want to reference (such as corpus folders, Praat scripts, etc.)
237+
inside the directory that is mounted to the Docker container, which is the ``polyglotdb-docker`` directory by default.
238+
239+
.. code:: python
240+
241+
import os
242+
corpus_root = './data/my_corpus'
243+
# Now you can use corpus_root to access files in the my_corpus folder
244+
245+
#. The Docker setup comes with several pre-installed tools inside the `polyglotdb` container located at `/pgdb/tools`:
246+
247+
1. `Praat`_: Installed at `/pgdb/tools/praat`, environment variable `praat`. In your script, you can reference it by :code:`os.environ.get('praat')`.
248+
2. `Reaper`_: Installed at `/pgdb/tools/reaper`, environment variable `reaper`. In your script, you can reference it by :code:`os.environ.get('reaper')`.
249+
250+
251+
214252
5. **Stopping the Docker Containers:**
215253

216254
To stop the Docker containers, first exit the `polyglotdb` shell by running:
@@ -226,6 +264,7 @@ Follow these steps to get your Docker container up and running:
226264
docker compose down
227265
228266
.. _Changing the Default Storage Location:
267+
229268
Changing the Default Storage Location
230269
-------------------------------------
231270

@@ -265,11 +304,3 @@ You can also change the working directory by modifying the `docker-compose.yml`
265304
- /path/to/your/working/directory:/polyglotdb
266305
267306
By doing this, the specified directory on your device will be mounted to the Docker container under `/polyglotdb`. To access PolyglotDB scripts and data within the container, ensure they are placed inside your chosen directory.
268-
269-
Pre-installed Tools
270-
-------------------
271-
272-
The Docker setup comes with several pre-installed tools inside the `polyglotdb` container located at `/pgdb/tools`:
273-
274-
1. `Praat`_: Installed at `/pgdb/tools/praat`, environment variable `praat`. In your script, you can reference it by :code:`os.environ.get('praat')`.
275-
2. `Reaper`_: Installed at `/pgdb/tools/reaper`, environment variable `reaper`. In your script, you can reference it by :code:`os.environ.get('reaper')`.

docs/source/introduction.rst

Lines changed: 29 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,16 @@ Introduction
2727

2828
.. _@esteng: https://github.com/esteng
2929

30+
.. _@lxy2304: https://github.com/lxy2304
31+
32+
.. _@massimolipari: https://github.com/massimolipari
33+
34+
.. _@michaelhaaf: https://github.com/michaelhaaf
35+
36+
.. _@james-tanner: https://github.com/james-tanner
37+
38+
.. _@msonderegger: https://github.com/msonderegger
39+
3040
.. _@samihuc: https://github.com/samihuc
3141

3242
.. _@MichaelGoodale: https://github.com/MichaelGoodale
@@ -60,36 +70,19 @@ General Background
6070

6171
**PolyglotDB** is a Python package that focuses on representing linguistic
6272
data in scalable, high-performance databases (called "Polyglot"
63-
databases here) to apply acoustic
64-
analysis and other algorithms to large speech corpora.
65-
66-
In general there are two ways to leverage PolyglotDB for analyzing a
67-
dataset:
68-
69-
1. The first way, more appropriate for technically skilled users, is
70-
through a Python API: writing Python scripts that import functions
71-
and classes from PolyglotDB. (For this route, see
72-
:ref:`installation` for setting up PolyglotDB, followed by
73-
:ref:`tutorial` for walk-through examples.) This way also makes
74-
more sense for users in an individual lab, where it can be assumed
75-
that all users have the same level of access to datasets (without
76-
any ethical issues).
77-
78-
2. The second way, more appropriate for a user group dispersed across
79-
multiple sites and where some users are less comfortable with
80-
Python scripting, is by setting up an ISCAN (Integrated Speech
81-
Corpus ANalysis) server---see the `ISCAN documentation`_ for more
82-
details. ISCAN servers allow users to view information and
83-
perform most functions of PolyglotDB through a web browser. In
84-
addition, ISCAN servers include features for the use case of
85-
multiple datasets with differential access: by user/corpus
86-
permissions level, and functionality for managing multiple
87-
Polyglot databases.
88-
89-
This documentation site is relevant for ways PolyglotDB canbeused, but
90-
is geared towards a technically-skilled user and thus focuses more on
91-
the use case of using PolyglotDB "by script" (#1).
92-
73+
databases here) to apply acoustic analysis and other algorithms to large speech corpora.
74+
75+
Users interact with PolyglotDB primarily through its Python API: writing Python scripts
76+
that import functions and classes from PolyglotDB. See :ref:`installation` for setting up PolyglotDB
77+
, followed by :ref:`tutorial` for walk-through examples.
78+
79+
.. note::
80+
81+
For those interested in a web-based interface, ISCAN (Integrated Speech Corpus ANalysis) is a separate
82+
project built on top of PolyglotDB. ISCAN servers allow users to view information and perform
83+
most functions of PolyglotDB through a web browser.
84+
See the `ISCAN documentation`_ for more details on setting it up.
85+
9386
The general workflow for working with PolyglotDB is:
9487

9588
* **Import**
@@ -207,11 +200,16 @@ Contributors
207200
------------
208201

209202
* Michael McAuliffe (`@mmcauliffe`_)
203+
* Xiaoyi Li (`@lxy2304`_)
204+
* Michael Haaf (`@michaelhaaf`_)
210205
* Elias Stengel-Eskin (`@esteng`_)
206+
* Arlie Coles (`@a-coles`_)
211207
* Sarah Mihuc (`@samihuc`_)
212208
* Michael Goodale (`@MichaelGoodale`_)
209+
* Massimo Lipari (`@massimolipari`_)
213210
* Jeff Mielke (`@jeffmielke`_)
214-
* Arlie Coles (`@a-coles`_)
211+
* James Tanner (`@james-tanner`_)
212+
* Morgan Sonderegger (`@msonderegger`_)
215213

216214

217215
Citation

docs/source/tutorial_enrichment.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ Tutorial 2: Adding extra information
1212
The main objective of this tutorial is to enrich an already imported corpus (see :ref:`tutorial_first_steps`) with additional
1313
information not present in the original audio and transcripts. This additional information will then be used for creating
1414
linguistically interesting queries in the next tutorial (:ref:`tutorial_query`).
15+
All the enrichment files that we will use in this tutorial are already bundled in with the tutorial corpus.
1516

1617
.. note::
1718

0 commit comments

Comments
 (0)