Welcome to the repository of SaySuomi (previous name CaptainA), a mobile application designed to help users practice their Finnish pronunciation. This repository contains the code for both the mobile application and the backend server.
Our main goals for front-end project is to develop a automatic speech asessment (ASA) feature and implement new UI/UX design to the mobile app. Specifically:
- A new ASA feature in the mobile app: an interface to display a speaking task (usually with picture and text description - with option for translation). User then speak and record their answer (about 30~60s). The mobile app will then send the speech data to the server. After the server finish, mobile app will receive 5 speech ratings scores:
fluency, pronunciation, range, accuracy, holisticand display them to the user. Those data will be store on their mobile app, and user can later access them in a different interface to review their progress.- We will also need an interface/function to collect user's consent and some background information. Those will also be sent to the server (with consent).
- Some functions (audio record, server connection) are already available, you can reuse them.
- As a real mobile app, we also targeting user experience (UX) and user interface (UI). Therefore, we also need a nice front end.
- A spider chart (as we discussed) could be a nice way to display the scores.
- Unity's Animation System is also nice and surprisingly easy to implement, but not required.
- Obviously you will need to work closely with back end and also the design team. Some resources may not be available to you until the end of the project (for example, new icon design).
- The processing time could be from 10~30s, we need to figure out a way to let's user know the server is processing. Some extra feature to collect feedback during waiting, or after getting the score is extremely useful.
- Remember that the mobile app target both Android and iOS, so the UI must work fine in most smartphone with different screensize.
- Other features that not directly related to ASA features but are also needed (for example, interface for text-to-speech system, we will handle the server). Obviously those extra is not priority and depend on the team and the progress of the main work.
For an example of user interface (we need to make a much better one in production), see: https://www.youtube.com/watch?v=cRskPKsSM3g
See the function ServerPost or NumberGamePost (at the bottom of https://github.com/Usin2705/CaptainA_unity/blob/main/Assets/Scripts/Managers/NetworkManager.cs) on how to send/receive data to/from the server.
For more information about how the backend would look like, you can look at SaySvenska server: https://github.com/Usin2705/SaySvenska/tree/main/Server
You can look at an example of the API (a bit old now) from SaySuomi Readme file: https://github.com/Usin2705/CaptainA_unity/tree/main
SaySuomi (previous name CaptainA) is a mobile application that utilizes the wav2vec 2.0 model for Finnish pronunciation practice. The app is available for download on both Google Play and the Apple App Store. The current version of the wav2vec 2.0 model used in the app can be downloaded from HuggingFace. The demo paper offers a short introduction. For more detailed analysis and documentation on the development of the original app, you can refer to Master's thesis.
The server for CaptainA runs on Nginx and comes with a Dockerfile, allowing it to run without any extra installations (aside from Nginx and Docker/Podman). Here are the steps to set up and run the server:
-
Model Download: Download the model from HuggingFace and copy it to the folder "PATH_TO_SERVER_FOLDER/models/nhan_wav2vec2-xls-r-300m-finnish-ent-10".
-
Port Setup: Open a port for CaptainA (let's call it PORT) and update the new port number in the Docker file.
-
Docker/Podman Setup: You can use either Docker or Podman for the Dockerfile. The default command is for Podman, but you can replace it with Docker by simply changing the command from
podmantodocker. -
Build the Image: First, build the image from the docker file using the command:
podman build --pull --rm -f "Dockerfile" -t captaina:latest "." -
Run the Server: To ensure the server will automatically restart even if the backend reboots, use the following command (choose between docker or podman):
For Docker:
docker run --restart=unless-stopped -d -p PORT:PORT --name captaina_server captainaFor Podman:
podman run --restart=always -d -p PORT:PORT --name captaina_server captaina -
Reboot the Server: Reboot the server to check if the docker container automatically restarts.
You can set up more workers or threads in the Dockerfile. The default is 2 workers with 1 thread, but this is because the server we use only has 4 Intel X5670 @ 2.93GHz.
The CaptainA server expects a Rest API POST with the following keys:
- file: wav file
- transcript: the target text (that users are expected to read)
The server responds with a JSON in the following format:
{
"levenshtein": [OPS List],
"prediction": "mustikka",
"score": [0.10, 0.75, 0.88, 0.90, 0.99, 0.95, 0.66, 0.01],
"warning": [0, 1, 2, 3]
}Where:
- levenshtein: OPS List (see example below)
- prediction: string prediction of ASR model
- score: list of pronunciation scores for each phoneme (for "mustikka" it would be [0.10, 0.75, 0.88, 0.90, 0.99, 0.95, 0.66, 0.01], indicating the first and last letter/phone (m and a) were mispronounced)
- warning: list of warning (int): [0, 1, 2, 3]. There are currently 4 warnings: word too short, NP should be pronounced as MP, NK and NG sound, Boundary gemination - Mene pois!
Random example of ops list, the list was just the result from Levenshtein.editops(transcript, prediction) converted into a dictionary for straightforward usage in Unity.
[
{"ops": "insert", "tran_index": 0, "pred_index": 2},
{"ops": "delete", "tran_index": 3, "pred_index": 4},
{"ops": "replace", "tran_index": 5, "pred_index": 6}
]Text-to-speech system: Develop a text-to-speech system so user can listen to example in Finnish.
Grammar Function for SaySuomi (low priority): Develop a feature allowing users to review grammar rules and practice them:
- Extract practice examples directly from flashcards.
- Users will be presented with English text to translate into Finnish.
- Users can voice their answers.
- Preferably, users can also type their answers. (Note: The learning benefit of typing might differ from traditional writing.)
The SaySuomi is licensed under the GNU Affero General Public License, version 3 or later. Other related work to SaySuomi made by the authors (thesis work, journal articles, audio samples, pictures, videos ...) are licensed under a Creative Commons "Attribution-NonCommercial-ShareAlike 4.0 International" (BY-NC-SA 4.0) license.
Other works not made by the authors are licensed accordingly to their respective owners:
- The authors of Oma Suomi 1: Kristiina Kuparinen, Terhi Tapaninen and Finn Lectura have given us permission to use the text in Oma Suomi 1 to create the flashcard for the SaySuomi app.
- Anki is licensed under AGPL3.
- SuperMemo2 is open to the public: Algorithm SM-2, (C) Copyright SuperMemo World, 1991. https://www.supermemo.com.
- The side picture illustrations are created by Aino Huhtaniemi (https://ainohuhtaniemi.com/), and she gave her permission to use and modify her original illustrations for the SaySuomi app.
- Some icons used in the application are from Google under Apache License 2.0.
- Photo illustrations and some of the videos were made with the contribution of Aija Elg and Noora Heikiö from Aalto University Language Centre.
- Some audio samples are from Aalto University Language Centre.
- Some audio samples are from Common Voice 11.0, licensed under Creative Commons Zero 1.0.
- Some audio samples and text examples are from LibriVox under Public Domain.
- We are grateful to Apollo Ailus and Kia Raitanen for their contributions to user research and engagement, and to Aalo Kailu, who designed the original user interface of the app.