A framework to analyse of echo chambers on YouTube (for now...)
If you want to train a transformers model on GPU with SpaCy, you need to download extra libraries. See here for more informations. You also need to choose and download one spacy model, which will be use to preprocess the corpus for topic modeling and training a classification model. Last step, create your python environment.
# venv
python -m venv my_env
source my_env/bin/activate
pip install -r requirements.txtBefore running the scripts, it is recommended to have an idea of your corpus structure. At the end, you will have 3 files:
- one with the videos' metadata, captions and gensim annotation;
- one with the comments' metadata, perspective api annotation and agree-disagree annotation;
- one with the commentators' metadata. No need to worry about directories, they will be created when saving files or models.
You need to get a key to access Youtube Data API v3 and another one to access Perspective API. You can also request an increase of quota for youtube or perspective, if you are particulary impatient or are scrapping a big youtube channel. I cannot garantee your requests will be granted.
You can use our scripts to get corpus from YouTube through the API. For more information, see misc directory.
All functions are commented, and Python files are in the docs directory to show you how to import and use every part of the processing chain. Soon, you will be able to use the framework through a command-line interface.
Darenne, L. (2024). Propositions pour l'identification, la modélisation et la quantification des chambres d’écho : Expérimentation sur un corpus de commentaires YouTube. Master Thesis, Institut National des Langues et Civilisations Orientales.
@mastersthesis{darenne_2024,
author = "Laura Darenne",
title = "Propositions pour l'identification, la modélisation et la quantification des chambres d’écho : Expérimentation sur un corpus de commentaires YouTube",
school = "Institut National des Langues et Civilisations Orientales",
year = "2024"
}
Guillaume Plique, Pauline Breteau, Jules Farjas, Héloïse Théro, Jean Descamps, Amélie Pellé, Laura Miguel, César Pichon, & Kelly Christensen. (2019). Minet, a webmining CLI tool & library for python. Zenodo. http://doi.org/10.5281/zenodo.4564399
Gensim. https://radimrehurek.com/gensim/models/ldamodel.html
Perspective API. https://current.withgoogle.com/the-current/toxicity/.
SpaCy. https://spacy.io/