The algorithms matches reddit users in five steps:
1. Scrape the subreddits of interest
2. Scrape the activities of the users whose posts are included in step 1 (treatment group) across all subreddits
3. Analyze the behavoir of all users to find the most unique subreddit for a given user
4. Scrape posts in the relevant subreddit to find a match (control) for the treatment user
5. Scrape the activities of the control group
Steps 1 & 2 are achieved by scraper.py, Step 3 in unique_sr_finder.py, Step 4 in match_finder.py and Step 5 in control_scraper.py
Make sure that you have access to the Reddit API. You will need the 'personal use script' and 'secret'.
Additionally, the script uses praw, psaw, configparser, numpy and pandas modules, so please be sure to install those.
To run the script:
-
Input your Reddit API credentials in the
scraperSettingssection ofconfig_public.inipersonal use scriptasclientIDsecretasclientSecretuserAgentshould be something unique and descriptive
-
Fill in the
treatmentSubredditand change other configurations as you see fit -
Rename
config_public.initoconfig.ini -
Navigate to the root directory in command line
-
chmod +x user_matching.shto make the script executable -
./user_matching.shto run the scriptmatch_finder2.pyis a multiprocess version ofmatch_finder.py. Use with caution