Node.js version: https://github.com/KORINZ/nhk-news-scraper-js
This project is a Python script for scraping news articles from NHK News Web Easy, a website that provides news articles written in simpler Japanese, suitable for language learners. The script extracts the article's URL, title, content, and essential vocabulary along with their furigana (hiragana reading) and generates a quiz for students based on the scraped article.
See the .txt file in the repository for an example output.
本プロジェクトは、語学学習者に適したより簡単な日本語で書かれたニュース記事を提供するサイト「NHKニュースウェブイージー」からニュース記事をスクレイピングするためのPythonスクリプトです。記事のURL、タイトル、内容、必須語彙をふりがなとともに抽出し、スクレイピングされた記事をもとに学生向けのクイズを生成するスクリプトです。
出力例については、リポジトリにある.txtファイルを参照してください。
- Extract a random news article from NHK News Web Easy
- Save article details (URL, date, title, content) and featured vocabularies (with furigana) in a text file
- Generate a daily quiz for students based on the scraped article
- Send customized quizzes, messages or stickers to LINE with Python GUI
- Automatically receive (via Google Apps Script) and evaluate answers and upload them to Google Sheets (via Python)
- Check sentiment scores for the news article
- Translate news articles/vocabularies to other languages via DeepL API with command line interface
Tested on Python 3.11 with Windows 11, WSL (Ubuntu 20.04), and macOS Ventura.
Required:
chardetBeautifulSoup4Seleniumwebdriver_managerrequestsline-bot-sdkcustomtkinter
Optional (check_grade_book.py):
pandasgspreadtabulate
Optional (check_sentiment.py):
transformersscipytorchtorchvisiontorchaudiofugashi[unidic]ipadic
Optional (translate.py):
deepl
Note: currently, fugashi will not work on Python downloaded from Microsoft Store. You will need to install Python from the official website if you want to use sentiment analysis.
- Sign up for a LINE official account.
- Get your own
CHANNEL_ACCESS_TOKEN(チャネルアクセストークン) andUSER_ID(あなたのユーザーID) from LINE Developers Messaging API Settings. - For macOS users, installation of MeCab is required if you want to use sentiment analysis:
brew install mecab- Clone this repository:
git clone https://github.com/KORINZ/nhk_news_web_easy_scraper.git- Install the required packages listed in the dependencies (make sure you are inside the cloned repository folder):
pip install -r requirements.txt- To run GUI:
python customtkinter_GUI.py- To run on the terminal:
python main.py-
The script will generate a text file
news_article.txtcontaining the article's URL, date, title, content, and essential vocabulary (with furigana and defintions) from a random news article. -
text files for quizzes and logging will also be generated.
- Install Japanese fonts:
sudo apt update
sudo apt install -y fonts-ipafont- Install tkinter; replace
xxwith your Python version:
sudo apt-get install python3.xx-tk- Install support for Linux GUI apps, see:
https://learn.microsoft.com/en-us/windows/wsl/tutorials/gui-apps
- Click on
クイズ作成to scrap a random news article and generate quizzes. - Click on
LINE機密情報入力inside設定tab to fill in yourCHANNEL_ACCESS_TOKEN(チャネルアクセストークン) andUSER_ID(あなたのユーザーID). - Click on
LINEに発信to send the quiz. - pending
- Set up a Google Cloud Platform account is required (https://console.cloud.google.com/).
- pending
- pending
- Create a database to store all past news articles, vocabularies, and quizzes
- Improve the formatting of the output text file
- Add translation to quiz vocabulary
Note that this script is for educational purposes only. When using the scraped content, follow the copyright laws and regulations applicable in your country. Make sure to properly cite the content's source and respect the content owners' intellectual property rights.

