Skip to content

Commit c9cfce5

Browse files
authored
Merge pull request #110 from C0rn3j/master
Typing, CI and cosmetic fixups
2 parents 71c1529 + 7edaa28 commit c9cfce5

File tree

7 files changed

+72
-71
lines changed

7 files changed

+72
-71
lines changed

.github/workflows/python-package.yml

Lines changed: 23 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -5,35 +5,35 @@ name: Python package
55

66
on:
77
push:
8-
branches: [ master ]
8+
branches: [master]
99
pull_request:
10-
branches: [ master ]
10+
branches: [master]
1111

1212
jobs:
1313
build:
1414

15-
runs-on: ubuntu-latest
15+
runs-on: ubuntu-24.04
1616
strategy:
1717
matrix:
18-
python-version: [3.6, 3.7, 3.8]
18+
python-version: ["3.10", "3.11", "3.12", "3.13"]
1919

2020
steps:
21-
- uses: actions/checkout@v2
22-
- name: Set up Python ${{ matrix.python-version }}
23-
uses: actions/setup-python@v2
24-
with:
25-
python-version: ${{ matrix.python-version }}
26-
- name: Install dependencies
27-
run: |
28-
python -m pip install --upgrade pip
29-
pip install flake8 pytest
30-
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
31-
- name: Lint with flake8
32-
run: |
33-
# stop the build if there are Python syntax errors or undefined names
34-
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
35-
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
36-
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
37-
- name: Test with pytest
38-
run: |
39-
pytest
21+
- uses: actions/checkout@v4
22+
- name: Set up Python ${{ matrix.python-version }}
23+
uses: actions/setup-python@v5
24+
with:
25+
python-version: ${{ matrix.python-version }}
26+
- name: Install dependencies
27+
run: |
28+
python -m pip install --upgrade pip
29+
pip install flake8 pytest
30+
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
31+
- name: Lint with flake8
32+
run: |
33+
# stop the build if there are Python syntax errors or undefined names
34+
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
35+
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
36+
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
37+
- name: Test with pytest
38+
run: |
39+
pytest

.github/workflows/python-publish.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,11 @@ jobs:
1313
runs-on: ubuntu-latest
1414

1515
steps:
16-
- uses: actions/checkout@v2
16+
- uses: actions/checkout@v4
1717
- name: Set up Python
18-
uses: actions/setup-python@v2
18+
uses: actions/setup-python@v5
1919
with:
20-
python-version: '3.x'
20+
python-version: '3.13'
2121
- name: Install dependencies
2222
run: |
2323
python -m pip install --upgrade pip

readme.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
A python project which downloads words from English Wiktionary ([en.wiktionary.org](https://en.wiktionary.org)) and parses articles' content in an easy to use JSON format. Right now, it parses etymologies, definitions, pronunciations, examples, audio links and related words.
44

5-
Note: This project will not be maintained since there are many free dictionary APIs now, please see - https://dictionaryapi.dev/ for example
5+
There are many free dictionary APIs nowadays which may or may not make this project redundant for you, do check out https://dictionaryapi.dev, for example.
66

77
[![Downloads](http://pepy.tech/badge/wiktionaryparser)](http://pepy.tech/project/wiktionaryparser)
88

@@ -29,7 +29,7 @@ Note: This project will not be maintained since there are many free dictionary A
2929

3030
#### Installation
3131

32-
##### Using pip
32+
##### Using pip
3333
* run `pip install wiktionaryparser`
3434

3535
##### From Source
@@ -59,8 +59,7 @@ Note: This project will not be maintained since there are many free dictionary A
5959

6060
#### Requirements
6161

62-
- requests==2.20.0
63-
- beautifulsoup4==4.4.0
62+
Python 3.10+
6463

6564
#### Contributions
6665

requirements.txt

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
1-
requests==2.20.0
2-
beautifulsoup4==4.9.1
3-
deepdiff==5.0.2
4-
parameterized==0.7.4
5-
requests-futures==1.0.0
6-
mock==4.0.2
7-
pylint==2.6.0
1+
requests
2+
beautifulsoup4
3+
deepdiff
4+
parameterized
5+
requests-futures
6+
mock
7+
pylint

setup.py

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
from setuptools import setup,find_packages
1+
from setuptools import setup
22

33
with open('readme.md', 'r') as readme:
44
long_desc = readme.read()
@@ -13,12 +13,12 @@
1313
data_files=[('testOutput', ['tests/testOutput.json']), ('readme', ['readme.md']), ('requirements', ['requirements.txt'])],
1414
author = 'Suyash Behera',
1515
author_email = '[email protected]',
16-
url = 'https://github.com/Suyash458/WiktionaryParser',
17-
download_url = 'https://github.com/Suyash458/WiktionaryParser/archive/master.zip',
16+
url = 'https://github.com/Suyash458/WiktionaryParser',
17+
download_url = 'https://github.com/Suyash458/WiktionaryParser/archive/master.zip',
1818
keywords = ['Parser', 'Wiktionary'],
19-
install_requires = ['beautifulsoup4','requests'],
19+
install_requires = ['beautifulsoup4', 'requests'],
2020
classifiers=[
21-
'Development Status :: 5 - Production/Stable',
22-
'License :: OSI Approved :: MIT License',
21+
'Development Status :: 5 - Production/Stable',
22+
'License :: OSI Approved :: MIT License',
2323
],
24-
)
24+
)

wiktionaryparser/core.py

Lines changed: 29 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
1-
import re, requests
1+
import logging
2+
import re
3+
import requests
24
from wiktionaryparser.utils import WordData, Definition, RelatedWord
35
from bs4 import BeautifulSoup
46
from itertools import zip_longest
@@ -20,7 +22,7 @@
2022
"coordinate terms",
2123
]
2224

23-
def is_subheading(child, parent):
25+
def is_subheading(child: str, parent: str) -> bool:
2426
child_headings = child.split(".")
2527
parent_headings = parent.split(".")
2628
if len(child_headings) <= len(parent_headings):
@@ -30,60 +32,60 @@ def is_subheading(child, parent):
3032
return False
3133
return True
3234

33-
class WiktionaryParser(object):
34-
def __init__(self):
35+
class WiktionaryParser:
36+
def __init__(self) -> None:
3537
self.url = "https://en.wiktionary.org/wiki/{}?printable=yes"
3638
self.soup = None
3739
self.session = requests.Session()
3840
self.session.mount("http://", requests.adapters.HTTPAdapter(max_retries = 2))
3941
self.session.mount("https://", requests.adapters.HTTPAdapter(max_retries = 2))
40-
self.language = 'english'
42+
self.language: str = 'english'
4143
self.current_word = None
42-
self.PARTS_OF_SPEECH = copy(PARTS_OF_SPEECH)
43-
self.RELATIONS = copy(RELATIONS)
44-
self.INCLUDED_ITEMS = self.RELATIONS + self.PARTS_OF_SPEECH + ['etymology', 'pronunciation']
44+
self.PARTS_OF_SPEECH: list[str] = copy(PARTS_OF_SPEECH)
45+
self.RELATIONS: list[str] = copy(RELATIONS)
46+
self.INCLUDED_ITEMS: list[str] = self.RELATIONS + self.PARTS_OF_SPEECH + ['etymology', 'pronunciation']
4547

46-
def include_part_of_speech(self, part_of_speech):
48+
def include_part_of_speech(self, part_of_speech) -> None:
4749
part_of_speech = part_of_speech.lower()
4850
if part_of_speech not in self.PARTS_OF_SPEECH:
4951
self.PARTS_OF_SPEECH.append(part_of_speech)
5052
self.INCLUDED_ITEMS.append(part_of_speech)
5153

52-
def exclude_part_of_speech(self, part_of_speech):
54+
def exclude_part_of_speech(self, part_of_speech) -> None:
5355
part_of_speech = part_of_speech.lower()
5456
self.PARTS_OF_SPEECH.remove(part_of_speech)
5557
self.INCLUDED_ITEMS.remove(part_of_speech)
5658

57-
def include_relation(self, relation):
59+
def include_relation(self, relation: str) -> None:
5860
relation = relation.lower()
5961
if relation not in self.RELATIONS:
6062
self.RELATIONS.append(relation)
6163
self.INCLUDED_ITEMS.append(relation)
6264

63-
def exclude_relation(self, relation):
65+
def exclude_relation(self, relation) -> None:
6466
relation = relation.lower()
6567
self.RELATIONS.remove(relation)
6668
self.INCLUDED_ITEMS.remove(relation)
6769

68-
def set_default_language(self, language=None):
70+
def set_default_language(self, language=None) -> None:
6971
if language is not None:
7072
self.language = language.lower()
7173

72-
def get_default_language(self):
74+
def get_default_language(self) -> str:
7375
return self.language
7476

75-
def clean_html(self):
77+
def clean_html(self) -> None:
7678
unwanted_classes = ['sister-wikipedia', 'thumb', 'reference', 'cited-source']
7779
for tag in self.soup.find_all(True, {'class': unwanted_classes}):
7880
tag.extract()
7981

80-
def remove_digits(self, string):
82+
def remove_digits(self, string: str) -> str:
8183
return string.translate(str.maketrans('', '', digits)).strip()
8284

83-
def count_digits(self, string):
85+
def count_digits(self, string: str) -> int:
8486
return len(list(filter(str.isdigit, string)))
8587

86-
def get_id_list(self, contents, content_type):
88+
def get_id_list(self, contents: list, content_type: str) -> list[tuple[str, str, str]]:
8789
if content_type == 'etymologies':
8890
checklist = ['etymology']
8991
elif content_type == 'pronunciation':
@@ -96,7 +98,7 @@ def get_id_list(self, contents, content_type):
9698
checklist = self.RELATIONS
9799
else:
98100
return None
99-
id_list = []
101+
id_list: list[tuple[str, str, str]] = []
100102
if len(contents) == 0:
101103
return [('1', x.title(), x) for x in checklist if self.soup.find('span', {'id': x.title()})]
102104
for content_tag in contents:
@@ -107,7 +109,7 @@ def get_id_list(self, contents, content_type):
107109
id_list.append((content_index, content_id, text_to_check))
108110
return id_list
109111

110-
def get_word_data(self, language):
112+
def get_word_data(self, language: str) -> list:
111113
contents = self.soup.find_all('span', {'class': 'toctext'})
112114
word_contents = []
113115
start_index = None
@@ -139,7 +141,7 @@ def get_word_data(self, language):
139141
json_obj_list = self.map_to_object(word_data)
140142
return json_obj_list
141143

142-
def parse_pronunciations(self, word_contents):
144+
def parse_pronunciations(self, word_contents) -> list:
143145
pronunciation_id_list = self.get_id_list(word_contents, 'pronunciation')
144146
pronunciation_list = []
145147
audio_links = []
@@ -168,7 +170,7 @@ def parse_pronunciations(self, word_contents):
168170
pronunciation_list.append((pronunciation_index, pronunciation_text, audio_links))
169171
return pronunciation_list
170172

171-
def parse_definitions(self, word_contents):
173+
def parse_definitions(self, word_contents) -> list:
172174
definition_id_list = self.get_id_list(word_contents, 'definitions')
173175
definition_list = []
174176
definition_tag = None
@@ -191,7 +193,7 @@ def parse_definitions(self, word_contents):
191193
definition_list.append((def_index, definition_text, def_type))
192194
return definition_list
193195

194-
def parse_examples(self, word_contents):
196+
def parse_examples(self, word_contents) -> list:
195197
definition_id_list = self.get_id_list(word_contents, 'definitions')
196198
example_list = []
197199
for def_index, def_id, def_type in definition_id_list:
@@ -212,7 +214,7 @@ def parse_examples(self, word_contents):
212214
table = table.find_next_sibling()
213215
return example_list
214216

215-
def parse_etymologies(self, word_contents):
217+
def parse_etymologies(self, word_contents) -> list:
216218
etymology_id_list = self.get_id_list(word_contents, 'etymologies')
217219
etymology_list = []
218220
etymology_tag = None
@@ -231,7 +233,7 @@ def parse_etymologies(self, word_contents):
231233
etymology_list.append((etymology_index, etymology_text))
232234
return etymology_list
233235

234-
def parse_related_words(self, word_contents):
236+
def parse_related_words(self, word_contents) -> list:
235237
relation_id_list = self.get_id_list(word_contents, 'related')
236238
related_words_list = []
237239
for related_index, related_id, relation_type in relation_id_list:
@@ -246,7 +248,7 @@ def parse_related_words(self, word_contents):
246248
related_words_list.append((related_index, words, relation_type))
247249
return related_words_list
248250

249-
def map_to_object(self, word_data):
251+
def map_to_object(self, word_data: dict) -> list:
250252
json_obj_list = []
251253
if not word_data['etymologies']:
252254
word_data['etymologies'] = [('', '')]
@@ -276,7 +278,7 @@ def map_to_object(self, word_data):
276278
json_obj_list.append(data_obj.to_json())
277279
return json_obj_list
278280

279-
def fetch(self, word, language=None, old_id=None):
281+
def fetch(self, word: str, language: str | None = None, old_id: int | None = None) -> list:
280282
language = self.language if not language else language
281283
response = self.session.get(self.url.format(word), params={'oldid': old_id})
282284
self.soup = BeautifulSoup(response.text.replace('>\n<', '><'), 'html.parser')

wiktionaryparser/py.typed

Whitespace-only changes.

0 commit comments

Comments
 (0)