Skip to content

Stream extra_metrics fails on repos with large number of issues/PRs #205

@laurentS

Description

@laurentS

When running the tap on https://github.com/microsoft/TypeScript with the extra_metrics stream, it crashes because the number of open issues is shown as 5k+ on the project page.

When navigating to https://github.com/microsoft/TypeScript/issues the actual number is 5988 (as of writing this), so really closer to 6k.

Stack trace:

File "tap_github/repository_streams.py", line 2071, in parse_response
     yield from scrape_metrics(response, self.logger)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "tap_github/scraping.py", line 126, in scrape_metrics
     issues = parse_counter(soup.find("span", id="issues-repo-tab-count"), logger)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "tap_github/scraping.py", line 109, in parse_counter
     return int(title_string.strip().replace(",", ""))
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: '5000+'

It would make sense to source the number of open issues and PRs from the graphql api endpoint instead.
I will open a PR to fix this.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions