Skip to content

Conversation

@shunfan-shao
Copy link
Contributor

Add some checks for forked project. Please let me know if there is any issue

@shunfan-shao shunfan-shao changed the title Add pr-check for forked project Draft: Add pr-check for forked project Nov 30, 2021
@shunfan-shao
Copy link
Contributor Author

shunfan-shao commented Dec 1, 2021

Currently GitHub API is fully public. There is an hour limit for specific IP address. Integrate authorization token into pipeline could increase the limit as well

Here are some information on GitHub API rate limit: https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting

@shunfan-shao
Copy link
Contributor Author

Add some more details per discussed. The pipeline should fail when I explicitly committed a change with project being a forked one: https://github.com/TestingResearchIllinois/idoft/runs/4378193026?check_suite_focus=true.

Copy link
Contributor

@winglam winglam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes! Minor suggestions to improve.

# Contains regexes for columns that are commmon to pr-data and tic-fic-data
common_data = {
"Project URL": re.compile(r"(https:\/\/github.com)(\/(\w|\.|-)+){2}"),
"Project URL": re.compile(r"(https:\/\/github.com\/([\w|\.|-]+)\/([\w|\.|-]+))"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the unchanged line giving you some issues?

log_esp_error(filename, log, f"{author}/{repo} is a forked repo")
except requests.exceptions.RequestException as e:
# handle(e)
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of simply passing here, it would be good to warn that the check for fork failed along with outputting the exception

if check_rule.__name__ == check_row_length.__name__:
check_rule(len(header), *params)
continue
if check_rule.__name__ == check_repo_sanity.__name__:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of running this check_rule for every changed line in the CSV file, we could just run this check when a new Project URL is added that does not already exist. i.e., get (1) the Project URLs that are already in the unchanged file and (2) the Project URLs in the new changes, and only run this check for Project URLs that are in (2) but not in (1).

This improvement need not be done immediately but would help avoid the Github limit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants