-
Notifications
You must be signed in to change notification settings - Fork 11
More receiving yards cases #117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Removed deprecated labels for elapsed time columns in cfbd_drives and added new columns to cfbd_live_plays documentation and tests. Updated test expectations to use expect_in instead of expect_setequal for column checks.
Added new columns to cfbd_play_stats_player output, improved sack player aggregation, handled NULL values, and updated documentation and examples to reflect changes. Also updated cfbd_live_plays documentation to include new columns for average start yard line and deserve to win metrics.
Changed cfbd_pbp_data to assign 3 timeouts per half for offense and defense when timeout data is missing from the API. Updated documentation and examples to reflect this behavior.
Added .groups = "drop" to the dplyr::summarise call in add_play_counts to control grouping behavior and prevent potential warnings in future dplyr versions.
Removed the specific count of variables from the return value description in both R and Rd files to improve maintainability and accuracy as the data frame structure may change.
… function usage in play data functions Corrected spacing and replaced superseded `dplyr::distinct_all()` with `dplyr::distinct()`, and standardized assignment spacing for improved code readability and consistency.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
WalkthroughAdds 2025-specific parsing branches to add_yardage yds_receiving logic; changes cfbd_pbp_data to subset and validate per-game plays (skip games with <20 plays and emit alert) for EPA/WPA computation; adjusts multiple tests to check expected columns are present. No public API signature changes. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Row as play_text (row)
participant Add as add_yardage()
participant Parser as yds_receiving logic
Row->>Add: call with play_text + metadata
Add->>Parser: evaluate case_when branches
rect rgb(240,248,255)
note right of Parser: Existing parsing branches
Parser-->Parser: classic forms → parse yards
end
rect rgb(237,250,240)
note right of Parser: New 2025-specific branches
Parser-->Parser: "pass to ... for a loss of N" → -N
Parser-->Parser: "pass to ... for N yd(s)/yards" → N
Parser-->Parser: trailing "pass"/"PASSER pass" variants → parse after "for"
Parser-->Parser: leading "to ..." forms: "for no gain" → 0 / "for a loss" → -N / "for N" → N
end
alt matched
Parser-->>Add: numeric yds_receiving
else no match
Parser-->>Add: NA_real_
end
Add-->>Row: updated row with yds_receiving
sequenceDiagram
autonumber
participant All as all_game_ids
participant Proc as cfbd_pbp_data map loop
participant Game as game_plays (per-game)
All->>Proc: iterate game_id x
Proc->>Game: subset plays for x -> game_plays
alt nrow(game_plays) < 20
Proc->>Proc: cli::cli_alert_danger("skipping game with <20 plays")
Proc-->>All: skip / return NULL for x
else
Proc->>Proc: compute EPA/WPA using game_plays
Proc-->>All: return game_plays (for aggregation)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
R/helper_pbp_add_yardage.R (2)
88-93: Good addition for negative-yard “pass to … for a loss of …”; add a zero-yard case too.Handle the “for no gain” phrasing for plain “pass to …” to avoid leaving NA for these plays.
Apply this diff to add the zero-yard case (order before other “pass to …” yards rules is fine):
+ .data$pass == 1 & + stringr::str_detect(.data$play_text, regex("pass to", ignore_case = TRUE)) & + stringr::str_detect(.data$play_text, regex("for no gain", ignore_case = TRUE)) ~ 0,
100-105: Tighten the “for X yards” regex; currenty\\w*ds?is overbroad.Use explicit units and a word boundary to avoid accidental matches.
- stringr::str_detect(.data$play_text, regex(" for \\d+ y\\w*ds?", ignore_case = TRUE)) ~ + stringr::str_detect(.data$play_text, regex("\\bfor\\s+\\d+\\s*(?:yds?|yards?)\\b", ignore_case = TRUE)) ~ as.numeric(stringr::str_extract( - stringi::stri_extract_first_regex(.data$play_text, "(?<= for)[^,]+"), "\\d+" + stringi::stri_extract_first_regex(.data$play_text, "(?<=\\bfor\\b)[^,]+"), "\\d+" )),
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
R/helper_pbp_add_yardage.R(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: ubuntu-latest (oldrel-1)
- GitHub Check: windows-latest (release)
- GitHub Check: ubuntu-latest (release)
🔇 Additional comments (1)
R/helper_pbp_add_yardage.R (1)
115-126: Fix contradictory regex: require trailing "pass" (optionally with parens) and allow "to" anywhereFile: R/helper_pbp_add_yardage.R (lines ~115-126). The current branches require '^to ' (start-of-string) and also check for 'pass' at the end — these predicates are effectively contradictory, so the branches will rarely fire. (github.com)
- # 2025 has some plays list "PASSER pass" at the very end of the play_text - .data$pass == 1 & - stringr::str_detect(.data$play_text, regex("pass \\(\\w", ignore_case = TRUE)) & - stringr::str_detect(.data$play_text, regex("^to ", ignore_case = FALSE)) ~ as.numeric(stringr::str_extract( - stringi::stri_extract_first_regex(.data$play_text, "(?<= for)[^,]+"), "\\d+" - )), - .data$pass == 1 & - stringr::str_detect(.data$play_text, regex("pass$", ignore_case = TRUE)) & - stringr::str_detect(.data$play_text, regex("^to ", ignore_case = FALSE)) ~ as.numeric(stringr::str_extract( - stringi::stri_extract_first_regex(.data$play_text, "(?<= for)[^,]+"), "\\d+" - )), + # 2025: some plays end with "... pass" (optionally with parens); yards still appear after "for" + .data$pass == 1 & + stringr::str_detect(.data$play_text, regex("\\bpass(?:\\s*\\([^)]*\\))?\\s*$", ignore_case = TRUE)) & + stringr::str_detect(.data$play_text, regex("\\bto\\s", ignore_case = TRUE)) ~ + as.numeric(stringr::str_extract( + stringi::stri_extract_first_regex(.data$play_text, "(?<=\\bfor\\b)[^,]+"), "\\d+" + )),Re-run the pipeline against Game ID 401754622 to confirm these plays are now captured.
Added a check to filter out games with fewer than 20 plays in the play-by-play data processing. This helps avoid issues with EPA/WPA models and improves data validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
R/cfbd_pbp_data.R (1)
597-609: LGTM! Per-game processing correctly implemented.The change from processing and returning the full
play_dfto per-gamegame_playssubsets is correct and aligns with the new validation logic. Themap_dfrwill properly combine the game-specific results.Note:
game_playsis defined on line 591 and reassigned here after piping through the processing functions. While this variable reuse pattern works in R, consider using a different variable name (e.g.,processed_plays) for the pipeline result to improve clarity.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
R/cfbd_pbp_data.R(2 hunks)tests/testthat/test-cfbd_live_scoreboard.R(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: ubuntu-latest (release)
- GitHub Check: windows-latest (release)
- GitHub Check: ubuntu-latest (oldrel-1)
🔇 Additional comments (1)
tests/testthat/test-cfbd_live_scoreboard.R (1)
54-55: LGTM! Test logic corrected.The inverted argument order now correctly verifies that all expected columns defined in
colsare present in the returned data frames, rather than checking the reverse.
| # Note: this should be changed to a complete data validation test in the future | ||
| # filter out games with less than 10 plays to avoid issues with EPA/WPA models | ||
| game_plays <- play_df %>% | ||
| dplyr::filter(.data$game_id == x) | ||
| if (nrow(game_plays) < 20) { | ||
| cli::cli_alert_danger(glue::glue("Skipping game_id {x} with only {nrow(game_plays)} plays")) | ||
| return(NULL) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix comment inconsistency with validation threshold.
Line 590 states "less than 10 plays" but line 593 checks for < 20. Update the comment to match the code.
Apply this diff:
- # filter out games with less than 10 plays to avoid issues with EPA/WPA models
+ # filter out games with less than 20 plays to avoid issues with EPA/WPA modelsConsider extracting the magic number.
The threshold of 20 plays is hardcoded. While the comment acknowledges this should be improved with a complete validation test, consider extracting it as a named constant for maintainability.
Example:
MIN_PLAYS_FOR_EPA_WPA <- 20Then use:
if (nrow(game_plays) < MIN_PLAYS_FOR_EPA_WPA) {
cli::cli_alert_danger(glue::glue("Skipping game_id {x} with only {nrow(game_plays)} plays"))
return(NULL)
}🤖 Prompt for AI Agents
In R/cfbd_pbp_data.R around lines 589 to 596, the inline comment incorrectly
says "less than 10 plays" while the code checks for fewer than 20 plays; update
the comment to reflect the actual threshold (20) and replace the hardcoded magic
number by introducing a named constant (e.g., MIN_PLAYS_FOR_EPA_WPA <- 20) at
top of the function or file, then use that constant in the if check and the
alert message to improve clarity and maintainability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ this is fair -- @saiemgilani was this minimum supposed to be 20 or 10
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
Update package version to 2.1.0. Add release notes for bug fixes in `cfbd_pbp_data()` and improvements to `add_yardage()` handling missing yardage values. Update cran-comments to reflect minor release and changes.
Added normalization for 'seasonType' to 'season_type' in cfbd_stats_game_advanced. Updated tests to check for column inclusion with expect_in instead of expect_setequal, and added team ID columns to betting lines test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
R/cfbd_stats.R (1)
269-270: Column renaming for consistency looks good; remove extra blank line.The addition of the
seasonType→season_typetransformation maintains consistency with the snake_case convention used throughout the codebase. This aligns with the pattern of other column transformations in the function.However, line 270 introduces an unnecessary blank line.
Apply this diff to remove the extra blank line:
colnames(df) <- gsub("seasonType", "season_type", colnames(df)) -
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
R/cfbd_stats.R(1 hunks)tests/testthat/test-cfbd_betting_lines.R(1 hunks)tests/testthat/test-cfbd_stats_game_advanced.R(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-25T01:24:22.513Z
Learnt from: saiemgilani
PR: sportsdataverse/cfbfastR#113
File: R/cfbd_stats.R:381-381
Timestamp: 2025-08-25T01:24:22.513Z
Learning: In cfbfastR's cfbd_stats_season_advanced() function, the gsub("Opportunies", "_opportunities", colnames(df)) is intentional and correct for handling the actual API response column naming from CollegeFootballData.com, not a typo as it might appear.
Applied to files:
R/cfbd_stats.R
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: ubuntu-latest (oldrel-1)
- GitHub Check: ubuntu-latest (release)
- GitHub Check: windows-latest (release)
🔇 Additional comments (3)
tests/testthat/test-cfbd_stats_game_advanced.R (1)
36-38: Test relaxation is reasonable for external API.Changing from exact set equality (
expect_setequal) to containment checks (expect_in) makes the tests more resilient to API changes. This is appropriate when testing external APIs that may add new fields without breaking existing functionality.Trade-off: The tests will no longer detect if unexpected additional columns appear in the response, but this is acceptable given the benefits of reduced test brittleness.
tests/testthat/test-cfbd_betting_lines.R (2)
22-23: Addition of team ID columns improves data structure.Including
home_team_idandaway_team_idin the expected columns enhances referential integrity and provides stable identifiers for teams, which is preferable to relying solely on team names.
30-31: Test relaxation is consistent with other test updates.The change from
expect_setequaltoexpect_infollows the same pattern applied intest-cfbd_stats_game_advanced.R, providing consistent test behavior across the test suite when validating external API responses.
Test Game ID: 401754622
Examples of text that previously returned NAs for
yds_receivingbut now have valid values:Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests