More receiving yards cases #117

akeaswaran · 2025-09-15T22:25:52Z

Test Game ID: 401754622

Examples of text that previously returned NAs for yds_receiving but now have valid values:

Aaron Philo pass to Dean Patterson for 6 yds to the GWEB 48
Cole Pennington sacked by Christian Garrett for a loss of 2 yards to the GWEB 37
to Anthony Lowe for 1 yd to the GT 0, for a TD, Nate Hampton pass (Charlie Viorel PAT MISSED)
to Anthony Lowe for 6 yds to the GWEB 35, Nate Hampton pass
to Brett Seither for 21 yds to the GWEB 32 for a 1ST down

Summary by CodeRabbit

New Features
- Expanded support for 2025 play-by-play text variants to derive receiving yards from more phrasings (e.g., “pass to … for X yards,” trailing “pass,” and cases without an explicit passer).
Bug Fixes
- Improved extraction for receiving yardage including negative (“for a loss of”) and zero-yard (“for no gain”) cases.
- Per-game validation in EPA/WPA processing: underpopulated games are skipped with a runtime alert.
- Normalized API output column renamed to season_type.
Documentation
- Package bumped to v2.1.0 and release notes updated.
Tests
- Tests adjusted to use subset/membership checks for column validations.

Removed deprecated labels for elapsed time columns in cfbd_drives and added new columns to cfbd_live_plays documentation and tests. Updated test expectations to use expect_in instead of expect_setequal for column checks.

Added new columns to cfbd_play_stats_player output, improved sack player aggregation, handled NULL values, and updated documentation and examples to reflect changes. Also updated cfbd_live_plays documentation to include new columns for average start yard line and deserve to win metrics.

Changed cfbd_pbp_data to assign 3 timeouts per half for offense and defense when timeout data is missing from the API. Updated documentation and examples to reflect this behavior.

Added .groups = "drop" to the dplyr::summarise call in add_play_counts to control grouping behavior and prevent potential warnings in future dplyr versions.

Removed the specific count of variables from the return value description in both R and Rd files to improve maintainability and accuracy as the data frame structure may change.

… function usage in play data functions Corrected spacing and replaced superseded `dplyr::distinct_all()` with `dplyr::distinct()`, and standardized assignment spacing for improved code readability and consistency.

vercel · 2025-09-15T22:25:57Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
cfbfastr	Ready	Preview	Comment	Oct 13, 2025 3:26pm

coderabbitai · 2025-09-15T22:26:00Z

Walkthrough

Adds 2025-specific parsing branches to add_yardage yds_receiving logic; changes cfbd_pbp_data to subset and validate per-game plays (skip games with <20 plays and emit alert) for EPA/WPA computation; adjusts multiple tests to check expected columns are present. No public API signature changes.

Changes

Cohort / File(s)	Change Summary
Yardage parsing enhancements `R/helper_pbp_add_yardage.R`	Extended `yds_receiving` parsing with multiple 2025-specific text patterns: handle "pass to ... for a loss of ..." → negative yards, "for X yd(s)/yards" → parsed yards, end-of-text "pass"/"PASSER pass" variants, leading "to ..." forms including "for no gain" → 0 and "for a loss" → negative; retains existing patterns and default `NA_real_`.
Per-game EPA/WPA filtering `R/cfbd_pbp_data.R`	Per-game processing now subsets `game_plays` for each `game_id`, skips games with fewer than 20 plays (emits `cli::cli_alert_danger`), and returns/uses `game_plays` within the map for EPA/WPA computation instead of the accumulating global `play_df`.
Column rename in stats output `R/cfbd_stats.R`	Added renaming of output column `seasonType` → `season_type` in `cfbd_stats_game_advanced` post-processing.
Test updates (subset checks & expanded cols) `tests/testthat/test-cfbd_live_scoreboard.R`, `tests/testthat/test-cfbd_betting_lines.R`, `tests/testthat/test-cfbd_stats_game_advanced.R`	Replaced strict set-equality assertions with containment checks (expect_in) asserting expected columns are present within returned data; expanded expected columns for betting lines to include `home_team_id` and `away_team_id`.
Metadata / release notes `DESCRIPTION`, `NEWS.md`, `cran-comments.md`	Bumped package version to 2.1.0 and added release notes describing cfbd_pbp_data fix and add_yardage improvements; updated CRAN comments accordingly.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Row as play_text (row)
  participant Add as add_yardage()
  participant Parser as yds_receiving logic

  Row->>Add: call with play_text + metadata
  Add->>Parser: evaluate case_when branches

  rect rgb(240,248,255)
    note right of Parser: Existing parsing branches
    Parser-->Parser: classic forms → parse yards
  end

  rect rgb(237,250,240)
    note right of Parser: New 2025-specific branches
    Parser-->Parser: "pass to ... for a loss of N" → -N
    Parser-->Parser: "pass to ... for N yd(s)/yards" → N
    Parser-->Parser: trailing "pass"/"PASSER pass" variants → parse after "for"
    Parser-->Parser: leading "to ..." forms: "for no gain" → 0 / "for a loss" → -N / "for N" → N
  end

  alt matched
    Parser-->>Add: numeric yds_receiving
  else no match
    Parser-->>Add: NA_real_
  end

  Add-->>Row: updated row with yds_receiving

sequenceDiagram
  autonumber
  participant All as all_game_ids
  participant Proc as cfbd_pbp_data map loop
  participant Game as game_plays (per-game)

  All->>Proc: iterate game_id x
  Proc->>Game: subset plays for x -> game_plays
  alt nrow(game_plays) < 20
    Proc->>Proc: cli::cli_alert_danger("skipping game with <20 plays")
    Proc-->>All: skip / return NULL for x
  else
    Proc->>Proc: compute EPA/WPA using game_plays
    Proc-->>All: return game_plays (for aggregation)
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

add a receiving yardage case #115 — Directly related updates to R/helper_pbp_add_yardage.R adding receiving-yard parsing cases; strong overlap.
Fixing issue with 2024 pass completions not having right yardage #104 — Related adjustments to pass-completion yardage parsing and regex branches in add_yardage.
Add remaining v2 functions #113 — Related per-game play processing changes in cfbd_pbp_data.R.

Poem

I hop through plays where phrases hide,
I sniff "for loss" and numbers wide.
New branches carved, each pattern chased,
Per-game checks keep short games paced.
Thump! The parser finds each yard retraced. 🐇

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title succinctly describes the primary change, namely adding more cases for parsing receiving yards, which aligns with the modifications in the pull request to extend yds_receiving functionality for additional play‐by‐play text patterns.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch AE/2025-receiving-yardage-parse-case

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

R/helper_pbp_add_yardage.R (2)
88-93: Good addition for negative-yard “pass to … for a loss of …”; add a zero-yard case too.

Handle the “for no gain” phrasing for plain “pass to …” to avoid leaving NA for these plays.

Apply this diff to add the zero-yard case (order before other “pass to …” yards rules is fine):
+        .data$pass == 1 &
+          stringr::str_detect(.data$play_text, regex("pass to", ignore_case = TRUE)) &
+          stringr::str_detect(.data$play_text, regex("for no gain", ignore_case = TRUE)) ~ 0,
100-105: Tighten the “for X yards” regex; current y\\w*ds? is overbroad.

Use explicit units and a word boundary to avoid accidental matches.
-          stringr::str_detect(.data$play_text, regex(" for \\d+ y\\w*ds?", ignore_case = TRUE)) ~
+          stringr::str_detect(.data$play_text, regex("\\bfor\\s+\\d+\\s*(?:yds?|yards?)\\b", ignore_case = TRUE)) ~
           as.numeric(stringr::str_extract(
-            stringi::stri_extract_first_regex(.data$play_text, "(?<= for)[^,]+"), "\\d+"
+            stringi::stri_extract_first_regex(.data$play_text, "(?<=\\bfor\\b)[^,]+"), "\\d+"
           )),

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b3312dc and 221e0f7.

📒 Files selected for processing (1)

R/helper_pbp_add_yardage.R (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: ubuntu-latest (oldrel-1)
GitHub Check: windows-latest (release)
GitHub Check: ubuntu-latest (release)

🔇 Additional comments (1)

R/helper_pbp_add_yardage.R (1)

115-126: Fix contradictory regex: require trailing "pass" (optionally with parens) and allow "to" anywhere

File: R/helper_pbp_add_yardage.R (lines ~115-126). The current branches require '^to ' (start-of-string) and also check for 'pass' at the end — these predicates are effectively contradictory, so the branches will rarely fire. (github.com)

-        # 2025 has some plays list "PASSER pass" at the very end of the play_text
-        .data$pass == 1 &
-          stringr::str_detect(.data$play_text, regex("pass \\(\\w", ignore_case = TRUE)) &
-          stringr::str_detect(.data$play_text, regex("^to ", ignore_case = FALSE)) ~ as.numeric(stringr::str_extract(
-            stringi::stri_extract_first_regex(.data$play_text, "(?<= for)[^,]+"), "\\d+"
-          )),
-        .data$pass == 1 &
-          stringr::str_detect(.data$play_text, regex("pass$", ignore_case = TRUE)) &
-          stringr::str_detect(.data$play_text, regex("^to ", ignore_case = FALSE)) ~ as.numeric(stringr::str_extract(
-            stringi::stri_extract_first_regex(.data$play_text, "(?<= for)[^,]+"), "\\d+"
-          )),
+        # 2025: some plays end with "... pass" (optionally with parens); yards still appear after "for"
+        .data$pass == 1 &
+          stringr::str_detect(.data$play_text, regex("\\bpass(?:\\s*\\([^)]*\\))?\\s*$", ignore_case = TRUE)) &
+          stringr::str_detect(.data$play_text, regex("\\bto\\s", ignore_case = TRUE)) ~
+          as.numeric(stringr::str_extract(
+            stringi::stri_extract_first_regex(.data$play_text, "(?<=\\bfor\\b)[^,]+"), "\\d+"
+          )),

Re-run the pipeline against Game ID 401754622 to confirm these plays are now captured.

R/helper_pbp_add_yardage.R

Added a check to filter out games with fewer than 20 plays in the play-by-play data processing. This helps avoid issues with EPA/WPA models and improves data validation.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

R/cfbd_pbp_data.R (1)

597-609: LGTM! Per-game processing correctly implemented.

The change from processing and returning the full play_df to per-game game_plays subsets is correct and aligns with the new validation logic. The map_dfr will properly combine the game-specific results.

Note: game_plays is defined on line 591 and reassigned here after piping through the processing functions. While this variable reuse pattern works in R, consider using a different variable name (e.g., processed_plays) for the pipeline result to improve clarity.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 69b981c and 142a7a0.

📒 Files selected for processing (2)

R/cfbd_pbp_data.R (2 hunks)
tests/testthat/test-cfbd_live_scoreboard.R (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: ubuntu-latest (release)
GitHub Check: windows-latest (release)
GitHub Check: ubuntu-latest (oldrel-1)

🔇 Additional comments (1)

tests/testthat/test-cfbd_live_scoreboard.R (1)

54-55: LGTM! Test logic corrected.

The inverted argument order now correctly verifies that all expected columns defined in cols are present in the returned data frames, rather than checking the reverse.

coderabbitai · 2025-10-13T15:16:17Z

R/cfbd_pbp_data.R

+        # Note: this should be changed to a complete data validation test in the future
+        # filter out games with less than 10 plays to avoid issues with EPA/WPA models
+        game_plays <- play_df %>%
+          dplyr::filter(.data$game_id == x)
+        if (nrow(game_plays) < 20) {
+          cli::cli_alert_danger(glue::glue("Skipping game_id {x} with only {nrow(game_plays)} plays"))
+          return(NULL)
+        }


⚠️ Potential issue | 🟡 Minor

Fix comment inconsistency with validation threshold.

Line 590 states "less than 10 plays" but line 593 checks for < 20. Update the comment to match the code.

Apply this diff:

- # filter out games with less than 10 plays to avoid issues with EPA/WPA models + # filter out games with less than 20 plays to avoid issues with EPA/WPA models

Consider extracting the magic number.

The threshold of 20 plays is hardcoded. While the comment acknowledges this should be improved with a complete validation test, consider extracting it as a named constant for maintainability.

Example:

MIN_PLAYS_FOR_EPA_WPA <- 20

Then use:

if (nrow(game_plays) < MIN_PLAYS_FOR_EPA_WPA) { cli::cli_alert_danger(glue::glue("Skipping game_id {x} with only {nrow(game_plays)} plays")) return(NULL) }

🤖 Prompt for AI Agents

In R/cfbd_pbp_data.R around lines 589 to 596, the inline comment incorrectly says "less than 10 plays" while the code checks for fewer than 20 plays; update the comment to reflect the actual threshold (20) and replace the hardcoded magic number by introducing a named constant (e.g., MIN_PLAYS_FOR_EPA_WPA <- 20) at top of the function or file, then use that constant in the if check and the alert message to improve clarity and maintainability.

^ this is fair -- @saiemgilani was this minimum supposed to be 20 or 10

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Update package version to 2.1.0. Add release notes for bug fixes in `cfbd_pbp_data()` and improvements to `add_yardage()` handling missing yardage values. Update cran-comments to reflect minor release and changes.

Added normalization for 'seasonType' to 'season_type' in cfbd_stats_game_advanced. Updated tests to check for column inclusion with expect_in instead of expect_setequal, and added team ID columns to betting lines test.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

R/cfbd_stats.R (1)
269-270: Column renaming for consistency looks good; remove extra blank line.

The addition of the seasonType → season_type transformation maintains consistency with the snake_case convention used throughout the codebase. This aligns with the pattern of other column transformations in the function.

However, line 270 introduces an unnecessary blank line.

Apply this diff to remove the extra blank line:
 colnames(df) <- gsub("seasonType", "season_type", colnames(df))
-

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dd8435a and fad6fde.

📒 Files selected for processing (3)

R/cfbd_stats.R (1 hunks)
tests/testthat/test-cfbd_betting_lines.R (1 hunks)
tests/testthat/test-cfbd_stats_game_advanced.R (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-08-25T01:24:22.513Z

Learnt from: saiemgilani
PR: sportsdataverse/cfbfastR#113
File: R/cfbd_stats.R:381-381
Timestamp: 2025-08-25T01:24:22.513Z
Learning: In cfbfastR's cfbd_stats_season_advanced() function, the gsub("Opportunies", "_opportunities", colnames(df)) is intentional and correct for handling the actual API response column naming from CollegeFootballData.com, not a typo as it might appear.

Applied to files:

R/cfbd_stats.R

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: ubuntu-latest (oldrel-1)
GitHub Check: ubuntu-latest (release)
GitHub Check: windows-latest (release)

🔇 Additional comments (3)

tests/testthat/test-cfbd_stats_game_advanced.R (1)

36-38: Test relaxation is reasonable for external API.

Changing from exact set equality (expect_setequal) to containment checks (expect_in) makes the tests more resilient to API changes. This is appropriate when testing external APIs that may add new fields without breaking existing functionality.

Trade-off: The tests will no longer detect if unexpected additional columns appear in the response, but this is acceptable given the benefits of reduced test brittleness.

tests/testthat/test-cfbd_betting_lines.R (2)

22-23: Addition of team ID columns improves data structure.

Including home_team_id and away_team_id in the expected columns enhances referential integrity and provides stable identifiers for teams, which is preferable to relying solely on team names.

30-31: Test relaxation is consistent with other test updates.

The change from expect_setequal to expect_in follows the same pattern applied in test-cfbd_stats_game_advanced.R, providing consistent test behavior across the test suite when validating external API responses.

akeaswaran and others added 10 commits September 3, 2025 14:01

add a receiving yardage case

ce4925f

docs: Created return tag for internal functions for CRAN

8aff7ef

chore: Update drive and live play columns, fix tests

6ad3e01

Removed deprecated labels for elapsed time columns in cfbd_drives and added new columns to cfbd_live_plays documentation and tests. Updated test expectations to use expect_in instead of expect_setequal for column checks.

fix: Substitute timeouts in cfbd_pbp_data when missing

8953e57

Changed cfbd_pbp_data to assign 3 timeouts per half for offense and defense when timeout data is missing from the API. Updated documentation and examples to reflect this behavior.

fix: Specify .groups argument in summarise call

61120cf

Added .groups = "drop" to the dplyr::summarise call in add_play_counts to control grouping behavior and prevent potential warnings in future dplyr versions.

docs: Update cfbd_drives return documentation

85f40d5

Removed the specific count of variables from the return value description in both R and Rd files to improve maintainability and accuracy as the data frame structure may change.

fix: use dplyr::distinct() over dplyr::distinct_all() spacing and…

31d0871

… function usage in play data functions Corrected spacing and replaced superseded `dplyr::distinct_all()` with `dplyr::distinct()`, and standardized assignment spacing for improved code readability and consistency.

add more cases

a78c607

Merge branch 'main' into AE/2025-receiving-yardage-parse-case

221e0f7

coderabbitai bot reviewed Sep 15, 2025

View reviewed changes

R/helper_pbp_add_yardage.R Show resolved Hide resolved

not sure these are necessary but sure

9404f5b

vercel bot deployed to Preview September 15, 2025 22:39 View deployment

matching parse logic

69b981c

vercel bot deployed to Preview September 16, 2025 04:12 View deployment

saiemgilani added 2 commits October 13, 2025 11:10

chore: Fix argument order in expect_in for scoreboard tests

33f86b4

fix: Skip games with insufficient plays in play-by-play data

142a7a0

Added a check to filter out games with fewer than 20 plays in the play-by-play data processing. This helps avoid issues with EPA/WPA models and improves data validation.

vercel bot deployed to Preview October 13, 2025 15:13 View deployment

coderabbitai bot reviewed Oct 13, 2025

View reviewed changes

chore: Bump version to 2.1.0 and update release notes

dd8435a

Update package version to 2.1.0. Add release notes for bug fixes in `cfbd_pbp_data()` and improvements to `add_yardage()` handling missing yardage values. Update cran-comments to reflect minor release and changes.

vercel bot deployed to Preview October 13, 2025 15:21 View deployment

chore: Normalize column names and update tests

fad6fde

Added normalization for 'seasonType' to 'season_type' in cfbd_stats_game_advanced. Updated tests to check for column inclusion with expect_in instead of expect_setequal, and added team ID columns to betting lines test.

vercel bot deployed to Preview October 13, 2025 15:26 View deployment

coderabbitai bot reviewed Oct 13, 2025

View reviewed changes

saiemgilani approved these changes Oct 13, 2025

View reviewed changes

saiemgilani merged commit 42152cb into main Oct 13, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

More receiving yards cases #117

More receiving yards cases #117

Uh oh!

akeaswaran commented Sep 15, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

vercel bot commented Sep 15, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Sep 15, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Oct 13, 2025 •

edited

Loading

Uh oh!

akeaswaran Oct 13, 2025

Uh oh!

coderabbitai bot Oct 13, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

More receiving yards cases #117

More receiving yards cases #117

Uh oh!

Conversation

akeaswaran commented Sep 15, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

vercel bot commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

akeaswaran Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

akeaswaran commented Sep 15, 2025 •

edited by coderabbitai bot

Loading

vercel bot commented Sep 15, 2025 •

edited

Loading

coderabbitai bot commented Sep 15, 2025 •

edited

Loading

coderabbitai bot Oct 13, 2025 •

edited

Loading