Skip to content

Kloppy Integration#161

Draft
UnravelSports wants to merge 1 commit intofloodlight-sports:mainfrom
UnravelSports:main
Draft

Kloppy Integration#161
UnravelSports wants to merge 1 commit intofloodlight-sports:mainfrom
UnravelSports:main

Conversation

@UnravelSports
Copy link

I've been working on #160. Please find the draft in this PR.

It contains 2 files:

  • floodlight.io.from_kloppy
  • from_kloppy.ipynb to simply run everything and check it out.

I also have some questions:

  • What is the purpose of "precedence" in the Teamsheet creation? Does it matter that we simply use the kloppy PositionType enum to order here? This does do GK, DF, MF, FW, but the order within each line is for example LB, RB, LCB, RCB.

  • Does floodlight enforce or rely on some specific orientation? I.e. home left, away right for the whole game?

  • Does floodlight enforce or rely on a specific coordinate system with pitch dimensions? I'm currently transforming to "secondspectrum", because it seems your Pitch object needs a string name and "secondspectrum" is supported.

  • I assume we rely on Period start and end frame ids? Are they really necessary, or only for creating XY objects?

  • Does floodlight record different metadata for event and tracking data? (e.g. frame rate)

  • Does floodlight not support z coordinates?

  • I noticed In some xy_objects we have "HT1" "HT2" (e.g. second spectrum) and in others we get "firstHalf" and "secondHalf" (e.g. in IDSSEDataset). Does this matter?

  • For events:

    • What is the purpose of eID and would it matter if we simply use kloppy EventType for this?
    • Is gameclock seconds since start of period?
    • Is outcome 1.0 and 0.0?
    • How much would it matter if timestamp is not a datetime, but a timedelta instead?
    • Is qualifier simply the raw event we get?
  • Tests are still missing

Copy link
Contributor

@manuba95 manuba95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @UnravelSports!
Thanks for the suggestion and the PR, I think this would add a lot of value to both Kloppy and floodlight! I added some specific comments below (mostly cosmetic) but also have some general ideas we should consider before a merge.

My main concern is that this is currently very taylored towards a two-team (home/away) scenario. Although this may be the use case for many official data providers, future GPS/LPS data support of Kloppy may result in different team (and period) constellations. I think it's worth thinking about how to generalize this PR now, so any number of groups and periods can be supported. I also left some comments about this below, but please let me know what you think about this as this would probably be the largest refactoring of your PR!

To answer your questions:

  • I think "precedence" was used specifically in the secondspectrum parser to have a rule on how the xIDs are sorted. This seems to be a legacy from my point of view since the introduction of Teamsheets. But I have to say, I was not involved in this parser so maybe I am missing something. I generally don't see any value of adding this logic to the kloppy bridge.
  • We don't rely on a specific orientation but that the x-axis is the longer side of the pitch
  • No, the origin of the Pitch coordinate system gets defined by the x/ylim argument. The scale can be set by the length/width and unit arguments. Also see my comment below regarding the get_pitch function.
  • I am not sure I understand the question. For the first creation of the XY array we need the number of players an number of frames. The latter could be extracted from the first and last frame ID or (as you do) from the length of the df. Please elaborate if this doesn't answer your question.
  • Yes, different metadata is stored as attributes of the Core objects (e.g., framerate is an attribute of the XY object). But I don't think they would differ between data from the same provider (e.g. framerate in XY and possession Code from the same provider should be the same)
  • We do not support z coordinates
  • Yes, I commented on that below as well. We are currently not very consistent with the period naming. I think for now it does not matter to much but streamlining this terminology could be a future project, For now, you could go with the terminology used in Kloppy ("Period_1" or similar). Honestly, I currently don't have a really strong opinion on this.
  • Events: I will comment on this later!
  • Tests: We currently don't include unit tests for our parsers mainly for data availablity reasons (which is not optimal!). Some ideas on how to test the kloppy bridge would be appreciated!

General minor comments:

  • Please merge into develop and add conventional PR name (e.g., "feat: Kloppy integration")
  • Please check codestyle with Black and flake8

Thanks again for your work, I hope we can include this soon!

@@ -0,0 +1,577 @@
from floodlight import Teamsheet, Code, XY, Pitch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please reorder imports: standard, third party, local. Then alphabetically.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also reorder the functions to main/public first, private below

pitch: Pitch
Pitch object with actual pitch length and width.
"""
from kloppy.domain import AttackingDirection
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a good reason I'm missing to have the imports within the function calls? I don't think calling anything from this module makes sense without having kloppy installed, right? Would it make sense to put all kloppy imports to the top of the file while also checking if its installed?

periods = {}
directions = {}

metadata_dict["framerate"] = metadata.frame_rate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think medatada_dict.update({"length": ... , "width": ...}) would be a bit cleaner here, but thats just my opinion.


metadata_dict["framerate"] = metadata.frame_rate
metadata_dict["length"] = (
metadata.pitch_dimensions.x_dim if metadata.pitch_dimensions else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to put pitch_dimensions.pitch_length/width here? Then the metadata_dictionary would have not include kloppy objects. Right now, the metadata_dict["length/width"] does not get called again while the pitch object creation calls the .pitch_length/width later.

metadata_dict["away_tID"] = away_team.team_id

for period in metadata.periods:
segment = f"HT{period.id}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are currently not very consistent in naming our periods but keep it more or less to the naming convention of the provider. E.g., with secondspectrum we use "HT1/2", with the DFL data, we use "firstHalf"/"secondHalf". This may not be optimal from our side, and I think a good solution would be to use Kloppys naming convention when transforming from Kloppy to floodlight, so e.g., "Period_1"/"P1" or something similar. Any thoughts?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two thoughts on this. It should be consistent, and it should probably be called "Period". I'll change it to "Period_{id}"

return ballstatus_code


def _create_ballstatus_code(df: pd.DataFrame) -> np.ndarray:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be duplicate.



def _extract_team_xy_data(
df: pd.DataFrame, team: "Team", teamsheet: "Teamsheet"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typing of team and teamsheet is incorrect (string). Also, although private, df could be a bit more explicit.

Array with shape (n_frames, 2*n_players) containing x,y coordinates
"""
# Ensure teamsheet has xIDs
if "xID" not in teamsheet.teamsheet.columns:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? It already gets checked in the main function.

teamsheet.add_xIDs()

# Get mapping from jersey number to array index (xID)
links_jID_to_xID = teamsheet.get_links("jID", "xID")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This expects the jID to exist and be unique. can we expect that? You could use the "player" column, since it has to exist in the Teamsheets and its likely to be unique (name or constructed)

links_jID_to_xID = teamsheet.get_links("jID", "xID")

# Get the maximum xID to determine array size
max_xID = max(links_jID_to_xID.values()) if links_jID_to_xID else 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would n_players = len(links_jID_to_xID) be sufficient?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants