Skip to content

Potential improvements to the database schema #13140

@Logiqx

Description

@Logiqx

Whilst examining the v2 results export, I spotted a few potential improvements to the database schema.

This issue summarises my observations and also describes some potential improvements.

Indices

The v2 results export includes primary keys for the majority of tables.

It would also be worth having a primary key on the persons table (wca_id, sub_id).

I've always used this approach within the databases used by the Senior Rankings.

MySQL and MariaDB can both use the index, even when sub_id is not in a SQL query.

Inconsistent types

A few primary keys have different types in the results export and developer export:

  • championships id bigint(20) vs int(11)
  • results id bigint(20) vs int(11)
  • scrambles id bigint(20) vs int(10) unsigned

The types should be consistent, whatever is deemed most appropriate.

FWIW my dev export does not reflect the following change, but maybe just a matter of time:

  • result_attempts attempt_number tinyint(3)

Changing types in this way can be regarded as a non-breaking change.

Missing fields

My dev export is missing some fields that are present in results export v2.0.x, but maybe just a matter of time:

  • competitions cancelled int(11)
  • competitions latitude_microdegrees int(11)
  • competitions longitude_microdegrees int(11)
  • competitions event_specs longtext
  • competitions delegates mediumtext
  • competitions organizers mediumtext
  • results person_country_id varchar(50)

Perhaps person_country_id should be in both exports or neither?

Dates

The results export has day / month / year fields in the competitions table, unlike the dev export which has actual dates:

  • competitions day
  • competitions year
  • competitions month
  • competitions end_day
  • competitions end_year
  • competitions end_month

Actual start / end dates would be preferable in the results export, allowing identical SQL queries to work against both exports.

FWIW, I add the start_date and end_date columns to the results export, so that I can use identical queries against either database.

Changing the competitions table in the results export would likely be deemed v2.1.x, but could be deployed in two phases.

  1. Date fields could be added to competitions in v2.0.x, enabling third parties to adopt the new fields
  2. Day / month / year columns could be removed from the competitions table in v2.1.x.

I'm not sure of the knock-on effects, but some tables in the dev database might also benefit from actual dates:

  • archive_registrations birthYear
  • archive_registrations birthDay
  • archive_registrations birthMonth
  • concise_average_results day
  • concise_average_results year
  • concise_average_results month
  • concise_single_results day
  • concise_single_results year
  • concise_single_results month

WCA id

Two different field names are used for the same item, and some instances use varchar(191).

  • concise_average_results person_id varchar(10)
  • concise_single_results person_id varchar(10)
  • inbox_results person_id varchar(20)
  • ranks_average person_id varchar(10)
  • ranks_single person_id varchar(10)
  • results person_id varchar(10)
  • inbox_persons wca_id varchar(10)
  • persons wca_id varchar(10)
  • tickets_edit_person wca_id varchar(191)
  • users wca_id varchar(191)

Using wca_id instead of person_id would make some sense, and so would the use of char(10).

The rename of person_id would obviously a breaking-change and may be deemed too painful.

A significant change like this would need to be deemed v2.1.x or even v3.0.0.

ISO 2

A number of tables include an ISO 3166-1 alpha-2 field, but some are too long.

  • competition_venues country_iso2 varchar(191)
  • eligible_country_iso2s_for_championship eligible_country_iso2 varchar(191)
  • countries iso2 varchar(2)
  • country_bands iso2 varchar(2)
  • inbox_persons country_iso2 varchar(2)
  • users country_iso2 varchar(255)

It makes sense to use char(2) for fixed length fields such as ISO codes.

Changing these fields to char types can be regarded as a non-breaking change.

1 character / 2 character fields

The use of char(1) and char(2) has semantic benefits, plus modest storage + performance benefits.

Fields that could potentially use char(1) or char(2) types:

  • archive_registrations gender varchar(1)
  • formats id varchar(1)
  • inbox_persons gender varchar(1)
  • inbox_results format_id varchar(1)
  • inbox_results round_type_id varchar(1)
  • persons gender varchar(1)
  • results format_id varchar(1)
  • results round_type_id varchar(1)
  • rounds old_type varchar(1)
  • round_types id varchar(1)
  • scrambles round_type_id varchar(1)
  • countries iso2 varchar(2)
  • country_bands iso2 varchar(2)
  • inbox_persons country_iso2 varchar(2)

Changing fields to char types can be regarded as a non-breaking change.

Summary

These are things that I happened to notice whilst reviewing the v2 results export.

After the transition from results export v1 to v2, perhaps there is some appetite for further improvements?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions