Skip to content

Comments

Write server stats to postgres for catabalancer#1389

Merged
mjh1 merged 5 commits intomainfrom
mh/redis
Jan 13, 2025
Merged

Write server stats to postgres for catabalancer#1389
mjh1 merged 5 commits intomainfrom
mh/redis

Conversation

@mjh1
Copy link
Member

@mjh1 mjh1 commented Jan 7, 2025

Switch to postgres to store node stats for catabalancer. Previously when we used serf to propagate this data throughout the cluster we ran into bottlenecks with messages queueing up, it didn't seem it was designed for that sort of use case.

  • The catalyst nodes (the ones running mist) will periodically update the db with their stats.
  • The catalyst-api pods will read the data from the db for all nodes and make the load balancing decisions.

Future PRs:

  • strip out the unused serf message handling code
  • cache the db results from RefreshNodes() at the moment it can be called multiple times concurrently, possibly just change it to a periodic refresh.

}

// JSON representation is deliberately truncated to keep the message size small
type NodeUpdateEvent struct {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just moved this code from the events package to make the dependencies easier

@mjh1 mjh1 changed the title Write server stats to redis Write server stats to postgres for catabalancer Jan 9, 2025
@mjh1 mjh1 requested review from leszko and thomshutt January 9, 2025 15:15
@mjh1 mjh1 marked this pull request as ready for review January 9, 2025 15:15
Copy link
Contributor

@leszko leszko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other 2 comments. Other than that LGTM

NodeMetrics: make(map[string]NodeMetrics),
metricTimeout: metricTimeout,
ingestStreamTimeout: ingestStreamTimeout,
MetricsDB: nodeStatsDB,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC This will be a different database than our metrics DB. Then, could you rename it to something like NodeStatsDB?

continue
}

c.UpdateNodes(event.NodeID, event.NodeMetrics)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need to keep any internal state for nodes? If we make a query to DB with every playback redirect request, then couldn't we reason everything from there?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep sorry I totally forgot to mention refactoring that as a plan for another PR, in two minds about whether I should do that in this PR but I think since it's been approved I think I'll just do it now but in a fresh PR.

@mjh1 mjh1 merged commit bc27392 into main Jan 13, 2025
6 of 7 checks passed
@mjh1 mjh1 deleted the mh/redis branch January 13, 2025 10:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants