-
Notifications
You must be signed in to change notification settings - Fork 120
core: Add health_monitor service #3701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
patrickelectric
wants to merge
7
commits into
bluerobotics:master
Choose a base branch
from
patrickelectric:system-health
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
core: Add health_monitor service #3701
patrickelectric
wants to merge
7
commits into
bluerobotics:master
from
patrickelectric:system-health
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Reviewer's GuideIntroduce a new Health Monitor backend service and frontend UI to aggregate system/vehicle health checks, expose them via an HTTP API, publish events over Zenoh, and surface them through a new /tools/health-monitor tool in the UI with filtering and history views. Sequence diagram for frontend fetching health summary and historysequenceDiagram
actor User
participant HealthMonitorView
participant HealthMonitorStore
participant Axios as back_axios
participant Nginx
participant HealthAPI as HealthMonitorAPI
User->>HealthMonitorView: Open /tools/health-monitor
HealthMonitorView->>HealthMonitorStore: mounted() refresh()
HealthMonitorStore->>HealthMonitorStore: setLoading(true)
HealthMonitorStore->>Axios: GET /health-monitor/v1.0/health/summary
Axios->>Nginx: HTTP request
Nginx->>HealthAPI: Proxy /health-monitor/...
HealthAPI-->>Nginx: 200 HealthSummary
Nginx-->>Axios: 200 HealthSummary
Axios-->>HealthMonitorStore: response.data
HealthMonitorStore->>HealthMonitorStore: setSummary(HealthSummary)
HealthMonitorStore->>HealthMonitorStore: setLoading(false)
HealthMonitorStore->>Axios: GET /health-monitor/v1.0/health/history?limit=200
Axios->>Nginx: HTTP request
Nginx->>HealthAPI: Proxy /health-monitor/...
HealthAPI-->>Nginx: 200 HealthHistory
Nginx-->>Axios: 200 HealthHistory
Axios-->>HealthMonitorStore: response.data
HealthMonitorStore->>HealthMonitorStore: setHistory(HealthHistory)
HealthMonitorView->>HealthMonitorView: Compute filteredActive, filteredHistory
HealthMonitorView-->>User: Render tables with filters
Class diagram for health_monitor backend serviceclassDiagram
class HealthProblem {
+str id
+str severity
+str title
+str details
+str source
+int timestamp
+Dict metadata
+int first_seen_ms
+int last_seen_ms
}
class HealthEvent {
+str id
+str severity
+str title
+str details
+str source
+int timestamp
+Dict metadata
+int first_seen_ms
+int last_seen_ms
+str type
}
class HealthSummary {
+List~HealthProblem~ active
+int updated_at
}
class HealthHistory {
+List~HealthEvent~ events
}
class HealthCheckResult {
+Dict~str,HealthProblem~ active
+Dict~str,HealthProblem~ resolved
}
class ProblemRecord {
+HealthProblem problem
+int first_seen_ms
+int last_seen_ms
}
class HealthStateTracker {
-Dict~str,ProblemRecord~ _active
+diff_and_update(new_active, resolved_info) List~HealthEvent~
+active_problems() List~HealthProblem~
+_event_from_problem(event_type, problem, first_seen_ms, last_seen_ms) HealthEvent
+_build_resolved_problem(problem, timestamp) HealthProblem
}
class KernelErrorTracker {
+set ERROR_LEVELS
-int _window_ms
-int _last_sequence
-int _last_error_ms
-str _last_message
-str _last_level
+evaluate(messages, now) HealthCheckResult
}
class UsbTracker {
-Dict~str,Dict~str,Any~~ _known_devices
-Dict~str,Dict~str,Any~~ _disconnected_devices
-bool _initialized
+evaluate(serial_ports, now) HealthCheckResult
+_problem_id(device_key) str
+_device_key(port) str
+_safe_str(value) str
}
class VersionComparator {
+fix_version(tag) str
+is_semver(tag) bool
+compare(a, b) int
+latest_semver(tags) str
+_parse_semver(tag) Tuple
}
class HealthConfig {
+float interval_sec
+int history_limit
+int disk_free_bytes
+float disk_free_percent
+float memory_warn_percent
+float memory_error_percent
+int kernel_error_window_ms
+float packet_loss_ratio
+int packet_loss_count
+float extension_cpu_percent
+float extension_memory_percent
+float extension_disk_percent
+float voltage_low
+float voltage_high
}
class HealthMonitor {
-HealthStateTracker _state
-deque _history
-KernelErrorTracker _kernel_tracker
-UsbTracker _usb_tracker
-ZenohSession _zenoh
-MavlinkMessenger _mavlink
-asyncio.Event _stop_event
-aiohttp.ClientSession _http_session
+stop()
+run()
+evaluate_once()
+summary() HealthSummary
+history_view(limit) HealthHistory
+config() HealthConfig
+_fetch_json(url, timeout) Any
+_get_vehicle_sysid() int
+_get_vehicle_voltage(vehicle_sysid) float
+_is_factory_mode() bool
+_bootstrap_tag() str
+_publish_event(event)
}
HealthEvent --|> HealthProblem
HealthSummary "*" o-- HealthProblem
HealthHistory "*" o-- HealthEvent
HealthStateTracker "1" o-- "*" ProblemRecord
HealthStateTracker ..> HealthCheckResult
KernelErrorTracker ..> HealthCheckResult
UsbTracker ..> HealthCheckResult
HealthMonitor ..> HealthStateTracker
HealthMonitor ..> KernelErrorTracker
HealthMonitor ..> UsbTracker
HealthMonitor ..> HealthSummary
HealthMonitor ..> HealthHistory
HealthMonitor ..> HealthConfig
HealthMonitor ..> HealthCheckResult
Class diagram for Health Monitor frontend store and viewclassDiagram
class HealthProblemTS {
+str id
+HealthSeverity severity
+str title
+str details
+HealthSource source
+number timestamp
+Record metadata
+number first_seen_ms
+number last_seen_ms
}
class HealthEventTS {
+HealthEventType type
+str id
+HealthSeverity severity
+str title
+str details
+HealthSource source
+number timestamp
+Record metadata
+number first_seen_ms
+number last_seen_ms
}
class HealthSummaryTS {
+HealthProblemTS[] active
+number updated_at
}
class HealthHistoryTS {
+HealthEventTS[] events
}
class HealthMonitorStore {
+string API_URL
+HealthSummaryTS summary
+HealthHistoryTS history
+boolean loading
+string error
+setSummary(value) void
+setHistory(value) void
+setLoading(value) void
+setError(message) void
+fetchSummary() Promise~void~
+fetchHistory(limit) Promise~void~
}
class HealthMonitorView {
+number activeTab
+string selectedSeverity
+string selectedSource
+string search
+string historySeverity
+string historySource
+string historySearch
+number refreshTimer
+activeHeaders
+historyHeaders
+summary
+history
+loading
+error
+severityOptions
+sourceOptions
+filteredActive
+filteredHistory
+mounted()
+beforeDestroy()
+refresh()
+matchesFilters(item, severity, source, text) boolean
+severityColor(severity) string
+eventColor(eventType) string
+formatTimestamp(timestamp) string
}
HealthEventTS --|> HealthProblemTS
HealthSummaryTS "*" o-- HealthProblemTS
HealthHistoryTS "*" o-- HealthEventTS
HealthMonitorStore --> HealthSummaryTS
HealthMonitorStore --> HealthHistoryTS
HealthMonitorView --> HealthMonitorStore
HealthMonitorView --> HealthSummaryTS
HealthMonitorView --> HealthHistoryTS
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
28f4003 to
2f3477a
Compare
Signed-off-by: Patrick José Pereira <[email protected]>
Signed-off-by: Patrick José Pereira <[email protected]>
Signed-off-by: Patrick José Pereira <[email protected]>
Signed-off-by: Patrick José Pereira <[email protected]>
Signed-off-by: Patrick José Pereira <[email protected]>
Signed-off-by: Patrick José Pereira <[email protected]>
2f3477a to
43b373f
Compare
Signed-off-by: Patrick José Pereira <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary by Sourcery
Introduce a Health Monitor backend service and frontend UI to track and surface system and vehicle health issues.
New Features:
Enhancements:
Build:
Tests: