-
Notifications
You must be signed in to change notification settings - Fork 409
Add tracing service with in memory span store #2119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Add proto definitions for TracingService with span ingestion and search capabilities. Includes generated code for backend (Go) and frontend (TypeScript) with ConnectRPC support.
- Add buf.build/redpandadata/otel dependency - Update protobuf to v1.36.11 - Add [email protected] for trace visualization - Update related transitive dependencies
Add new traces page accessible via /traces route with activity icon in navigation. Includes React Query integration for TracingService API.
Add enableTracingInConsole feature flag and TracingService feature constant. Regenerate OpenAPI specs with tracing endpoints.
…-store Resolved merge conflicts in: - frontend/src/components/constants.ts: Added both enablePipelineServiceAccount and enableTracingInConsole flags - frontend/src/components/routes.tsx: Integrated Activity icon into centralized icon system - frontend/src/components/icons/index.tsx: Added ActivityIcon export for tracing feature - Auto-generated files regenerated: openapi.json, openapi.v1alpha3.json, yarn.lock
|
The latest Buf updates on your PR. Results from workflow Buf CI / validate (pull_request).
|
- Fix CSS class ordering (transform utilities to end) - Simplify return statements (remove explicit undefined) - Fix import ordering in test files - Refactor span classifier to use OTel semantic conventions - Add time window refresh on manual refetch - Improve trace details panel overflow handling
| } | ||
|
|
||
| // Recursively extract the actual value from protobuf-es structure | ||
| const extractProtoValue = (value: unknown): unknown => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be useful to use protovalidate-es package for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm not sure I understand this comment. We are handling the response here, which is served by the backend and can be considered "validated" I guess. The rest of the code is more transformation / extraction and not validation. What did you envision here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair - since we are not performing any mutations here, protovalidate-es is not required
frontend/src/protogen/redpanda/api/dataplane/v1alpha3/tracing_pb.ts
Outdated
Show resolved
Hide resolved
frontend/src/components/pages/traces/utils/span-tree-builder.ts
Outdated
Show resolved
Hide resolved
| // Cleanup timeout on unmount | ||
| useEffect( | ||
| () => () => { | ||
| if (copyTimeoutRef.current) { | ||
| clearTimeout(copyTimeoutRef.current); | ||
| } | ||
| }, | ||
| [] | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this? I think this is opting out of some of the React renderer behavior, kind of hard to grasp, I would like to understand what's the problem we tried to solve
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment:
// Reset "Copied!" message after 2 seconds. Without the cleanup (return statement),
// if the sheet closes within 2s, setTimeout would still fire and call setState on
// an unmounted component, causing React warnings and wasted work.
The explicit date-fns@^4.1.0 dependency added in commit 989fe49 caused build failures due to incompatibility with [email protected] used by @redpanda-data/[email protected]. date-fns-tz v2 imports internal date-fns paths (./format/index.js, ./_lib/cloneObject/index.js, etc.) that are not exported in date-fns v4. By removing the explicit dependency, date-fns naturally resolves to v2 through the dependency tree, which is compatible with all packages.
- franz-go v1.20.4 -> v1.20.6 - franz-go/pkg/sr v1.5.0 -> v1.6.0 - Update transitive dependencies Aligns with enterprise backend dependency versions.
- Rename serviceName to rootServiceName across traces UI - Use protobuf Duration type instead of durationMs bigint - Add spinner components to loading states - Update test helpers and test cases
- Use TraceHistogram from API instead of client-side bucketing - Add hover tooltips with success/error breakdown - Show visible window indicator for loaded traces - Add stacked bars for success/error counts - Add header with "Showing X of Y" summary - Add legend for chart elements
- Accumulate traces across pages with "Load 100 older traces" button - Add jump navigation by clicking histogram buckets - Show "Back to newest" button when in jumped mode - Add fastFailRetry utility for deterministic gRPC error handling - Track initial histogram separately from paginated results
Keep the "Load older traces" button visible with spinner while fetching instead of hiding it during the loading period.
- Add calculateNiceTicks using Nice Numbers algorithm for clean axis values (e.g., 0, 20, 40, 60 instead of 0, 33, 66) - Add Y-axis labels and subtle horizontal grid lines - Increase top padding for better visual spacing
…nt refresh state - Extract resetQueryState callback to DRY up duplicated reset logic - Replace useEffect timeRange watcher with handleTimeRangeChange handler to prevent double-queries from nuqs hydration - Remove isRefreshing state, use isLoading from query instead - Remove unused refetch() - nowMs change triggers automatic query refetch
| * in { case: 'typeValue', value: actualValue } objects. This function unwraps them | ||
| * into plain JavaScript values for easier handling in the UI. | ||
| */ | ||
| const extractProtoValue = (value: AnyValue | undefined): unknown => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: if possible, could we generate some unit test for this? I think it'd give us some confidence when modifying this in the future and also serve as a documentation.
| } | ||
| }; | ||
|
|
||
| const getAttributeValue = (value: AnyValue | undefined): string => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: if possible, could we generate some unit test for this? I think it'd give us some confidence when modifying this in the future and also serve as a documentation.
| }; | ||
|
|
||
| export const AttributesTab: FC<Props> = ({ span }) => { | ||
| const [searchQuery, setSearchQuery] = useState(''); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we try to persist this to URL so that the URLs are shareable? (in case it makes sense semantically) We either can use useQueryState from nuqs or our wrapper useQueryStateWithCallback if we want to also involve localstorage default.
| <div className="relative"> | ||
| <Search className="pointer-events-none absolute top-1/2 left-2 h-3 w-3 -translate-y-1/2 text-muted-foreground" /> | ||
| <Input | ||
| className="h-7 pl-7 text-xs" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add testid please, so that we can more easily generate e2e.
| const SMALL_PAYLOAD_THRESHOLD = 2 * 1024; // 2KB | ||
| const PREVIEW_LINES = 3; | ||
|
|
||
| const formatBytes = (bytes: number): string => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we move to utils and unit test?
| className="flex h-auto w-full items-center justify-between px-0 py-0 text-left" | ||
| onClick={() => setIsExpanded(!isExpanded)} | ||
| type="button" | ||
| variant="ghost" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add testid, and possbily aria-expanded={isExpanded} for better e2e experience.
| ) : ( | ||
| <ChevronRight className="h-3 w-3 text-muted-foreground" /> | ||
| )} | ||
| <h5 className="font-medium text-[10px] text-muted-foreground uppercase tracking-wide">{title}</h5> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ideally we should try to stick with standard text-sm text-xl and avoid explicitelly setting px.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use Typography heading
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use Typography heading
| if (messages.length === 0) { | ||
| return ( | ||
| <Empty> | ||
| <EmptyHeader> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally if we could also add some testids
| } | ||
| }; | ||
|
|
||
| const isJsonContent = (content: string): boolean => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we also move this to utils?
| return `${content.slice(0, MAX_PAYLOAD_SIZE)}\n\n[... truncated ${content.length - MAX_PAYLOAD_SIZE} characters]`; | ||
| }; | ||
|
|
||
| const isJsonContent = (content: string): boolean => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is duplciated with other fn isJSON from raw-json-tab and if moved to utils I think it'd be better
jvorcak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of these are nits, but I think in general we should
- Aim for better a11y
- Provide proper testids where needed
- Try to generate e2e tests
- Move as much utility fns to utils and unit test them.
frontend/src/components/pages/traces/components/tool-call-tab.tsx
Outdated
Show resolved
Hide resolved
frontend/src/components/pages/traces/components/trace-details-sheet.tsx
Outdated
Show resolved
Hide resolved
frontend/src/components/pages/traces/components/trace-details-sheet.tsx
Outdated
Show resolved
Hide resolved
frontend/src/components/pages/traces/components/trace-waterfall.tsx
Outdated
Show resolved
Hide resolved
Extract helper components to reduce cognitive complexity below lint threshold (15): - trace-activity-chart: add block statements to getBarColorClass - traces-table: extract RootTraceServiceBadge, RootTraceDurationCell - trace-list-page: extract calculateVisibleWindow, TracesStatsRow, LoadMoreButton, TraceListToolbar
| "test": "bun run test:ci", | ||
| "test:unit": "TZ=GMT vitest run --config=vitest.config.unit.mts", | ||
| "test:integration": "NODE_OPTIONS='--loader=./tests/css-loader.mjs' TZ=GMT vitest run --config=vitest.config.integration.mts", | ||
| "test:integration": "NODE_OPTIONS='--max_old_space_size=4096 --loader=./tests/css-loader.mjs' TZ=GMT vitest run --config=vitest.config.integration.mts", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did this end up helping a bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are still looking into the tests, I added it because of the OOM logs we saw, but still unsure if this is helping
| ) : ( | ||
| <ChevronRight className="h-3 w-3 text-muted-foreground" /> | ||
| )} | ||
| <h5 className="font-medium text-[10px] text-muted-foreground uppercase tracking-wide">{title}</h5> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use Typography heading
| ) : ( | ||
| <ChevronRight className="h-3 w-3 text-muted-foreground" /> | ||
| )} | ||
| <h5 className="font-medium text-[10px] text-muted-foreground uppercase tracking-wide">{title}</h5> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use Typography heading
|
|
||
| // Use useState to lazily initialize the QueryClient once and keep it stable across re-renders | ||
| // This prevents the client from being recreated when the Wrapper re-renders | ||
| const [queryClient] = useState( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you notice a significant improvement? I guess when running locally we don't encounter this problem because query client is a singleton, but I suppose it can be quite counterproductive when running a lot of tests
- Move formatBytes, formatTime, getPreview to trace-formatters.ts - Move groupTracesByDate, calculateVisibleWindow to trace-statistics.ts - Add unit tests for both modules
- Add ARIA roles, labels, and controls for tabs, buttons, expandable sections - Add data-testid attributes for E2E testing - Mark decorative icons with aria-hidden
- Add trace-page utility with page object patterns - Add trace-workflow spec covering list, detail view, and navigation
Move truncateContent and formatJsonContent to trace-formatters.ts. Replace local isJsonContent with existing tryParseJson utility.
Persist tab selection in URL using nuqs. Replace manual copy state management with sonner toast notifications.
This PR covers the tracing UI that is served by the new TracingService RPC. The tracing sidebar item is hidden/shown conditionally depending on the console backend config.