-
Notifications
You must be signed in to change notification settings - Fork 3
Feature/Integrated Dataset Pages #3859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…nt datasets * feat(integrated-dataset): combine contributors from parent datasets * feat(integrated-dataset): deduplicate people by ORCiD if available, and sort by average index * fix(integrated-dataset): fix type errors * style(integrated-dataset): code style feedback
…ntegrated Datasets (#3887)
Co-authored-by: Matt Yoder <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a new “integrated dataset” detail-page variant and supporting infrastructure for integrated/derived datasets, along with related UX and data-access improvements across search, detail pages, bulk download, and metadata handling.
Changes:
- Add an Integrated Dataset detail page variant (including visualization, files, integrated data tables, provenance, analysis details, bulk transfer, and attribution) and route-level wiring to select it based on
is_integrated. - Refactor shared table, metadata, and download utilities (e.g.,
EntitiesTables, metadata TSV download,useDownloadTSV,DownloadsDropdownMenu) to support richer actions (saving, workspace actions, bulk download, TSV export) and integrated entity workflows. - Improve protocol URL handling (including GitHub links), visualization loading UX, publication data display (integrated data tables + collections), and several smaller UX/RBAC improvements (dataset access warnings, workspace buttons visibility, etc.).
Reviewed changes
Copilot reviewed 82 out of 84 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| context/app/utils.py | Extends should_redirect_entity to not redirect integrated datasets, allowing them to have their own detail pages. |
| context/app/static/js/typings/search.ts | Extends search document typings to include donor BMI fields and dataset description, enabling new columns and filters. |
| context/app/static/js/shared-styles/tables/columns.tsx | Adds dataset access warning component, new donor and description columns, safer joins for array fields, and wiring for sortable/width-configured columns. |
| context/app/static/js/shared-styles/tables/NumSelectedHeader/style.ts | Makes the header wrapper optionally omit its bottom border for use inside composite header rows. |
| context/app/static/js/shared-styles/tables/NumSelectedHeader/NumSelectedHeader.tsx | Reuses the new HeaderWrapperProps for consistent typing and styling. |
| context/app/static/js/shared-styles/tables/EntitiesTable/types.ts | Extends EntitiesTabTypes to support per-tab headerActions, initial sort state, and tooltip text. |
| context/app/static/js/shared-styles/tables/EntitiesTable/EntityTableHeaderCell.tsx | Adjusts header cell flex alignment and text wrapping to better accommodate filter controls and long labels. |
| context/app/static/js/shared-styles/tables/EntitiesTable/EntityTable.tsx | Allows non-selectable tables, adds an optional extra sticky header row for selection and header actions, and correctly manages sticky offsets. |
| context/app/static/js/shared-styles/tables/EntitiesTable/EntitiesTables.tsx | Refactors into separate tabs/bodies components, adds skeleton loading, per-tab emptiness handling, selection reset on tab change, and support for inline header actions. |
| context/app/static/js/shared-styles/icons/sectionIconMap.ts | Adds icons for new integrated-data and protocols-and-workflow-details sections and normalizes visualization icons. |
| context/app/static/js/pages/Sample/Sample.tsx | Switches to DetailContextProvider and passes explicit entity type, simplifying and standardizing detail context setup. |
| context/app/static/js/pages/Publication/Publication.tsx | Uses DetailContextProvider, replaces publication-specific bulk transfer with the generic BulkDataTransfer (using ancestor dataset UUIDs), and wires in the new PublicationsDataSection. |
| context/app/static/js/pages/Donor/Donor.tsx | Uses DetailContextProvider with entity type, aligning donor detail pages with the new context model. |
| context/app/static/js/pages/Dataset/utils.ts | Adds combinePeopleLists to deduplicate and order contributors/contacts across multiple datasets using ORCID or name+affiliation. |
| context/app/static/js/pages/Dataset/hooks.ts | Renames the processed dataset section key to protocols-and-workflow-details to match new icons and section components. |
| context/app/static/js/pages/Dataset/IntegratedDataset.tsx | Implements the new Integrated Dataset detail page (summary, visualization, integrated data tables, metadata, files, bulk transfer, provenance, analysis details, collections, publications, and attribution). |
| context/app/static/js/pages/Dataset/DatasetPageSummaryChildren.tsx | Adds reusable summary content that links to assay docs and organ pages, aggregating organs across origin samples. |
| context/app/static/js/pages/Dataset/Dataset.tsx | Wires integrated dataset variant, reuses SummaryDataChildren, routes processed-data section through ProcessedData, and delegates to IntegratedDatasetPage when appropriate. |
| context/app/static/js/hooks/useUBKG.ts | Hardens cellTypeDetail against empty IDs, avoiding unnecessary requests with empty identifiers. |
| context/app/static/js/hooks/useSearchData.ts | Fixes multisearch to send the correct number of Elasticsearch endpoints (one per request), aligning with multiFetch expectations. |
| context/app/static/js/hooks/useProtocolData.ts | Refactors protocol URL formatting to separate Protocols.io vs GitHub URLs and handle multiple/variant URL formats more robustly. |
| context/app/static/js/hooks/useProtocolData.spec.ts | Adds TypeScript tests for useFormattedProtocolUrls and isGithubUrl, covering mixed URL formats and edge cases. |
| context/app/static/js/hooks/useProtocolData.spec.js | Removes the legacy JS test file in favor of the new TS test suite. |
| context/app/static/js/hooks/useEntityData.ts | Extends useEntitiesData with shouldFetch and useDefaultQuery options to better control downstream useSearchHits behavior. |
| context/app/static/js/hooks/useDownloadTSV.ts | Introduces a reusable hook for TSV downloads driven by search/lineup entities, including analytics and error handling. |
| context/app/static/js/components/types.ts | Extends Entity and Dataset types with immediate ancestor/descendant IDs and is_integrated, enabling integrated-dataset-specific logic. |
| context/app/static/js/components/search/MetadataMenu/DownloadTSVItem.tsx | Refactors TSV download item to delegate to useDownloadTSV, simplifying local state and behavior. |
| context/app/static/js/components/publications/PublicationsDataSection/index.ts | Exports the new TypeScript PublicationsDataSection implementation. |
| context/app/static/js/components/publications/PublicationsDataSection/PublicationsDataSection.tsx | Reimplements publication “Data” section using integrated data tables plus collections, even when an associated collection is present. |
| context/app/static/js/components/publications/PublicationsDataSection/PublicationsDataSection.jsx | Removes the old JS implementation that used PublicationRelatedEntities. |
| context/app/static/js/components/publications/PublicationRelatedEntities/index.ts | Removes unused index barrel for the old related-entities component. |
| context/app/static/js/components/publications/PublicationRelatedEntities/hooks.ts | Removes the specialized publication-related-entities search hooks in favor of integrated data logic. |
| context/app/static/js/components/publications/PublicationRelatedEntities/PublicationRelatedEntities.tsx | Removes the old related-entities UI, replaced by integrated data tables. |
| context/app/static/js/components/publications/PublicationCollections/PublicationCollections.tsx | Simplifies the collections sub-section to render within the new “Data” section instead of owning its own section wrapper. |
| context/app/static/js/components/detailPage/visualization/style.ts | Centralizes the vitessceFixedHeight constant for reuse across visualization wrappers and skeletons. |
| context/app/static/js/components/detailPage/visualization/VitessceSkeleton/index.ts | Adds barrel export for the new Vitessce skeleton component. |
| context/app/static/js/components/detailPage/visualization/VitessceSkeleton/VisualizationSkeleton.tsx | Implements a skeleton layout approximating Vitessce’s multi-panel visualization while data is loading. |
| context/app/static/js/components/detailPage/visualization/VisualizationWrapper/style.ts | Simplifies wrapper styling to use the shared vitessceFixedHeight and drops the now-unused generic background wrapper. |
| context/app/static/js/components/detailPage/visualization/VisualizationWrapper/VisualizationWrapper.tsx | Switches Suspense fallback to the new visualization skeleton for a smoother loading experience. |
| context/app/static/js/components/detailPage/visualization/VisualizationWrapper/VisualizationSuspenseFallback.tsx | Reworks fallback to use the skeleton plus header, decoupled from being the Suspense fallback. |
| context/app/static/js/components/detailPage/visualization/Visualization/style.ts | Moves the fixed Vitessce height into a shared style file and keeps visualization styling/export surface consistent. |
| context/app/static/js/components/detailPage/visualization/Visualization/Visualization.tsx | Uses the new shared height and skeleton when Vitessce config is unavailable, improving user feedback during load. |
| context/app/static/js/components/detailPage/visualization/IntegratedDatasetVisualizationSection/index.ts | Barrel export for the integrated-dataset visualization section. |
| context/app/static/js/components/detailPage/visualization/IntegratedDatasetVisualizationSection/IntegratedDatasetVisualizationSection.tsx | Adds a dedicated “Visualization” section for integrated datasets with descriptive copy and wrapped Vitessce viewer. |
| context/app/static/js/components/detailPage/provenance/ProvTabs/ProvTabs.tsx | Adds an integratedDataset mode that skips tabs and shows only the graph (with large-graph warning) for integrated datasets. |
| context/app/static/js/components/detailPage/provenance/ProvSection/ProvSection.tsx | Wires integratedDataset through to ProvTabs, preserving other provenance behavior. |
| context/app/static/js/components/detailPage/multi-assay/MultiAssayMetadataTabs/MultiAssayMetadataTabs.tsx | Enhances multi-assay metadata tabs to disambiguate duplicate labels by appending HuBMAP IDs. |
| context/app/static/js/components/detailPage/files/file-fixtures.spec.ts | Extends test fixtures with entityType to match the new DetailContext shape. |
| context/app/static/js/components/detailPage/files/MultiFileDownloader/MultiFileDownloader.tsx | Changes to accept UnprocessedFile[] and generate download URLs via useFileLinks. |
| context/app/static/js/components/detailPage/files/IntegratedDatasetFiles/index.ts | Barrel export for the integrated dataset files section. |
| context/app/static/js/components/detailPage/files/IntegratedDatasetFiles/IntegratedDatasetFiles.tsx | Adds an integrated dataset-specific “Files” section wired to FilesTabs and DetailContext. |
| context/app/static/js/components/detailPage/files/FilesTabs/index.ts | Barrel export for the new shared FilesTabs component. |
| context/app/static/js/components/detailPage/files/FilesTabs/FilesTabs.tsx | Consolidates “Data Products” and “File Browser” into a single tabbed UI, tracking tab changes for analytics. |
| context/app/static/js/components/detailPage/files/Files/Files.spec.tsx | Updates tests to use DataProductProvider instead of processed dataset context for file-link generation. |
| context/app/static/js/components/detailPage/files/FileBrowserNode/FileBrowserNode.spec.tsx | Adjusts tests to wrap in DataProductProvider to satisfy new context requirements. |
| context/app/static/js/components/detailPage/files/FileBrowserFile/FileBrowserFile.spec.tsx | Similar test adjustments to use DataProductProvider. |
| context/app/static/js/components/detailPage/files/FileBrowser/FileBrowser.spec.tsx | Updates tests for the new DataProduct-based context usage. |
| context/app/static/js/components/detailPage/files/DataProducts/hooks.ts | Refactors file-link creation to use a new DataProductContext for dataset UUIDs instead of processed dataset context. |
| context/app/static/js/components/detailPage/files/DataProducts/DataProducts.tsx | Switches “Download All” behavior to pass raw UnprocessedFile[] into MultiFileDownloader; continues using file links internally. |
| context/app/static/js/components/detailPage/files/DataProducts/DataProducts.spec.tsx | Adds DataProductProvider to tests to match new context requirements and extends detail context with entityType. |
| context/app/static/js/components/detailPage/files/DataProducts/DataProductContext.tsx | Introduces a simple context/provider for the dataset UUID used in data-product file links. |
| context/app/static/js/components/detailPage/files/DataProducts/DataProduct.tsx | Performs minor cleanup (removal of commented pipeline-info import) while still using useFileLink. |
| context/app/static/js/components/detailPage/entityHeader/EntityHeaderActionButtons/EntityHeaderActionButtons.tsx | Gates the Processed Data Workspace menu behind isWorkspacesUser, consistent with workspace access controls. |
| context/app/static/js/components/detailPage/Protocol/Protocol.tsx | Integrates new formatted protocol URL structure and adds support for rendering GitHub repository links separately. |
| context/app/static/js/components/detailPage/ProcessedData/ProcessedDataset/ProcessedDataset.tsx | Refactors file tabbing to use shared FilesTabs, standardizes the protocols/workflow section id, and updates copy/analytics hooks. |
| context/app/static/js/components/detailPage/MetadataSection/hooks.ts | Adds a hook to compute per-entity metadata tables and TSV download URLs using UBKG field descriptions. |
| context/app/static/js/components/detailPage/MetadataSection/columns.ts | Expands TSV columns to include HuBMAP ID and entity label alongside key/value/description. |
| context/app/static/js/components/detailPage/MetadataSection/MetadataSection.tsx | Refactors metadata rendering to use the new hook and table layout, adds a primary “Download” button, and adjusts dataset descriptions for integrated datasets. |
| context/app/static/js/components/detailPage/IntegratedData/index.ts | Barrel export for the integrated data section. |
| context/app/static/js/components/detailPage/IntegratedData/IntegratedDataTables.tsx | Implements integrated data tables over donors, samples, and datasets using EntitiesTables with rich header actions. |
| context/app/static/js/components/detailPage/IntegratedData/IntegratedData.tsx | Adds the “Integrated Data” section (with TSV download for all integrated entities) used on integrated dataset and publication pages. |
| context/app/static/js/components/detailPage/DetailContext.tsx | Extends detail context with entityType and provides a typed DetailContextProvider used across entity pages. |
| context/app/static/js/components/detailPage/BulkDataTransfer/PublicationBulkDataTransfer.tsx | Removes the publication-specific bulk transfer wrapper in favor of the generalized section. |
| context/app/static/js/components/detailPage/BulkDataTransfer/BulkDataTransferSection.tsx | Generalizes bulk data transfer to support both regular datasets and integrated entities (dataset or publication) including CLT instructions and UUID sets. |
| context/app/static/js/components/detailPage/Attribution/Attribution.tsx | Adds specialized attribution descriptions for integrated and external datasets while retaining existing behavior for standard datasets. |
| context/app/static/js/components/detailPage/AnalysisDetails/AnalysisDetailsSection.tsx | Introduces a “Protocols & Workflow Details” section combining protocol links and analysis details with graceful fallbacks. |
| context/app/static/js/components/detailPage/AnalysisDetails/AnalysisDetails.tsx | Small type-cleanup (rename of props interface) for clarity. |
| context/app/static/js/components/data-transfer/DownloadsDropdownMenu/index.ts | Barrel export for the new downloads dropdown menu. |
| context/app/static/js/components/data-transfer/DownloadsDropdownMenu/DownloadsDropdownMenu.tsx | Adds a reusable “Download” dropdown (bulk datasets + TSV metadata) wired to selection state and access checks. |
| context/app/static/js/components/Routes/Routes.jsx | Threads the integrated flag from Flask data into the Dataset route to select the integrated dataset variant. |
| context/app/static/js/components/Contexts.tsx | Extends FlaskDataContextType with an integrated flag for integrated datasets. |
| context/app/routes_browse.py | Reads is_integrated from entities and injects it into flask_data for the React app to consume. |
| CHANGELOG-integrated-dataset-pages.md | Adds changelog entry for the introduction of integrated dataset pages. |
| CHANGELOG-integrated-dataset-pages-continued.md | Adds follow-on changelog entries for bulk data transfer and publication data section behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
See #3899 for up-to-date screenshots. The review of that PR covered all currently present functionality. |
| const labelCounts: Record<string, number> = {}; | ||
| entities.forEach(({ label }) => { | ||
| labelCounts[label] = (labelCounts[label] || 0) + 1; | ||
| }); | ||
|
|
||
| const deduplicated = entities.map(({ label, hubmap_id, tableRows, ...rest }) => { | ||
| const count = labelCounts[label]; | ||
| return { | ||
| hubmap_id, | ||
| tableRows, | ||
| label: | ||
| count > 1 ? ( | ||
| <Stack key={hubmap_id} spacing={0} alignItems="center"> | ||
| <div>{label}</div> | ||
| <div>({hubmap_id})</div> | ||
| </Stack> | ||
| ) : ( | ||
| label | ||
| ), | ||
| ...rest, | ||
| }; | ||
| }); | ||
| return deduplicated; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we dedupe before counting? Also counting feels like the use case for reduce.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is meant to disambiguate cases where there is more than one ancestor of a given type - e.g. if an object x analyte has 20 different datasets of the same data type, but with different metadata, this processing avoids having two labels with the same title by appending the HuBMAP ID.
As a result, we do need the counts - but calling it "deduplicated" is definitely unclear.
I'll revise this to use a .reduce call for the initial count aggregation and rename things accordingly, but the logic is otherwise sound.
john-conroy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Most of the comments resolved in earlier PRs. Thanks!
Summary
This branch contains the integrated dataset pages functionality.
Context:
We've received recurring asks for dedicated detail pages for EPICs, rather than having them redirect to the parent dataset. I am taking care of early prototyping then handing off any remaining TODOs to polish the functionality and accelerate development.
Lisa also highlighted that we have received previous feedback that the representation of SNARE-Seq2 data in the portal is not ideal. EPICs and SNARE-Seq2 are both instances where users' interest is in processed outputs that integrate data from multiple component datasets, rather than individual datasets' raw data. Therefore, an Integrated Dataset detail page handles both use-cases, while also allowing us to avoid presenting the internal "EPIC" name which users are not familiar with.
When users navigate directly to an integrated dataset's detail page (from dataset search, from being included in a publication's list of datasets, from copying/pasting the URL, etc), rather than displaying the unified view that focuses on the raw data, an alternative view is presented which prioritizes displaying the visualization of the processed data and provides tools to read the metadata/download the bulk data for the integrated analysis and its component datasets.
Integrated datasets will also remain visible in their comprising datasets' processed data section.
Designs are here: https://www.figma.com/design/LExciZTIeYVDkSYnqY7ebe/Integrated-Dataset-Detail-Pages?node-id=1371-3815&t=D21hJHYEi5MjcAV3-1
TODOs
is_integratedlogic to search api transformationsDesign Documentation/Original Tickets
https://www.figma.com/design/LExciZTIeYVDkSYnqY7ebe/Integrated-Dataset-Detail-Pages?node-id=1371-3815&t=gNbcm7gK0d7Eb4FG-1
Testing
Describe how the feature has been tested, including both automated and manual testing strategies.
Screenshots/Video
Include screenshots or video demonstrating any significant visual or behavioral changes.
Checklist
CHANGELOG-your-feature-name-here.mdis present in the root directory, describing the change(s) in full sentences.Additional Notes
Preview