Skip to content

Releases: data-dot-all/dataall

v2.9.0

20 Oct 11:14
e9f7d11

Choose a tag to compare

Release v2.9.0

What's Changed

🚀 Major Features

🔧 Technical Upgrades

🔒 Security & Permission Hardening

✨ Enhancements

🐛 Bug Fixes

📦 Dependency Updates

New Contributors

Full Changelog: v2.8.0...v2.9.0

v2.8.0

08 Jul 02:21
12f5893

Choose a tag to compare

Summary

🚨 This release includes a major infrastructure update that automates the migration from Aurora V1 to V2 with minimal configuration changes. 🚨 It also delivers key improvements such as asynchronous notifications in metadata form for enforcement rules and the migration of the user guide to GitHub Pages. As part of keeping the infrastructure up to date, Node.js 18 (now deprecated and nearing end-of-life) has been removed and replaced with Node.js 22 for CDK code builds and cdk synth. UI bugs identified after the 2.7.0 release have been addressed, and additional issues in the metadata forms module have been resolved. Integration test performance has been improved through faster execution and role configuration, and database session handling has been made more robust. The release also includes important security and compatibility updates to dependencies like urllib3 and requests.

What's Changed

Major changes

Other changes

Features and Enhancements

Bug Fixes

Security

Dependency Updates

Full Changelog: v2.7.0...v2.8.0-rc

v2.7.0

30 May 09:08
667c2ad

Choose a tag to compare

The data.all 2.7.0 release places a strong emphasis on fortifying platform security, while simultaneously delivering significant new capabilities. Major advancements, such as the robust Amazon Redshift integration with enhanced sharing controls and the introduction of row and column level data filtering, dramatically improve granular access governance for diverse data assets. Furthermore, dynamic metadata forms now enable programmatic enforcement of security policies, adding another layer of data protection. These pivotal features are backed by comprehensive security enhancements including strengthened input validation, critical dependency upgrades, platform hardening (like S3 bucket versioning), improved logging and monitoring, and advanced network security controls, all contributing to a more secure and resilient data ecosystem.

Finally a warm welcome to @anushka-singh, @rbernotas , and @TejasRGitHub from Yahoo to data.all's maintainers team

What's Changed

Security Related Changes

  • fix DatabaseResourceArn SSM param by @petrkalos in #1398
  • Add init for resource lock by @noah-paige in #1426
  • Fix: Typo, missing @staticmethod in ResourcePolicyRepository method by @dlpzx in #1439
  • Redshift data sharing - Cluster encryption guardrails and information by @dlpzx in #1447
  • update checkov baseline for cdk synth output by @noah-paige in #1450
  • Updated glue crawler security config by @mourya-33 in #1434
  • allow dbmigrations lambda to invoke any alembic command by @petrkalos in #1488
  • Import Datasets: Validate that bucket is unique by @SofiaSazonova in #1498
  • check bucket encryption type: key|alias by @SofiaSazonova in #1499
  • Validate imported resource names via NamingConventionService by @SofiaSazonova in #1501
  • S3Bucket WRITE/MODIFY permissions by @petrkalos in #1472
  • Allow origins conf changes by @mourya-33 in #1486
  • fix importing sse encrypted buckets by @petrkalos in #1514
  • Redshift data sharing - Add interface for share validations and Redshift guardrails by @dlpzx in #1484
  • Update baseline removing checkov exception for glue security config by @noah-paige in #1516
  • Add External Id Conditions to Deployment Roles by @noah-paige in #1521
  • Add bucket versioning by @noah-paige in #1522
  • Add bucket versioning pt 2 by @noah-paige in #1529
  • Increase access point creation buffer time and fix bug in share cross account if condition by @SofiaSazonova in #1552
  • Bandit fix: explicitly install typing-extensions by @SofiaSazonova in #1600
  • New permission model for Redshift ADMIN connections by @dlpzx in #1573
  • warn users when evaluating a non-readonly share request by @petrkalos in #1568
  • try to create AP every time, catch if already exists by @SofiaSazonova in #1609
  • Restrict invitation to Redshift Connections and edit permission name by @dlpzx in #1638
  • Add forceDelete to shareObjects to clean-up all shareItems by @dlpzx in #1646
  • Add permission checks to markNotificationAsRead + deleteNotification by @noah-paige in #1654
  • Add Removal Policy Retain to Bucket Policy IaC by @noah-paige in #1660
  • Extend Tenant Perms Coverage by @noah-paige in #1630
  • add custom domain support for apigw by @petrkalos in #1679
  • add warning to untrust data.all account when removing an environment by @petrkalos in #1685
  • Restrict pivotRole permissions with DENY statement by @dlpzx in #1681
  • Added Token Validations by @noah-paige in #1682
  • Updating overly permissive policies tagged by checkov for environment role using least privilege principles by @mourya-33 in #1632
  • Update sanitization technique by @noah-paige in #1692
  • Fix/input validation by @noah-paige in #1693
  • Add MANAGE_SHARES permissions by @dlpzx in #1702
  • Disable introspection on prod sizing by @noah-paige in #1704
  • Bump python runtime to bump cdk klayers cryptography version by @noah-paige in #1707
  • tenant-permission tests by @dlpzx in #1694
  • Added permission check - is tenant to update SSM parameters API by @dlpzx in #1714
  • Add GET_SHARE_OBJECT permissions to get data filters API by @dlpzx in #1717
  • Add permissions on list datasets for env group + cosmetic S3 Datasets by @dlpzx in #1718
  • Add GET_WORKSHEET permission in RUN_SQL_QUERY by @dlpzx in #1716
  • Added permissions to Quicksight monitoring service layer by @dlpzx in #1715
  • Add LIST_ENVIRONMENT_DATASETS permission for listing shared datasets and cleanup unused code by @dlpzx in #1719
  • Add omics create_run unauthorized test and improve other tests by @dlpzx in #1723
  • Introduce is_owner permissions to Glossary mutations + add new integration tests by @dlpzx in #1721
  • Refactor env permissions + modify getTrustAccount by @dlpzx in #1712
  • Avoid infinite loop in glossaries checks by @dlpzx in #1725
  • Feed consistent permissions by @dlpzx in #1722
  • Votes consistent permissions by @dlpzx in #1724
  • Consistent get_<DATA_ASSET> permissions - Dashboards by @dlpzx in #1729
  • add resource permission checks by @petrkalos in #1711
  • Consistent get_<DATA_ASSET> permissions - S3_Datasets by @dlpzx in #1727
  • BUGFIX] gh-1734 by @TejasRGitHub in [#1741
  • Gh 884] IAM policy splitting for requestor IAM policies by @TejasRGitHub in [#1650
  • Bugfix] - Changes in logic to delete share db by @TejasRGitHub in [#1706
  • Bugfix] | GH-1749 -Fixing share expiration task by @TejasRGitHub in [#1750
  • Fix: Add conditional to not lock empty list of resources by @dlpzx in #1760
  • disable apigw data tracing to avoid leaking sensitive information by @petrkalos in #1798
  • allow customization of waf rate limits and api gateway throttling limits by @petrkalos in #1800
  • add s3 server access logging by @petrkalos in #1811
  • git CodeBuild baseline role permissions to use GitHub connection by @SofiaSazonova in #1813
  • create a new access logs bucket instead of importing by @petrkalos in #1815
  • Fix/custom auth 500 by @petrkalos in #1792
  • change all lambdas to structured logging by @petrkalos in #1801
  • add explicit token duration config for both JWTs by @noah-paige in #1698
  • Userguide signout flow by @noah-paige in #1629
  • log API handler response only for LOG_LEVEL DEBUG. Set log level INFO for prod deployments by @dlpzx in #1662
  • Separating Out Access Logging by @noah-paige in #1695

Major Changes

Redshift Integration

This section details significant advancements in integrating Amazon Redshift, enabling better management, sharing, and security of Redshift datasets within the platform.

  • Add Redshift datasets module by @dlpzx in #1424 - Introduces a new module for managing Redshift datasets.
  • Redshift dataset module testing: Re-added client factories, mocking clients by @dlpzx in #1449 - Enhances testing capabilities for the Redshift dataset module by re-adding client factories and mocking clients.
  • Redshift data sharing - Redshift connection types and namespace Id by @dlpzx in #1451 - Adds support for different Redshift connection types and namespace IDs for data sharing.
  • Redshift data sharing - Boilerplate for redshift dataset sharing module by @dlpzx in #1461 - Provides foundational code for the Redshift dataset sharing module.
  • Redshift data sharing - Make ShareObject.IAMRole a generic "Role" by @dlpzx in #1462 - Generalizes the IAM Role definition within ShareObject for Redshift data sharing.
  • Redshift data sharing - Polish frontend views for Redshift shares by @dlpzx in #1477 - Improves the user interface for managing Redshift shares.
  • Redshift data sharing - Add sharing tasks to process Redshift datashares by @dlpzx in #1467 - Implements tasks to process Redshift data shares.
  • Redshift data sharing - Added methods from sharing back to redshift datasets (check_on_delete, list_shared_datasets...) by @dlpzx in #1511 - Adds methods for managing shared Redshift datasets, including checks on deletion and listing shared datasets.
  • Redshift data sharing - Documentation 1 - Redshift Connections and Datasets by @dlpzx in #1512 - First part of the Redshift connections and datasets documentation.
  • Redshift data sharing - Documentation 2 - Redshift Sharing by @dlpzx in #1519 - Second part of the Redshift sharing documentation.
  • Redshift data sharing - frontend changes in the Catalog - clean by @dlpzx in #1458 - Cleans up frontend changes related to Redshift data sharing in the Catalog.
  • Fix wrong environment in the verification of redshift role by @dlpzx in #1587 - Corrects an issue with Redshift role verification related to environments.
  • Add Redshift connection tooltips and info + restrict to DATA_USER connections for import Redshift Dataset by @dlpzx in #1565 - Adds helpful tooltips and restricts Redshift dataset import to DATA_USER connections.
  • Integration tests executed on a real deployment as part of the CICD - Redshift Connections by @dlpzx in #1628 - Introduces integration tests for Redshift connections within the CI/CD pipeline, executed on a real deployment.
  • Integration tests executed on a real deployment as part of the CICD - Redshift Datasets by @dlpzx in #1636 - Adds integration tests for Redshift datasets within the CI/CD pipeline, executed on a real deployment.
  • Fix error message of Redshift share verifier by @dlpzx in #1647 - Resolves an issue with the error message from the Redshift share verifier.
  • Fix: check if Redshift table exists before publishing it to data.all by @dlpzx in #1644 - Ensures Redshift tables exist before being published to data.all.
  • Integration tests executed on a real deployment as part of the CICD - Redshift Shares by @dlpzx in #1643 - Implements integration tests for Redshift shares within the CI/CD pipeline, executed on a real deployment.

Test improvements

This section highlights a series of enhancements to the tes...

Read more

v2.7.0-rc1

17 Feb 13:24
c36d337

Choose a tag to compare

v2.7.0-rc1 Pre-release
Pre-release

What's Changed

Read more

v2.6.2

15 Jan 15:54
749ffc3

Choose a tag to compare

🔐 Security

Data.all permission model has been reviewed to ensure all Mutations and Queries have proper permissions:

  • Add MANAGE_SHARES permissions by @dlpzx in #1702
  • Add permission check - is tenant to update SSM parameters API by @dlpzx in #1714
  • Add GET_SHARE_OBJECT permissions to get data filters API by @dlpzx in #1717
  • Add permissions on list datasets for env group + cosmetic S3 Datasets by @dlpzx in #1718
  • Add GET_WORKSHEET permission in RUN_SQL_QUERY by @dlpzx in #1716
  • Add permissions to Quicksight monitoring service layer by @dlpzx in #1715
  • Add LIST_ENVIRONMENT_DATASETS permission for listing shared datasets and cleanup unused code by @dlpzx in #1719
  • Add is_owner permissions to Glossary mutations + add new integration tests by @dlpzx in #1721
  • Refactor env permissions + modify getTrustAccount by @dlpzx in #1712
  • Add Feed consistent permissions by @dlpzx in #1722
  • Add Votes consistent permissions by @dlpzx in #1724
  • Consistent get_<DATA_ASSET> permissions - Dashboards by @dlpzx in #1729

🧪 Test improvements

Integration tests are in sync with main without 2.7 planned features. In this PR all core modules, optional modules and submodules are tested. That includes: tenant-permissions, omics, mlstudio, votes, notifications and backwards compatiblity of s3 shares. by @SofiaSazonova, @noah-paige , @petrkalos and @dlpzx

In addition, the following PR adds functional tests that ensure the permission model of data.all is not corrupted.

Dependencies

v2.6.1

08 Nov 22:49
4b9d784

Choose a tag to compare

What's Changed

This release is focused on security enhancements

  • Added Token Validations (#1682) + small fix in get-parameter CloudfrontDistributionDomainName from us-east-1 (#1687)
  • Add warning to untrust data.all account when removing an environment (#1685)
  • Add custom domain support for apigw (#1679)
  • Lambda Event Logs Handling (#1678)
  • Upgrade Spark version to 3.3 (#1675)
  • ES Search Query Collect All Response (#1631)
  • Extend Tenant Perms Coverage (#1630)
  • Limit Response info dataset queries (#1665)
  • Add Removal Policy Retain to Bucket Policy IaC (#1660)
  • log API handler response only for LOG_LEVEL DEBUG. Set log level INFO for prod deployments (#1662)
  • Add permission checks to markNotificationAsRead + deleteNotification (#1654)
  • Added error view and unified utility to check tenant user (#1657)
  • Userguide signout flow (#1629)

Full Changelog: v2.6.0...v2.6.1

v2.6.0

16 Jul 15:48
5e421fe

Choose a tag to compare

What's Changed

New features 🆕

Refactoring 💻

Enhancements 🥇

Tests 🧪

  • Automate bootstrapping of integrations tests by @petrkalos in #1289
  • Codebuild integration tests reads cognito-test-users param from environment account by @petrkalos in #1295
  • Add environment tests by @petrkalos in #1371, #1334 and Update gql apis + update_environment tests by @petrkalos in #1348
  • Add group/consumption_role invite/remove tests by @petrkalos in #1387
  • Add Dataset integration tests - Dataset CRUD + actions outside of data.all by @dlpzx in #1379
  • Add Worksheet integration tests - all except run sql query by @dlpzx in #1393
  • Add Notebook integration testsby @noah-paige in #1400

Fixes 🪲

Dependencies 📦

  • Safety checks - Ignore disputed issue on pip by @dlpzx in #1271
  • Bump certifi from 2023.7.22 to 2024.7.4 in /deploy/custom_resources/custom_authorizer by @dependabot in #1390
  • Upgrade ejs to 3.1.10 in yarn npm by @dlpzx in #1265
  • Bump requests from 2.31.0 to 2.32.0 in /backend by @dependabot in #1291
  • Bump requests from 2.31.0 to 2.32.0 in /backend/dataall/base/cdkproxy by @dependabot in #1293
  • Bump requests from 2.31.0 to 2.32.2 in /deploy/custom_resources/custom_authorizer by @dependabot in #1309
  • Upgrade flask packages to satisfy safety check by @petrkalos in #1313
  • Fix npm audit findings by @noah-paige in #1341
  • Bump urllib3 from 1.26.18 to 1.26.19 in /deploy/custom_resources/custom_authorizer by @dependabot in #1339
  • Update version auth at edge to use node v20 by @noah-paige in #1327

New Contributors

Full Changelog: v2.5.0...v2.6.0

v2.5.0

13 May 12:02
93ff772

Choose a tag to compare

What's Changed

New features 🆕

  • Make visibility of auto-approval toggle configurable based on confidentiality by @anushka-singh in #1223

Refactoring 💻

Enhancements 🥇

  • Enable encryption for lambda environment variables by @mourya-33 in #1225
  • Add integration tests on a real API client and integrate the tests in CICD by @dlpzx in #1219
  • Update lambda_api.py to add encryption for lambda env vars by @mourya-33 in #1255

Fixes 🪲

Dependencies 📦

  • Bump werkzeug from 3.0.1 to 3.0.3 in /tests_new/integration_tests by @dependabot in #1254
  • Bump werkzeug from 3.0.1 to 3.0.3 in /backend/dataall/base/cdkproxy by @dependabot in #1252
  • Bump werkzeug from 3.0.1 to 3.0.3 in /tests by @dependabot in #1253

Full Changelog: v2.4.0...2.5.0

v2.4.0

25 Apr 12:52
5df6100

Choose a tag to compare

What's Changed

⚠️ ⚠️ Important: Review the warnings in #1064 if you want to use environments in multiple-regions.

New features 🆕

Big Refactoring 💻

Enhancements 🥇

  • Remove allowAll bucket policy statement by @dlpzx in #1106
  • Adding check to remove any spaces in confidentiality names by @TejasRGitHub in #1126
  • Worksheet UI improvements - fix Team and list Environments of Team by @dlpzx in #1111
  • WAF rule parameters in cdk.json + Documentation by @SofiaSazonova in #1140
  • Update cdkExecPolicy.yaml to cleanup overly excessive permissions by @mourya-33 in #1085
  • Add grants to pivot role in verify tables functions by @dlpzx in #1149
  • Implement guardrails and mechanisms to deal with deleted IAM roles in share requests by @SofiaSazonova in #1161
  • Implement least privilege principle for cloudfront, lambda and db migration stacks by @mourya-33 in #1134
  • Implement less restrictive trust policy for local development pivot roles by @dlpzx in #1176

Fixes 🪲

  • Fix EnvUri to check GET_ENV permission for worksheet by @noah-paige in #1125
  • Grant IAM permissions to read data to environment team IAM roles independently from CREATE_DATASET permissions by @SofiaSazonova in #1137
  • Allow ListEnv to get associated organization information by @noah-paige in #1139
  • Redirect the user to correct URL after login by @TejasRGitHub in #1094
  • Fixes for email notifications not sending share link in the body by @TejasRGitHub in #1143
  • Fix folder pagination missing page by @dlpzx in #1158
  • Add "/ "to prefix in crawlers if it is not specified in input by @dlpzx in #1156
  • Add Athena List permissions to use AWS SDK for Pandas in SageMaker by @dlpzx in #1155
  • Add new data.all permissions REMOVE_ORGANIZATION_GROUP, INVITE_ORGANIZATION_GROUP to teams invited to an Organization by @SofiaSazonova in #1162
  • Fix missing GET_FOLDER permissions by @dlpzx in #1163
  • Fix input parameters for get credentials get environment group by @dlpzx in #1198
  • Update CDK exec role Policy name with region in template by @dlpzx in #1197
  • Remove creation of log-groups in Lambdas by @dlpzx in #1192
  • Fix missing session in resolve_environment by @dlpzx in #1199
  • Fix missing $ in CDK custom policy by @dlpzx in #1204
  • Fix unnecessary permission check in resolve_stack functions (failure in list datasets when there are shared datasets) by @dlpzx in #1205
  • Fix reference to locationUri by @dlpzx in #1209
  • Fix sagemaker tagging permissions by @dlpzx in #1211

Documentation 📚

  • Documentation in GitHub pages for release 2.4.0 by @dlpzx in #1191
  • Documentation in Userguide for release 2.4 by @dlpzx in #1218

Dependencies 📦

  • Upgrade follow-redirects and webpack-dev-middleware depedencies in frontend by @dlpzx in #1121
  • Upgrade express in frontend by @dlpzx in #1152
  • Bump idna from 3.4 to 3.7 in /deploy/custom_resources/custom_authorizer by @dependabot in #1166

v2.3.0

13 Mar 08:11
e10a043

Choose a tag to compare

What's Changed

⚠️ ⚠️ Important: After upgrading to v2.3.0 environment stacks need to be updated before executing data sharing requests. If the environment stack is not data sharing will fail. To update the environment stacks there are 3 options:

  1. Using cdk.json parameter enable_update_dataall_stacks_in_cicd_pipeline --> automatically updates the environments and dataset stacks in the CICD pipeline
  2. Waiting for overnight update stack task --> same as the above, but it runs at a daily schedule.
  3. Updating environments in Environment > Stack tab > click on Update button --> manual update

New features 🆕

  • Introduce dataset lock for data sharing, increasing robustness of parallel data sharing by @anushka-singh in #1072
  • Add verification of data sharing and reapplying if "unhealthy" by @noah-paige in #1062
  • Enable Central Catalog Glue databases import by @TejasRGitHub in #1021 and list them in worksheets in #1079
  • Replace IAM inline policies by configurable Managed Policies for folder and bucket sharing by @SofiaSazonova and @dlpzx in #1068
  • Simplify LakeFormation Glue database shares - single shared_db and single resource link table by @dlpzx in #1016 and add sharing guardrails drop permissions in #1055 and update Worksheet database names in UI in #1063
  • Add data sharing auto-approval option for datasets by @SofiaSazonova in #988
  • Introduce feature flags for topics and confidentiality and custom confidentiality list by @TejasRGitHub in #1049

Enhancements 🥇

Fixes 🪲

  • Fix reAuth re-renders glitch by @noah-paige in #918
  • Fix s3 bucket sharing for federated roles by @zsaltys in #920
  • Fix Disappearing Env Value Request Access Modal by @noah-paige in #919
  • Fix Frontend Config Role Issue while switching from Cognito Idp to Custom Auth by @TejasRGitHub in #938
  • Investigate why some shares did not go to failed state (issue 932), but remained stuck or in-progress by @anushka-singh in #933
  • Fix when migrating from Manually Created Pivot Role to Auto Create Pivot Role by @TejasRGitHub in #948
  • Validate consumer roles by @SofiaSazonova in #951
  • Fix local dev environment is broken after recent changes by @TejasRGitHub in #967
  • Bugfix 956 by @anushka-singh in #961
  • Add lakeformation in trust policy of dataset role by @dlpzx in #970
  • Add else if condition to get tables into InSync state by @TejasRGitHub in #980
  • Fix consumption role filtering by @TejasRGitHub in #975
  • Replace dataall prefix by resourcePrefix in data pipeline creation by @dlpzx in #985
  • Remove AWS Manged Lake Formation Service Linked Role from Pivot Role Nested Stack by @TejasRGitHub in #999
  • Fix created dataset naming convention by @noah-paige in #1002
  • Add CloudFormation permission to PivotRoleNestedStack by @TejasRGitHub in #1040
  • Fix userguide dockerfile by @dlpzx in #1089
  • Create DatasetLock for new datasets by @noah-paige in #1090
  • Fix verify share table items and access point share no bucket policy by @noah-paige in #1095
  • Add check and reapply for attaching S3 IAM policy by @dlpzx in #1096
  • Fix counter on paged responses by @petrkalos in #1091
  • Handle Error on clean up share and not get stuck in IN_PROGRESS status by @noah-paige in #1099
  • Fix issue in SageMaker Create permissions by @dlpzx in #1102

Refactoring 💻

Documentation 📚

Dependencies 📦

  • Upgrade Aurora postgreSQL engine 11 --> 13 by @noah-paige in #963
  • Upgrade axios package to resolve follow-redirect vulnerability by @noah-paige in #952
  • Remove unused packages: jinja2, deprecated by @dlpzx in #969
  • Upgrade npm packages: axios, css-tools by @dlpzx in #1052
  • Upgrade postcss and add yarn resolutions by @dlpzx in #1059
  • Applyboto3==1.34.35 in DeployFrontend action by @anandsumit2000 in #1054
  • Upgrade starlette version and dependecies to avoid ReDoS by @dlpzx in #1038
  • Upgrade ip package in frontend for yarn and npm by @dlpzx in #1070

New Contributors 👨‍💻 👩‍💻

Full Changelog: v2.2.0...v2.3.0