Adding Index Setting for Data compression#20355
Adding Index Setting for Data compression#20355darjisagar7 wants to merge 4 commits intoopensearch-project:feature/datafusionfrom
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
❌ Gradle check result for f60cfd8: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
|
||
| // Enhanced native methods that handle validation and provide better error reporting | ||
| public static native void createWriter(String file, long schemaAddress) throws IOException; | ||
| public static native void createWriter(String file, long schemaAddress, boolean isCompressionEnabled) throws IOException; |
There was a problem hiding this comment.
The change makes it very specific for compression. Can we instead pass a map here which can be read and have attribute names? Also, there are certain settings which we may want to apply for certain columns only (some may get handled via mappings as well). Can you look into if we can make the argument passed to Rust aligned to handle the index/field level settings?
There was a problem hiding this comment.
Iceberg configuration for ref
| write.format.default | parquet | Default file format for the table; parquet, avro, or orc |
|---|---|---|
| write.delete.format.default | data file format | Default delete file format for the table; parquet, avro, or orc |
| write.parquet.row-group-size-bytes | 134217728 (128 MB) | Parquet row group size |
| write.parquet.page-size-bytes | 1048576 (1 MB) | Parquet page size |
| write.parquet.page-row-limit | 20000 | Parquet page row limit |
| write.parquet.dict-size-bytes | 2097152 (2 MB) | Parquet dictionary page size |
| write.parquet.compression-codec | zstd | Parquet compression codec: zstd, brotli, lz4, gzip, snappy, uncompressed |
| write.parquet.compression-level | null | Parquet compression level |
| write.parquet.bloom-filter-enabled.column.col1 | (not set) | Hint to parquet to write a bloom filter for the column: 'col1' |
| write.parquet.bloom-filter-max-bytes | 1048576 (1 MB) | The maximum number of bytes for a bloom filter bitset |
| write.parquet.bloom-filter-fpp.column.col1 | 0.01 | The false positive probability for a bloom filter applied to 'col1' (must > 0.0 and < 1.0) |
| ); | ||
|
|
||
| public static final Setting<Boolean> INDEX_COMPRESSION_ENABLED_SETTING = Setting.boolSetting( | ||
| "index.compression.enabled", |
There was a problem hiding this comment.
Should we use the existing codec setting here? We can validate what values to support in Parquet plugin itself.
There was a problem hiding this comment.
Also, some settings may come from parquet directly. If there is any setting which is specific to a dataformat, we should have the setting declared in the plugin itself
public List<Setting<?>> getSettings() {
return Collections.emptyList();
}
f60cfd8 to
04f4f7d
Compare
|
❌ Gradle check result for 04f4f7d: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Sagar Darji <darjisagar7@gmail.com>
Signed-off-by: Sagar Darji <darjisagar7@gmail.com>
04f4f7d to
fe1b049
Compare
fe1b049 to
c01b518
Compare
|
❌ Gradle check result for c01b518: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
| "index.parquet.row_group_size_bytes", | ||
| new ByteSizeValue(128, ByteSizeUnit.MB), | ||
| Setting.Property.IndexScope, | ||
| Setting.Property.Dynamic |
There was a problem hiding this comment.
For now, its ok to be not dynamic
…rence Signed-off-by: Sagar Darji <darjisagar7@gmail.com>
c01b518 to
969e025
Compare
|
❌ Gradle check result for 969e025: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
| .set_compression(Compression::ZSTD(ZstdLevel::try_new(3).unwrap())) | ||
| .build(); | ||
|
|
||
| let props = WriterPropertiesBuilder::build(&config); |
|
|
||
| // Push current settings to Rust store once on construction, then keep in sync on updates | ||
| pushSettingsToRust(indexSettings); | ||
| indexSettings.getScopedSettings().addSettingsUpdateConsumer( |
There was a problem hiding this comment.
Since settings are not dynamic for now, we can remove these
| } | ||
|
|
||
| // Read configuration from the global settings store | ||
| let config: NativeSettings = SETTINGS_STORE.lock().unwrap().clone(); |
Signed-off-by: Sagar Darji <darjisagar7@gmail.com>
|
❌ Gradle check result for 178887e: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Description
[Describe what this change achieves]
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.