Adding Index Setting for Data compression by darjisagar7 · Pull Request #20355 · opensearch-project/OpenSearch

darjisagar7 · 2026-01-02T11:49:21Z

Description

[Describe what this change achieves]

>....                                                                                                                                                                                                              
        "compression.enabled": "true",
        "optimized.enabled": "true",
        "index.parquet.page_row_limit": 50000,
        "index.parquet.compression_type": "ZSTD"
    },
    "mappings": {
        "properties": {
            "id": {
                "type": "keyword"
            },
            "name": {
                "type": "keyword"
            },
            "age": {
                "type": "integer"
            },
            "salary": {
                "type": "long"
            },
            "score": {
                "type": "double"
            },
            "active": {
                "type": "boolean"
            },
            "created_date": {
                "type": "date"
            }
        }
    }
}'

{"acknowledged":true,"shards_acknowledged":true,"index":"index-7"}%

parquet-tools inspect OpenSearch/build/testclusters/runTask-0/data/nodes/0/indices/jhYkYQ8PQNGg-UAsLQEq8g/0/parquet/_parquet_file_generation_16.parquet

############ file meta data ############
created_by: parquet-rs version 53.4.1
num_columns: 15
num_rows: 6
num_row_groups: 1
format_version: 1.0
serialized_size: 2793


############ Columns ############
_routing
_doc_count
active
salary
_ignored
_seq_no
score
name
_id
id
created_date
_version
age
___row_id
_primary_term

############ Column(_routing) ############
name: _routing
path: _routing
max_definition_level: 1
max_repetition_level: 0
physical_type: BYTE_ARRAY
logical_type: String
converted_type (legacy): UTF8
compression: ZSTD (space_saved: -47%)

############ Column(_doc_count) ############
name: _doc_count
path: _doc_count
max_definition_level: 1
max_repetition_level: 0
physical_type: INT64
logical_type: None
converted_type (legacy): NONE
compression: ZSTD (space_saved: -47%)

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

Functionality includes testing.
API changes companion pull request created, if applicable.
Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

coderabbitai · 2026-01-02T11:49:27Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-01-02T11:55:43Z

❌ Gradle check result for f60cfd8: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

mgodwan · 2026-01-02T12:09:49Z

modules/parquet-data-format/src/main/java/com/parquet/parquetdataformat/bridge/RustBridge.java


    // Enhanced native methods that handle validation and provide better error reporting
-    public static native void createWriter(String file, long schemaAddress) throws IOException;
+    public static native void createWriter(String file, long schemaAddress, boolean isCompressionEnabled) throws IOException;


The change makes it very specific for compression. Can we instead pass a map here which can be read and have attribute names? Also, there are certain settings which we may want to apply for certain columns only (some may get handled via mappings as well). Can you look into if we can make the argument passed to Rust aligned to handle the index/field level settings?

Iceberg configuration for ref

write.format.default parquet Default file format for the table; parquet, avro, or orc

write.delete.format.default data file format Default delete file format for the table; parquet, avro, or orc

write.parquet.row-group-size-bytes 134217728 (128 MB) Parquet row group size

write.parquet.page-size-bytes 1048576 (1 MB) Parquet page size

write.parquet.page-row-limit 20000 Parquet page row limit

write.parquet.dict-size-bytes 2097152 (2 MB) Parquet dictionary page size

write.parquet.compression-codec zstd Parquet compression codec: zstd, brotli, lz4, gzip, snappy, uncompressed

write.parquet.compression-level null Parquet compression level

write.parquet.bloom-filter-enabled.column.col1 (not set) Hint to parquet to write a bloom filter for the column: 'col1'

write.parquet.bloom-filter-max-bytes 1048576 (1 MB) The maximum number of bytes for a bloom filter bitset

write.parquet.bloom-filter-fpp.column.col1 0.01 The false positive probability for a bloom filter applied to 'col1' (must > 0.0 and < 1.0)

mgodwan · 2026-01-02T12:10:46Z

server/src/main/java/org/opensearch/index/IndexSettings.java

    );

+    public static final Setting<Boolean> INDEX_COMPRESSION_ENABLED_SETTING = Setting.boolSetting(
+        "index.compression.enabled",


Should we use the existing codec setting here? We can validate what values to support in Parquet plugin itself.

Also, some settings may come from parquet directly. If there is any setting which is specific to a dataformat, we should have the setting declared in the plugin itself

public List<Setting<?>> getSettings() { return Collections.emptyList(); }

github-actions · 2026-01-20T06:28:46Z

❌ Gradle check result for 04f4f7d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Sagar Darji <darjisagar7@gmail.com>

github-actions · 2026-02-19T10:13:19Z

❌ Gradle check result for c01b518: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

mgodwan · 2026-02-19T10:20:20Z

modules/parquet-data-format/src/main/java/com/parquet/parquetdataformat/ParquetSettings.java

+        "index.parquet.row_group_size_bytes",
+        new ByteSizeValue(128, ByteSizeUnit.MB),
+        Setting.Property.IndexScope,
+        Setting.Property.Dynamic


For now, its ok to be not dynamic

…rence Signed-off-by: Sagar Darji <darjisagar7@gmail.com>

github-actions · 2026-02-19T10:49:59Z

❌ Gradle check result for 969e025: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

mgodwan · 2026-02-19T10:24:39Z

modules/parquet-data-format/src/main/rust/src/lib.rs

-            .set_compression(Compression::ZSTD(ZstdLevel::try_new(3).unwrap()))
-            .build();
+
+        let props = WriterPropertiesBuilder::build(&config);


handle for merges as well

mgodwan · 2026-02-19T10:29:04Z

...t-data-format/src/main/java/com/parquet/parquetdataformat/engine/ParquetExecutionEngine.java

+
+        // Push current settings to Rust store once on construction, then keep in sync on updates
+        pushSettingsToRust(indexSettings);
+        indexSettings.getScopedSettings().addSettingsUpdateConsumer(


Since settings are not dynamic for now, we can remove these

mgodwan · 2026-02-20T06:04:03Z

modules/parquet-data-format/src/main/rust/src/lib.rs

        }

+        // Read configuration from the global settings store
+        let config: NativeSettings = SETTINGS_STORE.lock().unwrap().clone();


how is this index level?

Signed-off-by: Sagar Darji <darjisagar7@gmail.com>

github-actions · 2026-02-20T07:17:06Z

❌ Gradle check result for 178887e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

darjisagar7 requested review from a team, Bukhtawar, CEHENKLE, Rishikesh1159, anasalkouz, andrross, ashking94, cwperks, dbwiddis, gbbafna, jed326, kotwanikunal, mch2, msfroh, owaiskazi19, reta, sachinpkale, saratvemulapalli, shwetathareja and sohami as code owners January 2, 2026 11:49

mgodwan reviewed Jan 2, 2026

View reviewed changes

darjisagar7 force-pushed the IndexSettingCompression branch from f60cfd8 to 04f4f7d Compare January 20, 2026 06:17

darjisagar7 mentioned this pull request Jan 21, 2026

Adding bloom_filter_enable field mapping parameter to allow field level bloom filter for Parquet #20404

Open

darjisagar7 added 2 commits February 19, 2026 15:12

Adding Index Setting for Data compression

449f426

Signed-off-by: Sagar Darji <darjisagar7@gmail.com>

Passing the settings with help of WriterConfig object as json to Rust

ae2220f

Signed-off-by: Sagar Darji <darjisagar7@gmail.com>

darjisagar7 force-pushed the IndexSettingCompression branch from 04f4f7d to fe1b049 Compare February 19, 2026 09:42

darjisagar7 force-pushed the IndexSettingCompression branch from fe1b049 to c01b518 Compare February 19, 2026 10:05

mgodwan reviewed Feb 19, 2026

View reviewed changes

Creating a store for Index setting in rust using the java object refe…

969e025

…rence Signed-off-by: Sagar Darji <darjisagar7@gmail.com>

darjisagar7 force-pushed the IndexSettingCompression branch from c01b518 to 969e025 Compare February 19, 2026 10:42

mgodwan reviewed Feb 20, 2026

View reviewed changes

Making the NativeSettings to be an index level settings

178887e

Signed-off-by: Sagar Darji <darjisagar7@gmail.com>

write.format.default	parquet	Default file format for the table; parquet, avro, or orc
write.delete.format.default	data file format	Default delete file format for the table; parquet, avro, or orc
write.parquet.row-group-size-bytes	134217728 (128 MB)	Parquet row group size
write.parquet.page-size-bytes	1048576 (1 MB)	Parquet page size
write.parquet.page-row-limit	20000	Parquet page row limit
write.parquet.dict-size-bytes	2097152 (2 MB)	Parquet dictionary page size
write.parquet.compression-codec	zstd	Parquet compression codec: zstd, brotli, lz4, gzip, snappy, uncompressed
write.parquet.compression-level	null	Parquet compression level
write.parquet.bloom-filter-enabled.column.col1	(not set)	Hint to parquet to write a bloom filter for the column: 'col1'
write.parquet.bloom-filter-max-bytes	1048576 (1 MB)	The maximum number of bytes for a bloom filter bitset
write.parquet.bloom-filter-fpp.column.col1	0.01	The false positive probability for a bloom filter applied to 'col1' (must > 0.0 and < 1.0)

Comments

Conversation

darjisagar7 commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Check List

Uh oh!

coderabbitai bot commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

github-actions bot commented Jan 2, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 20, 2026

Uh oh!

github-actions bot commented Feb 19, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 19, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

darjisagar7 commented Jan 2, 2026 •

edited

Loading

coderabbitai bot commented Jan 2, 2026 •

edited

Loading