[Benchmark] Support ScienceOlympiad Galaxy10DECaLS VRSBench#1410
Open
zhouyujin wants to merge 2 commits intoopen-compass:mainfrom
Open
[Benchmark] Support ScienceOlympiad Galaxy10DECaLS VRSBench#1410zhouyujin wants to merge 2 commits intoopen-compass:mainfrom
zhouyujin wants to merge 2 commits intoopen-compass:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add three datasets: ScienceOlympiad, Galaxy10DECaLS, VRSBench
TSV link: https://huggingface.co/datasets/YuJJJJin/ScienceOlympiad.tsv
ScienceOlympiad focuses on competitive‑level physics and chemistry problems with multimodal content. It evaluates models on scientific reasoning and visual comprehension.
TSV link: https://huggingface.co/datasets/YuJJJJin/Galaxy10DECaLS.tsv
Galaxy10DECaLS is a curated image classification dataset with 1,774 galaxy images across 10 classes. It evaluates models’ ability to classify astronomical objects based on visual features.
TSV link: https://huggingface.co/datasets/YuJJJJin/VRSBench.tsv
VRSBench is derived from the VQA test set of the VRSBench benchmark and evaluates multimodal understanding of remote‑sensing imagery.
Two variants are provided:
• VRSBench.tsv: Full evaluation set with 37,409 VQA samples.
• VRSBench_MINI.tsv: Compact evaluation set with 3,735 samples (10% stratified sampling from the full set, seed=42).
Both datasets cover 12 question categories and assess a model’s ability to answer remote‑sensing questions through visual analysis and reasoning.