-
Notifications
You must be signed in to change notification settings - Fork 2
Added Huggingface support #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
TheCedarPrince
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much @ParamThakkar123 for your work! I left a bunch of comments based on our call; apologies for the delay in reviewing with being out sick! Very excited here and happy to review multiple times as needed. Just let me know via ping again!
I think we also will need to implement #14 first before continuing with this PR however.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @ParamThakkar123 , I think that some of the dependencies in here got mixed up. Some of these should instead be moved to testing dependencies in the test folder. You can read about this exact workflow here: https://docs.julialang.org/en/v1/stdlib/Test/#Workflow-for-Testing-Packages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @ParamThakkar123 , this still does not appear to be addressed -- could you take a look again? Thanks!
|
Hey @ParamThakkar123 , I think you are unblocked on this PR now! Could you rebase and address the comments I left here? Thanks and keep up the good work! :D |
Yes @TheCedarPrince I will start working on this now!! |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #13 +/- ##
=======================================
Coverage ? 29.54%
=======================================
Files ? 5
Lines ? 44
Branches ? 0
=======================================
Hits ? 13
Misses ? 31
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
TheCedarPrince
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Param, very very close here! Some final comments; thanks for the hard work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @ParamThakkar123 , this still does not appear to be addressed -- could you take a look again? Thanks!
| using HealthSampleData | ||
|
|
||
| """ | ||
| register_huggingface_dataset(name::String, repo::String, filename::String) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there -- this really should be explicitly hardcoded like what we have for the Eunomia example. Here is what I am imagining the flow to be:
download_hf_dataset(name = "Synthea")- There should be a registered DataDep for Synthea in the data.jl file (like Eunomia)
- The
download_hf_datasetshould return that DataDep
You may want to create the Synthea DataDep as a constant or something that you can call from within data.jl.
| using HealthSampleData | ||
| using Test | ||
|
|
||
| @testset "HealthSampleData.jl" begin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add tests for downloading; you could upload a very small file to HF JuliaHealthDatasets and use that for testing.
TheCedarPrince
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Param, very very close here! Some final comments; thanks for the hard work!
@TheCedarPrince
This PR Fixes #12
Added a
_huggingface_dataset_registerto register datasets from huggingface hub using DataDeps.jl and implemented aloadfunction to load the dataset usingHealthSampleData.jlmodule for the users