-
Notifications
You must be signed in to change notification settings - Fork 36
Open
Description
Hi,
first of all many thanks for making this repository accessible, great job!
I have two questions/comments:
- it would be nice if there was an indication how much memory is roughly needed to load/prepare the datasets. I started with your example of SlimPajama-6B, and had some starting problems because I ran out of memory. Increasing the number of CPUs did the job, but it was a bit hard for me to find out how much is actually needed.
- it would be also super helpful to have some benchmarks if you have them available: for example, for a given model and dataset, what is the best train/val loss you reached so far, and what is the optimizer setting that reached it. I don't know if you did many runs yourself, but if you have these information, it would be awesome to make an overview such that everyone can config a good baseline without having to do the tuning.
Thanks, and kind regards,
Fabian
Metadata
Metadata
Assignees
Labels
No labels