Hi,
First of all I really appreciate the course, and am very greatful that you have made it openly available!
For assignment 4 you use the together cluster for storing data for Standford-students, and provide non-cluster alternatives up until section 4. I would not expect you to host a 375 GB download, but I was wondering if you could specify the dump(CC-MAIN-2025-18?) and which segments of the dump you used to create those 5000 WET-files you are storing on the cluster. If those are listed in the assignment, non-students could recreate the dataset and compare their results directly to the leaderboard. The paloma validation data-set is available on huggingface, so I think thats the only piece that is missing.