-
Notifications
You must be signed in to change notification settings - Fork 30
Description
OK, so it's taken me a while to break this down to actually figure out what I need, but I have no experience with Ruby so I'm hoping you'll consider adding this as an option for me.
Essentially I have discovered that the 'USED' space of a snapshot is only what is uniquely referenced by that snapshot. The setup I have is with two different overlapping automatic snapshots running, one is frequent but sparse (remove empties), and the other is infrequent but dense (keep empties). The aim is to have the last 500 changes in a data set at 5 minute intervals, and hourly intervals for the last 4 weeks. The problem is that where the two snapshot sequences overlap (i.e. on the hour) the sparse set always registers as zero-sized because it shares its dataset with the hourly snapshot and so is removed.
There is however a property called written@snapshot which describes how much data was written to the target since the requested snapshot. For example:
zfs get written@zfs-auto-snap_frequent-2017-01-30-06h00U tank/root@zfs-auto-snap_frequent-2017-01-30-06h05U
will say how much data was written to the 06h05U snapshot that didn't exist in the 06h00U snapshot. While this doesn't tell us the size of the 06h00U snapshot we know that it must also contain some data that the 06h05U snapshot doesn't (if only because the metadata addressing the written data must have been updated) and so it is the last snapshot on that 'interval' to reference that data. This means that if this snapshot has a zero size currently, we do know that at some point in the future if older snapshots are removed then this snapshot will become non-zero sized.
If we only used this metric then we may double our number of kept snapshots because the previous kept snapshot may already hold the same data as us. Effectively the snapshot we now keep may actually represents the last point that the kept snapshot before it remains unchanged. To differentiate this case we need to ensure that this snapshot also differs from the previous kept snapshot.
So, essentially what I am asking for is a filter for each zero sized snapshots that checks whether data was written between the previous snapshot and itself, and between itself and the next snapshot. If data exists in both of those cases I would like to preserve the snapshot because its 'USED' property would be non-zero if all snapshots not of its 'interval' were removed. The edge case of the most recent snapshot doesn't apply because it isn't subject to zero sized removal anyway, and the oldest snapshot should treat its written against previous as non-zero since it must be unique on this 'interval'.