Skip to content

Comments

Improve management of temporary files#108

Merged
stevenpawley merged 2 commits intoOSGeo:mainfrom
Kodiologist:temp-file-cleanup
Aug 29, 2025
Merged

Improve management of temporary files#108
stevenpawley merged 2 commits intoOSGeo:mainfrom
Kodiologist:temp-file-cleanup

Conversation

@Kodiologist
Copy link
Contributor

I noticed that if an R process uses rgrass a lot, temporary files can really accumulate. In one case, in which I was running execGRASS and read_RAST in concurrent loops for 15 hours, I had over 7 million files. It's hard to clean these up without nuking tempdir() entirely, because many of the temporary files are created with generic names at the root of tempdir().

Here my approach has been to give initGRASS a new parameter that can be used to set the root of all further temporary files (other than the ones created by GRASS in g.tempfile). I can then delete them with on.exit in my calling code.

This PR is incomplete (e.g., I didn't document the new parameter), but I figured I'd submit it as-is to see if you're interested before I polish it up.

@stevenpawley
Copy link
Member

Hi @Kodiologist, thanks for this contribution. Definitely agree that finer control over tempfile handling would be beneficial especially when moving large amounts of data between R and rgrass. Just to check, before introducing a new argument, could this issue be mitigated in your use case by (a) explicitly specifying the gisDbase location (even if it represents a disposable location/project), and (b) using terra::tmpFiles(remove = TRUE) (assuming that these files are being produced by transferring data)? However, I don't think that this would be enough to capture all intermediate files, and being able to specify tempdir for all GRASS related data would be useful for other scenarios, like when working on a remote instances where the OS drive is small.

@Kodiologist
Copy link
Contributor Author

could this issue be mitigated in your use case by (a) explicitly specifying the gisDbase location (even if it represents a disposable location/project), and (b) using terra::tmpFiles(remove = TRUE) (assuming that these files are being produced by transferring data)?

No, I did try specifying gisDbase earlier and rgrass still produced a lot of temporary files in the root of tempdir(), and my changes in this PR appeared to solve the issue without ever calling terra::tmpFiles. So I think the guilty party was rgrass's internal calls to tempfile.

@stevenpawley
Copy link
Member

Yes, these approaches won't catch everything, for example tempfiles that are generated when data is exported from the GRASS database during read_RAST. Several other geospatial packages also allow tempdir to be specified, so thanks for adding this.

Would you mind also creating a test for this that checks that everything is contained in the user-specified tempdir location? This will need to use skip_on_cran() unless the tmpdir location for the test is a subdirectory of tmpdir.

@Kodiologist
Copy link
Contributor Author

Would you mind also creating a test for this that checks that everything is contained in the user-specified tempdir location?

I've extended "testing basic doGRASS, execGRASS, stringexecGRASS" for this purpose. I've also polished things up in general, so this PR should be ready to go now.

The test named "testing read_RAST using sp" fails for me, but it's already failing on main.

My feeling is that an environment variable probably isn't the right way to store this temporary-directory name, but it seemed consistent with how initGRASS stores other state, so that's how I did it. I don't know if my approach is thread-safe.

@Kodiologist
Copy link
Contributor Author

It may be possible to add on.exit(unlink(…)) in a few more cases. I couldn't always tell whether a temporary file would still be needed after the function returned.

@stevenpawley
Copy link
Member

Could you just re-document - otherwise it looks great.

Regarding the environment variables, I've been working on an overhaul of the initGRASS process. Making use of options() via an rgrassOptions setter/getter is something that I was thinking could be applied to override, ignore.stderr, verbose, and now tempdir.

@Kodiologist
Copy link
Contributor Author

Sure, done.

@Kodiologist
Copy link
Contributor Author

The safest method to manage rgrass's state would probably be for initGRASS to return a state-storing object that must be passed to subsequent function calls, but that would break a lot of code.

@stevenpawley stevenpawley self-assigned this Aug 29, 2025
@stevenpawley stevenpawley merged commit 7740c98 into OSGeo:main Aug 29, 2025
6 checks passed
@stevenpawley
Copy link
Member

Thanks @Kodiologist, all tests pass and the tempdir addition has been merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants