-
-
Notifications
You must be signed in to change notification settings - Fork 48
Updating String.copy_cstring() and String.copy_cpointer() to return iso^ #218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,111 @@ | ||
| - Feature Name: Modify String.copy\_cpointer() and String.copy\_cstring() to return iso^ | ||
| - Start Date: 2025-09-10 | ||
| - RFC PR: | ||
| - Pony Issue: | ||
|
|
||
| # Summary | ||
|
|
||
| The two current constructors `String.copy_cpointer()` and `String.copy_cstring()` both take a `Pointer[U8] box` and return a `String ref^`. Given that these two constructors make copies, returning a `String iso^` would be preferable and safe. | ||
|
|
||
| Note: In the interests of clarity of writing, I will use `String.copy_cstring()` throughout the rest of this RFC. All of the commentary below should be read to apply to both `String.copy_cstring()`, and `String.copy_cpointer()`. | ||
|
|
||
| # Motivation | ||
|
|
||
| In many cases, as a library author I want to provide my end-users a mutable and sendable copy of a String received via C-FFI. In other words, a `String iso^`. With the way that the String library is currently written, there are two ways to do this: | ||
|
|
||
| ## Create a String ref^ using String.from\_cstring() or from\_cpointer() and clone(). | ||
|
|
||
| ```pony | ||
| var str: String iso = String.from_cstring(ptr).clone() | ||
| ``` | ||
|
|
||
| The main issue with this is that it directly violates the documentation which states: "This must be done only with C-FFI functions that return pony\_alloc'd character arrays". | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I disagree with this being "the main issue". It is important to highlight taht the pattern you have requires two string allocations. That is wasteful, especially if you do it a lot.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yup - the double allocation is motivation for the change. |
||
|
|
||
| ## Create a String ref^ using String.copy\_cstring() and clone() | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. again, here you are doing a clone and that is wasteful. You create a string solely to clone it to get a new string. eeewww. unclean! unclean!
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed - this is the motivation for the change… To avoid this second copy. |
||
|
|
||
| ```pony | ||
| var str: String iso = String.copy_cstring(ptr).clone() | ||
| ``` | ||
|
|
||
| This does not violate the documentation and does a clean copy into a new `String ref^`. Unfortunately, the clone() that follows to generate the `String iso^` we need results in a second copying of the data. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. as i note above, it worse than the copying of the data. |
||
|
|
||
| Since the `String.copy_cstring()` does a clean copy, it would be a safe operation to return a `String iso^`, making the second copying redundant. | ||
|
|
||
| Ideally, the change would result in the following use: | ||
|
|
||
| ```pony | ||
| var str: String iso = String.copy_cstring(ptr) | ||
| ``` | ||
|
|
||
| # Detailed design | ||
|
|
||
| The existing `String.copy_cstring()` and `String.copy_cpointer()` have the following signatures: | ||
|
|
||
| ```pony | ||
| new ref copy_cstring(str: Pointer[U8 val] box) | ||
| new ref copy_cpointer(str: Pointer[U8 val] box, len: USize val) | ||
| ``` | ||
|
|
||
| The reason that we cannot just change the return type to `String iso^` is that `str: Pointer[U8] box` is not sendable. The ideal outcome would be to not change the `Pointer[U8] box` to `Pointer[U8] val` in order to not create a breaking change. Here are two proposed ways to achieve this: | ||
|
|
||
| ## Convert from constructors to functions | ||
|
|
||
| The main disadvantage to this approach is that it does cause an additional creation of an (empty) `String` in addition to the `String iso^` it would return. That feels "impure"™. An implementation might look something like this: | ||
|
|
||
| ```pony | ||
| fun copy_cpointer(ptr: Pointer[U8] box, len: USize): String iso^ => | ||
| """ | ||
| Create a string by copying a fixed number of bytes from a pointer. | ||
| """ | ||
| let str: String iso = recover iso String(len + 1) end | ||
| if not ptr.is_null() then | ||
| ptr._copy_to(str._ptr._unsafe(), len) | ||
| str._set(len, 0) | ||
| str.recalc() | ||
| end | ||
| consume str | ||
| ``` | ||
|
Comment on lines
+52
to
+67
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is no better what you are already having to do AND it adds extra allocations for everyone. do not like. |
||
|
|
||
| ## Retain the constructor, but change the receiver to Pointer[U8] tag | ||
|
|
||
| ```pony | ||
| new iso copy_cstring(str: Pointer[U8 val] tag) | ||
| ``` | ||
|
|
||
| As there is a copying mechanism this doesn't break safety, but it does break the expectation that you cannot get data out of a `Pointer[U8] tag`. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what expectation that you can't get data out of a Pointer[U8] tag.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think my expectation came from the fact that there is no (current) way to get the content of a Pointer[U8] tag without resorting to C-FFI because all the functions to create Array[U8] or Strings require box/ref/val. So, creating the first constructor which allowed you to create a String from a Pointer[U8] tag would be a change in expectation. It comes down to the motivation as to why these constructors don't exist. Is it only to prevent creation of other objects that use the same underlying Pointers (thus breaking all guarantees…), or was there something more? I don't know - so I err'd on the side of caution.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Found it - this (from the tutorial). This one is easy: tag variables are opaque! They can’t be read from. So in my mind, if I could pass a Pointer[U8] tag into a function and get a copy of that string, it counted in my mind as "reading" from it. |
||
|
|
||
| # How We Teach This | ||
|
|
||
| The aim of this RFC is for there to be no user impact, so no new information will need to be shared. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If this could break existing code, it needs to be taught in the sense of we must inform people this is happening. At minimum that is in release notes, perhaps more. |
||
|
|
||
| # How We Test This | ||
|
|
||
| There are currently no tests for `String.copy_cstring()` or `String.copy_cpointer()` in builtin_tests. My initial PR included some. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this isnt sufficient for an RFC. If we get close to accepting a version of the RFC, then there needs to be detailed information about how we test. This section must be fulfilled for a PR implementing the RFC to be merged so a sufficiently detailed answer will be required here. |
||
|
|
||
| # Drawbacks | ||
|
|
||
| The first proposal doesn't feel "correct", creating an object via a function, and not using the calling object. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. and it adds a tax on everyone even those who are fine with the existing functionality. |
||
|
|
||
| The second proposal although not breaking safety, does break expectations that have been set over time. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. please show where these expectations have been set. |
||
|
|
||
| # Alternatives | ||
|
|
||
| ## Change the signature to only accept `Pointer[U8] val` | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I dont see any reason to make a breaking change of this sort when the same can be accomplished with a new constructor. |
||
|
|
||
| This would be a trivial but breaking change. A review of stdlib and the libraries in the github ponylang repositories finds no clear cases where I think this would break. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. trivial can mean a lot of things. i would not be in favor of an RFC that that's this is trivial without defining what is trivial about it. |
||
|
|
||
| ```pony | ||
| new iso copy_cstring(str: Pointer[U8 val] val) | ||
| ``` | ||
|
|
||
| ## Leave copy_cstring() as is, add copy_cstring_iso() | ||
|
|
||
| This would be a trivial non-breaking change, and there is existing precedent for including refcaps in function names: `String.from_iso_array()`. | ||
|
|
||
| ```pony | ||
| new iso copy_cstring_iso(str: Pointer[U8 val] val) | ||
| ``` | ||
|
|
||
| # Unresolved questions | ||
|
|
||
| Although the ideal of not making a breaking change is strong, it may result in being the most correct way forward. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. correct is a loaded term. i would not be in favor of an rfc that uses that language without detailing exactly what is meant by "correct". |
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
side note: that documentation should probably be updated as if it doesn't state a reason. to be safe, it must be assured that the allocator of the memory won't free it for as long as we hold on to it.
what should really be said is that it should only be done with a pointer to memory that we can take ownership of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
… and define what "we can take ownership of" means.
I always assumed (I know) that meant that when the Pointer[U8] refcount reached zero that ponyrt would attempt to free it via GC. So if not that, it means more that the library author wrote "You are responsible for calling free() on this allocation." or equivalent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry what I meant was "won't be freed while we hold a pointer to". Freeing it is an entirely different issue.
If it is pony allocated then we know it won't be freed under us.