-
-
Notifications
You must be signed in to change notification settings - Fork 48
Updating String.copy_cstring() and String.copy_cpointer() to return iso^ #218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| var str: String iso = String.from_cstring(ptr).clone() | ||
| ``` | ||
|
|
||
| The main issue with this is that it directly violates the documentation which states: "This must be done only with C-FFI functions that return pony\_alloc'd character arrays". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
side note: that documentation should probably be updated as if it doesn't state a reason. to be safe, it must be assured that the allocator of the memory won't free it for as long as we hold on to it.
what should really be said is that it should only be done with a pointer to memory that we can take ownership of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
… and define what "we can take ownership of" means.
I always assumed (I know) that meant that when the Pointer[U8] refcount reached zero that ponyrt would attempt to free it via GC. So if not that, it means more that the library author wrote "You are responsible for calling free() on this allocation." or equivalent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry what I meant was "won't be freed while we hold a pointer to". Freeing it is an entirely different issue.
If it is pony allocated then we know it won't be freed under us.
| var str: String iso = String.from_cstring(ptr).clone() | ||
| ``` | ||
|
|
||
| The main issue with this is that it directly violates the documentation which states: "This must be done only with C-FFI functions that return pony\_alloc'd character arrays". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree with this being "the main issue". It is important to highlight taht the pattern you have requires two string allocations. That is wasteful, especially if you do it a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup - the double allocation is motivation for the change.
|
|
||
| The main issue with this is that it directly violates the documentation which states: "This must be done only with C-FFI functions that return pony\_alloc'd character arrays". | ||
|
|
||
| ## Create a String ref^ using String.copy\_cstring() and clone() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, here you are doing a clone and that is wasteful. You create a string solely to clone it to get a new string. eeewww. unclean! unclean!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed - this is the motivation for the change… To avoid this second copy.
| var str: String iso = String.copy_cstring(ptr).clone() | ||
| ``` | ||
|
|
||
| This does not violate the documentation and does a clean copy into a new `String ref^`. Unfortunately, the clone() that follows to generate the `String iso^` we need results in a second copying of the data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as i note above, it worse than the copying of the data.
|
|
||
| The main disadvantage to this approach is that it does cause an additional creation of an (empty) `String` in addition to the `String iso^` it would return. That feels "impure"™. An implementation might look something like this: | ||
|
|
||
| ```pony | ||
| fun copy_cpointer(ptr: Pointer[U8] box, len: USize): String iso^ => | ||
| """ | ||
| Create a string by copying a fixed number of bytes from a pointer. | ||
| """ | ||
| let str: String iso = recover iso String(len + 1) end | ||
| if not ptr.is_null() then | ||
| ptr._copy_to(str._ptr._unsafe(), len) | ||
| str._set(len, 0) | ||
| str.recalc() | ||
| end | ||
| consume str | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is no better what you are already having to do AND it adds extra allocations for everyone. do not like.
| new iso copy_cstring(str: Pointer[U8 val] tag) | ||
| ``` | ||
|
|
||
| As there is a copying mechanism this doesn't break safety, but it does break the expectation that you cannot get data out of a `Pointer[U8] tag`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what expectation that you can't get data out of a Pointer[U8] tag. tag is identity. If I have the identity of a pointer, I can copy it. It is why Pointer is only available to manipulate from inside builtin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my expectation came from the fact that there is no (current) way to get the content of a Pointer[U8] tag without resorting to C-FFI because all the functions to create Array[U8] or Strings require box/ref/val.
So, creating the first constructor which allowed you to create a String from a Pointer[U8] tag would be a change in expectation.
It comes down to the motivation as to why these constructors don't exist.
Is it only to prevent creation of other objects that use the same underlying Pointers (thus breaking all guarantees…), or was there something more?
I don't know - so I err'd on the side of caution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found it - this (from the tutorial).
This one is easy: tag variables are opaque! They can’t be read from.
So in my mind, if I could pass a Pointer[U8] tag into a function and get a copy of that string, it counted in my mind as "reading" from it.
|
|
||
| # How We Teach This | ||
|
|
||
| The aim of this RFC is for there to be no user impact, so no new information will need to be shared. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this could break existing code, it needs to be taught in the sense of we must inform people this is happening. At minimum that is in release notes, perhaps more.
|
|
||
| # Drawbacks | ||
|
|
||
| The first proposal doesn't feel "correct", creating an object via a function, and not using the calling object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and it adds a tax on everyone even those who are fine with the existing functionality.
|
|
||
| The first proposal doesn't feel "correct", creating an object via a function, and not using the calling object. | ||
|
|
||
| The second proposal although not breaking safety, does break expectations that have been set over time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please show where these expectations have been set.
|
|
||
| # Alternatives | ||
|
|
||
| ## Change the signature to only accept `Pointer[U8] val` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont see any reason to make a breaking change of this sort when the same can be accomplished with a new constructor.
|
|
||
| ## Change the signature to only accept `Pointer[U8] val` | ||
|
|
||
| This would be a trivial but breaking change. A review of stdlib and the libraries in the github ponylang repositories finds no clear cases where I think this would break. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trivial can mean a lot of things. i would not be in favor of an RFC that that's this is trivial without defining what is trivial about it.
|
|
||
| # Unresolved questions | ||
|
|
||
| Although the ideal of not making a breaking change is strong, it may result in being the most correct way forward. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct is a loaded term. i would not be in favor of an rfc that uses that language without detailing exactly what is meant by "correct".
|
|
||
| # How We Test This | ||
|
|
||
| There are currently no tests for `String.copy_cstring()` or `String.copy_cpointer()` in builtin_tests. My initial PR included some. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isnt sufficient for an RFC. If we get close to accepting a version of the RFC, then there needs to be detailed information about how we test. This section must be fulfilled for a PR implementing the RFC to be merged so a sufficiently detailed answer will be required here.
|
In my original comment / commit I highlighted that my preferred solution was the addition of a set of The more I look at the problem now, the more I think that even the addition of this function is unnecessary as the existing functions can be coaxed to return an iso as long as you provide a I think I focused too hard on the documented function signature which, in my reading meant that it could only provide a ref^. My understanding missed the fact that the documentation cites the most permissive refcaps for inputs and the most restrictive on outputs. I think I'll write-up my thoughts on this subtlety later today on Zulip and take a look through the documentation to see if I can work out where I missed this subtle detail. The RFC was a very useful process for me in getting my thoughts on the subject to ground, but I no longer believe this is a useful RFC to pursue. |
The act of writing this RFC made me question my life choices.
I'm now leaning more towards just adding an additional constructor (from the Alternatives section)
An RFC for discussion!