Skip to content

Conversation

@redvers
Copy link

@redvers redvers commented Sep 10, 2025

The act of writing this RFC made me question my life choices.

I'm now leaning more towards just adding an additional constructor (from the Alternatives section)

new iso copy_cstring_iso(ptr: Pointer[U8] val)

An RFC for discussion!

@ponylang-main ponylang-main added discuss during sync Should be discussed during an upcoming sync status - new The RFC is new and ready for discussion. labels Sep 10, 2025
@redvers redvers changed the title Initial draft Updating String.copy_cstring() and String.copy_cpointer() to return iso^ Sep 10, 2025
var str: String iso = String.from_cstring(ptr).clone()
```

The main issue with this is that it directly violates the documentation which states: "This must be done only with C-FFI functions that return pony\_alloc'd character arrays".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side note: that documentation should probably be updated as if it doesn't state a reason. to be safe, it must be assured that the allocator of the memory won't free it for as long as we hold on to it.

what should really be said is that it should only be done with a pointer to memory that we can take ownership of.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

… and define what "we can take ownership of" means.

I always assumed (I know) that meant that when the Pointer[U8] refcount reached zero that ponyrt would attempt to free it via GC. So if not that, it means more that the library author wrote "You are responsible for calling free() on this allocation." or equivalent?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry what I meant was "won't be freed while we hold a pointer to". Freeing it is an entirely different issue.

If it is pony allocated then we know it won't be freed under us.

var str: String iso = String.from_cstring(ptr).clone()
```

The main issue with this is that it directly violates the documentation which states: "This must be done only with C-FFI functions that return pony\_alloc'd character arrays".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree with this being "the main issue". It is important to highlight taht the pattern you have requires two string allocations. That is wasteful, especially if you do it a lot.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup - the double allocation is motivation for the change.


The main issue with this is that it directly violates the documentation which states: "This must be done only with C-FFI functions that return pony\_alloc'd character arrays".

## Create a String ref^ using String.copy\_cstring() and clone()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, here you are doing a clone and that is wasteful. You create a string solely to clone it to get a new string. eeewww. unclean! unclean!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - this is the motivation for the change… To avoid this second copy.

var str: String iso = String.copy_cstring(ptr).clone()
```

This does not violate the documentation and does a clean copy into a new `String ref^`. Unfortunately, the clone() that follows to generate the `String iso^` we need results in a second copying of the data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as i note above, it worse than the copying of the data.

Comment on lines +52 to +67

The main disadvantage to this approach is that it does cause an additional creation of an (empty) `String` in addition to the `String iso^` it would return. That feels "impure"™. An implementation might look something like this:

```pony
fun copy_cpointer(ptr: Pointer[U8] box, len: USize): String iso^ =>
"""
Create a string by copying a fixed number of bytes from a pointer.
"""
let str: String iso = recover iso String(len + 1) end
if not ptr.is_null() then
ptr._copy_to(str._ptr._unsafe(), len)
str._set(len, 0)
str.recalc()
end
consume str
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is no better what you are already having to do AND it adds extra allocations for everyone. do not like.

new iso copy_cstring(str: Pointer[U8 val] tag)
```

As there is a copying mechanism this doesn't break safety, but it does break the expectation that you cannot get data out of a `Pointer[U8] tag`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what expectation that you can't get data out of a Pointer[U8] tag. tag is identity. If I have the identity of a pointer, I can copy it. It is why Pointer is only available to manipulate from inside builtin.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my expectation came from the fact that there is no (current) way to get the content of a Pointer[U8] tag without resorting to C-FFI because all the functions to create Array[U8] or Strings require box/ref/val.

So, creating the first constructor which allowed you to create a String from a Pointer[U8] tag would be a change in expectation.

It comes down to the motivation as to why these constructors don't exist.

Is it only to prevent creation of other objects that use the same underlying Pointers (thus breaking all guarantees…), or was there something more?

I don't know - so I err'd on the side of caution.

Copy link
Author

@redvers redvers Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found it - this (from the tutorial).

This one is easy: tag variables are opaque! They can’t be read from.

So in my mind, if I could pass a Pointer[U8] tag into a function and get a copy of that string, it counted in my mind as "reading" from it.


# How We Teach This

The aim of this RFC is for there to be no user impact, so no new information will need to be shared.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this could break existing code, it needs to be taught in the sense of we must inform people this is happening. At minimum that is in release notes, perhaps more.


# Drawbacks

The first proposal doesn't feel "correct", creating an object via a function, and not using the calling object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and it adds a tax on everyone even those who are fine with the existing functionality.


The first proposal doesn't feel "correct", creating an object via a function, and not using the calling object.

The second proposal although not breaking safety, does break expectations that have been set over time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please show where these expectations have been set.


# Alternatives

## Change the signature to only accept `Pointer[U8] val`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont see any reason to make a breaking change of this sort when the same can be accomplished with a new constructor.


## Change the signature to only accept `Pointer[U8] val`

This would be a trivial but breaking change. A review of stdlib and the libraries in the github ponylang repositories finds no clear cases where I think this would break.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial can mean a lot of things. i would not be in favor of an RFC that that's this is trivial without defining what is trivial about it.


# Unresolved questions

Although the ideal of not making a breaking change is strong, it may result in being the most correct way forward.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct is a loaded term. i would not be in favor of an rfc that uses that language without detailing exactly what is meant by "correct".


# How We Test This

There are currently no tests for `String.copy_cstring()` or `String.copy_cpointer()` in builtin_tests. My initial PR included some.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isnt sufficient for an RFC. If we get close to accepting a version of the RFC, then there needs to be detailed information about how we test. This section must be fulfilled for a PR implementing the RFC to be merged so a sufficiently detailed answer will be required here.

@redvers
Copy link
Author

redvers commented Sep 11, 2025

In my original comment / commit I highlighted that my preferred solution was the addition of a set of create_c*_iso constructors but pushed I still pushed the RFC as written to start a discussion.

The more I look at the problem now, the more I think that even the addition of this function is unnecessary as the existing functions can be coaxed to return an iso as long as you provide a Pointer[U8] val - my proposed change.

I think I focused too hard on the documented function signature which, in my reading meant that it could only provide a ref^.

My understanding missed the fact that the documentation cites the most permissive refcaps for inputs and the most restrictive on outputs.

I think I'll write-up my thoughts on this subtlety later today on Zulip and take a look through the documentation to see if I can work out where I missed this subtle detail.

The RFC was a very useful process for me in getting my thoughts on the subject to ground, but I no longer believe this is a useful RFC to pursue.

@redvers redvers closed this Sep 11, 2025
@ponylang-main ponylang-main added discuss during sync Should be discussed during an upcoming sync and removed discuss during sync Should be discussed during an upcoming sync labels Sep 11, 2025
@redvers redvers deleted the string_copy_to_return_iso branch September 11, 2025 17:38
@redvers redvers removed status - new The RFC is new and ready for discussion. discuss during sync Should be discussed during an upcoming sync labels Sep 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants