Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
111 changes: 111 additions & 0 deletions text/0000-string-copy_cpointer-copy_cstring-returns-iso.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
- Feature Name: Modify String.copy\_cpointer() and String.copy\_cstring() to return iso^
- Start Date: 2025-09-10
- RFC PR:
- Pony Issue:

# Summary

The two current constructors `String.copy_cpointer()` and `String.copy_cstring()` both take a `Pointer[U8] box` and return a `String ref^`. Given that these two constructors make copies, returning a `String iso^` would be preferable and safe.

Note: In the interests of clarity of writing, I will use `String.copy_cstring()` throughout the rest of this RFC. All of the commentary below should be read to apply to both `String.copy_cstring()`, and `String.copy_cpointer()`.

# Motivation

In many cases, as a library author I want to provide my end-users a mutable and sendable copy of a String received via C-FFI. In other words, a `String iso^`. With the way that the String library is currently written, there are two ways to do this:

## Create a String ref^ using String.from\_cstring() or from\_cpointer() and clone().

```pony
var str: String iso = String.from_cstring(ptr).clone()
```

The main issue with this is that it directly violates the documentation which states: "This must be done only with C-FFI functions that return pony\_alloc'd character arrays".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side note: that documentation should probably be updated as if it doesn't state a reason. to be safe, it must be assured that the allocator of the memory won't free it for as long as we hold on to it.

what should really be said is that it should only be done with a pointer to memory that we can take ownership of.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

… and define what "we can take ownership of" means.

I always assumed (I know) that meant that when the Pointer[U8] refcount reached zero that ponyrt would attempt to free it via GC. So if not that, it means more that the library author wrote "You are responsible for calling free() on this allocation." or equivalent?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry what I meant was "won't be freed while we hold a pointer to". Freeing it is an entirely different issue.

If it is pony allocated then we know it won't be freed under us.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree with this being "the main issue". It is important to highlight taht the pattern you have requires two string allocations. That is wasteful, especially if you do it a lot.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup - the double allocation is motivation for the change.


## Create a String ref^ using String.copy\_cstring() and clone()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, here you are doing a clone and that is wasteful. You create a string solely to clone it to get a new string. eeewww. unclean! unclean!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - this is the motivation for the change… To avoid this second copy.


```pony
var str: String iso = String.copy_cstring(ptr).clone()
```

This does not violate the documentation and does a clean copy into a new `String ref^`. Unfortunately, the clone() that follows to generate the `String iso^` we need results in a second copying of the data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as i note above, it worse than the copying of the data.


Since the `String.copy_cstring()` does a clean copy, it would be a safe operation to return a `String iso^`, making the second copying redundant.

Ideally, the change would result in the following use:

```pony
var str: String iso = String.copy_cstring(ptr)
```

# Detailed design

The existing `String.copy_cstring()` and `String.copy_cpointer()` have the following signatures:

```pony
new ref copy_cstring(str: Pointer[U8 val] box)
new ref copy_cpointer(str: Pointer[U8 val] box, len: USize val)
```

The reason that we cannot just change the return type to `String iso^` is that `str: Pointer[U8] box` is not sendable. The ideal outcome would be to not change the `Pointer[U8] box` to `Pointer[U8] val` in order to not create a breaking change. Here are two proposed ways to achieve this:

## Convert from constructors to functions

The main disadvantage to this approach is that it does cause an additional creation of an (empty) `String` in addition to the `String iso^` it would return. That feels "impure"™. An implementation might look something like this:

```pony
fun copy_cpointer(ptr: Pointer[U8] box, len: USize): String iso^ =>
"""
Create a string by copying a fixed number of bytes from a pointer.
"""
let str: String iso = recover iso String(len + 1) end
if not ptr.is_null() then
ptr._copy_to(str._ptr._unsafe(), len)
str._set(len, 0)
str.recalc()
end
consume str
```
Comment on lines +52 to +67
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is no better what you are already having to do AND it adds extra allocations for everyone. do not like.


## Retain the constructor, but change the receiver to Pointer[U8] tag

```pony
new iso copy_cstring(str: Pointer[U8 val] tag)
```

As there is a copying mechanism this doesn't break safety, but it does break the expectation that you cannot get data out of a `Pointer[U8] tag`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what expectation that you can't get data out of a Pointer[U8] tag. tag is identity. If I have the identity of a pointer, I can copy it. It is why Pointer is only available to manipulate from inside builtin.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my expectation came from the fact that there is no (current) way to get the content of a Pointer[U8] tag without resorting to C-FFI because all the functions to create Array[U8] or Strings require box/ref/val.

So, creating the first constructor which allowed you to create a String from a Pointer[U8] tag would be a change in expectation.

It comes down to the motivation as to why these constructors don't exist.

Is it only to prevent creation of other objects that use the same underlying Pointers (thus breaking all guarantees…), or was there something more?

I don't know - so I err'd on the side of caution.

Copy link
Author

@redvers redvers Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found it - this (from the tutorial).

This one is easy: tag variables are opaque! They can’t be read from.

So in my mind, if I could pass a Pointer[U8] tag into a function and get a copy of that string, it counted in my mind as "reading" from it.


# How We Teach This

The aim of this RFC is for there to be no user impact, so no new information will need to be shared.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this could break existing code, it needs to be taught in the sense of we must inform people this is happening. At minimum that is in release notes, perhaps more.


# How We Test This

There are currently no tests for `String.copy_cstring()` or `String.copy_cpointer()` in builtin_tests. My initial PR included some.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isnt sufficient for an RFC. If we get close to accepting a version of the RFC, then there needs to be detailed information about how we test. This section must be fulfilled for a PR implementing the RFC to be merged so a sufficiently detailed answer will be required here.


# Drawbacks

The first proposal doesn't feel "correct", creating an object via a function, and not using the calling object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and it adds a tax on everyone even those who are fine with the existing functionality.


The second proposal although not breaking safety, does break expectations that have been set over time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please show where these expectations have been set.


# Alternatives

## Change the signature to only accept `Pointer[U8] val`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont see any reason to make a breaking change of this sort when the same can be accomplished with a new constructor.


This would be a trivial but breaking change. A review of stdlib and the libraries in the github ponylang repositories finds no clear cases where I think this would break.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial can mean a lot of things. i would not be in favor of an RFC that that's this is trivial without defining what is trivial about it.


```pony
new iso copy_cstring(str: Pointer[U8 val] val)
```

## Leave copy_cstring() as is, add copy_cstring_iso()

This would be a trivial non-breaking change, and there is existing precedent for including refcaps in function names: `String.from_iso_array()`.

```pony
new iso copy_cstring_iso(str: Pointer[U8 val] val)
```

# Unresolved questions

Although the ideal of not making a breaking change is strong, it may result in being the most correct way forward.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct is a loaded term. i would not be in favor of an rfc that uses that language without detailing exactly what is meant by "correct".