Skip to content

Conversation

@tobil4sk
Copy link
Member

@tobil4sk tobil4sk commented Apr 18, 2023

Currently, there are a lot of redundant string conversions when using the file dialog api. This was mentioned briefly in #1622.

Here is a rough overview of the current situation:

Linux/macOS:

  • C++: hxstring -> std::wstring -> std::string, passed into tinyfiledialogs utf-8 api, then std::string -> std::wstring -> hxstring
  • HL: hl_vstring -> std::wstring -> std::string, passed into tinyfiledialogs utf-8 api, then std::string -> std::wstring -> utf-8 bytes -> native utf-16 hashlink string (on Haxe side via String.fromUtf8())

Windows:

  • C++: hxstring -> std::wstring, passed into tinyfiledialogs utf-16 api, then std::wstring -> hxstring
  • HL: hl_vstring -> std::wstring, passed into tinyfiledialogs utf-16 api, then std::wstring -> utf-8 bytes -> native utf-16 hashlink string (on Haxe side via String.fromUtf8())

This cleans things up to avoid unnecessary conversions, so now it looks like this in most cases:

Linux/macOS:

  • C++: hxstring -> utf-8[0], passed into tinyfiledialogs utf-8 api, then utf-8 -> hxstring
  • HL: hl_vstring -> utf-8, passed into tinyfiledialogs utf-8 api, then utf-8 -> utf-16 hashlink string (on Haxe side via String.fromUtf8())

Windows:

  • C++: hxstring -> utf-16[0], passed into tinyfiledialogs utf-16 api, then utf-16 -> hxstring
  • HL: hl_vstring, passed into tinyfiledialogs utf-16 api, then utf-16 -> utf8 -> utf-16 hashlink string (on Haxe side via String.fromUtf8())

[0] hxcpp can use ASCII (which is compatible with utf-8) or utf-16 depending on the string, so these conversions are avoided in some cases.

We can improve Hashlink further (for Windows), by returning utf-16 strings from the apis, however, I'm not sure if this would be considered a breaking change (new lime.hdlls would be incompatible with old lime haxe files).

In #1622, we briefly discussed using the utf-8 tinyfiledialog functions on Windows to unify the api, however, I looked into this and on Windows these functions just convert back to utf-16, so there would still be extra conversions. So I think this is the cleanest solution.

I've tested on Linux and Windows with the code from #1622, and everything works well.

@nixbody
Copy link
Contributor

nixbody commented Apr 18, 2023

Just a little correction here, HXCPP (with smart strings enabled) never uses UTF-8 for strings. It uses ASCII and when any character in a string doesn't fit within ASCII range then the entire string is converted to UTF-16. HXCPP, however, provides a function to convert it's strings to UTF-8.

@tobil4sk
Copy link
Member Author

Yes, that's correct. However, an ASCII string is valid utf-8, so when using hxs_utf8 on an hxcpp string which is ASCII encoded, it just returns a pointer to the ASCII string without any conversion being necessary.

@nixbody
Copy link
Contributor

nixbody commented Apr 18, 2023

I was wondering if you point that out :) I thought you might know, my apologies for stating the obvious.

@tobil4sk
Copy link
Member Author

No worries, it's worth making the distinction :)

@tobil4sk
Copy link
Member Author

There are a few more places where wstrings are converted to utf8 still. These include most of the system functions, though these are windows only, where wchar is equal to char16_t so hl_to_utf8/alloc_wstring can be used there. For System::GetDirectory, it can be modified to avoid wstrings.

The bigger issue is Font::GetFamilyName() in text/Font.cpp. Currently it assumes that wchar is 16 bits, which is only the case on Windows. Even if it used char16_t, the alloc_hxs_utf16() function returns an HxString, and I can't seem to figure out to turn that into a value.

No need to iterate through the whole string to find the length, just
to see if it is empty.
@tobil4sk
Copy link
Member Author

I fixed conflicts in the PR and updated it to clean things up and reduce duplicate code. I've tested on linux and windows and it works as expected.

Aside from cleaning things up and simplicity, it is also a good idea to make this change because the std::wstring currently being created on non-windows systems are actually utf8 with widened characters, which is a bit confusing and doesn't play nicely with apis (e.g. hxcpp's alloc_wstring). This confusion has caused issues in the past like #1622.

@player-03
Copy link
Contributor

I was going to ask about memory leaks, since some delete statements were removed. But then I tracked everything down in tinyfiledialogs.c and found that it only returns static variables. If it's called again, it reuses the variable, overwriting the old value. And that's fine because we copy the value into a Haxe string before calling again. As far as I can tell, everything works as it should.

Not that I don't trust you to check, I just wanted to understand it myself. Plus, I learned some basics of gdb, which is nice.

@tobil4sk
Copy link
Member Author

I was going to ask about memory leaks, since some delete statements were removed. But then I tracked everything down in tinyfiledialogs.c and found that it only returns static variables.

This is something I found confusing too so perhaps a comment might be useful. I can't remember if tfd has some documentation explaining this or if i just had to check the source?

Also, I suppose the static string might also not be thread safe, though that is regardless of this PR. Though, given it is file dialogs it might be unlikely to cause issues in practice.

Not that I don't trust you to check, I just wanted to understand it myself.

I agree, we want code that can be easily verified/understood by anyone reading it, rather than just assuming it works based on trust in one contributor.

@player-03
Copy link
Contributor

I can't remember if tfd has some documentation explaining this or if i just had to check the source?

I checked the source when I wrote that, but it turns out they also mention it in the readme: "String memory is preallocated statically for all the returned values."

Also, I suppose the static string might also not be thread safe, though that is regardless of this PR. Though, given it is file dialogs it might be unlikely to cause issues in practice.

Agreed on all counts. I doubt most OSes allow opening dialogs from non-UI threads, but if they did, both threads would read/write that same memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants