-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] windows: ucrt build on win10+: support unicode file names #438
Conversation
I'm a bit torn about this one. On one hand, this will enable unicode file names and env vars in modern builds of On the other hand, exactly the same effect can be achieved without any changes to the code at all - by placing a small XML UTF8 manifest file As a bonus, this will also work with msvcrt builds of less.exe.manifest<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<assembly manifestVersion="1.0" xmlns="urn:schemas-microsoft-com:asm.v1">
<assemblyIdentity type="win32" name="less.exe" version="6.0.0.0"/>
<application>
<windowsSettings>
<activeCodePage xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">UTF-8</activeCodePage>
</windowsSettings>
</application>
</assembly> However, using the manifest requires shipping this manifest, or additional work to embed it as a resource at the binary, which likely won't happen by itself, and also it won't run on XP. I don't know how important XP is. I still think this PR is a useful addition, mainly because it is useful, not invasive, and can be ripped out quite easily if needed, but it feels wrong when the same effect can happen without any code changes at all using the manifest... |
Hmm.. somewhat tangential, but I just noticed that there are two msvc makefiles - I don't think we need two makefiles? the differences between them are:
Shall we delete (and I'll update this PR to add The build tests were with vs2015 command line env. Also, it wouldn't hurt to add msvc and mingw build instructions. I can do that later. |
Makefile.win-vc64 was created by @Konstantin-Glukhov in 2bf6792. I don't know if there was a specific reason to create a separate Makefile, but I support merging the Makefiles if there are no strong reasons not to do so. |
I vote for having one Makefile |
There are three methods to support unicode file names on Windows: 1. Use the W Windows APIs (_wfopen, CreateFileW, etc). This works on Windows XP and later, but requires prevasive-ish code changes. 2. Attach/use a UTF-8 manifest without any code changes (i.e. keep using the ANSI APIs - fopen, CreateFile[A], etc). This works on Windows 10 1803 and later compiled with msvcrt or ucrt, can be done at build time or post-build, has no effect on win7/8[.1], but the binary doesn't run on Windows XP. 3. (try to) change the locale at runtime to UTF-8. Works on Windows 10 1803 and later compiled with ucrt only (msvc 2015+ or mingw ucrt). With msvcrt and/or earlier windows this has no effect, but it runs on XP and later. Requires small-ish code changes, because argv/env are already ANSI on entry, so they need to become UTF-8. This commit takes approach 3. If approach 1 or 2 is taken later, then this commit can be reverted. A UTF-8 manifest can still be used (e.g. if built with msvcrt), at which case the new code becomes no-op (when GetACP() is CP_UTF8).
It's effectively identical to Makefile.wnm, and since commit 0713eaf Makefile.wng can build 64 bits too. While at it, update the comment at Makefile.wnm to mention 64 too.
Thanks. Rebased, updated the PR to modify only |
Thanks. Probably worth mentioning at the changelog/news too ("UCRT builds on win10 and later now support Unicode file names"). |
This is a heads-up notice that apparently setting the UTF8 locale doesn't behave exactly as I thought it would. Docs for setting UTF-8 locale with UCRT: (search Note this from the docs:
and:
So I was expecting it to behave excatly like the manifest, but without So while the initial claim to open unicode file name does seem correct (and tested in "less"), because
So in "less", we know that unicode file names do open correctly, but I don't know what, if any, doesn't work correctly. For comparison, when using the UTF8 manifest (docs), then My recomendation is to leave it currently, but be aware that the unicode file handling might be incomplete, and watch out for relevant bug reports. For what it's worth, the "official" releases for windows here are built with UCRT, so, once a new release is made, we can expect it to support unicode file name using this feature (on Windows 10+). The current-latest windows build (643) was before this feature was added. FYI. |
As of now, my estimation is that the So as a rule of thumb, if the API begins in a capital letter, like |
I think these are the affected, non-libc APIs in "less" - which don't become UTF-8 (observed at the binary, could be used implicitly by the compiler). Note that the name at the source code is typically witout the final
EDIT: found it. So overall, I think only the console title get/set is broken with unicode file names. |
While both are true, they're actually unrelated issues. Get/Set ConsoleTitleA are only used to save/restore the title on startup/exit, and indeed the restoration may fail if the original title was unicode. The fact that the console title is gibberish with unicode file names relates to the fact that it's converted to wide char using CP_ACP instead of I'll make a PR to fix these issues. |
There are three methods to support unicode file names on Windows:
Use the W Windows APIs (_wfopen, CreateFileW, etc). This works on Windows XP and later, but requires prevasive-ish code changes.
Attach/use a UTF-8 manifest without any code changes (i.e. keep using the ANSI APIs - fopen, CreateFile[A], etc). This works on Windows 10 1903 and later compiled with msvcrt or ucrt, can be done at build time or post-build, has no effect on win7/8[.1], but the binary doesn't run on Windows XP.
(try to) change the locale at runtime to UTF-8. Works on Windows 10 1803 and later compiled with ucrt only (msvc 2015+ or mingw ucrt). With msvcrt and/or earlier windows this has no effect, but it runs on XP and later. Requires small-ish code changes, because argv/env are already ANSI on entry, so they need to become UTF-8.
This commit takes approach 3. If approach 1 or 2 is taken later, then this commit can be reverted.
A UTF-8 manifest can still be used (e.g. if built with msvcrt), at which case the new code becomes no-op (when GetACP() is CP_UTF8).