Making EPERM friendlier
Error reporting from the kernel (and low-level system libraries such as the C library) has been a primitive affair since the earliest UNIX systems. One of the consequences of this is that end users and system administrators often encounter error messages that provide quite limited information about the cause of the error, making it difficult to diagnose the underlying problem. Some recent discussions on the libc-alpha and Linux kernel mailing lists were started by developers who would like to improve this state of affairs by having the kernel provide more detailed error information to user space.
The traditional UNIX (and Linux) method of error reporting is via the (per-thread) global errno variable. The C library wrapper functions that invoke system calls indicate an error by returning -1 as the function result and setting errno to a positive integer value that identifies the cause of the error.
The fact that errno is a global variable is a source of complications for user-space programs. Because each system call may overwrite the global value, it is sometimes necessary to save a copy of the value if it needs to be preserved while making another system call. The fact that errno is global also means that signal handlers that make system calls must save a copy of errno on entry to the handler and restore it on exit, to prevent the possibility of overwriting a errno value that had previously been set in the main program.
Another problem with errno is that the information it reports is rather minimal: one of somewhat more than one hundred integer codes. Given that the kernel provides hundreds of system calls, many of which have multiple error cases, the mapping of errors to errno values inevitably means a loss of information.
That loss of information can be particularly acute when it comes to certain commonly used errno values. In a message to the libc-alpha mailing list, Dan Walsh explained the problem for two errors that are frequently encountered by end users:
Those two errors have been defined on UNIX systems since early times. POSIX
defines
EACCES as "an attempt was made to access a file in a way
forbidden by its file access permissions
" and EPERM as
"an attempt was made to perform an operation limited to processes
with appropriate privileges or to the owner of a file or other
resource
". These definitions were fairly comprehensible on early
UNIX systems, where the kernel was much less complex, the only method of
controlling file access was via classical rwx file permissions,
and the only kind of privilege separation was via user and group IDs and
superuser versus non-superuser. However, life is rather more complex on
modern UNIX systems.
In all, EPERM and EACCES are returned by more than 3000 locations across the Linux 3.7 kernel source code. However, it is not so much the number of return paths yielding these errors that is the problem. Rather, the problem for end users is determining the underlying cause of the errors. The possible causes are many, including denial of file access because of insufficient (classical) file permissions or because of permissions in an ACL, lack of the right capability, denial of an operation by a Linux Security Module or by the seccomp mechanism, and any of a number of other reasons. Dan summarized the problem faced by the end user:
Dan's mail linked to a wiki page ("Friendly EPERM") with a proposal on how to deal with the problem. That proposal involves changes to both the kernel and the GNU C library (glibc). The kernel changes would add a mechanism for exposing a "failure cookie" to user space that would provide more detailed information about the error delivered in errno. On the glibc side, strerror() and related calls (e.g., perror()) would access the failure cookie in order obtain information that could be used to provide a more detailed error message to the user.
Roland McGrath was quick to point out that the solution is not so simple. The problem is that it is quite common for applications to call strerror() only some time after a failed system call, or to do things such as saving errno in a temporary location and then restoring it later. In the meantime, the application is likely to have performed further system calls that may have changed the value of the failure cookie.
Roland went on to identify some of the problems inherent in trying to extend existing standardized interfaces in order to provide useful error information to end users:
Frankly, I don't see any practical way to achieve what you're after. In most cases, you can't even add new different errno codes for different kinds of permission errors, because POSIX specifies the standard code for certain errors and you'd break both standards compliance and all applications that test for standard errno codes to treat known classes of errors in particular ways.
In response, Eric Paris, one of the other proponents of the failure-cookie idea acknowledged Roland's points, noting that since the standard APIs can't be extended, then changes would be required to each application that wanted to take advantage of any additional error information provided by the kernel.
Eric subsequently posted a note to the kernel mailing list with a proposal on the kernel changes required to support improved error reporting. In essence, he proposes exposing some form of binary structure to user space that describes the cause of the last EPERM or EACCES error returned to the process by the kernel. That structure might, for example, be exposed via a thread-specific file in the /proc filesystem.
The structure would take the form of an initial field that indicates the subsystem that triggered the error—for example, capabilities, SELinux, or file permissions—followed by a union of substructures that provide subsystem-specific detail on the circumstances that triggered the error. Thus, for a file permissions error, the substructure might return the effective user and group ID of the process, the file user ID and group ID, and the file permission bits. At the user-space level, the binary structure could be read and translated to human-readable strings, perhaps via a glibc function that Eric suggested might be named something like get_extended_error_info().
Each of the kernel call sites that returned an EPERM or EACCES error would then need to be patched to update this information. But, patching all of those call sites would not be necessary to make the feature useful. As Eric noted:
There were various comments on Eric's proposal. In response to concerns from Stephen Smalley that this feature might leak information (such as file attributes) that could be considered sensitive in systems with a strict security policy (enforced by an LSM), Eric responded that the system could provide a sysctl to disable the feature:
Reasoning that its best to use an existing format and its tools rather than inventing a new format for error reporting, Casey Schaufler suggested that audit records should be used instead:
Eric expressed concerns that copying an audit record to the process's task_struct would carry more of a performance hit than copying a few integers to that structure, concluding:
Jakub Jelinek wondered which system
call Eric's mechanism should return information about, and whether its
state would be reset if a subsequent system call succeeded. In many cases,
there is no one-to-one mapping between C library calls and system calls, so
that some library functions may make one system call, save errno,
then make some other system call (that may or may not also fail), and then
restore the first system call's errno before returning to the
caller. Other C library functions themselves set errno. "So,
when would it be safe to call this new get_extended_error_info function and
how to determine to which syscall it was relevant?
"
Eric's opinion was that the mechanism
should return information about the last kernel system call. "It
would be really neat for libc to have a way to save and restore the
extended errno information, maybe even supply its own if it made the choice
in userspace, but that sounds really hard for the first pass.
"
However, there are problems with such a bare-bones approach. If the value returned by get_extended_error_info() corresponds to the last system call, rather than the errno value actually returned to user space, this risks confusing user-space applications (and users). Carlos O'Donell, who had earlier raised some of the same questions as Jakub and pointed out the need to properly handle the extended error information when a signal handler interrupts the main program, agreed with Casey's assessment that get_extended_error_info() should always return a value that corresponds to the current content of errno. That implies the need for a user-space function that can save and restore the extended error information.
Finally, David Gilbert suggested that
it would be useful to broaden Eric's proposal to handle errors beyond
EPERM and EACESS. "I've wasted way too much time
trying to figure out why mmap (for example) has given me an EINVAL; there
are just too many holes you can fall into.
"
In the last few days, discussion in the thread has gone quiet. However,
it's clear that Dan and Eric have identified a very real and practical
problem (and one that has been identified
by others in the past). The solution would probably need to address the
concerns raised in the discussion—most notably the need to have
get_extended_error_info() always correspond to the current value
of errno—and might possibly also be generalized beyond
EPERM and EACCES. However, that should all be feasible,
assuming someone takes on the (not insignificant) work of fleshing out the
design and implementing it. If they do, the lives of system administrators
and end users should become considerably easier when it comes to diagnosing
the causes of software error reports.
Index entries for this article | |
---|---|
Kernel | User-space API/Error reporting |
Posted Jan 19, 2013 3:45 UTC (Sat)
by dlang (guest, #313)
[Link] (45 responses)
If errors are rare, this is easy (look for the error with grep, or just look at the logs and notice the error)
if errors are common you have a bigger problem (both in running your system, and in finding the error :-) but finding what error message is unusual in a pile of common error messages is a common problem when dealing with logs.
Posted Jan 19, 2013 10:16 UTC (Sat)
by gdt (subscriber, #6284)
[Link] (1 responses)
Posted Jan 19, 2013 10:35 UTC (Sat)
by dlang (guest, #313)
[Link]
routine, or are you listing this as an advantage? In any case it's far easier with logs than with the other options being listed.
> access permissions
configurable by the sysadmin
> information leakage
configurable by the sysadmin, just like all other information in the logs. This is even ignoring the syscall to disable it that was mentioned
> file formats
Yes, this is a wonderful advantage, the data can be put in whatever file format the sysadmin wants.
> high cost of error path allowing denial of service.
Only if you configure it to be a denial of service, Again, this is up to the sysadmin, some admins may want to run a system so locked down that if the log cannot be written they want the system to stop. Most admins won't want this, and this behavior is configurable in the logging daemons.
everything you mention is either a solved problem, or a strong advantage of having this information in logs rather than in some temporary memory structure that requires that applications be modified to gather the information (and in almost every case, that gathered information ends up in the logs from the application)
Logs already contain sensitive information, in fact, any substantial body of logs is going to contain user passwords, from the simple fact that it's valuable to track failed login attempts and _someone_ will get out of sync with the software and type their password in the userid field, followed pretty quickly afterwords with a successful login by that same user.
This is part of the reason that system logs (at least authentication related logs) need to have their access restricted to the admins. I don't see any reason that this extra information about denied access would be any different.
And I flatly reject the concept that the reason for denying access needs to be kept secret from the sysadmin who's running the box (who may need to grant the access)
Posted Jan 20, 2013 0:13 UTC (Sun)
by dvdeug (subscriber, #10998)
[Link] (25 responses)
Posted Jan 20, 2013 0:30 UTC (Sun)
by dlang (guest, #313)
[Link] (24 responses)
If you were to get all the *BSDs to sign on, that would probably be enough.
Posted Jan 20, 2013 0:50 UTC (Sun)
by Cyberax (✭ supporter ✭, #52523)
[Link] (22 responses)
Posted Jan 20, 2013 0:54 UTC (Sun)
by ebiederm (subscriber, #35028)
[Link] (6 responses)
Posted Jan 20, 2013 1:03 UTC (Sun)
by dlang (guest, #313)
[Link] (5 responses)
Both groups benefit from being able to use the work that came before them, and both groups want to prevent others from benefiting from the work that they are doing (or are arrogant enough to believe that what they are doing is perfect and there will never be any need to build on what they are doing)
I fully expect people to take offence at this comparison, but after you calm down a bit, think about it and you will hopefully be a bit uncomfortable at how close the comparison matches.
Posted Jan 20, 2013 1:13 UTC (Sun)
by dvdeug (subscriber, #10998)
[Link] (2 responses)
Nobody is obliged to let history chain them in place; if you think you can do better then POSIX, you have the right to try and it's sad that you'll have to endure abuse to do so. Everyone benefits from the work that came before them; *nix systems have been harvesting features from Windows and Apple for years, and designing systems that don't work on those OSes, with no apology.
Posted Jan 20, 2013 21:51 UTC (Sun)
by deater (subscriber, #11746)
[Link] (1 responses)
Of course it does! I've successfully run in under the simh emulator, enough to port my assembly-language version of linux_logo to it (http://www.deater.net/weave/vmwprod/asm/ll/ll.html).
Sadly it seems like development has died off at some point, and the top hit for a website doesn't have much info.
Posted Jan 21, 2013 14:59 UTC (Mon)
by jjs (guest, #10315)
[Link]
According to that (if I read it correctly), they have it running on 2.6.18 kernel, at least to the CLI.
Posted Jan 20, 2013 1:21 UTC (Sun)
by Cyberax (✭ supporter ✭, #52523)
[Link]
I'm lurking on a lot of mailing lists and too often I see responses that can be summed up as: "It's not POSIX! Burn the heretic!" Meanwhile, competitors who don't care about POSIX beyond the very basics eat up their marketshare.
Posted Jan 20, 2013 9:33 UTC (Sun)
by alankila (guest, #47141)
[Link]
Posted Jan 20, 2013 1:07 UTC (Sun)
by dvdeug (subscriber, #10998)
[Link] (14 responses)
Posted Jan 20, 2013 1:09 UTC (Sun)
by Cyberax (✭ supporter ✭, #52523)
[Link] (13 responses)
Posted Jan 20, 2013 1:14 UTC (Sun)
by mpr22 (subscriber, #60784)
[Link]
Posted Jan 20, 2013 1:21 UTC (Sun)
by dvdeug (subscriber, #10998)
[Link] (1 responses)
I believe that file names should be valid strings of Unicode characters. But if you do that, there's going to be edge problems where POSIX programs can't access certain files, can't create certain files for reasons inexplicable to them, or the POSIX filename-native filename mapping is confusing. The question is going to be is it worth it?
Posted Jan 20, 2013 1:23 UTC (Sun)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Jan 20, 2013 1:25 UTC (Sun)
by dlang (guest, #313)
[Link] (9 responses)
You act as if the POSIX (and Single Unix Specification) standard is something handed down from on high that hasn't changed in 20 years.
The last revision to POSIX and SUS took place within the last couple of years, and the next one will take place within the next few years.
These standards work by looking at the things that people are developing, and getting consensus between the different developers as to what they can agree on, They then have those developers go and implement what they are proposing, and it only gets into the standard after there are running implementations.
by definition this means that they encourage new, non-standard, things to be developed and deployed (they can't add something to the standard if it hasn't been deployed yet)
The problem isn't with the idea of enhancing things, it's with the idea that standards don't matter, nobody else matters, only develop for yourself and to #$% with everyone else.
Posted Jan 20, 2013 1:31 UTC (Sun)
by dvdeug (subscriber, #10998)
[Link] (6 responses)
90% of companies and organizations fail quickly no matter what they do.
Posted Jan 21, 2013 23:13 UTC (Mon)
by cmccabe (guest, #60281)
[Link] (5 responses)
http://msdn.microsoft.com/en-us/library/cc231199.aspx
So this kind of rant is offtopic in more ways than one...
Posted Jan 22, 2013 17:51 UTC (Tue)
by ssmith32 (subscriber, #72404)
[Link] (1 responses)
ERROR_SUCCESS, of course :)
-stu
Posted Jan 23, 2013 0:36 UTC (Wed)
by marcH (subscriber, #57642)
[Link]
PS: I thought Windows 7 got rid of the "Start" name but I just found it is still showing as a tooltip.
Posted Jan 24, 2013 11:19 UTC (Thu)
by sorokin (guest, #88478)
[Link] (2 responses)
Posted Jan 30, 2013 5:53 UTC (Wed)
by cmccabe (guest, #60281)
[Link] (1 responses)
Posted Jan 30, 2013 7:33 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Jan 20, 2013 1:36 UTC (Sun)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
> You act as if the POSIX (and Single Unix Specification) standard is something handed down from on high that hasn't changed in 20 years.
For example, my another pet peeve - signals are useless for library writers because there's no mechanism to allocate/reserve them or to pass parameters to a signal handler.
> The last revision to POSIX and SUS took place within the last couple of years, and the next one will take place within the next few years.
> The problem isn't with the idea of enhancing things, it's with the idea that standards don't matter, nobody else matters, only develop for yourself and to #$% with everyone else.
Posted Jan 20, 2013 14:17 UTC (Sun)
by RobSeace (subscriber, #4435)
[Link]
You may wish to look into sigaction(SA_SIGINFO) and sigqueue() used with POSIX.1b real-time signals... That at least solves your second issue... As for your first, I'd think just using sigaction() to peek at the current handler would tell you if a signal is currently already in use or not...
Posted Jan 20, 2013 1:00 UTC (Sun)
by dvdeug (subscriber, #10998)
[Link]
With all due respect to *BSD maintainers, I don't see it. Lots of stuff uses udev, despite it being Linux-only. If you want to provide decent error messages to the majority of your users, you'll support it; if you don't care, then you won't.
Posted Jan 20, 2013 9:58 UTC (Sun)
by epa (subscriber, #39769)
[Link] (10 responses)
Many of the original UNIX designers clearly did not share the extreme conservatism of some of their followers. Plan 9 introduced errstr, an error string set by system calls and maintained in parallel with the old errno. That seems like a simple and elegant solution which Linux could also adopt.
Posted Jan 20, 2013 21:52 UTC (Sun)
by dlang (guest, #313)
[Link] (1 responses)
it lets the admin of the box see all the access that was denied. This can frequently identify 'bad actors' (unless they know the system intimately, they will have to poke around a bit before they find the hole they can get through)
And if you have a lot of permission denied errors, you would want to fix the software that's generating them to do something different.
all of this without any need to tie it in to a specific return code.
It happens to also give you a way to get more detail on the specific error (when you can tie the error to a specific time), and it nicely addresses the fact that you may not want to user to know all the details of why the permission was denied, but you do want to let the admin know.
Posted Jan 21, 2013 5:53 UTC (Mon)
by epa (subscriber, #39769)
[Link]
Returning meaningful error indicators to userspace does not preclude writing to a log file as well. In some cases, yes, security requires giving a terse 'permission denied' error with no further details. That situation is not the norm.
Posted Jan 20, 2013 23:09 UTC (Sun)
by skissane (subscriber, #38675)
[Link] (6 responses)
Posted Jan 21, 2013 0:10 UTC (Mon)
by ebiederm (subscriber, #35028)
[Link] (2 responses)
Posted Jan 25, 2013 4:17 UTC (Fri)
by skissane (subscriber, #38675)
[Link] (1 responses)
And, while this approach is popular with e.g. the gettext API, I see a couple of drawbacks:
Posted Jan 25, 2013 4:25 UTC (Fri)
by dlang (guest, #313)
[Link]
Posted Jan 21, 2013 10:44 UTC (Mon)
by micka (subscriber, #38720)
[Link] (2 responses)
I don't think english speaking people can really see what the problem is here : when the error message is by default internationalized, you can't "google" it, it will only return a handful of results, all of them by someone asking about the same problem you have, with no answers.
When you got this sort of error message and you must perform a search, you know you must reproduce the problem with i18n disabled. Sometimes it's as simple as
LANG=C <myprogram> <myparams>
but sometimes (I mostly have the problem when on windows, I don't know this system much and have no clue how i18n works there), that doesn't work.
The worst I have seen is the oracle database server (a really bad software anyway) giving you i18n'ed error messages ; you can't change the language on the client, you must do it on the server (if you are allowed to do so) !
Posted Jan 21, 2013 11:06 UTC (Mon)
by epa (subscriber, #39769)
[Link]
Any translated or 'friendly' message should be accompanied by a 'more details' button which gives the original string you can Google for.
Posted Jan 25, 2013 4:07 UTC (Fri)
by skissane (subscriber, #38675)
[Link]
Your comment "you can't change the language on the client, you must do it on the server (if you are allowed to do so)" doesn't appear to be true:
SQL> select * from fred;
SQL> alter session set nls_language = german;
Session altered.
SQL> select * from fred;
(Disclosure: I work for Oracle; these are my personal opinions, not my employer's.)
Posted Feb 2, 2013 18:06 UTC (Sat)
by quanstro (guest, #77996)
[Link]
Posted Jan 21, 2013 8:40 UTC (Mon)
by jorgegv (subscriber, #60484)
[Link] (5 responses)
If all of those error-returning calls each get a line or two sent to syslog with the reasons for the denial, you'll fill your log partition pretty quickly. Appart from the huge workload the syslog process would have.
The policy ('I want this error logged' or 'I don't want this error logged') belongs in userspace, not kernel.
Posted Jan 21, 2013 9:07 UTC (Mon)
by dlang (guest, #313)
[Link] (4 responses)
that said, if the logs are not written sanely, filtering them can be expensive, but given that we are talking about adding the logging now, we should be able to make this be something easy to filter.
Posted Jan 21, 2013 10:16 UTC (Mon)
by meuh (guest, #22042)
[Link] (3 responses)
Sadly, I'm not really convinced myself: even if there's only one POSIX function in the "critical" section, this call could be translated in multiple library function and syscall calls. And this is going to create some annoyance when reading logs.
Posted Jan 21, 2013 10:26 UTC (Mon)
by meuh (guest, #22042)
[Link] (2 responses)
Posted Jan 21, 2013 23:19 UTC (Mon)
by cmccabe (guest, #60281)
[Link] (1 responses)
We need ErrnoKit, a DBUS-enabled daemon that sends XML messages to a Mono process, which logs them in a custom binary format to the GNOME3 registry. Then they can be retrieved by the client application through SOAP requests to a CORBA object broker.
Posted Jan 22, 2013 9:17 UTC (Tue)
by niner (subscriber, #26151)
[Link]
Posted Jan 19, 2013 5:03 UTC (Sat)
by luto (subscriber, #39314)
[Link]
[1] http://en.wikipedia.org/wiki/Padding_oracle_attack
Posted Jan 19, 2013 5:24 UTC (Sat)
by josh (subscriber, #17465)
[Link] (5 responses)
Posted Jan 19, 2013 11:23 UTC (Sat)
by khim (subscriber, #9252)
[Link]
Posted Jan 20, 2013 8:17 UTC (Sun)
by geofft (subscriber, #59789)
[Link] (3 responses)
Posted Jan 21, 2013 9:36 UTC (Mon)
by pabs (subscriber, #43278)
[Link] (2 responses)
Posted Jan 21, 2013 9:49 UTC (Mon)
by rahulsundaram (subscriber, #21946)
[Link]
Posted Jan 21, 2013 10:49 UTC (Mon)
by micka (subscriber, #38720)
[Link]
Posted Jan 19, 2013 10:39 UTC (Sat)
by bokr (subscriber, #58369)
[Link] (1 responses)
Posted Jan 21, 2013 19:21 UTC (Mon)
by dave_malcolm (subscriber, #15013)
[Link]
Posted Jan 19, 2013 16:18 UTC (Sat)
by apoelstra (subscriber, #75205)
[Link]
People arguing about the security implications here should be arguing about how to secure their logs, not how to sanitize them.
Posted Jan 19, 2013 20:05 UTC (Sat)
by dkg (subscriber, #55359)
[Link] (1 responses)
I'm grateful to see the additional error reporting (i do think that obscure errors limit the usability of our systems) but there are some tricky tradeoffs that need to be balanced to do it right.
Posted Jan 20, 2013 0:23 UTC (Sun)
by dvdeug (subscriber, #10998)
[Link]
Posted Jan 20, 2013 6:17 UTC (Sun)
by wahern (subscriber, #37304)
[Link] (3 responses)
Yes, it's tedious to deal with errors early, but everything is tedious in C. It's the nature of the language. C isn't a RAD environment.
Let's not pretend that this proposal is a better errno. It's a work-around for broken software. It's equivalent to wrapping every error in an exception, and pretending that exceptions fix the tedium, instead of what they usually do--kick the bucket down the road.
Now, that doesn't make it a bad proposal, per se, just not what it's advertised as.
As for dealing with errno munging, the simplest answer is to not use errno. Capture the errno value immediately after a system call fails. And stop writing library APIs which write through errno; instead, return a friggin' int directly. Why people use kernel error reporting semantics as a prototype, I'll never understand. When I see application routines which return -1 to signal an error, I want to tear my hair out.
Posted Jan 20, 2013 6:34 UTC (Sun)
by dlang (guest, #313)
[Link]
The problem that's being addressed here is that a well written application that is going to tell the user what went wrong only knows that permisison was denied.
The application cannot provide any more information to the user, because it doesn't have the information.
And there is no sane way for the admin/support person to figure out _why_ the permission was denied.
In the 'old days', this was fairly simple, there was only one place to check (the rwx permissions).
However today, it's much harder.
When you don't give admins sane ways to figure out the cause of the permission problems within the context of the more complex security model, the result is going to be that admins disable the more complex security model, a secure system that doesn't get the job done is worthless.
Posted Jan 20, 2013 9:40 UTC (Sun)
by alankila (guest, #47141)
[Link] (1 responses)
If only C, or the userspace-kernel API could have something like that...
Posted Jan 21, 2013 8:40 UTC (Mon)
by epa (subscriber, #39769)
[Link]
Posted Jan 20, 2013 8:29 UTC (Sun)
by akeane (guest, #85436)
[Link] (1 responses)
The only jmp codes that should be popped off the stack should be EFAIL and SSUCCESS, computers are binary after all, if my processor doesn't need more than two states to let me play Doom, then I fail to see why the arrogant C library and POSIX standard should need more. It's complexity for it's own sake...
Maybe as a compromise, some kind soul should hack the C library so it associates an "E" number with a NULL terminated series of bytes which can then be written to a terminal or dot matrix printer.
Posted Jan 25, 2013 19:36 UTC (Fri)
by nix (subscriber, #2304)
[Link]
Posted Jan 20, 2013 14:08 UTC (Sun)
by justincormack (subscriber, #70439)
[Link] (4 responses)
A high bit mask with more detailed information would suffice, so you get the traditional error code in the low 8 bits, then information about the error location in the kernel in other bits. The kernel could export a map of more detailed information, so you could match up (and document) the reasons.
Obviously this is a breaking change, so your binary might have to set some flag to get the extended bits from the kernel.
Posted Jan 20, 2013 15:44 UTC (Sun)
by andreasb (guest, #80258)
[Link] (1 responses)
Posted Jan 20, 2013 18:59 UTC (Sun)
by akeane (guest, #85436)
[Link]
_open is the normal one
So, you need a set of extra syscalls in the kernel to add more info to the ret value, (luckily this will add even more lines of code and complexity to the kernel, what could go wrong? yay!)
and a switch in the C lib:
cc -o my_earthly_soul p_audio.cs -DOMGMOREERRNOSSUCKA
But this is assuming that anybody actually goes around checking error codes in this modern era; no one really bothers anyway; if it's a real problem and not just the kernel nagging at you then something else will break properly later on and you get a nice SEGV which you can blame on a third party device driver.
It also becomes increasing difficult to add additional lines of error checking code when you reach a certain age, and your monocle has seen better days (also you waste valuable bytes on your winchester disk)
I stand by my assertion that only two ERR codes are necessary in your typical unix warez:
fd = open("~/Music/a-dreadful-din.mp1");
#ifdef YOUNG_PERSON
if(fd == E:-( ))
#ifdef MOI
if(fd == EGETOFFMYLAWN)
#endif
Posted Jan 20, 2013 19:03 UTC (Sun)
by jreiser (subscriber, #11027)
[Link] (1 responses)
Posted Jan 20, 2013 20:02 UTC (Sun)
by justincormack (subscriber, #70439)
[Link]
Posted Jan 20, 2013 23:50 UTC (Sun)
by imunsie (guest, #68550)
[Link] (2 responses)
http://libexplain.sourceforge.net/
Posted Jan 21, 2013 0:15 UTC (Mon)
by ebiederm (subscriber, #35028)
[Link]
Posted Jan 21, 2013 21:59 UTC (Mon)
by PaulWay (subscriber, #45600)
[Link]
Paul
Posted Jan 21, 2013 2:18 UTC (Mon)
by PaulWay (subscriber, #45600)
[Link]
I don't think this is the basis of any in-Kernel expanded messaging, but I do think that the knowledge that Peter has picked up and put in libexplain of why things go wrong and what various codes mean is a useful reference when trying to build a system that improves on the current error reporting.
Hope this helps,
Paul
Posted Jan 21, 2013 10:17 UTC (Mon)
by etienne (guest, #25256)
[Link] (6 responses)
Posted Jan 21, 2013 10:24 UTC (Mon)
by mpr22 (subscriber, #60784)
[Link] (5 responses)
Posted Jan 21, 2013 11:45 UTC (Mon)
by johill (subscriber, #25196)
[Link] (4 responses)
Posted Jan 21, 2013 18:47 UTC (Mon)
by dtlin (subscriber, #36537)
[Link] (3 responses)
Posted Jan 21, 2013 18:56 UTC (Mon)
by apoelstra (subscriber, #75205)
[Link] (2 responses)
Posted Jan 22, 2013 7:22 UTC (Tue)
by itvirta (guest, #49997)
[Link] (1 responses)
Btw, this is the first I've heard of errno_t, apparently on the systems I checked, errno is defined as just (extern) int errno. Where does errno_t come from?
Posted Jan 22, 2013 14:00 UTC (Tue)
by etienne (guest, #25256)
[Link]
I do not remember where I have seen it first, probably someone defined it locally when going from 32 bits int to 64 bits int, but a bit of internet search leads to:
Note that increasing the size of the memory referenced by errno is fully backward compatible with already compiled software. My comment was a bit early on Monday morning, to activate a "bigger" errno you would need to define something like '__STDC_WANT_BIG_ERRNO__' before including "errno.h", and check a 32 bits signature just after the old standard code if you may run on a LIBC which do not provide the big errno.
Posted Jan 22, 2013 0:00 UTC (Tue)
by marcH (subscriber, #57642)
[Link] (1 responses)
Posted Jan 23, 2013 9:37 UTC (Wed)
by ymmv (subscriber, #4375)
[Link]
Never had a detailed answer when I pretty please asked for a flame thrower.
Posted Jan 22, 2013 11:51 UTC (Tue)
by vonbrand (guest, #4458)
[Link] (2 responses)
If errno is an integer, why not just use the full range? I.e., low 8 bits contain "traditional" errno, high 24 bits (uint32_t should be plenty... famous last words) contain details for whoever is groping for them. Frob perror(3) to use the full range on some feature macro, i.e., LINUX_VERSION >= 0x030900. Or am I missing something critical here?
Posted Jan 22, 2013 12:00 UTC (Tue)
by andresfreund (subscriber, #69562)
[Link]
Posted Jan 23, 2013 1:57 UTC (Wed)
by gdt (subscriber, #6284)
[Link]
What an application wants to know is: what happened, what should I tell the user, and what should I do next?
Take what to do next: Does it matter? Should you loop around and retry? Should you back up to the file selection interface and retry? Should you terminate cleanly (ie, telling the user WTF just happened, at the cost of more resources)? Is it so severe you should throw up your hands and exit uncleanly to give the system the best chance to bounce back?
If you go an add a bazillion more integer values then the "what to do next" problem becomes a bazillion times harder. One thing which a redesign should do is to stop the overloading of errno with meaning.
Posted Jan 22, 2013 20:33 UTC (Tue)
by scripter (subscriber, #2654)
[Link] (4 responses)
For example, C++ streams haven't provided programmers with a standardized way to get at errno/strerror. There's a proposal to fix it:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n...
For g++, there's a non-standard workaround: use C calls and convert the file descriptor into a C++ stream using __gnu_cxx::stdio_filebuf<char>:
http://stackoverflow.com/questions/2746168
It's also nice to get good error feedback in higher-level languages like Java.
I get frustrated when applications swallow error messages and provide high-level "Something went wrong" messages because it makes it difficult to find and fix the root problem. I suppose that's why tools like strace and ltrace are so useful.
Posted Feb 2, 2013 12:43 UTC (Sat)
by MrWim (subscriber, #47432)
[Link] (3 responses)
This is also a frustration of mine. Most exception propagation schemes make only two options easy: It is far too difficult to provide an error message like "Loading simulation failed because cell B74 of sim.csv contains 'abc' when it should contain a number". It would be nice if it were possible to attach more and more context to an exception as it propagates up the stack. Java has exception.getCause() and C++'s boost::exception has the ability to attach more data to an exception. I prefer boost's approach but it still sucks as: I don't know if other languages have solved this in a nicer way.
Posted Feb 2, 2013 23:22 UTC (Sat)
by etienne (guest, #25256)
[Link] (1 responses)
Posted Feb 3, 2013 0:37 UTC (Sun)
by nix (subscriber, #2304)
[Link]
Doing this gets even more painful when you consider localization. Even GNU gettext, with its support for %s-style elements whose order depends on language, would have trouble here, I fear.
Posted Feb 3, 2013 19:16 UTC (Sun)
by MrWim (subscriber, #47432)
[Link]
log why the permission is denied
log why the permission is denied
log why the permission is denied
log why the permission is denied
log why the permission is denied
log why the permission is denied
log why the permission is denied
log why the permission is denied
log why the permission is denied
log why the permission is denied
http://vax-linux.org/
Mailing list still active as of Dec 2012
log why the permission is denied
log why the permission is denied
log why the permission is denied
log why the permission is denied
Don't let us stop you.
log why the permission is denied
log why the permission is denied
log why the permission is denied
log why the permission is denied
log why the permission is denied
log why the permission is denied
log why the permission is denied
log why the permission is denied
No. Since introduction of COM it uses IErrorInfo in addition to HRESULT.
log why the permission is denied
log why the permission is denied
log why the permission is denied
log why the permission is denied
And with even more companies that decided to "stick to standards" and stop innovating (e.g. basically all commercial UNIX vendors).
Yep. Not much has changed in important areas, changes are mostly cosmetic (and yes, we've actually paid for copies of official POSIX standards).
Will it include cgroups, namespaces, kqueue? No?
And yet, the recent history shows us that this very attitude works. Most "community projects" end up dead after extensive bike-shedding flamewars.
log why the permission is denied
> writers because there's no mechanism to allocate/reserve them or to pass
> parameters to a signal handler.
log why the permission is denied
log why the permission is denied
log why the permission is denied
log why the permission is denied
Returning an error string has a problem - it doesn't internationalize well. I think it is better to define a catalog of error numbers; each error number has attached the number and types of allowed parameters and the English text. Additional files can contain translations to other languages. The kernel then just makes available to user-space a buffer containing the error code and its parameters - it is up to user space to do the message formatting. You'd need to make sure user space is using the same message
catalog as the kernel - but that should not be too hard.
log why the permission is denied
log why the permission is denied
Well, you can't just use the English text as-is. You actually have to use the English text with substitution variables, e.g. "File %s not found", and then pass the substitution variables separately - so passing a single string variable from user-space doesn't work very well, you'd need to do something like pass a structure containing the format string and the arguments to go with it...
log why the permission is denied
log why the permission is denied
log why the permission is denied
log why the permission is denied
log why the permission is denied
select * from fred
*
ERROR at line 1:
ORA-00942: table or view does not exist
select * from fred
*
ERROR at line 1:
ORA-00942: Tabelle oder View nicht vorhanden
log why the permission is denied
errno in plan 9; they are not maintained in parallel.
log why the permission is denied
log why the permission is denied
log why the permission is denied
log why the permission is denied
Then have the kernel export, as part of the VDSO, an error decoder library.
But to be expandable, a user space library might be better.
log why the permission is denied
log why the permission is denied
Making EPERM friendlier
Making EPERM friendlier
It's only mystifying the very first time: after you spent day or two trying to understand what's wrong with your file (redownload it, unpack it again, try to give it 777 permissions, etc) and finally ask on mailing list and get the answer "oh, well, it's obvious: you need 32bit-compatibility subsystem, the file which is not found is actually /lib/ld-linux.so.2, not the file which you are trying to run"... it's frustrating enough that you remember that fiasco for a looong time.
Making EPERM friendlier
Making EPERM friendlier
Making EPERM friendlier
Making EPERM friendlier
Making EPERM friendlier
Making EPERM friendlier
I hadn't thought of that when I wrote the proposal, but I like that idea. and I've added it to the "Scope" section of the wiki page.
Thanks!
Making EPERM friendlier
Making EPERM friendlier
This is definitely potentially a two-edged sword. It's worth noting CVE-2013-0157 (aka debian bug 697464) is a recent and simple example of a way in which more-detailed error reporting causes a data leak that might not be acceptable on some systems.
Making EPERM friendlier
Making EPERM friendlier
$ mount --guess-fstype /root/.ssh/../../dev/sda1 ; even the error-reporting only looks like an error reporting one, because $ mount /root/.ssh/../../dev/cdrom mounting the cdrom confirms the existence of /root/.ssh as much as an error message would.
Making EPERM friendlier
Making EPERM friendlier
Making EPERM friendlier
Making EPERM friendlier
Making EPERM (un)friendlier
Making EPERM (un)friendlier
Making EPERM friendlier
Making EPERM friendlier
Making EPERM friendlier
__open_ret_32 is the OMG MORE ERROR CODEZ!!!
if(fd == E:-) )
return cool;
return opens_gonna_hate;
#endif
if(fd == EAKEANE) /* Clearly a measure of success */
{
/* Remove rubbish modern so-called "music" */
unlink("~/Music");
/* Check for errors from unlink? nah... */
return heh!;
}
unlink("~"); /* There's probably some bad music there somewhere */
Making EPERM friendlier
Making EPERM friendlier
A solved problem
A solved problem
A solved problem
Making EPERM friendlier
Making EPERM friendlier
Right now errno is an address of an errno_t area of memory.
Why not increase a bit the area to write:
- a signature confirming the extended errno_t
- the size of this errno_t
- what service created the error
- a better description of the error
- a serial number?
It would be fully backward compatible.
Congratulations, you just broke switch (errno) { /* ... */ }.
Making EPERM friendlier
Not really. As far as I understand he's basically saying
Making EPERM friendlier
struct ext_err_no {
int /* or whatever */ errno;
// ... extended info ...
};
struct ext_err_no errno_storage;
#define errno &errno_storage.errno
(I'd guess this could be made to work and still be compliant)
The question of course is how to determine that the extended info is there?
I'm pretty sure etienne was not suggesting that, because if you had been reading the article, this breaks existing programs.
Making EPERM friendlier
do_something_that_might_fail();
{ /* maybe inside an interrupt, logging routine,
* or anything else that happens between where
* the error occurs and its consumer */
errno_t saved_errno = errno;
do_something_else(); /* might change errno */
errno = saved_errno; /* so put errno back */
}
Now you've just clobbered a single field in errno_storage
, which might have been saving information from a different call.
If you make the variable errno
encompass the entire extended storage space, then that code is fine, but then you can't treat it as an int, which is certainly used in many places too.
Making EPERM friendlier
Making EPERM friendlier
Making EPERM friendlier
In the world of Standard C, the type 'errno_t' is defined by TR24731-1 (see http://stackoverflow.com/questions/372980/ for more information) and you have to 'activate it' by defining '__STDC_WANT_LIB_EXT1__'.
Making EPERM friendlier
Making EPERM friendlier
Making EPERM friendlier
Making EPERM friendlier
Making EPERM friendlier
Making EPERM friendlier, even for C++ programs
Making EPERM friendlier, even for C++ programs
Making EPERM friendlier, even for C++ programs
Making EPERM friendlier, even for C++ programs
Making EPERM friendlier, even for C++ programs