Package: x2goserver Version: 4.0.1.20
At present, x2goserver sanitises usernames with a regexp in x2goutils.pm and in x2gosqlitewrapper.pl (same in both places). That's:
if ($string =~
/^([a-zA-Z\_][a-zA-Z0-9\_\-\.\@]{0,47}[\$]?)\-([\d]{2,4})\-([\d]{9,12})\_[a-zA-Z0-9\_\-\.]*\_dp[\d]{1,2}$/) {
A username of, eg, '1234567x' fails this test, and the x2go session fails to start. This is a valid username on CentOS (which is the OS I'm using in this case, but CentOS is far from unique here), therefore such a username should _not_ be rejected.
I have verified that the above code is indeed the source of my login problems, since if I hack the two files above, to have the regexp start with [a-zA-Z0-9\_], then my users can log in without difficulty. This hacking is obviously not a great solution.
An alternative test would be to use getpwent(3). This would verify that the proffered username is valid, absolutely authoritatively, without making any assumptions about what is or isn't valid on the current platform. x2go should not second-guess getpwent(3).
In Perl terms, the above test could be replace with:
$uid = getpwnam($string);
if ($uid) {
# username is OK
} else {
# user $string does not exist
}
Note that the test may in fact be redundant, since if this code is being run, then the corresponding user has already logged on to the system, so that the username has already been verified as valid and existing.
Other observations:
If the system (or specifically getpwent) regards a given username
as valid, then it will be valid for calls to other local library calls.
If this were not the case, that would be a major system bug on that
platform.
POSIX/Single Unix says of the username simply "To be portable across systems conforming to POSIX.1-2008, the value is composed of characters from the portable filename character set. The <hyphen-minus> character should not be used as the first character of a portable user name." (see <http://pubs.opengroup.org/onlinepubs/9699919799/>, paragraph 3.437)
The Debian useradd(8) page recommends something matching /^[a-z_][a-z0-9_-]*$/, but goes on to say "On Debian, the only constraints are that usernames must neither start with a dash ('-') nor contain a colon (':') or a whitespace (space: ' ', end of line: '\n', tabulation: '\t', etc.). Note that using a slash ('/') may break the default algorithm for the definition of the user's home directory." (see eg <https://www.unix.com/man-page/linux/8/useradd/>)
The corresponding RedHat/CentOS manpage doesn't even include that,
and instead says only "Usernames may only be up to 32 characters long."
FreeBSD is similarly laid-back about the username.
The GNU Coreutils manual <https://www.gnu.org/software/coreutils/manual/coreutils.html#Disambiguating-names-and-IDs> which explicitly acknowledges that an all-digits username is legitimate, and describes how the coreutils resolve the potential ambiguity.
It may have been true in the past that some unixes objected to all-digits usernames. I personally am not aware of any current unixes which do so.
It is not an option to change the usernames ('1234567x') on this system, since they are widely deployed in other systems. Also, they're valid username as far as the local system is concerned.
This issue was discussed on the user list a little while ago <http://lists.x2go.org/pipermail/x2go-user/2015-April/003161.html> (that's what gave me the aha!). There, Mihai Moldovan said "That's non-standard-compliant and you're basically on your own when doing "funky stuff"." To be clear, I agree such usernames are less than ideal, but I don't think they count as funky or non-compliant.
The issue was discussed on the x2go-dev list more recently, starting at <http://lists.x2go.org/pipermail/x2go-dev/2017-October/012140.html>
Best wishes,
Norman
-- Norman Gray : https://nxg.me.uk
On 10/27/2017 06:51 PM, Norman Gray wrote:
At present, x2goserver sanitises usernames with a regexp in x2goutils.pm and in x2gosqlitewrapper.pl (same in both places). [...]
Just to make it clear - we're not really "validating user names". I couldn't care less about the user name as such - it's the system's responsibility to deal with user names and if users managed to login, we can assume the user name is valid, like you've already written on the -dev mailing list and here.
What we really do in this part of code is validating a session ID, which happens to contain a user name. Sadly, as such, what we see as a user name must be correctly represented in order to check the session ID.
Generally, a session ID should look like that: <username>-<DISPLAY number>-<UNIX timestamp denoting session creation time>_st<string representation of session type>_dp<DISPLAY bit depth>
We want to make sure that this ID is valid and, further, that specific parts can be extracted from it.
A username of, eg, '1234567x' fails this test, and the x2go session fails to start. This is a valid username on CentOS (which is the OS I'm using in this case, but CentOS is far from unique here), therefore such a username should _not_ be rejected.
Okay, so we want to accept numeric and user names starting with digits.
An alternative test would be to use getpwent(3). This would verify that the proffered username is valid, absolutely authoritatively, without making any assumptions about what is or isn't valid on the current platform. x2go should not second-guess getpwent(3).
This is probably appropriate for other checks, yes, if we really need to make sure that a user name is valid on the system.
Note that the test may in fact be redundant, since if this code is being run, then the corresponding user has already logged on to the system, so that the username has already been verified as valid and existing.
In theory it's redundant. But there is a possibility that we are reading garbage data, where ever that might come from. Any bug (including our scripts messing up splitting up something, or inserting something invalid into the database and reading it again later) could trigger such a situation, so IMHO validation of input strings is really not redundant.
- POSIX/Single Unix says of the username simply "To be portable across systems conforming to POSIX.1-2008, the value is composed of characters from the portable filename character set. The <hyphen-minus> character should not be used as the first character of a portable user name." (see <http://pubs.opengroup.org/onlinepubs/9699919799/>, paragraph 3.437)
So, hyphen is prohibited as the first character. Also, SUS recommends (but does not enforce) using the portable filename character set[0] only for portability, which is restricted to [A-Za-z0-9._-]. Specifically, this does not include any special characters like umlauts, accented characters or generally any other Unicode character.
- The Debian useradd(8) page recommends something matching /^[a-z_][a-z0-9_-]*$/, but goes on to say "On Debian, the only constraints are that usernames must neither start with a dash ('-') nor contain a colon (':') or a whitespace (space: ' ', end of line: '\n', tabulation: '\t', etc.). Note that using a slash ('/') may break the default algorithm for the definition of the user's home directory." (see eg <https://www.unix.com/man-page/linux/8/useradd/>)
This is a bit stricter than the SUS definition (ignoring the portability recommendation). If taken at face value, Debian allows any Unicode character but the restricted ones. Interestingly, the recommended matching regexp doesn't include uppercase characters and, arguably more interestingly, doesn't allow a user name to start with a digit (which would be problematic for you).
- The corresponding RedHat/CentOS manpage doesn't even include that, and instead says only "Usernames may only be up to 32 characters long."
FreeBSD is similarly laid-back about the username.
This is interesting as well, since it's the first document that mentions a maximum length. If interpreted directly, the previous documents did not restrict the length (unless you haven't pasted some information relating to the string length).
Currently, we seem to limit the user name to a length of at least one to 48 characters. Not sure where this limit is coming from, but it's higher than what RHEL/CentOS documents. Specifically, if the 32 characters limit is a hard one, parsing longer user names than this successfully would be invalid. Then again, I'm not inclined to implement per-system limits, so I'd rather go with the most restricted set...
In the current code, for some reason we are also allowing the characters $ and @, if they are not the first characters. I'm not sure why that is, but allowing @ might be a consequence of multi-server support (i.e., that @ acts a separator for the host name.) Also, a trailing character of $ is allowed, which ramps up the section to 49 characters in total.
Going through the history, I can see that previously, a 32 (or 33) character limit was enforced, but changed to 48 (or 49) as part of this bug report: https://bugs.x2go.org/cgi-bin/bugreport.cgi?bug=574
The @ character indeed has been added to allow email-address-like user names as part of https://bugs.x2go.org/cgi-bin/bugreport.cgi?bug=573
Allowing $ as trailing characters has been part from the start, though I honestly don't understand why.
So, with all that, the most unrestrained option would be to use /^[^-].*/ for the user name part.
If taking the SUS recommendation for portable user names into account, that would change to /^[A-Za-z0-9._][A-Za-z0-9._-]*/.
If also applying the RHEL restriction of 32 characters, we'd come out at /^[A-Za-z0-9._][A-Za-z0-9._-]{0,31}/.
My initial idea was to just use the last regexp, but this would break setups with user@domain user names or user names longer than 32 characters.
In theory, /^[A-Za-z0-9._][A-Za-z0-9._-@]*/ should be more liberal, not restrict the length, allow portable user names and expand on that by allowing domain-based user names as well. I'd drop the trailing $.
Mihai
[0] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_...
Mihai, hello.
On 28 Oct 2017, at 9:09, Mihai Moldovan wrote:
On 10/27/2017 06:51 PM, Norman Gray wrote:
At present, x2goserver sanitises usernames with a regexp in x2goutils.pm and in x2gosqlitewrapper.pl (same in both places). [...]
Just to make it clear - we're not really "validating user names". I couldn't care less about the user name as such - it's the system's responsibility to deal with user names and if users managed to login, we can assume the user name is valid, like you've already written on the -dev mailing list and here. [...] What we really do in this part of code is validating a session ID, which happens to contain a user name. Sadly, as such, what we see as a user name must be correctly represented in order to check the session ID.
Generally, a session ID should look like that: <username>-<DISPLAY number>-<UNIX timestamp denoting session creation time>_st<string representation of session type>_dp<DISPLAY bit depth>
Righto -- that makes perfect sense: there did look like there was more going on there than mere validation.
Parenthetically (because it would imply changes well beyond the scope of this bug report), something like u<uid>-... would be easy to assemble at this point, and be totally bomb-proof (but obviously doesn't help if it's the user _name_ you need later on).
But returning to your points...
Note that the test may in fact be redundant, since if this code is being run, then the corresponding user has already logged on to the system, so that the username has already been verified as valid and existing.
In theory it's redundant. But there is a possibility that we are reading garbage data, where ever that might come from. Any bug (including our scripts messing up splitting up something, or inserting something invalid into the database and reading it again later) could trigger such a situation, so IMHO validation of input strings is really not redundant.
Sanity checks are good!
- POSIX/Single Unix says of the username simply "To be portable across systems conforming to POSIX.1-2008, the value is composed of characters from the portable filename character set. The <hyphen-minus> character should not be used as the first character of a portable user name." (see <http://pubs.opengroup.org/onlinepubs/9699919799/>, paragraph 3.437)
So, hyphen is prohibited as the first character. Also, SUS recommends (but does not enforce) using the portable filename character set[0] only for portability, which is restricted to [A-Za-z0-9._-]. Specifically, this does not include any special characters like umlauts, accented characters or generally any other Unicode character.
- The Debian useradd(8) page recommends something matching /^[a-z_][a-z0-9_-]*$/, but goes on to say "On Debian, the only constraints are that usernames must neither start with a dash ('-') nor contain a colon (':') or a whitespace (space: ' ', end of line: '\n', tabulation: '\t', etc.). Note that using a slash ('/') may break the default algorithm for the definition of the user's home directory." (see eg <https://www.unix.com/man-page/linux/8/useradd/>)
This is a bit stricter than the SUS definition (ignoring the portability recommendation). If taken at face value, Debian allows any Unicode character but the restricted ones. Interestingly, the recommended matching regexp doesn't include uppercase characters and, arguably more interestingly, doesn't allow a user name to start with a digit (which would be problematic for you).
- The corresponding RedHat/CentOS manpage doesn't even include that, and instead says only "Usernames may only be up to 32 characters long." FreeBSD is similarly laid-back about the username.
This is interesting as well, since it's the first document that mentions a maximum length. If interpreted directly, the previous documents did not restrict the length (unless you haven't pasted some information relating to the string length).
I wouldn't be at all astonished to see unicode usernames before long.
It's the sort of thing Apple or RedHat would do, and which it appears
Debian might already do in principle (if not much in practice); and
since IRIs, for example, can now at least indirectly support all of
Unicode in the DNS, the idea of a non-ASCII üsernamé is less
outlandish than it might once have been. It would go beyond SUS, and so
would be a big deal, but it might be worth having x2go aim for a
solution which is robust against that, and so which would solve the
issue once and for all.
Hmm: one possibility would be to put the uid in the session string (though I appreciate, as above, that may not work for x2go for other reasons).
Another would be to run the username through a punycode converter <https://en.wikipedia.org/wiki/Punycode> as with IRIs: any characters in [a-zA-Z0-9-] would come through that unchanged, but others would be normalised. This would be an invisible change for most usernames. As a normalisation, it also has the advantage that it's reversible if need be. I can't remember -- and that Wikipedia page doesn't refresh my memory -- exactly what subset of characters comes through a punycode conversion unchanged, so this would require a little further thought.
But again, these go beyond the immediate scope of this present issue.
The @ character indeed has been added to allow email-address-like user names as part of https://bugs.x2go.org/cgi-bin/bugreport.cgi?bug=573
Allowing $ as trailing characters has been part from the start, though I honestly don't understand why.
I share your puzzlement with that one. Is it possible this was the end-of-string pattern in the regexp, which got in to the allowable trailing characters by mistake?
In theory, /^[A-Za-z0-9._][A-Za-z0-9._-@]*/ should be more liberal, not restrict the length, allow portable user names and expand on that by allowing domain-based user names as well. I'd drop the trailing $.
That looks very plausible to me.
Best wishes,
Norman
-- Norman Gray : https://nxg.me.uk