I need help to create a regex (for JavaScript .match and PHP preg_match) that validates a unix type absolute path to a file (with international characters such as åäöøæð and so on) so that:
The regex needs to handle paths regardless of their depth (/path/to or /path/to/somewhere or /path/to/somewhere/else)
I have a regexp that marks 1 to 3 as valid /^\/.+[^\/]$/ , the problem is to make this regex not to mark 3 as valid as it contains // without any other character in between.
I need help to create a regex (for JavaScript .match and PHP preg_match) that validates a unix type absolute path to a file (with international characters such as åäöøæð and so on) so that:
The regex needs to handle paths regardless of their depth (/path/to or /path/to/somewhere or /path/to/somewhere/else)
I have a regexp that marks 1 to 3 as valid /^\/.+[^\/]$/ , the problem is to make this regex not to mark 3 as valid as it contains // without any other character in between.
Regex isn't really needed here. As far as I can see, there are three things you want to ensure:
/
/
, unless the whole string is /
//
All three of the above can be done with string functions.
In PHP:
if ($string != '/' && ($string[0] != '/' || $string[strlen($string)-1] == '/' || strpos($string, '//') > -1))
{
// string is invalid
}
In Javascript:
if (string != '/' && (string.charAt(0) != '/' || string.charAt(string.length - 1) == '/' || string.indexOf('//') > -1))
{
// string is invalid
}
Resources:
A Solution for PHP:
$lines = array(
"/path/to/someWhere",
"/path/tø/sömewhere",
"/path/to//somewhere",
"path/to/somewhere",
"/path/to/somewhere/",
);
foreach($lines as $line){
var_dump(preg_match('#^(/[^/]+)+$#',$line)); // dumps int(1) int(1) int(0) int(0) int(0)
}
I think this will do it:
^(:?\/$|(:?\/[^/]+)+$)
That says to accept any string that's either just a /, or any string formed from a sequence of one or more repetitions of a / followed by one or more non-/ characters.
This uses all greedy quantifiers so it should be fast; also, for performance, the ^ anchor is factored out.
That's a Javascript regex. I'm not a PHP programmer so the main thing I don't know is whether the non-capturing group syntax works in PHP. Also, I'm not sure how you'd handle "quoting" the slash characters.
This should work:
^/[^/]?$|^/[^/]([^/]|/[^/])*?[^/]$
It allows any character except /
, or a /
followed by any character except /
. It also makes sure that the last character isn’t a /
, and that the second character isn’t one either.
Finally, this uses /
without escaping. To use it in PHP, don’t use /
as the regex delimiter – this just makes the regular expression hard to read. Use any other character, e.g. ;
to delimit the expression instead:
;^/[^/]?$|^/[^/]([^/]|/[^/])*?[^/]$;
EDIT: Added special handing for the root path, "/"
, and paths that consist of a single letter directory.
If the path matches ^[^\/]|\/\/|.\/$
, it is invalid. Otherwise it is valid.
it's not regex, but works just as well.
str_replace('//', '/', $file)