URL Pattern Standard Commit b62405015d6ed4c5d54b46436319bf529c51a815 Snapshot

1. The `URLPattern` class

A URLPattern consists of several components, each of which represents a pattern which could be matched against the corresponding component of a URL.

It can be constructed using a string for each component, or from a shorthand string. It can optionally be resolved relative to a base URL.

The shorthand "https://example.com/:category/*" corresponds to the following components:

protocol: "https"
username: ""
password: ""
hostname: "example.com"
port: ""
pathname: "/:category/*"
search: ""
hash: ""

It matches the following URLs:

https://example.com/products/
https://example.com/blog/our-greatest-product-ever

It does not match the following URLs:

https://example.com/
http://example.com/products/
https://example.com:8443/blog/our-greatest-product-ever

typedef (USVString or URLPatternInit) URLPatternInput;

[Exposed=(Window,Worker)]
interface URLPattern {
  constructor(URLPatternInput input, USVString baseURL, optional URLPatternOptions options = {});
  constructor(optional URLPatternInput input = {}, optional URLPatternOptions options = {});

  boolean test(optional URLPatternInput input = {}, optional USVString baseURL);

  URLPatternResult? exec(optional URLPatternInput input = {}, optional USVString baseURL);

  readonly attribute USVString protocol;
  readonly attribute USVString username;
  readonly attribute USVString password;
  readonly attribute USVString hostname;
  readonly attribute USVString port;
  readonly attribute USVString pathname;
  readonly attribute USVString search;
  readonly attribute USVString hash;
};

dictionary URLPatternInit {
  USVString protocol;
  USVString username;
  USVString password;
  USVString hostname;
  USVString port;
  USVString pathname;
  USVString search;
  USVString hash;
  USVString baseURL;
};

dictionary URLPatternOptions {
  boolean ignoreCase = false;
};

dictionary URLPatternResult {
  sequence<URLPatternInput> inputs;

  URLPatternComponentResult protocol;
  URLPatternComponentResult username;
  URLPatternComponentResult password;
  URLPatternComponentResult hostname;
  URLPatternComponentResult port;
  URLPatternComponentResult pathname;
  URLPatternComponentResult search;
  URLPatternComponentResult hash;
};

dictionary URLPatternComponentResult {
  USVString input;
  record<USVString, (USVString or undefined)> groups;
};

Each URLPattern object has an associated protocol component, a component, which must be set upon creation.

Each URLPattern object has an associated username component, a component, which must be set upon creation.

Each URLPattern object has an associated password component, a component, which must be set upon creation.

Each URLPattern object has an associated hostname component, a component, which must be set upon creation.

Each URLPattern object has an associated port component, a component, which must be set upon creation.

Each URLPattern object has an associated pathname component, a component, which must be set upon creation.

Each URLPattern object has an associated search component, a component, which must be set upon creation.

Each URLPattern object has an associated hash component, a component, which must be set upon creation.

urlPattern = new URLPattern(input): Constructs a new URLPattern object. The input is an object containing separate patterns for each URL component; e.g. hostname, pathname, etc. Missing components will default to a wildcard pattern. In addition, input can contain a baseURL property that provides static text patterns for any missing components.
urlPattern = new URLPattern(patternString, baseURL): Constructs a new URLPattern object. patternString is a URL string containing pattern syntax for one or more components. If baseURL is provided, then patternString can be relative. This constructor will always set at least an empty string value and does not default any components to wildcard patterns.
urlPattern = new URLPattern(input, options): Constructs a new URLPattern object. The options is an object containing the additional configuration options that can affect how the components are matched. Currently it has only one property ignoreCase which can be set to true to enable case-insensitive matching.
Note that by default, that is in the absence of the options argument, matching is always case-sensitive.
urlPattern = new URLPattern(patternString, baseURL, options): Constructs a new URLPattern object. This overrides supports a URLPatternOptions object when constructing a pattern from a patternString object, describing the patterns for individual components, and base URL.
matches = urlPattern.test(input): Tests if urlPattern matches the given arguments. The input is an object containing strings representing each URL component; e.g. hostname, pathname, etc. Missing components are treated as empty strings. In addition, input can contain a baseURL property that provides values for any missing components. If urlPattern matches the input on a component-by-component basis then true is returned. Otherwise, false is returned.
matches = urlPattern.test(url, baseURL): Tests if urlPattern matches the given arguments. url is a URL string. If baseURL is provided, then url can be relative.
If urlPattern matches the input on a component-by-component basis then true is returned. Otherwise, false is returned.
result = urlPattern.exec(input): Executes the urlPattern against the given arguments. The input is an object containing strings representing each URL component; e.g. hostname, pathname, etc. Missing components are treated as empty strings. In addition, input can contain a baseURL property that provides values for any missing components.
If urlPattern matches the input on a component-by-component basis then an object is returned containing the results. Matched group values are contained in per-component group objects within the result object; e.g. matches.pathname.groups.id. If urlPattern does not match the input, then result is null.
result = urlPattern.exec(url, baseURL): Executes the urlPattern against the given arguments. url is a URL string. If baseURL is provided, then input can be relative.
If urlPattern matches the input on a component-by-component basis then an object is returned containing the results. Matched group values are contained in per-component group objects within the result object; e.g. matches.pathname.groups.id. If urlPattern does not match the input, then result is null.
urlPattern.protocol: Returns urlPattern’s normalized protocol pattern string.
urlPattern.username: Returns urlPattern’s normalized username pattern string.
urlPattern.password: Returns urlPattern’s normalized password pattern string.
urlPattern.hostname: Returns urlPattern’s normalized hostname pattern string.
urlPattern.port: Returns urlPattern’s normalized port pattern string.
urlPattern.pathname: Returns urlPattern’s normalized pathname pattern string.
urlPattern.search: Returns urlPattern’s normalized search pattern string.
urlPattern.hash: Returns urlPattern’s normalized hash pattern string.

The new URLPattern(input, baseURL, options) constructor steps are:

Run initialize given this, input, baseURL, and options.

The new URLPattern(input, options) constructor steps are:

Run initialize given this, input, null, and options.

To initialize a URLPattern given a URLPattern this, URLPatternInput input, string or null baseURL, and URLPatternOptions options:

Let init be null.
If input is a scalar value string then:
1. Set init to the result of running parse a constructor string given input.
2. If baseURL is null and init["protocol"] is null, then throw a TypeError.
3. Set init["baseURL"] to baseURL.
Otherwise:
1. Assert: input is a URLPatternInit.
2. If baseURL is not null, then throw a TypeError.
3. Set init to input.
Let processedInit be the result of process a URLPatternInit given init, "pattern", null, null, null, null, null, null, null, and null.
If processedInit["protocol"] is a special scheme and processedInit["port"] is its corresponding default port, then set processedInit["port"] to the empty string.
Set this’s protocol component to the result of compiling a component given processedInit["protocol"], canonicalize a protocol, and default options.
Set this’s username component to the result of compiling a component given processedInit["username"], canonicalize a username, and default options.
Set this’s password component to the result of compiling a component given processedInit["password"], canonicalize a password, and default options.
If the result running hostname pattern is an IPv6 address given processedInit["hostname"] is true, then set this’s hostname component to the result of compiling a component given processedInit["hostname"], canonicalize an IPv6 hostname, and hostname options.
Otherwise, set this’s hostname component to the result of compiling a component given processedInit["hostname"], canonicalize a hostname, and hostname options.
Set this’s port component to the result of compiling a component given processedInit["port"], canonicalize a port, and default options.
Let compileOptions be a copy of the default options with the ignore case property set to options["ignoreCase"].
If the result of running protocol component matches a special scheme given this’s protocol component is true, then:
1. Let pathCompileOptions be copy of the pathname options with the ignore case property set to options["ignoreCase"].
2. Set this’s pathname component to the result of compiling a component given processedInit["pathname"], canonicalize a pathname, and pathCompileOptions.
Otherwise set this’s pathname component to the result of compiling a component given processedInit["pathname"], canonicalize an opaque pathname, and compileOptions.
Set this’s search component to the result of compiling a component given processedInit["search"], canonicalize a search, and compileOptions.
Set this’s hash component to the result of compiling a component given processedInit["hash"], canonicalize a hash, and compileOptions.

The protocol getter steps are:

Return this's protocol component's pattern string.

The username getter steps are:

Return this's username component's pattern string.

The password getter steps are:

Return this's password component's pattern string.

The hostname getter steps are:

Return this's hostname component's pattern string.

The port getter steps are:

Return this's port component's pattern string.

The pathname getter steps are:

Return this's pathname component's pattern string.

The search getter steps are:

Return this's search component's pattern string.

The hash getter steps are:

Return this's hash component's pattern string.

The test(input, baseURL) method steps are:

Let result be the result of match given this, input, and baseURL if given.
If result is null, return false.
Return true.

The exec(input, baseURL) method steps are:

Return the result of match given this, input, and baseURL if given.

1.1. Internals

A URLPattern is associated with multiple component structs.

A component has an associated pattern string, a well formed pattern string, which must be set upon creation.

A component has an associated regular expression, a RegExp, which must be set upon creation.

A component has an associated group name list, a list of strings, which must be set upon creation.

To compile a component given a string input, encoding callback encoding callback, and options options:

If input is null, then set input to "*".
Let part list be the result of running parse a pattern string given input, options, and encoding callback.
Let (regular expression string, name list) be the result of running generate a regular expression and name list given part list and options.
Let flags be an empty string.
If options’s ignore case is true then set flags to "vi".
Otherwise set flags to "v"
Let regular expression be RegExpCreate(regular expression string, flags). If this throws an exception, catch it, and throw a TypeError.

The specification uses regular expressions to perform all matching, but this is not mandated. Implementations are free to perform matching directly against the part list when possible; e.g. when there are no custom regexp matching groups. If there are custom regular expressions, however, its important that they be immediately evaluated in the compile a component algorithm so an error can be thrown if they are invalid.
Let pattern string be the result of running generate a pattern string given part list and options.
Return a new component whose pattern string is pattern string, regular expression is regular expression, and group name list is name list.

To perform a match given a URLPattern urlpattern, a URLPatternInput input, and an optional string baseURLString:

Let protocol be the empty string.
Let username be the empty string.
Let password be the empty string.
Let hostname be the empty string.
Let port be the empty string.
Let pathname be the empty string.
Let search be the empty string.
Let hash be the empty string.
Let inputs be an empty list.
Append input to inputs.
If input is a URLPatternInit then:
1. If baseURLString was given, throw a TypeError.
2. Let applyResult be the result of process a URLPatternInit given input, "url", protocol, username, password, hostname, port, pathname, search, and hash. If this throws an exception, catch it, and return null.
3. Set protocol to applyResult["protocol"].
4. Set username to applyResult["username"].
5. Set password to applyResult["password"].
6. Set hostname to applyResult["hostname"].
7. Set port to applyResult["port"].
8. Set pathname to applyResult["pathname"].
9. Set search to applyResult["search"].
10. Set hash to applyResult["hash"].
Otherwise:
1. Let baseURL be null.
2. If baseURLString was given, then:
  1. Set baseURL to the result of parsing baseURLString.
  2. If baseURL is failure, return null.
  3. Append baseURLString to inputs.
3. Let url be the result of parsing input given baseURL.
4. If url is failure, return null.
5. Set protocol to url’s scheme.
6. Set username to url’s username.
7. Set password to url’s password.
8. Set hostname to url’s host or the empty string if the value is null.
9. Set port to url’s port or the empty string if the value is null.
10. Set pathname to the result of URL path serializing url.
11. Set search to url’s query or the empty string if the value is null.
12. Set hash to url’s fragment or the empty string if the value is null.
Let protocolExecResult be RegExpBuiltinExec(urlpattern’s protocol component's regular expression, protocol).
Let usernameExecResult be RegExpBuiltinExec(urlpattern’s username component's regular expression, username).
Let passwordExecResult be RegExpBuiltinExec(urlpattern’s password component's regular expression, password).
Let hostnameExecResult be RegExpBuiltinExec(urlpattern’s hostname component's regular expression, hostname).
Let portExecResult be RegExpBuiltinExec(urlpattern’s port component's regular expression, port).
Let pathnameExecResult be RegExpBuiltinExec(urlpattern’s pathname component's regular expression, pathname).
Let searchExecResult be RegExpBuiltinExec(urlpattern’s search component's regular expression, search).
Let hashExecResult be RegExpBuiltinExec(urlpattern’s hash component's regular expression, hash).
If protocolExecResult, usernameExecResult, passwordExecResult, hostnameExecResult, portExecResult, pathnameExecResult, searchExecResult, or hashExecResult are null then return null.
Let result be a new URLPatternResult.
Set result["inputs"] to inputs.
Set result["protocol"] to the result of creating a component match result given urlpattern’s protocol component, protocol, and protocolExecResult.
Set result["username"] to the result of creating a component match result given urlpattern’s username component, username, and usernameExecResult.
Set result["password"] to the result of creating a component match result given urlpattern’s password component, password, and passwordExecResult.
Set result["hostname"] to the result of creating a component match result given urlpattern’s hostname component, hostname, and hostnameExecResult.
Set result["port"] to the result of creating a component match result given urlpattern’s port component, port, and portExecResult.
Set result["pathname"] to the result of creating a component match result given urlpattern’s pathname component, pathname, and pathnameExecResult.
Set result["search"] to the result of creating a component match result given urlpattern’s search component, search, and searchExecResult.
Set result["hash"] to the result of creating a component match result given urlpattern’s hash component, hash, and hashExecResult.
Return result.

To create a component match result given a component component, a string input, and an array representing the output of RegExpBuiltinExec execResult:

Let result be a new URLPatternComponentResult.
Set result["input"] to input.
Let groups be a record<USVString, (USVString or undefined)>.
Let index be 1.
While index is less than Get(execResult, "length"):
1. Let name be component’s group name list[index − 1].
2. Let value be Get(execResult, ToString(index)).
3. Set groups[name] to value.
4. Increment index by 1.
Set result["groups"] to groups.
Return result.

The default options is an options struct with delimiter code point set to the empty string and prefix code point set to the empty string.

The hostname options is an options struct with delimiter code point set "." and prefix code point set to the empty string.

The pathname options is an options struct with delimiter code point set "/" and prefix code point set to "/".

To determine if a protocol component matches a special scheme given a component protocol component:

Let special scheme list be a list populated with all of the special schemes.
For each scheme of special scheme list:
1. Let test result be RegExpBuiltinExec(protocol component’s regular expression, scheme).
2. If test result is not null, then return true.
Return false.

To determine if a hostname pattern is an IPv6 address given a pattern string input:

If input’s code point length is less than 2, then return false.
Let input code points be input interpreted as a list of code points.
If input code points[0] is U+005B ([), then return true.
If input code points[0] is U+007B ({) and input code points[1] is U+005B ([), then return true.
If input code points[0] is U+005C (\) and input code points[1] is U+005B ([), then return true.
Return false.

1.2. Constructor string parsing

A constructor string parser is a struct.

A constructor string parser has an associated input, a string, which must be set upon creation.

A constructor string parser has an associated token list, a token list, which must be set upon creation.

A constructor string parser has an associated result, a URLPatternInit, initially set to a new URLPatternInit.

A constructor string parser has an associated component start, a number, initially set to 0.

A constructor string parser has an associated token index, a number, initially set to 0.

A constructor string parser has an associated token increment, a number, initially set to 1.

A constructor string parser has an associated group depth, a number, initially set to 0.

A constructor string parser has an associated hostname IPv6 bracket depth, a number, initially set to 0.

A constructor string parser has an associated protocol matches a special scheme flag, a boolean, initially set to false.

A constructor string parser has an associated state, a string, initially set to "init". It must be one of the following:

"init"
"protocol"
"authority"
"username"
"password"
"hostname"
"port"
"pathname"
"search"
"hash"
"done"

The URLPattern constructor string algorithm is very similar to the basic URL parser algorithm, but some differences prevent us from using that algorithm directly.

First, the URLPattern constructor string parser operates on tokens generated using the "lenient" tokenize policy. In constrast, basic URL parser operates on code points. Operating on tokens allows the URLPattern constructor string parser to more easily distinguish between code points that are significant pattern syntax and code points that might be a URL component separator. For example, it makes it trivial to handle named groups like ":hmm" in "https://a.c:hmm.example.com:8080" without getting confused with the port number.

Second, the URLPattern constructor string parser needs to avoid applying URL canonicalization to all code points like basic URL parser does. Instead we perform canonicalization on only parts of the pattern string we know are safe later when compiling each component pattern string.

Finally, the URLPattern constructor string parser does not handle some parts of the basic URL parser state machine. For example, it does not treat backslashes specially as they would all be treated as pattern characters and would require excessive escaping. In addition, this parser might not handle some more esoteric parts of the URL parsing algorithm like file URLs with a hostname. The goal with this parser was to handle the most common URLs while allowing any niche case to be handled instead via the URLPatternInit constructor.

To parse a constructor string given a string input:

Let parser be a new constructor string parser whose input is input and token list is the result of running tokenize given input and "lenient".

When constructing a pattern using a URLPatternInit like new URLPattern({ pathname: 'foo' }) any missing components will be defaulted to wildcards. In the constructor string case, however, all components are precisely defined as either empty string or a longer value. This is due to there being no way to simply "leave out" a component when writing a URL.

To implement this we initialize components in parser’s result with empty string in advance.

We can’t, however, do this immediately. We want to allow the baseURL to provide information for relative URLs, so we only want to set the default empty string values for components following the first component in the relative URL. We therefore wait to set the default component values until after we exit the "init" state.
While parser’s token index is less than parser’s token list size:
1. Set parser’s token increment to 1.
  
  On every iteration of the parse loop the parser’s token index will be incremented by its token increment value. Typically this means incrementing by 1, but at certain times it is set to zero. The token increment is then always reset back to 1 at the top of the loop.
2. If parser’s token list[parser’s token index]'s type is "end" then:
  1. If parser’s state is "init":
    
    If we reached the end of the string in the "init" state, then we failed to find a protocol terminator and this has to be a relative URLPattern constructor string.
    1. Run rewind given parser.
      
      We next determine at which component the relative pattern begins. Relative pathnames are most common, but URLs and URLPattern constructor strings can begin with the search or hash components as well.
    2. If the result of running is a hash prefix given parser is true, then run change state given parser, "hash" and 1.
    3. Otherwise if the result of running is a search prefix given parser is true:
      1. Run change state given parser, "search" and 1.
      2. Set parser’s result["hash"] to the empty string.
    4. Otherwise:
      1. Run change state given parser, "pathname" and 0.
      2. Set parser’s result["search"] to the empty string.
      3. Set parser’s result["hash"] to the empty string.
    5. Increment parser’s token index by parser’s token increment.
    6. Continue.
  2. If parser’s state is "authority":
    
    If we reached the end of the string in the "authority" state, then we failed to find an "@". Therefore there is no username or password.
    1. Run rewind and set state given parser, and "hostname".
    2. Increment parser’s token index by parser’s token increment.
    3. Continue.
  3. Run change state given parser, "done" and 0.
  4. Break.
3. If the result of running is a group open given parser is true:
  
  We ignore all code points within "{ ... }" pattern groupings. It would not make sense to allow a URL component boundary to lie within a grouping; e.g. "https://example.c{om/fo}o". While not supported within well formed pattern strings, we handle nested groupings here to avoid parser confusion.
  
  It is not necessary to perform this logic for regexp or named groups since those values are collapsed into individual tokens by the tokenize algorithm.
  1. Increment parser’s group depth by 1.
  2. Increment parser’s token index by parser’s token increment.
  3. Continue.
4. If parser’s group depth is greater than 0:
  1. If the result of running is a group close given parser is true, then decrement parser’s group depth by 1.
  2. Otherwise:
    1. Increment parser’s token index by parser’s token increment.
    2. Continue.
5. Switch on parser’s state and run the associated steps:
  "init"
  1. If the result of running is a protocol suffix given parser is true:
    
    We found a protocol suffix, so this is an absolute URLPattern constructor string. Therefore initialize all component to the empty string.
    
    Set parser’s result["username"] to the empty string.
    
    Set parser’s result["password"] to the empty string.
    
    Set parser’s result["hostname"] to the empty string.
    
    Set parser’s result["port"] to the empty string.
    
    Set parser’s result["pathname"] to the empty string.
    
    Set parser’s result["search"] to the empty string.
    
    Set parser’s result["hash"] to the empty string.
    
    Run rewind and set state given parser and "protocol".
  "protocol"
  1. If the result of running is a protocol suffix given parser is true:
    
    Run compute protocol matches a special scheme flag given parser.
    
    We need to eagerly compile the protocol component to determine if it matches any special schemes. If it does then certain special rules apply. It determines if the pathname defaults to a "/" and also whether we will look for the username, password, hostname, and port components. Authority slashes can also cause us to look for these components as well. Otherwise we treat this as an "opaque path URL" and go straight to the pathname component.
    
    If parser’s protocol matches a special scheme flag is true, then set parser’s result["pathname"] to "/".
    
    Let next state be "pathname".
    
    Let skip be 1.
    
    If the result of running next is authority slashes given parser is true:
    
    Set next state to "authority".
    
    Set skip to 3.
    
    Otherwise if parser’s protocol matches a special scheme flag is true, then set next state to "authority".
    
    Run change state given parser, next state, and skip.
  "authority"
  1. If the result of running is an identity terminator given parser is true, then run rewind and set state given parser and "username".
  2. Otherwise if any of the following are true:
    
    the result of running is a pathname start given parser;
    the result of running is a search prefix given parser; or
    the result of running is a hash prefix given parser,
    
    then run rewind and set state given parser and "hostname".
  "username"
  1. If the result of running is a password prefix given parser is true, then run change state given parser, "password", and 1.
  2. Otherwise if the result of running is an identity terminator given parser is true, then run change state given parser, "hostname", and 1.
  "password"
  1. If the result of running is an identity terminator given parser is true, then run change state given parser, "hostname", and 1.
  "hostname"
  1. If the result of running is an IPv6 open given parser is true, then increment parser’s hostname IPv6 bracket depth by 1.
  2. Otherwise if the result of running is an IPv6 close given parser is true, then decrement parser’s hostname IPv6 bracket depth by 1.
  3. Otherwise if the result of running is a port prefix given parser is true and parser’s hostname IPv6 bracket depth is zero, then run change state given parser, "port", and 1.
  4. Otherwise if the result of running is a pathname start given parser is true, then run change state given parser, "pathname", and 0.
  5. Otherwise if the result of running is a search prefix given parser is true, then run change state given parser, "search", and 1.
  6. Otherwise if the result of running is a hash prefix given parser is true, then run change state given parser, "hash", and 1.
  "port"
  1. If the result of running is a pathname start given parser is true, then run change state given parser, "pathname", and 0.
  2. Otherwise if the result of running is a search prefix given parser is true, then run change state given parser, "search", and 1.
  3. Otherwise if the result of running is a hash prefix given parser is true, then run change state given parser, "hash", and 1.
  "pathname"
  1. If the result of running is a search prefix given parser is true, then run change state given parser, "search", and 1.
  2. Otherwise if the result of running is a hash prefix given parser is true, then run change state given parser, "hash", and 1.
  "search"
  1. If the result of running is a hash prefix given parser is true, then run change state given parser, "hash", and 1.
  "hash"
  1. Do nothing.
  "done"
  1. Assert: This step is never reached.
6. Increment parser’s token index by parser’s token increment.
Return parser’s result.

To change state given a constructor string parser parser, a state new state, and a number skip:

If parser’s state is not "init", not "authority", and not "done", then set parser’s result[parser’s state] to the result of running make a component string given parser.
Set parser’s state to new state.
Increment parser’s token index by skip.
Set parser’s component start to parser’s token index.
Set parser’s token increment to 0.

To rewind given a constructor string parser parser:

Set parser’s token index to parser’s component start.
Set parser’s token increment to 0.

To rewind and set state given a constructor string parser parser and a state state:

Run rewind given parser.
Set parser’s state to state.

To get a safe token given a constructor string parser parser and a number index:

If index is less than parser’s token list's size, then return parser’s token list[index].
Assert: parser’s token list's size is greater than or equal to 1.
Let last index be parser’s token list's size − 1.
Let token be parser’s token list[last index].
Assert: token’s type is "end".
Return token.

To run is a non-special pattern char given a constructor string parser parser, a number index, and a string value:

Let token be the result of running get a safe token given parser and index.
If token’s value is not value, then return false.
If any of the following are true:
- token’s type is "char";
- token’s type is "escaped-char"; or
- token’s type is "invalid-char",
then return true.
Return false.

To run is a protocol suffix given a constructor string parser parser:

Return the result of running is a non-special pattern char given parser, parser’s token index, and ":".

To run next is authority slashes given a constructor string parser parser:

If the result of running is a non-special pattern char given parser, parser’s token index + 1, and "/" is false, then return false.
If the result of running is a non-special pattern char given parser, parser’s token index + 2, and "/" is false, then return false.
Return true.

To run is an identity terminator given a constructor string parser parser:

Return the result of running is a non-special pattern char given parser, parser’s token index, and "@".

To run is a password prefix given a constructor string parser parser:

Return the result of running is a non-special pattern char given parser, parser’s token index, and ":".

To run is a port prefix given a constructor string parser parser:

Return the result of running is a non-special pattern char given parser, parser’s token index, and ":".

To run is a pathname start given a constructor string parser parser:

Return the result of running is a non-special pattern char given parser, parser’s token index, and "/".

To run is a search prefix given a constructor string parser parser:

If result of running is a non-special pattern char given parser, parser’s token index and "?" is true, then return true.
If parser’s token list[parser’s token index]'s value is not "?", then return false.
Let previous index be parser’s token index − 1.
If previous index is less than 0, then return true.
Let previous token be the result of running get a safe token given parser and previous index.
If any of the following are true, then return false:
- previous token’s type is "name".
- previous token’s type is "regexp".
- previous token’s type is "close".
- previous token’s type is "asterisk".
Return true.

To run is a hash prefix given a constructor string parser parser:

Return the result of running is a non-special pattern char given parser, parser’s token index and "#".

To run is a group open given a constructor string parser parser:

If parser’s token list[parser’s token index]'s type is "open", then return true.
Otherwise return false.

To run is a group close given a constructor string parser parser:

If parser’s token list[parser’s token index]'s type is "close", then return true.
Otherwise return false.

To run is an IPv6 open given a constructor string parser parser:

Return the result of running is a non-special pattern char given parser, parser’s token index, and "[".

To run is an IPv6 close given a constructor string parser parser:

Return the result of running is a non-special pattern char given parser, parser’s token index, and "]".

To run make a component string given a constructor string parser parser:

Assert: parser’s token index is less than parser’s token list's size.
Let token be parser’s token list[parser’s token index].
Let component start token be the result of running get a safe token given parser and parser’s component start.
Let component start input index be component start token’s index.
Let end index be token’s index.
Return the code point substring from component start input index to end index within parser’s input.

To compute protocol matches a special scheme flag given a constructor string parser parser:

Let protocol string be the result of running make a component string given parser.
Let protocol component be the result of compiling a component given protocol string, canonicalize a protocol, and default options.
If the result of running protocol component matches a special scheme given protocol component is true, then set parser’s protocol matches a special scheme flag to true.

2. Patterns

A pattern string is a string that is written to match a set of target strings. A well formed pattern string conforms to a particular pattern syntax. This pattern syntax is directly based on the syntax used by the popular path-to-regexp JavaScript library.

It can be parsed to produce a part list which describes, in order, what must appear in a component string for the pattern string to match.

Pattern strings can contain capture groups, which by default match the shortest possible string, up to a component-specific separator (/ in the pathname, . in the hostname). For example, the pathname pattern "/blog/:title" will match "/blog/hello-world" but not "/blog/2012/02".

A regular expression can also be used instead, so the pathname pattern "/blog/:year(\\d+)/:month(\\d+)" will match "/blog/2012/02".

A group can also be made optional, or repeated, by using a modifier. For example, the pathname pattern "/products/:id?" will match both "/products" and "/products/2" (but not "/products/"). In the pathname specifically, groups automatically require a leading /; to avoid this, the group can be explicitly deliminated, as in the pathname pattern "/products/{:id}?".

A full wildcard * can also be used to match as much as possible, as in the pathname pattern "/products/*".

2.1. Parsing patterns

2.1.1. Tokens

A token list is a list containing zero or more token structs.

A token is a struct representing a single lexical token within a pattern string.

A token has an associated type, a string, initially "invalid-char". It must be one of the following:

"open": The token represents a U+007B ({) code point.
"close": The token represents a U+007D (}) code point.
"regexp": The token represents a string of the form "(<regular expression>)". The regular expression is required to consist of only ASCII code points.
"name": The token represents a string of the form ":<name>". The name value is restricted to code points that are consistent with JavaScript identifiers.
"char": The token represents a valid pattern code point without any special syntactical meaning.
"escaped-char": The token represents a code point escaped using a backslash like "\<char>".
"other-modifier": The token represents a matching group modifier that is either the U+003F (?) or U+002B (+) code points.
"asterisk": The token represents a U+002A (*) code point that can be either a wildcard matching group or a matching group modifier.
"end": The token represents the end of the pattern string.
"invalid-char": The token represents a code point that is invalid in the pattern. This could be because of the code point value itself or due to its location within the pattern relative to other syntactic elements.

A token has an associated index, a number, initially 0. It is the position of the first code point in the pattern string represented by the token.

A token has an associated value, a string, initially the empty string. It contains the code points from the pattern string represented by the token.

2.1.2. Tokenizing

A tokenize policy is a string that must be either "strict" or "lenient".

A tokenizer is a struct.

A tokenizer has an associated input, a pattern string, initially the empty string.

A tokenizer has an associated policy, a tokenize policy, initially "strict".

A tokenizer has an associated token list, a token list, initially an empty list.

A tokenizer has an associated index, a number, initially 0.

A tokenizer has an associated next index, a number, initially 0.

A tokenizer has an associated code point, a Unicode code point, initially null.

To tokenize a given string input and tokenize policy policy:

Let tokenizer be a new tokenizer.
Set tokenizer’s input to input.
Set tokenizer’s policy to policy.
While tokenizer’s index is less than tokenizer’s input's code point length:
1. Run seek and get the next code point given tokenizer and tokenizer’s index.
2. If tokenizer’s code point is U+002A (*):
  1. Run add a token with default position and length given tokenizer and "asterisk".
  2. Continue.
3. If tokenizer’s code point is U+002B (+) or U+003F (?):
  1. Run add a token with default position and length given tokenizer and "other-modifier".
  2. Continue.
4. If tokenizer’s code point is U+005C (\):
  1. If tokenizer’s index is equal to tokenizer’s input's code point length − 1:
    1. Run process a tokenizing error given tokenizer, tokenizer’s next index, and tokenizer’s index.
    2. Continue.
  2. Let escaped index be tokenizer’s next index.
  3. Run get the next code point given tokenizer.
  4. Run add a token with default length given tokenizer, "escaped-char", tokenizer’s next index, and escaped index.
  5. Continue.
5. If tokenizer’s code point is U+007B ({):
  1. Run add a token with default position and length given tokenizer and "open".
  2. Continue.
6. If tokenizer’s code point is U+007D (}):
  1. Run add a token with default position and length given tokenizer and "close".
  2. Continue.
7. If tokenizer’s code point is U+003A (:):
  1. Let name position be tokenizer’s next index.
  2. Let name start be name position.
  3. While name position is less than tokenizer’s input's code point length:
    1. Run seek and get the next code point given tokenizer and name position.
    2. Let first code point be true if name position equals name start and false otherwise.
    3. Let valid code point be the result of running is a valid name code point given tokenizer’s code point and first code point.
    4. If valid code point is false break.
    5. Set name position to tokenizer’s next index.
  4. If name position is less than or equal to name start:
    1. Run process a tokenizing error given tokenizer, name start, and tokenizer’s index.
    2. Continue.
  5. Run add a token with default length given tokenizer, "name", name position, and name start.
  6. Continue.
8. If tokenizer’s code point is U+0028 (():
  1. Let depth be 1.
  2. Let regexp position be tokenizer’s next index.
  3. Let regexp start be regexp position.
  4. Let error be false.
  5. While regexp position is less than tokenizer’s input's code point length:
    1. Run seek and get the next code point given tokenizer and regexp position.
    2. If the result of running is ASCII given tokenizer’s code point is false:
      1. Run process a tokenizing error given tokenizer, regexp start, and tokenizer’s index.
      2. Set error to true.
      3. Break.
    3. If regexp position equals regexp start and tokenizer’s code point is U+003F (?):
      1. Run process a tokenizing error given tokenizer, regexp start, and tokenizer’s index.
      2. Set error to true.
      3. Break.
    4. If tokenizer’s code point is U+005C (\):
      1. If regexp position equals tokenizer’s input's code point length − 1:
        
        Run process a tokenizing error given tokenizer, regexp start, and tokenizer’s index.
        
        Set error to true.
        
        Break
      2. Run get the next code point given tokenizer.
      3. If the result of running is ASCII given tokenizer’s code point is false:
        
        Run process a tokenizing error given tokenizer, regexp start, and tokenizer’s index.
        
        Set error to true.
        
        Break.
      4. Set regexp position to tokenizer’s next index.
      5. Continue.
    5. If tokenizer’s code point is U+0029 ()):
      1. Decrement depth by 1.
      2. If depth is 0:
        
        Set regexp position to tokenizer’s next index.
        
        Break.
    6. Otherwise if tokenizer’s code point is U+0028 (():
      1. Increment depth by 1.
      2. If regexp position equals tokenizer’s input's code point length − 1:
        
        Run process a tokenizing error given tokenizer, regexp start, and tokenizer’s index.
        
        Set error to true.
        
        Break
      3. Let temporary position be tokenizer’s next index.
      4. Run get the next code point given tokenizer.
      5. If tokenizer’s code point is not U+003F (?):
        
        Run process a tokenizing error given tokenizer, regexp start, and tokenizer’s index.
        
        Set error to true.
        
        Break.
      6. Set tokenizer’s next index to temporary position.
    7. Set regexp position to tokenizer’s next index.
  6. If error is true continue.
  7. If depth is not zero:
    1. Run process a tokenizing error given tokenizer, regexp start, and tokenizer’s index.
    2. Continue.
  8. Let regexp length be regexp position − regexp start − 1.
  9. If regexp length is zero:
    1. Run process a tokenizing error given tokenizer, regexp start, and tokenizer’s index.
    2. Continue.
  10. Run add a token given tokenizer, "regexp", regexp position, regexp start, and regexp length.
  11. Continue.
9. Run add a token with default position and length given tokenizer and "char".
Run add a token with default length given tokenizer, "end", tokenizer’s index, and tokenizer’s index.
Return tokenizer’s token list.

To get the next code point for a given tokenizer tokenizer:

Set tokenizer’s code point to the Unicode code point in tokenizer’s input at the position indicated by tokenizer’s next index.
Increment tokenizer’s next index by 1.

To seek and get the next code point for a given tokenizer tokenizer and number index:

Set tokenizer’s next index to index.
Run get the next code point given tokenizer.

To add a token for a given tokenizer tokenizer, type type, number next position, number value position, and number value length:

Let token be a new token.
Set token’s type to type.
Set token’s index to tokenizer’s index.
Set token’s value to the code point substring from value position with length value length within tokenizer’s input.
Append token to the back of tokenizer’s token list.
Set tokenizer’s index to next position.

To add a token with default length for a given tokenizer tokenizer, type type, number next position, and number value position:

Let computed length be next position − value position.
Run add a token given tokenizer, type, next position, value position, and computed length.

To add a token with default position and length for a given tokenizer tokenizer and type type:

Run add a token with default length given tokenizer, type, tokenizer’s next index, and tokenizer’s index.

To process a tokenizing error for a given tokenizer tokenizer, a number next position, and a number value position:

If tokenizer’s policy is "strict", then throw a TypeError.
Assert: tokenizer’s policy is "lenient".
Run add a token with default length given tokenizer, "invalid-char", next position, and value position.

To perform is a valid name code point given a Unicode code point and a boolean first:

If first is true return the result of checking if code point is contained in the IdentifierStart set of code points.
Otherwise return the result of checking if code point is contained in the IdentifierPart set of code points.

To determine if a Unicode code point is ASCII:

If code point is between U+0000 and U+007F inclusive, then return true.
Otherwise return false.

2.1.3. Parts

A part list is a list of zero or more parts.

A part is a struct representing one piece of a parser pattern string. It can contain at most one matching group, a fixed text prefix, a fixed text suffix, and a modifier. It can contain as little as a single fixed text string or a single matching group.

A part has an associated type, a string, which must be set upon creation. It must be one of the following:

"fixed-text": The part represents a simple fixed text string.
"regexp": The part represents a matching group with a custom regular expression.
"segment-wildcard": The part represents a matching group that matches code points up to the next separator code point. This is typically used for a named group like ":foo" that does not have a custom regular expression.
"full-wildcard": The part represents a matching group that greedily matches all code points. This is typically used for the "*" wildcard matching group.

A part has an associated value, a string, which must be set upon creation.

A part has an associated modifier a string, which must be set upon creation. It must be one of the following:

"none": The part does not have a modifier.
"optional": The part has an optional modifier indicated by the U+003F (?) code point.
"zero-or-more": The part has a "zero or more" modifier indicated by the U+002A (*) code point.
"one-or-more": The part has a "one or more" modifier indicated by the U+002B (+) code point.

A part has an associated name, a string, initially the empty string.

A part has an associated prefix, a string, initially the empty string.

A part has an associated suffix, a string, initially the empty string.

2.1.4. Options

An options struct contains different settings that control how pattern string behaves. These options originally come from path-to-regexp. We only include the options that are modified within the URLPattern specification and exclude the other options. For the purposes of comparison, this specification acts like path-to-regexp where strict, start, and end are always set to false.

An options has an associated delimiter code point, a string, which must be set upon creation. It must contain one ASCII code point or the empty string. This code point is treated as a segment separator and is used for determining how far a :foo named group should match by default. For example, if the delimiter code point is "/" then "/:foo" will match "/bar", but not "/bar/baz". If the delimiter code point is the empty string then the example pattern would match both strings.

An options has an associated prefix code point, a string, which must be set upon creation. It must contain one ASCII code point or the empty string. The code point is treated as an automatic prefix if found immediately preceding a match group. This matters when a match group is modified to be optional or repeating. For example, if prefix code point is "/" then "/foo/:bar?/baz" will treat the "/" before ":bar" as a prefix that becomes optional along with the named group. So in this example the pattern would match "/foo/baz".

An options has an associated ignore case, a boolean, which must be set up upon creation. It defaults to false. Depending on the set value, true or false, this flag enables case-sensitive or case-insensitive matches, respectively. For the purpose of comparison, this case be thought of as the negated sensitive option in path-to-regexp.

2.1.5. Parsing

An encoding callback is an abstract algorithm that takes a given string input. The input will be a simple text piece of a pattern string. An implementing algorithm will validate and encode the input. It must return the encoded string or throw an exception.

A pattern parser is a struct.

A pattern parser has an associated token list, a token list, initially an empty list.

A pattern parser has an associated encoding callback, a encoding callback, that must be set upon creation.

A pattern parser has an associated segment wildcard regexp, a string, that must be set upon creation.

A pattern parser has an associated part list, a part list, initially an empty list.

A pattern parser has an associated pending fixed value, a string, initially the empty string.

A pattern parser has an associated index, a number, initially 0.

A pattern parser has an associated next numeric name, a number, initially 0.

To parse a pattern string given a pattern string input, options options, and encoding callback encoding callback:

Let parser be a new pattern parser whose encoding callback is encoding callback and segment wildcard regexp is the result of running generate a segment wildcard regexp given options.
Set parser’s token list to the result of running tokenize given input and "strict".
While parser’s index is less than parser’s token list's size:

This first section is looking for the sequence: <prefix char><name><regexp><modifier>. There could be zero to all of these tokens.

"/:foo(bar)?"
All four tokens.
"/"
One "char" token.
":foo"
One "name" token.
"(bar)"
One "regexp" token.
"/:foo"
"char" and "name" tokens.
"/(bar)"
"char" and "regexp" tokens.
"/:foo?"
"char", "name", and "other-modifier" tokens.
"/(bar)?"
"char", "regexp", and "other-modifier" tokens.
1. Let char token be the result of running try to consume a token given parser and "char".
2. Let name token be the result of running try to consume a token given parser and "name".
3. Let regexp or wildcard token be the result of running try to consume a regexp or wildcard token given parser and name token.
4. If name token is not null or regexp or wildcard token is not null:
  
  If there is a matching group, we need to add the part immediately.
  1. Let prefix be the empty string.
  2. If char token is not null then set prefix to char token’s value.
  3. If prefix is not the empty string and not options’s prefix code point:
    1. Append prefix to the end of parser’s pending fixed value.
    2. Set prefix to the empty string.
  4. Run maybe add a part from the pending fixed value given parser.
  5. Let modifier token be the result of running try to consume a modifier token given parser.
  6. Run add a part given parser, prefix, name token, regexp or wildcard token, the empty string, and modifier token.
  7. Continue.
5. Let fixed token be char token.
  
  If there was no matching group, then we need to buffer any fixed text. We want to collect as much text as possible before adding it as a "fixed-text" part.
6. If fixed token is null, then set fixed token to the result of running try to consume a token given parser and "escaped-char".
7. If fixed token is not null:
  1. Append fixed token’s value to parser’s pending fixed value.
  2. Continue.
8. Let open token be the result of running try to consume a token given parser and "open".
  
  Next we look for the sequence <open><char prefix><name><regexp><char suffix><close><modifier>. The open and close are necessary, but the other tokens are not.
  
  "{a:foo(bar)b}?"
  All tokens are present.
  "{:foo}?"
  "open", "name", "close", and "other-modifier" tokens.
  "{(bar)}?"
  "open", "regexp", "close", and "other-modifier" tokens.
  "{ab}?"
  "open", "char", "close", and "other-modifier" tokens.
9. If open token is not null:
  1. Set prefix be the result of running consume text given parser.
  2. Set name token to the result of running try to consume a token given parser and "name".
  3. Set regexp or wildcard token to the result of running try to consume a regexp or wildcard token given parser and name token.
  4. Let suffix be the result of running consume text given parser.
  5. Run consume a required token given parser and "close".
  6. Set modifier token to the result of running try to consume a modifier token given parser.
  7. Run add a part given parser, prefix, name token, regexp or wildcard token, suffix, and modifier token.
  8. Continue.
10. Run maybe add a part from the pending fixed value given parser.
11. Run consume a required token given parser and "end".
Return parser’s part list.

The full wildcard regexp value is the string ".*".

To generate a segment wildcard regexp given an options options:

Let result be "[^".
Append the result of running escape a regexp string given options’s delimiter code point to the end of result.
Append "]+?" to the end of result.
Return result.

To try to consume a token given a pattern parser parser and type type:

Assert: parser’s index is less than parser’s token list size.
Let next token be parser’s token list[parser’s index].
If next token’s type is not type return null.
Increment parser’s index by 1.
Return next token.

To try to consume a modifier token given a pattern parser parser:

Let token be the result of running try to consume a token given parser and "other-modifier".
If token is not null, then return token.
Set token to the result of running try to consume a token given parser and "asterisk".
Return token.

To try to consume a regexp or wildcard token given a pattern parser parser and token name token:

Let token be the result of running try to consume a token given parser and "regexp".
If name token is null and token is null, then set token to the result of running try to consume a token given parser and "asterisk".
Return token.

To consume a required token given a pattern parser parser and type type:

Let result be the result of running try to consume a token given parser and type.
If result is null, then throw a TypeError.
Return result.

To consume text given a pattern parser parser:

Let result be the empty string.
While true:
1. Let token be the result of running try to consume a token given parser and "char".
2. If token is null, then set token to the result of running try to consume a token given parser and "escaped-char".
3. If token is null, then break.
4. Append token’s value to the end of result.
Return result.

To maybe add a part from the pending fixed value given a pattern parser parser:

If parser’s pending fixed value is the empty string, then return.
Let encoded value be the result of running parser’s encoding callback given parser’s pending fixed value.
Set parser’s pending fixed value to the empty string.
Let part be a new part whose type is "fixed-text", value is encoded value, and modifier is "none".
Append part to parser’s part list.

To add a part given a pattern parser parser, a string prefix, a token name token, a token regexp or wildcard token, a string suffix, and a token modifier token:

Let modifier be "none".
If modifier token is not null:
1. If modifier token’s value is "?" then set modifier to "optional".
2. Otherwise if modifier token’s value is "*" then set modifier to "zero-or-more".
3. Otherwise if modifier token’s value is "+" then set modifier to "one-or-more".
If name token is null and regexp or wildcard token is null and modifier is "none":

This was a "{foo}" grouping. We add this to the pending fixed value so that it will be combined with any previous or subsequent text.
1. Append prefix to the end of parser’s pending fixed value.
2. Return.
Run maybe add a part from the pending fixed value given parser.
If name token is null and regexp or wildcard token is null:

This was a "{foo}?" grouping. The modifier means we cannot combine it with other text. Therefore we add it as a part immediately.
1. Assert: suffix is the empty string.
2. If prefix is the empty string, then return.
3. Let encoded value be the result of running parser’s encoding callback given prefix.
4. Let part be a new part whose type is "fixed-text", value is encoded value, and modifier is modifier.
5. Append part to parser’s part list.
6. Return.
Let regexp value be the empty string.

Next, we convert the regexp or wildcard token into a regular expression.
If regexp or wildcard token is null, then set regexp value to parser’s segment wildcard regexp.
Otherwise if regexp or wildcard token’s type is "asterisk", then set regexp value to the full wildcard regexp value.
Otherwise set regexp value to regexp or wildcard token’s value.
Let type be "regexp".

Next, we convert regexp value into a part type. We make sure to go to a regular expression first so that an equivalent "regexp" token will be treated the same as a "name" or "asterisk" token.
If regexp value is parser’s segment wildcard regexp:
1. Set type to "segment-wildcard".
2. Set regexp value to the empty string.
Otherwise if regexp value is the full wildcard regexp value:
1. Set type to "full-wildcard".
2. Set regexp value to the empty string.
Let name be the empty string.

Next, we determine the part name. This can be explicitly provided by a "name" token or be automatically assigned.
If name token is not null, then set name to name token’s value.
Otherwise if regexp or wildcard token is not null:
1. Set name to parser’s next numeric name.
2. Increment parser’s next numeric name by 1.
If the result of running is a duplicate name given parser and name is true, then throw a TypeError.
Let encoded prefix be the result of running parser’s encoding callback given prefix.

Finally, we encode the fixed text values and create the part.
Let encoded suffix be the result of running parser’s encoding callback given suffix.
Let part be a new part whose type is type, value is regexp value, modifier is modifier, name is name, prefix is encoded prefix, and suffix is encoded suffix.
Append part to parser’s part list.

To determine if a value is a duplicate name given a pattern parser parser and a string name:

For each part of parser’s part list:
1. If part’s name is name, then return true.
Return false.

2.2. Converting part lists to regular expressions

To generate a regular expression and name list from a given part list part list and options options:

Let result be "^".
Let name list be a new list.
For each part of part list:
1. If part’s type is "fixed-text":
  1. If part’s modifier is "none", then append the result of running escape a regexp string given part’s value to the end of result.
  2. Otherwise:
    
    A "fixed-text" part with a modifier uses a non capturing group. It uses the following form.
    
    (?:<fixed text>)<modifier>
    1. Append "(?:" to the end of result.
    2. Append the result of running escape a regexp string given part’s value to the end of result.
    3. Append ")" to the end of result.
    4. Append the result of running convert a modifier to a string given part’s modifier to the end of result.
  3. Continue.
2. Assert: part’s name is not the empty string.
3. Append part’s name to name list.
  
  We collect the list of matching group names in a parallel list. This is largely done for legacy reasons to match path-to-regexp. We could attempt to convert this to use regular expression named captured groups, but given the complexity of this algorithm there is a real risk of introducing unintended bugs. In addition, if we ever end up exposing the generated regular expressions to the web we would like to maintain compability with path-to-regexp which has indicated its unlikely to switch to using named capture groups.
4. Let regexp value be part’s value.
5. If part’s type is "segment-wildcard", then set regexp value to the result of running generate a segment wildcard regexp given options.
6. Otherwise if part’s type is "full-wildcard", then set regexp value to full wildcard regexp value.
7. If part’s prefix is the empty string and part’s suffix is the empty string:
  
  If there is no prefix or suffix then generation depends on the modifier. If there is no modifier or just the optional modifier, it uses the following simple form:
  
  (<regexp value>)<modifier>
  
  If there is a repeating modifier, however, we will use the more complex form:
  
  ((?:<regexp value>)<modifier>)
  1. If part’s modifier is "none" or "optional", then:
    1. Append "(" to the end of result.
    2. Append regexp value to the end of result.
    3. Append ")" to the end of result.
    4. Append the result of running convert a modifier to a string given part’s modifier to the end of result.
  2. Otherwise:
    1. Append "((?:" to the end of result.
    2. Append regexp value to the end of result.
    3. Append ")" to the end of result.
    4. Append the result of running convert a modifier to a string given part’s modifier to the end of result.
    5. Append ")" to the end of result.
  3. Continue.
8. If part’s modifier is "none" or "optional":
  
  This section handles non-repeating parts with a prefix or suffix. There is an inner capturing group that contains the primary regexp value. The inner group is then combined with the prefix or suffix in an outer non-capturing group. Finally the modifier is applied. The resulting form is as follows.
  
  (?:<prefix>(<regexp value>)<suffix>)<modifier>
  1. Append "(?:" to the end of result.
  2. Append the result of running escape a regexp string given part’s prefix to the end of result.
  3. Append "(" to the end of result.
  4. Append regexp value to the end of result.
  5. Append ")" to the end of result.
  6. Append the result of running escape a regexp string given part’s suffix to the end of result.
  7. Append ")" to the end of result.
  8. Append the result of running convert a modifier to a string given part’s modifier to the end of result.
  9. Continue.
9. Assert: part’s modifier is "zero-or-more" or "one-or-more".
10. Assert: part’s prefix is not the empty string or part’s suffix is not the empty string.
  
  Repeating parts with a prefix or suffix are dramatically more complicated. We want to exclude the initial prefix and the final suffix, but include them between any repeated elements. To achieve this we provide a separate initial expression that excludes the prefix. Then the expression is duplicated with the prefix/suffix values included in an optional repeating element. If zero values are permitted then a final optional modifier can be appended. The resulting form is as follows.
  
  (?:<prefix>((?:<regexp value>)(?:<suffix><prefix>(?:<regexp value>))*)<suffix>)?
11. Append "(?:" to the end of result.
12. Append the result of running escape a regexp string given part’s prefix to the end of result.
13. Append "((?:" to the end of result.
14. Append regexp value to the end of result.
15. Append ")(?:" to the end of result.
16. Append the result of running escape a regexp string given part’s suffix to the end of result.
17. Append the result of running escape a regexp string given part’s prefix to the end of result.
18. Append "(?:" to the end of result.
19. Append regexp value to the end of result.
20. Append "))*)" to the end of result.
21. Append the result of running escape a regexp string given part’s suffix to the end of result.
22. Append ")" to the end of result.
23. If part’s modifier is "zero-or-more" then append "?" to the end of result.
Append "$" to the end of result.
Return (result, name list).

To escape a regexp string given a string input:

Assert: input is an ASCII string.
Let result be the empty string.
Let index be 0.
While index is less than input’s length:
1. Let c be input[index].
2. Increment index by 1.
3. If c is one of:
  - U+002E (.);
  - U+002B (+);
  - U+002A (*);
  - U+003F (?);
  - U+005E (^);
  - U+0024 ($);
  - U+007B ({);
  - U+007D (});
  - U+0028 (();
  - U+0029 ());
  - U+005B ([);
  - U+005D (]);
  - U+007C (|);
  - U+002F (/); or
  - U+005C (\),
  then append "\" to the end of result.
4. Append c to the end of result.
Return result.

2.3. Converting part lists to pattern strings

To generate a pattern string from a given part list part list and options options:

Let result be the empty string.
Let index list be the result of getting the indices for part list.
For each index of index list:
1. Let part be part list[index].
2. Let previous part be part list[index - 1] if index is greater than 0, otherwise let it be null.
3. Let next part be part list[index + 1] if index is less than index list’s size - 1, otherwise let it be null.
4. If part’s type is "fixed-text" then:
  1. If part’s modifier is "none" then:
    1. Append the result of running escape a pattern string given part’s value to the end of result.
    2. Continue.
  2. Append "{" to the end of result.
  3. Append the result of running escape a pattern string given part’s value to the end of result.
  4. Append "}" to the end of result.
  5. Append the result of running convert a modifier to a string given part’s modifier to the end of result.
  6. Continue.
5. Let custom name be true if part’s name[0] is not an ASCII digit; otherwise false.
6. Let needs grouping be true if at least one of the following are true, otherwise let it be false:
  - part’s suffix is not the empty string.
  - part’s prefix is not the empty string and is not options’s prefix code point.
7. If all of the following are true:
  - needs grouping is false; and
  - custom name is true; and
  - part’s type is "segment-wildcard"; and
  - part’s modifier is "none"; and
  - next part is not null; and
  - next part’s prefix is the empty string; and
  - next part’s suffix is the empty string
  then:
  1. If next part’s type is "fixed-text":
    1. Set needs grouping to true if the result of running is a valid name code point given next part’s value's first code point and the boolean false is true.
  2. Otherwise:
    1. Set needs grouping to true if next part’s name[0] is an ASCII digit.
8. If all of the following are true:
  - needs grouping is false; and
  - part’s prefix is the empty string; and
  - previous part is not null; and
  - previous part’s type is "fixed-text"; and
  - previous part’s value's last code point is options’s prefix code point.
  then set needs grouping to true.
9. Assert: part’s name is not the empty string or null.
10. If needs grouping is true, then append "{" to the end of result.
11. Append the result of running escape a pattern string given part’s prefix to the end of result.
12. If custom name is true:
  1. Append ":" to the end of result.
  2. Append part’s name to the end of result.
13. If part’s type is "regexp" then:
  1. Append "(" to the end of result.
  2. Append part’s value to the end of result.
  3. Append ")" to the end of result.
14. Otherwise if part’s type is "segment-wildcard" and custom name is false:
  1. Append "(" to the end of result.
  2. Append the result of running generate a segment wildcard regexp given options to the end of result.
  3. Append ")" to the end of result.
15. Otherwise if part’s type is "full-wildcard":
  1. If custom name is false and one of the following is true:
    - previous part is null; or
    - previous part’s type is "fixed-text"; or
    - previous part’s modifier is not "none"; or
    - needs grouping is true; or
    - part’s prefix is not the empty string
    then append "*" to the end of result.
  2. Otherwise:
    1. Append "(" to the end of result.
    2. Append full wildcard regexp value to the end of result.
    3. Append ")" to the end of result.
16. If all of the following are true:
  - part’s type is "segment-wildcard"; and
  - custom name is true; and
  - part’s suffix is not the empty string; and
  - The result of running is a valid name code point given part’s suffix's first code point and the boolean false is true
  then append U+005C (\) to the end of result.
17. Append the result of running escape a pattern string given part’s suffix to the end of result.
18. If needs grouping is true, then append "}" to the end of result.
19. Append the result of running convert a modifier to a string given part’s modifier to the end of result.
Return result.

To escape a pattern string given a string input:

Assert: input is an ASCII string.
Let result be the empty string.
Let index be 0.
While index is less than input’s length:
1. Let c be input[index].
2. Increment index by 1.
3. If c is one of:
  - U+002B (+);
  - U+002A (*);
  - U+003F (?);
  - U+003A (:);
  - U+007B ({);
  - U+007D (});
  - U+0028 (();
  - U+0029 ()); or
  - U+005C (\),
  then append U+005C (\) to the end of result.
4. Append c to the end of result.
Return result.

To convert a modifier to a string given a modifier modifier:

If modifier is "zero-or-more", then return "*".
If modifier is "optional", then return "?".
If modifier is "one-or-more", then return "+".
Return the empty string.

3. Canonicalization

3.1. Encoding callbacks

To canonicalize a protocol given a string value:

If value is the empty string, return value.
Let dummyURL be a new URL record.
Let parseResult be the result of running the basic URL parser given value followed by "://dummy.test", with dummyURL as url.

Note, state override is not used here because it enforces restrictions that are only appropriate for the protocol setter. Instead we use the protocol to parse a dummy URL using the normal parsing entry point.
If parseResult is failure, then throw a TypeError.
Return dummyURL’s scheme.

To canonicalize a username given a string value:

If value is the empty string, return value.
Let dummyURL be a new URL record.
Set the username given dummyURL and value.
Return dummyURL’s username.

To canonicalize a password given a string value:

If value is the empty string, return value.
Let dummyURL be a new URL record.
Set the password given dummyURL and value.
Return dummyURL’s password.

To canonicalize a hostname given a string value:

If value is the empty string, return value.
Let dummyURL be a new URL record.
Let parseResult be the result of running the basic URL parser given value with dummyURL as url and hostname state as state override.
If parseResult is failure, then throw a TypeError.
Return dummyURL’s host.

To canonicalize an IPv6 hostname given a string value:

Let result be the empty string.
For each code point in value interpreted as a list of code points:
1. If all of the following are true:
  - code point is not an ASCII hex digit;
  - code point is not U+005B ([);
  - code point is not U+005D (]); and
  - code point is not U+003A (:),
  then throw a TypeError.
2. Append the result of running ASCII lowercase given code point to the end of result.
Return result.

To canonicalize a port given a string portValue and optionally a string protocolValue:

If value is the empty string, return value.
Let dummyURL be a new URL record.
If protocolValue was given, then set dummyURL’s scheme to protocolValue.

Note, we set the URL record's scheme in order for the basic URL parser to recognize and normalize default port values.
Let parseResult be the result of running basic URL parser given portValue with dummyURL as url and port state as state override.
If parseResult is failure, then throw a TypeError.
Return dummyURL’s port or empty string if it is null.

To canonicalize a pathname given a string value:

If value is the empty string, then return value.
Let leading slash be true if the first code point in value is U+002F (/) and otherwise false.
Let modified value be "/-" if leading slash is false and otherwise the empty string.

The URL parser will automatically prepend a leading slash to the canonicalized pathname. This does not work here unfortunately. This algorithm is called for pieces of the pathname, instead of the entire pathname, when used as an encoding callback. Therefore we disable the prepending of the slash by inserting our own. An additional character is also inserted here in order to avoid inadvertantly collapsing a leading dot due to the fake leading slash being interpreted as a "/." sequence. These inserted characters are then removed from the result below.

Note, implementations are free to simply disable slash prepending in their URL parsing code instead of paying the performance penalty of inserting and removing characters in this algorithm.
Append value to the end of modified value.
Let dummyURL be a new URL record.
Let parseResult be the result of running basic URL parser given modified value with dummyURL as url and path start state as state override.
If parseResult is failure, then throw a TypeError.
Let result be the result of URL path serializing dummyURL.
If leading slash is false, then set result to the code point substring from 2 to the end of the string within result.
Return result.

To canonicalize an opaque pathname given a string value:

If value is the empty string, return value.
Let dummyURL be a new URL record.
Set dummyURL’s path to the empty string.
Let parseResult be the result of running URL parsing given value with dummyURL as url and opaque path state as state override.
If parseResult is failure, then throw a TypeError.
Return the result of URL path serializing dummyURL.

To canonicalize a search given a string value:

If value is the empty string, return value.
Let dummyURL be a new URL record.
Set dummyURL’s query to the empty string.
Let parseResult be the result of running basic URL parser given value with dummyURL as url and query state as state override.
If parseResult is failure, then throw a TypeError.
Return dummyURL’s query.

To canonicalize a hash given a string value:

If value is the empty string, return value.
Let dummyURL be a new URL record.
Set dummyURL’s fragment to the empty string.
Let parseResult be the result of running basic URL parser given value with dummyURL as url and fragment state as state override.
If parseResult is failure, then throw a TypeError.
Return dummyURL’s fragment.

3.2. `URLPatternInit` processing

To process a URLPatternInit given a URLPatternInit init, a string type, a string or null protocol, a string or null username, a string or null password, a string or null hostname, a string or null port, a string or null pathname, a string or null search, and a string or null hash:

Let result be the result of creating a new URLPatternInit.
Set result["protocol"] to protocol.
Set result["username"] to username.
Set result["password"] to password.
Set result["hostname"] to hostname.
Set result["port"] to port.
Set result["pathname"] to pathname.
Set result["search"] to search.
Set result["hash"] to hash.
Let baseURL be null.
If init["baseURL"] is not null:
1. Set baseURL to the result of parsing init["baseURL"].
2. If baseURL is failure, then throw a TypeError.
3. Set result["protocol"] to the result of processing a base URL string given baseURL’s scheme and type.
4. Set result["username"] to the result of processing a base URL string given baseURL’s username and type.
5. Set result["password"] to the result of processing a base URL string given baseURL’s password and type.
6. Set result["hostname"] to the result of processing a base URL string given baseURL’s host and type.
7. Set result["port"] to the result of processing a base URL string given baseURL’s port and type.
8. Set result["pathname"] to the result of processing a base URL string given the result of URL path serializing baseURL and type.
9. Set result["search"] to the result of processing a base URL string given baseURL’s query and type.
10. Set result["hash"] to the result of processing a base URL string given baseURL’s fragment and type.
If init["protocol"] is not null then set result["protocol"] to the result of process protocol for init given init["protocol"] and type.
If init["username"] is not null then set result["username"] to the result of process username for init given init["username"] and type.
If init["password"] is not null then set result["password"] to the result of process password for init given init["password"] and type.
If init["hostname"] is not null then set result["hostname"] to the result of process hostname for init given init["hostname"] and type.
If init["port"] is not null then set result["port"] to the result of process port for init given init["port"], result["protocol"], and type.
If init["pathname"] is not null:
1. Set result["pathname"] to init["pathname"].
2. If the following are all true:
  - baseURL is not null;
  - baseURL has an opaque path; and
  - the result of running is an absolute pathname given result["pathname"] and type is false,
  then:
  1. Let baseURLPath be the result of running process a base URL string given the result of URL path serializing baseURL and type.
  2. Let slash index be the index of the last U+002F (/) code point found in baseURLPath, interpreted as a sequence of code points, or null if there are no instances of the code point.
  3. If slash index is not null:
    1. Let new pathname be the code point substring from 0 to slash index + 1 within baseURLPath.
    2. Append result["pathname"] to the end of new pathname.
    3. Set result["pathname"] to new pathname.
3. Set result["pathname"] to the result of process pathname for init given result["pathname"], result["protocol"], and type.
If init["search"] is not null then set result["search"] to the result of process search for init given init["search"] and type.
If init["hash"] is not null then set result["hash"] to the result of process hash for init given init["hash"] and type.
Return result.

To process a base URL string given a string input and a string type:

Assert: input is not null.
If type is not "pattern" return input.
Return the result of escaping a pattern string given input.

To run is an absolute pathname given a pattern string input and a string type:

If input is the empty string, then return false.
If input[0] is U+002F (/), then return true.
If type is "url", then return false.
If input’s code point length is less than 2, then return false.
If input[0] is U+005C (\) and input[1] is U+002F (/), then return true.
If input[0] is U+007B ({) and input[1] is U+002F (/), then return true.
Return false.

To process protocol for init given a string value and a string type:

Let strippedValue be the given value with a single trailing U+003A (:) removed, if any.
If type is "pattern" then return strippedValue.
Return the result of running canonicalize a protocol given strippedValue.

To process username for init given a string value and a string type:

If type is "pattern" then return value.
Return the result of running canonicalize a username given value.

To process password for init given a string value and a string type:

If type is "pattern" then return value.
Return the result of running canonicalize a password given value.

To process hostname for init given a string value and a string type:

If type is "pattern" then return value.
Return the result of running canonicalize a hostname given value.

To process port for init given a string portValue, a string protocolValue, and a string type:

If type is "pattern" then return portValue.
Return the result of running canonicalize a port given portValue and protocolValue.

To process pathname for init given a string pathnameValue, a string protocolValue, and a string type:

If type is "pattern" then return pathnameValue.
If protocolValue is a special scheme or the empty string, then return the result of running canonicalize a pathname given pathnameValue.

If the protocolValue is the empty string then no value was provided for protocol in the constructor dictionary. Normally we do not special case empty string dictionary values, but in this case we treat it as a special scheme in order to default to the most common pathname canonicalization.
Return the result of running canonicalize an opaque pathname given pathnameValue.

To process search for init given a string value and a string type:

Let strippedValue be the given value with a single leading U+003F (?) removed, if any.
If type is "pattern" then return strippedValue.
Return the result of running canonicalize a search given strippedValue.

To process hash for init given a string value and a string type:

Let strippedValue be the given value with a single leading U+0023 (#) removed, if any.
If type is "pattern" then return strippedValue.
Return the result of running canonicalize a hash given strippedValue.

Acknowledgments

The editors would like to thank Alex Russell, Anne van Kesteren, Asa Kusuma, Blake Embrey, Cyrus Kasaaian, Daniel Murphy, Darwin Huang, Devlin Cronin, Domenic Denicola, Dominick Ng, Jake Archibald, Jeffrey Posnick, Jeremy Roman, Jimmy Shen, Joe Gregorio, Joshua Bell, Kenichi Ishibashi, Kenji Baheux, Kenneth Rohde Christiansen, Kingsley Ngan, Kinuko Yasuda, L. David Baron, Luca Casonato, Łukasz Anforowicz, Makoto Shimazu, Marijn Kruisselbrink, Matt Falkenhagen, Matt Giuca, Michael Landry, R. Samuel Klatchko, Rajesh Jagannathan, Ralph Chelala, Sangwhan Moon, Sayan Pal, Victor Costan, and Youenn Fablet for their contributors to this specification.

Special thanks to Blake Embrey and the other pillarjs/path-to-regexp contributors for building an excellent open source library that so many have found useful.

Also, special thanks to Kenneth Rohde Christiansen for his work on the polyfill. He put in extensive work to adapt to the changing URLPattern API.

This standard is written by Ben Kelly (Google, wanderview@chromium.org), Jeremy Roman (Google, jbroman@chromium.org), and 宍戸俊哉 (Shunya Shishido, Google, sisidovski@chromium.org).

Intellectual property rights

This Living Standard was originally developed in the W3C WICG, where it was available under the W3C Software and Document License.

Copyright © WHATWG (Apple, Google, Mozilla, Microsoft). This work is licensed under a Creative Commons Attribution 4.0 International License. To the extent portions of it are incorporated into source code, such portions in the source code are licensed under the BSD 3-Clause License instead.

URL Pattern (Commit b62405015d6ed4c5d54b46436319bf529c51a815)

Abstract

1. The `URLPattern` class

1.1. Internals

1.2. Constructor string parsing

2. Patterns

2.1. Parsing patterns

2.1.1. Tokens

2.1.2. Tokenizing

2.1.3. Parts

2.1.4. Options

2.1.5. Parsing

2.2. Converting part lists to regular expressions

2.3. Converting part lists to pattern strings

3. Canonicalization

3.1. Encoding callbacks

3.2. `URLPatternInit` processing

Acknowledgments

Intellectual property rights

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

IDL Index

URL Pattern (Commit b62405015d6ed4c5d54b46436319bf529c51a815)

Abstract

1. The URLPattern class

1.1. Internals

1.2. Constructor string parsing

2. Patterns

2.1. Parsing patterns

2.1.1. Tokens

2.1.2. Tokenizing

2.1.3. Parts

2.1.4. Options

2.1.5. Parsing

2.2. Converting part lists to regular expressions

2.3. Converting part lists to pattern strings

3. Canonicalization

3.1. Encoding callbacks

3.2. URLPatternInit processing

Acknowledgments

Intellectual property rights

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

IDL Index

1. The `URLPattern` class

3.2. `URLPatternInit` processing