Uri
Added in version 2.66.
- class Uri(*args, **kwargs)
The GUri
type and related functions can be used to parse URIs into
their components, and build valid URIs from individual components.
Since GUri
only represents absolute URIs, all GUri
’s will have a
URI scheme, so get_scheme
will always return a non-NULL
answer. Likewise, by definition, all URIs have a path component, so
get_path
will always return a non-NULL
string (which may
be empty).
If the URI string has an
‘authority’ component (that
is, if the scheme is followed by ://
rather than just :
), then the
GUri
will contain a hostname, and possibly a port and ‘userinfo’.
Additionally, depending on how the GUri
was constructed/parsed (for example,
using the G_URI_FLAGS_HAS_PASSWORD
and G_URI_FLAGS_HAS_AUTH_PARAMS
flags),
the userinfo may be split out into a username, password, and
additional authorization-related parameters.
Normally, the components of a GUri
will have all %
-encoded
characters decoded. However, if you construct/parse a GUri
with
G_URI_FLAGS_ENCODED
, then the %
-encoding will be preserved instead in
the userinfo, path, and query fields (and in the host field if also
created with G_URI_FLAGS_NON_DNS
). In particular, this is necessary if
the URI may contain binary data or non-UTF-8 text, or if decoding
the components might change the interpretation of the URI.
For example, with the encoded flag:
g_autoptr(GUri) uri = g_uri_parse ("http://host/path?query=http``%3A````%2F````%2Fhost````%2Fpath````%3Fparam````%3Dvalue``", G_URI_FLAGS_ENCODED, &err);
g_assert_cmpstr (g_uri_get_query (uri), ==, "query=http``%3A````%2F````%2Fhost````%2Fpath````%3Fparam````%3Dvalue``");
While the default %
-decoding behaviour would give:
g_autoptr(GUri) uri = g_uri_parse ("http://host/path?query=http``%3A````%2F````%2Fhost````%2Fpath````%3Fparam````%3Dvalue``", G_URI_FLAGS_NONE, &err);
g_assert_cmpstr (g_uri_get_query (uri), ==, "query=http://host/path?param=value");
During decoding, if an invalid UTF-8 string is encountered, parsing will fail with an error indicating the bad string location:
g_autoptr(GUri) uri = g_uri_parse ("http://host/path?query=http``%3A````%2F````%2Fhost````%2Fpath````%3Fbad````%3D````%00alue``", G_URI_FLAGS_NONE, &err);
g_assert_error (err, G_URI_ERROR, G_URI_ERROR_BAD_QUERY);
You should pass G_URI_FLAGS_ENCODED
or G_URI_FLAGS_ENCODED_QUERY
if you
need to handle that case manually. In particular, if the query string
contains =
characters that are %
-encoded, you should let
parse_params
do the decoding once of the query.
GUri
is immutable once constructed, and can safely be accessed from
multiple threads. Its reference counting is atomic.
Note that the scope of GUri
is to help manipulate URIs in various applications,
following RFC 3986. In particular,
it doesn’t intend to cover web browser needs, and doesn’t implement the
WHATWG URL standard. No APIs are provided to
help prevent
homograph attacks, so
GUri
is not suitable for formatting URIs for display to the user for making
security-sensitive decisions.
Relative and absolute URIs
As defined in RFC 3986, the hierarchical nature of URIs means that they can either be ‘relative references’ (sometimes referred to as ‘relative URIs’) or ‘URIs’ (for clarity, ‘URIs’ are referred to in this documentation as ‘absolute URIs’ — although in constrast to RFC 3986, fragment identifiers are always allowed).
Relative references have one or more components of the URI missing. In
particular, they have no scheme. Any other component, such as hostname,
query, etc. may be missing, apart from a path, which has to be specified (but
may be empty). The path may be relative, starting with ./
rather than /
.
For example, a valid relative reference is ./path?query
,
/?query#fragment
or //example.com
.
Absolute URIs have a scheme specified. Any other components of the URI which
are missing are specified as explicitly unset in the URI, rather than being
resolved relative to a base URI using parse_relative
.
For example, a valid absolute URI is file:///home/bob
or
https://search.com?query=string
.
A GUri
instance is always an absolute URI. A string may be an absolute URI
or a relative reference; see the documentation for individual functions as to
what forms they accept.
Parsing URIs
The most minimalist APIs for parsing URIs are split
and
split_with_user
. These split a URI into its component
parts, and return the parts; the difference between the two is that
split
treats the ‘userinfo’ component of the URI as a
single element, while split_with_user
can (depending on the
UriFlags
you pass) treat it as containing a username, password,
and authentication parameters. Alternatively, split_network
can be used when you are only interested in the components that are
needed to initiate a network connection to the service (scheme,
host, and port).
parse
is similar to split
, but instead of
returning individual strings, it returns a GUri
structure (and it requires
that the URI be an absolute URI).
resolve_relative
and parse_relative
allow
you to resolve a relative URI relative to a base URI.
resolve_relative
takes two strings and returns a string,
and parse_relative
takes a GUri
and a string and returns a
GUri
.
All of the parsing functions take a UriFlags
argument describing
exactly how to parse the URI; see the documentation for that type
for more details on the specific flags that you can pass. If you
need to choose different flags based on the type of URI, you can
use peek_scheme
on the URI string to check the scheme
first, and use that to decide what flags to parse it with.
For example, you might want to use G_URI_PARAMS_WWW_FORM
when parsing the
params for a web URI, so compare the result of peek_scheme
against http
and https
.
Building URIs
join
and join_with_user
can be used to construct
valid URI strings from a set of component strings. They are the
inverse of split
and split_with_user
.
Similarly, build
and build_with_user
can be
used to construct a GUri
from a set of component strings.
As with the parsing functions, the building functions take a
UriFlags
argument. In particular, it is important to keep in mind
whether the URI components you are using are already %
-encoded. If so,
you must pass the G_URI_FLAGS_ENCODED
flag.
file://
URIs
Note that Windows and Unix both define special rules for parsing
file://
URIs (involving non-UTF-8 character sets on Unix, and the
interpretation of path separators on Windows). GUri
does not
implement these rules. Use filename_from_uri
and
filename_to_uri
if you want to properly convert between
file://
URIs and local filenames.
URI Equality
Note that there is no g_uri_equal ()
function, because comparing
URIs usefully requires scheme-specific knowledge that GUri
does
not have. GUri
can help with normalization if you use the various
encoded UriFlags
as well as G_URI_FLAGS_SCHEME_NORMALIZE
however it is not comprehensive.
For example, data:,foo
and data:;base64,Zm9v
resolve to the same
thing according to the data:
URI specification which GLib does not
handle.
Methods
- class Uri
- build(flags: UriFlags, scheme: str, userinfo: str | None, host: str | None, port: int, path: str, query: str | None = None, fragment: str | None = None) Uri
Creates a new
Uri
from the given components according toflags
.See also
build_with_user()
, which allows specifying the components of the “userinfo” separately.Added in version 2.66.
- Parameters:
flags – flags describing how to build the
Uri
scheme – the URI scheme
userinfo – the userinfo component, or
None
host – the host component, or
None
port – the port, or
-1
path – the path component
query – the query component, or
None
fragment – the fragment, or
None
- build_with_user(flags: UriFlags, scheme: str, user: str | None, password: str | None, auth_params: str | None, host: str | None, port: int, path: str, query: str | None = None, fragment: str | None = None) Uri
Creates a new
Uri
from the given components according toflags
(HAS_PASSWORD
is added unconditionally). Theflags
must be coherent with the passed values, in particular use%
-encoded values withENCODED
.In contrast to
build()
, this allows specifying the components of the ‘userinfo’ field separately. Note thatuser
must be non-None
if eitherpassword
orauth_params
is non-None
.Added in version 2.66.
- Parameters:
flags – flags describing how to build the
Uri
scheme – the URI scheme
user – the user component of the userinfo, or
None
password – the password component of the userinfo, or
None
auth_params – the auth params of the userinfo, or
None
host – the host component, or
None
port – the port, or
-1
path – the path component
query – the query component, or
None
fragment – the fragment, or
None
- escape_bytes(unescaped: Sequence[int], reserved_chars_allowed: str | None = None) str
Escapes arbitrary data for use in a URI.
Normally all characters that are not ‘unreserved’ (i.e. ASCII alphanumerical characters plus dash, dot, underscore and tilde) are escaped. But if you specify characters in
reserved_chars_allowed
they are not escaped. This is useful for the ‘reserved’ characters in the URI specification, since those are allowed unescaped in some portions of a URI.Though technically incorrect, this will also allow escaping nul bytes as
%``00
.Added in version 2.66.
- Parameters:
unescaped – the unescaped input data.
reserved_chars_allowed – a string of reserved characters that are allowed to be used, or
None
.
- escape_string(unescaped: str, reserved_chars_allowed: str | None, allow_utf8: bool) str
Escapes a string for use in a URI.
Normally all characters that are not “unreserved” (i.e. ASCII alphanumerical characters plus dash, dot, underscore and tilde) are escaped. But if you specify characters in
reserved_chars_allowed
they are not escaped. This is useful for the “reserved” characters in the URI specification, since those are allowed unescaped in some portions of a URI.Added in version 2.16.
- Parameters:
unescaped – the unescaped input string.
reserved_chars_allowed – a string of reserved characters that are allowed to be used, or
None
.allow_utf8 –
True
if the result can include UTF-8 characters.
- get_auth_params() str | None
Gets
uri
’s authentication parameters, which may contain%
-encoding, depending on the flags with whichuri
was created. (Ifuri
was not created withHAS_AUTH_PARAMS
then this will beNone
.)Depending on the URI scheme,
parse_params()
may be useful for further parsing this information.Added in version 2.66.
- get_fragment() str | None
Gets
uri
’s fragment, which may contain%
-encoding, depending on the flags with whichuri
was created.Added in version 2.66.
- get_host() str | None
Gets
uri
’s host. This will never have%
-encoded characters, unless it is non-UTF-8 (which can only be the case ifuri
was created withNON_DNS
).If
uri
contained an IPv6 address literal, this value will be just that address, without the brackets around it that are necessary in the string form of the URI. Note that in this case there may also be a scope ID attached to the address. Eg,fe80::1234%``em1
(orfe80::1234%``25em1
if the string is still encoded).Added in version 2.66.
- get_password() str | None
Gets
uri
’s password, which may contain%
-encoding, depending on the flags with whichuri
was created. (Ifuri
was not created withHAS_PASSWORD
then this will beNone
.)Added in version 2.66.
- get_path() str
Gets
uri
’s path, which may contain%
-encoding, depending on the flags with whichuri
was created.Added in version 2.66.
- get_query() str | None
Gets
uri
’s query, which may contain%
-encoding, depending on the flags with whichuri
was created.For queries consisting of a series of
name=value
parameters,UriParamsIter
orparse_params()
may be useful.Added in version 2.66.
- get_scheme() str
Gets
uri
’s scheme. Note that this will always be all-lowercase, regardless of the string or strings thaturi
was created from.Added in version 2.66.
- get_user() str | None
Gets the ‘username’ component of
uri
’s userinfo, which may contain%
-encoding, depending on the flags with whichuri
was created. Ifuri
was not created withHAS_PASSWORD
orHAS_AUTH_PARAMS
, this is the same asget_userinfo()
.Added in version 2.66.
- get_userinfo() str | None
Gets
uri
’s userinfo, which may contain%
-encoding, depending on the flags with whichuri
was created.Added in version 2.66.
- is_valid(uri_string: str, flags: UriFlags) bool
Parses
uri_string
according toflags
, to determine whether it is a validabsolute URI <``relative`
-and-absolute-uris>`_, i.e. it does not need to be resolved relative to another URI usingparse_relative()
.If it’s not a valid URI, an error is returned explaining how it’s invalid.
See
split()
, and the definition ofUriFlags
, for more information on the effect offlags
.Added in version 2.66.
- Parameters:
uri_string – a string containing an absolute URI
flags – flags for parsing
uri_string
- join(flags: UriFlags, scheme: str | None, userinfo: str | None, host: str | None, port: int, path: str, query: str | None = None, fragment: str | None = None) str
Joins the given components together according to
flags
to create an absolute URI string.path
may not beNone
(though it may be the empty string).When
host
is present,path
must either be empty or begin with a slash (/
) character. Whenhost
is not present,path
cannot begin with two slash characters (//
). See RFC 3986, section 3.See also
join_with_user()
, which allows specifying the components of the ‘userinfo’ separately.HAS_PASSWORD
andHAS_AUTH_PARAMS
are ignored if set inflags
.Added in version 2.66.
- Parameters:
flags – flags describing how to build the URI string
scheme – the URI scheme, or
None
userinfo – the userinfo component, or
None
host – the host component, or
None
port – the port, or
-1
path – the path component
query – the query component, or
None
fragment – the fragment, or
None
- join_with_user(flags: UriFlags, scheme: str | None, user: str | None, password: str | None, auth_params: str | None, host: str | None, port: int, path: str, query: str | None = None, fragment: str | None = None) str
Joins the given components together according to
flags
to create an absolute URI string.path
may not beNone
(though it may be the empty string).In contrast to
join()
, this allows specifying the components of the ‘userinfo’ separately. It otherwise behaves the same.HAS_PASSWORD
andHAS_AUTH_PARAMS
are ignored if set inflags
.Added in version 2.66.
- Parameters:
flags – flags describing how to build the URI string
scheme – the URI scheme, or
None
user – the user component of the userinfo, or
None
password – the password component of the userinfo, or
None
auth_params – the auth params of the userinfo, or
None
host – the host component, or
None
port – the port, or
-1
path – the path component
query – the query component, or
None
fragment – the fragment, or
None
- list_extract_uris(uri_list: str) list[str]
Splits an URI list conforming to the text/uri-list mime type defined in RFC 2483 into individual URIs, discarding any comments. The URIs are not validated.
Added in version 2.6.
- Parameters:
uri_list – an URI list
- parse(uri_string: str, flags: UriFlags) Uri
Parses
uri_string
according toflags
. If the result is not a validabsolute URI <``relative`
-and-absolute-uris>`_, it will be discarded, and an error returned.Added in version 2.66.
- Parameters:
uri_string – a string representing an absolute URI
flags – flags describing how to parse
uri_string
- parse_params(params: str, length: int, separators: str, flags: UriParamsFlags) dict[str, str]
Many URI schemes include one or more attribute/value pairs as part of the URI value. This method can be used to parse them into a hash table. When an attribute has multiple occurrences, the last value is the final returned value. If you need to handle repeated attributes differently, use
UriParamsIter
.The
params
string is assumed to still be%
-encoded, but the returned values will be fully decoded. (Thus it is possible that the returned values may contain=
orseparators
, if the value was encoded in the input.) Invalid%
-encoding is treated as with thePARSE_RELAXED
rules forparse()
. (However, ifparams
is the path or query string from aUri
that was parsed withoutPARSE_RELAXED
andENCODED
, then you already know that it does not contain any invalid encoding.)WWW_FORM
is handled as documented forinit()
.If
CASE_INSENSITIVE
is passed toflags
, attributes will be compared case-insensitively, so a params stringattr=123&Attr=456
will only return a single attribute–value pair,Attr=456
. Case will be preserved in the returned attributes.If
params
cannot be parsed (for example, it contains twoseparators
characters in a row), thenerror
is set andNone
is returned.Added in version 2.66.
- Parameters:
params – a
%
-encoded string containingattribute=value
parameterslength – the length of
params
, or-1
if it is nul-terminatedseparators – the separator byte character set between parameters. (usually
&
, but sometimes;
or both&;
). Note that this function works on bytes not characters, so it can’t be used to delimit UTF-8 strings for anything but ASCII characters. You may pass an empty set, in which case no splitting will occur.flags – flags to modify the way the parameters are handled.
- parse_relative(uri_ref: str, flags: UriFlags) Uri
Parses
uri_ref
according toflags
and, if it is arelative URI <``relative`
-and-absolute-uris>`_, resolves it relative tobase_uri
. If the result is not a valid absolute URI, it will be discarded, and an error returned.Added in version 2.66.
- Parameters:
uri_ref – a string representing a relative or absolute URI
flags – flags describing how to parse
uri_ref
- parse_scheme(uri: str) str | None
Gets the scheme portion of a URI string. RFC 3986 decodes the scheme as:
URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
Common schemes include
file
,https
,svn+ssh
, etc.Added in version 2.16.
- Parameters:
uri – a valid URI.
- peek_scheme(uri: str) str | None
Gets the scheme portion of a URI string. RFC 3986 decodes the scheme as:
URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
Common schemes include
file
,https
,svn+ssh
, etc.Unlike
parse_scheme()
, the returned scheme is normalized to all-lowercase and does not need to be freed.Added in version 2.66.
- Parameters:
uri – a valid URI.
- resolve_relative(base_uri_string: str | None, uri_ref: str, flags: UriFlags) str
Parses
uri_ref
according toflags
and, if it is arelative URI <``relative`
-and-absolute-uris>`_, resolves it relative tobase_uri_string
. If the result is not a valid absolute URI, it will be discarded, and an error returned.(If
base_uri_string
isNone
, this just returnsuri_ref
, orNone
ifuri_ref
is invalid or not absolute.)Added in version 2.66.
- Parameters:
base_uri_string – a string representing a base URI
uri_ref – a string representing a relative or absolute URI
flags – flags describing how to parse
uri_ref
- split(uri_ref: str, flags: UriFlags) tuple[bool, str, str, str, int, str, str, str]
Parses
uri_ref
(which can be anabsolute or relative URI <``relative`
-and-absolute-uris>`_) according toflags
, and returns the pieces. Any component that doesn’t appear inuri_ref
will be returned asNone
(but note that all URIs always have a path component, though it may be the empty string).If
flags
containsENCODED
, then%
-encoded characters inuri_ref
will remain encoded in the output strings. (If not, then all such characters will be decoded.) Note that decoding will only work if the URI components are ASCII or UTF-8, so you will need to useENCODED
if they are not.Note that the
HAS_PASSWORD
andHAS_AUTH_PARAMS
flags
are ignored bysplit()
, since it always returns only the full userinfo; usesplit_with_user()
if you want it split up.Added in version 2.66.
- Parameters:
uri_ref – a string containing a relative or absolute URI
flags – flags for parsing
uri_ref
- split_network(uri_string: str, flags: UriFlags) tuple[bool, str, str, int]
Parses
uri_string
(which must be anabsolute URI <``relative`
-and-absolute-uris>`_) according toflags
, and returns the pieces relevant to connecting to a host. See the documentation forsplit()
for more details; this is mostly a wrapper around that function with simpler arguments. However, it will return an error ifuri_string
is a relative URI, or does not contain a hostname component.Added in version 2.66.
- Parameters:
uri_string – a string containing an absolute URI
flags – flags for parsing
uri_string
- split_with_user(uri_ref: str, flags: UriFlags) tuple[bool, str, str, str, str, str, int, str, str, str]
Parses
uri_ref
(which can be anabsolute or relative URI <``relative`
-and-absolute-uris>`_) according toflags
, and returns the pieces. Any component that doesn’t appear inuri_ref
will be returned asNone
(but note that all URIs always have a path component, though it may be the empty string).See
split()
, and the definition ofUriFlags
, for more information on the effect offlags
. Note thatpassword
will only be parsed out ifflags
containsHAS_PASSWORD
, andauth_params
will only be parsed out ifflags
containsHAS_AUTH_PARAMS
.Added in version 2.66.
- Parameters:
uri_ref – a string containing a relative or absolute URI
flags – flags for parsing
uri_ref
- to_string() str
Returns a string representing
uri
.This is not guaranteed to return a string which is identical to the string that
uri
was parsed from. However, if the source URI was syntactically correct (according to RFC 3986), and it was parsed withENCODED
, thento_string()
is guaranteed to return a string which is at least semantically equivalent to the source URI (according to RFC 3986).If
uri
might contain sensitive details, such as authentication parameters, or private data in its query string, and the returned string is going to be logged, then consider usingto_string_partial()
to redact parts.Added in version 2.66.
- to_string_partial(flags: UriHideFlags) str
Returns a string representing
uri
, subject to the options inflags
. Seeto_string()
andUriHideFlags
for more details.Added in version 2.66.
- Parameters:
flags – flags describing what parts of
uri
to hide
- unescape_bytes(escaped_string: str, length: int, illegal_characters: str | None = None) Bytes
Unescapes a segment of an escaped string as binary data.
Note that in contrast to
unescape_string()
, this does allow nul bytes to appear in the output.If any of the characters in
illegal_characters
appears as an escaped character inescaped_string
, then that is an error andNone
will be returned. This is useful if you want to avoid for instance having a slash being expanded in an escaped path element, which might confuse pathname handling.Added in version 2.66.
- Parameters:
escaped_string – A URI-escaped string
length – the length (in bytes) of
escaped_string
to escape, or-1
if it is nul-terminated.illegal_characters – a string of illegal characters not to be allowed, or
None
.
- unescape_segment(escaped_string: str | None = None, escaped_string_end: str | None = None, illegal_characters: str | None = None) str | None
Unescapes a segment of an escaped string.
If any of the characters in
illegal_characters
or the NUL character appears as an escaped character inescaped_string
, then that is an error andNone
will be returned. This is useful if you want to avoid for instance having a slash being expanded in an escaped path element, which might confuse pathname handling.Note:
NUL
byte is not accepted in the output, in contrast tounescape_bytes()
.Added in version 2.16.
- Parameters:
escaped_string – A string, may be
None
escaped_string_end – Pointer to end of
escaped_string
, may beNone
illegal_characters – An optional string of illegal characters not to be allowed, may be
None
- unescape_string(escaped_string: str, illegal_characters: str | None = None) str | None
Unescapes a whole escaped string.
If any of the characters in
illegal_characters
or the NUL character appears as an escaped character inescaped_string
, then that is an error andNone
will be returned. This is useful if you want to avoid for instance having a slash being expanded in an escaped path element, which might confuse pathname handling.Added in version 2.16.
- Parameters:
escaped_string – an escaped string to be unescaped.
illegal_characters – a string of illegal characters not to be allowed, or
None
.