Overview

Introduction

Note: This documentation is currently still under development. Expect improvements in the near future.

Google Safe Browsing v5 is an evolution of Google Safe Browsing v4. The two key changes made in v5 are data freshness and IP privacy. In addition, the API surface has been improved to increase flexibility, efficiency, and reduce bloat. Furthermore, Google Safe Browsing v5 is designed to make migration from v4 easy.

Currently, Google offers both v4 and v5 and both are considered production ready. You may use either v4 or v5. We have not announced a date for sunsetting v4; if we do, we will give a minimum notice of one year. This page will describe v5 as well as a migration guide from v4 to v5; the complete v4 documentation remains available.

Data Freshness

One significant improvement of Google Safe Browsing v5 over v4 (specifically, the v4 Update API) is data freshness and coverage. Since the protection highly depends on the client-maintained local database, the delay and size of the local database update is the main contributor of the missed protection. In v4, the typical client takes 20 to 50 minutes to obtain the most up-to-date version of threat lists. Unfortunately, phishing attacks spread fast: as of 2021, 60% of sites that deliver attacks live less than 10 minutes. Our analysis shows that around 25-30% of missing phishing protection is due to such data staleness. Further, some devices are not equipped to manage the entirety of the Google Safe Browsing threat lists, which continues to grow larger over time.

In v5, we introduce a mode of operation known as real-time protection. This circumvents the data staleness problem above. In v4, clients are expected to download and maintain a local database, perform checks against the locally downloaded threat lists, and then when there is a partial prefix match, perform a request to download the full hash. In v5, although clients should continue to download and maintain a local database of threat lists, clients are now also expected to download a list of likely-benign sites (called the Global Cache), perform both a local check for this Global Cache as well as a local threat list check, and finally when there is either a partial prefix match for threat lists or a no-match in the Global Cache, perform a request to download the full hashes. (For details on the local processing required by the client, please see the provided procedure below.) This represents a shift from allow-by-default to check-by-default, which can improve protection in light of faster propagation of threats on the web. In other words, this is a protocol that is designed to provide near-real-time protection: we aim to have clients benefit from fresher Google Safe Browsing data.

IP Privacy

Google Safe Browsing (v4 or v5) does not process anything associated with a user’s identity in the course of serving requests. Cookies, if sent, are ignored. The originating IP addresses of the requests are known to Google, but Google only uses the IP addresses for essential networking needs (i.e. for sending responses) and for anti-DoS purposes.

Concurrently with v5, we introduce a companion API known as the Safe Browsing Oblivious HTTP Gateway API. This uses Oblivious HTTP to hide end users' IP addresses from Google. It works by having a non-colluding third-party to handle an encrypted version of the user request and then forward that to Google. So the third party only has access to the IP addresses, and Google only has access to the content of the request. The third party operates an Oblivious HTTP Relay (such as this service by Fastly), and Google operates the Oblivious HTTP Gateway. This is an optional companion API. When using it in conjunction with Google Safe Browsing, end users' IP addresses are no longer sent to Google.

Appropriate Usage

Permitted Use

The Safe Browsing API is for non-commercial use only (meaning “not for sale or revenue generating purposes”). If you need a solution for commercial purposes, please refer to Web Risk.

Pricing

All Google Safe Browsing APIs are free of charge.

Quotas

Developers are allocated a default usage quota upon enabling the Safe Browsing API. Current allocation and usage can be viewed in the Google Developer Console. If you expect to use more than your currently allocated quota, you may request additional quota from the Developer Console's Quota interface. We review these requests and require a contact when applying for an increased quota to ensure that our service availability meets the needs of all users.

Appropriate URLs

Google Safe Browsing is designed to act on URLs that would be displayed in a browser's address bar. It is not designed to be used to check against subresources (such as a JavaScript or image referenced by an HTML file, or a WebSocket URL initiated by JavaScript). Such subresource URLs should not be checked against Google Safe Browsing.

If visiting a URL results in a redirect (such as HTTP 301), it is appropriate for the redirected URL to be checked against Google Safe Browsing. Client-side URL manipulation such as History.pushState does not result in new URLs to be checked against Google Safe Browsing.

User Warnings

If you use Google Safe Browsing to warn users about risks from particular webpages, the following guidelines apply.

These guidelines help protect both you and Google from misunderstandings by making clear that the page is not known with 100% certainty to be an unsafe web resource, and that the warnings merely identify possible risk.

  • In your user visible warning, you must not lead users to believe that the page in question is, without a doubt, an unsafe web resource. When you refer to the page being identified or the potential risks it may pose to users, you must qualify the warning using terms such as: suspected, potentially, possible, likely, may be.
  • Your warning must enable the user to learn more by reviewing Google's definition of various threats. The following links are suggested:
  • When you show warnings for pages identified as risky by the Safe Browsing Service, you must give attribution to Google by including the line "Advisory provided by Google" with a link to the Safe Browsing Advisory. If your product also shows warnings based on other sources, you must not include the Google attribution in warnings derived from non-Google data.
  • In your product documentation, you must provide a notice to let users know that the protection offered by Google Safe Browsing is not perfect. It must let them know that there is a chance of both false positives (safe sites flagged as risky) and false negatives (risky sites not flagged). We suggest using the following language:

    Google works to provide the most accurate and up-to-date information about unsafe web resources. However, Google cannot guarantee that its information is comprehensive and error-free: some risky sites may not be identified, and some safe sites may be identified in error.

The Modes of Operation

Google Safe Browsing v5 allows clients to choose from three modes of operation.

Real-Time Mode

When clients choose to use Google Safe Browsing v5 in real-time mode, clients will maintain in their local database: (i) a Global Cache of likely-benign sites, formatted as SHA256 hashes of host-suffix/path-prefix URL expressions, (ii) a set of threat lists, formatted as SHA256 hash prefixes of host-suffix/path-prefix URL expressions. The high-level idea is that whenever the client wishes to check a particular URL, a local check is performed using the Global Cache. If that check passes, a local threat lists check is performed. Otherwise, the client continues with the real-time hash check as detailed below.

Besides the local database, the client will maintain a local cache. Such a local cache need not be in persistent storage and may be cleared in case of memory pressure.

A detailed specification of the procedure is available below.

Local List Mode

When clients choose to use Google Safe Browsing v5 in this mode, the client behavior is similar to the v4 Update API except using the improved API surface of v5. Clients will maintain in their local database a set of threat lists formatted as SHA256 hash prefixes of host-suffix/path-prefix URL expressions. Whenever the client wishes to check a particular URL, a check is performed using the local threat list. If and only if there is a match, the client connects to the server to continue the check.

As with the above, the client will also maintain a local cache that need not be in persistent storage.

No-Storage Real-Time Mode

When clients choose to use Google Safe Browsing v5 in the no-storage real-time mode, the client need not maintain any persistent local database. However, the client is still expected to maintain a local cache. Such a local cache need not be in persistent storage and may be cleared in case of memory pressure.

Whenever the client wishes to check a particular URL, the client always connects to the server to perform a check. This mode is similar to what clients of the v4 Lookup API may implement.

Compared to the Real-Time Mode, this mode may use more network bandwidth but may be more suitable if it is inconvenient for the client to maintain persistent local state.

Checking URLs

This section contains detailed specifications of how clients check URLs.

Canonicalization of URLs

Before any URLs are checked, the client is expected to perform some canonicalization on that URL.

To begin, we assume that the client has parsed the URL and made it valid according to RFC 2396. If the URL uses an internationalized domain name (IDN), the client should convert the URL to the ASCII Punycode representation. The URL must include a path component; that is, it must have at least one slash following the domain (http://google.com/ instead of http://google.com).

First, remove tab (0x09), CR (0x0d), and LF (0x0a) characters from the URL. Do not remove escape sequences for these characters (e.g. %0a).

Second, if the URL ends in a fragment, remove the fragment. For example, shorten http://google.com/#frag to http://google.com/.

Third, repeatedly percent-unescape the URL until it has no more percent-escapes. (This may render the URL invalid.)

To canonicalize the hostname:

Extract the hostname from the URL and then:

  1. Remove all leading and trailing dots.
  2. Replace consecutive dots with a single dot.
  3. If the hostname can be parsed as an IPv4 address, normalize it to 4 dot-separated decimal values. The client should handle any legal IP-address encoding, including octal, hex, and fewer than four components.
  4. If the hostname can be parsed as a bracketed IPv6 address, normalize it by removing unnecessary leading zeroes in the components and collapsing zero components by using the double-colon syntax. For example [2001:0db8:0000::1] should be transformed into [2001:db8::1]. If the hostname is one of the two following special IPv6 address types, transform them into IPv4:
    • An IPv4-mapped IPv6 address, such as [::ffff:1.2.3.4], which should be transformed into 1.2.3.4;
    • A NAT64 address using the well-known prefix 64:ff9b::/96, such as [64:ff9b::1.2.3.4], which should be transformed into 1.2.3.4.
  5. Lowercase the whole string.

To canonicalize the path:

  1. Resolve the sequences /../ and /./ in the path by replacing /./ with /, and removing /../ along with the preceding path component.
  2. Replace runs of consecutive slashes with a single slash character.

Do not apply these path canonicalizations to the query parameters.

In the URL, percent-escape all characters that are <= ASCII 32, >= 127, #, or %. The escapes should use uppercase hex characters.

Host-Suffix Path-Prefix Expressions

Once the URL is canonicalized, the next step is to create the suffix/prefix expressions. Each suffix/prefix expression consists of a host suffix (or full host) and a path prefix (or full path).

The client will form up to 30 different possible host suffix and path prefix combinations. These combinations use only the host and path components of the URL. The scheme, username, password, and port are discarded. If the URL includes query parameters, then at least one combination will include the full path and query parameters.

For the host, the client will try at most five different strings. They are:

  • If the hostname is not an IPv4 or IPv6 literal, up to four hostnames formed by starting with the eTLD+1 domain and adding successive leading components. The determination of eTLD+1 should be based on the Public Suffix List. For example, a.b.example.com would result in the eTLD+1 domain of example.com as well as the host with one additional host component b.example.com.
  • The exact hostname in the URL. Following the previous example, a.b.example.com would be checked.

For the path, the client will try at most six different strings. They are:

  • The exact path of the URL, including query parameters.
  • The exact path of the URL, without query parameters.
  • The four paths formed by starting at the root (/) and successively appending path components, including a trailing slash.

The following examples illustrate the check behavior:

For the URL http://a.b.com/1/2.html?param=1, the client will try these possible strings:

a.b.com/1/2.html?param=1
a.b.com/1/2.html
a.b.com/
a.b.com/1/
b.com/1/2.html?param=1
b.com/1/2.html
b.com/
b.com/1/

For the URL http://a.b.c.d.e.f.com/1.html, the client will try these possible strings:

a.b.c.d.e.f.com/1.html
a.b.c.d.e.f.com/
c.d.e.f.com/1.html
c.d.e.f.com/
d.e.f.com/1.html
d.e.f.com/
e.f.com/1.html
e.f.com/
f.com/1.html
f.com/

(Note: skip b.c.d.e.f.com, since we'll take only the last five hostname components, and the full hostname.)

For the URL http://1.2.3.4/1/, the client will try these possible strings:

1.2.3.4/1/
1.2.3.4/

For the URL http://example.co.uk/1, the client will try these possible strings:

example.co.uk/1
example.co.uk/

Hashing

Google Safe Browsing exclusively uses SHA256 as the hash function. This hash function should be applied to the above expressions.

The full 32-byte hash will, depending on the circumstances, be truncated to 4 bytes, 8 bytes, or 16 bytes:

  • When using the hashes.search method, we currently require the hashes in the request to be truncated to exactly 4 bytes. Sending additional bytes in this request will compromise user privacy.

  • When downloading the lists for the local database using the hashList.get method or the hashLists.batchGet method, the length of the hashes sent by the server is influenced by both the nature of the list and the client's preference of the hash length, communicated by the desired_hash_length parameter.

The Real-Time URL Check Procedure

This procedure is used when the client chooses the real-time mode of operation.

This procedure takes a single URL u and returns SAFE, UNSAFE or UNSURE. If it returns SAFE the URL is deemed safe by Google Safe Browsing. If it returns UNSAFE the URL is deemed potentially unsafe by Google Safe Browsing and appropriate action should be taken: such as showing a warning to the end user, moving a received message to the spam folder, or requiring extra confirmation by the user before proceeding. If it returns UNSURE, the following local-check procedure should be used afterwards.

  1. Let expressions be a list of suffix/prefix expressions generated by the URL u.
  2. Let expressionHashes be a list, where the elements are SHA256 hashes of each expression in expressions.
  3. For each hash of expressionHashes:
    1. If hash can be found in the global cache, return UNSURE.
  4. Let expressionHashPrefixes be a list, where the elements are the first 4 bytes of each hash in expressionHashes.
  5. For each expressionHashPrefix of expressionHashPrefixes:
    1. Look up expressionHashPrefix in the local cache.
    2. If the cached entry is found:
      1. Determine whether the current time is greater than its expiration time.
      2. If it is greater:
        1. Remove the found cached entry from the local cache.
        2. Continue with the loop.
      3. If it is not greater:
        1. Remove this particular expressionHashPrefix from expressionHashPrefixes.
        2. Check whether the corresponding full hash within expressionHashes is found in the cached entry.
        3. If found, return UNSAFE.
        4. If not found, continue with the loop.
    3. If the cached entry is not found, continue with the loop.
  6. Send expressionHashPrefixes to the Google Safe Browsing v5 server using RPC SearchHashes or the REST method hashes.search. If an error occurred (including network errors, HTTP errors, etc), return UNSURE. Otherwise, let response be the response received from the SB server, which is a list of full hashes together with some auxiliary information identifying the nature of the threat (social engineering, malware, etc), as well as the cache expiration time expiration.
  7. For each fullHash of response:
    1. Insert fullHash into the local cache, together with expiration.
  8. For each fullHash of response:
    1. Let isFound be the result of finding fullHash in expressionHashes.
    2. If isFound is False, continue with the loop.
    3. If isFound is True, return UNSAFE.
  9. Return SAFE.

While this protocol specifies when the client sends expressionHashPrefixes to the server, this protocol purposefully does not specify exactly how to send them. For example, it is acceptable for the client to send all the expressionHashPrefixes in a single request, and it is also acceptable for the client to send each individual prefix in expressionHashPrefixes to the server in separate requests (perhaps proceeding in parallel). It is also acceptable for the client to send unrelated or randomly generated hash prefixes together with the hash prefixes in expressionHashPrefixes, as long as the number of hash prefixes sent in a single request does not exceed 30.

The LocalThreat List URL Check Procedure

This procedure is used when the client opts for the local list mode of operation. It is also used when the client the RealTimeCheck procedure above returns the value of UNSURE.

This procedure takes a single URL u and returns SAFE or UNSAFE.

  1. Let expressions be a list of suffix/prefix expressions generated by the URL u.
  2. Let expressionHashes be a list, where the elements are SHA256 hashes of each expression in expressions.
  3. Let expressionHashPrefixes be a list, where the elements are the first 4 bytes of each hash in expressionHashes.
  4. For each expressionHashPrefix of expressionHashPrefixes:
    1. Look up expressionHashPrefix in the local cache.
    2. If the cached entry is found:
      1. Determine whether the current time is greater than its expiration time.
      2. If it is greater:
        1. Remove the found cached entry from the local cache.
        2. Continue with the loop.
      3. If it is not greater:
        1. Remove this particular expressionHashPrefix from expressionHashPrefixes.
        2. Check whether the corresponding full hash within expressionHashes is found in the cached entry.
        3. If found, return UNSAFE.
        4. If not found, continue with the loop.
    3. If the cached entry is not found, continue with the loop.
  5. For each expressionHashPrefix of expressionHashPrefixes:
    1. Look up expressionHashPrefix in the local threat list database.
    2. If the expressionHashPrefix cannot be found in the local threat list database, remove it from expressionHashPrefixes.
  6. Send expressionHashPrefixes to the Google Safe Browsing v5 server using RPC SearchHashes or the REST method hashes.search. If an error occurred (including network errors, HTTP errors, etc), return SAFE. Otherwise, let response be the response received from the SB server, which is a list of full hashes together with some auxiliary information identifying the nature of the threat (social engineering, malware, etc), as well as the cache expiration time expiration.
  7. For each fullHash of response:
    1. Insert fullHash into the local cache, together with expiration.
  8. For each fullHash of response:
    1. Let isFound be the result of finding fullHash in expressionHashes.
    2. If isFound is False, continue with the loop.
    3. If isFound is True, return UNSAFE.
  9. Return SAFE.

The Real-Time URL Check Procedure Without a Local Database

This procedure is used when the client chooses the no-storage real-time mode of operation.

This procedure takes a single URL u and returns SAFE or UNSAFE.

  1. Let expressions be a list of suffix/prefix expressions generated by the URL u.
  2. Let expressionHashes be a list, where the elements are SHA256 hashes of each expression in expressions.
  3. Let expressionHashPrefixes be a list, where the elements are the first 4 bytes of each hash in expressionHashes.
  4. For each expressionHashPrefix of expressionHashPrefixes:
    1. Look up expressionHashPrefix in the local cache.
    2. If the cached entry is found:
      1. Determine whether the current time is greater than its expiration time.
      2. If it is greater:
        1. Remove the found cached entry from the local cache.
        2. Continue with the loop.
      3. If it is not greater:
        1. Remove this particular expressionHashPrefix from expressionHashPrefixes.
        2. Check whether the corresponding full hash within expressionHashes is found in the cached entry.
        3. If found, return UNSAFE.
        4. If not found, continue with the loop.
    3. If the cached entry is not found, continue with the loop.
  5. Send expressionHashPrefixes to the Google Safe Browsing v5 server using RPC SearchHashes or the REST method hashes.search. If an error occurred (including network errors, HTTP errors, etc), return SAFE. Otherwise, let response be the response received from the SB server, which is a list of full hashes together with some auxiliary information identifying the nature of the threat (social engineering, malware, etc), as well as the cache expiration time expiration.
  6. For each fullHash of response:
    1. Insert fullHash into the local cache, together with expiration.
  7. For each fullHash of response:
    1. Let isFound be the result of finding fullHash in expressionHashes.
    2. If isFound is False, continue with the loop.
    3. If isFound is True, return UNSAFE.
  8. Return SAFE.

Just like the Real-Time URL Check Procedure, this procedure does not specify exactly how to send the hash prefixes to the server. For example, it is acceptable for the client to send all the expressionHashPrefixes in a single request, and it is also acceptable for the client to send each individual prefix in expressionHashPrefixes to the server in separate requests (perhaps proceeding in parallel). It is also acceptable for the client to send unrelated or randomly generated hash prefixes together with the hash prefixes in expressionHashPrefixes, as long as the number of hash prefixes sent in a single request does not exceed 30.

Local Database Maintenance

Google Safe Browsing v5 expects the client to maintain a local database, except when the client chooses the No-Storage Real-Time Mode. It is up to the client the format and storage of this local database. The contents of this local database can conceptually be thought of as a folder containing various lists as files, and the contents of these files are SHA256 hashes or hash prefixes.

Database Updates

The client will regularly call the hashList.get method or the hashLists.batchGet method to update the database. Since the typical client will want to update multiple lists at a time, it is recommended to use hashLists.batchGet method.

Lists are identified by their distinct names. The names are short ASCII strings a few characters long.

Unlike V4, where lists are identified by the tuple of threat type, platform type, threat entry type, in v5 lists are simply identified by name. This provides flexibility when multiple v5 lists could share the same threat type. Platform types and threat entry types are removed in v5.

Once a name has been chosen for a list, it will never be renamed. Furthermore, once a list has appeared, it will never be removed (if the list is no longer useful, it will become empty but will continue to exist). Therefore, it is appropriate to hard code these names in the Google Safe Browsing client code.

Both the hashList.get method and the hashLists.batchGet method support incremental updates. Using incremental updates saves bandwidth and improves performance. Incremental updates work by delivering a delta between client's version of the list and the latest version of the list. (If a client is newly deployed and does not have any versions available, a full update is available.) The incremental update contains removal indices and additions. The client is first expected to remove the entries at the specified indices from its local database, and then apply the additions.

Finally, to prevent corruption, the client should check the stored data against the checksum provided by the server. Whenever the checksum does not match, the client should perform a full update.

Decoding the List Content

Decoding Hashes and Hash Prefixes

All lists are delivered using a special encoding to reduce size. This encoding works by recognizing that Google Safe Browsing lists contain, conceptually, a set of hashes or hash prefixes, which are statistically indistinguishable from random integers. If we were to sort these integers and take their adjacent difference, such adjacent difference is expected to be "small" in a sense. Golomb-Rice encoding then exploits this smallness.

Suppose that three host-suffix path-prefix expressions, namely a.example.com/, b.example.com/, and y.example.com/, are to be transmitted using 4-byte hash prefixes. Further suppose that the Rice parameter, denoted by k, is chosen to be 30. The server would start by calculating the full hash for these strings, which are, respectively:

291bc5421f1cd54d99afcc55d166e2b9fe42447025895bf09dd41b2110a687dc  a.example.com/
1d32c5084a360e58f1b87109637a6810acad97a861a7769e8f1841410d2a960c  b.example.com/
f7a502e56e8b01c6dc242b35122683c9d25d07fb1f532d9853eb0ef3ff334f03  y.example.com/

The server then forms 4-byte hash prefixes for each of the above, which is the first 4 bytes of the 32-byte full hash, interpreted as big-endian 32-bit integers. The big endianness refers to the fact that the first byte of the full hash becomes the most significant byte of the 32-bit integer. This step results in the integers 0x291bc542, 0x1d32c508, and 0xf7a502e5.

It is necessary for the server to sort these three hash prefixes lexicographically (equivalent to numerical sorting in big endian), and the result of the sorting is 0x1d32c508, 0x291bc542, 0xf7a502e5. The first hash prefix is stored unchanged in the first_value field.

The server then calculates the two adjacent differences, which are 0xbe9003a and 0xce893da3 respectively. Given that k is chosen to be 30, the server splits these two numbers into the quotient parts and remainder parts that are 2 and 30 bits long respectively. For the first number, the quotient part is zero and the remainder is 0xbe9003a; for the second number, the quotient part is 3 because the most significant two bits are 11 in binary and the remainder is 0xe893da3. For a given quotient q it is encoded into (1 << q) - 1 using exactly 1 + q bits; the remainder is encoded directly using k bits. The quotient part of the first number is encoded as 0, and the remainder part is in binary 001011111010010000000000111010; the quotient part of the second number is encoded as 0111, and the remainder part is 001110100010010011110110100011.

When these numbers are formed into a byte string, little endian is used. Conceptually it may be easier to imagine a long bitstring being formed starting from the least significant bits: we take the quotient part of the first number and prepend the remainder part of the first number; we then further prepend the quotient part of the second number and prepend the remainder part. This should result in the following large number (linebreaks and comments added for clarity):

001110100010010011110110100011 # Second number, remainder part
0111 # Second number, quotient part
001011111010010000000000111010 # First number, remainder part
0 # First number, quotient part

Written in a single line this would be

00111010001001001111011010001101110010111110100100000000001110100

Obviously this number far exceeds the 8 bits available in a single byte. The little endian encoding then takes the least significant 8 bits in that number, and outputs it as the first byte which is 01110100. For clarity, we can group the above bitstring into groups of eight starting from the least significant bits:

0 01110100 01001001 11101101 00011011 10010111 11010010 00000000 01110100

The little endian encoding then takes each byte from the right and puts that into a bytestring:

01110100
00000000
11010010
10010111
00011011
11101101
01001001
01110100
00000000

It can be seen that since we conceptually prepend new parts to the large number on the left (i.e. adding more significant bits) but we encode from the right (i.e. the least significant bits), the encoding and decoding can be performed incrementally.

This finally results in

additions_four_bytes {
  first_value: 489866504
  rice_parameter: 30
  entries_count: 2
  encoded_data: "t\000\322\227\033\355It\000"
}

The client simply follows the above steps in reverse to decode the hash prefixes. Unlike v4, there is no need to perform a byte swap at the end due to the fact that the hash prefix integers are interpreted as big endian.

Decoding Removal Indices

Removal indices are encoded using the exact same technique as above using 32-bit integers. The encoding and decoding of removal indices have not changed between v4 and v5.

Available Lists

The following lists are recommended for use in v5alpha1:

List Name Corresponding v4 ThreatType Enum Description
gc None This list is a Global Cache list. It is a special list only used in the Real-Time mode of operation.
se SOCIAL_ENGINEERING This list contains threats of the SOCIAL_ENGINEERING threat type.
mw MALWARE This list contains threats of the MALWARE threat type for desktop platforms.
uws UNWANTED_SOFTWARE This list contains threats of the UNWANTED_SOFTWARE threat type for desktop platforms.
uwsa UNWANTED_SOFTWARE This list contains threats of the UNWANTED_SOFTWARE threat type for Android platforms.
pha POTENTIALLY_HARMFUL_APPLICATION This list contains threats of the POTENTIALLY_HARMFUL_APPLICATION threat type for Android platforms.

Additional lists will become available at a later date, at which time the above table will be expanded.

It is allowed for the client to operate a caching proxy server to retrieve some or all of the above lists and then have the client contact the proxy server. If this is implemented, we recommend a short cache duration such as five minutes; in the future this cache duration may be communicated using the standard Cache-Control HTTP header.

Update Frequency

The client should inspect the server's returned value in the field minimum_wait_duration and use that to schedule the next update of the database. This value is possibly zero (the field minimum_wait_duration is completely missing), in which case the client SHOULD immediately perform another update.

Example Requests

This section documents some examples of directly using the HTTP API to access Google Safe Browsing. It is generally recommended to use a generated language binding because it will automatically handle encoding and decoding in a convenient way. Please refer to the documentation for that binding.

Here is an example HTTP request using the hashes.search method:

GET https://safebrowsing.googleapis.com/v5/hashes:search?key=INSERT_YOUR_API_KEY_HERE&hashPrefixes=WwuJdQ

The response body is a protocol-buffer formatted payload that you may then decode.

Here is an example HTTP request using the hashLists.batchGet method:

GET https://safebrowsing.googleapis.com/v5alpha1/hashLists:batchGet?key=INSERT_YOUR_API_KEY_HERE&names=se&names=mw

The response body is, once again, a protocol-buffer formatted payload that you may then decode.

Migration Guide

If you are currently using the v4 Update API, there is a seamless migration path from v4 to v5 without having to reset or erase the local database. This section documents how to do that.

Converting List Updates

In v4, one would use the threatListUpdates.fetch method to download lists. In v5, one would switch to the hashLists.batchGet method.

The following changes should be made to the request:

  1. Remove the v4 ClientInfo object altogether. Instead of supplying a client's identification using a dedicated field, simply use the well-known User-Agent header. While there is no prescribed format for supplying the client identification in this header, we suggest simply including the original client ID and client version separated by a space character or a slash character.
  2. For each v4 ListUpdateRequest object:
    • Look up the corresponding v5 list name in the table above and supply that name in the v5 request.
    • Remove unneeded fields such as threat_entry_type or platform_type.
    • The state field in v4 is directly compatible with the v5 versions field. The same byte string that would be sent to the server using the state field in v4 can simply be sent in v5 using the versions field.
    • For the v4 constraints, v5 uses a simplified version called SizeConstraints. Additional fields such as region should be dropped.

The following changes should be made to the response:

  1. The v4 enum ResponseType is simply replaced by a boolean field named partial_update.
  2. The minimum_wait_duration field can now be zero or omitted. If it is, the client is requested to immediately make another request. This only happens when the client specifies in SizeConstraints a smaller constraint on max update size than the max database size.
  3. The Rice decoding algorithm for 32-bit integers will need to be adjusted. The difference is that the encoded data are encoded with a different endianness. In both v4 and v5, 32-bit hash prefixes are sorted lexicographically. But in v4, those prefixes are treated as little endian when sorted, whereas in v5 those prefixes are treated as big endian when sorted. This means that the client does not need to do any sorting, since lexicographic sorting is identical to numeric sorting with big endian. An example of this sort in the Chromium implementation of v4 can be found here. Such sorting can be removed.
  4. The Rice decoding algorithm will need to be implemented for other hash lengths.

Converting Hash Searches

In v4, one would use the fullHashes.find method to get full hashes. The equivalent method in v5 is the hashes.search method.

The following changes should be made to the request:

  1. Structure the code to only send hash prefixes that are exactly 4 bytes in length.
  2. Remove the v4 ClientInfo objects altogether. Instead of supplying a client's identification using a dedicated field, simply use the well-known User-Agent header. While there is no prescribed format for supplying the client identification in this header, we suggest simply including the original client ID and client version separated by a space character or a slash character.
  3. Remove the client_states field. It is no longer necessary.
  4. It is no longer needed to include threat_types and similar fields.

The following changes should be made to the response:

  1. The minimum_wait_duration field has been removed. The client can always issue a new request on an as-needed basis.
  2. The v4 ThreatMatch object has been simplified into the FullHash object.
  3. Caching has been simplified into a single cache duration. See the above procedures for interacting with the cache.