This document applies to the following method: Update API (v4): threatListUpdates.fetch.
About compression
Compression is a key feature of the Safe Browsing APIs (v4). Compression significantly reduces bandwidth requirements, which is particularly, but not exclusively, relevant for mobile devices. The Safe Browsing server currently supports Rice compression. Additional compression methods may be added in the future.
Compression is set using the supportedCompressions field and CompressionType. Clients should use the RICE and RAW compression types. Safe Browsing uses the COMPRESSION_TYPE_UNSPECIFIED type when the compression type is not set (RAW compression will be substituted).
The Safe Browsing server will also use standard HTTP compression to further compress responses, regardless of the compression type selected, as long as the client sets the correct HTTP compression header (see the Wikipedia article HTTP compression).
Rice compression
As noted, the Safe Browsing server currently supports Rice compression (see the Wikipedia article Golomb coding for a full discussion of Golomb-Rice coding).
Compression/decompression
The RiceDeltaEncoding object represents the Rice-Golomb encoded data and is used to send compressed removal indices or compressed 4-byte hash prefixes. (Hash prefixes longer than 4 bytes will not be compressed, and will be served in raw format instead.)
For removal indices, the list of indices is sorted in ascending order and then delta encoded using RICE encoding. For additions, the 4-byte hash prefixes are re-interpreted as little-endian uint32s, sorted in ascending order, and then delta encoded using RICE encoding. Note the difference in hash format between RICE compression and RAW: raw hashes are lexicographically sorted bytes, whereas Rice hashes are uint32s sorted in ascending order (after decompression).
That is, the list of integers [1, 5, 7, 13] will be encoded as 1 (the first value) and the deltas [4, 2, 6].
The first value is stored in the firstValue
field and the deltas are encoded using a Golomb-Rice
encoder. The Rice parameter k (see below) is stored in riceParameter. The numEntries
field
contains the number of deltas encoded in the Rice encoder (3 in our example above, not 4). The
encodedData
field contains the actual encoded deltas.
Encoder/decoder
In the Rice encoder/decoder every delta n is encoded as q and r where n = (q<<k) + r (or, n = q * (2**k) + r). k is a constant and a parameter of the Rice encoder/decoder. The values for q and r are encoded in the bit stream using different encoding schemes.
The quotient q is encoded in unary coding followed by a 0. That is, 3 would be encoded as 1110, 4 as 11110 and 7 as 11111110. The quotient q is decoded first.
The remainder r is encoded using truncated binary encoding. Only the least significant k bits of r are written (and therefore read) from the bit stream. The remainder r is decoded after having decoded q.
Bit encoder/decoder
The Rice encoder relies on a bit encoder/decoder where single bits can be appended to the bit encoder; that is, to encode a quotient q that could be only two bits long.
The bit encoder is a list of (8-bit) bytes. Bits are set from the lowest significant bit in the first byte to the highest significant bit in the first byte. If a byte has all its bits already set, a new byte (initialized to zero) is appended to the end of the byte list. If the last byte is not fully used, its highest significant bits are set to zero. Example:
Bits Added | BitEncoder After Adding Bits |
---|---|
[] | |
0 | [00000000] |
1 | [00000010] |
1 | [00000110] |
1,0,1 | [00101110] |
0,0,0 | [00101110, 00000000] |
1,1,0 | [00101110, 00000110] |