# API Differences between the libraries

We outline the key API differences between the libraries that users should be aware of when using sketches.  The Apache DataSketches are designed to have, as far as possible, a consistent API.  Therefore, although only the following examples are provided here, they roughly map on to all other sketches provided.

In [1]:
import numpy as np
import datasketches as asf
import datasketch as ds
import mmh3

1. The `update()` method for `asf.hll_sketch` accepts inputs as integers, strings, bytes, and floats.  On the other hand, `datasketch.HyperLogLogPlusPlus` only accepts byte and string type inputs.

In [2]:
# Datasketches HLL can accept multiple inputs
# These are treated as different items in a single sketch.
asf_hll_types = asf.hll_sketch(14, asf.HLL_8)
asf_hll_types.update(1)
asf_hll_types.update(1.0)
asf_hll_types.update(str(1))

xx = 1
xx_bytes = xx.to_bytes(64, "little")
asf_hll_types.update(xx_bytes)

asf_hll_types.get_estimate()

4.000000029802323

In [3]:
# datasketch HLL needs bytes
dhll_type = ds.HyperLogLogPlusPlus(14,  hashfunc=lambda x: mmh3.hash64(x, signed=False)[0])
try:
    dhll_type.update(1)
except:
    print("Exception on integer input")
    
try:
    dhll_type.update(1.0)
except:
    print("Exception on float input")
    
try:
    dhll_type.update(xx_bytes)
    print("Accepts bytes input")
except:
    print("Exception on string input")
    
try:
    dhll_type.update(str(1))
    print("Accepts string input")
except:
    print("Exception on string input")
    
print(dhll_type.count()) # only two distinct items inserted into the sketch.

Exception on integer input
Exception on float input
Accepts bytes input
Accepts string input
2.000122080247517


2. The ASF HLL implementation comes with `get_upper_bound()` and `get_lower_bound()` functions.  These enable the user to understand with what confidence.  On the other hand, the `datasketch` implementation returns only the estimated count.



In [20]:
a_hll = asf.hll_sketch(14, asf.HLL_8)
d_hll = ds.HyperLogLogPlusPlus(14,  hashfunc=lambda x: mmh3.hash64(x, signed=False)[0])

n = 1<<15
for x in range(n):
    a_hll.update(x)
    d_hll.update(str(x))

In [21]:
#asf_hll_sketch = all_asf_hll[0] 
print(f"Lower bound (1 std. dev) as % of true value: {(100*a_hll.get_lower_bound(1) / n):.4f}")
print(f"ASF HyperLogLog estimate as % of true value: {(100*a_hll.get_estimate() / n):.4f}")
print(f"Upper bound (1 std. dev) as % of true value: {(100*a_hll.get_upper_bound(1) / n):.4f}")


Lower bound (1 std. dev) as % of true value: 99.5952
ASF HyperLogLog estimate as % of true value: 100.2430
Upper bound (1 std. dev) as % of true value: 100.8992


In [22]:
print(f"datasketch HyperLogLog estimate as % of true value: {(100*d_hll.count() / n):.4f}")

datasketch HyperLogLog estimate as % of true value: 100.6836
