pyspark.sql.functions.hll_sketch_estimate¶
-
pyspark.sql.functions.
hll_sketch_estimate
(col: ColumnOrName) → pyspark.sql.column.Column[source]¶ Returns the estimated number of unique values given the binary representation of a Datasketches HllSketch.
New in version 3.5.0.
Examples
>>> df = spark.createDataFrame([1,2,2,3], "INT") >>> df = df.agg(hll_sketch_estimate(hll_sketch_agg("value")).alias("distinct_cnt")) >>> df.show() +------------+ |distinct_cnt| +------------+ | 3| +------------+