pyspark.sql.functions.
grouping_id
Aggregate function: returns the level of grouping, equals to
(grouping(c1) << (n-1)) + (grouping(c2) << (n-2)) + … + grouping(cn)
New in version 2.0.0.
Changed in version 3.4.0: Supports Spark Connect.
Column
columns to check for.
returns level of the grouping it relates to.
Notes
The list of columns should match with grouping columns exactly, or empty (means all the grouping columns).
Examples
>>> df = spark.createDataFrame([(1, "a", "a"), ... (3, "a", "a"), ... (4, "b", "c")], ["c1", "c2", "c3"]) >>> df.cube("c2", "c3").agg(grouping_id(), sum("c1")).orderBy("c2", "c3").show() +----+----+-------------+-------+ | c2| c3|grouping_id()|sum(c1)| +----+----+-------------+-------+ |null|null| 3| 8| |null| a| 2| 4| |null| c| 2| 4| | a|null| 1| 4| | a| a| 0| 4| | b|null| 1| 4| | b| c| 0| 4| +----+----+-------------+-------+