pyspark.sql.functions.
max_by
Returns the value associated with the maximum value of ord.
New in version 3.3.0.
Column
target column that the value will be returned
column to be maximized
value associated with the maximum value of ord.
Examples
>>> df = spark.createDataFrame([ ... ("Java", 2012, 20000), ("dotNET", 2012, 5000), ... ("dotNET", 2013, 48000), ("Java", 2013, 30000)], ... schema=("course", "year", "earnings")) >>> df.groupby("course").agg(max_by("year", "earnings")).show() +------+----------------------+ |course|max_by(year, earnings)| +------+----------------------+ | Java| 2013| |dotNET| 2013| +------+----------------------+