pyspark.sql.DataFrame.sort

DataFrame.sort(*cols: Union[str, pyspark.sql.column.Column, List[Union[str, pyspark.sql.column.Column]]], **kwargs: Any) → pyspark.sql.dataframe.DataFrame[source]

Returns a new DataFrame sorted by the specified column(s).

New in version 1.3.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
colsstr, list, or Column, optional

list of Column or column names to sort by.

Returns
DataFrame

Sorted DataFrame.

Other Parameters
ascendingbool or list, optional, default True

boolean or list of boolean. Sort ascending vs. descending. Specify list for multiple sort orders. If a list is specified, the length of the list must equal the length of the cols.

Examples

>>> from pyspark.sql.functions import desc, asc
>>> df = spark.createDataFrame([
...     (2, "Alice"), (5, "Bob")], schema=["age", "name"])

Sort the DataFrame in ascending order.

>>> df.sort(asc("age")).show()
+---+-----+
|age| name|
+---+-----+
|  2|Alice|
|  5|  Bob|
+---+-----+

Sort the DataFrame in descending order.

>>> df.sort(df.age.desc()).show()
+---+-----+
|age| name|
+---+-----+
|  5|  Bob|
|  2|Alice|
+---+-----+
>>> df.orderBy(df.age.desc()).show()
+---+-----+
|age| name|
+---+-----+
|  5|  Bob|
|  2|Alice|
+---+-----+
>>> df.sort("age", ascending=False).show()
+---+-----+
|age| name|
+---+-----+
|  5|  Bob|
|  2|Alice|
+---+-----+

Specify multiple columns

>>> df = spark.createDataFrame([
...     (2, "Alice"), (2, "Bob"), (5, "Bob")], schema=["age", "name"])
>>> df.orderBy(desc("age"), "name").show()
+---+-----+
|age| name|
+---+-----+
|  5|  Bob|
|  2|Alice|
|  2|  Bob|
+---+-----+

Specify multiple columns for sorting order at ascending.

>>> df.orderBy(["age", "name"], ascending=[False, False]).show()
+---+-----+
|age| name|
+---+-----+
|  5|  Bob|
|  2|  Bob|
|  2|Alice|
+---+-----+