pyspark.sql.DataFrame.unpivot¶

DataFrame.unpivot(ids: Union[ColumnOrName, List[ColumnOrName], Tuple[ColumnOrName, …]], values: Union[ColumnOrName, List[ColumnOrName], Tuple[ColumnOrName, …], None], variableColumnName: str, valueColumnName: str) → DataFrame[source]¶

Unpivot a DataFrame from wide format to long format, optionally leaving identifier columns set. This is the reverse to groupBy(…).pivot(…).agg(…), except for the aggregation, which cannot be reversed.

This function is useful to massage a DataFrame into a format where some columns are identifier columns (“ids”), while all other columns (“values”) are “unpivoted” to the rows, leaving just two non-id columns, named as given by variableColumnName and valueColumnName.

When no “id” columns are given, the unpivoted DataFrame consists of only the “variable” and “value” columns.

The values columns must not be empty so at least one value must be given to be unpivoted. When values is None, all non-id columns will be unpivoted.

All “value” columns must share a least common data type. Unless they are the same data type, all “value” columns are cast to the nearest common data type. For instance, types IntegerType and LongType are cast to LongType, while IntegerType and StringType do not have a common data type and unpivot fails.

New in version 3.4.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters

idsstr, Column, tuple, list: Column(s) to use as identifiers. Can be a single column or column name, or a list or tuple for multiple columns.
valuesstr, Column, tuple, list, optional: Column(s) to unpivot. Can be a single column or column name, or a list or tuple for multiple columns. If specified, must not be empty. If not specified, uses all columns that are not set as ids.
variableColumnNamestr: Name of the variable column.
valueColumnNamestr: Name of the value column.

Returns

DataFrame: Unpivoted DataFrame.

See also

DataFrame.melt

Examples

>>> df = spark.createDataFrame(
...     [(1, 11, 1.1), (2, 12, 1.2)],
...     ["id", "int", "double"],
... )
>>> df.show()
+---+---+------+
| id|int|double|
+---+---+------+
|  1| 11|   1.1|
|  2| 12|   1.2|
+---+---+------+

>>> df.unpivot("id", ["int", "double"], "var", "val").show()
+---+------+----+
| id|   var| val|
+---+------+----+
|  1|   int|11.0|
|  1|double| 1.1|
|  2|   int|12.0|
|  2|double| 1.2|
+---+------+----+

pyspark.sql.DataFrame.unpersist pyspark.sql.DataFrame.where