pyspark.sql.DataFrame.union

DataFrame.union(other: pyspark.sql.dataframe.DataFrame) → pyspark.sql.dataframe.DataFrame[source]

Return a new DataFrame containing union of rows in this and another DataFrame.

New in version 2.0.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
otherDataFrame

Another DataFrame that needs to be unioned

Returns
DataFrame

Notes

This is equivalent to UNION ALL in SQL. To do a SQL-style set union (that does deduplication of elements), use this function followed by distinct().

Also as standard in SQL, this function resolves columns by position (not by name).

Examples

>>> df1 = spark.createDataFrame([[1, 2, 3]], ["col0", "col1", "col2"])
>>> df2 = spark.createDataFrame([[4, 5, 6]], ["col1", "col2", "col0"])
>>> df1.union(df2).show()
+----+----+----+
|col0|col1|col2|
+----+----+----+
|   1|   2|   3|
|   4|   5|   6|
+----+----+----+
>>> df1.union(df1).show()
+----+----+----+
|col0|col1|col2|
+----+----+----+
|   1|   2|   3|
|   1|   2|   3|
+----+----+----+