pyspark.sql.DataFrameWriter.saveAsTable

DataFrameWriter.saveAsTable(name: str, format: Optional[str] = None, mode: Optional[str] = None, partitionBy: Union[str, List[str], None] = None, **options: OptionalPrimitiveType) → None[source]

Saves the content of the DataFrame as the specified table.

In the case the table already exists, behavior of this function depends on the save mode, specified by the mode function (default to throwing an exception). When mode is Overwrite, the schema of the DataFrame does not need to be the same as that of the existing table.

  • append: Append contents of this DataFrame to existing data.

  • overwrite: Overwrite existing data.

  • error or errorifexists: Throw an exception if data already exists.

  • ignore: Silently ignore this operation if data already exists.

New in version 1.4.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
namestr

the table name

formatstr, optional

the format used to save

modestr, optional

one of append, overwrite, error, errorifexists, ignore (default: error)

partitionBystr or list

names of partitioning columns

**optionsdict

all other string options

Notes

When mode is Append, if there is an existing table, we will use the format and options of the existing table. The column order in the schema of the DataFrame doesn’t need to be the same as that of the existing table. Unlike DataFrameWriter.insertInto(), DataFrameWriter.saveAsTable() will use the column names to find the correct column positions.

Examples

Creates a table from a DataFrame, and read it back.

>>> _ = spark.sql("DROP TABLE IF EXISTS tblA")
>>> spark.createDataFrame([
...     (100, "Hyukjin Kwon"), (120, "Hyukjin Kwon"), (140, "Haejoon Lee")],
...     schema=["age", "name"]
... ).write.saveAsTable("tblA")
>>> spark.read.table("tblA").sort("age").show()
+---+------------+
|age|        name|
+---+------------+
|100|Hyukjin Kwon|
|120|Hyukjin Kwon|
|140| Haejoon Lee|
+---+------------+
>>> _ = spark.sql("DROP TABLE tblA")