DataFrameWriter.
bucketBy
Buckets the output by the given columns. If specified, the output is laid out on the file system similar to Hive’s bucketing scheme, but with a different bucket hash function and is not compatible with Hive’s bucketing.
New in version 2.3.0.
the number of buckets to save
a name of a column, or a list of names.
additional names (optional). If col is a list it should be empty.
Notes
Applicable for file-based data sources in combination with DataFrameWriter.saveAsTable().
DataFrameWriter.saveAsTable()
Examples
>>> (df.write.format('parquet') ... .bucketBy(100, 'year', 'month') ... .mode("overwrite") ... .saveAsTable('bucketed_table'))