pyspark.sql.functions.partitioning.bucket#
- pyspark.sql.functions.partitioning.bucket(numBuckets, col)[source]#
Partition transform function: A transform for any type that partitions by a hash of the input column.
New in version 4.0.0.
- Parameters
- col
Column
or str target date or timestamp column to work on.
- col
- Returns
Column
data partitioned by given columns.
Notes
This function can be used only in combination with
partitionedBy()
method of the DataFrameWriterV2.Examples
>>> df.writeTo("catalog.db.table").partitionedBy( ... partitioning.bucket(42, "ts") ... ).createOrReplace()