Repartition by range
repartitionByRange.Rd
The following options for repartition by range are possible:
1. Return a new SparkDataFrame range partitioned by the given columns into
numPartitions
.2. Return a new SparkDataFrame range partitioned by the given column(s), using
spark.sql.shuffle.partitions
as number of partitions.
At least one partition-by expression must be specified. When no explicit sort order is specified, "ascending nulls first" is assumed.
Usage
repartitionByRange(x, ...)
# S4 method for class 'SparkDataFrame'
repartitionByRange(x, numPartitions = NULL, col = NULL, ...)
Details
Note that due to performance reasons this method uses sampling to estimate the ranges.
Hence, the output may not be consistent, since sampling can return different values.
The sample size can be controlled by the config
spark.sql.execution.rangeExchange.sampleSizePerPartition
.
See also
Other SparkDataFrame functions:
SparkDataFrame-class
,
agg()
,
alias()
,
arrange()
,
as.data.frame()
,
attach,SparkDataFrame-method
,
broadcast()
,
cache()
,
checkpoint()
,
coalesce()
,
collect()
,
colnames()
,
coltypes()
,
createOrReplaceTempView()
,
crossJoin()
,
cube()
,
dapply()
,
dapplyCollect()
,
describe()
,
dim()
,
distinct()
,
drop()
,
dropDuplicates()
,
dropna()
,
dtypes()
,
except()
,
exceptAll()
,
explain()
,
filter()
,
first()
,
gapply()
,
gapplyCollect()
,
getNumPartitions()
,
group_by()
,
head()
,
hint()
,
histogram()
,
insertInto()
,
intersect()
,
intersectAll()
,
isLocal()
,
isStreaming()
,
join()
,
limit()
,
localCheckpoint()
,
merge()
,
mutate()
,
ncol()
,
nrow()
,
persist()
,
printSchema()
,
randomSplit()
,
rbind()
,
rename()
,
repartition()
,
rollup()
,
sample()
,
saveAsTable()
,
schema()
,
select()
,
selectExpr()
,
show()
,
showDF()
,
storageLevel()
,
str()
,
subset()
,
summary()
,
take()
,
toJSON()
,
union()
,
unionAll()
,
unionByName()
,
unpersist()
,
unpivot()
,
with()
,
withColumn()
,
withWatermark()
,
write.df()
,
write.jdbc()
,
write.json()
,
write.orc()
,
write.parquet()
,
write.stream()
,
write.text()
Examples
if (FALSE) { # \dontrun{
sparkR.session()
path <- "path/to/file.json"
df <- read.json(path)
newDF <- repartitionByRange(df, col = df$col1, df$col2)
newDF <- repartitionByRange(df, 3L, col = df$col1, df$col2)
} # }