pyspark.sql.DataFrameStatFunctions

class pyspark.sql.DataFrameStatFunctions(df)[source]

Functionality for statistic functions with DataFrame.

New in version 1.4.

Methods

approxQuantile(col, probabilities, relativeError)

Calculates the approximate quantiles of numerical columns of a DataFrame.

corr(col1, col2[, method])

Calculates the correlation of two columns of a DataFrame as a double value.

cov(col1, col2)

Calculate the sample covariance for the given columns, specified by their names, as a double value.

crosstab(col1, col2)

Computes a pair-wise frequency table of the given columns.

freqItems(cols[, support])

Finding frequent items for columns, possibly with false positives.

sampleBy(col, fractions[, seed])

Returns a stratified sample without replacement based on the fraction given on each stratum.