pyspark.sql.functions.corr

pyspark.sql.functions.corr(col1: ColumnOrName, col2: ColumnOrName) → pyspark.sql.column.Column[source]

Returns a new Column for the Pearson Correlation Coefficient for col1 and col2.

New in version 1.6.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
col1Column or str

first column to calculate correlation.

col1Column or str

second column to calculate correlation.

Returns
Column

Pearson Correlation Coefficient of these two column values.

Examples

>>> a = range(20)
>>> b = [2 * x for x in range(20)]
>>> df = spark.createDataFrame(zip(a, b), ["a", "b"])
>>> df.agg(corr("a", "b").alias('c')).collect()
[Row(c=1.0)]