pyspark.pandas.DataFrame.corr¶
-
DataFrame.
corr
(method: str = 'pearson') → pyspark.pandas.frame.DataFrame[source]¶ Compute pairwise correlation of columns, excluding NA/null values.
- Parameters
- method{‘pearson’, ‘spearman’}
pearson : standard correlation coefficient
spearman : Spearman rank correlation
- Returns
- yDataFrame
See also
Notes
There are behavior differences between pandas-on-Spark and pandas.
the method argument only accepts ‘pearson’, ‘spearman’
the data should not contain NaNs. pandas-on-Spark will return an error.
pandas-on-Spark doesn’t support the following argument(s).
min_periods argument is not supported
Examples
>>> df = ps.DataFrame([(.2, .3), (.0, .6), (.6, .0), (.2, .1)], ... columns=['dogs', 'cats']) >>> df.corr('pearson') dogs cats dogs 1.000000 -0.851064 cats -0.851064 1.000000
>>> df.corr('spearman') dogs cats dogs 1.000000 -0.948683 cats -0.948683 1.000000