GroupBy.
median
Compute median of groups, excluding missing values.
For multiple groupings, the result index will be a MultiIndex
Note
Unlike pandas’, the median in pandas-on-Spark is an approximated median based upon approximate percentile computation because computing median across a large dataset is extremely expensive.
Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data.
New in version 3.4.0.
Median of values within each group.
Examples
>>> psdf = ps.DataFrame({'a': [1., 1., 1., 1., 2., 2., 2., 3., 3., 3.], ... 'b': [2., 3., 1., 4., 6., 9., 8., 10., 7., 5.], ... 'c': [3., 5., 2., 5., 1., 2., 6., 4., 3., 6.]}, ... columns=['a', 'b', 'c'], ... index=[7, 2, 4, 1, 3, 4, 9, 10, 5, 6]) >>> psdf a b c 7 1.0 2.0 3.0 2 1.0 3.0 5.0 4 1.0 1.0 2.0 1 1.0 4.0 5.0 3 2.0 6.0 1.0 4 2.0 9.0 2.0 9 2.0 8.0 6.0 10 3.0 10.0 4.0 5 3.0 7.0 3.0 6 3.0 5.0 6.0
DataFrameGroupBy
>>> psdf.groupby('a').median().sort_index() b c a 1.0 2.0 3.0 2.0 8.0 2.0 3.0 7.0 4.0
SeriesGroupBy
>>> psdf.groupby('a')['b'].median().sort_index() a 1.0 2.0 2.0 8.0 3.0 7.0 Name: b, dtype: float64