pyspark.pandas.DataFrame.prod¶
-
DataFrame.
prod
(axis: Union[int, str, None] = None, skipna: bool = True, numeric_only: bool = None, min_count: int = 0) → Union[int, float, bool, str, bytes, decimal.Decimal, datetime.date, datetime.datetime, None, Series]¶ Return the product of the values.
Note
unlike pandas’, pandas-on-Spark’s emulates product by
exp(sum(log(...)))
trick. Therefore, it only works for positive numbers.- Parameters
- axis: {index (0), columns (1)}
Axis for the function to be applied on.
- skipna: bool, default True
Exclude NA/null values when computing the result.
Changed in version 3.4.0: Supported including NA/null values.
- numeric_only: bool, default None
Include only float, int, boolean columns. False is not supported. This parameter is mainly for pandas compatibility.
- min_count: int, default 0
The required number of valid values to perform the operation. If fewer than
min_count
non-NA values are present the result will be NA.
Examples
On a DataFrame:
Non-numeric type column is not included to the result.
>>> psdf = ps.DataFrame({'A': [1, 2, 3, 4, 5], ... 'B': [10, 20, 30, 40, 50], ... 'C': ['a', 'b', 'c', 'd', 'e']}) >>> psdf A B C 0 1 10 a 1 2 20 b 2 3 30 c 3 4 40 d 4 5 50 e
>>> psdf.prod() A 120 B 12000000 dtype: int64
If there is no numeric type columns, returns empty Series.
>>> ps.DataFrame({"key": ['a', 'b', 'c'], "val": ['x', 'y', 'z']}).prod() Series([], dtype: float64)
On a Series:
>>> ps.Series([1, 2, 3, 4, 5]).prod() 120
By default, the product of an empty or all-NA Series is
1
>>> ps.Series([]).prod() 1.0
This can be controlled with the
min_count
parameter>>> ps.Series([]).prod(min_count=1) nan