pyspark.pandas.DataFrame.shift¶
-
DataFrame.
shift
(periods: int = 1, fill_value: Optional[Any] = None) → pyspark.pandas.frame.DataFrame[source]¶ Shift DataFrame by desired number of periods.
Note
the current implementation of shift uses Spark’s Window without specifying partition specification. This leads to moving all data into a single partition in a single machine and could cause serious performance degradation. Avoid this method with very large datasets.
- Parameters
- periodsint
Number of periods to shift. Can be positive or negative.
- fill_valueobject, optional
The scalar value to use for newly introduced missing values. The default depends on the dtype of self. For numeric data, np.nan is used.
- Returns
- Copy of input DataFrame, shifted.
Examples
>>> df = ps.DataFrame({'Col1': [10, 20, 15, 30, 45], ... 'Col2': [13, 23, 18, 33, 48], ... 'Col3': [17, 27, 22, 37, 52]}, ... columns=['Col1', 'Col2', 'Col3'])
>>> df.shift(periods=3) Col1 Col2 Col3 0 NaN NaN NaN 1 NaN NaN NaN 2 NaN NaN NaN 3 10.0 13.0 17.0 4 20.0 23.0 27.0
>>> df.shift(periods=3, fill_value=0) Col1 Col2 Col3 0 0 0 0 1 0 0 0 2 0 0 0 3 10 13 17 4 20 23 27