SparkSession(sparkContext[, jsparkSession, …])
SparkSession
The entry point to programming Spark with the Dataset and DataFrame API.
Catalog(sparkSession)
Catalog
User-facing catalog API, accessible through SparkSession.catalog.
DataFrame(jdf, sql_ctx)
DataFrame
A distributed collection of data grouped into named columns.
Column(jc)
Column
A column in a DataFrame.
Observation([name])
Observation
Class to observe (named) metrics on a DataFrame.
Row
A row in DataFrame.
GroupedData(jgd, df)
GroupedData
A set of methods for aggregations on a DataFrame, created by DataFrame.groupBy().
DataFrame.groupBy()
PandasCogroupedOps(gd1, gd2)
PandasCogroupedOps
A logical grouping of two GroupedData, created by GroupedData.cogroup().
GroupedData.cogroup()
DataFrameNaFunctions(df)
DataFrameNaFunctions
Functionality for working with missing data in DataFrame.
DataFrameStatFunctions(df)
DataFrameStatFunctions
Functionality for statistic functions with DataFrame.
Window
Utility functions for defining window in DataFrames.
DataFrameReader(spark)
DataFrameReader
Interface used to load a DataFrame from external storage systems (e.g.
DataFrameWriter(df)
DataFrameWriter
Interface used to write a DataFrame to external storage systems (e.g.
DataFrameWriterV2(df, table)
DataFrameWriterV2
Interface used to write a class:pyspark.sql.dataframe.DataFrame to external storage using the v2 API.
UDFRegistration(sparkSession)
UDFRegistration
Wrapper for user-defined function registration.
udf.UserDefinedFunction(func[, returnType, …])
udf.UserDefinedFunction
User defined function in Python