pyspark.pandas.read_sql¶

pyspark.pandas.read_sql(sql: str, con: str, index_col: Union[str, List[str], None] = None, columns: Union[str, List[str], None] = None, **options: Any) → pyspark.pandas.frame.DataFrame[source]¶

Read SQL query or database table into a DataFrame.

This function is a convenience wrapper around read_sql_table and read_sql_query (for backward compatibility). It will delegate to the specific function depending on the provided input. A SQL query will be routed to read_sql_query, while a database table name will be routed to read_sql_table. Note that the delegated function might have more specific notes about their functionality not listed here.

Note

Some database might hit the issue of Spark: SPARK-27596

Parameters

sqlstring: SQL query to be executed or a table name.
constr: A JDBC URI could be provided as str.

Note

The URI must be JDBC URI instead of Python’s database URI.
index_colstring or list of strings, optional, default: None: Column(s) to set as index(MultiIndex).
columnslist, default: None: List of column names to select from SQL table (only used when reading a table).
optionsdict: All other options passed directly into Spark’s JDBC data source.

Returns

DataFrame