pyspark.pandas.read_sql¶
-
pyspark.pandas.
read_sql
(sql: str, con: str, index_col: Union[str, List[str], None] = None, columns: Union[str, List[str], None] = None, **options: Any) → pyspark.pandas.frame.DataFrame[source]¶ Read SQL query or database table into a DataFrame.
This function is a convenience wrapper around
read_sql_table
andread_sql_query
(for backward compatibility). It will delegate to the specific function depending on the provided input. A SQL query will be routed toread_sql_query
, while a database table name will be routed toread_sql_table
. Note that the delegated function might have more specific notes about their functionality not listed here.Note
Some database might hit the issue of Spark: SPARK-27596
- Parameters
- sqlstring
SQL query to be executed or a table name.
- constr
A JDBC URI could be provided as str.
Note
The URI must be JDBC URI instead of Python’s database URI.
- index_colstring or list of strings, optional, default: None
Column(s) to set as index(MultiIndex).
- columnslist, default: None
List of column names to select from SQL table (only used when reading a table).
- optionsdict
All other options passed directly into Spark’s JDBC data source.
- Returns
- DataFrame
See also
read_sql_table
Read SQL database table into a DataFrame.
read_sql_query
Read SQL query into a DataFrame.
Examples
>>> ps.read_sql('table_name', 'jdbc:postgresql:db_name') >>> ps.read_sql('SELECT * FROM table_name', 'jdbc:postgresql:db_name')