DataStreamReader.
orc
Loads a ORC file stream, returning the result as a DataFrame.
DataFrame
New in version 2.3.0.
sets whether we should merge schemas collected from all ORC part-files. This will override spark.sql.orc.mergeSchema. The default value is specified in spark.sql.orc.mergeSchema.
spark.sql.orc.mergeSchema
an optional glob pattern to only include files with paths matching the pattern. The syntax follows org.apache.hadoop.fs.GlobFilter. It does not change the behavior of partition discovery.
recursively scan a directory for files. Using this option disables partition discovery. # noqa
Examples
>>> orc_sdf = spark.readStream.schema(sdf_schema).orc(tempfile.mkdtemp()) >>> orc_sdf.isStreaming True >>> orc_sdf.schema == sdf_schema True