pyspark.sql.functions.
sentences
Splits a string into arrays of sentences, where each sentence is an array of words. The ‘language’ and ‘country’ arguments are optional, and if omitted, the default locale is used.
New in version 3.2.0.
Column
a string to be split
a language of the locale
a country of the locale
Examples
>>> df = spark.createDataFrame([["This is an example sentence."]], ["string"]) >>> df.select(sentences(df.string, lit("en"), lit("US"))).show(truncate=False) +-----------------------------------+ |sentences(string, en, US) | +-----------------------------------+ |[[This, is, an, example, sentence]]| +-----------------------------------+