RDD.
saveAsTextFile
Save this RDD as a text file, using string representations of elements.
New in version 0.7.0.
path to text file
fully qualified classname of the compression codec class i.e. “org.apache.hadoop.io.compress.GzipCodec” (None by default)
See also
SparkContext.textFile()
SparkContext.wholeTextFiles()
Examples
>>> import os >>> import tempfile >>> from fileinput import input >>> from glob import glob >>> with tempfile.TemporaryDirectory() as d1: ... path1 = os.path.join(d1, "text_file1") ... ... # Write a temporary text file ... sc.parallelize(range(10)).saveAsTextFile(path1) ... ... # Load text file as an RDD ... ''.join(sorted(input(glob(path1 + "/part-0000*")))) '0\n1\n2\n3\n4\n5\n6\n7\n8\n9\n'
Empty lines are tolerated when saving to text files.
>>> with tempfile.TemporaryDirectory() as d2: ... path2 = os.path.join(d2, "text2_file2") ... ... # Write another temporary text file ... sc.parallelize(['', 'foo', '', 'bar', '']).saveAsTextFile(path2) ... ... # Load text file as an RDD ... ''.join(sorted(input(glob(path2 + "/part-0000*")))) '\n\n\nbar\nfoo\n'
Using compressionCodecClass
>>> from fileinput import input, hook_compressed >>> with tempfile.TemporaryDirectory() as d3: ... path3 = os.path.join(d3, "text3") ... codec = "org.apache.hadoop.io.compress.GzipCodec" ... ... # Write another temporary text file with specified codec ... sc.parallelize(['foo', 'bar']).saveAsTextFile(path3, codec) ... ... # Load text file as an RDD ... result = sorted(input(glob(path3 + "/part*.gz"), openhook=hook_compressed)) ... ''.join([r.decode('utf-8') if isinstance(r, bytes) else r for r in result]) 'bar\nfoo\n'