- Accumulable<R,T> - Class in org.apache.spark
-
A data type that can be accumulated, ie has an commutative and associative "add" operation,
but where the result type, R
, may be different from the element type being added, T
.
- Accumulable(R, AccumulableParam<R, T>) - Constructor for class org.apache.spark.Accumulable
-
- accumulable(T, AccumulableParam<T, R>) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Create an
Accumulable
shared variable of the given type, to which tasks
can "add" values with
add
.
- accumulable(T, AccumulableParam<T, R>) - Method in class org.apache.spark.SparkContext
-
Create an
Accumulable
shared variable, to which tasks can add values
with
+=
.
- accumulableCollection(R, Function1<R, Growable<T>>, ClassTag<R>) - Method in class org.apache.spark.SparkContext
-
Create an accumulator from a "mutable collection" type.
- AccumulableParam<R,T> - Interface in org.apache.spark
-
Helper object defining how to accumulate values of a particular type.
- Accumulator<T> - Class in org.apache.spark
-
A simpler value of
Accumulable
where the result type being accumulated is the same
as the types of elements being merged, i.e.
- Accumulator(T, AccumulatorParam<T>) - Constructor for class org.apache.spark.Accumulator
-
- accumulator(int) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Create an
Accumulator
integer variable, which tasks can "add" values
to using the
add
method.
- accumulator(double) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Create an
Accumulator
double variable, which tasks can "add" values
to using the
add
method.
- accumulator(T, AccumulatorParam<T>) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Create an
Accumulator
variable of a given type, which tasks can "add"
values to using the
add
method.
- accumulator(T, AccumulatorParam<T>) - Method in class org.apache.spark.SparkContext
-
Create an
Accumulator
variable of a given type, which tasks can "add"
values to using the
+=
method.
- AccumulatorParam<T> - Interface in org.apache.spark
-
A simpler version of
AccumulableParam
where the only data type you can add
in is the same type as the accumulated value.
- active() - Method in class org.apache.spark.streaming.scheduler.ReceiverInfo
-
- activeStages() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- actor() - Method in class org.apache.spark.streaming.scheduler.ReceiverInfo
-
- ActorHelper - Interface in org.apache.spark.streaming.receiver
-
:: DeveloperApi ::
A receiver trait to be mixed in with your Actor to gain access to
the API for pushing received data into Spark Streaming for being processed.
- actorStream(Props, String, StorageLevel, SupervisorStrategy) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Create an input stream with any arbitrary user implemented actor receiver.
- actorStream(Props, String, StorageLevel) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Create an input stream with any arbitrary user implemented actor receiver.
- actorStream(Props, String) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Create an input stream with any arbitrary user implemented actor receiver.
- actorStream(Props, String, StorageLevel, SupervisorStrategy, ClassTag<T>) - Method in class org.apache.spark.streaming.StreamingContext
-
Create an input stream with any arbitrary user implemented actor receiver.
- ActorSupervisorStrategy - Class in org.apache.spark.streaming.receiver
-
:: DeveloperApi ::
A helper with set of defaults for supervisor strategy
- ActorSupervisorStrategy() - Constructor for class org.apache.spark.streaming.receiver.ActorSupervisorStrategy
-
- actorSystem() - Method in class org.apache.spark.SparkEnv
-
- add(T) - Method in class org.apache.spark.Accumulable
-
Add more data to this accumulator / accumulable
- add(Vector) - Method in class org.apache.spark.util.Vector
-
- addAccumulator(R, T) - Method in interface org.apache.spark.AccumulableParam
-
Add additional data to the accumulator value.
- addAccumulator(T, T) - Method in interface org.apache.spark.AccumulatorParam
-
- addedFiles() - Method in class org.apache.spark.SparkContext
-
- addedJars() - Method in class org.apache.spark.SparkContext
-
- addFile(String) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Add a file to be downloaded with this Spark job on every node.
- addFile(String) - Method in class org.apache.spark.SparkContext
-
Add a file to be downloaded with this Spark job on every node.
- addInPlace(R, R) - Method in interface org.apache.spark.AccumulableParam
-
Merge two accumulated values together.
- addInPlace(double, double) - Method in class org.apache.spark.SparkContext.DoubleAccumulatorParam$
-
- addInPlace(float, float) - Method in class org.apache.spark.SparkContext.FloatAccumulatorParam$
-
- addInPlace(int, int) - Method in class org.apache.spark.SparkContext.IntAccumulatorParam$
-
- addInPlace(long, long) - Method in class org.apache.spark.SparkContext.LongAccumulatorParam$
-
- addInPlace(Vector) - Method in class org.apache.spark.util.Vector
-
- addInPlace(Vector, Vector) - Method in class org.apache.spark.util.Vector.VectorAccumParam$
-
- addJar(String) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Adds a JAR dependency for all tasks to be executed on this SparkContext in the future.
- addJar(String) - Method in class org.apache.spark.SparkContext
-
Adds a JAR dependency for all tasks to be executed on this SparkContext in the future.
- addLocalConfiguration(String, int, int, int, JobConf) - Static method in class org.apache.spark.rdd.HadoopRDD
-
Add Hadoop configuration specific to a single partition and attempt.
- addOnCompleteCallback(Function0<BoxedUnit>) - Method in class org.apache.spark.TaskContext
-
Add a callback function to be executed on task completion.
- addSparkListener(SparkListener) - Method in class org.apache.spark.SparkContext
-
:: DeveloperApi ::
Register a listener to receive up-calls from events that happen during execution.
- addStreamingListener(StreamingListener) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
- addStreamingListener(StreamingListener) - Method in class org.apache.spark.streaming.StreamingContext
-
- aggregate(U, Function2<U, T, U>, Function2<U, U, U>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Aggregate the elements of each partition, and then the results for all the partitions, using
given combine functions and a neutral "zero value".
- aggregate(U, Function2<U, T, U>, Function2<U, U, U>, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
-
Aggregate the elements of each partition, and then the results for all the partitions, using
given combine functions and a neutral "zero value".
- Aggregate - Class in org.apache.spark.sql.execution
-
:: DeveloperApi ::
Groups input data by groupingExpressions
and computes the aggregateExpressions
for each
group.
- Aggregate(boolean, Seq<Expression>, Seq<NamedExpression>, SparkPlan, SQLContext) - Constructor for class org.apache.spark.sql.execution.Aggregate
-
- aggregate() - Method in class org.apache.spark.sql.execution.Aggregate.ComputedAggregate
-
- aggregate(Seq<Expression>) - Method in class org.apache.spark.sql.SchemaRDD
-
Performs an aggregation over all Rows in this RDD.
- Aggregate.ComputedAggregate - Class in org.apache.spark.sql.execution
-
An aggregate that needs to be computed for each row in a group.
- Aggregate.ComputedAggregate(AggregateExpression, AggregateExpression, AttributeReference) - Constructor for class org.apache.spark.sql.execution.Aggregate.ComputedAggregate
-
- Aggregate.ComputedAggregate$ - Class in org.apache.spark.sql.execution
-
- Aggregate.ComputedAggregate$() - Constructor for class org.apache.spark.sql.execution.Aggregate.ComputedAggregate$
-
- aggregateExpressions() - Method in class org.apache.spark.sql.execution.Aggregate
-
- Aggregator<K,V,C> - Class in org.apache.spark
-
:: DeveloperApi ::
A set of functions used to aggregate data.
- Aggregator(Function1<V, C>, Function2<C, V, C>, Function2<C, C, C>) - Constructor for class org.apache.spark.Aggregator
-
- Algo - Class in org.apache.spark.mllib.tree.configuration
-
:: Experimental ::
Enum to select the algorithm for the decision tree
- Algo() - Constructor for class org.apache.spark.mllib.tree.configuration.Algo
-
- algo() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- algo() - Method in class org.apache.spark.mllib.tree.model.DecisionTreeModel
-
- AlphaComponent - Annotation Type in org.apache.spark.annotation
-
A new component of Spark which may have unstable API's.
- alreadyPlanned() - Method in class org.apache.spark.sql.execution.SparkLogicalPlan
-
- ALS - Class in org.apache.spark.mllib.recommendation
-
Alternating Least Squares matrix factorization.
- ALS() - Constructor for class org.apache.spark.mllib.recommendation.ALS
-
Constructs an ALS instance with default parameters: {numBlocks: -1, rank: 10, iterations: 10,
lambda: 0.01, implicitPrefs: false, alpha: 1.0}.
- analyzed() - Method in class org.apache.spark.sql.hive.test.TestHiveContext.QueryExecution
-
- ANY() - Static method in class org.apache.spark.scheduler.TaskLocality
-
- appendBias(Vector) - Static method in class org.apache.spark.mllib.util.MLUtils
-
Returns a new vector with 1.0
(bias) appended to the input vector.
- apply(int) - Method in class org.apache.spark.mllib.linalg.DenseVector
-
- apply(int, int) - Method in interface org.apache.spark.mllib.linalg.Matrix
-
Gets the (i, j)-th element.
- apply(int) - Method in interface org.apache.spark.mllib.linalg.Vector
-
Gets the value of the ith element.
- apply(String) - Static method in class org.apache.spark.storage.BlockId
-
Converts a BlockId "name" String back into a BlockId.
- apply(String, String, int, int) - Static method in class org.apache.spark.storage.BlockManagerId
-
- apply(ObjectInput) - Static method in class org.apache.spark.storage.BlockManagerId
-
- apply(boolean, boolean, boolean, boolean, int) - Static method in class org.apache.spark.storage.StorageLevel
-
:: DeveloperApi ::
Create a new StorageLevel object without setting useOffHeap.
- apply(boolean, boolean, boolean, int) - Static method in class org.apache.spark.storage.StorageLevel
-
:: DeveloperApi ::
Create a new StorageLevel object.
- apply(int, int) - Static method in class org.apache.spark.storage.StorageLevel
-
:: DeveloperApi ::
Create a new StorageLevel object from its integer representation.
- apply(ObjectInput) - Static method in class org.apache.spark.storage.StorageLevel
-
:: DeveloperApi ::
Read StorageLevel object from ObjectInput stream.
- apply(long) - Static method in class org.apache.spark.streaming.Milliseconds
-
- apply(long) - Static method in class org.apache.spark.streaming.Minutes
-
- apply(long) - Static method in class org.apache.spark.streaming.Seconds
-
- apply(TraversableOnce<Object>) - Static method in class org.apache.spark.util.StatCounter
-
Build a StatCounter from a list of values.
- apply(Seq<Object>) - Static method in class org.apache.spark.util.StatCounter
-
Build a StatCounter from a list of values passed as variable-length arguments.
- apply(int) - Method in class org.apache.spark.util.Vector
-
- applySchema(JavaRDD<?>, Class<?>) - Method in class org.apache.spark.sql.api.java.JavaSQLContext
-
Applies a schema to an RDD of Java Beans.
- appName() - Method in class org.apache.spark.api.java.JavaSparkContext
-
- appName() - Method in class org.apache.spark.scheduler.SparkListenerApplicationStart
-
- appName() - Method in class org.apache.spark.SparkContext
-
- ApproxHist() - Static method in class org.apache.spark.mllib.tree.configuration.QuantileStrategy
-
- areaUnderPR() - Method in class org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
-
Computes the area under the precision-recall curve.
- areaUnderROC() - Method in class org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
-
Computes the area under the receiver operating characteristic (ROC) curve.
- as(Symbol) - Method in class org.apache.spark.sql.SchemaRDD
-
Applies a qualifier to the attributes of this relation.
- asIterator() - Method in interface org.apache.spark.serializer.DeserializationStream
-
Read the elements of this stream through an iterator.
- asRDDId() - Method in class org.apache.spark.storage.BlockId
-
- AsyncRDDActions<T> - Class in org.apache.spark.rdd
-
:: Experimental ::
A set of asynchronous RDD actions available through an implicit conversion.
- AsyncRDDActions(RDD<T>, ClassTag<T>) - Constructor for class org.apache.spark.rdd.AsyncRDDActions
-
- attemptId() - Method in class org.apache.spark.TaskContext
-
- attributes() - Method in class org.apache.spark.sql.hive.execution.HiveTableScan
-
- awaitTermination() - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Wait for the execution to stop.
- awaitTermination(long) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Wait for the execution to stop.
- awaitTermination() - Method in class org.apache.spark.streaming.StreamingContext
-
Wait for the execution to stop.
- awaitTermination(long) - Method in class org.apache.spark.streaming.StreamingContext
-
Wait for the execution to stop.
- cache() - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Persist this RDD with the default storage level (`MEMORY_ONLY`).
- cache() - Method in class org.apache.spark.api.java.JavaPairRDD
-
Persist this RDD with the default storage level (`MEMORY_ONLY`).
- cache() - Method in class org.apache.spark.api.java.JavaRDD
-
Persist this RDD with the default storage level (`MEMORY_ONLY`).
- cache() - Method in class org.apache.spark.rdd.RDD
-
Persist this RDD with the default storage level (`MEMORY_ONLY`).
- cache() - Method in class org.apache.spark.sql.api.java.JavaSchemaRDD
-
Persist this RDD with the default storage level (`MEMORY_ONLY`).
- cache() - Method in class org.apache.spark.streaming.api.java.JavaDStream
-
Persist RDDs of this DStream with the default storage level (MEMORY_ONLY_SER)
- cache() - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Persist RDDs of this DStream with the default storage level (MEMORY_ONLY_SER)
- cache() - Method in class org.apache.spark.streaming.dstream.DStream
-
Persist RDDs of this DStream with the default storage level (MEMORY_ONLY_SER)
- CacheCommand - Class in org.apache.spark.sql.execution
-
:: DeveloperApi ::
- CacheCommand(String, boolean, SQLContext) - Constructor for class org.apache.spark.sql.execution.CacheCommand
-
- cacheManager() - Method in class org.apache.spark.SparkEnv
-
- cacheTable(String) - Method in class org.apache.spark.sql.SQLContext
-
Caches the specified table in-memory.
- cacheTables() - Method in class org.apache.spark.sql.hive.test.TestHiveContext
-
- calculate(double, double) - Static method in class org.apache.spark.mllib.tree.impurity.Entropy
-
:: DeveloperApi ::
entropy calculation
- calculate(double, double, double) - Static method in class org.apache.spark.mllib.tree.impurity.Entropy
-
- calculate(double, double) - Static method in class org.apache.spark.mllib.tree.impurity.Gini
-
:: DeveloperApi ::
Gini coefficient calculation
- calculate(double, double, double) - Static method in class org.apache.spark.mllib.tree.impurity.Gini
-
- calculate(double, double) - Method in interface org.apache.spark.mllib.tree.impurity.Impurity
-
:: DeveloperApi ::
information calculation for binary classification
- calculate(double, double, double) - Method in interface org.apache.spark.mllib.tree.impurity.Impurity
-
:: DeveloperApi ::
information calculation for regression
- calculate(double, double) - Static method in class org.apache.spark.mllib.tree.impurity.Variance
-
- calculate(double, double, double) - Static method in class org.apache.spark.mllib.tree.impurity.Variance
-
:: DeveloperApi ::
variance calculation
- call(T) - Method in interface org.apache.spark.api.java.function.DoubleFlatMapFunction
-
- call(T) - Method in interface org.apache.spark.api.java.function.DoubleFunction
-
- call(T) - Method in interface org.apache.spark.api.java.function.FlatMapFunction
-
- call(T1, T2) - Method in interface org.apache.spark.api.java.function.FlatMapFunction2
-
- call(T1) - Method in interface org.apache.spark.api.java.function.Function
-
- call(T1, T2) - Method in interface org.apache.spark.api.java.function.Function2
-
- call(T1, T2, T3) - Method in interface org.apache.spark.api.java.function.Function3
-
- call(T) - Method in interface org.apache.spark.api.java.function.PairFlatMapFunction
-
- call(T) - Method in interface org.apache.spark.api.java.function.PairFunction
-
- call(T) - Method in interface org.apache.spark.api.java.function.VoidFunction
-
- cancel() - Method in class org.apache.spark.ComplexFutureAction
-
- cancel() - Method in interface org.apache.spark.FutureAction
-
Cancels the execution of this action.
- cancel() - Method in class org.apache.spark.SimpleFutureAction
-
- cancelAllJobs() - Method in class org.apache.spark.api.java.JavaSparkContext
-
Cancel all jobs that have been scheduled or are running.
- cancelAllJobs() - Method in class org.apache.spark.SparkContext
-
Cancel all jobs that have been scheduled or are running.
- cancelJobGroup(String) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Cancel active jobs for the specified group.
- cancelJobGroup(String) - Method in class org.apache.spark.SparkContext
-
Cancel active jobs for the specified group.
- cancelled() - Method in class org.apache.spark.ComplexFutureAction
-
Returns whether the promise has been cancelled.
- canEqual(Object) - Method in class org.apache.spark.util.MutablePair
-
- cartesian(JavaRDDLike<U, ?>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return the Cartesian product of this RDD and another one, that is, the RDD of all pairs of
elements (a, b) where a is in this
and b is in other
.
- cartesian(RDD<U>, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
-
Return the Cartesian product of this RDD and another one, that is, the RDD of all pairs of
elements (a, b) where a is in this
and b is in other
.
- CartesianProduct - Class in org.apache.spark.sql.execution
-
:: DeveloperApi ::
- CartesianProduct(SparkPlan, SparkPlan) - Constructor for class org.apache.spark.sql.execution.CartesianProduct
-
- Categorical() - Static method in class org.apache.spark.mllib.tree.configuration.FeatureType
-
- categoricalFeaturesInfo() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- categories() - Method in class org.apache.spark.mllib.tree.model.Split
-
- checkpoint() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Mark this RDD for checkpointing.
- checkpoint() - Method in class org.apache.spark.rdd.HadoopRDD
-
- checkpoint() - Method in class org.apache.spark.rdd.RDD
-
Mark this RDD for checkpointing.
- checkpoint(Duration) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Enable periodic checkpointing of RDDs of this DStream.
- checkpoint(String) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Sets the context to periodically checkpoint the DStream operations for master
fault-tolerance.
- checkpoint(Duration) - Method in class org.apache.spark.streaming.dstream.DStream
-
Enable periodic checkpointing of RDDs of this DStream
- checkpoint(String) - Method in class org.apache.spark.streaming.StreamingContext
-
Set the context to periodically checkpoint the DStream operations for driver
fault-tolerance.
- checkpointData() - Method in class org.apache.spark.rdd.RDD
-
- checkpointData() - Method in class org.apache.spark.streaming.dstream.DStream
-
- checkpointDir() - Method in class org.apache.spark.SparkContext
-
- checkpointDir() - Method in class org.apache.spark.streaming.StreamingContext
-
- checkpointDuration() - Method in class org.apache.spark.streaming.dstream.DStream
-
- checkpointDuration() - Method in class org.apache.spark.streaming.StreamingContext
-
- child() - Method in class org.apache.spark.sql.execution.Aggregate
-
- child() - Method in class org.apache.spark.sql.execution.DescribeCommand
-
- child() - Method in class org.apache.spark.sql.execution.Exchange
-
- child() - Method in class org.apache.spark.sql.execution.Filter
-
- child() - Method in class org.apache.spark.sql.execution.Generate
-
- child() - Method in class org.apache.spark.sql.execution.Limit
-
- child() - Method in class org.apache.spark.sql.execution.Project
-
- child() - Method in class org.apache.spark.sql.execution.Sample
-
- child() - Method in class org.apache.spark.sql.execution.Sort
-
- child() - Method in class org.apache.spark.sql.execution.TakeOrdered
-
- child() - Method in class org.apache.spark.sql.hive.execution.InsertIntoHiveTable
-
- child() - Method in class org.apache.spark.sql.hive.execution.ScriptTransformation
-
- child() - Method in class org.apache.spark.sql.parquet.InsertIntoParquetTable
-
- children() - Method in class org.apache.spark.sql.execution.SparkLogicalPlan
-
- children() - Method in class org.apache.spark.sql.execution.Union
-
- Classification() - Static method in class org.apache.spark.mllib.tree.configuration.Algo
-
- ClassificationModel - Interface in org.apache.spark.mllib.classification
-
:: Experimental ::
Represents a classification model that predicts to which of a set of categories an example
belongs.
- className() - Method in class org.apache.spark.ExceptionFailure
-
- classpathEntries() - Method in class org.apache.spark.ui.env.EnvironmentListener
-
- classTag() - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
- classTag() - Method in class org.apache.spark.api.java.JavaPairRDD
-
- classTag() - Method in class org.apache.spark.api.java.JavaRDD
-
- classTag() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
- classTag() - Method in class org.apache.spark.sql.api.java.JavaSchemaRDD
-
- classTag() - Method in class org.apache.spark.streaming.api.java.JavaDStream
-
- classTag() - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
- classTag() - Method in class org.apache.spark.streaming.api.java.JavaInputDStream
-
- classTag() - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
- classTag() - Method in class org.apache.spark.streaming.api.java.JavaReceiverInputDStream
-
- cleaner() - Method in class org.apache.spark.SparkContext
-
- clear() - Method in interface org.apache.spark.sql.SQLConf
-
- clearCallSite() - Method in class org.apache.spark.api.java.JavaSparkContext
-
Pass-through to SparkContext.setCallSite.
- clearCallSite() - Method in class org.apache.spark.SparkContext
-
Support function for API backtraces.
- clearDependencies() - Method in class org.apache.spark.rdd.CoGroupedRDD
-
- clearDependencies() - Method in class org.apache.spark.rdd.ShuffledRDD
-
- clearFiles() - Method in class org.apache.spark.api.java.JavaSparkContext
-
Clear the job's list of files added by addFile
so that they do not get downloaded to
any new nodes.
- clearFiles() - Method in class org.apache.spark.SparkContext
-
Clear the job's list of files added by addFile
so that they do not get downloaded to
any new nodes.
- clearJars() - Method in class org.apache.spark.api.java.JavaSparkContext
-
Clear the job's list of JARs added by addJar
so that they do not get downloaded to
any new nodes.
- clearJars() - Method in class org.apache.spark.SparkContext
-
Clear the job's list of JARs added by addJar
so that they do not get downloaded to
any new nodes.
- clearJobGroup() - Method in class org.apache.spark.api.java.JavaSparkContext
-
Clear the current thread's job group ID and its description.
- clearJobGroup() - Method in class org.apache.spark.SparkContext
-
Clear the current thread's job group ID and its description.
- clearThreshold() - Method in class org.apache.spark.mllib.classification.LogisticRegressionModel
-
:: Experimental ::
Clears the threshold so that predict
will output raw prediction scores.
- clearThreshold() - Method in class org.apache.spark.mllib.classification.SVMModel
-
:: Experimental ::
Clears the threshold so that predict
will output raw prediction scores.
- clone() - Method in class org.apache.spark.SparkConf
-
Copy this object
- clone() - Method in class org.apache.spark.storage.StorageLevel
-
- clone() - Method in class org.apache.spark.util.random.BernoulliSampler
-
- clone() - Method in class org.apache.spark.util.random.PoissonSampler
-
- clone() - Method in interface org.apache.spark.util.random.RandomSampler
-
- cloneComplement() - Method in class org.apache.spark.util.random.BernoulliSampler
-
Return a sampler with is the complement of the range specified of the current sampler.
- close() - Method in interface org.apache.spark.serializer.DeserializationStream
-
- close() - Method in interface org.apache.spark.serializer.SerializationStream
-
- closureSerializer() - Method in class org.apache.spark.SparkEnv
-
- clusterCenters() - Method in class org.apache.spark.mllib.clustering.KMeansModel
-
- coalesce(int) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Return a new RDD that is reduced into numPartitions
partitions.
- coalesce(int, boolean) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Return a new RDD that is reduced into numPartitions
partitions.
- coalesce(int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return a new RDD that is reduced into numPartitions
partitions.
- coalesce(int, boolean) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return a new RDD that is reduced into numPartitions
partitions.
- coalesce(int) - Method in class org.apache.spark.api.java.JavaRDD
-
Return a new RDD that is reduced into numPartitions
partitions.
- coalesce(int, boolean) - Method in class org.apache.spark.api.java.JavaRDD
-
Return a new RDD that is reduced into numPartitions
partitions.
- coalesce(int, boolean, Ordering<T>) - Method in class org.apache.spark.rdd.RDD
-
Return a new RDD that is reduced into numPartitions
partitions.
- coalesce(int, boolean) - Method in class org.apache.spark.sql.api.java.JavaSchemaRDD
-
Return a new RDD that is reduced into numPartitions
partitions.
- coalesce(int, boolean, Ordering<Row>) - Method in class org.apache.spark.sql.SchemaRDD
-
- cogroup(JavaPairRDD<K, W>, Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
-
For each key k in this
or other
, return a resulting RDD that contains a tuple with the
list of values for that key in this
as well as other
.
- cogroup(JavaPairRDD<K, W1>, JavaPairRDD<K, W2>, Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
-
For each key k in this
or other1
or other2
, return a resulting RDD that contains a
tuple with the list of values for that key in this
, other1
and other2
.
- cogroup(JavaPairRDD<K, W>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
For each key k in this
or other
, return a resulting RDD that contains a tuple with the
list of values for that key in this
as well as other
.
- cogroup(JavaPairRDD<K, W1>, JavaPairRDD<K, W2>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
For each key k in this
or other1
or other2
, return a resulting RDD that contains a
tuple with the list of values for that key in this
, other1
and other2
.
- cogroup(JavaPairRDD<K, W>, int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
For each key k in this
or other
, return a resulting RDD that contains a tuple with the
list of values for that key in this
as well as other
.
- cogroup(JavaPairRDD<K, W1>, JavaPairRDD<K, W2>, int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
For each key k in this
or other1
or other2
, return a resulting RDD that contains a
tuple with the list of values for that key in this
, other1
and other2
.
- cogroup(RDD<Tuple2<K, W>>, Partitioner) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
For each key k in this
or other
, return a resulting RDD that contains a tuple with the
list of values for that key in this
as well as other
.
- cogroup(RDD<Tuple2<K, W1>>, RDD<Tuple2<K, W2>>, Partitioner) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
- cogroup(RDD<Tuple2<K, W>>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
- cogroup(RDD<Tuple2<K, W1>>, RDD<Tuple2<K, W2>>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
For each key k in this
or other1
or other2
, return a resulting RDD that contains a
tuple with the list of values for that key in this
, other1
and other2
.
- cogroup(RDD<Tuple2<K, W>>, int) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
For each key k in this
or other
, return a resulting RDD that contains a tuple with the
list of values for that key in this
as well as other
.
- cogroup(RDD<Tuple2<K, W1>>, RDD<Tuple2<K, W2>>, int) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
For each key k in this
or other1
or other2
, return a resulting RDD that contains a
tuple with the list of values for that key in this
, other1
and other2
.
- cogroup(JavaPairDStream<K, W>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying 'cogroup' between RDDs of this
DStream and other
DStream.
- cogroup(JavaPairDStream<K, W>, int) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying 'cogroup' between RDDs of this
DStream and other
DStream.
- cogroup(JavaPairDStream<K, W>, Partitioner) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying 'cogroup' between RDDs of this
DStream and other
DStream.
- cogroup(DStream<Tuple2<K, W>>, ClassTag<W>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying 'cogroup' between RDDs of this
DStream and other
DStream.
- cogroup(DStream<Tuple2<K, W>>, int, ClassTag<W>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying 'cogroup' between RDDs of this
DStream and other
DStream.
- cogroup(DStream<Tuple2<K, W>>, Partitioner, ClassTag<W>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying 'cogroup' between RDDs of this
DStream and other
DStream.
- CoGroupedRDD<K> - Class in org.apache.spark.rdd
-
:: DeveloperApi ::
A RDD that cogroups its parents.
- CoGroupedRDD(Seq<RDD<? extends Product2<K, ?>>>, Partitioner) - Constructor for class org.apache.spark.rdd.CoGroupedRDD
-
- collect() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return an array that contains all of the elements in this RDD.
- collect() - Method in class org.apache.spark.rdd.RDD
-
Return an array that contains all of the elements in this RDD.
- collect(PartialFunction<T, U>, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
-
Return an RDD that contains all matching values by applying f
.
- collect() - Method in class org.apache.spark.sql.SchemaRDD
-
- collectAsMap() - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return the key-value pairs in this RDD to the master as a Map.
- collectAsMap() - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Return the key-value pairs in this RDD to the master as a Map.
- collectAsync() - Method in class org.apache.spark.rdd.AsyncRDDActions
-
Returns a future for retrieving all elements of this RDD.
- collectPartitions(int[]) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return an array that contains all of the elements in a specific partition of this RDD.
- columnPruningPred() - Method in class org.apache.spark.sql.parquet.ParquetTableScan
-
- combineByKey(Function<V, C>, Function2<C, V, C>, Function2<C, C, C>, Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Generic function to combine the elements for each key using a custom set of aggregation
functions.
- combineByKey(Function<V, C>, Function2<C, V, C>, Function2<C, C, C>, int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Simplified version of combineByKey that hash-partitions the output RDD.
- combineByKey(Function<V, C>, Function2<C, V, C>, Function2<C, C, C>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Simplified version of combineByKey that hash-partitions the resulting RDD using the existing
partitioner/parallelism level.
- combineByKey(Function1<V, C>, Function2<C, V, C>, Function2<C, C, C>, Partitioner, boolean, Serializer) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Generic function to combine the elements for each key using a custom set of aggregation
functions.
- combineByKey(Function1<V, C>, Function2<C, V, C>, Function2<C, C, C>, int) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Simplified version of combineByKey that hash-partitions the output RDD.
- combineByKey(Function1<V, C>, Function2<C, V, C>, Function2<C, C, C>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Simplified version of combineByKey that hash-partitions the resulting RDD using the
existing partitioner/parallelism level.
- combineByKey(Function<V, C>, Function2<C, V, C>, Function2<C, C, C>, Partitioner) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Combine elements of each key in DStream's RDDs using custom function.
- combineByKey(Function<V, C>, Function2<C, V, C>, Function2<C, C, C>, Partitioner, boolean) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Combine elements of each key in DStream's RDDs using custom function.
- combineByKey(Function1<V, C>, Function2<C, V, C>, Function2<C, C, C>, Partitioner, boolean, ClassTag<C>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Combine elements of each key in DStream's RDDs using custom functions.
- combineCombinersByKey(Iterator<Tuple2<K, C>>) - Method in class org.apache.spark.Aggregator
-
- combineCombinersByKey(Iterator<Tuple2<K, C>>, TaskContext) - Method in class org.apache.spark.Aggregator
-
- combineValuesByKey(Iterator<Product2<K, V>>) - Method in class org.apache.spark.Aggregator
-
- combineValuesByKey(Iterator<Product2<K, V>>, TaskContext) - Method in class org.apache.spark.Aggregator
-
- Command - Interface in org.apache.spark.sql.execution
-
- commands() - Method in class org.apache.spark.sql.hive.test.TestHiveContext.TestTable
-
- compare(RDDInfo) - Method in class org.apache.spark.storage.RDDInfo
-
- completed() - Method in class org.apache.spark.TaskContext
-
- completedStages() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- completionTime() - Method in class org.apache.spark.scheduler.StageInfo
-
Time when all tasks in the stage completed or when the stage was cancelled.
- ComplexFutureAction<T> - Class in org.apache.spark
-
:: Experimental ::
A
FutureAction
for actions that could trigger multiple Spark jobs.
- ComplexFutureAction() - Constructor for class org.apache.spark.ComplexFutureAction
-
- compressedInputStream(InputStream) - Method in interface org.apache.spark.io.CompressionCodec
-
- compressedInputStream(InputStream) - Method in class org.apache.spark.io.LZFCompressionCodec
-
- compressedInputStream(InputStream) - Method in class org.apache.spark.io.SnappyCompressionCodec
-
- compressedOutputStream(OutputStream) - Method in interface org.apache.spark.io.CompressionCodec
-
- compressedOutputStream(OutputStream) - Method in class org.apache.spark.io.LZFCompressionCodec
-
- compressedOutputStream(OutputStream) - Method in class org.apache.spark.io.SnappyCompressionCodec
-
- CompressionCodec - Interface in org.apache.spark.io
-
:: DeveloperApi ::
CompressionCodec allows the customization of choosing different compression implementations
to be used in block storage.
- compute(Vector, double, Vector) - Method in class org.apache.spark.mllib.optimization.Gradient
-
Compute the gradient and loss given the features of a single data point.
- compute(Vector, double, Vector, Vector) - Method in class org.apache.spark.mllib.optimization.Gradient
-
Compute the gradient and loss given the features of a single data point,
add the gradient to a provided vector to avoid creating new objects, and return loss.
- compute(Vector, double, Vector) - Method in class org.apache.spark.mllib.optimization.HingeGradient
-
- compute(Vector, double, Vector, Vector) - Method in class org.apache.spark.mllib.optimization.HingeGradient
-
- compute(Vector, Vector, double, int, double) - Method in class org.apache.spark.mllib.optimization.L1Updater
-
- compute(Vector, double, Vector) - Method in class org.apache.spark.mllib.optimization.LeastSquaresGradient
-
- compute(Vector, double, Vector, Vector) - Method in class org.apache.spark.mllib.optimization.LeastSquaresGradient
-
- compute(Vector, double, Vector) - Method in class org.apache.spark.mllib.optimization.LogisticGradient
-
- compute(Vector, double, Vector, Vector) - Method in class org.apache.spark.mllib.optimization.LogisticGradient
-
- compute(Vector, Vector, double, int, double) - Method in class org.apache.spark.mllib.optimization.SimpleUpdater
-
- compute(Vector, Vector, double, int, double) - Method in class org.apache.spark.mllib.optimization.SquaredL2Updater
-
- compute(Vector, Vector, double, int, double) - Method in class org.apache.spark.mllib.optimization.Updater
-
Compute an updated value for weights given the gradient, stepSize, iteration number and
regularization parameter.
- compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.CoGroupedRDD
-
- compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.HadoopRDD
-
- compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.JdbcRDD
-
- compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.NewHadoopRDD
-
- compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.PartitionPruningRDD
-
- compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.RDD
-
:: DeveloperApi ::
Implemented by subclasses to compute a given partition.
- compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.ShuffledRDD
-
- compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.UnionRDD
-
- compute(Partition, TaskContext) - Method in class org.apache.spark.sql.SchemaRDD
-
- compute(Time) - Method in class org.apache.spark.streaming.api.java.JavaDStream
-
Generate an RDD for the given duration
- compute(Time) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Method that generates a RDD for the given Duration
- compute(Time) - Method in class org.apache.spark.streaming.dstream.ConstantInputDStream
-
- compute(Time) - Method in class org.apache.spark.streaming.dstream.DStream
-
Method that generates a RDD for the given time
- compute(Time) - Method in class org.apache.spark.streaming.dstream.ReceiverInputDStream
-
Ask ReceiverInputTracker for received data blocks and generates RDDs with them.
- computeColumnSummaryStatistics() - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
-
Computes column-wise summary statistics.
- computeCost(RDD<Vector>) - Method in class org.apache.spark.mllib.clustering.KMeansModel
-
Return the K-means cost (sum of squared distances of points to their nearest center) for this
model on the given data.
- computeCovariance() - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
-
Computes the covariance matrix, treating each row as an observation.
- computeGramianMatrix() - Method in class org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix
-
Computes the Gramian matrix A^T A
.
- computeGramianMatrix() - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
-
Computes the Gramian matrix A^T A
.
- computePreferredLocations(Seq<InputFormatInfo>) - Static method in class org.apache.spark.scheduler.InputFormatInfo
-
Computes the preferred locations based on input(s) and returned a location to block map.
- computePrincipalComponents(int) - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
-
Computes the top k principal components.
- computeSVD(int, boolean, double) - Method in class org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix
-
Computes the singular value decomposition of this matrix.
- computeSVD(int, boolean, double) - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
-
Computes the singular value decomposition of this matrix.
- condition() - Method in class org.apache.spark.sql.execution.BroadcastNestedLoopJoin
-
- condition() - Method in class org.apache.spark.sql.execution.Filter
-
- condition() - Method in class org.apache.spark.sql.execution.LeftSemiJoinBNL
-
- conf() - Method in class org.apache.spark.SparkContext
-
- conf() - Method in class org.apache.spark.SparkEnv
-
- conf() - Method in class org.apache.spark.streaming.StreamingContext
-
- confidence() - Method in class org.apache.spark.partial.BoundedDouble
-
- configuration() - Method in class org.apache.spark.scheduler.InputFormatInfo
-
- connectionManager() - Method in class org.apache.spark.SparkEnv
-
- ConstantInputDStream<T> - Class in org.apache.spark.streaming.dstream
-
An input stream that always returns the same RDD on each timestep.
- ConstantInputDStream(StreamingContext, RDD<T>, ClassTag<T>) - Constructor for class org.apache.spark.streaming.dstream.ConstantInputDStream
-
- contains(String) - Method in class org.apache.spark.SparkConf
-
Does the configuration contain a given parameter?
- contains(String) - Method in interface org.apache.spark.sql.SQLConf
-
- containsCachedMetadata(String) - Static method in class org.apache.spark.rdd.HadoopRDD
-
- context() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
- context() - Method in class org.apache.spark.InterruptibleIterator
-
- context() - Method in class org.apache.spark.rdd.RDD
-
- context() - Method in class org.apache.spark.sql.hive.execution.HiveTableScan
-
- context() - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
- context() - Method in class org.apache.spark.streaming.dstream.DStream
-
Return the StreamingContext associated with this DStream
- Continuous() - Static method in class org.apache.spark.mllib.tree.configuration.FeatureType
-
- convertToCatalyst(Object) - Static method in class org.apache.spark.sql.execution.ExistingRdd
-
- CoordinateMatrix - Class in org.apache.spark.mllib.linalg.distributed
-
:: Experimental ::
Represents a matrix in coordinate format.
- CoordinateMatrix(RDD<MatrixEntry>, long, long) - Constructor for class org.apache.spark.mllib.linalg.distributed.CoordinateMatrix
-
- CoordinateMatrix(RDD<MatrixEntry>) - Constructor for class org.apache.spark.mllib.linalg.distributed.CoordinateMatrix
-
Alternative constructor leaving matrix dimensions to be determined automatically.
- copy() - Method in class org.apache.spark.util.StatCounter
-
Clone this StatCounter
- count() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return the number of elements in the RDD.
- count() - Method in interface org.apache.spark.mllib.stat.MultivariateStatisticalSummary
-
Sample size.
- count() - Method in class org.apache.spark.rdd.RDD
-
Return the number of elements in the RDD.
- count() - Method in class org.apache.spark.sql.SchemaRDD
-
:: Experimental ::
Return the number of elements in the RDD.
- count() - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD has a single element generated by counting each RDD
of this DStream.
- count() - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream in which each RDD has a single element generated by counting each RDD
of this DStream.
- count() - Method in class org.apache.spark.util.StatCounter
-
- countApprox(long, double) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
:: Experimental ::
Approximate version of count() that returns a potentially incomplete result
within a timeout, even if not all tasks have finished.
- countApprox(long) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
:: Experimental ::
Approximate version of count() that returns a potentially incomplete result
within a timeout, even if not all tasks have finished.
- countApprox(long, double) - Method in class org.apache.spark.rdd.RDD
-
:: Experimental ::
Approximate version of count() that returns a potentially incomplete result
within a timeout, even if not all tasks have finished.
- countApproxDistinct(double) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return approximate number of distinct elements in the RDD.
- countApproxDistinct(double) - Method in class org.apache.spark.rdd.RDD
-
:: Experimental ::
Return approximate number of distinct elements in the RDD.
- countApproxDistinctByKey(double, Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return approximate number of distinct values for each key in this RDD.
- countApproxDistinctByKey(double) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return approximate number of distinct values for each key this RDD.
- countApproxDistinctByKey(double, int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return approximate number of distinct values for each key in this RDD.
- countApproxDistinctByKey(double, Partitioner) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Return approximate number of distinct values for each key in this RDD.
- countApproxDistinctByKey(double, int) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Return approximate number of distinct values for each key in this RDD.
- countApproxDistinctByKey(double) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Return approximate number of distinct values for each key this RDD.
- countAsync() - Method in class org.apache.spark.rdd.AsyncRDDActions
-
Returns a future for counting the number of elements in the RDD.
- countByKey() - Method in class org.apache.spark.api.java.JavaPairRDD
-
Count the number of elements for each key, and return the result to the master as a Map.
- countByKey() - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Count the number of elements for each key, and return the result to the master as a Map.
- countByKeyApprox(long) - Method in class org.apache.spark.api.java.JavaPairRDD
-
:: Experimental ::
Approximate version of countByKey that can return a partial result if it does
not finish within a timeout.
- countByKeyApprox(long, double) - Method in class org.apache.spark.api.java.JavaPairRDD
-
:: Experimental ::
Approximate version of countByKey that can return a partial result if it does
not finish within a timeout.
- countByKeyApprox(long, double) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
:: Experimental ::
Approximate version of countByKey that can return a partial result if it does
not finish within a timeout.
- countByValue() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return the count of each unique value in this RDD as a map of (value, count) pairs.
- countByValue(Ordering<T>) - Method in class org.apache.spark.rdd.RDD
-
Return the count of each unique value in this RDD as a map of (value, count) pairs.
- countByValue() - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD contains the counts of each distinct value in
each RDD of this DStream.
- countByValue(int) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD contains the counts of each distinct value in
each RDD of this DStream.
- countByValue(int, Ordering<T>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream in which each RDD contains the counts of each distinct value in
each RDD of this DStream.
- countByValueAndWindow(Duration, Duration) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD contains the count of distinct elements in
RDDs in a sliding window over this DStream.
- countByValueAndWindow(Duration, Duration, int) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD contains the count of distinct elements in
RDDs in a sliding window over this DStream.
- countByValueAndWindow(Duration, Duration, int, Ordering<T>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream in which each RDD contains the count of distinct elements in
RDDs in a sliding window over this DStream.
- countByValueApprox(long, double) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
(Experimental) Approximate version of countByValue().
- countByValueApprox(long) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
(Experimental) Approximate version of countByValue().
- countByValueApprox(long, double, Ordering<T>) - Method in class org.apache.spark.rdd.RDD
-
:: Experimental ::
Approximate version of countByValue().
- countByWindow(Duration, Duration) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD has a single element generated by counting the number
of elements in a window over this DStream.
- countByWindow(Duration, Duration) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream in which each RDD has a single element generated by counting the number
of elements in a sliding window over this DStream.
- create(boolean, boolean, boolean, int) - Static method in class org.apache.spark.api.java.StorageLevels
-
Deprecated.
- create(boolean, boolean, boolean, boolean, int) - Static method in class org.apache.spark.api.java.StorageLevels
-
Create a new StorageLevel object.
- create(RDD<T>, Function1<Object, Object>) - Static method in class org.apache.spark.rdd.PartitionPruningRDD
-
Create a PartitionPruningRDD.
- create() - Method in interface org.apache.spark.streaming.api.java.JavaStreamingContextFactory
-
- createCodec(SparkConf) - Method in interface org.apache.spark.io.CompressionCodec
-
- createCodec(SparkConf, String) - Method in interface org.apache.spark.io.CompressionCodec
-
- createCombiner() - Method in class org.apache.spark.Aggregator
-
- createFilter(Expression) - Static method in class org.apache.spark.sql.parquet.ParquetFilters
-
- createParquetFile(Class<?>, String, boolean, Configuration) - Method in class org.apache.spark.sql.api.java.JavaSQLContext
-
:: Experimental ::
Creates an empty parquet file with the schema of class beanClass
, which can be registered as
a table.
- createParquetFile(String, boolean, Configuration, TypeTags.TypeTag<A>) - Method in class org.apache.spark.sql.SQLContext
-
:: Experimental ::
Creates an empty parquet file with the schema of class A
, which can be registered as a table.
- createRecordFilter(Seq<Expression>) - Static method in class org.apache.spark.sql.parquet.ParquetFilters
-
- createSchemaRDD(RDD<A>, TypeTags.TypeTag<A>) - Method in class org.apache.spark.sql.SQLContext
-
Creates a SchemaRDD from an RDD of case classes.
- createStream(StreamingContext, String, int, StorageLevel) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
-
Create a input stream from a Flume source.
- createStream(JavaStreamingContext, String, int) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
-
Creates a input stream from a Flume source.
- createStream(JavaStreamingContext, String, int, StorageLevel) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
-
Creates a input stream from a Flume source.
- createStream(StreamingContext, String, String, Map<String, Object>, StorageLevel) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
-
Create an input stream that pulls messages from a Kafka Broker.
- createStream(StreamingContext, Map<String, String>, Map<String, Object>, StorageLevel, ClassTag<K>, ClassTag<V>, Manifest<U>, Manifest<T>) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
-
Create an input stream that pulls messages from a Kafka Broker.
- createStream(JavaStreamingContext, String, String, Map<String, Integer>) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
-
Create an input stream that pulls messages form a Kafka Broker.
- createStream(JavaStreamingContext, String, String, Map<String, Integer>, StorageLevel) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
-
Create an input stream that pulls messages form a Kafka Broker.
- createStream(JavaStreamingContext, Class<K>, Class<V>, Class<U>, Class<T>, Map<String, String>, Map<String, Integer>, StorageLevel) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
-
Create an input stream that pulls messages form a Kafka Broker.
- createStream(StreamingContext, String, String, StorageLevel) - Static method in class org.apache.spark.streaming.mqtt.MQTTUtils
-
Create an input stream that receives messages pushed by a MQTT publisher.
- createStream(JavaStreamingContext, String, String) - Static method in class org.apache.spark.streaming.mqtt.MQTTUtils
-
Create an input stream that receives messages pushed by a MQTT publisher.
- createStream(JavaStreamingContext, String, String, StorageLevel) - Static method in class org.apache.spark.streaming.mqtt.MQTTUtils
-
Create an input stream that receives messages pushed by a MQTT publisher.
- createStream(StreamingContext, Option<Authorization>, Seq<String>, StorageLevel) - Static method in class org.apache.spark.streaming.twitter.TwitterUtils
-
Create a input stream that returns tweets received from Twitter.
- createStream(JavaStreamingContext) - Static method in class org.apache.spark.streaming.twitter.TwitterUtils
-
Create a input stream that returns tweets received from Twitter using Twitter4J's default
OAuth authentication; this requires the system properties twitter4j.oauth.consumerKey,
twitter4j.oauth.consumerSecret, twitter4j.oauth.accessToken and
twitter4j.oauth.accessTokenSecret.
- createStream(JavaStreamingContext, String[]) - Static method in class org.apache.spark.streaming.twitter.TwitterUtils
-
Create a input stream that returns tweets received from Twitter using Twitter4J's default
OAuth authentication; this requires the system properties twitter4j.oauth.consumerKey,
twitter4j.oauth.consumerSecret, twitter4j.oauth.accessToken and
twitter4j.oauth.accessTokenSecret.
- createStream(JavaStreamingContext, String[], StorageLevel) - Static method in class org.apache.spark.streaming.twitter.TwitterUtils
-
Create a input stream that returns tweets received from Twitter using Twitter4J's default
OAuth authentication; this requires the system properties twitter4j.oauth.consumerKey,
twitter4j.oauth.consumerSecret, twitter4j.oauth.accessToken and
twitter4j.oauth.accessTokenSecret.
- createStream(JavaStreamingContext, Authorization) - Static method in class org.apache.spark.streaming.twitter.TwitterUtils
-
Create a input stream that returns tweets received from Twitter.
- createStream(JavaStreamingContext, Authorization, String[]) - Static method in class org.apache.spark.streaming.twitter.TwitterUtils
-
Create a input stream that returns tweets received from Twitter.
- createStream(JavaStreamingContext, Authorization, String[], StorageLevel) - Static method in class org.apache.spark.streaming.twitter.TwitterUtils
-
Create a input stream that returns tweets received from Twitter.
- createStream(StreamingContext, String, Subscribe, Function1<Seq<ByteString>, Iterator<T>>, StorageLevel, SupervisorStrategy, ClassTag<T>) - Static method in class org.apache.spark.streaming.zeromq.ZeroMQUtils
-
Create an input stream that receives messages pushed by a zeromq publisher.
- createStream(JavaStreamingContext, String, Subscribe, Function<byte[][], Iterable<T>>, StorageLevel, SupervisorStrategy) - Static method in class org.apache.spark.streaming.zeromq.ZeroMQUtils
-
Create an input stream that receives messages pushed by a zeromq publisher.
- createStream(JavaStreamingContext, String, Subscribe, Function<byte[][], Iterable<T>>, StorageLevel) - Static method in class org.apache.spark.streaming.zeromq.ZeroMQUtils
-
Create an input stream that receives messages pushed by a zeromq publisher.
- createStream(JavaStreamingContext, String, Subscribe, Function<byte[][], Iterable<T>>) - Static method in class org.apache.spark.streaming.zeromq.ZeroMQUtils
-
Create an input stream that receives messages pushed by a zeromq publisher.
- createTable(String, boolean, TypeTags.TypeTag<A>) - Method in class org.apache.spark.sql.hive.HiveContext
-
Creates a table using the schema of the given class.
- creationSiteInfo() - Method in class org.apache.spark.rdd.RDD
-
User code that created this RDD (e.g.
- failed() - Method in class org.apache.spark.scheduler.TaskInfo
-
- failedStages() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- failedTasks() - Method in class org.apache.spark.ui.jobs.ExecutorSummary
-
- failureReason() - Method in class org.apache.spark.scheduler.StageInfo
-
If the stage failed, the reason why.
- FAIR() - Static method in class org.apache.spark.scheduler.SchedulingMode
-
- feature() - Method in class org.apache.spark.mllib.tree.model.Split
-
- features() - Method in class org.apache.spark.mllib.regression.LabeledPoint
-
- FeatureType - Class in org.apache.spark.mllib.tree.configuration
-
:: Experimental ::
Enum to describe whether a feature is "continuous" or "categorical"
- FeatureType() - Constructor for class org.apache.spark.mllib.tree.configuration.FeatureType
-
- featureType() - Method in class org.apache.spark.mllib.tree.model.Split
-
- FetchFailed - Class in org.apache.spark
-
:: DeveloperApi ::
Task failed to fetch shuffle data from a remote node.
- FetchFailed(BlockManagerId, int, int, int) - Constructor for class org.apache.spark.FetchFailed
-
- field() - Method in class org.apache.spark.storage.BroadcastBlockId
-
- FIFO() - Static method in class org.apache.spark.scheduler.SchedulingMode
-
- files() - Method in class org.apache.spark.SparkContext
-
- fileStream(String) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Create an input stream that monitors a Hadoop-compatible filesystem
for new files and reads them using the given key-value types and input format.
- fileStream(String, ClassTag<K>, ClassTag<V>, ClassTag<F>) - Method in class org.apache.spark.streaming.StreamingContext
-
Create a input stream that monitors a Hadoop-compatible filesystem
for new files and reads them using the given key-value types and input format.
- fileStream(String, Function1<Path, Object>, boolean, ClassTag<K>, ClassTag<V>, ClassTag<F>) - Method in class org.apache.spark.streaming.StreamingContext
-
Create a input stream that monitors a Hadoop-compatible filesystem
for new files and reads them using the given key-value types and input format.
- filter(Function<Double, Boolean>) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Return a new RDD containing only the elements that satisfy a predicate.
- filter(Function<Tuple2<K, V>, Boolean>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return a new RDD containing only the elements that satisfy a predicate.
- filter(Function<T, Boolean>) - Method in class org.apache.spark.api.java.JavaRDD
-
Return a new RDD containing only the elements that satisfy a predicate.
- filter(Function1<T, Object>) - Method in class org.apache.spark.rdd.RDD
-
Return a new RDD containing only the elements that satisfy a predicate.
- filter(Function<Row, Boolean>) - Method in class org.apache.spark.sql.api.java.JavaSchemaRDD
-
Return a new RDD containing only the elements that satisfy a predicate.
- Filter - Class in org.apache.spark.sql.execution
-
:: DeveloperApi ::
- Filter(Expression, SparkPlan) - Constructor for class org.apache.spark.sql.execution.Filter
-
- filter(Function1<Row, Object>) - Method in class org.apache.spark.sql.SchemaRDD
-
- filter(Function<T, Boolean>) - Method in class org.apache.spark.streaming.api.java.JavaDStream
-
Return a new DStream containing only the elements that satisfy a predicate.
- filter(Function<Tuple2<K, V>, Boolean>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream containing only the elements that satisfy a predicate.
- filter(Function1<T, Object>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream containing only the elements that satisfy a predicate.
- filterWith(Function1<Object, A>, Function2<T, A, Object>) - Method in class org.apache.spark.rdd.RDD
-
Filters this RDD with p, where p takes an additional parameter of type A.
- findExpression(CatalystFilter, Expression) - Static method in class org.apache.spark.sql.parquet.ParquetFilters
-
Try to find the given expression in the tree of filters in order to
determine whether it is safe to remove it from the higher level filters.
- finished() - Method in class org.apache.spark.scheduler.TaskInfo
-
- finishTime() - Method in class org.apache.spark.scheduler.TaskInfo
-
The time when the task has completed successfully (including the time to remotely fetch
results, if necessary).
- first() - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
- first() - Method in class org.apache.spark.api.java.JavaPairRDD
-
- first() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return the first element in this RDD.
- first() - Method in class org.apache.spark.rdd.RDD
-
Return the first element in this RDD.
- flatMap(FlatMapFunction<T, U>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return a new RDD by first applying a function to all elements of this
RDD, and then flattening the results.
- flatMap(Function1<T, TraversableOnce<U>>, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
-
Return a new RDD by first applying a function to all elements of this
RDD, and then flattening the results.
- flatMap(FlatMapFunction<T, U>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream by applying a function to all elements of this DStream,
and then flattening the results
- flatMap(Function1<T, Traversable<U>>, ClassTag<U>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream by applying a function to all elements of this DStream,
and then flattening the results
- FlatMapFunction<T,R> - Interface in org.apache.spark.api.java.function
-
A function that returns zero or more output records from each input record.
- FlatMapFunction2<T1,T2,R> - Interface in org.apache.spark.api.java.function
-
A function that takes two inputs and returns zero or more output records.
- flatMapToDouble(DoubleFlatMapFunction<T>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return a new RDD by first applying a function to all elements of this
RDD, and then flattening the results.
- flatMapToPair(PairFlatMapFunction<T, K2, V2>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return a new RDD by first applying a function to all elements of this
RDD, and then flattening the results.
- flatMapToPair(PairFlatMapFunction<T, K2, V2>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream by applying a function to all elements of this DStream,
and then flattening the results
- flatMapValues(Function<V, Iterable<U>>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Pass each value in the key-value pair RDD through a flatMap function without changing the
keys; this also retains the original RDD's partitioning.
- flatMapValues(Function1<V, TraversableOnce<U>>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Pass each value in the key-value pair RDD through a flatMap function without changing the
keys; this also retains the original RDD's partitioning.
- flatMapValues(Function<V, Iterable<U>>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying a flatmap function to the value of each key-value pairs in
'this' DStream without changing the key.
- flatMapValues(Function1<V, TraversableOnce<U>>, ClassTag<U>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying a flatmap function to the value of each key-value pairs in
'this' DStream without changing the key.
- flatMapWith(Function1<Object, A>, boolean, Function2<T, A, Seq<U>>, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
-
FlatMaps f over this RDD, where f takes an additional parameter of type A.
- floatToFloatWritable(float) - Static method in class org.apache.spark.SparkContext
-
- floatWritableConverter() - Static method in class org.apache.spark.SparkContext
-
- floor(Duration) - Method in class org.apache.spark.streaming.Time
-
- FlumeUtils - Class in org.apache.spark.streaming.flume
-
- FlumeUtils() - Constructor for class org.apache.spark.streaming.flume.FlumeUtils
-
- flush() - Method in interface org.apache.spark.serializer.SerializationStream
-
- fMeasureByThreshold(double) - Method in class org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
-
Returns the (threshold, F-Measure) curve.
- fMeasureByThreshold() - Method in class org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
-
Returns the (threshold, F-Measure) curve with beta = 1.0.
- fold(T, Function2<T, T, T>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Aggregate the elements of each partition, and then the results for all the partitions, using a
given associative function and a neutral "zero value".
- fold(T, Function2<T, T, T>) - Method in class org.apache.spark.rdd.RDD
-
Aggregate the elements of each partition, and then the results for all the partitions, using a
given associative function and a neutral "zero value".
- foldByKey(V, Partitioner, Function2<V, V, V>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Merge the values for each key using an associative function and a neutral "zero value" which
may be added to the result an arbitrary number of times, and must not change the result
(e.g ., Nil for list concatenation, 0 for addition, or 1 for multiplication.).
- foldByKey(V, int, Function2<V, V, V>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Merge the values for each key using an associative function and a neutral "zero value" which
may be added to the result an arbitrary number of times, and must not change the result
(e.g ., Nil for list concatenation, 0 for addition, or 1 for multiplication.).
- foldByKey(V, Function2<V, V, V>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Merge the values for each key using an associative function and a neutral "zero value"
which may be added to the result an arbitrary number of times, and must not change the result
(e.g., Nil for list concatenation, 0 for addition, or 1 for multiplication.).
- foldByKey(V, Partitioner, Function2<V, V, V>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Merge the values for each key using an associative function and a neutral "zero value" which
may be added to the result an arbitrary number of times, and must not change the result
(e.g., Nil for list concatenation, 0 for addition, or 1 for multiplication.).
- foldByKey(V, int, Function2<V, V, V>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Merge the values for each key using an associative function and a neutral "zero value" which
may be added to the result an arbitrary number of times, and must not change the result
(e.g., Nil for list concatenation, 0 for addition, or 1 for multiplication.).
- foldByKey(V, Function2<V, V, V>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Merge the values for each key using an associative function and a neutral "zero value" which
may be added to the result an arbitrary number of times, and must not change the result
(e.g., Nil for list concatenation, 0 for addition, or 1 for multiplication.).
- foreach(VoidFunction<T>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Applies a function f to all elements of this RDD.
- foreach(Function1<T, BoxedUnit>) - Method in class org.apache.spark.rdd.RDD
-
Applies a function f to all elements of this RDD.
- foreach(Function<R, Void>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Deprecated.
As of release 0.9.0, replaced by foreachRDD
- foreach(Function2<R, Time, Void>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Deprecated.
As of release 0.9.0, replaced by foreachRDD
- foreach(Function1<RDD<T>, BoxedUnit>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Apply a function to each RDD in this DStream.
- foreach(Function2<RDD<T>, Time, BoxedUnit>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Apply a function to each RDD in this DStream.
- foreachAsync(Function1<T, BoxedUnit>) - Method in class org.apache.spark.rdd.AsyncRDDActions
-
Applies a function f to all elements of this RDD.
- foreachPartition(VoidFunction<Iterator<T>>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Applies a function f to each partition of this RDD.
- foreachPartition(Function1<Iterator<T>, BoxedUnit>) - Method in class org.apache.spark.rdd.RDD
-
Applies a function f to each partition of this RDD.
- foreachPartitionAsync(Function1<Iterator<T>, BoxedUnit>) - Method in class org.apache.spark.rdd.AsyncRDDActions
-
Applies a function f to each partition of this RDD.
- foreachRDD(Function<R, Void>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Apply a function to each RDD in this DStream.
- foreachRDD(Function2<R, Time, Void>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Apply a function to each RDD in this DStream.
- foreachRDD(Function1<RDD<T>, BoxedUnit>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Apply a function to each RDD in this DStream.
- foreachRDD(Function2<RDD<T>, Time, BoxedUnit>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Apply a function to each RDD in this DStream.
- foreachWith(Function1<Object, A>, Function2<T, A, BoxedUnit>) - Method in class org.apache.spark.rdd.RDD
-
Applies f to each element of this RDD, where f takes an additional parameter of type A.
- formatExecutorId(String) - Method in class org.apache.spark.storage.StorageStatusListener
-
In the local mode, there is a discrepancy between the executor ID according to the
task ("localhost") and that according to SparkEnv ("").
- fraction() - Method in class org.apache.spark.sql.execution.Sample
-
- fromAvroFlumeEvent(AvroFlumeEvent) - Static method in class org.apache.spark.streaming.flume.SparkFlumeEvent
-
- fromDStream(DStream<T>, ClassTag<T>) - Static method in class org.apache.spark.streaming.api.java.JavaDStream
-
- fromInputDStream(InputDStream<T>, ClassTag<T>) - Static method in class org.apache.spark.streaming.api.java.JavaInputDStream
-
- fromInputDStream(InputDStream<Tuple2<K, V>>, ClassTag<K>, ClassTag<V>) - Static method in class org.apache.spark.streaming.api.java.JavaPairInputDStream
-
- fromJavaDStream(JavaDStream<Tuple2<K, V>>) - Static method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
- fromJavaRDD(JavaRDD<Tuple2<K, V>>) - Static method in class org.apache.spark.api.java.JavaPairRDD
-
Convert a JavaRDD of key-value pairs to JavaPairRDD.
- fromPairDStream(DStream<Tuple2<K, V>>, ClassTag<K>, ClassTag<V>) - Static method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
- fromProductRdd(RDD<A>, TypeTags.TypeTag<A>) - Static method in class org.apache.spark.sql.execution.ExistingRdd
-
- fromRDD(RDD<Object>) - Static method in class org.apache.spark.api.java.JavaDoubleRDD
-
- fromRDD(RDD<Tuple2<K, V>>, ClassTag<K>, ClassTag<V>) - Static method in class org.apache.spark.api.java.JavaPairRDD
-
- fromRDD(RDD<T>, ClassTag<T>) - Static method in class org.apache.spark.api.java.JavaRDD
-
- fromRdd(RDD<?>) - Static method in class org.apache.spark.storage.RDDInfo
-
- fromReceiverInputDStream(ReceiverInputDStream<Tuple2<K, V>>, ClassTag<K>, ClassTag<V>) - Static method in class org.apache.spark.streaming.api.java.JavaPairReceiverInputDStream
-
- fromReceiverInputDStream(ReceiverInputDStream<T>, ClassTag<T>) - Static method in class org.apache.spark.streaming.api.java.JavaReceiverInputDStream
-
- fromSparkContext(SparkContext) - Static method in class org.apache.spark.api.java.JavaSparkContext
-
- fromStage(Stage) - Static method in class org.apache.spark.scheduler.StageInfo
-
Construct a StageInfo from a Stage.
- Function<T1,R> - Interface in org.apache.spark.api.java.function
-
Base interface for functions whose return types do not create special RDDs.
- Function2<T1,T2,R> - Interface in org.apache.spark.api.java.function
-
A two-argument function that takes arguments of type T1 and T2 and returns an R.
- Function3<T1,T2,T3,R> - Interface in org.apache.spark.api.java.function
-
A three-argument function that takes arguments of type T1, T2 and T3 and returns an R.
- FutureAction<T> - Interface in org.apache.spark
-
:: Experimental ::
A future for the result of an action to support cancellation.
- gain() - Method in class org.apache.spark.mllib.tree.model.InformationGainStats
-
- GeneralizedLinearAlgorithm<M extends GeneralizedLinearModel> - Class in org.apache.spark.mllib.regression
-
:: DeveloperApi ::
GeneralizedLinearAlgorithm implements methods to train a Generalized Linear Model (GLM).
- GeneralizedLinearAlgorithm() - Constructor for class org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm
-
- GeneralizedLinearModel - Class in org.apache.spark.mllib.regression
-
:: DeveloperApi ::
GeneralizedLinearModel (GLM) represents a model trained using
GeneralizedLinearAlgorithm.
- GeneralizedLinearModel(Vector, double) - Constructor for class org.apache.spark.mllib.regression.GeneralizedLinearModel
-
- Generate - Class in org.apache.spark.sql.execution
-
:: DeveloperApi ::
Applies a Generator
to a stream of input rows, combining the
output of each into a new stream of rows.
- Generate(Generator, boolean, boolean, SparkPlan) - Constructor for class org.apache.spark.sql.execution.Generate
-
- generate(Generator, boolean, boolean, Option<String>) - Method in class org.apache.spark.sql.SchemaRDD
-
:: Experimental ::
Applies the given Generator, or table generating function, to this relation.
- generatedRDDs() - Method in class org.apache.spark.streaming.dstream.DStream
-
- generateKMeansRDD(SparkContext, int, int, int, double, int) - Static method in class org.apache.spark.mllib.util.KMeansDataGenerator
-
Generate an RDD containing test data for KMeans.
- generateLinearInput(double, double[], int, int, double) - Static method in class org.apache.spark.mllib.util.LinearDataGenerator
-
- generateLinearInputAsList(double, double[], int, int, double) - Static method in class org.apache.spark.mllib.util.LinearDataGenerator
-
Return a Java List of synthetic data randomly generated according to a multi
collinear model.
- generateLinearRDD(SparkContext, int, int, double, int, double) - Static method in class org.apache.spark.mllib.util.LinearDataGenerator
-
Generate an RDD containing sample data for Linear Regression models - including Ridge, Lasso,
and uregularized variants.
- generateLogisticRDD(SparkContext, int, int, double, int, double) - Static method in class org.apache.spark.mllib.util.LogisticRegressionDataGenerator
-
Generate an RDD containing test data for LogisticRegression.
- generator() - Method in class org.apache.spark.sql.execution.Generate
-
- get() - Method in interface org.apache.spark.FutureAction
-
Blocks and returns the result of this job.
- get(String) - Method in class org.apache.spark.SparkConf
-
Get a parameter; throws a NoSuchElementException if it's not set
- get(String, String) - Method in class org.apache.spark.SparkConf
-
Get a parameter, falling back to a default if not set
- get() - Static method in class org.apache.spark.SparkEnv
-
Returns the ThreadLocal SparkEnv, if non-null.
- get(String) - Static method in class org.apache.spark.SparkFiles
-
Get the absolute path of a file added through SparkContext.addFile()
.
- get(int) - Method in class org.apache.spark.sql.api.java.Row
-
Returns the value of column `i`.
- get(String) - Method in interface org.apache.spark.sql.SQLConf
-
- get(String, String) - Method in interface org.apache.spark.sql.SQLConf
-
- getAkkaConf() - Method in class org.apache.spark.SparkConf
-
Get all akka conf variables set on this SparkConf
- getAll() - Method in class org.apache.spark.SparkConf
-
Get all parameters as a list of pairs
- getAll() - Method in interface org.apache.spark.sql.SQLConf
-
- getAllPools() - Method in class org.apache.spark.SparkContext
-
:: DeveloperApi ::
Return pools for fair scheduler
- getBoolean(String, boolean) - Method in class org.apache.spark.SparkConf
-
Get a parameter as a boolean, falling back to a default if not set
- getBoolean(int) - Method in class org.apache.spark.sql.api.java.Row
-
Returns the value of column i
as a bool.
- getByte(int) - Method in class org.apache.spark.sql.api.java.Row
-
Returns the value of column i
as a byte.
- getCachedBlockManagerId(BlockManagerId) - Static method in class org.apache.spark.storage.BlockManagerId
-
- getCachedMetadata(String) - Static method in class org.apache.spark.rdd.HadoopRDD
-
The three methods below are helpers for accessing the local map, a property of the SparkEnv of
the local process.
- getCheckpointDir() - Method in class org.apache.spark.api.java.JavaSparkContext
-
- getCheckpointDir() - Method in class org.apache.spark.SparkContext
-
- getCheckpointFile() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Gets the name of the file to which this RDD was checkpointed
- getCheckpointFile() - Method in class org.apache.spark.rdd.RDD
-
Gets the name of the file to which this RDD was checkpointed
- getConf() - Method in class org.apache.spark.api.java.JavaSparkContext
-
Return a copy of this JavaSparkContext's configuration.
- getConf() - Method in class org.apache.spark.rdd.HadoopRDD
-
- getConf() - Method in class org.apache.spark.rdd.NewHadoopRDD
-
- getConf() - Method in class org.apache.spark.SparkContext
-
Return a copy of this SparkContext's configuration.
- getDependencies() - Method in class org.apache.spark.rdd.CoGroupedRDD
-
- getDependencies() - Method in class org.apache.spark.rdd.ShuffledRDD
-
- getDependencies() - Method in class org.apache.spark.rdd.UnionRDD
-
- getDouble(String, double) - Method in class org.apache.spark.SparkConf
-
Get a parameter as a double, falling back to a default if not set
- getDouble(int) - Method in class org.apache.spark.sql.api.java.Row
-
Returns the value of column i
as a double.
- getExecutorEnv() - Method in class org.apache.spark.SparkConf
-
Get all executor environment variables set on this SparkConf
- getExecutorMemoryStatus() - Method in class org.apache.spark.SparkContext
-
Return a map from the slave to the max memory available for caching and the remaining
memory available for caching.
- getExecutorStorageStatus() - Method in class org.apache.spark.SparkContext
-
:: DeveloperApi ::
Return information about blocks stored in all of the slaves
- getFinalValue() - Method in class org.apache.spark.partial.PartialResult
-
Blocking method to wait for and return the final value.
- getFloat(int) - Method in class org.apache.spark.sql.api.java.Row
-
Returns the value of column i
as a float.
- getHiveFile(String) - Method in class org.apache.spark.sql.hive.test.TestHiveContext
-
- getInt(String, int) - Method in class org.apache.spark.SparkConf
-
Get a parameter as an integer, falling back to a default if not set
- getInt(int) - Method in class org.apache.spark.sql.api.java.Row
-
Returns the value of column i
as an int.
- getLocalProperty(String) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Get a local property set in this thread, or null if it is missing.
- getLocalProperty(String) - Method in class org.apache.spark.SparkContext
-
Get a local property set in this thread, or null if it is missing.
- getLong(String, long) - Method in class org.apache.spark.SparkConf
-
Get a parameter as a long, falling back to a default if not set
- getLong(int) - Method in class org.apache.spark.sql.api.java.Row
-
Returns the value of column i
as a long.
- getOption(String) - Method in class org.apache.spark.SparkConf
-
Get a parameter as an Option
- getOption(String) - Method in interface org.apache.spark.sql.SQLConf
-
- getOrCreate(String, JavaStreamingContextFactory) - Static method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Either recreate a StreamingContext from checkpoint data or create a new StreamingContext.
- getOrCreate(String, Configuration, JavaStreamingContextFactory) - Static method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Either recreate a StreamingContext from checkpoint data or create a new StreamingContext.
- getOrCreate(String, Configuration, JavaStreamingContextFactory, boolean) - Static method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Either recreate a StreamingContext from checkpoint data or create a new StreamingContext.
- getOrCreate(String, Function0<StreamingContext>, Configuration, boolean) - Static method in class org.apache.spark.streaming.StreamingContext
-
Either recreate a StreamingContext from checkpoint data or create a new StreamingContext.
- getParents(int) - Method in class org.apache.spark.NarrowDependency
-
Get the parent partitions for a child partition.
- getParents(int) - Method in class org.apache.spark.OneToOneDependency
-
- getParents(int) - Method in class org.apache.spark.RangeDependency
-
- getPartition(Object) - Method in class org.apache.spark.HashPartitioner
-
- getPartition(Object) - Method in class org.apache.spark.Partitioner
-
- getPartition(Object) - Method in class org.apache.spark.RangePartitioner
-
- getPartitions() - Method in class org.apache.spark.rdd.CoGroupedRDD
-
- getPartitions() - Method in class org.apache.spark.rdd.HadoopRDD
-
- getPartitions() - Method in class org.apache.spark.rdd.JdbcRDD
-
- getPartitions() - Method in class org.apache.spark.rdd.NewHadoopRDD
-
- getPartitions() - Method in class org.apache.spark.rdd.ShuffledRDD
-
- getPartitions() - Method in class org.apache.spark.rdd.UnionRDD
-
- getPartitions() - Method in class org.apache.spark.sql.SchemaRDD
-
- getPersistentRDDs() - Method in class org.apache.spark.SparkContext
-
Returns an immutable map of RDDs that have marked themselves as persistent via cache() call.
- getPoolForName(String) - Method in class org.apache.spark.SparkContext
-
:: DeveloperApi ::
Return the pool associated with the given name, if one exists
- getPreferredLocations(Partition) - Method in class org.apache.spark.rdd.HadoopRDD
-
- getPreferredLocations(Partition) - Method in class org.apache.spark.rdd.NewHadoopRDD
-
- getPreferredLocations(Partition) - Method in class org.apache.spark.rdd.UnionRDD
-
- getRDDStorageInfo() - Method in class org.apache.spark.SparkContext
-
:: DeveloperApi ::
Return information about what RDDs are cached, if they are in mem or on disk, how much space
they take, etc.
- getReceiver() - Method in class org.apache.spark.streaming.dstream.ReceiverInputDStream
-
Gets the receiver object that will be sent to the worker nodes
to receive data.
- getRootDirectory() - Static method in class org.apache.spark.SparkFiles
-
Get the root directory that contains files added through SparkContext.addFile()
.
- getSchedulingMode() - Method in class org.apache.spark.SparkContext
-
Return current scheduling mode
- getSerializer(Serializer) - Method in interface org.apache.spark.serializer.Serializer
-
- getShort(int) - Method in class org.apache.spark.sql.api.java.Row
-
Returns the value of column i
as a short.
- getSparkHome() - Method in class org.apache.spark.api.java.JavaSparkContext
-
Get Spark's home location from either a value set through the constructor,
or the spark.home Java property, or the SPARK_HOME environment variable
(in that order of preference).
- getStorageLevel() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Get the RDD's current storage level, or StorageLevel.NONE if none is set.
- getStorageLevel() - Method in class org.apache.spark.rdd.RDD
-
Get the RDD's current storage level, or StorageLevel.NONE if none is set.
- getString(int) - Method in class org.apache.spark.sql.api.java.Row
-
Returns the value of column i
as a String.
- getThreadLocal() - Static method in class org.apache.spark.SparkEnv
-
Returns the ThreadLocal SparkEnv.
- gettingResult() - Method in class org.apache.spark.scheduler.TaskInfo
-
- gettingResultTime() - Method in class org.apache.spark.scheduler.TaskInfo
-
The time when the task started remotely getting the result.
- Gini - Class in org.apache.spark.mllib.tree.impurity
-
:: Experimental ::
Class for calculating the
Gini impurity
during binary classification.
- Gini() - Constructor for class org.apache.spark.mllib.tree.impurity.Gini
-
- global() - Method in class org.apache.spark.sql.execution.Sort
-
- glom() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return an RDD created by coalescing all elements within each partition into an array.
- glom() - Method in class org.apache.spark.rdd.RDD
-
Return an RDD created by coalescing all elements within each partition into an array.
- glom() - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD is generated by applying glom() to each RDD of
this DStream.
- glom() - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream in which each RDD is generated by applying glom() to each RDD of
this DStream.
- Gradient - Class in org.apache.spark.mllib.optimization
-
:: DeveloperApi ::
Class used to compute the gradient for a loss function, given a single data point.
- Gradient() - Constructor for class org.apache.spark.mllib.optimization.Gradient
-
- GradientDescent - Class in org.apache.spark.mllib.optimization
-
Class used to solve an optimization problem using Gradient Descent.
- graph() - Method in class org.apache.spark.streaming.dstream.DStream
-
- graph() - Method in class org.apache.spark.streaming.StreamingContext
-
- groupBy(Function<T, K>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return an RDD of grouped elements.
- groupBy(Function<T, K>, int) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return an RDD of grouped elements.
- groupBy(Function1<T, K>, ClassTag<K>) - Method in class org.apache.spark.rdd.RDD
-
Return an RDD of grouped items.
- groupBy(Function1<T, K>, int, ClassTag<K>) - Method in class org.apache.spark.rdd.RDD
-
Return an RDD of grouped elements.
- groupBy(Function1<T, K>, Partitioner, ClassTag<K>, Ordering<K>) - Method in class org.apache.spark.rdd.RDD
-
Return an RDD of grouped items.
- groupBy(Seq<Expression>, Seq<Expression>) - Method in class org.apache.spark.sql.SchemaRDD
-
Performs a grouping followed by an aggregation.
- groupByKey(Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Group the values for each key in the RDD into a single sequence.
- groupByKey(int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Group the values for each key in the RDD into a single sequence.
- groupByKey() - Method in class org.apache.spark.api.java.JavaPairRDD
-
Group the values for each key in the RDD into a single sequence.
- groupByKey(Partitioner) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Group the values for each key in the RDD into a single sequence.
- groupByKey(int) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Group the values for each key in the RDD into a single sequence.
- groupByKey() - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Group the values for each key in the RDD into a single sequence.
- groupByKey() - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying groupByKey
to each RDD.
- groupByKey(int) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying groupByKey
to each RDD.
- groupByKey(Partitioner) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying groupByKey
on each RDD of this
DStream.
- groupByKey() - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying groupByKey
to each RDD.
- groupByKey(int) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying groupByKey
to each RDD.
- groupByKey(Partitioner) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying groupByKey
on each RDD.
- groupByKeyAndWindow(Duration) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying groupByKey
over a sliding window.
- groupByKeyAndWindow(Duration, Duration) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying groupByKey
over a sliding window.
- groupByKeyAndWindow(Duration, Duration, int) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying groupByKey
over a sliding window on this
DStream.
- groupByKeyAndWindow(Duration, Duration, Partitioner) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying groupByKey
over a sliding window on this
DStream.
- groupByKeyAndWindow(Duration) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying groupByKey
over a sliding window.
- groupByKeyAndWindow(Duration, Duration) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying groupByKey
over a sliding window.
- groupByKeyAndWindow(Duration, Duration, int) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying groupByKey
over a sliding window on this
DStream.
- groupByKeyAndWindow(Duration, Duration, Partitioner) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Create a new DStream by applying groupByKey
over a sliding window on this
DStream.
- groupingExpressions() - Method in class org.apache.spark.sql.execution.Aggregate
-
- groupWith(JavaPairRDD<K, W>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Alias for cogroup.
- groupWith(JavaPairRDD<K, W1>, JavaPairRDD<K, W2>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Alias for cogroup.
- groupWith(RDD<Tuple2<K, W>>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Alias for cogroup.
- groupWith(RDD<Tuple2<K, W1>>, RDD<Tuple2<K, W2>>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Alias for cogroup.
- main(String[]) - Static method in class org.apache.spark.mllib.util.KMeansDataGenerator
-
- main(String[]) - Static method in class org.apache.spark.mllib.util.LinearDataGenerator
-
- main(String[]) - Static method in class org.apache.spark.mllib.util.LogisticRegressionDataGenerator
-
- main(String[]) - Static method in class org.apache.spark.mllib.util.MFDataGenerator
-
- main(String[]) - Static method in class org.apache.spark.mllib.util.SVMDataGenerator
-
- makeRDD(Seq<T>, int, ClassTag<T>) - Method in class org.apache.spark.SparkContext
-
Distribute a local Scala collection to form an RDD.
- makeRDD(Seq<Tuple2<T, Seq<String>>>, ClassTag<T>) - Method in class org.apache.spark.SparkContext
-
Distribute a local Scala collection to form an RDD, with one or more
location preferences (hostnames of Spark nodes) for each object.
- map(Function<T, R>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return a new RDD by applying a function to all elements of this RDD.
- map(Function1<R, T>) - Method in class org.apache.spark.partial.PartialResult
-
Transform this PartialResult into a PartialResult of type T.
- map(Function1<T, U>, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
-
Return a new RDD by applying a function to all elements of this RDD.
- map(Function<T, R>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream by applying a function to all elements of this DStream.
- map(Function1<T, U>, ClassTag<U>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream by applying a function to all elements of this DStream.
- mapId() - Method in class org.apache.spark.FetchFailed
-
- mapId() - Method in class org.apache.spark.storage.ShuffleBlockId
-
- mapOutputTracker() - Method in class org.apache.spark.SparkEnv
-
- mapPartitions(FlatMapFunction<Iterator<T>, U>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return a new RDD by applying a function to each partition of this RDD.
- mapPartitions(FlatMapFunction<Iterator<T>, U>, boolean) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return a new RDD by applying a function to each partition of this RDD.
- mapPartitions(Function1<Iterator<T>, Iterator<U>>, boolean, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
-
Return a new RDD by applying a function to each partition of this RDD.
- mapPartitions(FlatMapFunction<Iterator<T>, U>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD is generated by applying mapPartitions() to each RDDs
of this DStream.
- mapPartitions(Function1<Iterator<T>, Iterator<U>>, boolean, ClassTag<U>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream in which each RDD is generated by applying mapPartitions() to each RDDs
of this DStream.
- mapPartitionsToDouble(DoubleFlatMapFunction<Iterator<T>>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return a new RDD by applying a function to each partition of this RDD.
- mapPartitionsToDouble(DoubleFlatMapFunction<Iterator<T>>, boolean) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return a new RDD by applying a function to each partition of this RDD.
- mapPartitionsToPair(PairFlatMapFunction<Iterator<T>, K2, V2>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return a new RDD by applying a function to each partition of this RDD.
- mapPartitionsToPair(PairFlatMapFunction<Iterator<T>, K2, V2>, boolean) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return a new RDD by applying a function to each partition of this RDD.
- mapPartitionsToPair(PairFlatMapFunction<Iterator<T>, K2, V2>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD is generated by applying mapPartitions() to each RDDs
of this DStream.
- mapPartitionsWithContext(Function2<TaskContext, Iterator<T>, Iterator<U>>, boolean, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
-
:: DeveloperApi ::
Return a new RDD by applying a function to each partition of this RDD.
- mapPartitionsWithIndex(Function2<Integer, Iterator<T>, Iterator<R>>, boolean) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return a new RDD by applying a function to each partition of this RDD, while tracking the index
of the original partition.
- mapPartitionsWithIndex(Function2<Object, Iterator<T>, Iterator<U>>, boolean, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
-
Return a new RDD by applying a function to each partition of this RDD, while tracking the index
of the original partition.
- mapPartitionsWithSplit(Function2<Object, Iterator<T>, Iterator<U>>, boolean, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
-
Return a new RDD by applying a function to each partition of this RDD, while tracking the index
of the original partition.
- mapredInputFormat() - Method in class org.apache.spark.scheduler.InputFormatInfo
-
- mapreduceInputFormat() - Method in class org.apache.spark.scheduler.InputFormatInfo
-
- mapToDouble(DoubleFunction<T>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return a new RDD by applying a function to all elements of this RDD.
- mapToPair(PairFunction<T, K2, V2>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return a new RDD by applying a function to all elements of this RDD.
- mapToPair(PairFunction<T, K2, V2>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream by applying a function to all elements of this DStream.
- mapValues(Function<V, U>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Pass each value in the key-value pair RDD through a map function without changing the keys;
this also retains the original RDD's partitioning.
- mapValues(Function1<V, U>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Pass each value in the key-value pair RDD through a map function without changing the keys;
this also retains the original RDD's partitioning.
- mapValues(Function<V, U>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying a map function to the value of each key-value pairs in
'this' DStream without changing the key.
- mapValues(Function1<V, U>, ClassTag<U>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying a map function to the value of each key-value pairs in
'this' DStream without changing the key.
- mapWith(Function1<Object, A>, boolean, Function2<T, A, U>, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
-
Maps f over this RDD, where f takes an additional parameter of type A.
- master() - Method in class org.apache.spark.api.java.JavaSparkContext
-
- master() - Method in class org.apache.spark.SparkContext
-
- Matrices - Class in org.apache.spark.mllib.linalg
-
- Matrices() - Constructor for class org.apache.spark.mllib.linalg.Matrices
-
- Matrix - Interface in org.apache.spark.mllib.linalg
-
Trait for a local matrix.
- MatrixEntry - Class in org.apache.spark.mllib.linalg.distributed
-
:: Experimental ::
Represents an entry in an distributed matrix.
- MatrixEntry(long, long, double) - Constructor for class org.apache.spark.mllib.linalg.distributed.MatrixEntry
-
- MatrixFactorizationModel - Class in org.apache.spark.mllib.recommendation
-
Model representing the result of matrix factorization.
- max(Comparator<T>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Returns the maximum element from this RDD as defined by the specified
Comparator[T].
- max() - Method in interface org.apache.spark.mllib.stat.MultivariateStatisticalSummary
-
Maximum value of each column.
- max(Ordering<T>) - Method in class org.apache.spark.rdd.RDD
-
Returns the max of this RDD as defined by the implicit Ordering[T].
- max(Duration) - Method in class org.apache.spark.streaming.Duration
-
- max(Time) - Method in class org.apache.spark.streaming.Time
-
- max() - Method in class org.apache.spark.util.StatCounter
-
- maxBins() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- maxDepth() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- maxMem() - Method in class org.apache.spark.scheduler.SparkListenerBlockManagerAdded
-
- maxMem() - Method in class org.apache.spark.storage.StorageStatus
-
- maxMemoryInMB() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- mean() - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Compute the mean of this RDD's elements.
- mean() - Method in interface org.apache.spark.mllib.stat.MultivariateStatisticalSummary
-
Sample mean vector.
- mean() - Method in class org.apache.spark.partial.BoundedDouble
-
- mean() - Method in class org.apache.spark.rdd.DoubleRDDFunctions
-
Compute the mean of this RDD's elements.
- mean() - Method in class org.apache.spark.util.StatCounter
-
- meanApprox(long, Double) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Return the approximate mean of the elements in this RDD.
- meanApprox(long) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
:: Experimental ::
Approximate operation to return the mean within a timeout.
- meanApprox(long, double) - Method in class org.apache.spark.rdd.DoubleRDDFunctions
-
:: Experimental ::
Approximate operation to return the mean within a timeout.
- MEMORY_AND_DISK - Static variable in class org.apache.spark.api.java.StorageLevels
-
- MEMORY_AND_DISK() - Static method in class org.apache.spark.storage.StorageLevel
-
- MEMORY_AND_DISK_2 - Static variable in class org.apache.spark.api.java.StorageLevels
-
- MEMORY_AND_DISK_2() - Static method in class org.apache.spark.storage.StorageLevel
-
- MEMORY_AND_DISK_SER - Static variable in class org.apache.spark.api.java.StorageLevels
-
- MEMORY_AND_DISK_SER() - Static method in class org.apache.spark.storage.StorageLevel
-
- MEMORY_AND_DISK_SER_2 - Static variable in class org.apache.spark.api.java.StorageLevels
-
- MEMORY_AND_DISK_SER_2() - Static method in class org.apache.spark.storage.StorageLevel
-
- MEMORY_ONLY - Static variable in class org.apache.spark.api.java.StorageLevels
-
- MEMORY_ONLY() - Static method in class org.apache.spark.storage.StorageLevel
-
- MEMORY_ONLY_2 - Static variable in class org.apache.spark.api.java.StorageLevels
-
- MEMORY_ONLY_2() - Static method in class org.apache.spark.storage.StorageLevel
-
- MEMORY_ONLY_SER - Static variable in class org.apache.spark.api.java.StorageLevels
-
- MEMORY_ONLY_SER() - Static method in class org.apache.spark.storage.StorageLevel
-
- MEMORY_ONLY_SER_2 - Static variable in class org.apache.spark.api.java.StorageLevels
-
- MEMORY_ONLY_SER_2() - Static method in class org.apache.spark.storage.StorageLevel
-
- memoryBytesSpilled() - Method in class org.apache.spark.ui.jobs.ExecutorSummary
-
- memRemaining() - Method in class org.apache.spark.storage.StorageStatus
-
- memSize() - Method in class org.apache.spark.storage.BlockStatus
-
- memSize() - Method in class org.apache.spark.storage.RDDInfo
-
- memUsed() - Method in class org.apache.spark.storage.StorageStatus
-
- memUsedByRDD(int) - Method in class org.apache.spark.storage.StorageStatus
-
- merge(R) - Method in class org.apache.spark.Accumulable
-
Merge two accumulable objects together
- merge(double) - Method in class org.apache.spark.util.StatCounter
-
Add a value into this StatCounter, updating the internal statistics.
- merge(TraversableOnce<Object>) - Method in class org.apache.spark.util.StatCounter
-
Add multiple values into this StatCounter, updating the internal statistics.
- merge(StatCounter) - Method in class org.apache.spark.util.StatCounter
-
Merge another StatCounter into this one, adding up the internal statistics.
- mergeCombiners() - Method in class org.apache.spark.Aggregator
-
- mergeValue() - Method in class org.apache.spark.Aggregator
-
- metadataCleaner() - Method in class org.apache.spark.SparkContext
-
- metastorePath() - Method in class org.apache.spark.sql.hive.LocalHiveContext
-
- metastorePath() - Method in class org.apache.spark.sql.hive.test.TestHiveContext
-
- metrics() - Method in class org.apache.spark.ExceptionFailure
-
- metricsSystem() - Method in class org.apache.spark.SparkEnv
-
- MFDataGenerator - Class in org.apache.spark.mllib.util
-
:: DeveloperApi ::
Generate RDD(s) containing data for Matrix Factorization.
- MFDataGenerator() - Constructor for class org.apache.spark.mllib.util.MFDataGenerator
-
- milliseconds() - Method in class org.apache.spark.streaming.Duration
-
- Milliseconds - Class in org.apache.spark.streaming
-
Helper object that creates instance of
Duration
representing
a given number of milliseconds.
- Milliseconds() - Constructor for class org.apache.spark.streaming.Milliseconds
-
- milliseconds() - Method in class org.apache.spark.streaming.Time
-
- millisToString(long) - Static method in class org.apache.spark.scheduler.StatsReportListener
-
Reformat a time interval in milliseconds to a prettier format for output
- min(Comparator<T>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Returns the minimum element from this RDD as defined by the specified
Comparator[T].
- min() - Method in interface org.apache.spark.mllib.stat.MultivariateStatisticalSummary
-
Minimum value of each column.
- min(Ordering<T>) - Method in class org.apache.spark.rdd.RDD
-
Returns the min of this RDD as defined by the implicit Ordering[T].
- min(Duration) - Method in class org.apache.spark.streaming.Duration
-
- min(Time) - Method in class org.apache.spark.streaming.Time
-
- min() - Method in class org.apache.spark.util.StatCounter
-
- MinMax() - Static method in class org.apache.spark.mllib.tree.configuration.QuantileStrategy
-
- minutes() - Static method in class org.apache.spark.scheduler.StatsReportListener
-
- Minutes - Class in org.apache.spark.streaming
-
Helper object that creates instance of
Duration
representing
a given number of minutes.
- Minutes() - Constructor for class org.apache.spark.streaming.Minutes
-
- MLUtils - Class in org.apache.spark.mllib.util
-
Helper methods to load, save and pre-process data used in ML Lib.
- MLUtils() - Constructor for class org.apache.spark.mllib.util.MLUtils
-
- MQTTUtils - Class in org.apache.spark.streaming.mqtt
-
- MQTTUtils() - Constructor for class org.apache.spark.streaming.mqtt.MQTTUtils
-
- multiply(Matrix) - Method in class org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix
-
Multiply this matrix by a local matrix on the right.
- multiply(Matrix) - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
-
Multiply this matrix by a local matrix on the right.
- multiply(double) - Method in class org.apache.spark.util.Vector
-
- MultivariateStatisticalSummary - Interface in org.apache.spark.mllib.stat
-
Trait for multivariate statistical summary of a data matrix.
- mustCheckpoint() - Method in class org.apache.spark.streaming.dstream.DStream
-
- MutablePair<T1,T2> - Class in org.apache.spark.util
-
:: DeveloperApi ::
A tuple of 2 elements.
- MutablePair(T1, T2) - Constructor for class org.apache.spark.util.MutablePair
-
- MutablePair() - Constructor for class org.apache.spark.util.MutablePair
-
No-arg constructor for serialization
- objectFile(String, int) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Load an RDD saved as a SequenceFile containing serialized objects, with NullWritable keys and
BytesWritable values that contain a serialized partition.
- objectFile(String) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Load an RDD saved as a SequenceFile containing serialized objects, with NullWritable keys and
BytesWritable values that contain a serialized partition.
- objectFile(String, int, ClassTag<T>) - Method in class org.apache.spark.SparkContext
-
Load an RDD saved as a SequenceFile containing serialized objects, with NullWritable keys and
BytesWritable values that contain a serialized partition.
- OFF_HEAP - Static variable in class org.apache.spark.api.java.StorageLevels
-
- OFF_HEAP() - Static method in class org.apache.spark.storage.StorageLevel
-
- onApplicationEnd(SparkListenerApplicationEnd) - Method in interface org.apache.spark.scheduler.SparkListener
-
Called when the application ends
- onApplicationStart(SparkListenerApplicationStart) - Method in interface org.apache.spark.scheduler.SparkListener
-
Called when the application starts
- onBatchCompleted(StreamingListenerBatchCompleted) - Method in class org.apache.spark.streaming.scheduler.StatsReportListener
-
- onBatchCompleted(StreamingListenerBatchCompleted) - Method in interface org.apache.spark.streaming.scheduler.StreamingListener
-
Called when processing of a batch of jobs has completed.
- onBatchStarted(StreamingListenerBatchStarted) - Method in interface org.apache.spark.streaming.scheduler.StreamingListener
-
Called when processing of a batch of jobs has started.
- onBatchSubmitted(StreamingListenerBatchSubmitted) - Method in interface org.apache.spark.streaming.scheduler.StreamingListener
-
Called when a batch of jobs has been submitted for processing.
- onBlockManagerAdded(SparkListenerBlockManagerAdded) - Method in interface org.apache.spark.scheduler.SparkListener
-
Called when a new block manager has joined
- onBlockManagerAdded(SparkListenerBlockManagerAdded) - Method in class org.apache.spark.storage.StorageStatusListener
-
- onBlockManagerAdded(SparkListenerBlockManagerAdded) - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- onBlockManagerRemoved(SparkListenerBlockManagerRemoved) - Method in interface org.apache.spark.scheduler.SparkListener
-
Called when an existing block manager has been removed
- onBlockManagerRemoved(SparkListenerBlockManagerRemoved) - Method in class org.apache.spark.storage.StorageStatusListener
-
- onBlockManagerRemoved(SparkListenerBlockManagerRemoved) - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- onComplete(Function1<Try<T>, U>, ExecutionContext) - Method in class org.apache.spark.ComplexFutureAction
-
- onComplete(Function1<Try<T>, U>, ExecutionContext) - Method in interface org.apache.spark.FutureAction
-
When this action is completed, either through an exception, or a value, applies the provided
function.
- onComplete(Function1<R, BoxedUnit>) - Method in class org.apache.spark.partial.PartialResult
-
Set a handler to be called when this PartialResult completes.
- onComplete(Function1<Try<T>, U>, ExecutionContext) - Method in class org.apache.spark.SimpleFutureAction
-
- onEnvironmentUpdate(SparkListenerEnvironmentUpdate) - Method in interface org.apache.spark.scheduler.SparkListener
-
Called when environment properties have been updated
- onEnvironmentUpdate(SparkListenerEnvironmentUpdate) - Method in class org.apache.spark.ui.env.EnvironmentListener
-
- onEnvironmentUpdate(SparkListenerEnvironmentUpdate) - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- ones(int) - Static method in class org.apache.spark.util.Vector
-
- OneToOneDependency<T> - Class in org.apache.spark
-
:: DeveloperApi ::
Represents a one-to-one dependency between partitions of the parent and child RDDs.
- OneToOneDependency(RDD<T>) - Constructor for class org.apache.spark.OneToOneDependency
-
- onFail(Function1<Exception, BoxedUnit>) - Method in class org.apache.spark.partial.PartialResult
-
Set a handler to be called if this PartialResult's job fails.
- onJobEnd(SparkListenerJobEnd) - Method in class org.apache.spark.scheduler.JobLogger
-
When job ends, recording job completion status and close log file
- onJobEnd(SparkListenerJobEnd) - Method in interface org.apache.spark.scheduler.SparkListener
-
Called when a job ends
- onJobStart(SparkListenerJobStart) - Method in class org.apache.spark.scheduler.JobLogger
-
When job starts, record job property and stage graph
- onJobStart(SparkListenerJobStart) - Method in interface org.apache.spark.scheduler.SparkListener
-
Called when a job starts
- onReceiverError(StreamingListenerReceiverError) - Method in interface org.apache.spark.streaming.scheduler.StreamingListener
-
Called when a receiver has reported an error
- onReceiverStarted(StreamingListenerReceiverStarted) - Method in interface org.apache.spark.streaming.scheduler.StreamingListener
-
Called when a receiver has been started
- onReceiverStopped(StreamingListenerReceiverStopped) - Method in interface org.apache.spark.streaming.scheduler.StreamingListener
-
Called when a receiver has been stopped
- onStageCompleted(SparkListenerStageCompleted) - Method in class org.apache.spark.scheduler.JobLogger
-
When stage is completed, record stage completion status
- onStageCompleted(SparkListenerStageCompleted) - Method in interface org.apache.spark.scheduler.SparkListener
-
Called when a stage completes successfully or fails, with information on the completed stage.
- onStageCompleted(SparkListenerStageCompleted) - Method in class org.apache.spark.scheduler.StatsReportListener
-
- onStageCompleted(SparkListenerStageCompleted) - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- onStageCompleted(SparkListenerStageCompleted) - Method in class org.apache.spark.ui.storage.StorageListener
-
- onStageSubmitted(SparkListenerStageSubmitted) - Method in class org.apache.spark.scheduler.JobLogger
-
When stage is submitted, record stage submit info
- onStageSubmitted(SparkListenerStageSubmitted) - Method in interface org.apache.spark.scheduler.SparkListener
-
Called when a stage is submitted
- onStageSubmitted(SparkListenerStageSubmitted) - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
For FIFO, all stages are contained by "default" pool but "default" pool here is meaningless
- onStageSubmitted(SparkListenerStageSubmitted) - Method in class org.apache.spark.ui.storage.StorageListener
-
- onStart() - Method in class org.apache.spark.streaming.receiver.Receiver
-
This method is called by the system when the receiver is started.
- onStop() - Method in class org.apache.spark.streaming.receiver.Receiver
-
This method is called by the system when the receiver is stopped.
- onTaskEnd(SparkListenerTaskEnd) - Method in class org.apache.spark.scheduler.JobLogger
-
When task ends, record task completion status and metrics
- onTaskEnd(SparkListenerTaskEnd) - Method in interface org.apache.spark.scheduler.SparkListener
-
Called when a task ends
- onTaskEnd(SparkListenerTaskEnd) - Method in class org.apache.spark.scheduler.StatsReportListener
-
- onTaskEnd(SparkListenerTaskEnd) - Method in class org.apache.spark.storage.StorageStatusListener
-
- onTaskEnd(SparkListenerTaskEnd) - Method in class org.apache.spark.ui.exec.ExecutorsListener
-
- onTaskEnd(SparkListenerTaskEnd) - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- onTaskEnd(SparkListenerTaskEnd) - Method in class org.apache.spark.ui.storage.StorageListener
-
Assumes the storage status list is fully up-to-date.
- onTaskGettingResult(SparkListenerTaskGettingResult) - Method in interface org.apache.spark.scheduler.SparkListener
-
Called when a task begins remotely fetching its result (will not be called for tasks that do
not need to fetch the result remotely).
- onTaskGettingResult(SparkListenerTaskGettingResult) - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- onTaskStart(SparkListenerTaskStart) - Method in interface org.apache.spark.scheduler.SparkListener
-
Called when a task starts
- onTaskStart(SparkListenerTaskStart) - Method in class org.apache.spark.ui.exec.ExecutorsListener
-
- onTaskStart(SparkListenerTaskStart) - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- onUnpersistRDD(SparkListenerUnpersistRDD) - Method in interface org.apache.spark.scheduler.SparkListener
-
Called when an RDD is manually unpersisted by the application
- onUnpersistRDD(SparkListenerUnpersistRDD) - Method in class org.apache.spark.storage.StorageStatusListener
-
- onUnpersistRDD(SparkListenerUnpersistRDD) - Method in class org.apache.spark.ui.storage.StorageListener
-
- optimize(RDD<Tuple2<Object, Vector>>, Vector) - Method in class org.apache.spark.mllib.optimization.GradientDescent
-
:: DeveloperApi ::
Runs gradient descent on the given training data.
- optimize(RDD<Tuple2<Object, Vector>>, Vector) - Method in class org.apache.spark.mllib.optimization.LBFGS
-
- optimize(RDD<Tuple2<Object, Vector>>, Vector) - Method in interface org.apache.spark.mllib.optimization.Optimizer
-
Solve the provided convex optimization problem.
- optimizer() - Method in class org.apache.spark.mllib.classification.LogisticRegressionWithSGD
-
- optimizer() - Method in class org.apache.spark.mllib.classification.SVMWithSGD
-
- Optimizer - Interface in org.apache.spark.mllib.optimization
-
:: DeveloperApi ::
Trait for optimization problem solvers.
- optimizer() - Method in class org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm
-
The optimizer to solve the problem.
- optimizer() - Method in class org.apache.spark.mllib.regression.LassoWithSGD
-
- optimizer() - Method in class org.apache.spark.mllib.regression.LinearRegressionWithSGD
-
- optimizer() - Method in class org.apache.spark.mllib.regression.RidgeRegressionWithSGD
-
- orderBy(Seq<SortOrder>) - Method in class org.apache.spark.sql.SchemaRDD
-
Sorts the results by the given expressions.
- OrderedRDDFunctions<K,V,P extends scala.Product2<K,V>> - Class in org.apache.spark.rdd
-
Extra functions available on RDDs of (key, value) pairs where the key is sortable through
an implicit conversion.
- OrderedRDDFunctions(RDD<P>, Ordering<K>, ClassTag<K>, ClassTag<V>, ClassTag<P>) - Constructor for class org.apache.spark.rdd.OrderedRDDFunctions
-
- ordering() - Method in class org.apache.spark.sql.execution.Sort
-
- ordering() - Method in class org.apache.spark.sql.execution.TakeOrdered
-
- ordering() - Static method in class org.apache.spark.streaming.Time
-
- org.apache.spark - package org.apache.spark
-
Core Spark classes in Scala.
- org.apache.spark.annotation - package org.apache.spark.annotation
-
Spark annotations to mark an API experimental or intended only for advanced usages by developers.
- org.apache.spark.api.java - package org.apache.spark.api.java
-
Spark Java programming APIs.
- org.apache.spark.api.java.function - package org.apache.spark.api.java.function
-
Set of interfaces to represent functions in Spark's Java API.
- org.apache.spark.broadcast - package org.apache.spark.broadcast
-
Spark's broadcast variables, used to broadcast immutable datasets to all nodes.
- org.apache.spark.io - package org.apache.spark.io
-
IO codecs used for compression.
- org.apache.spark.mllib.classification - package org.apache.spark.mllib.classification
-
- org.apache.spark.mllib.clustering - package org.apache.spark.mllib.clustering
-
- org.apache.spark.mllib.evaluation - package org.apache.spark.mllib.evaluation
-
- org.apache.spark.mllib.linalg - package org.apache.spark.mllib.linalg
-
- org.apache.spark.mllib.linalg.distributed - package org.apache.spark.mllib.linalg.distributed
-
- org.apache.spark.mllib.optimization - package org.apache.spark.mllib.optimization
-
- org.apache.spark.mllib.recommendation - package org.apache.spark.mllib.recommendation
-
- org.apache.spark.mllib.regression - package org.apache.spark.mllib.regression
-
- org.apache.spark.mllib.stat - package org.apache.spark.mllib.stat
-
- org.apache.spark.mllib.tree - package org.apache.spark.mllib.tree
-
- org.apache.spark.mllib.tree.configuration - package org.apache.spark.mllib.tree.configuration
-
- org.apache.spark.mllib.tree.impurity - package org.apache.spark.mllib.tree.impurity
-
- org.apache.spark.mllib.tree.model - package org.apache.spark.mllib.tree.model
-
- org.apache.spark.mllib.util - package org.apache.spark.mllib.util
-
- org.apache.spark.partial - package org.apache.spark.partial
-
- org.apache.spark.rdd - package org.apache.spark.rdd
-
Provides implementation's of various RDDs.
- org.apache.spark.scheduler - package org.apache.spark.scheduler
-
Spark's DAG scheduler.
- org.apache.spark.serializer - package org.apache.spark.serializer
-
Pluggable serializers for RDD and shuffle data.
- org.apache.spark.sql - package org.apache.spark.sql
-
- org.apache.spark.sql.api.java - package org.apache.spark.sql.api.java
-
- org.apache.spark.sql.execution - package org.apache.spark.sql.execution
-
- org.apache.spark.sql.hive - package org.apache.spark.sql.hive
-
- org.apache.spark.sql.hive.api.java - package org.apache.spark.sql.hive.api.java
-
- org.apache.spark.sql.hive.execution - package org.apache.spark.sql.hive.execution
-
- org.apache.spark.sql.hive.test - package org.apache.spark.sql.hive.test
-
- org.apache.spark.sql.parquet - package org.apache.spark.sql.parquet
-
- org.apache.spark.sql.test - package org.apache.spark.sql.test
-
- org.apache.spark.storage - package org.apache.spark.storage
-
- org.apache.spark.streaming - package org.apache.spark.streaming
-
- org.apache.spark.streaming.api.java - package org.apache.spark.streaming.api.java
-
Java APIs for spark streaming.
- org.apache.spark.streaming.dstream - package org.apache.spark.streaming.dstream
-
Various implementations of DStreams.
- org.apache.spark.streaming.flume - package org.apache.spark.streaming.flume
-
Spark streaming receiver for Flume.
- org.apache.spark.streaming.kafka - package org.apache.spark.streaming.kafka
-
Kafka receiver for spark streaming.
- org.apache.spark.streaming.mqtt - package org.apache.spark.streaming.mqtt
-
MQTT receiver for Spark Streaming.
- org.apache.spark.streaming.receiver - package org.apache.spark.streaming.receiver
-
- org.apache.spark.streaming.scheduler - package org.apache.spark.streaming.scheduler
-
- org.apache.spark.streaming.twitter - package org.apache.spark.streaming.twitter
-
Twitter feed receiver for spark streaming.
- org.apache.spark.streaming.zeromq - package org.apache.spark.streaming.zeromq
-
Zeromq receiver for spark streaming.
- org.apache.spark.ui.env - package org.apache.spark.ui.env
-
- org.apache.spark.ui.exec - package org.apache.spark.ui.exec
-
- org.apache.spark.ui.jobs - package org.apache.spark.ui.jobs
-
- org.apache.spark.ui.storage - package org.apache.spark.ui.storage
-
- org.apache.spark.util - package org.apache.spark.util
-
Spark utilities.
- org.apache.spark.util.random - package org.apache.spark.util.random
-
Utilities for random number generation.
- otherCopyArgs() - Method in class org.apache.spark.sql.execution.Aggregate
-
- otherCopyArgs() - Method in class org.apache.spark.sql.execution.BroadcastNestedLoopJoin
-
- otherCopyArgs() - Method in class org.apache.spark.sql.execution.ExplainCommand
-
- otherCopyArgs() - Method in class org.apache.spark.sql.execution.LeftSemiJoinBNL
-
- otherCopyArgs() - Method in class org.apache.spark.sql.execution.Limit
-
- otherCopyArgs() - Method in class org.apache.spark.sql.execution.SetCommand
-
- otherCopyArgs() - Method in class org.apache.spark.sql.execution.TakeOrdered
-
- otherCopyArgs() - Method in class org.apache.spark.sql.execution.Union
-
- otherCopyArgs() - Method in class org.apache.spark.sql.hive.execution.DescribeHiveTableCommand
-
- otherCopyArgs() - Method in class org.apache.spark.sql.hive.execution.InsertIntoHiveTable
-
- otherCopyArgs() - Method in class org.apache.spark.sql.hive.execution.NativeCommand
-
- otherCopyArgs() - Method in class org.apache.spark.sql.hive.execution.ScriptTransformation
-
- otherCopyArgs() - Method in class org.apache.spark.sql.parquet.InsertIntoParquetTable
-
- otherCopyArgs() - Method in class org.apache.spark.sql.parquet.ParquetTableScan
-
- otherInfo() - Method in class org.apache.spark.streaming.receiver.Statistics
-
- outer() - Method in class org.apache.spark.sql.execution.Generate
-
- output() - Method in class org.apache.spark.sql.execution.Aggregate
-
- output() - Method in class org.apache.spark.sql.execution.BroadcastNestedLoopJoin
-
- output() - Method in class org.apache.spark.sql.execution.CacheCommand
-
- output() - Method in class org.apache.spark.sql.execution.CartesianProduct
-
- output() - Method in class org.apache.spark.sql.execution.DescribeCommand
-
- output() - Method in class org.apache.spark.sql.execution.Exchange
-
- output() - Method in class org.apache.spark.sql.execution.ExistingRdd
-
- output() - Method in class org.apache.spark.sql.execution.ExplainCommand
-
- output() - Method in class org.apache.spark.sql.execution.Filter
-
- output() - Method in class org.apache.spark.sql.execution.Generate
-
- output() - Method in class org.apache.spark.sql.execution.HashJoin
-
- output() - Method in class org.apache.spark.sql.execution.LeftSemiJoinBNL
-
- output() - Method in class org.apache.spark.sql.execution.LeftSemiJoinHash
-
- output() - Method in class org.apache.spark.sql.execution.Limit
-
- output() - Method in class org.apache.spark.sql.execution.Project
-
- output() - Method in class org.apache.spark.sql.execution.Sample
-
- output() - Method in class org.apache.spark.sql.execution.SetCommand
-
- output() - Method in class org.apache.spark.sql.execution.Sort
-
- output() - Method in class org.apache.spark.sql.execution.SparkLogicalPlan
-
- output() - Method in class org.apache.spark.sql.execution.TakeOrdered
-
- output() - Method in class org.apache.spark.sql.execution.Union
-
- output() - Method in class org.apache.spark.sql.hive.execution.DescribeHiveTableCommand
-
- output() - Method in class org.apache.spark.sql.hive.execution.HiveTableScan
-
- output() - Method in class org.apache.spark.sql.hive.execution.InsertIntoHiveTable
-
- output() - Method in class org.apache.spark.sql.hive.execution.NativeCommand
-
- output() - Method in class org.apache.spark.sql.hive.execution.ScriptTransformation
-
- output() - Method in class org.apache.spark.sql.parquet.InsertIntoParquetTable
-
- output() - Method in class org.apache.spark.sql.parquet.ParquetTableScan
-
- outputClass() - Method in class org.apache.spark.sql.hive.execution.InsertIntoHiveTable
-
- outputPartitioning() - Method in class org.apache.spark.sql.execution.BroadcastNestedLoopJoin
-
- outputPartitioning() - Method in class org.apache.spark.sql.execution.Exchange
-
- outputPartitioning() - Method in class org.apache.spark.sql.execution.HashJoin
-
- outputPartitioning() - Method in class org.apache.spark.sql.execution.LeftSemiJoinBNL
-
- outputPartitioning() - Method in class org.apache.spark.sql.execution.LeftSemiJoinHash
-
- outputPartitioning() - Method in class org.apache.spark.sql.execution.SparkPlan
-
Specifies how data is partitioned across different nodes in the cluster.
- overwrite() - Method in class org.apache.spark.sql.hive.execution.InsertIntoHiveTable
-
- overwrite() - Method in class org.apache.spark.sql.parquet.InsertIntoParquetTable
-
- PairDStreamFunctions<K,V> - Class in org.apache.spark.streaming.dstream
-
Extra functions available on DStream of (key, value) pairs through an implicit conversion.
- PairDStreamFunctions(DStream<Tuple2<K, V>>, ClassTag<K>, ClassTag<V>, Ordering<K>) - Constructor for class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
- PairFlatMapFunction<T,K,V> - Interface in org.apache.spark.api.java.function
-
A function that returns zero or more key-value pair records from each input record.
- PairFunction<T,K,V> - Interface in org.apache.spark.api.java.function
-
A function that returns key-value pairs (Tuple2), and can be used to construct PairRDDs.
- PairRDDFunctions<K,V> - Class in org.apache.spark.rdd
-
Extra functions available on RDDs of (key, value) pairs through an implicit conversion.
- PairRDDFunctions(RDD<Tuple2<K, V>>, ClassTag<K>, ClassTag<V>, Ordering<K>) - Constructor for class org.apache.spark.rdd.PairRDDFunctions
-
- parallelize(List<T>, int) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Distribute a local Scala collection to form an RDD.
- parallelize(List<T>) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Distribute a local Scala collection to form an RDD.
- parallelize(Seq<T>, int, ClassTag<T>) - Method in class org.apache.spark.SparkContext
-
Distribute a local Scala collection to form an RDD.
- parallelizeDoubles(List<Double>, int) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Distribute a local Scala collection to form an RDD.
- parallelizeDoubles(List<Double>) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Distribute a local Scala collection to form an RDD.
- parallelizePairs(List<Tuple2<K, V>>, int) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Distribute a local Scala collection to form an RDD.
- parallelizePairs(List<Tuple2<K, V>>) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Distribute a local Scala collection to form an RDD.
- PARQUET_FILTER_DATA() - Static method in class org.apache.spark.sql.parquet.ParquetFilters
-
- PARQUET_FILTER_PUSHDOWN_ENABLED() - Static method in class org.apache.spark.sql.parquet.ParquetFilters
-
- parquetFile(String) - Method in class org.apache.spark.sql.api.java.JavaSQLContext
-
- parquetFile(String) - Method in class org.apache.spark.sql.SQLContext
-
Loads a Parquet file, returning the result as a
SchemaRDD
.
- ParquetFilters - Class in org.apache.spark.sql.parquet
-
- ParquetFilters() - Constructor for class org.apache.spark.sql.parquet.ParquetFilters
-
- ParquetTableScan - Class in org.apache.spark.sql.parquet
-
Parquet table scan operator.
- ParquetTableScan(Seq<Attribute>, ParquetRelation, Seq<Expression>, SQLContext) - Constructor for class org.apache.spark.sql.parquet.ParquetTableScan
-
- partial() - Method in class org.apache.spark.sql.execution.Aggregate
-
- PartialResult<R> - Class in org.apache.spark.partial
-
- PartialResult(R, boolean) - Constructor for class org.apache.spark.partial.PartialResult
-
- Partition - Interface in org.apache.spark
-
A partition of an RDD.
- partition() - Method in class org.apache.spark.sql.hive.execution.InsertIntoHiveTable
-
- partitionBy(Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return a copy of the RDD partitioned using the specified partitioner.
- partitionBy(Partitioner) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Return a copy of the RDD partitioned using the specified partitioner.
- Partitioner - Class in org.apache.spark
-
An object that defines how the elements in a key-value pair RDD are partitioned by key.
- Partitioner() - Constructor for class org.apache.spark.Partitioner
-
- partitioner() - Method in class org.apache.spark.rdd.CoGroupedRDD
-
- partitioner() - Method in class org.apache.spark.rdd.RDD
-
Optionally overridden by subclasses to specify how they are partitioned.
- partitioner() - Method in class org.apache.spark.rdd.ShuffledRDD
-
- partitioner() - Method in class org.apache.spark.ShuffleDependency
-
- partitionId() - Method in class org.apache.spark.TaskContext
-
- partitionPruningPred() - Method in class org.apache.spark.sql.hive.execution.HiveTableScan
-
- PartitionPruningRDD<T> - Class in org.apache.spark.rdd
-
:: DeveloperApi ::
A RDD used to prune RDD partitions/partitions so we can avoid launching tasks on
all partitions.
- PartitionPruningRDD(RDD<T>, Function1<Object, Object>, ClassTag<T>) - Constructor for class org.apache.spark.rdd.PartitionPruningRDD
-
- partitions() - Method in class org.apache.spark.rdd.RDD
-
Get the array of partitions of this RDD, taking into account whether the
RDD is checkpointed or not.
- path() - Method in class org.apache.spark.scheduler.InputFormatInfo
-
- path() - Method in class org.apache.spark.scheduler.SplitInfo
-
- percentiles() - Static method in class org.apache.spark.scheduler.StatsReportListener
-
- percentilesHeader() - Static method in class org.apache.spark.scheduler.StatsReportListener
-
- persist(StorageLevel) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Set this RDD's storage level to persist its values across operations after the first time
it is computed.
- persist(StorageLevel) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Set this RDD's storage level to persist its values across operations after the first time
it is computed.
- persist(StorageLevel) - Method in class org.apache.spark.api.java.JavaRDD
-
Set this RDD's storage level to persist its values across operations after the first time
it is computed.
- persist(StorageLevel) - Method in class org.apache.spark.rdd.RDD
-
Set this RDD's storage level to persist its values across operations after the first time
it is computed.
- persist() - Method in class org.apache.spark.rdd.RDD
-
Persist this RDD with the default storage level (`MEMORY_ONLY`).
- persist() - Method in class org.apache.spark.sql.api.java.JavaSchemaRDD
-
Persist this RDD with the default storage level (`MEMORY_ONLY`).
- persist(StorageLevel) - Method in class org.apache.spark.sql.api.java.JavaSchemaRDD
-
Set this RDD's storage level to persist its values across operations after the first time
it is computed.
- persist() - Method in class org.apache.spark.streaming.api.java.JavaDStream
-
Persist RDDs of this DStream with the default storage level (MEMORY_ONLY_SER)
- persist(StorageLevel) - Method in class org.apache.spark.streaming.api.java.JavaDStream
-
Persist the RDDs of this DStream with the given storage level
- persist() - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Persist RDDs of this DStream with the default storage level (MEMORY_ONLY_SER)
- persist(StorageLevel) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Persist the RDDs of this DStream with the given storage level
- persist(StorageLevel) - Method in class org.apache.spark.streaming.dstream.DStream
-
Persist the RDDs of this DStream with the given storage level
- persist() - Method in class org.apache.spark.streaming.dstream.DStream
-
Persist RDDs of this DStream with the default storage level (MEMORY_ONLY_SER)
- persistentRdds() - Method in class org.apache.spark.SparkContext
-
- pi() - Method in class org.apache.spark.mllib.classification.NaiveBayesModel
-
- pipe(String) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return an RDD created by piping elements to a forked external process.
- pipe(List<String>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return an RDD created by piping elements to a forked external process.
- pipe(List<String>, Map<String, String>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return an RDD created by piping elements to a forked external process.
- pipe(String) - Method in class org.apache.spark.rdd.RDD
-
Return an RDD created by piping elements to a forked external process.
- pipe(String, Map<String, String>) - Method in class org.apache.spark.rdd.RDD
-
Return an RDD created by piping elements to a forked external process.
- pipe(Seq<String>, Map<String, String>, Function1<Function1<String, BoxedUnit>, BoxedUnit>, Function2<T, Function1<String, BoxedUnit>, BoxedUnit>, boolean) - Method in class org.apache.spark.rdd.RDD
-
Return an RDD created by piping elements to a forked external process.
- plusDot(Vector, Vector) - Method in class org.apache.spark.util.Vector
-
return (this + plus) dot other, but without creating any intermediate storage
- poisson() - Method in class org.apache.spark.util.random.PoissonSampler
-
- PoissonSampler<T> - Class in org.apache.spark.util.random
-
:: DeveloperApi ::
A sampler based on values drawn from Poisson distribution.
- PoissonSampler(double, Poisson) - Constructor for class org.apache.spark.util.random.PoissonSampler
-
- poolToActiveStages() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- port() - Method in class org.apache.spark.storage.BlockManagerId
-
- pr() - Method in class org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
-
Returns the precision-recall curve, which is an RDD of (recall, precision),
NOT (precision, recall), with (0.0, 1.0) prepended to it.
- precisionByThreshold() - Method in class org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
-
Returns the (threshold, precision) curve.
- predict(RDD<Vector>) - Method in interface org.apache.spark.mllib.classification.ClassificationModel
-
Predict values for the given data set using the model trained.
- predict(Vector) - Method in interface org.apache.spark.mllib.classification.ClassificationModel
-
Predict values for a single data point using the model trained.
- predict(JavaRDD<Vector>) - Method in interface org.apache.spark.mllib.classification.ClassificationModel
-
Predict values for examples stored in a JavaRDD.
- predict(RDD<Vector>) - Method in class org.apache.spark.mllib.classification.NaiveBayesModel
-
- predict(Vector) - Method in class org.apache.spark.mllib.classification.NaiveBayesModel
-
- predict(Vector) - Method in class org.apache.spark.mllib.clustering.KMeansModel
-
Returns the cluster index that a given point belongs to.
- predict(RDD<Vector>) - Method in class org.apache.spark.mllib.clustering.KMeansModel
-
Maps given points to their cluster indices.
- predict(JavaRDD<Vector>) - Method in class org.apache.spark.mllib.clustering.KMeansModel
-
Maps given points to their cluster indices.
- predict(int, int) - Method in class org.apache.spark.mllib.recommendation.MatrixFactorizationModel
-
Predict the rating of one user for one product.
- predict(RDD<Tuple2<Object, Object>>) - Method in class org.apache.spark.mllib.recommendation.MatrixFactorizationModel
-
Predict the rating of many users for many products.
- predict(JavaRDD<byte[]>) - Method in class org.apache.spark.mllib.recommendation.MatrixFactorizationModel
-
:: DeveloperApi ::
Predict the rating of many users for many products.
- predict(RDD<Vector>) - Method in class org.apache.spark.mllib.regression.GeneralizedLinearModel
-
Predict values for the given data set using the model trained.
- predict(Vector) - Method in class org.apache.spark.mllib.regression.GeneralizedLinearModel
-
Predict values for a single data point using the model trained.
- predict(RDD<Vector>) - Method in interface org.apache.spark.mllib.regression.RegressionModel
-
Predict values for the given data set using the model trained.
- predict(Vector) - Method in interface org.apache.spark.mllib.regression.RegressionModel
-
Predict values for a single data point using the model trained.
- predict(JavaRDD<Vector>) - Method in interface org.apache.spark.mllib.regression.RegressionModel
-
Predict values for examples stored in a JavaRDD.
- predict(Vector) - Method in class org.apache.spark.mllib.tree.model.DecisionTreeModel
-
Predict values for a single data point using the model trained.
- predict(RDD<Vector>) - Method in class org.apache.spark.mllib.tree.model.DecisionTreeModel
-
Predict values for the given data set using the model trained.
- predict() - Method in class org.apache.spark.mllib.tree.model.InformationGainStats
-
- predict() - Method in class org.apache.spark.mllib.tree.model.Node
-
- predictIfLeaf(Vector) - Method in class org.apache.spark.mllib.tree.model.Node
-
predict value if node is not leaf
- preferredLocation() - Method in class org.apache.spark.streaming.receiver.Receiver
-
Override this to specify a preferred location (hostname).
- preferredLocations(Partition) - Method in class org.apache.spark.rdd.RDD
-
Get the preferred locations of a partition (as hostnames), taking into account whether the
RDD is checkpointed.
- preferredNodeLocationData() - Method in class org.apache.spark.SparkContext
-
- prettyPrint() - Method in class org.apache.spark.streaming.Duration
-
- prev() - Method in class org.apache.spark.rdd.ShuffledRDD
-
- print() - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Print the first ten elements of each RDD generated in this DStream.
- print() - Method in class org.apache.spark.streaming.dstream.DStream
-
Print the first ten elements of each RDD generated in this DStream.
- printStats() - Method in class org.apache.spark.streaming.scheduler.StatsReportListener
-
- probabilities() - Static method in class org.apache.spark.scheduler.StatsReportListener
-
- PROCESS_LOCAL() - Static method in class org.apache.spark.scheduler.TaskLocality
-
- processingDelay() - Method in class org.apache.spark.streaming.scheduler.BatchInfo
-
Time taken for the all jobs of this batch to finish processing from the time they started
processing.
- processingEndTime() - Method in class org.apache.spark.streaming.scheduler.BatchInfo
-
- processingStartTime() - Method in class org.apache.spark.streaming.scheduler.BatchInfo
-
- product() - Method in class org.apache.spark.mllib.recommendation.Rating
-
- productFeatures() - Method in class org.apache.spark.mllib.recommendation.MatrixFactorizationModel
-
- productToRowRdd(RDD<A>) - Static method in class org.apache.spark.sql.execution.ExistingRdd
-
- Project - Class in org.apache.spark.sql.execution
-
:: DeveloperApi ::
- Project(Seq<NamedExpression>, SparkPlan) - Constructor for class org.apache.spark.sql.execution.Project
-
- projectList() - Method in class org.apache.spark.sql.execution.Project
-
- properties() - Method in class org.apache.spark.scheduler.SparkListenerJobStart
-
- properties() - Method in class org.apache.spark.scheduler.SparkListenerStageSubmitted
-
- pruneColumns(Seq<Attribute>) - Method in class org.apache.spark.sql.parquet.ParquetTableScan
-
Applies a (candidate) projection.
- Pseudorandom - Interface in org.apache.spark.util.random
-
:: DeveloperApi ::
A class with pseudorandom behavior.
- putCachedMetadata(String, Object) - Static method in class org.apache.spark.rdd.HadoopRDD
-
- RACK_LOCAL() - Static method in class org.apache.spark.scheduler.TaskLocality
-
- RANDOM() - Static method in class org.apache.spark.mllib.clustering.KMeans
-
- random(int, Random) - Static method in class org.apache.spark.util.Vector
-
Creates this
Vector
of given length containing random numbers
between 0.0 and 1.0.
- RandomSampler<T,U> - Interface in org.apache.spark.util.random
-
:: DeveloperApi ::
A pseudorandom sampler.
- randomSplit(double[], long) - Method in class org.apache.spark.rdd.RDD
-
Randomly splits this RDD with the provided weights.
- RangeDependency<T> - Class in org.apache.spark
-
:: DeveloperApi ::
Represents a one-to-one dependency between ranges of partitions in the parent and child RDDs.
- RangeDependency(RDD<T>, int, int, int) - Constructor for class org.apache.spark.RangeDependency
-
- RangePartitioner<K,V> - Class in org.apache.spark
-
A
Partitioner
that partitions sortable records by range into roughly
equal ranges.
- RangePartitioner(int, RDD<? extends Product2<K, V>>, boolean, Ordering<K>, ClassTag<K>) - Constructor for class org.apache.spark.RangePartitioner
-
- rank() - Method in class org.apache.spark.mllib.recommendation.MatrixFactorizationModel
-
- Rating - Class in org.apache.spark.mllib.recommendation
-
:: Experimental ::
A more compact class to represent a rating than Tuple3[Int, Int, Double].
- Rating(int, int, double) - Constructor for class org.apache.spark.mllib.recommendation.Rating
-
- rating() - Method in class org.apache.spark.mllib.recommendation.Rating
-
- rawSocketStream(String, int, StorageLevel) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Create an input stream from network source hostname:port, where data is received
as serialized blocks (serialized using the Spark's serializer) that can be directly
pushed into the block manager without deserializing them.
- rawSocketStream(String, int) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Create an input stream from network source hostname:port, where data is received
as serialized blocks (serialized using the Spark's serializer) that can be directly
pushed into the block manager without deserializing them.
- rawSocketStream(String, int, StorageLevel, ClassTag<T>) - Method in class org.apache.spark.streaming.StreamingContext
-
Create a input stream from network source hostname:port, where data is received
as serialized blocks (serialized using the Spark's serializer) that can be directly
pushed into the block manager without deserializing them.
- rdd() - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
- rdd() - Method in class org.apache.spark.api.java.JavaPairRDD
-
- rdd() - Method in class org.apache.spark.api.java.JavaRDD
-
- rdd() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
- rdd() - Method in class org.apache.spark.Dependency
-
- RDD<T> - Class in org.apache.spark.rdd
-
A Resilient Distributed Dataset (RDD), the basic abstraction in Spark.
- RDD(SparkContext, Seq<Dependency<?>>, ClassTag<T>) - Constructor for class org.apache.spark.rdd.RDD
-
- RDD(RDD<?>, ClassTag<T>) - Constructor for class org.apache.spark.rdd.RDD
-
Construct an RDD with just a one-to-one dependency on one parent
- rdd() - Method in class org.apache.spark.sql.api.java.JavaSchemaRDD
-
- rdd() - Method in class org.apache.spark.sql.execution.ExistingRdd
-
- RDD() - Static method in class org.apache.spark.storage.BlockId
-
- RDDBlockId - Class in org.apache.spark.storage
-
- RDDBlockId(int, int) - Constructor for class org.apache.spark.storage.RDDBlockId
-
- rddBlocks() - Method in class org.apache.spark.storage.StorageStatus
-
- rddId() - Method in class org.apache.spark.scheduler.SparkListenerUnpersistRDD
-
- rddId() - Method in class org.apache.spark.storage.RDDBlockId
-
- RDDInfo - Class in org.apache.spark.storage
-
- RDDInfo(int, String, int, StorageLevel) - Constructor for class org.apache.spark.storage.RDDInfo
-
- rddInfoList() - Method in class org.apache.spark.ui.storage.StorageListener
-
Filter RDD info to include only those with cached partitions
- rddInfos() - Method in class org.apache.spark.scheduler.StageInfo
-
- rdds() - Method in class org.apache.spark.rdd.CoGroupedRDD
-
- rdds() - Method in class org.apache.spark.rdd.UnionRDD
-
- rddToAsyncRDDActions(RDD<T>, ClassTag<T>) - Static method in class org.apache.spark.SparkContext
-
- rddToOrderedRDDFunctions(RDD<Tuple2<K, V>>, Ordering<K>, ClassTag<K>, ClassTag<V>) - Static method in class org.apache.spark.SparkContext
-
- rddToPairRDDFunctions(RDD<Tuple2<K, V>>, ClassTag<K>, ClassTag<V>, Ordering<K>) - Static method in class org.apache.spark.SparkContext
-
- rddToSequenceFileRDDFunctions(RDD<Tuple2<K, V>>, Function1<K, Writable>, ClassTag<K>, Function1<V, Writable>, ClassTag<V>) - Static method in class org.apache.spark.SparkContext
-
- readExternal(ObjectInput) - Method in class org.apache.spark.serializer.JavaSerializer
-
- readExternal(ObjectInput) - Method in class org.apache.spark.storage.BlockManagerId
-
- readExternal(ObjectInput) - Method in class org.apache.spark.storage.StorageLevel
-
- readExternal(ObjectInput) - Method in class org.apache.spark.streaming.flume.SparkFlumeEvent
-
- readObject(ClassTag<T>) - Method in interface org.apache.spark.serializer.DeserializationStream
-
- ready(Duration, CanAwait) - Method in class org.apache.spark.ComplexFutureAction
-
- ready(Duration, CanAwait) - Method in interface org.apache.spark.FutureAction
-
Blocks until this action completes.
- ready(Duration, CanAwait) - Method in class org.apache.spark.SimpleFutureAction
-
- reason() - Method in class org.apache.spark.scheduler.SparkListenerTaskEnd
-
- recallByThreshold() - Method in class org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
-
Returns the (threshold, recall) curve.
- receivedBlockInfo() - Method in class org.apache.spark.streaming.scheduler.BatchInfo
-
- Receiver<T> - Class in org.apache.spark.streaming.receiver
-
:: DeveloperApi ::
Abstract class of a receiver that can be run on worker nodes to receive external data.
- Receiver(StorageLevel) - Constructor for class org.apache.spark.streaming.receiver.Receiver
-
- ReceiverInfo - Class in org.apache.spark.streaming.scheduler
-
:: DeveloperApi ::
Class having information about a receiver
- ReceiverInfo(int, String, ActorRef, boolean, String, String, String) - Constructor for class org.apache.spark.streaming.scheduler.ReceiverInfo
-
- receiverInfo() - Method in class org.apache.spark.streaming.scheduler.StreamingListenerReceiverError
-
- receiverInfo() - Method in class org.apache.spark.streaming.scheduler.StreamingListenerReceiverStarted
-
- receiverInfo() - Method in class org.apache.spark.streaming.scheduler.StreamingListenerReceiverStopped
-
- receiverInputDStream() - Method in class org.apache.spark.streaming.api.java.JavaPairReceiverInputDStream
-
- receiverInputDStream() - Method in class org.apache.spark.streaming.api.java.JavaReceiverInputDStream
-
- ReceiverInputDStream<T> - Class in org.apache.spark.streaming.dstream
-
Abstract class for defining any
InputDStream
that has to start a receiver on worker nodes to receive external data.
- ReceiverInputDStream(StreamingContext, ClassTag<T>) - Constructor for class org.apache.spark.streaming.dstream.ReceiverInputDStream
-
- receiverStream(Receiver<T>) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Create an input stream with any arbitrary user implemented receiver.
- receiverStream(Receiver<T>, ClassTag<T>) - Method in class org.apache.spark.streaming.StreamingContext
-
Create an input stream with any arbitrary user implemented receiver.
- reduce(Function2<T, T, T>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Reduces the elements of this RDD using the specified commutative and associative binary
operator.
- reduce(Function2<T, T, T>) - Method in class org.apache.spark.rdd.RDD
-
Reduces the elements of this RDD using the specified commutative and
associative binary operator.
- reduce(Function2<T, T, T>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD has a single element generated by reducing each RDD
of this DStream.
- reduce(Function2<T, T, T>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream in which each RDD has a single element generated by reducing each RDD
of this DStream.
- reduceByKey(Partitioner, Function2<V, V, V>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Merge the values for each key using an associative reduce function.
- reduceByKey(Function2<V, V, V>, int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Merge the values for each key using an associative reduce function.
- reduceByKey(Function2<V, V, V>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Merge the values for each key using an associative reduce function.
- reduceByKey(Partitioner, Function2<V, V, V>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Merge the values for each key using an associative reduce function.
- reduceByKey(Function2<V, V, V>, int) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Merge the values for each key using an associative reduce function.
- reduceByKey(Function2<V, V, V>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Merge the values for each key using an associative reduce function.
- reduceByKey(Function2<V, V, V>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying reduceByKey
to each RDD.
- reduceByKey(Function2<V, V, V>, int) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying reduceByKey
to each RDD.
- reduceByKey(Function2<V, V, V>, Partitioner) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying reduceByKey
to each RDD.
- reduceByKey(Function2<V, V, V>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying reduceByKey
to each RDD.
- reduceByKey(Function2<V, V, V>, int) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying reduceByKey
to each RDD.
- reduceByKey(Function2<V, V, V>, Partitioner) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying reduceByKey
to each RDD.
- reduceByKeyAndWindow(Function2<V, V, V>, Duration) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Create a new DStream by applying reduceByKey
over a sliding window on this
DStream.
- reduceByKeyAndWindow(Function2<V, V, V>, Duration, Duration) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying reduceByKey
over a sliding window.
- reduceByKeyAndWindow(Function2<V, V, V>, Duration, Duration, int) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying reduceByKey
over a sliding window.
- reduceByKeyAndWindow(Function2<V, V, V>, Duration, Duration, Partitioner) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying reduceByKey
over a sliding window.
- reduceByKeyAndWindow(Function2<V, V, V>, Function2<V, V, V>, Duration, Duration) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by reducing over a using incremental computation.
- reduceByKeyAndWindow(Function2<V, V, V>, Function2<V, V, V>, Duration, Duration, int, Function<Tuple2<K, V>, Boolean>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying incremental reduceByKey
over a sliding window.
- reduceByKeyAndWindow(Function2<V, V, V>, Function2<V, V, V>, Duration, Duration, Partitioner, Function<Tuple2<K, V>, Boolean>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying incremental reduceByKey
over a sliding window.
- reduceByKeyAndWindow(Function2<V, V, V>, Duration) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying reduceByKey
over a sliding window on this
DStream.
- reduceByKeyAndWindow(Function2<V, V, V>, Duration, Duration) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying reduceByKey
over a sliding window.
- reduceByKeyAndWindow(Function2<V, V, V>, Duration, Duration, int) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying reduceByKey
over a sliding window.
- reduceByKeyAndWindow(Function2<V, V, V>, Duration, Duration, Partitioner) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying reduceByKey
over a sliding window.
- reduceByKeyAndWindow(Function2<V, V, V>, Function2<V, V, V>, Duration, Duration, int, Function1<Tuple2<K, V>, Object>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying incremental reduceByKey
over a sliding window.
- reduceByKeyAndWindow(Function2<V, V, V>, Function2<V, V, V>, Duration, Duration, Partitioner, Function1<Tuple2<K, V>, Object>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying incremental reduceByKey
over a sliding window.
- reduceByKeyLocally(Function2<V, V, V>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Merge the values for each key using an associative reduce function, but return the results
immediately to the master as a Map.
- reduceByKeyLocally(Function2<V, V, V>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Merge the values for each key using an associative reduce function, but return the results
immediately to the master as a Map.
- reduceByKeyToDriver(Function2<V, V, V>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Alias for reduceByKeyLocally
- reduceByWindow(Function2<T, T, T>, Duration, Duration) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD has a single element generated by reducing all
elements in a sliding window over this DStream.
- reduceByWindow(Function2<T, T, T>, Function2<T, T, T>, Duration, Duration) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD has a single element generated by reducing all
elements in a sliding window over this DStream.
- reduceByWindow(Function2<T, T, T>, Duration, Duration) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream in which each RDD has a single element generated by reducing all
elements in a sliding window over this DStream.
- reduceByWindow(Function2<T, T, T>, Function2<T, T, T>, Duration, Duration) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream in which each RDD has a single element generated by reducing all
elements in a sliding window over this DStream.
- reduceId() - Method in class org.apache.spark.FetchFailed
-
- reduceId() - Method in class org.apache.spark.storage.ShuffleBlockId
-
- references() - Method in class org.apache.spark.sql.execution.SparkLogicalPlan
-
- registerClasses(Kryo) - Method in interface org.apache.spark.serializer.KryoRegistrator
-
- registerRDDAsTable(JavaSchemaRDD, String) - Method in class org.apache.spark.sql.api.java.JavaSQLContext
-
Registers the given RDD as a temporary table in the catalog.
- registerRDDAsTable(SchemaRDD, String) - Method in class org.apache.spark.sql.SQLContext
-
Registers the given RDD as a temporary table in the catalog.
- registerTestTable(TestHiveContext.TestTable) - Method in class org.apache.spark.sql.hive.test.TestHiveContext
-
- Regression() - Static method in class org.apache.spark.mllib.tree.configuration.Algo
-
- RegressionModel - Interface in org.apache.spark.mllib.regression
-
- relation() - Method in class org.apache.spark.sql.hive.execution.HiveTableScan
-
- relation() - Method in class org.apache.spark.sql.parquet.InsertIntoParquetTable
-
- relation() - Method in class org.apache.spark.sql.parquet.ParquetTableScan
-
- remember(Duration) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Sets each DStreams in this context to remember RDDs it generated in the last given duration.
- remember(Duration) - Method in class org.apache.spark.streaming.StreamingContext
-
Set each DStreams in this context to remember RDDs it generated in the last given duration.
- rememberDuration() - Method in class org.apache.spark.streaming.dstream.DStream
-
- remove(String) - Method in class org.apache.spark.SparkConf
-
Remove a parameter from the configuration
- repartition(int) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Return a new RDD that has exactly numPartitions partitions.
- repartition(int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return a new RDD that has exactly numPartitions partitions.
- repartition(int) - Method in class org.apache.spark.api.java.JavaRDD
-
Return a new RDD that has exactly numPartitions partitions.
- repartition(int, Ordering<T>) - Method in class org.apache.spark.rdd.RDD
-
Return a new RDD that has exactly numPartitions partitions.
- repartition(int) - Method in class org.apache.spark.sql.api.java.JavaSchemaRDD
-
Return a new RDD that has exactly numPartitions
partitions.
- repartition(int, Ordering<Row>) - Method in class org.apache.spark.sql.SchemaRDD
-
- repartition(int) - Method in class org.apache.spark.streaming.api.java.JavaDStream
-
Return a new DStream with an increased or decreased level of parallelism.
- repartition(int) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream with an increased or decreased level of parallelism.
- repartition(int) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream with an increased or decreased level of parallelism.
- replication() - Method in class org.apache.spark.storage.StorageLevel
-
- reportError(String, Throwable) - Method in class org.apache.spark.streaming.receiver.Receiver
-
Report exceptions in receiving data.
- requiredChildDistribution() - Method in class org.apache.spark.sql.execution.Aggregate
-
- requiredChildDistribution() - Method in class org.apache.spark.sql.execution.HashJoin
-
- requiredChildDistribution() - Method in class org.apache.spark.sql.execution.LeftSemiJoinHash
-
- requiredChildDistribution() - Method in class org.apache.spark.sql.execution.Sort
-
- requiredChildDistribution() - Method in class org.apache.spark.sql.execution.SparkPlan
-
Specifies any partition requirements on the input data for this operator.
- reset() - Method in class org.apache.spark.sql.hive.test.TestHiveContext
-
Resets the test instance by deleting any tables that have been created.
- restart(String) - Method in class org.apache.spark.streaming.receiver.Receiver
-
Restart the receiver.
- restart(String, Throwable) - Method in class org.apache.spark.streaming.receiver.Receiver
-
Restart the receiver.
- restart(String, Throwable, int) - Method in class org.apache.spark.streaming.receiver.Receiver
-
Restart the receiver.
- Resubmitted - Class in org.apache.spark
-
:: DeveloperApi ::
A ShuffleMapTask
that completed successfully earlier, but we
lost the executor before the stage completed.
- Resubmitted() - Constructor for class org.apache.spark.Resubmitted
-
- result(Duration, CanAwait) - Method in class org.apache.spark.ComplexFutureAction
-
- result(Duration, CanAwait) - Method in interface org.apache.spark.FutureAction
-
Awaits and returns the result (of type T) of this action.
- result(Duration, CanAwait) - Method in class org.apache.spark.SimpleFutureAction
-
- resultAttribute() - Method in class org.apache.spark.sql.execution.Aggregate.ComputedAggregate
-
- resultSetToObjectArray(ResultSet) - Static method in class org.apache.spark.rdd.JdbcRDD
-
- retainedStages() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- RidgeRegressionModel - Class in org.apache.spark.mllib.regression
-
Regression model trained using RidgeRegression.
- RidgeRegressionWithSGD - Class in org.apache.spark.mllib.regression
-
Train a regression model with L2-regularization using Stochastic Gradient Descent.
- RidgeRegressionWithSGD() - Constructor for class org.apache.spark.mllib.regression.RidgeRegressionWithSGD
-
Construct a RidgeRegression object with default parameters: {stepSize: 1.0, numIterations: 100,
regParam: 1.0, miniBatchFraction: 1.0}.
- right() - Method in class org.apache.spark.sql.execution.BroadcastNestedLoopJoin
-
The Broadcast relation
- right() - Method in class org.apache.spark.sql.execution.CartesianProduct
-
- right() - Method in class org.apache.spark.sql.execution.HashJoin
-
- right() - Method in class org.apache.spark.sql.execution.LeftSemiJoinBNL
-
The Broadcast relation
- right() - Method in class org.apache.spark.sql.execution.LeftSemiJoinHash
-
- rightImpurity() - Method in class org.apache.spark.mllib.tree.model.InformationGainStats
-
- rightKeys() - Method in class org.apache.spark.sql.execution.HashJoin
-
- rightKeys() - Method in class org.apache.spark.sql.execution.LeftSemiJoinHash
-
- rightNode() - Method in class org.apache.spark.mllib.tree.model.Node
-
- rightOuterJoin(JavaPairRDD<K, W>, Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Perform a right outer join of this
and other
.
- rightOuterJoin(JavaPairRDD<K, W>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Perform a right outer join of this
and other
.
- rightOuterJoin(JavaPairRDD<K, W>, int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Perform a right outer join of this
and other
.
- rightOuterJoin(RDD<Tuple2<K, W>>, Partitioner) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Perform a right outer join of this
and other
.
- rightOuterJoin(RDD<Tuple2<K, W>>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Perform a right outer join of this
and other
.
- rightOuterJoin(RDD<Tuple2<K, W>>, int) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Perform a right outer join of this
and other
.
- rightOuterJoin(JavaPairDStream<K, W>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying 'right outer join' between RDDs of this
DStream and
other
DStream.
- rightOuterJoin(JavaPairDStream<K, W>, int) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying 'right outer join' between RDDs of this
DStream and
other
DStream.
- rightOuterJoin(JavaPairDStream<K, W>, Partitioner) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying 'right outer join' between RDDs of this
DStream and
other
DStream.
- rightOuterJoin(DStream<Tuple2<K, W>>, ClassTag<W>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying 'right outer join' between RDDs of this
DStream and
other
DStream.
- rightOuterJoin(DStream<Tuple2<K, W>>, int, ClassTag<W>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying 'right outer join' between RDDs of this
DStream and
other
DStream.
- rightOuterJoin(DStream<Tuple2<K, W>>, Partitioner, ClassTag<W>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying 'right outer join' between RDDs of this
DStream and
other
DStream.
- roc() - Method in class org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
-
Returns the receiver operating characteristic (ROC) curve,
which is an RDD of (false positive rate, true positive rate)
with (0.0, 0.0) prepended and (1.0, 1.0) appended to it.
- Row - Class in org.apache.spark.sql.api.java
-
A result row from a SparkSQL query.
- Row(Row) - Constructor for class org.apache.spark.sql.api.java.Row
-
- row() - Method in class org.apache.spark.sql.api.java.Row
-
- RowMatrix - Class in org.apache.spark.mllib.linalg.distributed
-
:: Experimental ::
Represents a row-oriented distributed Matrix with no meaningful row indices.
- RowMatrix(RDD<Vector>, long, int) - Constructor for class org.apache.spark.mllib.linalg.distributed.RowMatrix
-
- RowMatrix(RDD<Vector>) - Constructor for class org.apache.spark.mllib.linalg.distributed.RowMatrix
-
Alternative constructor leaving matrix dimensions to be determined automatically.
- rows() - Method in class org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix
-
- rows() - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
-
- run(Function0<T>, ExecutionContext) - Method in class org.apache.spark.ComplexFutureAction
-
Executes some action enclosed in the closure.
- run(RDD<LabeledPoint>) - Method in class org.apache.spark.mllib.classification.NaiveBayes
-
Run the algorithm with the configured parameters on an input RDD of LabeledPoint entries.
- run(RDD<Vector>) - Method in class org.apache.spark.mllib.clustering.KMeans
-
Train a K-means model on the given set of points; data
should be cached for high
performance, because this is an iterative algorithm.
- run(RDD<Rating>) - Method in class org.apache.spark.mllib.recommendation.ALS
-
Run ALS with the configured parameters on an input RDD of (user, product, rating) triples.
- run(RDD<LabeledPoint>) - Method in class org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm
-
Run the algorithm with the configured parameters on an input
RDD of LabeledPoint entries.
- run(RDD<LabeledPoint>, Vector) - Method in class org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm
-
Run the algorithm with the configured parameters on an input RDD
of LabeledPoint entries starting from the initial weights provided.
- runApproximateJob(RDD<T>, Function2<TaskContext, Iterator<T>, U>, <any>, long) - Method in class org.apache.spark.SparkContext
-
:: DeveloperApi ::
Run a job that can return approximate results.
- runJob(RDD<T>, Function1<Iterator<T>, U>, Seq<Object>, Function2<Object, U, BoxedUnit>, Function0<R>) - Method in class org.apache.spark.ComplexFutureAction
-
Runs a Spark job.
- runJob(RDD<T>, Function2<TaskContext, Iterator<T>, U>, Seq<Object>, boolean, Function2<Object, U, BoxedUnit>, ClassTag<U>) - Method in class org.apache.spark.SparkContext
-
Run a function on a given set of partitions in an RDD and pass the results to the given
handler function.
- runJob(RDD<T>, Function2<TaskContext, Iterator<T>, U>, Seq<Object>, boolean, ClassTag<U>) - Method in class org.apache.spark.SparkContext
-
Run a function on a given set of partitions in an RDD and return the results as an array.
- runJob(RDD<T>, Function1<Iterator<T>, U>, Seq<Object>, boolean, ClassTag<U>) - Method in class org.apache.spark.SparkContext
-
Run a job on a given set of partitions of an RDD, but take a function of type
Iterator[T] => U
instead of (TaskContext, Iterator[T]) => U
.
- runJob(RDD<T>, Function2<TaskContext, Iterator<T>, U>, ClassTag<U>) - Method in class org.apache.spark.SparkContext
-
Run a job on all partitions in an RDD and return the results in an array.
- runJob(RDD<T>, Function1<Iterator<T>, U>, ClassTag<U>) - Method in class org.apache.spark.SparkContext
-
Run a job on all partitions in an RDD and return the results in an array.
- runJob(RDD<T>, Function2<TaskContext, Iterator<T>, U>, Function2<Object, U, BoxedUnit>, ClassTag<U>) - Method in class org.apache.spark.SparkContext
-
Run a job on all partitions in an RDD and pass the results to a handler function.
- runJob(RDD<T>, Function1<Iterator<T>, U>, Function2<Object, U, BoxedUnit>, ClassTag<U>) - Method in class org.apache.spark.SparkContext
-
Run a job on all partitions in an RDD and pass the results to a handler function.
- runLBFGS(RDD<Tuple2<Object, Vector>>, Gradient, Updater, int, double, int, double, Vector) - Static method in class org.apache.spark.mllib.optimization.LBFGS
-
Run Limited-memory BFGS (L-BFGS) in parallel.
- runMiniBatchSGD(RDD<Tuple2<Object, Vector>>, Gradient, Updater, double, int, double, double, Vector) - Static method in class org.apache.spark.mllib.optimization.GradientDescent
-
Run stochastic gradient descent (SGD) in parallel using mini batches.
- running() - Method in class org.apache.spark.scheduler.TaskInfo
-
- runningLocally() - Method in class org.apache.spark.TaskContext
-
- runSqlHive(String) - Method in class org.apache.spark.sql.hive.test.TestHiveContext
-
- s() - Method in class org.apache.spark.mllib.linalg.SingularValueDecomposition
-
- sample(boolean, Double) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Return a sampled subset of this RDD.
- sample(boolean, Double, long) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Return a sampled subset of this RDD.
- sample(boolean, double) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return a sampled subset of this RDD.
- sample(boolean, double, long) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return a sampled subset of this RDD.
- sample(boolean, double) - Method in class org.apache.spark.api.java.JavaRDD
-
Return a sampled subset of this RDD.
- sample(boolean, double, long) - Method in class org.apache.spark.api.java.JavaRDD
-
Return a sampled subset of this RDD.
- sample(boolean, double, long) - Method in class org.apache.spark.rdd.RDD
-
Return a sampled subset of this RDD.
- Sample - Class in org.apache.spark.sql.execution
-
:: DeveloperApi ::
- Sample(double, boolean, long, SparkPlan) - Constructor for class org.apache.spark.sql.execution.Sample
-
- sample(boolean, double, long) - Method in class org.apache.spark.sql.SchemaRDD
-
:: Experimental ::
Returns a sampled version of the underlying dataset.
- sample(Iterator<T>) - Method in class org.apache.spark.util.random.BernoulliSampler
-
- sample(Iterator<T>) - Method in class org.apache.spark.util.random.PoissonSampler
-
- sample(Iterator<T>) - Method in interface org.apache.spark.util.random.RandomSampler
-
take a random sample
- sampleStdev() - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Compute the sample standard deviation of this RDD's elements (which corrects for bias in
estimating the standard deviation by dividing by N-1 instead of N).
- sampleStdev() - Method in class org.apache.spark.rdd.DoubleRDDFunctions
-
Compute the sample standard deviation of this RDD's elements (which corrects for bias in
estimating the standard deviation by dividing by N-1 instead of N).
- sampleStdev() - Method in class org.apache.spark.util.StatCounter
-
Return the sample standard deviation of the values, which corrects for bias in estimating the
variance by dividing by N-1 instead of N.
- sampleVariance() - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Compute the sample variance of this RDD's elements (which corrects for bias in
estimating the standard variance by dividing by N-1 instead of N).
- sampleVariance() - Method in class org.apache.spark.rdd.DoubleRDDFunctions
-
Compute the sample variance of this RDD's elements (which corrects for bias in
estimating the variance by dividing by N-1 instead of N).
- sampleVariance() - Method in class org.apache.spark.util.StatCounter
-
Return the sample variance, which corrects for bias in estimating the variance by dividing
by N-1 instead of N.
- saveAsHadoopDataset(JobConf) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Output the RDD to any Hadoop-supported storage system, using a Hadoop JobConf object for
that storage system.
- saveAsHadoopDataset(JobConf) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Output the RDD to any Hadoop-supported storage system, using a Hadoop JobConf object for
that storage system.
- saveAsHadoopFile(String, Class<?>, Class<?>, Class<F>, JobConf) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Output the RDD to any Hadoop-supported file system.
- saveAsHadoopFile(String, Class<?>, Class<?>, Class<F>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Output the RDD to any Hadoop-supported file system.
- saveAsHadoopFile(String, Class<?>, Class<?>, Class<F>, Class<? extends CompressionCodec>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Output the RDD to any Hadoop-supported file system, compressing with the supplied codec.
- saveAsHadoopFile(String, ClassTag<F>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Output the RDD to any Hadoop-supported file system, using a Hadoop OutputFormat
class
supporting the key and value types K and V in this RDD.
- saveAsHadoopFile(String, Class<? extends CompressionCodec>, ClassTag<F>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Output the RDD to any Hadoop-supported file system, using a Hadoop OutputFormat
class
supporting the key and value types K and V in this RDD.
- saveAsHadoopFile(String, Class<?>, Class<?>, Class<? extends OutputFormat<?, ?>>, Class<? extends CompressionCodec>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Output the RDD to any Hadoop-supported file system, using a Hadoop OutputFormat
class
supporting the key and value types K and V in this RDD.
- saveAsHadoopFile(String, Class<?>, Class<?>, Class<? extends OutputFormat<?, ?>>, JobConf, Option<Class<? extends CompressionCodec>>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Output the RDD to any Hadoop-supported file system, using a Hadoop OutputFormat
class
supporting the key and value types K and V in this RDD.
- saveAsHadoopFiles(String, String) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Save each RDD in this
DStream as a Hadoop file.
- saveAsHadoopFiles(String, String, Class<?>, Class<?>, Class<? extends OutputFormat<?, ?>>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Save each RDD in this
DStream as a Hadoop file.
- saveAsHadoopFiles(String, String, Class<?>, Class<?>, Class<? extends OutputFormat<?, ?>>, JobConf) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Save each RDD in this
DStream as a Hadoop file.
- saveAsHadoopFiles(String, String, ClassTag<F>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Save each RDD in this
DStream as a Hadoop file.
- saveAsHadoopFiles(String, String, Class<?>, Class<?>, Class<? extends OutputFormat<?, ?>>, JobConf) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Save each RDD in this
DStream as a Hadoop file.
- saveAsHiveFile(RDD<Writable>, Class<?>, FileSinkDesc, JobConf, boolean) - Method in class org.apache.spark.sql.hive.execution.InsertIntoHiveTable
-
- saveAsLibSVMFile(RDD<LabeledPoint>, String) - Static method in class org.apache.spark.mllib.util.MLUtils
-
Save labeled data in LIBSVM format.
- saveAsNewAPIHadoopDataset(Configuration) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Output the RDD to any Hadoop-supported storage system, using
a Configuration object for that storage system.
- saveAsNewAPIHadoopDataset(Configuration) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Output the RDD to any Hadoop-supported storage system with new Hadoop API, using a Hadoop
Configuration object for that storage system.
- saveAsNewAPIHadoopFile(String, Class<?>, Class<?>, Class<F>, Configuration) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Output the RDD to any Hadoop-supported file system.
- saveAsNewAPIHadoopFile(String, Class<?>, Class<?>, Class<F>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Output the RDD to any Hadoop-supported file system.
- saveAsNewAPIHadoopFile(String, ClassTag<F>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Output the RDD to any Hadoop-supported file system, using a new Hadoop API OutputFormat
(mapreduce.OutputFormat) object supporting the key and value types K and V in this RDD.
- saveAsNewAPIHadoopFile(String, Class<?>, Class<?>, Class<? extends OutputFormat<?, ?>>, Configuration) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Output the RDD to any Hadoop-supported file system, using a new Hadoop API OutputFormat
(mapreduce.OutputFormat) object supporting the key and value types K and V in this RDD.
- saveAsNewAPIHadoopFiles(String, String) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Save each RDD in this
DStream as a Hadoop file.
- saveAsNewAPIHadoopFiles(String, String, Class<?>, Class<?>, Class<? extends OutputFormat<?, ?>>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Save each RDD in this
DStream as a Hadoop file.
- saveAsNewAPIHadoopFiles(String, String, Class<?>, Class<?>, Class<? extends OutputFormat<?, ?>>, Configuration) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Save each RDD in this
DStream as a Hadoop file.
- saveAsNewAPIHadoopFiles(String, String, ClassTag<F>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Save each RDD in this
DStream as a Hadoop file.
- saveAsNewAPIHadoopFiles(String, String, Class<?>, Class<?>, Class<? extends OutputFormat<?, ?>>, Configuration) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Save each RDD in this
DStream as a Hadoop file.
- saveAsObjectFile(String) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Save this RDD as a SequenceFile of serialized objects.
- saveAsObjectFile(String) - Method in class org.apache.spark.rdd.RDD
-
Save this RDD as a SequenceFile of serialized objects.
- saveAsObjectFiles(String, String) - Method in class org.apache.spark.streaming.dstream.DStream
-
Save each RDD in this DStream as a Sequence file of serialized objects.
- saveAsSequenceFile(String, Option<Class<? extends CompressionCodec>>) - Method in class org.apache.spark.rdd.SequenceFileRDDFunctions
-
Output the RDD as a Hadoop SequenceFile using the Writable types we infer from the RDD's key
and value types.
- saveAsTextFile(String) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Save this RDD as a text file, using string representations of elements.
- saveAsTextFile(String, Class<? extends CompressionCodec>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Save this RDD as a compressed text file, using string representations of elements.
- saveAsTextFile(String) - Method in class org.apache.spark.rdd.RDD
-
Save this RDD as a text file, using string representations of elements.
- saveAsTextFile(String, Class<? extends CompressionCodec>) - Method in class org.apache.spark.rdd.RDD
-
Save this RDD as a compressed text file, using string representations of elements.
- saveAsTextFiles(String, String) - Method in class org.apache.spark.streaming.dstream.DStream
-
Save each RDD in this DStream as at text file, using string representation
of elements.
- saveLabeledData(RDD<LabeledPoint>, String) - Static method in class org.apache.spark.mllib.util.MLUtils
-
:: Experimental ::
Save labeled data to a file.
- sc() - Method in class org.apache.spark.api.java.JavaSparkContext
-
- sc() - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
- sc() - Method in class org.apache.spark.streaming.StreamingContext
-
- scalaIntToJavaLong(DStream<Object>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
- scalaToJavaLong(JavaPairDStream<K, Object>, ClassTag<K>) - Static method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
- scheduler() - Method in class org.apache.spark.streaming.StreamingContext
-
- schedulingDelay() - Method in class org.apache.spark.streaming.scheduler.BatchInfo
-
Time taken for the first job of this batch to start processing from the time this batch
was submitted to the streaming scheduler.
- SchedulingMode - Class in org.apache.spark.scheduler
-
"FAIR" and "FIFO" determines which policy is used
to order tasks amongst a Schedulable's sub-queues
"NONE" is used when the a Schedulable has no sub-queues.
- SchedulingMode() - Constructor for class org.apache.spark.scheduler.SchedulingMode
-
- schedulingMode() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- SchemaRDD - Class in org.apache.spark.sql
-
:: AlphaComponent ::
An RDD of Row
objects that has an associated schema.
- SchemaRDD(SQLContext, LogicalPlan) - Constructor for class org.apache.spark.sql.SchemaRDD
-
- script() - Method in class org.apache.spark.sql.hive.execution.ScriptTransformation
-
- ScriptTransformation - Class in org.apache.spark.sql.hive.execution
-
:: DeveloperApi ::
Transforms the input by forking and running the specified script.
- ScriptTransformation(Seq<Expression>, String, Seq<Attribute>, SparkPlan, HiveContext) - Constructor for class org.apache.spark.sql.hive.execution.ScriptTransformation
-
- seconds() - Static method in class org.apache.spark.scheduler.StatsReportListener
-
- Seconds - Class in org.apache.spark.streaming
-
Helper object that creates instance of
Duration
representing
a given number of seconds.
- Seconds() - Constructor for class org.apache.spark.streaming.Seconds
-
- securityManager() - Method in class org.apache.spark.SparkEnv
-
- seed() - Method in class org.apache.spark.sql.execution.Sample
-
- select(Seq<Expression>) - Method in class org.apache.spark.sql.SchemaRDD
-
Changes the output of this relation to the given expressions, similar to the SELECT
clause
in SQL.
- sequenceFile(String, Class<K>, Class<V>, int) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Get an RDD for a Hadoop SequenceFile with given key and value types.
- sequenceFile(String, Class<K>, Class<V>) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Get an RDD for a Hadoop SequenceFile.
- sequenceFile(String, Class<K>, Class<V>, int) - Method in class org.apache.spark.SparkContext
-
Get an RDD for a Hadoop SequenceFile with given key and value types.
- sequenceFile(String, Class<K>, Class<V>) - Method in class org.apache.spark.SparkContext
-
Get an RDD for a Hadoop SequenceFile with given key and value types.
- sequenceFile(String, int, ClassTag<K>, ClassTag<V>, Function0<WritableConverter<K>>, Function0<WritableConverter<V>>) - Method in class org.apache.spark.SparkContext
-
Version of sequenceFile() for types implicitly convertible to Writables through a
WritableConverter.
- SequenceFileRDDFunctions<K,V> - Class in org.apache.spark.rdd
-
Extra functions available on RDDs of (key, value) pairs to create a Hadoop SequenceFile,
through an implicit conversion.
- SequenceFileRDDFunctions(RDD<Tuple2<K, V>>, Function1<K, Writable>, ClassTag<K>, Function1<V, Writable>, ClassTag<V>) - Constructor for class org.apache.spark.rdd.SequenceFileRDDFunctions
-
- SerializableWritable<T extends org.apache.hadoop.io.Writable> - Class in org.apache.spark
-
- SerializableWritable(T) - Constructor for class org.apache.spark.SerializableWritable
-
- SerializationStream - Interface in org.apache.spark.serializer
-
:: DeveloperApi ::
A stream for writing serialized objects.
- serialize(T, ClassTag<T>) - Method in interface org.apache.spark.serializer.SerializerInstance
-
- serializedSize() - Method in class org.apache.spark.scheduler.TaskInfo
-
- serializeFilterExpressions(Seq<Expression>, Configuration) - Static method in class org.apache.spark.sql.parquet.ParquetFilters
-
Note: Inside the Hadoop API we only have access to
Configuration
, not to
SparkContext
, so we cannot use broadcasts to convey
the actual filter predicate.
- serializeMany(Iterator<T>, ClassTag<T>) - Method in interface org.apache.spark.serializer.SerializerInstance
-
- Serializer - Interface in org.apache.spark.serializer
-
:: DeveloperApi ::
A serializer.
- serializer() - Method in class org.apache.spark.ShuffleDependency
-
- serializer() - Method in class org.apache.spark.SparkEnv
-
- SerializerInstance - Interface in org.apache.spark.serializer
-
:: DeveloperApi ::
An instance of a serializer, for use by one thread at a time.
- serializeStream(OutputStream) - Method in interface org.apache.spark.serializer.SerializerInstance
-
- set(String, String) - Method in class org.apache.spark.SparkConf
-
Set a configuration variable.
- set(SparkEnv) - Static method in class org.apache.spark.SparkEnv
-
- set(String, String) - Method in class org.apache.spark.sql.hive.HiveContext
-
- set(Properties) - Method in interface org.apache.spark.sql.SQLConf
-
- set(String, String) - Method in interface org.apache.spark.sql.SQLConf
-
- setAll(Traversable<Tuple2<String, String>>) - Method in class org.apache.spark.SparkConf
-
Set multiple parameters together
- setAlpha(double) - Method in class org.apache.spark.mllib.recommendation.ALS
-
:: Experimental ::
Sets the constant used in computing confidence in implicit ALS.
- setAppName(String) - Method in class org.apache.spark.SparkConf
-
Set a name for your application.
- setBlocks(int) - Method in class org.apache.spark.mllib.recommendation.ALS
-
Set the number of blocks to parallelize the computation into; pass -1 for an auto-configured
number of blocks.
- setCallSite(String) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Pass-through to SparkContext.setCallSite.
- setCallSite(String) - Method in class org.apache.spark.SparkContext
-
Support function for API backtraces.
- setCheckpointDir(String) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Set the directory under which RDDs are going to be checkpointed.
- setCheckpointDir(String) - Method in class org.apache.spark.SparkContext
-
Set the directory under which RDDs are going to be checkpointed.
- SetCommand - Class in org.apache.spark.sql.execution
-
:: DeveloperApi ::
- SetCommand(Option<String>, Option<String>, Seq<Attribute>, SQLContext) - Constructor for class org.apache.spark.sql.execution.SetCommand
-
- setConvergenceTol(double) - Method in class org.apache.spark.mllib.optimization.LBFGS
-
Set the convergence tolerance of iterations for L-BFGS.
- setEpsilon(double) - Method in class org.apache.spark.mllib.clustering.KMeans
-
Set the distance threshold within which we've consider centers to have converged.
- setExecutorEnv(String, String) - Method in class org.apache.spark.SparkConf
-
Set an environment variable to be used when launching executors for this application.
- setExecutorEnv(Seq<Tuple2<String, String>>) - Method in class org.apache.spark.SparkConf
-
Set multiple environment variables to be used when launching executors.
- setExecutorEnv(Tuple2<String, String>[]) - Method in class org.apache.spark.SparkConf
-
Set multiple environment variables to be used when launching executors.
- setGradient(Gradient) - Method in class org.apache.spark.mllib.optimization.GradientDescent
-
Set the gradient function (of the loss function of one single data example)
to be used for SGD.
- setGradient(Gradient) - Method in class org.apache.spark.mllib.optimization.LBFGS
-
Set the gradient function (of the loss function of one single data example)
to be used for L-BFGS.
- setIfMissing(String, String) - Method in class org.apache.spark.SparkConf
-
Set a parameter if it isn't already configured
- setImplicitPrefs(boolean) - Method in class org.apache.spark.mllib.recommendation.ALS
-
Sets whether to use implicit preference.
- setInitializationMode(String) - Method in class org.apache.spark.mllib.clustering.KMeans
-
Set the initialization algorithm.
- setInitializationSteps(int) - Method in class org.apache.spark.mllib.clustering.KMeans
-
Set the number of steps for the k-means|| initialization mode.
- setIntercept(boolean) - Method in class org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm
-
Set if the algorithm should add an intercept.
- setIterations(int) - Method in class org.apache.spark.mllib.recommendation.ALS
-
Set the number of iterations to run.
- setJars(Seq<String>) - Method in class org.apache.spark.SparkConf
-
Set JAR files to distribute to the cluster.
- setJars(String[]) - Method in class org.apache.spark.SparkConf
-
Set JAR files to distribute to the cluster.
- setJobDescription(String) - Method in class org.apache.spark.SparkContext
-
Set a human readable description of the current job.
- setJobGroup(String, String, boolean) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Assigns a group ID to all the jobs started by this thread until the group ID is set to a
different value or cleared.
- setJobGroup(String, String) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Assigns a group ID to all the jobs started by this thread until the group ID is set to a
different value or cleared.
- setJobGroup(String, String, boolean) - Method in class org.apache.spark.SparkContext
-
Assigns a group ID to all the jobs started by this thread until the group ID is set to a
different value or cleared.
- setK(int) - Method in class org.apache.spark.mllib.clustering.KMeans
-
Set the number of clusters to create (k).
- setLambda(double) - Method in class org.apache.spark.mllib.classification.NaiveBayes
-
Set the smoothing parameter.
- setLambda(double) - Method in class org.apache.spark.mllib.recommendation.ALS
-
Set the regularization parameter, lambda.
- setLocalProperty(String, String) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Set a local property that affects jobs submitted from this thread, such as the
Spark fair scheduler pool.
- setLocalProperty(String, String) - Method in class org.apache.spark.SparkContext
-
Set a local property that affects jobs submitted from this thread, such as the
Spark fair scheduler pool.
- setMaster(String) - Method in class org.apache.spark.SparkConf
-
The master URL to connect to, such as "local" to run locally with one thread, "local[4]" to
run locally with 4 cores, or "spark://master:7077" to run on a Spark standalone cluster.
- setMaxIterations(int) - Method in class org.apache.spark.mllib.clustering.KMeans
-
Set maximum number of iterations to run.
- setMaxNumIterations(int) - Method in class org.apache.spark.mllib.optimization.LBFGS
-
Set the maximal number of iterations for L-BFGS.
- setMiniBatchFraction(double) - Method in class org.apache.spark.mllib.optimization.GradientDescent
-
:: Experimental ::
Set fraction of data to be used for each SGD iteration.
- setName(String) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Assign a name to this RDD
- setName(String) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Assign a name to this RDD
- setName(String) - Method in class org.apache.spark.api.java.JavaRDD
-
Assign a name to this RDD
- setName(String) - Method in class org.apache.spark.rdd.RDD
-
Assign a name to this RDD
- setName(String) - Method in class org.apache.spark.sql.api.java.JavaSchemaRDD
-
Assign a name to this RDD
- setNumCorrections(int) - Method in class org.apache.spark.mllib.optimization.LBFGS
-
Set the number of corrections used in the LBFGS update.
- setNumIterations(int) - Method in class org.apache.spark.mllib.optimization.GradientDescent
-
Set the number of iterations for SGD.
- setRank(int) - Method in class org.apache.spark.mllib.recommendation.ALS
-
Set the rank of the feature matrices computed (number of features).
- setRegParam(double) - Method in class org.apache.spark.mllib.optimization.GradientDescent
-
Set the regularization parameter.
- setRegParam(double) - Method in class org.apache.spark.mllib.optimization.LBFGS
-
Set the regularization parameter.
- setRuns(int) - Method in class org.apache.spark.mllib.clustering.KMeans
-
:: Experimental ::
Set the number of runs of the algorithm to execute in parallel.
- setSeed(long) - Method in class org.apache.spark.mllib.recommendation.ALS
-
Sets a random seed to have deterministic results.
- setSeed(long) - Method in class org.apache.spark.util.random.BernoulliSampler
-
- setSeed(long) - Method in class org.apache.spark.util.random.PoissonSampler
-
- setSeed(long) - Method in interface org.apache.spark.util.random.Pseudorandom
-
Set random seed.
- setSerializer(Serializer) - Method in class org.apache.spark.rdd.CoGroupedRDD
-
- setSerializer(Serializer) - Method in class org.apache.spark.rdd.ShuffledRDD
-
- setSparkHome(String) - Method in class org.apache.spark.SparkConf
-
Set the location where Spark is installed on worker nodes.
- setStepSize(double) - Method in class org.apache.spark.mllib.optimization.GradientDescent
-
Set the initial step size of SGD for the first step.
- setThreshold(double) - Method in class org.apache.spark.mllib.classification.LogisticRegressionModel
-
:: Experimental ::
Sets the threshold that separates positive predictions from negative predictions.
- setThreshold(double) - Method in class org.apache.spark.mllib.classification.SVMModel
-
:: Experimental ::
Sets the threshold that separates positive predictions from negative predictions.
- settings() - Method in interface org.apache.spark.sql.SQLConf
-
- setUpdater(Updater) - Method in class org.apache.spark.mllib.optimization.GradientDescent
-
Set the updater function to actually perform a gradient step in a given direction.
- setUpdater(Updater) - Method in class org.apache.spark.mllib.optimization.LBFGS
-
Set the updater function to actually perform a gradient step in a given direction.
- setValidateData(boolean) - Method in class org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm
-
Set if the algorithm should validate data before training.
- setValue(R) - Method in class org.apache.spark.Accumulable
-
Set the accumulator's value; only allowed on master
- showBytesDistribution(String, Function2<TaskInfo, TaskMetrics, Option<Object>>, Seq<Tuple2<TaskInfo, TaskMetrics>>) - Static method in class org.apache.spark.scheduler.StatsReportListener
-
- showBytesDistribution(String, Option<org.apache.spark.util.Distribution>) - Static method in class org.apache.spark.scheduler.StatsReportListener
-
- showBytesDistribution(String, org.apache.spark.util.Distribution) - Static method in class org.apache.spark.scheduler.StatsReportListener
-
- showDistribution(String, org.apache.spark.util.Distribution, Function1<Object, String>) - Static method in class org.apache.spark.scheduler.StatsReportListener
-
- showDistribution(String, Option<org.apache.spark.util.Distribution>, Function1<Object, String>) - Static method in class org.apache.spark.scheduler.StatsReportListener
-
- showDistribution(String, Option<org.apache.spark.util.Distribution>, String) - Static method in class org.apache.spark.scheduler.StatsReportListener
-
- showDistribution(String, String, Function2<TaskInfo, TaskMetrics, Option<Object>>, Seq<Tuple2<TaskInfo, TaskMetrics>>) - Static method in class org.apache.spark.scheduler.StatsReportListener
-
- showMillisDistribution(String, Option<org.apache.spark.util.Distribution>) - Static method in class org.apache.spark.scheduler.StatsReportListener
-
- showMillisDistribution(String, Function2<TaskInfo, TaskMetrics, Option<Object>>, Seq<Tuple2<TaskInfo, TaskMetrics>>) - Static method in class org.apache.spark.scheduler.StatsReportListener
-
- showMillisDistribution(String, Function1<BatchInfo, Option<Object>>) - Method in class org.apache.spark.streaming.scheduler.StatsReportListener
-
- SHUFFLE() - Static method in class org.apache.spark.storage.BlockId
-
- ShuffleBlockId - Class in org.apache.spark.storage
-
- ShuffleBlockId(int, int, int) - Constructor for class org.apache.spark.storage.ShuffleBlockId
-
- ShuffleDependency<K,V> - Class in org.apache.spark
-
:: DeveloperApi ::
Represents a dependency on the output of a shuffle stage.
- ShuffleDependency(RDD<? extends Product2<K, V>>, Partitioner, Serializer) - Constructor for class org.apache.spark.ShuffleDependency
-
- ShuffledRDD<K,V,P extends scala.Product2<K,V>> - Class in org.apache.spark.rdd
-
:: DeveloperApi ::
The resulting RDD from a shuffle (e.g.
- ShuffledRDD(RDD<P>, Partitioner, ClassTag<P>) - Constructor for class org.apache.spark.rdd.ShuffledRDD
-
- shuffleFetcher() - Method in class org.apache.spark.SparkEnv
-
- shuffleId() - Method in class org.apache.spark.FetchFailed
-
- shuffleId() - Method in class org.apache.spark.ShuffleDependency
-
- shuffleId() - Method in class org.apache.spark.storage.ShuffleBlockId
-
- shuffleMemoryMap() - Method in class org.apache.spark.SparkEnv
-
- shuffleRead() - Method in class org.apache.spark.ui.jobs.ExecutorSummary
-
- shuffleWrite() - Method in class org.apache.spark.ui.jobs.ExecutorSummary
-
- sideEffectResult() - Method in interface org.apache.spark.sql.execution.Command
-
A concrete command should override this lazy field to wrap up any side effects caused by the
command or any other computation that should be evaluated exactly once.
- SimpleFutureAction<T> - Class in org.apache.spark
-
:: Experimental ::
A
FutureAction
holding the result of an action that triggers a single job.
- SimpleUpdater - Class in org.apache.spark.mllib.optimization
-
:: DeveloperApi ::
A simple updater for gradient descent *without* any regularization.
- SimpleUpdater() - Constructor for class org.apache.spark.mllib.optimization.SimpleUpdater
-
- SingularValueDecomposition<UType,VType> - Class in org.apache.spark.mllib.linalg
-
:: Experimental ::
Represents singular value decomposition (SVD) factors.
- SingularValueDecomposition(UType, Vector, VType) - Constructor for class org.apache.spark.mllib.linalg.SingularValueDecomposition
-
- size() - Method in class org.apache.spark.mllib.linalg.DenseVector
-
- size() - Method in class org.apache.spark.mllib.linalg.SparseVector
-
- size() - Method in interface org.apache.spark.mllib.linalg.Vector
-
Size of the vector.
- slice(Time, Time) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return all the RDDs between 'fromDuration' to 'toDuration' (both included)
- slice(org.apache.spark.streaming.Interval) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return all the RDDs defined by the Interval object (both end times included)
- slice(Time, Time) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return all the RDDs between 'fromTime' to 'toTime' (both included)
- slideDuration() - Method in class org.apache.spark.streaming.dstream.DStream
-
Time interval after which the DStream generates a RDD
- slideDuration() - Method in class org.apache.spark.streaming.dstream.InputDStream
-
- SnappyCompressionCodec - Class in org.apache.spark.io
-
- SnappyCompressionCodec(SparkConf) - Constructor for class org.apache.spark.io.SnappyCompressionCodec
-
- socketStream(String, int, Function<InputStream, Iterable<T>>, StorageLevel) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Create an input stream from network source hostname:port.
- socketStream(String, int, Function1<InputStream, Iterator<T>>, StorageLevel, ClassTag<T>) - Method in class org.apache.spark.streaming.StreamingContext
-
Create a input stream from TCP source hostname:port.
- socketTextStream(String, int, StorageLevel) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Create an input stream from network source hostname:port.
- socketTextStream(String, int) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Create an input stream from network source hostname:port.
- socketTextStream(String, int, StorageLevel) - Method in class org.apache.spark.streaming.StreamingContext
-
Create a input stream from TCP source hostname:port.
- Sort() - Static method in class org.apache.spark.mllib.tree.configuration.QuantileStrategy
-
- Sort - Class in org.apache.spark.sql.execution
-
:: DeveloperApi ::
- Sort(Seq<SortOrder>, boolean, SparkPlan) - Constructor for class org.apache.spark.sql.execution.Sort
-
- sortByKey() - Method in class org.apache.spark.api.java.JavaPairRDD
-
Sort the RDD by key, so that each partition contains a sorted range of the elements in
ascending order.
- sortByKey(boolean) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Sort the RDD by key, so that each partition contains a sorted range of the elements.
- sortByKey(Comparator<K>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Sort the RDD by key, so that each partition contains a sorted range of the elements.
- sortByKey(Comparator<K>, boolean) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Sort the RDD by key, so that each partition contains a sorted range of the elements.
- sortByKey(Comparator<K>, boolean, int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Sort the RDD by key, so that each partition contains a sorted range of the elements.
- sortByKey(boolean, int) - Method in class org.apache.spark.rdd.OrderedRDDFunctions
-
Sort the RDD by key, so that each partition contains a sorted range of the elements.
- sortOrder() - Method in class org.apache.spark.sql.execution.Sort
-
- sortOrder() - Method in class org.apache.spark.sql.execution.TakeOrdered
-
- SPARK_JOB_DESCRIPTION() - Static method in class org.apache.spark.SparkContext
-
- SPARK_JOB_GROUP_ID() - Static method in class org.apache.spark.SparkContext
-
- SPARK_JOB_INTERRUPT_ON_CANCEL() - Static method in class org.apache.spark.SparkContext
-
- SPARK_UNKNOWN_USER() - Static method in class org.apache.spark.SparkContext
-
- SPARK_VERSION() - Static method in class org.apache.spark.SparkContext
-
- SparkConf - Class in org.apache.spark
-
Configuration for a Spark application.
- SparkConf(boolean) - Constructor for class org.apache.spark.SparkConf
-
- SparkConf() - Constructor for class org.apache.spark.SparkConf
-
Create a SparkConf that loads defaults from system properties and the classpath
- sparkContext() - Method in class org.apache.spark.rdd.RDD
-
The SparkContext that created this RDD.
- SparkContext - Class in org.apache.spark
-
Main entry point for Spark functionality.
- SparkContext(SparkConf) - Constructor for class org.apache.spark.SparkContext
-
- SparkContext() - Constructor for class org.apache.spark.SparkContext
-
Create a SparkContext that loads settings from system properties (for instance, when
launching with ./bin/spark-submit).
- SparkContext(SparkConf, Map<String, Set<SplitInfo>>) - Constructor for class org.apache.spark.SparkContext
-
:: DeveloperApi ::
Alternative constructor for setting preferred locations where Spark will create executors.
- SparkContext(String, String, SparkConf) - Constructor for class org.apache.spark.SparkContext
-
Alternative constructor that allows setting common Spark properties directly
- SparkContext(String, String, String, Seq<String>, Map<String, String>, Map<String, Set<SplitInfo>>) - Constructor for class org.apache.spark.SparkContext
-
Alternative constructor that allows setting common Spark properties directly
- sparkContext() - Method in class org.apache.spark.sql.SQLContext
-
- sparkContext() - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
The underlying SparkContext
- sparkContext() - Method in class org.apache.spark.streaming.StreamingContext
-
Return the associated Spark context
- SparkContext.DoubleAccumulatorParam$ - Class in org.apache.spark
-
- SparkContext.DoubleAccumulatorParam$() - Constructor for class org.apache.spark.SparkContext.DoubleAccumulatorParam$
-
- SparkContext.FloatAccumulatorParam$ - Class in org.apache.spark
-
- SparkContext.FloatAccumulatorParam$() - Constructor for class org.apache.spark.SparkContext.FloatAccumulatorParam$
-
- SparkContext.IntAccumulatorParam$ - Class in org.apache.spark
-
- SparkContext.IntAccumulatorParam$() - Constructor for class org.apache.spark.SparkContext.IntAccumulatorParam$
-
- SparkContext.LongAccumulatorParam$ - Class in org.apache.spark
-
- SparkContext.LongAccumulatorParam$() - Constructor for class org.apache.spark.SparkContext.LongAccumulatorParam$
-
- SparkEnv - Class in org.apache.spark
-
:: DeveloperApi ::
Holds all the runtime environment objects for a running Spark instance (either master or worker),
including the serializer, Akka actor system, block manager, map output tracker, etc.
- SparkEnv(String, ActorSystem, Serializer, Serializer, CacheManager, MapOutputTracker, ShuffleFetcher, org.apache.spark.broadcast.BroadcastManager, org.apache.spark.storage.BlockManager, ConnectionManager, SecurityManager, HttpFileServer, String, org.apache.spark.metrics.MetricsSystem, SparkConf) - Constructor for class org.apache.spark.SparkEnv
-
- SparkException - Exception in org.apache.spark
-
- SparkException(String, Throwable) - Constructor for exception org.apache.spark.SparkException
-
- SparkException(String) - Constructor for exception org.apache.spark.SparkException
-
- SparkFiles - Class in org.apache.spark
-
Resolves paths to files added through SparkContext.addFile()
.
- SparkFiles() - Constructor for class org.apache.spark.SparkFiles
-
- sparkFilesDir() - Method in class org.apache.spark.SparkEnv
-
- SparkFlumeEvent - Class in org.apache.spark.streaming.flume
-
A wrapper class for AvroFlumeEvent's with a custom serialization format.
- SparkFlumeEvent() - Constructor for class org.apache.spark.streaming.flume.SparkFlumeEvent
-
- SparkListener - Interface in org.apache.spark.scheduler
-
:: DeveloperApi ::
Interface for listening to events from the Spark scheduler.
- SparkListenerApplicationEnd - Class in org.apache.spark.scheduler
-
- SparkListenerApplicationEnd(long) - Constructor for class org.apache.spark.scheduler.SparkListenerApplicationEnd
-
- SparkListenerApplicationStart - Class in org.apache.spark.scheduler
-
- SparkListenerApplicationStart(String, long, String) - Constructor for class org.apache.spark.scheduler.SparkListenerApplicationStart
-
- SparkListenerBlockManagerAdded - Class in org.apache.spark.scheduler
-
- SparkListenerBlockManagerAdded(BlockManagerId, long) - Constructor for class org.apache.spark.scheduler.SparkListenerBlockManagerAdded
-
- SparkListenerBlockManagerRemoved - Class in org.apache.spark.scheduler
-
- SparkListenerBlockManagerRemoved(BlockManagerId) - Constructor for class org.apache.spark.scheduler.SparkListenerBlockManagerRemoved
-
- SparkListenerEnvironmentUpdate - Class in org.apache.spark.scheduler
-
- SparkListenerEnvironmentUpdate(Map<String, Seq<Tuple2<String, String>>>) - Constructor for class org.apache.spark.scheduler.SparkListenerEnvironmentUpdate
-
- SparkListenerEvent - Interface in org.apache.spark.scheduler
-
- SparkListenerJobEnd - Class in org.apache.spark.scheduler
-
- SparkListenerJobEnd(int, JobResult) - Constructor for class org.apache.spark.scheduler.SparkListenerJobEnd
-
- SparkListenerJobStart - Class in org.apache.spark.scheduler
-
- SparkListenerJobStart(int, Seq<Object>, Properties) - Constructor for class org.apache.spark.scheduler.SparkListenerJobStart
-
- SparkListenerStageCompleted - Class in org.apache.spark.scheduler
-
- SparkListenerStageCompleted(StageInfo) - Constructor for class org.apache.spark.scheduler.SparkListenerStageCompleted
-
- SparkListenerStageSubmitted - Class in org.apache.spark.scheduler
-
- SparkListenerStageSubmitted(StageInfo, Properties) - Constructor for class org.apache.spark.scheduler.SparkListenerStageSubmitted
-
- SparkListenerTaskEnd - Class in org.apache.spark.scheduler
-
- SparkListenerTaskEnd(int, String, TaskEndReason, TaskInfo, TaskMetrics) - Constructor for class org.apache.spark.scheduler.SparkListenerTaskEnd
-
- SparkListenerTaskGettingResult - Class in org.apache.spark.scheduler
-
- SparkListenerTaskGettingResult(TaskInfo) - Constructor for class org.apache.spark.scheduler.SparkListenerTaskGettingResult
-
- SparkListenerTaskStart - Class in org.apache.spark.scheduler
-
- SparkListenerTaskStart(int, TaskInfo) - Constructor for class org.apache.spark.scheduler.SparkListenerTaskStart
-
- SparkListenerUnpersistRDD - Class in org.apache.spark.scheduler
-
- SparkListenerUnpersistRDD(int) - Constructor for class org.apache.spark.scheduler.SparkListenerUnpersistRDD
-
- SparkLogicalPlan - Class in org.apache.spark.sql.execution
-
:: DeveloperApi ::
Allows already planned SparkQueries to be linked into logical query plans.
- SparkLogicalPlan(SparkPlan) - Constructor for class org.apache.spark.sql.execution.SparkLogicalPlan
-
- SparkPlan - Class in org.apache.spark.sql.execution
-
:: DeveloperApi ::
- SparkPlan() - Constructor for class org.apache.spark.sql.execution.SparkPlan
-
- sparkProperties() - Method in class org.apache.spark.ui.env.EnvironmentListener
-
- sparkUser() - Method in class org.apache.spark.api.java.JavaSparkContext
-
- sparkUser() - Method in class org.apache.spark.scheduler.SparkListenerApplicationStart
-
- sparkUser() - Method in class org.apache.spark.SparkContext
-
- sparse(int, int[], double[]) - Static method in class org.apache.spark.mllib.linalg.Vectors
-
Creates a sparse vector providing its index array and value array.
- sparse(int, Seq<Tuple2<Object, Object>>) - Static method in class org.apache.spark.mllib.linalg.Vectors
-
Creates a sparse vector using unordered (index, value) pairs.
- sparse(int, Iterable<Tuple2<Integer, Double>>) - Static method in class org.apache.spark.mllib.linalg.Vectors
-
Creates a sparse vector using unordered (index, value) pairs in a Java friendly way.
- SparseVector - Class in org.apache.spark.mllib.linalg
-
A sparse vector represented by an index array and an value array.
- SparseVector(int, int[], double[]) - Constructor for class org.apache.spark.mllib.linalg.SparseVector
-
- split() - Method in class org.apache.spark.mllib.tree.model.Node
-
- Split - Class in org.apache.spark.mllib.tree.model
-
:: DeveloperApi ::
Split applied to a feature
- Split(int, double, Enumeration.Value, List<Object>) - Constructor for class org.apache.spark.mllib.tree.model.Split
-
- splitId() - Method in class org.apache.spark.TaskContext
-
- splitIndex() - Method in class org.apache.spark.storage.RDDBlockId
-
- SplitInfo - Class in org.apache.spark.scheduler
-
- SplitInfo(Class<?>, String, String, long, Object) - Constructor for class org.apache.spark.scheduler.SplitInfo
-
- splits() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Set of partitions in this RDD.
- sql(String) - Method in class org.apache.spark.sql.api.java.JavaSQLContext
-
Executes a query expressed in SQL, returning the result as a JavaSchemaRDD
- sql() - Method in class org.apache.spark.sql.hive.execution.NativeCommand
-
- sql(String) - Method in class org.apache.spark.sql.SQLContext
-
Executes a SQL query using Spark, returning the result as a SchemaRDD.
- SQLConf - Interface in org.apache.spark.sql
-
SQLConf holds mutable config parameters and hints.
- sqlContext() - Method in class org.apache.spark.sql.api.java.JavaSchemaRDD
-
- sqlContext() - Method in class org.apache.spark.sql.api.java.JavaSQLContext
-
- sqlContext() - Method in class org.apache.spark.sql.hive.api.java.JavaHiveContext
-
- sqlContext() - Method in class org.apache.spark.sql.parquet.InsertIntoParquetTable
-
- sqlContext() - Method in class org.apache.spark.sql.parquet.ParquetTableScan
-
- sqlContext() - Method in class org.apache.spark.sql.SchemaRDD
-
- SQLContext - Class in org.apache.spark.sql
-
:: AlphaComponent ::
The entry point for running relational queries using Spark.
- SQLContext(SparkContext) - Constructor for class org.apache.spark.sql.SQLContext
-
- squaredDist(Vector) - Method in class org.apache.spark.util.Vector
-
- SquaredL2Updater - Class in org.apache.spark.mllib.optimization
-
:: DeveloperApi ::
Updater for L2 regularized problems.
- SquaredL2Updater() - Constructor for class org.apache.spark.mllib.optimization.SquaredL2Updater
-
- srdd() - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
- ssc() - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
- ssc() - Method in class org.apache.spark.streaming.dstream.DStream
-
- stackTrace() - Method in class org.apache.spark.ExceptionFailure
-
- stageFailed(String) - Method in class org.apache.spark.scheduler.StageInfo
-
- stageId() - Method in class org.apache.spark.scheduler.SparkListenerTaskEnd
-
- stageId() - Method in class org.apache.spark.scheduler.SparkListenerTaskStart
-
- stageId() - Method in class org.apache.spark.scheduler.StageInfo
-
- stageId() - Method in class org.apache.spark.TaskContext
-
- stageIds() - Method in class org.apache.spark.scheduler.SparkListenerJobStart
-
- stageIdToDescription() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- stageIdToDiskBytesSpilled() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- stageIdToExecutorSummaries() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- stageIdToMemoryBytesSpilled() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- stageIdToPool() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- stageIdToShuffleRead() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- stageIdToShuffleWrite() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- stageIdToTaskData() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- stageIdToTasksActive() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- stageIdToTasksComplete() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- stageIdToTasksFailed() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- stageIdToTime() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- stageInfo() - Method in class org.apache.spark.scheduler.SparkListenerStageCompleted
-
- stageInfo() - Method in class org.apache.spark.scheduler.SparkListenerStageSubmitted
-
- StageInfo - Class in org.apache.spark.scheduler
-
:: DeveloperApi ::
Stores information about a stage to pass from the scheduler to SparkListeners.
- StageInfo(int, String, int, Seq<RDDInfo>) - Constructor for class org.apache.spark.scheduler.StageInfo
-
- start() - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Start the execution of the streams.
- start() - Method in class org.apache.spark.streaming.dstream.ConstantInputDStream
-
- start() - Method in class org.apache.spark.streaming.dstream.InputDStream
-
Method called to start receiving data.
- start() - Method in class org.apache.spark.streaming.dstream.ReceiverInputDStream
-
- start() - Method in class org.apache.spark.streaming.StreamingContext
-
Start the execution of the streams.
- startTime() - Method in class org.apache.spark.api.java.JavaSparkContext
-
- startTime() - Method in class org.apache.spark.SparkContext
-
- StatCounter - Class in org.apache.spark.util
-
A class for tracking the statistics of a set of numbers (count, mean and variance) in a
numerically robust way.
- StatCounter(TraversableOnce<Object>) - Constructor for class org.apache.spark.util.StatCounter
-
- StatCounter() - Constructor for class org.apache.spark.util.StatCounter
-
Initialize the StatCounter with no values.
- state() - Method in class org.apache.spark.streaming.StreamingContext
-
- Statistics - Class in org.apache.spark.streaming.receiver
-
:: DeveloperApi ::
Statistics for querying the supervisor about state of workers.
- Statistics(int, int, int, String) - Constructor for class org.apache.spark.streaming.receiver.Statistics
-
- stats() - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Return a
StatCounter
object that captures the mean, variance and
count of the RDD's elements in one operation.
- stats() - Method in class org.apache.spark.mllib.tree.model.Node
-
- stats() - Method in class org.apache.spark.rdd.DoubleRDDFunctions
-
Return a
StatCounter
object that captures the mean, variance and
count of the RDD's elements in one operation.
- StatsReportListener - Class in org.apache.spark.scheduler
-
:: DeveloperApi ::
Simple SparkListener that logs a few summary statistics when each stage completes
- StatsReportListener() - Constructor for class org.apache.spark.scheduler.StatsReportListener
-
- StatsReportListener - Class in org.apache.spark.streaming.scheduler
-
:: DeveloperApi ::
A simple StreamingListener that logs summary statistics across Spark Streaming batches
- StatsReportListener(int) - Constructor for class org.apache.spark.streaming.scheduler.StatsReportListener
-
- status() - Method in class org.apache.spark.scheduler.TaskInfo
-
- stdev() - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Compute the standard deviation of this RDD's elements.
- stdev() - Method in class org.apache.spark.rdd.DoubleRDDFunctions
-
Compute the standard deviation of this RDD's elements.
- stdev() - Method in class org.apache.spark.util.StatCounter
-
Return the standard deviation of the values.
- stop() - Method in class org.apache.spark.api.java.JavaSparkContext
-
Shut down the SparkContext.
- stop() - Method in interface org.apache.spark.broadcast.BroadcastFactory
-
- stop() - Method in class org.apache.spark.broadcast.HttpBroadcastFactory
-
- stop() - Method in class org.apache.spark.broadcast.TorrentBroadcastFactory
-
- stop() - Method in class org.apache.spark.SparkContext
-
Shut down the SparkContext.
- stop() - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Stop the execution of the streams.
- stop(boolean) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Stop the execution of the streams.
- stop(boolean, boolean) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Stop the execution of the streams.
- stop() - Method in class org.apache.spark.streaming.dstream.ConstantInputDStream
-
- stop() - Method in class org.apache.spark.streaming.dstream.InputDStream
-
Method called to stop receiving data.
- stop() - Method in class org.apache.spark.streaming.dstream.ReceiverInputDStream
-
- stop(String) - Method in class org.apache.spark.streaming.receiver.Receiver
-
Stop the receiver completely.
- stop(String, Throwable) - Method in class org.apache.spark.streaming.receiver.Receiver
-
Stop the receiver completely due to an exception
- stop(boolean) - Method in class org.apache.spark.streaming.StreamingContext
-
Stop the execution of the streams immediately (does not wait for all received data
to be processed).
- stop(boolean, boolean) - Method in class org.apache.spark.streaming.StreamingContext
-
Stop the execution of the streams, with option of ensuring all received data
has been processed.
- storageLevel() - Method in class org.apache.spark.storage.BlockStatus
-
- storageLevel() - Method in class org.apache.spark.storage.RDDInfo
-
- StorageLevel - Class in org.apache.spark.storage
-
:: DeveloperApi ::
Flags for controlling the storage of an RDD.
- StorageLevel() - Constructor for class org.apache.spark.storage.StorageLevel
-
- storageLevel() - Method in class org.apache.spark.streaming.dstream.DStream
-
- storageLevel() - Method in class org.apache.spark.streaming.receiver.Receiver
-
- storageLevelCache() - Static method in class org.apache.spark.storage.StorageLevel
-
:: DeveloperApi ::
Read StorageLevel object from ObjectInput stream.
- StorageLevels - Class in org.apache.spark.api.java
-
Expose some commonly useful storage level constants.
- StorageLevels() - Constructor for class org.apache.spark.api.java.StorageLevels
-
- StorageListener - Class in org.apache.spark.ui.storage
-
:: DeveloperApi ::
A SparkListener that prepares information to be displayed on the BlockManagerUI.
- StorageListener(StorageStatusListener) - Constructor for class org.apache.spark.ui.storage.StorageListener
-
- StorageStatus - Class in org.apache.spark.storage
-
:: DeveloperApi ::
Storage information for each BlockManager.
- StorageStatus(BlockManagerId, long, Map<BlockId, BlockStatus>) - Constructor for class org.apache.spark.storage.StorageStatus
-
- storageStatusList() - Method in class org.apache.spark.storage.StorageStatusListener
-
- storageStatusList() - Method in class org.apache.spark.ui.exec.ExecutorsListener
-
- storageStatusList() - Method in class org.apache.spark.ui.storage.StorageListener
-
- StorageStatusListener - Class in org.apache.spark.storage
-
:: DeveloperApi ::
A SparkListener that maintains executor storage status.
- StorageStatusListener() - Constructor for class org.apache.spark.storage.StorageStatusListener
-
- store(Iterator<T>) - Method in interface org.apache.spark.streaming.receiver.ActorHelper
-
Store an iterator of received data as a data block into Spark's memory.
- store(ByteBuffer) - Method in interface org.apache.spark.streaming.receiver.ActorHelper
-
Store the bytes of received data as a data block into Spark's memory.
- store(T) - Method in interface org.apache.spark.streaming.receiver.ActorHelper
-
Store a single item of received data to Spark's memory.
- store(T) - Method in class org.apache.spark.streaming.receiver.Receiver
-
Store a single item of received data to Spark's memory.
- store(ArrayBuffer<T>) - Method in class org.apache.spark.streaming.receiver.Receiver
-
Store an ArrayBuffer of received data as a data block into Spark's memory.
- store(ArrayBuffer<T>, Object) - Method in class org.apache.spark.streaming.receiver.Receiver
-
Store an ArrayBuffer of received data as a data block into Spark's memory.
- store(Iterator<T>) - Method in class org.apache.spark.streaming.receiver.Receiver
-
Store an iterator of received data as a data block into Spark's memory.
- store(Iterator<T>, Object) - Method in class org.apache.spark.streaming.receiver.Receiver
-
Store an iterator of received data as a data block into Spark's memory.
- store(Iterator<T>) - Method in class org.apache.spark.streaming.receiver.Receiver
-
Store an iterator of received data as a data block into Spark's memory.
- store(Iterator<T>, Object) - Method in class org.apache.spark.streaming.receiver.Receiver
-
Store an iterator of received data as a data block into Spark's memory.
- store(ByteBuffer) - Method in class org.apache.spark.streaming.receiver.Receiver
-
Store the bytes of received data as a data block into Spark's memory.
- store(ByteBuffer, Object) - Method in class org.apache.spark.streaming.receiver.Receiver
-
Store the bytes of received data as a data block into Spark's memory.
- Strategy - Class in org.apache.spark.mllib.tree.configuration
-
:: Experimental ::
Stores all the configuration options for tree construction
- Strategy(Enumeration.Value, Impurity, int, int, Enumeration.Value, Map<Object, Object>, int) - Constructor for class org.apache.spark.mllib.tree.configuration.Strategy
-
- STREAM() - Static method in class org.apache.spark.storage.BlockId
-
- StreamBlockId - Class in org.apache.spark.storage
-
- StreamBlockId(int, long) - Constructor for class org.apache.spark.storage.StreamBlockId
-
- streamed() - Method in class org.apache.spark.sql.execution.BroadcastNestedLoopJoin
-
- streamed() - Method in class org.apache.spark.sql.execution.LeftSemiJoinBNL
-
- streamedKeys() - Method in class org.apache.spark.sql.execution.HashJoin
-
- streamedKeys() - Method in class org.apache.spark.sql.execution.LeftSemiJoinHash
-
- streamedPlan() - Method in class org.apache.spark.sql.execution.HashJoin
-
- streamedPlan() - Method in class org.apache.spark.sql.execution.LeftSemiJoinHash
-
- streamId() - Method in class org.apache.spark.storage.StreamBlockId
-
- streamId() - Method in class org.apache.spark.streaming.receiver.Receiver
-
Get the unique identifier the receiver input stream that this
receiver is associated with.
- streamId() - Method in class org.apache.spark.streaming.scheduler.ReceiverInfo
-
- StreamingContext - Class in org.apache.spark.streaming
-
Main entry point for Spark Streaming functionality.
- StreamingContext(SparkContext, Duration) - Constructor for class org.apache.spark.streaming.StreamingContext
-
Create a StreamingContext using an existing SparkContext.
- StreamingContext(SparkConf, Duration) - Constructor for class org.apache.spark.streaming.StreamingContext
-
Create a StreamingContext by providing the configuration necessary for a new SparkContext.
- StreamingContext(String, String, Duration, String, Seq<String>, Map<String, String>) - Constructor for class org.apache.spark.streaming.StreamingContext
-
Create a StreamingContext by providing the details necessary for creating a new SparkContext.
- StreamingContext(String, Configuration) - Constructor for class org.apache.spark.streaming.StreamingContext
-
Recreate a StreamingContext from a checkpoint file.
- StreamingContextState() - Method in class org.apache.spark.streaming.StreamingContext
-
Accessor for nested Scala object
- StreamingListener - Interface in org.apache.spark.streaming.scheduler
-
:: DeveloperApi ::
A listener interface for receiving information about an ongoing streaming
computation.
- StreamingListenerBatchCompleted - Class in org.apache.spark.streaming.scheduler
-
- StreamingListenerBatchCompleted(BatchInfo) - Constructor for class org.apache.spark.streaming.scheduler.StreamingListenerBatchCompleted
-
- StreamingListenerBatchStarted - Class in org.apache.spark.streaming.scheduler
-
- StreamingListenerBatchStarted(BatchInfo) - Constructor for class org.apache.spark.streaming.scheduler.StreamingListenerBatchStarted
-
- StreamingListenerBatchSubmitted - Class in org.apache.spark.streaming.scheduler
-
- StreamingListenerBatchSubmitted(BatchInfo) - Constructor for class org.apache.spark.streaming.scheduler.StreamingListenerBatchSubmitted
-
- StreamingListenerEvent - Interface in org.apache.spark.streaming.scheduler
-
:: DeveloperApi ::
Base trait for events related to StreamingListener
- StreamingListenerReceiverError - Class in org.apache.spark.streaming.scheduler
-
- StreamingListenerReceiverError(ReceiverInfo) - Constructor for class org.apache.spark.streaming.scheduler.StreamingListenerReceiverError
-
- StreamingListenerReceiverStarted - Class in org.apache.spark.streaming.scheduler
-
- StreamingListenerReceiverStarted(ReceiverInfo) - Constructor for class org.apache.spark.streaming.scheduler.StreamingListenerReceiverStarted
-
- StreamingListenerReceiverStopped - Class in org.apache.spark.streaming.scheduler
-
- StreamingListenerReceiverStopped(ReceiverInfo) - Constructor for class org.apache.spark.streaming.scheduler.StreamingListenerReceiverStopped
-
- streamSideKeyGenerator() - Method in class org.apache.spark.sql.execution.HashJoin
-
- streamSideKeyGenerator() - Method in class org.apache.spark.sql.execution.LeftSemiJoinHash
-
- stringToText(String) - Static method in class org.apache.spark.SparkContext
-
- stringWritableConverter() - Static method in class org.apache.spark.SparkContext
-
- submissionTime() - Method in class org.apache.spark.scheduler.StageInfo
-
When this stage was submitted from the DAGScheduler to a TaskScheduler.
- submissionTime() - Method in class org.apache.spark.streaming.scheduler.BatchInfo
-
- submitJob(RDD<T>, Function1<Iterator<T>, U>, Seq<Object>, Function2<Object, U, BoxedUnit>, Function0<R>) - Method in class org.apache.spark.SparkContext
-
:: Experimental ::
Submit a job for execution and return a FutureJob holding the result.
- subtract(JavaDoubleRDD) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Return an RDD with the elements from this
that are not in other
.
- subtract(JavaDoubleRDD, int) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Return an RDD with the elements from this
that are not in other
.
- subtract(JavaDoubleRDD, Partitioner) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Return an RDD with the elements from this
that are not in other
.
- subtract(JavaPairRDD<K, V>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return an RDD with the elements from this
that are not in other
.
- subtract(JavaPairRDD<K, V>, int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return an RDD with the elements from this
that are not in other
.
- subtract(JavaPairRDD<K, V>, Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return an RDD with the elements from this
that are not in other
.
- subtract(JavaRDD<T>) - Method in class org.apache.spark.api.java.JavaRDD
-
Return an RDD with the elements from this
that are not in other
.
- subtract(JavaRDD<T>, int) - Method in class org.apache.spark.api.java.JavaRDD
-
Return an RDD with the elements from this
that are not in other
.
- subtract(JavaRDD<T>, Partitioner) - Method in class org.apache.spark.api.java.JavaRDD
-
Return an RDD with the elements from this
that are not in other
.
- subtract(RDD<T>) - Method in class org.apache.spark.rdd.RDD
-
Return an RDD with the elements from this
that are not in other
.
- subtract(RDD<T>, int) - Method in class org.apache.spark.rdd.RDD
-
Return an RDD with the elements from this
that are not in other
.
- subtract(RDD<T>, Partitioner, Ordering<T>) - Method in class org.apache.spark.rdd.RDD
-
Return an RDD with the elements from this
that are not in other
.
- subtract(JavaSchemaRDD) - Method in class org.apache.spark.sql.api.java.JavaSchemaRDD
-
Return an RDD with the elements from this
that are not in other
.
- subtract(JavaSchemaRDD, int) - Method in class org.apache.spark.sql.api.java.JavaSchemaRDD
-
Return an RDD with the elements from this
that are not in other
.
- subtract(JavaSchemaRDD, Partitioner) - Method in class org.apache.spark.sql.api.java.JavaSchemaRDD
-
Return an RDD with the elements from this
that are not in other
.
- subtract(RDD<Row>) - Method in class org.apache.spark.sql.SchemaRDD
-
- subtract(RDD<Row>, int) - Method in class org.apache.spark.sql.SchemaRDD
-
- subtract(RDD<Row>, Partitioner, Ordering<Row>) - Method in class org.apache.spark.sql.SchemaRDD
-
- subtract(Vector) - Method in class org.apache.spark.util.Vector
-
- subtractByKey(JavaPairRDD<K, W>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return an RDD with the pairs from this
whose keys are not in other
.
- subtractByKey(JavaPairRDD<K, W>, int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return an RDD with the pairs from `this` whose keys are not in `other`.
- subtractByKey(JavaPairRDD<K, W>, Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return an RDD with the pairs from `this` whose keys are not in `other`.
- subtractByKey(RDD<Tuple2<K, W>>, ClassTag<W>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Return an RDD with the pairs from this
whose keys are not in other
.
- subtractByKey(RDD<Tuple2<K, W>>, int, ClassTag<W>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Return an RDD with the pairs from `this` whose keys are not in `other`.
- subtractByKey(RDD<Tuple2<K, W>>, Partitioner, ClassTag<W>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Return an RDD with the pairs from `this` whose keys are not in `other`.
- succeededTasks() - Method in class org.apache.spark.ui.jobs.ExecutorSummary
-
- Success - Class in org.apache.spark
-
:: DeveloperApi ::
Task succeeded.
- Success() - Constructor for class org.apache.spark.Success
-
- successful() - Method in class org.apache.spark.scheduler.TaskInfo
-
- sum() - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Add up the elements in this RDD.
- sum() - Method in class org.apache.spark.rdd.DoubleRDDFunctions
-
Add up the elements in this RDD.
- sum() - Method in class org.apache.spark.util.StatCounter
-
- sum() - Method in class org.apache.spark.util.Vector
-
- sumApprox(long, Double) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
:: Experimental ::
Approximate operation to return the sum within a timeout.
- sumApprox(long) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
:: Experimental ::
Approximate operation to return the sum within a timeout.
- sumApprox(long, double) - Method in class org.apache.spark.rdd.DoubleRDDFunctions
-
:: Experimental ::
Approximate operation to return the sum within a timeout.
- SVMDataGenerator - Class in org.apache.spark.mllib.util
-
:: DeveloperApi ::
Generate sample data used for SVM.
- SVMDataGenerator() - Constructor for class org.apache.spark.mllib.util.SVMDataGenerator
-
- SVMModel - Class in org.apache.spark.mllib.classification
-
Model for Support Vector Machines (SVMs).
- SVMWithSGD - Class in org.apache.spark.mllib.classification
-
Train a Support Vector Machine (SVM) using Stochastic Gradient Descent.
- SVMWithSGD() - Constructor for class org.apache.spark.mllib.classification.SVMWithSGD
-
Construct a SVM object with default parameters
- systemProperties() - Method in class org.apache.spark.ui.env.EnvironmentListener
-
- t() - Method in class org.apache.spark.SerializableWritable
-
- table() - Method in class org.apache.spark.sql.hive.execution.DescribeHiveTableCommand
-
- table() - Method in class org.apache.spark.sql.hive.execution.InsertIntoHiveTable
-
- table(String) - Method in class org.apache.spark.sql.SQLContext
-
Returns the specified table as a SchemaRDD
- tableName() - Method in class org.apache.spark.sql.execution.CacheCommand
-
- tachyonFolderName() - Method in class org.apache.spark.SparkContext
-
- tachyonSize() - Method in class org.apache.spark.storage.BlockStatus
-
- tachyonSize() - Method in class org.apache.spark.storage.RDDInfo
-
- take(int) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Take the first num elements of the RDD.
- take(int) - Method in class org.apache.spark.rdd.RDD
-
Take the first num elements of the RDD.
- take(int) - Method in class org.apache.spark.sql.SchemaRDD
-
- takeAsync(int) - Method in class org.apache.spark.rdd.AsyncRDDActions
-
Returns a future for retrieving the first num elements of the RDD.
- takeOrdered(int, Comparator<T>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Returns the first K elements from this RDD as defined by
the specified Comparator[T] and maintains the order.
- takeOrdered(int) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Returns the first K elements from this RDD using the
natural ordering for T while maintain the order.
- takeOrdered(int, Ordering<T>) - Method in class org.apache.spark.rdd.RDD
-
Returns the first K (smallest) elements from this RDD as defined by the specified
implicit Ordering[T] and maintains the ordering.
- TakeOrdered - Class in org.apache.spark.sql.execution
-
:: DeveloperApi ::
Take the first limit elements as defined by the sortOrder.
- TakeOrdered(int, Seq<SortOrder>, SparkPlan, SQLContext) - Constructor for class org.apache.spark.sql.execution.TakeOrdered
-
- takeSample(boolean, int) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
- takeSample(boolean, int, long) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
- takeSample(boolean, int, long) - Method in class org.apache.spark.rdd.RDD
-
- TaskContext - Class in org.apache.spark
-
:: DeveloperApi ::
Contextual information about a task which can be read or mutated during execution.
- TaskContext(int, int, long, boolean, TaskMetrics) - Constructor for class org.apache.spark.TaskContext
-
- TaskEndReason - Interface in org.apache.spark
-
:: DeveloperApi ::
Various possible reasons why a task ended.
- TaskFailedReason - Interface in org.apache.spark
-
:: DeveloperApi ::
Various possible reasons why a task failed.
- taskId() - Method in class org.apache.spark.scheduler.TaskInfo
-
- taskId() - Method in class org.apache.spark.storage.TaskResultBlockId
-
- taskInfo() - Method in class org.apache.spark.scheduler.SparkListenerTaskEnd
-
- taskInfo() - Method in class org.apache.spark.scheduler.SparkListenerTaskGettingResult
-
- taskInfo() - Method in class org.apache.spark.scheduler.SparkListenerTaskStart
-
- TaskInfo - Class in org.apache.spark.scheduler
-
:: DeveloperApi ::
Information about a running task attempt inside a TaskSet.
- TaskInfo(long, int, long, String, String, Enumeration.Value) - Constructor for class org.apache.spark.scheduler.TaskInfo
-
- taskInfo() - Method in class org.apache.spark.ui.jobs.TaskUIData
-
- TaskKilled - Class in org.apache.spark
-
:: DeveloperApi ::
Task was killed intentionally and needs to be rescheduled.
- TaskKilled() - Constructor for class org.apache.spark.TaskKilled
-
- TaskKilledException - Exception in org.apache.spark
-
:: DeveloperApi ::
Exception thrown when a task is explicitly killed (i.e., task failure is expected).
- TaskKilledException() - Constructor for exception org.apache.spark.TaskKilledException
-
- taskLocality() - Method in class org.apache.spark.scheduler.TaskInfo
-
- TaskLocality - Class in org.apache.spark.scheduler
-
- TaskLocality() - Constructor for class org.apache.spark.scheduler.TaskLocality
-
- taskMetrics() - Method in class org.apache.spark.scheduler.SparkListenerTaskEnd
-
- taskMetrics() - Method in class org.apache.spark.TaskContext
-
- taskMetrics() - Method in class org.apache.spark.ui.jobs.TaskUIData
-
- TASKRESULT() - Static method in class org.apache.spark.storage.BlockId
-
- TaskResultBlockId - Class in org.apache.spark.storage
-
- TaskResultBlockId(long) - Constructor for class org.apache.spark.storage.TaskResultBlockId
-
- TaskResultLost - Class in org.apache.spark
-
:: DeveloperApi ::
The task finished successfully, but the result was lost from the executor's block manager before
it was fetched.
- TaskResultLost() - Constructor for class org.apache.spark.TaskResultLost
-
- taskScheduler() - Method in class org.apache.spark.SparkContext
-
- taskTime() - Method in class org.apache.spark.ui.jobs.ExecutorSummary
-
- taskType() - Method in class org.apache.spark.scheduler.SparkListenerTaskEnd
-
- TaskUIData - Class in org.apache.spark.ui.jobs
-
- TaskUIData(TaskInfo, Option<TaskMetrics>, Option<String>) - Constructor for class org.apache.spark.ui.jobs.TaskUIData
-
- TEST() - Static method in class org.apache.spark.storage.BlockId
-
- TestHive - Class in org.apache.spark.sql.hive.test
-
- TestHive() - Constructor for class org.apache.spark.sql.hive.test.TestHive
-
- TestHiveContext - Class in org.apache.spark.sql.hive.test
-
A locally running test instance of Spark's Hive execution engine.
- TestHiveContext(SparkContext) - Constructor for class org.apache.spark.sql.hive.test.TestHiveContext
-
- TestHiveContext.QueryExecution - Class in org.apache.spark.sql.hive.test
-
Override QueryExecution with special debug workflow.
- TestHiveContext.QueryExecution() - Constructor for class org.apache.spark.sql.hive.test.TestHiveContext.QueryExecution
-
- TestHiveContext.TestTable - Class in org.apache.spark.sql.hive.test
-
- TestHiveContext.TestTable(String, Seq<Function0<BoxedUnit>>) - Constructor for class org.apache.spark.sql.hive.test.TestHiveContext.TestTable
-
- TestSQLContext - Class in org.apache.spark.sql.test
-
A SQLContext that can be used for local testing.
- TestSQLContext() - Constructor for class org.apache.spark.sql.test.TestSQLContext
-
- testTables() - Method in class org.apache.spark.sql.hive.test.TestHiveContext
-
A list of test tables and the DDL required to initialize them.
- textFile(String) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Read a text file from HDFS, a local file system (available on all nodes), or any
Hadoop-supported file system URI, and return it as an RDD of Strings.
- textFile(String, int) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Read a text file from HDFS, a local file system (available on all nodes), or any
Hadoop-supported file system URI, and return it as an RDD of Strings.
- textFile(String, int) - Method in class org.apache.spark.SparkContext
-
Read a text file from HDFS, a local file system (available on all nodes), or any
Hadoop-supported file system URI, and return it as an RDD of Strings.
- textFileStream(String) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Create an input stream that monitors a Hadoop-compatible filesystem
for new files and reads them as text files (using key as LongWritable, value
as Text and input format as TextInputFormat).
- textFileStream(String) - Method in class org.apache.spark.streaming.StreamingContext
-
Create a input stream that monitors a Hadoop-compatible filesystem
for new files and reads them as text files (using key as LongWritable, value
as Text and input format as TextInputFormat).
- theta() - Method in class org.apache.spark.mllib.classification.NaiveBayesModel
-
- threshold() - Method in class org.apache.spark.mllib.tree.model.Split
-
- thresholds() - Method in class org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
-
Returns thresholds in descending order.
- time() - Method in class org.apache.spark.scheduler.SparkListenerApplicationEnd
-
- time() - Method in class org.apache.spark.scheduler.SparkListenerApplicationStart
-
- Time - Class in org.apache.spark.streaming
-
This is a simple class that represents an absolute instant of time.
- Time(long) - Constructor for class org.apache.spark.streaming.Time
-
- to(Time, Duration) - Method in class org.apache.spark.streaming.Time
-
- toArray() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
- toArray() - Method in class org.apache.spark.mllib.linalg.DenseMatrix
-
- toArray() - Method in class org.apache.spark.mllib.linalg.DenseVector
-
- toArray() - Method in interface org.apache.spark.mllib.linalg.Matrix
-
Converts to a dense array in column major.
- toArray() - Method in class org.apache.spark.mllib.linalg.SparseVector
-
- toArray() - Method in interface org.apache.spark.mllib.linalg.Vector
-
Converts the instance to a double array.
- toArray() - Method in class org.apache.spark.rdd.RDD
-
Return an array that contains all of the elements in this RDD.
- toBreeze() - Method in interface org.apache.spark.mllib.linalg.distributed.DistributedMatrix
-
Collects data and assembles a local dense breeze matrix (for test only).
- toBreeze() - Method in interface org.apache.spark.mllib.linalg.Matrix
-
Converts to a breeze matrix.
- toBreeze() - Method in interface org.apache.spark.mllib.linalg.Vector
-
Converts the instance to a breeze vector.
- toDataType(String) - Static method in class org.apache.spark.sql.hive.HiveMetastoreTypes
-
- toDebugString() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
A description of this RDD and its recursive dependencies for debugging.
- toDebugString() - Method in class org.apache.spark.rdd.RDD
-
A description of this RDD and its recursive dependencies for debugging.
- toDebugString() - Method in class org.apache.spark.SparkConf
-
Return a string listing all keys and values, one per line.
- toDebugString() - Method in interface org.apache.spark.sql.SQLConf
-
- toErrorString() - Method in class org.apache.spark.ExceptionFailure
-
- toErrorString() - Static method in class org.apache.spark.ExecutorLostFailure
-
- toErrorString() - Method in class org.apache.spark.FetchFailed
-
- toErrorString() - Static method in class org.apache.spark.Resubmitted
-
- toErrorString() - Method in interface org.apache.spark.TaskFailedReason
-
Error message displayed in the web UI.
- toErrorString() - Static method in class org.apache.spark.TaskKilled
-
- toErrorString() - Static method in class org.apache.spark.TaskResultLost
-
- toErrorString() - Static method in class org.apache.spark.UnknownReason
-
- toFormattedString() - Method in class org.apache.spark.streaming.Duration
-
- toIndexedRowMatrix() - Method in class org.apache.spark.mllib.linalg.distributed.CoordinateMatrix
-
Converts to IndexedRowMatrix.
- toInt() - Method in class org.apache.spark.storage.StorageLevel
-
- toJavaDStream() - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Convert to a JavaDStream
- toJavaRDD() - Method in class org.apache.spark.rdd.RDD
-
- toJavaSchemaRDD() - Method in class org.apache.spark.sql.SchemaRDD
-
Returns this RDD as a JavaSchemaRDD.
- toLocalIterator() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return an iterator that contains all of the elements in this RDD.
- toLocalIterator() - Method in class org.apache.spark.rdd.RDD
-
Return an iterator that contains all of the elements in this RDD.
- toMetastoreType(DataType) - Static method in class org.apache.spark.sql.hive.HiveMetastoreTypes
-
- top(int, Comparator<T>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Returns the top K elements from this RDD as defined by
the specified Comparator[T].
- top(int) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Returns the top K elements from this RDD using the
natural ordering for T.
- top(int, Ordering<T>) - Method in class org.apache.spark.rdd.RDD
-
- toPairDStreamFunctions(DStream<Tuple2<K, V>>, ClassTag<K>, ClassTag<V>, Ordering<K>) - Static method in class org.apache.spark.streaming.StreamingContext
-
- topNode() - Method in class org.apache.spark.mllib.tree.model.DecisionTreeModel
-
- toRDD(JavaDoubleRDD) - Static method in class org.apache.spark.api.java.JavaDoubleRDD
-
- toRDD(JavaPairRDD<K, V>) - Static method in class org.apache.spark.api.java.JavaPairRDD
-
- toRDD(JavaRDD<T>) - Static method in class org.apache.spark.api.java.JavaRDD
-
- toRowMatrix() - Method in class org.apache.spark.mllib.linalg.distributed.CoordinateMatrix
-
Converts to RowMatrix, dropping row indices after grouping by row index.
- toRowMatrix() - Method in class org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix
-
Drops row indices and converts this matrix to a
RowMatrix
.
- TorrentBroadcastFactory - Class in org.apache.spark.broadcast
-
A
Broadcast
implementation that uses a BitTorrent-like
protocol to do a distributed transfer of the broadcasted data to the executors.
- TorrentBroadcastFactory() - Constructor for class org.apache.spark.broadcast.TorrentBroadcastFactory
-
- toSchemaRDD() - Method in class org.apache.spark.sql.SchemaRDD
-
Returns this RDD as a SchemaRDD.
- toSparkContext(JavaSparkContext) - Static method in class org.apache.spark.api.java.JavaSparkContext
-
- toSplitInfo(Class<?>, String, InputSplit) - Static method in class org.apache.spark.scheduler.SplitInfo
-
- toSplitInfo(Class<?>, String, InputSplit) - Static method in class org.apache.spark.scheduler.SplitInfo
-
- toString() - Method in class org.apache.spark.Accumulable
-
- toString() - Method in class org.apache.spark.api.java.JavaRDD
-
- toString() - Method in class org.apache.spark.broadcast.Broadcast
-
- toString() - Method in class org.apache.spark.mllib.linalg.DenseVector
-
- toString() - Method in interface org.apache.spark.mllib.linalg.Matrix
-
- toString() - Method in class org.apache.spark.mllib.linalg.SparseVector
-
- toString() - Method in class org.apache.spark.mllib.regression.LabeledPoint
-
- toString() - Method in class org.apache.spark.mllib.tree.model.InformationGainStats
-
- toString() - Method in class org.apache.spark.mllib.tree.model.Node
-
- toString() - Method in class org.apache.spark.mllib.tree.model.Split
-
- toString() - Method in class org.apache.spark.partial.BoundedDouble
-
- toString() - Method in class org.apache.spark.partial.PartialResult
-
- toString() - Method in class org.apache.spark.rdd.RDD
-
- toString() - Method in class org.apache.spark.scheduler.InputFormatInfo
-
- toString() - Method in class org.apache.spark.scheduler.SplitInfo
-
- toString() - Method in class org.apache.spark.SerializableWritable
-
- toString() - Method in class org.apache.spark.sql.api.java.JavaSchemaRDD
-
- toString() - Method in class org.apache.spark.storage.BlockId
-
- toString() - Method in class org.apache.spark.storage.BlockManagerId
-
- toString() - Method in class org.apache.spark.storage.RDDInfo
-
- toString() - Method in class org.apache.spark.storage.StorageLevel
-
- toString() - Method in class org.apache.spark.streaming.Duration
-
- toString() - Method in class org.apache.spark.streaming.Time
-
- toString() - Method in class org.apache.spark.util.MutablePair
-
- toString() - Method in class org.apache.spark.util.StatCounter
-
- toString() - Method in class org.apache.spark.util.Vector
-
- totalDelay() - Method in class org.apache.spark.streaming.scheduler.BatchInfo
-
Time taken for all the jobs of this batch to finish processing from the time they
were submitted.
- totalShuffleRead() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- totalShuffleWrite() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- totalTime() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- train(RDD<LabeledPoint>, int, double, double, Vector) - Static method in class org.apache.spark.mllib.classification.LogisticRegressionWithSGD
-
Train a logistic regression model given an RDD of (label, features) pairs.
- train(RDD<LabeledPoint>, int, double, double) - Static method in class org.apache.spark.mllib.classification.LogisticRegressionWithSGD
-
Train a logistic regression model given an RDD of (label, features) pairs.
- train(RDD<LabeledPoint>, int, double) - Static method in class org.apache.spark.mllib.classification.LogisticRegressionWithSGD
-
Train a logistic regression model given an RDD of (label, features) pairs.
- train(RDD<LabeledPoint>, int) - Static method in class org.apache.spark.mllib.classification.LogisticRegressionWithSGD
-
Train a logistic regression model given an RDD of (label, features) pairs.
- train(RDD<LabeledPoint>) - Static method in class org.apache.spark.mllib.classification.NaiveBayes
-
Trains a Naive Bayes model given an RDD of (label, features)
pairs.
- train(RDD<LabeledPoint>, double) - Static method in class org.apache.spark.mllib.classification.NaiveBayes
-
Trains a Naive Bayes model given an RDD of (label, features)
pairs.
- train(RDD<LabeledPoint>, int, double, double, double, Vector) - Static method in class org.apache.spark.mllib.classification.SVMWithSGD
-
Train a SVM model given an RDD of (label, features) pairs.
- train(RDD<LabeledPoint>, int, double, double, double) - Static method in class org.apache.spark.mllib.classification.SVMWithSGD
-
Train a SVM model given an RDD of (label, features) pairs.
- train(RDD<LabeledPoint>, int, double, double) - Static method in class org.apache.spark.mllib.classification.SVMWithSGD
-
Train a SVM model given an RDD of (label, features) pairs.
- train(RDD<LabeledPoint>, int) - Static method in class org.apache.spark.mllib.classification.SVMWithSGD
-
Train a SVM model given an RDD of (label, features) pairs.
- train(RDD<Vector>, int, int, int, String) - Static method in class org.apache.spark.mllib.clustering.KMeans
-
Trains a k-means model using the given set of parameters.
- train(RDD<Vector>, int, int) - Static method in class org.apache.spark.mllib.clustering.KMeans
-
Trains a k-means model using specified parameters and the default values for unspecified.
- train(RDD<Vector>, int, int, int) - Static method in class org.apache.spark.mllib.clustering.KMeans
-
Trains a k-means model using specified parameters and the default values for unspecified.
- train(RDD<Rating>, int, int, double, int, long) - Static method in class org.apache.spark.mllib.recommendation.ALS
-
Train a matrix factorization model given an RDD of ratings given by users to some products,
in the form of (userID, productID, rating) pairs.
- train(RDD<Rating>, int, int, double, int) - Static method in class org.apache.spark.mllib.recommendation.ALS
-
Train a matrix factorization model given an RDD of ratings given by users to some products,
in the form of (userID, productID, rating) pairs.
- train(RDD<Rating>, int, int, double) - Static method in class org.apache.spark.mllib.recommendation.ALS
-
Train a matrix factorization model given an RDD of ratings given by users to some products,
in the form of (userID, productID, rating) pairs.
- train(RDD<Rating>, int, int) - Static method in class org.apache.spark.mllib.recommendation.ALS
-
Train a matrix factorization model given an RDD of ratings given by users to some products,
in the form of (userID, productID, rating) pairs.
- train(RDD<LabeledPoint>, int, double, double, double, Vector) - Static method in class org.apache.spark.mllib.regression.LassoWithSGD
-
Train a Lasso model given an RDD of (label, features) pairs.
- train(RDD<LabeledPoint>, int, double, double, double) - Static method in class org.apache.spark.mllib.regression.LassoWithSGD
-
Train a Lasso model given an RDD of (label, features) pairs.
- train(RDD<LabeledPoint>, int, double, double) - Static method in class org.apache.spark.mllib.regression.LassoWithSGD
-
Train a Lasso model given an RDD of (label, features) pairs.
- train(RDD<LabeledPoint>, int) - Static method in class org.apache.spark.mllib.regression.LassoWithSGD
-
Train a Lasso model given an RDD of (label, features) pairs.
- train(RDD<LabeledPoint>, int, double, double, Vector) - Static method in class org.apache.spark.mllib.regression.LinearRegressionWithSGD
-
Train a Linear Regression model given an RDD of (label, features) pairs.
- train(RDD<LabeledPoint>, int, double, double) - Static method in class org.apache.spark.mllib.regression.LinearRegressionWithSGD
-
Train a LinearRegression model given an RDD of (label, features) pairs.
- train(RDD<LabeledPoint>, int, double) - Static method in class org.apache.spark.mllib.regression.LinearRegressionWithSGD
-
Train a LinearRegression model given an RDD of (label, features) pairs.
- train(RDD<LabeledPoint>, int) - Static method in class org.apache.spark.mllib.regression.LinearRegressionWithSGD
-
Train a LinearRegression model given an RDD of (label, features) pairs.
- train(RDD<LabeledPoint>, int, double, double, double, Vector) - Static method in class org.apache.spark.mllib.regression.RidgeRegressionWithSGD
-
Train a RidgeRegression model given an RDD of (label, features) pairs.
- train(RDD<LabeledPoint>, int, double, double, double) - Static method in class org.apache.spark.mllib.regression.RidgeRegressionWithSGD
-
Train a RidgeRegression model given an RDD of (label, features) pairs.
- train(RDD<LabeledPoint>, int, double, double) - Static method in class org.apache.spark.mllib.regression.RidgeRegressionWithSGD
-
Train a RidgeRegression model given an RDD of (label, features) pairs.
- train(RDD<LabeledPoint>, int) - Static method in class org.apache.spark.mllib.regression.RidgeRegressionWithSGD
-
Train a RidgeRegression model given an RDD of (label, features) pairs.
- train(RDD<LabeledPoint>) - Method in class org.apache.spark.mllib.tree.DecisionTree
-
Method to train a decision tree model over an RDD
- trainImplicit(RDD<Rating>, int, int, double, int, double, long) - Static method in class org.apache.spark.mllib.recommendation.ALS
-
Train a matrix factorization model given an RDD of 'implicit preferences' given by users
to some products, in the form of (userID, productID, preference) pairs.
- trainImplicit(RDD<Rating>, int, int, double, int, double) - Static method in class org.apache.spark.mllib.recommendation.ALS
-
Train a matrix factorization model given an RDD of 'implicit preferences' given by users
to some products, in the form of (userID, productID, preference) pairs.
- trainImplicit(RDD<Rating>, int, int, double, double) - Static method in class org.apache.spark.mllib.recommendation.ALS
-
Train a matrix factorization model given an RDD of 'implicit preferences' given by users to
some products, in the form of (userID, productID, preference) pairs.
- trainImplicit(RDD<Rating>, int, int) - Static method in class org.apache.spark.mllib.recommendation.ALS
-
Train a matrix factorization model given an RDD of 'implicit preferences' ratings given by
users to some products, in the form of (userID, productID, rating) pairs.
- transform(Function<R, JavaRDD<U>>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD is generated by applying a function
on each RDD of 'this' DStream.
- transform(Function2<R, Time, JavaRDD<U>>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD is generated by applying a function
on each RDD of 'this' DStream.
- transform(List<JavaDStream<?>>, Function2<List<JavaRDD<?>>, Time, JavaRDD<T>>) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Create a new DStream in which each RDD is generated by applying a function on RDDs of
the DStreams.
- transform(Function1<RDD<T>, RDD<U>>, ClassTag<U>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream in which each RDD is generated by applying a function
on each RDD of 'this' DStream.
- transform(Function2<RDD<T>, Time, RDD<U>>, ClassTag<U>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream in which each RDD is generated by applying a function
on each RDD of 'this' DStream.
- transform(Seq<DStream<?>>, Function2<Seq<RDD<?>>, Time, RDD<T>>, ClassTag<T>) - Method in class org.apache.spark.streaming.StreamingContext
-
Create a new DStream in which each RDD is generated by applying a function on RDDs of
the DStreams.
- transformToPair(Function<R, JavaPairRDD<K2, V2>>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD is generated by applying a function
on each RDD of 'this' DStream.
- transformToPair(Function2<R, Time, JavaPairRDD<K2, V2>>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD is generated by applying a function
on each RDD of 'this' DStream.
- transformToPair(List<JavaDStream<?>>, Function2<List<JavaRDD<?>>, Time, JavaPairRDD<K, V>>) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Create a new DStream in which each RDD is generated by applying a function on RDDs of
the DStreams.
- transformWith(JavaDStream<U>, Function3<R, JavaRDD<U>, Time, JavaRDD<W>>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD is generated by applying a function
on each RDD of 'this' DStream and 'other' DStream.
- transformWith(JavaPairDStream<K2, V2>, Function3<R, JavaPairRDD<K2, V2>, Time, JavaRDD<W>>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD is generated by applying a function
on each RDD of 'this' DStream and 'other' DStream.
- transformWith(DStream<U>, Function2<RDD<T>, RDD<U>, RDD<V>>, ClassTag<U>, ClassTag<V>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream in which each RDD is generated by applying a function
on each RDD of 'this' DStream and 'other' DStream.
- transformWith(DStream<U>, Function3<RDD<T>, RDD<U>, Time, RDD<V>>, ClassTag<U>, ClassTag<V>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream in which each RDD is generated by applying a function
on each RDD of 'this' DStream and 'other' DStream.
- transformWithToPair(JavaDStream<U>, Function3<R, JavaRDD<U>, Time, JavaPairRDD<K2, V2>>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD is generated by applying a function
on each RDD of 'this' DStream and 'other' DStream.
- transformWithToPair(JavaPairDStream<K2, V2>, Function3<R, JavaPairRDD<K2, V2>, Time, JavaPairRDD<K3, V3>>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD is generated by applying a function
on each RDD of 'this' DStream and 'other' DStream.
- TwitterUtils - Class in org.apache.spark.streaming.twitter
-
- TwitterUtils() - Constructor for class org.apache.spark.streaming.twitter.TwitterUtils
-