- abs(Column) - Static method in class org.apache.spark.sql.functions
-
Computes the absolute value.
- abs() - Method in class org.apache.spark.sql.types.Decimal
-
- AbsoluteError - Class in org.apache.spark.mllib.tree.loss
-
:: DeveloperApi ::
Class for absolute error loss calculation (for regression).
- AbsoluteError() - Constructor for class org.apache.spark.mllib.tree.loss.AbsoluteError
-
- accessTime() - Method in class org.apache.spark.sql.sources.HadoopFsRelation.FakeFileStatus
-
- accId() - Method in class org.apache.spark.CleanAccum
-
- Accumulable<R,T> - Class in org.apache.spark
-
A data type that can be accumulated, ie has an commutative and associative "add" operation,
but where the result type, R
, may be different from the element type being added, T
.
- Accumulable(R, AccumulableParam<R, T>, Option<String>) - Constructor for class org.apache.spark.Accumulable
-
- Accumulable(R, AccumulableParam<R, T>) - Constructor for class org.apache.spark.Accumulable
-
- accumulable(T, AccumulableParam<T, R>) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Create an
Accumulable
shared variable of the given type, to which tasks
can "add" values with
add
.
- accumulable(T, String, AccumulableParam<T, R>) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Create an
Accumulable
shared variable of the given type, to which tasks
can "add" values with
add
.
- accumulable(R, AccumulableParam<R, T>) - Method in class org.apache.spark.SparkContext
-
Create an
Accumulable
shared variable, to which tasks can add values
with
+=
.
- accumulable(R, String, AccumulableParam<R, T>) - Method in class org.apache.spark.SparkContext
-
Create an
Accumulable
shared variable, with a name for display in the
Spark UI.
- accumulableCollection(R, Function1<R, Growable<T>>, ClassTag<R>) - Method in class org.apache.spark.SparkContext
-
Create an accumulator from a "mutable collection" type.
- AccumulableInfo - Class in org.apache.spark.scheduler
-
:: DeveloperApi ::
Information about an
Accumulable
modified during a task or stage.
- AccumulableInfo - Class in org.apache.spark.status.api.v1
-
- AccumulableParam<R,T> - Interface in org.apache.spark
-
Helper object defining how to accumulate values of a particular type.
- accumulables() - Method in class org.apache.spark.scheduler.StageInfo
-
Terminal values of accumulables updated during this stage.
- accumulables() - Method in class org.apache.spark.scheduler.TaskInfo
-
Intermediate updates to accumulables during this task.
- Accumulator<T> - Class in org.apache.spark
-
A simpler value of
Accumulable
where the result type being accumulated is the same
as the types of elements being merged, i.e.
- Accumulator(T, AccumulatorParam<T>, Option<String>) - Constructor for class org.apache.spark.Accumulator
-
- Accumulator(T, AccumulatorParam<T>) - Constructor for class org.apache.spark.Accumulator
-
- accumulator(int) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Create an
Accumulator
integer variable, which tasks can "add" values
to using the
add
method.
- accumulator(int, String) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Create an
Accumulator
integer variable, which tasks can "add" values
to using the
add
method.
- accumulator(double) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Create an
Accumulator
double variable, which tasks can "add" values
to using the
add
method.
- accumulator(double, String) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Create an
Accumulator
double variable, which tasks can "add" values
to using the
add
method.
- accumulator(T, AccumulatorParam<T>) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Create an
Accumulator
variable of a given type, which tasks can "add"
values to using the
add
method.
- accumulator(T, String, AccumulatorParam<T>) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Create an
Accumulator
variable of a given type, which tasks can "add"
values to using the
add
method.
- accumulator(T, AccumulatorParam<T>) - Method in class org.apache.spark.SparkContext
-
Create an
Accumulator
variable of a given type, which tasks can "add"
values to using the
+=
method.
- accumulator(T, String, AccumulatorParam<T>) - Method in class org.apache.spark.SparkContext
-
Create an
Accumulator
variable of a given type, with a name for display
in the Spark UI.
- AccumulatorParam<T> - Interface in org.apache.spark
-
A simpler version of
AccumulableParam
where the only data type you can add
in is the same type as the accumulated value.
- AccumulatorParam.DoubleAccumulatorParam$ - Class in org.apache.spark
-
- AccumulatorParam.DoubleAccumulatorParam$() - Constructor for class org.apache.spark.AccumulatorParam.DoubleAccumulatorParam$
-
- AccumulatorParam.FloatAccumulatorParam$ - Class in org.apache.spark
-
- AccumulatorParam.FloatAccumulatorParam$() - Constructor for class org.apache.spark.AccumulatorParam.FloatAccumulatorParam$
-
- AccumulatorParam.IntAccumulatorParam$ - Class in org.apache.spark
-
- AccumulatorParam.IntAccumulatorParam$() - Constructor for class org.apache.spark.AccumulatorParam.IntAccumulatorParam$
-
- AccumulatorParam.LongAccumulatorParam$ - Class in org.apache.spark
-
- AccumulatorParam.LongAccumulatorParam$() - Constructor for class org.apache.spark.AccumulatorParam.LongAccumulatorParam$
-
- accumulatorUpdates() - Method in class org.apache.spark.status.api.v1.StageData
-
- accumulatorUpdates() - Method in class org.apache.spark.status.api.v1.TaskData
-
- accuracy() - Method in class org.apache.spark.mllib.evaluation.MultilabelMetrics
-
Returns accuracy
- acos(Column) - Static method in class org.apache.spark.sql.functions
-
Computes the cosine inverse of the given value; the returned angle is in the range
0.0 through pi.
- acos(String) - Static method in class org.apache.spark.sql.functions
-
Computes the cosine inverse of the given column; the returned angle is in the range
0.0 through pi.
- active() - Method in class org.apache.spark.streaming.scheduler.ReceiverInfo
-
- activeJobs() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- activeStages() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- activeTasks() - Method in class org.apache.spark.status.api.v1.ExecutorSummary
-
- ActorHelper - Interface in org.apache.spark.streaming.receiver
-
:: DeveloperApi ::
A receiver trait to be mixed in with your Actor to gain access to
the API for pushing received data into Spark Streaming for being processed.
- actorStream(Props, String, StorageLevel, SupervisorStrategy) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Create an input stream with any arbitrary user implemented actor receiver.
- actorStream(Props, String, StorageLevel) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Create an input stream with any arbitrary user implemented actor receiver.
- actorStream(Props, String) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Create an input stream with any arbitrary user implemented actor receiver.
- actorStream(Props, String, StorageLevel, SupervisorStrategy, ClassTag<T>) - Method in class org.apache.spark.streaming.StreamingContext
-
Create an input stream with any arbitrary user implemented actor receiver.
- ActorSupervisorStrategy - Class in org.apache.spark.streaming.receiver
-
:: DeveloperApi ::
A helper with set of defaults for supervisor strategy
- ActorSupervisorStrategy() - Constructor for class org.apache.spark.streaming.receiver.ActorSupervisorStrategy
-
- actorSystem() - Method in class org.apache.spark.SparkEnv
-
- add(T) - Method in class org.apache.spark.Accumulable
-
Add more data to this accumulator / accumulable
- add(double, Vector) - Method in class org.apache.spark.ml.classification.LogisticAggregator
-
Add a new training data to this LogisticAggregator, and update the loss and gradient
of the objective function.
- add(double, Vector) - Method in class org.apache.spark.ml.regression.LeastSquaresAggregator
-
Add a new training data to this LeastSquaresAggregator, and update the loss and gradient
of the objective function.
- add(double[], MultivariateGaussian[], ExpectationSum, Vector<Object>) - Static method in class org.apache.spark.mllib.clustering.ExpectationSum
-
- add(Vector) - Method in class org.apache.spark.mllib.feature.IDF.DocumentFrequencyAggregator
-
Adds a new document.
- add(BlockMatrix) - Method in class org.apache.spark.mllib.linalg.distributed.BlockMatrix
-
Adds two block matrices together.
- add(Vector) - Method in class org.apache.spark.mllib.stat.MultivariateOnlineSummarizer
-
Add a new sample to this summarizer, and update the statistical summary.
- add(StructField) - Method in class org.apache.spark.sql.types.StructType
-
- add(String, DataType) - Method in class org.apache.spark.sql.types.StructType
-
Creates a new
StructType
by adding a new nullable field with no metadata.
- add(String, DataType, boolean) - Method in class org.apache.spark.sql.types.StructType
-
Creates a new
StructType
by adding a new field with no metadata.
- add(String, DataType, boolean, Metadata) - Method in class org.apache.spark.sql.types.StructType
-
Creates a new
StructType
by adding a new field and specifying metadata.
- add(String, String) - Method in class org.apache.spark.sql.types.StructType
-
Creates a new
StructType
by adding a new nullable field with no metadata where the
dataType is specified as a String.
- add(String, String, boolean) - Method in class org.apache.spark.sql.types.StructType
-
Creates a new
StructType
by adding a new field with no metadata where the
dataType is specified as a String.
- add(String, String, boolean, Metadata) - Method in class org.apache.spark.sql.types.StructType
-
Creates a new
StructType
by adding a new field and specifying metadata where the
dataType is specified as a String.
- add(Vector) - Method in class org.apache.spark.util.Vector
-
- add_months(Column, int) - Static method in class org.apache.spark.sql.functions
-
Returns the date that is numMonths after startDate.
- addAccumulator(R, T) - Method in interface org.apache.spark.AccumulableParam
-
Add additional data to the accumulator value.
- addAccumulator(T, T) - Method in interface org.apache.spark.AccumulatorParam
-
- addAppArgs(String...) - Method in class org.apache.spark.launcher.SparkLauncher
-
Adds command line arguments for the application.
- addedFiles() - Method in class org.apache.spark.SparkContext
-
- addedJars() - Method in class org.apache.spark.SparkContext
-
- addFile(String) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Add a file to be downloaded with this Spark job on every node.
- addFile(String) - Method in class org.apache.spark.launcher.SparkLauncher
-
Adds a file to be submitted with the application.
- addFile(String) - Method in class org.apache.spark.SparkContext
-
Add a file to be downloaded with this Spark job on every node.
- addFile(String, boolean) - Method in class org.apache.spark.SparkContext
-
Add a file to be downloaded with this Spark job on every node.
- addGrid(Param<T>, Iterable<T>) - Method in class org.apache.spark.ml.tuning.ParamGridBuilder
-
Adds a param with multiple values (overwrites if the input param exists).
- addGrid(DoubleParam, double[]) - Method in class org.apache.spark.ml.tuning.ParamGridBuilder
-
Adds a double param with multiple values.
- addGrid(IntParam, int[]) - Method in class org.apache.spark.ml.tuning.ParamGridBuilder
-
Adds a int param with multiple values.
- addGrid(FloatParam, float[]) - Method in class org.apache.spark.ml.tuning.ParamGridBuilder
-
Adds a float param with multiple values.
- addGrid(LongParam, long[]) - Method in class org.apache.spark.ml.tuning.ParamGridBuilder
-
Adds a long param with multiple values.
- addGrid(BooleanParam) - Method in class org.apache.spark.ml.tuning.ParamGridBuilder
-
Adds a boolean param with true and false.
- addInPlace(R, R) - Method in interface org.apache.spark.AccumulableParam
-
Merge two accumulated values together.
- addInPlace(double, double) - Method in class org.apache.spark.AccumulatorParam.DoubleAccumulatorParam$
-
- addInPlace(float, float) - Method in class org.apache.spark.AccumulatorParam.FloatAccumulatorParam$
-
- addInPlace(int, int) - Method in class org.apache.spark.AccumulatorParam.IntAccumulatorParam$
-
- addInPlace(long, long) - Method in class org.apache.spark.AccumulatorParam.LongAccumulatorParam$
-
- addInPlace(double, double) - Method in class org.apache.spark.SparkContext.DoubleAccumulatorParam$
-
- addInPlace(float, float) - Method in class org.apache.spark.SparkContext.FloatAccumulatorParam$
-
- addInPlace(int, int) - Method in class org.apache.spark.SparkContext.IntAccumulatorParam$
-
- addInPlace(long, long) - Method in class org.apache.spark.SparkContext.LongAccumulatorParam$
-
- addInPlace(Vector) - Method in class org.apache.spark.util.Vector
-
- addInPlace(Vector, Vector) - Method in class org.apache.spark.util.Vector.VectorAccumParam$
-
- addIntercept() - Method in class org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm
-
Whether to add intercept (default: false).
- addJar(String) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Adds a JAR dependency for all tasks to be executed on this SparkContext in the future.
- addJar(String) - Method in class org.apache.spark.launcher.SparkLauncher
-
Adds a jar file to be submitted with the application.
- addJar(String) - Method in class org.apache.spark.SparkContext
-
Adds a JAR dependency for all tasks to be executed on this SparkContext in the future.
- addLocalConfiguration(String, int, int, int, JobConf) - Static method in class org.apache.spark.rdd.HadoopRDD
-
Add Hadoop configuration specific to a single partition and attempt.
- addOnCompleteCallback(Function0<BoxedUnit>) - Method in class org.apache.spark.TaskContext
-
Adds a callback function to be executed on task completion.
- addPartToPGroup(Partition, PartitionGroup) - Method in class org.apache.spark.rdd.PartitionCoalescer
-
- addPyFile(String) - Method in class org.apache.spark.launcher.SparkLauncher
-
Adds a python file / zip / egg to be submitted with the application.
- address() - Method in class org.apache.spark.status.api.v1.RDDDataDistribution
-
- addSparkArg(String) - Method in class org.apache.spark.launcher.SparkLauncher
-
Adds a no-value argument to the Spark invocation.
- addSparkArg(String, String) - Method in class org.apache.spark.launcher.SparkLauncher
-
Adds an argument with a value to the Spark invocation.
- addSparkListener(SparkListener) - Method in class org.apache.spark.SparkContext
-
:: DeveloperApi ::
Register a listener to receive up-calls from events that happen during execution.
- addStreamingListener(StreamingListener) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
- addStreamingListener(StreamingListener) - Method in class org.apache.spark.streaming.StreamingContext
-
- addTaskCompletionListener(TaskCompletionListener) - Method in class org.apache.spark.TaskContext
-
Adds a (Java friendly) listener to be executed on task completion.
- addTaskCompletionListener(Function1<TaskContext, BoxedUnit>) - Method in class org.apache.spark.TaskContext
-
Adds a listener in the form of a Scala closure to be executed on task completion.
- addVector(Vector) - Method in class org.apache.spark.ml.feature.VectorIndexer.CategoryStats
-
Add a new vector to this index, updating sets of unique feature values
- agg(Column, Column...) - Method in class org.apache.spark.sql.DataFrame
-
Aggregates on the entire
DataFrame
without groups.
- agg(Tuple2<String, String>, Seq<Tuple2<String, String>>) - Method in class org.apache.spark.sql.DataFrame
-
(Scala-specific) Aggregates on the entire
DataFrame
without groups.
- agg(Map<String, String>) - Method in class org.apache.spark.sql.DataFrame
-
(Scala-specific) Aggregates on the entire
DataFrame
without groups.
- agg(Map<String, String>) - Method in class org.apache.spark.sql.DataFrame
-
(Java-specific) Aggregates on the entire
DataFrame
without groups.
- agg(Column, Seq<Column>) - Method in class org.apache.spark.sql.DataFrame
-
Aggregates on the entire
DataFrame
without groups.
- agg(Column, Column...) - Method in class org.apache.spark.sql.GroupedData
-
Compute aggregates by specifying a series of aggregate columns.
- agg(Tuple2<String, String>, Seq<Tuple2<String, String>>) - Method in class org.apache.spark.sql.GroupedData
-
(Scala-specific) Compute aggregates by specifying a map from column name to
aggregate methods.
- agg(Map<String, String>) - Method in class org.apache.spark.sql.GroupedData
-
(Scala-specific) Compute aggregates by specifying a map from column name to
aggregate methods.
- agg(Map<String, String>) - Method in class org.apache.spark.sql.GroupedData
-
(Java-specific) Compute aggregates by specifying a map from column name to
aggregate methods.
- agg(Column, Seq<Column>) - Method in class org.apache.spark.sql.GroupedData
-
Compute aggregates by specifying a series of aggregate columns.
- aggregate(U, Function2<U, T, U>, Function2<U, U, U>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Aggregate the elements of each partition, and then the results for all the partitions, using
given combine functions and a neutral "zero value".
- aggregate(U, Function2<U, T, U>, Function2<U, U, U>, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
-
Aggregate the elements of each partition, and then the results for all the partitions, using
given combine functions and a neutral "zero value".
- aggregateByKey(U, Partitioner, Function2<U, V, U>, Function2<U, U, U>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Aggregate the values of each key, using given combine functions and a neutral "zero value".
- aggregateByKey(U, int, Function2<U, V, U>, Function2<U, U, U>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Aggregate the values of each key, using given combine functions and a neutral "zero value".
- aggregateByKey(U, Function2<U, V, U>, Function2<U, U, U>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Aggregate the values of each key, using given combine functions and a neutral "zero value".
- aggregateByKey(U, Partitioner, Function2<U, V, U>, Function2<U, U, U>, ClassTag<U>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Aggregate the values of each key, using given combine functions and a neutral "zero value".
- aggregateByKey(U, int, Function2<U, V, U>, Function2<U, U, U>, ClassTag<U>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Aggregate the values of each key, using given combine functions and a neutral "zero value".
- aggregateByKey(U, Function2<U, V, U>, Function2<U, U, U>, ClassTag<U>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Aggregate the values of each key, using given combine functions and a neutral "zero value".
- AggregatedDialect - Class in org.apache.spark.sql.jdbc
-
:: DeveloperApi ::
AggregatedDialect can unify multiple dialects into one virtual Dialect.
- AggregatedDialect(List<JdbcDialect>) - Constructor for class org.apache.spark.sql.jdbc.AggregatedDialect
-
- aggregateMessages(Function1<EdgeContext<VD, ED, A>, BoxedUnit>, Function2<A, A, A>, TripletFields, ClassTag<A>) - Method in class org.apache.spark.graphx.Graph
-
Aggregates values from the neighboring edges and vertices of each vertex.
- aggregateMessagesWithActiveSet(Function1<EdgeContext<VD, ED, A>, BoxedUnit>, Function2<A, A, A>, TripletFields, Option<Tuple2<VertexRDD<?>, EdgeDirection>>, ClassTag<A>) - Method in class org.apache.spark.graphx.impl.GraphImpl
-
- aggregateUsingIndex(RDD<Tuple2<Object, VD2>>, Function2<VD2, VD2, VD2>, ClassTag<VD2>) - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
-
- aggregateUsingIndex(RDD<Tuple2<Object, VD2>>, Function2<VD2, VD2, VD2>, ClassTag<VD2>) - Method in class org.apache.spark.graphx.VertexRDD
-
Aggregates vertices in messages
that have the same ids using reduceFunc
, returning a
VertexRDD co-indexed with this
.
- AggregatingEdgeContext<VD,ED,A> - Class in org.apache.spark.graphx.impl
-
- AggregatingEdgeContext(Function2<A, A, A>, Object, BitSet) - Constructor for class org.apache.spark.graphx.impl.AggregatingEdgeContext
-
- Aggregator<K,V,C> - Class in org.apache.spark
-
:: DeveloperApi ::
A set of functions used to aggregate data.
- Aggregator(Function1<V, C>, Function2<C, V, C>, Function2<C, C, C>) - Constructor for class org.apache.spark.Aggregator
-
- aggregator() - Method in class org.apache.spark.ShuffleDependency
-
- Algo - Class in org.apache.spark.mllib.tree.configuration
-
:: Experimental ::
Enum to select the algorithm for the decision tree
- Algo() - Constructor for class org.apache.spark.mllib.tree.configuration.Algo
-
- algo() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- algo() - Method in class org.apache.spark.mllib.tree.model.DecisionTreeModel
-
- algo() - Method in class org.apache.spark.mllib.tree.model.GradientBoostedTreesModel
-
- algo() - Method in class org.apache.spark.mllib.tree.model.RandomForestModel
-
- algorithm() - Method in class org.apache.spark.mllib.classification.StreamingLogisticRegressionWithSGD
-
- algorithm() - Method in class org.apache.spark.mllib.regression.StreamingLinearAlgorithm
-
The algorithm to use for updating.
- algorithm() - Method in class org.apache.spark.mllib.regression.StreamingLinearRegressionWithSGD
-
- alias(String) - Method in class org.apache.spark.sql.Column
-
Gives the column an alias.
- All - Static variable in class org.apache.spark.graphx.TripletFields
-
Expose all the fields (source, edge, and destination).
- AlphaComponent - Annotation Type in org.apache.spark.annotation
-
A new component of Spark which may have unstable API's.
- ALS - Class in org.apache.spark.ml.recommendation
-
:: Experimental ::
Alternating Least Squares (ALS) matrix factorization.
- ALS(String) - Constructor for class org.apache.spark.ml.recommendation.ALS
-
- ALS() - Constructor for class org.apache.spark.ml.recommendation.ALS
-
- ALS - Class in org.apache.spark.mllib.recommendation
-
Alternating Least Squares matrix factorization.
- ALS() - Constructor for class org.apache.spark.mllib.recommendation.ALS
-
Constructs an ALS instance with default parameters: {numBlocks: -1, rank: 10, iterations: 10,
lambda: 0.01, implicitPrefs: false, alpha: 1.0}.
- ALS.Rating<ID> - Class in org.apache.spark.ml.recommendation
-
:: DeveloperApi ::
Rating class for better code readability.
- ALS.Rating(ID, ID, float) - Constructor for class org.apache.spark.ml.recommendation.ALS.Rating
-
- ALS.Rating$ - Class in org.apache.spark.ml.recommendation
-
- ALS.Rating$() - Constructor for class org.apache.spark.ml.recommendation.ALS.Rating$
-
- ALSModel - Class in org.apache.spark.ml.recommendation
-
:: Experimental ::
Model fitted by ALS.
- AnalysisException - Exception in org.apache.spark.sql
-
:: DeveloperApi ::
Thrown when a query fails to analyze, usually because the query itself is invalid.
- AnalysisException(String, Option<Object>, Option<Object>) - Constructor for exception org.apache.spark.sql.AnalysisException
-
- analyze(String) - Method in class org.apache.spark.sql.hive.HiveContext
-
Analyzes the given table in the current database to generate statistics, which will be
used in query optimizations.
- analyzed() - Method in class org.apache.spark.sql.SQLContext.QueryExecution
-
- analyzer() - Method in class org.apache.spark.sql.hive.HiveContext
-
- analyzer() - Method in class org.apache.spark.sql.SQLContext
-
- and(Column) - Method in class org.apache.spark.sql.Column
-
Boolean AND.
- And - Class in org.apache.spark.sql.sources
-
A filter that evaluates to true
iff both left
or right
evaluate to true
.
- And(Filter, Filter) - Constructor for class org.apache.spark.sql.sources.And
-
- antecedent() - Method in class org.apache.spark.mllib.fpm.AssociationRules.Rule
-
- ANY() - Static method in class org.apache.spark.scheduler.TaskLocality
-
- anyNull() - Method in interface org.apache.spark.sql.Row
-
Returns true if there are any NULL values in this row.
- appAttemptId() - Method in class org.apache.spark.scheduler.SparkListenerApplicationStart
-
- appendBias(Vector) - Static method in class org.apache.spark.mllib.util.MLUtils
-
Returns a new vector with 1.0
(bias) appended to the input vector.
- appId() - Method in class org.apache.spark.scheduler.SparkListenerApplicationStart
-
- applicationAttemptId() - Method in class org.apache.spark.SparkContext
-
- ApplicationAttemptInfo - Class in org.apache.spark.status.api.v1
-
- applicationId() - Method in class org.apache.spark.SparkContext
-
A unique identifier for the Spark application.
- ApplicationInfo - Class in org.apache.spark.status.api.v1
-
- ApplicationStatus - Enum in org.apache.spark.status.api.v1
-
- apply(RDD<Tuple2<Object, VD>>, RDD<Edge<ED>>, VD, StorageLevel, StorageLevel, ClassTag<VD>, ClassTag<ED>) - Static method in class org.apache.spark.graphx.Graph
-
Construct a graph from a collection of vertices and
edges with attributes.
- apply(RDD<Edge<ED>>, VD, StorageLevel, StorageLevel, ClassTag<VD>, ClassTag<ED>) - Static method in class org.apache.spark.graphx.impl.GraphImpl
-
Create a graph from edges, setting referenced vertices to `defaultVertexAttr`.
- apply(RDD<Tuple2<Object, VD>>, RDD<Edge<ED>>, VD, StorageLevel, StorageLevel, ClassTag<VD>, ClassTag<ED>) - Static method in class org.apache.spark.graphx.impl.GraphImpl
-
Create a graph from vertices and edges, setting missing vertices to `defaultVertexAttr`.
- apply(VertexRDD<VD>, EdgeRDD<ED>, ClassTag<VD>, ClassTag<ED>) - Static method in class org.apache.spark.graphx.impl.GraphImpl
-
Create a graph from a VertexRDD and an EdgeRDD with arbitrary replicated vertices.
- apply(Graph<VD, ED>, A, int, EdgeDirection, Function3<Object, VD, A, VD>, Function1<EdgeTriplet<VD, ED>, Iterator<Tuple2<Object, A>>>, Function2<A, A, A>, ClassTag<VD>, ClassTag<ED>, ClassTag<A>) - Static method in class org.apache.spark.graphx.Pregel
-
Execute a Pregel-like iterative vertex-parallel abstraction.
- apply(RDD<Tuple2<Object, VD>>, ClassTag<VD>) - Static method in class org.apache.spark.graphx.VertexRDD
-
Constructs a standalone
VertexRDD
(one that is not set up for efficient joins with an
EdgeRDD
) from an RDD of vertex-attribute pairs.
- apply(RDD<Tuple2<Object, VD>>, EdgeRDD<?>, VD, ClassTag<VD>) - Static method in class org.apache.spark.graphx.VertexRDD
-
Constructs a VertexRDD
from an RDD of vertex-attribute pairs.
- apply(RDD<Tuple2<Object, VD>>, EdgeRDD<?>, VD, Function2<VD, VD, VD>, ClassTag<VD>) - Static method in class org.apache.spark.graphx.VertexRDD
-
Constructs a VertexRDD
from an RDD of vertex-attribute pairs.
- apply(String) - Method in class org.apache.spark.ml.attribute.AttributeGroup
-
Gets an attribute by its name.
- apply(int) - Method in class org.apache.spark.ml.attribute.AttributeGroup
-
Gets an attribute by its index.
- apply(Param<T>) - Method in class org.apache.spark.ml.param.ParamMap
-
Gets the value of the input param or its default value if it does not exist.
- apply(int, int) - Method in class org.apache.spark.mllib.linalg.DenseMatrix
-
- apply(int) - Method in class org.apache.spark.mllib.linalg.DenseVector
-
- apply(int, int) - Method in interface org.apache.spark.mllib.linalg.Matrix
-
Gets the (i, j)-th element.
- apply(int, int) - Method in class org.apache.spark.mllib.linalg.SparseMatrix
-
- apply(int) - Method in interface org.apache.spark.mllib.linalg.Vector
-
Gets the value of the ith element.
- apply(int, Predict, double, boolean) - Static method in class org.apache.spark.mllib.tree.model.Node
-
Construct a node with nodeIndex, predict, impurity and isLeaf parameters.
- apply(String) - Static method in class org.apache.spark.rdd.PartitionGroup
-
- apply(long, String, Option<String>, String) - Static method in class org.apache.spark.scheduler.AccumulableInfo
-
- apply(long, String, String) - Static method in class org.apache.spark.scheduler.AccumulableInfo
-
- apply(long, TaskMetrics) - Static method in class org.apache.spark.scheduler.RuntimePercentage
-
- apply(Object) - Method in class org.apache.spark.sql.Column
-
Extracts a value or values from a complex type.
- apply(String) - Method in class org.apache.spark.sql.DataFrame
-
Selects column based on the column name and return it as a
Column
.
- apply(Column...) - Method in class org.apache.spark.sql.expressions.UserDefinedAggregateFunction
-
Creates a Column
for this UDAF using given Column
s as input arguments.
- apply(Seq<Column>) - Method in class org.apache.spark.sql.expressions.UserDefinedAggregateFunction
-
Creates a Column
for this UDAF using given Column
s as input arguments.
- apply(DataFrame, Seq<Expression>, GroupedData.GroupType) - Static method in class org.apache.spark.sql.GroupedData
-
- apply(int) - Method in interface org.apache.spark.sql.Row
-
Returns the value at position i.
- apply(Object[], Object[]) - Static method in class org.apache.spark.sql.types.ArrayBasedMapData
-
- apply(DataType) - Static method in class org.apache.spark.sql.types.ArrayType
-
Construct a
ArrayType
object with the given element type.
- apply(double) - Static method in class org.apache.spark.sql.types.Decimal
-
- apply(long) - Static method in class org.apache.spark.sql.types.Decimal
-
- apply(int) - Static method in class org.apache.spark.sql.types.Decimal
-
- apply(BigDecimal) - Static method in class org.apache.spark.sql.types.Decimal
-
- apply(BigDecimal) - Static method in class org.apache.spark.sql.types.Decimal
-
- apply(BigDecimal, int, int) - Static method in class org.apache.spark.sql.types.Decimal
-
- apply(BigDecimal, int, int) - Static method in class org.apache.spark.sql.types.Decimal
-
- apply(long, int, int) - Static method in class org.apache.spark.sql.types.Decimal
-
- apply(String) - Static method in class org.apache.spark.sql.types.Decimal
-
- apply() - Static method in class org.apache.spark.sql.types.DecimalType
-
- apply(Option<PrecisionInfo>) - Static method in class org.apache.spark.sql.types.DecimalType
-
- apply(DataType, DataType) - Static method in class org.apache.spark.sql.types.MapType
-
Construct a
MapType
object with the given key type and value type.
- apply(String) - Method in class org.apache.spark.sql.types.StructType
-
- apply(Set<String>) - Method in class org.apache.spark.sql.types.StructType
-
Returns a
StructType
containing
StructField
s of the given names, preserving the
original order of fields.
- apply(int) - Method in class org.apache.spark.sql.types.StructType
-
- apply(Seq<Column>) - Method in class org.apache.spark.sql.UserDefinedFunction
-
- apply(String) - Static method in class org.apache.spark.storage.BlockId
-
Converts a BlockId "name" String back into a BlockId.
- apply(String, String, int) - Static method in class org.apache.spark.storage.BlockManagerId
-
- apply(ObjectInput) - Static method in class org.apache.spark.storage.BlockManagerId
-
- apply(boolean, boolean, boolean, boolean, int) - Static method in class org.apache.spark.storage.StorageLevel
-
:: DeveloperApi ::
Create a new StorageLevel object without setting useOffHeap.
- apply(boolean, boolean, boolean, int) - Static method in class org.apache.spark.storage.StorageLevel
-
:: DeveloperApi ::
Create a new StorageLevel object.
- apply(int, int) - Static method in class org.apache.spark.storage.StorageLevel
-
:: DeveloperApi ::
Create a new StorageLevel object from its integer representation.
- apply(ObjectInput) - Static method in class org.apache.spark.storage.StorageLevel
-
:: DeveloperApi ::
Read StorageLevel object from ObjectInput stream.
- apply(String, int) - Static method in class org.apache.spark.streaming.kafka.Broker
-
- apply(String, int, long, long) - Static method in class org.apache.spark.streaming.kafka.OffsetRange
-
- apply(TopicAndPartition, long, long) - Static method in class org.apache.spark.streaming.kafka.OffsetRange
-
- apply(long) - Static method in class org.apache.spark.streaming.Milliseconds
-
- apply(long) - Static method in class org.apache.spark.streaming.Minutes
-
- apply(long) - Static method in class org.apache.spark.streaming.Seconds
-
- apply(TraversableOnce<Object>) - Static method in class org.apache.spark.util.StatCounter
-
Build a StatCounter from a list of values.
- apply(Seq<Object>) - Static method in class org.apache.spark.util.StatCounter
-
Build a StatCounter from a list of values passed as variable-length arguments.
- apply(int) - Method in class org.apache.spark.util.Vector
-
- applySchema(RDD<Row>, StructType) - Method in class org.apache.spark.sql.SQLContext
-
- applySchema(JavaRDD<Row>, StructType) - Method in class org.apache.spark.sql.SQLContext
-
- applySchema(RDD<?>, Class<?>) - Method in class org.apache.spark.sql.SQLContext
-
- applySchema(JavaRDD<?>, Class<?>) - Method in class org.apache.spark.sql.SQLContext
-
- applySchemaToPythonRDD(RDD<Object[]>, String) - Method in class org.apache.spark.sql.SQLContext
-
- applySchemaToPythonRDD(RDD<Object[]>, StructType) - Method in class org.apache.spark.sql.SQLContext
-
- appName() - Method in class org.apache.spark.api.java.JavaSparkContext
-
- appName() - Method in class org.apache.spark.scheduler.SparkListenerApplicationStart
-
- appName() - Method in class org.apache.spark.SparkContext
-
- approxCountDistinct(Column) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns the approximate number of distinct items in a group.
- approxCountDistinct(String) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns the approximate number of distinct items in a group.
- approxCountDistinct(Column, double) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns the approximate number of distinct items in a group.
- approxCountDistinct(String, double) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns the approximate number of distinct items in a group.
- ApproxHist() - Static method in class org.apache.spark.mllib.tree.configuration.QuantileStrategy
-
- areaUnderPR() - Method in class org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
-
Computes the area under the precision-recall curve.
- areaUnderROC() - Method in class org.apache.spark.ml.classification.BinaryLogisticRegressionSummary
-
Computes the area under the receiver operating characteristic (ROC) curve.
- areaUnderROC() - Method in class org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
-
Computes the area under the receiver operating characteristic (ROC) curve.
- argmax() - Method in class org.apache.spark.mllib.linalg.DenseVector
-
- argmax() - Method in class org.apache.spark.mllib.linalg.SparseVector
-
- argmax() - Method in interface org.apache.spark.mllib.linalg.Vector
-
Find the index of a maximal element.
- arr() - Method in class org.apache.spark.rdd.PartitionGroup
-
- array(DataType) - Method in class org.apache.spark.sql.ColumnName
-
Creates a new StructField
of type array.
- array(Column...) - Static method in class org.apache.spark.sql.functions
-
Creates a new array column.
- array(Seq<Column>) - Static method in class org.apache.spark.sql.functions
-
Creates a new array column.
- array(String, Seq<String>) - Static method in class org.apache.spark.sql.functions
-
Creates a new array column.
- array() - Method in class org.apache.spark.sql.types.GenericArrayData
-
- array_contains(Column, Object) - Static method in class org.apache.spark.sql.functions
-
Returns true if the array contain the value
- ArrayBasedMapData - Class in org.apache.spark.sql.types
-
- ArrayBasedMapData(ArrayData, ArrayData) - Constructor for class org.apache.spark.sql.types.ArrayBasedMapData
-
- ArrayData - Class in org.apache.spark.sql.types
-
- ArrayData() - Constructor for class org.apache.spark.sql.types.ArrayData
-
- arrayLengthGt(double) - Static method in class org.apache.spark.ml.param.ParamValidators
-
Check that the array length is greater than lowerBound.
- ArrayType - Class in org.apache.spark.sql.types
-
- ArrayType(DataType, boolean) - Constructor for class org.apache.spark.sql.types.ArrayType
-
- ArrayType() - Constructor for class org.apache.spark.sql.types.ArrayType
-
No-arg constructor for kryo.
- as(String) - Method in class org.apache.spark.sql.Column
-
Gives the column an alias.
- as(Seq<String>) - Method in class org.apache.spark.sql.Column
-
(Scala-specific) Assigns the given aliases to the results of a table generating function.
- as(String[]) - Method in class org.apache.spark.sql.Column
-
Assigns the given aliases to the results of a table generating function.
- as(Symbol) - Method in class org.apache.spark.sql.Column
-
Gives the column an alias.
- as(String, Metadata) - Method in class org.apache.spark.sql.Column
-
Gives the column an alias with metadata.
- as(String) - Method in class org.apache.spark.sql.DataFrame
-
- as(Symbol) - Method in class org.apache.spark.sql.DataFrame
-
(Scala-specific) Returns a new
DataFrame
with an alias set.
- asc() - Method in class org.apache.spark.sql.Column
-
Returns an ordering used in sorting.
- asc(String) - Static method in class org.apache.spark.sql.functions
-
Returns a sort expression based on ascending order of the column.
- ascii(Column) - Static method in class org.apache.spark.sql.functions
-
Computes the numeric value of the first character of the string column, and returns the
result as a int column.
- asin(Column) - Static method in class org.apache.spark.sql.functions
-
Computes the sine inverse of the given value; the returned angle is in the range
-pi/2 through pi/2.
- asin(String) - Static method in class org.apache.spark.sql.functions
-
Computes the sine inverse of the given column; the returned angle is in the range
-pi/2 through pi/2.
- asIntegral() - Method in class org.apache.spark.sql.types.DecimalType
-
- asIntegral() - Method in class org.apache.spark.sql.types.DoubleType
-
- asIntegral() - Method in class org.apache.spark.sql.types.FloatType
-
- asIterator() - Method in class org.apache.spark.serializer.DeserializationStream
-
Read the elements of this stream through an iterator.
- asJavaPairRDD() - Method in class org.apache.spark.api.r.PairwiseRRDD
-
- asJavaRDD() - Method in class org.apache.spark.api.r.RRDD
-
- asJavaRDD() - Method in class org.apache.spark.api.r.StringRRDD
-
- asKeyValueIterator() - Method in class org.apache.spark.serializer.DeserializationStream
-
Read the elements of this stream through an iterator over key-value pairs.
- AskPermissionToCommitOutput - Class in org.apache.spark.scheduler
-
- AskPermissionToCommitOutput(int, int, int) - Constructor for class org.apache.spark.scheduler.AskPermissionToCommitOutput
-
- askTimeout(SparkConf) - Static method in class org.apache.spark.util.RpcUtils
-
- asRDDId() - Method in class org.apache.spark.storage.BlockId
-
- assertAnalyzed() - Method in class org.apache.spark.sql.SQLContext.QueryExecution
-
- assertValid() - Method in class org.apache.spark.broadcast.Broadcast
-
Check if this broadcast is valid.
- assignments() - Method in class org.apache.spark.mllib.clustering.PowerIterationClusteringModel
-
- AssociationRules - Class in org.apache.spark.mllib.fpm
-
:: Experimental ::
- AssociationRules() - Constructor for class org.apache.spark.mllib.fpm.AssociationRules
-
Constructs a default instance with default parameters {minConfidence = 0.8}.
- AssociationRules.Rule<Item> - Class in org.apache.spark.mllib.fpm
-
:: Experimental ::
- AsyncRDDActions<T> - Class in org.apache.spark.rdd
-
A set of asynchronous RDD actions available through an implicit conversion.
- AsyncRDDActions(RDD<T>, ClassTag<T>) - Constructor for class org.apache.spark.rdd.AsyncRDDActions
-
- atan(Column) - Static method in class org.apache.spark.sql.functions
-
Computes the tangent inverse of the given value.
- atan(String) - Static method in class org.apache.spark.sql.functions
-
Computes the tangent inverse of the given column.
- atan2(Column, Column) - Static method in class org.apache.spark.sql.functions
-
Returns the angle theta from the conversion of rectangular coordinates (x, y) to
polar coordinates (r, theta).
- atan2(Column, String) - Static method in class org.apache.spark.sql.functions
-
Returns the angle theta from the conversion of rectangular coordinates (x, y) to
polar coordinates (r, theta).
- atan2(String, Column) - Static method in class org.apache.spark.sql.functions
-
Returns the angle theta from the conversion of rectangular coordinates (x, y) to
polar coordinates (r, theta).
- atan2(String, String) - Static method in class org.apache.spark.sql.functions
-
Returns the angle theta from the conversion of rectangular coordinates (x, y) to
polar coordinates (r, theta).
- atan2(Column, double) - Static method in class org.apache.spark.sql.functions
-
Returns the angle theta from the conversion of rectangular coordinates (x, y) to
polar coordinates (r, theta).
- atan2(String, double) - Static method in class org.apache.spark.sql.functions
-
Returns the angle theta from the conversion of rectangular coordinates (x, y) to
polar coordinates (r, theta).
- atan2(double, Column) - Static method in class org.apache.spark.sql.functions
-
Returns the angle theta from the conversion of rectangular coordinates (x, y) to
polar coordinates (r, theta).
- atan2(double, String) - Static method in class org.apache.spark.sql.functions
-
Returns the angle theta from the conversion of rectangular coordinates (x, y) to
polar coordinates (r, theta).
- attempt() - Method in class org.apache.spark.scheduler.TaskInfo
-
- attempt() - Method in class org.apache.spark.status.api.v1.TaskData
-
- attemptId() - Method in class org.apache.spark.scheduler.StageInfo
-
- attemptId() - Method in class org.apache.spark.status.api.v1.ApplicationAttemptInfo
-
- attemptId() - Method in class org.apache.spark.status.api.v1.StageData
-
- attemptId() - Method in class org.apache.spark.TaskContext
-
- attemptNumber() - Method in class org.apache.spark.scheduler.AskPermissionToCommitOutput
-
- attemptNumber() - Method in class org.apache.spark.scheduler.TaskInfo
-
- attemptNumber() - Method in class org.apache.spark.TaskCommitDenied
-
- attemptNumber() - Method in class org.apache.spark.TaskContext
-
How many times this task has been attempted.
- attempts() - Method in class org.apache.spark.status.api.v1.ApplicationInfo
-
- attr() - Method in class org.apache.spark.graphx.Edge
-
- attr() - Method in class org.apache.spark.graphx.EdgeContext
-
The attribute associated with the edge.
- attr() - Method in class org.apache.spark.graphx.impl.AggregatingEdgeContext
-
- Attribute - Class in org.apache.spark.ml.attribute
-
:: DeveloperApi ::
Abstract class for ML attributes.
- Attribute() - Constructor for class org.apache.spark.ml.attribute.Attribute
-
- attribute() - Method in class org.apache.spark.sql.sources.EqualNullSafe
-
- attribute() - Method in class org.apache.spark.sql.sources.EqualTo
-
- attribute() - Method in class org.apache.spark.sql.sources.GreaterThan
-
- attribute() - Method in class org.apache.spark.sql.sources.GreaterThanOrEqual
-
- attribute() - Method in class org.apache.spark.sql.sources.In
-
- attribute() - Method in class org.apache.spark.sql.sources.IsNotNull
-
- attribute() - Method in class org.apache.spark.sql.sources.IsNull
-
- attribute() - Method in class org.apache.spark.sql.sources.LessThan
-
- attribute() - Method in class org.apache.spark.sql.sources.LessThanOrEqual
-
- attribute() - Method in class org.apache.spark.sql.sources.StringContains
-
- attribute() - Method in class org.apache.spark.sql.sources.StringEndsWith
-
- attribute() - Method in class org.apache.spark.sql.sources.StringStartsWith
-
- AttributeGroup - Class in org.apache.spark.ml.attribute
-
:: DeveloperApi ::
Attributes that describe a vector ML column.
- AttributeGroup(String) - Constructor for class org.apache.spark.ml.attribute.AttributeGroup
-
Creates an attribute group without attribute info.
- AttributeGroup(String, int) - Constructor for class org.apache.spark.ml.attribute.AttributeGroup
-
Creates an attribute group knowing only the number of attributes.
- AttributeGroup(String, Attribute[]) - Constructor for class org.apache.spark.ml.attribute.AttributeGroup
-
Creates an attribute group with attributes.
- attributes() - Method in class org.apache.spark.ml.attribute.AttributeGroup
-
Optional array of attributes.
- AttributeType - Class in org.apache.spark.ml.attribute
-
:: DeveloperApi ::
An enum-like type for attribute types: AttributeType$.Numeric
, AttributeType$.Nominal
,
and AttributeType$.Binary
.
- AttributeType(String) - Constructor for class org.apache.spark.ml.attribute.AttributeType
-
- attrType() - Method in class org.apache.spark.ml.attribute.Attribute
-
Attribute type.
- attrType() - Method in class org.apache.spark.ml.attribute.BinaryAttribute
-
- attrType() - Method in class org.apache.spark.ml.attribute.NominalAttribute
-
- attrType() - Method in class org.apache.spark.ml.attribute.NumericAttribute
-
- attrType() - Static method in class org.apache.spark.ml.attribute.UnresolvedAttribute
-
- available() - Method in class org.apache.spark.storage.BufferReleasingInputStream
-
- avg(Column) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns the average of the values in a group.
- avg(String) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns the average of the values in a group.
- avg(String...) - Method in class org.apache.spark.sql.GroupedData
-
Compute the mean value for each numeric columns for each group.
- avg(Seq<String>) - Method in class org.apache.spark.sql.GroupedData
-
Compute the mean value for each numeric columns for each group.
- avgMetrics() - Method in class org.apache.spark.ml.tuning.CrossValidatorModel
-
- awaitTermination() - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Wait for the execution to stop.
- awaitTermination(long) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Deprecated.
As of 1.3.0, replaced by awaitTerminationOrTimeout(Long)
.
- awaitTermination() - Method in class org.apache.spark.streaming.StreamingContext
-
Wait for the execution to stop.
- awaitTermination(long) - Method in class org.apache.spark.streaming.StreamingContext
-
Deprecated.
As of 1.3.0, replaced by awaitTerminationOrTimeout(Long)
.
- awaitTerminationOrTimeout(long) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Wait for the execution to stop.
- awaitTerminationOrTimeout(long) - Method in class org.apache.spark.streaming.StreamingContext
-
Wait for the execution to stop.
- cache() - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Persist this RDD with the default storage level (`MEMORY_ONLY`).
- cache() - Method in class org.apache.spark.api.java.JavaPairRDD
-
Persist this RDD with the default storage level (`MEMORY_ONLY`).
- cache() - Method in class org.apache.spark.api.java.JavaRDD
-
Persist this RDD with the default storage level (`MEMORY_ONLY`).
- cache() - Method in class org.apache.spark.graphx.Graph
-
Caches the vertices and edges associated with this graph at the previously-specified target
storage levels, which default to MEMORY_ONLY
.
- cache() - Method in class org.apache.spark.graphx.impl.EdgeRDDImpl
-
Persists the edge partitions using `targetStorageLevel`, which defaults to MEMORY_ONLY.
- cache() - Method in class org.apache.spark.graphx.impl.GraphImpl
-
- cache() - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
-
Persists the vertex partitions at `targetStorageLevel`, which defaults to MEMORY_ONLY.
- cache() - Method in class org.apache.spark.mllib.linalg.distributed.BlockMatrix
-
Caches the underlying RDD.
- cache() - Method in class org.apache.spark.rdd.RDD
-
Persist this RDD with the default storage level (`MEMORY_ONLY`).
- cache() - Method in class org.apache.spark.sql.DataFrame
-
- cache() - Method in class org.apache.spark.streaming.api.java.JavaDStream
-
Persist RDDs of this DStream with the default storage level (MEMORY_ONLY_SER)
- cache() - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Persist RDDs of this DStream with the default storage level (MEMORY_ONLY_SER)
- cache() - Method in class org.apache.spark.streaming.dstream.DStream
-
Persist RDDs of this DStream with the default storage level (MEMORY_ONLY_SER)
- cachedLeafStatuses() - Method in class org.apache.spark.sql.sources.HadoopFsRelation
-
- cacheManager() - Method in class org.apache.spark.SparkEnv
-
- cacheManager() - Method in class org.apache.spark.sql.SQLContext
-
- cacheTable(String) - Method in class org.apache.spark.sql.SQLContext
-
Caches the specified table in-memory.
- calculate(DenseVector<Object>) - Method in class org.apache.spark.ml.classification.LogisticCostFun
-
- calculate(DenseVector<Object>) - Method in class org.apache.spark.ml.regression.LeastSquaresCostFun
-
- calculate(double[], double) - Static method in class org.apache.spark.mllib.tree.impurity.Entropy
-
:: DeveloperApi ::
information calculation for multiclass classification
- calculate(double, double, double) - Static method in class org.apache.spark.mllib.tree.impurity.Entropy
-
:: DeveloperApi ::
variance calculation
- calculate(double[], double) - Static method in class org.apache.spark.mllib.tree.impurity.Gini
-
:: DeveloperApi ::
information calculation for multiclass classification
- calculate(double, double, double) - Static method in class org.apache.spark.mllib.tree.impurity.Gini
-
:: DeveloperApi ::
variance calculation
- calculate(double[], double) - Method in interface org.apache.spark.mllib.tree.impurity.Impurity
-
:: DeveloperApi ::
information calculation for multiclass classification
- calculate(double, double, double) - Method in interface org.apache.spark.mllib.tree.impurity.Impurity
-
:: DeveloperApi ::
information calculation for regression
- calculate(double[], double) - Static method in class org.apache.spark.mllib.tree.impurity.Variance
-
:: DeveloperApi ::
information calculation for multiclass classification
- calculate(double, double, double) - Static method in class org.apache.spark.mllib.tree.impurity.Variance
-
:: DeveloperApi ::
variance calculation
- CalendarIntervalType - Class in org.apache.spark.sql.types
-
:: DeveloperApi ::
The data type representing calendar time intervals.
- CalendarIntervalType - Static variable in class org.apache.spark.sql.types.DataTypes
-
Gets the CalendarIntervalType object.
- call(T) - Method in interface org.apache.spark.api.java.function.DoubleFlatMapFunction
-
- call(T) - Method in interface org.apache.spark.api.java.function.DoubleFunction
-
- call(T) - Method in interface org.apache.spark.api.java.function.FlatMapFunction
-
- call(T1, T2) - Method in interface org.apache.spark.api.java.function.FlatMapFunction2
-
- call(T1) - Method in interface org.apache.spark.api.java.function.Function
-
- call() - Method in interface org.apache.spark.api.java.function.Function0
-
- call(T1, T2) - Method in interface org.apache.spark.api.java.function.Function2
-
- call(T1, T2, T3) - Method in interface org.apache.spark.api.java.function.Function3
-
- call(T) - Method in interface org.apache.spark.api.java.function.PairFlatMapFunction
-
- call(T) - Method in interface org.apache.spark.api.java.function.PairFunction
-
- call(T) - Method in interface org.apache.spark.api.java.function.VoidFunction
-
- call(T1) - Method in interface org.apache.spark.sql.api.java.UDF1
-
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10) - Method in interface org.apache.spark.sql.api.java.UDF10
-
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11) - Method in interface org.apache.spark.sql.api.java.UDF11
-
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12) - Method in interface org.apache.spark.sql.api.java.UDF12
-
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13) - Method in interface org.apache.spark.sql.api.java.UDF13
-
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14) - Method in interface org.apache.spark.sql.api.java.UDF14
-
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15) - Method in interface org.apache.spark.sql.api.java.UDF15
-
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16) - Method in interface org.apache.spark.sql.api.java.UDF16
-
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17) - Method in interface org.apache.spark.sql.api.java.UDF17
-
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18) - Method in interface org.apache.spark.sql.api.java.UDF18
-
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19) - Method in interface org.apache.spark.sql.api.java.UDF19
-
- call(T1, T2) - Method in interface org.apache.spark.sql.api.java.UDF2
-
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20) - Method in interface org.apache.spark.sql.api.java.UDF20
-
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21) - Method in interface org.apache.spark.sql.api.java.UDF21
-
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22) - Method in interface org.apache.spark.sql.api.java.UDF22
-
- call(T1, T2, T3) - Method in interface org.apache.spark.sql.api.java.UDF3
-
- call(T1, T2, T3, T4) - Method in interface org.apache.spark.sql.api.java.UDF4
-
- call(T1, T2, T3, T4, T5) - Method in interface org.apache.spark.sql.api.java.UDF5
-
- call(T1, T2, T3, T4, T5, T6) - Method in interface org.apache.spark.sql.api.java.UDF6
-
- call(T1, T2, T3, T4, T5, T6, T7) - Method in interface org.apache.spark.sql.api.java.UDF7
-
- call(T1, T2, T3, T4, T5, T6, T7, T8) - Method in interface org.apache.spark.sql.api.java.UDF8
-
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9) - Method in interface org.apache.spark.sql.api.java.UDF9
-
- callUDF(String, Column...) - Static method in class org.apache.spark.sql.functions
-
Call an user-defined function.
- callUDF(Function0<?>, DataType) - Static method in class org.apache.spark.sql.functions
-
Deprecated.
As of 1.5.0, since it's redundant with udf()
- callUDF(Function1<?, ?>, DataType, Column) - Static method in class org.apache.spark.sql.functions
-
Deprecated.
As of 1.5.0, since it's redundant with udf()
- callUDF(Function2<?, ?, ?>, DataType, Column, Column) - Static method in class org.apache.spark.sql.functions
-
Deprecated.
As of 1.5.0, since it's redundant with udf()
- callUDF(Function3<?, ?, ?, ?>, DataType, Column, Column, Column) - Static method in class org.apache.spark.sql.functions
-
Deprecated.
As of 1.5.0, since it's redundant with udf()
- callUDF(Function4<?, ?, ?, ?, ?>, DataType, Column, Column, Column, Column) - Static method in class org.apache.spark.sql.functions
-
Deprecated.
As of 1.5.0, since it's redundant with udf()
- callUDF(Function5<?, ?, ?, ?, ?, ?>, DataType, Column, Column, Column, Column, Column) - Static method in class org.apache.spark.sql.functions
-
Deprecated.
As of 1.5.0, since it's redundant with udf()
- callUDF(Function6<?, ?, ?, ?, ?, ?, ?>, DataType, Column, Column, Column, Column, Column, Column) - Static method in class org.apache.spark.sql.functions
-
Deprecated.
As of 1.5.0, since it's redundant with udf()
- callUDF(Function7<?, ?, ?, ?, ?, ?, ?, ?>, DataType, Column, Column, Column, Column, Column, Column, Column) - Static method in class org.apache.spark.sql.functions
-
Deprecated.
As of 1.5.0, since it's redundant with udf()
- callUDF(Function8<?, ?, ?, ?, ?, ?, ?, ?, ?>, DataType, Column, Column, Column, Column, Column, Column, Column, Column) - Static method in class org.apache.spark.sql.functions
-
Deprecated.
As of 1.5.0, since it's redundant with udf()
- callUDF(Function9<?, ?, ?, ?, ?, ?, ?, ?, ?, ?>, DataType, Column, Column, Column, Column, Column, Column, Column, Column, Column) - Static method in class org.apache.spark.sql.functions
-
Deprecated.
As of 1.5.0, since it's redundant with udf()
- callUDF(Function10<?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?>, DataType, Column, Column, Column, Column, Column, Column, Column, Column, Column, Column) - Static method in class org.apache.spark.sql.functions
-
Deprecated.
As of 1.5.0, since it's redundant with udf()
- callUDF(String, Seq<Column>) - Static method in class org.apache.spark.sql.functions
-
Call an user-defined function.
- callUdf(String, Seq<Column>) - Static method in class org.apache.spark.sql.functions
-
Deprecated.
As of 1.5.0, since it was not coherent to have two functions callUdf and callUDF
- cancel() - Method in class org.apache.spark.ComplexFutureAction
-
- cancel() - Method in interface org.apache.spark.FutureAction
-
Cancels the execution of this action.
- cancel() - Method in class org.apache.spark.SimpleFutureAction
-
- cancelAllJobs() - Method in class org.apache.spark.api.java.JavaSparkContext
-
Cancel all jobs that have been scheduled or are running.
- cancelAllJobs() - Method in class org.apache.spark.SparkContext
-
Cancel all jobs that have been scheduled or are running.
- cancelJobGroup(String) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Cancel active jobs for the specified group.
- cancelJobGroup(String) - Method in class org.apache.spark.SparkContext
-
Cancel active jobs for the specified group.
- canEqual(Object) - Method in class org.apache.spark.scheduler.cluster.ExecutorInfo
-
- canEqual(Object) - Method in class org.apache.spark.util.MutablePair
-
- canHandle(String) - Method in class org.apache.spark.sql.jdbc.AggregatedDialect
-
- canHandle(String) - Method in class org.apache.spark.sql.jdbc.JdbcDialect
-
Check if this dialect instance can handle a certain jdbc url.
- canHandle(String) - Static method in class org.apache.spark.sql.jdbc.MySQLDialect
-
- canHandle(String) - Static method in class org.apache.spark.sql.jdbc.NoopDialect
-
- canHandle(String) - Static method in class org.apache.spark.sql.jdbc.PostgresDialect
-
- cartesian(JavaRDDLike<U, ?>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return the Cartesian product of this RDD and another one, that is, the RDD of all pairs of
elements (a, b) where a is in this
and b is in other
.
- cartesian(RDD<U>, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
-
Return the Cartesian product of this RDD and another one, that is, the RDD of all pairs of
elements (a, b) where a is in this
and b is in other
.
- caseSensitive() - Method in class org.apache.spark.ml.feature.StopWordsRemover
-
whether to do a case sensitive comparison over the stop words
Default: false
- cast(DataType) - Method in class org.apache.spark.sql.Column
-
Casts the column to a different data type.
- cast(String) - Method in class org.apache.spark.sql.Column
-
Casts the column to a different data type, using the canonical string representation
of the type.
- catalog() - Method in class org.apache.spark.sql.hive.HiveContext
-
- catalog() - Method in class org.apache.spark.sql.SQLContext
-
- CatalystScan - Interface in org.apache.spark.sql.sources
-
::Experimental::
An interface for experimenting with a more direct connection to the query planner.
- Categorical() - Static method in class org.apache.spark.mllib.tree.configuration.FeatureType
-
- categoricalFeaturesInfo() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- CategoricalSplit - Class in org.apache.spark.ml.tree
-
:: DeveloperApi ::
Split which tests a categorical feature.
- categories() - Method in class org.apache.spark.mllib.tree.model.Split
-
- categoryMaps() - Method in class org.apache.spark.ml.feature.VectorIndexerModel
-
- cbrt(Column) - Static method in class org.apache.spark.sql.functions
-
Computes the cube-root of the given value.
- cbrt(String) - Static method in class org.apache.spark.sql.functions
-
Computes the cube-root of the given column.
- ceil(Column) - Static method in class org.apache.spark.sql.functions
-
Computes the ceiling of the given value.
- ceil(String) - Static method in class org.apache.spark.sql.functions
-
Computes the ceiling of the given column.
- changePrecision(int, int) - Method in class org.apache.spark.sql.types.Decimal
-
Update precision and scale while keeping our value the same, and return true if successful.
- checkpoint() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Mark this RDD for checkpointing.
- checkpoint() - Method in class org.apache.spark.graphx.Graph
-
Mark this Graph for checkpointing.
- checkpoint() - Method in class org.apache.spark.graphx.impl.EdgeRDDImpl
-
- checkpoint() - Method in class org.apache.spark.graphx.impl.GraphImpl
-
- checkpoint() - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
-
- checkpoint() - Method in class org.apache.spark.rdd.HadoopRDD
-
- checkpoint() - Method in class org.apache.spark.rdd.RDD
-
Mark this RDD for checkpointing.
- checkpoint(Duration) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Enable periodic checkpointing of RDDs of this DStream.
- checkpoint(String) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Sets the context to periodically checkpoint the DStream operations for master
fault-tolerance.
- checkpoint(Duration) - Method in class org.apache.spark.streaming.dstream.DStream
-
Enable periodic checkpointing of RDDs of this DStream
- checkpoint(String) - Method in class org.apache.spark.streaming.StreamingContext
-
Set the context to periodically checkpoint the DStream operations for driver
fault-tolerance.
- checkpointData() - Method in class org.apache.spark.rdd.RDD
-
- checkpointData() - Method in class org.apache.spark.streaming.dstream.DStream
-
- checkpointDir() - Method in class org.apache.spark.SparkContext
-
- checkpointDir() - Method in class org.apache.spark.streaming.StreamingContext
-
- checkpointDuration() - Method in class org.apache.spark.streaming.dstream.DStream
-
- checkpointDuration() - Method in class org.apache.spark.streaming.StreamingContext
-
- checkpointFile(String) - Method in class org.apache.spark.api.java.JavaSparkContext
-
- checkpointFile(String, ClassTag<T>) - Method in class org.apache.spark.SparkContext
-
- checkpointInterval() - Method in class org.apache.spark.mllib.clustering.EMLDAOptimizer
-
- checkpointInterval() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- checkSplits(double[]) - Static method in class org.apache.spark.ml.feature.Bucketizer
-
We require splits to be of length >= 3 and to be in strictly increasing order.
- child() - Method in class org.apache.spark.sql.sources.Not
-
- ChiSqSelector - Class in org.apache.spark.mllib.feature
-
:: Experimental ::
Creates a ChiSquared feature selector.
- ChiSqSelector(int) - Constructor for class org.apache.spark.mllib.feature.ChiSqSelector
-
- ChiSqSelectorModel - Class in org.apache.spark.mllib.feature
-
:: Experimental ::
Chi Squared selector model.
- ChiSqSelectorModel(int[]) - Constructor for class org.apache.spark.mllib.feature.ChiSqSelectorModel
-
- chiSqTest(Vector, Vector) - Static method in class org.apache.spark.mllib.stat.Statistics
-
Conduct Pearson's chi-squared goodness of fit test of the observed data against the
expected distribution.
- chiSqTest(Vector) - Static method in class org.apache.spark.mllib.stat.Statistics
-
Conduct Pearson's chi-squared goodness of fit test of the observed data against the uniform
distribution, with each category having an expected frequency of 1 / observed.size
.
- chiSqTest(Matrix) - Static method in class org.apache.spark.mllib.stat.Statistics
-
Conduct Pearson's independence test on the input contingency matrix, which cannot contain
negative entries or columns or rows that sum up to 0.
- chiSqTest(RDD<LabeledPoint>) - Static method in class org.apache.spark.mllib.stat.Statistics
-
Conduct Pearson's independence test for every feature against the label across the input RDD.
- chiSqTest(JavaRDD<LabeledPoint>) - Static method in class org.apache.spark.mllib.stat.Statistics
-
Java-friendly version of chiSqTest()
- ChiSqTestResult - Class in org.apache.spark.mllib.stat.test
-
:: Experimental ::
Object containing the test results for the chi-squared hypothesis test.
- Classification() - Static method in class org.apache.spark.mllib.tree.configuration.Algo
-
- ClassificationModel<FeaturesType,M extends ClassificationModel<FeaturesType,M>> - Class in org.apache.spark.ml.classification
-
:: DeveloperApi ::
- ClassificationModel() - Constructor for class org.apache.spark.ml.classification.ClassificationModel
-
- ClassificationModel - Interface in org.apache.spark.mllib.classification
-
:: Experimental ::
Represents a classification model that predicts to which of a set of categories an example
belongs.
- Classifier<FeaturesType,E extends Classifier<FeaturesType,E,M>,M extends ClassificationModel<FeaturesType,M>> - Class in org.apache.spark.ml.classification
-
:: DeveloperApi ::
- Classifier() - Constructor for class org.apache.spark.ml.classification.Classifier
-
- className() - Method in class org.apache.spark.ExceptionFailure
-
- classpathEntries() - Method in class org.apache.spark.ui.env.EnvironmentListener
-
- classTag() - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
- classTag() - Method in class org.apache.spark.api.java.JavaPairRDD
-
- classTag() - Method in class org.apache.spark.api.java.JavaRDD
-
- classTag() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
- classTag() - Method in class org.apache.spark.streaming.api.java.JavaDStream
-
- classTag() - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
- classTag() - Method in class org.apache.spark.streaming.api.java.JavaInputDStream
-
- classTag() - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
- classTag() - Method in class org.apache.spark.streaming.api.java.JavaReceiverInputDStream
-
- clean(long, boolean) - Method in class org.apache.spark.streaming.util.WriteAheadLog
-
Clean all the records that are older than the threshold time.
- CleanAccum - Class in org.apache.spark
-
- CleanAccum(long) - Constructor for class org.apache.spark.CleanAccum
-
- CleanBroadcast - Class in org.apache.spark
-
- CleanBroadcast(long) - Constructor for class org.apache.spark.CleanBroadcast
-
- CleanCheckpoint - Class in org.apache.spark
-
- CleanCheckpoint(int) - Constructor for class org.apache.spark.CleanCheckpoint
-
- CleanRDD - Class in org.apache.spark
-
- CleanRDD(int) - Constructor for class org.apache.spark.CleanRDD
-
- CleanShuffle - Class in org.apache.spark
-
- CleanShuffle(int) - Constructor for class org.apache.spark.CleanShuffle
-
- CleanupTask - Interface in org.apache.spark
-
Classes that represent cleaning tasks.
- CleanupTaskWeakReference - Class in org.apache.spark
-
A WeakReference associated with a CleanupTask.
- CleanupTaskWeakReference(CleanupTask, Object, ReferenceQueue<Object>) - Constructor for class org.apache.spark.CleanupTaskWeakReference
-
- clear(Param<?>) - Method in interface org.apache.spark.ml.param.Params
-
Clears the user-supplied value for the input param.
- clearCache() - Method in class org.apache.spark.sql.SQLContext
-
Removes all cached tables from the in-memory cache.
- clearCallSite() - Method in class org.apache.spark.api.java.JavaSparkContext
-
Pass-through to SparkContext.setCallSite.
- clearCallSite() - Method in class org.apache.spark.SparkContext
-
Clear the thread-local property for overriding the call sites
of actions and RDDs.
- clearDependencies() - Method in class org.apache.spark.rdd.CoGroupedRDD
-
- clearDependencies() - Method in class org.apache.spark.rdd.RDD
-
Clears the dependencies of this RDD.
- clearDependencies() - Method in class org.apache.spark.rdd.ShuffledRDD
-
- clearDependencies() - Method in class org.apache.spark.rdd.UnionRDD
-
- clearFiles() - Method in class org.apache.spark.api.java.JavaSparkContext
-
Clear the job's list of files added by addFile
so that they do not get downloaded to
any new nodes.
- clearFiles() - Method in class org.apache.spark.SparkContext
-
Clear the job's list of files added by addFile
so that they do not get downloaded to
any new nodes.
- clearJars() - Method in class org.apache.spark.api.java.JavaSparkContext
-
Clear the job's list of JARs added by addJar
so that they do not get downloaded to
any new nodes.
- clearJars() - Method in class org.apache.spark.SparkContext
-
Clear the job's list of JARs added by addJar
so that they do not get downloaded to
any new nodes.
- clearJobGroup() - Method in class org.apache.spark.api.java.JavaSparkContext
-
Clear the current thread's job group ID and its description.
- clearJobGroup() - Method in class org.apache.spark.SparkContext
-
Clear the current thread's job group ID and its description.
- clearThreshold() - Method in class org.apache.spark.mllib.classification.LogisticRegressionModel
-
:: Experimental ::
Clears the threshold so that predict
will output raw prediction scores.
- clearThreshold() - Method in class org.apache.spark.mllib.classification.SVMModel
-
:: Experimental ::
Clears the threshold so that predict
will output raw prediction scores.
- clone() - Method in class org.apache.spark.SparkConf
-
Copy this object
- clone() - Method in class org.apache.spark.sql.types.Decimal
-
- clone() - Method in class org.apache.spark.storage.StorageLevel
-
- clone() - Method in class org.apache.spark.util.random.BernoulliCellSampler
-
- clone() - Method in class org.apache.spark.util.random.BernoulliSampler
-
- clone() - Method in class org.apache.spark.util.random.PoissonSampler
-
- clone() - Method in interface org.apache.spark.util.random.RandomSampler
-
return a copy of the RandomSampler object
- cloneComplement() - Method in class org.apache.spark.util.random.BernoulliCellSampler
-
Return a sampler that is the complement of the range specified of the current sampler.
- close() - Method in class org.apache.spark.api.java.JavaSparkContext
-
- close() - Method in class org.apache.spark.input.PortableDataStream
-
Close the file (if it is currently open)
- close() - Method in class org.apache.spark.io.SnappyOutputStreamWrapper
-
- close() - Method in class org.apache.spark.serializer.DeserializationStream
-
- close() - Method in class org.apache.spark.serializer.SerializationStream
-
- close() - Method in class org.apache.spark.sql.sources.OutputWriter
-
- close() - Method in class org.apache.spark.storage.BufferReleasingInputStream
-
- close() - Method in class org.apache.spark.storage.TimeTrackingOutputStream
-
- close() - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
- close() - Method in class org.apache.spark.streaming.util.WriteAheadLog
-
Close this log and release any resources.
- closeLogWriter(int) - Method in class org.apache.spark.scheduler.JobLogger
-
Close log file, and clean the stage relationship in stageIdToJobId
- closureSerializer() - Method in class org.apache.spark.SparkEnv
-
- cls() - Method in class org.apache.spark.util.MethodIdentifier
-
- cluster() - Method in class org.apache.spark.mllib.clustering.PowerIterationClustering.Assignment
-
- clusterCenters() - Method in class org.apache.spark.ml.clustering.KMeansModel
-
- clusterCenters() - Method in class org.apache.spark.mllib.clustering.KMeansModel
-
- clusterCenters() - Method in class org.apache.spark.mllib.clustering.StreamingKMeansModel
-
- clusterWeights() - Method in class org.apache.spark.mllib.clustering.StreamingKMeansModel
-
- cn() - Method in class org.apache.spark.mllib.feature.VocabWord
-
- coalesce(int) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Return a new RDD that is reduced into numPartitions
partitions.
- coalesce(int, boolean) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Return a new RDD that is reduced into numPartitions
partitions.
- coalesce(int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return a new RDD that is reduced into numPartitions
partitions.
- coalesce(int, boolean) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return a new RDD that is reduced into numPartitions
partitions.
- coalesce(int) - Method in class org.apache.spark.api.java.JavaRDD
-
Return a new RDD that is reduced into numPartitions
partitions.
- coalesce(int, boolean) - Method in class org.apache.spark.api.java.JavaRDD
-
Return a new RDD that is reduced into numPartitions
partitions.
- coalesce(int, boolean, Ordering<T>) - Method in class org.apache.spark.rdd.RDD
-
Return a new RDD that is reduced into numPartitions
partitions.
- coalesce(int) - Method in class org.apache.spark.sql.DataFrame
-
Returns a new
DataFrame
that has exactly
numPartitions
partitions.
- coalesce(Column...) - Static method in class org.apache.spark.sql.functions
-
Returns the first column that is not null, or null if all inputs are null.
- coalesce(Seq<Column>) - Static method in class org.apache.spark.sql.functions
-
Returns the first column that is not null, or null if all inputs are null.
- code() - Method in class org.apache.spark.mllib.feature.VocabWord
-
- codegenEnabled() - Method in class org.apache.spark.sql.SQLContext.SparkPlanner
-
- codeLen() - Method in class org.apache.spark.mllib.feature.VocabWord
-
- cogroup(JavaPairRDD<K, W>, Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
-
For each key k in this
or other
, return a resulting RDD that contains a tuple with the
list of values for that key in this
as well as other
.
- cogroup(JavaPairRDD<K, W1>, JavaPairRDD<K, W2>, Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
-
For each key k in this
or other1
or other2
, return a resulting RDD that contains a
tuple with the list of values for that key in this
, other1
and other2
.
- cogroup(JavaPairRDD<K, W1>, JavaPairRDD<K, W2>, JavaPairRDD<K, W3>, Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
-
For each key k in this
or other1
or other2
or other3
,
return a resulting RDD that contains a tuple with the list of values
for that key in this
, other1
, other2
and other3
.
- cogroup(JavaPairRDD<K, W>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
For each key k in this
or other
, return a resulting RDD that contains a tuple with the
list of values for that key in this
as well as other
.
- cogroup(JavaPairRDD<K, W1>, JavaPairRDD<K, W2>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
For each key k in this
or other1
or other2
, return a resulting RDD that contains a
tuple with the list of values for that key in this
, other1
and other2
.
- cogroup(JavaPairRDD<K, W1>, JavaPairRDD<K, W2>, JavaPairRDD<K, W3>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
For each key k in this
or other1
or other2
or other3
,
return a resulting RDD that contains a tuple with the list of values
for that key in this
, other1
, other2
and other3
.
- cogroup(JavaPairRDD<K, W>, int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
For each key k in this
or other
, return a resulting RDD that contains a tuple with the
list of values for that key in this
as well as other
.
- cogroup(JavaPairRDD<K, W1>, JavaPairRDD<K, W2>, int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
For each key k in this
or other1
or other2
, return a resulting RDD that contains a
tuple with the list of values for that key in this
, other1
and other2
.
- cogroup(JavaPairRDD<K, W1>, JavaPairRDD<K, W2>, JavaPairRDD<K, W3>, int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
For each key k in this
or other1
or other2
or other3
,
return a resulting RDD that contains a tuple with the list of values
for that key in this
, other1
, other2
and other3
.
- cogroup(RDD<Tuple2<K, W1>>, RDD<Tuple2<K, W2>>, RDD<Tuple2<K, W3>>, Partitioner) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
For each key k in this
or other1
or other2
or other3
,
return a resulting RDD that contains a tuple with the list of values
for that key in this
, other1
, other2
and other3
.
- cogroup(RDD<Tuple2<K, W>>, Partitioner) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
- cogroup(RDD<Tuple2<K, W1>>, RDD<Tuple2<K, W2>>, Partitioner) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
- cogroup(RDD<Tuple2<K, W1>>, RDD<Tuple2<K, W2>>, RDD<Tuple2<K, W3>>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
- cogroup(RDD<Tuple2<K, W>>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
For each key k in this
or other
, return a resulting RDD that contains a tuple with the
list of values for that key in this
as well as other
.
- cogroup(RDD<Tuple2<K, W1>>, RDD<Tuple2<K, W2>>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
For each key k in this
or other1
or other2
, return a resulting RDD that contains a
tuple with the list of values for that key in this
, other1
and other2
.
- cogroup(RDD<Tuple2<K, W>>, int) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
For each key k in this
or other
, return a resulting RDD that contains a tuple with the
list of values for that key in this
as well as other
.
- cogroup(RDD<Tuple2<K, W1>>, RDD<Tuple2<K, W2>>, int) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
For each key k in this
or other1
or other2
, return a resulting RDD that contains a
tuple with the list of values for that key in this
, other1
and other2
.
- cogroup(RDD<Tuple2<K, W1>>, RDD<Tuple2<K, W2>>, RDD<Tuple2<K, W3>>, int) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
For each key k in this
or other1
or other2
or other3
,
return a resulting RDD that contains a tuple with the list of values
for that key in this
, other1
, other2
and other3
.
- cogroup(JavaPairDStream<K, W>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying 'cogroup' between RDDs of this
DStream and other
DStream.
- cogroup(JavaPairDStream<K, W>, int) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying 'cogroup' between RDDs of this
DStream and other
DStream.
- cogroup(JavaPairDStream<K, W>, Partitioner) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying 'cogroup' between RDDs of this
DStream and other
DStream.
- cogroup(DStream<Tuple2<K, W>>, ClassTag<W>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying 'cogroup' between RDDs of this
DStream and other
DStream.
- cogroup(DStream<Tuple2<K, W>>, int, ClassTag<W>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying 'cogroup' between RDDs of this
DStream and other
DStream.
- cogroup(DStream<Tuple2<K, W>>, Partitioner, ClassTag<W>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying 'cogroup' between RDDs of this
DStream and other
DStream.
- CoGroupedRDD<K> - Class in org.apache.spark.rdd
-
:: DeveloperApi ::
A RDD that cogroups its parents.
- CoGroupedRDD(Seq<RDD<? extends Product2<K, ?>>>, Partitioner) - Constructor for class org.apache.spark.rdd.CoGroupedRDD
-
- col(String) - Method in class org.apache.spark.sql.DataFrame
-
Selects column based on the column name and return it as a
Column
.
- col(String) - Static method in class org.apache.spark.sql.functions
-
Returns a
Column
based on the given column name.
- collect() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return an array that contains all of the elements in this RDD.
- collect() - Method in class org.apache.spark.graphx.impl.EdgeRDDImpl
-
- collect() - Method in class org.apache.spark.rdd.RDD
-
Return an array that contains all of the elements in this RDD.
- collect(PartialFunction<T, U>, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
-
Return an RDD that contains all matching values by applying f
.
- collect() - Method in class org.apache.spark.sql.DataFrame
-
Returns an array that contains all of
Row
s in this
DataFrame
.
- collectAsList() - Method in class org.apache.spark.sql.DataFrame
-
Returns a Java list that contains all of
Row
s in this
DataFrame
.
- collectAsMap() - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return the key-value pairs in this RDD to the master as a Map.
- collectAsMap() - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Return the key-value pairs in this RDD to the master as a Map.
- collectAsync() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
The asynchronous version of collect
, which returns a future for
retrieving an array containing all of the elements in this RDD.
- collectAsync() - Method in class org.apache.spark.rdd.AsyncRDDActions
-
Returns a future for retrieving all elements of this RDD.
- collectEdges(EdgeDirection) - Method in class org.apache.spark.graphx.GraphOps
-
Returns an RDD that contains for each vertex v its local edges,
i.e., the edges that are incident on v, in the user-specified direction.
- collectNeighborIds(EdgeDirection) - Method in class org.apache.spark.graphx.GraphOps
-
Collect the neighbor vertex ids for each vertex.
- collectNeighbors(EdgeDirection) - Method in class org.apache.spark.graphx.GraphOps
-
Collect the neighbor vertex attributes for each vertex.
- collectPartitions(int[]) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return an array that contains all of the elements in a specific partition of this RDD.
- colPtrs() - Method in class org.apache.spark.mllib.linalg.SparseMatrix
-
- colsPerBlock() - Method in class org.apache.spark.mllib.linalg.distributed.BlockMatrix
-
- colStats(RDD<Vector>) - Static method in class org.apache.spark.mllib.stat.Statistics
-
Computes column-wise summary statistics for the input RDD[Vector].
- Column - Class in org.apache.spark.sql
-
- Column(Expression) - Constructor for class org.apache.spark.sql.Column
-
- Column(String) - Constructor for class org.apache.spark.sql.Column
-
- column(String) - Static method in class org.apache.spark.sql.functions
-
Returns a
Column
based on the given column name.
- ColumnName - Class in org.apache.spark.sql
-
:: Experimental ::
A convenient class used for constructing schema.
- ColumnName(String) - Constructor for class org.apache.spark.sql.ColumnName
-
- ColumnPruner - Class in org.apache.spark.ml.feature
-
Utility transformer for removing temporary columns from a DataFrame.
- ColumnPruner(Set<String>) - Constructor for class org.apache.spark.ml.feature.ColumnPruner
-
- columns() - Method in class org.apache.spark.sql.DataFrame
-
Returns all column names as an array.
- columnSimilarities() - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
-
Compute all cosine similarities between columns of this matrix using the brute-force
approach of computing normalized dot products.
- columnSimilarities(double) - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
-
Compute similarities between columns of this matrix using a sampling approach.
- combineByKey(Function<V, C>, Function2<C, V, C>, Function2<C, C, C>, Partitioner, boolean, Serializer) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Generic function to combine the elements for each key using a custom set of aggregation
functions.
- combineByKey(Function<V, C>, Function2<C, V, C>, Function2<C, C, C>, Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Generic function to combine the elements for each key using a custom set of aggregation
functions.
- combineByKey(Function<V, C>, Function2<C, V, C>, Function2<C, C, C>, int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Simplified version of combineByKey that hash-partitions the output RDD and uses map-side
aggregation.
- combineByKey(Function<V, C>, Function2<C, V, C>, Function2<C, C, C>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Simplified version of combineByKey that hash-partitions the resulting RDD using the existing
partitioner/parallelism level and using map-side aggregation.
- combineByKey(Function1<V, C>, Function2<C, V, C>, Function2<C, C, C>, Partitioner, boolean, Serializer) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Generic function to combine the elements for each key using a custom set of aggregation
functions.
- combineByKey(Function1<V, C>, Function2<C, V, C>, Function2<C, C, C>, int) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Simplified version of combineByKey that hash-partitions the output RDD.
- combineByKey(Function1<V, C>, Function2<C, V, C>, Function2<C, C, C>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
- combineByKey(Function<V, C>, Function2<C, V, C>, Function2<C, C, C>, Partitioner) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Combine elements of each key in DStream's RDDs using custom function.
- combineByKey(Function<V, C>, Function2<C, V, C>, Function2<C, C, C>, Partitioner, boolean) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Combine elements of each key in DStream's RDDs using custom function.
- combineByKey(Function1<V, C>, Function2<C, V, C>, Function2<C, C, C>, Partitioner, boolean, ClassTag<C>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Combine elements of each key in DStream's RDDs using custom functions.
- combineCombinersByKey(Iterator<Product2<K, C>>) - Method in class org.apache.spark.Aggregator
-
- combineCombinersByKey(Iterator<Product2<K, C>>, TaskContext) - Method in class org.apache.spark.Aggregator
-
- combineValuesByKey(Iterator<Product2<K, V>>) - Method in class org.apache.spark.Aggregator
-
- combineValuesByKey(Iterator<Product2<K, V>>, TaskContext) - Method in class org.apache.spark.Aggregator
-
- compare(PartitionGroup, PartitionGroup) - Method in class org.apache.spark.rdd.PartitionCoalescer
-
- compare(Option<PartitionGroup>, Option<PartitionGroup>) - Method in class org.apache.spark.rdd.PartitionCoalescer
-
- compare(Decimal) - Method in class org.apache.spark.sql.types.Decimal
-
- compare(RDDInfo) - Method in class org.apache.spark.storage.RDDInfo
-
- compareTo(SparkShutdownHook) - Method in class org.apache.spark.util.SparkShutdownHook
-
- completed() - Method in class org.apache.spark.status.api.v1.ApplicationAttemptInfo
-
- completedJobs() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- completedStages() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- completedTasks() - Method in class org.apache.spark.status.api.v1.ExecutorSummary
-
- completionTime() - Method in class org.apache.spark.scheduler.StageInfo
-
Time when all tasks in the stage completed or when the stage was cancelled.
- completionTime() - Method in class org.apache.spark.status.api.v1.JobData
-
- ComplexFutureAction<T> - Class in org.apache.spark
-
A
FutureAction
for actions that could trigger multiple Spark jobs.
- ComplexFutureAction() - Constructor for class org.apache.spark.ComplexFutureAction
-
- compressed() - Method in interface org.apache.spark.mllib.linalg.Vector
-
Returns a vector in either dense or sparse format, whichever uses less storage.
- compressedInputStream(InputStream) - Method in interface org.apache.spark.io.CompressionCodec
-
- compressedInputStream(InputStream) - Method in class org.apache.spark.io.LZ4CompressionCodec
-
- compressedInputStream(InputStream) - Method in class org.apache.spark.io.LZFCompressionCodec
-
- compressedInputStream(InputStream) - Method in class org.apache.spark.io.SnappyCompressionCodec
-
- compressedOutputStream(OutputStream) - Method in interface org.apache.spark.io.CompressionCodec
-
- compressedOutputStream(OutputStream) - Method in class org.apache.spark.io.LZ4CompressionCodec
-
- compressedOutputStream(OutputStream) - Method in class org.apache.spark.io.LZFCompressionCodec
-
- compressedOutputStream(OutputStream) - Method in class org.apache.spark.io.SnappyCompressionCodec
-
- CompressionCodec - Interface in org.apache.spark.io
-
:: DeveloperApi ::
CompressionCodec allows the customization of choosing different compression implementations
to be used in block storage.
- compute(Partition, TaskContext) - Method in class org.apache.spark.api.r.BaseRRDD
-
- compute(Partition, TaskContext) - Method in class org.apache.spark.graphx.EdgeRDD
-
- compute(Partition, TaskContext) - Method in class org.apache.spark.graphx.VertexRDD
-
Provides the RDD[(VertexId, VD)]
equivalent output.
- compute(Vector, double, Vector) - Method in class org.apache.spark.mllib.optimization.Gradient
-
Compute the gradient and loss given the features of a single data point.
- compute(Vector, double, Vector, Vector) - Method in class org.apache.spark.mllib.optimization.Gradient
-
Compute the gradient and loss given the features of a single data point,
add the gradient to a provided vector to avoid creating new objects, and return loss.
- compute(Vector, double, Vector) - Method in class org.apache.spark.mllib.optimization.HingeGradient
-
- compute(Vector, double, Vector, Vector) - Method in class org.apache.spark.mllib.optimization.HingeGradient
-
- compute(Vector, Vector, double, int, double) - Method in class org.apache.spark.mllib.optimization.L1Updater
-
- compute(Vector, double, Vector) - Method in class org.apache.spark.mllib.optimization.LeastSquaresGradient
-
- compute(Vector, double, Vector, Vector) - Method in class org.apache.spark.mllib.optimization.LeastSquaresGradient
-
- compute(Vector, double, Vector) - Method in class org.apache.spark.mllib.optimization.LogisticGradient
-
- compute(Vector, double, Vector, Vector) - Method in class org.apache.spark.mllib.optimization.LogisticGradient
-
- compute(Vector, Vector, double, int, double) - Method in class org.apache.spark.mllib.optimization.SimpleUpdater
-
- compute(Vector, Vector, double, int, double) - Method in class org.apache.spark.mllib.optimization.SquaredL2Updater
-
- compute(Vector, Vector, double, int, double) - Method in class org.apache.spark.mllib.optimization.Updater
-
Compute an updated value for weights given the gradient, stepSize, iteration number and
regularization parameter.
- compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.CoGroupedRDD
-
- compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.HadoopRDD
-
- compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.JdbcRDD
-
- compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.NewHadoopRDD
-
- compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.PartitionPruningRDD
-
- compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.RDD
-
:: DeveloperApi ::
Implemented by subclasses to compute a given partition.
- compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.ShuffledRDD
-
- compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.UnionRDD
-
- compute(Time) - Method in class org.apache.spark.streaming.api.java.JavaDStream
-
Generate an RDD for the given duration
- compute(Time) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Method that generates a RDD for the given Duration
- compute(Time) - Method in class org.apache.spark.streaming.dstream.ConstantInputDStream
-
- compute(Time) - Method in class org.apache.spark.streaming.dstream.DStream
-
Method that generates a RDD for the given time
- compute(Time) - Method in class org.apache.spark.streaming.dstream.ReceiverInputDStream
-
- computeColumnSummaryStatistics() - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
-
Computes column-wise summary statistics.
- computeCost(RDD<Vector>) - Method in class org.apache.spark.mllib.clustering.KMeansModel
-
Return the K-means cost (sum of squared distances of points to their nearest center) for this
model on the given data.
- computeCovariance() - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
-
Computes the covariance matrix, treating each row as an observation.
- computeError(org.apache.spark.mllib.tree.model.TreeEnsembleModel, RDD<LabeledPoint>) - Method in interface org.apache.spark.mllib.tree.loss.Loss
-
Method to calculate error of the base learner for the gradient boosting calculation.
- computeError(double, double) - Method in interface org.apache.spark.mllib.tree.loss.Loss
-
Method to calculate loss when the predictions are already known.
- computeGramianMatrix() - Method in class org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix
-
Computes the Gramian matrix A^T A
.
- computeGramianMatrix() - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
-
Computes the Gramian matrix A^T A
.
- computeInitialPredictionAndError(RDD<LabeledPoint>, double, DecisionTreeModel, Loss) - Static method in class org.apache.spark.mllib.tree.model.GradientBoostedTreesModel
-
Compute the initial predictions and errors for a dataset for the first
iteration of gradient boosting.
- computePreferredLocations(Seq<InputFormatInfo>) - Static method in class org.apache.spark.scheduler.InputFormatInfo
-
Computes the preferred locations based on input(s) and returned a location to block map.
- computePrincipalComponents(int) - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
-
Computes the top k principal components.
- computeSVD(int, boolean, double) - Method in class org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix
-
Computes the singular value decomposition of this IndexedRowMatrix.
- computeSVD(int, boolean, double) - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
-
Computes singular value decomposition of this matrix.
- concat(Column...) - Static method in class org.apache.spark.sql.functions
-
Concatenates multiple input string columns together into a single string column.
- concat(Seq<Column>) - Static method in class org.apache.spark.sql.functions
-
Concatenates multiple input string columns together into a single string column.
- concat_ws(String, Column...) - Static method in class org.apache.spark.sql.functions
-
Concatenates multiple input string columns together into a single string column,
using the given separator.
- concat_ws(String, Seq<Column>) - Static method in class org.apache.spark.sql.functions
-
Concatenates multiple input string columns together into a single string column,
using the given separator.
- conf() - Method in class org.apache.spark.SparkEnv
-
- conf() - Method in class org.apache.spark.sql.hive.HiveContext.SQLSession
-
- conf() - Method in class org.apache.spark.sql.SQLContext
-
- conf() - Method in class org.apache.spark.sql.SQLContext.SQLSession
-
- conf() - Method in class org.apache.spark.streaming.StreamingContext
-
- confidence() - Method in class org.apache.spark.mllib.fpm.AssociationRules.Rule
-
Returns the confidence of the rule.
- confidence() - Method in class org.apache.spark.partial.BoundedDouble
-
- configuration() - Method in class org.apache.spark.scheduler.InputFormatInfo
-
- CONFIGURATION_INSTANTIATION_LOCK() - Static method in class org.apache.spark.rdd.HadoopRDD
-
Configuration's constructor is not threadsafe (see SPARK-1097 and HADOOP-10456).
- CONFIGURATION_INSTANTIATION_LOCK() - Static method in class org.apache.spark.rdd.NewHadoopRDD
-
Configuration's constructor is not threadsafe (see SPARK-1097 and HADOOP-10456).
- configure() - Method in class org.apache.spark.sql.hive.HiveContext
-
Overridden by child classes that need to set configuration before the client init.
- confusionMatrix() - Method in class org.apache.spark.mllib.evaluation.MulticlassMetrics
-
Returns confusion matrix:
predicted classes are in columns,
they are ordered by class label ascending,
as in "labels"
- connectedComponents() - Method in class org.apache.spark.graphx.GraphOps
-
Compute the connected component membership of each vertex and return a graph with the vertex
value containing the lowest vertex id in the connected component containing that vertex.
- ConnectedComponents - Class in org.apache.spark.graphx.lib
-
Connected components algorithm.
- ConnectedComponents() - Constructor for class org.apache.spark.graphx.lib.ConnectedComponents
-
- consequent() - Method in class org.apache.spark.mllib.fpm.AssociationRules.Rule
-
- ConstantInputDStream<T> - Class in org.apache.spark.streaming.dstream
-
An input stream that always returns the same RDD on each timestep.
- ConstantInputDStream(StreamingContext, RDD<T>, ClassTag<T>) - Constructor for class org.apache.spark.streaming.dstream.ConstantInputDStream
-
- contains(Param<?>) - Method in class org.apache.spark.ml.param.ParamMap
-
Checks whether a parameter is explicitly specified.
- contains(String) - Method in class org.apache.spark.SparkConf
-
Does the configuration contain a given parameter?
- contains(Object) - Method in class org.apache.spark.sql.Column
-
Contains the other element.
- contains(String) - Method in class org.apache.spark.sql.types.Metadata
-
Tests whether this Metadata contains a binding for a key.
- containsBlock(BlockId) - Method in class org.apache.spark.storage.StorageStatus
-
Return whether the given block is stored in this block manager in O(1) time.
- containsCachedMetadata(String) - Static method in class org.apache.spark.rdd.HadoopRDD
-
- containsNull() - Method in class org.apache.spark.sql.types.ArrayType
-
- context() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
- context() - Method in class org.apache.spark.InterruptibleIterator
-
- context() - Method in class org.apache.spark.rdd.RDD
-
- context() - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
- context() - Method in class org.apache.spark.streaming.dstream.DStream
-
Return the StreamingContext associated with this DStream
- Continuous() - Static method in class org.apache.spark.mllib.tree.configuration.FeatureType
-
- ContinuousSplit - Class in org.apache.spark.ml.tree
-
:: DeveloperApi ::
Split which tests a continuous feature.
- conv(Column, int, int) - Static method in class org.apache.spark.sql.functions
-
Convert a number in a string column from one base to another.
- CONVERT_CTAS() - Static method in class org.apache.spark.sql.hive.HiveContext
-
- CONVERT_METASTORE_PARQUET() - Static method in class org.apache.spark.sql.hive.HiveContext
-
- CONVERT_METASTORE_PARQUET_WITH_SCHEMA_MERGING() - Static method in class org.apache.spark.sql.hive.HiveContext
-
- convertCTAS() - Method in class org.apache.spark.sql.hive.HiveContext
-
When true, a table created by a Hive CTAS statement (no USING clause) will be
converted to a data source table, using the data source set by spark.sql.sources.default.
- convertMetastoreParquet() - Method in class org.apache.spark.sql.hive.HiveContext
-
When true, enables an experimental feature where metastore tables that use the parquet SerDe
are automatically converted to use the Spark SQL parquet table scan, instead of the Hive
SerDe.
- convertMetastoreParquetWithSchemaMerging() - Method in class org.apache.spark.sql.hive.HiveContext
-
When true, also tries to merge possibly different but compatible Parquet schemas in different
Parquet data files.
- convertToCanonicalEdges(Function2<ED, ED, ED>) - Method in class org.apache.spark.graphx.GraphOps
-
Convert bi-directional edges into uni-directional ones.
- CoordinateMatrix - Class in org.apache.spark.mllib.linalg.distributed
-
:: Experimental ::
Represents a matrix in coordinate format.
- CoordinateMatrix(RDD<MatrixEntry>, long, long) - Constructor for class org.apache.spark.mllib.linalg.distributed.CoordinateMatrix
-
- CoordinateMatrix(RDD<MatrixEntry>) - Constructor for class org.apache.spark.mllib.linalg.distributed.CoordinateMatrix
-
Alternative constructor leaving matrix dimensions to be determined automatically.
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.DecisionTreeClassificationModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.DecisionTreeClassifier
-
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.GBTClassificationModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.GBTClassifier
-
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.LogisticRegression
-
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.LogisticRegressionModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.MultilayerPerceptronClassificationModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.MultilayerPerceptronClassifier
-
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.NaiveBayes
-
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.NaiveBayesModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.OneVsRest
-
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.OneVsRestModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.RandomForestClassificationModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.RandomForestClassifier
-
- copy(ParamMap) - Method in class org.apache.spark.ml.clustering.KMeans
-
- copy(ParamMap) - Method in class org.apache.spark.ml.clustering.KMeansModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.Estimator
-
- copy(ParamMap) - Method in class org.apache.spark.ml.evaluation.BinaryClassificationEvaluator
-
- copy(ParamMap) - Method in class org.apache.spark.ml.evaluation.Evaluator
-
- copy(ParamMap) - Method in class org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
-
- copy(ParamMap) - Method in class org.apache.spark.ml.evaluation.RegressionEvaluator
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.Binarizer
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.Bucketizer
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.ColumnPruner
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.CountVectorizer
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.CountVectorizerModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.HashingTF
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.IDF
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.IDFModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.IndexToString
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.MinMaxScaler
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.MinMaxScalerModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.OneHotEncoder
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.PCA
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.PCAModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.PolynomialExpansion
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.RegexTokenizer
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.RFormula
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.RFormulaModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.StandardScaler
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.StandardScalerModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.StopWordsRemover
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.StringIndexer
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.StringIndexerModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.Tokenizer
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.VectorAssembler
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.VectorIndexer
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.VectorIndexerModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.VectorSlicer
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.Word2Vec
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.Word2VecModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.Model
-
- copy() - Method in class org.apache.spark.ml.param.ParamMap
-
Creates a copy of this param map.
- copy(ParamMap) - Method in interface org.apache.spark.ml.param.Params
-
Creates a copy of this instance with the same UID and some extra params.
- copy(ParamMap) - Method in class org.apache.spark.ml.Pipeline
-
- copy(ParamMap) - Method in class org.apache.spark.ml.PipelineModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.PipelineStage
-
- copy(ParamMap) - Method in class org.apache.spark.ml.Predictor
-
- copy(ParamMap) - Method in class org.apache.spark.ml.recommendation.ALS
-
- copy(ParamMap) - Method in class org.apache.spark.ml.recommendation.ALSModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.regression.DecisionTreeRegressionModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.regression.DecisionTreeRegressor
-
- copy(ParamMap) - Method in class org.apache.spark.ml.regression.GBTRegressionModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.regression.GBTRegressor
-
- copy(ParamMap) - Method in class org.apache.spark.ml.regression.IsotonicRegression
-
- copy(ParamMap) - Method in class org.apache.spark.ml.regression.IsotonicRegressionModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.regression.LinearRegression
-
- copy(ParamMap) - Method in class org.apache.spark.ml.regression.LinearRegressionModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.regression.RandomForestRegressionModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.regression.RandomForestRegressor
-
- copy(ParamMap) - Method in class org.apache.spark.ml.Transformer
-
- copy(ParamMap) - Method in class org.apache.spark.ml.tuning.CrossValidator
-
- copy(ParamMap) - Method in class org.apache.spark.ml.tuning.CrossValidatorModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.tuning.TrainValidationSplit
-
- copy(ParamMap) - Method in class org.apache.spark.ml.tuning.TrainValidationSplitModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.UnaryTransformer
-
- copy() - Method in class org.apache.spark.mllib.linalg.DenseMatrix
-
- copy() - Method in class org.apache.spark.mllib.linalg.DenseVector
-
- copy() - Method in interface org.apache.spark.mllib.linalg.Matrix
-
Get a deep copy of the matrix.
- copy() - Method in class org.apache.spark.mllib.linalg.SparseMatrix
-
- copy() - Method in class org.apache.spark.mllib.linalg.SparseVector
-
- copy() - Method in interface org.apache.spark.mllib.linalg.Vector
-
Makes a deep copy of this vector.
- copy() - Method in class org.apache.spark.mllib.random.ExponentialGenerator
-
- copy() - Method in class org.apache.spark.mllib.random.GammaGenerator
-
- copy() - Method in class org.apache.spark.mllib.random.LogNormalGenerator
-
- copy() - Method in class org.apache.spark.mllib.random.PoissonGenerator
-
- copy() - Method in interface org.apache.spark.mllib.random.RandomDataGenerator
-
Returns a copy of the RandomDataGenerator with a new instance of the rng object used in the
class when applicable for non-locking concurrent usage.
- copy() - Method in class org.apache.spark.mllib.random.StandardNormalGenerator
-
- copy() - Method in class org.apache.spark.mllib.random.UniformGenerator
-
- copy() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
Returns a shallow copy of this instance.
- copy() - Method in interface org.apache.spark.sql.Row
-
Make a copy of the current
Row
object.
- copy() - Method in class org.apache.spark.sql.types.ArrayBasedMapData
-
- copy() - Method in class org.apache.spark.sql.types.ArrayData
-
- copy() - Method in class org.apache.spark.sql.types.GenericArrayData
-
- copy() - Method in class org.apache.spark.sql.types.MapData
-
- copy() - Method in class org.apache.spark.util.StatCounter
-
Clone this StatCounter
- copyValues(T, ParamMap) - Method in interface org.apache.spark.ml.param.Params
-
Copies param values from this instance to another instance for params shared by them.
- corr(RDD<Vector>) - Static method in class org.apache.spark.mllib.stat.Statistics
-
Compute the Pearson correlation matrix for the input RDD of Vectors.
- corr(RDD<Vector>, String) - Static method in class org.apache.spark.mllib.stat.Statistics
-
Compute the correlation matrix for the input RDD of Vectors using the specified method.
- corr(RDD<Object>, RDD<Object>) - Static method in class org.apache.spark.mllib.stat.Statistics
-
Compute the Pearson correlation for the input RDDs.
- corr(JavaRDD<Double>, JavaRDD<Double>) - Static method in class org.apache.spark.mllib.stat.Statistics
-
Java-friendly version of corr()
- corr(RDD<Object>, RDD<Object>, String) - Static method in class org.apache.spark.mllib.stat.Statistics
-
Compute the correlation for the input RDDs using the specified method.
- corr(JavaRDD<Double>, JavaRDD<Double>, String) - Static method in class org.apache.spark.mllib.stat.Statistics
-
Java-friendly version of corr()
- corr(String, String, String) - Method in class org.apache.spark.sql.DataFrameStatFunctions
-
Calculates the correlation of two columns of a DataFrame.
- corr(String, String) - Method in class org.apache.spark.sql.DataFrameStatFunctions
-
Calculates the Pearson Correlation Coefficient of two columns of a DataFrame.
- cos(Column) - Static method in class org.apache.spark.sql.functions
-
Computes the cosine of the given value.
- cos(String) - Static method in class org.apache.spark.sql.functions
-
Computes the cosine of the given column.
- cosh(Column) - Static method in class org.apache.spark.sql.functions
-
Computes the hyperbolic cosine of the given value.
- cosh(String) - Static method in class org.apache.spark.sql.functions
-
Computes the hyperbolic cosine of the given column.
- count() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return the number of elements in the RDD.
- count() - Method in class org.apache.spark.graphx.impl.EdgeRDDImpl
-
The number of edges in the RDD.
- count() - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
-
The number of vertices in the RDD.
- count() - Method in class org.apache.spark.ml.classification.LogisticAggregator
-
- count() - Method in class org.apache.spark.ml.regression.LeastSquaresAggregator
-
- count() - Method in class org.apache.spark.mllib.stat.MultivariateOnlineSummarizer
-
Sample size.
- count() - Method in interface org.apache.spark.mllib.stat.MultivariateStatisticalSummary
-
Sample size.
- count() - Method in class org.apache.spark.rdd.RDD
-
Return the number of elements in the RDD.
- count() - Method in class org.apache.spark.sql.DataFrame
-
- count(Column) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns the number of items in a group.
- count(String) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns the number of items in a group.
- count() - Method in class org.apache.spark.sql.GroupedData
-
Count the number of rows for each group.
- count() - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD has a single element generated by counting each RDD
of this DStream.
- count() - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream in which each RDD has a single element generated by counting each RDD
of this DStream.
- count() - Method in class org.apache.spark.streaming.kafka.OffsetRange
-
Number of messages this OffsetRange refers to
- count() - Method in class org.apache.spark.util.StatCounter
-
- countApprox(long, double) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
:: Experimental ::
Approximate version of count() that returns a potentially incomplete result
within a timeout, even if not all tasks have finished.
- countApprox(long) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
:: Experimental ::
Approximate version of count() that returns a potentially incomplete result
within a timeout, even if not all tasks have finished.
- countApprox(long, double) - Method in class org.apache.spark.rdd.RDD
-
:: Experimental ::
Approximate version of count() that returns a potentially incomplete result
within a timeout, even if not all tasks have finished.
- countApproxDistinct(double) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return approximate number of distinct elements in the RDD.
- countApproxDistinct(int, int) - Method in class org.apache.spark.rdd.RDD
-
:: Experimental ::
Return approximate number of distinct elements in the RDD.
- countApproxDistinct(double) - Method in class org.apache.spark.rdd.RDD
-
Return approximate number of distinct elements in the RDD.
- countApproxDistinctByKey(double, Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return approximate number of distinct values for each key in this RDD.
- countApproxDistinctByKey(double, int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return approximate number of distinct values for each key in this RDD.
- countApproxDistinctByKey(double) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return approximate number of distinct values for each key in this RDD.
- countApproxDistinctByKey(int, int, Partitioner) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
:: Experimental ::
- countApproxDistinctByKey(double, Partitioner) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Return approximate number of distinct values for each key in this RDD.
- countApproxDistinctByKey(double, int) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Return approximate number of distinct values for each key in this RDD.
- countApproxDistinctByKey(double) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Return approximate number of distinct values for each key in this RDD.
- countAsync() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
The asynchronous version of count
, which returns a
future for counting the number of elements in this RDD.
- countAsync() - Method in class org.apache.spark.rdd.AsyncRDDActions
-
Returns a future for counting the number of elements in the RDD.
- countByKey() - Method in class org.apache.spark.api.java.JavaPairRDD
-
Count the number of elements for each key, and return the result to the master as a Map.
- countByKey() - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Count the number of elements for each key, collecting the results to a local Map.
- countByKeyApprox(long) - Method in class org.apache.spark.api.java.JavaPairRDD
-
:: Experimental ::
Approximate version of countByKey that can return a partial result if it does
not finish within a timeout.
- countByKeyApprox(long, double) - Method in class org.apache.spark.api.java.JavaPairRDD
-
:: Experimental ::
Approximate version of countByKey that can return a partial result if it does
not finish within a timeout.
- countByKeyApprox(long, double) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
:: Experimental ::
Approximate version of countByKey that can return a partial result if it does
not finish within a timeout.
- countByValue() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return the count of each unique value in this RDD as a map of (value, count) pairs.
- countByValue(Ordering<T>) - Method in class org.apache.spark.rdd.RDD
-
Return the count of each unique value in this RDD as a local map of (value, count) pairs.
- countByValue() - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD contains the counts of each distinct value in
each RDD of this DStream.
- countByValue(int) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD contains the counts of each distinct value in
each RDD of this DStream.
- countByValue(int, Ordering<T>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream in which each RDD contains the counts of each distinct value in
each RDD of this DStream.
- countByValueAndWindow(Duration, Duration) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD contains the count of distinct elements in
RDDs in a sliding window over this DStream.
- countByValueAndWindow(Duration, Duration, int) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD contains the count of distinct elements in
RDDs in a sliding window over this DStream.
- countByValueAndWindow(Duration, Duration, int, Ordering<T>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream in which each RDD contains the count of distinct elements in
RDDs in a sliding window over this DStream.
- countByValueApprox(long, double) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
(Experimental) Approximate version of countByValue().
- countByValueApprox(long) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
(Experimental) Approximate version of countByValue().
- countByValueApprox(long, double, Ordering<T>) - Method in class org.apache.spark.rdd.RDD
-
:: Experimental ::
Approximate version of countByValue().
- countByWindow(Duration, Duration) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD has a single element generated by counting the number
of elements in a window over this DStream.
- countByWindow(Duration, Duration) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream in which each RDD has a single element generated by counting the number
of elements in a sliding window over this DStream.
- countDistinct(Column, Column...) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns the number of distinct items in a group.
- countDistinct(String, String...) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns the number of distinct items in a group.
- countDistinct(Column, Seq<Column>) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns the number of distinct items in a group.
- countDistinct(String, Seq<String>) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns the number of distinct items in a group.
- CountVectorizer - Class in org.apache.spark.ml.feature
-
:: Experimental ::
Extracts a vocabulary from document collections and generates a
CountVectorizerModel
.
- CountVectorizer(String) - Constructor for class org.apache.spark.ml.feature.CountVectorizer
-
- CountVectorizer() - Constructor for class org.apache.spark.ml.feature.CountVectorizer
-
- CountVectorizerModel - Class in org.apache.spark.ml.feature
-
:: Experimental ::
Converts a text document to a sparse vector of token counts.
- CountVectorizerModel(String, String[]) - Constructor for class org.apache.spark.ml.feature.CountVectorizerModel
-
- CountVectorizerModel(String[]) - Constructor for class org.apache.spark.ml.feature.CountVectorizerModel
-
- cov(String, String) - Method in class org.apache.spark.sql.DataFrameStatFunctions
-
Calculate the sample covariance of two numerical columns of a DataFrame.
- crc32(Column) - Static method in class org.apache.spark.sql.functions
-
Calculates the cyclic redundancy check value (CRC32) of a binary column and
returns the value as a bigint.
- CreatableRelationProvider - Interface in org.apache.spark.sql.sources
-
- create(boolean, boolean, boolean, int) - Static method in class org.apache.spark.api.java.StorageLevels
-
Deprecated.
- create(boolean, boolean, boolean, boolean, int) - Static method in class org.apache.spark.api.java.StorageLevels
-
Create a new StorageLevel object.
- create(JavaSparkContext, JdbcRDD.ConnectionFactory, String, long, long, int, Function<ResultSet, T>) - Static method in class org.apache.spark.rdd.JdbcRDD
-
Create an RDD that executes an SQL query on a JDBC connection and reads results.
- create(JavaSparkContext, JdbcRDD.ConnectionFactory, String, long, long, int) - Static method in class org.apache.spark.rdd.JdbcRDD
-
Create an RDD that executes an SQL query on a JDBC connection and reads results.
- create(RDD<T>, Function1<Object, Object>) - Static method in class org.apache.spark.rdd.PartitionPruningRDD
-
Create a PartitionPruningRDD.
- create(Object...) - Static method in class org.apache.spark.sql.RowFactory
-
Create a
Row
from the given arguments.
- create() - Method in interface org.apache.spark.streaming.api.java.JavaStreamingContextFactory
-
- create(String, int) - Static method in class org.apache.spark.streaming.kafka.Broker
-
- create(String, int, long, long) - Static method in class org.apache.spark.streaming.kafka.OffsetRange
-
- create(TopicAndPartition, long, long) - Static method in class org.apache.spark.streaming.kafka.OffsetRange
-
- createArrayType(DataType) - Static method in class org.apache.spark.sql.types.DataTypes
-
Creates an ArrayType by specifying the data type of elements (elementType
).
- createArrayType(DataType, boolean) - Static method in class org.apache.spark.sql.types.DataTypes
-
Creates an ArrayType by specifying the data type of elements (elementType
) and
whether the array contains null values (containsNull
).
- createCombiner() - Method in class org.apache.spark.Aggregator
-
- createDataFrame(RDD<A>, TypeTags.TypeTag<A>) - Method in class org.apache.spark.sql.SQLContext
-
- createDataFrame(Seq<A>, TypeTags.TypeTag<A>) - Method in class org.apache.spark.sql.SQLContext
-
- createDataFrame(RDD<Row>, StructType) - Method in class org.apache.spark.sql.SQLContext
-
- createDataFrame(JavaRDD<Row>, StructType) - Method in class org.apache.spark.sql.SQLContext
-
- createDataFrame(RDD<?>, Class<?>) - Method in class org.apache.spark.sql.SQLContext
-
- createDataFrame(JavaRDD<?>, Class<?>) - Method in class org.apache.spark.sql.SQLContext
-
- createDecimalType(int, int) - Static method in class org.apache.spark.sql.types.DataTypes
-
Creates a DecimalType by specifying the precision and scale.
- createDecimalType() - Static method in class org.apache.spark.sql.types.DataTypes
-
Creates a DecimalType with default precision and scale, which are 10 and 0.
- createDirectStream(StreamingContext, Map<String, String>, Map<TopicAndPartition, Object>, Function1<MessageAndMetadata<K, V>, R>, ClassTag<K>, ClassTag<V>, ClassTag<KD>, ClassTag<VD>, ClassTag<R>) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
-
Create an input stream that directly pulls messages from Kafka Brokers
without using any receiver.
- createDirectStream(StreamingContext, Map<String, String>, Set<String>, ClassTag<K>, ClassTag<V>, ClassTag<KD>, ClassTag<VD>) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
-
Create an input stream that directly pulls messages from Kafka Brokers
without using any receiver.
- createDirectStream(JavaStreamingContext, Class<K>, Class<V>, Class<KD>, Class<VD>, Class<R>, Map<String, String>, Map<TopicAndPartition, Long>, Function<MessageAndMetadata<K, V>, R>) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
-
Create an input stream that directly pulls messages from Kafka Brokers
without using any receiver.
- createDirectStream(JavaStreamingContext, Class<K>, Class<V>, Class<KD>, Class<VD>, Map<String, String>, Set<String>) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
-
Create an input stream that directly pulls messages from Kafka Brokers
without using any receiver.
- createExternalTable(String, String) - Method in class org.apache.spark.sql.SQLContext
-
- createExternalTable(String, String, String) - Method in class org.apache.spark.sql.SQLContext
-
- createExternalTable(String, String, Map<String, String>) - Method in class org.apache.spark.sql.SQLContext
-
- createExternalTable(String, String, Map<String, String>) - Method in class org.apache.spark.sql.SQLContext
-
- createExternalTable(String, String, StructType, Map<String, String>) - Method in class org.apache.spark.sql.SQLContext
-
- createExternalTable(String, String, StructType, Map<String, String>) - Method in class org.apache.spark.sql.SQLContext
-
- createJDBCTable(String, String, boolean) - Method in class org.apache.spark.sql.DataFrame
-
Deprecated.
As of 1.340, replaced by write().jdbc()
.
- createLogDir() - Method in class org.apache.spark.scheduler.JobLogger
-
Create a folder for log files, the folder's name is the creation time of jobLogger
- createLogWriter(int) - Method in class org.apache.spark.scheduler.JobLogger
-
Create a log file for one job
- createMapType(DataType, DataType) - Static method in class org.apache.spark.sql.types.DataTypes
-
Creates a MapType by specifying the data type of keys (keyType
) and values
(keyType
).
- createMapType(DataType, DataType, boolean) - Static method in class org.apache.spark.sql.types.DataTypes
-
Creates a MapType by specifying the data type of keys (keyType
), the data type of
values (keyType
), and whether values contain any null value
(valueContainsNull
).
- createModel(Vector, double) - Method in class org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS
-
- createModel(Vector, double) - Method in class org.apache.spark.mllib.classification.LogisticRegressionWithSGD
-
- createModel(Vector, double) - Method in class org.apache.spark.mllib.classification.SVMWithSGD
-
- createModel(Vector, double) - Method in class org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm
-
Create a model given the weights and intercept
- createModel(Vector, double) - Method in class org.apache.spark.mllib.regression.LassoWithSGD
-
- createModel(Vector, double) - Method in class org.apache.spark.mllib.regression.LinearRegressionWithSGD
-
- createModel(Vector, double) - Method in class org.apache.spark.mllib.regression.RidgeRegressionWithSGD
-
- createPollingStream(StreamingContext, String, int, StorageLevel) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
-
Creates an input stream that is to be used with the Spark Sink deployed on a Flume agent.
- createPollingStream(StreamingContext, Seq<InetSocketAddress>, StorageLevel) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
-
Creates an input stream that is to be used with the Spark Sink deployed on a Flume agent.
- createPollingStream(StreamingContext, Seq<InetSocketAddress>, StorageLevel, int, int) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
-
Creates an input stream that is to be used with the Spark Sink deployed on a Flume agent.
- createPollingStream(JavaStreamingContext, String, int) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
-
Creates an input stream that is to be used with the Spark Sink deployed on a Flume agent.
- createPollingStream(JavaStreamingContext, String, int, StorageLevel) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
-
Creates an input stream that is to be used with the Spark Sink deployed on a Flume agent.
- createPollingStream(JavaStreamingContext, InetSocketAddress[], StorageLevel) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
-
Creates an input stream that is to be used with the Spark Sink deployed on a Flume agent.
- createPollingStream(JavaStreamingContext, InetSocketAddress[], StorageLevel, int, int) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
-
Creates an input stream that is to be used with the Spark Sink deployed on a Flume agent.
- createRDD(SparkContext, Map<String, String>, OffsetRange[], ClassTag<K>, ClassTag<V>, ClassTag<KD>, ClassTag<VD>) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
-
Create a RDD from Kafka using offset ranges for each topic and partition.
- createRDD(SparkContext, Map<String, String>, OffsetRange[], Map<TopicAndPartition, Broker>, Function1<MessageAndMetadata<K, V>, R>, ClassTag<K>, ClassTag<V>, ClassTag<KD>, ClassTag<VD>, ClassTag<R>) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
-
Create a RDD from Kafka using offset ranges for each topic and partition.
- createRDD(JavaSparkContext, Class<K>, Class<V>, Class<KD>, Class<VD>, Map<String, String>, OffsetRange[]) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
-
Create a RDD from Kafka using offset ranges for each topic and partition.
- createRDD(JavaSparkContext, Class<K>, Class<V>, Class<KD>, Class<VD>, Class<R>, Map<String, String>, OffsetRange[], Map<TopicAndPartition, Broker>, Function<MessageAndMetadata<K, V>, R>) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
-
Create a RDD from Kafka using offset ranges for each topic and partition.
- createRDDFromArray(JavaSparkContext, byte[][]) - Static method in class org.apache.spark.api.r.RRDD
-
Create an RRDD given a sequence of byte arrays.
- createRDDWithLocalProperties(Time, Function0<U>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Wrap a body of code such that the call site and operation scope
information are passed to the RDDs created in this body properly.
- createRelation(SQLContext, SaveMode, Map<String, String>, DataFrame) - Method in interface org.apache.spark.sql.sources.CreatableRelationProvider
-
Creates a relation with the given parameters based on the contents of the given
DataFrame.
- createRelation(SQLContext, String[], Option<StructType>, Option<StructType>, Map<String, String>) - Method in interface org.apache.spark.sql.sources.HadoopFsRelationProvider
-
Returns a new base relation with the given parameters, a user defined schema, and a list of
partition columns.
- createRelation(SQLContext, Map<String, String>) - Method in interface org.apache.spark.sql.sources.RelationProvider
-
Returns a new base relation with the given parameters.
- createRelation(SQLContext, Map<String, String>, StructType) - Method in interface org.apache.spark.sql.sources.SchemaRelationProvider
-
Returns a new base relation with the given parameters and user defined schema.
- createRWorker(int) - Static method in class org.apache.spark.api.r.RRDD
-
ProcessBuilder used to launch worker R processes.
- createSession() - Method in class org.apache.spark.sql.hive.HiveContext
-
- createSession() - Method in class org.apache.spark.sql.SQLContext
-
- createSparkContext(String, String, String, String[], Map<Object, Object>, Map<Object, Object>) - Static method in class org.apache.spark.api.r.RRDD
-
- createStream(StreamingContext, String, int, StorageLevel) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
-
Create a input stream from a Flume source.
- createStream(StreamingContext, String, int, StorageLevel, boolean) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
-
Create a input stream from a Flume source.
- createStream(JavaStreamingContext, String, int) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
-
Creates a input stream from a Flume source.
- createStream(JavaStreamingContext, String, int, StorageLevel) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
-
Creates a input stream from a Flume source.
- createStream(JavaStreamingContext, String, int, StorageLevel, boolean) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
-
Creates a input stream from a Flume source.
- createStream(StreamingContext, String, String, Map<String, Object>, StorageLevel) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
-
Create an input stream that pulls messages from Kafka Brokers.
- createStream(StreamingContext, Map<String, String>, Map<String, Object>, StorageLevel, ClassTag<K>, ClassTag<V>, ClassTag<U>, ClassTag<T>) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
-
Create an input stream that pulls messages from Kafka Brokers.
- createStream(JavaStreamingContext, String, String, Map<String, Integer>) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
-
Create an input stream that pulls messages from Kafka Brokers.
- createStream(JavaStreamingContext, String, String, Map<String, Integer>, StorageLevel) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
-
Create an input stream that pulls messages from Kafka Brokers.
- createStream(JavaStreamingContext, Class<K>, Class<V>, Class<U>, Class<T>, Map<String, String>, Map<String, Integer>, StorageLevel) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
-
Create an input stream that pulls messages from Kafka Brokers.
- createStream(StreamingContext, String, String, String, String, InitialPositionInStream, Duration, StorageLevel) - Static method in class org.apache.spark.streaming.kinesis.KinesisUtils
-
Create an input stream that pulls messages from a Kinesis stream.
- createStream(StreamingContext, String, String, String, String, InitialPositionInStream, Duration, StorageLevel, String, String) - Static method in class org.apache.spark.streaming.kinesis.KinesisUtils
-
Create an input stream that pulls messages from a Kinesis stream.
- createStream(StreamingContext, String, String, Duration, InitialPositionInStream, StorageLevel) - Static method in class org.apache.spark.streaming.kinesis.KinesisUtils
-
Create an input stream that pulls messages from a Kinesis stream.
- createStream(JavaStreamingContext, String, String, String, String, InitialPositionInStream, Duration, StorageLevel) - Static method in class org.apache.spark.streaming.kinesis.KinesisUtils
-
Create an input stream that pulls messages from a Kinesis stream.
- createStream(JavaStreamingContext, String, String, String, String, InitialPositionInStream, Duration, StorageLevel, String, String) - Static method in class org.apache.spark.streaming.kinesis.KinesisUtils
-
Create an input stream that pulls messages from a Kinesis stream.
- createStream(JavaStreamingContext, String, String, Duration, InitialPositionInStream, StorageLevel) - Static method in class org.apache.spark.streaming.kinesis.KinesisUtils
-
Create an input stream that pulls messages from a Kinesis stream.
- createStream(JavaStreamingContext, String, String, String, String, int, Duration, StorageLevel, String, String) - Method in class org.apache.spark.streaming.kinesis.KinesisUtilsPythonHelper
-
- createStream(StreamingContext, String, String, StorageLevel) - Static method in class org.apache.spark.streaming.mqtt.MQTTUtils
-
Create an input stream that receives messages pushed by a MQTT publisher.
- createStream(JavaStreamingContext, String, String) - Static method in class org.apache.spark.streaming.mqtt.MQTTUtils
-
Create an input stream that receives messages pushed by a MQTT publisher.
- createStream(JavaStreamingContext, String, String, StorageLevel) - Static method in class org.apache.spark.streaming.mqtt.MQTTUtils
-
Create an input stream that receives messages pushed by a MQTT publisher.
- createStream(StreamingContext, Option<Authorization>, Seq<String>, StorageLevel) - Static method in class org.apache.spark.streaming.twitter.TwitterUtils
-
Create a input stream that returns tweets received from Twitter.
- createStream(JavaStreamingContext) - Static method in class org.apache.spark.streaming.twitter.TwitterUtils
-
Create a input stream that returns tweets received from Twitter using Twitter4J's default
OAuth authentication; this requires the system properties twitter4j.oauth.consumerKey,
twitter4j.oauth.consumerSecret, twitter4j.oauth.accessToken and
twitter4j.oauth.accessTokenSecret.
- createStream(JavaStreamingContext, String[]) - Static method in class org.apache.spark.streaming.twitter.TwitterUtils
-
Create a input stream that returns tweets received from Twitter using Twitter4J's default
OAuth authentication; this requires the system properties twitter4j.oauth.consumerKey,
twitter4j.oauth.consumerSecret, twitter4j.oauth.accessToken and
twitter4j.oauth.accessTokenSecret.
- createStream(JavaStreamingContext, String[], StorageLevel) - Static method in class org.apache.spark.streaming.twitter.TwitterUtils
-
Create a input stream that returns tweets received from Twitter using Twitter4J's default
OAuth authentication; this requires the system properties twitter4j.oauth.consumerKey,
twitter4j.oauth.consumerSecret, twitter4j.oauth.accessToken and
twitter4j.oauth.accessTokenSecret.
- createStream(JavaStreamingContext, Authorization) - Static method in class org.apache.spark.streaming.twitter.TwitterUtils
-
Create a input stream that returns tweets received from Twitter.
- createStream(JavaStreamingContext, Authorization, String[]) - Static method in class org.apache.spark.streaming.twitter.TwitterUtils
-
Create a input stream that returns tweets received from Twitter.
- createStream(JavaStreamingContext, Authorization, String[], StorageLevel) - Static method in class org.apache.spark.streaming.twitter.TwitterUtils
-
Create a input stream that returns tweets received from Twitter.
- createStream(StreamingContext, String, Subscribe, Function1<Seq<ByteString>, Iterator<T>>, StorageLevel, SupervisorStrategy, ClassTag<T>) - Static method in class org.apache.spark.streaming.zeromq.ZeroMQUtils
-
Create an input stream that receives messages pushed by a zeromq publisher.
- createStream(JavaStreamingContext, String, Subscribe, Function<byte[][], Iterable<T>>, StorageLevel, SupervisorStrategy) - Static method in class org.apache.spark.streaming.zeromq.ZeroMQUtils
-
Create an input stream that receives messages pushed by a zeromq publisher.
- createStream(JavaStreamingContext, String, Subscribe, Function<byte[][], Iterable<T>>, StorageLevel) - Static method in class org.apache.spark.streaming.zeromq.ZeroMQUtils
-
Create an input stream that receives messages pushed by a zeromq publisher.
- createStream(JavaStreamingContext, String, Subscribe, Function<byte[][], Iterable<T>>) - Static method in class org.apache.spark.streaming.zeromq.ZeroMQUtils
-
Create an input stream that receives messages pushed by a zeromq publisher.
- createStructField(String, DataType, boolean, Metadata) - Static method in class org.apache.spark.sql.types.DataTypes
-
Creates a StructField by specifying the name (name
), data type (dataType
) and
whether values of this field can be null values (nullable
).
- createStructField(String, DataType, boolean) - Static method in class org.apache.spark.sql.types.DataTypes
-
Creates a StructField with empty metadata.
- createStructType(List<StructField>) - Static method in class org.apache.spark.sql.types.DataTypes
-
Creates a StructType with the given list of StructFields (fields
).
- createStructType(StructField[]) - Static method in class org.apache.spark.sql.types.DataTypes
-
Creates a StructType with the given StructField array (fields
).
- createTransformFunc() - Method in class org.apache.spark.ml.feature.DCT
-
- createTransformFunc() - Method in class org.apache.spark.ml.feature.ElementwiseProduct
-
- createTransformFunc() - Method in class org.apache.spark.ml.feature.NGram
-
- createTransformFunc() - Method in class org.apache.spark.ml.feature.Normalizer
-
- createTransformFunc() - Method in class org.apache.spark.ml.feature.PolynomialExpansion
-
- createTransformFunc() - Method in class org.apache.spark.ml.feature.RegexTokenizer
-
- createTransformFunc() - Method in class org.apache.spark.ml.feature.Tokenizer
-
- createTransformFunc() - Method in class org.apache.spark.ml.UnaryTransformer
-
Creates the transform function using the given param map.
- creationSite() - Method in class org.apache.spark.rdd.RDD
-
User code that created this RDD (e.g.
- creationSite() - Method in class org.apache.spark.streaming.dstream.DStream
-
- crosstab(String, String) - Method in class org.apache.spark.sql.DataFrameStatFunctions
-
Computes a pair-wise frequency table of the given columns.
- CrossValidator - Class in org.apache.spark.ml.tuning
-
:: Experimental ::
K-fold cross validation.
- CrossValidator(String) - Constructor for class org.apache.spark.ml.tuning.CrossValidator
-
- CrossValidator() - Constructor for class org.apache.spark.ml.tuning.CrossValidator
-
- CrossValidatorModel - Class in org.apache.spark.ml.tuning
-
:: Experimental ::
Model from k-fold cross validation.
- cube(Column...) - Method in class org.apache.spark.sql.DataFrame
-
Create a multi-dimensional cube for the current
DataFrame
using the specified columns,
so we can run aggregation on them.
- cube(String, String...) - Method in class org.apache.spark.sql.DataFrame
-
Create a multi-dimensional cube for the current
DataFrame
using the specified columns,
so we can run aggregation on them.
- cube(Seq<Column>) - Method in class org.apache.spark.sql.DataFrame
-
Create a multi-dimensional cube for the current
DataFrame
using the specified columns,
so we can run aggregation on them.
- cube(String, Seq<String>) - Method in class org.apache.spark.sql.DataFrame
-
Create a multi-dimensional cube for the current
DataFrame
using the specified columns,
so we can run aggregation on them.
- cumeDist() - Static method in class org.apache.spark.sql.functions
-
Window function: returns the cumulative distribution of values within a window partition,
i.e.
- current_date() - Static method in class org.apache.spark.sql.functions
-
Returns the current date as a date column.
- current_timestamp() - Static method in class org.apache.spark.sql.functions
-
Returns the current timestamp as a timestamp column.
- currentAttemptId() - Method in interface org.apache.spark.SparkStageInfo
-
- currentAttemptId() - Method in class org.apache.spark.SparkStageInfoImpl
-
- currentSession() - Method in class org.apache.spark.sql.SQLContext
-
- currPrefLocs(Partition) - Method in class org.apache.spark.rdd.PartitionCoalescer
-
- databaseTypeDefinition() - Method in class org.apache.spark.sql.jdbc.JdbcType
-
- dataDistribution() - Method in class org.apache.spark.status.api.v1.RDDStorageInfo
-
- DataFrame - Class in org.apache.spark.sql
-
:: Experimental ::
A distributed collection of data organized into named columns.
- DataFrame(SQLContext, LogicalPlan) - Constructor for class org.apache.spark.sql.DataFrame
-
A constructor that automatically analyzes the logical plan.
- DataFrameNaFunctions - Class in org.apache.spark.sql
-
:: Experimental ::
Functionality for working with missing data in
DataFrame
s.
- DataFrameReader - Class in org.apache.spark.sql
-
:: Experimental ::
Interface used to load a
DataFrame
from external storage systems (e.g.
- DataFrameStatFunctions - Class in org.apache.spark.sql
-
:: Experimental ::
Statistic functions for
DataFrame
s.
- DataFrameWriter - Class in org.apache.spark.sql
-
:: Experimental ::
Interface used to write a
DataFrame
to external storage systems (e.g.
- dataSchema() - Method in class org.apache.spark.sql.sources.HadoopFsRelation
-
Specifies schema of actual data files.
- DataSourceRegister - Interface in org.apache.spark.sql.sources
-
::DeveloperApi::
Data sources should implement this trait so that they can register an alias to their data source.
- dataStream() - Method in class org.apache.spark.api.r.BaseRRDD
-
- dataType() - Method in class org.apache.spark.sql.expressions.UserDefinedAggregateFunction
-
- DataType - Class in org.apache.spark.sql.types
-
:: DeveloperApi ::
The base type of all Spark SQL data types.
- DataType() - Constructor for class org.apache.spark.sql.types.DataType
-
- dataType() - Method in class org.apache.spark.sql.types.StructField
-
- dataType() - Method in class org.apache.spark.sql.UserDefinedFunction
-
- DataTypes - Class in org.apache.spark.sql.types
-
To get/create specific data type, users should use singleton objects and factory methods
provided by this class.
- DataTypes() - Constructor for class org.apache.spark.sql.types.DataTypes
-
- DataValidators - Class in org.apache.spark.mllib.util
-
:: DeveloperApi ::
A collection of methods used to validate data before applying ML algorithms.
- DataValidators() - Constructor for class org.apache.spark.mllib.util.DataValidators
-
- date() - Method in class org.apache.spark.sql.ColumnName
-
Creates a new StructField
of type date.
- date_add(Column, int) - Static method in class org.apache.spark.sql.functions
-
Returns the date that is days
days after start
- date_format(Column, String) - Static method in class org.apache.spark.sql.functions
-
Converts a date/timestamp/string to a value of string in the format specified by the date
format given by the second argument.
- date_sub(Column, int) - Static method in class org.apache.spark.sql.functions
-
Returns the date that is days
days before start
- datediff(Column, Column) - Static method in class org.apache.spark.sql.functions
-
Returns the number of days from start
to end
.
- DateType - Static variable in class org.apache.spark.sql.types.DataTypes
-
Gets the DateType object.
- DateType - Class in org.apache.spark.sql.types
-
:: DeveloperApi ::
A date type, supporting "0001-01-01" through "9999-12-31".
- dayofmonth(Column) - Static method in class org.apache.spark.sql.functions
-
Extracts the day of the month as an integer from a given date/timestamp/string.
- dayofyear(Column) - Static method in class org.apache.spark.sql.functions
-
Extracts the day of the year as an integer from a given date/timestamp/string.
- DCT - Class in org.apache.spark.ml.feature
-
:: Experimental ::
A feature transformer that takes the 1D discrete cosine transform of a real vector.
- DCT(String) - Constructor for class org.apache.spark.ml.feature.DCT
-
- DCT() - Constructor for class org.apache.spark.ml.feature.DCT
-
- ddlParser() - Method in class org.apache.spark.sql.SQLContext
-
- decayFactor() - Method in class org.apache.spark.mllib.clustering.StreamingKMeans
-
- decimal() - Method in class org.apache.spark.sql.ColumnName
-
Creates a new StructField
of type decimal.
- decimal(int, int) - Method in class org.apache.spark.sql.ColumnName
-
Creates a new StructField
of type decimal.
- Decimal - Class in org.apache.spark.sql.types
-
A mutable implementation of BigDecimal that can hold a Long if values are small enough.
- Decimal() - Constructor for class org.apache.spark.sql.types.Decimal
-
- DecimalType - Class in org.apache.spark.sql.types
-
- DecimalType(int, int) - Constructor for class org.apache.spark.sql.types.DecimalType
-
- DecimalType(int) - Constructor for class org.apache.spark.sql.types.DecimalType
-
- DecimalType() - Constructor for class org.apache.spark.sql.types.DecimalType
-
- DecimalType(Option<PrecisionInfo>) - Constructor for class org.apache.spark.sql.types.DecimalType
-
- DecisionTree - Class in org.apache.spark.mllib.tree
-
:: Experimental ::
A class which implements a decision tree learning algorithm for classification and regression.
- DecisionTree(Strategy) - Constructor for class org.apache.spark.mllib.tree.DecisionTree
-
- DecisionTreeClassificationModel - Class in org.apache.spark.ml.classification
-
:: Experimental ::
Decision tree
model for classification.
- DecisionTreeClassifier - Class in org.apache.spark.ml.classification
-
:: Experimental ::
Decision tree
learning algorithm
for classification.
- DecisionTreeClassifier(String) - Constructor for class org.apache.spark.ml.classification.DecisionTreeClassifier
-
- DecisionTreeClassifier() - Constructor for class org.apache.spark.ml.classification.DecisionTreeClassifier
-
- DecisionTreeModel - Class in org.apache.spark.mllib.tree.model
-
:: Experimental ::
Decision tree model for classification or regression.
- DecisionTreeModel(Node, Enumeration.Value) - Constructor for class org.apache.spark.mllib.tree.model.DecisionTreeModel
-
- DecisionTreeRegressionModel - Class in org.apache.spark.ml.regression
-
:: Experimental ::
Decision tree
model for regression.
- DecisionTreeRegressor - Class in org.apache.spark.ml.regression
-
:: Experimental ::
Decision tree
learning algorithm
for regression.
- DecisionTreeRegressor(String) - Constructor for class org.apache.spark.ml.regression.DecisionTreeRegressor
-
- DecisionTreeRegressor() - Constructor for class org.apache.spark.ml.regression.DecisionTreeRegressor
-
- decode(Column, String) - Static method in class org.apache.spark.sql.functions
-
Computes the first argument into a string from a binary using the provided character set
(one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16').
- decodeLabel(Vector) - Static method in class org.apache.spark.ml.classification.LabelConverter
-
Converts a vector to a label.
- defaultAttr() - Static method in class org.apache.spark.ml.attribute.BinaryAttribute
-
The default binary attribute.
- defaultAttr() - Static method in class org.apache.spark.ml.attribute.NominalAttribute
-
The default nominal attribute.
- defaultAttr() - Static method in class org.apache.spark.ml.attribute.NumericAttribute
-
The default numeric attribute.
- defaultClassLoader() - Method in class org.apache.spark.serializer.Serializer
-
Default ClassLoader to use in deserialization.
- defaultCopy(ParamMap) - Method in interface org.apache.spark.ml.param.Params
-
Default implementation of copy with extra params.
- defaultMinPartitions() - Method in class org.apache.spark.api.java.JavaSparkContext
-
Default min number of partitions for Hadoop RDDs when not given by user
- defaultMinPartitions() - Method in class org.apache.spark.SparkContext
-
Default min number of partitions for Hadoop RDDs when not given by user
Notice that we use math.min so the "defaultMinPartitions" cannot be higher than 2.
- defaultMinSplits() - Method in class org.apache.spark.api.java.JavaSparkContext
-
- defaultMinSplits() - Method in class org.apache.spark.SparkContext
-
Default min number of partitions for Hadoop RDDs when not given by user
- defaultParallelism() - Method in class org.apache.spark.api.java.JavaSparkContext
-
Default level of parallelism to use when not given by user (e.g.
- defaultParallelism() - Method in class org.apache.spark.SparkContext
-
Default level of parallelism to use when not given by user (e.g.
- defaultParamMap() - Method in interface org.apache.spark.ml.param.Params
-
Internal param map for default values.
- defaultParams(String) - Static method in class org.apache.spark.mllib.tree.configuration.BoostingStrategy
-
Returns default configuration for the boosting algorithm
- defaultParams(Enumeration.Value) - Static method in class org.apache.spark.mllib.tree.configuration.BoostingStrategy
-
Returns default configuration for the boosting algorithm
- defaultPartitioner(RDD<?>, Seq<RDD<?>>) - Static method in class org.apache.spark.Partitioner
-
Choose a partitioner to use for a cogroup-like operation between a number of RDDs.
- defaultSession() - Method in class org.apache.spark.sql.SQLContext
-
- defaultSize() - Method in class org.apache.spark.sql.types.ArrayType
-
The default size of a value of the ArrayType is 100 * the default size of the element type.
- defaultSize() - Method in class org.apache.spark.sql.types.BinaryType
-
The default size of a value of the BinaryType is 4096 bytes.
- defaultSize() - Method in class org.apache.spark.sql.types.BooleanType
-
The default size of a value of the BooleanType is 1 byte.
- defaultSize() - Method in class org.apache.spark.sql.types.ByteType
-
The default size of a value of the ByteType is 1 byte.
- defaultSize() - Method in class org.apache.spark.sql.types.CalendarIntervalType
-
- defaultSize() - Method in class org.apache.spark.sql.types.DataType
-
The default size of a value of this data type, used internally for size estimation.
- defaultSize() - Method in class org.apache.spark.sql.types.DateType
-
The default size of a value of the DateType is 4 bytes.
- defaultSize() - Method in class org.apache.spark.sql.types.DecimalType
-
The default size of a value of the DecimalType is 4096 bytes.
- defaultSize() - Method in class org.apache.spark.sql.types.DoubleType
-
The default size of a value of the DoubleType is 8 bytes.
- defaultSize() - Method in class org.apache.spark.sql.types.FloatType
-
The default size of a value of the FloatType is 4 bytes.
- defaultSize() - Method in class org.apache.spark.sql.types.IntegerType
-
The default size of a value of the IntegerType is 4 bytes.
- defaultSize() - Method in class org.apache.spark.sql.types.LongType
-
The default size of a value of the LongType is 8 bytes.
- defaultSize() - Method in class org.apache.spark.sql.types.MapType
-
The default size of a value of the MapType is
100 * (the default size of the key type + the default size of the value type).
- defaultSize() - Method in class org.apache.spark.sql.types.NullType
-
- defaultSize() - Method in class org.apache.spark.sql.types.ShortType
-
The default size of a value of the ShortType is 2 bytes.
- defaultSize() - Method in class org.apache.spark.sql.types.StringType
-
The default size of a value of the StringType is 4096 bytes.
- defaultSize() - Method in class org.apache.spark.sql.types.StructType
-
The default size of a value of the StructType is the total default sizes of all field types.
- defaultSize() - Method in class org.apache.spark.sql.types.TimestampType
-
The default size of a value of the TimestampType is 8 bytes.
- defaultSize() - Method in class org.apache.spark.sql.types.UserDefinedType
-
The default size of a value of the UserDefinedType is 4096 bytes.
- defaultStategy(Enumeration.Value) - Static method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- defaultStrategy(String) - Static method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- defaultStrategy(Enumeration.Value) - Static method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- defaultStrategy() - Static method in class org.apache.spark.streaming.receiver.ActorSupervisorStrategy
-
- degree() - Method in class org.apache.spark.ml.feature.PolynomialExpansion
-
The polynomial degree to expand, which should be >= 1.
- degrees() - Method in class org.apache.spark.graphx.GraphOps
-
The degree of each vertex in the graph.
- degreesOfFreedom() - Method in class org.apache.spark.mllib.stat.test.ChiSqTestResult
-
- degreesOfFreedom() - Method in class org.apache.spark.mllib.stat.test.KolmogorovSmirnovTestResult
-
- degreesOfFreedom() - Method in interface org.apache.spark.mllib.stat.test.TestResult
-
Returns the degree(s) of freedom of the hypothesis test.
- delegate() - Method in class org.apache.spark.InterruptibleIterator
-
- dense(int, int, double[]) - Static method in class org.apache.spark.mllib.linalg.Matrices
-
Creates a column-major dense matrix.
- dense(double, double...) - Static method in class org.apache.spark.mllib.linalg.Vectors
-
Creates a dense vector from its values.
- dense(double, Seq<Object>) - Static method in class org.apache.spark.mllib.linalg.Vectors
-
Creates a dense vector from its values.
- dense(double[]) - Static method in class org.apache.spark.mllib.linalg.Vectors
-
Creates a dense vector from a double array.
- DenseMatrix - Class in org.apache.spark.mllib.linalg
-
Column-major dense matrix.
- DenseMatrix(int, int, double[], boolean) - Constructor for class org.apache.spark.mllib.linalg.DenseMatrix
-
- DenseMatrix(int, int, double[]) - Constructor for class org.apache.spark.mllib.linalg.DenseMatrix
-
Column-major dense matrix.
- denseRank() - Static method in class org.apache.spark.sql.functions
-
Window function: returns the rank of rows within a window partition, without any gaps.
- DenseVector - Class in org.apache.spark.mllib.linalg
-
A dense vector represented by a value array.
- DenseVector(double[]) - Constructor for class org.apache.spark.mllib.linalg.DenseVector
-
- dependencies() - Method in class org.apache.spark.rdd.RDD
-
Get the list of dependencies of this RDD, taking into account whether the
RDD is checkpointed or not.
- dependencies() - Method in class org.apache.spark.streaming.dstream.DStream
-
List of parent DStreams on which this DStream depends on
- dependencies() - Method in class org.apache.spark.streaming.dstream.InputDStream
-
- Dependency<T> - Class in org.apache.spark
-
:: DeveloperApi ::
Base class for dependencies.
- Dependency() - Constructor for class org.apache.spark.Dependency
-
- depth() - Method in class org.apache.spark.mllib.tree.model.DecisionTreeModel
-
Get depth of tree.
- desc() - Method in class org.apache.spark.sql.Column
-
Returns an ordering used in sorting.
- desc(String) - Static method in class org.apache.spark.sql.functions
-
Returns a sort expression based on the descending order of the column.
- desc() - Method in class org.apache.spark.util.MethodIdentifier
-
- describe(String...) - Method in class org.apache.spark.sql.DataFrame
-
Computes statistics for numeric columns, including count, mean, stddev, min, and max.
- describe(Seq<String>) - Method in class org.apache.spark.sql.DataFrame
-
Computes statistics for numeric columns, including count, mean, stddev, min, and max.
- describeTopics(int) - Method in class org.apache.spark.mllib.clustering.DistributedLDAModel
-
- describeTopics(int) - Method in class org.apache.spark.mllib.clustering.LDAModel
-
Return the topics described by weighted terms.
- describeTopics() - Method in class org.apache.spark.mllib.clustering.LDAModel
-
Return the topics described by weighted terms.
- describeTopics(int) - Method in class org.apache.spark.mllib.clustering.LocalLDAModel
-
- description() - Method in class org.apache.spark.ExceptionFailure
-
- description() - Method in class org.apache.spark.status.api.v1.JobData
-
- description() - Method in class org.apache.spark.storage.StorageLevel
-
- DeserializationStream - Class in org.apache.spark.serializer
-
:: DeveloperApi ::
A stream for reading serialized objects.
- DeserializationStream() - Constructor for class org.apache.spark.serializer.DeserializationStream
-
- deserialize(Object) - Method in class org.apache.spark.mllib.linalg.VectorUDT
-
- deserialize(ByteBuffer, ClassLoader, ClassTag<T>) - Method in class org.apache.spark.serializer.DummySerializerInstance
-
- deserialize(ByteBuffer, ClassTag<T>) - Method in class org.apache.spark.serializer.DummySerializerInstance
-
- deserialize(ByteBuffer, ClassTag<T>) - Method in class org.apache.spark.serializer.SerializerInstance
-
- deserialize(ByteBuffer, ClassLoader, ClassTag<T>) - Method in class org.apache.spark.serializer.SerializerInstance
-
- deserialize(Object) - Method in class org.apache.spark.sql.types.UserDefinedType
-
Convert a SQL datum to the user type
- deserialized() - Method in class org.apache.spark.storage.MemoryEntry
-
- deserialized() - Method in class org.apache.spark.storage.StorageLevel
-
- deserializeStream(InputStream) - Method in class org.apache.spark.serializer.DummySerializerInstance
-
- deserializeStream(InputStream) - Method in class org.apache.spark.serializer.SerializerInstance
-
- destroy() - Method in class org.apache.spark.broadcast.Broadcast
-
Destroy all data and metadata related to this broadcast variable.
- detachSession() - Method in class org.apache.spark.sql.SQLContext
-
- details() - Method in class org.apache.spark.scheduler.StageInfo
-
- details() - Method in class org.apache.spark.status.api.v1.StageData
-
- determineBounds(ArrayBuffer<Tuple2<K, Object>>, int, Ordering<K>, ClassTag<K>) - Static method in class org.apache.spark.RangePartitioner
-
Determines the bounds for range partitioning from candidates with weights indicating how many
items each represents.
- deterministic() - Method in class org.apache.spark.sql.expressions.UserDefinedAggregateFunction
-
Returns true iff this function is deterministic, i.e.
- DeveloperApi - Annotation Type in org.apache.spark.annotation
-
A lower-level, unstable API intended for developers.
- diag(Vector) - Static method in class org.apache.spark.mllib.linalg.DenseMatrix
-
Generate a diagonal matrix in DenseMatrix
format from the supplied values.
- diag(Vector) - Static method in class org.apache.spark.mllib.linalg.Matrices
-
Generate a diagonal matrix in Matrix
format from the supplied values.
- dialectClassName() - Method in class org.apache.spark.sql.hive.HiveContext
-
- dialectClassName() - Method in class org.apache.spark.sql.SQLContext
-
- diff(RDD<Tuple2<Object, VD>>) - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
-
- diff(VertexRDD<VD>) - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
-
- diff(RDD<Tuple2<Object, VD>>) - Method in class org.apache.spark.graphx.VertexRDD
-
For each vertex present in both this
and other
, diff
returns only those vertices with
differing values; for values that are different, keeps the values from other
.
- diff(VertexRDD<VD>) - Method in class org.apache.spark.graphx.VertexRDD
-
For each vertex present in both this
and other
, diff
returns only those vertices with
differing values; for values that are different, keeps the values from other
.
- disableOutputSpecValidation() - Static method in class org.apache.spark.rdd.PairRDDFunctions
-
- DISK_ONLY - Static variable in class org.apache.spark.api.java.StorageLevels
-
- DISK_ONLY() - Static method in class org.apache.spark.storage.StorageLevel
-
- DISK_ONLY_2 - Static variable in class org.apache.spark.api.java.StorageLevels
-
- DISK_ONLY_2() - Static method in class org.apache.spark.storage.StorageLevel
-
- diskBytesSpilled() - Method in class org.apache.spark.status.api.v1.ExecutorStageSummary
-
- diskBytesSpilled() - Method in class org.apache.spark.status.api.v1.StageData
-
- diskBytesSpilled() - Method in class org.apache.spark.status.api.v1.TaskMetricDistributions
-
- diskBytesSpilled() - Method in class org.apache.spark.status.api.v1.TaskMetrics
-
- diskSize() - Method in class org.apache.spark.storage.BlockStatus
-
- diskSize() - Method in class org.apache.spark.storage.BlockUpdatedInfo
-
- diskSize() - Method in class org.apache.spark.storage.RDDInfo
-
- diskUsed() - Method in class org.apache.spark.status.api.v1.ExecutorSummary
-
- diskUsed() - Method in class org.apache.spark.status.api.v1.RDDDataDistribution
-
- diskUsed() - Method in class org.apache.spark.status.api.v1.RDDPartitionInfo
-
- diskUsed() - Method in class org.apache.spark.status.api.v1.RDDStorageInfo
-
- diskUsed() - Method in class org.apache.spark.storage.StorageStatus
-
Return the disk space used by this block manager.
- diskUsedByRdd(int) - Method in class org.apache.spark.storage.StorageStatus
-
Return the disk space used by the given RDD in this block manager in O(1) time.
- dist(Vector) - Method in class org.apache.spark.util.Vector
-
- distinct() - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Return a new RDD containing the distinct elements in this RDD.
- distinct(int) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Return a new RDD containing the distinct elements in this RDD.
- distinct() - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return a new RDD containing the distinct elements in this RDD.
- distinct(int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return a new RDD containing the distinct elements in this RDD.
- distinct() - Method in class org.apache.spark.api.java.JavaRDD
-
Return a new RDD containing the distinct elements in this RDD.
- distinct(int) - Method in class org.apache.spark.api.java.JavaRDD
-
Return a new RDD containing the distinct elements in this RDD.
- distinct(int, Ordering<T>) - Method in class org.apache.spark.rdd.RDD
-
Return a new RDD containing the distinct elements in this RDD.
- distinct() - Method in class org.apache.spark.rdd.RDD
-
Return a new RDD containing the distinct elements in this RDD.
- distinct() - Method in class org.apache.spark.sql.DataFrame
-
- distinct(Column...) - Method in class org.apache.spark.sql.expressions.UserDefinedAggregateFunction
-
Creates a Column
for this UDAF using the distinct values of the given
Column
s as input arguments.
- distinct(Seq<Column>) - Method in class org.apache.spark.sql.expressions.UserDefinedAggregateFunction
-
Creates a Column
for this UDAF using the distinct values of the given
Column
s as input arguments.
- DistributedLDAModel - Class in org.apache.spark.mllib.clustering
-
:: Experimental ::
- DistributedMatrix - Interface in org.apache.spark.mllib.linalg.distributed
-
Represents a distributively stored matrix backed by one or more RDDs.
- div(Duration) - Method in class org.apache.spark.streaming.Duration
-
- divide(Object) - Method in class org.apache.spark.sql.Column
-
Division this expression by another expression.
- divide(double) - Method in class org.apache.spark.util.Vector
-
- doc() - Method in class org.apache.spark.ml.param.Param
-
- docConcentration() - Method in class org.apache.spark.mllib.clustering.DistributedLDAModel
-
- docConcentration() - Method in class org.apache.spark.mllib.clustering.EMLDAOptimizer
-
- docConcentration() - Method in class org.apache.spark.mllib.clustering.LDAModel
-
Concentration parameter (commonly named "alpha") for the prior placed on documents'
distributions over topics ("theta").
- docConcentration() - Method in class org.apache.spark.mllib.clustering.LocalLDAModel
-
- doDestroy(boolean) - Method in class org.apache.spark.broadcast.Broadcast
-
Actually destroy all data and metadata related to this broadcast variable.
- dot(Vector) - Method in class org.apache.spark.util.Vector
-
- doubleAccumulator(double) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Create an
Accumulator
double variable, which tasks can "add" values
to using the
add
method.
- doubleAccumulator(double, String) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Create an
Accumulator
double variable, which tasks can "add" values
to using the
add
method.
- DoubleArrayParam - Class in org.apache.spark.ml.param
-
:: DeveloperApi ::
Specialized version of Param[Array[Double
} for Java.
- DoubleArrayParam(Params, String, String, Function1<double[], Object>) - Constructor for class org.apache.spark.ml.param.DoubleArrayParam
-
- DoubleArrayParam(Params, String, String) - Constructor for class org.apache.spark.ml.param.DoubleArrayParam
-
- DoubleDecimal() - Static method in class org.apache.spark.sql.types.DecimalType
-
- DoubleFlatMapFunction<T> - Interface in org.apache.spark.api.java.function
-
A function that returns zero or more records of type Double from each input record.
- DoubleFunction<T> - Interface in org.apache.spark.api.java.function
-
A function that returns Doubles, and can be used to construct DoubleRDDs.
- DoubleParam - Class in org.apache.spark.ml.param
-
:: DeveloperApi ::
Specialized version of Param[Double
] for Java.
- DoubleParam(String, String, String, Function1<Object, Object>) - Constructor for class org.apache.spark.ml.param.DoubleParam
-
- DoubleParam(String, String, String) - Constructor for class org.apache.spark.ml.param.DoubleParam
-
- DoubleParam(Identifiable, String, String, Function1<Object, Object>) - Constructor for class org.apache.spark.ml.param.DoubleParam
-
- DoubleParam(Identifiable, String, String) - Constructor for class org.apache.spark.ml.param.DoubleParam
-
- DoubleRDDFunctions - Class in org.apache.spark.rdd
-
Extra functions available on RDDs of Doubles through an implicit conversion.
- DoubleRDDFunctions(RDD<Object>) - Constructor for class org.apache.spark.rdd.DoubleRDDFunctions
-
- doubleRDDToDoubleRDDFunctions(RDD<Object>) - Static method in class org.apache.spark.rdd.RDD
-
- doubleRDDToDoubleRDDFunctions(RDD<Object>) - Static method in class org.apache.spark.SparkContext
-
- doubleToDoubleWritable(double) - Static method in class org.apache.spark.SparkContext
-
- doubleToMultiplier(double) - Static method in class org.apache.spark.util.Vector
-
- DoubleType - Static variable in class org.apache.spark.sql.types.DataTypes
-
Gets the DoubleType object.
- DoubleType - Class in org.apache.spark.sql.types
-
:: DeveloperApi ::
The data type representing Double
values.
- doubleWritableConverter() - Static method in class org.apache.spark.SparkContext
-
- doUnpersist(boolean) - Method in class org.apache.spark.broadcast.Broadcast
-
Actually unpersist the broadcasted value on the executors.
- DRIVER_EXTRA_CLASSPATH - Static variable in class org.apache.spark.launcher.SparkLauncher
-
Configuration key for the driver class path.
- DRIVER_EXTRA_JAVA_OPTIONS - Static variable in class org.apache.spark.launcher.SparkLauncher
-
Configuration key for the driver VM options.
- DRIVER_EXTRA_LIBRARY_PATH - Static variable in class org.apache.spark.launcher.SparkLauncher
-
Configuration key for the driver native library path.
- DRIVER_IDENTIFIER() - Static method in class org.apache.spark.SparkContext
-
Executor id for the driver.
- DRIVER_MEMORY - Static variable in class org.apache.spark.launcher.SparkLauncher
-
Configuration key for the driver memory.
- driverActorSystemName() - Static method in class org.apache.spark.SparkEnv
-
- driverLogs() - Method in class org.apache.spark.scheduler.SparkListenerApplicationStart
-
- drop(String) - Method in class org.apache.spark.sql.DataFrame
-
Returns a new
DataFrame
with a column dropped.
- drop(Column) - Method in class org.apache.spark.sql.DataFrame
-
Returns a new
DataFrame
with a column dropped.
- drop() - Method in class org.apache.spark.sql.DataFrameNaFunctions
-
Returns a new
DataFrame
that drops rows containing any null or NaN values.
- drop(String) - Method in class org.apache.spark.sql.DataFrameNaFunctions
-
Returns a new
DataFrame
that drops rows containing null or NaN values.
- drop(String[]) - Method in class org.apache.spark.sql.DataFrameNaFunctions
-
Returns a new
DataFrame
that drops rows containing any null or NaN values
in the specified columns.
- drop(Seq<String>) - Method in class org.apache.spark.sql.DataFrameNaFunctions
-
(Scala-specific) Returns a new
DataFrame
that drops rows containing any null or NaN values
in the specified columns.
- drop(String, String[]) - Method in class org.apache.spark.sql.DataFrameNaFunctions
-
Returns a new
DataFrame
that drops rows containing null or NaN values
in the specified columns.
- drop(String, Seq<String>) - Method in class org.apache.spark.sql.DataFrameNaFunctions
-
(Scala-specific) Returns a new
DataFrame
that drops rows containing null or NaN values
in the specified columns.
- drop(int) - Method in class org.apache.spark.sql.DataFrameNaFunctions
-
Returns a new
DataFrame
that drops rows containing
less than
minNonNulls
non-null and non-NaN values.
- drop(int, String[]) - Method in class org.apache.spark.sql.DataFrameNaFunctions
-
Returns a new
DataFrame
that drops rows containing
less than
minNonNulls
non-null and non-NaN values in the specified columns.
- drop(int, Seq<String>) - Method in class org.apache.spark.sql.DataFrameNaFunctions
-
(Scala-specific) Returns a new
DataFrame
that drops rows containing less than
minNonNulls
non-null and non-NaN values in the specified columns.
- dropDuplicates() - Method in class org.apache.spark.sql.DataFrame
-
- dropDuplicates(Seq<String>) - Method in class org.apache.spark.sql.DataFrame
-
(Scala-specific) Returns a new
DataFrame
with duplicate rows removed, considering only
the subset of columns.
- dropDuplicates(String[]) - Method in class org.apache.spark.sql.DataFrame
-
Returns a new
DataFrame
with duplicate rows removed, considering only
the subset of columns.
- dropLast() - Method in class org.apache.spark.ml.feature.OneHotEncoder
-
Whether to drop the last category in the encoded vector (default: true)
- dropTempTable(String) - Method in class org.apache.spark.sql.SQLContext
-
- Dst - Static variable in class org.apache.spark.graphx.TripletFields
-
Expose the destination and edge fields but not the source field.
- dstAttr() - Method in class org.apache.spark.graphx.EdgeContext
-
The vertex attribute of the edge's destination vertex.
- dstAttr() - Method in class org.apache.spark.graphx.EdgeTriplet
-
The destination vertex attribute
- dstAttr() - Method in class org.apache.spark.graphx.impl.AggregatingEdgeContext
-
- dstId() - Method in class org.apache.spark.graphx.Edge
-
- dstId() - Method in class org.apache.spark.graphx.EdgeContext
-
The vertex id of the edge's destination vertex.
- dstId() - Method in class org.apache.spark.graphx.impl.AggregatingEdgeContext
-
- dstream() - Method in class org.apache.spark.streaming.api.java.JavaDStream
-
- dstream() - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
- dstream() - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
- DStream<T> - Class in org.apache.spark.streaming.dstream
-
A Discretized Stream (DStream), the basic abstraction in Spark Streaming, is a continuous
sequence of RDDs (of the same type) representing a continuous stream of data (see
org.apache.spark.rdd.RDD in the Spark core documentation for more details on RDDs).
- DStream(StreamingContext, ClassTag<T>) - Constructor for class org.apache.spark.streaming.dstream.DStream
-
- dtypes() - Method in class org.apache.spark.sql.DataFrame
-
Returns all column names and their data types as an array.
- DummySerializerInstance - Class in org.apache.spark.serializer
-
Unfortunately, we need a serializer instance in order to construct a DiskBlockObjectWriter.
- duration() - Method in class org.apache.spark.scheduler.TaskInfo
-
- Duration - Class in org.apache.spark.streaming
-
- Duration(long) - Constructor for class org.apache.spark.streaming.Duration
-
- Durations - Class in org.apache.spark.streaming
-
- Durations() - Constructor for class org.apache.spark.streaming.Durations
-
- f() - Method in class org.apache.spark.sql.UserDefinedFunction
-
- f1Measure() - Method in class org.apache.spark.mllib.evaluation.MultilabelMetrics
-
Returns document-based f1-measure averaged by the number of documents
- f1Measure(double) - Method in class org.apache.spark.mllib.evaluation.MultilabelMetrics
-
Returns f1-measure for a given label (category)
- factorial(Column) - Static method in class org.apache.spark.sql.functions
-
Computes the factorial of the given value.
- failed() - Method in class org.apache.spark.scheduler.TaskInfo
-
- failedJobs() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- failedStages() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- failedTasks() - Method in class org.apache.spark.status.api.v1.ExecutorStageSummary
-
- failedTasks() - Method in class org.apache.spark.status.api.v1.ExecutorSummary
-
- failureReason() - Method in class org.apache.spark.scheduler.StageInfo
-
If the stage failed, the reason why.
- FAIR() - Static method in class org.apache.spark.scheduler.SchedulingMode
-
- falsePositiveRate(double) - Method in class org.apache.spark.mllib.evaluation.MulticlassMetrics
-
Returns false positive rate for a given label (category)
- feature() - Method in class org.apache.spark.mllib.tree.model.Split
-
- featureImportances() - Method in class org.apache.spark.ml.classification.RandomForestClassificationModel
-
Estimate of the importance of each feature.
- featureImportances() - Method in class org.apache.spark.ml.regression.RandomForestRegressionModel
-
Estimate of the importance of each feature.
- featureIndex() - Method in class org.apache.spark.ml.tree.CategoricalSplit
-
- featureIndex() - Method in class org.apache.spark.ml.tree.ContinuousSplit
-
- featureIndex() - Method in interface org.apache.spark.ml.tree.Split
-
Index of feature which this split tests
- features() - Method in class org.apache.spark.mllib.regression.LabeledPoint
-
- featuresCol() - Method in class org.apache.spark.ml.regression.LinearRegressionTrainingSummary
-
- featuresDataType() - Method in class org.apache.spark.ml.PredictionModel
-
Returns the SQL DataType corresponding to the FeaturesType type parameter.
- FeatureType - Class in org.apache.spark.mllib.tree.configuration
-
:: Experimental ::
Enum to describe whether a feature is "continuous" or "categorical"
- FeatureType() - Constructor for class org.apache.spark.mllib.tree.configuration.FeatureType
-
- featureType() - Method in class org.apache.spark.mllib.tree.model.Split
-
- FetchFailed - Class in org.apache.spark
-
:: DeveloperApi ::
Task failed to fetch shuffle data from a remote node.
- FetchFailed(BlockManagerId, int, int, int, String) - Constructor for class org.apache.spark.FetchFailed
-
- fetchPct() - Method in class org.apache.spark.scheduler.RuntimePercentage
-
- fetchWaitTime() - Method in class org.apache.spark.status.api.v1.ShuffleReadMetricDistributions
-
- fetchWaitTime() - Method in class org.apache.spark.status.api.v1.ShuffleReadMetrics
-
- field() - Method in class org.apache.spark.storage.BroadcastBlockId
-
- fieldIndex(String) - Method in interface org.apache.spark.sql.Row
-
Returns the index of a given field name.
- fieldIndex(String) - Method in class org.apache.spark.sql.types.StructType
-
Returns index of a given field
- fieldNames() - Method in class org.apache.spark.sql.types.StructType
-
Returns all field names in an array.
- fields() - Method in class org.apache.spark.sql.types.StructType
-
- FIFO() - Static method in class org.apache.spark.scheduler.SchedulingMode
-
- files() - Method in class org.apache.spark.SparkContext
-
- fileStream(String, Class<K>, Class<V>, Class<F>) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Create an input stream that monitors a Hadoop-compatible filesystem
for new files and reads them using the given key-value types and input format.
- fileStream(String, Class<K>, Class<V>, Class<F>, Function<Path, Boolean>, boolean) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Create an input stream that monitors a Hadoop-compatible filesystem
for new files and reads them using the given key-value types and input format.
- fileStream(String, Class<K>, Class<V>, Class<F>, Function<Path, Boolean>, boolean, Configuration) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Create an input stream that monitors a Hadoop-compatible filesystem
for new files and reads them using the given key-value types and input format.
- fileStream(String, ClassTag<K>, ClassTag<V>, ClassTag<F>) - Method in class org.apache.spark.streaming.StreamingContext
-
Create a input stream that monitors a Hadoop-compatible filesystem
for new files and reads them using the given key-value types and input format.
- fileStream(String, Function1<Path, Object>, boolean, ClassTag<K>, ClassTag<V>, ClassTag<F>) - Method in class org.apache.spark.streaming.StreamingContext
-
Create a input stream that monitors a Hadoop-compatible filesystem
for new files and reads them using the given key-value types and input format.
- fileStream(String, Function1<Path, Object>, boolean, Configuration, ClassTag<K>, ClassTag<V>, ClassTag<F>) - Method in class org.apache.spark.streaming.StreamingContext
-
Create a input stream that monitors a Hadoop-compatible filesystem
for new files and reads them using the given key-value types and input format.
- fill(double) - Method in class org.apache.spark.sql.DataFrameNaFunctions
-
Returns a new
DataFrame
that replaces null or NaN values in numeric columns with
value
.
- fill(String) - Method in class org.apache.spark.sql.DataFrameNaFunctions
-
Returns a new
DataFrame
that replaces null values in string columns with
value
.
- fill(double, String[]) - Method in class org.apache.spark.sql.DataFrameNaFunctions
-
Returns a new
DataFrame
that replaces null or NaN values in specified numeric columns.
- fill(double, Seq<String>) - Method in class org.apache.spark.sql.DataFrameNaFunctions
-
(Scala-specific) Returns a new
DataFrame
that replaces null or NaN values in specified
numeric columns.
- fill(String, String[]) - Method in class org.apache.spark.sql.DataFrameNaFunctions
-
Returns a new
DataFrame
that replaces null values in specified string columns.
- fill(String, Seq<String>) - Method in class org.apache.spark.sql.DataFrameNaFunctions
-
(Scala-specific) Returns a new
DataFrame
that replaces null values in
specified string columns.
- fill(Map<String, Object>) - Method in class org.apache.spark.sql.DataFrameNaFunctions
-
Returns a new
DataFrame
that replaces null values.
- fill(Map<String, Object>) - Method in class org.apache.spark.sql.DataFrameNaFunctions
-
(Scala-specific) Returns a new
DataFrame
that replaces null values.
- filter(Function<Double, Boolean>) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Return a new RDD containing only the elements that satisfy a predicate.
- filter(Function<Tuple2<K, V>, Boolean>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return a new RDD containing only the elements that satisfy a predicate.
- filter(Function<T, Boolean>) - Method in class org.apache.spark.api.java.JavaRDD
-
Return a new RDD containing only the elements that satisfy a predicate.
- filter(Function1<Graph<VD, ED>, Graph<VD2, ED2>>, Function1<EdgeTriplet<VD2, ED2>, Object>, Function2<Object, VD2, Object>, ClassTag<VD2>, ClassTag<ED2>) - Method in class org.apache.spark.graphx.GraphOps
-
Filter the graph by computing some values to filter on, and applying the predicates.
- filter(Function1<EdgeTriplet<VD, ED>, Object>, Function2<Object, VD, Object>) - Method in class org.apache.spark.graphx.impl.EdgeRDDImpl
-
- filter(Function1<Tuple2<Object, VD>, Object>) - Method in class org.apache.spark.graphx.VertexRDD
-
Restricts the vertex set to the set of vertices satisfying the given predicate.
- filter(Params) - Method in class org.apache.spark.ml.param.ParamMap
-
Filters this param map for the given parent.
- filter(Function1<T, Object>) - Method in class org.apache.spark.rdd.RDD
-
Return a new RDD containing only the elements that satisfy a predicate.
- filter(Column) - Method in class org.apache.spark.sql.DataFrame
-
Filters rows using the given condition.
- filter(String) - Method in class org.apache.spark.sql.DataFrame
-
Filters rows using the given SQL expression.
- Filter - Class in org.apache.spark.sql.sources
-
A filter predicate for data sources.
- Filter() - Constructor for class org.apache.spark.sql.sources.Filter
-
- filter(Function<T, Boolean>) - Method in class org.apache.spark.streaming.api.java.JavaDStream
-
Return a new DStream containing only the elements that satisfy a predicate.
- filter(Function<Tuple2<K, V>, Boolean>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream containing only the elements that satisfy a predicate.
- filter(Function1<T, Object>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream containing only the elements that satisfy a predicate.
- filterByRange(K, K) - Method in class org.apache.spark.rdd.OrderedRDDFunctions
-
Returns an RDD containing only the elements in the the inclusive range lower
to upper
.
- filterWith(Function1<Object, A>, Function2<T, A, Object>) - Method in class org.apache.spark.rdd.RDD
-
Filters this RDD with p, where p takes an additional parameter of type A.
- findSplitsBins(RDD<LabeledPoint>, org.apache.spark.mllib.tree.impl.DecisionTreeMetadata) - Static method in class org.apache.spark.mllib.tree.DecisionTree
-
Returns splits and bins for decision tree calculation.
- findSynonyms(String, int) - Method in class org.apache.spark.ml.feature.Word2VecModel
-
Find "num" number of words closest in similarity to the given word.
- findSynonyms(Vector, int) - Method in class org.apache.spark.ml.feature.Word2VecModel
-
Find "num" number of words closest to similarity to the given vector representation
of the word.
- findSynonyms(String, int) - Method in class org.apache.spark.mllib.feature.Word2VecModel
-
Find synonyms of a word
- findSynonyms(Vector, int) - Method in class org.apache.spark.mllib.feature.Word2VecModel
-
Find synonyms of the vector representation of a word
- finished() - Method in class org.apache.spark.scheduler.TaskInfo
-
- finishTime() - Method in class org.apache.spark.scheduler.TaskInfo
-
The time when the task has completed successfully (including the time to remotely fetch
results, if necessary).
- first() - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
- first() - Method in class org.apache.spark.api.java.JavaPairRDD
-
- first() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return the first element in this RDD.
- first() - Method in class org.apache.spark.rdd.RDD
-
Return the first element in this RDD.
- first() - Method in class org.apache.spark.sql.DataFrame
-
Returns the first row.
- first(Column) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns the first value in a group.
- first(String) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns the first value of a column in a group.
- firstParent(ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
-
Returns the first parent RDD
- fit(DataFrame) - Method in class org.apache.spark.ml.classification.OneVsRest
-
- fit(DataFrame) - Method in class org.apache.spark.ml.clustering.KMeans
-
- fit(DataFrame, ParamPair<?>, ParamPair<?>...) - Method in class org.apache.spark.ml.Estimator
-
Fits a single model to the input data with optional parameters.
- fit(DataFrame, ParamPair<?>, Seq<ParamPair<?>>) - Method in class org.apache.spark.ml.Estimator
-
Fits a single model to the input data with optional parameters.
- fit(DataFrame, ParamMap) - Method in class org.apache.spark.ml.Estimator
-
Fits a single model to the input data with provided parameter map.
- fit(DataFrame) - Method in class org.apache.spark.ml.Estimator
-
Fits a model to the input data.
- fit(DataFrame, ParamMap[]) - Method in class org.apache.spark.ml.Estimator
-
Fits multiple models to the input data with multiple sets of parameters.
- fit(DataFrame) - Method in class org.apache.spark.ml.feature.CountVectorizer
-
- fit(DataFrame) - Method in class org.apache.spark.ml.feature.IDF
-
- fit(DataFrame) - Method in class org.apache.spark.ml.feature.MinMaxScaler
-
- fit(DataFrame) - Method in class org.apache.spark.ml.feature.PCA
-
Computes a
PCAModel
that contains the principal components of the input vectors.
- fit(DataFrame) - Method in class org.apache.spark.ml.feature.RFormula
-
- fit(DataFrame) - Method in class org.apache.spark.ml.feature.StandardScaler
-
- fit(DataFrame) - Method in class org.apache.spark.ml.feature.StringIndexer
-
- fit(DataFrame) - Method in class org.apache.spark.ml.feature.VectorIndexer
-
- fit(DataFrame) - Method in class org.apache.spark.ml.feature.Word2Vec
-
- fit(DataFrame) - Method in class org.apache.spark.ml.Pipeline
-
Fits the pipeline to the input dataset with additional parameters.
- fit(DataFrame) - Method in class org.apache.spark.ml.Predictor
-
- fit(DataFrame) - Method in class org.apache.spark.ml.recommendation.ALS
-
- fit(DataFrame) - Method in class org.apache.spark.ml.regression.IsotonicRegression
-
- fit(DataFrame) - Method in class org.apache.spark.ml.tuning.CrossValidator
-
- fit(DataFrame) - Method in class org.apache.spark.ml.tuning.TrainValidationSplit
-
- fit(RDD<LabeledPoint>) - Method in class org.apache.spark.mllib.feature.ChiSqSelector
-
Returns a ChiSquared feature selector.
- fit(RDD<Vector>) - Method in class org.apache.spark.mllib.feature.IDF
-
Computes the inverse document frequency.
- fit(JavaRDD<Vector>) - Method in class org.apache.spark.mllib.feature.IDF
-
Computes the inverse document frequency.
- fit(RDD<Vector>) - Method in class org.apache.spark.mllib.feature.PCA
-
Computes a
PCAModel
that contains the principal components of the input vectors.
- fit(JavaRDD<Vector>) - Method in class org.apache.spark.mllib.feature.PCA
-
Java-friendly version of fit()
- fit(RDD<Vector>) - Method in class org.apache.spark.mllib.feature.StandardScaler
-
Computes the mean and variance and stores as a model to be used for later scaling.
- fit(RDD<S>) - Method in class org.apache.spark.mllib.feature.Word2Vec
-
Computes the vector representation of each word in vocabulary.
- fit(JavaRDD<S>) - Method in class org.apache.spark.mllib.feature.Word2Vec
-
Computes the vector representation of each word in vocabulary (Java version).
- flatMap(FlatMapFunction<T, U>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return a new RDD by first applying a function to all elements of this
RDD, and then flattening the results.
- flatMap(Function1<T, TraversableOnce<U>>, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
-
Return a new RDD by first applying a function to all elements of this
RDD, and then flattening the results.
- flatMap(Function1<Row, TraversableOnce<R>>, ClassTag<R>) - Method in class org.apache.spark.sql.DataFrame
-
Returns a new RDD by first applying a function to all rows of this
DataFrame
,
and then flattening the results.
- flatMap(FlatMapFunction<T, U>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream by applying a function to all elements of this DStream,
and then flattening the results
- flatMap(Function1<T, Traversable<U>>, ClassTag<U>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream by applying a function to all elements of this DStream,
and then flattening the results
- FlatMapFunction<T,R> - Interface in org.apache.spark.api.java.function
-
A function that returns zero or more output records from each input record.
- FlatMapFunction2<T1,T2,R> - Interface in org.apache.spark.api.java.function
-
A function that takes two inputs and returns zero or more output records.
- flatMapToDouble(DoubleFlatMapFunction<T>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return a new RDD by first applying a function to all elements of this
RDD, and then flattening the results.
- flatMapToPair(PairFlatMapFunction<T, K2, V2>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return a new RDD by first applying a function to all elements of this
RDD, and then flattening the results.
- flatMapToPair(PairFlatMapFunction<T, K2, V2>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream by applying a function to all elements of this DStream,
and then flattening the results
- flatMapValues(Function<V, Iterable<U>>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Pass each value in the key-value pair RDD through a flatMap function without changing the
keys; this also retains the original RDD's partitioning.
- flatMapValues(Function1<V, TraversableOnce<U>>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Pass each value in the key-value pair RDD through a flatMap function without changing the
keys; this also retains the original RDD's partitioning.
- flatMapValues(Function<V, Iterable<U>>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying a flatmap function to the value of each key-value pairs in
'this' DStream without changing the key.
- flatMapValues(Function1<V, TraversableOnce<U>>, ClassTag<U>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying a flatmap function to the value of each key-value pairs in
'this' DStream without changing the key.
- flatMapWith(Function1<Object, A>, boolean, Function2<T, A, Seq<U>>, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
-
FlatMaps f over this RDD, where f takes an additional parameter of type A.
- FloatDecimal() - Static method in class org.apache.spark.sql.types.DecimalType
-
- FloatParam - Class in org.apache.spark.ml.param
-
:: DeveloperApi ::
Specialized version of Param[Float
] for Java.
- FloatParam(String, String, String, Function1<Object, Object>) - Constructor for class org.apache.spark.ml.param.FloatParam
-
- FloatParam(String, String, String) - Constructor for class org.apache.spark.ml.param.FloatParam
-
- FloatParam(Identifiable, String, String, Function1<Object, Object>) - Constructor for class org.apache.spark.ml.param.FloatParam
-
- FloatParam(Identifiable, String, String) - Constructor for class org.apache.spark.ml.param.FloatParam
-
- floatToFloatWritable(float) - Static method in class org.apache.spark.SparkContext
-
- FloatType - Static variable in class org.apache.spark.sql.types.DataTypes
-
Gets the FloatType object.
- FloatType - Class in org.apache.spark.sql.types
-
:: DeveloperApi ::
The data type representing Float
values.
- floatWritableConverter() - Static method in class org.apache.spark.SparkContext
-
- floor(Column) - Static method in class org.apache.spark.sql.functions
-
Computes the floor of the given value.
- floor(String) - Static method in class org.apache.spark.sql.functions
-
Computes the floor of the given column.
- floor(Duration) - Method in class org.apache.spark.streaming.Time
-
- floor(Duration, Time) - Method in class org.apache.spark.streaming.Time
-
- FlumeUtils - Class in org.apache.spark.streaming.flume
-
- FlumeUtils() - Constructor for class org.apache.spark.streaming.flume.FlumeUtils
-
- flush() - Method in class org.apache.spark.io.SnappyOutputStreamWrapper
-
- flush() - Method in class org.apache.spark.serializer.SerializationStream
-
- flush() - Method in class org.apache.spark.storage.TimeTrackingOutputStream
-
- fMeasure(double, double) - Method in class org.apache.spark.mllib.evaluation.MulticlassMetrics
-
Returns f-measure for a given label (category)
- fMeasure(double) - Method in class org.apache.spark.mllib.evaluation.MulticlassMetrics
-
Returns f1-measure for a given label (category)
- fMeasure() - Method in class org.apache.spark.mllib.evaluation.MulticlassMetrics
-
Returns f-measure
(equals to precision and recall because precision equals recall)
- fMeasureByThreshold() - Method in class org.apache.spark.ml.classification.BinaryLogisticRegressionSummary
-
Returns a dataframe with two fields (threshold, F-Measure) curve with beta = 1.0.
- fMeasureByThreshold(double) - Method in class org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
-
Returns the (threshold, F-Measure) curve.
- fMeasureByThreshold() - Method in class org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
-
Returns the (threshold, F-Measure) curve with beta = 1.0.
- fold(T, Function2<T, T, T>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Aggregate the elements of each partition, and then the results for all the partitions, using a
given associative and commutative function and a neutral "zero value".
- fold(T, Function2<T, T, T>) - Method in class org.apache.spark.rdd.RDD
-
Aggregate the elements of each partition, and then the results for all the partitions, using a
given associative and commutative function and a neutral "zero value".
- foldByKey(V, Partitioner, Function2<V, V, V>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Merge the values for each key using an associative function and a neutral "zero value" which
may be added to the result an arbitrary number of times, and must not change the result
(e.g ., Nil for list concatenation, 0 for addition, or 1 for multiplication.).
- foldByKey(V, int, Function2<V, V, V>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Merge the values for each key using an associative function and a neutral "zero value" which
may be added to the result an arbitrary number of times, and must not change the result
(e.g ., Nil for list concatenation, 0 for addition, or 1 for multiplication.).
- foldByKey(V, Function2<V, V, V>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Merge the values for each key using an associative function and a neutral "zero value"
which may be added to the result an arbitrary number of times, and must not change the result
(e.g., Nil for list concatenation, 0 for addition, or 1 for multiplication.).
- foldByKey(V, Partitioner, Function2<V, V, V>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Merge the values for each key using an associative function and a neutral "zero value" which
may be added to the result an arbitrary number of times, and must not change the result
(e.g., Nil for list concatenation, 0 for addition, or 1 for multiplication.).
- foldByKey(V, int, Function2<V, V, V>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Merge the values for each key using an associative function and a neutral "zero value" which
may be added to the result an arbitrary number of times, and must not change the result
(e.g., Nil for list concatenation, 0 for addition, or 1 for multiplication.).
- foldByKey(V, Function2<V, V, V>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Merge the values for each key using an associative function and a neutral "zero value" which
may be added to the result an arbitrary number of times, and must not change the result
(e.g., Nil for list concatenation, 0 for addition, or 1 for multiplication.).
- foreach(VoidFunction<T>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Applies a function f to all elements of this RDD.
- foreach(Function1<T, BoxedUnit>) - Method in class org.apache.spark.rdd.RDD
-
Applies a function f to all elements of this RDD.
- foreach(Function1<Row, BoxedUnit>) - Method in class org.apache.spark.sql.DataFrame
-
Applies a function f
to all rows.
- foreach(DataType, Function2<Object, Object, BoxedUnit>) - Method in class org.apache.spark.sql.types.ArrayData
-
- foreach(DataType, DataType, Function2<Object, Object, BoxedUnit>) - Method in class org.apache.spark.sql.types.MapData
-
- foreach(Function<R, Void>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Deprecated.
As of release 0.9.0, replaced by foreachRDD
- foreach(Function2<R, Time, Void>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Deprecated.
As of release 0.9.0, replaced by foreachRDD
- foreach(Function1<RDD<T>, BoxedUnit>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Deprecated.
As of 0.9.0, replaced by foreachRDD
.
- foreach(Function2<RDD<T>, Time, BoxedUnit>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Deprecated.
As of 0.9.0, replaced by foreachRDD
.
- foreachActive(Function3<Object, Object, Object, BoxedUnit>) - Method in interface org.apache.spark.mllib.linalg.Matrix
-
Applies a function f
to all the active elements of dense and sparse matrix.
- foreachActive(Function2<Object, Object, BoxedUnit>) - Method in interface org.apache.spark.mllib.linalg.Vector
-
Applies a function f
to all the active elements of dense and sparse vector.
- foreachAsync(VoidFunction<T>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
The asynchronous version of the foreach
action, which
applies a function f to all the elements of this RDD.
- foreachAsync(Function1<T, BoxedUnit>) - Method in class org.apache.spark.rdd.AsyncRDDActions
-
Applies a function f to all elements of this RDD.
- foreachPartition(VoidFunction<Iterator<T>>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Applies a function f to each partition of this RDD.
- foreachPartition(Function1<Iterator<T>, BoxedUnit>) - Method in class org.apache.spark.rdd.RDD
-
Applies a function f to each partition of this RDD.
- foreachPartition(Function1<Iterator<Row>, BoxedUnit>) - Method in class org.apache.spark.sql.DataFrame
-
Applies a function f to each partition of this
DataFrame
.
- foreachPartitionAsync(VoidFunction<Iterator<T>>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
The asynchronous version of the foreachPartition
action, which
applies a function f to each partition of this RDD.
- foreachPartitionAsync(Function1<Iterator<T>, BoxedUnit>) - Method in class org.apache.spark.rdd.AsyncRDDActions
-
Applies a function f to each partition of this RDD.
- foreachRDD(Function<R, Void>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Apply a function to each RDD in this DStream.
- foreachRDD(Function2<R, Time, Void>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Apply a function to each RDD in this DStream.
- foreachRDD(Function1<RDD<T>, BoxedUnit>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Apply a function to each RDD in this DStream.
- foreachRDD(Function2<RDD<T>, Time, BoxedUnit>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Apply a function to each RDD in this DStream.
- foreachWith(Function1<Object, A>, Function2<T, A, BoxedUnit>) - Method in class org.apache.spark.rdd.RDD
-
Applies f to each element of this RDD, where f takes an additional parameter of type A.
- format(String) - Method in class org.apache.spark.sql.DataFrameReader
-
Specifies the input data source format.
- format(String) - Method in class org.apache.spark.sql.DataFrameWriter
-
Specifies the underlying output data source.
- format_number(Column, int) - Static method in class org.apache.spark.sql.functions
-
Formats numeric column x to a format like '#,###,###.##', rounded to d decimal places,
and returns the result as a string column.
- format_string(String, Column...) - Static method in class org.apache.spark.sql.functions
-
Formats the arguments in printf-style and returns the result as a string column.
- format_string(String, Seq<Column>) - Static method in class org.apache.spark.sql.functions
-
Formats the arguments in printf-style and returns the result as a string column.
- formatVersion() - Method in class org.apache.spark.mllib.classification.LogisticRegressionModel
-
- formatVersion() - Method in class org.apache.spark.mllib.classification.NaiveBayesModel
-
- formatVersion() - Method in class org.apache.spark.mllib.classification.SVMModel
-
- formatVersion() - Method in class org.apache.spark.mllib.clustering.DistributedLDAModel
-
- formatVersion() - Method in class org.apache.spark.mllib.clustering.GaussianMixtureModel
-
- formatVersion() - Method in class org.apache.spark.mllib.clustering.KMeansModel
-
- formatVersion() - Method in class org.apache.spark.mllib.clustering.LocalLDAModel
-
- formatVersion() - Method in class org.apache.spark.mllib.clustering.PowerIterationClusteringModel
-
- formatVersion() - Method in class org.apache.spark.mllib.feature.Word2VecModel
-
- formatVersion() - Method in class org.apache.spark.mllib.recommendation.MatrixFactorizationModel
-
- formatVersion() - Method in class org.apache.spark.mllib.regression.IsotonicRegressionModel
-
- formatVersion() - Method in class org.apache.spark.mllib.regression.LassoModel
-
- formatVersion() - Method in class org.apache.spark.mllib.regression.LinearRegressionModel
-
- formatVersion() - Method in class org.apache.spark.mllib.regression.RidgeRegressionModel
-
- formatVersion() - Method in class org.apache.spark.mllib.tree.model.DecisionTreeModel
-
- formatVersion() - Method in class org.apache.spark.mllib.tree.model.GradientBoostedTreesModel
-
- formatVersion() - Method in class org.apache.spark.mllib.tree.model.RandomForestModel
-
- formatVersion() - Method in interface org.apache.spark.mllib.util.Saveable
-
Current version of model save/load format.
- formula() - Method in class org.apache.spark.ml.feature.RFormula
-
R formula parameter.
- FPGrowth - Class in org.apache.spark.mllib.fpm
-
:: Experimental ::
- FPGrowth() - Constructor for class org.apache.spark.mllib.fpm.FPGrowth
-
Constructs a default instance with default parameters {minSupport: 0.3
, numPartitions: same
as the input data}.
- FPGrowth.FreqItemset<Item> - Class in org.apache.spark.mllib.fpm
-
Frequent itemset.
- FPGrowth.FreqItemset(Object, long) - Constructor for class org.apache.spark.mllib.fpm.FPGrowth.FreqItemset
-
- FPGrowthModel<Item> - Class in org.apache.spark.mllib.fpm
-
:: Experimental ::
- FPGrowthModel(RDD<FPGrowth.FreqItemset<Item>>, ClassTag<Item>) - Constructor for class org.apache.spark.mllib.fpm.FPGrowthModel
-
- fractional() - Method in class org.apache.spark.sql.types.DecimalType
-
- fractional() - Method in class org.apache.spark.sql.types.DoubleType
-
- fractional() - Method in class org.apache.spark.sql.types.FloatType
-
- freq() - Method in class org.apache.spark.mllib.fpm.FPGrowth.FreqItemset
-
- freq() - Method in class org.apache.spark.mllib.fpm.PrefixSpan.FreqSequence
-
- freqItems(String[], double) - Method in class org.apache.spark.sql.DataFrameStatFunctions
-
Finding frequent items for columns, possibly with false positives.
- freqItems(String[]) - Method in class org.apache.spark.sql.DataFrameStatFunctions
-
Finding frequent items for columns, possibly with false positives.
- freqItems(Seq<String>, double) - Method in class org.apache.spark.sql.DataFrameStatFunctions
-
(Scala-specific) Finding frequent items for columns, possibly with false positives.
- freqItems(Seq<String>) - Method in class org.apache.spark.sql.DataFrameStatFunctions
-
(Scala-specific) Finding frequent items for columns, possibly with false positives.
- freqItemsets() - Method in class org.apache.spark.mllib.fpm.FPGrowthModel
-
- freqSequences() - Method in class org.apache.spark.mllib.fpm.PrefixSpanModel
-
- from_unixtime(Column) - Static method in class org.apache.spark.sql.functions
-
Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string
representing the timestamp of that moment in the current system time zone in the given
format.
- from_unixtime(Column, String) - Static method in class org.apache.spark.sql.functions
-
Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string
representing the timestamp of that moment in the current system time zone in the given
format.
- from_utc_timestamp(Column, String) - Static method in class org.apache.spark.sql.functions
-
Assumes given timestamp is UTC and converts to given timezone.
- fromAttributes(Seq<Attribute>) - Static method in class org.apache.spark.sql.types.StructType
-
- fromAvroFlumeEvent(AvroFlumeEvent) - Static method in class org.apache.spark.streaming.flume.SparkFlumeEvent
-
- fromCaseClassString(String) - Static method in class org.apache.spark.sql.types.DataType
-
Deprecated.
As of 1.2.0, replaced by DataType.fromJson()
- fromCOO(int, int, Iterable<Tuple3<Object, Object, Object>>) - Static method in class org.apache.spark.mllib.linalg.SparseMatrix
-
Generate a SparseMatrix
from Coordinate List (COO) format.
- fromDStream(DStream<T>, ClassTag<T>) - Static method in class org.apache.spark.streaming.api.java.JavaDStream
-
- fromEdgePartitions(RDD<Tuple2<Object, EdgePartition<ED, VD>>>, VD, StorageLevel, StorageLevel, ClassTag<VD>, ClassTag<ED>) - Static method in class org.apache.spark.graphx.impl.GraphImpl
-
Create a graph from EdgePartitions, setting referenced vertices to `defaultVertexAttr`.
- fromEdges(RDD<Edge<ED>>, ClassTag<ED>, ClassTag<VD>) - Static method in class org.apache.spark.graphx.EdgeRDD
-
Creates an EdgeRDD from a set of edges.
- fromEdges(RDD<Edge<ED>>, VD, StorageLevel, StorageLevel, ClassTag<VD>, ClassTag<ED>) - Static method in class org.apache.spark.graphx.Graph
-
Construct a graph from a collection of edges.
- fromEdges(EdgeRDD<?>, int, VD, ClassTag<VD>) - Static method in class org.apache.spark.graphx.VertexRDD
-
Constructs a VertexRDD
containing all vertices referred to in edges
.
- fromEdgeTuples(RDD<Tuple2<Object, Object>>, VD, Option<PartitionStrategy>, StorageLevel, StorageLevel, ClassTag<VD>) - Static method in class org.apache.spark.graphx.Graph
-
Construct a graph from a collection of edges encoded as vertex id pairs.
- fromExistingRDDs(VertexRDD<VD>, EdgeRDD<ED>, ClassTag<VD>, ClassTag<ED>) - Static method in class org.apache.spark.graphx.impl.GraphImpl
-
Create a graph from a VertexRDD and an EdgeRDD with the same replicated vertex type as the
vertices.
- fromInputDStream(InputDStream<T>, ClassTag<T>) - Static method in class org.apache.spark.streaming.api.java.JavaInputDStream
-
- fromInputDStream(InputDStream<Tuple2<K, V>>, ClassTag<K>, ClassTag<V>) - Static method in class org.apache.spark.streaming.api.java.JavaPairInputDStream
-
- fromJavaDStream(JavaDStream<Tuple2<K, V>>) - Static method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
- fromJavaRDD(JavaRDD<Tuple2<K, V>>) - Static method in class org.apache.spark.api.java.JavaPairRDD
-
Convert a JavaRDD of key-value pairs to JavaPairRDD.
- fromJson(String) - Static method in class org.apache.spark.sql.types.DataType
-
- fromJson(String) - Static method in class org.apache.spark.sql.types.Metadata
-
Creates a Metadata instance from JSON.
- fromName(String) - Static method in class org.apache.spark.ml.attribute.AttributeType
-
- fromOffset() - Method in class org.apache.spark.streaming.kafka.OffsetRange
-
- fromOld(DecisionTreeModel, DecisionTreeClassifier, Map<Object, Object>) - Static method in class org.apache.spark.ml.classification.DecisionTreeClassificationModel
-
(private[ml]) Convert a model from the old API
- fromOld(GradientBoostedTreesModel, GBTClassifier, Map<Object, Object>) - Static method in class org.apache.spark.ml.classification.GBTClassificationModel
-
(private[ml]) Convert a model from the old API
- fromOld(NaiveBayesModel, NaiveBayes) - Static method in class org.apache.spark.ml.classification.NaiveBayesModel
-
Convert a model from the old API
- fromOld(RandomForestModel, RandomForestClassifier, Map<Object, Object>, int) - Static method in class org.apache.spark.ml.classification.RandomForestClassificationModel
-
(private[ml]) Convert a model from the old API
- fromOld(DecisionTreeModel, DecisionTreeRegressor, Map<Object, Object>) - Static method in class org.apache.spark.ml.regression.DecisionTreeRegressionModel
-
(private[ml]) Convert a model from the old API
- fromOld(GradientBoostedTreesModel, GBTRegressor, Map<Object, Object>) - Static method in class org.apache.spark.ml.regression.GBTRegressionModel
-
(private[ml]) Convert a model from the old API
- fromOld(RandomForestModel, RandomForestRegressor, Map<Object, Object>) - Static method in class org.apache.spark.ml.regression.RandomForestRegressionModel
-
(private[ml]) Convert a model from the old API
- fromOld(Node, Map<Object, Object>) - Static method in class org.apache.spark.ml.tree.Node
-
Create a new Node from the old Node format, recursively creating child nodes as needed.
- fromPairDStream(DStream<Tuple2<K, V>>, ClassTag<K>, ClassTag<V>) - Static method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
- fromPairRDD(RDD<Tuple2<K, V>>, ClassTag<K>, ClassTag<V>) - Static method in class org.apache.spark.mllib.rdd.MLPairRDDFunctions
-
Implicit conversion from a pair RDD to MLPairRDDFunctions.
- fromRDD(RDD<Object>) - Static method in class org.apache.spark.api.java.JavaDoubleRDD
-
- fromRDD(RDD<Tuple2<K, V>>, ClassTag<K>, ClassTag<V>) - Static method in class org.apache.spark.api.java.JavaPairRDD
-
- fromRDD(RDD<T>, ClassTag<T>) - Static method in class org.apache.spark.api.java.JavaRDD
-
- fromRDD(RDD<T>, ClassTag<T>) - Static method in class org.apache.spark.mllib.rdd.RDDFunctions
-
Implicit conversion from an RDD to RDDFunctions.
- fromRdd(RDD<?>) - Static method in class org.apache.spark.storage.RDDInfo
-
- fromReceiverInputDStream(ReceiverInputDStream<Tuple2<K, V>>, ClassTag<K>, ClassTag<V>) - Static method in class org.apache.spark.streaming.api.java.JavaPairReceiverInputDStream
-
- fromReceiverInputDStream(ReceiverInputDStream<T>, ClassTag<T>) - Static method in class org.apache.spark.streaming.api.java.JavaReceiverInputDStream
-
- fromSparkContext(SparkContext) - Static method in class org.apache.spark.api.java.JavaSparkContext
-
- fromStage(Stage, int, Option<Object>, Seq<Seq<TaskLocation>>) - Static method in class org.apache.spark.scheduler.StageInfo
-
Construct a StageInfo from a Stage.
- fromString(String) - Static method in enum org.apache.spark.JobExecutionStatus
-
- fromString(String) - Static method in class org.apache.spark.mllib.tree.loss.Losses
-
- fromString(String) - Static method in enum org.apache.spark.status.api.v1.ApplicationStatus
-
- fromString(String) - Static method in enum org.apache.spark.status.api.v1.StageStatus
-
- fromString(String) - Static method in enum org.apache.spark.status.api.v1.TaskSorting
-
- fromString(String) - Static method in class org.apache.spark.storage.StorageLevel
-
:: DeveloperApi ::
Return the StorageLevel object with the specified name.
- fromStructField(StructField) - Static method in class org.apache.spark.ml.attribute.AttributeGroup
-
Creates an attribute group from a StructField
instance.
- fullOuterJoin(JavaPairRDD<K, W>, Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Perform a full outer join of this
and other
.
- fullOuterJoin(JavaPairRDD<K, W>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Perform a full outer join of this
and other
.
- fullOuterJoin(JavaPairRDD<K, W>, int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Perform a full outer join of this
and other
.
- fullOuterJoin(RDD<Tuple2<K, W>>, Partitioner) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Perform a full outer join of this
and other
.
- fullOuterJoin(RDD<Tuple2<K, W>>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Perform a full outer join of this
and other
.
- fullOuterJoin(RDD<Tuple2<K, W>>, int) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Perform a full outer join of this
and other
.
- fullOuterJoin(JavaPairDStream<K, W>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying 'full outer join' between RDDs of this
DStream and
other
DStream.
- fullOuterJoin(JavaPairDStream<K, W>, int) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying 'full outer join' between RDDs of this
DStream and
other
DStream.
- fullOuterJoin(JavaPairDStream<K, W>, Partitioner) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying 'full outer join' between RDDs of this
DStream and
other
DStream.
- fullOuterJoin(DStream<Tuple2<K, W>>, ClassTag<W>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying 'full outer join' between RDDs of this
DStream and
other
DStream.
- fullOuterJoin(DStream<Tuple2<K, W>>, int, ClassTag<W>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying 'full outer join' between RDDs of this
DStream and
other
DStream.
- fullOuterJoin(DStream<Tuple2<K, W>>, Partitioner, ClassTag<W>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying 'full outer join' between RDDs of this
DStream and
other
DStream.
- fullStackTrace() - Method in class org.apache.spark.ExceptionFailure
-
- Function<T1,R> - Interface in org.apache.spark.api.java.function
-
Base interface for functions whose return types do not create special RDDs.
- Function0<R> - Interface in org.apache.spark.api.java.function
-
A zero-argument function that returns an R.
- Function2<T1,T2,R> - Interface in org.apache.spark.api.java.function
-
A two-argument function that takes arguments of type T1 and T2 and returns an R.
- Function3<T1,T2,T3,R> - Interface in org.apache.spark.api.java.function
-
A three-argument function that takes arguments of type T1, T2 and T3 and returns an R.
- functionRegistry() - Method in class org.apache.spark.sql.hive.HiveContext
-
- functionRegistry() - Method in class org.apache.spark.sql.SQLContext
-
- functions - Class in org.apache.spark.sql
-
- functions() - Constructor for class org.apache.spark.sql.functions
-
- FutureAction<T> - Interface in org.apache.spark
-
A future for the result of an action to support cancellation.
- futureExecutionContext() - Static method in class org.apache.spark.rdd.AsyncRDDActions
-
- gain() - Method in class org.apache.spark.ml.tree.InternalNode
-
- gain() - Method in class org.apache.spark.mllib.tree.model.InformationGainStats
-
- gamma1() - Method in class org.apache.spark.graphx.lib.SVDPlusPlus.Conf
-
- gamma2() - Method in class org.apache.spark.graphx.lib.SVDPlusPlus.Conf
-
- gamma6() - Method in class org.apache.spark.graphx.lib.SVDPlusPlus.Conf
-
- gamma7() - Method in class org.apache.spark.graphx.lib.SVDPlusPlus.Conf
-
- GammaGenerator - Class in org.apache.spark.mllib.random
-
:: DeveloperApi ::
Generates i.i.d.
- GammaGenerator(double, double) - Constructor for class org.apache.spark.mllib.random.GammaGenerator
-
- gammaJavaRDD(JavaSparkContext, double, double, long, int, long) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
- gammaJavaRDD(JavaSparkContext, double, double, long, int) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
- gammaJavaRDD(JavaSparkContext, double, double, long) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
- gammaJavaVectorRDD(JavaSparkContext, double, double, long, int, int, long) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
- gammaJavaVectorRDD(JavaSparkContext, double, double, long, int, int) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
- gammaJavaVectorRDD(JavaSparkContext, double, double, long, int) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
- gammaRDD(SparkContext, double, double, long, int, long) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
Generates an RDD comprised of i.i.d.
samples from the gamma distribution with the input
shape and scale.
- gammaShape() - Method in class org.apache.spark.mllib.clustering.DistributedLDAModel
-
- gammaShape() - Method in class org.apache.spark.mllib.clustering.LDAModel
-
Shape parameter for random initialization of variational parameter gamma.
- gammaShape() - Method in class org.apache.spark.mllib.clustering.LocalLDAModel
-
- gammaVectorRDD(SparkContext, double, double, long, int, int, long) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
Generates an RDD[Vector] with vectors containing i.i.d.
samples drawn from the
gamma distribution with the input shape and scale.
- gaps() - Method in class org.apache.spark.ml.feature.RegexTokenizer
-
Indicates whether regex splits on gaps (true) or matches tokens (false).
- GaussianMixture - Class in org.apache.spark.mllib.clustering
-
:: Experimental ::
- GaussianMixture() - Constructor for class org.apache.spark.mllib.clustering.GaussianMixture
-
Constructs a default instance.
- GaussianMixtureModel - Class in org.apache.spark.mllib.clustering
-
:: Experimental ::
- GaussianMixtureModel(double[], MultivariateGaussian[]) - Constructor for class org.apache.spark.mllib.clustering.GaussianMixtureModel
-
- gaussians() - Method in class org.apache.spark.mllib.clustering.GaussianMixtureModel
-
- GBTClassificationModel - Class in org.apache.spark.ml.classification
-
:: Experimental ::
Gradient-Boosted Trees (GBTs)
model for classification.
- GBTClassificationModel(String, DecisionTreeRegressionModel[], double[]) - Constructor for class org.apache.spark.ml.classification.GBTClassificationModel
-
- GBTClassifier - Class in org.apache.spark.ml.classification
-
:: Experimental ::
Gradient-Boosted Trees (GBTs)
learning algorithm for classification.
- GBTClassifier(String) - Constructor for class org.apache.spark.ml.classification.GBTClassifier
-
- GBTClassifier() - Constructor for class org.apache.spark.ml.classification.GBTClassifier
-
- GBTRegressionModel - Class in org.apache.spark.ml.regression
-
:: Experimental ::
- GBTRegressionModel(String, DecisionTreeRegressionModel[], double[]) - Constructor for class org.apache.spark.ml.regression.GBTRegressionModel
-
- GBTRegressor - Class in org.apache.spark.ml.regression
-
:: Experimental ::
Gradient-Boosted Trees (GBTs)
learning algorithm for regression.
- GBTRegressor(String) - Constructor for class org.apache.spark.ml.regression.GBTRegressor
-
- GBTRegressor() - Constructor for class org.apache.spark.ml.regression.GBTRegressor
-
- GeneralizedLinearAlgorithm<M extends GeneralizedLinearModel> - Class in org.apache.spark.mllib.regression
-
:: DeveloperApi ::
GeneralizedLinearAlgorithm implements methods to train a Generalized Linear Model (GLM).
- GeneralizedLinearAlgorithm() - Constructor for class org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm
-
- GeneralizedLinearModel - Class in org.apache.spark.mllib.regression
-
:: DeveloperApi ::
GeneralizedLinearModel (GLM) represents a model trained using
GeneralizedLinearAlgorithm.
- GeneralizedLinearModel(Vector, double) - Constructor for class org.apache.spark.mllib.regression.GeneralizedLinearModel
-
- generate(String, String, int, int) - Static method in class org.apache.spark.examples.streaming.KinesisWordProducerASL
-
- generateAssociationRules(double) - Method in class org.apache.spark.mllib.fpm.FPGrowthModel
-
Generates association rules for the Item
s in freqItemsets
.
- generatedRDDs() - Method in class org.apache.spark.streaming.dstream.DStream
-
- generateKMeansRDD(SparkContext, int, int, int, double, int) - Static method in class org.apache.spark.mllib.util.KMeansDataGenerator
-
Generate an RDD containing test data for KMeans.
- generateLinearInput(double, double[], int, int, double) - Static method in class org.apache.spark.mllib.util.LinearDataGenerator
-
For compatibility, the generated data without specifying the mean and variance
will have zero mean and variance of (1.0/3.0) since the original output range is
[-1, 1] with uniform distribution, and the variance of uniform distribution
is (b - a)^2^ / 12 which will be (1.0/3.0)
- generateLinearInput(double, double[], double[], double[], int, int, double) - Static method in class org.apache.spark.mllib.util.LinearDataGenerator
-
- generateLinearInputAsList(double, double[], int, int, double) - Static method in class org.apache.spark.mllib.util.LinearDataGenerator
-
Return a Java List of synthetic data randomly generated according to a multi
collinear model.
- generateLinearRDD(SparkContext, int, int, double, int, double) - Static method in class org.apache.spark.mllib.util.LinearDataGenerator
-
Generate an RDD containing sample data for Linear Regression models - including Ridge, Lasso,
and uregularized variants.
- generateLogisticRDD(SparkContext, int, int, double, int, double) - Static method in class org.apache.spark.mllib.util.LogisticRegressionDataGenerator
-
Generate an RDD containing test data for LogisticRegression.
- generateRandomEdges(int, int, int, long) - Static method in class org.apache.spark.graphx.util.GraphGenerators
-
- GenericArrayData - Class in org.apache.spark.sql.types
-
- GenericArrayData(Object[]) - Constructor for class org.apache.spark.sql.types.GenericArrayData
-
- geq(Object) - Method in class org.apache.spark.sql.Column
-
Greater than or equal to an expression.
- get() - Method in interface org.apache.spark.FutureAction
-
Blocks and returns the result of this job.
- get(Param<T>) - Method in class org.apache.spark.ml.param.ParamMap
-
Optionally returns the value associated with a param.
- get(Param<T>) - Method in interface org.apache.spark.ml.param.Params
-
Optionally returns the user-supplied value of a param.
- get(String) - Method in class org.apache.spark.SparkConf
-
Get a parameter; throws a NoSuchElementException if it's not set
- get(String, String) - Method in class org.apache.spark.SparkConf
-
Get a parameter, falling back to a default if not set
- get() - Static method in class org.apache.spark.SparkEnv
-
Returns the SparkEnv.
- get(String) - Static method in class org.apache.spark.SparkFiles
-
Get the absolute path of a file added through SparkContext.addFile()
.
- get(int) - Method in interface org.apache.spark.sql.Row
-
Returns the value at position i.
- get(int, DataType) - Method in class org.apache.spark.sql.types.GenericArrayData
-
- get() - Static method in class org.apache.spark.TaskContext
-
Return the currently active TaskContext.
- getActive() - Static method in class org.apache.spark.streaming.StreamingContext
-
:: Experimental ::
- getActiveJobIds() - Method in class org.apache.spark.api.java.JavaSparkStatusTracker
-
Returns an array containing the ids of all active jobs.
- getActiveJobIds() - Method in class org.apache.spark.SparkStatusTracker
-
Returns an array containing the ids of all active jobs.
- getActiveOrCreate(Function0<StreamingContext>) - Static method in class org.apache.spark.streaming.StreamingContext
-
:: Experimental ::
- getActiveOrCreate(String, Function0<StreamingContext>, Configuration, boolean) - Static method in class org.apache.spark.streaming.StreamingContext
-
:: Experimental ::
- getActiveStageIds() - Method in class org.apache.spark.api.java.JavaSparkStatusTracker
-
Returns an array containing the ids of all active stages.
- getActiveStageIds() - Method in class org.apache.spark.SparkStatusTracker
-
Returns an array containing the ids of all active stages.
- getAkkaConf() - Method in class org.apache.spark.SparkConf
-
Get all akka conf variables set on this SparkConf
- getAlgo() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- getAll() - Method in class org.apache.spark.SparkConf
-
Get all parameters as a list of pairs
- getAllConfs() - Method in class org.apache.spark.sql.SQLContext
-
Return all the configuration properties that have been set (i.e.
- getAllPools() - Method in class org.apache.spark.SparkContext
-
:: DeveloperApi ::
Return pools for fair scheduler
- getAlpha() - Method in class org.apache.spark.mllib.clustering.LDA
-
Alias for getDocConcentration
- getAppId() - Method in class org.apache.spark.SparkConf
-
Returns the Spark application id, valid in the Driver after TaskScheduler registration and
from the start in the Executor.
- getArray(int) - Method in class org.apache.spark.sql.types.GenericArrayData
-
- getAs(int) - Method in interface org.apache.spark.sql.Row
-
Returns the value at position i.
- getAs(String) - Method in interface org.apache.spark.sql.Row
-
Returns the value of a given fieldName.
- getAsymmetricAlpha() - Method in class org.apache.spark.mllib.clustering.LDA
-
Alias for getAsymmetricDocConcentration
- getAsymmetricDocConcentration() - Method in class org.apache.spark.mllib.clustering.LDA
-
Concentration parameter (commonly named "alpha") for the prior placed on documents'
distributions over topics ("theta").
- getAttr(String) - Method in class org.apache.spark.ml.attribute.AttributeGroup
-
Gets an attribute by its name.
- getAttr(int) - Method in class org.apache.spark.ml.attribute.AttributeGroup
-
Gets an attribute by its index.
- getAvroSchema() - Method in class org.apache.spark.SparkConf
-
Gets all the avro schemas in the configuration used in the generic Avro record serializer
- getBeta() - Method in class org.apache.spark.mllib.clustering.LDA
-
Alias for getTopicConcentration
- getBinary(int) - Method in class org.apache.spark.sql.types.GenericArrayData
-
- getBlock(BlockId) - Method in class org.apache.spark.storage.StorageStatus
-
Return the given block stored in this block manager in O(1) time.
- getBoolean(String, boolean) - Method in class org.apache.spark.SparkConf
-
Get a parameter as a boolean, falling back to a default if not set
- getBoolean(int) - Method in interface org.apache.spark.sql.Row
-
Returns the value at position i as a primitive boolean.
- getBoolean(int) - Method in class org.apache.spark.sql.types.GenericArrayData
-
- getBoolean(String) - Method in class org.apache.spark.sql.types.Metadata
-
Gets a Boolean.
- getBooleanArray(String) - Method in class org.apache.spark.sql.types.Metadata
-
Gets a Boolean array.
- getByte(int) - Method in interface org.apache.spark.sql.Row
-
Returns the value at position i as a primitive byte.
- getByte(int) - Method in class org.apache.spark.sql.types.GenericArrayData
-
- getCachedBlockManagerId(BlockManagerId) - Static method in class org.apache.spark.storage.BlockManagerId
-
- getCachedMetadata(String) - Static method in class org.apache.spark.rdd.HadoopRDD
-
The three methods below are helpers for accessing the local map, a property of the SparkEnv of
the local process.
- getCaseSensitive() - Method in class org.apache.spark.ml.feature.StopWordsRemover
-
- getCatalystType(int, String, int, MetadataBuilder) - Method in class org.apache.spark.sql.jdbc.AggregatedDialect
-
- getCatalystType(int, String, int, MetadataBuilder) - Method in class org.apache.spark.sql.jdbc.JdbcDialect
-
Get the custom datatype mapping for the given jdbc meta information.
- getCatalystType(int, String, int, MetadataBuilder) - Static method in class org.apache.spark.sql.jdbc.MySQLDialect
-
- getCatalystType(int, String, int, MetadataBuilder) - Static method in class org.apache.spark.sql.jdbc.PostgresDialect
-
- getCategoricalFeaturesInfo() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- getCategoryMaps() - Method in class org.apache.spark.ml.feature.VectorIndexer.CategoryStats
-
Based on stats collected, decide which features are categorical,
and choose indices for categories.
- getCheckpointDir() - Method in class org.apache.spark.api.java.JavaSparkContext
-
- getCheckpointDir() - Method in class org.apache.spark.SparkContext
-
- getCheckpointFile() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Gets the name of the file to which this RDD was checkpointed
- getCheckpointFile() - Method in class org.apache.spark.graphx.impl.EdgeRDDImpl
-
- getCheckpointFile() - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
-
- getCheckpointFile() - Method in class org.apache.spark.rdd.RDD
-
Gets the name of the directory to which this RDD was checkpointed.
- getCheckpointFiles() - Method in class org.apache.spark.graphx.Graph
-
Gets the name of the files to which this Graph was checkpointed.
- getCheckpointFiles() - Method in class org.apache.spark.graphx.impl.GraphImpl
-
- getCheckpointInterval() - Method in class org.apache.spark.mllib.clustering.LDA
-
Period (in iterations) between checkpoints.
- getCheckpointInterval() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- getConf() - Method in class org.apache.spark.api.java.JavaSparkContext
-
Return a copy of this JavaSparkContext's configuration.
- getConf() - Method in class org.apache.spark.rdd.HadoopRDD
-
- getConf() - Method in class org.apache.spark.rdd.NewHadoopRDD
-
- getConf() - Method in class org.apache.spark.SparkContext
-
Return a copy of this SparkContext's configuration.
- getConf(String) - Method in class org.apache.spark.sql.SQLContext
-
Return the value of Spark SQL configuration property for the given key.
- getConf(String, String) - Method in class org.apache.spark.sql.SQLContext
-
Return the value of Spark SQL configuration property for the given key.
- getConnection() - Method in interface org.apache.spark.rdd.JdbcRDD.ConnectionFactory
-
- getConvergenceTol() - Method in class org.apache.spark.mllib.clustering.GaussianMixture
-
Return the largest change in log-likelihood at which convergence is
considered to have occurred.
- getDate(int) - Method in interface org.apache.spark.sql.Row
-
Returns the value at position i of date type as java.sql.Date.
- getDecimal(int) - Method in interface org.apache.spark.sql.Row
-
Returns the value at position i of decimal type as java.math.BigDecimal.
- getDecimal(int, int, int) - Method in class org.apache.spark.sql.types.GenericArrayData
-
- getDefault(Param<T>) - Method in interface org.apache.spark.ml.param.Params
-
Gets the default value of a parameter.
- getDegree() - Method in class org.apache.spark.ml.feature.PolynomialExpansion
-
- getDependencies() - Method in class org.apache.spark.rdd.CoGroupedRDD
-
- getDependencies() - Method in class org.apache.spark.rdd.RDD
-
Implemented by subclasses to return how this RDD depends on parent RDDs.
- getDependencies() - Method in class org.apache.spark.rdd.ShuffledRDD
-
- getDependencies() - Method in class org.apache.spark.rdd.UnionRDD
-
- getDeprecatedConfig(String, SparkConf) - Static method in class org.apache.spark.SparkConf
-
Looks for available deprecated keys for the given config option, and return the first
value available.
- getDocConcentration() - Method in class org.apache.spark.mllib.clustering.LDA
-
Concentration parameter (commonly named "alpha") for the prior placed on documents'
distributions over topics ("theta").
- getDouble(String, double) - Method in class org.apache.spark.SparkConf
-
Get a parameter as a double, falling back to a default if not set
- getDouble(int) - Method in interface org.apache.spark.sql.Row
-
Returns the value at position i as a primitive double.
- getDouble(int) - Method in class org.apache.spark.sql.types.GenericArrayData
-
- getDouble(String) - Method in class org.apache.spark.sql.types.Metadata
-
Gets a Double.
- getDoubleArray(String) - Method in class org.apache.spark.sql.types.Metadata
-
Gets a Double array.
- getEpsilon() - Method in class org.apache.spark.mllib.clustering.KMeans
-
The distance threshold within which we've consider centers to have converged.
- getExecutorEnv() - Method in class org.apache.spark.SparkConf
-
Get all executor environment variables set on this SparkConf
- getExecutorMemoryStatus() - Method in class org.apache.spark.SparkContext
-
Return a map from the slave to the max memory available for caching and the remaining
memory available for caching.
- getExecutorStorageStatus() - Method in class org.apache.spark.SparkContext
-
:: DeveloperApi ::
Return information about blocks stored in all of the slaves
- getField(String) - Method in class org.apache.spark.sql.Column
-
An expression that gets a field by name in a StructType
.
- getFinalValue() - Method in class org.apache.spark.partial.PartialResult
-
Blocking method to wait for and return the final value.
- getFloat(int) - Method in interface org.apache.spark.sql.Row
-
Returns the value at position i as a primitive float.
- getFloat(int) - Method in class org.apache.spark.sql.types.GenericArrayData
-
- getFormula() - Method in class org.apache.spark.ml.feature.RFormula
-
- getGaps() - Method in class org.apache.spark.ml.feature.RegexTokenizer
-
- getImpurity() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- getIndices() - Method in class org.apache.spark.ml.feature.VectorSlicer
-
- getInitializationMode() - Method in class org.apache.spark.mllib.clustering.KMeans
-
The initialization algorithm.
- getInitializationSteps() - Method in class org.apache.spark.mllib.clustering.KMeans
-
Number of steps for the k-means|| initialization mode
- getInitialModel() - Method in class org.apache.spark.mllib.clustering.GaussianMixture
-
Return the user supplied initial GMM, if supplied
- getInitialPositionInStream(int) - Method in class org.apache.spark.streaming.kinesis.KinesisUtilsPythonHelper
-
- getInputFormat(JobConf) - Method in class org.apache.spark.rdd.HadoopRDD
-
- getInt(String, int) - Method in class org.apache.spark.SparkConf
-
Get a parameter as an integer, falling back to a default if not set
- getInt(int) - Method in interface org.apache.spark.sql.Row
-
Returns the value at position i as a primitive int.
- getInt(int) - Method in class org.apache.spark.sql.types.GenericArrayData
-
- getInterval(int) - Method in class org.apache.spark.sql.types.GenericArrayData
-
- getInverse() - Method in class org.apache.spark.ml.feature.DCT
-
- getItem(Object) - Method in class org.apache.spark.sql.Column
-
An expression that gets an item at position ordinal
out of an array,
or gets a value by key key
in a MapType
.
- getJavaMap(int) - Method in interface org.apache.spark.sql.Row
-
Returns the value at position i of array type as a Map
.
- getJDBCType(DataType) - Method in class org.apache.spark.sql.jdbc.AggregatedDialect
-
- getJDBCType(DataType) - Method in class org.apache.spark.sql.jdbc.JdbcDialect
-
Retrieve the jdbc / sql type for a given datatype.
- getJDBCType(DataType) - Static method in class org.apache.spark.sql.jdbc.PostgresDialect
-
- getJobConf() - Method in class org.apache.spark.rdd.HadoopRDD
-
- getJobIdsForGroup(String) - Method in class org.apache.spark.api.java.JavaSparkStatusTracker
-
Return a list of all known jobs in a particular job group.
- getJobIdsForGroup(String) - Method in class org.apache.spark.SparkStatusTracker
-
Return a list of all known jobs in a particular job group.
- getJobInfo(int) - Method in class org.apache.spark.api.java.JavaSparkStatusTracker
-
Returns job information, or null
if the job info could not be found or was garbage collected.
- getJobInfo(int) - Method in class org.apache.spark.SparkStatusTracker
-
Returns job information, or None
if the job info could not be found or was garbage collected.
- getK() - Method in class org.apache.spark.mllib.clustering.GaussianMixture
-
Return the number of Gaussians in the mixture model
- getK() - Method in class org.apache.spark.mllib.clustering.KMeans
-
Number of clusters to create (k).
- getK() - Method in class org.apache.spark.mllib.clustering.LDA
-
Number of topics to infer.
- getKappa() - Method in class org.apache.spark.mllib.clustering.OnlineLDAOptimizer
-
Learning rate: exponential decay rate
- getLabels() - Method in class org.apache.spark.ml.feature.IndexToString
-
Optional labels to be provided by the user, if not supplied column
metadata is read for labels.
- getLambda() - Method in class org.apache.spark.mllib.classification.NaiveBayes
-
Get the smoothing parameter.
- getLDAModel(double[]) - Method in interface org.apache.spark.mllib.clustering.LDAOptimizer
-
- getLearningRate() - Method in class org.apache.spark.mllib.tree.configuration.BoostingStrategy
-
- getLeastGroupHash(String) - Method in class org.apache.spark.rdd.PartitionCoalescer
-
Sorts and gets the least element of the list associated with key in groupHash
The returned PartitionGroup is the least loaded of all groups that represent the machine "key"
- getList(int) - Method in interface org.apache.spark.sql.Row
-
Returns the value at position i of array type as List
.
- getLocalProperty(String) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Get a local property set in this thread, or null if it is missing.
- getLocalProperty(String) - Method in class org.apache.spark.SparkContext
-
Get a local property set in this thread, or null if it is missing.
- getLong(String, long) - Method in class org.apache.spark.SparkConf
-
Get a parameter as a long, falling back to a default if not set
- getLong(int) - Method in interface org.apache.spark.sql.Row
-
Returns the value at position i as a primitive long.
- getLong(int) - Method in class org.apache.spark.sql.types.GenericArrayData
-
- getLong(String) - Method in class org.apache.spark.sql.types.Metadata
-
Gets a Long.
- getLongArray(String) - Method in class org.apache.spark.sql.types.Metadata
-
Gets a Long array.
- getLoss() - Method in class org.apache.spark.mllib.tree.configuration.BoostingStrategy
-
- getLossType() - Method in class org.apache.spark.ml.classification.GBTClassifier
-
- getLossType() - Method in class org.apache.spark.ml.regression.GBTRegressor
-
- getMap(int) - Method in interface org.apache.spark.sql.Row
-
Returns the value at position i of map type as a Scala Map.
- getMap(int) - Method in class org.apache.spark.sql.types.GenericArrayData
-
- getMap() - Method in class org.apache.spark.sql.types.MetadataBuilder
-
Returns the immutable version of this map.
- getMaxBins() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- getMaxDepth() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- getMaxIterations() - Method in class org.apache.spark.mllib.clustering.GaussianMixture
-
Return the maximum number of iterations to run
- getMaxIterations() - Method in class org.apache.spark.mllib.clustering.KMeans
-
Maximum number of iterations to run.
- getMaxIterations() - Method in class org.apache.spark.mllib.clustering.LDA
-
Maximum number of iterations for learning.
- getMaxLocalProjDBSize() - Method in class org.apache.spark.mllib.fpm.PrefixSpan
-
Gets the maximum number of items allowed in a projected database before local processing.
- getMaxMemoryInMB() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- getMaxPatternLength() - Method in class org.apache.spark.mllib.fpm.PrefixSpan
-
Gets the maximal pattern length (i.e.
- getMessage() - Method in exception org.apache.spark.sql.AnalysisException
-
- getMetadata(String) - Method in class org.apache.spark.sql.types.Metadata
-
Gets a Metadata.
- getMetadataArray(String) - Method in class org.apache.spark.sql.types.Metadata
-
Gets a Metadata array.
- getMetricName() - Method in class org.apache.spark.ml.evaluation.BinaryClassificationEvaluator
-
- getMetricName() - Method in class org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
-
- getMetricName() - Method in class org.apache.spark.ml.evaluation.RegressionEvaluator
-
- getMetricsSources(String) - Method in class org.apache.spark.TaskContext
-
::DeveloperApi::
Returns all metrics sources with the given name which are associated with the instance
which runs the task.
- getMiniBatchFraction() - Method in class org.apache.spark.mllib.clustering.OnlineLDAOptimizer
-
Mini-batch fraction, which sets the fraction of document sampled and used in each iteration
- getMinInfoGain() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- getMinInstancesPerNode() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- getMinSupport() - Method in class org.apache.spark.mllib.fpm.PrefixSpan
-
Get the minimal support (i.e.
- getMinTokenLength() - Method in class org.apache.spark.ml.feature.RegexTokenizer
-
- getModelType() - Method in class org.apache.spark.mllib.classification.NaiveBayes
-
Get the model type.
- getN() - Method in class org.apache.spark.ml.feature.NGram
-
- getNames() - Method in class org.apache.spark.ml.feature.VectorSlicer
-
- getNode(int, Node) - Static method in class org.apache.spark.mllib.tree.model.Node
-
Traces down from a root node to get the node with the given node index.
- getNumClasses() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- getNumFeatures() - Method in class org.apache.spark.ml.feature.HashingTF
-
- getNumFeatures() - Method in class org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm
-
The dimension of training features.
- getNumIterations() - Method in class org.apache.spark.mllib.tree.configuration.BoostingStrategy
-
- getNumValues() - Method in class org.apache.spark.ml.attribute.NominalAttribute
-
Get the number of values, either from numValues
or from values
.
- getOptimizeDocConcentration() - Method in class org.apache.spark.mllib.clustering.OnlineLDAOptimizer
-
Optimize docConcentration, indicates whether docConcentration (Dirichlet parameter for
document-topic distribution) will be optimized during training.
- getOptimizer() - Method in class org.apache.spark.mllib.clustering.LDA
-
:: DeveloperApi ::
- getOption(String) - Method in class org.apache.spark.SparkConf
-
Get a parameter as an Option
- getOrCreate(SparkConf) - Static method in class org.apache.spark.SparkContext
-
This function may be used to get or instantiate a SparkContext and register it as a
singleton object.
- getOrCreate() - Static method in class org.apache.spark.SparkContext
-
This function may be used to get or instantiate a SparkContext and register it as a
singleton object.
- getOrCreate(SparkContext) - Static method in class org.apache.spark.sql.SQLContext
-
Get the singleton SQLContext if it exists or create a new one using the given SparkContext.
- getOrCreate(String, JavaStreamingContextFactory) - Static method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Deprecated.
As of 1.4.0, replaced by getOrCreate
without JavaStreamingContextFactor.
- getOrCreate(String, Configuration, JavaStreamingContextFactory) - Static method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Deprecated.
As of 1.4.0, replaced by getOrCreate
without JavaStreamingContextFactor.
- getOrCreate(String, Configuration, JavaStreamingContextFactory, boolean) - Static method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Deprecated.
As of 1.4.0, replaced by getOrCreate
without JavaStreamingContextFactor.
- getOrCreate(String, Function0<JavaStreamingContext>) - Static method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Either recreate a StreamingContext from checkpoint data or create a new StreamingContext.
- getOrCreate(String, Function0<JavaStreamingContext>, Configuration) - Static method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Either recreate a StreamingContext from checkpoint data or create a new StreamingContext.
- getOrCreate(String, Function0<JavaStreamingContext>, Configuration, boolean) - Static method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Either recreate a StreamingContext from checkpoint data or create a new StreamingContext.
- getOrCreate(String, Function0<StreamingContext>, Configuration, boolean) - Static method in class org.apache.spark.streaming.StreamingContext
-
Either recreate a StreamingContext from checkpoint data or create a new StreamingContext.
- getOrDefault(Param<T>) - Method in interface org.apache.spark.ml.param.Params
-
Gets the value of a param in the embedded param map or its default value.
- getOrElse(Param<T>, T) - Method in class org.apache.spark.ml.param.ParamMap
-
Returns the value associated with a param or a default value.
- getP() - Method in class org.apache.spark.ml.feature.Normalizer
-
- getParam(String) - Method in interface org.apache.spark.ml.param.Params
-
Gets a param by its name.
- getParents(int) - Method in class org.apache.spark.NarrowDependency
-
Get the parent partitions for a child partition.
- getParents(int) - Method in class org.apache.spark.OneToOneDependency
-
- getParents(int) - Method in class org.apache.spark.RangeDependency
-
- getPartition(long, long, int) - Method in class org.apache.spark.graphx.PartitionStrategy.CanonicalRandomVertexCut$
-
- getPartition(long, long, int) - Method in class org.apache.spark.graphx.PartitionStrategy.EdgePartition1D$
-
- getPartition(long, long, int) - Method in class org.apache.spark.graphx.PartitionStrategy.EdgePartition2D$
-
- getPartition(long, long, int) - Method in interface org.apache.spark.graphx.PartitionStrategy
-
Returns the partition number for a given edge.
- getPartition(long, long, int) - Method in class org.apache.spark.graphx.PartitionStrategy.RandomVertexCut$
-
- getPartition(Object) - Method in class org.apache.spark.HashPartitioner
-
- getPartition(Object) - Method in class org.apache.spark.Partitioner
-
- getPartition(Object) - Method in class org.apache.spark.RangePartitioner
-
- getPartitionId() - Static method in class org.apache.spark.TaskContext
-
Returns the partition id of currently active TaskContext.
- getPartitions() - Method in class org.apache.spark.api.r.BaseRRDD
-
- getPartitions() - Method in class org.apache.spark.graphx.EdgeRDD
-
- getPartitions() - Method in class org.apache.spark.graphx.VertexRDD
-
- getPartitions() - Method in class org.apache.spark.rdd.CoGroupedRDD
-
- getPartitions() - Method in class org.apache.spark.rdd.HadoopRDD
-
- getPartitions() - Method in class org.apache.spark.rdd.JdbcRDD
-
- getPartitions() - Method in class org.apache.spark.rdd.NewHadoopRDD
-
- getPartitions() - Method in class org.apache.spark.rdd.PartitionCoalescer
-
- getPartitions() - Method in class org.apache.spark.rdd.PartitionPruningRDD
-
- getPartitions() - Method in class org.apache.spark.rdd.RDD
-
Implemented by subclasses to return the set of partitions in this RDD.
- getPartitions() - Method in class org.apache.spark.rdd.ShuffledRDD
-
- getPartitions() - Method in class org.apache.spark.rdd.UnionRDD
-
- getPath() - Method in class org.apache.spark.input.PortableDataStream
-
- getPattern() - Method in class org.apache.spark.ml.feature.RegexTokenizer
-
- getPersistentRDDs() - Method in class org.apache.spark.SparkContext
-
Returns an immutable map of RDDs that have marked themselves as persistent via cache() call.
- getPoolForName(String) - Method in class org.apache.spark.SparkContext
-
:: DeveloperApi ::
Return the pool associated with the given name, if one exists
- getPreferredLocations(Partition) - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
-
- getPreferredLocations(Partition) - Method in class org.apache.spark.rdd.HadoopRDD
-
- getPreferredLocations(Partition) - Method in class org.apache.spark.rdd.NewHadoopRDD
-
- getPreferredLocations(Partition) - Method in class org.apache.spark.rdd.RDD
-
Optionally overridden by subclasses to specify placement preferences.
- getPreferredLocations(Partition) - Method in class org.apache.spark.rdd.UnionRDD
-
- getQuantileCalculationStrategy() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- getRDDStorageInfo() - Method in class org.apache.spark.SparkContext
-
:: DeveloperApi ::
Return information about what RDDs are cached, if they are in mem or on disk, how much space
they take, etc.
- getReceiver() - Method in class org.apache.spark.streaming.dstream.ReceiverInputDStream
-
Gets the receiver object that will be sent to the worker nodes
to receive data.
- getRootDirectory() - Static method in class org.apache.spark.SparkFiles
-
Get the root directory that contains files added through SparkContext.addFile()
.
- getRuns() - Method in class org.apache.spark.mllib.clustering.KMeans
-
:: Experimental ::
Number of runs of the algorithm to execute in parallel.
- getScalingVec() - Method in class org.apache.spark.ml.feature.ElementwiseProduct
-
- getSchedulingMode() - Method in class org.apache.spark.SparkContext
-
Return current scheduling mode
- getSchema(Class<?>) - Method in class org.apache.spark.sql.SQLContext
-
- getSeed() - Method in class org.apache.spark.mllib.clustering.GaussianMixture
-
Return the random seed
- getSeed() - Method in class org.apache.spark.mllib.clustering.KMeans
-
The random seed for cluster initialization.
- getSeed() - Method in class org.apache.spark.mllib.clustering.LDA
-
Random seed
- getSeq(int) - Method in interface org.apache.spark.sql.Row
-
Returns the value at position i of array type as a Scala Seq.
- getSerializer(Serializer) - Static method in class org.apache.spark.serializer.Serializer
-
- getSerializer(Option<Serializer>) - Static method in class org.apache.spark.serializer.Serializer
-
- getShort(int) - Method in interface org.apache.spark.sql.Row
-
Returns the value at position i as a primitive short.
- getShort(int) - Method in class org.apache.spark.sql.types.GenericArrayData
-
- getSizeAsBytes(String) - Method in class org.apache.spark.SparkConf
-
Get a size parameter as bytes; throws a NoSuchElementException if it's not set.
- getSizeAsBytes(String, String) - Method in class org.apache.spark.SparkConf
-
Get a size parameter as bytes, falling back to a default if not set.
- getSizeAsBytes(String, long) - Method in class org.apache.spark.SparkConf
-
Get a size parameter as bytes, falling back to a default if not set.
- getSizeAsGb(String) - Method in class org.apache.spark.SparkConf
-
Get a size parameter as Gibibytes; throws a NoSuchElementException if it's not set.
- getSizeAsGb(String, String) - Method in class org.apache.spark.SparkConf
-
Get a size parameter as Gibibytes, falling back to a default if not set.
- getSizeAsKb(String) - Method in class org.apache.spark.SparkConf
-
Get a size parameter as Kibibytes; throws a NoSuchElementException if it's not set.
- getSizeAsKb(String, String) - Method in class org.apache.spark.SparkConf
-
Get a size parameter as Kibibytes, falling back to a default if not set.
- getSizeAsMb(String) - Method in class org.apache.spark.SparkConf
-
Get a size parameter as Mebibytes; throws a NoSuchElementException if it's not set.
- getSizeAsMb(String, String) - Method in class org.apache.spark.SparkConf
-
Get a size parameter as Mebibytes, falling back to a default if not set.
- getSparkHome() - Method in class org.apache.spark.api.java.JavaSparkContext
-
Get Spark's home location from either a value set through the constructor,
or the spark.home Java property, or the SPARK_HOME environment variable
(in that order of preference).
- getSplits() - Method in class org.apache.spark.ml.feature.Bucketizer
-
- getSQLDialect() - Method in class org.apache.spark.sql.SQLContext
-
- getStageInfo(int) - Method in class org.apache.spark.api.java.JavaSparkStatusTracker
-
Returns stage information, or null
if the stage info could not be found or was
garbage collected.
- getStageInfo(int) - Method in class org.apache.spark.SparkStatusTracker
-
Returns stage information, or None
if the stage info could not be found or was
garbage collected.
- getStages() - Method in class org.apache.spark.ml.Pipeline
-
- getState() - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
:: DeveloperApi ::
- getState() - Method in class org.apache.spark.streaming.StreamingContext
-
:: DeveloperApi ::
- getStopWords() - Method in class org.apache.spark.ml.feature.StopWordsRemover
-
- getStorageLevel() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Get the RDD's current storage level, or StorageLevel.NONE if none is set.
- getStorageLevel() - Method in class org.apache.spark.graphx.impl.EdgeRDDImpl
-
- getStorageLevel() - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
-
- getStorageLevel() - Method in class org.apache.spark.rdd.RDD
-
Get the RDD's current storage level, or StorageLevel.NONE if none is set.
- getString(int) - Method in interface org.apache.spark.sql.Row
-
Returns the value at position i as a String object.
- getString(String) - Method in class org.apache.spark.sql.types.Metadata
-
Gets a String.
- getStringArray(String) - Method in class org.apache.spark.sql.types.Metadata
-
Gets a String array.
- getStruct(int) - Method in interface org.apache.spark.sql.Row
-
Returns the value at position i of struct type as an
Row
object.
- getStruct(int, int) - Method in class org.apache.spark.sql.types.GenericArrayData
-
- getSubsamplingRate() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- getTau0() - Method in class org.apache.spark.mllib.clustering.OnlineLDAOptimizer
-
A (positive) learning parameter that downweights early iterations.
- getThreadLocal() - Static method in class org.apache.spark.SparkEnv
-
Returns the ThreadLocal SparkEnv.
- getThreshold() - Method in class org.apache.spark.ml.classification.LogisticRegression
-
- getThreshold() - Method in class org.apache.spark.ml.classification.LogisticRegressionModel
-
- getThreshold() - Method in class org.apache.spark.ml.feature.Binarizer
-
- getThreshold() - Method in class org.apache.spark.mllib.classification.LogisticRegressionModel
-
:: Experimental ::
Returns the threshold (if any) used for converting raw prediction scores into 0/1 predictions.
- getThreshold() - Method in class org.apache.spark.mllib.classification.SVMModel
-
:: Experimental ::
Returns the threshold (if any) used for converting raw prediction scores into 0/1 predictions.
- getThresholds() - Method in class org.apache.spark.ml.classification.LogisticRegression
-
- getThresholds() - Method in class org.apache.spark.ml.classification.LogisticRegressionModel
-
- getTimeAsMs(String) - Method in class org.apache.spark.SparkConf
-
Get a time parameter as milliseconds; throws a NoSuchElementException if it's not set.
- getTimeAsMs(String, String) - Method in class org.apache.spark.SparkConf
-
Get a time parameter as milliseconds, falling back to a default if not set.
- getTimeAsSeconds(String) - Method in class org.apache.spark.SparkConf
-
Get a time parameter as seconds; throws a NoSuchElementException if it's not set.
- getTimeAsSeconds(String, String) - Method in class org.apache.spark.SparkConf
-
Get a time parameter as seconds, falling back to a default if not set.
- getTimestamp(int) - Method in interface org.apache.spark.sql.Row
-
Returns the value at position i of date type as java.sql.Timestamp.
- gettingResult() - Method in class org.apache.spark.scheduler.TaskInfo
-
- gettingResultTime() - Method in class org.apache.spark.scheduler.TaskInfo
-
The time when the task started remotely getting the result.
- getTopicConcentration() - Method in class org.apache.spark.mllib.clustering.LDA
-
Concentration parameter (commonly named "beta" or "eta") for the prior placed on topics'
distributions over terms.
- getTreeStrategy() - Method in class org.apache.spark.mllib.tree.configuration.BoostingStrategy
-
- getUseNodeIdCache() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- getUTF8String(int) - Method in class org.apache.spark.sql.types.GenericArrayData
-
- getValidationTol() - Method in class org.apache.spark.mllib.tree.configuration.BoostingStrategy
-
- getValue() - Method in class org.apache.spark.broadcast.Broadcast
-
Actually get the broadcasted value.
- getValue(int) - Method in class org.apache.spark.ml.attribute.NominalAttribute
-
Gets a value given its index.
- getValuesMap(Seq<String>) - Method in interface org.apache.spark.sql.Row
-
Returns a Map(name -> value) for the requested fieldNames
- getVectors() - Method in class org.apache.spark.ml.feature.Word2VecModel
-
Returns a dataframe with two fields, "word" and "vector", with "word" being a String and
and the vector the DenseVector that it is mapped to.
- getVectors() - Method in class org.apache.spark.mllib.feature.Word2VecModel
-
Returns a map of words to their vector representations.
- Gini - Class in org.apache.spark.mllib.tree.impurity
-
:: Experimental ::
Class for calculating the
Gini impurity
during binary classification.
- Gini() - Constructor for class org.apache.spark.mllib.tree.impurity.Gini
-
- globalTopicTotals() - Method in class org.apache.spark.mllib.clustering.DistributedLDAModel
-
- globalTopicTotals() - Method in class org.apache.spark.mllib.clustering.EMLDAOptimizer
-
Aggregate distributions over topics from all term vertices.
- glom() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return an RDD created by coalescing all elements within each partition into an array.
- glom() - Method in class org.apache.spark.rdd.RDD
-
Return an RDD created by coalescing all elements within each partition into an array.
- glom() - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD is generated by applying glom() to each RDD of
this DStream.
- glom() - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream in which each RDD is generated by applying glom() to each RDD of
this DStream.
- gradient() - Method in class org.apache.spark.ml.classification.LogisticAggregator
-
- gradient() - Method in class org.apache.spark.ml.regression.LeastSquaresAggregator
-
- Gradient - Class in org.apache.spark.mllib.optimization
-
:: DeveloperApi ::
Class used to compute the gradient for a loss function, given a single data point.
- Gradient() - Constructor for class org.apache.spark.mllib.optimization.Gradient
-
- gradient(double, double) - Static method in class org.apache.spark.mllib.tree.loss.AbsoluteError
-
Method to calculate the gradients for the gradient boosting calculation for least
absolute error calculation.
- gradient(double, double) - Static method in class org.apache.spark.mllib.tree.loss.LogLoss
-
Method to calculate the loss gradients for the gradient boosting calculation for binary
classification
The gradient with respect to F(x) is: - 4 y / (1 + exp(2 y F(x)))
- gradient(double, double) - Method in interface org.apache.spark.mllib.tree.loss.Loss
-
Method to calculate the gradients for the gradient boosting calculation.
- gradient(double, double) - Static method in class org.apache.spark.mllib.tree.loss.SquaredError
-
Method to calculate the gradients for the gradient boosting calculation for least
squares error calculation.
- GradientBoostedTrees - Class in org.apache.spark.mllib.tree
-
:: Experimental ::
A class that implements
Stochastic Gradient Boosting
for regression and binary classification.
- GradientBoostedTrees(BoostingStrategy) - Constructor for class org.apache.spark.mllib.tree.GradientBoostedTrees
-
- GradientBoostedTreesModel - Class in org.apache.spark.mllib.tree.model
-
:: Experimental ::
Represents a gradient boosted trees model.
- GradientBoostedTreesModel(Enumeration.Value, DecisionTreeModel[], double[]) - Constructor for class org.apache.spark.mllib.tree.model.GradientBoostedTreesModel
-
- GradientDescent - Class in org.apache.spark.mllib.optimization
-
Class used to solve an optimization problem using Gradient Descent.
- Graph<VD,ED> - Class in org.apache.spark.graphx
-
The Graph abstractly represents a graph with arbitrary objects
associated with vertices and edges.
- Graph(ClassTag<VD>, ClassTag<ED>) - Constructor for class org.apache.spark.graphx.Graph
-
- graph() - Method in class org.apache.spark.mllib.clustering.DistributedLDAModel
-
- graph() - Method in class org.apache.spark.mllib.clustering.EMLDAOptimizer
-
The following fields will only be initialized through the initialize() method
- graph() - Method in class org.apache.spark.streaming.dstream.DStream
-
- graph() - Method in class org.apache.spark.streaming.StreamingContext
-
- GraphGenerators - Class in org.apache.spark.graphx.util
-
A collection of graph generating functions.
- GraphGenerators() - Constructor for class org.apache.spark.graphx.util.GraphGenerators
-
- GraphImpl<VD,ED> - Class in org.apache.spark.graphx.impl
-
An implementation of
Graph
to support computation on graphs.
- GraphImpl(VertexRDD<VD>, ReplicatedVertexView<VD, ED>, ClassTag<VD>, ClassTag<ED>) - Constructor for class org.apache.spark.graphx.impl.GraphImpl
-
- GraphImpl(ClassTag<VD>, ClassTag<ED>) - Constructor for class org.apache.spark.graphx.impl.GraphImpl
-
Default constructor is provided to support serialization
- GraphKryoRegistrator - Class in org.apache.spark.graphx
-
Registers GraphX classes with Kryo for improved performance.
- GraphKryoRegistrator() - Constructor for class org.apache.spark.graphx.GraphKryoRegistrator
-
- GraphLoader - Class in org.apache.spark.graphx
-
Provides utilities for loading
Graph
s from files.
- GraphLoader() - Constructor for class org.apache.spark.graphx.GraphLoader
-
- GraphOps<VD,ED> - Class in org.apache.spark.graphx
-
Contains additional functionality for
Graph
.
- GraphOps(Graph<VD, ED>, ClassTag<VD>, ClassTag<ED>) - Constructor for class org.apache.spark.graphx.GraphOps
-
- graphToGraphOps(Graph<VD, ED>, ClassTag<VD>, ClassTag<ED>) - Static method in class org.apache.spark.graphx.Graph
-
Implicitly extracts the
GraphOps
member from a graph.
- GraphXUtils - Class in org.apache.spark.graphx
-
- GraphXUtils() - Constructor for class org.apache.spark.graphx.GraphXUtils
-
- greater(Duration) - Method in class org.apache.spark.streaming.Duration
-
- greater(Time) - Method in class org.apache.spark.streaming.Time
-
- greaterEq(Duration) - Method in class org.apache.spark.streaming.Duration
-
- greaterEq(Time) - Method in class org.apache.spark.streaming.Time
-
- GreaterThan - Class in org.apache.spark.sql.sources
-
A filter that evaluates to true
iff the attribute evaluates to a value
greater than value
.
- GreaterThan(String, Object) - Constructor for class org.apache.spark.sql.sources.GreaterThan
-
- GreaterThanOrEqual - Class in org.apache.spark.sql.sources
-
A filter that evaluates to true
iff the attribute evaluates to a value
greater than or equal to value
.
- GreaterThanOrEqual(String, Object) - Constructor for class org.apache.spark.sql.sources.GreaterThanOrEqual
-
- greatest(Column...) - Static method in class org.apache.spark.sql.functions
-
Returns the greatest value of the list of values, skipping null values.
- greatest(String, String...) - Static method in class org.apache.spark.sql.functions
-
Returns the greatest value of the list of column names, skipping null values.
- greatest(Seq<Column>) - Static method in class org.apache.spark.sql.functions
-
Returns the greatest value of the list of values, skipping null values.
- greatest(String, Seq<String>) - Static method in class org.apache.spark.sql.functions
-
Returns the greatest value of the list of column names, skipping null values.
- gridGraph(SparkContext, int, int) - Static method in class org.apache.spark.graphx.util.GraphGenerators
-
Create rows
by cols
grid graph with each vertex connected to its
row+1 and col+1 neighbors.
- groupArr() - Method in class org.apache.spark.rdd.PartitionCoalescer
-
- groupBy(Function<T, U>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return an RDD of grouped elements.
- groupBy(Function<T, U>, int) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return an RDD of grouped elements.
- groupBy(Function1<T, K>, ClassTag<K>) - Method in class org.apache.spark.rdd.RDD
-
Return an RDD of grouped items.
- groupBy(Function1<T, K>, int, ClassTag<K>) - Method in class org.apache.spark.rdd.RDD
-
Return an RDD of grouped elements.
- groupBy(Function1<T, K>, Partitioner, ClassTag<K>, Ordering<K>) - Method in class org.apache.spark.rdd.RDD
-
Return an RDD of grouped items.
- groupBy(Column...) - Method in class org.apache.spark.sql.DataFrame
-
Groups the
DataFrame
using the specified columns, so we can run aggregation on them.
- groupBy(String, String...) - Method in class org.apache.spark.sql.DataFrame
-
Groups the
DataFrame
using the specified columns, so we can run aggregation on them.
- groupBy(Seq<Column>) - Method in class org.apache.spark.sql.DataFrame
-
Groups the
DataFrame
using the specified columns, so we can run aggregation on them.
- groupBy(String, Seq<String>) - Method in class org.apache.spark.sql.DataFrame
-
Groups the
DataFrame
using the specified columns, so we can run aggregation on them.
- groupByKey(Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Group the values for each key in the RDD into a single sequence.
- groupByKey(int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Group the values for each key in the RDD into a single sequence.
- groupByKey() - Method in class org.apache.spark.api.java.JavaPairRDD
-
Group the values for each key in the RDD into a single sequence.
- groupByKey(Partitioner) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Group the values for each key in the RDD into a single sequence.
- groupByKey(int) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Group the values for each key in the RDD into a single sequence.
- groupByKey() - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Group the values for each key in the RDD into a single sequence.
- groupByKey() - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying groupByKey
to each RDD.
- groupByKey(int) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying groupByKey
to each RDD.
- groupByKey(Partitioner) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying groupByKey
on each RDD of this
DStream.
- groupByKey() - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying groupByKey
to each RDD.
- groupByKey(int) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying groupByKey
to each RDD.
- groupByKey(Partitioner) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying groupByKey
on each RDD.
- groupByKeyAndWindow(Duration) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying groupByKey
over a sliding window.
- groupByKeyAndWindow(Duration, Duration) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying groupByKey
over a sliding window.
- groupByKeyAndWindow(Duration, Duration, int) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying groupByKey
over a sliding window on this
DStream.
- groupByKeyAndWindow(Duration, Duration, Partitioner) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying groupByKey
over a sliding window on this
DStream.
- groupByKeyAndWindow(Duration) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying groupByKey
over a sliding window.
- groupByKeyAndWindow(Duration, Duration) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying groupByKey
over a sliding window.
- groupByKeyAndWindow(Duration, Duration, int) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying groupByKey
over a sliding window on this
DStream.
- groupByKeyAndWindow(Duration, Duration, Partitioner) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Create a new DStream by applying groupByKey
over a sliding window on this
DStream.
- GroupedData - Class in org.apache.spark.sql
-
:: Experimental ::
A set of methods for aggregations on a
DataFrame
, created by
DataFrame.groupBy
.
- GroupedData(DataFrame, Seq<Expression>, GroupedData.GroupType) - Constructor for class org.apache.spark.sql.GroupedData
-
- groupEdges(Function2<ED, ED, ED>) - Method in class org.apache.spark.graphx.Graph
-
Merges multiple edges between two vertices into a single edge.
- groupEdges(Function2<ED, ED, ED>) - Method in class org.apache.spark.graphx.impl.GraphImpl
-
- groupHash() - Method in class org.apache.spark.rdd.PartitionCoalescer
-
- groupWith(JavaPairRDD<K, W>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Alias for cogroup.
- groupWith(JavaPairRDD<K, W1>, JavaPairRDD<K, W2>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Alias for cogroup.
- groupWith(JavaPairRDD<K, W1>, JavaPairRDD<K, W2>, JavaPairRDD<K, W3>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Alias for cogroup.
- groupWith(RDD<Tuple2<K, W>>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Alias for cogroup.
- groupWith(RDD<Tuple2<K, W1>>, RDD<Tuple2<K, W2>>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Alias for cogroup.
- groupWith(RDD<Tuple2<K, W1>>, RDD<Tuple2<K, W2>>, RDD<Tuple2<K, W3>>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Alias for cogroup.
- gt(double) - Static method in class org.apache.spark.ml.param.ParamValidators
-
Check if value > lowerBound
- gt(Object) - Method in class org.apache.spark.sql.Column
-
Greater than.
- gtEq(double) - Static method in class org.apache.spark.ml.param.ParamValidators
-
Check if value >= lowerBound
- L1Updater - Class in org.apache.spark.mllib.optimization
-
:: DeveloperApi ::
Updater for L1 regularized problems.
- L1Updater() - Constructor for class org.apache.spark.mllib.optimization.L1Updater
-
- label() - Method in class org.apache.spark.mllib.regression.LabeledPoint
-
- labelCol() - Method in class org.apache.spark.ml.classification.BinaryLogisticRegressionSummary
-
- labelCol() - Method in interface org.apache.spark.ml.classification.LogisticRegressionSummary
-
Field in "predictions" which gives the the true label of each sample.
- labelCol() - Method in class org.apache.spark.ml.regression.LinearRegressionSummary
-
- LabelConverter - Class in org.apache.spark.ml.classification
-
Label to vector converter.
- LabelConverter() - Constructor for class org.apache.spark.ml.classification.LabelConverter
-
- LabeledPoint - Class in org.apache.spark.mllib.regression
-
Class that represents the features and labels of a data point.
- LabeledPoint(double, Vector) - Constructor for class org.apache.spark.mllib.regression.LabeledPoint
-
- LabelPropagation - Class in org.apache.spark.graphx.lib
-
Label Propagation algorithm.
- LabelPropagation() - Constructor for class org.apache.spark.graphx.lib.LabelPropagation
-
- labels() - Method in class org.apache.spark.ml.feature.IndexToString
-
Param for array of labels.
- labels() - Method in class org.apache.spark.ml.feature.StringIndexerModel
-
- labels() - Method in class org.apache.spark.mllib.classification.NaiveBayesModel
-
- labels() - Method in class org.apache.spark.mllib.evaluation.MulticlassMetrics
-
Returns the sequence of labels in ascending order
- labels() - Method in class org.apache.spark.mllib.evaluation.MultilabelMetrics
-
Returns the sequence of labels in ascending order
- lag(Column, int) - Static method in class org.apache.spark.sql.functions
-
Window function: returns the value that is offset
rows before the current row, and
null
if there is less than offset
rows before the current row.
- lag(String, int) - Static method in class org.apache.spark.sql.functions
-
Window function: returns the value that is offset
rows before the current row, and
null
if there is less than offset
rows before the current row.
- lag(String, int, Object) - Static method in class org.apache.spark.sql.functions
-
Window function: returns the value that is offset
rows before the current row, and
defaultValue
if there is less than offset
rows before the current row.
- lag(Column, int, Object) - Static method in class org.apache.spark.sql.functions
-
Window function: returns the value that is offset
rows before the current row, and
defaultValue
if there is less than offset
rows before the current row.
- LassoModel - Class in org.apache.spark.mllib.regression
-
Regression model trained using Lasso.
- LassoModel(Vector, double) - Constructor for class org.apache.spark.mllib.regression.LassoModel
-
- LassoWithSGD - Class in org.apache.spark.mllib.regression
-
Train a regression model with L1-regularization using Stochastic Gradient Descent.
- LassoWithSGD() - Constructor for class org.apache.spark.mllib.regression.LassoWithSGD
-
Construct a Lasso object with default parameters: {stepSize: 1.0, numIterations: 100,
regParam: 0.01, miniBatchFraction: 1.0}.
- last(Column) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns the last value in a group.
- last(String) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns the last value of the column in a group.
- last_day(Column) - Static method in class org.apache.spark.sql.functions
-
Given a date column, returns the last day of the month which the given date belongs to.
- lastError() - Method in class org.apache.spark.streaming.scheduler.ReceiverInfo
-
- lastErrorMessage() - Method in class org.apache.spark.streaming.scheduler.ReceiverInfo
-
- lastErrorTime() - Method in class org.apache.spark.streaming.scheduler.ReceiverInfo
-
- lastValidTime() - Method in class org.apache.spark.streaming.dstream.InputDStream
-
- latestModel() - Method in class org.apache.spark.mllib.clustering.StreamingKMeans
-
Return the latest model.
- latestModel() - Method in class org.apache.spark.mllib.regression.StreamingLinearAlgorithm
-
Return the latest model.
- launch() - Method in class org.apache.spark.launcher.SparkLauncher
-
Launches a sub-process that will start the configured Spark application.
- launchTime() - Method in class org.apache.spark.scheduler.TaskInfo
-
- launchTime() - Method in class org.apache.spark.status.api.v1.TaskData
-
- layers() - Method in class org.apache.spark.ml.classification.MultilayerPerceptronClassificationModel
-
- LBFGS - Class in org.apache.spark.mllib.optimization
-
:: DeveloperApi ::
Class used to solve an optimization problem using Limited-memory BFGS.
- LBFGS(Gradient, Updater) - Constructor for class org.apache.spark.mllib.optimization.LBFGS
-
- LDA - Class in org.apache.spark.mllib.clustering
-
:: Experimental ::
- LDA() - Constructor for class org.apache.spark.mllib.clustering.LDA
-
Constructs a LDA instance with default parameters.
- LDAModel - Class in org.apache.spark.mllib.clustering
-
:: Experimental ::
- LDAOptimizer - Interface in org.apache.spark.mllib.clustering
-
:: DeveloperApi ::
- lead(String, int) - Static method in class org.apache.spark.sql.functions
-
Window function: returns the value that is offset
rows after the current row, and
null
if there is less than offset
rows after the current row.
- lead(Column, int) - Static method in class org.apache.spark.sql.functions
-
Window function: returns the value that is offset
rows after the current row, and
null
if there is less than offset
rows after the current row.
- lead(String, int, Object) - Static method in class org.apache.spark.sql.functions
-
Window function: returns the value that is offset
rows after the current row, and
defaultValue
if there is less than offset
rows after the current row.
- lead(Column, int, Object) - Static method in class org.apache.spark.sql.functions
-
Window function: returns the value that is offset
rows after the current row, and
defaultValue
if there is less than offset
rows after the current row.
- LeafNode - Class in org.apache.spark.ml.tree
-
:: DeveloperApi ::
Decision tree leaf node.
- learningRate() - Method in class org.apache.spark.mllib.tree.configuration.BoostingStrategy
-
- least(Column...) - Static method in class org.apache.spark.sql.functions
-
Returns the least value of the list of values, skipping null values.
- least(String, String...) - Static method in class org.apache.spark.sql.functions
-
Returns the least value of the list of column names, skipping null values.
- least(Seq<Column>) - Static method in class org.apache.spark.sql.functions
-
Returns the least value of the list of values, skipping null values.
- least(String, Seq<String>) - Static method in class org.apache.spark.sql.functions
-
Returns the least value of the list of column names, skipping null values.
- LeastSquaresAggregator - Class in org.apache.spark.ml.regression
-
LeastSquaresAggregator computes the gradient and loss for a Least-squared loss function,
as used in linear regression for samples in sparse or dense vector in a online fashion.
- LeastSquaresAggregator(Vector, double, double, boolean, double[], double[]) - Constructor for class org.apache.spark.ml.regression.LeastSquaresAggregator
-
- LeastSquaresCostFun - Class in org.apache.spark.ml.regression
-
LeastSquaresCostFun implements Breeze's DiffFunction[T] for Least Squares cost.
- LeastSquaresCostFun(RDD<Tuple2<Object, Vector>>, double, double, boolean, boolean, double[], double[], double) - Constructor for class org.apache.spark.ml.regression.LeastSquaresCostFun
-
- LeastSquaresGradient - Class in org.apache.spark.mllib.optimization
-
:: DeveloperApi ::
Compute gradient and loss for a Least-squared loss function, as used in linear regression.
- LeastSquaresGradient() - Constructor for class org.apache.spark.mllib.optimization.LeastSquaresGradient
-
- left() - Method in class org.apache.spark.sql.sources.And
-
- left() - Method in class org.apache.spark.sql.sources.Or
-
- leftCategories() - Method in class org.apache.spark.ml.tree.CategoricalSplit
-
Get sorted categories which split to the left
- leftChild() - Method in class org.apache.spark.ml.tree.InternalNode
-
- leftChildIndex(int) - Static method in class org.apache.spark.mllib.tree.model.Node
-
Return the index of the left child of this node.
- leftImpurity() - Method in class org.apache.spark.mllib.tree.model.InformationGainStats
-
- leftJoin(RDD<Tuple2<Object, VD2>>, Function3<Object, VD, Option<VD2>, VD3>, ClassTag<VD2>, ClassTag<VD3>) - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
-
- leftJoin(RDD<Tuple2<Object, VD2>>, Function3<Object, VD, Option<VD2>, VD3>, ClassTag<VD2>, ClassTag<VD3>) - Method in class org.apache.spark.graphx.VertexRDD
-
Left joins this VertexRDD with an RDD containing vertex attribute pairs.
- leftNode() - Method in class org.apache.spark.mllib.tree.model.Node
-
- leftOuterJoin(JavaPairRDD<K, W>, Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Perform a left outer join of this
and other
.
- leftOuterJoin(JavaPairRDD<K, W>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Perform a left outer join of this
and other
.
- leftOuterJoin(JavaPairRDD<K, W>, int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Perform a left outer join of this
and other
.
- leftOuterJoin(RDD<Tuple2<K, W>>, Partitioner) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Perform a left outer join of this
and other
.
- leftOuterJoin(RDD<Tuple2<K, W>>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Perform a left outer join of this
and other
.
- leftOuterJoin(RDD<Tuple2<K, W>>, int) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Perform a left outer join of this
and other
.
- leftOuterJoin(JavaPairDStream<K, W>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying 'left outer join' between RDDs of this
DStream and
other
DStream.
- leftOuterJoin(JavaPairDStream<K, W>, int) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying 'left outer join' between RDDs of this
DStream and
other
DStream.
- leftOuterJoin(JavaPairDStream<K, W>, Partitioner) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying 'left outer join' between RDDs of this
DStream and
other
DStream.
- leftOuterJoin(DStream<Tuple2<K, W>>, ClassTag<W>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying 'left outer join' between RDDs of this
DStream and
other
DStream.
- leftOuterJoin(DStream<Tuple2<K, W>>, int, ClassTag<W>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying 'left outer join' between RDDs of this
DStream and
other
DStream.
- leftOuterJoin(DStream<Tuple2<K, W>>, Partitioner, ClassTag<W>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying 'left outer join' between RDDs of this
DStream and
other
DStream.
- leftPredict() - Method in class org.apache.spark.mllib.tree.model.InformationGainStats
-
- leftZipJoin(VertexRDD<VD2>, Function3<Object, VD, Option<VD2>, VD3>, ClassTag<VD2>, ClassTag<VD3>) - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
-
- leftZipJoin(VertexRDD<VD2>, Function3<Object, VD, Option<VD2>, VD3>, ClassTag<VD2>, ClassTag<VD3>) - Method in class org.apache.spark.graphx.VertexRDD
-
Left joins this RDD with another VertexRDD with the same index.
- LEGACY_DRIVER_IDENTIFIER() - Static method in class org.apache.spark.SparkContext
-
Legacy version of DRIVER_IDENTIFIER, retained for backwards-compatibility.
- length() - Method in class org.apache.spark.scheduler.SplitInfo
-
- length(Column) - Static method in class org.apache.spark.sql.functions
-
Computes the length of a given string or binary column.
- length() - Method in interface org.apache.spark.sql.Row
-
Number of elements in the Row.
- length() - Method in class org.apache.spark.sql.sources.HadoopFsRelation.FakeFileStatus
-
- length() - Method in class org.apache.spark.sql.types.StructType
-
- length() - Method in class org.apache.spark.util.Vector
-
- leq(Object) - Method in class org.apache.spark.sql.Column
-
Less than or equal to.
- less(Duration) - Method in class org.apache.spark.streaming.Duration
-
- less(Time) - Method in class org.apache.spark.streaming.Time
-
- lessEq(Duration) - Method in class org.apache.spark.streaming.Duration
-
- lessEq(Time) - Method in class org.apache.spark.streaming.Time
-
- LessThan - Class in org.apache.spark.sql.sources
-
A filter that evaluates to true
iff the attribute evaluates to a value
less than value
.
- LessThan(String, Object) - Constructor for class org.apache.spark.sql.sources.LessThan
-
- LessThanOrEqual - Class in org.apache.spark.sql.sources
-
A filter that evaluates to true
iff the attribute evaluates to a value
less than or equal to value
.
- LessThanOrEqual(String, Object) - Constructor for class org.apache.spark.sql.sources.LessThanOrEqual
-
- levenshtein(Column, Column) - Static method in class org.apache.spark.sql.functions
-
Computes the Levenshtein distance of the two given string columns.
- like(String) - Method in class org.apache.spark.sql.Column
-
SQL like expression.
- limit(int) - Method in class org.apache.spark.sql.DataFrame
-
Returns a new
DataFrame
by taking the first
n
rows.
- line() - Method in exception org.apache.spark.sql.AnalysisException
-
- LinearDataGenerator - Class in org.apache.spark.mllib.util
-
:: DeveloperApi ::
Generate sample data used for Linear Data.
- LinearDataGenerator() - Constructor for class org.apache.spark.mllib.util.LinearDataGenerator
-
- LinearRegression - Class in org.apache.spark.ml.regression
-
:: Experimental ::
Linear regression.
- LinearRegression(String) - Constructor for class org.apache.spark.ml.regression.LinearRegression
-
- LinearRegression() - Constructor for class org.apache.spark.ml.regression.LinearRegression
-
- LinearRegressionModel - Class in org.apache.spark.ml.regression
-
- LinearRegressionModel - Class in org.apache.spark.mllib.regression
-
Regression model trained using LinearRegression.
- LinearRegressionModel(Vector, double) - Constructor for class org.apache.spark.mllib.regression.LinearRegressionModel
-
- LinearRegressionSummary - Class in org.apache.spark.ml.regression
-
:: Experimental ::
Linear regression results evaluated on a dataset.
- LinearRegressionTrainingSummary - Class in org.apache.spark.ml.regression
-
:: Experimental ::
Linear regression training results.
- LinearRegressionWithSGD - Class in org.apache.spark.mllib.regression
-
Train a linear regression model with no regularization using Stochastic Gradient Descent.
- LinearRegressionWithSGD() - Constructor for class org.apache.spark.mllib.regression.LinearRegressionWithSGD
-
Construct a LinearRegression object with default parameters: {stepSize: 1.0,
numIterations: 100, miniBatchFraction: 1.0}.
- listener() - Method in class org.apache.spark.sql.SQLContext
-
- listenerBus() - Method in class org.apache.spark.SparkContext
-
- listLeafFiles(FileSystem, FileStatus) - Static method in class org.apache.spark.sql.sources.HadoopFsRelation
-
- listLeafFilesInParallel(String[], Configuration, SparkContext) - Static method in class org.apache.spark.sql.sources.HadoopFsRelation
-
- lit(Object) - Static method in class org.apache.spark.sql.functions
-
Creates a
Column
of literal value.
- load(SparkContext, String) - Static method in class org.apache.spark.mllib.classification.LogisticRegressionModel
-
- load(SparkContext, String) - Static method in class org.apache.spark.mllib.classification.NaiveBayesModel
-
- load(SparkContext, String) - Static method in class org.apache.spark.mllib.classification.SVMModel
-
- load(SparkContext, String) - Static method in class org.apache.spark.mllib.clustering.DistributedLDAModel
-
- load(SparkContext, String) - Static method in class org.apache.spark.mllib.clustering.GaussianMixtureModel
-
- load(SparkContext, String) - Static method in class org.apache.spark.mllib.clustering.KMeansModel
-
- load(SparkContext, String) - Static method in class org.apache.spark.mllib.clustering.LocalLDAModel
-
- load(SparkContext, String) - Static method in class org.apache.spark.mllib.clustering.PowerIterationClusteringModel
-
- load(SparkContext, String) - Static method in class org.apache.spark.mllib.feature.Word2VecModel
-
- load(SparkContext, String) - Static method in class org.apache.spark.mllib.recommendation.MatrixFactorizationModel
-
Load a model from the given path.
- load(SparkContext, String) - Static method in class org.apache.spark.mllib.regression.IsotonicRegressionModel
-
- load(SparkContext, String) - Static method in class org.apache.spark.mllib.regression.LassoModel
-
- load(SparkContext, String) - Static method in class org.apache.spark.mllib.regression.LinearRegressionModel
-
- load(SparkContext, String) - Static method in class org.apache.spark.mllib.regression.RidgeRegressionModel
-
- load(SparkContext, String) - Static method in class org.apache.spark.mllib.tree.model.DecisionTreeModel
-
- load(SparkContext, String) - Static method in class org.apache.spark.mllib.tree.model.GradientBoostedTreesModel
-
- load(SparkContext, String) - Static method in class org.apache.spark.mllib.tree.model.RandomForestModel
-
- load(SparkContext, String) - Method in interface org.apache.spark.mllib.util.Loader
-
Load a model from the given path.
- load(String) - Method in class org.apache.spark.sql.DataFrameReader
-
Loads input in as a
DataFrame
, for data sources that require a path (e.g.
- load() - Method in class org.apache.spark.sql.DataFrameReader
-
Loads input in as a
DataFrame
, for data sources that don't require a path (e.g.
- load(String) - Method in class org.apache.spark.sql.SQLContext
-
Deprecated.
As of 1.4.0, replaced by read().load(path)
.
- load(String, String) - Method in class org.apache.spark.sql.SQLContext
-
Deprecated.
As of 1.4.0, replaced by read().format(source).load(path)
.
- load(String, Map<String, String>) - Method in class org.apache.spark.sql.SQLContext
-
Deprecated.
As of 1.4.0, replaced by read().format(source).options(options).load()
.
- load(String, Map<String, String>) - Method in class org.apache.spark.sql.SQLContext
-
Deprecated.
As of 1.4.0, replaced by read().format(source).options(options).load()
.
- load(String, StructType, Map<String, String>) - Method in class org.apache.spark.sql.SQLContext
-
Deprecated.
As of 1.4.0, replaced by
read().format(source).schema(schema).options(options).load()
.
- load(String, StructType, Map<String, String>) - Method in class org.apache.spark.sql.SQLContext
-
Deprecated.
As of 1.4.0, replaced by
read().format(source).schema(schema).options(options).load()
.
- Loader<M extends Saveable> - Interface in org.apache.spark.mllib.util
-
:: DeveloperApi ::
- loadLabeledData(SparkContext, String) - Static method in class org.apache.spark.mllib.util.MLUtils
-
- loadLabeledPoints(SparkContext, String, int) - Static method in class org.apache.spark.mllib.util.MLUtils
-
Loads labeled points saved using RDD[LabeledPoint].saveAsTextFile
.
- loadLabeledPoints(SparkContext, String) - Static method in class org.apache.spark.mllib.util.MLUtils
-
Loads labeled points saved using RDD[LabeledPoint].saveAsTextFile
with the default number of
partitions.
- loadLibSVMFile(SparkContext, String, int, int) - Static method in class org.apache.spark.mllib.util.MLUtils
-
Loads labeled data in the LIBSVM format into an RDD[LabeledPoint].
- loadLibSVMFile(SparkContext, String, boolean, int, int) - Static method in class org.apache.spark.mllib.util.MLUtils
-
- loadLibSVMFile(SparkContext, String, int) - Static method in class org.apache.spark.mllib.util.MLUtils
-
Loads labeled data in the LIBSVM format into an RDD[LabeledPoint], with the default number of
partitions.
- loadLibSVMFile(SparkContext, String, boolean, int) - Static method in class org.apache.spark.mllib.util.MLUtils
-
- loadLibSVMFile(SparkContext, String, boolean) - Static method in class org.apache.spark.mllib.util.MLUtils
-
- loadLibSVMFile(SparkContext, String) - Static method in class org.apache.spark.mllib.util.MLUtils
-
Loads binary labeled data in the LIBSVM format into an RDD[LabeledPoint], with number of
features determined automatically and the default number of partitions.
- loadVectors(SparkContext, String, int) - Static method in class org.apache.spark.mllib.util.MLUtils
-
Loads vectors saved using RDD[Vector].saveAsTextFile
.
- loadVectors(SparkContext, String) - Static method in class org.apache.spark.mllib.util.MLUtils
-
Loads vectors saved using RDD[Vector].saveAsTextFile
with the default number of partitions.
- localBlocksFetched() - Method in class org.apache.spark.status.api.v1.ShuffleReadMetricDistributions
-
- localBlocksFetched() - Method in class org.apache.spark.status.api.v1.ShuffleReadMetrics
-
- localCheckpoint() - Method in class org.apache.spark.rdd.RDD
-
Mark this RDD for local checkpointing using Spark's existing caching layer.
- LocalLDAModel - Class in org.apache.spark.mllib.clustering
-
:: Experimental ::
- localProperties() - Method in class org.apache.spark.SparkContext
-
- localValue() - Method in class org.apache.spark.Accumulable
-
Get the current value of this accumulator from within a task.
- locate(String, Column) - Static method in class org.apache.spark.sql.functions
-
Locate the position of the first occurrence of substr.
- locate(String, Column, int) - Static method in class org.apache.spark.sql.functions
-
Locate the position of the first occurrence of substr in a string column, after position pos.
- location() - Method in class org.apache.spark.streaming.scheduler.ReceiverInfo
-
- log() - Method in interface org.apache.spark.Logging
-
- log(Column) - Static method in class org.apache.spark.sql.functions
-
Computes the natural logarithm of the given value.
- log(String) - Static method in class org.apache.spark.sql.functions
-
Computes the natural logarithm of the given column.
- log(double, Column) - Static method in class org.apache.spark.sql.functions
-
Returns the first argument-base logarithm of the second argument.
- log(double, String) - Static method in class org.apache.spark.sql.functions
-
Returns the first argument-base logarithm of the second argument.
- log10(Column) - Static method in class org.apache.spark.sql.functions
-
Computes the logarithm of the given value in base 10.
- log10(String) - Static method in class org.apache.spark.sql.functions
-
Computes the logarithm of the given value in base 10.
- log1p(Column) - Static method in class org.apache.spark.sql.functions
-
Computes the natural logarithm of the given value plus one.
- log1p(String) - Static method in class org.apache.spark.sql.functions
-
Computes the natural logarithm of the given column plus one.
- log2(Column) - Static method in class org.apache.spark.sql.functions
-
Computes the logarithm of the given column in base 2.
- log2(String) - Static method in class org.apache.spark.sql.functions
-
Computes the logarithm of the given value in base 2.
- log_() - Method in interface org.apache.spark.Logging
-
- logDebug(Function0<String>) - Method in interface org.apache.spark.Logging
-
- logDebug(Function0<String>, Throwable) - Method in interface org.apache.spark.Logging
-
- logDeprecationWarning(String) - Static method in class org.apache.spark.SparkConf
-
Logs a warning message if the given config key is deprecated.
- logDirName() - Method in class org.apache.spark.scheduler.JobLogger
-
- logError(Function0<String>) - Method in interface org.apache.spark.Logging
-
- logError(Function0<String>, Throwable) - Method in interface org.apache.spark.Logging
-
- Logging - Interface in org.apache.spark
-
:: DeveloperApi ::
Utility trait for classes that want to log data.
- logical() - Method in class org.apache.spark.sql.SQLContext.QueryExecution
-
- logicalPlan() - Method in class org.apache.spark.sql.DataFrame
-
- logInfo(Function0<String>) - Method in interface org.apache.spark.Logging
-
- logInfo(Function0<String>, Throwable) - Method in interface org.apache.spark.Logging
-
- LogisticAggregator - Class in org.apache.spark.ml.classification
-
LogisticAggregator computes the gradient and loss for binary logistic loss function, as used
in binary classification for samples in sparse or dense vector in a online fashion.
- LogisticAggregator(Vector, int, boolean, double[], double[]) - Constructor for class org.apache.spark.ml.classification.LogisticAggregator
-
- LogisticCostFun - Class in org.apache.spark.ml.classification
-
LogisticCostFun implements Breeze's DiffFunction[T] for a multinomial logistic loss function,
as used in multi-class classification (it is also used in binary logistic regression).
- LogisticCostFun(RDD<Tuple2<Object, Vector>>, int, boolean, boolean, double[], double[], double) - Constructor for class org.apache.spark.ml.classification.LogisticCostFun
-
- LogisticGradient - Class in org.apache.spark.mllib.optimization
-
:: DeveloperApi ::
Compute gradient and loss for a multinomial logistic loss function, as used
in multi-class classification (it is also used in binary logistic regression).
- LogisticGradient(int) - Constructor for class org.apache.spark.mllib.optimization.LogisticGradient
-
- LogisticGradient() - Constructor for class org.apache.spark.mllib.optimization.LogisticGradient
-
- LogisticRegression - Class in org.apache.spark.ml.classification
-
:: Experimental ::
Logistic regression.
- LogisticRegression(String) - Constructor for class org.apache.spark.ml.classification.LogisticRegression
-
- LogisticRegression() - Constructor for class org.apache.spark.ml.classification.LogisticRegression
-
- LogisticRegressionDataGenerator - Class in org.apache.spark.mllib.util
-
:: DeveloperApi ::
Generate test data for LogisticRegression.
- LogisticRegressionDataGenerator() - Constructor for class org.apache.spark.mllib.util.LogisticRegressionDataGenerator
-
- LogisticRegressionModel - Class in org.apache.spark.ml.classification
-
- LogisticRegressionModel - Class in org.apache.spark.mllib.classification
-
Classification model trained using Multinomial/Binary Logistic Regression.
- LogisticRegressionModel(Vector, double, int, int) - Constructor for class org.apache.spark.mllib.classification.LogisticRegressionModel
-
- LogisticRegressionModel(Vector, double) - Constructor for class org.apache.spark.mllib.classification.LogisticRegressionModel
-
- LogisticRegressionSummary - Interface in org.apache.spark.ml.classification
-
Abstraction for Logistic Regression Results for a given model.
- LogisticRegressionTrainingSummary - Interface in org.apache.spark.ml.classification
-
Abstraction for multinomial Logistic Regression Training results.
- LogisticRegressionWithLBFGS - Class in org.apache.spark.mllib.classification
-
Train a classification model for Multinomial/Binary Logistic Regression using
Limited-memory BFGS.
- LogisticRegressionWithLBFGS() - Constructor for class org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS
-
- LogisticRegressionWithSGD - Class in org.apache.spark.mllib.classification
-
Train a classification model for Binary Logistic Regression
using Stochastic Gradient Descent.
- LogisticRegressionWithSGD() - Constructor for class org.apache.spark.mllib.classification.LogisticRegressionWithSGD
-
Construct a LogisticRegression object with default parameters: {stepSize: 1.0,
numIterations: 100, regParm: 0.01, miniBatchFraction: 1.0}.
- logLikelihood() - Method in class org.apache.spark.mllib.clustering.DistributedLDAModel
-
Log likelihood of the observed tokens in the training set,
given the current parameter estimates:
log P(docs | topics, topic distributions for docs, alpha, eta)
- logLikelihood() - Method in class org.apache.spark.mllib.clustering.ExpectationSum
-
- logLikelihood(RDD<Tuple2<Object, Vector>>) - Method in class org.apache.spark.mllib.clustering.LocalLDAModel
-
Calculates a lower bound on the log likelihood of the entire corpus.
- logLikelihood(JavaPairRDD<Long, Vector>) - Method in class org.apache.spark.mllib.clustering.LocalLDAModel
-
Java-friendly version of logLikelihood
- LogLoss - Class in org.apache.spark.mllib.tree.loss
-
:: DeveloperApi ::
Class for log loss calculation (for classification).
- LogLoss() - Constructor for class org.apache.spark.mllib.tree.loss.LogLoss
-
- logName() - Method in interface org.apache.spark.Logging
-
- LogNormalGenerator - Class in org.apache.spark.mllib.random
-
:: DeveloperApi ::
Generates i.i.d.
- LogNormalGenerator(double, double) - Constructor for class org.apache.spark.mllib.random.LogNormalGenerator
-
- logNormalGraph(SparkContext, int, int, double, double, long) - Static method in class org.apache.spark.graphx.util.GraphGenerators
-
Generate a graph whose vertex out degree distribution is log normal.
- logNormalJavaRDD(JavaSparkContext, double, double, long, int, long) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
- logNormalJavaRDD(JavaSparkContext, double, double, long, int) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
- logNormalJavaRDD(JavaSparkContext, double, double, long) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
- logNormalJavaVectorRDD(JavaSparkContext, double, double, long, int, int, long) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
- logNormalJavaVectorRDD(JavaSparkContext, double, double, long, int, int) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
- logNormalJavaVectorRDD(JavaSparkContext, double, double, long, int) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
- logNormalRDD(SparkContext, double, double, long, int, long) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
Generates an RDD comprised of i.i.d.
samples from the log normal distribution with the input
mean and standard deviation
- logNormalVectorRDD(SparkContext, double, double, long, int, int, long) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
Generates an RDD[Vector] with vectors containing i.i.d.
samples drawn from a
log normal distribution.
- logpdf(Vector) - Method in class org.apache.spark.mllib.stat.distribution.MultivariateGaussian
-
Returns the log-density of this multivariate Gaussian at given point, x
- logPerplexity(RDD<Tuple2<Object, Vector>>) - Method in class org.apache.spark.mllib.clustering.LocalLDAModel
-
Calculate an upper bound bound on perplexity.
- logPerplexity(JavaPairRDD<Long, Vector>) - Method in class org.apache.spark.mllib.clustering.LocalLDAModel
-
Java-friendly version of logPerplexity
- logPrior() - Method in class org.apache.spark.mllib.clustering.DistributedLDAModel
-
Log probability of the current parameter estimate:
log P(topics, topic distributions for docs | alpha, eta)
- logTrace(Function0<String>) - Method in interface org.apache.spark.Logging
-
- logTrace(Function0<String>, Throwable) - Method in interface org.apache.spark.Logging
-
- logUrlMap() - Method in class org.apache.spark.scheduler.cluster.ExecutorInfo
-
- logWarning(Function0<String>) - Method in interface org.apache.spark.Logging
-
- logWarning(Function0<String>, Throwable) - Method in interface org.apache.spark.Logging
-
- LongDecimal() - Static method in class org.apache.spark.sql.types.DecimalType
-
- LongParam - Class in org.apache.spark.ml.param
-
:: DeveloperApi ::
Specialized version of Param[Long
] for Java.
- LongParam(String, String, String, Function1<Object, Object>) - Constructor for class org.apache.spark.ml.param.LongParam
-
- LongParam(String, String, String) - Constructor for class org.apache.spark.ml.param.LongParam
-
- LongParam(Identifiable, String, String, Function1<Object, Object>) - Constructor for class org.apache.spark.ml.param.LongParam
-
- LongParam(Identifiable, String, String) - Constructor for class org.apache.spark.ml.param.LongParam
-
- longToLongWritable(long) - Static method in class org.apache.spark.SparkContext
-
- LongType - Static variable in class org.apache.spark.sql.types.DataTypes
-
Gets the LongType object.
- LongType - Class in org.apache.spark.sql.types
-
:: DeveloperApi ::
The data type representing Long
values.
- longWritableConverter() - Static method in class org.apache.spark.SparkContext
-
- lookup(K) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return the list of values in the RDD for key key
.
- lookup(K) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Return the list of values in the RDD for key key
.
- lookupTimeout(SparkConf) - Static method in class org.apache.spark.util.RpcUtils
-
- loss() - Method in class org.apache.spark.ml.classification.LogisticAggregator
-
- loss() - Method in class org.apache.spark.ml.regression.LeastSquaresAggregator
-
- loss() - Method in class org.apache.spark.mllib.tree.configuration.BoostingStrategy
-
- Loss - Interface in org.apache.spark.mllib.tree.loss
-
:: DeveloperApi ::
Trait for adding "pluggable" loss functions for the gradient boosting algorithm.
- Losses - Class in org.apache.spark.mllib.tree.loss
-
- Losses() - Constructor for class org.apache.spark.mllib.tree.loss.Losses
-
- lossType() - Method in class org.apache.spark.ml.classification.GBTClassifier
-
Loss function which GBT tries to minimize.
- lossType() - Method in class org.apache.spark.ml.regression.GBTRegressor
-
Loss function which GBT tries to minimize.
- low() - Method in class org.apache.spark.partial.BoundedDouble
-
- lower(Column) - Static method in class org.apache.spark.sql.functions
-
Converts a string column to lower case.
- lpad(Column, int, String) - Static method in class org.apache.spark.sql.functions
-
Left-pad the string column with
- lt(double) - Static method in class org.apache.spark.ml.param.ParamValidators
-
Check if value < upperBound
- lt(Object) - Method in class org.apache.spark.sql.Column
-
Less than.
- ltEq(double) - Static method in class org.apache.spark.ml.param.ParamValidators
-
Check if value <= upperBound
- ltrim(Column) - Static method in class org.apache.spark.sql.functions
-
Trim the spaces from left end for the specified string value.
- LZ4CompressionCodec - Class in org.apache.spark.io
-
- LZ4CompressionCodec(SparkConf) - Constructor for class org.apache.spark.io.LZ4CompressionCodec
-
- LZFCompressionCodec - Class in org.apache.spark.io
-
- LZFCompressionCodec(SparkConf) - Constructor for class org.apache.spark.io.LZFCompressionCodec
-
- main(String[]) - Static method in class org.apache.spark.examples.streaming.JavaKinesisWordCountASL
-
- main(String[]) - Static method in class org.apache.spark.examples.streaming.KinesisWordCountASL
-
- main(String[]) - Static method in class org.apache.spark.examples.streaming.KinesisWordProducerASL
-
- main(String[]) - Static method in class org.apache.spark.mllib.util.KMeansDataGenerator
-
- main(String[]) - Static method in class org.apache.spark.mllib.util.LinearDataGenerator
-
- main(String[]) - Static method in class org.apache.spark.mllib.util.LogisticRegressionDataGenerator
-
- main(String[]) - Static method in class org.apache.spark.mllib.util.MFDataGenerator
-
- main(String[]) - Static method in class org.apache.spark.mllib.util.SVMDataGenerator
-
- makeDriverRef(String, SparkConf, org.apache.spark.rpc.RpcEnv) - Static method in class org.apache.spark.util.RpcUtils
-
Retrieve a RpcEndpointRef
which is located in the driver via its name.
- makeRDD(Seq<T>, int, ClassTag<T>) - Method in class org.apache.spark.SparkContext
-
Distribute a local Scala collection to form an RDD.
- makeRDD(Seq<Tuple2<T, Seq<String>>>, ClassTag<T>) - Method in class org.apache.spark.SparkContext
-
Distribute a local Scala collection to form an RDD, with one or more
location preferences (hostnames of Spark nodes) for each object.
- map(Function<T, R>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return a new RDD by applying a function to all elements of this RDD.
- map(Function1<Object, Object>) - Method in interface org.apache.spark.mllib.linalg.Matrix
-
Map the values of this matrix using a function.
- map(Function1<R, T>) - Method in class org.apache.spark.partial.PartialResult
-
Transform this PartialResult into a PartialResult of type T.
- map(Function1<T, U>, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
-
Return a new RDD by applying a function to all elements of this RDD.
- map(DataType, DataType) - Method in class org.apache.spark.sql.ColumnName
-
Creates a new StructField
of type map.
- map(MapType) - Method in class org.apache.spark.sql.ColumnName
-
- map(Function1<Row, R>, ClassTag<R>) - Method in class org.apache.spark.sql.DataFrame
-
Returns a new RDD by applying a function to all rows of this DataFrame.
- map() - Method in class org.apache.spark.sql.types.Metadata
-
- map(Function<T, R>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream by applying a function to all elements of this DStream.
- map(Function1<T, U>, ClassTag<U>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream by applying a function to all elements of this DStream.
- MapData - Class in org.apache.spark.sql.types
-
- MapData() - Constructor for class org.apache.spark.sql.types.MapData
-
- mapEdgePartitions(Function2<Object, EdgePartition<ED, VD>, EdgePartition<ED2, VD2>>, ClassTag<ED2>, ClassTag<VD2>) - Method in class org.apache.spark.graphx.impl.EdgeRDDImpl
-
- mapEdges(Function1<Edge<ED>, ED2>, ClassTag<ED2>) - Method in class org.apache.spark.graphx.Graph
-
Transforms each edge attribute in the graph using the map function.
- mapEdges(Function2<Object, Iterator<Edge<ED>>, Iterator<ED2>>, ClassTag<ED2>) - Method in class org.apache.spark.graphx.Graph
-
Transforms each edge attribute using the map function, passing it a whole partition at a
time.
- mapEdges(Function2<Object, Iterator<Edge<ED>>, Iterator<ED2>>, ClassTag<ED2>) - Method in class org.apache.spark.graphx.impl.GraphImpl
-
- mapId() - Method in class org.apache.spark.FetchFailed
-
- mapId() - Method in class org.apache.spark.storage.ShuffleBlockId
-
- mapId() - Method in class org.apache.spark.storage.ShuffleDataBlockId
-
- mapId() - Method in class org.apache.spark.storage.ShuffleIndexBlockId
-
- mapOutputTracker() - Method in class org.apache.spark.SparkEnv
-
- mapPartitions(FlatMapFunction<Iterator<T>, U>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return a new RDD by applying a function to each partition of this RDD.
- mapPartitions(FlatMapFunction<Iterator<T>, U>, boolean) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return a new RDD by applying a function to each partition of this RDD.
- mapPartitions(Function1<Iterator<T>, Iterator<U>>, boolean, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
-
Return a new RDD by applying a function to each partition of this RDD.
- mapPartitions(Function1<Iterator<Row>, Iterator<R>>, ClassTag<R>) - Method in class org.apache.spark.sql.DataFrame
-
Returns a new RDD by applying a function to each partition of this DataFrame.
- mapPartitions(FlatMapFunction<Iterator<T>, U>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD is generated by applying mapPartitions() to each RDDs
of this DStream.
- mapPartitions(Function1<Iterator<T>, Iterator<U>>, boolean, ClassTag<U>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream in which each RDD is generated by applying mapPartitions() to each RDDs
of this DStream.
- mapPartitionsToDouble(DoubleFlatMapFunction<Iterator<T>>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return a new RDD by applying a function to each partition of this RDD.
- mapPartitionsToDouble(DoubleFlatMapFunction<Iterator<T>>, boolean) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return a new RDD by applying a function to each partition of this RDD.
- mapPartitionsToPair(PairFlatMapFunction<Iterator<T>, K2, V2>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return a new RDD by applying a function to each partition of this RDD.
- mapPartitionsToPair(PairFlatMapFunction<Iterator<T>, K2, V2>, boolean) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return a new RDD by applying a function to each partition of this RDD.
- mapPartitionsToPair(PairFlatMapFunction<Iterator<T>, K2, V2>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD is generated by applying mapPartitions() to each RDDs
of this DStream.
- mapPartitionsWithContext(Function2<TaskContext, Iterator<T>, Iterator<U>>, boolean, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
-
:: DeveloperApi ::
Return a new RDD by applying a function to each partition of this RDD.
- mapPartitionsWithIndex(Function2<Integer, Iterator<T>, Iterator<R>>, boolean) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return a new RDD by applying a function to each partition of this RDD, while tracking the index
of the original partition.
- mapPartitionsWithIndex(Function2<Object, Iterator<T>, Iterator<U>>, boolean, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
-
Return a new RDD by applying a function to each partition of this RDD, while tracking the index
of the original partition.
- mapPartitionsWithInputSplit(Function2<InputSplit, Iterator<Tuple2<K, V>>, Iterator<R>>, boolean) - Method in class org.apache.spark.api.java.JavaHadoopRDD
-
Maps over a partition, providing the InputSplit that was used as the base of the partition.
- mapPartitionsWithInputSplit(Function2<InputSplit, Iterator<Tuple2<K, V>>, Iterator<R>>, boolean) - Method in class org.apache.spark.api.java.JavaNewHadoopRDD
-
Maps over a partition, providing the InputSplit that was used as the base of the partition.
- mapPartitionsWithInputSplit(Function2<InputSplit, Iterator<Tuple2<K, V>>, Iterator<U>>, boolean, ClassTag<U>) - Method in class org.apache.spark.rdd.HadoopRDD
-
Maps over a partition, providing the InputSplit that was used as the base of the partition.
- mapPartitionsWithInputSplit(Function2<InputSplit, Iterator<Tuple2<K, V>>, Iterator<U>>, boolean, ClassTag<U>) - Method in class org.apache.spark.rdd.NewHadoopRDD
-
Maps over a partition, providing the InputSplit that was used as the base of the partition.
- mapPartitionsWithSplit(Function2<Object, Iterator<T>, Iterator<U>>, boolean, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
-
Return a new RDD by applying a function to each partition of this RDD, while tracking the index
of the original partition.
- mapredInputFormat() - Method in class org.apache.spark.scheduler.InputFormatInfo
-
- mapreduceInputFormat() - Method in class org.apache.spark.scheduler.InputFormatInfo
-
- mapReduceTriplets(Function1<EdgeTriplet<VD, ED>, Iterator<Tuple2<Object, A>>>, Function2<A, A, A>, Option<Tuple2<VertexRDD<?>, EdgeDirection>>, ClassTag<A>) - Method in class org.apache.spark.graphx.Graph
-
Aggregates values from the neighboring edges and vertices of each vertex.
- mapReduceTriplets(Function1<EdgeTriplet<VD, ED>, Iterator<Tuple2<Object, A>>>, Function2<A, A, A>, Option<Tuple2<VertexRDD<?>, EdgeDirection>>, ClassTag<A>) - Method in class org.apache.spark.graphx.impl.GraphImpl
-
- mapSideCombine() - Method in class org.apache.spark.ShuffleDependency
-
- mapToDouble(DoubleFunction<T>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return a new RDD by applying a function to all elements of this RDD.
- mapToPair(PairFunction<T, K2, V2>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return a new RDD by applying a function to all elements of this RDD.
- mapToPair(PairFunction<T, K2, V2>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream by applying a function to all elements of this DStream.
- mapTriplets(Function1<EdgeTriplet<VD, ED>, ED2>, ClassTag<ED2>) - Method in class org.apache.spark.graphx.Graph
-
Transforms each edge attribute using the map function, passing it the adjacent vertex
attributes as well.
- mapTriplets(Function1<EdgeTriplet<VD, ED>, ED2>, TripletFields, ClassTag<ED2>) - Method in class org.apache.spark.graphx.Graph
-
Transforms each edge attribute using the map function, passing it the adjacent vertex
attributes as well.
- mapTriplets(Function2<Object, Iterator<EdgeTriplet<VD, ED>>, Iterator<ED2>>, TripletFields, ClassTag<ED2>) - Method in class org.apache.spark.graphx.Graph
-
Transforms each edge attribute a partition at a time using the map function, passing it the
adjacent vertex attributes as well.
- mapTriplets(Function2<Object, Iterator<EdgeTriplet<VD, ED>>, Iterator<ED2>>, TripletFields, ClassTag<ED2>) - Method in class org.apache.spark.graphx.impl.GraphImpl
-
- MapType - Class in org.apache.spark.sql.types
-
:: DeveloperApi ::
The data type for Maps.
- MapType(DataType, DataType, boolean) - Constructor for class org.apache.spark.sql.types.MapType
-
- MapType() - Constructor for class org.apache.spark.sql.types.MapType
-
No-arg constructor for kryo.
- mapValues(Function<V, U>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Pass each value in the key-value pair RDD through a map function without changing the keys;
this also retains the original RDD's partitioning.
- mapValues(Function1<Edge<ED>, ED2>, ClassTag<ED2>) - Method in class org.apache.spark.graphx.EdgeRDD
-
Map the values in an edge partitioning preserving the structure but changing the values.
- mapValues(Function1<Edge<ED>, ED2>, ClassTag<ED2>) - Method in class org.apache.spark.graphx.impl.EdgeRDDImpl
-
- mapValues(Function1<VD, VD2>, ClassTag<VD2>) - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
-
- mapValues(Function2<Object, VD, VD2>, ClassTag<VD2>) - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
-
- mapValues(Function1<VD, VD2>, ClassTag<VD2>) - Method in class org.apache.spark.graphx.VertexRDD
-
Maps each vertex attribute, preserving the index.
- mapValues(Function2<Object, VD, VD2>, ClassTag<VD2>) - Method in class org.apache.spark.graphx.VertexRDD
-
Maps each vertex attribute, additionally supplying the vertex ID.
- mapValues(Function1<V, U>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Pass each value in the key-value pair RDD through a map function without changing the keys;
this also retains the original RDD's partitioning.
- mapValues(Function<V, U>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying a map function to the value of each key-value pairs in
'this' DStream without changing the key.
- mapValues(Function1<V, U>, ClassTag<U>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying a map function to the value of each key-value pairs in
'this' DStream without changing the key.
- mapVertices(Function2<Object, VD, VD2>, ClassTag<VD2>, Predef.$eq$colon$eq<VD, VD2>) - Method in class org.apache.spark.graphx.Graph
-
Transforms each vertex attribute in the graph using the map function.
- mapVertices(Function2<Object, VD, VD2>, ClassTag<VD2>, Predef.$eq$colon$eq<VD, VD2>) - Method in class org.apache.spark.graphx.impl.GraphImpl
-
- mapWith(Function1<Object, A>, boolean, Function2<T, A, U>, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
-
Maps f over this RDD, where f takes an additional parameter of type A.
- mark(int) - Method in class org.apache.spark.storage.BufferReleasingInputStream
-
- markSupported() - Method in class org.apache.spark.storage.BufferReleasingInputStream
-
- mask(Graph<VD2, ED2>, ClassTag<VD2>, ClassTag<ED2>) - Method in class org.apache.spark.graphx.Graph
-
Restricts the graph to only the vertices and edges that are also in other
, but keeps the
attributes from this graph.
- mask(Graph<VD2, ED2>, ClassTag<VD2>, ClassTag<ED2>) - Method in class org.apache.spark.graphx.impl.GraphImpl
-
- master() - Method in class org.apache.spark.api.java.JavaSparkContext
-
- master() - Method in class org.apache.spark.SparkContext
-
- Matrices - Class in org.apache.spark.mllib.linalg
-
- Matrices() - Constructor for class org.apache.spark.mllib.linalg.Matrices
-
- Matrix - Interface in org.apache.spark.mllib.linalg
-
Trait for a local matrix.
- MatrixEntry - Class in org.apache.spark.mllib.linalg.distributed
-
:: Experimental ::
Represents an entry in an distributed matrix.
- MatrixEntry(long, long, double) - Constructor for class org.apache.spark.mllib.linalg.distributed.MatrixEntry
-
- MatrixFactorizationModel - Class in org.apache.spark.mllib.recommendation
-
Model representing the result of matrix factorization.
- MatrixFactorizationModel(int, RDD<Tuple2<Object, double[]>>, RDD<Tuple2<Object, double[]>>) - Constructor for class org.apache.spark.mllib.recommendation.MatrixFactorizationModel
-
- max() - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Returns the maximum element from this RDD as defined by
the default comparator natural order.
- max(Comparator<T>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Returns the maximum element from this RDD as defined by the specified
Comparator[T].
- max() - Method in class org.apache.spark.ml.attribute.NumericAttribute
-
- max() - Method in class org.apache.spark.mllib.stat.MultivariateOnlineSummarizer
-
Maximum value of each dimension.
- max() - Method in interface org.apache.spark.mllib.stat.MultivariateStatisticalSummary
-
Maximum value of each column.
- max(Ordering<T>) - Method in class org.apache.spark.rdd.RDD
-
Returns the max of this RDD as defined by the implicit Ordering[T].
- max(Column) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns the maximum value of the expression in a group.
- max(String) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns the maximum value of the column in a group.
- max(String...) - Method in class org.apache.spark.sql.GroupedData
-
Compute the max value for each numeric columns for each group.
- max(Seq<String>) - Method in class org.apache.spark.sql.GroupedData
-
Compute the max value for each numeric columns for each group.
- max(Duration) - Method in class org.apache.spark.streaming.Duration
-
- max(Time) - Method in class org.apache.spark.streaming.Time
-
- max() - Method in class org.apache.spark.util.StatCounter
-
- MAX_LONG_DIGITS() - Static method in class org.apache.spark.sql.types.Decimal
-
Maximum number of decimal digits a Long can represent
- MAX_PRECISION() - Static method in class org.apache.spark.sql.types.DecimalType
-
- MAX_SCALE() - Static method in class org.apache.spark.sql.types.DecimalType
-
- maxBins() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- maxBufferSizeMb() - Method in class org.apache.spark.serializer.KryoSerializer
-
- maxDepth() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- maxIters() - Method in class org.apache.spark.graphx.lib.SVDPlusPlus.Conf
-
- maxMem() - Method in class org.apache.spark.scheduler.SparkListenerBlockManagerAdded
-
- maxMem() - Method in class org.apache.spark.storage.StorageStatus
-
- maxMemory() - Method in class org.apache.spark.status.api.v1.ExecutorSummary
-
- maxMemoryInMB() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- maxNodesInLevel(int) - Static method in class org.apache.spark.mllib.tree.model.Node
-
Return the maximum number of nodes which can be in the given level of the tree.
- maxVal() - Method in class org.apache.spark.graphx.lib.SVDPlusPlus.Conf
-
- md5(Column) - Static method in class org.apache.spark.sql.functions
-
Calculates the MD5 digest of a binary column and returns the value
as a 32 character hex string.
- mean() - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Compute the mean of this RDD's elements.
- mean() - Method in class org.apache.spark.ml.feature.StandardScalerModel
-
- mean() - Method in class org.apache.spark.mllib.feature.StandardScalerModel
-
- mean() - Method in class org.apache.spark.mllib.random.ExponentialGenerator
-
- mean() - Method in class org.apache.spark.mllib.random.LogNormalGenerator
-
- mean() - Method in class org.apache.spark.mllib.random.PoissonGenerator
-
- mean() - Method in class org.apache.spark.mllib.stat.MultivariateOnlineSummarizer
-
Sample mean of each dimension.
- mean() - Method in interface org.apache.spark.mllib.stat.MultivariateStatisticalSummary
-
Sample mean vector.
- mean() - Method in class org.apache.spark.partial.BoundedDouble
-
- mean() - Method in class org.apache.spark.rdd.DoubleRDDFunctions
-
Compute the mean of this RDD's elements.
- mean(Column) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns the average of the values in a group.
- mean(String) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns the average of the values in a group.
- mean(String...) - Method in class org.apache.spark.sql.GroupedData
-
Compute the average value for each numeric columns for each group.
- mean(Seq<String>) - Method in class org.apache.spark.sql.GroupedData
-
Compute the average value for each numeric columns for each group.
- mean() - Method in class org.apache.spark.util.StatCounter
-
- meanAbsoluteError() - Method in class org.apache.spark.ml.regression.LinearRegressionSummary
-
Returns the mean absolute error, which is a risk function corresponding to the
expected value of the absolute error loss or l1-norm loss.
- meanAbsoluteError() - Method in class org.apache.spark.mllib.evaluation.RegressionMetrics
-
Returns the mean absolute error, which is a risk function corresponding to the
expected value of the absolute error loss or l1-norm loss.
- meanApprox(long, Double) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Return the approximate mean of the elements in this RDD.
- meanApprox(long) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
:: Experimental ::
Approximate operation to return the mean within a timeout.
- meanApprox(long, double) - Method in class org.apache.spark.rdd.DoubleRDDFunctions
-
:: Experimental ::
Approximate operation to return the mean within a timeout.
- meanAveragePrecision() - Method in class org.apache.spark.mllib.evaluation.RankingMetrics
-
Returns the mean average precision (MAP) of all the queries.
- means() - Method in class org.apache.spark.mllib.clustering.ExpectationSum
-
- meanSquaredError() - Method in class org.apache.spark.ml.regression.LinearRegressionSummary
-
Returns the mean squared error, which is a risk function corresponding to the
expected value of the squared error loss or quadratic loss.
- meanSquaredError() - Method in class org.apache.spark.mllib.evaluation.RegressionMetrics
-
Returns the mean squared error, which is a risk function corresponding to the
expected value of the squared error loss or quadratic loss.
- MEMORY_AND_DISK - Static variable in class org.apache.spark.api.java.StorageLevels
-
- MEMORY_AND_DISK() - Static method in class org.apache.spark.storage.StorageLevel
-
- MEMORY_AND_DISK_2 - Static variable in class org.apache.spark.api.java.StorageLevels
-
- MEMORY_AND_DISK_2() - Static method in class org.apache.spark.storage.StorageLevel
-
- MEMORY_AND_DISK_SER - Static variable in class org.apache.spark.api.java.StorageLevels
-
- MEMORY_AND_DISK_SER() - Static method in class org.apache.spark.storage.StorageLevel
-
- MEMORY_AND_DISK_SER_2 - Static variable in class org.apache.spark.api.java.StorageLevels
-
- MEMORY_AND_DISK_SER_2() - Static method in class org.apache.spark.storage.StorageLevel
-
- MEMORY_ONLY - Static variable in class org.apache.spark.api.java.StorageLevels
-
- MEMORY_ONLY() - Static method in class org.apache.spark.storage.StorageLevel
-
- MEMORY_ONLY_2 - Static variable in class org.apache.spark.api.java.StorageLevels
-
- MEMORY_ONLY_2() - Static method in class org.apache.spark.storage.StorageLevel
-
- MEMORY_ONLY_SER - Static variable in class org.apache.spark.api.java.StorageLevels
-
- MEMORY_ONLY_SER() - Static method in class org.apache.spark.storage.StorageLevel
-
- MEMORY_ONLY_SER_2 - Static variable in class org.apache.spark.api.java.StorageLevels
-
- MEMORY_ONLY_SER_2() - Static method in class org.apache.spark.storage.StorageLevel
-
- memoryBytesSpilled() - Method in class org.apache.spark.status.api.v1.ExecutorStageSummary
-
- memoryBytesSpilled() - Method in class org.apache.spark.status.api.v1.StageData
-
- memoryBytesSpilled() - Method in class org.apache.spark.status.api.v1.TaskMetricDistributions
-
- memoryBytesSpilled() - Method in class org.apache.spark.status.api.v1.TaskMetrics
-
- MemoryEntry - Class in org.apache.spark.storage
-
- MemoryEntry(Object, long, boolean) - Constructor for class org.apache.spark.storage.MemoryEntry
-
- memoryRemaining() - Method in class org.apache.spark.status.api.v1.RDDDataDistribution
-
- memoryUsed() - Method in class org.apache.spark.status.api.v1.ExecutorSummary
-
- memoryUsed() - Method in class org.apache.spark.status.api.v1.RDDDataDistribution
-
- memoryUsed() - Method in class org.apache.spark.status.api.v1.RDDPartitionInfo
-
- memoryUsed() - Method in class org.apache.spark.status.api.v1.RDDStorageInfo
-
- memRemaining() - Method in class org.apache.spark.storage.StorageStatus
-
Return the memory remaining in this block manager.
- memSize() - Method in class org.apache.spark.storage.BlockStatus
-
- memSize() - Method in class org.apache.spark.storage.BlockUpdatedInfo
-
- memSize() - Method in class org.apache.spark.storage.RDDInfo
-
- memUsed() - Method in class org.apache.spark.storage.StorageStatus
-
Return the memory used by this block manager.
- memUsedByRdd(int) - Method in class org.apache.spark.storage.StorageStatus
-
Return the memory used by the given RDD in this block manager in O(1) time.
- merge(R) - Method in class org.apache.spark.Accumulable
-
Merge two accumulable objects together
- merge(LogisticAggregator) - Method in class org.apache.spark.ml.classification.LogisticAggregator
-
Merge another LogisticAggregator, and update the loss and gradient
of the objective function.
- merge(VectorIndexer.CategoryStats) - Method in class org.apache.spark.ml.feature.VectorIndexer.CategoryStats
-
Merge with another instance, modifying this instance.
- merge(LeastSquaresAggregator) - Method in class org.apache.spark.ml.regression.LeastSquaresAggregator
-
Merge another LeastSquaresAggregator, and update the loss and gradient
of the objective function.
- merge(IDF.DocumentFrequencyAggregator) - Method in class org.apache.spark.mllib.feature.IDF.DocumentFrequencyAggregator
-
Merges another.
- merge(MultivariateOnlineSummarizer) - Method in class org.apache.spark.mllib.stat.MultivariateOnlineSummarizer
-
Merge another MultivariateOnlineSummarizer, and update the statistical summary.
- merge(MutableAggregationBuffer, Row) - Method in class org.apache.spark.sql.expressions.UserDefinedAggregateFunction
-
Merges two aggregation buffers and stores the updated buffer values back to buffer1
.
- merge(double) - Method in class org.apache.spark.util.StatCounter
-
Add a value into this StatCounter, updating the internal statistics.
- merge(TraversableOnce<Object>) - Method in class org.apache.spark.util.StatCounter
-
Add multiple values into this StatCounter, updating the internal statistics.
- merge(StatCounter) - Method in class org.apache.spark.util.StatCounter
-
Merge another StatCounter into this one, adding up the internal statistics.
- mergeCombiners() - Method in class org.apache.spark.Aggregator
-
- mergeValue() - Method in class org.apache.spark.Aggregator
-
- message() - Method in class org.apache.spark.FetchFailed
-
- message() - Method in exception org.apache.spark.sql.AnalysisException
-
- Metadata - Class in org.apache.spark.sql.types
-
:: DeveloperApi ::
- Metadata() - Constructor for class org.apache.spark.sql.types.Metadata
-
No-arg constructor for kryo.
- metadata() - Method in class org.apache.spark.sql.types.StructField
-
- metadata() - Method in class org.apache.spark.streaming.scheduler.StreamInputInfo
-
- METADATA_KEY_DESCRIPTION() - Static method in class org.apache.spark.streaming.scheduler.StreamInputInfo
-
The key for description in StreamInputInfo.metadata
.
- MetadataBuilder - Class in org.apache.spark.sql.types
-
:: DeveloperApi ::
- MetadataBuilder() - Constructor for class org.apache.spark.sql.types.MetadataBuilder
-
- metadataDescription() - Method in class org.apache.spark.streaming.scheduler.StreamInputInfo
-
- metadataHive() - Method in class org.apache.spark.sql.hive.HiveContext
-
The copy of the Hive client that is used to retrieve metadata from the Hive MetaStore.
- method() - Method in class org.apache.spark.mllib.stat.test.ChiSqTestResult
-
- MethodIdentifier<T> - Class in org.apache.spark.util
-
Helper class to identify a method.
- MethodIdentifier(Class<T>, String, String) - Constructor for class org.apache.spark.util.MethodIdentifier
-
- metricName() - Method in class org.apache.spark.ml.evaluation.BinaryClassificationEvaluator
-
param for metric name in evaluation
Default: areaUnderROC
- metricName() - Method in class org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
-
param for metric name in evaluation (supports "f1"
(default), "precision"
, "recall"
,
"weightedPrecision"
, "weightedRecall"
)
- metricName() - Method in class org.apache.spark.ml.evaluation.RegressionEvaluator
-
param for metric name in evaluation (supports "rmse"
(default), "mse"
, "r2"
, and "mae"
)
- metrics() - Method in class org.apache.spark.ExceptionFailure
-
- metricsSystem() - Method in class org.apache.spark.SparkContext
-
- metricsSystem() - Method in class org.apache.spark.SparkEnv
-
- MFDataGenerator - Class in org.apache.spark.mllib.util
-
:: DeveloperApi ::
Generate RDD(s) containing data for Matrix Factorization.
- MFDataGenerator() - Constructor for class org.apache.spark.mllib.util.MFDataGenerator
-
- microF1Measure() - Method in class org.apache.spark.mllib.evaluation.MultilabelMetrics
-
Returns micro-averaged label-based f1-measure
(equals to micro-averaged document-based f1-measure)
- microPrecision() - Method in class org.apache.spark.mllib.evaluation.MultilabelMetrics
-
Returns micro-averaged label-based precision
(equals to micro-averaged document-based precision)
- microRecall() - Method in class org.apache.spark.mllib.evaluation.MultilabelMetrics
-
Returns micro-averaged label-based recall
(equals to micro-averaged document-based recall)
- milliseconds() - Method in class org.apache.spark.streaming.Duration
-
- milliseconds(long) - Static method in class org.apache.spark.streaming.Durations
-
- Milliseconds - Class in org.apache.spark.streaming
-
Helper object that creates instance of
Duration
representing
a given number of milliseconds.
- Milliseconds() - Constructor for class org.apache.spark.streaming.Milliseconds
-
- milliseconds() - Method in class org.apache.spark.streaming.Time
-
- millisToString(long) - Static method in class org.apache.spark.scheduler.StatsReportListener
-
Reformat a time interval in milliseconds to a prettier format for output
- min() - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Returns the minimum element from this RDD as defined by
the default comparator natural order.
- min(Comparator<T>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Returns the minimum element from this RDD as defined by the specified
Comparator[T].
- min() - Method in class org.apache.spark.ml.attribute.NumericAttribute
-
- min() - Method in class org.apache.spark.mllib.stat.MultivariateOnlineSummarizer
-
Minimum value of each dimension.
- min() - Method in interface org.apache.spark.mllib.stat.MultivariateStatisticalSummary
-
Minimum value of each column.
- min(Ordering<T>) - Method in class org.apache.spark.rdd.RDD
-
Returns the min of this RDD as defined by the implicit Ordering[T].
- min(Column) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns the minimum value of the expression in a group.
- min(String) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns the minimum value of the column in a group.
- min(String...) - Method in class org.apache.spark.sql.GroupedData
-
Compute the min value for each numeric column for each group.
- min(Seq<String>) - Method in class org.apache.spark.sql.GroupedData
-
Compute the min value for each numeric column for each group.
- min(Duration) - Method in class org.apache.spark.streaming.Duration
-
- min(Time) - Method in class org.apache.spark.streaming.Time
-
- min() - Method in class org.apache.spark.util.StatCounter
-
- minDocFreq() - Method in class org.apache.spark.mllib.feature.IDF.DocumentFrequencyAggregator
-
- minDocFreq() - Method in class org.apache.spark.mllib.feature.IDF
-
- minInfoGain() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- minInstancesPerNode() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- MinMax() - Static method in class org.apache.spark.mllib.tree.configuration.QuantileStrategy
-
- MinMaxScaler - Class in org.apache.spark.ml.feature
-
:: Experimental ::
Rescale each feature individually to a common range [min, max] linearly using column summary
statistics, which is also known as min-max normalization or Rescaling.
- MinMaxScaler(String) - Constructor for class org.apache.spark.ml.feature.MinMaxScaler
-
- MinMaxScaler() - Constructor for class org.apache.spark.ml.feature.MinMaxScaler
-
- MinMaxScalerModel - Class in org.apache.spark.ml.feature
-
- minTokenLength() - Method in class org.apache.spark.ml.feature.RegexTokenizer
-
Minimum token length, >= 0.
- minus(RDD<Tuple2<Object, VD>>) - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
-
- minus(VertexRDD<VD>) - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
-
- minus(RDD<Tuple2<Object, VD>>) - Method in class org.apache.spark.graphx.VertexRDD
-
For each VertexId present in both this
and other
, minus will act as a set difference
operation returning only those unique VertexId's present in this
.
- minus(VertexRDD<VD>) - Method in class org.apache.spark.graphx.VertexRDD
-
For each VertexId present in both this
and other
, minus will act as a set difference
operation returning only those unique VertexId's present in this
.
- minus(Object) - Method in class org.apache.spark.sql.Column
-
Subtraction.
- minus(Duration) - Method in class org.apache.spark.streaming.Duration
-
- minus(Time) - Method in class org.apache.spark.streaming.Time
-
- minus(Duration) - Method in class org.apache.spark.streaming.Time
-
- minute(Column) - Static method in class org.apache.spark.sql.functions
-
Extracts the minutes as an integer from a given date/timestamp/string.
- minutes() - Static method in class org.apache.spark.scheduler.StatsReportListener
-
- minutes(long) - Static method in class org.apache.spark.streaming.Durations
-
- Minutes - Class in org.apache.spark.streaming
-
Helper object that creates instance of
Duration
representing
a given number of minutes.
- Minutes() - Constructor for class org.apache.spark.streaming.Minutes
-
- minVal() - Method in class org.apache.spark.graphx.lib.SVDPlusPlus.Conf
-
- mkString() - Method in interface org.apache.spark.sql.Row
-
Displays all elements of this sequence in a string (without a separator).
- mkString(String) - Method in interface org.apache.spark.sql.Row
-
Displays all elements of this sequence in a string using a separator string.
- mkString(String, String, String) - Method in interface org.apache.spark.sql.Row
-
Displays all elements of this traversable or iterator in a string using
start, end, and separator strings.
- MLPairRDDFunctions<K,V> - Class in org.apache.spark.mllib.rdd
-
Machine learning specific Pair RDD functions.
- MLPairRDDFunctions(RDD<Tuple2<K, V>>, ClassTag<K>, ClassTag<V>) - Constructor for class org.apache.spark.mllib.rdd.MLPairRDDFunctions
-
- MLUtils - Class in org.apache.spark.mllib.util
-
Helper methods to load, save and pre-process data used in ML Lib.
- MLUtils() - Constructor for class org.apache.spark.mllib.util.MLUtils
-
- mod(Object) - Method in class org.apache.spark.sql.Column
-
Modulo (a.k.a.
- mode(SaveMode) - Method in class org.apache.spark.sql.DataFrameWriter
-
Specifies the behavior when data or table already exists.
- mode(String) - Method in class org.apache.spark.sql.DataFrameWriter
-
Specifies the behavior when data or table already exists.
- Model<M extends Model<M>> - Class in org.apache.spark.ml
-
- Model() - Constructor for class org.apache.spark.ml.Model
-
- model() - Method in class org.apache.spark.mllib.classification.StreamingLogisticRegressionWithSGD
-
- model() - Method in class org.apache.spark.mllib.clustering.StreamingKMeans
-
- model() - Method in class org.apache.spark.mllib.regression.StreamingLinearAlgorithm
-
The model to be updated and used for prediction.
- model() - Method in class org.apache.spark.mllib.regression.StreamingLinearRegressionWithSGD
-
- models() - Method in class org.apache.spark.ml.classification.OneVsRestModel
-
- modelType() - Method in class org.apache.spark.mllib.classification.NaiveBayesModel
-
- modificationTime() - Method in class org.apache.spark.sql.sources.HadoopFsRelation.FakeFileStatus
-
- MODULE$ - Static variable in class org.apache.spark.AccumulatorParam.DoubleAccumulatorParam$
-
Static reference to the singleton instance of this Scala object.
- MODULE$ - Static variable in class org.apache.spark.AccumulatorParam.FloatAccumulatorParam$
-
Static reference to the singleton instance of this Scala object.
- MODULE$ - Static variable in class org.apache.spark.AccumulatorParam.IntAccumulatorParam$
-
Static reference to the singleton instance of this Scala object.
- MODULE$ - Static variable in class org.apache.spark.AccumulatorParam.LongAccumulatorParam$
-
Static reference to the singleton instance of this Scala object.
- MODULE$ - Static variable in class org.apache.spark.graphx.PartitionStrategy.CanonicalRandomVertexCut$
-
Static reference to the singleton instance of this Scala object.
- MODULE$ - Static variable in class org.apache.spark.graphx.PartitionStrategy.EdgePartition1D$
-
Static reference to the singleton instance of this Scala object.
- MODULE$ - Static variable in class org.apache.spark.graphx.PartitionStrategy.EdgePartition2D$
-
Static reference to the singleton instance of this Scala object.
- MODULE$ - Static variable in class org.apache.spark.graphx.PartitionStrategy.RandomVertexCut$
-
Static reference to the singleton instance of this Scala object.
- MODULE$ - Static variable in class org.apache.spark.ml.recommendation.ALS.Rating$
-
Static reference to the singleton instance of this Scala object.
- MODULE$ - Static variable in class org.apache.spark.mllib.clustering.PowerIterationClustering.Assignment$
-
Static reference to the singleton instance of this Scala object.
- MODULE$ - Static variable in class org.apache.spark.SparkContext.DoubleAccumulatorParam$
-
Static reference to the singleton instance of this Scala object.
- MODULE$ - Static variable in class org.apache.spark.SparkContext.FloatAccumulatorParam$
-
Static reference to the singleton instance of this Scala object.
- MODULE$ - Static variable in class org.apache.spark.SparkContext.IntAccumulatorParam$
-
Static reference to the singleton instance of this Scala object.
- MODULE$ - Static variable in class org.apache.spark.SparkContext.LongAccumulatorParam$
-
Static reference to the singleton instance of this Scala object.
- MODULE$ - Static variable in class org.apache.spark.sql.sources.HadoopFsRelation.FakeFileStatus$
-
Static reference to the singleton instance of this Scala object.
- MODULE$ - Static variable in class org.apache.spark.util.Vector.VectorAccumParam$
-
Static reference to the singleton instance of this Scala object.
- monotonicallyIncreasingId() - Static method in class org.apache.spark.sql.functions
-
A column expression that generates monotonically increasing 64-bit integers.
- month(Column) - Static method in class org.apache.spark.sql.functions
-
Extracts the month as an integer from a given date/timestamp/string.
- months_between(Column, Column) - Static method in class org.apache.spark.sql.functions
-
- MQTTUtils - Class in org.apache.spark.streaming.mqtt
-
- MQTTUtils() - Constructor for class org.apache.spark.streaming.mqtt.MQTTUtils
-
- mu() - Method in class org.apache.spark.mllib.stat.distribution.MultivariateGaussian
-
- MulticlassClassificationEvaluator - Class in org.apache.spark.ml.evaluation
-
:: Experimental ::
Evaluator for multiclass classification, which expects two input columns: score and label.
- MulticlassClassificationEvaluator(String) - Constructor for class org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
-
- MulticlassClassificationEvaluator() - Constructor for class org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
-
- MulticlassMetrics - Class in org.apache.spark.mllib.evaluation
-
::Experimental::
Evaluator for multiclass classification.
- MulticlassMetrics(RDD<Tuple2<Object, Object>>) - Constructor for class org.apache.spark.mllib.evaluation.MulticlassMetrics
-
- MultilabelMetrics - Class in org.apache.spark.mllib.evaluation
-
Evaluator for multilabel classification.
- MultilabelMetrics(RDD<Tuple2<double[], double[]>>) - Constructor for class org.apache.spark.mllib.evaluation.MultilabelMetrics
-
- multiLabelValidator(int) - Static method in class org.apache.spark.mllib.util.DataValidators
-
Function to check if labels used for k class multi-label classification are
in the range of {0, 1, ..., k - 1}.
- MultilayerPerceptronClassificationModel - Class in org.apache.spark.ml.classification
-
:: Experimental ::
Classification model based on the Multilayer Perceptron.
- MultilayerPerceptronClassifier - Class in org.apache.spark.ml.classification
-
:: Experimental ::
Classifier trainer based on the Multilayer Perceptron.
- MultilayerPerceptronClassifier(String) - Constructor for class org.apache.spark.ml.classification.MultilayerPerceptronClassifier
-
- MultilayerPerceptronClassifier() - Constructor for class org.apache.spark.ml.classification.MultilayerPerceptronClassifier
-
- Multinomial() - Static method in class org.apache.spark.mllib.classification.NaiveBayes
-
String name for multinomial model type.
- multiply(BlockMatrix) - Method in class org.apache.spark.mllib.linalg.distributed.BlockMatrix
-
- multiply(Matrix) - Method in class org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix
-
Multiply this matrix by a local matrix on the right.
- multiply(Matrix) - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
-
Multiply this matrix by a local matrix on the right.
- multiply(DenseMatrix) - Method in interface org.apache.spark.mllib.linalg.Matrix
-
Convenience method for `Matrix`-`DenseMatrix` multiplication.
- multiply(DenseVector) - Method in interface org.apache.spark.mllib.linalg.Matrix
-
Convenience method for `Matrix`-`DenseVector` multiplication.
- multiply(Vector) - Method in interface org.apache.spark.mllib.linalg.Matrix
-
Convenience method for `Matrix`-`Vector` multiplication.
- multiply(Object) - Method in class org.apache.spark.sql.Column
-
Multiplication of this expression and another expression.
- multiply(double) - Method in class org.apache.spark.util.Vector
-
- MultivariateGaussian - Class in org.apache.spark.mllib.stat.distribution
-
:: DeveloperApi ::
This class provides basic functionality for a Multivariate Gaussian (Normal) Distribution.
- MultivariateGaussian(Vector, Matrix) - Constructor for class org.apache.spark.mllib.stat.distribution.MultivariateGaussian
-
- MultivariateOnlineSummarizer - Class in org.apache.spark.mllib.stat
-
:: DeveloperApi ::
MultivariateOnlineSummarizer implements
MultivariateStatisticalSummary
to compute the mean,
variance, minimum, maximum, counts, and nonzero counts for samples in sparse or dense vector
format in a online fashion.
- MultivariateOnlineSummarizer() - Constructor for class org.apache.spark.mllib.stat.MultivariateOnlineSummarizer
-
- MultivariateStatisticalSummary - Interface in org.apache.spark.mllib.stat
-
Trait for multivariate statistical summary of a data matrix.
- mustCheckpoint() - Method in class org.apache.spark.streaming.dstream.DStream
-
- MutableAggregationBuffer - Class in org.apache.spark.sql.expressions
-
:: Experimental ::
A Row
representing an mutable aggregation buffer.
- MutableAggregationBuffer() - Constructor for class org.apache.spark.sql.expressions.MutableAggregationBuffer
-
- MutablePair<T1,T2> - Class in org.apache.spark.util
-
:: DeveloperApi ::
A tuple of 2 elements.
- MutablePair(T1, T2) - Constructor for class org.apache.spark.util.MutablePair
-
- MutablePair() - Constructor for class org.apache.spark.util.MutablePair
-
No-arg constructor for serialization
- myName() - Method in class org.apache.spark.util.InnerClosureFinder
-
- MySQLDialect - Class in org.apache.spark.sql.jdbc
-
:: DeveloperApi ::
Default mysql dialect to read bit/bitsets correctly.
- MySQLDialect() - Constructor for class org.apache.spark.sql.jdbc.MySQLDialect
-
- p() - Method in class org.apache.spark.ml.feature.Normalizer
-
Normalization in L^p^ space.
- pageRank(double, double) - Method in class org.apache.spark.graphx.GraphOps
-
Run a dynamic version of PageRank returning a graph with vertex attributes containing the
PageRank and edge attributes containing the normalized edge weight.
- PageRank - Class in org.apache.spark.graphx.lib
-
PageRank algorithm implementation.
- PageRank() - Constructor for class org.apache.spark.graphx.lib.PageRank
-
- PairDStreamFunctions<K,V> - Class in org.apache.spark.streaming.dstream
-
Extra functions available on DStream of (key, value) pairs through an implicit conversion.
- PairDStreamFunctions(DStream<Tuple2<K, V>>, ClassTag<K>, ClassTag<V>, Ordering<K>) - Constructor for class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
- PairFlatMapFunction<T,K,V> - Interface in org.apache.spark.api.java.function
-
A function that returns zero or more key-value pair records from each input record.
- PairFunction<T,K,V> - Interface in org.apache.spark.api.java.function
-
A function that returns key-value pairs (Tuple2<K, V>), and can be used to
construct PairRDDs.
- PairRDDFunctions<K,V> - Class in org.apache.spark.rdd
-
Extra functions available on RDDs of (key, value) pairs through an implicit conversion.
- PairRDDFunctions(RDD<Tuple2<K, V>>, ClassTag<K>, ClassTag<V>, Ordering<K>) - Constructor for class org.apache.spark.rdd.PairRDDFunctions
-
- PairwiseRRDD<T> - Class in org.apache.spark.api.r
-
Form an RDD[(Int, Array[Byte])] from key-value pairs returned from R.
- PairwiseRRDD(RDD<T>, int, byte[], String, byte[], Object[], ClassTag<T>) - Constructor for class org.apache.spark.api.r.PairwiseRRDD
-
- parallelize(List<T>, int) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Distribute a local Scala collection to form an RDD.
- parallelize(List<T>) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Distribute a local Scala collection to form an RDD.
- parallelize(Seq<T>, int, ClassTag<T>) - Method in class org.apache.spark.SparkContext
-
Distribute a local Scala collection to form an RDD.
- parallelizeDoubles(List<Double>, int) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Distribute a local Scala collection to form an RDD.
- parallelizeDoubles(List<Double>) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Distribute a local Scala collection to form an RDD.
- parallelizePairs(List<Tuple2<K, V>>, int) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Distribute a local Scala collection to form an RDD.
- parallelizePairs(List<Tuple2<K, V>>) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Distribute a local Scala collection to form an RDD.
- Param<T> - Class in org.apache.spark.ml.param
-
:: DeveloperApi ::
A param with self-contained documentation and optionally default value.
- Param(String, String, String, Function1<T, Object>) - Constructor for class org.apache.spark.ml.param.Param
-
- Param(Identifiable, String, String, Function1<T, Object>) - Constructor for class org.apache.spark.ml.param.Param
-
- Param(String, String, String) - Constructor for class org.apache.spark.ml.param.Param
-
- Param(Identifiable, String, String) - Constructor for class org.apache.spark.ml.param.Param
-
- param() - Method in class org.apache.spark.ml.param.ParamPair
-
- ParamGridBuilder - Class in org.apache.spark.ml.tuning
-
:: Experimental ::
Builder for a param grid used in grid search-based model selection.
- ParamGridBuilder() - Constructor for class org.apache.spark.ml.tuning.ParamGridBuilder
-
- ParamMap - Class in org.apache.spark.ml.param
-
:: Experimental ::
A param to value map.
- ParamMap() - Constructor for class org.apache.spark.ml.param.ParamMap
-
Creates an empty param map.
- paramMap() - Method in interface org.apache.spark.ml.param.Params
-
Internal param map for user-supplied values.
- ParamPair<T> - Class in org.apache.spark.ml.param
-
:: Experimental ::
A param and its value.
- ParamPair(Param<T>, T) - Constructor for class org.apache.spark.ml.param.ParamPair
-
- Params - Interface in org.apache.spark.ml.param
-
:: DeveloperApi ::
Trait for components that take parameters.
- params() - Method in interface org.apache.spark.ml.param.Params
-
Returns all params sorted by their names.
- ParamValidators - Class in org.apache.spark.ml.param
-
:: DeveloperApi ::
Factory methods for common validation functions for Param.isValid
.
- ParamValidators() - Constructor for class org.apache.spark.ml.param.ParamValidators
-
- parent() - Method in class org.apache.spark.ml.Model
-
The parent estimator that produced this model.
- parent() - Method in class org.apache.spark.ml.param.Param
-
- parent(int, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
-
Returns the jth parent RDD: e.g.
- parentIds() - Method in class org.apache.spark.scheduler.StageInfo
-
- parentIds() - Method in class org.apache.spark.storage.RDDInfo
-
- parentIndex(int) - Static method in class org.apache.spark.mllib.tree.model.Node
-
Get the parent index of the given node, or 0 if it is the root.
- parquet(String...) - Method in class org.apache.spark.sql.DataFrameReader
-
Loads a Parquet file, returning the result as a
DataFrame
.
- parquet(Seq<String>) - Method in class org.apache.spark.sql.DataFrameReader
-
Loads a Parquet file, returning the result as a
DataFrame
.
- parquet(String) - Method in class org.apache.spark.sql.DataFrameWriter
-
Saves the content of the
DataFrame
in Parquet format at the specified path.
- parquetFile(String...) - Method in class org.apache.spark.sql.SQLContext
-
Deprecated.
As of 1.4.0, replaced by read().parquet()
.
- parquetFile(Seq<String>) - Method in class org.apache.spark.sql.SQLContext
-
- parse(String) - Static method in class org.apache.spark.mllib.linalg.Vectors
-
Parses a string resulted from
Vector.toString
into a
Vector
.
- parse(String) - Static method in class org.apache.spark.mllib.regression.LabeledPoint
-
Parses a string resulted from
LabeledPoint#toString
into
an
LabeledPoint
.
- parseDataType(String) - Method in class org.apache.spark.sql.SQLContext
-
- parseIgnoreCase(Class<E>, String) - Static method in class org.apache.spark.util.EnumUtil
-
- parseSql(String) - Method in class org.apache.spark.sql.hive.HiveContext
-
- parseSql(String) - Method in class org.apache.spark.sql.SQLContext
-
- PartialResult<R> - Class in org.apache.spark.partial
-
- PartialResult(R, boolean) - Constructor for class org.apache.spark.partial.PartialResult
-
- Partition - Interface in org.apache.spark
-
An identifier for a partition in an RDD.
- partition() - Method in class org.apache.spark.scheduler.AskPermissionToCommitOutput
-
- partition() - Method in class org.apache.spark.streaming.kafka.OffsetRange
-
- partitionBy(Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return a copy of the RDD partitioned using the specified partitioner.
- partitionBy(PartitionStrategy) - Method in class org.apache.spark.graphx.Graph
-
Repartitions the edges in the graph according to partitionStrategy
.
- partitionBy(PartitionStrategy, int) - Method in class org.apache.spark.graphx.Graph
-
Repartitions the edges in the graph according to partitionStrategy
.
- partitionBy(PartitionStrategy) - Method in class org.apache.spark.graphx.impl.GraphImpl
-
- partitionBy(PartitionStrategy, int) - Method in class org.apache.spark.graphx.impl.GraphImpl
-
- partitionBy(Partitioner) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Return a copy of the RDD partitioned using the specified partitioner.
- partitionBy(String...) - Method in class org.apache.spark.sql.DataFrameWriter
-
Partitions the output by the given columns on the file system.
- partitionBy(Seq<String>) - Method in class org.apache.spark.sql.DataFrameWriter
-
Partitions the output by the given columns on the file system.
- partitionBy(String, String...) - Static method in class org.apache.spark.sql.expressions.Window
-
Creates a
WindowSpec
with the partitioning defined.
- partitionBy(Column...) - Static method in class org.apache.spark.sql.expressions.Window
-
Creates a
WindowSpec
with the partitioning defined.
- partitionBy(String, Seq<String>) - Static method in class org.apache.spark.sql.expressions.Window
-
Creates a
WindowSpec
with the partitioning defined.
- partitionBy(Seq<Column>) - Static method in class org.apache.spark.sql.expressions.Window
-
Creates a
WindowSpec
with the partitioning defined.
- partitionBy(String, String...) - Method in class org.apache.spark.sql.expressions.WindowSpec
-
- partitionBy(Column...) - Method in class org.apache.spark.sql.expressions.WindowSpec
-
- partitionBy(String, Seq<String>) - Method in class org.apache.spark.sql.expressions.WindowSpec
-
- partitionBy(Seq<Column>) - Method in class org.apache.spark.sql.expressions.WindowSpec
-
- PartitionCoalescer - Class in org.apache.spark.rdd
-
Coalesce the partitions of a parent RDD (prev
) into fewer partitions, so that each partition of
this RDD computes one or more of the parent ones.
- PartitionCoalescer(int, RDD<?>, double) - Constructor for class org.apache.spark.rdd.PartitionCoalescer
-
- PartitionCoalescer.LocationIterator - Class in org.apache.spark.rdd
-
- PartitionCoalescer.LocationIterator(RDD<?>) - Constructor for class org.apache.spark.rdd.PartitionCoalescer.LocationIterator
-
- partitionColumns() - Method in class org.apache.spark.sql.sources.HadoopFsRelation
-
Partition columns.
- partitioner() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
The partitioner of this RDD.
- partitioner() - Method in class org.apache.spark.graphx.impl.EdgeRDDImpl
-
If partitionsRDD
already has a partitioner, use it.
- partitioner() - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
-
- Partitioner - Class in org.apache.spark
-
An object that defines how the elements in a key-value pair RDD are partitioned by key.
- Partitioner() - Constructor for class org.apache.spark.Partitioner
-
- partitioner() - Method in class org.apache.spark.rdd.CoGroupedRDD
-
- partitioner() - Method in class org.apache.spark.rdd.RDD
-
Optionally overridden by subclasses to specify how they are partitioned.
- partitioner() - Method in class org.apache.spark.rdd.ShuffledRDD
-
- partitioner() - Method in class org.apache.spark.ShuffleDependency
-
- PartitionGroup - Class in org.apache.spark.rdd
-
- PartitionGroup(Option<String>) - Constructor for class org.apache.spark.rdd.PartitionGroup
-
- partitionID() - Method in class org.apache.spark.TaskCommitDenied
-
- partitionId() - Method in class org.apache.spark.TaskContext
-
The ID of the RDD partition that is computed by this task.
- PartitionPruningRDD<T> - Class in org.apache.spark.rdd
-
:: DeveloperApi ::
A RDD used to prune RDD partitions/partitions so we can avoid launching tasks on
all partitions.
- PartitionPruningRDD(RDD<T>, Function1<Object, Object>, ClassTag<T>) - Constructor for class org.apache.spark.rdd.PartitionPruningRDD
-
- partitions() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Set of partitions in this RDD.
- partitions() - Method in class org.apache.spark.rdd.RDD
-
Get the array of partitions of this RDD, taking into account whether the
RDD is checkpointed or not.
- partitions() - Method in class org.apache.spark.status.api.v1.RDDStorageInfo
-
- partitionsRDD() - Method in class org.apache.spark.graphx.impl.EdgeRDDImpl
-
- partitionsRDD() - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
-
- PartitionStrategy - Interface in org.apache.spark.graphx
-
Represents the way edges are assigned to edge partitions based on their source and destination
vertex IDs.
- PartitionStrategy.CanonicalRandomVertexCut$ - Class in org.apache.spark.graphx
-
Assigns edges to partitions by hashing the source and destination vertex IDs in a canonical
direction, resulting in a random vertex cut that colocates all edges between two vertices,
regardless of direction.
- PartitionStrategy.CanonicalRandomVertexCut$() - Constructor for class org.apache.spark.graphx.PartitionStrategy.CanonicalRandomVertexCut$
-
- PartitionStrategy.EdgePartition1D$ - Class in org.apache.spark.graphx
-
Assigns edges to partitions using only the source vertex ID, colocating edges with the same
source.
- PartitionStrategy.EdgePartition1D$() - Constructor for class org.apache.spark.graphx.PartitionStrategy.EdgePartition1D$
-
- PartitionStrategy.EdgePartition2D$ - Class in org.apache.spark.graphx
-
Assigns edges to partitions using a 2D partitioning of the sparse edge adjacency matrix,
guaranteeing a 2 * sqrt(numParts)
bound on vertex replication.
- PartitionStrategy.EdgePartition2D$() - Constructor for class org.apache.spark.graphx.PartitionStrategy.EdgePartition2D$
-
- PartitionStrategy.RandomVertexCut$ - Class in org.apache.spark.graphx
-
Assigns edges to partitions by hashing the source and destination vertex IDs, resulting in a
random vertex cut that colocates all same-direction edges between two vertices.
- PartitionStrategy.RandomVertexCut$() - Constructor for class org.apache.spark.graphx.PartitionStrategy.RandomVertexCut$
-
- path() - Method in class org.apache.spark.scheduler.InputFormatInfo
-
- path() - Method in class org.apache.spark.scheduler.SplitInfo
-
- path() - Method in class org.apache.spark.sql.sources.HadoopFsRelation.FakeFileStatus
-
- paths() - Method in class org.apache.spark.sql.sources.HadoopFsRelation
-
Base paths of this relation.
- pattern() - Method in class org.apache.spark.ml.feature.RegexTokenizer
-
Regex pattern used to match delimiters if gaps
is true or tokens if gaps
is false.
- pc() - Method in class org.apache.spark.mllib.feature.PCAModel
-
- PCA - Class in org.apache.spark.ml.feature
-
:: Experimental ::
PCA trains a model to project vectors to a low-dimensional space using PCA.
- PCA(String) - Constructor for class org.apache.spark.ml.feature.PCA
-
- PCA() - Constructor for class org.apache.spark.ml.feature.PCA
-
- PCA - Class in org.apache.spark.mllib.feature
-
A feature transformer that projects vectors to a low-dimensional space using PCA.
- PCA(int) - Constructor for class org.apache.spark.mllib.feature.PCA
-
- PCAModel - Class in org.apache.spark.ml.feature
-
- PCAModel - Class in org.apache.spark.mllib.feature
-
Model fitted by
PCA
that can project vectors to a low-dimensional space using PCA.
- pdf(Vector) - Method in class org.apache.spark.mllib.stat.distribution.MultivariateGaussian
-
Returns density of this multivariate Gaussian at given point, x
- pendingStages() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- percentiles() - Static method in class org.apache.spark.scheduler.StatsReportListener
-
- percentilesHeader() - Static method in class org.apache.spark.scheduler.StatsReportListener
-
- percentRank() - Static method in class org.apache.spark.sql.functions
-
Window function: returns the relative rank (i.e.
- persist(StorageLevel) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Set this RDD's storage level to persist its values across operations after the first time
it is computed.
- persist(StorageLevel) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Set this RDD's storage level to persist its values across operations after the first time
it is computed.
- persist(StorageLevel) - Method in class org.apache.spark.api.java.JavaRDD
-
Set this RDD's storage level to persist its values across operations after the first time
it is computed.
- persist(StorageLevel) - Method in class org.apache.spark.graphx.Graph
-
Caches the vertices and edges associated with this graph at the specified storage level,
ignoring any target storage levels previously set.
- persist(StorageLevel) - Method in class org.apache.spark.graphx.impl.EdgeRDDImpl
-
Persists the edge partitions at the specified storage level, ignoring any existing target
storage level.
- persist(StorageLevel) - Method in class org.apache.spark.graphx.impl.GraphImpl
-
- persist(StorageLevel) - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
-
Persists the vertex partitions at the specified storage level, ignoring any existing target
storage level.
- persist(StorageLevel) - Method in class org.apache.spark.mllib.linalg.distributed.BlockMatrix
-
Persists the underlying RDD with the specified storage level.
- persist(StorageLevel) - Method in class org.apache.spark.rdd.HadoopRDD
-
- persist(StorageLevel) - Method in class org.apache.spark.rdd.NewHadoopRDD
-
- persist(StorageLevel) - Method in class org.apache.spark.rdd.RDD
-
Set this RDD's storage level to persist its values across operations after the first time
it is computed.
- persist() - Method in class org.apache.spark.rdd.RDD
-
Persist this RDD with the default storage level (`MEMORY_ONLY`).
- persist() - Method in class org.apache.spark.sql.DataFrame
-
- persist(StorageLevel) - Method in class org.apache.spark.sql.DataFrame
-
- persist() - Method in class org.apache.spark.streaming.api.java.JavaDStream
-
Persist RDDs of this DStream with the default storage level (MEMORY_ONLY_SER)
- persist(StorageLevel) - Method in class org.apache.spark.streaming.api.java.JavaDStream
-
Persist the RDDs of this DStream with the given storage level
- persist() - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Persist RDDs of this DStream with the default storage level (MEMORY_ONLY_SER)
- persist(StorageLevel) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Persist the RDDs of this DStream with the given storage level
- persist(StorageLevel) - Method in class org.apache.spark.streaming.dstream.DStream
-
Persist the RDDs of this DStream with the given storage level
- persist() - Method in class org.apache.spark.streaming.dstream.DStream
-
Persist RDDs of this DStream with the default storage level (MEMORY_ONLY_SER)
- persistentRdds() - Method in class org.apache.spark.SparkContext
-
- personalizedPageRank(long, double, double) - Method in class org.apache.spark.graphx.GraphOps
-
Run personalized PageRank for a given vertex, such that all random walks
are started relative to the source node.
- pi() - Method in class org.apache.spark.ml.classification.NaiveBayesModel
-
- pi() - Method in class org.apache.spark.mllib.classification.NaiveBayesModel
-
- pickBin(Partition) - Method in class org.apache.spark.rdd.PartitionCoalescer
-
Takes a parent RDD partition and decides which of the partition groups to put it in
Takes locality into account, but also uses power of 2 choices to load balance
It strikes a balance between the two use the balanceSlack variable
- pickRandomVertex() - Method in class org.apache.spark.graphx.GraphOps
-
Picks a random vertex from the graph and returns its ID.
- pipe(String) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return an RDD created by piping elements to a forked external process.
- pipe(List<String>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return an RDD created by piping elements to a forked external process.
- pipe(List<String>, Map<String, String>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return an RDD created by piping elements to a forked external process.
- pipe(String) - Method in class org.apache.spark.rdd.RDD
-
Return an RDD created by piping elements to a forked external process.
- pipe(String, Map<String, String>) - Method in class org.apache.spark.rdd.RDD
-
Return an RDD created by piping elements to a forked external process.
- pipe(Seq<String>, Map<String, String>, Function1<Function1<String, BoxedUnit>, BoxedUnit>, Function2<T, Function1<String, BoxedUnit>, BoxedUnit>, boolean) - Method in class org.apache.spark.rdd.RDD
-
Return an RDD created by piping elements to a forked external process.
- Pipeline - Class in org.apache.spark.ml
-
:: Experimental ::
A simple pipeline, which acts as an estimator.
- Pipeline(String) - Constructor for class org.apache.spark.ml.Pipeline
-
- Pipeline() - Constructor for class org.apache.spark.ml.Pipeline
-
- PipelineModel - Class in org.apache.spark.ml
-
:: Experimental ::
Represents a fitted pipeline.
- PipelineStage - Class in org.apache.spark.ml
-
- PipelineStage() - Constructor for class org.apache.spark.ml.PipelineStage
-
- planner() - Method in class org.apache.spark.sql.hive.HiveContext
-
- planner() - Method in class org.apache.spark.sql.SQLContext
-
- plus(Object) - Method in class org.apache.spark.sql.Column
-
Sum of this expression and another expression.
- plus(Duration) - Method in class org.apache.spark.streaming.Duration
-
- plus(Duration) - Method in class org.apache.spark.streaming.Time
-
- plusDot(Vector, Vector) - Method in class org.apache.spark.util.Vector
-
return (this + plus) dot other, but without creating any intermediate storage
- PMMLExportable - Interface in org.apache.spark.mllib.pmml
-
:: DeveloperApi ::
Export model to the PMML format
Predictive Model Markup Language (PMML) is an XML-based file format
developed by the Data Mining Group (www.dmg.org).
- pmod(Column, Column) - Static method in class org.apache.spark.sql.functions
-
Returns the positive value of dividend mod divisor.
- point() - Method in class org.apache.spark.mllib.feature.VocabWord
-
- POINTS() - Static method in class org.apache.spark.mllib.clustering.StreamingKMeans
-
- PoissonGenerator - Class in org.apache.spark.mllib.random
-
:: DeveloperApi ::
Generates i.i.d.
- PoissonGenerator(double) - Constructor for class org.apache.spark.mllib.random.PoissonGenerator
-
- poissonJavaRDD(JavaSparkContext, double, long, int, long) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
- poissonJavaRDD(JavaSparkContext, double, long, int) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
- poissonJavaRDD(JavaSparkContext, double, long) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
- poissonJavaVectorRDD(JavaSparkContext, double, long, int, int, long) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
- poissonJavaVectorRDD(JavaSparkContext, double, long, int, int) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
- poissonJavaVectorRDD(JavaSparkContext, double, long, int) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
- poissonRDD(SparkContext, double, long, int, long) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
Generates an RDD comprised of i.i.d.
samples from the Poisson distribution with the input
mean.
- PoissonSampler<T> - Class in org.apache.spark.util.random
-
:: DeveloperApi ::
A sampler for sampling with replacement, based on values drawn from Poisson distribution.
- PoissonSampler(double, boolean, ClassTag<T>) - Constructor for class org.apache.spark.util.random.PoissonSampler
-
- PoissonSampler(double, ClassTag<T>) - Constructor for class org.apache.spark.util.random.PoissonSampler
-
- poissonVectorRDD(SparkContext, double, long, int, int, long) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
Generates an RDD[Vector] with vectors containing i.i.d.
samples drawn from the
Poisson distribution with the input mean.
- PolynomialExpansion - Class in org.apache.spark.ml.feature
-
:: Experimental ::
Perform feature expansion in a polynomial space.
- PolynomialExpansion(String) - Constructor for class org.apache.spark.ml.feature.PolynomialExpansion
-
- PolynomialExpansion() - Constructor for class org.apache.spark.ml.feature.PolynomialExpansion
-
- poolToActiveStages() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- port() - Method in class org.apache.spark.storage.BlockManagerId
-
- port() - Method in class org.apache.spark.streaming.kafka.Broker
-
Broker's port
- PortableDataStream - Class in org.apache.spark.input
-
A class that allows DataStreams to be serialized and moved around by not creating them
until they need to be read
- PortableDataStream(CombineFileSplit, TaskAttemptContext, Integer) - Constructor for class org.apache.spark.input.PortableDataStream
-
- PostgresDialect - Class in org.apache.spark.sql.jdbc
-
:: DeveloperApi ::
Default postgres dialect, mapping bit/cidr/inet on read and string/binary/boolean on write.
- PostgresDialect() - Constructor for class org.apache.spark.sql.jdbc.PostgresDialect
-
- pow(Column, Column) - Static method in class org.apache.spark.sql.functions
-
Returns the value of the first argument raised to the power of the second argument.
- pow(Column, String) - Static method in class org.apache.spark.sql.functions
-
Returns the value of the first argument raised to the power of the second argument.
- pow(String, Column) - Static method in class org.apache.spark.sql.functions
-
Returns the value of the first argument raised to the power of the second argument.
- pow(String, String) - Static method in class org.apache.spark.sql.functions
-
Returns the value of the first argument raised to the power of the second argument.
- pow(Column, double) - Static method in class org.apache.spark.sql.functions
-
Returns the value of the first argument raised to the power of the second argument.
- pow(String, double) - Static method in class org.apache.spark.sql.functions
-
Returns the value of the first argument raised to the power of the second argument.
- pow(double, Column) - Static method in class org.apache.spark.sql.functions
-
Returns the value of the first argument raised to the power of the second argument.
- pow(double, String) - Static method in class org.apache.spark.sql.functions
-
Returns the value of the first argument raised to the power of the second argument.
- PowerIterationClustering - Class in org.apache.spark.mllib.clustering
-
- PowerIterationClustering() - Constructor for class org.apache.spark.mllib.clustering.PowerIterationClustering
-
- PowerIterationClustering.Assignment - Class in org.apache.spark.mllib.clustering
-
:: Experimental ::
Cluster assignment.
- PowerIterationClustering.Assignment(long, int) - Constructor for class org.apache.spark.mllib.clustering.PowerIterationClustering.Assignment
-
- PowerIterationClustering.Assignment$ - Class in org.apache.spark.mllib.clustering
-
- PowerIterationClustering.Assignment$() - Constructor for class org.apache.spark.mllib.clustering.PowerIterationClustering.Assignment$
-
- PowerIterationClusteringModel - Class in org.apache.spark.mllib.clustering
-
:: Experimental ::
- PowerIterationClusteringModel(int, RDD<PowerIterationClustering.Assignment>) - Constructor for class org.apache.spark.mllib.clustering.PowerIterationClusteringModel
-
- pr() - Method in class org.apache.spark.ml.classification.BinaryLogisticRegressionSummary
-
Returns the precision-recall curve, which is an Dataframe containing
two fields recall, precision with (0.0, 1.0) prepended to it.
- pr() - Method in class org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
-
Returns the precision-recall curve, which is an RDD of (recall, precision),
NOT (precision, recall), with (0.0, 1.0) prepended to it.
- precision(double) - Method in class org.apache.spark.mllib.evaluation.MulticlassMetrics
-
Returns precision for a given label (category)
- precision() - Method in class org.apache.spark.mllib.evaluation.MulticlassMetrics
-
Returns precision
- precision() - Method in class org.apache.spark.mllib.evaluation.MultilabelMetrics
-
Returns document-based precision averaged by the number of documents
- precision(double) - Method in class org.apache.spark.mllib.evaluation.MultilabelMetrics
-
Returns precision for a given label (category)
- precision() - Method in class org.apache.spark.sql.types.Decimal
-
- precision() - Method in class org.apache.spark.sql.types.DecimalType
-
- precision() - Method in class org.apache.spark.sql.types.PrecisionInfo
-
- precisionAt(int) - Method in class org.apache.spark.mllib.evaluation.RankingMetrics
-
Compute the average precision of all the queries, truncated at ranking position k.
- precisionByThreshold() - Method in class org.apache.spark.ml.classification.BinaryLogisticRegressionSummary
-
Returns a dataframe with two fields (threshold, precision) curve.
- precisionByThreshold() - Method in class org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
-
Returns the (threshold, precision) curve.
- precisionInfo() - Method in class org.apache.spark.sql.types.DecimalType
-
- PrecisionInfo - Class in org.apache.spark.sql.types
-
Precision parameters for a Decimal
- PrecisionInfo(int, int) - Constructor for class org.apache.spark.sql.types.PrecisionInfo
-
- predict(FeaturesType) - Method in class org.apache.spark.ml.classification.ClassificationModel
-
Predict label for the given features.
- predict(Vector) - Method in class org.apache.spark.ml.classification.DecisionTreeClassificationModel
-
- predict(Vector) - Method in class org.apache.spark.ml.classification.GBTClassificationModel
-
- predict(Vector) - Method in class org.apache.spark.ml.classification.LogisticRegressionModel
-
Predict label for the given feature vector.
- predict(Vector) - Method in class org.apache.spark.ml.classification.MultilayerPerceptronClassificationModel
-
Predict label for the given features.
- predict(FeaturesType) - Method in class org.apache.spark.ml.PredictionModel
-
Predict label for the given features.
- predict(Vector) - Method in class org.apache.spark.ml.regression.DecisionTreeRegressionModel
-
- predict(Vector) - Method in class org.apache.spark.ml.regression.GBTRegressionModel
-
- predict(Vector) - Method in class org.apache.spark.ml.regression.LinearRegressionModel
-
- predict(Vector) - Method in class org.apache.spark.ml.regression.RandomForestRegressionModel
-
- predict(RDD<Vector>) - Method in interface org.apache.spark.mllib.classification.ClassificationModel
-
Predict values for the given data set using the model trained.
- predict(Vector) - Method in interface org.apache.spark.mllib.classification.ClassificationModel
-
Predict values for a single data point using the model trained.
- predict(JavaRDD<Vector>) - Method in interface org.apache.spark.mllib.classification.ClassificationModel
-
Predict values for examples stored in a JavaRDD.
- predict(RDD<Vector>) - Method in class org.apache.spark.mllib.classification.NaiveBayesModel
-
- predict(Vector) - Method in class org.apache.spark.mllib.classification.NaiveBayesModel
-
- predict(RDD<Vector>) - Method in class org.apache.spark.mllib.clustering.GaussianMixtureModel
-
Maps given points to their cluster indices.
- predict(Vector) - Method in class org.apache.spark.mllib.clustering.GaussianMixtureModel
-
Maps given point to its cluster index.
- predict(JavaRDD<Vector>) - Method in class org.apache.spark.mllib.clustering.GaussianMixtureModel
-
Java-friendly version of predict()
- predict(Vector) - Method in class org.apache.spark.mllib.clustering.KMeansModel
-
Returns the cluster index that a given point belongs to.
- predict(RDD<Vector>) - Method in class org.apache.spark.mllib.clustering.KMeansModel
-
Maps given points to their cluster indices.
- predict(JavaRDD<Vector>) - Method in class org.apache.spark.mllib.clustering.KMeansModel
-
Maps given points to their cluster indices.
- predict(int, int) - Method in class org.apache.spark.mllib.recommendation.MatrixFactorizationModel
-
Predict the rating of one user for one product.
- predict(RDD<Tuple2<Object, Object>>) - Method in class org.apache.spark.mllib.recommendation.MatrixFactorizationModel
-
Predict the rating of many users for many products.
- predict(JavaPairRDD<Integer, Integer>) - Method in class org.apache.spark.mllib.recommendation.MatrixFactorizationModel
-
Java-friendly version of MatrixFactorizationModel.predict
.
- predict(RDD<Vector>) - Method in class org.apache.spark.mllib.regression.GeneralizedLinearModel
-
Predict values for the given data set using the model trained.
- predict(Vector) - Method in class org.apache.spark.mllib.regression.GeneralizedLinearModel
-
Predict values for a single data point using the model trained.
- predict(RDD<Object>) - Method in class org.apache.spark.mllib.regression.IsotonicRegressionModel
-
Predict labels for provided features.
- predict(JavaDoubleRDD) - Method in class org.apache.spark.mllib.regression.IsotonicRegressionModel
-
Predict labels for provided features.
- predict(double) - Method in class org.apache.spark.mllib.regression.IsotonicRegressionModel
-
Predict a single label.
- predict(RDD<Vector>) - Method in interface org.apache.spark.mllib.regression.RegressionModel
-
Predict values for the given data set using the model trained.
- predict(Vector) - Method in interface org.apache.spark.mllib.regression.RegressionModel
-
Predict values for a single data point using the model trained.
- predict(JavaRDD<Vector>) - Method in interface org.apache.spark.mllib.regression.RegressionModel
-
Predict values for examples stored in a JavaRDD.
- predict(Vector) - Method in class org.apache.spark.mllib.tree.model.DecisionTreeModel
-
Predict values for a single data point using the model trained.
- predict(RDD<Vector>) - Method in class org.apache.spark.mllib.tree.model.DecisionTreeModel
-
Predict values for the given data set using the model trained.
- predict(JavaRDD<Vector>) - Method in class org.apache.spark.mllib.tree.model.DecisionTreeModel
-
Predict values for the given data set using the model trained.
- predict() - Method in class org.apache.spark.mllib.tree.model.Node
-
- predict(Vector) - Method in class org.apache.spark.mllib.tree.model.Node
-
predict value if node is not leaf
- Predict - Class in org.apache.spark.mllib.tree.model
-
Predicted value for a node
param: predict predicted value
param: prob probability of the label (classification only)
- Predict(double, double) - Constructor for class org.apache.spark.mllib.tree.model.Predict
-
- predict() - Method in class org.apache.spark.mllib.tree.model.Predict
-
- prediction() - Method in class org.apache.spark.ml.tree.InternalNode
-
- prediction() - Method in class org.apache.spark.ml.tree.LeafNode
-
- prediction() - Method in class org.apache.spark.ml.tree.Node
-
Prediction a leaf node makes, or which an internal node would make if it were a leaf node
- predictionCol() - Method in class org.apache.spark.ml.regression.LinearRegressionSummary
-
- PredictionModel<FeaturesType,M extends PredictionModel<FeaturesType,M>> - Class in org.apache.spark.ml
-
:: DeveloperApi ::
Abstraction for a model for prediction tasks (regression and classification).
- PredictionModel() - Constructor for class org.apache.spark.ml.PredictionModel
-
- predictions() - Method in class org.apache.spark.ml.classification.BinaryLogisticRegressionSummary
-
- predictions() - Method in interface org.apache.spark.ml.classification.LogisticRegressionSummary
-
Dataframe outputted by the model's `transform` method.
- predictions() - Method in class org.apache.spark.ml.regression.IsotonicRegressionModel
-
Predictions associated with the boundaries at the same index, monotone because of isotonic
regression.
- predictions() - Method in class org.apache.spark.ml.regression.LinearRegressionSummary
-
- predictions() - Method in class org.apache.spark.mllib.regression.IsotonicRegressionModel
-
- predictOn(DStream<Vector>) - Method in class org.apache.spark.mllib.clustering.StreamingKMeans
-
Use the clustering model to make predictions on batches of data from a DStream.
- predictOn(JavaDStream<Vector>) - Method in class org.apache.spark.mllib.clustering.StreamingKMeans
-
Java-friendly version of predictOn
.
- predictOn(DStream<Vector>) - Method in class org.apache.spark.mllib.regression.StreamingLinearAlgorithm
-
Use the model to make predictions on batches of data from a DStream
- predictOn(JavaDStream<Vector>) - Method in class org.apache.spark.mllib.regression.StreamingLinearAlgorithm
-
Java-friendly version of predictOn
.
- predictOnValues(DStream<Tuple2<K, Vector>>, ClassTag<K>) - Method in class org.apache.spark.mllib.clustering.StreamingKMeans
-
Use the model to make predictions on the values of a DStream and carry over its keys.
- predictOnValues(JavaPairDStream<K, Vector>) - Method in class org.apache.spark.mllib.clustering.StreamingKMeans
-
Java-friendly version of predictOnValues
.
- predictOnValues(DStream<Tuple2<K, Vector>>, ClassTag<K>) - Method in class org.apache.spark.mllib.regression.StreamingLinearAlgorithm
-
Use the model to make predictions on the values of a DStream and carry over its keys.
- predictOnValues(JavaPairDStream<K, Vector>) - Method in class org.apache.spark.mllib.regression.StreamingLinearAlgorithm
-
Java-friendly version of predictOnValues
.
- Predictor<FeaturesType,Learner extends Predictor<FeaturesType,Learner,M>,M extends PredictionModel<FeaturesType,M>> - Class in org.apache.spark.ml
-
:: DeveloperApi ::
Abstraction for prediction problems (regression and classification).
- Predictor() - Constructor for class org.apache.spark.ml.Predictor
-
- predictPoint(Vector, Vector, double) - Method in class org.apache.spark.mllib.classification.LogisticRegressionModel
-
- predictPoint(Vector, Vector, double) - Method in class org.apache.spark.mllib.classification.SVMModel
-
- predictPoint(Vector, Vector, double) - Method in class org.apache.spark.mllib.regression.GeneralizedLinearModel
-
Predict the result given a data point and the weights learned.
- predictPoint(Vector, Vector, double) - Method in class org.apache.spark.mllib.regression.LassoModel
-
- predictPoint(Vector, Vector, double) - Method in class org.apache.spark.mllib.regression.LinearRegressionModel
-
- predictPoint(Vector, Vector, double) - Method in class org.apache.spark.mllib.regression.RidgeRegressionModel
-
- predictProbabilities(RDD<Vector>) - Method in class org.apache.spark.mllib.classification.NaiveBayesModel
-
Predict values for the given data set using the model trained.
- predictProbabilities(Vector) - Method in class org.apache.spark.mllib.classification.NaiveBayesModel
-
Predict posterior class probabilities for a single data point using the model trained.
- predictProbability(FeaturesType) - Method in class org.apache.spark.ml.classification.ProbabilisticClassificationModel
-
Predict the probability of each class given the features.
- predictRaw(FeaturesType) - Method in class org.apache.spark.ml.classification.ClassificationModel
-
Raw prediction for each possible label.
- predictRaw(Vector) - Method in class org.apache.spark.ml.classification.DecisionTreeClassificationModel
-
- predictRaw(Vector) - Method in class org.apache.spark.ml.classification.LogisticRegressionModel
-
- predictRaw(Vector) - Method in class org.apache.spark.ml.classification.NaiveBayesModel
-
- predictRaw(Vector) - Method in class org.apache.spark.ml.classification.RandomForestClassificationModel
-
- predictSoft(RDD<Vector>) - Method in class org.apache.spark.mllib.clustering.GaussianMixtureModel
-
Given the input vectors, return the membership value of each vector
to all mixture components.
- predictSoft(Vector) - Method in class org.apache.spark.mllib.clustering.GaussianMixtureModel
-
Given the input vector, return the membership values to all mixture components.
- preferredLocation() - Method in class org.apache.spark.streaming.receiver.Receiver
-
Override this to specify a preferred location (hostname).
- preferredLocations(Partition) - Method in class org.apache.spark.rdd.RDD
-
Get the preferred locations of a partition, taking into account whether the
RDD is checkpointed.
- preferredNodeLocationData() - Method in class org.apache.spark.SparkContext
-
- PrefixSpan - Class in org.apache.spark.mllib.fpm
-
:: Experimental ::
- PrefixSpan() - Constructor for class org.apache.spark.mllib.fpm.PrefixSpan
-
Constructs a default instance with default parameters
{minSupport: 0.1
, maxPatternLength: 10
, maxLocalProjDBSize: 32000000L
}.
- PrefixSpan.FreqSequence<Item> - Class in org.apache.spark.mllib.fpm
-
Represents a frequence sequence.
- PrefixSpan.FreqSequence(Object[], long) - Constructor for class org.apache.spark.mllib.fpm.PrefixSpan.FreqSequence
-
- PrefixSpanModel<Item> - Class in org.apache.spark.mllib.fpm
-
Model fitted by
PrefixSpan
param: freqSequences frequent sequences
- PrefixSpanModel(RDD<PrefixSpan.FreqSequence<Item>>) - Constructor for class org.apache.spark.mllib.fpm.PrefixSpanModel
-
- prefLoc() - Method in class org.apache.spark.rdd.PartitionGroup
-
- pregel(A, int, EdgeDirection, Function3<Object, VD, A, VD>, Function1<EdgeTriplet<VD, ED>, Iterator<Tuple2<Object, A>>>, Function2<A, A, A>, ClassTag<A>) - Method in class org.apache.spark.graphx.GraphOps
-
Execute a Pregel-like iterative vertex-parallel abstraction.
- Pregel - Class in org.apache.spark.graphx
-
Implements a Pregel-like bulk-synchronous message-passing API.
- Pregel() - Constructor for class org.apache.spark.graphx.Pregel
-
- prepareForExecution() - Method in class org.apache.spark.sql.SQLContext
-
- prepareJobForWrite(Job) - Method in class org.apache.spark.sql.sources.HadoopFsRelation
-
- prettyJson() - Method in class org.apache.spark.sql.types.DataType
-
The pretty (i.e.
- prettyPrint() - Method in class org.apache.spark.streaming.Duration
-
- prev() - Method in class org.apache.spark.rdd.ShuffledRDD
-
- prevHandler() - Method in class org.apache.spark.util.SignalLoggerHandler
-
- primitiveTypes() - Static method in class org.apache.spark.sql.hive.HiveContext
-
- print() - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Print the first ten elements of each RDD generated in this DStream.
- print(int) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Print the first num elements of each RDD generated in this DStream.
- print() - Method in class org.apache.spark.streaming.dstream.DStream
-
Print the first ten elements of each RDD generated in this DStream.
- print(int) - Method in class org.apache.spark.streaming.dstream.DStream
-
Print the first num elements of each RDD generated in this DStream.
- printSchema() - Method in class org.apache.spark.sql.DataFrame
-
Prints the schema to the console in a nice tree format.
- printStats() - Method in class org.apache.spark.streaming.scheduler.StatsReportListener
-
- printTreeString() - Method in class org.apache.spark.sql.types.StructType
-
- Private - Annotation Type in org.apache.spark.annotation
-
A class that is considered private to the internals of Spark -- there is a high-likelihood
they will be changed in future versions of Spark.
- prob() - Method in class org.apache.spark.mllib.tree.model.Predict
-
- ProbabilisticClassificationModel<FeaturesType,M extends ProbabilisticClassificationModel<FeaturesType,M>> - Class in org.apache.spark.ml.classification
-
:: DeveloperApi ::
- ProbabilisticClassificationModel() - Constructor for class org.apache.spark.ml.classification.ProbabilisticClassificationModel
-
- ProbabilisticClassifier<FeaturesType,E extends ProbabilisticClassifier<FeaturesType,E,M>,M extends ProbabilisticClassificationModel<FeaturesType,M>> - Class in org.apache.spark.ml.classification
-
:: DeveloperApi ::
- ProbabilisticClassifier() - Constructor for class org.apache.spark.ml.classification.ProbabilisticClassifier
-
- probabilities() - Static method in class org.apache.spark.scheduler.StatsReportListener
-
- probability2prediction(Vector) - Method in class org.apache.spark.ml.classification.LogisticRegressionModel
-
- probability2prediction(Vector) - Method in class org.apache.spark.ml.classification.ProbabilisticClassificationModel
-
Given a vector of class conditional probabilities, select the predicted label.
- probabilityCol() - Method in class org.apache.spark.ml.classification.BinaryLogisticRegressionSummary
-
- probabilityCol() - Method in interface org.apache.spark.ml.classification.LogisticRegressionSummary
-
Field in "predictions" which gives the calibrated probability of each sample as a vector.
- PROCESS_LOCAL() - Static method in class org.apache.spark.scheduler.TaskLocality
-
- processingDelay() - Method in class org.apache.spark.streaming.scheduler.BatchInfo
-
Time taken for the all jobs of this batch to finish processing from the time they started
processing.
- processingEndTime() - Method in class org.apache.spark.streaming.scheduler.BatchInfo
-
- processingStartTime() - Method in class org.apache.spark.streaming.scheduler.BatchInfo
-
- product() - Method in class org.apache.spark.mllib.recommendation.Rating
-
- productFeatures() - Method in class org.apache.spark.mllib.recommendation.MatrixFactorizationModel
-
- progressListener() - Method in class org.apache.spark.streaming.StreamingContext
-
- properties() - Method in class org.apache.spark.scheduler.SparkListenerJobStart
-
- properties() - Method in class org.apache.spark.scheduler.SparkListenerStageSubmitted
-
- PrunedFilteredScan - Interface in org.apache.spark.sql.sources
-
::DeveloperApi::
A BaseRelation that can eliminate unneeded columns and filter using selected
predicates before producing an RDD containing all matching tuples as Row objects.
- PrunedScan - Interface in org.apache.spark.sql.sources
-
::DeveloperApi::
A BaseRelation that can eliminate unneeded columns before producing an RDD
containing all of its tuples as Row objects.
- pruneFilterProject(Seq<NamedExpression>, Seq<Expression>, Function1<Seq<Expression>, Seq<Expression>>, Function1<Seq<Attribute>, SparkPlan>) - Method in class org.apache.spark.sql.SQLContext.SparkPlanner
-
- Pseudorandom - Interface in org.apache.spark.util.random
-
:: DeveloperApi ::
A class with pseudorandom behavior.
- put(ParamPair<?>...) - Method in class org.apache.spark.ml.param.ParamMap
-
Puts a list of param pairs (overwrites if the input params exists).
- put(Param<T>, T) - Method in class org.apache.spark.ml.param.ParamMap
-
Puts a (param, value) pair (overwrites if the input param exists).
- put(Seq<ParamPair<?>>) - Method in class org.apache.spark.ml.param.ParamMap
-
Puts a list of param pairs (overwrites if the input params exists).
- putBoolean(String, boolean) - Method in class org.apache.spark.sql.types.MetadataBuilder
-
Puts a Boolean.
- putBooleanArray(String, boolean[]) - Method in class org.apache.spark.sql.types.MetadataBuilder
-
Puts a Boolean array.
- putDouble(String, double) - Method in class org.apache.spark.sql.types.MetadataBuilder
-
Puts a Double.
- putDoubleArray(String, double[]) - Method in class org.apache.spark.sql.types.MetadataBuilder
-
Puts a Double array.
- putLong(String, long) - Method in class org.apache.spark.sql.types.MetadataBuilder
-
Puts a Long.
- putLongArray(String, long[]) - Method in class org.apache.spark.sql.types.MetadataBuilder
-
Puts a Long array.
- putMetadata(String, Metadata) - Method in class org.apache.spark.sql.types.MetadataBuilder
-
- putMetadataArray(String, Metadata[]) - Method in class org.apache.spark.sql.types.MetadataBuilder
-
- putString(String, String) - Method in class org.apache.spark.sql.types.MetadataBuilder
-
Puts a String.
- putStringArray(String, String[]) - Method in class org.apache.spark.sql.types.MetadataBuilder
-
Puts a String array.
- pValue() - Method in class org.apache.spark.mllib.stat.test.ChiSqTestResult
-
- pValue() - Method in class org.apache.spark.mllib.stat.test.KolmogorovSmirnovTestResult
-
- pValue() - Method in interface org.apache.spark.mllib.stat.test.TestResult
-
The probability of obtaining a test statistic result at least as extreme as the one that was
actually observed, assuming that the null hypothesis is true.
- pyUDT() - Method in class org.apache.spark.mllib.linalg.VectorUDT
-
- pyUDT() - Method in class org.apache.spark.sql.types.UserDefinedType
-
Paired Python UDT class, if exists.
- R() - Method in class org.apache.spark.mllib.linalg.QRDecomposition
-
- r2() - Method in class org.apache.spark.ml.regression.LinearRegressionSummary
-
Returns R^2^, the coefficient of determination.
- r2() - Method in class org.apache.spark.mllib.evaluation.RegressionMetrics
-
Returns R^2^, the unadjusted coefficient of determination.
- RACK_LOCAL() - Static method in class org.apache.spark.scheduler.TaskLocality
-
- rand(int, int, Random) - Static method in class org.apache.spark.mllib.linalg.DenseMatrix
-
Generate a DenseMatrix
consisting of i.i.d.
uniform random numbers.
- rand(int, int, Random) - Static method in class org.apache.spark.mllib.linalg.Matrices
-
Generate a DenseMatrix
consisting of i.i.d.
uniform random numbers.
- rand(long) - Static method in class org.apache.spark.sql.functions
-
Generate a random column with i.i.d.
- rand() - Static method in class org.apache.spark.sql.functions
-
Generate a random column with i.i.d.
- randn(int, int, Random) - Static method in class org.apache.spark.mllib.linalg.DenseMatrix
-
Generate a DenseMatrix
consisting of i.i.d.
gaussian random numbers.
- randn(int, int, Random) - Static method in class org.apache.spark.mllib.linalg.Matrices
-
Generate a DenseMatrix
consisting of i.i.d.
gaussian random numbers.
- randn(long) - Static method in class org.apache.spark.sql.functions
-
Generate a column with i.i.d.
- randn() - Static method in class org.apache.spark.sql.functions
-
Generate a column with i.i.d.
- RANDOM() - Static method in class org.apache.spark.mllib.clustering.KMeans
-
- random(int, Random) - Static method in class org.apache.spark.util.Vector
-
Creates this
Vector
of given length containing random numbers
between 0.0 and 1.0.
- RandomDataGenerator<T> - Interface in org.apache.spark.mllib.random
-
:: DeveloperApi ::
Trait for random data generators that generate i.i.d.
- RandomForest - Class in org.apache.spark.mllib.tree
-
:: Experimental ::
A class that implements a Random Forest
learning algorithm for classification and regression.
- RandomForest(Strategy, int, String, int) - Constructor for class org.apache.spark.mllib.tree.RandomForest
-
- RandomForestClassificationModel - Class in org.apache.spark.ml.classification
-
:: Experimental ::
Random Forest
model for classification.
- RandomForestClassifier - Class in org.apache.spark.ml.classification
-
:: Experimental ::
Random Forest
learning algorithm for
classification.
- RandomForestClassifier(String) - Constructor for class org.apache.spark.ml.classification.RandomForestClassifier
-
- RandomForestClassifier() - Constructor for class org.apache.spark.ml.classification.RandomForestClassifier
-
- RandomForestModel - Class in org.apache.spark.mllib.tree.model
-
:: Experimental ::
Represents a random forest model.
- RandomForestModel(Enumeration.Value, DecisionTreeModel[]) - Constructor for class org.apache.spark.mllib.tree.model.RandomForestModel
-
- RandomForestRegressionModel - Class in org.apache.spark.ml.regression
-
:: Experimental ::
Random Forest
model for regression.
- RandomForestRegressor - Class in org.apache.spark.ml.regression
-
:: Experimental ::
Random Forest
learning algorithm for regression.
- RandomForestRegressor(String) - Constructor for class org.apache.spark.ml.regression.RandomForestRegressor
-
- RandomForestRegressor() - Constructor for class org.apache.spark.ml.regression.RandomForestRegressor
-
- randomRDD(SparkContext, RandomDataGenerator<T>, long, int, long, ClassTag<T>) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
:: DeveloperApi ::
Generates an RDD comprised of i.i.d.
samples produced by the input RandomDataGenerator.
- RandomRDDs - Class in org.apache.spark.mllib.random
-
:: Experimental ::
Generator methods for creating RDDs comprised of i.i.d.
samples from some distribution.
- RandomRDDs() - Constructor for class org.apache.spark.mllib.random.RandomRDDs
-
- RandomSampler<T,U> - Interface in org.apache.spark.util.random
-
:: DeveloperApi ::
A pseudorandom sampler.
- randomSplit(double[]) - Method in class org.apache.spark.api.java.JavaRDD
-
Randomly splits this RDD with the provided weights.
- randomSplit(double[], long) - Method in class org.apache.spark.api.java.JavaRDD
-
Randomly splits this RDD with the provided weights.
- randomSplit(double[], long) - Method in class org.apache.spark.rdd.RDD
-
Randomly splits this RDD with the provided weights.
- randomSplit(double[], long) - Method in class org.apache.spark.sql.DataFrame
-
Randomly splits this
DataFrame
with the provided weights.
- randomSplit(double[]) - Method in class org.apache.spark.sql.DataFrame
-
Randomly splits this
DataFrame
with the provided weights.
- randomVectorRDD(SparkContext, RandomDataGenerator<Object>, long, int, int, long) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
:: DeveloperApi ::
Generates an RDD[Vector] with vectors containing i.i.d.
samples produced by the
input RandomDataGenerator.
- range(long, long, long, int) - Method in class org.apache.spark.SparkContext
-
Creates a new RDD[Long] containing elements from start
to end
(exclusive), increased by
step
every element.
- range(long) - Method in class org.apache.spark.sql.SQLContext
-
- range(long, long) - Method in class org.apache.spark.sql.SQLContext
-
- range(long, long, long, int) - Method in class org.apache.spark.sql.SQLContext
-
- rangeBetween(long, long) - Method in class org.apache.spark.sql.expressions.WindowSpec
-
Defines the frame boundaries, from start
(inclusive) to end
(inclusive).
- RangeDependency<T> - Class in org.apache.spark
-
:: DeveloperApi ::
Represents a one-to-one dependency between ranges of partitions in the parent and child RDDs.
- RangeDependency(RDD<T>, int, int, int) - Constructor for class org.apache.spark.RangeDependency
-
- RangePartitioner<K,V> - Class in org.apache.spark
-
A
Partitioner
that partitions sortable records by range into roughly
equal ranges.
- RangePartitioner(int, RDD<? extends Product2<K, V>>, boolean, Ordering<K>, ClassTag<K>) - Constructor for class org.apache.spark.RangePartitioner
-
- rank() - Method in class org.apache.spark.graphx.lib.SVDPlusPlus.Conf
-
- rank() - Method in class org.apache.spark.ml.recommendation.ALSModel
-
- rank() - Method in class org.apache.spark.mllib.recommendation.MatrixFactorizationModel
-
- rank() - Static method in class org.apache.spark.sql.functions
-
Window function: returns the rank of rows within a window partition.
- RankingMetrics<T> - Class in org.apache.spark.mllib.evaluation
-
::Experimental::
Evaluator for ranking algorithms.
- RankingMetrics(RDD<Tuple2<Object, Object>>, ClassTag<T>) - Constructor for class org.apache.spark.mllib.evaluation.RankingMetrics
-
- rateController() - Method in class org.apache.spark.streaming.dstream.InputDStream
-
- rateController() - Method in class org.apache.spark.streaming.dstream.ReceiverInputDStream
-
Asynchronously maintains & sends new rate limits to the receiver through the receiver tracker.
- rating() - Method in class org.apache.spark.ml.recommendation.ALS.Rating
-
- Rating - Class in org.apache.spark.mllib.recommendation
-
A more compact class to represent a rating than Tuple3[Int, Int, Double].
- Rating(int, int, double) - Constructor for class org.apache.spark.mllib.recommendation.Rating
-
- rating() - Method in class org.apache.spark.mllib.recommendation.Rating
-
- raw2prediction(Vector) - Method in class org.apache.spark.ml.classification.ClassificationModel
-
Given a vector of raw predictions, select the predicted label.
- raw2prediction(Vector) - Method in class org.apache.spark.ml.classification.LogisticRegressionModel
-
- raw2prediction(Vector) - Method in class org.apache.spark.ml.classification.ProbabilisticClassificationModel
-
- raw2probability(Vector) - Method in class org.apache.spark.ml.classification.ProbabilisticClassificationModel
-
Non-in-place version of raw2probabilityInPlace()
- raw2probabilityInPlace(Vector) - Method in class org.apache.spark.ml.classification.DecisionTreeClassificationModel
-
- raw2probabilityInPlace(Vector) - Method in class org.apache.spark.ml.classification.LogisticRegressionModel
-
- raw2probabilityInPlace(Vector) - Method in class org.apache.spark.ml.classification.NaiveBayesModel
-
- raw2probabilityInPlace(Vector) - Method in class org.apache.spark.ml.classification.ProbabilisticClassificationModel
-
Estimate the probability of each class given the raw prediction,
doing the computation in-place.
- raw2probabilityInPlace(Vector) - Method in class org.apache.spark.ml.classification.RandomForestClassificationModel
-
- rawSocketStream(String, int, StorageLevel) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Create an input stream from network source hostname:port, where data is received
as serialized blocks (serialized using the Spark's serializer) that can be directly
pushed into the block manager without deserializing them.
- rawSocketStream(String, int) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Create an input stream from network source hostname:port, where data is received
as serialized blocks (serialized using the Spark's serializer) that can be directly
pushed into the block manager without deserializing them.
- rawSocketStream(String, int, StorageLevel, ClassTag<T>) - Method in class org.apache.spark.streaming.StreamingContext
-
Create a input stream from network source hostname:port, where data is received
as serialized blocks (serialized using the Spark's serializer) that can be directly
pushed into the block manager without deserializing them.
- rdd() - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
- rdd() - Method in class org.apache.spark.api.java.JavaPairRDD
-
- rdd() - Method in class org.apache.spark.api.java.JavaRDD
-
- rdd() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
- rdd() - Method in class org.apache.spark.Dependency
-
- rdd() - Method in class org.apache.spark.NarrowDependency
-
- RDD<T> - Class in org.apache.spark.rdd
-
A Resilient Distributed Dataset (RDD), the basic abstraction in Spark.
- RDD(SparkContext, Seq<Dependency<?>>, ClassTag<T>) - Constructor for class org.apache.spark.rdd.RDD
-
- RDD(RDD<?>, ClassTag<T>) - Constructor for class org.apache.spark.rdd.RDD
-
Construct an RDD with just a one-to-one dependency on one parent
- rdd() - Method in class org.apache.spark.ShuffleDependency
-
- rdd() - Method in class org.apache.spark.sql.DataFrame
-
- RDD() - Static method in class org.apache.spark.storage.BlockId
-
- RDD_SCOPE_KEY() - Static method in class org.apache.spark.SparkContext
-
- RDD_SCOPE_NO_OVERRIDE_KEY() - Static method in class org.apache.spark.SparkContext
-
- RDDBlockId - Class in org.apache.spark.storage
-
- RDDBlockId(int, int) - Constructor for class org.apache.spark.storage.RDDBlockId
-
- rddBlocks() - Method in class org.apache.spark.status.api.v1.ExecutorSummary
-
- rddBlocks() - Method in class org.apache.spark.storage.StorageStatus
-
Return the RDD blocks stored in this block manager.
- rddBlocksById(int) - Method in class org.apache.spark.storage.StorageStatus
-
Return the blocks that belong to the given RDD stored in this block manager.
- RDDDataDistribution - Class in org.apache.spark.status.api.v1
-
- RDDFunctions<T> - Class in org.apache.spark.mllib.rdd
-
Machine learning specific RDD functions.
- RDDFunctions(RDD<T>, ClassTag<T>) - Constructor for class org.apache.spark.mllib.rdd.RDDFunctions
-
- rddId() - Method in class org.apache.spark.CleanCheckpoint
-
- rddId() - Method in class org.apache.spark.CleanRDD
-
- rddId() - Method in class org.apache.spark.scheduler.SparkListenerUnpersistRDD
-
- rddId() - Method in class org.apache.spark.storage.RDDBlockId
-
- RDDInfo - Class in org.apache.spark.storage
-
- RDDInfo(int, String, int, StorageLevel, Seq<Object>, Option<org.apache.spark.rdd.RDDOperationScope>) - Constructor for class org.apache.spark.storage.RDDInfo
-
- rddInfoList() - Method in class org.apache.spark.ui.storage.StorageListener
-
Filter RDD info to include only those with cached partitions
- rddInfos() - Method in class org.apache.spark.scheduler.StageInfo
-
- RDDPartitionInfo - Class in org.apache.spark.status.api.v1
-
- rdds() - Method in class org.apache.spark.rdd.CoGroupedRDD
-
- rdds() - Method in class org.apache.spark.rdd.UnionRDD
-
- RDDStorageInfo - Class in org.apache.spark.status.api.v1
-
- rddStorageLevel(int) - Method in class org.apache.spark.storage.StorageStatus
-
Return the storage level, if any, used by the given RDD in this block manager.
- rddToAsyncRDDActions(RDD<T>, ClassTag<T>) - Static method in class org.apache.spark.rdd.RDD
-
- rddToAsyncRDDActions(RDD<T>, ClassTag<T>) - Static method in class org.apache.spark.SparkContext
-
- rddToOrderedRDDFunctions(RDD<Tuple2<K, V>>, Ordering<K>, ClassTag<K>, ClassTag<V>) - Static method in class org.apache.spark.rdd.RDD
-
- rddToOrderedRDDFunctions(RDD<Tuple2<K, V>>, Ordering<K>, ClassTag<K>, ClassTag<V>) - Static method in class org.apache.spark.SparkContext
-
- rddToPairRDDFunctions(RDD<Tuple2<K, V>>, ClassTag<K>, ClassTag<V>, Ordering<K>) - Static method in class org.apache.spark.rdd.RDD
-
- rddToPairRDDFunctions(RDD<Tuple2<K, V>>, ClassTag<K>, ClassTag<V>, Ordering<K>) - Static method in class org.apache.spark.SparkContext
-
- rddToSequenceFileRDDFunctions(RDD<Tuple2<K, V>>, ClassTag<K>, ClassTag<V>, <any>, <any>) - Static method in class org.apache.spark.rdd.RDD
-
- rddToSequenceFileRDDFunctions(RDD<Tuple2<K, V>>, Function1<K, Writable>, ClassTag<K>, Function1<V, Writable>, ClassTag<V>) - Static method in class org.apache.spark.SparkContext
-
- read() - Method in class org.apache.spark.api.r.BaseRRDD
-
- read(Kryo, Input, Class<Iterable<?>>) - Method in class org.apache.spark.serializer.JavaIterableWrapperSerializer
-
- read() - Method in class org.apache.spark.sql.SQLContext
-
- read() - Method in class org.apache.spark.storage.BufferReleasingInputStream
-
- read(byte[]) - Method in class org.apache.spark.storage.BufferReleasingInputStream
-
- read(byte[], int, int) - Method in class org.apache.spark.storage.BufferReleasingInputStream
-
- read(WriteAheadLogRecordHandle) - Method in class org.apache.spark.streaming.util.WriteAheadLog
-
Read a written record based on the given record handle.
- readAll() - Method in class org.apache.spark.streaming.util.WriteAheadLog
-
Read and return an iterator of all the records that have been written but not yet cleaned up.
- readBytes() - Method in class org.apache.spark.status.api.v1.ShuffleReadMetricDistributions
-
- readData(int) - Method in class org.apache.spark.api.r.BaseRRDD
-
- readData(int) - Method in class org.apache.spark.api.r.PairwiseRRDD
-
- readData(int) - Method in class org.apache.spark.api.r.RRDD
-
- readData(int) - Method in class org.apache.spark.api.r.StringRRDD
-
- readExternal(ObjectInput) - Method in class org.apache.spark.serializer.JavaSerializer
-
- readExternal(ObjectInput) - Method in class org.apache.spark.storage.BlockManagerId
-
- readExternal(ObjectInput) - Method in class org.apache.spark.storage.StorageLevel
-
- readExternal(ObjectInput) - Method in class org.apache.spark.streaming.flume.SparkFlumeEvent
-
- readKey(ClassTag<T>) - Method in class org.apache.spark.serializer.DeserializationStream
-
Reads the object representing the key of a key-value pair.
- readObject(ClassTag<T>) - Method in class org.apache.spark.serializer.DeserializationStream
-
The most general-purpose method to read an object.
- readRecords() - Method in class org.apache.spark.status.api.v1.ShuffleReadMetricDistributions
-
- readValue(ClassTag<T>) - Method in class org.apache.spark.serializer.DeserializationStream
-
Reads the object representing the value of a key-value pair.
- ready(Duration, CanAwait) - Method in class org.apache.spark.ComplexFutureAction
-
- ready(Duration, CanAwait) - Method in interface org.apache.spark.FutureAction
-
Blocks until this action completes.
- ready(Duration, CanAwait) - Method in class org.apache.spark.SimpleFutureAction
-
- reason() - Method in class org.apache.spark.scheduler.SparkListenerExecutorRemoved
-
- reason() - Method in class org.apache.spark.scheduler.SparkListenerTaskEnd
-
- recall(double) - Method in class org.apache.spark.mllib.evaluation.MulticlassMetrics
-
Returns recall for a given label (category)
- recall() - Method in class org.apache.spark.mllib.evaluation.MulticlassMetrics
-
Returns recall
(equals to precision for multiclass classifier
because sum of all false positives is equal to sum
of all false negatives)
- recall() - Method in class org.apache.spark.mllib.evaluation.MultilabelMetrics
-
Returns document-based recall averaged by the number of documents
- recall(double) - Method in class org.apache.spark.mllib.evaluation.MultilabelMetrics
-
Returns recall for a given label (category)
- recallByThreshold() - Method in class org.apache.spark.ml.classification.BinaryLogisticRegressionSummary
-
Returns a dataframe with two fields (threshold, recall) curve.
- recallByThreshold() - Method in class org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
-
Returns the (threshold, recall) curve.
- Receiver<T> - Class in org.apache.spark.streaming.receiver
-
:: DeveloperApi ::
Abstract class of a receiver that can be run on worker nodes to receive external data.
- Receiver(StorageLevel) - Constructor for class org.apache.spark.streaming.receiver.Receiver
-
- ReceiverInfo - Class in org.apache.spark.streaming.scheduler
-
:: DeveloperApi ::
Class having information about a receiver
- ReceiverInfo(int, String, boolean, String, String, String, long) - Constructor for class org.apache.spark.streaming.scheduler.ReceiverInfo
-
- receiverInfo() - Method in class org.apache.spark.streaming.scheduler.StreamingListenerReceiverError
-
- receiverInfo() - Method in class org.apache.spark.streaming.scheduler.StreamingListenerReceiverStarted
-
- receiverInfo() - Method in class org.apache.spark.streaming.scheduler.StreamingListenerReceiverStopped
-
- receiverInputDStream() - Method in class org.apache.spark.streaming.api.java.JavaPairReceiverInputDStream
-
- receiverInputDStream() - Method in class org.apache.spark.streaming.api.java.JavaReceiverInputDStream
-
- ReceiverInputDStream<T> - Class in org.apache.spark.streaming.dstream
-
Abstract class for defining any
InputDStream
that has to start a receiver on worker nodes to receive external data.
- ReceiverInputDStream(StreamingContext, ClassTag<T>) - Constructor for class org.apache.spark.streaming.dstream.ReceiverInputDStream
-
- receiverStream(Receiver<T>) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Create an input stream with any arbitrary user implemented receiver.
- receiverStream(Receiver<T>, ClassTag<T>) - Method in class org.apache.spark.streaming.StreamingContext
-
Create an input stream with any arbitrary user implemented receiver.
- recommendProducts(int, int) - Method in class org.apache.spark.mllib.recommendation.MatrixFactorizationModel
-
Recommends products to a user.
- recommendProductsForUsers(int) - Method in class org.apache.spark.mllib.recommendation.MatrixFactorizationModel
-
Recommends topK products for all users.
- recommendUsers(int, int) - Method in class org.apache.spark.mllib.recommendation.MatrixFactorizationModel
-
Recommends users to a product.
- recommendUsersForProducts(int) - Method in class org.apache.spark.mllib.recommendation.MatrixFactorizationModel
-
Recommends topK users for all products.
- recordJobProperties(int, Properties) - Method in class org.apache.spark.scheduler.JobLogger
-
Record job properties into job log file
- RECORDS_BETWEEN_BYTES_READ_METRIC_UPDATES() - Static method in class org.apache.spark.rdd.HadoopRDD
-
Update the input bytes read metric each time this number of records has been read
- RECORDS_BETWEEN_BYTES_WRITTEN_METRIC_UPDATES() - Static method in class org.apache.spark.rdd.PairRDDFunctions
-
- recordsRead() - Method in class org.apache.spark.status.api.v1.InputMetricDistributions
-
- recordsRead() - Method in class org.apache.spark.status.api.v1.InputMetrics
-
- recordsRead() - Method in class org.apache.spark.status.api.v1.ShuffleReadMetrics
-
- recordsWritten() - Method in class org.apache.spark.status.api.v1.OutputMetricDistributions
-
- recordsWritten() - Method in class org.apache.spark.status.api.v1.OutputMetrics
-
- recordsWritten() - Method in class org.apache.spark.status.api.v1.ShuffleWriteMetrics
-
- recordTaskMetrics(int, String, TaskInfo, TaskMetrics) - Method in class org.apache.spark.scheduler.JobLogger
-
Record task metrics into job log files, including execution info and shuffle metrics
- reduce(Function2<T, T, T>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Reduces the elements of this RDD using the specified commutative and associative binary
operator.
- reduce(Function2<T, T, T>) - Method in class org.apache.spark.rdd.RDD
-
Reduces the elements of this RDD using the specified commutative and
associative binary operator.
- reduce(Function2<T, T, T>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD has a single element generated by reducing each RDD
of this DStream.
- reduce(Function2<T, T, T>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream in which each RDD has a single element generated by reducing each RDD
of this DStream.
- reduceByKey(Partitioner, Function2<V, V, V>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Merge the values for each key using an associative reduce function.
- reduceByKey(Function2<V, V, V>, int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Merge the values for each key using an associative reduce function.
- reduceByKey(Function2<V, V, V>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Merge the values for each key using an associative reduce function.
- reduceByKey(Partitioner, Function2<V, V, V>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Merge the values for each key using an associative reduce function.
- reduceByKey(Function2<V, V, V>, int) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Merge the values for each key using an associative reduce function.
- reduceByKey(Function2<V, V, V>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Merge the values for each key using an associative reduce function.
- reduceByKey(Function2<V, V, V>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying reduceByKey
to each RDD.
- reduceByKey(Function2<V, V, V>, int) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying reduceByKey
to each RDD.
- reduceByKey(Function2<V, V, V>, Partitioner) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying reduceByKey
to each RDD.
- reduceByKey(Function2<V, V, V>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying reduceByKey
to each RDD.
- reduceByKey(Function2<V, V, V>, int) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying reduceByKey
to each RDD.
- reduceByKey(Function2<V, V, V>, Partitioner) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying reduceByKey
to each RDD.
- reduceByKeyAndWindow(Function2<V, V, V>, Duration) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Create a new DStream by applying reduceByKey
over a sliding window on this
DStream.
- reduceByKeyAndWindow(Function2<V, V, V>, Duration, Duration) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying reduceByKey
over a sliding window.
- reduceByKeyAndWindow(Function2<V, V, V>, Duration, Duration, int) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying reduceByKey
over a sliding window.
- reduceByKeyAndWindow(Function2<V, V, V>, Duration, Duration, Partitioner) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying reduceByKey
over a sliding window.
- reduceByKeyAndWindow(Function2<V, V, V>, Function2<V, V, V>, Duration, Duration) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by reducing over a using incremental computation.
- reduceByKeyAndWindow(Function2<V, V, V>, Function2<V, V, V>, Duration, Duration, int, Function<Tuple2<K, V>, Boolean>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying incremental reduceByKey
over a sliding window.
- reduceByKeyAndWindow(Function2<V, V, V>, Function2<V, V, V>, Duration, Duration, Partitioner, Function<Tuple2<K, V>, Boolean>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying incremental reduceByKey
over a sliding window.
- reduceByKeyAndWindow(Function2<V, V, V>, Duration) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying reduceByKey
over a sliding window on this
DStream.
- reduceByKeyAndWindow(Function2<V, V, V>, Duration, Duration) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying reduceByKey
over a sliding window.
- reduceByKeyAndWindow(Function2<V, V, V>, Duration, Duration, int) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying reduceByKey
over a sliding window.
- reduceByKeyAndWindow(Function2<V, V, V>, Duration, Duration, Partitioner) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying reduceByKey
over a sliding window.
- reduceByKeyAndWindow(Function2<V, V, V>, Function2<V, V, V>, Duration, Duration, int, Function1<Tuple2<K, V>, Object>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying incremental reduceByKey
over a sliding window.
- reduceByKeyAndWindow(Function2<V, V, V>, Function2<V, V, V>, Duration, Duration, Partitioner, Function1<Tuple2<K, V>, Object>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying incremental reduceByKey
over a sliding window.
- reduceByKeyLocally(Function2<V, V, V>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Merge the values for each key using an associative reduce function, but return the results
immediately to the master as a Map.
- reduceByKeyLocally(Function2<V, V, V>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Merge the values for each key using an associative reduce function, but return the results
immediately to the master as a Map.
- reduceByKeyToDriver(Function2<V, V, V>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Alias for reduceByKeyLocally
- reduceByWindow(Function2<T, T, T>, Duration, Duration) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Deprecated.
As this API is not Java compatible.
- reduceByWindow(Function2<T, T, T>, Duration, Duration) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD has a single element generated by reducing all
elements in a sliding window over this DStream.
- reduceByWindow(Function2<T, T, T>, Function2<T, T, T>, Duration, Duration) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD has a single element generated by reducing all
elements in a sliding window over this DStream.
- reduceByWindow(Function2<T, T, T>, Duration, Duration) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream in which each RDD has a single element generated by reducing all
elements in a sliding window over this DStream.
- reduceByWindow(Function2<T, T, T>, Function2<T, T, T>, Duration, Duration) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream in which each RDD has a single element generated by reducing all
elements in a sliding window over this DStream.
- reduceId() - Method in class org.apache.spark.FetchFailed
-
- reduceId() - Method in class org.apache.spark.storage.ShuffleBlockId
-
- reduceId() - Method in class org.apache.spark.storage.ShuffleDataBlockId
-
- reduceId() - Method in class org.apache.spark.storage.ShuffleIndexBlockId
-
- refreshTable(String) - Method in class org.apache.spark.sql.hive.HiveContext
-
Invalidate and refresh all the cached the metadata of the given table.
- regexp_extract(Column, String, int) - Static method in class org.apache.spark.sql.functions
-
Extract a specific(idx) group identified by a java regex, from the specified string column.
- regexp_replace(Column, String, String) - Static method in class org.apache.spark.sql.functions
-
Replace all substrings of the specified string value that match regexp with rep.
- RegexTokenizer - Class in org.apache.spark.ml.feature
-
:: Experimental ::
A regex based tokenizer that extracts tokens either by using the provided regex pattern to split
the text (default) or repeatedly matching the regex (if gaps
is false).
- RegexTokenizer(String) - Constructor for class org.apache.spark.ml.feature.RegexTokenizer
-
- RegexTokenizer() - Constructor for class org.apache.spark.ml.feature.RegexTokenizer
-
- register(String, UserDefinedAggregateFunction) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a user-defined aggregate function (UDAF).
- register(String, Function0<RT>, TypeTags.TypeTag<RT>) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a Scala closure of 0 arguments as user-defined function (UDF).
- register(String, Function1<A1, RT>, TypeTags.TypeTag<RT>, TypeTags.TypeTag<A1>) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a Scala closure of 1 arguments as user-defined function (UDF).
- register(String, Function2<A1, A2, RT>, TypeTags.TypeTag<RT>, TypeTags.TypeTag<A1>, TypeTags.TypeTag<A2>) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a Scala closure of 2 arguments as user-defined function (UDF).
- register(String, Function3<A1, A2, A3, RT>, TypeTags.TypeTag<RT>, TypeTags.TypeTag<A1>, TypeTags.TypeTag<A2>, TypeTags.TypeTag<A3>) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a Scala closure of 3 arguments as user-defined function (UDF).
- register(String, Function4<A1, A2, A3, A4, RT>, TypeTags.TypeTag<RT>, TypeTags.TypeTag<A1>, TypeTags.TypeTag<A2>, TypeTags.TypeTag<A3>, TypeTags.TypeTag<A4>) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a Scala closure of 4 arguments as user-defined function (UDF).
- register(String, Function5<A1, A2, A3, A4, A5, RT>, TypeTags.TypeTag<RT>, TypeTags.TypeTag<A1>, TypeTags.TypeTag<A2>, TypeTags.TypeTag<A3>, TypeTags.TypeTag<A4>, TypeTags.TypeTag<A5>) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a Scala closure of 5 arguments as user-defined function (UDF).
- register(String, Function6<A1, A2, A3, A4, A5, A6, RT>, TypeTags.TypeTag<RT>, TypeTags.TypeTag<A1>, TypeTags.TypeTag<A2>, TypeTags.TypeTag<A3>, TypeTags.TypeTag<A4>, TypeTags.TypeTag<A5>, TypeTags.TypeTag<A6>) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a Scala closure of 6 arguments as user-defined function (UDF).
- register(String, Function7<A1, A2, A3, A4, A5, A6, A7, RT>, TypeTags.TypeTag<RT>, TypeTags.TypeTag<A1>, TypeTags.TypeTag<A2>, TypeTags.TypeTag<A3>, TypeTags.TypeTag<A4>, TypeTags.TypeTag<A5>, TypeTags.TypeTag<A6>, TypeTags.TypeTag<A7>) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a Scala closure of 7 arguments as user-defined function (UDF).
- register(String, Function8<A1, A2, A3, A4, A5, A6, A7, A8, RT>, TypeTags.TypeTag<RT>, TypeTags.TypeTag<A1>, TypeTags.TypeTag<A2>, TypeTags.TypeTag<A3>, TypeTags.TypeTag<A4>, TypeTags.TypeTag<A5>, TypeTags.TypeTag<A6>, TypeTags.TypeTag<A7>, TypeTags.TypeTag<A8>) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a Scala closure of 8 arguments as user-defined function (UDF).
- register(String, Function9<A1, A2, A3, A4, A5, A6, A7, A8, A9, RT>, TypeTags.TypeTag<RT>, TypeTags.TypeTag<A1>, TypeTags.TypeTag<A2>, TypeTags.TypeTag<A3>, TypeTags.TypeTag<A4>, TypeTags.TypeTag<A5>, TypeTags.TypeTag<A6>, TypeTags.TypeTag<A7>, TypeTags.TypeTag<A8>, TypeTags.TypeTag<A9>) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a Scala closure of 9 arguments as user-defined function (UDF).
- register(String, Function10<A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, RT>, TypeTags.TypeTag<RT>, TypeTags.TypeTag<A1>, TypeTags.TypeTag<A2>, TypeTags.TypeTag<A3>, TypeTags.TypeTag<A4>, TypeTags.TypeTag<A5>, TypeTags.TypeTag<A6>, TypeTags.TypeTag<A7>, TypeTags.TypeTag<A8>, TypeTags.TypeTag<A9>, TypeTags.TypeTag<A10>) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a Scala closure of 10 arguments as user-defined function (UDF).
- register(String, Function11<A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, A11, RT>, TypeTags.TypeTag<RT>, TypeTags.TypeTag<A1>, TypeTags.TypeTag<A2>, TypeTags.TypeTag<A3>, TypeTags.TypeTag<A4>, TypeTags.TypeTag<A5>, TypeTags.TypeTag<A6>, TypeTags.TypeTag<A7>, TypeTags.TypeTag<A8>, TypeTags.TypeTag<A9>, TypeTags.TypeTag<A10>, TypeTags.TypeTag<A11>) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a Scala closure of 11 arguments as user-defined function (UDF).
- register(String, Function12<A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, A11, A12, RT>, TypeTags.TypeTag<RT>, TypeTags.TypeTag<A1>, TypeTags.TypeTag<A2>, TypeTags.TypeTag<A3>, TypeTags.TypeTag<A4>, TypeTags.TypeTag<A5>, TypeTags.TypeTag<A6>, TypeTags.TypeTag<A7>, TypeTags.TypeTag<A8>, TypeTags.TypeTag<A9>, TypeTags.TypeTag<A10>, TypeTags.TypeTag<A11>, TypeTags.TypeTag<A12>) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a Scala closure of 12 arguments as user-defined function (UDF).
- register(String, Function13<A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, A11, A12, A13, RT>, TypeTags.TypeTag<RT>, TypeTags.TypeTag<A1>, TypeTags.TypeTag<A2>, TypeTags.TypeTag<A3>, TypeTags.TypeTag<A4>, TypeTags.TypeTag<A5>, TypeTags.TypeTag<A6>, TypeTags.TypeTag<A7>, TypeTags.TypeTag<A8>, TypeTags.TypeTag<A9>, TypeTags.TypeTag<A10>, TypeTags.TypeTag<A11>, TypeTags.TypeTag<A12>, TypeTags.TypeTag<A13>) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a Scala closure of 13 arguments as user-defined function (UDF).
- register(String, Function14<A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, A11, A12, A13, A14, RT>, TypeTags.TypeTag<RT>, TypeTags.TypeTag<A1>, TypeTags.TypeTag<A2>, TypeTags.TypeTag<A3>, TypeTags.TypeTag<A4>, TypeTags.TypeTag<A5>, TypeTags.TypeTag<A6>, TypeTags.TypeTag<A7>, TypeTags.TypeTag<A8>, TypeTags.TypeTag<A9>, TypeTags.TypeTag<A10>, TypeTags.TypeTag<A11>, TypeTags.TypeTag<A12>, TypeTags.TypeTag<A13>, TypeTags.TypeTag<A14>) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a Scala closure of 14 arguments as user-defined function (UDF).
- register(String, Function15<A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, A11, A12, A13, A14, A15, RT>, TypeTags.TypeTag<RT>, TypeTags.TypeTag<A1>, TypeTags.TypeTag<A2>, TypeTags.TypeTag<A3>, TypeTags.TypeTag<A4>, TypeTags.TypeTag<A5>, TypeTags.TypeTag<A6>, TypeTags.TypeTag<A7>, TypeTags.TypeTag<A8>, TypeTags.TypeTag<A9>, TypeTags.TypeTag<A10>, TypeTags.TypeTag<A11>, TypeTags.TypeTag<A12>, TypeTags.TypeTag<A13>, TypeTags.TypeTag<A14>, TypeTags.TypeTag<A15>) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a Scala closure of 15 arguments as user-defined function (UDF).
- register(String, Function16<A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, A11, A12, A13, A14, A15, A16, RT>, TypeTags.TypeTag<RT>, TypeTags.TypeTag<A1>, TypeTags.TypeTag<A2>, TypeTags.TypeTag<A3>, TypeTags.TypeTag<A4>, TypeTags.TypeTag<A5>, TypeTags.TypeTag<A6>, TypeTags.TypeTag<A7>, TypeTags.TypeTag<A8>, TypeTags.TypeTag<A9>, TypeTags.TypeTag<A10>, TypeTags.TypeTag<A11>, TypeTags.TypeTag<A12>, TypeTags.TypeTag<A13>, TypeTags.TypeTag<A14>, TypeTags.TypeTag<A15>, TypeTags.TypeTag<A16>) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a Scala closure of 16 arguments as user-defined function (UDF).
- register(String, Function17<A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, A11, A12, A13, A14, A15, A16, A17, RT>, TypeTags.TypeTag<RT>, TypeTags.TypeTag<A1>, TypeTags.TypeTag<A2>, TypeTags.TypeTag<A3>, TypeTags.TypeTag<A4>, TypeTags.TypeTag<A5>, TypeTags.TypeTag<A6>, TypeTags.TypeTag<A7>, TypeTags.TypeTag<A8>, TypeTags.TypeTag<A9>, TypeTags.TypeTag<A10>, TypeTags.TypeTag<A11>, TypeTags.TypeTag<A12>, TypeTags.TypeTag<A13>, TypeTags.TypeTag<A14>, TypeTags.TypeTag<A15>, TypeTags.TypeTag<A16>, TypeTags.TypeTag<A17>) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a Scala closure of 17 arguments as user-defined function (UDF).
- register(String, Function18<A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, A11, A12, A13, A14, A15, A16, A17, A18, RT>, TypeTags.TypeTag<RT>, TypeTags.TypeTag<A1>, TypeTags.TypeTag<A2>, TypeTags.TypeTag<A3>, TypeTags.TypeTag<A4>, TypeTags.TypeTag<A5>, TypeTags.TypeTag<A6>, TypeTags.TypeTag<A7>, TypeTags.TypeTag<A8>, TypeTags.TypeTag<A9>, TypeTags.TypeTag<A10>, TypeTags.TypeTag<A11>, TypeTags.TypeTag<A12>, TypeTags.TypeTag<A13>, TypeTags.TypeTag<A14>, TypeTags.TypeTag<A15>, TypeTags.TypeTag<A16>, TypeTags.TypeTag<A17>, TypeTags.TypeTag<A18>) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a Scala closure of 18 arguments as user-defined function (UDF).
- register(String, Function19<A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, A11, A12, A13, A14, A15, A16, A17, A18, A19, RT>, TypeTags.TypeTag<RT>, TypeTags.TypeTag<A1>, TypeTags.TypeTag<A2>, TypeTags.TypeTag<A3>, TypeTags.TypeTag<A4>, TypeTags.TypeTag<A5>, TypeTags.TypeTag<A6>, TypeTags.TypeTag<A7>, TypeTags.TypeTag<A8>, TypeTags.TypeTag<A9>, TypeTags.TypeTag<A10>, TypeTags.TypeTag<A11>, TypeTags.TypeTag<A12>, TypeTags.TypeTag<A13>, TypeTags.TypeTag<A14>, TypeTags.TypeTag<A15>, TypeTags.TypeTag<A16>, TypeTags.TypeTag<A17>, TypeTags.TypeTag<A18>, TypeTags.TypeTag<A19>) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a Scala closure of 19 arguments as user-defined function (UDF).
- register(String, Function20<A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, A11, A12, A13, A14, A15, A16, A17, A18, A19, A20, RT>, TypeTags.TypeTag<RT>, TypeTags.TypeTag<A1>, TypeTags.TypeTag<A2>, TypeTags.TypeTag<A3>, TypeTags.TypeTag<A4>, TypeTags.TypeTag<A5>, TypeTags.TypeTag<A6>, TypeTags.TypeTag<A7>, TypeTags.TypeTag<A8>, TypeTags.TypeTag<A9>, TypeTags.TypeTag<A10>, TypeTags.TypeTag<A11>, TypeTags.TypeTag<A12>, TypeTags.TypeTag<A13>, TypeTags.TypeTag<A14>, TypeTags.TypeTag<A15>, TypeTags.TypeTag<A16>, TypeTags.TypeTag<A17>, TypeTags.TypeTag<A18>, TypeTags.TypeTag<A19>, TypeTags.TypeTag<A20>) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a Scala closure of 20 arguments as user-defined function (UDF).
- register(String, Function21<A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, A11, A12, A13, A14, A15, A16, A17, A18, A19, A20, A21, RT>, TypeTags.TypeTag<RT>, TypeTags.TypeTag<A1>, TypeTags.TypeTag<A2>, TypeTags.TypeTag<A3>, TypeTags.TypeTag<A4>, TypeTags.TypeTag<A5>, TypeTags.TypeTag<A6>, TypeTags.TypeTag<A7>, TypeTags.TypeTag<A8>, TypeTags.TypeTag<A9>, TypeTags.TypeTag<A10>, TypeTags.TypeTag<A11>, TypeTags.TypeTag<A12>, TypeTags.TypeTag<A13>, TypeTags.TypeTag<A14>, TypeTags.TypeTag<A15>, TypeTags.TypeTag<A16>, TypeTags.TypeTag<A17>, TypeTags.TypeTag<A18>, TypeTags.TypeTag<A19>, TypeTags.TypeTag<A20>, TypeTags.TypeTag<A21>) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a Scala closure of 21 arguments as user-defined function (UDF).
- register(String, Function22<A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, A11, A12, A13, A14, A15, A16, A17, A18, A19, A20, A21, A22, RT>, TypeTags.TypeTag<RT>, TypeTags.TypeTag<A1>, TypeTags.TypeTag<A2>, TypeTags.TypeTag<A3>, TypeTags.TypeTag<A4>, TypeTags.TypeTag<A5>, TypeTags.TypeTag<A6>, TypeTags.TypeTag<A7>, TypeTags.TypeTag<A8>, TypeTags.TypeTag<A9>, TypeTags.TypeTag<A10>, TypeTags.TypeTag<A11>, TypeTags.TypeTag<A12>, TypeTags.TypeTag<A13>, TypeTags.TypeTag<A14>, TypeTags.TypeTag<A15>, TypeTags.TypeTag<A16>, TypeTags.TypeTag<A17>, TypeTags.TypeTag<A18>, TypeTags.TypeTag<A19>, TypeTags.TypeTag<A20>, TypeTags.TypeTag<A21>, TypeTags.TypeTag<A22>) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a Scala closure of 22 arguments as user-defined function (UDF).
- register(String, UDF1<?, ?>, DataType) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a user-defined function with 1 arguments.
- register(String, UDF2<?, ?, ?>, DataType) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a user-defined function with 2 arguments.
- register(String, UDF3<?, ?, ?, ?>, DataType) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a user-defined function with 3 arguments.
- register(String, UDF4<?, ?, ?, ?, ?>, DataType) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a user-defined function with 4 arguments.
- register(String, UDF5<?, ?, ?, ?, ?, ?>, DataType) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a user-defined function with 5 arguments.
- register(String, UDF6<?, ?, ?, ?, ?, ?, ?>, DataType) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a user-defined function with 6 arguments.
- register(String, UDF7<?, ?, ?, ?, ?, ?, ?, ?>, DataType) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a user-defined function with 7 arguments.
- register(String, UDF8<?, ?, ?, ?, ?, ?, ?, ?, ?>, DataType) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a user-defined function with 8 arguments.
- register(String, UDF9<?, ?, ?, ?, ?, ?, ?, ?, ?, ?>, DataType) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a user-defined function with 9 arguments.
- register(String, UDF10<?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?>, DataType) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a user-defined function with 10 arguments.
- register(String, UDF11<?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?>, DataType) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a user-defined function with 11 arguments.
- register(String, UDF12<?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?>, DataType) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a user-defined function with 12 arguments.
- register(String, UDF13<?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?>, DataType) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a user-defined function with 13 arguments.
- register(String, UDF14<?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?>, DataType) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a user-defined function with 14 arguments.
- register(String, UDF15<?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?>, DataType) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a user-defined function with 15 arguments.
- register(String, UDF16<?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?>, DataType) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a user-defined function with 16 arguments.
- register(String, UDF17<?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?>, DataType) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a user-defined function with 17 arguments.
- register(String, UDF18<?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?>, DataType) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a user-defined function with 18 arguments.
- register(String, UDF19<?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?>, DataType) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a user-defined function with 19 arguments.
- register(String, UDF20<?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?>, DataType) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a user-defined function with 20 arguments.
- register(String, UDF21<?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?>, DataType) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a user-defined function with 21 arguments.
- register(String, UDF22<?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?>, DataType) - Method in class org.apache.spark.sql.UDFRegistration
-
Register a user-defined function with 22 arguments.
- registerAvroSchemas(Seq<Schema>) - Method in class org.apache.spark.SparkConf
-
Use Kryo serialization and register the given set of Avro schemas so that the generic
record serializer can decrease network IO
- registerClasses(Kryo) - Method in class org.apache.spark.graphx.GraphKryoRegistrator
-
- registerClasses(Kryo) - Method in interface org.apache.spark.serializer.KryoRegistrator
-
- registerDialect(JdbcDialect) - Static method in class org.apache.spark.sql.jdbc.JdbcDialects
-
Register a dialect for use on all new matching jdbc
DataFrame
.
- registerKryoClasses(SparkConf) - Static method in class org.apache.spark.graphx.GraphXUtils
-
Registers classes that GraphX uses with Kryo.
- registerKryoClasses(Class<?>[]) - Method in class org.apache.spark.SparkConf
-
Use Kryo serialization and register the given set of classes with Kryo.
- registerPython(String, UserDefinedPythonFunction) - Method in class org.apache.spark.sql.UDFRegistration
-
- registerTempTable(String) - Method in class org.apache.spark.sql.DataFrame
-
Registers this
DataFrame
as a temporary table using the given name.
- Regression() - Static method in class org.apache.spark.mllib.tree.configuration.Algo
-
- RegressionEvaluator - Class in org.apache.spark.ml.evaluation
-
:: Experimental ::
Evaluator for regression, which expects two input columns: prediction and label.
- RegressionEvaluator(String) - Constructor for class org.apache.spark.ml.evaluation.RegressionEvaluator
-
- RegressionEvaluator() - Constructor for class org.apache.spark.ml.evaluation.RegressionEvaluator
-
- RegressionMetrics - Class in org.apache.spark.mllib.evaluation
-
:: Experimental ::
Evaluator for regression.
- RegressionMetrics(RDD<Tuple2<Object, Object>>) - Constructor for class org.apache.spark.mllib.evaluation.RegressionMetrics
-
- RegressionModel<FeaturesType,M extends RegressionModel<FeaturesType,M>> - Class in org.apache.spark.ml.regression
-
:: DeveloperApi ::
- RegressionModel() - Constructor for class org.apache.spark.ml.regression.RegressionModel
-
- RegressionModel - Interface in org.apache.spark.mllib.regression
-
- reindex() - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
-
- reindex() - Method in class org.apache.spark.graphx.VertexRDD
-
Construct a new VertexRDD that is indexed by only the visible vertices.
- RelationProvider - Interface in org.apache.spark.sql.sources
-
::DeveloperApi::
Implemented by objects that produce relations for a specific kind of data source.
- relativeDirection(long) - Method in class org.apache.spark.graphx.Edge
-
Return the relative direction of the edge to the corresponding
vertex.
- remainder(Decimal) - Method in class org.apache.spark.sql.types.Decimal
-
- remember(Duration) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Sets each DStreams in this context to remember RDDs it generated in the last given duration.
- remember(Duration) - Method in class org.apache.spark.streaming.StreamingContext
-
Set each DStreams in this context to remember RDDs it generated in the last given duration.
- rememberDuration() - Method in class org.apache.spark.streaming.dstream.DStream
-
- remoteBlocksFetched() - Method in class org.apache.spark.status.api.v1.ShuffleReadMetricDistributions
-
- remoteBlocksFetched() - Method in class org.apache.spark.status.api.v1.ShuffleReadMetrics
-
- remoteBytesRead() - Method in class org.apache.spark.status.api.v1.ShuffleReadMetricDistributions
-
- remoteBytesRead() - Method in class org.apache.spark.status.api.v1.ShuffleReadMetrics
-
- remove(Param<T>) - Method in class org.apache.spark.ml.param.ParamMap
-
Removes a key from this map and returns its value associated previously as an option.
- remove(String) - Method in class org.apache.spark.SparkConf
-
Remove a parameter from the configuration
- repartition(int) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Return a new RDD that has exactly numPartitions partitions.
- repartition(int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return a new RDD that has exactly numPartitions partitions.
- repartition(int) - Method in class org.apache.spark.api.java.JavaRDD
-
Return a new RDD that has exactly numPartitions partitions.
- repartition(int, Ordering<T>) - Method in class org.apache.spark.rdd.RDD
-
Return a new RDD that has exactly numPartitions partitions.
- repartition(int) - Method in class org.apache.spark.sql.DataFrame
-
Returns a new
DataFrame
that has exactly
numPartitions
partitions.
- repartition(int) - Method in class org.apache.spark.streaming.api.java.JavaDStream
-
Return a new DStream with an increased or decreased level of parallelism.
- repartition(int) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream with an increased or decreased level of parallelism.
- repartition(int) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream with an increased or decreased level of parallelism.
- repartitionAndSortWithinPartitions(Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Repartition the RDD according to the given partitioner and, within each resulting partition,
sort records by their keys.
- repartitionAndSortWithinPartitions(Partitioner, Comparator<K>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Repartition the RDD according to the given partitioner and, within each resulting partition,
sort records by their keys.
- repartitionAndSortWithinPartitions(Partitioner) - Method in class org.apache.spark.rdd.OrderedRDDFunctions
-
Repartition the RDD according to the given partitioner and, within each resulting partition,
sort records by their keys.
- repeat(Column, int) - Static method in class org.apache.spark.sql.functions
-
Repeats a string column n times, and returns it as a new string column.
- replace(String, Map<T, T>) - Method in class org.apache.spark.sql.DataFrameNaFunctions
-
Replaces values matching keys in replacement
map with the corresponding values.
- replace(String[], Map<T, T>) - Method in class org.apache.spark.sql.DataFrameNaFunctions
-
Replaces values matching keys in replacement
map with the corresponding values.
- replace(String, Map<T, T>) - Method in class org.apache.spark.sql.DataFrameNaFunctions
-
(Scala-specific) Replaces values matching keys in replacement
map.
- replace(Seq<String>, Map<T, T>) - Method in class org.apache.spark.sql.DataFrameNaFunctions
-
(Scala-specific) Replaces values matching keys in replacement
map.
- replicatedVertexView() - Method in class org.apache.spark.graphx.impl.GraphImpl
-
- replication() - Method in class org.apache.spark.storage.StorageLevel
-
- reportError(String, Throwable) - Method in class org.apache.spark.streaming.receiver.Receiver
-
Report exceptions in receiving data.
- requestExecutors(int) - Method in class org.apache.spark.SparkContext
-
:: DeveloperApi ::
Request an additional number of executors from the cluster manager.
- reset() - Method in class org.apache.spark.storage.BufferReleasingInputStream
-
- resetIterator() - Method in class org.apache.spark.rdd.PartitionCoalescer.LocationIterator
-
- residuals() - Method in class org.apache.spark.ml.regression.LinearRegressionSummary
-
Residuals (label - predicted value)
- resolve(String) - Method in class org.apache.spark.sql.DataFrame
-
- restart(String) - Method in class org.apache.spark.streaming.receiver.Receiver
-
Restart the receiver.
- restart(String, Throwable) - Method in class org.apache.spark.streaming.receiver.Receiver
-
Restart the receiver.
- restart(String, Throwable, int) - Method in class org.apache.spark.streaming.receiver.Receiver
-
Restart the receiver.
- Resubmitted - Class in org.apache.spark
-
:: DeveloperApi ::
A ShuffleMapTask
that completed successfully earlier, but we
lost the executor before the stage completed.
- Resubmitted() - Constructor for class org.apache.spark.Resubmitted
-
- result(Duration, CanAwait) - Method in class org.apache.spark.ComplexFutureAction
-
- result(Duration, CanAwait) - Method in interface org.apache.spark.FutureAction
-
Awaits and returns the result (of type T) of this action.
- result(Duration, CanAwait) - Method in class org.apache.spark.SimpleFutureAction
-
- resultSerializationTime() - Method in class org.apache.spark.status.api.v1.TaskMetricDistributions
-
- resultSerializationTime() - Method in class org.apache.spark.status.api.v1.TaskMetrics
-
- resultSetToObjectArray(ResultSet) - Static method in class org.apache.spark.rdd.JdbcRDD
-
- resultSize() - Method in class org.apache.spark.status.api.v1.TaskMetricDistributions
-
- resultSize() - Method in class org.apache.spark.status.api.v1.TaskMetrics
-
- retainedJobs() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- retainedStages() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- retryWaitMs(SparkConf) - Static method in class org.apache.spark.util.RpcUtils
-
Returns the configured number of milliseconds to wait on each retry
- ReturnStatementFinder - Class in org.apache.spark.util
-
- ReturnStatementFinder() - Constructor for class org.apache.spark.util.ReturnStatementFinder
-
- reverse() - Method in class org.apache.spark.graphx.EdgeDirection
-
Reverse the direction of an edge.
- reverse() - Method in class org.apache.spark.graphx.EdgeRDD
-
Reverse all the edges in this RDD.
- reverse() - Method in class org.apache.spark.graphx.Graph
-
Reverses all edges in the graph.
- reverse() - Method in class org.apache.spark.graphx.impl.EdgeRDDImpl
-
- reverse() - Method in class org.apache.spark.graphx.impl.GraphImpl
-
- reverse(Column) - Static method in class org.apache.spark.sql.functions
-
Reverses the string column and returns it as a new string column.
- reverseRoutingTables() - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
-
- reverseRoutingTables() - Method in class org.apache.spark.graphx.VertexRDD
-
Returns a new
VertexRDD
reflecting a reversal of all edge directions in the corresponding
EdgeRDD
.
- ReviveOffers - Class in org.apache.spark.scheduler.local
-
- ReviveOffers() - Constructor for class org.apache.spark.scheduler.local.ReviveOffers
-
- RFormula - Class in org.apache.spark.ml.feature
-
:: Experimental ::
Implements the transforms required for fitting a dataset against an R model formula.
- RFormula(String) - Constructor for class org.apache.spark.ml.feature.RFormula
-
- RFormula() - Constructor for class org.apache.spark.ml.feature.RFormula
-
- RFormulaModel - Class in org.apache.spark.ml.feature
-
:: Experimental ::
A fitted RFormula.
- RidgeRegressionModel - Class in org.apache.spark.mllib.regression
-
Regression model trained using RidgeRegression.
- RidgeRegressionModel(Vector, double) - Constructor for class org.apache.spark.mllib.regression.RidgeRegressionModel
-
- RidgeRegressionWithSGD - Class in org.apache.spark.mllib.regression
-
Train a regression model with L2-regularization using Stochastic Gradient Descent.
- RidgeRegressionWithSGD() - Constructor for class org.apache.spark.mllib.regression.RidgeRegressionWithSGD
-
Construct a RidgeRegression object with default parameters: {stepSize: 1.0, numIterations: 100,
regParam: 0.01, miniBatchFraction: 1.0}.
- right() - Method in class org.apache.spark.sql.sources.And
-
- right() - Method in class org.apache.spark.sql.sources.Or
-
- rightCategories() - Method in class org.apache.spark.ml.tree.CategoricalSplit
-
Get sorted categories which split to the right
- rightChild() - Method in class org.apache.spark.ml.tree.InternalNode
-
- rightChildIndex(int) - Static method in class org.apache.spark.mllib.tree.model.Node
-
Return the index of the right child of this node.
- rightImpurity() - Method in class org.apache.spark.mllib.tree.model.InformationGainStats
-
- rightNode() - Method in class org.apache.spark.mllib.tree.model.Node
-
- rightOuterJoin(JavaPairRDD<K, W>, Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Perform a right outer join of this
and other
.
- rightOuterJoin(JavaPairRDD<K, W>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Perform a right outer join of this
and other
.
- rightOuterJoin(JavaPairRDD<K, W>, int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Perform a right outer join of this
and other
.
- rightOuterJoin(RDD<Tuple2<K, W>>, Partitioner) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Perform a right outer join of this
and other
.
- rightOuterJoin(RDD<Tuple2<K, W>>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Perform a right outer join of this
and other
.
- rightOuterJoin(RDD<Tuple2<K, W>>, int) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Perform a right outer join of this
and other
.
- rightOuterJoin(JavaPairDStream<K, W>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying 'right outer join' between RDDs of this
DStream and
other
DStream.
- rightOuterJoin(JavaPairDStream<K, W>, int) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying 'right outer join' between RDDs of this
DStream and
other
DStream.
- rightOuterJoin(JavaPairDStream<K, W>, Partitioner) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying 'right outer join' between RDDs of this
DStream and
other
DStream.
- rightOuterJoin(DStream<Tuple2<K, W>>, ClassTag<W>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying 'right outer join' between RDDs of this
DStream and
other
DStream.
- rightOuterJoin(DStream<Tuple2<K, W>>, int, ClassTag<W>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying 'right outer join' between RDDs of this
DStream and
other
DStream.
- rightOuterJoin(DStream<Tuple2<K, W>>, Partitioner, ClassTag<W>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying 'right outer join' between RDDs of this
DStream and
other
DStream.
- rightPredict() - Method in class org.apache.spark.mllib.tree.model.InformationGainStats
-
- rint(Column) - Static method in class org.apache.spark.sql.functions
-
Returns the double value that is closest in value to the argument and
is equal to a mathematical integer.
- rint(String) - Static method in class org.apache.spark.sql.functions
-
Returns the double value that is closest in value to the argument and
is equal to a mathematical integer.
- rlike(String) - Method in class org.apache.spark.sql.Column
-
SQL RLIKE expression (LIKE with Regex).
- RMATa() - Static method in class org.apache.spark.graphx.util.GraphGenerators
-
- RMATb() - Static method in class org.apache.spark.graphx.util.GraphGenerators
-
- RMATc() - Static method in class org.apache.spark.graphx.util.GraphGenerators
-
- RMATd() - Static method in class org.apache.spark.graphx.util.GraphGenerators
-
- rmatGraph(SparkContext, int, int) - Static method in class org.apache.spark.graphx.util.GraphGenerators
-
A random graph generator using the R-MAT model, proposed in
"R-MAT: A Recursive Model for Graph Mining" by Chakrabarti et al.
- rnd() - Method in class org.apache.spark.rdd.PartitionCoalescer
-
- roc() - Method in class org.apache.spark.ml.classification.BinaryLogisticRegressionSummary
-
Returns the receiver operating characteristic (ROC) curve,
which is an Dataframe having two fields (FPR, TPR)
with (0.0, 0.0) prepended and (1.0, 1.0) appended to it.
- roc() - Method in class org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
-
Returns the receiver operating characteristic (ROC) curve,
which is an RDD of (false positive rate, true positive rate)
with (0.0, 0.0) prepended and (1.0, 1.0) appended to it.
- rollup(Column...) - Method in class org.apache.spark.sql.DataFrame
-
Create a multi-dimensional rollup for the current
DataFrame
using the specified columns,
so we can run aggregation on them.
- rollup(String, String...) - Method in class org.apache.spark.sql.DataFrame
-
Create a multi-dimensional rollup for the current
DataFrame
using the specified columns,
so we can run aggregation on them.
- rollup(Seq<Column>) - Method in class org.apache.spark.sql.DataFrame
-
Create a multi-dimensional rollup for the current
DataFrame
using the specified columns,
so we can run aggregation on them.
- rollup(String, Seq<String>) - Method in class org.apache.spark.sql.DataFrame
-
Create a multi-dimensional rollup for the current
DataFrame
using the specified columns,
so we can run aggregation on them.
- rootMeanSquaredError() - Method in class org.apache.spark.ml.regression.LinearRegressionSummary
-
Returns the root mean squared error, which is defined as the square root of
the mean squared error.
- rootMeanSquaredError() - Method in class org.apache.spark.mllib.evaluation.RegressionMetrics
-
Returns the root mean squared error, which is defined as the square root of
the mean squared error.
- rootNode() - Method in class org.apache.spark.ml.classification.DecisionTreeClassificationModel
-
- rootNode() - Method in class org.apache.spark.ml.regression.DecisionTreeRegressionModel
-
- round(Column) - Static method in class org.apache.spark.sql.functions
-
Returns the value of the column e
rounded to 0 decimal places.
- round(Column, int) - Static method in class org.apache.spark.sql.functions
-
Round the value of e
to scale
decimal places if scale
>= 0
or at integral part when scale
< 0.
- Row - Interface in org.apache.spark.sql
-
Represents one row of output from a relational operator.
- RowFactory - Class in org.apache.spark.sql
-
A factory class used to construct
Row
objects.
- RowFactory() - Constructor for class org.apache.spark.sql.RowFactory
-
- rowIndices() - Method in class org.apache.spark.mllib.linalg.SparseMatrix
-
- RowMatrix - Class in org.apache.spark.mllib.linalg.distributed
-
:: Experimental ::
Represents a row-oriented distributed Matrix with no meaningful row indices.
- RowMatrix(RDD<Vector>, long, int) - Constructor for class org.apache.spark.mllib.linalg.distributed.RowMatrix
-
- RowMatrix(RDD<Vector>) - Constructor for class org.apache.spark.mllib.linalg.distributed.RowMatrix
-
Alternative constructor leaving matrix dimensions to be determined automatically.
- rowNumber() - Static method in class org.apache.spark.sql.functions
-
Window function: returns a sequential number starting at 1 within a window partition.
- rows() - Method in class org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix
-
- rows() - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
-
- rowsBetween(long, long) - Method in class org.apache.spark.sql.expressions.WindowSpec
-
Defines the frame boundaries, from start
(inclusive) to end
(inclusive).
- rowsPerBlock() - Method in class org.apache.spark.mllib.linalg.distributed.BlockMatrix
-
- rpad(Column, int, String) - Static method in class org.apache.spark.sql.functions
-
Right-padded with pad to a length of len.
- rpcEnv() - Method in class org.apache.spark.SparkEnv
-
- RpcUtils - Class in org.apache.spark.util
-
- RpcUtils() - Constructor for class org.apache.spark.util.RpcUtils
-
- RRDD<T> - Class in org.apache.spark.api.r
-
An RDD that stores serialized R objects as Array[Byte].
- RRDD(RDD<T>, byte[], String, String, byte[], Object[], ClassTag<T>) - Constructor for class org.apache.spark.api.r.RRDD
-
- rtrim(Column) - Static method in class org.apache.spark.sql.functions
-
Trim the spaces from right end for the specified string value.
- run(Function0<T>, ExecutionContext) - Method in class org.apache.spark.ComplexFutureAction
-
Executes some action enclosed in the closure.
- run(Graph<VD, ED>, ClassTag<VD>, ClassTag<ED>) - Static method in class org.apache.spark.graphx.lib.ConnectedComponents
-
Compute the connected component membership of each vertex and return a graph with the vertex
value containing the lowest vertex id in the connected component containing that vertex.
- run(Graph<VD, ED>, int, ClassTag<ED>) - Static method in class org.apache.spark.graphx.lib.LabelPropagation
-
Run static Label Propagation for detecting communities in networks.
- run(Graph<VD, ED>, int, double, ClassTag<VD>, ClassTag<ED>) - Static method in class org.apache.spark.graphx.lib.PageRank
-
Run PageRank for a fixed number of iterations returning a graph
with vertex attributes containing the PageRank and edge
attributes the normalized edge weight.
- run(Graph<VD, ED>, Seq<Object>, ClassTag<ED>) - Static method in class org.apache.spark.graphx.lib.ShortestPaths
-
Computes shortest paths to the given set of landmark vertices.
- run(Graph<VD, ED>, int, ClassTag<VD>, ClassTag<ED>) - Static method in class org.apache.spark.graphx.lib.StronglyConnectedComponents
-
Compute the strongly connected component (SCC) of each vertex and return a graph with the
vertex value containing the lowest vertex id in the SCC containing that vertex.
- run(RDD<Edge<Object>>, SVDPlusPlus.Conf) - Static method in class org.apache.spark.graphx.lib.SVDPlusPlus
-
Implement SVD++ based on "Factorization Meets the Neighborhood:
a Multifaceted Collaborative Filtering Model",
available at http://public.research.att.com/~volinsky/netflix/kdd08koren.pdf
.
- run(Graph<VD, ED>, ClassTag<VD>, ClassTag<ED>) - Static method in class org.apache.spark.graphx.lib.TriangleCount
-
- run(RDD<LabeledPoint>) - Method in class org.apache.spark.mllib.classification.NaiveBayes
-
Run the algorithm with the configured parameters on an input RDD of LabeledPoint entries.
- run(RDD<Vector>) - Method in class org.apache.spark.mllib.clustering.GaussianMixture
-
Perform expectation maximization
- run(JavaRDD<Vector>) - Method in class org.apache.spark.mllib.clustering.GaussianMixture
-
Java-friendly version of run()
- run(RDD<Vector>) - Method in class org.apache.spark.mllib.clustering.KMeans
-
Train a K-means model on the given set of points; data
should be cached for high
performance, because this is an iterative algorithm.
- run(RDD<Tuple2<Object, Vector>>) - Method in class org.apache.spark.mllib.clustering.LDA
-
Learn an LDA model using the given dataset.
- run(JavaPairRDD<Long, Vector>) - Method in class org.apache.spark.mllib.clustering.LDA
-
Java-friendly version of run()
- run(Graph<Object, Object>) - Method in class org.apache.spark.mllib.clustering.PowerIterationClustering
-
Run the PIC algorithm on Graph.
- run(RDD<Tuple3<Object, Object, Object>>) - Method in class org.apache.spark.mllib.clustering.PowerIterationClustering
-
Run the PIC algorithm.
- run(JavaRDD<Tuple3<Long, Long, Double>>) - Method in class org.apache.spark.mllib.clustering.PowerIterationClustering
-
A Java-friendly version of PowerIterationClustering.run
.
- run(RDD<FPGrowth.FreqItemset<Item>>, ClassTag<Item>) - Method in class org.apache.spark.mllib.fpm.AssociationRules
-
Computes the association rules with confidence above minConfidence
.
- run(JavaRDD<FPGrowth.FreqItemset<Item>>) - Method in class org.apache.spark.mllib.fpm.AssociationRules
-
Java-friendly version of run
.
- run(RDD<Object>, ClassTag<Item>) - Method in class org.apache.spark.mllib.fpm.FPGrowth
-
Computes an FP-Growth model that contains frequent itemsets.
- run(JavaRDD<Basket>) - Method in class org.apache.spark.mllib.fpm.FPGrowth
-
Java-friendly version of run
.
- run(RDD<Object[]>, ClassTag<Item>) - Method in class org.apache.spark.mllib.fpm.PrefixSpan
-
Finds the complete set of frequent sequential patterns in the input sequences of itemsets.
- run(JavaRDD<Sequence>) - Method in class org.apache.spark.mllib.fpm.PrefixSpan
-
A Java-friendly version of
run()
that reads sequences from a
JavaRDD
and returns
frequent sequences in a
PrefixSpanModel
.
- run(RDD<Rating>) - Method in class org.apache.spark.mllib.recommendation.ALS
-
Run ALS with the configured parameters on an input RDD of (user, product, rating) triples.
- run(JavaRDD<Rating>) - Method in class org.apache.spark.mllib.recommendation.ALS
-
Java-friendly version of ALS.run
.
- run(RDD<LabeledPoint>) - Method in class org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm
-
Run the algorithm with the configured parameters on an input
RDD of LabeledPoint entries.
- run(RDD<LabeledPoint>, Vector) - Method in class org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm
-
Run the algorithm with the configured parameters on an input RDD
of LabeledPoint entries starting from the initial weights provided.
- run(RDD<Tuple3<Object, Object, Object>>) - Method in class org.apache.spark.mllib.regression.IsotonicRegression
-
Run IsotonicRegression algorithm to obtain isotonic regression model.
- run(JavaRDD<Tuple3<Double, Double, Double>>) - Method in class org.apache.spark.mllib.regression.IsotonicRegression
-
Run pool adjacent violators algorithm to obtain isotonic regression model.
- run(RDD<LabeledPoint>) - Method in class org.apache.spark.mllib.tree.DecisionTree
-
Method to train a decision tree model over an RDD
- run(RDD<LabeledPoint>) - Method in class org.apache.spark.mllib.tree.GradientBoostedTrees
-
Method to train a gradient boosting model
- run(JavaRDD<LabeledPoint>) - Method in class org.apache.spark.mllib.tree.GradientBoostedTrees
-
Java-friendly API for org.apache.spark.mllib.tree.GradientBoostedTrees!#run
.
- run(RDD<LabeledPoint>) - Method in class org.apache.spark.mllib.tree.RandomForest
-
Method to train a decision tree model over an RDD
- run() - Method in class org.apache.spark.rdd.PartitionCoalescer
-
Runs the packing algorithm and returns an array of PartitionGroups that if possible are
load balanced and grouped by locality
- run() - Method in class org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread
-
- run() - Method in class org.apache.spark.util.SparkShutdownHook
-
- runApproximateJob(RDD<T>, Function2<TaskContext, Iterator<T>, U>, <any>, long) - Method in class org.apache.spark.SparkContext
-
:: DeveloperApi ::
Run a job that can return approximate results.
- runJob(RDD<T>, Function1<Iterator<T>, U>, Seq<Object>, Function2<Object, U, BoxedUnit>, Function0<R>) - Method in class org.apache.spark.ComplexFutureAction
-
Runs a Spark job.
- runJob(RDD<T>, Function2<TaskContext, Iterator<T>, U>, Seq<Object>, Function2<Object, U, BoxedUnit>, ClassTag<U>) - Method in class org.apache.spark.SparkContext
-
Run a function on a given set of partitions in an RDD and pass the results to the given
handler function.
- runJob(RDD<T>, Function2<TaskContext, Iterator<T>, U>, Seq<Object>, ClassTag<U>) - Method in class org.apache.spark.SparkContext
-
Run a function on a given set of partitions in an RDD and return the results as an array.
- runJob(RDD<T>, Function1<Iterator<T>, U>, Seq<Object>, ClassTag<U>) - Method in class org.apache.spark.SparkContext
-
Run a job on a given set of partitions of an RDD, but take a function of type
Iterator[T] => U
instead of (TaskContext, Iterator[T]) => U
.
- runJob(RDD<T>, Function2<TaskContext, Iterator<T>, U>, Seq<Object>, boolean, Function2<Object, U, BoxedUnit>, ClassTag<U>) - Method in class org.apache.spark.SparkContext
-
Run a function on a given set of partitions in an RDD and pass the results to the given
handler function.
- runJob(RDD<T>, Function2<TaskContext, Iterator<T>, U>, Seq<Object>, boolean, ClassTag<U>) - Method in class org.apache.spark.SparkContext
-
Run a function on a given set of partitions in an RDD and return the results as an array.
- runJob(RDD<T>, Function1<Iterator<T>, U>, Seq<Object>, boolean, ClassTag<U>) - Method in class org.apache.spark.SparkContext
-
Run a job on a given set of partitions of an RDD, but take a function of type
Iterator[T] => U
instead of (TaskContext, Iterator[T]) => U
.
- runJob(RDD<T>, Function2<TaskContext, Iterator<T>, U>, ClassTag<U>) - Method in class org.apache.spark.SparkContext
-
Run a job on all partitions in an RDD and return the results in an array.
- runJob(RDD<T>, Function1<Iterator<T>, U>, ClassTag<U>) - Method in class org.apache.spark.SparkContext
-
Run a job on all partitions in an RDD and return the results in an array.
- runJob(RDD<T>, Function2<TaskContext, Iterator<T>, U>, Function2<Object, U, BoxedUnit>, ClassTag<U>) - Method in class org.apache.spark.SparkContext
-
Run a job on all partitions in an RDD and pass the results to a handler function.
- runJob(RDD<T>, Function1<Iterator<T>, U>, Function2<Object, U, BoxedUnit>, ClassTag<U>) - Method in class org.apache.spark.SparkContext
-
Run a job on all partitions in an RDD and pass the results to a handler function.
- runLBFGS(RDD<Tuple2<Object, Vector>>, Gradient, Updater, int, double, int, double, Vector) - Static method in class org.apache.spark.mllib.optimization.LBFGS
-
Run Limited-memory BFGS (L-BFGS) in parallel.
- runMiniBatchSGD(RDD<Tuple2<Object, Vector>>, Gradient, Updater, double, int, double, double, Vector, double) - Static method in class org.apache.spark.mllib.optimization.GradientDescent
-
Run stochastic gradient descent (SGD) in parallel using mini batches.
- runMiniBatchSGD(RDD<Tuple2<Object, Vector>>, Gradient, Updater, double, int, double, double, Vector) - Static method in class org.apache.spark.mllib.optimization.GradientDescent
-
Alias of runMiniBatchSGD
with convergenceTol set to default value of 0.001.
- running() - Method in class org.apache.spark.scheduler.TaskInfo
-
- runningLocally() - Method in class org.apache.spark.TaskContext
-
- runSqlHive(String) - Method in class org.apache.spark.sql.hive.HiveContext
-
- runSVDPlusPlus(RDD<Edge<Object>>, SVDPlusPlus.Conf) - Static method in class org.apache.spark.graphx.lib.SVDPlusPlus
-
This method is now replaced by the updated version of run()
and returns exactly
the same result.
- RuntimePercentage - Class in org.apache.spark.scheduler
-
- RuntimePercentage(double, Option<Object>, double) - Constructor for class org.apache.spark.scheduler.RuntimePercentage
-
- runUntilConvergence(Graph<VD, ED>, double, double, ClassTag<VD>, ClassTag<ED>) - Static method in class org.apache.spark.graphx.lib.PageRank
-
Run a dynamic version of PageRank returning a graph with vertex attributes containing the
PageRank and edge attributes containing the normalized edge weight.
- runUntilConvergenceWithOptions(Graph<VD, ED>, double, double, Option<Object>, ClassTag<VD>, ClassTag<ED>) - Static method in class org.apache.spark.graphx.lib.PageRank
-
Run a dynamic version of PageRank returning a graph with vertex attributes containing the
PageRank and edge attributes containing the normalized edge weight.
- runWithOptions(Graph<VD, ED>, int, double, Option<Object>, ClassTag<VD>, ClassTag<ED>) - Static method in class org.apache.spark.graphx.lib.PageRank
-
Run PageRank for a fixed number of iterations returning a graph
with vertex attributes containing the PageRank and edge
attributes the normalized edge weight.
- runWithValidation(RDD<LabeledPoint>, RDD<LabeledPoint>) - Method in class org.apache.spark.mllib.tree.GradientBoostedTrees
-
Method to validate a gradient boosting model
- runWithValidation(JavaRDD<LabeledPoint>, JavaRDD<LabeledPoint>) - Method in class org.apache.spark.mllib.tree.GradientBoostedTrees
-
Java-friendly API for org.apache.spark.mllib.tree.GradientBoostedTrees!#runWithValidation
.
- s() - Method in class org.apache.spark.mllib.linalg.SingularValueDecomposition
-
- sample(boolean, Double) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Return a sampled subset of this RDD.
- sample(boolean, Double, long) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Return a sampled subset of this RDD.
- sample(boolean, double) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return a sampled subset of this RDD.
- sample(boolean, double, long) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return a sampled subset of this RDD.
- sample(boolean, double) - Method in class org.apache.spark.api.java.JavaRDD
-
Return a sampled subset of this RDD.
- sample(boolean, double, long) - Method in class org.apache.spark.api.java.JavaRDD
-
Return a sampled subset of this RDD.
- sample(boolean, double, long) - Method in class org.apache.spark.rdd.RDD
-
Return a sampled subset of this RDD.
- sample(boolean, double, long) - Method in class org.apache.spark.sql.DataFrame
-
Returns a new
DataFrame
by sampling a fraction of rows.
- sample(boolean, double) - Method in class org.apache.spark.sql.DataFrame
-
Returns a new
DataFrame
by sampling a fraction of rows, using a random seed.
- sample(Iterator<T>) - Method in class org.apache.spark.util.random.BernoulliCellSampler
-
- sample(Iterator<T>) - Method in class org.apache.spark.util.random.BernoulliSampler
-
- sample(Iterator<T>) - Method in class org.apache.spark.util.random.PoissonSampler
-
- sample(Iterator<T>) - Method in interface org.apache.spark.util.random.RandomSampler
-
take a random sample
- sampleBy(String, Map<T, Object>, long) - Method in class org.apache.spark.sql.DataFrameStatFunctions
-
Returns a stratified sample without replacement based on the fraction given on each stratum.
- sampleBy(String, Map<T, Double>, long) - Method in class org.apache.spark.sql.DataFrameStatFunctions
-
Returns a stratified sample without replacement based on the fraction given on each stratum.
- sampleByKey(boolean, Map<K, Object>, long) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return a subset of this RDD sampled by key (via stratified sampling).
- sampleByKey(boolean, Map<K, Object>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return a subset of this RDD sampled by key (via stratified sampling).
- sampleByKey(boolean, Map<K, Object>, long) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Return a subset of this RDD sampled by key (via stratified sampling).
- sampleByKeyExact(boolean, Map<K, Object>, long) - Method in class org.apache.spark.api.java.JavaPairRDD
-
::Experimental::
Return a subset of this RDD sampled by key (via stratified sampling) containing exactly
math.ceil(numItems * samplingRate) for each stratum (group of pairs with the same key).
- sampleByKeyExact(boolean, Map<K, Object>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
::Experimental::
Return a subset of this RDD sampled by key (via stratified sampling) containing exactly
math.ceil(numItems * samplingRate) for each stratum (group of pairs with the same key).
- sampleByKeyExact(boolean, Map<K, Object>, long) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
::Experimental::
Return a subset of this RDD sampled by key (via stratified sampling) containing exactly
math.ceil(numItems * samplingRate) for each stratum (group of pairs with the same key).
- sampleStdev() - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Compute the sample standard deviation of this RDD's elements (which corrects for bias in
estimating the standard deviation by dividing by N-1 instead of N).
- sampleStdev() - Method in class org.apache.spark.rdd.DoubleRDDFunctions
-
Compute the sample standard deviation of this RDD's elements (which corrects for bias in
estimating the standard deviation by dividing by N-1 instead of N).
- sampleStdev() - Method in class org.apache.spark.util.StatCounter
-
Return the sample standard deviation of the values, which corrects for bias in estimating the
variance by dividing by N-1 instead of N.
- sampleVariance() - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Compute the sample variance of this RDD's elements (which corrects for bias in
estimating the standard variance by dividing by N-1 instead of N).
- sampleVariance() - Method in class org.apache.spark.rdd.DoubleRDDFunctions
-
Compute the sample variance of this RDD's elements (which corrects for bias in
estimating the variance by dividing by N-1 instead of N).
- sampleVariance() - Method in class org.apache.spark.util.StatCounter
-
Return the sample variance, which corrects for bias in estimating the variance by dividing
by N-1 instead of N.
- save(SparkContext, String) - Method in class org.apache.spark.mllib.classification.LogisticRegressionModel
-
- save(SparkContext, String) - Method in class org.apache.spark.mllib.classification.NaiveBayesModel
-
- save(SparkContext, String) - Method in class org.apache.spark.mllib.classification.SVMModel
-
- save(SparkContext, String) - Method in class org.apache.spark.mllib.clustering.DistributedLDAModel
-
Java-friendly version of topicDistributions
- save(SparkContext, String) - Method in class org.apache.spark.mllib.clustering.GaussianMixtureModel
-
- save(SparkContext, String) - Method in class org.apache.spark.mllib.clustering.KMeansModel
-
- save(SparkContext, String) - Method in class org.apache.spark.mllib.clustering.LocalLDAModel
-
- save(SparkContext, String) - Method in class org.apache.spark.mllib.clustering.PowerIterationClusteringModel
-
- save(SparkContext, String) - Method in class org.apache.spark.mllib.feature.Word2VecModel
-
- save(SparkContext, String) - Method in class org.apache.spark.mllib.recommendation.MatrixFactorizationModel
-
Save this model to the given path.
- save(SparkContext, String) - Method in class org.apache.spark.mllib.regression.IsotonicRegressionModel
-
- save(SparkContext, String) - Method in class org.apache.spark.mllib.regression.LassoModel
-
- save(SparkContext, String) - Method in class org.apache.spark.mllib.regression.LinearRegressionModel
-
- save(SparkContext, String) - Method in class org.apache.spark.mllib.regression.RidgeRegressionModel
-
- save(SparkContext, String) - Method in class org.apache.spark.mllib.tree.model.DecisionTreeModel
-
- save(SparkContext, String) - Method in class org.apache.spark.mllib.tree.model.GradientBoostedTreesModel
-
- save(SparkContext, String) - Method in class org.apache.spark.mllib.tree.model.RandomForestModel
-
- save(SparkContext, String) - Method in interface org.apache.spark.mllib.util.Saveable
-
Save this model to the given path.
- save(String) - Method in class org.apache.spark.sql.DataFrame
-
Deprecated.
As of 1.4.0, replaced by write().save(path)
.
- save(String, SaveMode) - Method in class org.apache.spark.sql.DataFrame
-
Deprecated.
As of 1.4.0, replaced by write().mode(mode).save(path)
.
- save(String, String) - Method in class org.apache.spark.sql.DataFrame
-
Deprecated.
As of 1.4.0, replaced by write().format(source).save(path)
.
- save(String, String, SaveMode) - Method in class org.apache.spark.sql.DataFrame
-
Deprecated.
As of 1.4.0, replaced by write().format(source).mode(mode).save(path)
.
- save(String, SaveMode, Map<String, String>) - Method in class org.apache.spark.sql.DataFrame
-
Deprecated.
As of 1.4.0, replaced by
write().format(source).mode(mode).options(options).save(path)
.
- save(String, SaveMode, Map<String, String>) - Method in class org.apache.spark.sql.DataFrame
-
Deprecated.
As of 1.4.0, replaced by
write().format(source).mode(mode).options(options).save(path)
.
- save(String) - Method in class org.apache.spark.sql.DataFrameWriter
-
Saves the content of the
DataFrame
at the specified path.
- save() - Method in class org.apache.spark.sql.DataFrameWriter
-
Saves the content of the
DataFrame
as the specified table.
- Saveable - Interface in org.apache.spark.mllib.util
-
:: DeveloperApi ::
- saveAsHadoopDataset(JobConf) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Output the RDD to any Hadoop-supported storage system, using a Hadoop JobConf object for
that storage system.
- saveAsHadoopDataset(JobConf) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Output the RDD to any Hadoop-supported storage system, using a Hadoop JobConf object for
that storage system.
- saveAsHadoopFile(String, Class<?>, Class<?>, Class<F>, JobConf) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Output the RDD to any Hadoop-supported file system.
- saveAsHadoopFile(String, Class<?>, Class<?>, Class<F>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Output the RDD to any Hadoop-supported file system.
- saveAsHadoopFile(String, Class<?>, Class<?>, Class<F>, Class<? extends CompressionCodec>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Output the RDD to any Hadoop-supported file system, compressing with the supplied codec.
- saveAsHadoopFile(String, ClassTag<F>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Output the RDD to any Hadoop-supported file system, using a Hadoop OutputFormat
class
supporting the key and value types K and V in this RDD.
- saveAsHadoopFile(String, Class<? extends CompressionCodec>, ClassTag<F>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Output the RDD to any Hadoop-supported file system, using a Hadoop OutputFormat
class
supporting the key and value types K and V in this RDD.
- saveAsHadoopFile(String, Class<?>, Class<?>, Class<? extends OutputFormat<?, ?>>, Class<? extends CompressionCodec>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Output the RDD to any Hadoop-supported file system, using a Hadoop OutputFormat
class
supporting the key and value types K and V in this RDD.
- saveAsHadoopFile(String, Class<?>, Class<?>, Class<? extends OutputFormat<?, ?>>, JobConf, Option<Class<? extends CompressionCodec>>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Output the RDD to any Hadoop-supported file system, using a Hadoop OutputFormat
class
supporting the key and value types K and V in this RDD.
- saveAsHadoopFiles(String, String) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Save each RDD in this
DStream as a Hadoop file.
- saveAsHadoopFiles(String, String, Class<?>, Class<?>, Class<F>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Save each RDD in this
DStream as a Hadoop file.
- saveAsHadoopFiles(String, String, Class<?>, Class<?>, Class<F>, JobConf) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Save each RDD in this
DStream as a Hadoop file.
- saveAsHadoopFiles(String, String, ClassTag<F>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Save each RDD in this
DStream as a Hadoop file.
- saveAsHadoopFiles(String, String, Class<?>, Class<?>, Class<? extends OutputFormat<?, ?>>, JobConf) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Save each RDD in this
DStream as a Hadoop file.
- saveAsLibSVMFile(RDD<LabeledPoint>, String) - Static method in class org.apache.spark.mllib.util.MLUtils
-
Save labeled data in LIBSVM format.
- saveAsNewAPIHadoopDataset(Configuration) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Output the RDD to any Hadoop-supported storage system, using
a Configuration object for that storage system.
- saveAsNewAPIHadoopDataset(Configuration) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Output the RDD to any Hadoop-supported storage system with new Hadoop API, using a Hadoop
Configuration object for that storage system.
- saveAsNewAPIHadoopFile(String, Class<?>, Class<?>, Class<F>, Configuration) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Output the RDD to any Hadoop-supported file system.
- saveAsNewAPIHadoopFile(String, Class<?>, Class<?>, Class<F>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Output the RDD to any Hadoop-supported file system.
- saveAsNewAPIHadoopFile(String, ClassTag<F>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Output the RDD to any Hadoop-supported file system, using a new Hadoop API OutputFormat
(mapreduce.OutputFormat) object supporting the key and value types K and V in this RDD.
- saveAsNewAPIHadoopFile(String, Class<?>, Class<?>, Class<? extends OutputFormat<?, ?>>, Configuration) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Output the RDD to any Hadoop-supported file system, using a new Hadoop API OutputFormat
(mapreduce.OutputFormat) object supporting the key and value types K and V in this RDD.
- saveAsNewAPIHadoopFiles(String, String) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Save each RDD in this
DStream as a Hadoop file.
- saveAsNewAPIHadoopFiles(String, String, Class<?>, Class<?>, Class<F>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Save each RDD in this
DStream as a Hadoop file.
- saveAsNewAPIHadoopFiles(String, String, Class<?>, Class<?>, Class<F>, Configuration) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Save each RDD in this
DStream as a Hadoop file.
- saveAsNewAPIHadoopFiles(String, String, ClassTag<F>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Save each RDD in this
DStream as a Hadoop file.
- saveAsNewAPIHadoopFiles(String, String, Class<?>, Class<?>, Class<? extends OutputFormat<?, ?>>, Configuration) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Save each RDD in this
DStream as a Hadoop file.
- saveAsObjectFile(String) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Save this RDD as a SequenceFile of serialized objects.
- saveAsObjectFile(String) - Method in class org.apache.spark.rdd.RDD
-
Save this RDD as a SequenceFile of serialized objects.
- saveAsObjectFiles(String, String) - Method in class org.apache.spark.streaming.dstream.DStream
-
Save each RDD in this DStream as a Sequence file of serialized objects.
- saveAsParquetFile(String) - Method in class org.apache.spark.sql.DataFrame
-
Deprecated.
As of 1.4.0, replaced by write().parquet()
.
- saveAsSequenceFile(String, Option<Class<? extends CompressionCodec>>) - Method in class org.apache.spark.rdd.SequenceFileRDDFunctions
-
Output the RDD as a Hadoop SequenceFile using the Writable types we infer from the RDD's key
and value types.
- saveAsTable(String) - Method in class org.apache.spark.sql.DataFrame
-
Deprecated.
As of 1.4.0, replaced by write().saveAsTable(tableName)
.
- saveAsTable(String, SaveMode) - Method in class org.apache.spark.sql.DataFrame
-
Deprecated.
As of 1.4.0, replaced by write().mode(mode).saveAsTable(tableName)
.
- saveAsTable(String, String) - Method in class org.apache.spark.sql.DataFrame
-
Deprecated.
As of 1.4.0, replaced by write().format(source).saveAsTable(tableName)
.
- saveAsTable(String, String, SaveMode) - Method in class org.apache.spark.sql.DataFrame
-
Deprecated.
As of 1.4.0, replaced by write().mode(mode).saveAsTable(tableName)
.
- saveAsTable(String, String, SaveMode, Map<String, String>) - Method in class org.apache.spark.sql.DataFrame
-
Deprecated.
As of 1.4.0, replaced by
write().format(source).mode(mode).options(options).saveAsTable(tableName)
.
- saveAsTable(String, String, SaveMode, Map<String, String>) - Method in class org.apache.spark.sql.DataFrame
-
Deprecated.
As of 1.4.0, replaced by
write().format(source).mode(mode).options(options).saveAsTable(tableName)
.
- saveAsTable(String) - Method in class org.apache.spark.sql.DataFrameWriter
-
Saves the content of the
DataFrame
as the specified table.
- saveAsTextFile(String) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Save this RDD as a text file, using string representations of elements.
- saveAsTextFile(String, Class<? extends CompressionCodec>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Save this RDD as a compressed text file, using string representations of elements.
- saveAsTextFile(String) - Method in class org.apache.spark.rdd.RDD
-
Save this RDD as a text file, using string representations of elements.
- saveAsTextFile(String, Class<? extends CompressionCodec>) - Method in class org.apache.spark.rdd.RDD
-
Save this RDD as a compressed text file, using string representations of elements.
- saveAsTextFiles(String, String) - Method in class org.apache.spark.streaming.dstream.DStream
-
Save each RDD in this DStream as at text file, using string representation
of elements.
- saveLabeledData(RDD<LabeledPoint>, String) - Static method in class org.apache.spark.mllib.util.MLUtils
-
- SaveMode - Enum in org.apache.spark.sql
-
SaveMode is used to specify the expected behavior of saving a DataFrame to a data source.
- sc() - Method in class org.apache.spark.api.java.JavaSparkContext
-
- sc() - Method in class org.apache.spark.sql.SQLContext.implicits$.StringToColumn
-
- sc() - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Deprecated.
As of 0.9.0, replaced by sparkContext
- sc() - Method in class org.apache.spark.streaming.StreamingContext
-
- scalaIntToJavaLong(DStream<Object>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
- scalaToJavaLong(JavaPairDStream<K, Object>, ClassTag<K>) - Static method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
- scale() - Method in class org.apache.spark.mllib.random.GammaGenerator
-
- scale() - Method in class org.apache.spark.sql.types.Decimal
-
- scale() - Method in class org.apache.spark.sql.types.DecimalType
-
- scale() - Method in class org.apache.spark.sql.types.PrecisionInfo
-
- scalingVec() - Method in class org.apache.spark.ml.feature.ElementwiseProduct
-
the vector to multiply with input vectors
- scalingVec() - Method in class org.apache.spark.mllib.feature.ElementwiseProduct
-
- scheduler() - Method in class org.apache.spark.streaming.StreamingContext
-
- schedulingDelay() - Method in class org.apache.spark.streaming.scheduler.BatchInfo
-
Time taken for the first job of this batch to start processing from the time this batch
was submitted to the streaming scheduler.
- SchedulingMode - Class in org.apache.spark.scheduler
-
"FAIR" and "FIFO" determines which policy is used
to order tasks amongst a Schedulable's sub-queues
"NONE" is used when the a Schedulable has no sub-queues.
- SchedulingMode() - Constructor for class org.apache.spark.scheduler.SchedulingMode
-
- schedulingMode() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- schedulingPool() - Method in class org.apache.spark.status.api.v1.StageData
-
- schema() - Method in class org.apache.spark.sql.DataFrame
-
- schema(StructType) - Method in class org.apache.spark.sql.DataFrameReader
-
Specifies the input schema.
- schema() - Method in interface org.apache.spark.sql.Row
-
Schema for the row.
- schema() - Method in class org.apache.spark.sql.sources.BaseRelation
-
- schema() - Method in class org.apache.spark.sql.sources.HadoopFsRelation
-
Schema of this relation.
- SchemaRelationProvider - Interface in org.apache.spark.sql.sources
-
::DeveloperApi::
Implemented by objects that produce relations for a specific kind of data source
with a given schema.
- scope() - Method in class org.apache.spark.rdd.RDD
-
The scope associated with the operation that created this RDD.
- scope() - Method in class org.apache.spark.storage.RDDInfo
-
- scoreAndLabels() - Method in class org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
-
- ScriptTransformationWriterThread - Class in org.apache.spark.sql.hive.execution
-
- ScriptTransformationWriterThread(Iterator<InternalRow>, Seq<DataType>, org.apache.spark.sql.catalyst.expressions.Projection, AbstractSerDe, ObjectInspector, HiveScriptIOSchema, OutputStream, Process, org.apache.spark.util.CircularBuffer, TaskContext, Configuration) - Constructor for class org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread
-
- second(Column) - Static method in class org.apache.spark.sql.functions
-
Extracts the seconds as an integer from a given date/timestamp/string.
- seconds() - Static method in class org.apache.spark.scheduler.StatsReportListener
-
- seconds(long) - Static method in class org.apache.spark.streaming.Durations
-
- Seconds - Class in org.apache.spark.streaming
-
Helper object that creates instance of
Duration
representing
a given number of seconds.
- Seconds() - Constructor for class org.apache.spark.streaming.Seconds
-
- securityManager() - Method in class org.apache.spark.SparkEnv
-
- select(Column...) - Method in class org.apache.spark.sql.DataFrame
-
Selects a set of column based expressions.
- select(String, String...) - Method in class org.apache.spark.sql.DataFrame
-
Selects a set of columns.
- select(Seq<Column>) - Method in class org.apache.spark.sql.DataFrame
-
Selects a set of column based expressions.
- select(String, Seq<String>) - Method in class org.apache.spark.sql.DataFrame
-
Selects a set of columns.
- selectedFeatures() - Method in class org.apache.spark.mllib.feature.ChiSqSelectorModel
-
- selectExpr(String...) - Method in class org.apache.spark.sql.DataFrame
-
Selects a set of SQL expressions.
- selectExpr(Seq<String>) - Method in class org.apache.spark.sql.DataFrame
-
Selects a set of SQL expressions.
- sendToDst(A) - Method in class org.apache.spark.graphx.EdgeContext
-
Sends a message to the destination vertex.
- sendToDst(A) - Method in class org.apache.spark.graphx.impl.AggregatingEdgeContext
-
- sendToSrc(A) - Method in class org.apache.spark.graphx.EdgeContext
-
Sends a message to the source vertex.
- sendToSrc(A) - Method in class org.apache.spark.graphx.impl.AggregatingEdgeContext
-
- sequence() - Method in class org.apache.spark.mllib.fpm.PrefixSpan.FreqSequence
-
- sequenceFile(String, Class<K>, Class<V>, int) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Get an RDD for a Hadoop SequenceFile with given key and value types.
- sequenceFile(String, Class<K>, Class<V>) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Get an RDD for a Hadoop SequenceFile.
- sequenceFile(String, Class<K>, Class<V>, int) - Method in class org.apache.spark.SparkContext
-
Get an RDD for a Hadoop SequenceFile with given key and value types.
- sequenceFile(String, Class<K>, Class<V>) - Method in class org.apache.spark.SparkContext
-
Get an RDD for a Hadoop SequenceFile with given key and value types.
- sequenceFile(String, int, ClassTag<K>, ClassTag<V>, Function0<WritableConverter<K>>, Function0<WritableConverter<V>>) - Method in class org.apache.spark.SparkContext
-
Version of sequenceFile() for types implicitly convertible to Writables through a
WritableConverter.
- SequenceFileRDDFunctions<K,V> - Class in org.apache.spark.rdd
-
Extra functions available on RDDs of (key, value) pairs to create a Hadoop SequenceFile,
through an implicit conversion.
- SequenceFileRDDFunctions(RDD<Tuple2<K, V>>, Class<? extends Writable>, Class<? extends Writable>, Function1<K, Writable>, ClassTag<K>, Function1<V, Writable>, ClassTag<V>) - Constructor for class org.apache.spark.rdd.SequenceFileRDDFunctions
-
- SequenceFileRDDFunctions(RDD<Tuple2<K, V>>, Function1<K, Writable>, ClassTag<K>, Function1<V, Writable>, ClassTag<V>) - Constructor for class org.apache.spark.rdd.SequenceFileRDDFunctions
-
- SerializableWritable<T extends org.apache.hadoop.io.Writable> - Class in org.apache.spark
-
- SerializableWritable(T) - Constructor for class org.apache.spark.SerializableWritable
-
- SerializationStream - Class in org.apache.spark.serializer
-
:: DeveloperApi ::
A stream for writing serialized objects.
- SerializationStream() - Constructor for class org.apache.spark.serializer.SerializationStream
-
- serialize(Object) - Method in class org.apache.spark.mllib.linalg.VectorUDT
-
- serialize(T, ClassTag<T>) - Method in class org.apache.spark.serializer.DummySerializerInstance
-
- serialize(T, ClassTag<T>) - Method in class org.apache.spark.serializer.SerializerInstance
-
- serialize(Object) - Method in class org.apache.spark.sql.types.UserDefinedType
-
Convert the user type to a SQL datum
- serializedData() - Method in class org.apache.spark.scheduler.local.StatusUpdate
-
- serializedPyClass() - Method in class org.apache.spark.sql.types.UserDefinedType
-
Serialized Python UDT class, if exists.
- Serializer - Class in org.apache.spark.serializer
-
:: DeveloperApi ::
A serializer.
- Serializer() - Constructor for class org.apache.spark.serializer.Serializer
-
- serializer() - Method in class org.apache.spark.ShuffleDependency
-
- serializer() - Method in class org.apache.spark.SparkEnv
-
- SerializerInstance - Class in org.apache.spark.serializer
-
:: DeveloperApi ::
An instance of a serializer, for use by one thread at a time.
- SerializerInstance() - Constructor for class org.apache.spark.serializer.SerializerInstance
-
- serializeStream(OutputStream) - Method in class org.apache.spark.serializer.DummySerializerInstance
-
- serializeStream(OutputStream) - Method in class org.apache.spark.serializer.SerializerInstance
-
- sessionState() - Method in class org.apache.spark.sql.hive.HiveContext.SQLSession
-
SQLConf and HiveConf contracts:
- set(Edge<ED>) - Method in class org.apache.spark.graphx.EdgeTriplet
-
Set the edge properties of this triplet.
- set(long, long, int, int, VD, VD, ED) - Method in class org.apache.spark.graphx.impl.AggregatingEdgeContext
-
- set(Param<T>, T) - Method in interface org.apache.spark.ml.param.Params
-
Sets a parameter in the embedded param map.
- set(String, Object) - Method in interface org.apache.spark.ml.param.Params
-
Sets a parameter (by name) in the embedded param map.
- set(ParamPair<?>) - Method in interface org.apache.spark.ml.param.Params
-
Sets a parameter in the embedded param map.
- set(String, String) - Method in class org.apache.spark.SparkConf
-
Set a configuration variable.
- set(SparkEnv) - Static method in class org.apache.spark.SparkEnv
-
- set(long) - Method in class org.apache.spark.sql.types.Decimal
-
Set this Decimal to the given Long.
- set(int) - Method in class org.apache.spark.sql.types.Decimal
-
Set this Decimal to the given Int.
- set(long, int, int) - Method in class org.apache.spark.sql.types.Decimal
-
Set this Decimal to the given unscaled Long, with a given precision and scale.
- set(BigDecimal, int, int) - Method in class org.apache.spark.sql.types.Decimal
-
Set this Decimal to the given BigDecimal value, with a given precision and scale.
- set(BigDecimal) - Method in class org.apache.spark.sql.types.Decimal
-
Set this Decimal to the given BigDecimal value, inheriting its precision and scale.
- set(Decimal) - Method in class org.apache.spark.sql.types.Decimal
-
Set this Decimal to the given Decimal value.
- setAggregator(Aggregator<K, V, C>) - Method in class org.apache.spark.rdd.ShuffledRDD
-
Set aggregator for RDD's shuffle.
- setAlgo(String) - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
Sets Algorithm using a String.
- setAll(Traversable<Tuple2<String, String>>) - Method in class org.apache.spark.SparkConf
-
Set multiple parameters together
- setAlpha(double) - Method in class org.apache.spark.ml.recommendation.ALS
-
- setAlpha(Vector) - Method in class org.apache.spark.mllib.clustering.LDA
-
Alias for setDocConcentration()
- setAlpha(double) - Method in class org.apache.spark.mllib.clustering.LDA
-
Alias for setDocConcentration()
- setAlpha(double) - Method in class org.apache.spark.mllib.recommendation.ALS
-
Sets the constant used in computing confidence in implicit ALS.
- setAppName(String) - Method in class org.apache.spark.launcher.SparkLauncher
-
Set the application name.
- setAppName(String) - Method in class org.apache.spark.SparkConf
-
Set a name for your application.
- setAppResource(String) - Method in class org.apache.spark.launcher.SparkLauncher
-
Set the main application resource.
- setBandwidth(double) - Method in class org.apache.spark.mllib.stat.KernelDensity
-
Sets the bandwidth (standard deviation) of the Gaussian kernel (default: 1.0
).
- setBeta(double) - Method in class org.apache.spark.mllib.clustering.LDA
-
Alias for setTopicConcentration()
- setBlocks(int) - Method in class org.apache.spark.mllib.recommendation.ALS
-
Set the number of blocks for both user blocks and product blocks to parallelize the computation
into; pass -1 for an auto-configured number of blocks.
- setBlockSize(int) - Method in class org.apache.spark.ml.classification.MultilayerPerceptronClassifier
-
- setCacheNodeIds(boolean) - Method in class org.apache.spark.ml.classification.DecisionTreeClassifier
-
- setCacheNodeIds(boolean) - Method in class org.apache.spark.ml.classification.GBTClassifier
-
- setCacheNodeIds(boolean) - Method in class org.apache.spark.ml.classification.RandomForestClassifier
-
- setCacheNodeIds(boolean) - Method in class org.apache.spark.ml.regression.DecisionTreeRegressor
-
- setCacheNodeIds(boolean) - Method in class org.apache.spark.ml.regression.GBTRegressor
-
- setCacheNodeIds(boolean) - Method in class org.apache.spark.ml.regression.RandomForestRegressor
-
- setCallSite(String) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Pass-through to SparkContext.setCallSite.
- setCallSite(String) - Method in class org.apache.spark.SparkContext
-
Set the thread-local property for overriding the call sites
of actions and RDDs.
- setCaseSensitive(boolean) - Method in class org.apache.spark.ml.feature.StopWordsRemover
-
- setCategoricalFeaturesInfo(Map<Integer, Integer>) - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
Sets categoricalFeaturesInfo using a Java Map.
- setCheckpointDir(String) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Set the directory under which RDDs are going to be checkpointed.
- setCheckpointDir(String) - Method in class org.apache.spark.SparkContext
-
Set the directory under which RDDs are going to be checkpointed.
- setCheckpointInterval(int) - Method in class org.apache.spark.ml.classification.DecisionTreeClassifier
-
- setCheckpointInterval(int) - Method in class org.apache.spark.ml.classification.GBTClassifier
-
- setCheckpointInterval(int) - Method in class org.apache.spark.ml.classification.RandomForestClassifier
-
- setCheckpointInterval(int) - Method in class org.apache.spark.ml.recommendation.ALS
-
- setCheckpointInterval(int) - Method in class org.apache.spark.ml.regression.DecisionTreeRegressor
-
- setCheckpointInterval(int) - Method in class org.apache.spark.ml.regression.GBTRegressor
-
- setCheckpointInterval(int) - Method in class org.apache.spark.ml.regression.RandomForestRegressor
-
- setCheckpointInterval(int) - Method in class org.apache.spark.mllib.clustering.LDA
-
Period (in iterations) between checkpoints (default = 10).
- setCheckpointInterval(int) - Method in class org.apache.spark.mllib.recommendation.ALS
-
Set period (in iterations) between checkpoints (default = 10).
- setCheckpointInterval(int) - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- setClassifier(Classifier<?, ?, ?>) - Method in class org.apache.spark.ml.classification.OneVsRest
-
- setConf(String, String) - Method in class org.apache.spark.launcher.SparkLauncher
-
Set a single configuration value for the application.
- setConf(String, String) - Method in class org.apache.spark.sql.hive.HiveContext
-
- setConf(Properties) - Method in class org.apache.spark.sql.SQLContext
-
Set Spark SQL configuration properties.
- setConf(String, String) - Method in class org.apache.spark.sql.SQLContext
-
Set the given Spark SQL configuration property.
- setConvergenceTol(double) - Method in class org.apache.spark.mllib.clustering.GaussianMixture
-
Set the largest change in log-likelihood at which convergence is
considered to have occurred.
- setConvergenceTol(double) - Method in class org.apache.spark.mllib.optimization.GradientDescent
-
Set the convergence tolerance.
- setConvergenceTol(double) - Method in class org.apache.spark.mllib.optimization.LBFGS
-
Set the convergence tolerance of iterations for L-BFGS.
- setConvergenceTol(double) - Method in class org.apache.spark.mllib.regression.StreamingLinearRegressionWithSGD
-
Set the convergence tolerance.
- setDecayFactor(double) - Method in class org.apache.spark.mllib.clustering.StreamingKMeans
-
Set the decay factor directly (for forgetful algorithms).
- setDefault(Param<T>, T) - Method in interface org.apache.spark.ml.param.Params
-
Sets a default value for a param.
- setDefault(Seq<ParamPair<?>>) - Method in interface org.apache.spark.ml.param.Params
-
Sets default values for a list of params.
- setDefaultClassLoader(ClassLoader) - Method in class org.apache.spark.serializer.Serializer
-
Sets a class loader for the serializer to use in deserialization.
- setDegree(int) - Method in class org.apache.spark.ml.feature.PolynomialExpansion
-
- setDeployMode(String) - Method in class org.apache.spark.launcher.SparkLauncher
-
Set the deploy mode for the application.
- setDocConcentration(Vector) - Method in class org.apache.spark.mllib.clustering.LDA
-
Concentration parameter (commonly named "alpha") for the prior placed on documents'
distributions over topics ("theta").
- setDocConcentration(double) - Method in class org.apache.spark.mllib.clustering.LDA
-
Replicates a Double
docConcentration to create a symmetric prior.
- setDropLast(boolean) - Method in class org.apache.spark.ml.feature.OneHotEncoder
-
- setElasticNetParam(double) - Method in class org.apache.spark.ml.classification.LogisticRegression
-
Set the ElasticNet mixing parameter.
- setElasticNetParam(double) - Method in class org.apache.spark.ml.regression.LinearRegression
-
Set the ElasticNet mixing parameter.
- setEpsilon(double) - Method in class org.apache.spark.mllib.clustering.KMeans
-
Set the distance threshold within which we've consider centers to have converged.
- setEstimator(Estimator<?>) - Method in class org.apache.spark.ml.tuning.CrossValidator
-
- setEstimator(Estimator<?>) - Method in class org.apache.spark.ml.tuning.TrainValidationSplit
-
- setEstimatorParamMaps(ParamMap[]) - Method in class org.apache.spark.ml.tuning.CrossValidator
-
- setEstimatorParamMaps(ParamMap[]) - Method in class org.apache.spark.ml.tuning.TrainValidationSplit
-
- setEvaluator(Evaluator) - Method in class org.apache.spark.ml.tuning.CrossValidator
-
- setEvaluator(Evaluator) - Method in class org.apache.spark.ml.tuning.TrainValidationSplit
-
- setExecutorEnv(String, String) - Method in class org.apache.spark.SparkConf
-
Set an environment variable to be used when launching executors for this application.
- setExecutorEnv(Seq<Tuple2<String, String>>) - Method in class org.apache.spark.SparkConf
-
Set multiple environment variables to be used when launching executors.
- setExecutorEnv(Tuple2<String, String>[]) - Method in class org.apache.spark.SparkConf
-
Set multiple environment variables to be used when launching executors.
- setFeatureIndex(int) - Method in class org.apache.spark.ml.regression.IsotonicRegression
-
- setFeatureIndex(int) - Method in class org.apache.spark.ml.regression.IsotonicRegressionModel
-
- setFeaturesCol(String) - Method in class org.apache.spark.ml.classification.OneVsRest
-
- setFeaturesCol(String) - Method in class org.apache.spark.ml.clustering.KMeans
-
- setFeaturesCol(String) - Method in class org.apache.spark.ml.feature.RFormula
-
- setFeaturesCol(String) - Method in class org.apache.spark.ml.PredictionModel
-
- setFeaturesCol(String) - Method in class org.apache.spark.ml.Predictor
-
- setFeaturesCol(String) - Method in class org.apache.spark.ml.regression.IsotonicRegression
-
- setFeaturesCol(String) - Method in class org.apache.spark.ml.regression.IsotonicRegressionModel
-
- setFeatureSubsetStrategy(String) - Method in class org.apache.spark.ml.classification.RandomForestClassifier
-
- setFeatureSubsetStrategy(String) - Method in class org.apache.spark.ml.regression.RandomForestRegressor
-
- setFinalRDDStorageLevel(StorageLevel) - Method in class org.apache.spark.mllib.recommendation.ALS
-
:: DeveloperApi ::
Sets storage level for final RDDs (user/product used in MatrixFactorizationModel).
- setFitIntercept(boolean) - Method in class org.apache.spark.ml.classification.LogisticRegression
-
Whether to fit an intercept term.
- setFitIntercept(boolean) - Method in class org.apache.spark.ml.regression.LinearRegression
-
Set if we should fit the intercept
Default is true.
- setFormula(String) - Method in class org.apache.spark.ml.feature.RFormula
-
Sets the formula to use for this transformer.
- setGaps(boolean) - Method in class org.apache.spark.ml.feature.RegexTokenizer
-
- setGradient(Gradient) - Method in class org.apache.spark.mllib.optimization.GradientDescent
-
Set the gradient function (of the loss function of one single data example)
to be used for SGD.
- setGradient(Gradient) - Method in class org.apache.spark.mllib.optimization.LBFGS
-
Set the gradient function (of the loss function of one single data example)
to be used for L-BFGS.
- setHalfLife(double, String) - Method in class org.apache.spark.mllib.clustering.StreamingKMeans
-
Set the half life and time unit ("batches" or "points") for forgetful algorithms.
- setIfMissing(String, String) - Method in class org.apache.spark.SparkConf
-
Set a parameter if it isn't already configured
- setImplicitPrefs(boolean) - Method in class org.apache.spark.ml.recommendation.ALS
-
- setImplicitPrefs(boolean) - Method in class org.apache.spark.mllib.recommendation.ALS
-
Sets whether to use implicit preference.
- setImpurity(String) - Method in class org.apache.spark.ml.classification.DecisionTreeClassifier
-
- setImpurity(String) - Method in class org.apache.spark.ml.classification.GBTClassifier
-
The impurity setting is ignored for GBT models.
- setImpurity(String) - Method in class org.apache.spark.ml.classification.RandomForestClassifier
-
- setImpurity(String) - Method in class org.apache.spark.ml.regression.DecisionTreeRegressor
-
- setImpurity(String) - Method in class org.apache.spark.ml.regression.GBTRegressor
-
The impurity setting is ignored for GBT models.
- setImpurity(String) - Method in class org.apache.spark.ml.regression.RandomForestRegressor
-
- setImpurity(Impurity) - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- setIndices(int[]) - Method in class org.apache.spark.ml.feature.VectorSlicer
-
- setInitialCenters(Vector[], double[]) - Method in class org.apache.spark.mllib.clustering.StreamingKMeans
-
Specify initial centers directly.
- setInitializationMode(String) - Method in class org.apache.spark.mllib.clustering.KMeans
-
Set the initialization algorithm.
- setInitializationMode(String) - Method in class org.apache.spark.mllib.clustering.PowerIterationClustering
-
Set the initialization mode.
- setInitializationSteps(int) - Method in class org.apache.spark.mllib.clustering.KMeans
-
Set the number of steps for the k-means|| initialization mode.
- setInitialModel(GaussianMixtureModel) - Method in class org.apache.spark.mllib.clustering.GaussianMixture
-
Set the initial GMM starting point, bypassing the random initialization.
- setInitialModel(KMeansModel) - Method in class org.apache.spark.mllib.clustering.KMeans
-
Set the initial starting point, bypassing the random initialization or k-means||
The condition model.k == this.k must be met, failure results
in an IllegalArgumentException.
- setInitialWeights(Vector) - Method in class org.apache.spark.mllib.classification.StreamingLogisticRegressionWithSGD
-
Set the initial weights.
- setInitialWeights(Vector) - Method in class org.apache.spark.mllib.regression.StreamingLinearRegressionWithSGD
-
Set the initial weights.
- setInitMode(String) - Method in class org.apache.spark.ml.clustering.KMeans
-
- setInitSteps(int) - Method in class org.apache.spark.ml.clustering.KMeans
-
- setInputCol(String) - Method in class org.apache.spark.ml.feature.Binarizer
-
- setInputCol(String) - Method in class org.apache.spark.ml.feature.Bucketizer
-
- setInputCol(String) - Method in class org.apache.spark.ml.feature.CountVectorizer
-
- setInputCol(String) - Method in class org.apache.spark.ml.feature.CountVectorizerModel
-
- setInputCol(String) - Method in class org.apache.spark.ml.feature.HashingTF
-
- setInputCol(String) - Method in class org.apache.spark.ml.feature.IDF
-
- setInputCol(String) - Method in class org.apache.spark.ml.feature.IDFModel
-
- setInputCol(String) - Method in class org.apache.spark.ml.feature.IndexToString
-
- setInputCol(String) - Method in class org.apache.spark.ml.feature.MinMaxScaler
-
- setInputCol(String) - Method in class org.apache.spark.ml.feature.MinMaxScalerModel
-
- setInputCol(String) - Method in class org.apache.spark.ml.feature.OneHotEncoder
-
- setInputCol(String) - Method in class org.apache.spark.ml.feature.PCA
-
- setInputCol(String) - Method in class org.apache.spark.ml.feature.PCAModel
-
- setInputCol(String) - Method in class org.apache.spark.ml.feature.StandardScaler
-
- setInputCol(String) - Method in class org.apache.spark.ml.feature.StandardScalerModel
-
- setInputCol(String) - Method in class org.apache.spark.ml.feature.StopWordsRemover
-
- setInputCol(String) - Method in class org.apache.spark.ml.feature.StringIndexer
-
- setInputCol(String) - Method in class org.apache.spark.ml.feature.StringIndexerModel
-
- setInputCol(String) - Method in class org.apache.spark.ml.feature.VectorIndexer
-
- setInputCol(String) - Method in class org.apache.spark.ml.feature.VectorIndexerModel
-
- setInputCol(String) - Method in class org.apache.spark.ml.feature.VectorSlicer
-
- setInputCol(String) - Method in class org.apache.spark.ml.feature.Word2Vec
-
- setInputCol(String) - Method in class org.apache.spark.ml.feature.Word2VecModel
-
- setInputCol(String) - Method in class org.apache.spark.ml.UnaryTransformer
-
- setInputCols(String[]) - Method in class org.apache.spark.ml.feature.VectorAssembler
-
- setIntercept(boolean) - Method in class org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm
-
Set if the algorithm should add an intercept.
- setIntermediateRDDStorageLevel(StorageLevel) - Method in class org.apache.spark.mllib.recommendation.ALS
-
:: DeveloperApi ::
Sets storage level for intermediate RDDs (user/product in/out links).
- setInverse(boolean) - Method in class org.apache.spark.ml.feature.DCT
-
- setIsotonic(boolean) - Method in class org.apache.spark.ml.regression.IsotonicRegression
-
- setIsotonic(boolean) - Method in class org.apache.spark.mllib.regression.IsotonicRegression
-
Sets the isotonic parameter.
- setItemCol(String) - Method in class org.apache.spark.ml.recommendation.ALS
-
- setItemCol(String) - Method in class org.apache.spark.ml.recommendation.ALSModel
-
- setIterations(int) - Method in class org.apache.spark.mllib.recommendation.ALS
-
Set the number of iterations to run.
- setJars(Seq<String>) - Method in class org.apache.spark.SparkConf
-
Set JAR files to distribute to the cluster.
- setJars(String[]) - Method in class org.apache.spark.SparkConf
-
Set JAR files to distribute to the cluster.
- setJavaHome(String) - Method in class org.apache.spark.launcher.SparkLauncher
-
Set a custom JAVA_HOME for launching the Spark application.
- setJobDescription(String) - Method in class org.apache.spark.SparkContext
-
Set a human readable description of the current job.
- setJobGroup(String, String, boolean) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Assigns a group ID to all the jobs started by this thread until the group ID is set to a
different value or cleared.
- setJobGroup(String, String) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Assigns a group ID to all the jobs started by this thread until the group ID is set to a
different value or cleared.
- setJobGroup(String, String, boolean) - Method in class org.apache.spark.SparkContext
-
Assigns a group ID to all the jobs started by this thread until the group ID is set to a
different value or cleared.
- setK(int) - Method in class org.apache.spark.ml.clustering.KMeans
-
- setK(int) - Method in class org.apache.spark.ml.feature.PCA
-
- setK(int) - Method in class org.apache.spark.mllib.clustering.GaussianMixture
-
Set the number of Gaussians in the mixture model.
- setK(int) - Method in class org.apache.spark.mllib.clustering.KMeans
-
Set the number of clusters to create (k).
- setK(int) - Method in class org.apache.spark.mllib.clustering.LDA
-
Number of topics to infer.
- setK(int) - Method in class org.apache.spark.mllib.clustering.PowerIterationClustering
-
- setK(int) - Method in class org.apache.spark.mllib.clustering.StreamingKMeans
-
Set the number of clusters.
- setKappa(double) - Method in class org.apache.spark.mllib.clustering.OnlineLDAOptimizer
-
Learning rate: exponential decay rate---should be between
(0.5, 1.0] to guarantee asymptotic convergence.
- setKeyOrdering(Ordering<K>) - Method in class org.apache.spark.rdd.ShuffledRDD
-
Set key ordering for RDD's shuffle.
- setLabelCol(String) - Method in class org.apache.spark.ml.classification.OneVsRest
-
- setLabelCol(String) - Method in class org.apache.spark.ml.evaluation.BinaryClassificationEvaluator
-
- setLabelCol(String) - Method in class org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
-
- setLabelCol(String) - Method in class org.apache.spark.ml.evaluation.RegressionEvaluator
-
- setLabelCol(String) - Method in class org.apache.spark.ml.feature.RFormula
-
- setLabelCol(String) - Method in class org.apache.spark.ml.Predictor
-
- setLabelCol(String) - Method in class org.apache.spark.ml.regression.IsotonicRegression
-
- setLabels(String[]) - Method in class org.apache.spark.ml.feature.IndexToString
-
Optional labels to be provided by the user, if not supplied column
metadata is read for labels.
- setLambda(double) - Method in class org.apache.spark.mllib.classification.NaiveBayes
-
Set the smoothing parameter.
- setLambda(double) - Method in class org.apache.spark.mllib.recommendation.ALS
-
Set the regularization parameter, lambda.
- setLayers(int[]) - Method in class org.apache.spark.ml.classification.MultilayerPerceptronClassifier
-
- setLearningRate(double) - Method in class org.apache.spark.mllib.feature.Word2Vec
-
Sets initial learning rate (default: 0.025).
- setLearningRate(double) - Method in class org.apache.spark.mllib.tree.configuration.BoostingStrategy
-
- setLocalProperty(String, String) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Set a local property that affects jobs submitted from this thread, such as the
Spark fair scheduler pool.
- setLocalProperty(String, String) - Method in class org.apache.spark.SparkContext
-
Set a local property that affects jobs submitted from this thread, such as the
Spark fair scheduler pool.
- setLogLevel(String) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Control our logLevel.
- setLogLevel(String) - Method in class org.apache.spark.SparkContext
-
Control our logLevel.
- setLoss(Loss) - Method in class org.apache.spark.mllib.tree.configuration.BoostingStrategy
-
- setLossType(String) - Method in class org.apache.spark.ml.classification.GBTClassifier
-
- setLossType(String) - Method in class org.apache.spark.ml.regression.GBTRegressor
-
- setMainClass(String) - Method in class org.apache.spark.launcher.SparkLauncher
-
Sets the application class name for Java/Scala applications.
- setMapSideCombine(boolean) - Method in class org.apache.spark.rdd.ShuffledRDD
-
Set mapSideCombine flag for RDD's shuffle.
- setMaster(String) - Method in class org.apache.spark.launcher.SparkLauncher
-
Set the Spark master for the application.
- setMaster(String) - Method in class org.apache.spark.SparkConf
-
The master URL to connect to, such as "local" to run locally with one thread, "local[4]" to
run locally with 4 cores, or "spark://master:7077" to run on a Spark standalone cluster.
- setMax(double) - Method in class org.apache.spark.ml.feature.MinMaxScaler
-
- setMax(double) - Method in class org.apache.spark.ml.feature.MinMaxScalerModel
-
- setMaxBins(int) - Method in class org.apache.spark.ml.classification.DecisionTreeClassifier
-
- setMaxBins(int) - Method in class org.apache.spark.ml.classification.GBTClassifier
-
- setMaxBins(int) - Method in class org.apache.spark.ml.classification.RandomForestClassifier
-
- setMaxBins(int) - Method in class org.apache.spark.ml.regression.DecisionTreeRegressor
-
- setMaxBins(int) - Method in class org.apache.spark.ml.regression.GBTRegressor
-
- setMaxBins(int) - Method in class org.apache.spark.ml.regression.RandomForestRegressor
-
- setMaxBins(int) - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- setMaxCategories(int) - Method in class org.apache.spark.ml.feature.VectorIndexer
-
- setMaxDepth(int) - Method in class org.apache.spark.ml.classification.DecisionTreeClassifier
-
- setMaxDepth(int) - Method in class org.apache.spark.ml.classification.GBTClassifier
-
- setMaxDepth(int) - Method in class org.apache.spark.ml.classification.RandomForestClassifier
-
- setMaxDepth(int) - Method in class org.apache.spark.ml.regression.DecisionTreeRegressor
-
- setMaxDepth(int) - Method in class org.apache.spark.ml.regression.GBTRegressor
-
- setMaxDepth(int) - Method in class org.apache.spark.ml.regression.RandomForestRegressor
-
- setMaxDepth(int) - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- setMaxIter(int) - Method in class org.apache.spark.ml.classification.GBTClassifier
-
- setMaxIter(int) - Method in class org.apache.spark.ml.classification.LogisticRegression
-
Set the maximum number of iterations.
- setMaxIter(int) - Method in class org.apache.spark.ml.classification.MultilayerPerceptronClassifier
-
Set the maximum number of iterations.
- setMaxIter(int) - Method in class org.apache.spark.ml.clustering.KMeans
-
- setMaxIter(int) - Method in class org.apache.spark.ml.feature.Word2Vec
-
- setMaxIter(int) - Method in class org.apache.spark.ml.recommendation.ALS
-
- setMaxIter(int) - Method in class org.apache.spark.ml.regression.GBTRegressor
-
- setMaxIter(int) - Method in class org.apache.spark.ml.regression.LinearRegression
-
Set the maximum number of iterations.
- setMaxIterations(int) - Method in class org.apache.spark.mllib.clustering.GaussianMixture
-
Set the maximum number of iterations to run.
- setMaxIterations(int) - Method in class org.apache.spark.mllib.clustering.KMeans
-
Set maximum number of iterations to run.
- setMaxIterations(int) - Method in class org.apache.spark.mllib.clustering.LDA
-
Maximum number of iterations for learning.
- setMaxIterations(int) - Method in class org.apache.spark.mllib.clustering.PowerIterationClustering
-
Set maximum number of iterations of the power iteration loop
- setMaxLocalProjDBSize(long) - Method in class org.apache.spark.mllib.fpm.PrefixSpan
-
Sets the maximum number of items (including delimiters used in the internal storage format)
allowed in a projected database before local processing (default: 32000000L
).
- setMaxMemoryInMB(int) - Method in class org.apache.spark.ml.classification.DecisionTreeClassifier
-
- setMaxMemoryInMB(int) - Method in class org.apache.spark.ml.classification.GBTClassifier
-
- setMaxMemoryInMB(int) - Method in class org.apache.spark.ml.classification.RandomForestClassifier
-
- setMaxMemoryInMB(int) - Method in class org.apache.spark.ml.regression.DecisionTreeRegressor
-
- setMaxMemoryInMB(int) - Method in class org.apache.spark.ml.regression.GBTRegressor
-
- setMaxMemoryInMB(int) - Method in class org.apache.spark.ml.regression.RandomForestRegressor
-
- setMaxMemoryInMB(int) - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- setMaxNumIterations(int) - Method in class org.apache.spark.mllib.optimization.LBFGS
-
- setMaxPatternLength(int) - Method in class org.apache.spark.mllib.fpm.PrefixSpan
-
Sets maximal pattern length (default: 10
).
- setMetricName(String) - Method in class org.apache.spark.ml.evaluation.BinaryClassificationEvaluator
-
- setMetricName(String) - Method in class org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
-
- setMetricName(String) - Method in class org.apache.spark.ml.evaluation.RegressionEvaluator
-
- setMin(double) - Method in class org.apache.spark.ml.feature.MinMaxScaler
-
- setMin(double) - Method in class org.apache.spark.ml.feature.MinMaxScalerModel
-
- setMinConfidence(double) - Method in class org.apache.spark.mllib.fpm.AssociationRules
-
Sets the minimal confidence (default: 0.8
).
- setMinCount(int) - Method in class org.apache.spark.ml.feature.Word2Vec
-
- setMinCount(int) - Method in class org.apache.spark.mllib.feature.Word2Vec
-
Sets minCount, the minimum number of times a token must appear to be included in the word2vec
model's vocabulary (default: 5).
- setMinDF(double) - Method in class org.apache.spark.ml.feature.CountVectorizer
-
- setMinDocFreq(int) - Method in class org.apache.spark.ml.feature.IDF
-
- setMiniBatchFraction(double) - Method in class org.apache.spark.mllib.classification.StreamingLogisticRegressionWithSGD
-
Set the fraction of each batch to use for updates.
- setMiniBatchFraction(double) - Method in class org.apache.spark.mllib.clustering.OnlineLDAOptimizer
-
Mini-batch fraction in (0, 1], which sets the fraction of document sampled and used in
each iteration.
- setMiniBatchFraction(double) - Method in class org.apache.spark.mllib.optimization.GradientDescent
-
:: Experimental ::
Set fraction of data to be used for each SGD iteration.
- setMiniBatchFraction(double) - Method in class org.apache.spark.mllib.regression.StreamingLinearRegressionWithSGD
-
Set the fraction of each batch to use for updates.
- setMinInfoGain(double) - Method in class org.apache.spark.ml.classification.DecisionTreeClassifier
-
- setMinInfoGain(double) - Method in class org.apache.spark.ml.classification.GBTClassifier
-
- setMinInfoGain(double) - Method in class org.apache.spark.ml.classification.RandomForestClassifier
-
- setMinInfoGain(double) - Method in class org.apache.spark.ml.regression.DecisionTreeRegressor
-
- setMinInfoGain(double) - Method in class org.apache.spark.ml.regression.GBTRegressor
-
- setMinInfoGain(double) - Method in class org.apache.spark.ml.regression.RandomForestRegressor
-
- setMinInfoGain(double) - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- setMinInstancesPerNode(int) - Method in class org.apache.spark.ml.classification.DecisionTreeClassifier
-
- setMinInstancesPerNode(int) - Method in class org.apache.spark.ml.classification.GBTClassifier
-
- setMinInstancesPerNode(int) - Method in class org.apache.spark.ml.classification.RandomForestClassifier
-
- setMinInstancesPerNode(int) - Method in class org.apache.spark.ml.regression.DecisionTreeRegressor
-
- setMinInstancesPerNode(int) - Method in class org.apache.spark.ml.regression.GBTRegressor
-
- setMinInstancesPerNode(int) - Method in class org.apache.spark.ml.regression.RandomForestRegressor
-
- setMinInstancesPerNode(int) - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- setMinSupport(double) - Method in class org.apache.spark.mllib.fpm.FPGrowth
-
Sets the minimal support level (default: 0.3
).
- setMinSupport(double) - Method in class org.apache.spark.mllib.fpm.PrefixSpan
-
Sets the minimal support level (default: 0.1
).
- setMinTF(double) - Method in class org.apache.spark.ml.feature.CountVectorizer
-
- setMinTF(double) - Method in class org.apache.spark.ml.feature.CountVectorizerModel
-
- setMinTokenLength(int) - Method in class org.apache.spark.ml.feature.RegexTokenizer
-
- setModelType(String) - Method in class org.apache.spark.ml.classification.NaiveBayes
-
Set the model type using a string (case-sensitive).
- setModelType(String) - Method in class org.apache.spark.mllib.classification.NaiveBayes
-
Set the model type using a string (case-sensitive).
- setN(int) - Method in class org.apache.spark.ml.feature.NGram
-
- setName(String) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Assign a name to this RDD
- setName(String) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Assign a name to this RDD
- setName(String) - Method in class org.apache.spark.api.java.JavaRDD
-
Assign a name to this RDD
- setName(String) - Method in class org.apache.spark.graphx.impl.EdgeRDDImpl
-
- setName(String) - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
-
- setName(String) - Method in class org.apache.spark.rdd.RDD
-
Assign a name to this RDD
- setNames(String[]) - Method in class org.apache.spark.ml.feature.VectorSlicer
-
- setNonnegative(boolean) - Method in class org.apache.spark.ml.recommendation.ALS
-
- setNonnegative(boolean) - Method in class org.apache.spark.mllib.recommendation.ALS
-
Set whether the least-squares problems solved at each iteration should have
nonnegativity constraints.
- setNumBlocks(int) - Method in class org.apache.spark.ml.recommendation.ALS
-
Sets both numUserBlocks and numItemBlocks to the specific value.
- setNumClasses(int) - Method in class org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS
-
:: Experimental ::
Set the number of possible outcomes for k classes classification problem in
Multinomial Logistic Regression.
- setNumClasses(int) - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- setNumCorrections(int) - Method in class org.apache.spark.mllib.optimization.LBFGS
-
Set the number of corrections used in the LBFGS update.
- setNumFeatures(int) - Method in class org.apache.spark.ml.feature.HashingTF
-
- setNumFolds(int) - Method in class org.apache.spark.ml.tuning.CrossValidator
-
- setNumItemBlocks(int) - Method in class org.apache.spark.ml.recommendation.ALS
-
- setNumIterations(int) - Method in class org.apache.spark.mllib.classification.StreamingLogisticRegressionWithSGD
-
Set the number of iterations of gradient descent to run per update.
- setNumIterations(int) - Method in class org.apache.spark.mllib.feature.Word2Vec
-
Sets number of iterations (default: 1), which should be smaller than or equal to number of
partitions.
- setNumIterations(int) - Method in class org.apache.spark.mllib.optimization.GradientDescent
-
Set the number of iterations for SGD.
- setNumIterations(int) - Method in class org.apache.spark.mllib.optimization.LBFGS
-
Set the maximal number of iterations for L-BFGS.
- setNumIterations(int) - Method in class org.apache.spark.mllib.regression.StreamingLinearRegressionWithSGD
-
Set the number of iterations of gradient descent to run per update.
- setNumIterations(int) - Method in class org.apache.spark.mllib.tree.configuration.BoostingStrategy
-
- setNumPartitions(int) - Method in class org.apache.spark.ml.feature.Word2Vec
-
- setNumPartitions(int) - Method in class org.apache.spark.mllib.feature.Word2Vec
-
Sets number of partitions (default: 1).
- setNumPartitions(int) - Method in class org.apache.spark.mllib.fpm.FPGrowth
-
Sets the number of partitions used by parallel FP-growth (default: same as input data).
- setNumTrees(int) - Method in class org.apache.spark.ml.classification.RandomForestClassifier
-
- setNumTrees(int) - Method in class org.apache.spark.ml.regression.RandomForestRegressor
-
- setNumUserBlocks(int) - Method in class org.apache.spark.ml.recommendation.ALS
-
- setOptimizeDocConcentration(boolean) - Method in class org.apache.spark.mllib.clustering.OnlineLDAOptimizer
-
Sets whether to optimize docConcentration parameter during training.
- setOptimizer(LDAOptimizer) - Method in class org.apache.spark.mllib.clustering.LDA
-
:: DeveloperApi ::
- setOptimizer(String) - Method in class org.apache.spark.mllib.clustering.LDA
-
Set the LDAOptimizer used to perform the actual calculation by algorithm name.
- setOrNull(long, int, int) - Method in class org.apache.spark.sql.types.Decimal
-
Set this Decimal to the given unscaled Long, with a given precision and scale,
and return it, or return null if it cannot be set due to overflow.
- setOutputCol(String) - Method in class org.apache.spark.ml.feature.Binarizer
-
- setOutputCol(String) - Method in class org.apache.spark.ml.feature.Bucketizer
-
- setOutputCol(String) - Method in class org.apache.spark.ml.feature.CountVectorizer
-
- setOutputCol(String) - Method in class org.apache.spark.ml.feature.CountVectorizerModel
-
- setOutputCol(String) - Method in class org.apache.spark.ml.feature.HashingTF
-
- setOutputCol(String) - Method in class org.apache.spark.ml.feature.IDF
-
- setOutputCol(String) - Method in class org.apache.spark.ml.feature.IDFModel
-
- setOutputCol(String) - Method in class org.apache.spark.ml.feature.IndexToString
-
- setOutputCol(String) - Method in class org.apache.spark.ml.feature.MinMaxScaler
-
- setOutputCol(String) - Method in class org.apache.spark.ml.feature.MinMaxScalerModel
-
- setOutputCol(String) - Method in class org.apache.spark.ml.feature.OneHotEncoder
-
- setOutputCol(String) - Method in class org.apache.spark.ml.feature.PCA
-
- setOutputCol(String) - Method in class org.apache.spark.ml.feature.PCAModel
-
- setOutputCol(String) - Method in class org.apache.spark.ml.feature.StandardScaler
-
- setOutputCol(String) - Method in class org.apache.spark.ml.feature.StandardScalerModel
-
- setOutputCol(String) - Method in class org.apache.spark.ml.feature.StopWordsRemover
-
- setOutputCol(String) - Method in class org.apache.spark.ml.feature.StringIndexer
-
- setOutputCol(String) - Method in class org.apache.spark.ml.feature.StringIndexerModel
-
- setOutputCol(String) - Method in class org.apache.spark.ml.feature.VectorAssembler
-
- setOutputCol(String) - Method in class org.apache.spark.ml.feature.VectorIndexer
-
- setOutputCol(String) - Method in class org.apache.spark.ml.feature.VectorIndexerModel
-
- setOutputCol(String) - Method in class org.apache.spark.ml.feature.VectorSlicer
-
- setOutputCol(String) - Method in class org.apache.spark.ml.feature.Word2Vec
-
- setOutputCol(String) - Method in class org.apache.spark.ml.feature.Word2VecModel
-
- setOutputCol(String) - Method in class org.apache.spark.ml.UnaryTransformer
-
- setP(double) - Method in class org.apache.spark.ml.feature.Normalizer
-
- setParent(Estimator<M>) - Method in class org.apache.spark.ml.Model
-
Sets the parent of this model (Java API).
- setPattern(String) - Method in class org.apache.spark.ml.feature.RegexTokenizer
-
- setPredictionCol(String) - Method in class org.apache.spark.ml.classification.OneVsRest
-
- setPredictionCol(String) - Method in class org.apache.spark.ml.clustering.KMeans
-
- setPredictionCol(String) - Method in class org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
-
- setPredictionCol(String) - Method in class org.apache.spark.ml.evaluation.RegressionEvaluator
-
- setPredictionCol(String) - Method in class org.apache.spark.ml.PredictionModel
-
- setPredictionCol(String) - Method in class org.apache.spark.ml.Predictor
-
- setPredictionCol(String) - Method in class org.apache.spark.ml.recommendation.ALS
-
- setPredictionCol(String) - Method in class org.apache.spark.ml.recommendation.ALSModel
-
- setPredictionCol(String) - Method in class org.apache.spark.ml.regression.IsotonicRegression
-
- setPredictionCol(String) - Method in class org.apache.spark.ml.regression.IsotonicRegressionModel
-
- setProbabilityCol(String) - Method in class org.apache.spark.ml.classification.ProbabilisticClassificationModel
-
- setProbabilityCol(String) - Method in class org.apache.spark.ml.classification.ProbabilisticClassifier
-
- setProductBlocks(int) - Method in class org.apache.spark.mllib.recommendation.ALS
-
Set the number of product blocks to parallelize the computation.
- setPropertiesFile(String) - Method in class org.apache.spark.launcher.SparkLauncher
-
Set a custom properties file with Spark configuration for the application.
- setQuantileCalculationStrategy(Enumeration.Value) - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- setRandomCenters(int, double, long) - Method in class org.apache.spark.mllib.clustering.StreamingKMeans
-
Initialize random centers, requiring only the number of dimensions.
- setRank(int) - Method in class org.apache.spark.ml.recommendation.ALS
-
- setRank(int) - Method in class org.apache.spark.mllib.recommendation.ALS
-
Set the rank of the feature matrices computed (number of features).
- setRatingCol(String) - Method in class org.apache.spark.ml.recommendation.ALS
-
- setRawPredictionCol(String) - Method in class org.apache.spark.ml.classification.ClassificationModel
-
- setRawPredictionCol(String) - Method in class org.apache.spark.ml.classification.Classifier
-
- setRawPredictionCol(String) - Method in class org.apache.spark.ml.evaluation.BinaryClassificationEvaluator
-
- setRegParam(double) - Method in class org.apache.spark.ml.classification.LogisticRegression
-
Set the regularization parameter.
- setRegParam(double) - Method in class org.apache.spark.ml.recommendation.ALS
-
- setRegParam(double) - Method in class org.apache.spark.ml.regression.LinearRegression
-
Set the regularization parameter.
- setRegParam(double) - Method in class org.apache.spark.mllib.classification.StreamingLogisticRegressionWithSGD
-
Set the regularization parameter.
- setRegParam(double) - Method in class org.apache.spark.mllib.optimization.GradientDescent
-
Set the regularization parameter.
- setRegParam(double) - Method in class org.apache.spark.mllib.optimization.LBFGS
-
Set the regularization parameter.
- setRest(long, int, VD, ED) - Method in class org.apache.spark.graphx.impl.AggregatingEdgeContext
-
- setRuns(int) - Method in class org.apache.spark.mllib.clustering.KMeans
-
:: Experimental ::
Set the number of runs of the algorithm to execute in parallel.
- setSample(RDD<Object>) - Method in class org.apache.spark.mllib.stat.KernelDensity
-
Sets the sample to use for density estimation.
- setSample(JavaRDD<Double>) - Method in class org.apache.spark.mllib.stat.KernelDensity
-
Sets the sample to use for density estimation (for Java users).
- setScalingVec(Vector) - Method in class org.apache.spark.ml.feature.ElementwiseProduct
-
- setScoreCol(String) - Method in class org.apache.spark.ml.evaluation.BinaryClassificationEvaluator
-
Deprecated.
use setRawPredictionCol()
instead
- setSeed(long) - Method in class org.apache.spark.ml.classification.GBTClassifier
-
- setSeed(long) - Method in class org.apache.spark.ml.classification.MultilayerPerceptronClassifier
-
Set the seed for weights initialization.
- setSeed(long) - Method in class org.apache.spark.ml.classification.RandomForestClassifier
-
- setSeed(long) - Method in class org.apache.spark.ml.clustering.KMeans
-
- setSeed(long) - Method in class org.apache.spark.ml.feature.Word2Vec
-
- setSeed(long) - Method in class org.apache.spark.ml.recommendation.ALS
-
- setSeed(long) - Method in class org.apache.spark.ml.regression.GBTRegressor
-
- setSeed(long) - Method in class org.apache.spark.ml.regression.RandomForestRegressor
-
- setSeed(long) - Method in class org.apache.spark.mllib.clustering.GaussianMixture
-
Set the random seed
- setSeed(long) - Method in class org.apache.spark.mllib.clustering.KMeans
-
Set the random seed for cluster initialization.
- setSeed(long) - Method in class org.apache.spark.mllib.clustering.LDA
-
Random seed
- setSeed(long) - Method in class org.apache.spark.mllib.feature.Word2Vec
-
Sets random seed (default: a random long integer).
- setSeed(long) - Method in class org.apache.spark.mllib.random.ExponentialGenerator
-
- setSeed(long) - Method in class org.apache.spark.mllib.random.GammaGenerator
-
- setSeed(long) - Method in class org.apache.spark.mllib.random.LogNormalGenerator
-
- setSeed(long) - Method in class org.apache.spark.mllib.random.PoissonGenerator
-
- setSeed(long) - Method in class org.apache.spark.mllib.random.StandardNormalGenerator
-
- setSeed(long) - Method in class org.apache.spark.mllib.random.UniformGenerator
-
- setSeed(long) - Method in class org.apache.spark.mllib.recommendation.ALS
-
Sets a random seed to have deterministic results.
- setSeed(long) - Method in class org.apache.spark.util.random.BernoulliCellSampler
-
- setSeed(long) - Method in class org.apache.spark.util.random.BernoulliSampler
-
- setSeed(long) - Method in class org.apache.spark.util.random.PoissonSampler
-
- setSeed(long) - Method in interface org.apache.spark.util.random.Pseudorandom
-
Set random seed.
- setSerializer(Serializer) - Method in class org.apache.spark.rdd.CoGroupedRDD
-
Set a serializer for this RDD's shuffle, or null to use the default (spark.serializer)
- setSerializer(Serializer) - Method in class org.apache.spark.rdd.ShuffledRDD
-
Set a serializer for this RDD's shuffle, or null to use the default (spark.serializer)
- setSession(SQLContext.SQLSession) - Method in class org.apache.spark.sql.SQLContext
-
- setSmoothing(double) - Method in class org.apache.spark.ml.classification.NaiveBayes
-
Set the smoothing parameter.
- setSparkHome(String) - Method in class org.apache.spark.launcher.SparkLauncher
-
Set a custom Spark installation location for the application.
- setSparkHome(String) - Method in class org.apache.spark.SparkConf
-
Set the location where Spark is installed on worker nodes.
- setSplits(double[]) - Method in class org.apache.spark.ml.feature.Bucketizer
-
- setSrcOnly(long, int, VD) - Method in class org.apache.spark.graphx.impl.AggregatingEdgeContext
-
- setStages(PipelineStage[]) - Method in class org.apache.spark.ml.Pipeline
-
- setStandardization(boolean) - Method in class org.apache.spark.ml.classification.LogisticRegression
-
Whether to standardize the training features before fitting the model.
- setStandardization(boolean) - Method in class org.apache.spark.ml.regression.LinearRegression
-
Whether to standardize the training features before fitting the model.
- setStepSize(double) - Method in class org.apache.spark.ml.classification.GBTClassifier
-
- setStepSize(double) - Method in class org.apache.spark.ml.feature.Word2Vec
-
- setStepSize(double) - Method in class org.apache.spark.ml.regression.GBTRegressor
-
- setStepSize(double) - Method in class org.apache.spark.mllib.classification.StreamingLogisticRegressionWithSGD
-
Set the step size for gradient descent.
- setStepSize(double) - Method in class org.apache.spark.mllib.optimization.GradientDescent
-
Set the initial step size of SGD for the first step.
- setStepSize(double) - Method in class org.apache.spark.mllib.regression.StreamingLinearRegressionWithSGD
-
Set the step size for gradient descent.
- setStopWords(String[]) - Method in class org.apache.spark.ml.feature.StopWordsRemover
-
- setSubsamplingRate(double) - Method in class org.apache.spark.ml.classification.GBTClassifier
-
- setSubsamplingRate(double) - Method in class org.apache.spark.ml.classification.RandomForestClassifier
-
- setSubsamplingRate(double) - Method in class org.apache.spark.ml.regression.GBTRegressor
-
- setSubsamplingRate(double) - Method in class org.apache.spark.ml.regression.RandomForestRegressor
-
- setSubsamplingRate(double) - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- setTaskContext(TaskContext) - Static method in class org.apache.spark.TaskContext
-
Set the thread local TaskContext.
- setTau0(double) - Method in class org.apache.spark.mllib.clustering.OnlineLDAOptimizer
-
A (positive) learning parameter that downweights early iterations.
- setThreshold(double) - Method in class org.apache.spark.ml.classification.LogisticRegression
-
- setThreshold(double) - Method in class org.apache.spark.ml.classification.LogisticRegressionModel
-
- setThreshold(double) - Method in class org.apache.spark.ml.feature.Binarizer
-
- setThreshold(double) - Method in class org.apache.spark.mllib.classification.LogisticRegressionModel
-
:: Experimental ::
Sets the threshold that separates positive predictions from negative predictions
in Binary Logistic Regression.
- setThreshold(double) - Method in class org.apache.spark.mllib.classification.SVMModel
-
:: Experimental ::
Sets the threshold that separates positive predictions from negative predictions.
- setThresholds(double[]) - Method in class org.apache.spark.ml.classification.LogisticRegression
-
- setThresholds(double[]) - Method in class org.apache.spark.ml.classification.LogisticRegressionModel
-
- setThresholds(double[]) - Method in class org.apache.spark.ml.classification.ProbabilisticClassificationModel
-
- setThresholds(double[]) - Method in class org.apache.spark.ml.classification.ProbabilisticClassifier
-
- setTol(double) - Method in class org.apache.spark.ml.classification.LogisticRegression
-
Set the convergence tolerance of iterations.
- setTol(double) - Method in class org.apache.spark.ml.classification.MultilayerPerceptronClassifier
-
Set the convergence tolerance of iterations.
- setTol(double) - Method in class org.apache.spark.ml.clustering.KMeans
-
- setTol(double) - Method in class org.apache.spark.ml.regression.LinearRegression
-
Set the convergence tolerance of iterations.
- setTopicConcentration(double) - Method in class org.apache.spark.mllib.clustering.LDA
-
Concentration parameter (commonly named "beta" or "eta") for the prior placed on topics'
distributions over terms.
- setTrainRatio(double) - Method in class org.apache.spark.ml.tuning.TrainValidationSplit
-
- setTreeStrategy(Strategy) - Method in class org.apache.spark.mllib.tree.configuration.BoostingStrategy
-
- setUpdater(Updater) - Method in class org.apache.spark.mllib.optimization.GradientDescent
-
Set the updater function to actually perform a gradient step in a given direction.
- setUpdater(Updater) - Method in class org.apache.spark.mllib.optimization.LBFGS
-
Set the updater function to actually perform a gradient step in a given direction.
- setupGroups(int) - Method in class org.apache.spark.rdd.PartitionCoalescer
-
Initializes targetLen partition groups and assigns a preferredLocation
This uses coupon collector to estimate how many preferredLocations it must rotate through
until it has seen most of the preferred locations (2 * n log(n))
- setUseNodeIdCache(boolean) - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- setUserBlocks(int) - Method in class org.apache.spark.mllib.recommendation.ALS
-
Set the number of user blocks to parallelize the computation.
- setUserCol(String) - Method in class org.apache.spark.ml.recommendation.ALS
-
- setUserCol(String) - Method in class org.apache.spark.ml.recommendation.ALSModel
-
- setValidateData(boolean) - Method in class org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm
-
Set if the algorithm should validate data before training.
- setValidationTol(double) - Method in class org.apache.spark.mllib.tree.configuration.BoostingStrategy
-
- setValue(R) - Method in class org.apache.spark.Accumulable
-
Set the accumulator's value; only allowed on master
- setVectorSize(int) - Method in class org.apache.spark.ml.feature.Word2Vec
-
- setVectorSize(int) - Method in class org.apache.spark.mllib.feature.Word2Vec
-
Sets vector size (default: 100).
- setVerbose(boolean) - Method in class org.apache.spark.launcher.SparkLauncher
-
Enables verbose reporting for SparkSubmit.
- setVocabSize(int) - Method in class org.apache.spark.ml.feature.CountVectorizer
-
- setWeightCol(String) - Method in class org.apache.spark.ml.regression.IsotonicRegression
-
- setWithMean(boolean) - Method in class org.apache.spark.ml.feature.StandardScaler
-
- setWithMean(boolean) - Method in class org.apache.spark.mllib.feature.StandardScalerModel
-
- setWithStd(boolean) - Method in class org.apache.spark.ml.feature.StandardScaler
-
- setWithStd(boolean) - Method in class org.apache.spark.mllib.feature.StandardScalerModel
-
- sha1(Column) - Static method in class org.apache.spark.sql.functions
-
Calculates the SHA-1 digest of a binary column and returns the value
as a 40 character hex string.
- sha2(Column, int) - Static method in class org.apache.spark.sql.functions
-
Calculates the SHA-2 family of hash functions of a binary column and
returns the value as a hex string.
- shape() - Method in class org.apache.spark.mllib.random.GammaGenerator
-
- shiftLeft(Column, int) - Static method in class org.apache.spark.sql.functions
-
Shift the the given value numBits left.
- shiftRight(Column, int) - Static method in class org.apache.spark.sql.functions
-
Shift the the given value numBits right.
- shiftRightUnsigned(Column, int) - Static method in class org.apache.spark.sql.functions
-
Unsigned shift the the given value numBits right.
- ShortDecimal() - Static method in class org.apache.spark.sql.types.DecimalType
-
- ShortestPaths - Class in org.apache.spark.graphx.lib
-
Computes shortest paths to the given set of landmark vertices, returning a graph where each
vertex attribute is a map containing the shortest-path distance to each reachable landmark.
- ShortestPaths() - Constructor for class org.apache.spark.graphx.lib.ShortestPaths
-
- shortName() - Method in interface org.apache.spark.sql.sources.DataSourceRegister
-
The string that represents the format that this data source provider uses.
- ShortType - Static variable in class org.apache.spark.sql.types.DataTypes
-
Gets the ShortType object.
- ShortType - Class in org.apache.spark.sql.types
-
:: DeveloperApi ::
The data type representing Short
values.
- shouldDistributeGaussians(int, int) - Static method in class org.apache.spark.mllib.clustering.GaussianMixture
-
Heuristic to distribute the computation of the MultivariateGaussian
s, approximately when
d > 25 except for when k is very small.
- shouldGoLeft(Vector) - Method in interface org.apache.spark.ml.tree.Split
-
Return true (split to left) or false (split to right).
- shouldGoLeft(int, Split[]) - Method in interface org.apache.spark.ml.tree.Split
-
Return true (split to left) or false (split to right).
- shouldOwn(Param<?>) - Method in interface org.apache.spark.ml.param.Params
-
Validates that the input param belongs to this instance.
- show(int) - Method in class org.apache.spark.sql.DataFrame
-
- show() - Method in class org.apache.spark.sql.DataFrame
-
Displays the top 20 rows of
DataFrame
in a tabular form.
- show(boolean) - Method in class org.apache.spark.sql.DataFrame
-
Displays the top 20 rows of
DataFrame
in a tabular form.
- show(int, boolean) - Method in class org.apache.spark.sql.DataFrame
-
- showBytesDistribution(String, Function2<TaskInfo, TaskMetrics, Option<Object>>, Seq<Tuple2<TaskInfo, TaskMetrics>>) - Static method in class org.apache.spark.scheduler.StatsReportListener
-
- showBytesDistribution(String, Option<org.apache.spark.util.Distribution>) - Static method in class org.apache.spark.scheduler.StatsReportListener
-
- showBytesDistribution(String, org.apache.spark.util.Distribution) - Static method in class org.apache.spark.scheduler.StatsReportListener
-
- showDistribution(String, org.apache.spark.util.Distribution, Function1<Object, String>) - Static method in class org.apache.spark.scheduler.StatsReportListener
-
- showDistribution(String, Option<org.apache.spark.util.Distribution>, Function1<Object, String>) - Static method in class org.apache.spark.scheduler.StatsReportListener
-
- showDistribution(String, Option<org.apache.spark.util.Distribution>, String) - Static method in class org.apache.spark.scheduler.StatsReportListener
-
- showDistribution(String, String, Function2<TaskInfo, TaskMetrics, Option<Object>>, Seq<Tuple2<TaskInfo, TaskMetrics>>) - Static method in class org.apache.spark.scheduler.StatsReportListener
-
- showMillisDistribution(String, Option<org.apache.spark.util.Distribution>) - Static method in class org.apache.spark.scheduler.StatsReportListener
-
- showMillisDistribution(String, Function2<TaskInfo, TaskMetrics, Option<Object>>, Seq<Tuple2<TaskInfo, TaskMetrics>>) - Static method in class org.apache.spark.scheduler.StatsReportListener
-
- showMillisDistribution(String, Function1<BatchInfo, Option<Object>>) - Method in class org.apache.spark.streaming.scheduler.StatsReportListener
-
- SHUFFLE() - Static method in class org.apache.spark.storage.BlockId
-
- SHUFFLE_DATA() - Static method in class org.apache.spark.storage.BlockId
-
- SHUFFLE_INDEX() - Static method in class org.apache.spark.storage.BlockId
-
- ShuffleBlockId - Class in org.apache.spark.storage
-
- ShuffleBlockId(int, int, int) - Constructor for class org.apache.spark.storage.ShuffleBlockId
-
- ShuffleDataBlockId - Class in org.apache.spark.storage
-
- ShuffleDataBlockId(int, int, int) - Constructor for class org.apache.spark.storage.ShuffleDataBlockId
-
- ShuffleDependency<K,V,C> - Class in org.apache.spark
-
:: DeveloperApi ::
Represents a dependency on the output of a shuffle stage.
- ShuffleDependency(RDD<? extends Product2<K, V>>, Partitioner, Option<Serializer>, Option<Ordering<K>>, Option<Aggregator<K, V, C>>, boolean) - Constructor for class org.apache.spark.ShuffleDependency
-
- ShuffledRDD<K,V,C> - Class in org.apache.spark.rdd
-
:: DeveloperApi ::
The resulting RDD from a shuffle (e.g.
- ShuffledRDD(RDD<? extends Product2<K, V>>, Partitioner) - Constructor for class org.apache.spark.rdd.ShuffledRDD
-
- shuffleHandle() - Method in class org.apache.spark.ShuffleDependency
-
- shuffleId() - Method in class org.apache.spark.CleanShuffle
-
- shuffleId() - Method in class org.apache.spark.FetchFailed
-
- shuffleId() - Method in class org.apache.spark.ShuffleDependency
-
- shuffleId() - Method in class org.apache.spark.storage.ShuffleBlockId
-
- shuffleId() - Method in class org.apache.spark.storage.ShuffleDataBlockId
-
- shuffleId() - Method in class org.apache.spark.storage.ShuffleIndexBlockId
-
- ShuffleIndexBlockId - Class in org.apache.spark.storage
-
- ShuffleIndexBlockId(int, int, int) - Constructor for class org.apache.spark.storage.ShuffleIndexBlockId
-
- shuffleManager() - Method in class org.apache.spark.SparkEnv
-
- shuffleMemoryManager() - Method in class org.apache.spark.SparkEnv
-
- shuffleRead() - Method in class org.apache.spark.status.api.v1.ExecutorStageSummary
-
- shuffleReadBytes() - Method in class org.apache.spark.status.api.v1.StageData
-
- ShuffleReadMetricDistributions - Class in org.apache.spark.status.api.v1
-
- ShuffleReadMetrics - Class in org.apache.spark.status.api.v1
-
- shuffleReadMetrics() - Method in class org.apache.spark.status.api.v1.TaskMetricDistributions
-
- shuffleReadMetrics() - Method in class org.apache.spark.status.api.v1.TaskMetrics
-
- shuffleReadRecords() - Method in class org.apache.spark.status.api.v1.StageData
-
- shuffleWrite() - Method in class org.apache.spark.status.api.v1.ExecutorStageSummary
-
- shuffleWriteBytes() - Method in class org.apache.spark.status.api.v1.StageData
-
- ShuffleWriteMetricDistributions - Class in org.apache.spark.status.api.v1
-
- ShuffleWriteMetrics - Class in org.apache.spark.status.api.v1
-
- shuffleWriteMetrics() - Method in class org.apache.spark.status.api.v1.TaskMetricDistributions
-
- shuffleWriteMetrics() - Method in class org.apache.spark.status.api.v1.TaskMetrics
-
- shuffleWriteRecords() - Method in class org.apache.spark.status.api.v1.StageData
-
- sigma() - Method in class org.apache.spark.mllib.stat.distribution.MultivariateGaussian
-
- sigmas() - Method in class org.apache.spark.mllib.clustering.ExpectationSum
-
- SignalLoggerHandler - Class in org.apache.spark.util
-
- SignalLoggerHandler(String, Logger) - Constructor for class org.apache.spark.util.SignalLoggerHandler
-
- signum(Column) - Static method in class org.apache.spark.sql.functions
-
Computes the signum of the given value.
- signum(String) - Static method in class org.apache.spark.sql.functions
-
Computes the signum of the given column.
- SimpleFutureAction<T> - Class in org.apache.spark
-
A
FutureAction
holding the result of an action that triggers a single job.
- simpleString() - Method in class org.apache.spark.sql.hive.HiveContext.QueryExecution
-
- simpleString() - Method in class org.apache.spark.sql.SQLContext.QueryExecution
-
- simpleString() - Method in class org.apache.spark.sql.types.ArrayType
-
- simpleString() - Method in class org.apache.spark.sql.types.ByteType
-
- simpleString() - Method in class org.apache.spark.sql.types.DataType
-
Readable string representation for the type.
- simpleString() - Method in class org.apache.spark.sql.types.DecimalType
-
- simpleString() - Method in class org.apache.spark.sql.types.IntegerType
-
- simpleString() - Method in class org.apache.spark.sql.types.LongType
-
- simpleString() - Method in class org.apache.spark.sql.types.MapType
-
- simpleString() - Method in class org.apache.spark.sql.types.ShortType
-
- simpleString() - Method in class org.apache.spark.sql.types.StructType
-
- SimpleUpdater - Class in org.apache.spark.mllib.optimization
-
:: DeveloperApi ::
A simple updater for gradient descent *without* any regularization.
- SimpleUpdater() - Constructor for class org.apache.spark.mllib.optimization.SimpleUpdater
-
- sin(Column) - Static method in class org.apache.spark.sql.functions
-
Computes the sine of the given value.
- sin(String) - Static method in class org.apache.spark.sql.functions
-
Computes the sine of the given column.
- SingularValueDecomposition<UType,VType> - Class in org.apache.spark.mllib.linalg
-
:: Experimental ::
Represents singular value decomposition (SVD) factors.
- SingularValueDecomposition(UType, Vector, VType) - Constructor for class org.apache.spark.mllib.linalg.SingularValueDecomposition
-
- sinh(Column) - Static method in class org.apache.spark.sql.functions
-
Computes the hyperbolic sine of the given value.
- sinh(String) - Static method in class org.apache.spark.sql.functions
-
Computes the hyperbolic sine of the given column.
- size() - Method in class org.apache.spark.ml.attribute.AttributeGroup
-
Size of the attribute group.
- size() - Method in class org.apache.spark.ml.param.ParamMap
-
Number of param pairs in this map.
- size() - Method in class org.apache.spark.mllib.linalg.DenseVector
-
- size() - Method in class org.apache.spark.mllib.linalg.SparseVector
-
- size() - Method in interface org.apache.spark.mllib.linalg.Vector
-
Size of the vector.
- size() - Method in class org.apache.spark.rdd.PartitionGroup
-
- size(Column) - Static method in class org.apache.spark.sql.functions
-
Returns length of array or map.
- size() - Method in interface org.apache.spark.sql.Row
-
Number of elements in the Row.
- size() - Method in class org.apache.spark.storage.MemoryEntry
-
- SizeEstimator - Class in org.apache.spark.util
-
:: DeveloperApi ::
Estimates the sizes of Java objects (number of bytes of memory they occupy), for use in
memory-aware caches.
- SizeEstimator() - Constructor for class org.apache.spark.util.SizeEstimator
-
- sizeInBytes() - Method in class org.apache.spark.sql.sources.BaseRelation
-
Returns an estimated size of this relation in bytes.
- sizeInBytes() - Method in class org.apache.spark.sql.sources.HadoopFsRelation
-
- sketch(RDD<K>, int, ClassTag<K>) - Static method in class org.apache.spark.RangePartitioner
-
Sketches the input RDD via reservoir sampling on each partition.
- skip(long) - Method in class org.apache.spark.storage.BufferReleasingInputStream
-
- skippedStages() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- slack() - Method in class org.apache.spark.rdd.PartitionCoalescer
-
- slice(Time, Time) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return all the RDDs between 'fromDuration' to 'toDuration' (both included)
- slice(org.apache.spark.streaming.Interval) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return all the RDDs defined by the Interval object (both end times included)
- slice(Time, Time) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return all the RDDs between 'fromTime' to 'toTime' (both included)
- slideDuration() - Method in class org.apache.spark.streaming.dstream.DStream
-
Time interval after which the DStream generates a RDD
- slideDuration() - Method in class org.apache.spark.streaming.dstream.InputDStream
-
- sliding(int) - Method in class org.apache.spark.mllib.rdd.RDDFunctions
-
Returns a RDD from grouping items of its parent RDD in fixed size blocks by passing a sliding
window over them.
- SnappyCompressionCodec - Class in org.apache.spark.io
-
- SnappyCompressionCodec(SparkConf) - Constructor for class org.apache.spark.io.SnappyCompressionCodec
-
- SnappyOutputStreamWrapper - Class in org.apache.spark.io
-
Wrapper over SnappyOutputStream
which guards against write-after-close and double-close
issues.
- SnappyOutputStreamWrapper(SnappyOutputStream) - Constructor for class org.apache.spark.io.SnappyOutputStreamWrapper
-
- socketStream(String, int, Function<InputStream, Iterable<T>>, StorageLevel) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Create an input stream from network source hostname:port.
- socketStream(String, int, Function1<InputStream, Iterator<T>>, StorageLevel, ClassTag<T>) - Method in class org.apache.spark.streaming.StreamingContext
-
Create a input stream from TCP source hostname:port.
- socketTextStream(String, int, StorageLevel) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Create an input stream from network source hostname:port.
- socketTextStream(String, int) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Create an input stream from network source hostname:port.
- socketTextStream(String, int, StorageLevel) - Method in class org.apache.spark.streaming.StreamingContext
-
Create a input stream from TCP source hostname:port.
- Sort() - Static method in class org.apache.spark.mllib.tree.configuration.QuantileStrategy
-
- sort(String, String...) - Method in class org.apache.spark.sql.DataFrame
-
Returns a new
DataFrame
sorted by the specified column, all in ascending order.
- sort(Column...) - Method in class org.apache.spark.sql.DataFrame
-
Returns a new
DataFrame
sorted by the given expressions.
- sort(String, Seq<String>) - Method in class org.apache.spark.sql.DataFrame
-
Returns a new
DataFrame
sorted by the specified column, all in ascending order.
- sort(Seq<Column>) - Method in class org.apache.spark.sql.DataFrame
-
Returns a new
DataFrame
sorted by the given expressions.
- sort_array(Column) - Static method in class org.apache.spark.sql.functions
-
Sorts the input array for the given column in ascending order,
according to the natural ordering of the array elements.
- sort_array(Column, boolean) - Static method in class org.apache.spark.sql.functions
-
Sorts the input array for the given column in ascending / descending order,
according to the natural ordering of the array elements.
- sortBy(Function<T, S>, boolean, int) - Method in class org.apache.spark.api.java.JavaRDD
-
Return this RDD sorted by the given key function.
- sortBy(Function1<T, K>, boolean, int, Ordering<K>, ClassTag<K>) - Method in class org.apache.spark.rdd.RDD
-
Return this RDD sorted by the given key function.
- sortByKey() - Method in class org.apache.spark.api.java.JavaPairRDD
-
Sort the RDD by key, so that each partition contains a sorted range of the elements in
ascending order.
- sortByKey(boolean) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Sort the RDD by key, so that each partition contains a sorted range of the elements.
- sortByKey(boolean, int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Sort the RDD by key, so that each partition contains a sorted range of the elements.
- sortByKey(Comparator<K>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Sort the RDD by key, so that each partition contains a sorted range of the elements.
- sortByKey(Comparator<K>, boolean) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Sort the RDD by key, so that each partition contains a sorted range of the elements.
- sortByKey(Comparator<K>, boolean, int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Sort the RDD by key, so that each partition contains a sorted range of the elements.
- sortByKey(boolean, int) - Method in class org.apache.spark.rdd.OrderedRDDFunctions
-
Sort the RDD by key, so that each partition contains a sorted range of the elements.
- soundex(Column) - Static method in class org.apache.spark.sql.functions
-
* Return the soundex code for the specified expression.
- SPARK_JOB_DESCRIPTION() - Static method in class org.apache.spark.SparkContext
-
- SPARK_JOB_GROUP_ID() - Static method in class org.apache.spark.SparkContext
-
- SPARK_JOB_INTERRUPT_ON_CANCEL() - Static method in class org.apache.spark.SparkContext
-
- SPARK_MASTER - Static variable in class org.apache.spark.launcher.SparkLauncher
-
The Spark master.
- SparkConf - Class in org.apache.spark
-
Configuration for a Spark application.
- SparkConf(boolean) - Constructor for class org.apache.spark.SparkConf
-
- SparkConf() - Constructor for class org.apache.spark.SparkConf
-
Create a SparkConf that loads defaults from system properties and the classpath
- sparkContext() - Method in class org.apache.spark.rdd.RDD
-
The SparkContext that created this RDD.
- SparkContext - Class in org.apache.spark
-
Main entry point for Spark functionality.
- SparkContext(SparkConf) - Constructor for class org.apache.spark.SparkContext
-
- SparkContext() - Constructor for class org.apache.spark.SparkContext
-
Create a SparkContext that loads settings from system properties (for instance, when
launching with ./bin/spark-submit).
- SparkContext(SparkConf, Map<String, Set<SplitInfo>>) - Constructor for class org.apache.spark.SparkContext
-
:: DeveloperApi ::
Alternative constructor for setting preferred locations where Spark will create executors.
- SparkContext(String, String, SparkConf) - Constructor for class org.apache.spark.SparkContext
-
Alternative constructor that allows setting common Spark properties directly
- SparkContext(String, String, String, Seq<String>, Map<String, String>, Map<String, Set<SplitInfo>>) - Constructor for class org.apache.spark.SparkContext
-
Alternative constructor that allows setting common Spark properties directly
- sparkContext() - Method in class org.apache.spark.sql.SQLContext
-
- sparkContext() - Method in class org.apache.spark.sql.SQLContext.SparkPlanner
-
- sparkContext() - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
The underlying SparkContext
- sparkContext() - Method in class org.apache.spark.streaming.StreamingContext
-
Return the associated Spark context
- SparkContext.DoubleAccumulatorParam$ - Class in org.apache.spark
-
- SparkContext.DoubleAccumulatorParam$() - Constructor for class org.apache.spark.SparkContext.DoubleAccumulatorParam$
-
- SparkContext.FloatAccumulatorParam$ - Class in org.apache.spark
-
- SparkContext.FloatAccumulatorParam$() - Constructor for class org.apache.spark.SparkContext.FloatAccumulatorParam$
-
- SparkContext.IntAccumulatorParam$ - Class in org.apache.spark
-
- SparkContext.IntAccumulatorParam$() - Constructor for class org.apache.spark.SparkContext.IntAccumulatorParam$
-
- SparkContext.LongAccumulatorParam$ - Class in org.apache.spark
-
- SparkContext.LongAccumulatorParam$() - Constructor for class org.apache.spark.SparkContext.LongAccumulatorParam$
-
- SparkEnv - Class in org.apache.spark
-
:: DeveloperApi ::
Holds all the runtime environment objects for a running Spark instance (either master or worker),
including the serializer, Akka actor system, block manager, map output tracker, etc.
- SparkEnv(String, org.apache.spark.rpc.RpcEnv, Serializer, Serializer, CacheManager, MapOutputTracker, ShuffleManager, org.apache.spark.broadcast.BroadcastManager, BlockTransferService, org.apache.spark.storage.BlockManager, SecurityManager, HttpFileServer, String, org.apache.spark.metrics.MetricsSystem, ShuffleMemoryManager, ExecutorMemoryManager, org.apache.spark.scheduler.OutputCommitCoordinator, SparkConf) - Constructor for class org.apache.spark.SparkEnv
-
- SparkException - Exception in org.apache.spark
-
- SparkException(String, Throwable) - Constructor for exception org.apache.spark.SparkException
-
- SparkException(String) - Constructor for exception org.apache.spark.SparkException
-
- SparkFiles - Class in org.apache.spark
-
Resolves paths to files added through SparkContext.addFile()
.
- SparkFiles() - Constructor for class org.apache.spark.SparkFiles
-
- sparkFilesDir() - Method in class org.apache.spark.SparkEnv
-
- SparkFirehoseListener - Class in org.apache.spark
-
Class that allows users to receive all SparkListener events.
- SparkFirehoseListener() - Constructor for class org.apache.spark.SparkFirehoseListener
-
- SparkFlumeEvent - Class in org.apache.spark.streaming.flume
-
A wrapper class for AvroFlumeEvent's with a custom serialization format.
- SparkFlumeEvent() - Constructor for class org.apache.spark.streaming.flume.SparkFlumeEvent
-
- SparkJobInfo - Interface in org.apache.spark
-
Exposes information about Spark Jobs.
- SparkJobInfoImpl - Class in org.apache.spark
-
- SparkJobInfoImpl(int, int[], JobExecutionStatus) - Constructor for class org.apache.spark.SparkJobInfoImpl
-
- SparkLauncher - Class in org.apache.spark.launcher
-
Launcher for Spark applications.
- SparkLauncher() - Constructor for class org.apache.spark.launcher.SparkLauncher
-
- SparkLauncher(Map<String, String>) - Constructor for class org.apache.spark.launcher.SparkLauncher
-
Creates a launcher that will set the given environment variables in the child.
- SparkListener - Interface in org.apache.spark.scheduler
-
:: DeveloperApi ::
Interface for listening to events from the Spark scheduler.
- SparkListenerApplicationEnd - Class in org.apache.spark.scheduler
-
- SparkListenerApplicationEnd(long) - Constructor for class org.apache.spark.scheduler.SparkListenerApplicationEnd
-
- SparkListenerApplicationStart - Class in org.apache.spark.scheduler
-
- SparkListenerApplicationStart(String, Option<String>, long, String, Option<String>, Option<Map<String, String>>) - Constructor for class org.apache.spark.scheduler.SparkListenerApplicationStart
-
- SparkListenerBlockManagerAdded - Class in org.apache.spark.scheduler
-
- SparkListenerBlockManagerAdded(long, BlockManagerId, long) - Constructor for class org.apache.spark.scheduler.SparkListenerBlockManagerAdded
-
- SparkListenerBlockManagerRemoved - Class in org.apache.spark.scheduler
-
- SparkListenerBlockManagerRemoved(long, BlockManagerId) - Constructor for class org.apache.spark.scheduler.SparkListenerBlockManagerRemoved
-
- SparkListenerBlockUpdated - Class in org.apache.spark.scheduler
-
- SparkListenerBlockUpdated(BlockUpdatedInfo) - Constructor for class org.apache.spark.scheduler.SparkListenerBlockUpdated
-
- SparkListenerEnvironmentUpdate - Class in org.apache.spark.scheduler
-
- SparkListenerEnvironmentUpdate(Map<String, Seq<Tuple2<String, String>>>) - Constructor for class org.apache.spark.scheduler.SparkListenerEnvironmentUpdate
-
- SparkListenerEvent - Interface in org.apache.spark.scheduler
-
- SparkListenerExecutorAdded - Class in org.apache.spark.scheduler
-
- SparkListenerExecutorAdded(long, String, ExecutorInfo) - Constructor for class org.apache.spark.scheduler.SparkListenerExecutorAdded
-
- SparkListenerExecutorMetricsUpdate - Class in org.apache.spark.scheduler
-
Periodic updates from executors.
- SparkListenerExecutorMetricsUpdate(String, Seq<Tuple4<Object, Object, Object, TaskMetrics>>) - Constructor for class org.apache.spark.scheduler.SparkListenerExecutorMetricsUpdate
-
- SparkListenerExecutorRemoved - Class in org.apache.spark.scheduler
-
- SparkListenerExecutorRemoved(long, String, String) - Constructor for class org.apache.spark.scheduler.SparkListenerExecutorRemoved
-
- SparkListenerJobEnd - Class in org.apache.spark.scheduler
-
- SparkListenerJobEnd(int, long, JobResult) - Constructor for class org.apache.spark.scheduler.SparkListenerJobEnd
-
- SparkListenerJobStart - Class in org.apache.spark.scheduler
-
- SparkListenerJobStart(int, long, Seq<StageInfo>, Properties) - Constructor for class org.apache.spark.scheduler.SparkListenerJobStart
-
- SparkListenerStageCompleted - Class in org.apache.spark.scheduler
-
- SparkListenerStageCompleted(StageInfo) - Constructor for class org.apache.spark.scheduler.SparkListenerStageCompleted
-
- SparkListenerStageSubmitted - Class in org.apache.spark.scheduler
-
- SparkListenerStageSubmitted(StageInfo, Properties) - Constructor for class org.apache.spark.scheduler.SparkListenerStageSubmitted
-
- SparkListenerTaskEnd - Class in org.apache.spark.scheduler
-
- SparkListenerTaskEnd(int, int, String, TaskEndReason, TaskInfo, TaskMetrics) - Constructor for class org.apache.spark.scheduler.SparkListenerTaskEnd
-
- SparkListenerTaskGettingResult - Class in org.apache.spark.scheduler
-
- SparkListenerTaskGettingResult(TaskInfo) - Constructor for class org.apache.spark.scheduler.SparkListenerTaskGettingResult
-
- SparkListenerTaskStart - Class in org.apache.spark.scheduler
-
- SparkListenerTaskStart(int, int, TaskInfo) - Constructor for class org.apache.spark.scheduler.SparkListenerTaskStart
-
- SparkListenerUnpersistRDD - Class in org.apache.spark.scheduler
-
- SparkListenerUnpersistRDD(int) - Constructor for class org.apache.spark.scheduler.SparkListenerUnpersistRDD
-
- sparkPartitionId() - Static method in class org.apache.spark.sql.functions
-
Partition ID of the Spark task.
- sparkPlan() - Method in class org.apache.spark.sql.SQLContext.QueryExecution
-
- sparkProperties() - Method in class org.apache.spark.ui.env.EnvironmentListener
-
- SparkShutdownHook - Class in org.apache.spark.util
-
- SparkShutdownHook(int, Function0<BoxedUnit>) - Constructor for class org.apache.spark.util.SparkShutdownHook
-
- SparkStageInfo - Interface in org.apache.spark
-
Exposes information about Spark Stages.
- SparkStageInfoImpl - Class in org.apache.spark
-
- SparkStageInfoImpl(int, int, long, String, int, int, int, int) - Constructor for class org.apache.spark.SparkStageInfoImpl
-
- SparkStatusTracker - Class in org.apache.spark
-
Low-level status reporting APIs for monitoring job and stage progress.
- sparkUser() - Method in class org.apache.spark.api.java.JavaSparkContext
-
- sparkUser() - Method in class org.apache.spark.scheduler.SparkListenerApplicationStart
-
- sparkUser() - Method in class org.apache.spark.SparkContext
-
- sparkUser() - Method in class org.apache.spark.status.api.v1.ApplicationAttemptInfo
-
- sparse(int, int, int[], int[], double[]) - Static method in class org.apache.spark.mllib.linalg.Matrices
-
Creates a column-major sparse matrix in Compressed Sparse Column (CSC) format.
- sparse(int, int[], double[]) - Static method in class org.apache.spark.mllib.linalg.Vectors
-
Creates a sparse vector providing its index array and value array.
- sparse(int, Seq<Tuple2<Object, Object>>) - Static method in class org.apache.spark.mllib.linalg.Vectors
-
Creates a sparse vector using unordered (index, value) pairs.
- sparse(int, Iterable<Tuple2<Integer, Double>>) - Static method in class org.apache.spark.mllib.linalg.Vectors
-
Creates a sparse vector using unordered (index, value) pairs in a Java friendly way.
- SparseMatrix - Class in org.apache.spark.mllib.linalg
-
Column-major sparse matrix.
- SparseMatrix(int, int, int[], int[], double[], boolean) - Constructor for class org.apache.spark.mllib.linalg.SparseMatrix
-
- SparseMatrix(int, int, int[], int[], double[]) - Constructor for class org.apache.spark.mllib.linalg.SparseMatrix
-
Column-major sparse matrix.
- SparseVector - Class in org.apache.spark.mllib.linalg
-
A sparse vector represented by an index array and an value array.
- SparseVector(int, int[], double[]) - Constructor for class org.apache.spark.mllib.linalg.SparseVector
-
- sparsity() - Method in class org.apache.spark.ml.attribute.NumericAttribute
-
- spdiag(Vector) - Static method in class org.apache.spark.mllib.linalg.SparseMatrix
-
Generate a diagonal matrix in SparseMatrix
format from the supplied values.
- SpecialLengths - Class in org.apache.spark.api.r
-
- SpecialLengths() - Constructor for class org.apache.spark.api.r.SpecialLengths
-
- speculative() - Method in class org.apache.spark.scheduler.TaskInfo
-
- speculative() - Method in class org.apache.spark.status.api.v1.TaskData
-
- speye(int) - Static method in class org.apache.spark.mllib.linalg.Matrices
-
Generate a sparse Identity Matrix in Matrix
format.
- speye(int) - Static method in class org.apache.spark.mllib.linalg.SparseMatrix
-
Generate an Identity Matrix in SparseMatrix
format.
- split() - Method in class org.apache.spark.ml.tree.InternalNode
-
- Split - Interface in org.apache.spark.ml.tree
-
:: DeveloperApi ::
Interface for a "Split," which specifies a test made at a decision tree node
to choose the left or right path.
- split() - Method in class org.apache.spark.mllib.tree.model.Node
-
- Split - Class in org.apache.spark.mllib.tree.model
-
:: DeveloperApi ::
Split applied to a feature
param: feature feature index
param: threshold Threshold for continuous feature.
- Split(int, double, Enumeration.Value, List<Object>) - Constructor for class org.apache.spark.mllib.tree.model.Split
-
- split(Column, String) - Static method in class org.apache.spark.sql.functions
-
Splits str around pattern (pattern is a regular expression).
- SPLIT_INFO_REFLECTIONS() - Static method in class org.apache.spark.rdd.HadoopRDD
-
- splitIndex() - Method in class org.apache.spark.storage.RDDBlockId
-
- SplitInfo - Class in org.apache.spark.scheduler
-
- SplitInfo(Class<?>, String, String, long, Object) - Constructor for class org.apache.spark.scheduler.SplitInfo
-
- splits() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
- splits() - Method in class org.apache.spark.ml.feature.Bucketizer
-
Parameter for mapping continuous features into buckets.
- sprand(int, int, double, Random) - Static method in class org.apache.spark.mllib.linalg.Matrices
-
Generate a SparseMatrix
consisting of i.i.d.
gaussian random numbers.
- sprand(int, int, double, Random) - Static method in class org.apache.spark.mllib.linalg.SparseMatrix
-
Generate a SparseMatrix
consisting of i.i.d
.
- sprandn(int, int, double, Random) - Static method in class org.apache.spark.mllib.linalg.Matrices
-
Generate a SparseMatrix
consisting of i.i.d.
gaussian random numbers.
- sprandn(int, int, double, Random) - Static method in class org.apache.spark.mllib.linalg.SparseMatrix
-
Generate a SparseMatrix
consisting of i.i.d
.
- sqdist(Vector, Vector) - Static method in class org.apache.spark.mllib.linalg.Vectors
-
Returns the squared distance between two Vectors.
- sql(String) - Method in class org.apache.spark.sql.SQLContext
-
- sqlContext() - Method in class org.apache.spark.sql.DataFrame
-
- sqlContext() - Method in class org.apache.spark.sql.sources.BaseRelation
-
- SQLContext - Class in org.apache.spark.sql
-
The entry point for working with structured data (rows and columns) in Spark.
- SQLContext(SparkContext) - Constructor for class org.apache.spark.sql.SQLContext
-
- SQLContext(JavaSparkContext) - Constructor for class org.apache.spark.sql.SQLContext
-
- sqlContext() - Method in class org.apache.spark.sql.SQLContext.SparkPlanner
-
- SQLContext.implicits$ - Class in org.apache.spark.sql
-
:: Experimental ::
(Scala-specific) Implicit methods available in Scala for converting
common Scala objects into
DataFrame
s.
- SQLContext.implicits$() - Constructor for class org.apache.spark.sql.SQLContext.implicits$
-
- SQLContext.implicits$.StringToColumn - Class in org.apache.spark.sql
-
Converts $"col name" into an
Column
.
- SQLContext.implicits$.StringToColumn(StringContext) - Constructor for class org.apache.spark.sql.SQLContext.implicits$.StringToColumn
-
- SQLContext.QueryExecution - Class in org.apache.spark.sql
-
- SQLContext.QueryExecution(LogicalPlan) - Constructor for class org.apache.spark.sql.SQLContext.QueryExecution
-
- SQLContext.SparkPlanner - Class in org.apache.spark.sql
-
- SQLContext.SparkPlanner() - Constructor for class org.apache.spark.sql.SQLContext.SparkPlanner
-
- SQLContext.SQLSession - Class in org.apache.spark.sql
-
- SQLContext.SQLSession() - Constructor for class org.apache.spark.sql.SQLContext.SQLSession
-
- sqlParser() - Method in class org.apache.spark.sql.SQLContext
-
- sqlType() - Method in class org.apache.spark.mllib.linalg.VectorUDT
-
- sqlType() - Method in class org.apache.spark.sql.types.UserDefinedType
-
Underlying storage type for this UDT
- SQLUserDefinedType - Annotation Type in org.apache.spark.sql.types
-
::DeveloperApi::
A user-defined type which can be automatically recognized by a SQLContext and registered.
- sqrt(Column) - Static method in class org.apache.spark.sql.functions
-
Computes the square root of the specified float value.
- sqrt(String) - Static method in class org.apache.spark.sql.functions
-
Computes the square root of the specified float value.
- squaredDist(Vector) - Method in class org.apache.spark.util.Vector
-
- SquaredError - Class in org.apache.spark.mllib.tree.loss
-
:: DeveloperApi ::
Class for squared error loss calculation.
- SquaredError() - Constructor for class org.apache.spark.mllib.tree.loss.SquaredError
-
- SquaredL2Updater - Class in org.apache.spark.mllib.optimization
-
:: DeveloperApi ::
Updater for L2 regularized problems.
- SquaredL2Updater() - Constructor for class org.apache.spark.mllib.optimization.SquaredL2Updater
-
- Src - Static variable in class org.apache.spark.graphx.TripletFields
-
Expose the source and edge fields but not the destination field.
- srcAttr() - Method in class org.apache.spark.graphx.EdgeContext
-
The vertex attribute of the edge's source vertex.
- srcAttr() - Method in class org.apache.spark.graphx.EdgeTriplet
-
The source vertex attribute
- srcAttr() - Method in class org.apache.spark.graphx.impl.AggregatingEdgeContext
-
- srcId() - Method in class org.apache.spark.graphx.Edge
-
- srcId() - Method in class org.apache.spark.graphx.EdgeContext
-
The vertex id of the edge's source vertex.
- srcId() - Method in class org.apache.spark.graphx.impl.AggregatingEdgeContext
-
- srdd() - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
- ssc() - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
- ssc() - Method in class org.apache.spark.streaming.dstream.DStream
-
- stackTrace() - Method in class org.apache.spark.ExceptionFailure
-
- stage() - Method in class org.apache.spark.scheduler.AskPermissionToCommitOutput
-
- stageAttemptId() - Method in class org.apache.spark.scheduler.SparkListenerTaskEnd
-
- stageAttemptId() - Method in class org.apache.spark.scheduler.SparkListenerTaskStart
-
- StageData - Class in org.apache.spark.status.api.v1
-
- stageFailed(String) - Method in class org.apache.spark.scheduler.StageInfo
-
- stageId() - Method in class org.apache.spark.scheduler.SparkListenerTaskEnd
-
- stageId() - Method in class org.apache.spark.scheduler.SparkListenerTaskStart
-
- stageId() - Method in class org.apache.spark.scheduler.StageInfo
-
- stageId() - Method in interface org.apache.spark.SparkStageInfo
-
- stageId() - Method in class org.apache.spark.SparkStageInfoImpl
-
- stageId() - Method in class org.apache.spark.status.api.v1.StageData
-
- stageId() - Method in class org.apache.spark.TaskContext
-
The ID of the stage that this task belong to.
- stageIds() - Method in class org.apache.spark.scheduler.SparkListenerJobStart
-
- stageIds() - Method in interface org.apache.spark.SparkJobInfo
-
- stageIds() - Method in class org.apache.spark.SparkJobInfoImpl
-
- stageIds() - Method in class org.apache.spark.status.api.v1.JobData
-
- stageIdToActiveJobIds() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- stageIdToData() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- stageIdToInfo() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- stageInfo() - Method in class org.apache.spark.scheduler.SparkListenerStageCompleted
-
- stageInfo() - Method in class org.apache.spark.scheduler.SparkListenerStageSubmitted
-
- StageInfo - Class in org.apache.spark.scheduler
-
:: DeveloperApi ::
Stores information about a stage to pass from the scheduler to SparkListeners.
- StageInfo(int, int, String, int, Seq<RDDInfo>, Seq<Object>, String, Seq<Seq<TaskLocation>>) - Constructor for class org.apache.spark.scheduler.StageInfo
-
- stageInfos() - Method in class org.apache.spark.scheduler.SparkListenerJobStart
-
- stageLogInfo(int, String, boolean) - Method in class org.apache.spark.scheduler.JobLogger
-
Write info into log file
- stages() - Method in class org.apache.spark.ml.Pipeline
-
param for pipeline stages
- stages() - Method in class org.apache.spark.ml.PipelineModel
-
- StageStatus - Enum in org.apache.spark.status.api.v1
-
- StandardNormalGenerator - Class in org.apache.spark.mllib.random
-
:: DeveloperApi ::
Generates i.i.d.
- StandardNormalGenerator() - Constructor for class org.apache.spark.mllib.random.StandardNormalGenerator
-
- StandardScaler - Class in org.apache.spark.ml.feature
-
:: Experimental ::
Standardizes features by removing the mean and scaling to unit variance using column summary
statistics on the samples in the training set.
- StandardScaler(String) - Constructor for class org.apache.spark.ml.feature.StandardScaler
-
- StandardScaler() - Constructor for class org.apache.spark.ml.feature.StandardScaler
-
- StandardScaler - Class in org.apache.spark.mllib.feature
-
:: Experimental ::
Standardizes features by removing the mean and scaling to unit std using column summary
statistics on the samples in the training set.
- StandardScaler(boolean, boolean) - Constructor for class org.apache.spark.mllib.feature.StandardScaler
-
- StandardScaler() - Constructor for class org.apache.spark.mllib.feature.StandardScaler
-
- StandardScalerModel - Class in org.apache.spark.ml.feature
-
- StandardScalerModel - Class in org.apache.spark.mllib.feature
-
:: Experimental ::
Represents a StandardScaler model that can transform vectors.
- StandardScalerModel(Vector, Vector, boolean, boolean) - Constructor for class org.apache.spark.mllib.feature.StandardScalerModel
-
- StandardScalerModel(Vector, Vector) - Constructor for class org.apache.spark.mllib.feature.StandardScalerModel
-
- StandardScalerModel(Vector) - Constructor for class org.apache.spark.mllib.feature.StandardScalerModel
-
- starGraph(SparkContext, int) - Static method in class org.apache.spark.graphx.util.GraphGenerators
-
Create a star graph with vertex 0 being the center.
- start() - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Start the execution of the streams.
- start() - Method in class org.apache.spark.streaming.dstream.ConstantInputDStream
-
- start() - Method in class org.apache.spark.streaming.dstream.InputDStream
-
Method called to start receiving data.
- start() - Method in class org.apache.spark.streaming.dstream.ReceiverInputDStream
-
- start() - Method in class org.apache.spark.streaming.StreamingContext
-
Start the execution of the streams.
- startIndexInLevel(int) - Static method in class org.apache.spark.mllib.tree.model.Node
-
Return the index of the first node in the given level.
- startPosition() - Method in exception org.apache.spark.sql.AnalysisException
-
- startsWith(Column) - Method in class org.apache.spark.sql.Column
-
String starts with.
- startsWith(String) - Method in class org.apache.spark.sql.Column
-
String starts with another string literal.
- startTime() - Method in class org.apache.spark.api.java.JavaSparkContext
-
- startTime() - Method in class org.apache.spark.SparkContext
-
- startTime() - Method in class org.apache.spark.status.api.v1.ApplicationAttemptInfo
-
- startTime() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- stat() - Method in class org.apache.spark.sql.DataFrame
-
- StatCounter - Class in org.apache.spark.util
-
A class for tracking the statistics of a set of numbers (count, mean and variance) in a
numerically robust way.
- StatCounter(TraversableOnce<Object>) - Constructor for class org.apache.spark.util.StatCounter
-
- StatCounter() - Constructor for class org.apache.spark.util.StatCounter
-
Initialize the StatCounter with no values.
- state() - Method in class org.apache.spark.scheduler.local.StatusUpdate
-
- staticPageRank(int, double) - Method in class org.apache.spark.graphx.GraphOps
-
Run PageRank for a fixed number of iterations returning a graph with vertex attributes
containing the PageRank and edge attributes the normalized edge weight.
- staticPersonalizedPageRank(long, int, double) - Method in class org.apache.spark.graphx.GraphOps
-
Run Personalized PageRank for a fixed number of iterations with
with all iterations originating at the source node
returning a graph with vertex attributes
containing the PageRank and edge attributes the normalized edge weight.
- statistic() - Method in class org.apache.spark.mllib.stat.test.ChiSqTestResult
-
- statistic() - Method in class org.apache.spark.mllib.stat.test.KolmogorovSmirnovTestResult
-
- statistic() - Method in interface org.apache.spark.mllib.stat.test.TestResult
-
Test statistic.
- Statistics - Class in org.apache.spark.mllib.stat
-
- Statistics() - Constructor for class org.apache.spark.mllib.stat.Statistics
-
- Statistics - Class in org.apache.spark.streaming.receiver
-
:: DeveloperApi ::
Statistics for querying the supervisor about state of workers.
- Statistics(int, int, int, String) - Constructor for class org.apache.spark.streaming.receiver.Statistics
-
- stats() - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Return a
StatCounter
object that captures the mean, variance and
count of the RDD's elements in one operation.
- stats() - Method in class org.apache.spark.mllib.tree.model.Node
-
- stats() - Method in class org.apache.spark.rdd.DoubleRDDFunctions
-
Return a
StatCounter
object that captures the mean, variance and
count of the RDD's elements in one operation.
- StatsReportListener - Class in org.apache.spark.scheduler
-
:: DeveloperApi ::
Simple SparkListener that logs a few summary statistics when each stage completes
- StatsReportListener() - Constructor for class org.apache.spark.scheduler.StatsReportListener
-
- StatsReportListener - Class in org.apache.spark.streaming.scheduler
-
:: DeveloperApi ::
A simple StreamingListener that logs summary statistics across Spark Streaming batches
param: numBatchInfos Number of last batches to consider for generating statistics (default: 10)
- StatsReportListener(int) - Constructor for class org.apache.spark.streaming.scheduler.StatsReportListener
-
- status() - Method in class org.apache.spark.scheduler.TaskInfo
-
- status() - Method in interface org.apache.spark.SparkJobInfo
-
- status() - Method in class org.apache.spark.SparkJobInfoImpl
-
- status() - Method in class org.apache.spark.status.api.v1.JobData
-
- status() - Method in class org.apache.spark.status.api.v1.StageData
-
- statusTracker() - Method in class org.apache.spark.api.java.JavaSparkContext
-
- statusTracker() - Method in class org.apache.spark.SparkContext
-
- StatusUpdate - Class in org.apache.spark.scheduler.local
-
- StatusUpdate(long, Enumeration.Value, ByteBuffer) - Constructor for class org.apache.spark.scheduler.local.StatusUpdate
-
- std() - Method in class org.apache.spark.ml.attribute.NumericAttribute
-
- std() - Method in class org.apache.spark.ml.feature.StandardScalerModel
-
- std() - Method in class org.apache.spark.mllib.feature.StandardScalerModel
-
- std() - Method in class org.apache.spark.mllib.random.LogNormalGenerator
-
- stdev() - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Compute the standard deviation of this RDD's elements.
- stdev() - Method in class org.apache.spark.rdd.DoubleRDDFunctions
-
Compute the standard deviation of this RDD's elements.
- stdev() - Method in class org.apache.spark.util.StatCounter
-
Return the standard deviation of the values.
- stop() - Method in class org.apache.spark.api.java.JavaSparkContext
-
Shut down the SparkContext.
- stop() - Method in interface org.apache.spark.broadcast.BroadcastFactory
-
- stop() - Method in class org.apache.spark.broadcast.HttpBroadcastFactory
-
- stop() - Method in class org.apache.spark.broadcast.TorrentBroadcastFactory
-
- stop() - Method in class org.apache.spark.SparkContext
-
- stop() - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Stop the execution of the streams.
- stop(boolean) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Stop the execution of the streams.
- stop(boolean, boolean) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Stop the execution of the streams.
- stop() - Method in class org.apache.spark.streaming.dstream.ConstantInputDStream
-
- stop() - Method in class org.apache.spark.streaming.dstream.InputDStream
-
Method called to stop receiving data.
- stop() - Method in class org.apache.spark.streaming.dstream.ReceiverInputDStream
-
- stop(String) - Method in class org.apache.spark.streaming.receiver.Receiver
-
Stop the receiver completely.
- stop(String, Throwable) - Method in class org.apache.spark.streaming.receiver.Receiver
-
Stop the receiver completely due to an exception
- stop(boolean) - Method in class org.apache.spark.streaming.StreamingContext
-
Stop the execution of the streams immediately (does not wait for all received data
to be processed).
- stop(boolean, boolean) - Method in class org.apache.spark.streaming.StreamingContext
-
Stop the execution of the streams, with option of ensuring all received data
has been processed.
- StopCoordinator - Class in org.apache.spark.scheduler
-
- StopCoordinator() - Constructor for class org.apache.spark.scheduler.StopCoordinator
-
- StopExecutor - Class in org.apache.spark.scheduler.local
-
- StopExecutor() - Constructor for class org.apache.spark.scheduler.local.StopExecutor
-
- stopped() - Method in class org.apache.spark.SparkContext
-
- StopWords - Class in org.apache.spark.ml.feature
-
stop words list
- StopWords() - Constructor for class org.apache.spark.ml.feature.StopWords
-
- stopWords() - Method in class org.apache.spark.ml.feature.StopWordsRemover
-
the stop words set to be filtered out
Default: StopWords.English
- StopWordsRemover - Class in org.apache.spark.ml.feature
-
:: Experimental ::
A feature transformer that filters out stop words from input.
- StopWordsRemover(String) - Constructor for class org.apache.spark.ml.feature.StopWordsRemover
-
- StopWordsRemover() - Constructor for class org.apache.spark.ml.feature.StopWordsRemover
-
- storageLevel() - Method in class org.apache.spark.status.api.v1.RDDPartitionInfo
-
- storageLevel() - Method in class org.apache.spark.status.api.v1.RDDStorageInfo
-
- storageLevel() - Method in class org.apache.spark.storage.BlockStatus
-
- storageLevel() - Method in class org.apache.spark.storage.BlockUpdatedInfo
-
- storageLevel() - Method in class org.apache.spark.storage.RDDInfo
-
- StorageLevel - Class in org.apache.spark.storage
-
:: DeveloperApi ::
Flags for controlling the storage of an RDD.
- StorageLevel() - Constructor for class org.apache.spark.storage.StorageLevel
-
- storageLevel() - Method in class org.apache.spark.streaming.dstream.DStream
-
- storageLevel() - Method in class org.apache.spark.streaming.receiver.Receiver
-
- storageLevelCache() - Static method in class org.apache.spark.storage.StorageLevel
-
:: DeveloperApi ::
Read StorageLevel object from ObjectInput stream.
- StorageLevels - Class in org.apache.spark.api.java
-
Expose some commonly useful storage level constants.
- StorageLevels() - Constructor for class org.apache.spark.api.java.StorageLevels
-
- StorageListener - Class in org.apache.spark.ui.storage
-
:: DeveloperApi ::
A SparkListener that prepares information to be displayed on the BlockManagerUI.
- StorageListener(StorageStatusListener) - Constructor for class org.apache.spark.ui.storage.StorageListener
-
- StorageStatus - Class in org.apache.spark.storage
-
:: DeveloperApi ::
Storage information for each BlockManager.
- StorageStatus(BlockManagerId, long) - Constructor for class org.apache.spark.storage.StorageStatus
-
- StorageStatus(BlockManagerId, long, Map<BlockId, BlockStatus>) - Constructor for class org.apache.spark.storage.StorageStatus
-
Create a storage status with an initial set of blocks, leaving the source unmodified.
- storageStatusList() - Method in class org.apache.spark.storage.StorageStatusListener
-
- storageStatusList() - Method in class org.apache.spark.ui.exec.ExecutorsListener
-
- storageStatusList() - Method in class org.apache.spark.ui.storage.StorageListener
-
- StorageStatusListener - Class in org.apache.spark.storage
-
:: DeveloperApi ::
A SparkListener that maintains executor storage status.
- StorageStatusListener() - Constructor for class org.apache.spark.storage.StorageStatusListener
-
- store(Iterator<T>) - Method in interface org.apache.spark.streaming.receiver.ActorHelper
-
Store an iterator of received data as a data block into Spark's memory.
- store(ByteBuffer) - Method in interface org.apache.spark.streaming.receiver.ActorHelper
-
Store the bytes of received data as a data block into Spark's memory.
- store(T) - Method in interface org.apache.spark.streaming.receiver.ActorHelper
-
Store a single item of received data to Spark's memory.
- store(T) - Method in class org.apache.spark.streaming.receiver.Receiver
-
Store a single item of received data to Spark's memory.
- store(ArrayBuffer<T>) - Method in class org.apache.spark.streaming.receiver.Receiver
-
Store an ArrayBuffer of received data as a data block into Spark's memory.
- store(ArrayBuffer<T>, Object) - Method in class org.apache.spark.streaming.receiver.Receiver
-
Store an ArrayBuffer of received data as a data block into Spark's memory.
- store(Iterator<T>) - Method in class org.apache.spark.streaming.receiver.Receiver
-
Store an iterator of received data as a data block into Spark's memory.
- store(Iterator<T>, Object) - Method in class org.apache.spark.streaming.receiver.Receiver
-
Store an iterator of received data as a data block into Spark's memory.
- store(Iterator<T>) - Method in class org.apache.spark.streaming.receiver.Receiver
-
Store an iterator of received data as a data block into Spark's memory.
- store(Iterator<T>, Object) - Method in class org.apache.spark.streaming.receiver.Receiver
-
Store an iterator of received data as a data block into Spark's memory.
- store(ByteBuffer) - Method in class org.apache.spark.streaming.receiver.Receiver
-
Store the bytes of received data as a data block into Spark's memory.
- store(ByteBuffer, Object) - Method in class org.apache.spark.streaming.receiver.Receiver
-
Store the bytes of received data as a data block into Spark's memory.
- strategies() - Method in class org.apache.spark.sql.SQLContext.SparkPlanner
-
- Strategy - Class in org.apache.spark.mllib.tree.configuration
-
:: Experimental ::
Stores all the configuration options for tree construction
param: algo Learning goal.
- Strategy(Enumeration.Value, Impurity, int, int, int, Enumeration.Value, Map<Object, Object>, int, double, int, double, boolean, int) - Constructor for class org.apache.spark.mllib.tree.configuration.Strategy
-
- Strategy(Enumeration.Value, Impurity, int, int, int, Map<Integer, Integer>) - Constructor for class org.apache.spark.mllib.tree.configuration.Strategy
-
- STREAM() - Static method in class org.apache.spark.storage.BlockId
-
- StreamBlockId - Class in org.apache.spark.storage
-
- StreamBlockId(int, long) - Constructor for class org.apache.spark.storage.StreamBlockId
-
- streamId() - Method in class org.apache.spark.storage.StreamBlockId
-
- streamId() - Method in class org.apache.spark.streaming.receiver.Receiver
-
Get the unique identifier the receiver input stream that this
receiver is associated with.
- streamId() - Method in class org.apache.spark.streaming.scheduler.ReceiverInfo
-
- streamIdToInputInfo() - Method in class org.apache.spark.streaming.scheduler.BatchInfo
-
- streamIdToNumRecords() - Method in class org.apache.spark.streaming.scheduler.BatchInfo
-
- StreamingContext - Class in org.apache.spark.streaming
-
Main entry point for Spark Streaming functionality.
- StreamingContext(SparkContext, Duration) - Constructor for class org.apache.spark.streaming.StreamingContext
-
Create a StreamingContext using an existing SparkContext.
- StreamingContext(SparkConf, Duration) - Constructor for class org.apache.spark.streaming.StreamingContext
-
Create a StreamingContext by providing the configuration necessary for a new SparkContext.
- StreamingContext(String, String, Duration, String, Seq<String>, Map<String, String>) - Constructor for class org.apache.spark.streaming.StreamingContext
-
Create a StreamingContext by providing the details necessary for creating a new SparkContext.
- StreamingContext(String, Configuration) - Constructor for class org.apache.spark.streaming.StreamingContext
-
Recreate a StreamingContext from a checkpoint file.
- StreamingContext(String) - Constructor for class org.apache.spark.streaming.StreamingContext
-
Recreate a StreamingContext from a checkpoint file.
- StreamingContext(String, SparkContext) - Constructor for class org.apache.spark.streaming.StreamingContext
-
Recreate a StreamingContext from a checkpoint file using an existing SparkContext.
- StreamingContextState - Enum in org.apache.spark.streaming
-
:: DeveloperApi ::
Represents the state of a StreamingContext.
- StreamingKMeans - Class in org.apache.spark.mllib.clustering
-
:: Experimental ::
- StreamingKMeans(int, double, String) - Constructor for class org.apache.spark.mllib.clustering.StreamingKMeans
-
- StreamingKMeans() - Constructor for class org.apache.spark.mllib.clustering.StreamingKMeans
-
- StreamingKMeansModel - Class in org.apache.spark.mllib.clustering
-
:: Experimental ::
- StreamingKMeansModel(Vector[], double[]) - Constructor for class org.apache.spark.mllib.clustering.StreamingKMeansModel
-
- StreamingLinearAlgorithm<M extends GeneralizedLinearModel,A extends GeneralizedLinearAlgorithm<M>> - Class in org.apache.spark.mllib.regression
-
:: DeveloperApi ::
StreamingLinearAlgorithm implements methods for continuously
training a generalized linear model model on streaming data,
and using it for prediction on (possibly different) streaming data.
- StreamingLinearAlgorithm() - Constructor for class org.apache.spark.mllib.regression.StreamingLinearAlgorithm
-
- StreamingLinearRegressionWithSGD - Class in org.apache.spark.mllib.regression
-
:: Experimental ::
Train or predict a linear regression model on streaming data.
- StreamingLinearRegressionWithSGD() - Constructor for class org.apache.spark.mllib.regression.StreamingLinearRegressionWithSGD
-
Construct a StreamingLinearRegression object with default parameters:
{stepSize: 0.1, numIterations: 50, miniBatchFraction: 1.0}.
- StreamingListener - Interface in org.apache.spark.streaming.scheduler
-
:: DeveloperApi ::
A listener interface for receiving information about an ongoing streaming
computation.
- StreamingListenerBatchCompleted - Class in org.apache.spark.streaming.scheduler
-
- StreamingListenerBatchCompleted(BatchInfo) - Constructor for class org.apache.spark.streaming.scheduler.StreamingListenerBatchCompleted
-
- StreamingListenerBatchStarted - Class in org.apache.spark.streaming.scheduler
-
- StreamingListenerBatchStarted(BatchInfo) - Constructor for class org.apache.spark.streaming.scheduler.StreamingListenerBatchStarted
-
- StreamingListenerBatchSubmitted - Class in org.apache.spark.streaming.scheduler
-
- StreamingListenerBatchSubmitted(BatchInfo) - Constructor for class org.apache.spark.streaming.scheduler.StreamingListenerBatchSubmitted
-
- StreamingListenerEvent - Interface in org.apache.spark.streaming.scheduler
-
:: DeveloperApi ::
Base trait for events related to StreamingListener
- StreamingListenerReceiverError - Class in org.apache.spark.streaming.scheduler
-
- StreamingListenerReceiverError(ReceiverInfo) - Constructor for class org.apache.spark.streaming.scheduler.StreamingListenerReceiverError
-
- StreamingListenerReceiverStarted - Class in org.apache.spark.streaming.scheduler
-
- StreamingListenerReceiverStarted(ReceiverInfo) - Constructor for class org.apache.spark.streaming.scheduler.StreamingListenerReceiverStarted
-
- StreamingListenerReceiverStopped - Class in org.apache.spark.streaming.scheduler
-
- StreamingListenerReceiverStopped(ReceiverInfo) - Constructor for class org.apache.spark.streaming.scheduler.StreamingListenerReceiverStopped
-
- StreamingLogisticRegressionWithSGD - Class in org.apache.spark.mllib.classification
-
:: Experimental ::
Train or predict a logistic regression model on streaming data.
- StreamingLogisticRegressionWithSGD() - Constructor for class org.apache.spark.mllib.classification.StreamingLogisticRegressionWithSGD
-
Construct a StreamingLogisticRegression object with default parameters:
{stepSize: 0.1, numIterations: 50, miniBatchFraction: 1.0, regParam: 0.0}.
- StreamInputInfo - Class in org.apache.spark.streaming.scheduler
-
:: DeveloperApi ::
Track the information of input stream at specified batch time.
- StreamInputInfo(int, long, Map<String, Object>) - Constructor for class org.apache.spark.streaming.scheduler.StreamInputInfo
-
- string() - Method in class org.apache.spark.sql.ColumnName
-
Creates a new StructField
of type string.
- StringArrayParam - Class in org.apache.spark.ml.param
-
:: DeveloperApi ::
Specialized version of Param[Array[String
} for Java.
- StringArrayParam(Params, String, String, Function1<String[], Object>) - Constructor for class org.apache.spark.ml.param.StringArrayParam
-
- StringArrayParam(Params, String, String) - Constructor for class org.apache.spark.ml.param.StringArrayParam
-
- StringContains - Class in org.apache.spark.sql.sources
-
A filter that evaluates to true
iff the attribute evaluates to
a string that contains the string value
.
- StringContains(String, String) - Constructor for class org.apache.spark.sql.sources.StringContains
-
- StringEndsWith - Class in org.apache.spark.sql.sources
-
A filter that evaluates to true
iff the attribute evaluates to
a string that starts with value
.
- StringEndsWith(String, String) - Constructor for class org.apache.spark.sql.sources.StringEndsWith
-
- StringIndexer - Class in org.apache.spark.ml.feature
-
:: Experimental ::
A label indexer that maps a string column of labels to an ML column of label indices.
- StringIndexer(String) - Constructor for class org.apache.spark.ml.feature.StringIndexer
-
- StringIndexer() - Constructor for class org.apache.spark.ml.feature.StringIndexer
-
- StringIndexerModel - Class in org.apache.spark.ml.feature
-
- StringIndexerModel(String, String[]) - Constructor for class org.apache.spark.ml.feature.StringIndexerModel
-
- StringIndexerModel(String[]) - Constructor for class org.apache.spark.ml.feature.StringIndexerModel
-
- stringOrError(Function0<A>) - Method in class org.apache.spark.sql.SQLContext.QueryExecution
-
- stringResult() - Method in class org.apache.spark.sql.hive.HiveContext.QueryExecution
-
Returns the result as a hive compatible sequence of strings.
- StringRRDD<T> - Class in org.apache.spark.api.r
-
An RDD that stores R objects as Array[String].
- StringRRDD(RDD<T>, byte[], String, byte[], Object[], ClassTag<T>) - Constructor for class org.apache.spark.api.r.StringRRDD
-
- StringStartsWith - Class in org.apache.spark.sql.sources
-
A filter that evaluates to true
iff the attribute evaluates to
a string that starts with value
.
- StringStartsWith(String, String) - Constructor for class org.apache.spark.sql.sources.StringStartsWith
-
- stringToText(String) - Static method in class org.apache.spark.SparkContext
-
- StringType - Static variable in class org.apache.spark.sql.types.DataTypes
-
Gets the StringType object.
- StringType - Class in org.apache.spark.sql.types
-
:: DeveloperApi ::
The data type representing String
values.
- stringWritableConverter() - Static method in class org.apache.spark.SparkContext
-
- stronglyConnectedComponents(int) - Method in class org.apache.spark.graphx.GraphOps
-
Compute the strongly connected component (SCC) of each vertex and return a graph with the
vertex value containing the lowest vertex id in the SCC containing that vertex.
- StronglyConnectedComponents - Class in org.apache.spark.graphx.lib
-
Strongly connected components algorithm implementation.
- StronglyConnectedComponents() - Constructor for class org.apache.spark.graphx.lib.StronglyConnectedComponents
-
- struct(Seq<StructField>) - Method in class org.apache.spark.sql.ColumnName
-
Creates a new StructField
of type struct.
- struct(StructType) - Method in class org.apache.spark.sql.ColumnName
-
Creates a new StructField
of type struct.
- struct(Column...) - Static method in class org.apache.spark.sql.functions
-
Creates a new struct column.
- struct(Seq<Column>) - Static method in class org.apache.spark.sql.functions
-
Creates a new struct column.
- struct(String, Seq<String>) - Static method in class org.apache.spark.sql.functions
-
Creates a new struct column that composes multiple input columns.
- StructField - Class in org.apache.spark.sql.types
-
A field inside a StructType.
- StructField(String, DataType, boolean, Metadata) - Constructor for class org.apache.spark.sql.types.StructField
-
- StructField() - Constructor for class org.apache.spark.sql.types.StructField
-
No-arg constructor for kryo.
- StructType - Class in org.apache.spark.sql.types
-
:: DeveloperApi ::
A
StructType
object can be constructed by
- StructType(StructField[]) - Constructor for class org.apache.spark.sql.types.StructType
-
- StructType() - Constructor for class org.apache.spark.sql.types.StructType
-
No-arg constructor for kryo.
- subgraph(Function1<EdgeTriplet<VD, ED>, Object>, Function2<Object, VD, Object>) - Method in class org.apache.spark.graphx.Graph
-
Restricts the graph to only the vertices and edges satisfying the predicates.
- subgraph(Function1<EdgeTriplet<VD, ED>, Object>, Function2<Object, VD, Object>) - Method in class org.apache.spark.graphx.impl.GraphImpl
-
- submissionTime() - Method in class org.apache.spark.scheduler.StageInfo
-
When this stage was submitted from the DAGScheduler to a TaskScheduler.
- submissionTime() - Method in interface org.apache.spark.SparkStageInfo
-
- submissionTime() - Method in class org.apache.spark.SparkStageInfoImpl
-
- submissionTime() - Method in class org.apache.spark.status.api.v1.JobData
-
- submissionTime() - Method in class org.apache.spark.streaming.scheduler.BatchInfo
-
- submitJob(RDD<T>, Function1<Iterator<T>, U>, Seq<Object>, Function2<Object, U, BoxedUnit>, Function0<R>) - Method in class org.apache.spark.SparkContext
-
:: Experimental ::
Submit a job for execution and return a FutureJob holding the result.
- subsamplingRate() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- subsetAccuracy() - Method in class org.apache.spark.mllib.evaluation.MultilabelMetrics
-
Returns subset accuracy
(for equal sets of labels)
- substitutor() - Method in class org.apache.spark.sql.hive.HiveContext
-
- substr(Column, Column) - Method in class org.apache.spark.sql.Column
-
An expression that returns a substring.
- substr(int, int) - Method in class org.apache.spark.sql.Column
-
An expression that returns a substring.
- substring(Column, int, int) - Static method in class org.apache.spark.sql.functions
-
Substring starts at pos
and is of length len
when str is String type or
returns the slice of byte array that starts at pos
in byte and is of length len
when str is Binary type
- substring_index(Column, String, int) - Static method in class org.apache.spark.sql.functions
-
Returns the substring from string str before count occurrences of the delimiter delim.
- subtract(JavaDoubleRDD) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Return an RDD with the elements from this
that are not in other
.
- subtract(JavaDoubleRDD, int) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Return an RDD with the elements from this
that are not in other
.
- subtract(JavaDoubleRDD, Partitioner) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Return an RDD with the elements from this
that are not in other
.
- subtract(JavaPairRDD<K, V>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return an RDD with the elements from this
that are not in other
.
- subtract(JavaPairRDD<K, V>, int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return an RDD with the elements from this
that are not in other
.
- subtract(JavaPairRDD<K, V>, Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return an RDD with the elements from this
that are not in other
.
- subtract(JavaRDD<T>) - Method in class org.apache.spark.api.java.JavaRDD
-
Return an RDD with the elements from this
that are not in other
.
- subtract(JavaRDD<T>, int) - Method in class org.apache.spark.api.java.JavaRDD
-
Return an RDD with the elements from this
that are not in other
.
- subtract(JavaRDD<T>, Partitioner) - Method in class org.apache.spark.api.java.JavaRDD
-
Return an RDD with the elements from this
that are not in other
.
- subtract(RDD<T>) - Method in class org.apache.spark.rdd.RDD
-
Return an RDD with the elements from this
that are not in other
.
- subtract(RDD<T>, int) - Method in class org.apache.spark.rdd.RDD
-
Return an RDD with the elements from this
that are not in other
.
- subtract(RDD<T>, Partitioner, Ordering<T>) - Method in class org.apache.spark.rdd.RDD
-
Return an RDD with the elements from this
that are not in other
.
- subtract(Vector) - Method in class org.apache.spark.util.Vector
-
- subtractByKey(JavaPairRDD<K, W>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return an RDD with the pairs from this
whose keys are not in other
.
- subtractByKey(JavaPairRDD<K, W>, int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return an RDD with the pairs from `this` whose keys are not in `other`.
- subtractByKey(JavaPairRDD<K, W>, Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return an RDD with the pairs from `this` whose keys are not in `other`.
- subtractByKey(RDD<Tuple2<K, W>>, ClassTag<W>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Return an RDD with the pairs from this
whose keys are not in other
.
- subtractByKey(RDD<Tuple2<K, W>>, int, ClassTag<W>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Return an RDD with the pairs from `this` whose keys are not in `other`.
- subtractByKey(RDD<Tuple2<K, W>>, Partitioner, ClassTag<W>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Return an RDD with the pairs from `this` whose keys are not in `other`.
- succeededTasks() - Method in class org.apache.spark.status.api.v1.ExecutorStageSummary
-
- Success - Class in org.apache.spark
-
:: DeveloperApi ::
Task succeeded.
- Success() - Constructor for class org.apache.spark.Success
-
- successful() - Method in class org.apache.spark.scheduler.TaskInfo
-
- sum() - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Add up the elements in this RDD.
- sum() - Method in class org.apache.spark.rdd.DoubleRDDFunctions
-
Add up the elements in this RDD.
- sum(Column) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns the sum of all values in the expression.
- sum(String) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns the sum of all values in the given column.
- sum(String...) - Method in class org.apache.spark.sql.GroupedData
-
Compute the sum for each numeric columns for each group.
- sum(Seq<String>) - Method in class org.apache.spark.sql.GroupedData
-
Compute the sum for each numeric columns for each group.
- sum() - Method in class org.apache.spark.util.StatCounter
-
- sum() - Method in class org.apache.spark.util.Vector
-
- sumApprox(long, Double) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
:: Experimental ::
Approximate operation to return the sum within a timeout.
- sumApprox(long) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
:: Experimental ::
Approximate operation to return the sum within a timeout.
- sumApprox(long, double) - Method in class org.apache.spark.rdd.DoubleRDDFunctions
-
:: Experimental ::
Approximate operation to return the sum within a timeout.
- sumDistinct(Column) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns the sum of distinct values in the expression.
- sumDistinct(String) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns the sum of distinct values in the expression.
- summary() - Method in class org.apache.spark.ml.classification.LogisticRegressionModel
-
Gets summary of model on training set.
- summary() - Method in class org.apache.spark.ml.regression.LinearRegressionModel
-
Gets summary (e.g.
- supportedFeatureSubsetStrategies() - Static method in class org.apache.spark.ml.classification.RandomForestClassifier
-
Accessor for supported featureSubsetStrategy settings: auto, all, onethird, sqrt, log2
- supportedFeatureSubsetStrategies() - Static method in class org.apache.spark.ml.regression.RandomForestRegressor
-
Accessor for supported featureSubsetStrategy settings: auto, all, onethird, sqrt, log2
- supportedFeatureSubsetStrategies() - Static method in class org.apache.spark.mllib.tree.RandomForest
-
List of supported feature subset sampling strategies.
- supportedImpurities() - Static method in class org.apache.spark.ml.classification.DecisionTreeClassifier
-
Accessor for supported impurities: entropy, gini
- supportedImpurities() - Static method in class org.apache.spark.ml.classification.RandomForestClassifier
-
Accessor for supported impurity settings: entropy, gini
- supportedImpurities() - Static method in class org.apache.spark.ml.regression.DecisionTreeRegressor
-
Accessor for supported impurities: variance
- supportedImpurities() - Static method in class org.apache.spark.ml.regression.RandomForestRegressor
-
Accessor for supported impurity settings: variance
- supportedLossTypes() - Static method in class org.apache.spark.ml.classification.GBTClassifier
-
Accessor for supported loss settings: logistic
- supportedLossTypes() - Static method in class org.apache.spark.ml.regression.GBTRegressor
-
Accessor for supported loss settings: squared (L2), absolute (L1)
- supportedModelTypes() - Static method in class org.apache.spark.mllib.classification.NaiveBayes
-
- supportsRelocationOfSerializedObjects() - Method in class org.apache.spark.serializer.KryoSerializer
-
- SVDPlusPlus - Class in org.apache.spark.graphx.lib
-
Implementation of SVD++ algorithm.
- SVDPlusPlus() - Constructor for class org.apache.spark.graphx.lib.SVDPlusPlus
-
- SVDPlusPlus.Conf - Class in org.apache.spark.graphx.lib
-
Configuration parameters for SVDPlusPlus.
- SVDPlusPlus.Conf(int, int, double, double, double, double, double, double) - Constructor for class org.apache.spark.graphx.lib.SVDPlusPlus.Conf
-
- SVMDataGenerator - Class in org.apache.spark.mllib.util
-
:: DeveloperApi ::
Generate sample data used for SVM.
- SVMDataGenerator() - Constructor for class org.apache.spark.mllib.util.SVMDataGenerator
-
- SVMModel - Class in org.apache.spark.mllib.classification
-
Model for Support Vector Machines (SVMs).
- SVMModel(Vector, double) - Constructor for class org.apache.spark.mllib.classification.SVMModel
-
- SVMWithSGD - Class in org.apache.spark.mllib.classification
-
Train a Support Vector Machine (SVM) using Stochastic Gradient Descent.
- SVMWithSGD() - Constructor for class org.apache.spark.mllib.classification.SVMWithSGD
-
Construct a SVM object with default parameters: {stepSize: 1.0, numIterations: 100,
regParm: 0.01, miniBatchFraction: 1.0}.
- SYSTEM_DEFAULT() - Static method in class org.apache.spark.sql.types.DecimalType
-
- systemProperties() - Method in class org.apache.spark.ui.env.EnvironmentListener
-
- t() - Method in class org.apache.spark.SerializableWritable
-
- table(String) - Method in class org.apache.spark.sql.DataFrameReader
-
- table(String) - Method in class org.apache.spark.sql.SQLContext
-
- tableNames() - Method in class org.apache.spark.sql.SQLContext
-
- tableNames(String) - Method in class org.apache.spark.sql.SQLContext
-
- tables() - Method in class org.apache.spark.sql.SQLContext
-
- tables(String) - Method in class org.apache.spark.sql.SQLContext
-
- TableScan - Interface in org.apache.spark.sql.sources
-
::DeveloperApi::
A BaseRelation that can produce all of its tuples as an RDD of Row objects.
- tachyonFolderName() - Method in class org.apache.spark.SparkContext
-
- tag() - Method in class org.apache.spark.sql.types.BinaryType
-
- tag() - Method in class org.apache.spark.sql.types.BooleanType
-
- tag() - Method in class org.apache.spark.sql.types.ByteType
-
- tag() - Method in class org.apache.spark.sql.types.DateType
-
- tag() - Method in class org.apache.spark.sql.types.DecimalType
-
- tag() - Method in class org.apache.spark.sql.types.DoubleType
-
- tag() - Method in class org.apache.spark.sql.types.FloatType
-
- tag() - Method in class org.apache.spark.sql.types.IntegerType
-
- tag() - Method in class org.apache.spark.sql.types.LongType
-
- tag() - Method in class org.apache.spark.sql.types.ShortType
-
- tag() - Method in class org.apache.spark.sql.types.StringType
-
- tag() - Method in class org.apache.spark.sql.types.TimestampType
-
- take(int) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Take the first num elements of the RDD.
- take(int) - Method in class org.apache.spark.rdd.RDD
-
Take the first num elements of the RDD.
- take(int) - Method in class org.apache.spark.sql.DataFrame
-
- takeAsync(int) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
The asynchronous version of the take
action, which returns a
future for retrieving the first num
elements of this RDD.
- takeAsync(int) - Method in class org.apache.spark.rdd.AsyncRDDActions
-
Returns a future for retrieving the first num elements of the RDD.
- takeOrdered(int, Comparator<T>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Returns the first k (smallest) elements from this RDD as defined by
the specified Comparator[T] and maintains the order.
- takeOrdered(int) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Returns the first k (smallest) elements from this RDD using the
natural ordering for T while maintain the order.
- takeOrdered(int, Ordering<T>) - Method in class org.apache.spark.rdd.RDD
-
Returns the first k (smallest) elements from this RDD as defined by the specified
implicit Ordering[T] and maintains the ordering.
- takeSample(boolean, int) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
- takeSample(boolean, int, long) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
- takeSample(boolean, int, long) - Method in class org.apache.spark.rdd.RDD
-
Return a fixed-size sampled subset of this RDD in an array
- tallSkinnyQR(boolean) - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
-
- tan(Column) - Static method in class org.apache.spark.sql.functions
-
Computes the tangent of the given value.
- tan(String) - Static method in class org.apache.spark.sql.functions
-
Computes the tangent of the given column.
- tanh(Column) - Static method in class org.apache.spark.sql.functions
-
Computes the hyperbolic tangent of the given value.
- tanh(String) - Static method in class org.apache.spark.sql.functions
-
Computes the hyperbolic tangent of the given column.
- targetStorageLevel() - Method in class org.apache.spark.graphx.impl.EdgeRDDImpl
-
- targetStorageLevel() - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
-
- task() - Method in class org.apache.spark.CleanupTaskWeakReference
-
- taskAttemptId() - Method in class org.apache.spark.TaskContext
-
An ID that is unique to this task attempt (within the same SparkContext, no two task attempts
will share the same attempt ID).
- TaskCommitDenied - Class in org.apache.spark
-
:: DeveloperApi ::
Task requested the driver to commit, but was denied.
- TaskCommitDenied(int, int, int) - Constructor for class org.apache.spark.TaskCommitDenied
-
- TaskCompletionListener - Interface in org.apache.spark.util
-
:: DeveloperApi ::
- TaskContext - Class in org.apache.spark
-
Contextual information about a task which can be read or mutated during
execution.
- TaskContext() - Constructor for class org.apache.spark.TaskContext
-
- TaskData - Class in org.apache.spark.status.api.v1
-
- TaskEndReason - Interface in org.apache.spark
-
:: DeveloperApi ::
Various possible reasons why a task ended.
- TaskFailedReason - Interface in org.apache.spark
-
:: DeveloperApi ::
Various possible reasons why a task failed.
- taskId() - Method in class org.apache.spark.scheduler.local.KillTask
-
- taskId() - Method in class org.apache.spark.scheduler.local.StatusUpdate
-
- taskId() - Method in class org.apache.spark.scheduler.TaskInfo
-
- taskId() - Method in class org.apache.spark.status.api.v1.TaskData
-
- taskId() - Method in class org.apache.spark.storage.TaskResultBlockId
-
- taskInfo() - Method in class org.apache.spark.scheduler.SparkListenerTaskEnd
-
- taskInfo() - Method in class org.apache.spark.scheduler.SparkListenerTaskGettingResult
-
- taskInfo() - Method in class org.apache.spark.scheduler.SparkListenerTaskStart
-
- TaskInfo - Class in org.apache.spark.scheduler
-
:: DeveloperApi ::
Information about a running task attempt inside a TaskSet.
- TaskInfo(long, int, int, long, String, String, Enumeration.Value, boolean) - Constructor for class org.apache.spark.scheduler.TaskInfo
-
- TaskKilled - Class in org.apache.spark
-
:: DeveloperApi ::
Task was killed intentionally and needs to be rescheduled.
- TaskKilled() - Constructor for class org.apache.spark.TaskKilled
-
- TaskKilledException - Exception in org.apache.spark
-
:: DeveloperApi ::
Exception thrown when a task is explicitly killed (i.e., task failure is expected).
- TaskKilledException() - Constructor for exception org.apache.spark.TaskKilledException
-
- taskLocality() - Method in class org.apache.spark.scheduler.TaskInfo
-
- TaskLocality - Class in org.apache.spark.scheduler
-
- TaskLocality() - Constructor for class org.apache.spark.scheduler.TaskLocality
-
- taskLocality() - Method in class org.apache.spark.status.api.v1.TaskData
-
- taskLocalityPreferences() - Method in class org.apache.spark.scheduler.StageInfo
-
- TaskMetricDistributions - Class in org.apache.spark.status.api.v1
-
- taskMetrics() - Method in class org.apache.spark.scheduler.SparkListenerExecutorMetricsUpdate
-
- taskMetrics() - Method in class org.apache.spark.scheduler.SparkListenerTaskEnd
-
- taskMetrics() - Method in class org.apache.spark.status.api.v1.TaskData
-
- TaskMetrics - Class in org.apache.spark.status.api.v1
-
- taskMetrics() - Method in class org.apache.spark.TaskContext
-
::DeveloperApi::
- TASKRESULT() - Static method in class org.apache.spark.storage.BlockId
-
- TaskResultBlockId - Class in org.apache.spark.storage
-
- TaskResultBlockId(long) - Constructor for class org.apache.spark.storage.TaskResultBlockId
-
- TaskResultLost - Class in org.apache.spark
-
:: DeveloperApi ::
The task finished successfully, but the result was lost from the executor's block manager before
it was fetched.
- TaskResultLost() - Constructor for class org.apache.spark.TaskResultLost
-
- tasks() - Method in class org.apache.spark.status.api.v1.StageData
-
- TaskSorting - Enum in org.apache.spark.status.api.v1
-
- taskTime() - Method in class org.apache.spark.status.api.v1.ExecutorStageSummary
-
- taskType() - Method in class org.apache.spark.scheduler.SparkListenerTaskEnd
-
- TEST() - Static method in class org.apache.spark.storage.BlockId
-
- TestResult<DF> - Interface in org.apache.spark.mllib.stat.test
-
:: Experimental ::
Trait for hypothesis test results.
- textFile(String) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Read a text file from HDFS, a local file system (available on all nodes), or any
Hadoop-supported file system URI, and return it as an RDD of Strings.
- textFile(String, int) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Read a text file from HDFS, a local file system (available on all nodes), or any
Hadoop-supported file system URI, and return it as an RDD of Strings.
- textFile(String, int) - Method in class org.apache.spark.SparkContext
-
Read a text file from HDFS, a local file system (available on all nodes), or any
Hadoop-supported file system URI, and return it as an RDD of Strings.
- textFileStream(String) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Create an input stream that monitors a Hadoop-compatible filesystem
for new files and reads them as text files (using key as LongWritable, value
as Text and input format as TextInputFormat).
- textFileStream(String) - Method in class org.apache.spark.streaming.StreamingContext
-
Create a input stream that monitors a Hadoop-compatible filesystem
for new files and reads them as text files (using key as LongWritable, value
as Text and input format as TextInputFormat).
- theta() - Method in class org.apache.spark.ml.classification.NaiveBayesModel
-
- theta() - Method in class org.apache.spark.mllib.classification.NaiveBayesModel
-
- threshold() - Method in class org.apache.spark.ml.feature.Binarizer
-
Param for threshold used to binarize continuous features.
- threshold() - Method in class org.apache.spark.ml.tree.ContinuousSplit
-
- threshold() - Method in class org.apache.spark.mllib.tree.model.Split
-
- thresholds() - Method in class org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
-
Returns thresholds in descending order.
- throwBalls() - Method in class org.apache.spark.rdd.PartitionCoalescer
-
- time() - Method in class org.apache.spark.scheduler.SparkListenerApplicationEnd
-
- time() - Method in class org.apache.spark.scheduler.SparkListenerApplicationStart
-
- time() - Method in class org.apache.spark.scheduler.SparkListenerBlockManagerAdded
-
- time() - Method in class org.apache.spark.scheduler.SparkListenerBlockManagerRemoved
-
- time() - Method in class org.apache.spark.scheduler.SparkListenerExecutorAdded
-
- time() - Method in class org.apache.spark.scheduler.SparkListenerExecutorRemoved
-
- time() - Method in class org.apache.spark.scheduler.SparkListenerJobEnd
-
- time() - Method in class org.apache.spark.scheduler.SparkListenerJobStart
-
- Time - Class in org.apache.spark.streaming
-
This is a simple class that represents an absolute instant of time.
- Time(long) - Constructor for class org.apache.spark.streaming.Time
-
- times(int) - Method in class org.apache.spark.streaming.Duration
-
- timestamp() - Method in class org.apache.spark.sql.ColumnName
-
Creates a new StructField
of type timestamp.
- TimestampType - Static variable in class org.apache.spark.sql.types.DataTypes
-
Gets the TimestampType object.
- TimestampType - Class in org.apache.spark.sql.types
-
:: DeveloperApi ::
The data type representing java.sql.Timestamp
values.
- TimeTrackingOutputStream - Class in org.apache.spark.storage
-
Intercepts write calls and tracks total time spent writing in order to update shuffle write
metrics.
- TimeTrackingOutputStream(ShuffleWriteMetrics, OutputStream) - Constructor for class org.apache.spark.storage.TimeTrackingOutputStream
-
- timeUnit() - Method in class org.apache.spark.mllib.clustering.StreamingKMeans
-
- TIMING_DATA() - Static method in class org.apache.spark.api.r.SpecialLengths
-
- tlSession() - Method in class org.apache.spark.sql.SQLContext
-
- to(Time, Duration) - Method in class org.apache.spark.streaming.Time
-
- to_date(Column) - Static method in class org.apache.spark.sql.functions
-
Converts the column into DateType.
- to_utc_timestamp(Column, String) - Static method in class org.apache.spark.sql.functions
-
Assumes given timestamp is in given timezone and converts to UTC.
- toArray() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
- toArray() - Method in class org.apache.spark.input.PortableDataStream
-
Read the file as a byte array
- toArray() - Method in class org.apache.spark.mllib.linalg.DenseVector
-
- toArray() - Method in interface org.apache.spark.mllib.linalg.Matrix
-
Converts to a dense array in column major.
- toArray() - Method in class org.apache.spark.mllib.linalg.SparseVector
-
- toArray() - Method in interface org.apache.spark.mllib.linalg.Vector
-
Converts the instance to a double array.
- toArray() - Method in class org.apache.spark.rdd.RDD
-
Return an array that contains all of the elements in this RDD.
- toArray(DataType, ClassTag<T>) - Method in class org.apache.spark.sql.types.ArrayData
-
- toAttributes() - Method in class org.apache.spark.sql.types.StructType
-
- toBigDecimal() - Method in class org.apache.spark.sql.types.Decimal
-
- toBlockMatrix() - Method in class org.apache.spark.mllib.linalg.distributed.CoordinateMatrix
-
Converts to BlockMatrix.
- toBlockMatrix(int, int) - Method in class org.apache.spark.mllib.linalg.distributed.CoordinateMatrix
-
Converts to BlockMatrix.
- toBlockMatrix() - Method in class org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix
-
Converts to BlockMatrix.
- toBlockMatrix(int, int) - Method in class org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix
-
Converts to BlockMatrix.
- toBooleanArray() - Method in class org.apache.spark.sql.types.ArrayData
-
- toBreeze() - Method in interface org.apache.spark.mllib.linalg.distributed.DistributedMatrix
-
Collects data and assembles a local dense breeze matrix (for test only).
- toBreeze() - Method in interface org.apache.spark.mllib.linalg.Matrix
-
Converts to a breeze matrix.
- toBreeze() - Method in interface org.apache.spark.mllib.linalg.Vector
-
Converts the instance to a breeze vector.
- toByte() - Method in class org.apache.spark.sql.types.Decimal
-
- toByteArray() - Method in class org.apache.spark.sql.types.ArrayData
-
- toCoordinateMatrix() - Method in class org.apache.spark.mllib.linalg.distributed.BlockMatrix
-
Converts to CoordinateMatrix.
- toCoordinateMatrix() - Method in class org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix
-
- toDebugString() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
A description of this RDD and its recursive dependencies for debugging.
- toDebugString() - Method in class org.apache.spark.mllib.tree.model.DecisionTreeModel
-
Print the full model to a string.
- toDebugString() - Method in class org.apache.spark.rdd.RDD
-
A description of this RDD and its recursive dependencies for debugging.
- toDebugString() - Method in class org.apache.spark.SparkConf
-
Return a string listing all keys and values, one per line.
- toDebugString() - Method in class org.apache.spark.sql.types.Decimal
-
- toDegrees(Column) - Static method in class org.apache.spark.sql.functions
-
Converts an angle measured in radians to an approximately equivalent angle measured in degrees.
- toDegrees(String) - Static method in class org.apache.spark.sql.functions
-
Converts an angle measured in radians to an approximately equivalent angle measured in degrees.
- toDense() - Method in class org.apache.spark.mllib.linalg.SparseMatrix
-
Generate a DenseMatrix
from the given SparseMatrix
.
- toDense() - Method in interface org.apache.spark.mllib.linalg.Vector
-
Converts this vector to a dense vector.
- toDF(String...) - Method in class org.apache.spark.sql.DataFrame
-
Returns a new
DataFrame
with columns renamed.
- toDF() - Method in class org.apache.spark.sql.DataFrame
-
Returns the object itself.
- toDF(Seq<String>) - Method in class org.apache.spark.sql.DataFrame
-
Returns a new
DataFrame
with columns renamed.
- toDouble() - Method in class org.apache.spark.sql.types.Decimal
-
- toDoubleArray() - Method in class org.apache.spark.sql.types.ArrayData
-
- toEdgeTriplet() - Method in class org.apache.spark.graphx.EdgeContext
-
Converts the edge and vertex properties into an
EdgeTriplet
for convenience.
- toErrorString() - Method in class org.apache.spark.ExceptionFailure
-
- toErrorString() - Method in class org.apache.spark.ExecutorLostFailure
-
- toErrorString() - Method in class org.apache.spark.FetchFailed
-
- toErrorString() - Static method in class org.apache.spark.Resubmitted
-
- toErrorString() - Method in class org.apache.spark.TaskCommitDenied
-
- toErrorString() - Method in interface org.apache.spark.TaskFailedReason
-
Error message displayed in the web UI.
- toErrorString() - Static method in class org.apache.spark.TaskKilled
-
- toErrorString() - Static method in class org.apache.spark.TaskResultLost
-
- toErrorString() - Static method in class org.apache.spark.UnknownReason
-
- toFloat() - Method in class org.apache.spark.sql.types.Decimal
-
- toFloatArray() - Method in class org.apache.spark.sql.types.ArrayData
-
- toFormattedString() - Method in class org.apache.spark.streaming.Duration
-
- toHiveString(Tuple2<Object, DataType>) - Static method in class org.apache.spark.sql.hive.HiveContext
-
- toHiveStructString(Tuple2<Object, DataType>) - Static method in class org.apache.spark.sql.hive.HiveContext
-
Hive outputs fields of structs slightly differently than top level attributes.
- toIndexedRowMatrix() - Method in class org.apache.spark.mllib.linalg.distributed.BlockMatrix
-
Converts to IndexedRowMatrix.
- toIndexedRowMatrix() - Method in class org.apache.spark.mllib.linalg.distributed.CoordinateMatrix
-
Converts to IndexedRowMatrix.
- toInt() - Method in class org.apache.spark.sql.types.Decimal
-
- toInt() - Method in class org.apache.spark.storage.StorageLevel
-
- toIntArray() - Method in class org.apache.spark.sql.types.ArrayData
-
- toJavaBigDecimal() - Method in class org.apache.spark.sql.types.Decimal
-
- toJavaDStream() - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Convert to a JavaDStream
- toJavaRDD() - Method in class org.apache.spark.rdd.RDD
-
- toJavaRDD() - Method in class org.apache.spark.sql.DataFrame
-
- toJSON() - Method in class org.apache.spark.sql.DataFrame
-
Returns the content of the
DataFrame
as a RDD of JSON strings.
- Tokenizer - Class in org.apache.spark.ml.feature
-
:: Experimental ::
A tokenizer that converts the input string to lowercase and then splits it by white spaces.
- Tokenizer(String) - Constructor for class org.apache.spark.ml.feature.Tokenizer
-
- Tokenizer() - Constructor for class org.apache.spark.ml.feature.Tokenizer
-
- toLocal() - Method in class org.apache.spark.mllib.clustering.DistributedLDAModel
-
Convert model to a local model.
- toLocalIterator() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return an iterator that contains all of the elements in this RDD.
- toLocalIterator() - Method in class org.apache.spark.rdd.RDD
-
Return an iterator that contains all of the elements in this RDD.
- toLocalMatrix() - Method in class org.apache.spark.mllib.linalg.distributed.BlockMatrix
-
Collect the distributed matrix on the driver as a `DenseMatrix`.
- toLong() - Method in class org.apache.spark.sql.types.Decimal
-
- toLongArray() - Method in class org.apache.spark.sql.types.ArrayData
-
- toMetadata(Metadata) - Method in class org.apache.spark.ml.attribute.Attribute
-
Converts to ML metadata with some existing metadata.
- toMetadata() - Method in class org.apache.spark.ml.attribute.Attribute
-
Converts to ML metadata
- toMetadata(Metadata) - Method in class org.apache.spark.ml.attribute.AttributeGroup
-
Converts to ML metadata with some existing metadata.
- toMetadata() - Method in class org.apache.spark.ml.attribute.AttributeGroup
-
Converts to ML metadata
- toOld() - Method in interface org.apache.spark.ml.tree.Split
-
Convert to old Split format
- top(int, Comparator<T>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Returns the top k (largest) elements from this RDD as defined by
the specified Comparator[T].
- top(int) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Returns the top k (largest) elements from this RDD using the
natural ordering for T.
- top(int, Ordering<T>) - Method in class org.apache.spark.rdd.RDD
-
- toPairDStreamFunctions(DStream<Tuple2<K, V>>, ClassTag<K>, ClassTag<V>, Ordering<K>) - Static method in class org.apache.spark.streaming.dstream.DStream
-
- toPairDStreamFunctions(DStream<Tuple2<K, V>>, ClassTag<K>, ClassTag<V>, Ordering<K>) - Static method in class org.apache.spark.streaming.StreamingContext
-
Deprecated.
As of 1.3.0, replaced by implicit functions in the DStream companion object.
This is kept here only for backward compatibility.
- topByKey(int, Ordering<V>) - Method in class org.apache.spark.mllib.rdd.MLPairRDDFunctions
-
Returns the top k (largest) elements for each key from this RDD as defined by the specified
implicit Ordering[T].
- topDocumentsPerTopic(int) - Method in class org.apache.spark.mllib.clustering.DistributedLDAModel
-
Return the top documents for each topic
- topic() - Method in class org.apache.spark.streaming.kafka.OffsetRange
-
- topicAndPartition() - Method in class org.apache.spark.streaming.kafka.OffsetRange
-
Kafka TopicAndPartition object, for convenience
- topicAssignments() - Method in class org.apache.spark.mllib.clustering.DistributedLDAModel
-
Return the top topic for each (doc, term) pair.
- topicConcentration() - Method in class org.apache.spark.mllib.clustering.DistributedLDAModel
-
- topicConcentration() - Method in class org.apache.spark.mllib.clustering.EMLDAOptimizer
-
- topicConcentration() - Method in class org.apache.spark.mllib.clustering.LDAModel
-
Concentration parameter (commonly named "beta" or "eta") for the prior placed on topics'
distributions over terms.
- topicConcentration() - Method in class org.apache.spark.mllib.clustering.LocalLDAModel
-
- topicDistributions() - Method in class org.apache.spark.mllib.clustering.DistributedLDAModel
-
For each document in the training set, return the distribution over topics for that document
("theta_doc").
- topicDistributions(RDD<Tuple2<Object, Vector>>) - Method in class org.apache.spark.mllib.clustering.LocalLDAModel
-
Predicts the topic mixture distribution for each document (often called "theta" in the
literature).
- topicDistributions(JavaPairRDD<Long, Vector>) - Method in class org.apache.spark.mllib.clustering.LocalLDAModel
-
Java-friendly version of topicDistributions
- topics() - Method in class org.apache.spark.mllib.clustering.LocalLDAModel
-
- topicsMatrix() - Method in class org.apache.spark.mllib.clustering.DistributedLDAModel
-
Inferred topics, where each topic is represented by a distribution over terms.
- topicsMatrix() - Method in class org.apache.spark.mllib.clustering.LDAModel
-
Inferred topics, where each topic is represented by a distribution over terms.
- topicsMatrix() - Method in class org.apache.spark.mllib.clustering.LocalLDAModel
-
- toPMML(StreamResult) - Method in interface org.apache.spark.mllib.pmml.PMMLExportable
-
Export the model to the stream result in PMML format
- toPMML(String) - Method in interface org.apache.spark.mllib.pmml.PMMLExportable
-
:: Experimental ::
Export the model to a local file in PMML format
- toPMML(SparkContext, String) - Method in interface org.apache.spark.mllib.pmml.PMMLExportable
-
:: Experimental ::
Export the model to a directory on a distributed file system in PMML format
- toPMML(OutputStream) - Method in interface org.apache.spark.mllib.pmml.PMMLExportable
-
:: Experimental ::
Export the model to the OutputStream in PMML format
- toPMML() - Method in interface org.apache.spark.mllib.pmml.PMMLExportable
-
:: Experimental ::
Export the model to a String in PMML format
- topNode() - Method in class org.apache.spark.mllib.tree.model.DecisionTreeModel
-
- topTopicsPerDocument(int) - Method in class org.apache.spark.mllib.clustering.DistributedLDAModel
-
For each document, return the top k weighted topics for that document and their weights.
- toRadians(Column) - Static method in class org.apache.spark.sql.functions
-
Converts an angle measured in degrees to an approximately equivalent angle measured in radians.
- toRadians(String) - Static method in class org.apache.spark.sql.functions
-
Converts an angle measured in degrees to an approximately equivalent angle measured in radians.
- toRDD(JavaDoubleRDD) - Static method in class org.apache.spark.api.java.JavaDoubleRDD
-
- toRDD(JavaPairRDD<K, V>) - Static method in class org.apache.spark.api.java.JavaPairRDD
-
- toRDD(JavaRDD<T>) - Static method in class org.apache.spark.api.java.JavaRDD
-
- toRdd() - Method in class org.apache.spark.sql.SQLContext.QueryExecution
-
- toRowMatrix() - Method in class org.apache.spark.mllib.linalg.distributed.CoordinateMatrix
-
Converts to RowMatrix, dropping row indices after grouping by row index.
- toRowMatrix() - Method in class org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix
-
Drops row indices and converts this matrix to a
RowMatrix
.
- TorrentBroadcastFactory - Class in org.apache.spark.broadcast
-
A
Broadcast
implementation that uses a BitTorrent-like
protocol to do a distributed transfer of the broadcasted data to the executors.
- TorrentBroadcastFactory() - Constructor for class org.apache.spark.broadcast.TorrentBroadcastFactory
-
- toScalaMap(ArrayBasedMapData) - Static method in class org.apache.spark.sql.types.ArrayBasedMapData
-
- toSchemaRDD() - Method in class org.apache.spark.sql.DataFrame
-
Deprecated.
As of 1.3.0, replaced by toDF()
.
- toSeq() - Method in class org.apache.spark.ml.param.ParamMap
-
Converts this param map to a sequence of param pairs.
- toSeq() - Method in interface org.apache.spark.sql.Row
-
Return a Scala Seq representing the row.
- toShort() - Method in class org.apache.spark.sql.types.Decimal
-
- toShortArray() - Method in class org.apache.spark.sql.types.ArrayData
-
- toSparkContext(JavaSparkContext) - Static method in class org.apache.spark.api.java.JavaSparkContext
-
- toSparse() - Method in class org.apache.spark.mllib.linalg.DenseMatrix
-
Generate a SparseMatrix
from the given DenseMatrix
.
- toSparse() - Method in class org.apache.spark.mllib.linalg.DenseVector
-
- toSparse() - Method in class org.apache.spark.mllib.linalg.SparseVector
-
- toSparse() - Method in interface org.apache.spark.mllib.linalg.Vector
-
Converts this vector to a sparse vector with all explicit zeros removed.
- toSplitInfo(Class<?>, String, InputSplit) - Static method in class org.apache.spark.scheduler.SplitInfo
-
- toSplitInfo(Class<?>, String, InputSplit) - Static method in class org.apache.spark.scheduler.SplitInfo
-
- toString() - Method in class org.apache.spark.Accumulable
-
- toString() - Method in class org.apache.spark.api.java.JavaRDD
-
- toString() - Method in class org.apache.spark.broadcast.Broadcast
-
- toString() - Method in class org.apache.spark.graphx.EdgeDirection
-
- toString() - Method in class org.apache.spark.graphx.EdgeTriplet
-
- toString() - Method in class org.apache.spark.ml.attribute.Attribute
-
- toString() - Method in class org.apache.spark.ml.classification.DecisionTreeClassificationModel
-
- toString() - Method in class org.apache.spark.ml.classification.GBTClassificationModel
-
- toString() - Method in class org.apache.spark.ml.classification.NaiveBayesModel
-
- toString() - Method in class org.apache.spark.ml.classification.RandomForestClassificationModel
-
- toString() - Method in class org.apache.spark.ml.feature.RFormula
-
- toString() - Method in class org.apache.spark.ml.feature.RFormulaModel
-
- toString() - Method in class org.apache.spark.ml.param.Param
-
- toString() - Method in class org.apache.spark.ml.param.ParamMap
-
- toString() - Method in class org.apache.spark.ml.regression.DecisionTreeRegressionModel
-
- toString() - Method in class org.apache.spark.ml.regression.GBTRegressionModel
-
- toString() - Method in class org.apache.spark.ml.regression.RandomForestRegressionModel
-
- toString() - Method in class org.apache.spark.ml.tree.InternalNode
-
- toString() - Method in class org.apache.spark.ml.tree.LeafNode
-
- toString() - Method in interface org.apache.spark.ml.util.Identifiable
-
- toString() - Method in class org.apache.spark.mllib.classification.LogisticRegressionModel
-
- toString() - Method in class org.apache.spark.mllib.classification.SVMModel
-
- toString() - Method in class org.apache.spark.mllib.linalg.DenseVector
-
- toString() - Method in interface org.apache.spark.mllib.linalg.Matrix
-
A human readable representation of the matrix
- toString(int, int) - Method in interface org.apache.spark.mllib.linalg.Matrix
-
A human readable representation of the matrix with maximum lines and width
- toString() - Method in class org.apache.spark.mllib.linalg.SparseVector
-
- toString() - Method in class org.apache.spark.mllib.regression.GeneralizedLinearModel
-
Print a summary of the model.
- toString() - Method in class org.apache.spark.mllib.regression.LabeledPoint
-
- toString() - Method in class org.apache.spark.mllib.stat.test.ChiSqTestResult
-
- toString() - Method in class org.apache.spark.mllib.stat.test.KolmogorovSmirnovTestResult
-
- toString() - Method in interface org.apache.spark.mllib.stat.test.TestResult
-
String explaining the hypothesis test result.
- toString() - Method in class org.apache.spark.mllib.tree.model.DecisionTreeModel
-
Print a summary of the model.
- toString() - Method in class org.apache.spark.mllib.tree.model.InformationGainStats
-
- toString() - Method in class org.apache.spark.mllib.tree.model.Node
-
- toString() - Method in class org.apache.spark.mllib.tree.model.Predict
-
- toString() - Method in class org.apache.spark.mllib.tree.model.Split
-
- toString() - Method in class org.apache.spark.partial.BoundedDouble
-
- toString() - Method in class org.apache.spark.partial.PartialResult
-
- toString() - Method in class org.apache.spark.rdd.RDD
-
- toString() - Method in class org.apache.spark.scheduler.InputFormatInfo
-
- toString() - Method in class org.apache.spark.scheduler.SplitInfo
-
- toString() - Method in class org.apache.spark.SerializableWritable
-
- toString() - Method in class org.apache.spark.sql.Column
-
- toString() - Method in class org.apache.spark.sql.DataFrame
-
- toString() - Method in interface org.apache.spark.sql.Row
-
- toString() - Method in class org.apache.spark.sql.sources.HadoopFsRelation
-
- toString() - Method in class org.apache.spark.sql.SQLContext.QueryExecution
-
- toString() - Method in class org.apache.spark.sql.types.ArrayBasedMapData
-
- toString() - Method in class org.apache.spark.sql.types.Decimal
-
- toString() - Method in class org.apache.spark.sql.types.DecimalType
-
- toString() - Method in class org.apache.spark.sql.types.GenericArrayData
-
- toString() - Method in class org.apache.spark.sql.types.Metadata
-
- toString() - Method in class org.apache.spark.sql.types.StructField
-
- toString() - Method in class org.apache.spark.storage.BlockId
-
- toString() - Method in class org.apache.spark.storage.BlockManagerId
-
- toString() - Method in class org.apache.spark.storage.RDDInfo
-
- toString() - Method in class org.apache.spark.storage.StorageLevel
-
- toString() - Method in class org.apache.spark.streaming.Duration
-
- toString() - Method in class org.apache.spark.streaming.kafka.Broker
-
- toString() - Method in class org.apache.spark.streaming.kafka.OffsetRange
-
- toString() - Method in class org.apache.spark.streaming.Time
-
- toString() - Method in class org.apache.spark.util.MutablePair
-
- toString() - Method in class org.apache.spark.util.StatCounter
-
- toString() - Method in class org.apache.spark.util.Vector
-
- toStructField(Metadata) - Method in class org.apache.spark.ml.attribute.Attribute
-
Converts to a StructField
with some existing metadata.
- toStructField() - Method in class org.apache.spark.ml.attribute.Attribute
-
Converts to a StructField
.
- toStructField(Metadata) - Method in class org.apache.spark.ml.attribute.AttributeGroup
-
Converts to a StructField with some existing metadata.
- toStructField() - Method in class org.apache.spark.ml.attribute.AttributeGroup
-
Converts to a StructField.
- totalBlocksFetched() - Method in class org.apache.spark.status.api.v1.ShuffleReadMetricDistributions
-
- totalBlocksFetched() - Method in class org.apache.spark.status.api.v1.ShuffleReadMetrics
-
- totalCores() - Method in class org.apache.spark.scheduler.cluster.ExecutorInfo
-
- totalDelay() - Method in class org.apache.spark.streaming.scheduler.BatchInfo
-
Time taken for all the jobs of this batch to finish processing from the time they
were submitted.
- totalDuration() - Method in class org.apache.spark.status.api.v1.ExecutorSummary
-
- totalInputBytes() - Method in class org.apache.spark.status.api.v1.ExecutorSummary
-
- totalIterations() - Method in interface org.apache.spark.ml.classification.LogisticRegressionTrainingSummary
-
Number of training iterations until termination
- totalIterations() - Method in class org.apache.spark.ml.regression.LinearRegressionTrainingSummary
-
Number of training iterations until termination
- totalShuffleRead() - Method in class org.apache.spark.status.api.v1.ExecutorSummary
-
- totalShuffleWrite() - Method in class org.apache.spark.status.api.v1.ExecutorSummary
-
- totalTasks() - Method in class org.apache.spark.status.api.v1.ExecutorSummary
-
- toTuple() - Method in class org.apache.spark.graphx.EdgeTriplet
-
- toUnscaledLong() - Method in class org.apache.spark.sql.types.Decimal
-
- train(DataFrame) - Method in class org.apache.spark.ml.classification.DecisionTreeClassifier
-
- train(DataFrame) - Method in class org.apache.spark.ml.classification.GBTClassifier
-
- train(DataFrame) - Method in class org.apache.spark.ml.classification.LogisticRegression
-
- train(DataFrame) - Method in class org.apache.spark.ml.classification.MultilayerPerceptronClassifier
-
Train a model using the given dataset and parameters.
- train(DataFrame) - Method in class org.apache.spark.ml.classification.NaiveBayes
-
- train(DataFrame) - Method in class org.apache.spark.ml.classification.RandomForestClassifier
-
- train(DataFrame) - Method in class org.apache.spark.ml.Predictor
-
Train a model using the given dataset and parameters.
- train(RDD<ALS.Rating<ID>>, int, int, int, int, double, boolean, double, boolean, StorageLevel, StorageLevel, int, long, ClassTag<ID>, Ordering<ID>) - Static method in class org.apache.spark.ml.recommendation.ALS
-
:: DeveloperApi ::
Implementation of the ALS algorithm.
- train(DataFrame) - Method in class org.apache.spark.ml.regression.DecisionTreeRegressor
-
- train(DataFrame) - Method in class org.apache.spark.ml.regression.GBTRegressor
-
- train(DataFrame) - Method in class org.apache.spark.ml.regression.LinearRegression
-
- train(DataFrame) - Method in class org.apache.spark.ml.regression.RandomForestRegressor
-
- train(RDD<LabeledPoint>, int, double, double, Vector) - Static method in class org.apache.spark.mllib.classification.LogisticRegressionWithSGD
-
Train a logistic regression model given an RDD of (label, features) pairs.
- train(RDD<LabeledPoint>, int, double, double) - Static method in class org.apache.spark.mllib.classification.LogisticRegressionWithSGD
-
Train a logistic regression model given an RDD of (label, features) pairs.
- train(RDD<LabeledPoint>, int, double) - Static method in class org.apache.spark.mllib.classification.LogisticRegressionWithSGD
-
Train a logistic regression model given an RDD of (label, features) pairs.
- train(RDD<LabeledPoint>, int) - Static method in class org.apache.spark.mllib.classification.LogisticRegressionWithSGD
-
Train a logistic regression model given an RDD of (label, features) pairs.
- train(RDD<LabeledPoint>) - Static method in class org.apache.spark.mllib.classification.NaiveBayes
-
Trains a Naive Bayes model given an RDD of (label, features)
pairs.
- train(RDD<LabeledPoint>, double) - Static method in class org.apache.spark.mllib.classification.NaiveBayes
-
Trains a Naive Bayes model given an RDD of (label, features)
pairs.
- train(RDD<LabeledPoint>, double, String) - Static method in class org.apache.spark.mllib.classification.NaiveBayes
-
Trains a Naive Bayes model given an RDD of (label, features)
pairs.
- train(RDD<LabeledPoint>, int, double, double, double, Vector) - Static method in class org.apache.spark.mllib.classification.SVMWithSGD
-
Train a SVM model given an RDD of (label, features) pairs.
- train(RDD<LabeledPoint>, int, double, double, double) - Static method in class org.apache.spark.mllib.classification.SVMWithSGD
-
Train a SVM model given an RDD of (label, features) pairs.
- train(RDD<LabeledPoint>, int, double, double) - Static method in class org.apache.spark.mllib.classification.SVMWithSGD
-
Train a SVM model given an RDD of (label, features) pairs.
- train(RDD<LabeledPoint>, int) - Static method in class org.apache.spark.mllib.classification.SVMWithSGD
-
Train a SVM model given an RDD of (label, features) pairs.
- train(RDD<Vector>, int, int, int, String, long) - Static method in class org.apache.spark.mllib.clustering.KMeans
-
Trains a k-means model using the given set of parameters.
- train(RDD<Vector>, int, int, int, String) - Static method in class org.apache.spark.mllib.clustering.KMeans
-
Trains a k-means model using the given set of parameters.
- train(RDD<Vector>, int, int) - Static method in class org.apache.spark.mllib.clustering.KMeans
-
Trains a k-means model using specified parameters and the default values for unspecified.
- train(RDD<Vector>, int, int, int) - Static method in class org.apache.spark.mllib.clustering.KMeans
-
Trains a k-means model using specified parameters and the default values for unspecified.
- train(RDD<Rating>, int, int, double, int, long) - Static method in class org.apache.spark.mllib.recommendation.ALS
-
Train a matrix factorization model given an RDD of ratings given by users to some products,
in the form of (userID, productID, rating) pairs.
- train(RDD<Rating>, int, int, double, int) - Static method in class org.apache.spark.mllib.recommendation.ALS
-
Train a matrix factorization model given an RDD of ratings given by users to some products,
in the form of (userID, productID, rating) pairs.
- train(RDD<Rating>, int, int, double) - Static method in class org.apache.spark.mllib.recommendation.ALS
-
Train a matrix factorization model given an RDD of ratings given by users to some products,
in the form of (userID, productID, rating) pairs.
- train(RDD<Rating>, int, int) - Static method in class org.apache.spark.mllib.recommendation.ALS
-
Train a matrix factorization model given an RDD of ratings given by users to some products,
in the form of (userID, productID, rating) pairs.
- train(RDD<LabeledPoint>, int, double, double, double, Vector) - Static method in class org.apache.spark.mllib.regression.LassoWithSGD
-
Train a Lasso model given an RDD of (label, features) pairs.
- train(RDD<LabeledPoint>, int, double, double, double) - Static method in class org.apache.spark.mllib.regression.LassoWithSGD
-
Train a Lasso model given an RDD of (label, features) pairs.
- train(RDD<LabeledPoint>, int, double, double) - Static method in class org.apache.spark.mllib.regression.LassoWithSGD
-
Train a Lasso model given an RDD of (label, features) pairs.
- train(RDD<LabeledPoint>, int) - Static method in class org.apache.spark.mllib.regression.LassoWithSGD
-
Train a Lasso model given an RDD of (label, features) pairs.
- train(RDD<LabeledPoint>, int, double, double, Vector) - Static method in class org.apache.spark.mllib.regression.LinearRegressionWithSGD
-
Train a Linear Regression model given an RDD of (label, features) pairs.
- train(RDD<LabeledPoint>, int, double, double) - Static method in class org.apache.spark.mllib.regression.LinearRegressionWithSGD
-
Train a LinearRegression model given an RDD of (label, features) pairs.
- train(RDD<LabeledPoint>, int, double) - Static method in class org.apache.spark.mllib.regression.LinearRegressionWithSGD
-
Train a LinearRegression model given an RDD of (label, features) pairs.
- train(RDD<LabeledPoint>, int) - Static method in class org.apache.spark.mllib.regression.LinearRegressionWithSGD
-
Train a LinearRegression model given an RDD of (label, features) pairs.
- train(RDD<LabeledPoint>, int, double, double, double, Vector) - Static method in class org.apache.spark.mllib.regression.RidgeRegressionWithSGD
-
Train a RidgeRegression model given an RDD of (label, features) pairs.
- train(RDD<LabeledPoint>, int, double, double, double) - Static method in class org.apache.spark.mllib.regression.RidgeRegressionWithSGD
-
Train a RidgeRegression model given an RDD of (label, features) pairs.
- train(RDD<LabeledPoint>, int, double, double) - Static method in class org.apache.spark.mllib.regression.RidgeRegressionWithSGD
-
Train a RidgeRegression model given an RDD of (label, features) pairs.
- train(RDD<LabeledPoint>, int) - Static method in class org.apache.spark.mllib.regression.RidgeRegressionWithSGD
-
Train a RidgeRegression model given an RDD of (label, features) pairs.
- train(RDD<LabeledPoint>, Strategy) - Static method in class org.apache.spark.mllib.tree.DecisionTree
-
Method to train a decision tree model.
- train(RDD<LabeledPoint>, Enumeration.Value, Impurity, int) - Static method in class org.apache.spark.mllib.tree.DecisionTree
-
Method to train a decision tree model.
- train(RDD<LabeledPoint>, Enumeration.Value, Impurity, int, int) - Static method in class org.apache.spark.mllib.tree.DecisionTree
-
Method to train a decision tree model.
- train(RDD<LabeledPoint>, Enumeration.Value, Impurity, int, int, int, Enumeration.Value, Map<Object, Object>) - Static method in class org.apache.spark.mllib.tree.DecisionTree
-
Method to train a decision tree model.
- train(RDD<LabeledPoint>, BoostingStrategy) - Static method in class org.apache.spark.mllib.tree.GradientBoostedTrees
-
Method to train a gradient boosting model.
- train(JavaRDD<LabeledPoint>, BoostingStrategy) - Static method in class org.apache.spark.mllib.tree.GradientBoostedTrees
-
Java-friendly API for GradientBoostedTrees$.train(org.apache.spark.rdd.RDD<org.apache.spark.mllib.regression.LabeledPoint>, org.apache.spark.mllib.tree.configuration.BoostingStrategy)
- trainClassifier(RDD<LabeledPoint>, int, Map<Object, Object>, String, int, int) - Static method in class org.apache.spark.mllib.tree.DecisionTree
-
Method to train a decision tree model for binary or multiclass classification.
- trainClassifier(JavaRDD<LabeledPoint>, int, Map<Integer, Integer>, String, int, int) - Static method in class org.apache.spark.mllib.tree.DecisionTree
-
Java-friendly API for DecisionTree$.trainClassifier(org.apache.spark.rdd.RDD<org.apache.spark.mllib.regression.LabeledPoint>, int, scala.collection.immutable.Map<java.lang.Object, java.lang.Object>, java.lang.String, int, int)
- trainClassifier(RDD<LabeledPoint>, Strategy, int, String, int) - Static method in class org.apache.spark.mllib.tree.RandomForest
-
Method to train a decision tree model for binary or multiclass classification.
- trainClassifier(RDD<LabeledPoint>, int, Map<Object, Object>, int, String, String, int, int, int) - Static method in class org.apache.spark.mllib.tree.RandomForest
-
Method to train a decision tree model for binary or multiclass classification.
- trainClassifier(JavaRDD<LabeledPoint>, int, Map<Integer, Integer>, int, String, String, int, int, int) - Static method in class org.apache.spark.mllib.tree.RandomForest
-
Java-friendly API for RandomForest$.trainClassifier(org.apache.spark.rdd.RDD<org.apache.spark.mllib.regression.LabeledPoint>, org.apache.spark.mllib.tree.configuration.Strategy, int, java.lang.String, int)
- trainImplicit(RDD<Rating>, int, int, double, int, double, long) - Static method in class org.apache.spark.mllib.recommendation.ALS
-
Train a matrix factorization model given an RDD of 'implicit preferences' given by users
to some products, in the form of (userID, productID, preference) pairs.
- trainImplicit(RDD<Rating>, int, int, double, int, double) - Static method in class org.apache.spark.mllib.recommendation.ALS
-
Train a matrix factorization model given an RDD of 'implicit preferences' given by users
to some products, in the form of (userID, productID, preference) pairs.
- trainImplicit(RDD<Rating>, int, int, double, double) - Static method in class org.apache.spark.mllib.recommendation.ALS
-
Train a matrix factorization model given an RDD of 'implicit preferences' given by users to
some products, in the form of (userID, productID, preference) pairs.
- trainImplicit(RDD<Rating>, int, int) - Static method in class org.apache.spark.mllib.recommendation.ALS
-
Train a matrix factorization model given an RDD of 'implicit preferences' ratings given by
users to some products, in the form of (userID, productID, rating) pairs.
- trainOn(DStream<Vector>) - Method in class org.apache.spark.mllib.clustering.StreamingKMeans
-
Update the clustering model by training on batches of data from a DStream.
- trainOn(JavaDStream<Vector>) - Method in class org.apache.spark.mllib.clustering.StreamingKMeans
-
Java-friendly version of trainOn
.
- trainOn(DStream<LabeledPoint>) - Method in class org.apache.spark.mllib.regression.StreamingLinearAlgorithm
-
Update the model by training on batches of data from a DStream.
- trainOn(JavaDStream<LabeledPoint>) - Method in class org.apache.spark.mllib.regression.StreamingLinearAlgorithm
-
Java-friendly version of trainOn
.
- trainRegressor(RDD<LabeledPoint>, Map<Object, Object>, String, int, int) - Static method in class org.apache.spark.mllib.tree.DecisionTree
-
Method to train a decision tree model for regression.
- trainRegressor(JavaRDD<LabeledPoint>, Map<Integer, Integer>, String, int, int) - Static method in class org.apache.spark.mllib.tree.DecisionTree
-
Java-friendly API for DecisionTree$.trainRegressor(org.apache.spark.rdd.RDD<org.apache.spark.mllib.regression.LabeledPoint>, scala.collection.immutable.Map<java.lang.Object, java.lang.Object>, java.lang.String, int, int)
- trainRegressor(RDD<LabeledPoint>, Strategy, int, String, int) - Static method in class org.apache.spark.mllib.tree.RandomForest
-
Method to train a decision tree model for regression.
- trainRegressor(RDD<LabeledPoint>, Map<Object, Object>, int, String, String, int, int, int) - Static method in class org.apache.spark.mllib.tree.RandomForest
-
Method to train a decision tree model for regression.
- trainRegressor(JavaRDD<LabeledPoint>, Map<Integer, Integer>, int, String, String, int, int, int) - Static method in class org.apache.spark.mllib.tree.RandomForest
-
Java-friendly API for RandomForest$.trainRegressor(org.apache.spark.rdd.RDD<org.apache.spark.mllib.regression.LabeledPoint>, org.apache.spark.mllib.tree.configuration.Strategy, int, java.lang.String, int)
- TrainValidationSplit - Class in org.apache.spark.ml.tuning
-
:: Experimental ::
Validation for hyper-parameter tuning.
- TrainValidationSplit(String) - Constructor for class org.apache.spark.ml.tuning.TrainValidationSplit
-
- TrainValidationSplit() - Constructor for class org.apache.spark.ml.tuning.TrainValidationSplit
-
- TrainValidationSplitModel - Class in org.apache.spark.ml.tuning
-
:: Experimental ::
Model from train validation split.
- transform(DataFrame) - Method in class org.apache.spark.ml.classification.ClassificationModel
-
Transforms dataset by reading from featuresCol
, and appending new columns as specified by
parameters:
- predicted labels as predictionCol
of type Double
- raw predictions (confidences) as rawPredictionCol
of type Vector
.
- transform(DataFrame) - Method in class org.apache.spark.ml.classification.OneVsRestModel
-
- transform(DataFrame) - Method in class org.apache.spark.ml.classification.ProbabilisticClassificationModel
-
Transforms dataset by reading from featuresCol
, and appending new columns as specified by
parameters:
- predicted labels as predictionCol
of type Double
- raw predictions (confidences) as rawPredictionCol
of type Vector
- probability of each class as probabilityCol
of type Vector
.
- transform(DataFrame) - Method in class org.apache.spark.ml.clustering.KMeansModel
-
- transform(DataFrame) - Method in class org.apache.spark.ml.feature.Binarizer
-
- transform(DataFrame) - Method in class org.apache.spark.ml.feature.Bucketizer
-
- transform(DataFrame) - Method in class org.apache.spark.ml.feature.ColumnPruner
-
- transform(DataFrame) - Method in class org.apache.spark.ml.feature.CountVectorizerModel
-
- transform(DataFrame) - Method in class org.apache.spark.ml.feature.HashingTF
-
- transform(DataFrame) - Method in class org.apache.spark.ml.feature.IDFModel
-
- transform(DataFrame) - Method in class org.apache.spark.ml.feature.IndexToString
-
- transform(DataFrame) - Method in class org.apache.spark.ml.feature.MinMaxScalerModel
-
- transform(DataFrame) - Method in class org.apache.spark.ml.feature.OneHotEncoder
-
- transform(DataFrame) - Method in class org.apache.spark.ml.feature.PCAModel
-
- transform(DataFrame) - Method in class org.apache.spark.ml.feature.RFormulaModel
-
- transform(DataFrame) - Method in class org.apache.spark.ml.feature.StandardScalerModel
-
- transform(DataFrame) - Method in class org.apache.spark.ml.feature.StopWordsRemover
-
- transform(DataFrame) - Method in class org.apache.spark.ml.feature.StringIndexerModel
-
- transform(DataFrame) - Method in class org.apache.spark.ml.feature.VectorAssembler
-
- transform(DataFrame) - Method in class org.apache.spark.ml.feature.VectorIndexerModel
-
- transform(DataFrame) - Method in class org.apache.spark.ml.feature.VectorSlicer
-
- transform(DataFrame) - Method in class org.apache.spark.ml.feature.Word2VecModel
-
Transform a sentence column to a vector column to represent the whole sentence.
- transform(DataFrame) - Method in class org.apache.spark.ml.PipelineModel
-
- transform(DataFrame) - Method in class org.apache.spark.ml.PredictionModel
-
Transforms dataset by reading from featuresCol
, calling predict()
, and storing
the predictions as a new column predictionCol
.
- transform(DataFrame) - Method in class org.apache.spark.ml.recommendation.ALSModel
-
- transform(DataFrame) - Method in class org.apache.spark.ml.regression.IsotonicRegressionModel
-
- transform(DataFrame, ParamPair<?>, ParamPair<?>...) - Method in class org.apache.spark.ml.Transformer
-
Transforms the dataset with optional parameters
- transform(DataFrame, ParamPair<?>, Seq<ParamPair<?>>) - Method in class org.apache.spark.ml.Transformer
-
Transforms the dataset with optional parameters
- transform(DataFrame, ParamMap) - Method in class org.apache.spark.ml.Transformer
-
Transforms the dataset with provided parameter map as additional parameters.
- transform(DataFrame) - Method in class org.apache.spark.ml.Transformer
-
Transforms the input dataset.
- transform(DataFrame) - Method in class org.apache.spark.ml.tuning.CrossValidatorModel
-
- transform(DataFrame) - Method in class org.apache.spark.ml.tuning.TrainValidationSplitModel
-
- transform(DataFrame) - Method in class org.apache.spark.ml.UnaryTransformer
-
- transform(Vector) - Method in class org.apache.spark.mllib.feature.ChiSqSelectorModel
-
Applies transformation on a vector.
- transform(Vector) - Method in class org.apache.spark.mllib.feature.ElementwiseProduct
-
Does the hadamard product transformation.
- transform(Iterable<Object>) - Method in class org.apache.spark.mllib.feature.HashingTF
-
Transforms the input document into a sparse term frequency vector.
- transform(Iterable<?>) - Method in class org.apache.spark.mllib.feature.HashingTF
-
Transforms the input document into a sparse term frequency vector (Java version).
- transform(RDD<D>) - Method in class org.apache.spark.mllib.feature.HashingTF
-
Transforms the input document to term frequency vectors.
- transform(JavaRDD<D>) - Method in class org.apache.spark.mllib.feature.HashingTF
-
Transforms the input document to term frequency vectors (Java version).
- transform(RDD<Vector>) - Method in class org.apache.spark.mllib.feature.IDFModel
-
Transforms term frequency (TF) vectors to TF-IDF vectors.
- transform(Vector) - Method in class org.apache.spark.mllib.feature.IDFModel
-
Transforms a term frequency (TF) vector to a TF-IDF vector
- transform(JavaRDD<Vector>) - Method in class org.apache.spark.mllib.feature.IDFModel
-
Transforms term frequency (TF) vectors to TF-IDF vectors (Java version).
- transform(Vector) - Method in class org.apache.spark.mllib.feature.Normalizer
-
Applies unit length normalization on a vector.
- transform(Vector) - Method in class org.apache.spark.mllib.feature.PCAModel
-
Transform a vector by computed Principal Components.
- transform(Vector) - Method in class org.apache.spark.mllib.feature.StandardScalerModel
-
Applies standardization transformation on a vector.
- transform(Vector) - Method in interface org.apache.spark.mllib.feature.VectorTransformer
-
Applies transformation on a vector.
- transform(RDD<Vector>) - Method in interface org.apache.spark.mllib.feature.VectorTransformer
-
Applies transformation on an RDD[Vector].
- transform(JavaRDD<Vector>) - Method in interface org.apache.spark.mllib.feature.VectorTransformer
-
Applies transformation on an JavaRDD[Vector].
- transform(String) - Method in class org.apache.spark.mllib.feature.Word2VecModel
-
Transforms a word to its vector representation
- transform(Function<R, JavaRDD<U>>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD is generated by applying a function
on each RDD of 'this' DStream.
- transform(Function2<R, Time, JavaRDD<U>>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD is generated by applying a function
on each RDD of 'this' DStream.
- transform(List<JavaDStream<?>>, Function2<List<JavaRDD<?>>, Time, JavaRDD<T>>) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Create a new DStream in which each RDD is generated by applying a function on RDDs of
the DStreams.
- transform(Function1<RDD<T>, RDD<U>>, ClassTag<U>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream in which each RDD is generated by applying a function
on each RDD of 'this' DStream.
- transform(Function2<RDD<T>, Time, RDD<U>>, ClassTag<U>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream in which each RDD is generated by applying a function
on each RDD of 'this' DStream.
- transform(Seq<DStream<?>>, Function2<Seq<RDD<?>>, Time, RDD<T>>, ClassTag<T>) - Method in class org.apache.spark.streaming.StreamingContext
-
Create a new DStream in which each RDD is generated by applying a function on RDDs of
the DStreams.
- Transformer - Class in org.apache.spark.ml
-
:: DeveloperApi ::
Abstract class for transformers that transform one dataset into another.
- Transformer() - Constructor for class org.apache.spark.ml.Transformer
-
- transformImpl(DataFrame) - Method in class org.apache.spark.ml.classification.GBTClassificationModel
-
- transformImpl(DataFrame) - Method in class org.apache.spark.ml.classification.RandomForestClassificationModel
-
- transformImpl(DataFrame) - Method in class org.apache.spark.ml.PredictionModel
-
- transformImpl(DataFrame) - Method in class org.apache.spark.ml.regression.GBTRegressionModel
-
- transformImpl(DataFrame) - Method in class org.apache.spark.ml.regression.RandomForestRegressionModel
-
- transformSchema(StructType) - Method in class org.apache.spark.ml.classification.OneVsRest
-
- transformSchema(StructType) - Method in class org.apache.spark.ml.classification.OneVsRestModel
-
- transformSchema(StructType) - Method in class org.apache.spark.ml.clustering.KMeans
-
- transformSchema(StructType) - Method in class org.apache.spark.ml.clustering.KMeansModel
-
- transformSchema(StructType) - Method in class org.apache.spark.ml.feature.Binarizer
-
- transformSchema(StructType) - Method in class org.apache.spark.ml.feature.Bucketizer
-
- transformSchema(StructType) - Method in class org.apache.spark.ml.feature.ColumnPruner
-
- transformSchema(StructType) - Method in class org.apache.spark.ml.feature.CountVectorizer
-
- transformSchema(StructType) - Method in class org.apache.spark.ml.feature.CountVectorizerModel
-
- transformSchema(StructType) - Method in class org.apache.spark.ml.feature.HashingTF
-
- transformSchema(StructType) - Method in class org.apache.spark.ml.feature.IDF
-
- transformSchema(StructType) - Method in class org.apache.spark.ml.feature.IDFModel
-
- transformSchema(StructType) - Method in class org.apache.spark.ml.feature.IndexToString
-
Transform the schema for the inverse transformation
- transformSchema(StructType) - Method in class org.apache.spark.ml.feature.MinMaxScaler
-
- transformSchema(StructType) - Method in class org.apache.spark.ml.feature.MinMaxScalerModel
-
- transformSchema(StructType) - Method in class org.apache.spark.ml.feature.OneHotEncoder
-
- transformSchema(StructType) - Method in class org.apache.spark.ml.feature.PCA
-
- transformSchema(StructType) - Method in class org.apache.spark.ml.feature.PCAModel
-
- transformSchema(StructType) - Method in class org.apache.spark.ml.feature.RFormula
-
- transformSchema(StructType) - Method in class org.apache.spark.ml.feature.RFormulaModel
-
- transformSchema(StructType) - Method in class org.apache.spark.ml.feature.StandardScaler
-
- transformSchema(StructType) - Method in class org.apache.spark.ml.feature.StandardScalerModel
-
- transformSchema(StructType) - Method in class org.apache.spark.ml.feature.StopWordsRemover
-
- transformSchema(StructType) - Method in class org.apache.spark.ml.feature.StringIndexer
-
- transformSchema(StructType) - Method in class org.apache.spark.ml.feature.StringIndexerModel
-
- transformSchema(StructType) - Method in class org.apache.spark.ml.feature.VectorAssembler
-
- transformSchema(StructType) - Method in class org.apache.spark.ml.feature.VectorIndexer
-
- transformSchema(StructType) - Method in class org.apache.spark.ml.feature.VectorIndexerModel
-
- transformSchema(StructType) - Method in class org.apache.spark.ml.feature.VectorSlicer
-
- transformSchema(StructType) - Method in class org.apache.spark.ml.feature.Word2Vec
-
- transformSchema(StructType) - Method in class org.apache.spark.ml.feature.Word2VecModel
-
- transformSchema(StructType) - Method in class org.apache.spark.ml.Pipeline
-
- transformSchema(StructType) - Method in class org.apache.spark.ml.PipelineModel
-
- transformSchema(StructType) - Method in class org.apache.spark.ml.PipelineStage
-
:: DeveloperApi ::
- transformSchema(StructType, boolean) - Method in class org.apache.spark.ml.PipelineStage
-
:: DeveloperApi ::
- transformSchema(StructType) - Method in class org.apache.spark.ml.PredictionModel
-
- transformSchema(StructType) - Method in class org.apache.spark.ml.Predictor
-
- transformSchema(StructType) - Method in class org.apache.spark.ml.recommendation.ALS
-
- transformSchema(StructType) - Method in class org.apache.spark.ml.recommendation.ALSModel
-
- transformSchema(StructType) - Method in class org.apache.spark.ml.regression.IsotonicRegression
-
- transformSchema(StructType) - Method in class org.apache.spark.ml.regression.IsotonicRegressionModel
-
- transformSchema(StructType) - Method in class org.apache.spark.ml.tuning.CrossValidator
-
- transformSchema(StructType) - Method in class org.apache.spark.ml.tuning.CrossValidatorModel
-
- transformSchema(StructType) - Method in class org.apache.spark.ml.tuning.TrainValidationSplit
-
- transformSchema(StructType) - Method in class org.apache.spark.ml.tuning.TrainValidationSplitModel
-
- transformSchema(StructType) - Method in class org.apache.spark.ml.UnaryTransformer
-
- transformToPair(Function<R, JavaPairRDD<K2, V2>>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD is generated by applying a function
on each RDD of 'this' DStream.
- transformToPair(Function2<R, Time, JavaPairRDD<K2, V2>>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD is generated by applying a function
on each RDD of 'this' DStream.
- transformToPair(List<JavaDStream<?>>, Function2<List<JavaRDD<?>>, Time, JavaPairRDD<K, V>>) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Create a new DStream in which each RDD is generated by applying a function on RDDs of
the DStreams.
- transformWith(JavaDStream<U>, Function3<R, JavaRDD<U>, Time, JavaRDD<W>>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD is generated by applying a function
on each RDD of 'this' DStream and 'other' DStream.
- transformWith(JavaPairDStream<K2, V2>, Function3<R, JavaPairRDD<K2, V2>, Time, JavaRDD<W>>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD is generated by applying a function
on each RDD of 'this' DStream and 'other' DStream.
- transformWith(DStream<U>, Function2<RDD<T>, RDD<U>, RDD<V>>, ClassTag<U>, ClassTag<V>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream in which each RDD is generated by applying a function
on each RDD of 'this' DStream and 'other' DStream.
- transformWith(DStream<U>, Function3<RDD<T>, RDD<U>, Time, RDD<V>>, ClassTag<U>, ClassTag<V>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream in which each RDD is generated by applying a function
on each RDD of 'this' DStream and 'other' DStream.
- transformWithToPair(JavaDStream<U>, Function3<R, JavaRDD<U>, Time, JavaPairRDD<K2, V2>>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD is generated by applying a function
on each RDD of 'this' DStream and 'other' DStream.
- transformWithToPair(JavaPairDStream<K2, V2>, Function3<R, JavaPairRDD<K2, V2>, Time, JavaPairRDD<K3, V3>>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD is generated by applying a function
on each RDD of 'this' DStream and 'other' DStream.
- translate(Column, String, String) - Static method in class org.apache.spark.sql.functions
-
Translate any character in the src by a character in replaceString.
- transpose() - Method in class org.apache.spark.mllib.linalg.DenseMatrix
-
- transpose() - Method in class org.apache.spark.mllib.linalg.distributed.BlockMatrix
-
Transpose this BlockMatrix
.
- transpose() - Method in class org.apache.spark.mllib.linalg.distributed.CoordinateMatrix
-
Transposes this CoordinateMatrix.
- transpose() - Method in interface org.apache.spark.mllib.linalg.Matrix
-
Transpose the Matrix.
- transpose() - Method in class org.apache.spark.mllib.linalg.SparseMatrix
-
- treeAggregate(U, Function2<U, T, U>, Function2<U, U, U>, int) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Aggregates the elements of this RDD in a multi-level tree pattern.
- treeAggregate(U, Function2<U, T, U>, Function2<U, U, U>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
- treeAggregate(U, Function2<U, T, U>, Function2<U, U, U>, int, ClassTag<U>) - Method in class org.apache.spark.mllib.rdd.RDDFunctions
-
- treeAggregate(U, Function2<U, T, U>, Function2<U, U, U>, int, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
-
Aggregates the elements of this RDD in a multi-level tree pattern.
- treeReduce(Function2<T, T, T>, int) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Reduces the elements of this RDD in a multi-level tree pattern.
- treeReduce(Function2<T, T, T>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
- treeReduce(Function2<T, T, T>, int) - Method in class org.apache.spark.mllib.rdd.RDDFunctions
-
- treeReduce(Function2<T, T, T>, int) - Method in class org.apache.spark.rdd.RDD
-
Reduces the elements of this RDD in a multi-level tree pattern.
- trees() - Method in class org.apache.spark.ml.classification.GBTClassificationModel
-
- trees() - Method in class org.apache.spark.ml.classification.RandomForestClassificationModel
-
- trees() - Method in class org.apache.spark.ml.regression.GBTRegressionModel
-
- trees() - Method in class org.apache.spark.ml.regression.RandomForestRegressionModel
-
- trees() - Method in class org.apache.spark.mllib.tree.model.GradientBoostedTreesModel
-
- trees() - Method in class org.apache.spark.mllib.tree.model.RandomForestModel
-
- treeStrategy() - Method in class org.apache.spark.mllib.tree.configuration.BoostingStrategy
-
- treeString() - Method in class org.apache.spark.sql.types.StructType
-
- treeWeights() - Method in class org.apache.spark.ml.classification.GBTClassificationModel
-
- treeWeights() - Method in class org.apache.spark.ml.classification.RandomForestClassificationModel
-
- treeWeights() - Method in class org.apache.spark.ml.regression.GBTRegressionModel
-
- treeWeights() - Method in class org.apache.spark.ml.regression.RandomForestRegressionModel
-
- treeWeights() - Method in class org.apache.spark.mllib.tree.model.GradientBoostedTreesModel
-
- triangleCount() - Method in class org.apache.spark.graphx.GraphOps
-
Compute the number of triangles passing through each vertex.
- TriangleCount - Class in org.apache.spark.graphx.lib
-
Compute the number of triangles passing through each vertex.
- TriangleCount() - Constructor for class org.apache.spark.graphx.lib.TriangleCount
-
- trim(Column) - Static method in class org.apache.spark.sql.functions
-
Trim the spaces from both ends for the specified string column.
- TripletFields - Class in org.apache.spark.graphx
-
Represents a subset of the fields of an [[EdgeTriplet]] or [[EdgeContext]].
- TripletFields() - Constructor for class org.apache.spark.graphx.TripletFields
-
Constructs a default TripletFields in which all fields are included.
- TripletFields(boolean, boolean, boolean) - Constructor for class org.apache.spark.graphx.TripletFields
-
- triplets() - Method in class org.apache.spark.graphx.Graph
-
An RDD containing the edge triplets, which are edges along with the vertex data associated with
the adjacent vertices.
- triplets() - Method in class org.apache.spark.graphx.impl.GraphImpl
-
Return a RDD that brings edges together with their source and destination vertices.
- truePositiveRate(double) - Method in class org.apache.spark.mllib.evaluation.MulticlassMetrics
-
Returns true positive rate for a given label (category)
- trunc(Column, String) - Static method in class org.apache.spark.sql.functions
-
Returns date truncated to the unit specified by the format.
- TwitterUtils - Class in org.apache.spark.streaming.twitter
-
- TwitterUtils() - Constructor for class org.apache.spark.streaming.twitter.TwitterUtils
-
- typeName() - Method in class org.apache.spark.mllib.linalg.VectorUDT
-
- typeName() - Method in class org.apache.spark.sql.types.DataType
-
Name of the type used in JSON serialization.
- typeName() - Method in class org.apache.spark.sql.types.DecimalType
-
- U() - Method in class org.apache.spark.mllib.linalg.SingularValueDecomposition
-
- udf(Function0<RT>, TypeTags.TypeTag<RT>) - Static method in class org.apache.spark.sql.functions
-
Defines a user-defined function of 0 arguments as user-defined function (UDF).
- udf(Function1<A1, RT>, TypeTags.TypeTag<RT>, TypeTags.TypeTag<A1>) - Static method in class org.apache.spark.sql.functions
-
Defines a user-defined function of 1 arguments as user-defined function (UDF).
- udf(Function2<A1, A2, RT>, TypeTags.TypeTag<RT>, TypeTags.TypeTag<A1>, TypeTags.TypeTag<A2>) - Static method in class org.apache.spark.sql.functions
-
Defines a user-defined function of 2 arguments as user-defined function (UDF).
- udf(Function3<A1, A2, A3, RT>, TypeTags.TypeTag<RT>, TypeTags.TypeTag<A1>, TypeTags.TypeTag<A2>, TypeTags.TypeTag<A3>) - Static method in class org.apache.spark.sql.functions
-
Defines a user-defined function of 3 arguments as user-defined function (UDF).
- udf(Function4<A1, A2, A3, A4, RT>, TypeTags.TypeTag<RT>, TypeTags.TypeTag<A1>, TypeTags.TypeTag<A2>, TypeTags.TypeTag<A3>, TypeTags.TypeTag<A4>) - Static method in class org.apache.spark.sql.functions
-
Defines a user-defined function of 4 arguments as user-defined function (UDF).
- udf(Function5<A1, A2, A3, A4, A5, RT>, TypeTags.TypeTag<RT>, TypeTags.TypeTag<A1>, TypeTags.TypeTag<A2>, TypeTags.TypeTag<A3>, TypeTags.TypeTag<A4>, TypeTags.TypeTag<A5>) - Static method in class org.apache.spark.sql.functions
-
Defines a user-defined function of 5 arguments as user-defined function (UDF).
- udf(Function6<A1, A2, A3, A4, A5, A6, RT>, TypeTags.TypeTag<RT>, TypeTags.TypeTag<A1>, TypeTags.TypeTag<A2>, TypeTags.TypeTag<A3>, TypeTags.TypeTag<A4>, TypeTags.TypeTag<A5>, TypeTags.TypeTag<A6>) - Static method in class org.apache.spark.sql.functions
-
Defines a user-defined function of 6 arguments as user-defined function (UDF).
- udf(Function7<A1, A2, A3, A4, A5, A6, A7, RT>, TypeTags.TypeTag<RT>, TypeTags.TypeTag<A1>, TypeTags.TypeTag<A2>, TypeTags.TypeTag<A3>, TypeTags.TypeTag<A4>, TypeTags.TypeTag<A5>, TypeTags.TypeTag<A6>, TypeTags.TypeTag<A7>) - Static method in class org.apache.spark.sql.functions
-
Defines a user-defined function of 7 arguments as user-defined function (UDF).
- udf(Function8<A1, A2, A3, A4, A5, A6, A7, A8, RT>, TypeTags.TypeTag<RT>, TypeTags.TypeTag<A1>, TypeTags.TypeTag<A2>, TypeTags.TypeTag<A3>, TypeTags.TypeTag<A4>, TypeTags.TypeTag<A5>, TypeTags.TypeTag<A6>, TypeTags.TypeTag<A7>, TypeTags.TypeTag<A8>) - Static method in class org.apache.spark.sql.functions
-
Defines a user-defined function of 8 arguments as user-defined function (UDF).
- udf(Function9<A1, A2, A3, A4, A5, A6, A7, A8, A9, RT>, TypeTags.TypeTag<RT>, TypeTags.TypeTag<A1>, TypeTags.TypeTag<A2>, TypeTags.TypeTag<A3>, TypeTags.TypeTag<A4>, TypeTags.TypeTag<A5>, TypeTags.TypeTag<A6>, TypeTags.TypeTag<A7>, TypeTags.TypeTag<A8>, TypeTags.TypeTag<A9>) - Static method in class org.apache.spark.sql.functions
-
Defines a user-defined function of 9 arguments as user-defined function (UDF).
- udf(Function10<A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, RT>, TypeTags.TypeTag<RT>, TypeTags.TypeTag<A1>, TypeTags.TypeTag<A2>, TypeTags.TypeTag<A3>, TypeTags.TypeTag<A4>, TypeTags.TypeTag<A5>, TypeTags.TypeTag<A6>, TypeTags.TypeTag<A7>, TypeTags.TypeTag<A8>, TypeTags.TypeTag<A9>, TypeTags.TypeTag<A10>) - Static method in class org.apache.spark.sql.functions
-
Defines a user-defined function of 10 arguments as user-defined function (UDF).
- udf() - Method in class org.apache.spark.sql.SQLContext
-
A collection of methods for registering user-defined functions (UDF).
- UDF1<T1,R> - Interface in org.apache.spark.sql.api.java
-
A Spark SQL UDF that has 1 arguments.
- UDF10<T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,R> - Interface in org.apache.spark.sql.api.java
-
A Spark SQL UDF that has 10 arguments.
- UDF11<T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,R> - Interface in org.apache.spark.sql.api.java
-
A Spark SQL UDF that has 11 arguments.
- UDF12<T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,R> - Interface in org.apache.spark.sql.api.java
-
A Spark SQL UDF that has 12 arguments.
- UDF13<T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,R> - Interface in org.apache.spark.sql.api.java
-
A Spark SQL UDF that has 13 arguments.
- UDF14<T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,R> - Interface in org.apache.spark.sql.api.java
-
A Spark SQL UDF that has 14 arguments.
- UDF15<T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,T15,R> - Interface in org.apache.spark.sql.api.java
-
A Spark SQL UDF that has 15 arguments.
- UDF16<T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,T15,T16,R> - Interface in org.apache.spark.sql.api.java
-
A Spark SQL UDF that has 16 arguments.
- UDF17<T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,T15,T16,T17,R> - Interface in org.apache.spark.sql.api.java
-
A Spark SQL UDF that has 17 arguments.
- UDF18<T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,T15,T16,T17,T18,R> - Interface in org.apache.spark.sql.api.java
-
A Spark SQL UDF that has 18 arguments.
- UDF19<T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,T15,T16,T17,T18,T19,R> - Interface in org.apache.spark.sql.api.java
-
A Spark SQL UDF that has 19 arguments.
- UDF2<T1,T2,R> - Interface in org.apache.spark.sql.api.java
-
A Spark SQL UDF that has 2 arguments.
- UDF20<T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,T15,T16,T17,T18,T19,T20,R> - Interface in org.apache.spark.sql.api.java
-
A Spark SQL UDF that has 20 arguments.
- UDF21<T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,T15,T16,T17,T18,T19,T20,T21,R> - Interface in org.apache.spark.sql.api.java
-
A Spark SQL UDF that has 21 arguments.
- UDF22<T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,T15,T16,T17,T18,T19,T20,T21,T22,R> - Interface in org.apache.spark.sql.api.java
-
A Spark SQL UDF that has 22 arguments.
- UDF3<T1,T2,T3,R> - Interface in org.apache.spark.sql.api.java
-
A Spark SQL UDF that has 3 arguments.
- UDF4<T1,T2,T3,T4,R> - Interface in org.apache.spark.sql.api.java
-
A Spark SQL UDF that has 4 arguments.
- UDF5<T1,T2,T3,T4,T5,R> - Interface in org.apache.spark.sql.api.java
-
A Spark SQL UDF that has 5 arguments.
- UDF6<T1,T2,T3,T4,T5,T6,R> - Interface in org.apache.spark.sql.api.java
-
A Spark SQL UDF that has 6 arguments.
- UDF7<T1,T2,T3,T4,T5,T6,T7,R> - Interface in org.apache.spark.sql.api.java
-
A Spark SQL UDF that has 7 arguments.
- UDF8<T1,T2,T3,T4,T5,T6,T7,T8,R> - Interface in org.apache.spark.sql.api.java
-
A Spark SQL UDF that has 8 arguments.
- UDF9<T1,T2,T3,T4,T5,T6,T7,T8,T9,R> - Interface in org.apache.spark.sql.api.java
-
A Spark SQL UDF that has 9 arguments.
- UDFRegistration - Class in org.apache.spark.sql
-
Functions for registering user-defined functions.
- uid() - Method in class org.apache.spark.ml.classification.DecisionTreeClassificationModel
-
- uid() - Method in class org.apache.spark.ml.classification.DecisionTreeClassifier
-
- uid() - Method in class org.apache.spark.ml.classification.GBTClassificationModel
-
- uid() - Method in class org.apache.spark.ml.classification.GBTClassifier
-
- uid() - Method in class org.apache.spark.ml.classification.LogisticRegression
-
- uid() - Method in class org.apache.spark.ml.classification.LogisticRegressionModel
-
- uid() - Method in class org.apache.spark.ml.classification.MultilayerPerceptronClassificationModel
-
- uid() - Method in class org.apache.spark.ml.classification.MultilayerPerceptronClassifier
-
- uid() - Method in class org.apache.spark.ml.classification.NaiveBayes
-
- uid() - Method in class org.apache.spark.ml.classification.NaiveBayesModel
-
- uid() - Method in class org.apache.spark.ml.classification.OneVsRest
-
- uid() - Method in class org.apache.spark.ml.classification.OneVsRestModel
-
- uid() - Method in class org.apache.spark.ml.classification.RandomForestClassificationModel
-
- uid() - Method in class org.apache.spark.ml.classification.RandomForestClassifier
-
- uid() - Method in class org.apache.spark.ml.clustering.KMeans
-
- uid() - Method in class org.apache.spark.ml.clustering.KMeansModel
-
- uid() - Method in class org.apache.spark.ml.evaluation.BinaryClassificationEvaluator
-
- uid() - Method in class org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
-
- uid() - Method in class org.apache.spark.ml.evaluation.RegressionEvaluator
-
- uid() - Method in class org.apache.spark.ml.feature.Binarizer
-
- uid() - Method in class org.apache.spark.ml.feature.Bucketizer
-
- uid() - Method in class org.apache.spark.ml.feature.ColumnPruner
-
- uid() - Method in class org.apache.spark.ml.feature.CountVectorizer
-
- uid() - Method in class org.apache.spark.ml.feature.CountVectorizerModel
-
- uid() - Method in class org.apache.spark.ml.feature.DCT
-
- uid() - Method in class org.apache.spark.ml.feature.ElementwiseProduct
-
- uid() - Method in class org.apache.spark.ml.feature.HashingTF
-
- uid() - Method in class org.apache.spark.ml.feature.IDF
-
- uid() - Method in class org.apache.spark.ml.feature.IDFModel
-
- uid() - Method in class org.apache.spark.ml.feature.IndexToString
-
- uid() - Method in class org.apache.spark.ml.feature.MinMaxScaler
-
- uid() - Method in class org.apache.spark.ml.feature.MinMaxScalerModel
-
- uid() - Method in class org.apache.spark.ml.feature.NGram
-
- uid() - Method in class org.apache.spark.ml.feature.Normalizer
-
- uid() - Method in class org.apache.spark.ml.feature.OneHotEncoder
-
- uid() - Method in class org.apache.spark.ml.feature.PCA
-
- uid() - Method in class org.apache.spark.ml.feature.PCAModel
-
- uid() - Method in class org.apache.spark.ml.feature.PolynomialExpansion
-
- uid() - Method in class org.apache.spark.ml.feature.RegexTokenizer
-
- uid() - Method in class org.apache.spark.ml.feature.RFormula
-
- uid() - Method in class org.apache.spark.ml.feature.RFormulaModel
-
- uid() - Method in class org.apache.spark.ml.feature.StandardScaler
-
- uid() - Method in class org.apache.spark.ml.feature.StandardScalerModel
-
- uid() - Method in class org.apache.spark.ml.feature.StopWordsRemover
-
- uid() - Method in class org.apache.spark.ml.feature.StringIndexer
-
- uid() - Method in class org.apache.spark.ml.feature.StringIndexerModel
-
- uid() - Method in class org.apache.spark.ml.feature.Tokenizer
-
- uid() - Method in class org.apache.spark.ml.feature.VectorAssembler
-
- uid() - Method in class org.apache.spark.ml.feature.VectorIndexer
-
- uid() - Method in class org.apache.spark.ml.feature.VectorIndexerModel
-
- uid() - Method in class org.apache.spark.ml.feature.VectorSlicer
-
- uid() - Method in class org.apache.spark.ml.feature.Word2Vec
-
- uid() - Method in class org.apache.spark.ml.feature.Word2VecModel
-
- uid() - Method in class org.apache.spark.ml.Pipeline
-
- uid() - Method in class org.apache.spark.ml.PipelineModel
-
- uid() - Method in class org.apache.spark.ml.recommendation.ALS
-
- uid() - Method in class org.apache.spark.ml.recommendation.ALSModel
-
- uid() - Method in class org.apache.spark.ml.regression.DecisionTreeRegressionModel
-
- uid() - Method in class org.apache.spark.ml.regression.DecisionTreeRegressor
-
- uid() - Method in class org.apache.spark.ml.regression.GBTRegressionModel
-
- uid() - Method in class org.apache.spark.ml.regression.GBTRegressor
-
- uid() - Method in class org.apache.spark.ml.regression.IsotonicRegression
-
- uid() - Method in class org.apache.spark.ml.regression.IsotonicRegressionModel
-
- uid() - Method in class org.apache.spark.ml.regression.LinearRegression
-
- uid() - Method in class org.apache.spark.ml.regression.LinearRegressionModel
-
- uid() - Method in class org.apache.spark.ml.regression.RandomForestRegressionModel
-
- uid() - Method in class org.apache.spark.ml.regression.RandomForestRegressor
-
- uid() - Method in class org.apache.spark.ml.tuning.CrossValidator
-
- uid() - Method in class org.apache.spark.ml.tuning.CrossValidatorModel
-
- uid() - Method in class org.apache.spark.ml.tuning.TrainValidationSplit
-
- uid() - Method in class org.apache.spark.ml.tuning.TrainValidationSplitModel
-
- uid() - Method in interface org.apache.spark.ml.util.Identifiable
-
An immutable unique ID for the object and its derivatives.
- uiTab() - Method in class org.apache.spark.streaming.StreamingContext
-
- unapply(EdgeContext<VD, ED, A>) - Static method in class org.apache.spark.graphx.EdgeContext
-
Extractor mainly used for Graph#aggregateMessages*.
- unapply(DenseVector) - Static method in class org.apache.spark.mllib.linalg.DenseVector
-
Extracts the value array from a dense vector.
- unapply(SparseVector) - Static method in class org.apache.spark.mllib.linalg.SparseVector
-
- unapply(Column) - Static method in class org.apache.spark.sql.Column
-
- unapply(DataType) - Static method in class org.apache.spark.sql.types.DecimalType
-
- unapply(Expression) - Static method in class org.apache.spark.sql.types.DecimalType
-
- unapply(Expression) - Static method in class org.apache.spark.sql.types.NumericType
-
Enables matching against NumericType for expressions:
- unapply(Broker) - Static method in class org.apache.spark.streaming.kafka.Broker
-
- UnaryTransformer<IN,OUT,T extends UnaryTransformer<IN,OUT,T>> - Class in org.apache.spark.ml
-
:: DeveloperApi ::
Abstract class for transformers that take one input column, apply transformation, and output the
result as a new column.
- UnaryTransformer() - Constructor for class org.apache.spark.ml.UnaryTransformer
-
- unbase64(Column) - Static method in class org.apache.spark.sql.functions
-
Decodes a BASE64 encoded string column and returns it as a binary column.
- unbroadcast(long, boolean, boolean) - Method in interface org.apache.spark.broadcast.BroadcastFactory
-
- unbroadcast(long, boolean, boolean) - Method in class org.apache.spark.broadcast.HttpBroadcastFactory
-
Remove all persisted state associated with the HTTP broadcast with the given ID.
- unbroadcast(long, boolean, boolean) - Method in class org.apache.spark.broadcast.TorrentBroadcastFactory
-
Remove all persisted state associated with the torrent broadcast with the given ID.
- uncacheTable(String) - Method in class org.apache.spark.sql.SQLContext
-
Removes the specified table from the in-memory cache.
- underlyingSplit() - Method in class org.apache.spark.scheduler.SplitInfo
-
- unhex(Column) - Static method in class org.apache.spark.sql.functions
-
Inverse of hex.
- UniformGenerator - Class in org.apache.spark.mllib.random
-
:: DeveloperApi ::
Generates i.i.d.
- UniformGenerator() - Constructor for class org.apache.spark.mllib.random.UniformGenerator
-
- uniformJavaRDD(JavaSparkContext, long, int, long) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
- uniformJavaRDD(JavaSparkContext, long, int) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
- uniformJavaRDD(JavaSparkContext, long) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
- uniformJavaVectorRDD(JavaSparkContext, long, int, int, long) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
- uniformJavaVectorRDD(JavaSparkContext, long, int, int) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
- uniformJavaVectorRDD(JavaSparkContext, long, int) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
- uniformRDD(SparkContext, long, int, long) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
Generates an RDD comprised of i.i.d.
samples from the uniform distribution U(0.0, 1.0)
.
- uniformVectorRDD(SparkContext, long, int, int, long) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
Generates an RDD[Vector] with vectors containing i.i.d.
samples drawn from the
uniform distribution on U(0.0, 1.0)
.
- union(JavaDoubleRDD) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Return the union of this RDD and another one.
- union(JavaPairRDD<K, V>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return the union of this RDD and another one.
- union(JavaRDD<T>) - Method in class org.apache.spark.api.java.JavaRDD
-
Return the union of this RDD and another one.
- union(JavaRDD<T>, List<JavaRDD<T>>) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Build the union of two or more RDDs.
- union(JavaPairRDD<K, V>, List<JavaPairRDD<K, V>>) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Build the union of two or more RDDs.
- union(JavaDoubleRDD, List<JavaDoubleRDD>) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Build the union of two or more RDDs.
- union(RDD<T>) - Method in class org.apache.spark.rdd.RDD
-
Return the union of this RDD and another one.
- union(Seq<RDD<T>>, ClassTag<T>) - Method in class org.apache.spark.SparkContext
-
Build the union of a list of RDDs.
- union(RDD<T>, Seq<RDD<T>>, ClassTag<T>) - Method in class org.apache.spark.SparkContext
-
Build the union of a list of RDDs passed as variable-length arguments.
- union(JavaDStream<T>) - Method in class org.apache.spark.streaming.api.java.JavaDStream
-
Return a new DStream by unifying data of another DStream with this DStream.
- union(JavaPairDStream<K, V>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by unifying data of another DStream with this DStream.
- union(JavaDStream<T>, List<JavaDStream<T>>) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Create a unified DStream from multiple DStreams of the same type and same slide duration.
- union(JavaPairDStream<K, V>, List<JavaPairDStream<K, V>>) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Create a unified DStream from multiple DStreams of the same type and same slide duration.
- union(DStream<T>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream by unifying data of another DStream with this DStream.
- union(Seq<DStream<T>>, ClassTag<T>) - Method in class org.apache.spark.streaming.StreamingContext
-
Create a unified DStream from multiple DStreams of the same type and same slide duration.
- unionAll(DataFrame) - Method in class org.apache.spark.sql.DataFrame
-
Returns a new
DataFrame
containing union of rows in this frame and another frame.
- UnionRDD<T> - Class in org.apache.spark.rdd
-
- UnionRDD(SparkContext, Seq<RDD<T>>, ClassTag<T>) - Constructor for class org.apache.spark.rdd.UnionRDD
-
- uniqueId() - Method in class org.apache.spark.storage.StreamBlockId
-
- unix_timestamp() - Static method in class org.apache.spark.sql.functions
-
Gets current Unix timestamp in seconds.
- unix_timestamp(Column) - Static method in class org.apache.spark.sql.functions
-
Converts time string in format yyyy-MM-dd HH:mm:ss to Unix timestamp (in seconds),
using the default timezone and the default locale, return null if fail.
- unix_timestamp(Column, String) - Static method in class org.apache.spark.sql.functions
-
Convert time string with given pattern
(see [http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html])
to Unix time stamp (in seconds), return null if fail.
- UnknownReason - Class in org.apache.spark
-
:: DeveloperApi ::
We don't know why the task ended -- for example, because of a ClassNotFound exception when
deserializing the task result.
- UnknownReason() - Constructor for class org.apache.spark.UnknownReason
-
- Unlimited() - Static method in class org.apache.spark.sql.types.DecimalType
-
- unpersist() - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
- unpersist(boolean) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
- unpersist() - Method in class org.apache.spark.api.java.JavaPairRDD
-
Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
- unpersist(boolean) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
- unpersist() - Method in class org.apache.spark.api.java.JavaRDD
-
Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
- unpersist(boolean) - Method in class org.apache.spark.api.java.JavaRDD
-
Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
- unpersist() - Method in class org.apache.spark.broadcast.Broadcast
-
Asynchronously delete cached copies of this broadcast on the executors.
- unpersist(boolean) - Method in class org.apache.spark.broadcast.Broadcast
-
Delete cached copies of this broadcast on the executors.
- unpersist(boolean) - Method in class org.apache.spark.graphx.Graph
-
Uncaches both vertices and edges of this graph.
- unpersist(boolean) - Method in class org.apache.spark.graphx.impl.EdgeRDDImpl
-
- unpersist(boolean) - Method in class org.apache.spark.graphx.impl.GraphImpl
-
- unpersist(boolean) - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
-
- unpersist() - Method in class org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
-
Unpersist intermediate RDDs used in the computation.
- unpersist(boolean) - Method in class org.apache.spark.rdd.RDD
-
Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
- unpersist(boolean) - Method in class org.apache.spark.sql.DataFrame
-
- unpersist() - Method in class org.apache.spark.sql.DataFrame
-
- unpersistVertices(boolean) - Method in class org.apache.spark.graphx.Graph
-
Uncaches only the vertices of this graph, leaving the edges alone.
- unpersistVertices(boolean) - Method in class org.apache.spark.graphx.impl.GraphImpl
-
- unregisterDialect(JdbcDialect) - Static method in class org.apache.spark.sql.jdbc.JdbcDialects
-
Unregister a dialect.
- Unresolved() - Static method in class org.apache.spark.ml.attribute.AttributeType
-
Unresolved type.
- UnresolvedAttribute - Class in org.apache.spark.ml.attribute
-
:: DeveloperApi ::
An unresolved attribute.
- UnresolvedAttribute() - Constructor for class org.apache.spark.ml.attribute.UnresolvedAttribute
-
- unsafeEnabled() - Method in class org.apache.spark.sql.SQLContext.SparkPlanner
-
- unset() - Static method in class org.apache.spark.TaskContext
-
Unset the thread local TaskContext.
- until(Time, Duration) - Method in class org.apache.spark.streaming.Time
-
- untilOffset() - Method in class org.apache.spark.streaming.kafka.OffsetRange
-
- update(RDD<Vector>, double, String) - Method in class org.apache.spark.mllib.clustering.StreamingKMeansModel
-
Perform a k-means update on a batch of data.
- update(int, int, double) - Method in interface org.apache.spark.mllib.linalg.Matrix
-
Update element at (i, j)
- update(Function1<Object, Object>) - Method in interface org.apache.spark.mllib.linalg.Matrix
-
Update all the values of this matrix using the function f.
- update() - Method in class org.apache.spark.scheduler.AccumulableInfo
-
- update(int, Object) - Method in class org.apache.spark.sql.expressions.MutableAggregationBuffer
-
Update the ith value of this buffer.
- update(MutableAggregationBuffer, Row) - Method in class org.apache.spark.sql.expressions.UserDefinedAggregateFunction
-
Updates the given aggregation buffer buffer
with new input data from input
.
- update() - Method in class org.apache.spark.status.api.v1.AccumulableInfo
-
- update(T1, T2) - Method in class org.apache.spark.util.MutablePair
-
Updates this pair with new values and returns itself
- updateAggregateMetrics(UIData.StageUIData, String, TaskMetrics, Option<TaskMetrics>) - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
Upon receiving new metrics for a task, updates the per-stage and per-executor-per-stage
aggregate metrics by calculating deltas between the currently recorded metrics and the new
metrics.
- updatePredictionError(RDD<LabeledPoint>, RDD<Tuple2<Object, Object>>, double, DecisionTreeModel, Loss) - Static method in class org.apache.spark.mllib.tree.model.GradientBoostedTreesModel
-
Update a zipped predictionError RDD
(as obtained with computeInitialPredictionAndError)
- Updater - Class in org.apache.spark.mllib.optimization
-
:: DeveloperApi ::
Class used to perform steps (weight update) using Gradient Descent methods.
- Updater() - Constructor for class org.apache.spark.mllib.optimization.Updater
-
- updateStateByKey(Function2<List<V>, Optional<S>, Optional<S>>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new "state" DStream where the state for each key is updated by applying
the given function on the previous state of the key and the new values of each key.
- updateStateByKey(Function2<List<V>, Optional<S>, Optional<S>>, int) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new "state" DStream where the state for each key is updated by applying
the given function on the previous state of the key and the new values of each key.
- updateStateByKey(Function2<List<V>, Optional<S>, Optional<S>>, Partitioner) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new "state" DStream where the state for each key is updated by applying
the given function on the previous state of the key and the new values of the key.
- updateStateByKey(Function2<List<V>, Optional<S>, Optional<S>>, Partitioner, JavaPairRDD<K, S>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new "state" DStream where the state for each key is updated by applying
the given function on the previous state of the key and the new values of the key.
- updateStateByKey(Function2<Seq<V>, Option<S>, Option<S>>, ClassTag<S>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new "state" DStream where the state for each key is updated by applying
the given function on the previous state of the key and the new values of each key.
- updateStateByKey(Function2<Seq<V>, Option<S>, Option<S>>, int, ClassTag<S>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new "state" DStream where the state for each key is updated by applying
the given function on the previous state of the key and the new values of each key.
- updateStateByKey(Function2<Seq<V>, Option<S>, Option<S>>, Partitioner, ClassTag<S>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new "state" DStream where the state for each key is updated by applying
the given function on the previous state of the key and the new values of the key.
- updateStateByKey(Function1<Iterator<Tuple3<K, Seq<V>, Option<S>>>, Iterator<Tuple2<K, S>>>, Partitioner, boolean, ClassTag<S>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new "state" DStream where the state for each key is updated by applying
the given function on the previous state of the key and the new values of each key.
- updateStateByKey(Function2<Seq<V>, Option<S>, Option<S>>, Partitioner, RDD<Tuple2<K, S>>, ClassTag<S>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new "state" DStream where the state for each key is updated by applying
the given function on the previous state of the key and the new values of the key.
- updateStateByKey(Function1<Iterator<Tuple3<K, Seq<V>, Option<S>>>, Iterator<Tuple2<K, S>>>, Partitioner, boolean, RDD<Tuple2<K, S>>, ClassTag<S>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new "state" DStream where the state for each key is updated by applying
the given function on the previous state of the key and the new values of each key.
- upper(Column) - Static method in class org.apache.spark.sql.functions
-
Converts a string column to upper case.
- useDisk() - Method in class org.apache.spark.storage.StorageLevel
-
- useDst - Variable in class org.apache.spark.graphx.TripletFields
-
Indicates whether the destination vertex attribute is included.
- useEdge - Variable in class org.apache.spark.graphx.TripletFields
-
Indicates whether the edge attribute is included.
- useMemory() - Method in class org.apache.spark.storage.StorageLevel
-
- useNodeIdCache() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- useOffHeap() - Method in class org.apache.spark.storage.StorageLevel
-
- user() - Method in class org.apache.spark.ml.recommendation.ALS.Rating
-
- user() - Method in class org.apache.spark.mllib.recommendation.Rating
-
- user() - Method in class org.apache.spark.scheduler.JobLogger
-
- USER_DEFAULT() - Static method in class org.apache.spark.sql.types.DecimalType
-
- userClass() - Method in class org.apache.spark.mllib.linalg.VectorUDT
-
- userClass() - Method in class org.apache.spark.sql.types.UserDefinedType
-
Class object for the UserType
- UserDefinedAggregateFunction - Class in org.apache.spark.sql.expressions
-
:: Experimental ::
The base class for implementing user-defined aggregate functions (UDAF).
- UserDefinedAggregateFunction() - Constructor for class org.apache.spark.sql.expressions.UserDefinedAggregateFunction
-
- UserDefinedFunction - Class in org.apache.spark.sql
-
A user-defined function.
- UserDefinedFunction(Object, DataType, Seq<DataType>) - Constructor for class org.apache.spark.sql.UserDefinedFunction
-
- userDefinedPartitionColumns() - Method in class org.apache.spark.sql.sources.HadoopFsRelation
-
Optional user defined partition columns.
- UserDefinedType<UserType> - Class in org.apache.spark.sql.types
-
::DeveloperApi::
The data type for User Defined Types (UDTs).
- UserDefinedType() - Constructor for class org.apache.spark.sql.types.UserDefinedType
-
- userFactors() - Method in class org.apache.spark.ml.recommendation.ALSModel
-
- userFeatures() - Method in class org.apache.spark.mllib.recommendation.MatrixFactorizationModel
-
- useSrc - Variable in class org.apache.spark.graphx.TripletFields
-
Indicates whether the source vertex attribute is included.