spark.kmeans {SparkR} | R Documentation |
Fits a k-means clustering model against a Spark DataFrame, similarly to R's kmeans().
Users can call summary
to print a summary of the fitted model, predict
to make
predictions on new data, and write.ml
/read.ml
to save/load fitted models.
spark.kmeans(data, formula, ...) ## S4 method for signature 'SparkDataFrame,formula' spark.kmeans(data, formula, k = 2, maxIter = 20, initMode = c("k-means||", "random")) ## S4 method for signature 'KMeansModel' summary(object) ## S4 method for signature 'KMeansModel' predict(object, newData) ## S4 method for signature 'KMeansModel,character' write.ml(object, path, overwrite = FALSE)
data |
a SparkDataFrame for training. |
formula |
a symbolic description of the model to be fitted. Currently only a few formula operators are supported, including '~', '.', ':', '+', and '-'. Note that the response variable of formula is empty in spark.kmeans. |
... |
additional argument(s) passed to the method. |
k |
number of centers. |
maxIter |
maximum iteration number. |
initMode |
the initialization algorithm choosen to fit the model. |
object |
a fitted k-means model. |
newData |
a SparkDataFrame for testing. |
path |
the directory where the model is saved. |
overwrite |
overwrites or not if the output path already exists. Default is FALSE which means throw exception if the output path exists. |
spark.kmeans
returns a fitted k-means model.
summary
returns summary information of the fitted model, which is a list.
The list includes the model's k
(number of cluster centers),
coefficients
(model cluster centers),
size
(number of data points in each cluster), and cluster
(cluster centers of the transformed data).
predict
returns the predicted values based on a k-means model.
spark.kmeans since 2.0.0
summary(KMeansModel) since 2.0.0
predict(KMeansModel) since 2.0.0
write.ml(KMeansModel, character) since 2.0.0
## Not run:
##D sparkR.session()
##D data(iris)
##D df <- createDataFrame(iris)
##D model <- spark.kmeans(df, Sepal_Length ~ Sepal_Width, k = 4, initMode = "random")
##D summary(model)
##D
##D # fitted values on training data
##D fitted <- predict(model, df)
##D head(select(fitted, "Sepal_Length", "prediction"))
##D
##D # save fitted model to input path
##D path <- "path/to/model"
##D write.ml(model, path)
##D
##D # can also read back the saved model and print
##D savedModel <- read.ml(path)
##D summary(savedModel)
## End(Not run)