spark.gaussianMixture {SparkR}R Documentation

Multivariate Gaussian Mixture Model (GMM)

Description

Fits multivariate gaussian mixture model against a SparkDataFrame, similarly to R's mvnormalmixEM(). Users can call summary to print a summary of the fitted model, predict to make predictions on new data, and write.ml/read.ml to save/load fitted models.

Usage

spark.gaussianMixture(data, formula, ...)

## S4 method for signature 'SparkDataFrame,formula'
spark.gaussianMixture(data, formula,
  k = 2, maxIter = 100, tol = 0.01)

## S4 method for signature 'GaussianMixtureModel'
summary(object)

## S4 method for signature 'GaussianMixtureModel'
predict(object, newData)

## S4 method for signature 'GaussianMixtureModel,character'
write.ml(object, path,
  overwrite = FALSE)

Arguments

data

a SparkDataFrame for training.

formula

a symbolic description of the model to be fitted. Currently only a few formula operators are supported, including '~', '.', ':', '+', and '-'. Note that the response variable of formula is empty in spark.gaussianMixture.

...

additional arguments passed to the method.

k

number of independent Gaussians in the mixture model.

maxIter

maximum iteration number.

tol

the convergence tolerance.

object

a fitted gaussian mixture model.

newData

a SparkDataFrame for testing.

path

the directory where the model is saved.

overwrite

overwrites or not if the output path already exists. Default is FALSE which means throw exception if the output path exists.

Value

spark.gaussianMixture returns a fitted multivariate gaussian mixture model.

summary returns summary of the fitted model, which is a list. The list includes the model's lambda (lambda), mu (mu), sigma (sigma), loglik (loglik), and posterior (posterior).

predict returns a SparkDataFrame containing predicted labels in a column named "prediction".

Note

spark.gaussianMixture since 2.1.0

summary(GaussianMixtureModel) since 2.1.0

predict(GaussianMixtureModel) since 2.1.0

write.ml(GaussianMixtureModel, character) since 2.1.0

See Also

mixtools: https://cran.r-project.org/package=mixtools

predict, read.ml, write.ml

Examples

## Not run: 
##D sparkR.session()
##D library(mvtnorm)
##D set.seed(100)
##D a <- rmvnorm(4, c(0, 0))
##D b <- rmvnorm(6, c(3, 4))
##D data <- rbind(a, b)
##D df <- createDataFrame(as.data.frame(data))
##D model <- spark.gaussianMixture(df, ~ V1 + V2, k = 2)
##D summary(model)
##D 
##D # fitted values on training data
##D fitted <- predict(model, df)
##D head(select(fitted, "V1", "prediction"))
##D 
##D # save fitted model to input path
##D path <- "path/to/model"
##D write.ml(model, path)
##D 
##D # can also read back the saved model and print
##D savedModel <- read.ml(path)
##D summary(savedModel)
## End(Not run)

[Package SparkR version 2.2.3 Index]