pyspark.mllib.clustering.
GaussianMixture
Learning algorithm for Gaussian Mixtures using the expectation-maximization algorithm.
New in version 1.3.0.
Methods
train(rdd, k[, convergenceTol, …])
train
Train a Gaussian Mixture clustering model.
Methods Documentation
pyspark.RDD
Training points as an RDD of pyspark.mllib.linalg.Vector or convertible sequence types.
pyspark.mllib.linalg.Vector
Number of independent Gaussians in the mixture model.
Maximum change in log-likelihood at which convergence is considered to have occurred. (default: 1e-3)
Maximum number of iterations allowed. (default: 100)
Random seed for initial Gaussian distribution. Set as None to generate seed based on system time. (default: None)
Initial GMM starting point, bypassing the random initialization. (default: None)