ALS¶
-
class
pyspark.mllib.recommendation.
ALS
[source]¶ Alternating Least Squares matrix factorization
New in version 0.9.0.
Methods
train
(ratings, rank[, iterations, lambda_, …])Train a matrix factorization model given an RDD of ratings by users for a subset of products.
trainImplicit
(ratings, rank[, iterations, …])Train a matrix factorization model given an RDD of ‘implicit preferences’ of users for a subset of products.
Methods Documentation
-
classmethod
train
(ratings, rank, iterations=5, lambda_=0.01, blocks=- 1, nonnegative=False, seed=None)[source]¶ Train a matrix factorization model given an RDD of ratings by users for a subset of products. The ratings matrix is approximated as the product of two lower-rank matrices of a given rank (number of features). To solve for these features, ALS is run iteratively with a configurable level of parallelism.
New in version 0.9.0.
- Parameters
- ratings
pyspark.RDD
RDD of Rating or (userID, productID, rating) tuple.
- rankint
Number of features to use (also referred to as the number of latent factors).
- iterationsint, optional
Number of iterations of ALS. (default: 5)
- lambda_float, optional
Regularization parameter. (default: 0.01)
- blocksint, optional
Number of blocks used to parallelize the computation. A value of -1 will use an auto-configured number of blocks. (default: -1)
- nonnegativebool, optional
A value of True will solve least-squares with nonnegativity constraints. (default: False)
- seedbool, optional
Random seed for initial matrix factorization model. A value of None will use system time as the seed. (default: None)
- ratings
-
classmethod
trainImplicit
(ratings, rank, iterations=5, lambda_=0.01, blocks=- 1, alpha=0.01, nonnegative=False, seed=None)[source]¶ Train a matrix factorization model given an RDD of ‘implicit preferences’ of users for a subset of products. The ratings matrix is approximated as the product of two lower-rank matrices of a given rank (number of features). To solve for these features, ALS is run iteratively with a configurable level of parallelism.
New in version 0.9.0.
- Parameters
- ratings
pyspark.RDD
RDD of Rating or (userID, productID, rating) tuple.
- rankint
Number of features to use (also referred to as the number of latent factors).
- iterationsint, optional
Number of iterations of ALS. (default: 5)
- lambda_float, optional
Regularization parameter. (default: 0.01)
- blocksint, optional
Number of blocks used to parallelize the computation. A value of -1 will use an auto-configured number of blocks. (default: -1)
- alphafloat, optional
A constant used in computing confidence. (default: 0.01)
- nonnegativebool, optional
A value of True will solve least-squares with nonnegativity constraints. (default: False)
- seedint, optional
Random seed for initial matrix factorization model. A value of None will use system time as the seed. (default: None)
- ratings
-
classmethod