Training¶
The most versatile class is TrainSession
.
MacauSession
and BPMFSession
provide a simpler interface.
TrainSession¶
-
class
smurff.
TrainSession
(priors=[u'normal', u'normal'], num_latent=NUM_LATENT_DEFAULT_VALUE, num_threads=NUM_THREADS_DEFAULT_VALUE, burnin=BURNIN_DEFAULT_VALUE, nsamples=NSAMPLES_DEFAULT_VALUE, seed=RANDOM_SEED_DEFAULT_VALUE, threshold=None, verbose=1, save_prefix=None, save_extension=None, save_freq=None, checkpoint_freq=None, csv_status=None)¶ Class for doing a training run in smurff
A simple use case could be:
>>> session = smurff.TrainSession(burnin = 5, nsamples = 5) >>> session.addTrainAndTest(Ydense) >>> session.run()
-
priors
¶ The type of prior to use for each dimension
Type: list, where element is one of { “normal”, “normalone”, “macau”, “macauone”, “spikeandslab” }
-
num_latent
¶ Number of latent dimensions in the model
Type: int
-
burnin
¶ Number of burnin samples to discard
Type: int
-
nsamples
¶ Number of samples to keep
Type: int
-
num_threads
¶ Number of OpenMP threads to use for model building
Type: int
-
verbose
¶ Verbosity level for C++ library
Type: {0, 1, 2}
-
seed
¶ Random seed to use for sampling
Type: float
-
save_prefix
¶ Path where to store the samples. The path includes the directory name, as well as the initial part of the file names.
Type: path
-
save_freq
¶ - N>0: save every Nth sample
- N==0: never save a sample
- N==-1: save only the last sample
Type: int
-
save_extension
¶ - .csv: save in textual csv file format
- .ddm: save in binary file format
Type: { “.csv”, “.ddm” }
-
checkpoint_freq
¶ Save the state of the session every N seconds.
Type: int
-
csv_status
¶ Stores limited set of parameters, indicative for training progress in this file. See
StatusItem
Type: filepath
-
addData
(self, pos, Y, is_scarce=False, noise=PyNoiseConfig())¶ Stacks more matrices/tensors next to the main train matrix.
- pos : shape
- Block position of the data with respect to train. The train matrix/tensor has implicit block position (0, 0).
- Y : :class: numpy.ndarray,
scipy.sparse
matrix or :class: SparseTensor - Data matrix/tensor to add
- is_scarce : bool
- When Y is sparse, and is_scarce is True the missing values are considered as unknown. When Y is sparse, and is_scarce is False the missing values are considered as zero. When Y is dense, this parameter is ignored.
- noise : :class: PyNoiseConfig
- Noise model to use for Y
-
addPropagatedPosterior
(self, mode, mu, Lambda)¶ Adds mu and Lambda from propagated posterior
- mode : int
- dimension to add side info (rows = 0, cols = 1)
- mu : :class: numpy.ndarray matrix
- mean matrix mu should have as many rows as num_latent mu should have as many columns as size of dimension mode in train
- Lambda : :class: numpy.ndarray matrix
- co-variance matrix Lambda should be shaped like K x K x N Where K == num_latent and N == dimension mode in train
-
addSideInfo
(self, mode, Y, noise=PyNoiseConfig(), tol=1e-6, direct=False)¶ Adds fully known side info, for use in with the macau or macauone prior
- mode : int
- dimension to add side info (rows = 0, cols = 1)
- Y : :class: numpy.ndarray,
scipy.sparse
matrix - Side info matrix/tensor Y should have as many rows in Y as you have elemnts in the dimension selected using mode. Columns in Y are features for each element.
- noise : :class: PyNoiseConfig
- Noise model to use for Y
- direct : boolean
- When True, uses a direct inversion method.
- When False, uses a CG solver
The direct method is only feasible for a small (< 100K) number of features.
- tol : float
- Tolerance for the CG solver.
-
addTrainAndTest
(self, Y, Ytest=None, noise=PyNoiseConfig(), is_scarce=True)¶ Adds a train and optionally a test matrix as input data to this TrainSession
Parameters: - Y – Train matrix/tensor
- Ytest (
scipy.sparse
matrix or :class: SparseTensor) – Test matrix/tensor. Mainly used for calculating RMSE. - noise – Noise model to use for Y
- is_scarce (bool) – When Y is sparse, and is_scarce is True the missing values are considered as unknown. When Y is sparse, and is_scarce is False the missing values are considered as zero. When Y is dense, this parameter is ignored.
-
getConfig
(self)¶ Get this TrainSession’s configuration in ini-file format
-
getRmseAvg
(self)¶ Average RMSE across all samples for the test matrix
-
getStatus
(self)¶ Returns
StatusItem
with current state of the session
-
getTestPredictions
(self)¶ Get predictions for test matrix.
Returns: list of Prediction
Return type: list
-
init
(self)¶ Initializes the TrainSession after all data has been added.
You need to call this method befor calling
step()
, unless you callrun()
Returns: Return type: StatusItem
of the session.
-
makePredictSession
(self)¶ Makes a
PredictSession
based on the model that as built in this TrainSession.
-
run
(self)¶ Equivalent to:
self.init() while self.step(): pass
-
step
(self)¶ Does on sampling or burnin iteration.
Returns: - - When a step was executed (
StatusItem
of the session.) - - After the last iteration, when no step was executed (None.)
- - When a step was executed (
-
MacauSession¶
-
class
smurff.
MacauSession
(Ytrain, Ytest=None, side_info=None, univariate=False, direct=False, **args)¶ A train session specialized for use with the Macau algorithm
-
Ytrain
¶ - Train matrix/tensor
- Ytest :
scipy.sparse
matrix or :class: SparseTensor - Test matrix/tensor. Mainly used for calculating RMSE.
- side_info : list of :class: numpy.ndarray,
scipy.sparse
matrix or None - Side info matrix/tensor for each dimension If there is no side info for a certain mode, pass None. Each side info should have as many rows as you have elemnts in corresponding dimension of Ytrain.
- direct : bool
- Use Cholesky instead of CG solver
- univariate : bool
- Use univariate or multivariate sampling.
- **args:
- Extra arguments are passed to the
TrainSession
Type: class: numpy.ndarray, scipy.sparse
matrix or :class: SparseTensor - Ytest :
-
BPMFSession¶
-
class
smurff.
BPMFSession
(Ytrain, Ytest=None, univariate=False, **args)¶ A train session specialized for use with the BPMF algorithm
-
Ytrain
¶ - Train matrix/tensor
- Ytest :
scipy.sparse
matrix or :class: SparseTensor - Test matrix/tensor. Mainly used for calculating RMSE.
- univariate : bool
- Use univariate or multivariate sampling.
- **args:
- Extra arguments are passed to the
TrainSession
Type: class: numpy.ndarray, scipy.sparse
matrix or :class: SparseTensor - Ytest :
-
StatusItem¶
-
class
smurff.
StatusItem
(phase, iter, phase_iter, model_norms, rmse_avg, rmse_1sample, train_rmse, auc_1sample, auc_avg, elapsed_iter, nnz_per_sec, samples_per_sec)¶ Short set of parameters indicative for the training progress.
-
phase
¶ Type: { “Burnin”, “Sampling” }
-
iter
¶ Current iteration in current phase
Type: int
-
phase_iter
¶ Number of iterations in this phase
Type: int
-
model_norms
¶ Norm of each U/V matrix
Type: list of float
-
rmse_avg
¶ Averag RMSE for test matrix across all samples
Type: float
-
rmse_1sample
¶ RMSE for test matrix of last sample
Type: float
-
train_rmse
¶ RMSE for train matrix of last sample
Type: float
-
auc_1sample
¶ ROC AUC of the test matrix of the last sample Only available if you provided a threshold.
Type: float
-
auc_avg
¶ Averag ROC AUC of the test matrix accross all samples Only available if you provided a threshold.
Type: float
-
elapsed_iter
¶ Number of seconds the last sampling iteration took
Type: float
-
nnz_per_sec
¶ Compute performance indicator; number of non-zero elements in train processed per second
Type: float
-
samples_per_sec
¶ Compute performance indicator; number of rows and columns in U/V processed per second
Type: float
-