Training¶
The most versatile class is TrainSession
.
MacauSession
and BPMFSession
provide a simpler interface.
TrainSession¶
-
class
smurff.
TrainSession
(priors=['normal', 'normal'], num_latent=None, num_threads=None, burnin=None, nsamples=None, seed=None, threshold=None, verbose=None, save_name=None, save_freq=None, checkpoint_freq=None)¶ Class for doing a training run in smurff
A simple use case could be:
>>> trainSession = smurff.TrainSession(burnin = 5, nsamples = 5) >>> trainSession.setTrain(Ydense) >>> trainSession.run()
-
priors
¶ The type of prior to use for each dimension
Type: list, where element is one of { “normal”, “normalone”, “macau”, “macauone”, “spikeandslab” }
-
num_latent
¶ Number of latent dimensions in the model
Type: int
-
burnin
¶ Number of burnin samples to discard
Type: int
-
nsamples
¶ Number of samples to keep
Type: int
-
num_threads
¶ Number of OpenMP threads to use for model building
Type: int
-
verbose
¶ Verbosity level for C++ library
Type: {0, 1, 2}
-
seed
¶ Random seed to use for sampling
Type: float
-
save_name
¶ HDF5 filename to store the samples.
Type: path
-
save_freq
¶ - N>0: save every Nth sample
- N==0: never save a sample
- N==-1: save only the last sample
Type: int
-
checkpoint_freq
¶ Save the state of the trainSession every N seconds.
Type: int
-
addData
(pos, Y, noise=<smurff.helper.FixedNoise object>, is_scarce=False)¶ Stacks more matrices/tensors next to the main train matrix.
- pos : shape
- Block position of the data with respect to train. The train matrix/tensor has implicit block position (0, 0).
- Y : :class: numpy.ndarray,
scipy.sparse
matrix or :class: SparseTensor - Data matrix/tensor to add
- is_scarce : bool
- When Y is sparse, and is_scarce is True the missing values are considered as unknown. When Y is sparse, and is_scarce is False the missing values are considered as zero. When Y is dense, this parameter is ignored.
- noise : :class: NoiseConfig
- Noise model to use for Y
-
addPropagatedPosterior
(mode, mu, Lambda)¶ Adds mu and Lambda from propagated posterior
- mode : int
- dimension to add side info (rows = 0, cols = 1)
- mu : :class: numpy.ndarray matrix
- mean matrix mu should have as many rows as num_latent mu should have as many columns as size of dimension mode in train
- Lambda : :class: numpy.ndarray matrix
- co-variance matrix Lambda should be shaped like K x K x N Where K == num_latent and N == dimension mode in train
-
addSideInfo
(mode, Y, noise=<smurff.helper.SampledNoise object>, direct=True)¶ Adds fully known side info, for use in with the macau or macauone prior
- mode : int
- dimension to add side info (rows = 0, cols = 1)
- Y : :class: numpy.ndarray,
scipy.sparse
matrix - Side info matrix/tensor Y should have as many rows in Y as you have elemnts in the dimension selected using mode. Columns in Y are features for each element.
- noise : :class: NoiseConfig
- Noise model to use for Y
- direct : boolean
- When True, uses a direct inversion method.
- When False, uses a CG solver
The direct method is only feasible for a small (< 100K) number of features.
-
init
()¶ Initializes the TrainSession after all data has been added.
You need to call this method befor calling
step()
, unless you callrun()
Returns: Return type: StatusItem
of the trainSession.
-
makePredictSession
()¶ Makes a
PredictSession
based on the model that as built in this TrainSession.
-
run
()¶ Equivalent to:
self.init() while self.step(): pass
-
setTrain
(Y, noise=<smurff.helper.FixedNoise object>, is_scarce=True)¶ Adds a train and optionally a test matrix as input data to this TrainSession
Parameters: - Y – Train matrix/tensor
- noise – Noise model to use for Y
- is_scarce (bool) – When Y is sparse, and is_scarce is True the missing values are considered as unknown. When Y is sparse, and is_scarce is False the missing values are considered as zero. When Y is dense, this parameter is ignored.
-
step
()¶ Does on sampling or burnin iteration.
Returns: - - When a step was executed (
StatusItem
of the trainSession.) - - After the last iteration, when no step was executed (None.)
- - When a step was executed (
-
MacauSession¶
-
class
smurff.
MacauSession
(Ytrain, is_scarce=True, Ytest=None, side_info=None, univariate=False, direct=True, *args, **kwargs)¶ A train trainSession specialized for use with the Macau algorithm
-
Ytrain
¶ - Train matrix/tensor
- Ytest :
scipy.sparse
matrix or :class: SparseTensor - Test matrix/tensor. Mainly used for calculating RMSE.
- side_info : list of :class: numpy.ndarray,
scipy.sparse
matrix or None - Side info matrix/tensor for each dimension If there is no side info for a certain mode, pass None. Each side info should have as many rows as you have elemnts in corresponding dimension of Ytrain.
- direct : bool
- Use Cholesky instead of CG solver
- univariate : bool
- Use univariate or multivariate sampling.
- **args:
- Extra arguments are passed to the
TrainSession
Type: class: numpy.ndarray, scipy.sparse
matrix or :class: SparseTensor - Ytest :
-
BPMFSession¶
-
class
smurff.
BPMFSession
(Ytrain, is_scarce=True, Ytest=None, univariate=False, *args, **kwargs)¶ A train trainSession specialized for use with the BPMF algorithm
-
Ytrain
¶ - Train matrix/tensor
- Ytest :
scipy.sparse
matrix or :class: SparseTensor - Test matrix/tensor. Mainly used for calculating RMSE.
- univariate : bool
- Use univariate or multivariate sampling.
- **args:
- Extra arguments are passed to the
TrainSession
Type: class: numpy.ndarray, scipy.sparse
matrix or :class: SparseTensor - Ytest :
-
StatusItem¶
-
class
smurff.
StatusItem
¶ Short set of parameters indicative for the training progress.
-
auc_1sample
¶ ROC AUC of the test matrix of the last sampleOnly available if you provided a threshold
-
auc_avg
¶ Average ROC AUC of the test matrix across all samplesOnly available if you provided a threshold
-
elapsed_iter
¶ Number of seconds the last sampling iteration took
-
iter
¶ Current iteration in current phase
-
nnz_per_sec
¶ Compute performance indicator; number of non-zero elements in train processed per second
-
phase
¶ { “Burnin”, “Sampling” }
-
rmse_1sample
¶ RMSE for test matrix of last sample
-
rmse_avg
¶ Averag RMSE for test matrix across all samples
-
samples_per_sec
¶ Compute performance indicator; number of rows and columns in U/V processed per second
-
train_rmse
¶ RMSE for train matrix of last sample
-