Training

The most versatile class is TrainSession. MacauSession and BPMFSession provide a simpler interface.

TrainSession

class smurff.TrainSession(priors=['normal', 'normal'], num_latent=None, num_threads=None, burnin=None, nsamples=None, seed=None, threshold=None, verbose=None, save_name=None, save_freq=None, checkpoint_freq=None)

Class for doing a training run in smurff

A simple use case could be:

>>> trainSession = smurff.TrainSession(burnin = 5, nsamples = 5)
>>> trainSession.setTrain(Ydense)
>>> trainSession.run()
priors

The type of prior to use for each dimension

Type:list, where element is one of { “normal”, “normalone”, “macau”, “macauone”, “spikeandslab” }
num_latent

Number of latent dimensions in the model

Type:int
burnin

Number of burnin samples to discard

Type:int
nsamples

Number of samples to keep

Type:int
num_threads

Number of OpenMP threads to use for model building

Type:int
verbose

Verbosity level for C++ library

Type:{0, 1, 2}
seed

Random seed to use for sampling

Type:float
save_name

HDF5 filename to store the samples.

Type:path
save_freq
  • N>0: save every Nth sample
  • N==0: never save a sample
  • N==-1: save only the last sample
Type:int
checkpoint_freq

Save the state of the trainSession every N seconds.

Type:int
addData(pos, Y, noise=<smurff.helper.FixedNoise object>, is_scarce=False)

Stacks more matrices/tensors next to the main train matrix.

pos : shape
Block position of the data with respect to train. The train matrix/tensor has implicit block position (0, 0).
Y : :class: numpy.ndarray, scipy.sparse matrix or :class: SparseTensor
Data matrix/tensor to add
is_scarce : bool
When Y is sparse, and is_scarce is True the missing values are considered as unknown. When Y is sparse, and is_scarce is False the missing values are considered as zero. When Y is dense, this parameter is ignored.
noise : :class: NoiseConfig
Noise model to use for Y
addPropagatedPosterior(mode, mu, Lambda)

Adds mu and Lambda from propagated posterior

mode : int
dimension to add side info (rows = 0, cols = 1)
mu : :class: numpy.ndarray matrix
mean matrix mu should have as many rows as num_latent mu should have as many columns as size of dimension mode in train
Lambda : :class: numpy.ndarray matrix
co-variance matrix Lambda should be shaped like K x K x N Where K == num_latent and N == dimension mode in train
addSideInfo(mode, Y, noise=<smurff.helper.SampledNoise object>, direct=True)

Adds fully known side info, for use in with the macau or macauone prior

mode : int
dimension to add side info (rows = 0, cols = 1)
Y : :class: numpy.ndarray, scipy.sparse matrix
Side info matrix/tensor Y should have as many rows in Y as you have elemnts in the dimension selected using mode. Columns in Y are features for each element.
noise : :class: NoiseConfig
Noise model to use for Y
direct : boolean
  • When True, uses a direct inversion method.
  • When False, uses a CG solver

The direct method is only feasible for a small (< 100K) number of features.

init()

Initializes the TrainSession after all data has been added.

You need to call this method befor calling step(), unless you call run()

Returns:
Return type:StatusItem of the trainSession.
makePredictSession()

Makes a PredictSession based on the model that as built in this TrainSession.

run()

Equivalent to:

self.init()
while self.step():
    pass
setTrain(Y, noise=<smurff.helper.FixedNoise object>, is_scarce=True)

Adds a train and optionally a test matrix as input data to this TrainSession

Parameters:
  • Y – Train matrix/tensor
  • noise – Noise model to use for Y
  • is_scarce (bool) – When Y is sparse, and is_scarce is True the missing values are considered as unknown. When Y is sparse, and is_scarce is False the missing values are considered as zero. When Y is dense, this parameter is ignored.
step()

Does on sampling or burnin iteration.

Returns:
  • - When a step was executed (StatusItem of the trainSession.)
  • - After the last iteration, when no step was executed (None.)

MacauSession

class smurff.MacauSession(Ytrain, is_scarce=True, Ytest=None, side_info=None, univariate=False, direct=True, *args, **kwargs)

A train trainSession specialized for use with the Macau algorithm

Ytrain
Train matrix/tensor
Ytest : scipy.sparse matrix or :class: SparseTensor
Test matrix/tensor. Mainly used for calculating RMSE.
side_info : list of :class: numpy.ndarray, scipy.sparse matrix or None
Side info matrix/tensor for each dimension If there is no side info for a certain mode, pass None. Each side info should have as many rows as you have elemnts in corresponding dimension of Ytrain.
direct : bool
Use Cholesky instead of CG solver
univariate : bool
Use univariate or multivariate sampling.
**args:
Extra arguments are passed to the TrainSession
Type:
class:numpy.ndarray, scipy.sparse matrix or :class: SparseTensor

BPMFSession

class smurff.BPMFSession(Ytrain, is_scarce=True, Ytest=None, univariate=False, *args, **kwargs)

A train trainSession specialized for use with the BPMF algorithm

Ytrain
Train matrix/tensor
Ytest : scipy.sparse matrix or :class: SparseTensor
Test matrix/tensor. Mainly used for calculating RMSE.
univariate : bool
Use univariate or multivariate sampling.
**args:
Extra arguments are passed to the TrainSession
Type:
class:numpy.ndarray, scipy.sparse matrix or :class: SparseTensor

StatusItem

class smurff.StatusItem

Short set of parameters indicative for the training progress.

auc_1sample

ROC AUC of the test matrix of the last sampleOnly available if you provided a threshold

auc_avg

Average ROC AUC of the test matrix across all samplesOnly available if you provided a threshold

elapsed_iter

Number of seconds the last sampling iteration took

iter

Current iteration in current phase

nnz_per_sec

Compute performance indicator; number of non-zero elements in train processed per second

phase

{ “Burnin”, “Sampling” }

rmse_1sample

RMSE for test matrix of last sample

rmse_avg

Averag RMSE for test matrix across all samples

samples_per_sec

Compute performance indicator; number of rows and columns in U/V processed per second

train_rmse

RMSE for train matrix of last sample