Training ¶

Contents

Training

The most versatile class is TrainSession. MacauSession and BPMFSession provide a simpler interface.

TrainSession ¶

class smurff.TrainSession(priors=['normal', 'normal'], num_latent=None, num_threads=None, burnin=None, nsamples=None, seed=None, threshold=None, verbose=None, save_name=None, save_freq=None, checkpoint_freq=None)¶

Class for doing a training run in smurff

A simple use case could be:

>>> trainSession = smurff.TrainSession(burnin = 5, nsamples = 5)
>>> trainSession.setTrain(Ydense)
>>> trainSession.run()

priors¶

The type of prior to use for each dimension

Type:	list, where element is one of { “normal”, “normalone”, “macau”, “macauone”, “spikeandslab” }

num_latent¶

Number of latent dimensions in the model

Type:	int

burnin¶

Number of burnin samples to discard

Type:	int

nsamples¶

Number of samples to keep

Type:	int

num_threads¶

Number of OpenMP threads to use for model building

Type:	int

verbose¶

Verbosity level for C++ library

Type:	{0, 1, 2}

seed¶

Random seed to use for sampling

Type:	float

save_name¶

HDF5 filename to store the samples.

Type:	path

save_freq¶

N>0: save every Nth sample
N==0: never save a sample
N==-1: save only the last sample

Type:	int

checkpoint_freq¶

Save the state of the trainSession every N seconds.

Type:	int

addData(pos, Y, noise=<smurff.helper.FixedNoise object>, is_scarce=False)¶

Stacks more matrices/tensors next to the main train matrix.

pos : shape: Block position of the data with respect to train. The train matrix/tensor has implicit block position (0, 0).
Y : :class: numpy.ndarray, scipy.sparse matrix or :class: SparseTensor: Data matrix/tensor to add
is_scarce : bool: When Y is sparse, and is_scarce is True the missing values are considered as unknown. When Y is sparse, and is_scarce is False the missing values are considered as zero. When Y is dense, this parameter is ignored.
noise : :class: NoiseConfig: Noise model to use for Y

addPropagatedPosterior(mode, mu, Lambda)¶

Adds mu and Lambda from propagated posterior

mode : int: dimension to add side info (rows = 0, cols = 1)
mu : :class: numpy.ndarray matrix: mean matrix mu should have as many rows as num_latent mu should have as many columns as size of dimension mode in train
Lambda : :class: numpy.ndarray matrix: co-variance matrix Lambda should be shaped like K x K x N Where K == num_latent and N == dimension mode in train

addSideInfo(mode, Y, noise=<smurff.helper.SampledNoise object>, direct=True)¶

Adds fully known side info, for use in with the macau or macauone prior

mode : int

dimension to add side info (rows = 0, cols = 1)

Y : :class: numpy.ndarray, scipy.sparse matrix

Side info matrix/tensor Y should have as many rows in Y as you have elemnts in the dimension selected using mode. Columns in Y are features for each element.

noise : :class: NoiseConfig

Noise model to use for Y

direct : boolean

When True, uses a direct inversion method.
When False, uses a CG solver

The direct method is only feasible for a small (< 100K) number of features.

init()¶

Initializes the TrainSession after all data has been added.

You need to call this method befor calling step(), unless you call run()

Returns:
Return type:	`StatusItem` of the trainSession.

makePredictSession()¶: Makes a PredictSession based on the model that as built in this TrainSession.

run()¶

Equivalent to:

self.init()
while self.step():
    pass

setTrain(Y, noise=<smurff.helper.FixedNoise object>, is_scarce=True)¶

Adds a train and optionally a test matrix as input data to this TrainSession

Parameters:	Y – Train matrix/tensor noise – Noise model to use for Y is_scarce (bool) – When Y is sparse, and is_scarce is True the missing values are considered as unknown. When Y is sparse, and is_scarce is False the missing values are considered as zero. When Y is dense, this parameter is ignored.

step()¶

Does on sampling or burnin iteration.

Returns:	- When a step was executed (`StatusItem` of the trainSession.) - After the last iteration, when no step was executed (None.)

MacauSession ¶

class smurff.MacauSession(Ytrain, is_scarce=True, Ytest=None, side_info=None, univariate=False, direct=True, *args, **kwargs)¶

A train trainSession specialized for use with the Macau algorithm

Ytrain¶

Train matrix/tensor

Ytest : scipy.sparse matrix or :class: SparseTensor: Test matrix/tensor. Mainly used for calculating RMSE.
side_info : list of :class: numpy.ndarray, scipy.sparse matrix or None: Side info matrix/tensor for each dimension If there is no side info for a certain mode, pass None. Each side info should have as many rows as you have elemnts in corresponding dimension of Ytrain.
direct : bool: Use Cholesky instead of CG solver
univariate : bool: Use univariate or multivariate sampling.
**args:: Extra arguments are passed to the TrainSession

Type:

class:	numpy.ndarray, `scipy.sparse` matrix or :class: SparseTensor

BPMFSession ¶

class smurff.BPMFSession(Ytrain, is_scarce=True, Ytest=None, univariate=False, *args, **kwargs)¶

A train trainSession specialized for use with the BPMF algorithm

Ytrain¶

Train matrix/tensor

Ytest : scipy.sparse matrix or :class: SparseTensor: Test matrix/tensor. Mainly used for calculating RMSE.
univariate : bool: Use univariate or multivariate sampling.
**args:: Extra arguments are passed to the TrainSession

Type:

class:	numpy.ndarray, `scipy.sparse` matrix or :class: SparseTensor

StatusItem ¶

class smurff.StatusItem¶

Short set of parameters indicative for the training progress.

auc_1sample¶: ROC AUC of the test matrix of the last sampleOnly available if you provided a threshold

auc_avg¶: Average ROC AUC of the test matrix across all samplesOnly available if you provided a threshold

elapsed_iter¶: Number of seconds the last sampling iteration took

iter¶: Current iteration in current phase

nnz_per_sec¶: Compute performance indicator; number of non-zero elements in train processed per second

phase¶: { “Burnin”, “Sampling” }

rmse_1sample¶: RMSE for test matrix of last sample

rmse_avg¶: Averag RMSE for test matrix across all samples

samples_per_sec¶: Compute performance indicator; number of rows and columns in U/V processed per second

train_rmse¶: RMSE for train matrix of last sample

Training¶

TrainSession¶

MacauSession¶

BPMFSession¶

StatusItem¶

Training ¶

TrainSession ¶

MacauSession ¶

BPMFSession ¶

StatusItem ¶