Training ¶

Contents

Training

The most versatile class is TrainSession. MacauSession and BPMFSession provide a simpler interface.

TrainSession ¶

class smurff.TrainSession(priors=[u'normal', u'normal'], num_latent=NUM_LATENT_DEFAULT_VALUE, num_threads=NUM_THREADS_DEFAULT_VALUE, burnin=BURNIN_DEFAULT_VALUE, nsamples=NSAMPLES_DEFAULT_VALUE, seed=RANDOM_SEED_DEFAULT_VALUE, threshold=None, verbose=1, save_prefix=None, save_extension=None, save_freq=None, checkpoint_freq=None, csv_status=None)¶

Class for doing a training run in smurff

A simple use case could be:

>>> session = smurff.TrainSession(burnin = 5, nsamples = 5)
>>> session.addTrainAndTest(Ydense)
>>> session.run()

priors¶

The type of prior to use for each dimension

Type:	list, where element is one of { “normal”, “normalone”, “macau”, “macauone”, “spikeandslab” }

num_latent¶

Number of latent dimensions in the model

Type:	int

burnin¶

Number of burnin samples to discard

Type:	int

nsamples¶

Number of samples to keep

Type:	int

num_threads¶

Number of OpenMP threads to use for model building

Type:	int

verbose¶

Verbosity level for C++ library

Type:	{0, 1, 2}

seed¶

Random seed to use for sampling

Type:	float

save_prefix¶

Path where to store the samples. The path includes the directory name, as well as the initial part of the file names.

Type:	path

save_freq¶

N>0: save every Nth sample
N==0: never save a sample
N==-1: save only the last sample

Type:	int

save_extension¶

.csv: save in textual csv file format
.ddm: save in binary file format

Type:	{ “.csv”, “.ddm” }

checkpoint_freq¶

Save the state of the session every N seconds.

Type:	int

csv_status¶

Stores limited set of parameters, indicative for training progress in this file. See StatusItem

Type:	filepath

addData(self, pos, Y, is_scarce=False, noise=PyNoiseConfig())¶

Stacks more matrices/tensors next to the main train matrix.

pos : shape: Block position of the data with respect to train. The train matrix/tensor has implicit block position (0, 0).
Y : :class: numpy.ndarray, scipy.sparse matrix or :class: SparseTensor: Data matrix/tensor to add
is_scarce : bool: When Y is sparse, and is_scarce is True the missing values are considered as unknown. When Y is sparse, and is_scarce is False the missing values are considered as zero. When Y is dense, this parameter is ignored.
noise : :class: PyNoiseConfig: Noise model to use for Y

addPropagatedPosterior(self, mode, mu, Lambda)¶

Adds mu and Lambda from propagated posterior

mode : int: dimension to add side info (rows = 0, cols = 1)
mu : :class: numpy.ndarray matrix: mean matrix mu should have as many rows as num_latent mu should have as many columns as size of dimension mode in train
Lambda : :class: numpy.ndarray matrix: co-variance matrix Lambda should be shaped like K x K x N Where K == num_latent and N == dimension mode in train

addSideInfo(self, mode, Y, noise=PyNoiseConfig(), tol=1e-6, direct=False)¶

Adds fully known side info, for use in with the macau or macauone prior

mode : int

dimension to add side info (rows = 0, cols = 1)

Y : :class: numpy.ndarray, scipy.sparse matrix

Side info matrix/tensor Y should have as many rows in Y as you have elemnts in the dimension selected using mode. Columns in Y are features for each element.

noise : :class: PyNoiseConfig

Noise model to use for Y

direct : boolean

When True, uses a direct inversion method.
When False, uses a CG solver

The direct method is only feasible for a small (< 100K) number of features.

tol : float

Tolerance for the CG solver.

addTrainAndTest(self, Y, Ytest=None, noise=PyNoiseConfig(), is_scarce=True)¶

Adds a train and optionally a test matrix as input data to this TrainSession

Parameters:

Y – Train matrix/tensor
Ytest (scipy.sparse matrix or :class: SparseTensor) – Test matrix/tensor. Mainly used for calculating RMSE.
noise – Noise model to use for Y
is_scarce (bool) – When Y is sparse, and is_scarce is True the missing values are considered as unknown. When Y is sparse, and is_scarce is False the missing values are considered as zero. When Y is dense, this parameter is ignored.

getConfig(self)¶: Get this TrainSession’s configuration in ini-file format

getRmseAvg(self)¶: Average RMSE across all samples for the test matrix

getStatus(self)¶: Returns StatusItem with current state of the session

getTestPredictions(self)¶

Get predictions for test matrix.

Returns:	list of `Prediction`
Return type:	list

init(self)¶

Initializes the TrainSession after all data has been added.

You need to call this method befor calling step(), unless you call run()

Returns:
Return type:	`StatusItem` of the session.

makePredictSession(self)¶: Makes a PredictSession based on the model that as built in this TrainSession.

run(self)¶

Equivalent to:

self.init()
while self.step():
    pass

step(self)¶

Does on sampling or burnin iteration.

Returns:	- When a step was executed (`StatusItem` of the session.) - After the last iteration, when no step was executed (None.)

MacauSession ¶

class smurff.MacauSession(Ytrain, Ytest=None, side_info=None, univariate=False, direct=False, **args)¶

A train session specialized for use with the Macau algorithm

Ytrain¶

Train matrix/tensor

Ytest : scipy.sparse matrix or :class: SparseTensor: Test matrix/tensor. Mainly used for calculating RMSE.
side_info : list of :class: numpy.ndarray, scipy.sparse matrix or None: Side info matrix/tensor for each dimension If there is no side info for a certain mode, pass None. Each side info should have as many rows as you have elemnts in corresponding dimension of Ytrain.
direct : bool: Use Cholesky instead of CG solver
univariate : bool: Use univariate or multivariate sampling.
**args:: Extra arguments are passed to the TrainSession

Type:

class:	numpy.ndarray, `scipy.sparse` matrix or :class: SparseTensor

BPMFSession ¶

class smurff.BPMFSession(Ytrain, Ytest=None, univariate=False, **args)¶

A train session specialized for use with the BPMF algorithm

Ytrain¶

Train matrix/tensor

Ytest : scipy.sparse matrix or :class: SparseTensor: Test matrix/tensor. Mainly used for calculating RMSE.
univariate : bool: Use univariate or multivariate sampling.
**args:: Extra arguments are passed to the TrainSession

Type:

class:	numpy.ndarray, `scipy.sparse` matrix or :class: SparseTensor

StatusItem ¶

class smurff.StatusItem(phase, iter, phase_iter, model_norms, rmse_avg, rmse_1sample, train_rmse, auc_1sample, auc_avg, elapsed_iter, nnz_per_sec, samples_per_sec)¶

Short set of parameters indicative for the training progress.

phase¶

Type:	{ “Burnin”, “Sampling” }

iter¶

Current iteration in current phase

Type:	int

phase_iter¶

Number of iterations in this phase

Type:	int

model_norms¶

Norm of each U/V matrix

Type:	list of float

rmse_avg¶

Averag RMSE for test matrix across all samples

Type:	float

rmse_1sample¶

RMSE for test matrix of last sample

Type:	float

train_rmse¶

RMSE for train matrix of last sample

Type:	float

auc_1sample¶

ROC AUC of the test matrix of the last sample Only available if you provided a threshold.

Type:	float

auc_avg¶

Averag ROC AUC of the test matrix accross all samples Only available if you provided a threshold.

Type:	float

elapsed_iter¶

Number of seconds the last sampling iteration took

Type:	float

nnz_per_sec¶

Compute performance indicator; number of non-zero elements in train processed per second

Type:	float

samples_per_sec¶

Compute performance indicator; number of rows and columns in U/V processed per second

Type:	float

Training¶

TrainSession¶

MacauSession¶

BPMFSession¶

StatusItem¶

Training ¶

TrainSession ¶

MacauSession ¶

BPMFSession ¶

StatusItem ¶