Note
This page was generated from notebooks/inference_with_smurff.ipynb.
Inference with SMURFF¶
In this notebook we will continue on the first example. After running a training session again in SMURFF, we will look deeper into how to use SMURFF for making predictions.
To make predictions we recall that the value of a tensor model is given by a tensor contraction of all latent matrices. Specifically, the prediction for the element \(\hat{Y}_{ijk}\) of a rank-3 tensor is given by
Since a matrix is a rank-2 tensor the prediction for a matrix is given by:
These inner products are computed by SMURFF automagicaly, as we will see below.
Saving models¶
We run a Macau
training session using side information (ecfp
) from the chembl dataset. We make sure we save every 10th sample, such that we can load the model afterwards. This run will take some minutes to run.
[ ]:
import smurff
import os
ic50_train, ic50_test, ecfp = smurff.load_chembl()
os.makedirs("ic50-macau", exist_ok=True)
session = smurff.MacauSession(
Ytrain = ic50_train,
Ytest = ic50_test,
side_info = [ecfp, None],
num_latent = 16,
burnin = 200,
nsamples = 10,
save_freq = 1,
save_prefix= "ic50-macau",
verbose = 1,)
predictions = session.run()
Saved files¶
The saved files are indexed in a root ini-file, in this case the root ini-file will be ic50-macau/root.ini
. The content of this file lists all saved info for this training run. For example
[options]
options = ic50-save-options.ini
[steps]
sample_step_10 = sample-10-step.ini
sample_step_20 = sample-20-step.ini
sample_step_30 = sample-30-step.ini
sample_step_40 = sample-40-step.ini
Each step ini-file contains the matrices saved in the step:
[models]
num_models = 2
model_0 = sample-50-U0-latents.ddm
model_1 = sample-50-U1-latents.ddm
[predictions]
pred = sample-50-predictions.csv
pred_state = sample-50-predictions-state.ini
[priors]
num_priors = 2
prior_0 = sample-50-F0-link.ddm
prior_1 = sample-50-F1-link.ddm
Making predictions from a TrainSession
¶
The easiest way to make predictions is from an existing TrainSession
:
[ ]:
predictor = session.makePredictSession()
print(predictor)
Once we have a PredictSession
, there are serveral ways to make predictions:
- From a sparse matrix
- For all possible elements in the matrix (the complete \(U \times V\))
- For a single point in the matrix
- Using only side-information
Predict all elements¶
We can make predictions for all rows \(\times\) columns in our matrix
[ ]:
p = predictor.predict_all()
print(p.shape) # p is a numpy array of size: (num samples) x (num rows) x (num columns)
Predict element in a sparse matrix¶
We can make predictions for a sparse matrix, for example our ic50_test
matrix:
[ ]:
p = predictor.predict_some(ic50_test)
print(len(p),"predictions") # p is a list of Predictions
print("predictions 1:", p[0])
Predict just one element¶
Or just one element. Let’s predict the first element of our ic50_test
matrix:
[ ]:
from scipy.sparse import find
(i,j,v) = find(ic50_test)
p = predictor.predict_one((i[0],j[0]),v[0])
print(p)
And plot the histogram of predictions for this element.
[ ]:
%matplotlib inline
import matplotlib.pyplot as plt
# Plot a histogram of the samples.
plt.subplot(111)
plt.hist(p.pred_all, bins=10, density=True, label = "predictions's histogram")
plt.plot(p.val, 1., 'ro', markersize =5, label = 'actual value')
plt.legend()
plt.title('Histogram of ' + str(len(p.pred_all)) + ' predictions')
plt.show()
Make predictions using side information¶
We can make predictions for rows/columns not in our train matrix, using only side info:
[ ]:
import numpy as np
from scipy.sparse import find
(i,j,v) = find(ic50_test)
row_side_info = ecfp.tocsr().getrow(i[0])
p = predictor.predict_one((row_side_info,j[0]),v[0])
print(p)
Accessing the saved model itself¶
The latents matrices for all samples are stored in the PredictSession
as numpy
arrays
[ ]:
# print the U matrices for all samples
for i,s in enumerate(predictor.samples):
print("sample", i, ":", [ (m, u.shape) for m,u in enumerate(s.latents) ])
This will allow us to compute predictions for arbitraty slices of the matrix or tensors using numpy.einsum
:
[ ]:
sample1 = predictor.samples[0]
(U1, U2) = sample1.latents
## predict the slice Y[7, : ] from sample 1
Yhat_7x = np.einsum(U1[:,7], [0], U2, [0, 2])
## predict the slice Y[:, 0:10] from sample 1
Yhat_x10 = np.einsum(U1, [0, 1], U2[:,0:10], [0, 2])
The two examples above give a matrix (rank-2 tensor) as a result. It is adviced to make predictions on all samples, and average the predictions.
Making predictions from saved run¶
One can also make a PredictSession
from a save root ini-file:
[ ]:
import smurff
predictor = smurff.PredictSession("ic50-macau/save-root.ini")
print(predictor)
[ ]: