Preprocessing and Input Files¶
This section talks about:
Contents
Sparse Tensors¶
-
class
smurff.
SparseTensor
(data, shape=None)¶ Wrapper around a pandas DataFrame to represent a sparse tensor
The DataFrame should have N index columns (int type) and 1 value column (float type) N is the dimensionality of the tensor
You can also specify the shape of the tensor. If you don’t it is detected automatically.
Split Train and Test¶
-
smurff.
make_train_test
(Y, ntest)¶ Splits a sparse matrix Y into a train and a test matrix.
Parameters: - Y (scipy sparse matrix (coo_matrix, csr_matrix or csc_matrix)) – Matrix to split
- ntest (float <1.0 or integer.) –
- if float, then indicates the ratio of test cells
- if integer, then indicates the number of test cells
Returns: - Ytrain (coo_matrix) – train part
- Ytest (coo_matrix) – test part
Scaling and Centering¶
-
smurff.center.
center_and_scale
(m, mode, with_mean=True, with_std=True)¶ Center and/or scale the matrix m to the mean and/or standard deviation.
Parameters: - m ({array-like, sparse matrix}) – The data to center and scale.
- mode ({ "rows", "cols", "global" }) –
- “rows”: center/scale each row indepently
- ”cols”: center/scale each column idependently
- ”global”: center/scale using global meand and/or standard deviation/
- with_mean (boolean, True by default) – If True, center the data before scaling.
- with_std (boolean, True by default) – If True, scale the data to unit variance (or equivalently, unit standard deviation).
Returns: - m (array-like) – Transformed array.
- mean (array-like or double or None) – Computed mean depending on mode
- std (array-like or double or None) – Computed standard deviation depending on mode
Notes
Also supports scaling of sparse matrices. This makes sense only when the matrix is scarce, i.e. when the zero-elements represent unknown values.
Example ChEMBL dataset¶
-
smurff.
load_chembl
()¶ Downloads a small subset of the ChEMBL dataset.
Returns: - ic50_train (sparse matrix) – sparse train matrix
- ic50_test (sparse matrix) – sparse test matrix
- feat (sparse matrix) – sparse row features