Example2: tensor factorization¶

The examples/example2.py file contains a simple script performing tensor factorization, namely inference over multiple NXTfusion.NXTfusion.Relation between two NXTfusion.NXTfusion.Entity .

We start by defining the same entities used in examples/example1.

protEnt = NX.Entity("proteins", list(range(0,100)), np.int16)
drugEnt = NX.Entity("compounds", list(range(0,1000)), np.int16)

Then we create three random matrices that will define the 3 different relations between protEnt and drugEnt, and we put them into the NXTfusion.DataMatrix.DataMatrix format, which allows optimized mini-batching during training.

protDrugMat1 = np.random.rand(100, 1000)
protDrugMat2 = np.random.rand(100, 1000)
protDrugMat3 = np.random.rand(100, 1000)
protDrugMat1 = DM.DataMatrix("protDrugMatrix1", protEnt, drugEnt, protDrugMat1)
protDrugMat2 = DM.DataMatrix("protDrugMatrix2", protEnt, drugEnt, protDrugMat2)
protDrugMat3 = DM.DataMatrix("protDrugMatrix3", protEnt, drugEnt, protDrugMat3)

Since we have three relations, and that they might constitute different prediction tasks (e.g. regression, prediction), we define one loss function for each NXTfusion.NXTfusion.Relation. As an example, here we use 3 different losses for regression that are provided by pytorch.

We encapsulate each of them with the NXTfusion.NXLosses.LossWrapper class: this will allow the losses to ignore the ignore_index values, thus allowing fast (batched) inference over sparsely observed matrices (matrices/Relations with missing values).

protDrugLoss1 = L.LossWrapper(t.nn.MSELoss(), type="regression", ignore_index = IGNORE_INDEX)
protDrugLoss2 = L.LossWrapper(t.nn.L1Loss(), type="regression", ignore_index = IGNORE_INDEX)
protDrugLoss3 = L.LossWrapper(t.nn.SmoothL1Loss(), type="regression", ignore_index = IGNORE_INDEX)

We then build the ER graph using the NXTfusion API. We thus define the NXTfusion.NXTfusion.Relation that will contain all the relations between the protEnt and drugEnt entities, and we add the relations one by one. Finally, we instantiate the NXTfusion.NXTfusion.ERgraph object, which will contain the MetaRelation.

protDrugRel = NX.MetaRelation("prot-drug", protEnt, drugEnt, None, None)
protDrugRel.append(NX.Relation("drugInteraction1", protEnt, drugEnt, protDrugMat1, "regression", protDrugLoss1, relationWeight=1))
protDrugRel.append(NX.Relation("drugInteraction2", protEnt, drugEnt, protDrugMat2, "regression", protDrugLoss2, relationWeight=1))
protDrugRel.append(NX.Relation("drugInteraction3", protEnt, drugEnt, protDrugMat3, "regression", protDrugLoss3, relationWeight=1))
ERgraph = NX.ERgraph([protDrugRel])

We perform training as usual, defining a t.nn.Module suitable for the target ERgraph and we incapsulate it into the NNwrapper. We can then use the .fit() and .predict() methods to train and test the model.

model = example2Model(ERgraph, "mod2")
wrapper = NNwrapper(model, dev = DEVICE, ignore_index = IGNORE_INDEX)
wrapper.fit(ERgraph, epochs=5)

Since the ERgraph contains multiple relations, we can predict separately each of them. The following code shows how to do it. First, we compute the X values for the NXTfusion.NXTfusion.Relation we want to predict, and then we specify to the .predict function the name of the target MetaRelation and the NXTfusion.NXTfusion.Relation in it. The NXTfusion.NXmultiRelSide.NNwrapper.predict() method will return the predictions for the specified relation, or an error if it is not present.

X, Y, corresp = buildPytorchFeats(protDrugMat2)
Yp1 = wrapper.predict(ERgraph, X, "prot-drug", "drugInteraction2", None, None)
print("Final MSE: ", (np.sum((np.array(Yp) - np.array(Y))**2))/float(len(Yp)))

X, Y, corresp = buildPytorchFeats(protDrugMat3)
Yp1 = wrapper.predict(ERgraph, X, "prot-drug", "drugInteraction3", None, None)
print("Final MSE: ", (np.sum((np.array(Yp) - np.array(Y))**2))/float(len(Yp)))