Quickstart

The examples folder contains some scripts showing in an incremental way how the NXTfusion library can be used, on both synthetic and real data.

Example1: single (nonlinear) matrix factorization

The file examples/example1.py contains the simplest example of how NXTfusion can be used. We use numpy to randomly generate a (100,1000) real valued matrix and we assume it represents the affinity between proteins (represented by the protein entity protEnt) and compounds/drugs (drugEnt).

protDrugMat = np.random.rand(100, 1000)
protEnt = NX.Entity("proteins", list(range(0,100)), np.int16)
drugEnt = NX.Entity("compounds", list(range(0,1000)), np.int16)

We thus transform the numpy.ndarray matrix into a NXTfusion.DataMatrix.DataMatrix object which stores the matrix/relation data in a way suitable for minibatching in a Neural Network (NN). As you can see from the module details, there are many constructors for the NXTfusion.DataMatrix.DataMatrix object, in this case the one that processes a numpy.ndarray matrix will be automatically called.

protDrugMat = DM.DataMatrix("protDrugMatrix", protEnt, drugEnt, protDrugMat)

Next, we define a loss function suitable for this relation. Since we generated real values, we the task of factorizing this relation will be a regression.

protDrugLoss = L.LossWrapper(t.nn.MSELoss(), type="regression", ignore_index = IGNORE_INDEX)

The ignore_index is used to tell the NN which values should be ignored during the computation of the loss. It allows to train on partially observed matrices (sparse).

After that we just need to build the Entity-Relation graph (ERgraph) as we intend it using the APIs provided by NXTfusion. To fo so, we first define a NXTfusion.NXTfusion.MetaRelation “prot-drug” that will contain all the relations between those entities.

We then append the actual NXTfusion.NXTfusion.Relation object (represented by the protDrugMat object) to this NXTfusion.NXTfusion.MetaRelation with the NXTfusion.NXTfusion.MetaRelation.append() method. In the classic Matrix Factorization settings, only one metrix is considered, meaning that there will be only one relation between two entities.

protDrugRel = NX.MetaRelation("prot-drug", protEnt, drugEnt, None, None)
protDrugRel.append(NX.Relation("drugInteraction", protEnt, drugEnt, protDrugMat, "regression", protDrugLoss, relationWeight=1))
ERgraph = NX.ERgraph([protDrugRel])

In this case the NXTfusion.NXTfusion.ERgraph will thus be formed by a single NXTfusion.NXTfusion.MetaRelation containing only a NXTfusion.NXTfusion.Relation. We create such object as shown.

The next step is to define a NN model able to perform inference over this simple graph. We provide such a pytorch NN as example1Model. We input this model to the NXTfusion.NXmultiRelSide.NNwrapper object, which will mediate the interaction between the NN object and the NXTfusion.NXTfusion.ERgraph, in a transparent way to the user.

model = example1Model(ERgraph, "mod1")
wrapper = NNwrapper(model, dev = DEVICE, ignore_index = IGNORE_INDEX)
wrapper.fit(ERgraph, epochs=50)

The NNwrapper has the scikit-learn-inspired NXTfusion.NXmultiRelSide.NNwrapper.fit() and NXTfusion.NXmultiRelSide.NNwrapper.predict() methods, that are the only way in which the user is required to interact with it. The NXTfusion.NXmultiRelSide.NNwrapper.fit() model will train the example1Model NN to factorize the NXTfusion.NXTfusion.ERgraph.

In order to obtain predictions from the trained model, we will use the NXTfusion.NXmultiRelSide.NNwrapper.predict() method. In order to tell the NXTfusion.NXmultiRelSide.NNwrapper which cells in the matrix/Relation we are interested in, we need to build a special “input vector” X. In this case we want to predict the entire matrix, to make sure that the training converged, and we thus use the buildPytorchFeats function to transform the entire matrix into a NXTfusion.NXmultiRelSide.NNwrapper.predict()-understandable format.

X, Y, corresp = buildPytorchFeats(protDrugMat)
Yp = wrapper.predict(ERgraph, X, "prot-drug", "drugInteraction", None, None)

We thus use the predict to obtain the model’s predictions for the requested positions (X) of the NXTfusion.NXTfusion.Relation “drugInteraction” within the NXTfusion.NXTfusion.MetaRelation “prot-drug” in the NXTfusion.NXTfusion.ERgraph . This specification of which NXTfusion.NXTfusion.Relation and:py:class:NXTfusion.NXTfusion.MetaRelation should be predicted seems unnecessary here, where only one NXTfusion.NXTfusion.Relation exists, but becomes important when you want to predict a specific relation in larger ER graphs.