NXTfusion.DataMatrix module

class NXTfusion.DataMatrix.DataMatrix[source]

Bases: object

The input “data” format should be: {(ent1, ent2): value} for all the observed elements in the matrix.

The format in which the data is stored in the DataMatrix object is the following: featsHT = {domain1Name_numeric : [ numpy16_domain2Names_numeric, numpyX_labels ]}

__init__(self, name: str, ent1: Entity, ent2: Entity, data: numpy.ndarray)

One of the alternative constructors for the DataMatrix class.

Parameters
  • name (str) – Name of the data matrix

  • ent1 (Entity) – Entity object representing the object on the dimension 0

  • ent2 (Entity) – Entity object representing the object on the dimension 1

  • data (dict) – Hash table containing the (sparse) elements and in the matrix describing the relation. The input “data” format should be: {(ent1, ent2): value} for all the observed elements in the matrix.

  • dtype (numpy.dtype) – The smallest possible type that could be used to store the elements of the matrix (e.g. np.int16 can represent up to 2^16 unique objects in the entity)

__init__(self, name: str, ent1: NX.Entity, ent2: NX.Entity, data: dict, dtype: type)

One of the alternative constructors for the DataMatrix class.

Parameters
  • name (str) – Name of the data matrix

  • ent1 (Entity) – Entity object representing the object on the dimension 0

  • ent2 (Entity) – Entity object representing the object on the dimension 1

  • data (numpy.ndarray) – Numpy matrix containing the (dense) describing the relation between ent1 and en2.

  • dtype (numpy.dtype) – The smallest possible type that could be used to store the elements of the matrix (e.g. np.int16)

Returns

the message id

__init__(self, name: str, data: numpy.ndarray, dtype: numpy.dtype)

Simplified constructor for the DataMatrix class. Entities are inferred from the dimensionality of the np.ndarray.

Parameters
  • name (str) – Name of the data matrix

  • data (numpy.ndarray) – Numpy matrix containing the (dense) describing the relation between ent1 and en2.

  • dtype (numpy.dtype) – The smallest possible type that could be used to store the elements of the matrix (e.g. np.int16 can represent up to 2^16 unique objects in the entity)

__init__(self, path: str)

Constructor that reads the DataMatrix from a previously serialized DataMatrix object.

Parameters

path (str) – Path of the serialized DataMatrix

size()[source]

Function that return the size of the relation (number of elements in the matrix).

Returns

Return type

Size of the relation in the DataMatrix object

standardize()[source]

Method that standardizes the matrix with the formula x’ = (x - mu)/s, where mu is the mean and s is the standard deviation.

Returns

Return type

None

toHashTable()dict[source]

Method that returns an hash table (dict) containing the DataMatrix data.

Returns

Return type

dict

class NXTfusion.DataMatrix.SideInfo[source]

Bases: object

Class that encapsulated the side information raw data in order to be efficiently processed by NXTfusion. You can use this class to wrap side information vectors analogously to how DataMatrix wraps matrix/relations.

__init__(self, name: str, ent1: Entity, ent2: Entity, data: dict)

One of the alternative constructors for the SideInfo class.

Parameters
  • name (str) – Name of the data matrix

  • ent1 (Entity) – Entity object representing the object on the dimension 0

  • data (dict) – Dict containing ent1 objects as keys and feature vectors (side information) as values.

__init__(self, name: str, ent1: Entity, ent2: Entity, data: numpy.ndarray)

One of the alternative constructors for the SideInfo class.

Parameters
  • name (str) – Name of the data matrix

  • ent1 (Entity) – Entity object representing the object on the dimension 0

  • data (numpy.ndarray) – Numpy array that contains the side information. It has shape (ent1 obj, feature length), similarly to a scikit-learn feature vector.

__init__(self, name: str, ent1: Entity, ent2: Entity, data: scipy.sparse.coo_matrix)

One of the alternative constructors for the SideInfo class.

Parameters
  • name (str) – Name of the data matrix

  • ent1 (Entity) – Entity object representing the object on the dimension 0

  • data (scipy.sparse.coo_matrix) – Scipy coo_matrix that contains the side information. It has shape (ent1 obj, feature length), similarly to a scikit-learn feature vector. It can be sparse, but currently the sparsity during mini batching is NOT supported.

__init__(self, path: str)

This constructor reads a serialized (SideInfo.dump()) SideInfo object. :param str path: Path to the serialized SideInfo object.

dump(path=None)[source]

Method that serializes the SideInfo storing it at the selected path.

path: str

Destination path for the serialized file

Returns

Return type

None

normalize()[source]

Method that standardizes the matrix with the formula x’ = (x - mu)/s, where mu is the mean and s is the standard deviation.

Returns

Return type

None