Operations
Basic operations for this environment.
enter_table_data
¤
enter_table_data(*, table_name: str, data: LongStr)
Enter table data as CSV. The first row should contain column names.
join_tables
¤
join_tables(bundle_a: Bundle, bundle_b: Bundle, *, table_a: TableName, table_b: TableName, join_type: JoinType = inner, on_column: str = '', left_on: str = '', right_on: str = '', suffixes: str = '_a,_b')
Join/merge dataframes from two bundles.
Parameters: - table_a: Table name from bundle A - table_b: Table name from bundle B - join_type: Type of join - "inner", "outer", "left", "right", "cross" - on_column: Column name to join on (same name in both tables) - left_on: Column name in left table (when column names differ) - right_on: Column name in right table (when column names differ) - suffixes: Suffixes for overlapping columns (comma-separated, e.g., "_a,_b")
organize
¤
organize(bundles: list[Bundle], *, relations: str = '', merge_mode: BundleMergeMode = must_be_unique)
Merge multiple inputs and construct graphs from the tables.
To create a graph, import tables for edges and nodes, and combine them in this operation.
Operations for reading and writing files.
export_to_file
¤
export_to_file(bundle: Bundle, *, table_name: str, filename: PathStr, file_format: FileFormat = csv)
Exports a DataFrame to a file.
| PARAMETER | DESCRIPTION |
|---|---|
bundle
|
The bundle containing the DataFrame to export.
TYPE:
|
table_name
|
The name of the DataFrame in the bundle to export.
TYPE:
|
filename
|
The name of the file to export to.
TYPE:
|
file_format
|
The format of the file to export to. Defaults to CSV.
TYPE:
|
import_csv
¤
Imports a CSV file.
import_file
¤
import_file(*, file_path: PathStr, table_name: str, file_format: FileFormat = csv, **kwargs) -> Bundle
Read the contents of the a file into a Bundle.
| PARAMETER | DESCRIPTION |
|---|---|
file_path
|
Path to the file to import.
TYPE:
|
table_name
|
Name to use for identifying the table in the bundle.
TYPE:
|
file_format
|
Format of the file. Has to be one of the values in the
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Bundle
|
Bundle with a single table with the contents of the file.
TYPE:
|
Operations for graphs.
sample_graph
¤
Takes a (preferably connected) subgraph.
Operations for machine learning.
define_model
¤
Trains the selected model on the selected dataset. Most training parameters are set in the model definition.
model_inference
¤
model_inference(bundle: Bundle, *, model_name: PyTorchModelName = 'model', input_mapping: ModelInferenceInputMapping | None, output_mapping: ModelOutputMapping | None, batch_size: int = 1)
Executes a trained model.
train_model
¤
train_model(bundle: Bundle, *, model_name: PyTorchModelName = 'model', input_mapping: ModelTrainingInputMapping | None, epochs: int = 1, batch_size: int = 1)
Trains the selected model on the selected dataset. Training parameters specific to the model are set in the model definition, while parameters specific to the hardware environment and dataset are set here.
PyKEEN graph embedding operations.
PyKEENModelName
module-attribute
¤
PyKEENModelName = Annotated[str, {'format': 'dropdown', 'metadata_query': "[].other.*[] | [?type == 'pykeen-model'].key"}]
A type annotation to be used for parameters of an operation. PyKEENModelName is rendered as a dropdown in the frontend, listing the PyKEEN models in the Bundle. The model name is passed to the operation as a string.
PyKEENModelWrapper
¤
Wrapper to add metadata method to PyKEEN models for dropdown queries, and to enable caching of model
def_pykeen_with_attributes
¤
def_pykeen_with_attributes(dataset: Bundle, *, interaction_name: PyKEENModel1D = TransE, combination_name: PyKEENCombinations = ConcatProjection, embedding_dim: int, loss_function: str, random_seed: int, save_as: str, **kwargs) -> Bundle
Defines a PyKEEN model capable of using numeric literals as node attributes.
define_pykeen_model
¤
define_pykeen_model(bundle: Bundle, *, model: PyKEENModelMoreD = MuRE, edge_data_table: TableName = 'edges', embedding_dim: int = 50, loss_function: PyKEENSupportedLosses = NSSALoss, seed: int = 42, save_as: str = 'PyKEENmodel')
Defines a PyKEEN model based on the selected model type.
evaluate
¤
evaluate(bundle: Bundle, *, model_name: PyKEENModelName = 'PyKEENmodel', evaluator_type: EvaluatorTypes = RankBasedEvaluator, eval_table: TableName = 'edges_test', additional_true_triples_table: TableName = 'edges_train', metrics_str: str = 'ALL', batch_size: int = 32)
Evaluates the given model on the test set using the specified evaluator type. Args: evaluator_type: The type of evaluator to use. Note: When using classification based methods, evaluation may be extremely slow. metrics_str: Comma separated list, "ALL" if all metrics are needed.
factory_to_df
¤
Convert a TriplesFactory to a DataFrame with labeled columns.
full_predict
¤
full_predict(bundle: Bundle, *, model_name: PyKEENModelName = 'PyKEENmodel', k: int | None = None, inductive_setting: bool = False)
Warning: This prediction can be a very expensive operation!
| PARAMETER | DESCRIPTION |
|---|---|
k
|
Pass "" to keep all scores
TYPE:
|
get_inductive_model
¤
get_inductive_model(bundle: Bundle, *, triples_table: TableName, inference_table: TableName, interaction: PyKEENModel1D = DistMult, embedding_dim: int = 200, loss_function: str, num_tokens: int = 2, aggregation: PyTorchAggregationFunctions = MLP, use_GNN: bool = False, seed: int = 42, save_as: str = 'InductiveModel')
Defines an InductiveNodePiece model (with an optional GNN message passing layer) for inductive link prediction tasks.
| PARAMETER | DESCRIPTION |
|---|---|
triples_table
|
The transductive edges of the graph.
TYPE:
|
inference_table
|
The inductive edges of the graph.
TYPE:
|
interaction
|
Type of interaction the model will use for link prediction scoring.
TYPE:
|
num_tokens
|
Number of hash tokens for each node representation, usually 66th percentiles of the number of unique incident relations per node.
TYPE:
|
aggregation
|
Aggregation of multiple token representations to a single entity representation. Pick a top-level torch function, or use 'mlp' for a two-layer built-in mlp aggregator.
TYPE:
|
import_inductive_dataset
¤
Imports an inductive dataset from the PyKEEN library.
import_pykeen_dataset_path
¤
Imports a dataset from the PyKEEN library.
inductively_split_dataset
¤
inductively_split_dataset(bundle: Bundle, *, dataset_table: TableName, entity_ratio: float = 0.5, training_ratio: float = 0.8, testing_ratio: float = 0.1, validation_ratio: float = 0.1, seed: int = 42)
Splits incoming data into 4 subsets. Transductive training on which training should be run, inductive inference on which during training inference is done. Inference testing and validation sets that can be used to evaluate model performance.
| PARAMETER | DESCRIPTION |
|---|---|
entity_ratio
|
How many percent of the entities in the dataset should be in the transductive training graph. If
TYPE:
|
training_ratio
|
When semi-inductive this is entity ratio, when fully-inductive this is the inference training split
TYPE:
|
testing_ratio
|
When semi-inductive this is entity ratio, when fully-inductive this is the inference testing split
TYPE:
|
validation_ratio
|
When semi-inductive this is entity ratio, when fully-inductive this is the inference validation split
TYPE:
|
prepare_triples
¤
prepare_triples(triples_df: DataFrame, entity_to_id: Optional[Mapping[str, int]] = None, relation_to_id: Optional[Mapping[str, int]] = None, inv_triples: bool = False, numeric_literals: Optional[DataFrame] = None) -> TriplesFactory | TriplesNumericLiteralsFactory
Prepare triples for PyKEEN from a DataFrame.
req_inverse_triples
¤
Check if the model requires inverse triples.
target_predict
¤
target_predict(bundle: Bundle, *, model_name: PyKEENModelName = 'PyKEENmodel', head: str, relation: str, tail: str, inductive_setting: bool = False)
Leave the target prediction field empty
Visualizations.
binned_graph_visualization
¤
binned_graph_visualization(b: Bundle, *, x_property: NodePropertyName, y_property: NodePropertyName, x_bins=5, y_bins=5, show_loops: bool = False)
Nodes binned together by x and y are aggregated into one node. Edges between bins are aggregated into one edge.