Operations

Basic operations for this environment.

enter_table_data ¤

enter_table_data(*, table_name: str, data: LongStr)

Enter table data as CSV. The first row should contain column names.

join_tables ¤

join_tables(bundle_a: Bundle, bundle_b: Bundle, *, table_a: TableName, table_b: TableName, join_type: JoinType = inner, on_column: str = '', left_on: str = '', right_on: str = '', suffixes: str = '_a,_b')

Join/merge dataframes from two bundles.

Parameters: - table_a: Table name from bundle A - table_b: Table name from bundle B - join_type: Type of join - "inner", "outer", "left", "right", "cross" - on_column: Column name to join on (same name in both tables) - left_on: Column name in left table (when column names differ) - right_on: Column name in right table (when column names differ) - suffixes: Suffixes for overlapping columns (comma-separated, e.g., "_a,_b")

organize ¤

organize(bundles: list[Bundle], *, relations: str = '', merge_mode: BundleMergeMode = must_be_unique)

Merge multiple inputs and construct graphs from the tables.

To create a graph, import tables for edges and nodes, and combine them in this operation.

Operations for reading and writing files.

export_to_file ¤

export_to_file(bundle: Bundle, *, table_name: str, filename: PathStr, file_format: FileFormat = csv)

Exports a DataFrame to a file.

PARAMETER	DESCRIPTION
`bundle`	The bundle containing the DataFrame to export. TYPE: `Bundle`
`table_name`	The name of the DataFrame in the bundle to export. TYPE: `str`
`filename`	The name of the file to export to. TYPE: `PathStr`
`file_format`	The format of the file to export to. Defaults to CSV. TYPE: `FileFormat` DEFAULT: `csv`

import_csv ¤

import_csv(*, filename: PathStr, columns: str = '<from file>', separator: str = '<auto>')

Imports a CSV file.

import_file ¤

import_file(*, file_path: PathStr, table_name: str, file_format: FileFormat = csv, **kwargs) -> Bundle

Read the contents of the a file into a Bundle.

PARAMETER	DESCRIPTION
`file_path`	Path to the file to import. TYPE: `PathStr`
`table_name`	Name to use for identifying the table in the bundle. TYPE: `str`
`file_format`	Format of the file. Has to be one of the values in the `FileFormat` enum. TYPE: `FileFormat` DEFAULT: `csv`

RETURNS	DESCRIPTION
`Bundle`	Bundle with a single table with the contents of the file. TYPE: `Bundle`

import_graphml ¤

import_graphml(*, filename: PathStr)

Imports a GraphML file.

import_parquet ¤

import_parquet(*, filename: PathStr)

Imports a Parquet file.

Operations for graphs.

sample_graph ¤

sample_graph(graph: Graph, *, nodes: int = 100)

Takes a (preferably connected) subgraph.

Operations for machine learning.

define_model ¤

define_model(bundle: Bundle, *, model_workspace: str, save_as: str = 'model')

Trains the selected model on the selected dataset. Most training parameters are set in the model definition.

model_inference ¤

model_inference(bundle: Bundle, *, model_name: PyTorchModelName = 'model', input_mapping: ModelInferenceInputMapping | None, output_mapping: ModelOutputMapping | None, batch_size: int = 1)

Executes a trained model.

train_model ¤

train_model(bundle: Bundle, *, model_name: PyTorchModelName = 'model', input_mapping: ModelTrainingInputMapping | None, epochs: int = 1, batch_size: int = 1)

Trains the selected model on the selected dataset. Training parameters specific to the model are set in the model definition, while parameters specific to the hardware environment and dataset are set here.

train_test_split ¤

train_test_split(bundle: Bundle, *, table_name: TableName, test_ratio: float = 0.1, seed=1234)

Splits a dataframe in the bundle into separate "_train" and "_test" dataframes.

train_test_val_split ¤

train_test_val_split(bundle: Bundle, *, table_name: TableName, test_ratio: float = 0.1, val_ratio: float = 0.1, seed=1234)

Splits a dataframe in the bundle into separate "_train", "_test" and "_val" dataframes.

PyKEEN graph embedding operations.

PyKEENModelName `module-attribute` ¤

PyKEENModelName = Annotated[str, {'format': 'dropdown', 'metadata_query': "[].other.*[] | [?type == 'pykeen-model'].key"}]

A type annotation to be used for parameters of an operation. PyKEENModelName is rendered as a dropdown in the frontend, listing the PyKEEN models in the Bundle. The model name is passed to the operation as a string.

PyKEENModelWrapper ¤

Wrapper to add metadata method to PyKEEN models for dropdown queries, and to enable caching of model

def_pykeen_with_attributes ¤

def_pykeen_with_attributes(dataset: Bundle, *, interaction_name: PyKEENModel1D = TransE, combination_name: PyKEENCombinations = ConcatProjection, embedding_dim: int, loss_function: str, random_seed: int, save_as: str, **kwargs) -> Bundle

Defines a PyKEEN model capable of using numeric literals as node attributes.

define_pykeen_model ¤

define_pykeen_model(bundle: Bundle, *, model: PyKEENModelMoreD = MuRE, edge_data_table: TableName = 'edges', embedding_dim: int = 50, loss_function: PyKEENSupportedLosses = NSSALoss, seed: int = 42, save_as: str = 'PyKEENmodel')

Defines a PyKEEN model based on the selected model type.

evaluate ¤

evaluate(bundle: Bundle, *, model_name: PyKEENModelName = 'PyKEENmodel', evaluator_type: EvaluatorTypes = RankBasedEvaluator, eval_table: TableName = 'edges_test', additional_true_triples_table: TableName = 'edges_train', metrics_str: str = 'ALL', batch_size: int = 32)

Evaluates the given model on the test set using the specified evaluator type. Args: evaluator_type: The type of evaluator to use. Note: When using classification based methods, evaluation may be extremely slow. metrics_str: Comma separated list, "ALL" if all metrics are needed.

factory_to_df ¤

factory_to_df(factory: CoreTriplesFactory) -> DataFrame

Convert a TriplesFactory to a DataFrame with labeled columns.

full_predict ¤

full_predict(bundle: Bundle, *, model_name: PyKEENModelName = 'PyKEENmodel', k: int | None = None, inductive_setting: bool = False)

Warning: This prediction can be a very expensive operation!

PARAMETER	DESCRIPTION
`k`	Pass "" to keep all scores TYPE: `int \| None` DEFAULT: `None`

get_inductive_model ¤

get_inductive_model(bundle: Bundle, *, triples_table: TableName, inference_table: TableName, interaction: PyKEENModel1D = DistMult, embedding_dim: int = 200, loss_function: str, num_tokens: int = 2, aggregation: PyTorchAggregationFunctions = MLP, use_GNN: bool = False, seed: int = 42, save_as: str = 'InductiveModel')

Defines an InductiveNodePiece model (with an optional GNN message passing layer) for inductive link prediction tasks.

PARAMETER	DESCRIPTION
`triples_table`	The transductive edges of the graph. TYPE: `TableName`
`inference_table`	The inductive edges of the graph. TYPE: `TableName`
`interaction`	Type of interaction the model will use for link prediction scoring. TYPE: `PyKEENModel1D` DEFAULT: `DistMult`
`num_tokens`	Number of hash tokens for each node representation, usually 66th percentiles of the number of unique incident relations per node. TYPE: `int` DEFAULT: `2`
`aggregation`	Aggregation of multiple token representations to a single entity representation. Pick a top-level torch function, or use 'mlp' for a two-layer built-in mlp aggregator. TYPE: `PyTorchAggregationFunctions` DEFAULT: `MLP`

import_inductive_dataset ¤

import_inductive_dataset(*, dataset: InductiveDataset = ILPC2022Small)

Imports an inductive dataset from the PyKEEN library.

import_pykeen_dataset_path ¤

import_pykeen_dataset_path(self, *, dataset: PyKEENDataset = Nations) -> Bundle

Imports a dataset from the PyKEEN library.

inductively_split_dataset ¤

inductively_split_dataset(bundle: Bundle, *, dataset_table: TableName, entity_ratio: float = 0.5, training_ratio: float = 0.8, testing_ratio: float = 0.1, validation_ratio: float = 0.1, seed: int = 42)

Splits incoming data into 4 subsets. Transductive training on which training should be run, inductive inference on which during training inference is done. Inference testing and validation sets that can be used to evaluate model performance.

PARAMETER	DESCRIPTION
`entity_ratio`	How many percent of the entities in the dataset should be in the transductive training graph. If `0` semi-inductive split is applied, else fully-inductive split is applied TYPE: `float` DEFAULT: `0.5`
`training_ratio`	When semi-inductive this is entity ratio, when fully-inductive this is the inference training split TYPE: `float` DEFAULT: `0.8`
`testing_ratio`	When semi-inductive this is entity ratio, when fully-inductive this is the inference testing split TYPE: `float` DEFAULT: `0.1`
`validation_ratio`	When semi-inductive this is entity ratio, when fully-inductive this is the inference validation split TYPE: `float` DEFAULT: `0.1`

prepare_triples ¤

prepare_triples(triples_df: DataFrame, entity_to_id: Optional[Mapping[str, int]] = None, relation_to_id: Optional[Mapping[str, int]] = None, inv_triples: bool = False, numeric_literals: Optional[DataFrame] = None) -> TriplesFactory | TriplesNumericLiteralsFactory

Prepare triples for PyKEEN from a DataFrame.

req_inverse_triples ¤

req_inverse_triples(model: Model | PyKEENModel1D | PyKEENModelMoreD) -> bool

Check if the model requires inverse triples.

target_predict ¤

target_predict(bundle: Bundle, *, model_name: PyKEENModelName = 'PyKEENmodel', head: str, relation: str, tail: str, inductive_setting: bool = False)

Leave the target prediction field empty

SQL and Cypher.

cypher ¤

cypher(bundle: Bundle, *, query: LongStr, save_as: str = 'results')

Run a Cypher query on the graph in the bundle. Save the results as a new DataFrame.

sql ¤

sql(bundle: Bundle, *, query: LongStr, save_as: str = 'results')

Run a SQL query on the DataFrames in the bundle. Save the results as a new DataFrame.

Visualizations.

binned_graph_visualization ¤

binned_graph_visualization(b: Bundle, *, x_property: NodePropertyName, y_property: NodePropertyName, x_bins=5, y_bins=5, show_loops: bool = False)

Nodes binned together by x and y are aggregated into one node. Edges between bins are aggregated into one edge.

Automatically wraps all NetworkX functions as LynxKite operations.