Skip to content

Operations

Basic operations for this environment.

enter_table_data ¤

enter_table_data(*, table_name: str, data: LongStr)

Enter table data as CSV. The first row should contain column names.

join_tables ¤

join_tables(bundle_a: Bundle, bundle_b: Bundle, *, table_a: TableName, table_b: TableName, join_type: JoinType = inner, on_column: str = '', left_on: str = '', right_on: str = '', suffixes: str = '_a,_b')

Join/merge dataframes from two bundles.

Parameters: - table_a: Table name from bundle A - table_b: Table name from bundle B - join_type: Type of join - "inner", "outer", "left", "right", "cross" - on_column: Column name to join on (same name in both tables) - left_on: Column name in left table (when column names differ) - right_on: Column name in right table (when column names differ) - suffixes: Suffixes for overlapping columns (comma-separated, e.g., "_a,_b")

organize ¤

organize(bundles: list[Bundle], *, relations: str = '', merge_mode: BundleMergeMode = must_be_unique)

Merge multiple inputs and construct graphs from the tables.

To create a graph, import tables for edges and nodes, and combine them in this operation.

Operations for reading and writing files.

export_to_file ¤

export_to_file(bundle: Bundle, *, table_name: str, filename: PathStr, file_format: FileFormat = csv)

Exports a DataFrame to a file.

PARAMETER DESCRIPTION
bundle

The bundle containing the DataFrame to export.

TYPE: Bundle

table_name

The name of the DataFrame in the bundle to export.

TYPE: str

filename

The name of the file to export to.

TYPE: PathStr

file_format

The format of the file to export to. Defaults to CSV.

TYPE: FileFormat DEFAULT: csv

import_csv ¤

import_csv(*, filename: PathStr, columns: str = '<from file>', separator: str = '<auto>')

Imports a CSV file.

import_file ¤

import_file(*, file_path: PathStr, table_name: str, file_format: FileFormat = csv, **kwargs) -> Bundle

Read the contents of the a file into a Bundle.

PARAMETER DESCRIPTION
file_path

Path to the file to import.

TYPE: PathStr

table_name

Name to use for identifying the table in the bundle.

TYPE: str

file_format

Format of the file. Has to be one of the values in the FileFormat enum.

TYPE: FileFormat DEFAULT: csv

RETURNS DESCRIPTION
Bundle

Bundle with a single table with the contents of the file.

TYPE: Bundle

import_graphml ¤

import_graphml(*, filename: PathStr)

Imports a GraphML file.

import_parquet ¤

import_parquet(*, filename: PathStr)

Imports a Parquet file.

Operations for graphs.

sample_graph ¤

sample_graph(graph: Graph, *, nodes: int = 100)

Takes a (preferably connected) subgraph.

Operations for machine learning.

define_model ¤

define_model(bundle: Bundle, *, model_workspace: str, save_as: str = 'model')

Trains the selected model on the selected dataset. Most training parameters are set in the model definition.

model_inference ¤

model_inference(bundle: Bundle, *, model_name: PyTorchModelName = 'model', input_mapping: ModelInferenceInputMapping | None, output_mapping: ModelOutputMapping | None, batch_size: int = 1)

Executes a trained model.

train_model ¤

train_model(bundle: Bundle, *, model_name: PyTorchModelName = 'model', input_mapping: ModelTrainingInputMapping | None, epochs: int = 1, batch_size: int = 1)

Trains the selected model on the selected dataset. Training parameters specific to the model are set in the model definition, while parameters specific to the hardware environment and dataset are set here.

train_test_split ¤

train_test_split(bundle: Bundle, *, table_name: TableName, test_ratio: float = 0.1, seed=1234)

Splits a dataframe in the bundle into separate "_train" and "_test" dataframes.

train_test_val_split ¤

train_test_val_split(bundle: Bundle, *, table_name: TableName, test_ratio: float = 0.1, val_ratio: float = 0.1, seed=1234)

Splits a dataframe in the bundle into separate "_train", "_test" and "_val" dataframes.

PyKEEN graph embedding operations.

PyKEENModelName module-attribute ¤

PyKEENModelName = Annotated[str, {'format': 'dropdown', 'metadata_query': "[].other.*[] | [?type == 'pykeen-model'].key"}]

A type annotation to be used for parameters of an operation. PyKEENModelName is rendered as a dropdown in the frontend, listing the PyKEEN models in the Bundle. The model name is passed to the operation as a string.

PyKEENModelWrapper ¤

Wrapper to add metadata method to PyKEEN models for dropdown queries, and to enable caching of model

def_pykeen_with_attributes ¤

def_pykeen_with_attributes(dataset: Bundle, *, interaction_name: PyKEENModel1D = TransE, combination_name: PyKEENCombinations = ConcatProjection, embedding_dim: int, loss_function: str, random_seed: int, save_as: str, **kwargs) -> Bundle

Defines a PyKEEN model capable of using numeric literals as node attributes.

define_pykeen_model ¤

define_pykeen_model(bundle: Bundle, *, model: PyKEENModelMoreD = MuRE, edge_data_table: TableName = 'edges', embedding_dim: int = 50, loss_function: PyKEENSupportedLosses = NSSALoss, seed: int = 42, save_as: str = 'PyKEENmodel')

Defines a PyKEEN model based on the selected model type.

evaluate ¤

evaluate(bundle: Bundle, *, model_name: PyKEENModelName = 'PyKEENmodel', evaluator_type: EvaluatorTypes = RankBasedEvaluator, eval_table: TableName = 'edges_test', additional_true_triples_table: TableName = 'edges_train', metrics_str: str = 'ALL', batch_size: int = 32)

Evaluates the given model on the test set using the specified evaluator type. Args: evaluator_type: The type of evaluator to use. Note: When using classification based methods, evaluation may be extremely slow. metrics_str: Comma separated list, "ALL" if all metrics are needed.

factory_to_df ¤

factory_to_df(factory: CoreTriplesFactory) -> DataFrame

Convert a TriplesFactory to a DataFrame with labeled columns.

full_predict ¤

full_predict(bundle: Bundle, *, model_name: PyKEENModelName = 'PyKEENmodel', k: int | None = None, inductive_setting: bool = False)

Warning: This prediction can be a very expensive operation!

PARAMETER DESCRIPTION
k

Pass "" to keep all scores

TYPE: int | None DEFAULT: None

get_inductive_model ¤

get_inductive_model(bundle: Bundle, *, triples_table: TableName, inference_table: TableName, interaction: PyKEENModel1D = DistMult, embedding_dim: int = 200, loss_function: str, num_tokens: int = 2, aggregation: PyTorchAggregationFunctions = MLP, use_GNN: bool = False, seed: int = 42, save_as: str = 'InductiveModel')

Defines an InductiveNodePiece model (with an optional GNN message passing layer) for inductive link prediction tasks.

PARAMETER DESCRIPTION
triples_table

The transductive edges of the graph.

TYPE: TableName

inference_table

The inductive edges of the graph.

TYPE: TableName

interaction

Type of interaction the model will use for link prediction scoring.

TYPE: PyKEENModel1D DEFAULT: DistMult

num_tokens

Number of hash tokens for each node representation, usually 66th percentiles of the number of unique incident relations per node.

TYPE: int DEFAULT: 2

aggregation

Aggregation of multiple token representations to a single entity representation. Pick a top-level torch function, or use 'mlp' for a two-layer built-in mlp aggregator.

TYPE: PyTorchAggregationFunctions DEFAULT: MLP

import_inductive_dataset ¤

import_inductive_dataset(*, dataset: InductiveDataset = ILPC2022Small)

Imports an inductive dataset from the PyKEEN library.

import_pykeen_dataset_path ¤

import_pykeen_dataset_path(self, *, dataset: PyKEENDataset = Nations) -> Bundle

Imports a dataset from the PyKEEN library.

inductively_split_dataset ¤

inductively_split_dataset(bundle: Bundle, *, dataset_table: TableName, entity_ratio: float = 0.5, training_ratio: float = 0.8, testing_ratio: float = 0.1, validation_ratio: float = 0.1, seed: int = 42)

Splits incoming data into 4 subsets. Transductive training on which training should be run, inductive inference on which during training inference is done. Inference testing and validation sets that can be used to evaluate model performance.

PARAMETER DESCRIPTION
entity_ratio

How many percent of the entities in the dataset should be in the transductive training graph. If 0 semi-inductive split is applied, else fully-inductive split is applied

TYPE: float DEFAULT: 0.5

training_ratio

When semi-inductive this is entity ratio, when fully-inductive this is the inference training split

TYPE: float DEFAULT: 0.8

testing_ratio

When semi-inductive this is entity ratio, when fully-inductive this is the inference testing split

TYPE: float DEFAULT: 0.1

validation_ratio

When semi-inductive this is entity ratio, when fully-inductive this is the inference validation split

TYPE: float DEFAULT: 0.1

prepare_triples ¤

prepare_triples(triples_df: DataFrame, entity_to_id: Optional[Mapping[str, int]] = None, relation_to_id: Optional[Mapping[str, int]] = None, inv_triples: bool = False, numeric_literals: Optional[DataFrame] = None) -> TriplesFactory | TriplesNumericLiteralsFactory

Prepare triples for PyKEEN from a DataFrame.

req_inverse_triples ¤

req_inverse_triples(model: Model | PyKEENModel1D | PyKEENModelMoreD) -> bool

Check if the model requires inverse triples.

target_predict ¤

target_predict(bundle: Bundle, *, model_name: PyKEENModelName = 'PyKEENmodel', head: str, relation: str, tail: str, inductive_setting: bool = False)

Leave the target prediction field empty

SQL and Cypher.

cypher ¤

cypher(bundle: Bundle, *, query: LongStr, save_as: str = 'results')

Run a Cypher query on the graph in the bundle. Save the results as a new DataFrame.

sql ¤

sql(bundle: Bundle, *, query: LongStr, save_as: str = 'results')

Run a SQL query on the DataFrames in the bundle. Save the results as a new DataFrame.

Visualizations.

binned_graph_visualization ¤

binned_graph_visualization(b: Bundle, *, x_property: NodePropertyName, y_property: NodePropertyName, x_bins=5, y_bins=5, show_loops: bool = False)

Nodes binned together by x and y are aggregated into one node. Edges between bins are aggregated into one edge.

Automatically wraps all NetworkX functions as LynxKite operations.