Operations
Python documentation for the operations in Lynxkite. This document has been automatically generated.
- lynx.operations.addConstantEdgeAttribute(name, value, type)
Adds an attribute with a fixed value to every edge.
- Parameters:
name – The new attribute will be created under this name.
value – The attribute value. Should be a number if Type is set to number.
type – The operation can create either number (numeric) or String typed attributes.
- lynx.operations.addConstantVertexAttribute(name, value, type)
Adds an attribute with a fixed value to every vertex.
- Parameters:
name – The new attribute will be created under this name.
value – The attribute value. Should be a number if Type is set to number.
type – The operation can create either number or String typed attributes.
- lynx.operations.addPopularityXSimilarityOptimizedEdges(externaldegree, internaldegree, exponent, seed)
Experimental Feature
Creates a graph with given amount of vertices and average degrees. The edges will follow a power-law - also known as scale-free - distribution and have high clustering. Vertices get two edge attributes called “radial” and “angular” that can later be used for edge strength evaluation or link prediction. The algorithm is based on Popularity versus Similarity in Growing Networks and Network Mapping by Replaying Hyperbolic Growth.
The edges are generated by simulating hyperbolic growth. Vertices are added one by one and at the time of each addition new edges are created in two ways. First, the new vertex is added and it creates edges from itself to older vertices
“external” edges. Then some new edges are added between older vertices
“internal” edges. This way the average amount of edges added per vertex will be slightly more than externalDegree + internalDegree.
- Parameters:
externaldegree – The number of edges a vertex creates from itself upon addition to the growing graph.
internaldegree – The average number of edges created between older vertices whenever a new vertex is added to the growing graph.
exponent – The exponent of the power-law degree distribution. Values can be 0.5 - 1, endpoints excluded.
seed – The random seed. +
- lynx.operations.addRandomEdgeAttribute()
- lynx.operations.addRandomVertexAttribute()
- lynx.operations.addRankAttribute(rankattr, keyattr, order)
Creates a new vertex attribute that is the rank of the vertex when ordered by the key attribute. Rank 0 will be the vertex with the highest or lowest key attribute value (depending on the direction of the ordering). String attributes will be ranked alphabetically.
This operation makes it easy to find the top (or bottom) N vertices by an attribute. First, create the ranking attribute. Then filter by this attribute.
- Parameters:
rankattr – The new attribute will be created under this name.
keyattr – The attribute to rank by.
order – With ascending ordering rank 0 belongs to the vertex with the minimal key attribute value or the vertex that is at the beginning of the alphabet. With descending ordering rank 0 belongs to the vertex with the maximal key attribute value or the vertex that is at the end of the alphabet.
- lynx.operations.addReversedEdges(distattr)
For every A→B edge adds a new B→A edge, copying over the attributes of the original. Thus this operation will double the number of edges in the graph.
Using this operation you end up with a graph with symmetric edges: if A→B exists then B→A also exists. This is the closest you can get to an “undirected” graph.
Optionally, a new edge attribute (a ‘distinguishing attribute’) will be created so that you can tell the original edges from the new edges after the operation. Edges where this attribute is 0 are original edges; edges where this attribute is 1 are new edges.
- Parameters:
distattr – The name of the distinguishing edge attribute; leave it empty if the attribute should not be created.
- lynx.operations.aggregateEdgeAttributeGlobally(prefix)
Aggregates edge attributes across the entire graph into one graph attribute for each attribute. For example you could use it to calculate the average call duration across an entire call dataset.
- Parameters:
prefix – Save the aggregated values with this prefix.
- lynx.operations.aggregateEdgeAttributeToVertices(prefix, direction)
Aggregates an attribute on all the edges going in or out of vertices. For example it can calculate the average duration of calls for each person in a call dataset.
- Parameters:
prefix – Save the aggregated attributes with this prefix.
direction –
incoming edges: Aggregate across the edges coming in to each vertex.
outgoing edges: Aggregate across the edges going out of each vertex.
all edges: Aggregate across all the edges going in or out of each vertex.
- lynx.operations.aggregateFromSegmentation(prefix)
Aggregates vertex attributes across all the segments that a vertex in the base graph belongs to. For example, it can calculate the average size of cliques a person belongs to.
- Parameters:
prefix – Save the aggregated attributes with this prefix.
- lynx.operations.aggregateOnNeighbors(prefix, direction)
Aggregates across the vertices that are connected to each vertex. You can use the Aggregate on parameter to define how exactly this aggregation will take place: choosing one of the ‘edges’ settings can result in a neighboring vertex being taken into account several times (depending on the number of edges between the vertex and its neighboring vertex); whereas choosing one of the ‘neighbors’ settings will result in each neighboring vertex being taken into account once.
For example, it can calculate the average age of the friends of each person.
- Parameters:
prefix – Save the aggregated attributes with this prefix.
direction –
incoming edges: Aggregate across the edges coming in to each vertex.
outgoing edges: Aggregate across the edges going out of each vertex.
all edges: Aggregate across all the edges going in or out of each vertex.
symmetric edges: Aggregate across the ‘symmetric’ edges for each vertex: this means that if you have n edges going from A to B and k edges going from B to A, then min(n,k) edges will be taken into account for both A and B.
in-neighbors: For each vertex A, aggregate across those vertices that have an outgoing edge to A.
out-neighbors: For each vertex A, aggregate across those vertices that have an incoming edge from A.
all neighbors: For each vertex A, aggregate across those vertices that either have an outgoing edge to or an incoming edge from A.
symmetric neighbors: For each vertex A, aggregate across those vertices that have both an outgoing edge to and an incoming edge from A.
- lynx.operations.aggregateToSegmentation()
Aggregates vertex attributes across all the vertices that belong to a segment. For example, it can calculate the average age of each clique.
- lynx.operations.aggregateVertexAttributeGlobally(prefix)
Aggregates vertex attributes across the entire graph into one graph attribute for each attribute. For example you could use it to calculate the average age across an entire dataset of people.
- Parameters:
prefix – Save the aggregated values with this prefix.
- lynx.operations.anchor(description, parameters)
This special box represents the workspace itself. There is always exactly one instance of it. It allows you to control workspace-wide settings as parameters on this box. It can also serve to anchor your workspace with a high-level description.
- Parameters:
description – An overall description of the purpose of this workspace.
parameters – Workspaces containing output boxes can be used as <<custom-boxes, custom boxes>> in other workspaces. Here you can define what parameters the custom box created from this workspace shall have. + Parameters can also be used as workspace-wide constants. For example if you want to import accounts-2017.csv and transactions-2017.csv, you could create a date parameter with default value set to 2017 and import the files as accounts-$date.csv and transactions-$date.csv. (Make sure to mark these parametric file names as <<parametric-parameters, parametric>>.) This makes it easy to change the date for all imported files at once later.
- lynx.operations.approximateClusteringCoefficient(name, bits)
Scalable algorithm to calculate the approximate local clustering coefficient attribute for every vertex. It quantifies how close the vertex’s neighbors are to being a clique. In practice a high (close to 1.0) clustering coefficient means that the neighbors of a vertex are highly interconnected, 0.0 means there are no edges between the neighbors of the vertex.
- Parameters:
name – The new attribute will be created under this name.
bits – This algorithm is an approximation. This parameter sets the trade-off between the quality of the approximation and the memory and time consumption of the algorithm.
- lynx.operations.approximateEmbeddedness(name, bits)
Scalable algorithm to calculate the approximate overlap size of vertex neighborhoods along the edges. If an A→B edge has an embeddedness of N, it means A and B have N common neighbors. The approximate embeddedness is undefined for loop edges.
- Parameters:
name – The new attribute will be created under this name.
bits – This algorithm is an approximation. This parameter sets the trade-off between the quality of the approximation and the memory and time consumption of the algorithm.
- lynx.operations.bundleVertexAttributesIntoAVector(output, elements)
Bundles the chosen number and Vector[number] attributes into one Vector attribute. By default, LynxKite puts the numeric attributes after each other in alphabetical order and then concatenates the Vector attributes to the resulting Vector in alphabetical order as well. The resulting attribute is undefined where any of the input attributes is undefined.
For example, if you bundle the age, favorite_day and income attributes into a Vector attribute called everything, you end up with the following attributes.
- Parameters:
output – The new attribute will be created under this name.
elements – The attributes you would like to bundle into a Vector.
- lynx.operations.checkCliques(selected, bothdir)
Validates that the segments of the segmentation are in fact cliques.
Creates a new invalid_cliques graph attribute, which lists non-clique segment IDs up to a certain number.
- Parameters:
selected – The validation can be restricted to a subset of the segments, resulting in quicker operation.
bothdir – Whether edges have to exist in both directions between all members of a clique.
- lynx.operations.classifyWithModel(name, model)
Creates classifications from a model and vertex attributes of the graph. For the classifications with nominal outputs, an additional probability is created to represent the corresponding outcome probability.
- Parameters:
name – The new attribute of the classification will be created under this name.
model – The model used for the classifications and a mapping from vertex attributes to the model’s features. + Every feature of the model needs to be mapped to a vertex attribute.
- lynx.operations.coloring(name)
Finds a coloring of the vertices of the graph with no two neighbors with the same color. The colors are represented by numbers. Tries to find a coloring with few colors.
Vertex coloring is used in scheduling problems to distribute resources among parties which simultaneously and asynchronously request them. https://en.wikipedia.org/wiki/Graph_coloring
- Parameters:
name – The new attribute will be created under this name.
- lynx.operations.combineSegmentations(name, segmentations)
Creates a new segmentation from the selected existing segmentations. Each new segment corresponds to one original segment from each of the original segmentations, and the new segment is the intersection of all the corresponding segments. We keep non-empty resulting segments only. Edges between segmentations are discarded.
If you have segmentations A and B with two segments each, such as:
A
- Parameters:
name – The new segmentation will be saved under this name.
segmentations – The segmentations to combine. Select two or more.
- lynx.operations.comment(comment)
Adds a comment to the workspace. As with any box, you can freely place your comment anywhere on the workspace. Adding comments does not have any effect on the computation but can potentially make your workflow easier to understand for others – or even for your future self.
Markdown can be used to present formatted text or embed links and images.
- Parameters:
comment – Markdown text to be displayed in the workspace.
- lynx.operations.compareSegmentationEdges(golden, test)
Compares the edge sets of two segmentations and computes precision and recall. In order to make this work, the edges of the both segmentation graphs should be matchable against each other. Therefore, this operation only allows comparing segmentations which were created using the <<Use base graph as segmentation>> operation from the same graph. (More precisely, a one to one correspondence is needed between the vertices of both segmentations and the base graph.)
You can use this operation for example to evaluate different colocation results against a reference result.
- Parameters:
golden – Segmentation containing the golden edges.
test – Segmentation containing the test edges.
- lynx.operations.computeAssortativity(name, attr)
Assortativity is the correlation in the values of an attribute along the edges of the graph. A high assortativity means connected vertices often have similar attribute values.
Uses the NetworKit implementation.
- Parameters:
name – The new graph attribute will be created under this name.
attr – The attribute in which you are interested in correlations along the edges.
- lynx.operations.computeCentrality(name, algorithm, direction, weight, samples, maxdiameter, bits)
Calculates a centrality metric for every vertex. Higher centrality means that the vertex is more embedded in the graph. Multiple different centrality measures have been defined in the literature. You can choose the specific centrality measure as a parameter to this operation.
- Parameters:
name – The new attribute will be created under this name.
algorithm –
Average distance
(or closeness centrality) of the vertex A is the sum of the shortest paths to A divided by the size of its coreachable set. - The https://en.wikipedia.org/wiki/Betweenness_centrality
direction –
incoming edges: Calculating paths from vertices.
outgoing edges: Calculating paths to vertices.
all edges: Calculating paths to both directions - effectively on an undirected graph.
weight – Some of the centrality algorithms can take the selected edge weights into account.
samples – Some of the estimation methods are based on picking a sample of vertices. This parameter controls the size of this sample. A bigger sample leads to a more accurate estimate and a longer computation time.
maxdiameter – Some algorithms (harmonic, Lin, and average distance) work by counting the shortest paths up to a certain length in each iteration. This parameter sets the maximal length to check, so it has a strong influence over the run time of the operation. + A setting lower than the actual diameter of the graph can theoretically introduce unbounded error to the results. In typical small world graphs this effect may be acceptable, however.
bits – Some centrality algorithms (harmonic, Lin, and average distance) are approximations. This parameter sets the trade-off between the quality of the approximation and the memory and time consumption of the algorithm. In most cases the default value is good enough. On very large graphs it may help to use a lower number in order to speed up the algorithm or meet memory constraints.
- lynx.operations.computeClusteringCoefficient(name)
Calculates the local clustering coefficient attribute for every vertex. It quantifies how close the vertex’s neighbors are to being a clique. In practice a high (close to 1.0) clustering coefficient means that the neighbors of a vertex are highly interconnected, 0.0 means there are no edges between the neighbors of the vertex.
- Parameters:
name – The new attribute will be created under this name.
- lynx.operations.computeCoverageOfSegmentation(name, weight)
Computes a scalar for a non-overlapping segmentation. Coverage is the fraction of edges that connect vertices within the same segment.
Uses the NetworKit implementation.
- Parameters:
name – This box creates a new vertex attribute on the segmentation by this name.
weight – An edge attribute can be used to weight the edges in the coverage computation.
- lynx.operations.computeDegree(name, direction)
For every vertex, this operation calculates either the number of edges it is connected to or the number of neighboring vertices it is connected to. You can use the Count parameter to control this calculation: choosing one of the ‘edges’ settings can result in a neighboring vertex being counted several times (depending on the number of edges between the vertex and the neighboring vertex); whereas choosing one of the ‘neighbors’ settings will result in each neighboring vertex counted once.
- Parameters:
name – The new attribute will be created under this name.
direction –
incoming edges: Count the edges coming in to each vertex.
outgoing edges: Count the edges going out of each vertex.
all edges: Count all the edges going in or out of each vertex.
symmetric edges: Count the ‘symmetric’ edges for each vertex: this means that if you have n edges going from A to B and k edges going from B to A, then min(n,k) edges will be taken into account for both A and B.
in-neighbors: For each vertex A, count those vertices that have an outgoing edge to A.
out-neighbors: For each vertex A, count those vertices that have an incoming edge from A.
all neighbors: For each vertex A, count those vertices that either have an outgoing edge to or an incoming edge from A.
symmetric neighbors: For each vertex A, count those vertices that have both an outgoing edge to and an incoming edge from A.
- lynx.operations.computeDiameter(name, max_error)
The diameter of a graph is the maximal shortest-distance path length between two vertices. All vertex pairs are at most this far from each other.
Uses the NetworKit implementation.
- Parameters:
name – The new graph attribute will be created under this name.
max_error – Set to 0 to get the exact diameter. This may require a lot of computation, however. + Set to a value greater than 0 to use a faster computation that gives lower and upper bounds on the diameter. With 0.1 maximum relative error, for example, the upper bound will be no more than 10% greater than the true diameter.
- lynx.operations.computeDispersion(name)
Calculates the extent to which two people’s mutual friends are not themselves well-connected. The dispersion attribute for an A→B edge is the number of pairs of nodes that are both connected to A and B but are not directly connected to each other.
Dispersion ignores edge directions.
It is a useful signal for identifying romantic partnerships – connections with high dispersion – according to _Romantic Partnerships and the Dispersion of Social Ties: A Network Analysis of Relationship Status on Facebook_.
A normalized dispersion metric is also generated by this operation. This is normalized against the embeddedness of the edge with the formula recommended in the cited article. (_disp(u,v)^0.61^/(emb(u,v)+5)_) It does not necessarily fall in the _(0,1)_ range.
- Parameters:
name – The new edge attribute will be created under this name.
- lynx.operations.computeDistanceViaShortestPath(name, edge_distance, starting_distance, iterations)
Calculates the length of the shortest path from a given set of vertices for every vertex. To use this operation, a set of starting _v~i~_ vertices has to be specified, each with a starting distance _sd(v~i~)_. Edges represent a unit distance by default, but this can be overridden using an attribute. This operation will compute for each vertex _v~i~_ the smallest distance from a starting vertex, also counting the starting distance of the starting vertex: _d(v~i~)
- Parameters:
name – The new attribute will be created under this name.
edge_distance –
The attribute containing the distances corresponding to edges. (Cost in the above example.)
Negative values are allowed but there must be no loops where the sum of distances is negative.
starting_distance – A numeric attribute that specifies the initial distances of the vertices that we consider already reachable before starting this operation. (In the above example, specify this for the elements of the starting set, and leave this undefined for the rest of the vertices.)
iterations – The maximum number of edges considered for a shortest-distance path.
- lynx.operations.computeEdgeCutOfSegmentation(name, weight)
Computes a scalar for a non-overlapping segmentation. Edge cut is the total weight of the edges going between different segments.
Uses the NetworKit implementation.
- Parameters:
name – This box creates a new vertex attribute on the segmentation by this name.
weight – An edge attribute can be used as edge weight.
- lynx.operations.computeEffectiveDiameter(name, ratio, algorithm, bits, approximations)
The effective diameter is a distance within which a given portion of vertices can be found from each other.
For example, at most six degrees of separation are between most people on Earth. There may be hermits and lost tribes that would push the true diameter above 6, but they are a minority. If we ignore 1% of the population and find that the remaining 99% have a true diameter of 6, we can say that the graph has an effective diameter of 6.
Uses the exact and estimated NetworKit implementations.
- Parameters:
name – The new graph attribute will be created under this name.
ratio – The fraction of the vertices to keep.
algorithm – Whether to compute the effective diameter exactly (slower) or approximately (faster).
bits – For estimating the effective diameter the http://www.cs.cmu.edu/~christos/PUBLICATIONS/kdd02-anf.pdf
approximations – For estimating the effective diameter the http://www.cs.cmu.edu/~christos/PUBLICATIONS/kdd02-anf.pdf
- lynx.operations.computeEmbeddedness(name)
Edge embeddedness is the overlap size of vertex neighborhoods along the edges. If an A→B edge has an embeddedness of N, it means A and B have N common neighbors.
- Parameters:
name – The new attribute will be created under this name.
- lynx.operations.computeHubDominance(name)
Computes the hub dominance metric for each segment in a segmentation. The hub dominance of a segment is the highest internal degree in the segment divided by the highest possible internal degree. (The segment size minus one.)
If a segment has a vertex that is connected to all other vertices in that segment then its hub dominance will be 1. This metric is useful for comparing the structures that make up the different segments in a segmentation.
For further analysis and theory see Characterizing the community structure of complex networks by Lancichinetti et al.
Uses the NetworKit implementation.
- Parameters:
name – This box creates a new vertex attribute on the segmentation by this name.
- lynx.operations.computeHyperbolicEdgeProbability(radial, angular)
Adds edge attribute hyperbolic edge probability based on hyperbolic distances between vertices. This indicates how likely that edge would be to exist if the input graph was probability x similarity-grown. On a general level it is a metric of edge strength. Probabilities are guaranteed to be 0
- Parameters:
radial – The vertex attribute to be used as radial coordinates. Should not contain negative values.
angular – The vertex attribute to be used as angular coordinates. Values should be 0 - 2 * Pi.
- lynx.operations.computeInPython(code, inputs, outputs)
Executes custom Python code to define new vertex, edge, or graph attributes.
The following example computes two new vertex attributes (with_title and age_squared), two new edge attributes (score and names), and two new graph_attributes (hello and average_age). (You can try it on the <<Create example graph, example graph>> which has the attributes used in this code.)
[source,python]
vs[‘with_title’]
- param code:
The Python code you want to run. See the operation description for details.
- param inputs:
A comma-separated list of attributes that your code wants to use. For example, vs.my_attribute, vs.another_attribute, graph_attributes.last_one.
- param outputs:
A comma-separated list of attributes that your code generates. These must be annotated with the type of the attribute. For example, vs.my*new*attribute: str, vs.another*new*attribute: float, graph_attributes.also_new: str.
- lynx.operations.computeInputs()
Triggers the computations for all entities associated with its input.
For table inputs, it computes the table.
For graph inputs, it computes the vertices and edges, all attributes, and the same transitively for all segments plus the segmentation links.
- lynx.operations.computeModularityOfSegmentation(name, weight)
Computes a scalar for a non-overlapping segmentation. If the vertices were connected randomly while preserving the degrees, a certain fraction of all edges would fall within each segment. We subtract this from the observed fraction of edges that fall within the segments. Modularity is the total observed difference.
A modularity of 0 means the relationship between internal edges and external edges is consistent with randomly selected edges or segments. A positive modularity means more internal edges than would be expected by chance. A negative modularity means less internal edges than would be expected by chance.
Uses the NetworKit implementation.
- Parameters:
name – This box creates a new vertex attribute on the segmentation by this name.
weight – An edge attribute can be used to weight the edges instead of just looking at edge counts.
- lynx.operations.computePagerank(name, weights, iterations, damping, direction)
Calculates PageRank for every vertex. PageRank is calculated by simulating random walks on the graph. Its PageRank reflects the likelihood that the walk leads to a specific vertex.
Let’s imagine a social graph with information flowing along the edges. In this case high PageRank means that the vertex is more likely to be the target of the information.
Similarly, it may be useful to identify information sources in the reversed graph. Simply reverse the edges before running the operation to calculate the reverse PageRank.
- Parameters:
name – The new attribute will be created under this name.
weights – The edge weights. Edges with greater weight correspond to higher probabilities in the theoretical random walk.
iterations – PageRank is an iterative algorithm. More iterations take more time but can lead to more precise results. As a rule of thumb set the number of iterations to the diameter of the graph, or to the median shortest path.
damping – The probability of continuing the random walk at each step. Higher damping factors lead to longer random walks.
direction –
incoming edges: Simulate random walk in the reverse edge direction. Finds the most influential sources.
outgoing edges: Simulate random walk in the original edge direction. Finds the most popular destinations.
all edges: Simulate random walk in both directions.
- lynx.operations.computeSegmentConductance(name, weight)
Computes the conductance of each segment in a non-overlapping segmentation. The conductance of a segment is the number of edges going between the segment and the rest of the graph divided by sum of the degrees in the segment or the rest of the graph (whichever is smaller).
A high conductance value indicates a segment that is strongly connected to the rest of the graph. A value over 0.5 means more edges going out of the segment than edges inside it.
See Experiments on Density-Constrained Graph Clustering by Görke et al for details and analysis.
Uses the NetworKit implementation.
- Parameters:
name – This box creates a new vertex attribute on the segmentation by this name.
weight – The definition can be rephrased to apply to weighted graphs. In this case the total weight of the cut is compared to the weighted degrees.
- lynx.operations.computeSegmentDensity(name)
Computes the density of each segment in a non-overlapping segmentation. The density of a segment is the number of internal edges divided by the number of possible internal edges.
Uses the NetworKit implementation.
- Parameters:
name – This box creates a new vertex attribute on the segmentation by this name.
- lynx.operations.computeSegmentExpansion(name, weight)
Computes the expansion of each segment in a non-overlapping segmentation. The expansion of a segment is the number of edges going between the segment and the rest of the graph divided by the number of vertices in the segment or in the rest of the graph (whichever is smaller).
A high expansion value indicates a segment that is strongly connected to the rest of the graph. A value over 1 means the vertices in this segment have more than one external neighbor on average.
See Experiments on Density-Constrained Graph Clustering by Görke et al for details and analysis.
Uses the NetworKit implementation.
- Parameters:
name – This box creates a new vertex attribute on the segmentation by this name.
weight – The definition can be rephrased to apply to weighted graphs. In this case the total weight of the cut is compared to the weighted degrees.
- lynx.operations.computeSegmentFragmentation(name)
Computes the fragmentation of each segment in a non-overlapping segmentation. The fragmentation of a segment is one minus the ratio of the size of its largest component and the whole segment.
A segment that is entirely connected will have a fragmentation of zero. If the fragmentation approaches one, it will be made up of smaller and smaller components.
Uses the NetworKit implementation.
- Parameters:
name – This box creates a new vertex attribute on the segmentation by this name.
- lynx.operations.computeSegmentStability(name)
Computes the stability of each segment in a non-overlapping segmentation. A vertex is considered stable if it has more neighbors inside the segment than outside. The stability of a segment is the fraction of its vertices that are stable.
A high stability value (close to 1) indicates a segment where vertices are more connected internally than externally. A stability lower than 0.5 means that the majority of neighbors are external for more than half of the vertices.
Uses the NetworKit implementation.
- Parameters:
name – This box creates a new vertex attribute on the segmentation by this name.
- lynx.operations.connectVerticesOnAttribute(fromattr, toattr)
Creates edges between vertices that are equal in a chosen attribute. If the source attribute of A equals the destination attribute of B, an A→B edge will be generated.
The two attributes must be of the same data type.
For example, if you connect nodes based on the “name” attribute, then everyone called “John Smith” will be connected to all the other “John Smiths”.
- Parameters:
fromattr – An A→B edge is generated when this attribute on A matches the destination attribute on B.
toattr – An A→B edge is generated when the source attribute on A matches this attribute on B.
- lynx.operations.convertEdgeAttributeToNumber(attr)
Converts the selected String typed edge attributes to the number type.
The attributes will be converted in-place. If you want to keep the original String attribute as well, make a copy first!
- Parameters:
attr – The attributes to be converted.
- lynx.operations.convertEdgeAttributeToString(attr)
Converts the selected edge attributes to String type.
The attributes will be converted in-place. If you want to keep the original String attribute as well, make a copy first!
- Parameters:
attr – The attributes to be converted.
- lynx.operations.convertVertexAttributeToNumber(attr)
Converts the selected String typed vertex attributes to the number type.
The attributes will be converted in-place. If you want to keep the original String attribute as well, make a copy first!
- Parameters:
attr – The attributes to be converted.
- lynx.operations.convertVertexAttributeToString(attr)
Converts the selected vertex attributes to String type.
The attributes will be converted in-place. If you want to keep the original attributes as well, make a copy first!
- Parameters:
attr – The attributes to be converted.
- lynx.operations.copyEdgeAttribute(name, destination)
Creates a copy of an edge attribute.
- Parameters:
name –
destination –
- lynx.operations.copyEdgesToBaseGraph()
Copies the edges from a segmentation to the base graph. The copy is performed along the links between the segmentation and the base graph. If two segments are connected with some edges, then each edge will be copied to each pairs of members of the segments.
This operation has a potential to create trillions of edges or more. The number of edges created is the sum of the source and destination segment sizes multiplied together for each edge in the segmentation. It is recommended to drop very large segments before running this computation.
- lynx.operations.copyEdgesToSegmentation()
Copies the edges from the base graph to the segmentation. The copy is performed along the links between the base graph and the segmentation. If a base vertex belongs to no segments, its edges will not be found in the result. If a base vertex belongs to multiple segments, its edges will have multiple copies in the result.
- lynx.operations.copyGraphAttribute(name, destination)
Creates a copy of a graph attribute.
- Parameters:
name –
destination –
- lynx.operations.copyGraphAttributeFromOtherGraph(sourceproject, sourcescalarname, destscalarname)
This operation can take a graph attribute from another graph and copy it to the current graph.
It can be useful if we trained a machine learning model in one graph, and would like to apply this model in another graph for predicting undefined attribute values.
- Parameters:
sourceproject – The name of the other graph from where we want to copy a graph attribute.
sourcescalarname – The name of the graph attribute in the other graph. If it is a simple string, then the graph attribute with that name has to be in the root of the other graph. If it is a .-separated string, then it means a graph attribute in a segmentation of the other graph. The syntax for this case is: seg_1.seg_2…..seg_n.graph_attribute.
destscalarname – This will be the name of the copied graph attribute in this graph.
- lynx.operations.copySegmentation(name, destination)
Creates a copy of a segmentation.
- Parameters:
name –
destination –
- lynx.operations.copyVertexAttribute(name, destination)
Creates a copy of a vertex attribute.
- Parameters:
name –
destination –
- lynx.operations.copyVertexAttributesFromSegmentation(prefix)
Copies all vertex attributes from the segmentation to the parent.
This operation is only available when each vertex belongs to just one segment. (As in the case of connected components, for example.)
- Parameters:
prefix – A prefix for the new attribute names. Leave empty for no prefix.
- lynx.operations.copyVertexAttributesToSegmentation(prefix)
Copies all vertex attributes from the parent to the segmentation.
This operation available only when each segment contains just one vertex.
- Parameters:
prefix – A prefix for the new attribute names. Leave empty for no prefix.
- lynx.operations.correlateTwoAttributes(attra, attrb)
Calculates the Pearson correlation coefficient of two attributes. Only vertices where both attributes are defined are considered.
Note that correlation is undefined if at least one of the attributes is a constant.
- Parameters:
attra – The correlation of these two attributes will be calculated.
attrb – The correlation of these two attributes will be calculated.
- lynx.operations.createAGraphWithCertainDegrees(size, degrees, algorithm, seed)
Creates a graph in which the distribution of vertex degrees is as specified.
Uses the NetworKit implementation.
- Parameters:
size – The created graph will have this many vertices.
degrees – The algorithm will try to ensure that an equal number of vertices will have each of the listed degrees. For example, generating 30 vertices with a degree list of “1, 1, 5” will result in 20 vertices having degree 1 and 10 vertices having degree 5.
algorithm –
The algorithm to use. - Chung–Lu: An extension of the Erdős–Rényi random graph model
with edge probabilities dependent on vertex “weights”. See Efficient Generation of Networks with Given Expected Degrees.
seed – The random seed. +
- lynx.operations.createBarabSiAlbertGraph(size, attachments_per_vertex, connected_at_start, seed)
Creates a random graph using the https://en.wikipedia.org/wiki/Barab%C3%A1si%E2%80%93Albert_model[Barabási–Albert model]. The vertices are created one by one and connected to a set number of randomly chosen previously created vertices. This ensures a skewed degree distribution with “older” vertices tending to have a higher degree.
Uses the NetworKit implementation.
- Parameters:
size – The created graph will have this many vertices.
attachments_per_vertex – As each vertex is added, it will be connected to this many existing vertices.
connected_at_start – This many vertices will be connected in a circle at the start of the algorithm.
seed – The random seed. +
- lynx.operations.createClusteredRandomGraph(size, clusters, probability_in, probability_out, seed)
Creates a random graph with a given number of clusters. It randomly places each vertex into one of the clusters then adds an edge for each vertex pair with the given probabilities.
Uses the NetworKit implementation.
- Parameters:
size – The created graph will have this many vertices.
clusters – The created graph will have this many clusters. Each vertex will be randomly placed into one of the clusters with equal probability.
probability_in – The probablity for adding an edge between two vertices if they are in the same cluster.
probability_out – The probablity for adding an edge between two vertices if they are in different clusters.
seed – The random seed. +
- lynx.operations.createDorogovtsevMendesRandomGraph(size, seed)
Creates a planar random graph with a power-law distribution. Starts with a triangle and in each step adds a new node that is connected to the two endpoints of a randomly selected edge.
See Modern architecture of random graphs: Constructions and correlations by Dorogovtsev et al.
Uses the NetworKit implementation.
- Parameters:
size – The created graph will have this many vertices.
seed – The random seed. +
- lynx.operations.createEdgesFromCoOccurrence()
Connects vertices in the base graph if they co-occur in any segments. Multiple co-occurrences will result in multiple parallel edges. Loop edges are generated for each segment that a vertex belongs to. The attributes of the segment are copied to the edges created from it.
This operation has a potential to create trillions of edges or more. The number of edges created is the sum of squares of the segment sizes. It is recommended to drop very large segments before running this computation.
- lynx.operations.createEdgesFromSetOverlaps(minoverlap)
Connects segments with large enough overlaps.
- Parameters:
minoverlap – Two segments will be connected if they have at least this many members in common.
- lynx.operations.createErdSRNyiGraph(size, probability, seed)
Creates a random graph using the https://en.wikipedia.org/wiki/Erd%C5%91s%E2%80%93R%C3%A9nyi_model[Erdős–Rényi model]. In this model each pair of vertices is connected independently with the same probability. It creates a very uniform graph with no tendency to skewed degree distributions or clustering.
Uses the NetworKit implementation.
- Parameters:
size – The created graph will have this many vertices.
probability – Each pair of vertices is connected with this probability.
seed – The random seed. +
- lynx.operations.createExampleGraph()
Creates small test graph with 4 people and 4 edges between them.
- lynx.operations.createGraphInPython(code, outputs)
Executes custom Python code to define a graph. Ideal for creating complex graphs programmatically and for loading datasets in non-standard formats.
The following example creates a small graph with some attributes.
[source,python]
vs
- param code:
The Python code you want to run. See the operation description for details.
- param outputs:
A comma-separated list of attributes that your code generates. These must be annotated with the type of the attribute. For example, vs.my*new*attribute: str, vs.another*new*attribute: float, graph_attributes.also_new: str.
- lynx.operations.createHyperbolicRandomGraph(size, avg_degree, exponent, temperature, seed)
Creates a random graph based on randomly placed points on the hyperbolic plane. The points corresponding to vertices are placed on a disk. If two points are closer than a threshold (by the hyperbolic distance metric), an edge will be created between those two vertices.
The motivation for this is to reflect popularity (how close the point is to the center) and interest (in which direction the point lies). This leads to realistic clustering properties in the generated random graph.
The radius of the disk and the neighborhood radius can be chosen to ensure a desired average and power-law exponent for the degree distribution.
Instead of a strict neighborhood radius, within which edges are always created and outside of which they never are, we can also consider probabilistic edge generation. In this case the shorter the distance between two points, the more likely that an edge should be generated.
The temperature parameter is defined in a way that makes the strict neighborhood radius case an edge case (T
- Parameters:
size – The created graph will have this many vertices.
avg_degree – The expected value of the degree distribution.
exponent – The exponent of the degree distribution.
temperature – When zero, vertices are connected if they lie within a fixed threshold on the hyperbolic disk. Larger values add randomness while trying to preserve the degree distribution.
seed – The random seed. +
- lynx.operations.createLfrRandomGraph(size, avg_degree, max_degree, degree_exponent, min_community, max_community, community_exponent, avg_mixing, seed)
LFR stands for Lancichinetti, Fortunato, and Radicchi, the authors of Benchmark graphs for testing community detection algorithms and Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities upon which this generator is based.
The LFR random graph features overlapping communities. Each vertex is randomized into multiple communities while ensuring a desired power-law community size distribution. Then edges within communities are generated to match the desired power-law vertex degree distribution. Finally edges are swapped around to create cross-community connections.
Uses the NetworKit implementation.
- Parameters:
size – The created graph will have this many vertices.
avg_degree – The expected value of the desired vertex degree distribution.
max_degree – The maximum of the desired vertex degree distribution.
degree_exponent – The power-law exponent of the desired vertex degree distribution. A higher number means a more skewed distribution.
min_community – The minimum of the desired community size distribution.
max_community – The maximum of the desired community size distribution.
community_exponent – The power-law exponent of the desired community size distribution. A higher number means a more skewed distribution.
avg_mixing – What ratio of the neighbors of each vertex should on average be of other communities.
seed – The random seed. +
- lynx.operations.createMocnikRandomGraph(size, dimension, density, seed)
Creates a random graph as described in https://www.mocnik-science.net/publications/2015c%20-%20Franz-Benjamin%20Mocnik%20-%20Modelling%20Spatial%20Structures.pdf[Modelling Spatial Structures] by Mocnik et al. The model is based on randomly placing the vertices in Euclidean space and generating edges with a higher probability for pairs of vertices that are closer together.
Uses the NetworKit implementation.
- Parameters:
size – The created graph will have this many vertices.
dimension – The vertices are placed randomly in a space with this many dimensions.
density – The desired ratio of edges to nodes.
seed – The random seed. +
- lynx.operations.createP2pRandomGraph(size, dense_areas, max_degree, neighborhood_radius, seed)
Creates a random graph using the model described in A distributed diffusive heuristic for clustering a virtual P2P supercomputer by Gehweiler et al.
The vertices are randomly placed in a 2-dimensional unit square with a torus topology. Vertices within a set radius are connected when permitted by the maximum degree constraint.
Some dense circular areas within the unit suqare are picked at the beginning and these are populated first. Any remaining vertices are then placed uniformly. This leads to a clustering effect that models the internal networks of companies and institutions as observed in real peer-to-peer network topologies.
Uses the NetworKit implementation.
- Parameters:
size – The created graph will have this many vertices.
dense_areas – How many dense areas to pick. These will vary in size and will be populated first.
max_degree – Each vertex will be connected to at most this many neighbors.
neighborhood_radius – The model works by placing points on the unit square. Points within this radius will be connected to each other.
seed – The random seed. +
- lynx.operations.createRandomEdges(degree, seed)
Creates edges randomly, so that each vertex will have a degree uniformly chosen between 0 and 2 * the provided parameter.
For example, you can create a random graph by first applying operation <<Create vertices>> and then creating the random edges.
- Parameters:
degree – The degree of a vertex will be chosen uniformly between 0 and 2 * this number. This results in generating number of vertices * average degree edges.
seed – The random seed. +
- lynx.operations.createScaleFreeRandomEdges(iterations, periterationmultiplier)
Creates edges randomly so that the resulting graph is scale-free.
This is an iterative algorithm. We start with one edge per vertex and in each iteration the number of edges gets approximately multiplied by Per iteration edge number multiplier.
- Parameters:
iterations – Each iteration increases the number of edges by the specified multiplier. A higher number of iteration will result in a more scale-free degree distribution, but also a slower performance.
periterationmultiplier – Each iteration increases the number of edges by the specified multiplier. The edge count starts from the number of vertices, so with N iterations and m as the multiplier you will have _m^N^_ edges by the end.
- lynx.operations.createVertices(size)
Creates a new vertex set with no edges. Two attributes are generated: id and ordinal. id is the internal vertex ID, while ordinal is an index for the vertex: it goes from zero to the vertex set size.
- Parameters:
size – The number of vertices to create.
- lynx.operations.customPlot(plot_code)
Creates a plot from the input table. The plot can be defined using the Vegas plotting API in Scala. This API makes it easy to define Vega-Lite plots in code.
You code has to evaluate to a vegas.Vegas object. For your convenience vegas._ is already imported. An example of a simple plot would be:
``` Vegas()
.withData(table) .encodeX(“name”, Nom) .encodeY(“age”, Quant) .encodeColor(“gender”, Nom) .mark(Bar)
Vegas() is the entry point to the plotting API. You can provide a title if you like: Vegas(“My Favorite Plot”).
LynxKite fetches a sample of up to 10,000 rows from your table for the purpose of the plot. This data is made available in the table variable (as Seq[Map[String, Any]]). .withData(table) binds this data to the plot. You can transform the data before plotting if necessary:
``` val doubled
- Parameters:
plot_code – Scala code for defining the plot.
- lynx.operations.defineSegmentationLinksFromMatchingAttributes(base_id_attr, seg_id_attr)
Connect vertices in the base graph with segments based on matching attributes.
This operation can be used (among other things) to create connections between two graphs once one has been imported as a segmentation of the other. (See <<Use other graph as segmentation>>.)
- Parameters:
base_id_attr – A vertex will be connected to a segment if the selected vertex attribute of the vertex matches the selected vertex attribute of the segment.
seg_id_attr – A vertex will be connected to a segment if the selected vertex attribute of the vertex matches the selected vertex attribute of the segment.
- lynx.operations.deriveColumn(name, value)
Derives a new column on a table input via an SQL expression. Outputs a table.
- Parameters:
name – The name of the new column.
value – The SQL expression to define the new column.
- lynx.operations.deriveEdgeAttribute(output, defined_attrs, expr, persist)
Generates a new attribute based on existing attributes. The value expression can be an arbitrary Scala expression, and it can refer to existing attributes on the edge as if they were local variables. It can also refer to attributes of the source and destination vertex of the edge using the format src$attribute and dst$attribute.
For example you can write weight * scala.math.abs(src$age - dst$age) to generate a new attribute that is the weighted age difference of the two endpoints of the edge.
You can also refer to graph attributes in the Scala expression. For example, assuming that you have a graph attribute *age*average_, you can use the expression if (src$age < age_average / 2 && dst$age > age_average * 2) 1.0 else 0.0 to identify connections between relatively young and relatively old people.
Back quotes can be used to refer to attribute names that are not valid Scala identifiers.
The Scala expression can return any of the following types: - String, - Double, which will be presented as number - Int, which will be automatically converted to Double - Long, which will be automatically converted to Double - `Vector`s or `Set`s combined from the above.
In case you do not want to define the output for every input, you can return an Option created from the above types. E.g. if (income > 1000) Some(age) else None.
- Parameters:
output – The new attribute will be created under this name.
defined_attrs –
true: The new attribute will only be defined on edges for which all the attributes used in the expression are defined.
false: The new attribute is defined on all edges. In this case the Scala expression does not pass the attributes using their original types, but wraps them into Option`s. E.g. if you have an attribute `income: Double you would see it as `income: Option
expr – The Scala expression. You can enter multiple lines in the editor.
persist – If enabled, the output attribute will be saved to disk once it is calculated. If disabled, the attribute will be re-computed each time its output is used. Persistence can improve performance at the cost of disk space.
- lynx.operations.deriveGraphAttribute(output, expr)
Generates a new <<graph-attributes, graph attribute>> based on existing graph attributes. The value expression can be an arbitrary Scala expression, and it can refer to existing graph attributes as if they were local variables.
For example you could derive a new graph attribute as something_sum / something_count to get the average of something.
- Parameters:
output – The new graph attribute will be created under this name.
expr – The Scala expression. You can enter multiple lines in the editor.
- lynx.operations.deriveVertexAttribute(output, defined_attrs, expr, persist)
Generates a new attribute based on existing vertex attributes. The value expression can be an arbitrary Scala expression, and it can refer to existing attributes as if they were local variables.
For example you can write age * 2 to generate a new attribute that is the double of the age attribute. Or you can write `if (gender
- Parameters:
output – The new attribute will be created under this name.
defined_attrs –
true: The new attribute will only be defined on vertices for which all the attributes used in the expression are defined.
false: The new attribute is defined on all vertices. In this case the Scala expression does not pass the attributes using their original types, but wraps them into Option`s. E.g. if you have an attribute `income: Double you would see it as `income: Option
expr – The Scala expression. You can enter multiple lines in the editor.
persist – If enabled, the output attribute will be saved to disk once it is calculated. If disabled, the attribute will be re-computed each time its output is used. Persistence can improve performance at the cost of disk space.
- lynx.operations.discardEdgeAttributes(name)
Throws away edge attributes.
- Parameters:
name – The attributes to discard.
- lynx.operations.discardEdges()
Throws away all edges. This implies discarding all edge attributes too.
- lynx.operations.discardGraphAttributes(name)
Throws away graph attributes.
- Parameters:
name – The graph attributes to discard.
- lynx.operations.discardLoopEdges()
Discards edges that connect a vertex to itself.
- lynx.operations.discardSegmentation(name)
Throws away a segmentation value.
- Parameters:
name – The segmentation to discard.
- lynx.operations.discardVertexAttributes(name)
Throws away vertex attributes.
- Parameters:
name – The vertex attributes to discard.
- lynx.operations.embedVertices(save_as, iterations, dimensions, walks_per_node, walk_length, context_size)
Creates a vertex embedding using the PyTorch Geometric implementation of the node2vec algorithm.
- Parameters:
save_as – The new attribute will be created under this name.
iterations – Number of training iterations.
dimensions – The size of each embedding vector.
walks_per_node – Number of random walks collected for each vertex.
walk_length – Length of the random walks collected for each vertex.
context_size – The random walks will be cut with a rolling window of this size. This allows reusing the same walk for multiple vertices.
- lynx.operations.exportEdgeAttributesToNeo4j(url, username, password, version, labels, keys)
Exports edge attributes from a graph in LynxKite to a corresponding graph in Neo4j.
The relationships in Neo4j are identified by a key property (or properties). You must have a corresponding edge attribute in LynxKite by the same name. This will be used to find the right relationship to update in Neo4j.
The properties of the Neo4j relationships will be updated with the exported edge attributes using a Cypher query like this:
UNWIND $events as event MATCH ()-[r:TYPE {key: event.`key`}]-() SET r +
- Parameters:
url – The Neo4j connection string of the form bolt://localhost:7687.
username – Username for the connection.
password – Password for the connection. It will be saved in the workspace and visible to anyone with access to the workspace.
version – LynxKite only re-computes outputs if parameters or inputs have changed. This is true for exports too. If you want to repeat a previous export, you can increase this export repetition ID parameter.
labels – Makes it possible to restrict the export to one relationship type in Neo4j. This is useful to make sure no other relationship type is accidentally affected. The format is as in Cypher: :TYPE. Leave empty to allow updating any node.
keys – Select the attribute (or attributes) to identify the Neo4j relationships by. The attribute name must match the property name in Neo4j.
- lynx.operations.exportGraphToNeo4j(url, username, password, version, node_labels, relationship_type)
Exports a graph from LynxKite to Neo4j. The whole graph will be copied to Neo4j with all attributes. No existing data is modified in Neo4j.
A !LynxKite export timestamp property is added to each new node and relationship in Neo4j. This helps clean up the export if needed.
The Cypher query to export nodes is, depending on whether an attribute specifies the node labels:
UNWIND $events AS event // Without node labels: CREATE (n) SET n +
- Parameters:
url – The Neo4j connection string of the form bolt://localhost:7687.
username – Username for the connection.
password – Password for the connection. It will be saved in the workspace and visible to anyone with access to the workspace.
version – LynxKite only re-computes outputs if parameters or inputs have changed. This is true for exports too. If you want to repeat a previous export, you can increase this export repetition ID parameter.
node_labels – A string vertex attribute that is a comma-separated list of labels to apply to the newly created nodes. Optional. You must have Neo4j APOC installed on the Neo4j instance to use this.
relationship_type –
A string edge attribute that specifies the relationship type for each newly created relationship. Optional. You must have Neo4j APOC installed on the Neo4j instance to use this.
- lynx.operations.exportToAvro(path, version, for_download)
Apache AVRO is a row-oriented remote procedure call and data serialization framework.
- Parameters:
path – The distributed file-system path of the output file. It defaults to <auto>, in which case the path is auto generated from the parameters and the type of export (e.g. Export to CSV). This means that the same export operation with the same parameters always generates the same path.
version – Version is the version number of the result of the export operation. It is a non negative integer. LynxKite treats export operations as other operations: it remembers the result (which in this case is the knowledge that the export was successfully done) and won’t repeat the calculation. However, there might be a need to export an already exported table with the same set of parameters (e.g. the exported file is lost). In this case you need to change the version number, making that parameters are not the same as in the previous export.
for_download – Set this to “true” if the purpose of this export is file download: in this case LynxKite will repartition the data into one single file, which will be downloaded. The default “no” will result in no such repartition: this performs much better when other, partition-aware tools are used to import the exported data.
- lynx.operations.exportToCsv(path, delimiter, quote, quote_all, header, escape, null_value, date_format, timestamp_format, drop_leading_white_space, drop_trailing_white_space, version, for_download)
CSV stands for comma-separated values. It is a common human-readable file format where each record is on a separate line and fields of the record are simply separated with a comma or other delimiter. CSV does not store data types, so all fields become strings when importing from this format.
- Parameters:
path – The distributed file-system path of the output file. It defaults to <auto>, in which case the path is auto generated from the parameters and the type of export (e.g. Export to CSV). This means that the same export operation with the same parameters always generates the same path.
delimiter – The delimiter separating the fields in each line.
quote – The character used for quoting strings that contain the delimiter. If the string also contains the quote character, it will be escaped with a backslash ({backslash}).
quote_all – Quotes all string values if set. Only quotes in the necessary cases otherwise.
header – Whether or not to include the header in the CSV file. If the data is exported as multiple CSV files the header will be included in each of them. When such a data set is directly downloaded, the header will appear multiple times in the resulting file.
escape – The character used for escaping quotes inside an already quoted value.
null_value – The string representation of a null value. This is how null-s are going to be written in the CSV file.
date_format – The string that indicates a date format. Custom date formats follow the formats at java.text.SimpleDateFormat.
timestamp_format –
The string that indicates a timestamp format. Custom date formats follow the formats at java.text.SimpleDateFormat.
drop_leading_white_space – A flag indicating whether or not leading whitespaces from values being written should be skipped.
drop_trailing_white_space – A flag indicating whether or not trailing whitespaces from values being written should be skipped.
version – Version is the version number of the result of the export operation. It is a non negative integer. LynxKite treats export operations as other operations: it remembers the result (which in this case is the knowledge that the export was successfully done) and won’t repeat the calculation. However, there might be a need to export an already exported table with the same set of parameters (e.g. the exported file is lost). In this case you need to change the version number, making that parameters are not the same as in the previous export.
for_download – Set this to “true” if the purpose of this export is file download: in this case LynxKite will repartition the data into one single file, which will be downloaded. The default “no” will result in no such repartition: this performs much better when other, partition-aware tools are used to import the exported data.
- lynx.operations.exportToDelta(path, version, for_download)
Export data to a Delta table.
- Parameters:
path – The distributed file-system path of the output file. It defaults to <auto>, in which case the path is auto generated from the parameters and the type of export (e.g. Export to CSV). This means that the same export operation with the same parameters always generates the same path.
version – Version is the version number of the result of the export operation. It is a non negative integer. LynxKite treats export operations as other operations: it remembers the result (which in this case is the knowledge that the export was successfully done) and won’t repeat the calculation. However, there might be a need to export an already exported table with the same set of parameters (e.g. the exported file is lost). In this case you need to change the version number, making that parameters are not the same as in the previous export.
for_download – Set this to “true” if the purpose of this export is file download: in this case LynxKite will repartition the data into one single file, which will be downloaded. The default “no” will result in no such repartition: this performs much better when other, partition-aware tools are used to import the exported data.
- lynx.operations.exportToHive(table, mode, partition_by)
Export a table directly to Apache Hive.
- Parameters:
table – The name of the database table to export to.
mode –
Describes whether LynxKite should expect a table to already exist and how to handle this case. + The table must not exist means the table will be created and it is an error if it already exists. + Drop the table if it already exists means the table will be deleted and re-created if it already exists. Use this mode with great care. This method cannot be used if you specify any fields to partition by, the reason being that the underlying Spark library will delete all other partitions in the table in this case.
Insert into an existing table requires the table to already exist and it will add the exported data at the end of the existing table.
partition_by – The list of column names (if any) which you wish the table to be partitioned by. This cannot be used in conjunction with the “Drop the table if it already exists” mode.
- lynx.operations.exportToJdbc(url, table, mode)
JDBC is used to connect to relational databases such as MySQL. See <<jdbc-details>> for setup steps required for connecting to a database.
- Parameters:
url – The connection URL for the database. This typically includes the username and password. The exact syntax entirely depends on the database type. Please consult the documentation of the database.
table – The name of the database table to export to.
mode – Describes whether LynxKite should expect a table to already exist and how to handle this case. + The table must not exist means the table will be created and it is an error if it already exists. + Drop the table if it already exists means the table will be deleted and re-created if it already exists. Use this mode with great care. + Insert into an existing table requires the table to already exist and it will add the exported data at the end of the existing table.
- lynx.operations.exportToJson(path, version, for_download)
JSON is a rich human-readable data format. It produces larger files than CSV but can represent data types. Each line of the file stores one record encoded as a JSON object.
- Parameters:
path – The distributed file-system path of the output file. It defaults to <auto>, in which case the path is auto generated from the parameters and the type of export (e.g. Export to CSV). This means that the same export operation with the same parameters always generates the same path.
version – Version is the version number of the result of the export operation. It is a non negative integer. LynxKite treats export operations as other operations: it remembers the result (which in this case is the knowledge that the export was successfully done) and won’t repeat the calculation. However, there might be a need to export an already exported table with the same set of parameters (e.g. the exported file is lost). In this case you need to change the version number, making that parameters are not the same as in the previous export.
for_download – Set this to “true” if the purpose of this export is file download: in this case LynxKite will repartition the data into one single file, which will be downloaded. The default “no” will result in no such repartition: this performs much better when other, partition-aware tools are used to import the exported data.
- lynx.operations.exportToOrc(path, version, for_download)
Apache ORC is a columnar data storage format.
- Parameters:
path – The distributed file-system path of the output file. It defaults to <auto>, in which case the path is auto generated from the parameters and the type of export (e.g. Export to CSV). This means that the same export operation with the same parameters always generates the same path.
version – Version is the version number of the result of the export operation. It is a non negative integer. LynxKite treats export operations as other operations: it remembers the result (which in this case is the knowledge that the export was successfully done) and won’t repeat the calculation. However, there might be a need to export an already exported table with the same set of parameters (e.g. the exported file is lost). In this case you need to change the version number, making that parameters are not the same as in the previous export.
for_download – Set this to “true” if the purpose of this export is file download: in this case LynxKite will repartition the data into one single file, which will be downloaded. The default “no” will result in no such repartition: this performs much better when other, partition-aware tools are used to import the exported data.
- lynx.operations.exportToParquet(path, version, for_download)
Apache Parquet is a columnar data storage format.
- Parameters:
path – The distributed file-system path of the output file. It defaults to <auto>, in which case the path is auto generated from the parameters and the type of export (e.g. Export to CSV). This means that the same export operation with the same parameters always generates the same path.
version – Version is the version number of the result of the export operation. It is a non negative integer. LynxKite treats export operations as other operations: it remembers the result (which in this case is the knowledge that the export was successfully done) and won’t repeat the calculation. However, there might be a need to export an already exported table with the same set of parameters (e.g. the exported file is lost). In this case you need to change the version number, making that parameters are not the same as in the previous export.
for_download – Set this to “true” if the purpose of this export is file download: in this case LynxKite will repartition the data into one single file, which will be downloaded. The default “no” will result in no such repartition: this performs much better when other, partition-aware tools are used to import the exported data.
- lynx.operations.exportVertexAttributesToNeo4j(url, username, password, version, labels, keys)
Exports vertex attributes from a graph in LynxKite to a corresponding graph in Neo4j.
The nodes in Neo4j are identified by a key property (or properties). You must have a corresponding vertex attribute in LynxKite by the same name. This will be used to find the right nodes to update in Neo4j.
The properties of the Neo4j nodes will be updated with the exported vertex attributes using a Cypher query like this:
UNWIND $events as event MATCH (n:Label1:Label2 {key: event.`key`}) SET n +
- Parameters:
url – The Neo4j connection string of the form bolt://localhost:7687.
username – Username for the connection.
password – Password for the connection. It will be saved in the workspace and visible to anyone with access to the workspace.
version – LynxKite only re-computes outputs if parameters or inputs have changed. This is true for exports too. If you want to repeat a previous export, you can increase this export repetition ID parameter.
labels – Makes it possible to restrict the export to one label (or combination of labels) in Neo4j. This is useful to make sure no other node type is accidentally affected. The format is as in Cypher: :Label1:Label2. Leave empty to allow updating any node.
keys – Select the attribute (or attributes) to identify the Neo4j nodes by. The attribute name must match the property name in Neo4j.
- lynx.operations.exposeInternalEdgeId(name)
Exposes the internal edge ID as an attribute. Useful if you want to identify edges, for example in an exported dataset.
- Parameters:
name – The ID attribute will be saved under this name.
- lynx.operations.exposeInternalVertexId(name)
Exposes the internal vertex ID as an attribute. This attribute is automatically generated by operations that generate new vertex sets. (In most cases this is already available as attribute ‘id’.) But you can regenerate it with this operation if necessary.
- Parameters:
name – The ID attribute will be saved under this name.
- lynx.operations.externalComputation1()
- lynx.operations.externalComputation10()
- lynx.operations.externalComputation2()
- lynx.operations.externalComputation3()
- lynx.operations.externalComputation4()
- lynx.operations.externalComputation5()
- lynx.operations.externalComputation6()
- lynx.operations.externalComputation7()
- lynx.operations.externalComputation8()
- lynx.operations.externalComputation9()
- lynx.operations.fillEdgeAttributesWithConstantDefaultValues(title)
An attribute may not be defined on every edge. This operation sets a default value for the edges where it was not defined.
- Parameters:
title – The given value will be set for edges where the attribute is not defined. No change for attributes for which the default value is left empty. The default value must be numeric for number attributes.
- lynx.operations.fillVertexAttributesWithConstantDefaultValues(title)
An attribute may not be defined on every vertex. This operation sets a default value for the vertices where it was not defined.
- Parameters:
title – The given value will be set for vertices where the attribute is not defined. No change for attributes for which the default value is left empty. The default value must be numeric for number attributes.
- lynx.operations.filterByAttributes(ref1, ref2, ref3, ref4, ref5, ref6, ref7, ref8, ref9)
Keeps only vertices and edges that match the specified filters.
You can specify filters for multiple attributes at the same time, in which case you will be left with vertices/edges that match all of your filters.
Regardless of the exact the filter, whenever you specify a filter for an attribute you always restrict to those edges/vertices where the attribute is defined. E.g. if say you have a filter requiring age > 10, then you will only keep vertices where age attribute is defined and the value of age is more than ten.
The filtering syntax depends on the type of the attribute in most cases.
[p-ref1]#Match all filter#:: For every attribute type * matches all defined values. This is useful for discarding vertices/edges where a specific attribute is undefined.
[p-ref2]#Comma separated list#:: This filter is a comma-separated list of values you want to match. It can be used for String and number types. For example medium,high would be a String filter to match these two values only, e.g., it would exclude low values. Another example is 19,20,30.
[p-ref3]#Comparison filters#:: These filters are available for String and number types. You can specify bounds, with the <, >, `<
- Parameters:
ref1 – For every attribute type * matches all defined values. This is useful for discarding vertices/edges where a specific attribute is undefined.
ref2 – This filter is a comma-separated list of values you want to match. It can be used for String and number types. For example medium,high would be a String filter to match these two values only, e.g., it would exclude low values. Another example is 19,20,30.
ref3 – These filters are available for String and number types. You can specify bounds, with the <, >, `<
ref4 – For String and number types you can specify intervals with brackets. The parenthesis (( )) denotes an exclusive boundary and the square bracket (`
ref5 – For String attributes, regex filters can also be applied. The following tips and examples can be useful: * regex(xyz) for finding strings that contain xyz. * regex(^Abc) for strings that start with Abc. * regex(Abc$) for strings that end with Abc. * regex((.)) for strings with double letters, like abbc. * regex(d) or `regex(
ref6 – For the `Vector
ref7 – These filters can be used for attributes whose type is Vector. The filter all(…) will match the Vector only when the internal filter matches all elements of the Vector. You can also use forall and Ɐ as synonyms. For example all(<0) for a `Vector
ref8 – Any filter can be prefixed with ! to negate it. For example !medium will exclude medium values. Another typical usecase for this is specifying ! (a single exclamation mark character) as the filter for a String attribute. This is interpreted as non-empty, so it will restrict to those vertices/edges where the String attribute is defined and its value is not empty string. Remember, all filters work on defined values only, so !* will not match any vertices/edges.
ref9 – If you need a string filter that contains a character with a special meaning (e.g., >), use double quotes around the string. E.g., `>”
- lynx.operations.filterWithSql(vertex_filter, edge_filter, filter)
Filters a graph or table with SQL expressions.
This has the same effect as using a SQL box with
select * from vertices where <FILTER>
andselect * from edge_attributes where <FILTER>
and then recombining the tables into a graph. But it is more efficient.When used with a table input it is identical to a SQL box with
select * from input where <FILTER>
. But it saves a bit of typing.- Parameters:
vertex_filter – Filter the vertices with this SQL expression when the input is a graph. For example you could write
age > 30 and income < age * 2000
.edge_filter – Filter the edges with this SQL expression when the input is a graph. For example you could write
duration > count * 10 or kind like '%*message*%'
.filter – Filter with this SQL expression when the input is a table. For example you could write
age > 30 and income < age * 2000
.
- lynx.operations.findCommunitiesWithLabelPropagation(name, weight, variant)
Uses the label propagation algorithm to identify communities in the graph. The communities are represented as a segmentation on the graph.
Label propagation starts with assigning a unique label to each vertex. Then each vertex takes on the most common label in their neighborhood. This step is repeated until the labels stabilize.
Uses the NetworKit implementations of PLP and LPDegreeOrdered.
- Parameters:
name – The name of the newly created segmentation.
weight – The neighboring labels are weighted with the edge weight. A bigger weight results in that neighbor having a bigger influence in the label update step.
variant –
The results of label propagation depend greatly on the order of the updates. The available options are: - classic: An efficient method that uses an arbitrary ordering and parallel updates. - degree-ordered: A more predictable method that performs the updates in increasing
order of degree.
- lynx.operations.findCommunitiesWithTheLouvainMethod(name, weight, resolution)
Uses the Louvain method to identify communities in the graph. The communities are represented as a segmentation on the graph.
The Louvain method is a greedy optimization toward maximal modularity. High modularity means many edges within communities and few edges between communities. Specifically we compare the edge counts to what we would expect if the clusters were chosen at random.
Uses the NetworKit implementation.
- Parameters:
name – The name of the newly created segmentation.
weight – Edges can be weighted to contribute more or less to modularity.
resolution – A lower resolution will result in bigger communities. + Also known as the 𝛾 parameter, the expected edge probabilities in the modularity calculation are multiplied by this number. + For details of the physical basis of this parameter see Statistical Mechanics of Community Detection by Joerg Reichardt and Stefan Bornholdt.
- lynx.operations.findConnectedComponents(name, directions)
Creates a segment for every connected component of the graph.
Connected components are maximal vertex sets where a path exists between each pair of vertices.
- Parameters:
name – The new segmentation will be saved under this name.
directions – Ignore directions::: The algorithm adds reversed edges before calculating the components. Require both directions::: The algorithm discards non-symmetric edges before calculating the components.
- lynx.operations.findInfocomCommunities(cliques_name, communities_name, bothdir, min, adjacency_threshold)
Creates a segmentation of overlapping communities.
The algorithm finds maximal cliques then merges them to communities. Two cliques are merged if they sufficiently overlap. More details can be found in https://papers.ssrn.com/sol3/papers.cfm?abstract_id
- Parameters:
cliques_name – A new segmentation with the maximal cliques will be saved under this name.
communities_name – The new segmentation with the infocom communities will be saved under this name.
bothdir – Whether edges have to exist in both directions between all members of a clique. +
min – Cliques smaller than this will not be collected. + This improves the performance of the algorithm, and small cliques are often not a good indicator anyway.
adjacency_threshold – Clique overlap is a measure of the overlap between two cliques relative to their sizes. It is normalized to
- lynx.operations.findKCoreDecomposition(name)
If we deleted all parts of a graph outside of the k-core, all vertices would still have a degree of at least k. More visually, the 0-core is the whole graph. If we discard the isolated vertices we get the 1-core. If we repeatedly discard all degree-1 vertices, we get the 2-core. And so on.
Read more on Wikipedia.
This operation outputs the number of the highest core that each vertex belongs to as a vertex attribute.
- Parameters:
name – The new attribute will be created under this name.
- lynx.operations.findMaximalCliques(name, bothdir, min)
Creates a segmentation of vertices based on the maximal cliques they are the member of. A maximal clique is a maximal set of vertices where there is an edge between every two vertex. Since one vertex can be part of multiple maximal cliques this segmentation might be overlapping.
- Parameters:
name – The new segmentation will be saved under this name.
bothdir – Whether edges have to exist in both directions between all members of a clique. +
min – Cliques smaller than this will not be collected. + This improves the performance of the algorithm, and small cliques are often not a good indicator anyway.
- lynx.operations.findModularClustering(name, weights, max_iterations, min_increment_per_iteration)
Tries to find a partitioning of the vertices with high modularity.
Edges that go between vertices in the same segment increase modularity, while edges that go from one segment to the other decrease modularity. The algorithm iteratively merges and splits segments and moves vertices between segments until it cannot find changes that would significantly improve the modularity score.
- Parameters:
name – The new segmentation will be saved under this name.
weights – The attribute to use as edge weights.
max_iterations – After this number of iterations we stop regardless of modularity increment. Use -1 for unlimited.
min_increment_per_iteration – If the average modularity increment in the last few iterations goes below this then we stop the algorithm and settle with the clustering found.
- lynx.operations.findOptimalSpanningTree(name, weight, optimize, seed)
Finds the https://en.wikipedia.org/wiki/Minimum*spanning*tree[minimum (or maximum) spanning tree] in a graph. The edges marked by the emitted edge attribute (
in_tree
by default) form a tree for each component in the graph. This tree will have the lowest (or highest) possible total edge weight.Uses the NetworKit implementation.
- Parameters:
name – The new edge attribute will be created under this name. Its value will be 1 for the edges that make up the tree and undefined for the edges that are not part of the tree.
weight – Choose a numerical attribute that represents the cost or value of the edges. With unit weights the result is just a random tree for each component.
optimize – Whether to find the tree with the lowest or highest possible total edge weight.
seed – When multiple trees have the optimal weight, one is chosen at random. +
- lynx.operations.findSteinerTree(ename, vname, pname, rname, edge_costs, root_costs, gain)
Given a directed graph in which each vertex has two associated quantities, the “gain”, and the “root cost”, and each edge has an associated quantity, the “cost”, this operation will yield a forest (a set of trees) that is a subgraph of the given graph. Furthermore, in this subgraph, the sum of the gains minus the sum of the (edge and root) costs approximate the maximal possible value.
Finding this optimal subgraph is called the Prize-collecting Steiner Tree Problem.
The operation will result in four outputs: (1) A new edge attribute, which will specify which edges are part of the optimal solution. Its value will be 1.0 for edges that are part of the optimal forest and not defined otherwise; (2) A new vertex attribute, which will specify which vertices are part of the optimal solution. Its value will be 1.0 for vertices that are part of the optimal forest and not defined otherwise. (3) A new graph attribute that contains the net gain, that is, the total sum of the gains minus the total sum of the (edge and root) costs; and (4) A new vertex attribute that will specify the root vertices in the optimal solution: it will be 1.0 for the root vertices and not defined otherwise.
- Parameters:
ename – The new edge attribute will be created under this name, to pinpoint the edges in the solution.
vname – The new vertex attribute will be created under this name, to pinpoint the vertices in the solution.
pname – The profit will be reported under this name.
rname – The new vertex attribute will be created under this name, to pinpoint the tree roots in the optimal solution.
edge_costs – This edge attribute specified here will determine the cost for including the given edge in the solution. Negative and undefined values are treated as 0.
root_costs – The vertex attribute specified here determines the cost for allowing the given vertex to be a starting point (the root) of a tree in the solution forest. Negative or undefined values mean that the vertex cannot be used as a root point.
gain – This vertex attribute specifies the reward (gain) for including the given vertex in the solution. Negative or undefined values are treated as 0.
- lynx.operations.findTriangles(name, bothdir)
Creates a segment for every triangle in the graph. A triangle is defined as 3 pairwise connected vertices, regardless of the direction and number of edges between them. This means that triangles with one or more multiple edges are still only counted once, and the operation does not differentiate between directed and undirected triangles. Since one vertex can be part of multiple triangles this segmentation might be overlapping.
- Parameters:
name – The new segmentation will be saved under this name.
bothdir – Whether edges have to exist in both directions between all members of a triangle. + If the direction of the edges is not important, set this to false. This will allow placing two vertices into the same clique even if they are only connected in one direction.
- lynx.operations.fingerprintBasedOnAttributes(leftname, rightname, weights, mo, ms, extra)
In a graph that has two different String identifier attributes (e.g. Facebook ID and MSISDN) this operation will match the vertices that only have the first attribute defined with the vertices that only have the second attribute defined. For the well-matched vertices the new attributes will be added. (For example if a vertex only had an MSISDN and we found a matching Facebook ID, this will be saved as the Facebook ID of the vertex.)
The matched vertices will not be automatically merged, but this can easily be performed with the <<Merge vertices by attribute>> operation on either of the two identifier attributes.
- Parameters:
leftname – Two identifying attributes have to be selected.
rightname – Two identifying attributes have to be selected.
weights – What number edge attribute to use as edge weight. The edge weights are also considered when calculating the similarity between two vertices.
mo – The number of common neighbors two vertices must have to be considered for matching. It must be at least 1. (If two vertices have no common neighbors their similarity would be zero anyway.)
ms – The similarity threshold below which two vertices will not be considered a match even if there are no better matches for them. Similarity is normalized to
extra – You can use this box to further tweak how the fingerprinting operation works. Consult with a Lynx expert if you think you need this.
- lynx.operations.graphRejoin(attrs, segs, edge)
This operation allows the user to join (i.e., carry over) attributes from one graph to another one. This is only allowed when the target of the join (where the attributes are taken to) and the source (where the attributes are taken from) are compatible. Compatibility in this context means that the source and the target have a “common ancestor”, which makes it possible to perform the join. Suppose, for example, that operation <<take-edges-as-vertices>> have been applied, and then some new vertex attributes have been computed on the resulting graph. These new vertex attributes can now be joined back to the original graph (that was the input for <<take-edges-as-vertices>>), because there is a correspondence between the edges of the original graph and the vertices that contain the newly computed vertex attributes.
Conversely, the edges and the vertices of a graph will not be compatible (even if the number of edges is the same as the number of vertices), because no such correspondence can be established between the edges and the vertices in this case.
Additionally, it is possible to join segmentations from another graph. This operation has an additional requirement (besides compatibility), namely, that both the target of the join (the left side) and the source be vertices (and not edges).
Please, bear it in mind that both attributes and segmentations will overwrite the original attributes and segmentations on the right side in case there is a name collision.
When vertex attributes are joined, it is also possible to copy over the edges from the source graph (provided that the source graph has edges). In this case, the original edges in the target graph are dropped, and the source edges (along with their attributes) will take their place.
- Parameters:
attrs – Attributes that should be joined to the graph. They overwrite attributes in the target graph which have identical names.
segs – Segmentations to join to the graph. They overwrite segmentations in the target side of the graph which have identical names.
edge – When set, the edges of the source graph (and their attributes) will replace the edges of the target graph.
- lynx.operations.graphUnion()
The resulting graph is just a disconnected graph containing the vertices and edges of the two originating graphs. All vertex and edge attributes are preserved. If an attribute exists in both graphs, it must have the same data type in both.
The resulting graph will have as many vertices as the sum of the vertex counts in the two source graphs. The same with the edges.
Segmentations are discarded.
- lynx.operations.graphVisualization()
Creates a visualization from the input graph. You can use the box parameter popup to define the parameters and layout of the visualization. See <<graph-visualizations>> for more details.
- lynx.operations.growSegmentation(direction)
Grows the segmentation along edges of the parent graph.
This operation modifies this segmentation by growing each segment with the neighbors of its elements. For example if vertex A is a member of segment X and edge A→B exists in the original graph then B also becomes the member of X (depending on the value of the direction parameter).
This operation can be used together with <<Use base graph as segmentation>> to create a segmentation of neighborhoods.
- Parameters:
direction – Adds the neighbors to the segments using this direction.
- lynx.operations.hashVertexAttribute(attr, salt)
Uses the SHA-256 algorithm to hash an attribute: all values of the attribute get replaced by a seemingly random value. The same original values get replaced by the same new value and different original values get (almost certainly) replaced by different new values.
Treat the salt like a password for the data. Choose a long string that the recipient of the data has no chance of guessing. (Do not use the name of a person or project.)
The salt must begin with the prefix SECRET( and end with ), for example SECRET(qCXoC7l0VYiN8Qp). This is important, because LynxKite will replace such strings with three asterisks when writing log files. Thus, the salt cannot appear in log files. Caveat: Please note that the salt must still be saved to disk as part of the workspace; only the log files are filtered this way.
To illustrate the mechanics of irreversible hashing and the importance of a good salt string, consider the following example. We have a data set of phone calls and we have hashed the phone numbers. Arthur gets access to the hashed data and learns or guesses the salt. Arthur can now apply the same hashing to the phone number of Guinevere as was used on the original data set and look her up in the graph. He can also apply hashing to the phone numbers of all the knights of the round table and see which knight has Guinevere been making calls to.
- Parameters:
attr – The attribute(s) which will be hashed.
salt – The value of the salt.
- lynx.operations.importAvro(filename)
Apache AVRO is a row-oriented remote procedure call and data serialization framework.
- Parameters:
filename – The distributed file-system path of the file. See <<prefixed-paths>> for more details on specifying paths.
- lynx.operations.importCsv(filename, columns, delimiter, quote, escape, null_value, date_format, timestamp_format, ignore_leading_white_space, ignore_trailing_white_space, comment, error_handling, infer)
CSV stands for comma-separated values. It is a common human-readable file format where each record is on a separate line and fields of the record are simply separated with a comma or other delimiter. CSV does not store data types, so all fields become strings when importing from this format.
- Parameters:
filename – Upload a file by clicking the +++<label class
columns – The names of all the columns in the file, as a comma-separated list. If empty, the column names will be read from the file. (Use this if the file has a header.)
delimiter – The delimiter separating the fields in each line.
quote – The character used for escaping quoted values where the delimiter can be part of the value.
escape – The character used for escaping quotes inside an already quoted value.
null_value – The string representation of a null value in the CSV file. For example if set to undefined, every undefined value in the CSV file will be converted to Scala null-s. By default this is an empty string, so empty strings are converted to null-s upon import.
date_format –
The string that indicates a date format. Custom date formats follow the formats at java.text.SimpleDateFormat.
timestamp_format –
The string that indicates a timestamp format. Custom date formats follow the formats at java.text.SimpleDateFormat.
ignore_leading_white_space – A flag indicating whether or not leading whitespaces from values being read should be skipped.
ignore_trailing_white_space – A flag indicating whether or not trailing whitespaces from values being read should be skipped.
comment – Every line beginning with this character is skipped, if set. For example if the comment character is # the following line is ignored in the CSV file: # This is a comment.
error_handling – What should happen if a line has more or less fields than the number of columns? + Fail on any malformed line will cause the import to fail if there is such a line. + Ignore malformed lines will simply omit such lines from the table. In this case an erroneously defined column list can result in an empty table. + Salvage malformed lines: truncate or fill with nulls will still import the problematic lines, dropping some data or inserting undefined values.
infer – Automatically detects data types in the CSV. For example a column full of numbers will become a Double. If disabled, all columns are imported as ``String``s.
- lynx.operations.importDelta(filename, version_as_of)
Import a Delta Table.
- Parameters:
filename – The distributed file-system path of the file. See <<prefixed-paths>> for more details on specifying paths.
version_as_of – Version of the Delta table to be imported. The empty string corresponds to the latest version.
- lynx.operations.importFromHive(table_name)
Import an Apache Hive table directly to LynxKite.
- Parameters:
table_name – The name of the Hive table to import.
- lynx.operations.importFromNeo4j(url, username, password, vertex_query, edge_query)
Import a graph from the Neo4j graph database.
Neo4j does not have a strict schema. Different nodes may have different attributes. In LynxKite the list of vertex attributes is defined for the whole graph. But each vertex may leave any attribute undefined.
If you import Neo4j nodes that have different attributes, such as movies that have a title and actors that have a name, the resulting graph will have both title and name attributes. title will only be defined on movies, name will only be defined on actors.
The same happens with edges.
If multiple node types have attributes of the same name, those attributes need to have the same type. If this is not the case, you can narrow down the query by node label.
- Parameters:
url – The connection URI for Neo4j.
username – The username to use for the connection.
password – The password to use for the connection.
vertex_query – The Cypher query to run on Neo4j to get the vertices. This query must return a node named node. The default query imports all the nodes from Neo4j. Leave empty to not import vertex attributes.
edge_query – The Cypher query to run on Neo4j to get the edges. This query must return a relationship named rel. The default query imports all the relationships from Neo4j. Leave empty to not import edges.
- lynx.operations.importJdbc(jdbc_url, jdbc_table, key_column, num_partitions, partition_predicates)
JDBC is used to connect to relational databases such as MySQL. See <<jdbc-details>> for setup steps required for connecting to a database.
- Parameters:
jdbc_url – The connection URL for the database. This typically includes the username and password. The exact syntax entirely depends on the database type. Please consult the documentation of the database.
jdbc_table – The name of the database table to import. + All identifiers have to be properly quoted according to the SQL syntax of the source database. + The following formats may work depending on the type of the source database: + * TABLE_NAME * SCHEMA_NAME.TABLE_NAME * (SELECT * FROM TABLE_NAME WHERE <filter condition>) TABLE_ALIAS + In the last example the filtering query runs on the source database, before the import. It can dramatically reduce network traffic needed for the import operation and it makes possible to use data source specific SQL dialects.
key_column – This column is used to partition the SQL query. The range from min(key) to max(key) will be split into a sub-range for each Spark worker, so they can each query a part of the data in parallel. + Pick a column that is uniformly distributed. Numerical identifiers will give the best performance. String (VARCHAR) columns are also supported but only work well if they mostly contain letters of the English alphabet and numbers. + If the partitioning column is left empty, only a fraction of the cluster resources will be used. + The column name has to be properly quoted according to the SQL syntax of the source database.
num_partitions – LynxKite will perform this many SQL queries in parallel to get the data. Leave at zero to let LynxKite automatically decide. Set a specific value if the database cannot support that many queries.
partition_predicates – This advanced option provides even greater control over the partitioning. It is an alternative option to specifying the key column. Here you can specify a comma-separated list of WHERE clauses, which will be used as the partitions. + For example you could provide `AGE < 30, AGE >
- lynx.operations.importJson(filename)
JSON is a rich human-readable data format. JSON files are larger than CSV files but can represent data types. Each line of the file in this format stores one record encoded as a JSON object.
- Parameters:
filename – Upload a file by clicking the +++<label class
- lynx.operations.importOrc(filename)
Apache ORC is a columnar data storage format.
- Parameters:
filename – The distributed file-system path of the file. See <<prefixed-paths>> for more details on specifying paths.
- lynx.operations.importParquet(filename)
Apache Parquet is a columnar data storage format.
- Parameters:
filename – The distributed file-system path of the file. See <<prefixed-paths>> for more details on specifying paths.
- lynx.operations.importSnapshot(path)
Makes a previously saved snapshot accessible from the workspace.
- Parameters:
path – The full path to the snapshot in LynxKite’s virtual filesystem.
- lynx.operations.importUnionOfTableSnapshots(paths)
Makes the union of a list of previously saved table snapshots accessible from the workspace as a single table.
The union works as the UNION ALL command in SQL and does not remove duplicates.
- Parameters:
paths –
The comma separated set of full paths to the snapshots in LynxKite’s virtual filesystem.
Each path has to refer to a table snapshot.
The tables have to have the same schema.
The output table will union the input tables in the same order as defined here.
- lynx.operations.importWellKnownGraphDataset(name)
Gives easy access to graph datasets commonly used for benchmarks.
See the PyTorch Geometric documentation for details about the specific datasets.
- Parameters:
name – Which dataset to import.
- lynx.operations.input(name)
This special box represents an input that comes from outside of this workspace. This box will not have a valid output on its own. When this workspace is used as a custom box in another workspace, the custom box will have one input for each input box. When the inputs are connected, those input states will appear on the outputs of the input boxes.
Input boxes without a name are ignored. Each input box must have a different name.
See the section on <<custom-boxes>> on how to use this box.
- Parameters:
name – The name of the input, when the workspace is used as a custom box.
- lynx.operations.linkBaseGraphAndSegmentationByFingerprint(mo, ms, extra)
Finds the best matching between a base graph and a segmentation. It considers a base vertex A and a segment B a good “match” if the neighborhood of A (including A) is very connected to the neighborhood of B (including B) according to the current connections between the graph and the segmentation.
The result of this operation is a new edge set between the base graph and the segmentation, that is a one-to-one matching.
- Parameters:
mo – The number of common neighbors two vertices must have to be considered for matching. It must be at least 1. (If two vertices have no common neighbors their similarity would be zero anyway.)
ms – The similarity threshold below which two vertices will not be considered a match even if there are no better matches for them. Similarity is normalized to
extra – You can use this box to further tweak how the fingerprinting operation works. Consult with a Lynx expert if you think you need this.
- lynx.operations.lookupRegion(position, shapefile, attribute, ignoreUnsupportedShapes, output)
For every position vertex attribute looks up features in a Shapefile and returns a specified attribute.
The lookup depends on the coordinate reference system of the feature. The input position must use the same coordinate reference system as the one specified in the Shapefile.
If there are no matching features the output is omitted.
If the specified attribute does not exist for any matching feature the output is omitted.
If there are multiple suitable features the algorithm picks the first one.
Shapefiles can be obtained from various sources, like OpenStreetMap.
- Parameters:
position – The (latitude, longitude) location tuple.
shapefile – The Shapefile used for the lookup. The list is created from the files in the KITE_META/resources/shapefiles directory. A Shapefile consist of a .shp, .shx and .dbf file of the same name.
attribute – The attribute in the Shapefile used for the output.
ignoreUnsupportedShapes – If set true, silently ignores unknown shape types potentially contained by the Shapefile. Otherwise throws an error.
output – The name of the new vertex attribute.
- lynx.operations.makeAllSegmentsEmpy()
Throws away all segmentation links.
- lynx.operations.mapHyperbolicCoordinates(seed)
Experimental Feature
Map an undirected graph to a hyperbolic surface. Vertices get two attributes called “radial” and “angular” that can be used for edge strength evaluation or link prediction. The algorithm is based on Network Mapping by Replaying Hyperbolic Growth.
The coordinates are generated by simulating hyperbolic growth. The algorithm’s results are most useful when the graph to be mapped follows a power-law degree distribution and has high clustering.
- Parameters:
seed – The random seed. +
- lynx.operations.mergeParallelEdges()
Multiple edges going from A to B will be merged into a single edge. The edges going from A to B are not merged with edges going from B to A.
Edge attributes can be aggregated across the merged edges.
- lynx.operations.mergeParallelEdgesByAttribute(key)
Multiple edges going from A to B that share the same value of the given edge attribute will be merged into a single edge. The edges going from A to B are not merged with edges going from B to A.
- Parameters:
key –
The edge attribute on which the merging will be based.
include::glossary.asciidoc
- lynx.operations.mergeParallelSegmentationLinks()
Multiple segmentation links going from A base vertex to B segmentation vertex will be merged into a single link.
After performing a <<merge-vertices-by-attribute, Merge vertices by attribute>> operation, there might be multiple parallel links going between some of the base graph and segmentation vertices. This can cause unexpected behavior when aggregating to or from the segmentation. This operation addresses this behavior by merging parallel segmentation links.
- lynx.operations.mergeTwoEdgeAttributes(name, attr1, attr2)
An attribute may not be defined on every edge. This operation uses the secondary attribute to fill in the values where the primary attribute is undefined. If both are undefined on an edge then the result is undefined too.
- Parameters:
name – The new attribute will be created under this name.
attr1 – If this attribute is defined on an edge, then its value will be copied to the output attribute.
attr2 – If the primary attribute is not defined on an edge but the secondary attribute is, then the secondary attribute’s value will be copied to the output variable.
- lynx.operations.mergeTwoVertexAttributes(name, attr1, attr2)
An attribute may not be defined on every vertex. This operation uses the secondary attribute to fill in the values where the primary attribute is undefined. If both are undefined on a vertex then the result is undefined too.
- Parameters:
name – The new attribute will be created under this name.
attr1 – If this attribute is defined on a vertex, then its value will be copied to the output attribute.
attr2 – If the primary attribute is not defined on a vertex but the secondary attribute is, then the secondary attribute’s value will be copied to the output variable.
- lynx.operations.mergeVerticesByAttribute(key)
Merges each set of vertices that are equal by the chosen attribute. Vertices where the chosen attribute is not defined are discarded. Aggregations can be specified for how to handle the rest of the attributes, which may be different among the merged vertices. Any edge that connected two vertices that are merged will become a loop.
Merge vertices by attributes might create parallel links between the base graph and its segmentations. If it is important that there are no such parallel links (e.g. when performing aggregations to and from segmentations), make sure to run the <<merge-parallel-segmentation-links, Merge parallel segmentation links>> operation on the segmentations in question.
- Parameters:
key – If a set of vertices have the same value for the selected attribute, they will all be merged into a single vertex.
- lynx.operations.oneHotEncodeAttribute(output, catAttr, categories)
Encodes a categorical String attribute into a one-hot Vector[number]. For example, if you apply it to the name attribute of the example graph with categories Adam,Eve,Isolated Joe,Sue, you end up with
- Parameters:
output – The new attribute will be created under this name.
catAttr – The attribute you would like to turn into a one-hot Vector.
categories – Possible categories separated by commas.
- lynx.operations.output(name)
This special box represents an output that goes outside of this workspace. When this workspace is used as a custom box in another workspace, the custom box will have one output for each output box.
Output boxes without a name are ignored. Each output box must have a different name.
See the section on <<custom-boxes>> on how to use this box.
- Parameters:
name – The name of the output, when the workspace is used as a custom box.
- lynx.operations.placeVerticesWithEdgeLengths(name, dimensions, length, algorithm, pivots, radius, tolerance)
These methods create a graph layout as a new
Vector[number]
vertex attribute where the edges have the given lengths, or as close to those as possible.Uses the NetworKit implementations for PivotMDS and MaxentStress.
- Parameters:
name – The position attribute will be saved under this name.
dimensions – The dimensions of the space where the vertices are placed. The created ``Vector``s will be this long.
length – This edge attribute can specify the length that each edge should be.
algorithm –
The algorithms offered are: - Pivot MDS picks a number of pivot vertices (spread out as much as possible) and
finds a solution that puts all other vertices the right distance from the pivots through an iterative matrix eigendecomposition method. + See Eigensolver Methods for Progressive Multidimensional Scaling of Large Data by Ulrik Brandes and Christian Pich for the detailed definition and analysis.
Maxent-Stress is recommended when there are many different ways to satisfy the edge length constraints. (Such as in graphs with low degrees or in high-dimensional spaces.) It picks from the large solution space by maximizing the solution’s entropy. + Cannot handle disconnected graphs. + See A Maxent-Stress Model for Graph Layout by Gansner et al for the detailed definition and analysis.
pivots – The number of pivots to choose for Pivot MDS. More pivots result in a more accurate layout and a longer computation time.
radius – Maxent-Stress applies the stress model between vertices within this many hops from each other.
tolerance – Maxent-Stress uses an algebraic solver to optimize the vertex positions. This parameter allows tuning the solver to provide faster but less accurate solutions.
- lynx.operations.predictEdgesWithHyperbolicPositions(size, externaldegree, internaldegree, exponent, radial, angular)
Creates additional edges in a graph based on hyperbolic distances between vertices.
2 * size edges will be added because
the new edges are undirected. Vertices must have two number vertex attributes to be used as radial and angular coordinates.
The algorithm is based on Popularity versus Similarity in Growing Networks and Network Mapping by Replaying Hyperbolic Growth.
- Parameters:
size – The number of edges to generate. The total number will be 2 * size because every edge is added in two directions.
externaldegree – The number of edges a vertex creates from itself upon addition to the growth simulation graph.
internaldegree – The average number of edges created between older vertices whenever a new vertex is added to the growth simulation graph.
exponent – The exponent of the power-law degree distribution. Values can be 0.5 - 1, endpoints excluded.
radial – The vertex attribute to be used as radial coordinates. Should not contain negative values.
angular – The vertex attribute to be used as angular coordinates. Values should be 0 - 2 * Pi.
- lynx.operations.predictVertexAttribute(label, features, method)
If an attribute is defined for some vertices but not for others, machine learning can be used to fill in the blanks. A model is built from the vertices where the attribute is defined and the model predictions are generated for all the vertices.
The prediction is created in a new attribute named after the predicted attribute, such as age_prediction.
This operation only supports number-typed attributes. You can come up with ways to map other types to numbers to include them in the prediction. For example mapping gender to 0.0 and 1.0 makes sense.
- Parameters:
label – The partially defined attribute that you want to predict.
features – The attributes that will be used as the input of the predictions. Predictions will be generated for vertices where all of the predictors are defined.
method –
Linear regression with no regularization.
Ridge regression (also known as Tikhonov regularization) with L2-regularization.
Lasso with L1-regularization.
Logistic regression for binary classification. (The predicted attribute must be 0 or 1.)
Naive Bayes classifier with multinomial event model.
Decision tree with maximum depth 5 and 32 bins for all features.
Random forest of 20 trees of depth 5 with 32 bins. One third of features are considered for splits at each node.
Gradient-boosted trees produce ensembles of decision trees with depth 5 and 32 bins.
- lynx.operations.predictWithGcn(save_as, features, label, model)
Uses a trained Graph Convolutional Network to make predictions.
- Parameters:
save_as – The prediction will be saved as an attribute under this name.
features – Vector attribute containing the features to be used as inputs for the algorithm.
label – The attribute we want to predict. (This is used if the model was trained to use the target labels as additional inputs.)
model – The model to use for the prediction.
- lynx.operations.predictWithModel(name, model)
Creates predictions from a model and vertex attributes of the graph.
- Parameters:
name – The new attribute of the predictions will be created under this name.
model – The model used for the predictions and a mapping from vertex attributes to the model’s features. + Every feature of the model needs to be mapped to a vertex attribute.
- lynx.operations.pullSegmentationOneLevelUp()
Creates a copy of a segmentation in the parent of its parent segmentation. In the created segmentation, the set of segments will be the same as in the original. A vertex will be made member of a segment if it was transitively member of the corresponding segment in the original segmentation. The attributes and sub-segmentations of the segmentation are also copied.
- lynx.operations.reduceAttributeDimensions(save_as, vector, dimensions, method, perplexity)
Transforms (embeds) a Vector attribute to a lower-dimensional space. This is great for laying out graphs for visualizations based on vertex attributes rather than graph structure.
- Parameters:
save_as – The new attribute will be created under this name.
vector – The high-dimensional vertex attribute that we want to embed.
dimensions – Number of dimensions in the output vector.
method – The dimensionality reduction method to use. Principal component analysis or t-SNE. (Implementations provided by scikit-learn.)
perplexity – Size of the vertex neighborhood to consider for t-SNE.
- lynx.operations.renameEdgeAttributes(title)
Changes the name of edge attributes.
- Parameters:
title – If the new name is empty, the attribute will be discarded.
- lynx.operations.renameGraphAttributes(title)
Changes the name of graph attributes.
- Parameters:
title – If the new name is empty, the attribute will be discarded.
- lynx.operations.renameSegmentation(before, after)
Changes the name of a segmentation.
This operation is more easily accessed from the segmentation’s dropdown menu in the graph state view.
- Parameters:
before – The segmentation to rename.
after – The new name.
- lynx.operations.renameVertexAttributes(title)
Changes the name of vertex attributes.
- Parameters:
title – If the new name is empty, the attribute will be discarded.
- lynx.operations.replaceEdgesWithTriadicClosure()
For every A→B→C triplet, creates an A→C edge. The original edges are discarded. The new A→C edge gets the attributes of the original A→B and B→C edges with prefixes “ab_” and “bc_”.
Be aware, in dense graphs a plenty of new edges can be generated.
Possible use case: we are looking for connections between vertices, like same subscriber with multiple devices. We have an edge metric that we think is a good indicator, or we have a model that gives predictions for edges. If we want to calculate this metric, and pick the edges with high values, it is possible that the edge that would be the winner does not exist. Often we think that a transitive closure would add the missing edge. For example, I don’t call my second phone, but I call a lot of the same people from the two phones.
- lynx.operations.replaceWithEdgeGraph()
Creates the edge graph (or line graph), where each vertex corresponds to an edge in the current graph. The vertices will be connected, if one corresponding edge is the continuation of the other.
- lynx.operations.reverseEdgeDirection()
Replaces every A→B edge with its reverse edge (B→A).
Attributes are preserved. Running this operation twice gets back the original graph.
- lynx.operations.sampleEdgesFromCoOccurrence(probability, seed)
Connects vertices in the parent graph with a given probability if they co-occur in any segments. Multiple co-occurrences will have the same chance of being selected as single ones. Loop edges are also included with the same probability.
- Parameters:
probability – The probability of choosing a vertex pair. The expected value of the number of created vertices will be probability * number of edges without parallel edges.
seed – The random seed. +
- lynx.operations.sampleGraphByRandomWalks(startpoints, walksfromonepoint, walkabortionprobability, vertexattrname, edgeattrname, seed)
This operation realizes a random walk on the graph which can be used as a small smart sample to test your model on. The walk starts from a randomly selected vertex and at every step either aborts the current walk (with probability Walk abortion probability) and jumps back to the start point or moves to a randomly selected (directed sense) neighbor of the current vertex. After _Number of walks from each start point_ restarts it selects a new start vertex. After Number of start points new start points were selected, it stops. The performance of this algorithm according to different metrics can be found in the following publication, https://cs.stanford.edu/people/jure/pubs/sampling-kdd06.pdf.
The output of the operation is a vertex and an edge attribute which describes which was the first step that ended at the given vertex / traversed the given edge. The attributes are not defined on vertices that were never reached or edges that were never traversed.
Use the <<Filter by attributes>> box to discard the part of the graph outside of the sample. Applying the * filter for first_reached will discard the vertices where the attribute is undefined.
If the resulting sample is still too large, it can be quickly reduced by keeping only the low index nodes and edges. Obtaining a sample with exactly n vertices is also possible with the following procedure.
. Run this operation. Let us denote the computed vertex attribute by first_reached and edge attribute by first_traversed. . Rank the vertices by first_reached. . Filter the vertices by the rank attribute to keep the only vertex of rank n. . Aggregate first_reached to a graph attribute on the filtered graph (use either average, first, max, min, or *most*common_ - there is only one vertex in the filtered graph). . Filter the vertices and edges of the original graph and keep the ones that have smaller or equal first_reached or first_traversed values than the value of the derived graph attribute.
- Parameters:
startpoints – The number of times a new start point is selected.
walksfromonepoint – The number of times the random walk restarts from the same start point before selecting a new start point.
walkabortionprobability – The probability of aborting a walk instead of moving along an edge. Therefore the length of the parts of the walk between two abortions follows a geometric distribution with parameter _Walk abortion probability_.
vertexattrname – The name of the attribute which shows which step reached the given vertex first. It is not defined on vertices that were never reached.
edgeattrname – The name of the attribute which shows which step traversed the given edge first. It is not defined on edges that were never traversed.
seed – The random seed. +
- lynx.operations.saveToSnapshot(path)
Saves the input to a snapshot. The location of the snapshot has to be specified as a full path.
- Parameters:
path – The full path of the target snapshot in the LynxKite directory system.
- lynx.operations.scoreEdgesWithTheForestFireModel(name, spread_prob, burn_ratio, seed)
Produces an edge attribute that reflects the importance of each edge in the spread of information or other communicable effects across the network.
A simple summary of the algorithm would be:
Pick a random vertex. The fire starts here.
With probability p jump to step 4.
Set a neighbor on fire and mark the edge as burnt. Jump to step 2.
This vertex has burnt out. Pick another vertex that is on fire and jump to step 2.
These steps are repeated until the total number of edges burnt reaches the desired multiple of the total edge count. The score for each edge is proportional to the number of simulations in which it was burnt. It is normalized to have a maximum of 1.
The forest fire model was introduced in http://www.cs.cmu.edu/~jure/pubs/powergrowth-tkdd.pdf[Graph Evolution: Densification and Shrinking Diameters] by Leskovec et al.
Uses the NetworKit implementation.
- Parameters:
name – The new graph attribute will be created under this name.
spread_prob – The probability that a vertex on fire will light another neighbor on fire. This would be _1 − p_ in the simple summary in the operation’s description.
burn_ratio – The simulations are repeated until the total number of edges burnt reaches the total number of edges in the graph multiplied by this factor. + Increase to make sure all edges receive a non-zero score. This will also increase the run time.
seed – The seed used for picking where the fires start, which way they spread, and when they stop spreading. + Due to parallelization the algorithm may give different results even with the same seed. +
- lynx.operations.segmentByDoubleAttribute(name, attr, interval_size, overlap)
Segments the vertices by a number vertex attribute.
The domain of the attribute is split into intervals of the given size and every vertex that belongs to a given interval will belong to one segment. Empty segments are not created.
- Parameters:
name – The new segmentation will be saved under this name.
attr – The number attribute to segment by.
interval_size – The attribute’s domain will be split into intervals of this size. The splitting always starts at zero.
overlap – If you enable overlapping intervals, then each interval will have a 50% overlap with both the previous and the next interval. As a result each vertex will belong to two segments, guaranteeing that any vertices with an attribute value difference less than half the interval size will share at least one segment.
- lynx.operations.segmentByEventSequence(name, time_attr, location, algorithm, sequence_length, time_window_step, time_window_length)
Treat vertices as people attending events, and segment them by attendance of sequences of events. There are several algorithms for generating event sequences, see under <<segment-by-event-sequence-algorithm, Algorithm>>.
This operation runs on a segmentation which contains events as vertices, and it is a segmentation over a graph containing people as vertices.
- Parameters:
name – The new segmentation will be saved under this name.
time_attr – The number attribute corresponding the time of events.
location – A segmentation over events or an attribute corresponding to the location of events.
algorithm –
Take continuous event sequences:
Merges subsequent events of the same location, and then takes all the continuous event sequences of length Time window length, with maximal timespan of Time window length. For each of these events, a segment is created for each time bucket the starting event falls into. Time buckets are defined by Time window step and bucketing starts from 0.0 time.
Allow gaps in event sequences:
Takes all event sequences that are no longer than Time window length and then creates a segment for each subsequence with Sequence length.
sequence_length – Number of events in each segment.
time_window_step – Bucket size used for discretizing events.
time_window_length – Maximum time difference between first and last event in a segment.
- lynx.operations.segmentByGeographicalProximity(name, position, shapefile, distance, ignoreUnsupportedShapes)
Creates a segmentation from the features in a Shapefile. A vertex is connected to a segment if the the position vertex attribute is within a specified distance from the segment’s geometry attribute. Feature attributes from the Shapefile become segmentation attributes.
The lookup depends on the coordinate reference system and distance metric of the feature. All inputs must use the same coordinate reference system and distance metric.
This algorithm creates an overlapping segmentation since one vertex can be sufficiently close to multiple GEO segments.
Shapefiles can be obtained from various sources, like OpenStreetMap.
- Parameters:
name – The name of the new geographical segmentation.
position – The (latitude, longitude) location tuple.
shapefile –
The Shapefile used for the lookup. The list is created from the files in the KITE_META/resources/shapefiles directory. A Shapefile consist of a .shp, .shx and .dbf file of the same name.
distance – Vertices are connected to geographical segments if within this distance. The distance has to use the same metric and coordinate reference system as the features within the Shapefile.
ignoreUnsupportedShapes – If set true, silently ignores unknown shape types potentially contained by the Shapefile. Otherwise throws an error.
- lynx.operations.segmentByInterval(name, begin_attr, end_attr, interval_size, overlap)
Segments the vertices by a pair of number vertex attributes representing intervals.
The domain of the attributes is split into intervals of the given size. Each of these intervals will represent a segment. Each vertex will belong to each segment whose interval intersects with the interval of the vertex. Empty segments are not created.
- Parameters:
name – The new segmentation will be saved under this name.
begin_attr – The number attribute corresponding the beginning of intervals to segment by.
end_attr – The number attribute corresponding the end of intervals to segment by.
interval_size – The attribute’s domain will be split into intervals of this size. The splitting always starts at zero.
overlap – If you enable overlapping intervals, then each interval will have a 50% overlap with both the previous and the next interval.
- lynx.operations.segmentByStringAttribute(name, attr)
Segments the vertices by a String vertex attribute.
Every vertex with the same attribute value will belong to one segment.
- Parameters:
name – The new segmentation will be saved under this name.
attr – The String attribute to segment by.
- lynx.operations.segmentByVectorAttribute(name, attr)
Segments the vertices by a vector vertex attribute.
Segments are created from the values in all of the vector attributes. A vertex is connected to every segment corresponding to the elements in the vector.
- Parameters:
name – The new segmentation will be saved under this name.
attr – The vector attribute to segment by.
- lynx.operations.setEdgeAttributeIcons(title)
Associates icons with edge attributes. It has no effect beyond highlighting something on the user interface.
The icons are a subset of the Unicode characters in the “emoji” range, as provided by the Google Noto Font.
- Parameters:
title – Leave empty to remove the icon for the corresponding attribute or add one of the supported icon names, such as snowman*without*snow.
- lynx.operations.setGraphAttributeIcon(name, icon)
Associates an icon with a graph attribute. It has no effect beyond highlighting something on the user interface.
The icons are a subset of the Unicode characters in the “emoji” range, as provided by the Google Noto Font.
- Parameters:
name – The graph attribute to highlight.
icon – One of the supported icon names, such as snowman*without*snow. Leave empty to remove the icon.
- lynx.operations.setSegmentationIcon(name, icon)
Associates an icon with a segmentation. It has no effect beyond highlighting something on the user interface.
The icons are a subset of the Unicode characters in the “emoji” range, as provided by the Google Noto Font.
This operation is more easily accessed from the segmentation’s dropdown menu in the graph state view.
- Parameters:
name – The segmentation to highlight.
icon – One of the supported icon names, such as snowman*without*snow. Leave empty to remove the icon.
- lynx.operations.setVertexAttributeIcons(title)
Associates icons vertex attributes. It has no effect beyond highlighting something on the user interface.
The icons are a subset of the Unicode characters in the “emoji” range, as provided by the Google Noto Font.
- Parameters:
title – Leave empty to remove the icon for the corresponding attribute or add one of the supported icon names, such as snowman*without*snow.
- lynx.operations.snowballSample(ratio, radius, attrname, seed)
This operation creates a small smart sample of a graph. First, a subset of the original vertices is chosen for start points; the ratio of the size of this subset to the size of the original vertex set is the first parameter for the operation. Then a certain neighborhood of each start point is added to the sample; the radius of this neighborhood is controlled by another parameter. The result of the operation is a subgraph of the original graph consisting of the vertices of the sample and the edges between them. This operation also creates a new attribute which shows how far the sample vertices are from the closest start point. (One vertex can be in more than one neighborhood.) This attribute can be used to decide whether a sample vertex is near to a start point or not.
For example, you can create a random sample of the graph to test your model on smaller data set.
- Parameters:
ratio – The (approximate) fraction of vertices to use as starting points.
radius – Limits the size of the neighborhoods of the start points.
attrname – The name of the attribute which shows how far the sample vertices are from the closest start point.
seed – The random seed. +
- lynx.operations.splitEdges(rep, idx)
Split (multiply) edges in a graph. A numeric edge attribute controls how many copies of the edge should exist after the operation. If this attribute is 1, the edge will be kept as it is. If this attribute is zero, the edge will be discarded entirely. Higher values (e.g., 2) will result in more identical copies of the given edge.
After the operation, all previous edge attributes will be preserved; in particular, copies of one edge will have the same values for the previous edge attributes. A new edge attribute (the so called index attribute) will also be created so that you can differentiate between copies of the same edge. If a given edge was multiplied by n times, the n new edges will have n different index attribute values running from 0 to n-1.
- Parameters:
rep – A numeric edge attribute that specifies how many copies of the edge should exist after the operation. (The value is rounded to the nearest integer, so 1.8 will mean 2 copies.)
idx – The name of the attribute that will contain unique identifiers for the otherwise identical copies of the edge.
- lynx.operations.splitToTrainAndTestSet(source, test_set_ratio, seed)
Based on the source attribute, 2 new attributes are created, source*train and source*test. The attribute is partitioned, so every instance is copied to either the training or the test set.
- Parameters:
source – The attribute you want to create train and test sets from.
test_set_ratio – A test set is a random sample of the vertices. This parameter gives the size of the test set as a fraction of the total vertex count.
seed – Random seed. +
- lynx.operations.splitVertices(rep, idx)
Split (multiply) vertices in a graph. A numeric vertex attribute controls how many copies of the vertex should exist after the operation. If this attribute is 1, the vertex will be kept as it is. If this attribute is zero, the vertex will be discarded entirely. Higher values (e.g., 2) will result in more identical copies of the given vertex. All edges coming from and going to this vertex are multiplied (or discarded) appropriately.
After the operation, all previous vertex and edge attributes will be preserved; in particular, copies of one vertex will have the same values for the previous vertex attributes. A new vertex attribute (the so called index attribute) will also be created so that you can differentiate between copies of the same vertex. If a given vertex was multiplied by n times, the n new vertices will have n different index attribute values running from 0 to n-1.
This operation assigns new vertex ids to the vertices; these will be accessible via a new vertex attribute.
- Parameters:
rep – A numberic vertex attribute that specifies how many copies of the vertex should exist after the operation. (The number value is rounded to the nearest integer, so 1.8 will mean 2 copies.)
idx – The name of the attribute that will contain unique identifiers for the otherwise identical copies of the vertex.
- lynx.operations.sql1()
Executes a SQL query on a single input, which can be either a graph or a table. Outputs a table. If the input is a table, it is available in the query as input. For example:
` select * from input `
If the input is a graph, its internal tables are available directly.
- Prefix:
- Maybe-tick:
- Prefix!:
- Maybe-tick!:
- lynx.operations.sql10()
Executes an SQL query on its ten inputs, which can be either graphs or tables. Outputs a table. The inputs are available in the query as one, two, three, four, five, six, seven, eight, nine, ten. For example:
` select * from one union select * from two union select * from three union select * from four union select * from five union select * from six union select * from seven union select * from eight union select * from nine union select * from ten `
- Prefix:
one.
- Maybe-tick:
{backtick}
- Prefix!:
- Maybe-tick!:
- lynx.operations.sql2()
Executes an SQL query on its two inputs, which can be either graphs or tables. Outputs a table. The inputs are available in the query as one and two. For example:
``` select one.*, two.* from one join two on one.id
- lynx.operations.sql3()
Executes an SQL query on its three inputs, which can be either graphs or tables. Outputs a table. The inputs are available in the query as one, two, three. For example:
``` select one.*, two.*, three.* from one join two join three on one.id
- lynx.operations.sql4()
Executes an SQL query on its four inputs, which can be either graphs or tables. Outputs a table. The inputs are available in the query as one, two, three, four. For example:
` select * from one union select * from two union select * from three union select * from four `
- Prefix:
one.
- Maybe-tick:
{backtick}
- Prefix!:
- Maybe-tick!:
- lynx.operations.sql5()
Executes an SQL query on its five inputs, which can be either graphs or tables. Outputs a table. The inputs are available in the query as one, two, three, four, five. For example:
` select * from one union select * from two union select * from three union select * from four union select * from five `
- Prefix:
one.
- Maybe-tick:
{backtick}
- Prefix!:
- Maybe-tick!:
- lynx.operations.sql6()
Executes an SQL query on its six inputs, which can be either graphs or tables. Outputs a table. The inputs are available in the query as one, two, three, four, five, six. For example:
` select * from one union select * from two union select * from three union select * from four union select * from five union select * from six `
- Prefix:
one.
- Maybe-tick:
{backtick}
- Prefix!:
- Maybe-tick!:
- lynx.operations.sql7()
Executes an SQL query on its seven inputs, which can be either graphs or tables. Outputs a table. The inputs are available in the query as one, two, three, four, five, six, seven. For example:
` select * from one union select * from two union select * from three union select * from four union select * from five union select * from six union select * from seven `
- Prefix:
one.
- Maybe-tick:
{backtick}
- Prefix!:
- Maybe-tick!:
- lynx.operations.sql8()
Executes an SQL query on its eight inputs, which can be either graphs or tables. Outputs a table. The inputs are available in the query as one, two, three, four, five, six, seven, eight. For example:
` select * from one union select * from two union select * from three union select * from four union select * from five union select * from six union select * from seven union select * from eight `
- Prefix:
one.
- Maybe-tick:
{backtick}
- Prefix!:
- Maybe-tick!:
- lynx.operations.sql9()
Executes an SQL query on its nine inputs, which can be either graphs or tables. Outputs a table. The inputs are available in the query as one, two, three, four, five, six, seven, eight, nine. For example:
` select * from one union select * from two union select * from three union select * from four union select * from five union select * from six union select * from seven union select * from eight union select * from nine `
- Prefix:
one.
- Maybe-tick:
{backtick}
- Prefix!:
- Maybe-tick!:
- lynx.operations.takeEdgesAsVertices()
Takes a graph and creates a new one where the vertices correspond to the original graph’s edges. All edge attributes in the original graph are converted to vertex attributes in the new graph with the edge_ prefix. All vertex attributes are converted to two vertex attributes with src_ and dst_ prefixes. Segmentations of the original graph are lost.
- lynx.operations.takeSegmentationAsBaseGraph()
Takes a segmentation of a graph and returns the segmentation as a base graph itself.
- lynx.operations.takeSegmentationLinksAsBaseGraph()
Replaces the current graph with the links from its base graph to the selected segmentation, represented as vertices. The vertices will have base_ and segment_ prefixed attributes generated for the attributes on the base graph and the segmentation respectively.
- lynx.operations.trainADecisionTreeClassificationModel(name, label, features, impurity, maxbins, maxdepth, mininfogain, minInstancesPerNode, seed)
Trains a decision tree classifier model using the graph’s vertex attributes. The algorithm recursively partitions the feature space into two parts. The tree predicts the same label for each bottommost (leaf) partition. Each binary partitioning is chosen from a set of possible splits in order to maximize the information gain at the corresponding tree node. For calculating the information gain the impurity of the nodes is used (read more about impurity at the description of the impurity parameter): the information gain is the difference between the parent node impurity and the weighted sum of the two child node impurities. More information about the parameters.
- Parameters:
name – The model will be stored as a graph attribute using this name.
label – The vertex attribute the model is trained to predict.
features – The attributes the model learns to use for making predictions.
impurity –
Node impurity is a measure of homogeneity of the labels at the node and is used for calculating the information gain. There are two impurity measures provided. +
Gini: Let S denote the set of training examples in this node. Gini
impurity is the probability of a randomly chosen element of S to get an incorrect label, if it was randomly labeled according to the distribution of labels in S. - Entropy: Let S denote the set of training examples in this node, and let _f~i~* be the ratio of the i th label in *S_. The entropy of the node is the sum of the _-p~i~log(p~i~)_ values.
maxbins – Number of bins used when discretizing continuous features.
maxdepth – Maximum depth of the tree.
mininfogain – Minimum information gain for a split to be considered as a tree node.
minInstancesPerNode – For a node to be split further, the split must improve at least this much (in terms of information gain).
seed – We maximize the information gain only among a subset of the possible splits. This random seed is used for selecting the set of splits we consider at a node.
- lynx.operations.trainADecisionTreeRegressionModel(name, label, features, maxbins, maxdepth, mininfogain, minInstancesPerNode, seed)
Trains a decision tree regression model using the graph’s vertex attributes. The algorithm recursively partitions the feature space into two parts. The tree predicts the same label for each bottommost (leaf) partition. Each binary partitioning is chosen from a set of possible splits in order to maximize the information gain at the corresponding tree node. For calculating the information gain the variance of the nodes is used: the information gain is the difference between the parent node variance and the weighted sum of the two child node variances. More information about the parameters.
Note: Once the tree is trained there is only a finite number of possible predictions. Because of this, the regression model might seem like a classification. The main difference is that these buckets (“classes”) are invented by the algorithm during the training in order to minimize the variance.
- Parameters:
name – The model will be stored as a graph attribute using this name.
label – The vertex attribute the model is trained to predict.
features – The attributes the model learns to use for making predictions.
maxbins – Number of bins used when discretizing continuous features.
maxdepth – Maximum depth of the tree.
mininfogain – Minimum information gain for a split to be considered as a tree node.
minInstancesPerNode – For a node to be split further, the split must improve at least this much (in terms of information gain).
seed – We maximize the information gain only among a subset of the possible splits. This random seed is used for selecting the set of splits we consider at a node.
- lynx.operations.trainAGcnClassifier(save_as, iterations, features, label, forget, batch_size, learning_rate, hidden_size, num_conv_layers, conv_op, seed)
Trains a Graph Convolutional Network using Pytorch Geometric. Applicable for classification problems.
- Parameters:
save_as – The resulting model will be saved as a graph attribute using this name.
iterations – Number of training iterations.
features – Vector attribute containing the features to be used as inputs for the training algorithm.
label – The attribute we want to predict.
forget – Set true to allow a vertex to see the labels of its neighbors and use them for predicting its own label.
batch_size – In each iteration of the training, we compute the error only on a subset of the vertices. Batch size specifies the size of this subset.
learning_rate – Value of the learning rate.
hidden_size – Size of the hidden layers.
num_conv_layers – Number of convolution layers.
conv_op – The type of graph convolution to use. GCNConv or GatedGraphConv.
seed – Random seed for initializing network weights and choosing training batches.
- lynx.operations.trainAGcnRegressor(save_as, iterations, features, label, forget, batch_size, learning_rate, hidden_size, num_conv_layers, conv_op, seed)
Trains a Graph Convolutional Network using Pytorch Geometric. Applicable for regression problems.
- Parameters:
save_as – The resulting model will be saved as a graph attribute using this name.
iterations – Number of training iterations.
features – Vector attribute containing the features to be used as inputs for the training algorithm.
label – The attribute we want to predict.
forget – Set true to allow a vertex to see the labels of its neighbors and use them for predicting its own label.
batch_size – In each iteration of the training, we compute the error only on a subset of the vertices. Batch size specifies the size of this subset.
learning_rate – Value of the learning rate.
hidden_size – Size of the hidden layers.
num_conv_layers – Number of convolution layers.
conv_op –
The type of graph convolution to use. GCNConv or GatedGraphConv.
seed – Random seed for initializing network weights and choosing training batches.
- lynx.operations.trainAKmeansClusteringModel(name, features, k, max_iter, seed)
Trains a k-means clustering model using the graph’s vertex attributes. The algorithm converges when the maximum number of iterations is reached or every cluster center does not move in the last iteration.
k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster.
For best results it may be necessary to scale the features before training the model.
- Parameters:
name – The model will be stored as a graph attribute using this name.
features – Attributes to be used as inputs for the training algorithm. The trained model will have a list of features with the same names and semantics.
k – The number of clusters to be created.
max_iter – The maximum number of iterations (>
seed – The random seed.
- lynx.operations.trainALogisticRegressionModel(name, label, features, max_iter)
Trains a logistic regression model using the graph’s vertex attributes. The algorithm converges when the maximum number of iterations is reached or no coefficient has changed in the last iteration. The threshold of the model is chosen to maximize the F-score.
Logistic regression measures the relationship between the categorical dependent variable and one or more independent variables by estimating probabilities using a logistic function.
The current implementation of logistic regression only supports binary classes.
- Parameters:
name – The model will be stored as a graph attribute using this name.
label – The vertex attribute for which the model is trained to classify. The attribute should be binary label of either 0.0 or 1.0.
features – Attributes to be used as inputs for the training algorithm.
max_iter – The maximum number of iterations (>
- lynx.operations.trainLinearRegressionModel(name, label, features, method)
Trains a linear regression model using the graph’s vertex attributes.
- Parameters:
name – The model will be stored as a graph attribute using this name.
label – The vertex attribute for which the model is trained.
features – Attributes to be used as inputs for the training algorithm. The trained model will have a list of features with the same names and semantics.
method – The algorithm used to train the linear regression model.
- lynx.operations.transform()
Transforms all columns of a table input via SQL expressions. Outputs a table.
An input parameter is generated for every table column. The parameters are SQL expressions interpreted on the input table. The default value leaves the column alone.
- lynx.operations.useBaseGraphAsSegmentation(name)
Creates a new segmentation which is a copy of the base graph. Also creates segmentation links between the original vertices and their corresponding vertices in the segmentation.
For example, let’s say we have a social network and we want to make a segmentation containing a selected group of people and the segmentation links should represent the original connections between the members of this selected group and other people.
We can do this by first using this operation to copy the base graph to segmentation then using the <<Grow segmentation>> operation to add the necessary segmentation links. Finally, using the <<Filter by attributes>> operation, we can ensure that the segmentation contains only members of the selected group.
- Parameters:
name – The name assigned to the new segmentation. It defaults to the graph’s name.
- lynx.operations.useMetagraphAsGraph(timestamp)
Loads the relationships between LynxKite entities such as attributes and operations as a graph. This complex graph can be useful for debugging or demonstration purposes. Because it exposes data about all graphs, it is only accessible to administrator users.
- Parameters:
timestamp – This number will be used to identify the current state of the metagraph. If you edit the history and leave the timestamp unchanged, you will get the same metagraph as before. If you change the timestamp, you will get the latest version of the metagraph.
- lynx.operations.useOtherGraphAsSegmentation()
Copies another graph into a new segmentation for this one. There will be no connections between the segments and the base vertices. You can import/create those via other operations. (See <<Use table as segmentation links>> and <<Define segmentation links from matching attributes>>.)
It is possible to import the graph itself as segmentation. But even in this special case, there will be no connections between the segments and the base vertices. Another operation, <<Use base graph as segmentation>> can be used if edges are desired.
- lynx.operations.useTableAsEdgeAttributes(id_attr, id_column, prefix, unique_keys, if_exists)
Imports edge attributes for existing edges from a table. This is useful when you already have edges and just want to import one or more attributes.
There are two different use cases for this operation: - Import using unique edge attribute values. For example if the edges represent relationships between people (identified by src and dst IDs) we can import the number of total calls between each two people. In this case the operation fails for duplicate attribute values - i.e. parallel edges. - Import using a normal edge attribute. For example if each edge represents a call and the location of the person making the call is an edge attribute (cell tower ID) we can import latitudes and longitudes for those towers. Here the tower IDs still have to be unique in the lookup table.
- Parameters:
id_attr – The edge attribute which is used to join with the table’s ID column.
id_column – The ID column name in the table. This should be a String column that uses the values of the chosen edge attribute as IDs.
prefix – Prepend this prefix string to the new edge attribute names. This can be used to avoid accidentally overwriting existing attributes.
unique_keys – Assert that the edge attribute values have to be unique if set true. The values of the matching ID column in the table have to be unique in both cases.
if_exists –
If the attribute from the table clashes with an existing attribute of the graph, you can select how to handle this: - Merge, prefer the table’s version: Where the table defines new values, those will be used.
Elsewhere the existing values are kept.
Merge, prefer the graph’s version: Where the edge attribute is already defined, it is left unchanged. Elsewhere the value from the table is used.
Merge, report error on conflict: An assertion is made to ensure that the values in the table are identical to the values in the graph on edges where both are defined.
Keep the graph’s version: The data in the table is ignored.
Use the table’s version: The attribute is deleted from the graph and replaced with the attribute imported from the table.
Disallow this: A name conflict is treated as an error.
- lynx.operations.useTableAsEdges(attr, src, dst)
Imports edges from a table. Your vertices must have an identifying attribute, by which the edges can be attached to them.
- Parameters:
attr – The IDs that are used in the file when defining the edges.
src – The table column that specifies the source of the edge.
dst – The table column that specifies the destination of the edge.
- lynx.operations.useTableAsGraph(src, dst)
Imports edges from a table. Each line in the table represents one edge. Each column in the table will be accessible as an edge attribute.
Vertices will be generated for the endpoints of the edges with two vertex attributes:
stringId will contain the ID string that was used in the table.
id will contain the internal vertex ID.
This is useful when your table contains edges (e.g., calls) and there is no separate table for vertices. This operation makes it possible to load edges and use them as a graph. Note that this graph will never have zero-degree vertices.
- Parameters:
src –
dst –
- lynx.operations.useTableAsSegmentation(name, base_id_attr, base_id_column, seg_id_column)
Imports a segmentation from a table. The table must have a column identifying an existing vertex by a String attribute and another column that specifies the segment it belongs to. Each vertex may belong to any number of segments.
The rest of the columns in the table are ignored.
- Parameters:
name – The imported segmentation will be created under this name.
base_id_attr – The String vertex attribute that identifies the base vertices.
base_id_column – The table column that identifies vertices.
seg_id_column – The table column that identifies segments.
- lynx.operations.useTableAsSegmentationLinks(base_id_attr, base_id_column, seg_id_attr, seg_id_column)
Import the connection between the main graph and this segmentation from a table. Each row in the table represents a connection between one base vertex and one segment.
- Parameters:
base_id_attr – The String vertex attribute that can be joined to the identifying column in the table.
base_id_column – The table column that can be joined to the identifying attribute on the base graph.
seg_id_attr – The String vertex attribute that can be joined to the identifying column in the table.
seg_id_column – The table column that can be joined to the identifying attribute on the segmentation.
- lynx.operations.useTableAsVertexAttributes(id_attr, id_column, prefix, unique_keys, if_exists)
Imports vertex attributes for existing vertices from a table. This is useful when you already have vertices and just want to import one or more attributes.
There are two different use cases for this operation: - Import using unique vertex attribute values. For example if the vertices represent people this attribute can be a personal ID. In this case the operation fails in case of duplicate attribute values (either among vertices or in the table). - Import using a normal vertex attribute. For example this can be a city of residence (vertices are people) and we can import census data for those cities for each person. Here the operation allows duplications of cities among vertices (but not in the lookup table).
- Parameters:
id_attr – The String vertex attribute which is used to join with the table’s ID column.
id_column – The ID column name in the table. This should be a String column that uses the values of the chosen vertex attribute as IDs.
prefix – Prepend this prefix string to the new vertex attribute names. This can be used to avoid accidentally overwriting existing attributes.
unique_keys – Assert that the vertex attribute values have to be unique if set true. The values of the matching ID column in the table have to be unique in both cases.
if_exists –
If the attribute from the table clashes with an existing attribute of the graph, you can select how to handle this: - Merge, prefer the table’s version: Where the table defines new values, those will be used.
Elsewhere the existing values are kept.
Merge, prefer the graph’s version: Where the vertex attribute is already defined, it is left unchanged. Elsewhere the value from the table is used.
Merge, report error on conflict: An assertion is made to ensure that the values in the table are identical to the values in the graph on vertices where both are defined.
Keep the graph’s version: The data in the table is ignored.
Use the table’s version: The attribute is deleted from the graph and replaced with the attribute imported from the table.
Disallow this: A name conflict is treated as an error.
- lynx.operations.useTableAsVertices()
Imports vertices (no edges) from a table. Each column in the table will be accessible as a vertex attribute.
- lynx.operations.weightedAggregateEdgeAttributeGlobally(prefix, weight)
Aggregates edge attributes across the entire graph into one graph attribute for each attribute. For example you could use it to calculate the total income as the sum of call durations weighted by the rates across an entire call dataset.
- Parameters:
prefix – Save the aggregated values with this prefix.
weight – The number attribute to use as weight.
- lynx.operations.weightedAggregateEdgeAttributeToVertices(prefix, weight, direction)
Aggregates an attribute on all the edges going in or out of vertices. For example it can calculate the average cost per second of calls for each person.
- Parameters:
prefix – Save the aggregated attributes with this prefix.
weight – The number attribute to use as weight.
direction –
incoming edges: Aggregate across the edges coming in to each vertex.
outgoing edges: Aggregate across the edges going out of each vertex.
all edges: Aggregate across all the edges going in or out of each vertex.
- lynx.operations.weightedAggregateFromSegmentation(prefix, weight)
Aggregates vertex attributes across all the segments that a vertex in the base graph belongs to. For example, it can calculate an average over the cliques a person belongs to, weighted by the size of the cliques.
- Parameters:
prefix – Save the aggregated attributes with this prefix.
weight – The number attribute to use as weight.
- lynx.operations.weightedAggregateOnNeighbors(prefix, weight, direction)
Aggregates across the vertices that are connected to each vertex. You can use the Aggregate on parameter to define how exactly this aggregation will take place: choosing one of the ‘edges’ settings can result in a neighboring vertex being taken into account several times (depending on the number of edges between the vertex and its neighboring vertex); whereas choosing one of the ‘neighbors’ settings will result in each neighboring vertex being taken into account once.
For example, it can calculate the average age per kilogram of the friends of each person.
- Parameters:
prefix – Save the aggregated attributes with this prefix.
weight – The number attribute to use as weight.
direction –
incoming edges: Aggregate across the edges coming in to each vertex.
outgoing edges: Aggregate across the edges going out of each vertex.
all edges: Aggregate across all the edges going in or out of each vertex.
symmetric edges: Aggregate across the ‘symmetric’ edges for each vertex: this means that if you have n edges going from A to B and k edges going from B to A, then min(n,k) edges will be taken into account for both A and B.
in-neighbors: For each vertex A, aggregate across those vertices that have an outgoing edge to A.
out-neighbors: For each vertex A, aggregate across those vertices that have an incoming edge from A.
all neighbors: For each vertex A, aggregate across those vertices that either have an outgoing edge to or an incoming edge from A.
symmetric neighbors: For each vertex A, aggregate across those vertices that have both an outgoing edge to and an incoming edge from A.
- lynx.operations.weightedAggregateToSegmentation(weight)
Aggregates vertex attributes across all the vertices that belong to a segment. For example, it can calculate the average age per kilogram of each clique.
- Parameters:
weight – The number attribute to use as weight.
- lynx.operations.weightedAggregateVertexAttributeGlobally(prefix, weight)
Aggregates vertex attributes across the entire graph into one graph attribute for each attribute. For example you could use it to calculate the average age across an entire dataset of people weighted by their PageRank.
- Parameters:
prefix – Save the aggregated values with this prefix.
weight – The number attribute to use as weight.