LynxKite features

LynxKite is a powerful open-source analytics tool for very large graphs and other datasets. It scales to billions of edges thanks to the underlying Apache Spark cluster computing engine. It seamlessly combines the benefits of a friendly graphical interface and a powerful Python API.

Hundreds of scalable graph operations, including graph metrics like PageRank, embeddedness, and centrality, machine learning methods including GCNs, graph segmentations like modular clustering, and various transformation tools like aggregations on neighborhoods.
The two main data types are graphs and relational tables. Switch back and forth between the two as needed to describe complex logical flows. Run SQL on both.
A friendly web UI for building powerful pipelines of operation boxes. Define your own custom boxes to structure your logic.
Tight integration with Python lets you implement custom transformations or create whole workflows through a simple API.
Integrates with the Hadoop ecosystem. Import and export from CSV, JSON, Parquet, ORC, JDBC, Hive, or Neo4j.
Fully documented.
Proven in production on large clusters and real datasets.
Fully configurable graph visualizations and statistical plots. Experimental 3D and ray-traced graph renderings.

All of these features are included in our open-source (AGPL) release. We also offer an enterprise version with the following additions:

Collaboration features for multiple users on a shared LynxKite instance.
OAuth and LDAP integration.
Fine-grained access control.
Support and professional services.

Downloads »

LynxKite is under active development. Check out our Roadmap to see what we have planned for future releases.

Algorithms in LynxKite

Graph structure

Edge graph
Random graphs (Barabási–Albert, Dorogovtsev–Mendes, Erdős–Rényi, Mocnik, Chung–Lu, Havel–Hakimi, etc)
Random walks
Snowball sampling

Centrality & other metrics

Assortativity
Betweenness centrality
Closeness centrality
Core decomposition
Degree
Dispersion of connections
Edge embeddedness
Effective diameter
Eigenvector centrality
Harmonic centrality
Katz centrality
K-path centrality
Laplacian centrality
Lin centrality
Local clustering coefficient
PageRank
Sfigality

Community detection

Colocation
Community processing
Information communities
Label propagation
Louvain method
Maximal cliques
Modular clustering
Triangles
(Strongly) connected components

Community metrics

Conductance
Coverage
Edge cut
Expansion
Fragmentation
Hub dominance
Intrapartition density
Modulariy

Dynamic processes

Graph diffusion operators
Graph diffusion through communities
Triadic closure
Viral modelling

Machine learning

Decision trees
Graph convolutional networks (GCN)
k-means clustering
Linear regression
Logistic regression
node2vec
Pearson correlation coefficient

Optimization

Edge prediction via hyperbolic mapping
Maximal spanning tree
Maxent–stress layout
Neighborhood fingerprinting
Pivot MDS layout
Shortest path distance from a set
Steiner tree
Vertex coloring