LynxKite is a powerful open-source analytics tool for very large graphs and other datasets.
It scales to billions of edges thanks to the underlying Apache Spark cluster computing engine.
It seamlessly combines the benefits of a friendly graphical interface and a powerful Python API.
- Hundreds of scalable graph operations, including graph metrics like PageRank, embeddedness,
and centrality, machine learning methods including
GCNs, graph segmentations like modular
clustering, and various transformation tools like aggregations on neighborhoods.
- The two main data types are graphs and relational tables. Switch back and forth between the two
as needed to describe complex logical flows. Run SQL on both.
- A friendly web UI for building powerful pipelines of operation boxes. Define your own custom boxes
to structure your logic.
- Tight integration with Python lets you implement custom transformations or create whole
workflows through a simple API.
- Integrates with the Hadoop ecosystem. Import and export from CSV, JSON, Parquet, ORC, JDBC, Hive,
- Fully documented.
- Proven in production on large clusters and real datasets.
- Fully configurable graph visualizations and statistical plots. Experimental 3D and ray-traced
All of these features are included in our open-source (AGPL) release.
We also offer an enterprise version with the following additions:
- Collaboration features for multiple users on a shared LynxKite instance.
- OAuth and LDAP integration.
- Fine-grained access control.
- Support and professional services.
LynxKite is under active development.
Check out our Roadmap to see what we have planned for future releases.
Algorithms in LynxKite
- Edge graph
- Random graphs (Barabási–Albert, Dorogovtsev–Mendes, Erdős–Rényi, Mocnik, Chung–Lu, Havel–Hakimi, etc)
- Random walks
- Snowball sampling
Centrality & other metrics
- Betweenness centrality
- Closeness centrality
- Core decomposition
- Dispersion of connections
- Edge embeddedness
- Effective diameter
- Eigenvector centrality
- Harmonic centrality
- Katz centrality
- K-path centrality
- Laplacian centrality
- Lin centrality
- Local clustering coefficient
- Community processing
- Information communities
- Label propagation
- Louvain method
- Maximal cliques
- Modular clustering
- (Strongly) connected components
- Edge cut
- Hub dominance
- Intrapartition density
- Graph diffusion operators
- Graph diffusion through communities
- Triadic closure
- Viral modelling
- Decision trees
- Graph convolutional networks (GCN)
- k-means clustering
- Linear regression
- Logistic regression
- Pearson correlation coefficient
- Edge prediction via hyperbolic mapping
- Maximal spanning tree
- Maxent–stress layout
- Neighborhood fingerprinting
- Pivot MDS layout
- Shortest path distance from a set
- Steiner tree
- Vertex coloring