Apache Spark is great for distributed computations on large datasets. It is not a great fit for quick computations on small and medium-size datasets. We are addressing that with Sphynx, our next-generation backend. Sphynx is a high-performance graph analytics execution engine written in Go. Starting from version 3.0, LynxKite can choose to execute operations either on Apache Spark or on Sphynx.
The Sphynx implementations have a number of advantages:
Over the quarter we will add Sphynx implementations for more and more operations. We have seen 100× speedups in laboratory benchmarks. We cannot add Sphynx implementations for every operation. But our goal is to cover enough operations that the user experience is vastly improved.
Status: (2020-01-07) Infrastructure changes are 90% done. We are creating Sphynx implementations of critical operations.
New users cannot build complex workspaces in LynxKite. But that is where LynxKite’s power shines. We are building “wizards” to put the power of complex workspaces at the fingertips of new users.
Advanced users can turn their workspaces into easy-to-use wizards. LynxKite will also come bundled with a selection of useful wizards.
Status: (2020-01-07) Infrastructure is done. We are building wizards.
A common data science project is supervised learning: figuring out the attributes of people or other entities based on a smaller number of known examples. This gets more complicated on graphs, because we cannot consider the vertices of the graph individually. We have to work with the graph as a whole to make the most of the connections between vertices.
LynxKite has a number of tools already for solving such node attribute prediction problems. We want to make it easier to use these tools by offering a fully automated workflow. Just take a CSV, choose what you want to predict, and everything happens automatically.
Status: (2020-01-16) Work started.
Graphs often feature in combinatorial optimization problems. The Prize-Collecting Steiner Tree Problem is of practical importance for finding minimum-cost fiber network layouts. We are adding it to LynxKite.
Status: (2020-01-16) Done. To be released in LynxKite 3.1.
We have invested in neural networks that work with graphs from the start. We believe they are key to finding patterns in densely connected datasets. Our immediate next step is to add two important GNN applications to LynxKite: node embedding and missing attribute prediction. Expect to see more graph AI capabilities coming down the road.
Status: (2020-01-16) Lots of research over 2019. Prototype for executing PyTorch in LynxKite is working.
LynxKite has two systems for indicating progress already. But neither gives you a clear indication of how much work has been completed and how much more you have to wait. We want to address this.
Status: (2020-01-07) Not started.
LynxKite is not a graph database. As a batch analytical tool it complements graph databases nicely. A graph database can answer interactive queries and handle updates. LynxKite can train models on the whole graph or compute metrics for every vertex.
We plan to add import/export functionality for popular graph databases. We are also considering deeper integrations and partnerships.
Status: (2020-01-07) Import from Neo4j is done. Not started on others yet.
LynxKite already has a solid UI for exploring graphs. We plan to look into how to make it even better.
Status: (2020-01-07) Not started.