While 2022 brought much misfortune and tragedy around the world, it was kind to our small project. Our engineering team has doubled in size. We have not just worked off technical debt, but turned it into its opposite. (“Technical credit”?) And we have added some stunning new features in the 5.x versions.
Let’s take a closer look!
Lynx has always taken maintenance work seriously. But taking something seriously is not always enough. The best time to make large systemic improvements is when these improvements become the keys to reaching a goal more easily. Then the time investment for maintenance pays off immediately, it’s clear what we need, and everyone is motivated.
This happened several times this year:
We have switched to a faster, more standardized storage format for graph data in #237. Back in 2014 we chose SequenceFiles for data storage. I can’t recall why we didn’t opt for Parquet instead. Perhaps it wasn’t fully supported in Spark 0.9. Perhaps we didn’t realize its advantages yet. (Parquet 1.0 was released just a year earlier.) But better late than never: we’re on Parquet now.
We replaced the system for launching LynxKite. (#269)
LynxKite is now a single JAR that you run with spark-submit
.
LynxKite uses Apache Spark for distributed computing and the Play
Framework for providing a web interface. Both of these have a
system for launching your application for you. We had to pick one to launch LynxKite
and replicate what the other launcher does in our own code. We picked Play in 2014. We understood
Spark’s startup better and Play’s generated startup script was flexible. Our modifications to the
script grew into an elaborate system of its own. (call_spark_submit.sh
) That is gone now. We
flipped the startup – we trust spark-submit
to start LynxKite and we set up
Play ourselves. This lets LynxKite behave like any Spark application. It easily runs in
highly customized Spark environments now.
We’ve switched our build and CI to Earthly. #296 The Makefile is not deleted yet. But Earthly can manage not just the build steps, but the environment and the Docker images too. We share our build cache in the office and with GitHub Actions. The whole setup is simpler and more reliable. (This builds on our work from last year to use Conda for managing all dependencies.)
We’re switching UI testing automation from Protractor to Playwright. #307 LynxKite is an AngularJS application. (Another choice that made sense in 2014.) Protractor is a dedicated test framework for AngularJS. It’s so deprecated at this point that it’s hard to even run it, never mind reliably or with a current browser. I’m happy to report that Playwright is faster. It has already caught bugs that Protractor’s auto-waiting kept hidden. And the migration is not even hard.
We’ve revamped demo.lynxkite.com. We can manage and update it easier now and it has that nice new software smell.
An alternating focus between maintenance and features (a “tick-tock” cycle) is common in software development. You might think 2022 was a “tick” year for LynxKite. But it wasn’t! Check out our new features:
All of these big additions were motivated by the needs of users. In 2023 we will continue to listen to your feedback and focus on what makes the most difference for users.
That sounds a bit like “we have no plans”, doesn’t it? Let me list the areas that are main strengths of LynxKite and what sort of improvements we are considering in each area:
Integrations. Many tools you find in LynxKite are also available elsewhere. But LynxKite brings everything together into one big pot. We have long wanted to add integration with Gremlin graph tools like JanusGraph or Azure Cosmos DB. We’re also looking at compatibility with some tools from the KNIME ecosystem.
Performance. GPU algorithms are already fantastically fast. But the data is currently moved between the GPU and the CPU for each algorithm. Instead, we could keep the data on the GPU and only remove it when necessary.
Interactive exploration. We are considering many improvements to our visualization engine. Streamlined controls, embedding in Jupyter Notebooks, and better support for heterogeneous graph (such as knowledge graphs) are all in the cards.
We can’t wait for your feedback in 2023. Let’s toast to another successful year for LynxKite!