What is a QLattice?
by: Kevin Broløs & Chris Cave
(Feyn version 3.0 or newer)
The QLattice
The QLattice
is a supervised machine learning tool for symbolic regression, and is a technology developed by Abzu
that is inspired by Richard Feynman's path integral formulation. That's why we've named our python
library Feyn
, and the Q
in QLattice
is for Quantum
.
It composes functions together to build mathematical models between the inputs and output in your dataset. The functions vary from elementary ones such as addition, multiplication and squaring, to more complex ones such as natural logarithm, exponential and tanh.
Overall, symbolic regression approaches tend to keep a high performance, while still maintaining generalisability, which separates it from other popular models such as random forests. Our own benchmarks show similar results.
Feyn
Feyn
is the python module for using the QLattice
, and training models that have been sampled from the QLattice
.
When sampling models, you define criteria in Feyn
that these models must meet. Some examples include: is it a classification or regression problem, which features you want to learn about, what functions you want to include, how complex the models may be, and other such constraints.
The fitting process
A typical process looks like this:
- You sample a few thousand models at a time from a
QLattice
. - You fit them all using a version of backpropagation, and evaluate them on some criteria (such as a variety of loss function and information criteria).
- You discard the worst models.
- You update the
QLattice
with the structures of the best models. - You start over from point 1, and add a new handful of samples to your list of models to evaluate and compete with the ones you kept from the previous loop.
You can consider a QLattice
as a probability distribution where models are sampled from it. Initially, this distribution is uniform and is tuned after each update call. Going through this process helps a QLattice
converge and shapes the distribution towards better solutions.
Every step in this process happens locally on your machine.
Why not just brute-force?
The space of all possible models is potentially infinite, which makes brute-forcing the solution intractable for all but the simplest datasets. This is why you update the QLattice
with the best model structures so far. You can also narrow the search space, by being specific on what relationships to investigate, and restrict the types of models the QLattice
will produce.
What about privacy?
Every step of the process when using a QLattice
with Feyn
happens locally on your machine. You can even run this without an internet connection.
In particular this means that none of your data is at any point exchanged and does not leave your machine.
Understanding the models
The resulting models are represented by unidirectional, acyclic graphs that cleanly visualize what happens in the mathematical equation for everyone to understand. On top of this, we have a suite of plots and tools to help you dig deeper into the models you get, and help you understand not only the relationships better, but also the tradeoffs, biases and support levels present in your model.
This makes the QLattice
especially great for when you want insights and intend to investigate relationships between your features.
The QLattice in a nutshell
The QLattice
is an environment to simulate discrete paths from multiple inputs to an output. It does this in a finite multi-dimensional lattice-space. This is where the inspiration from Feynman's path integral comes in.
The QLattice
simulates inputs taking a path through the lattice space before emerging to an output. If you do this until a solid path has been shaped, you'll eventually converge to the path most likely to explain the problem you're trying to model. Along the path that we take, we'll randomly sample from a selection of interactions
-- functions that transform the inputs to a new output.
Interactions
Interactions
are the basic computation units of each model. They take in data, transform it and then emit it out to be used in the next interaction
. Here are the current possible interactions:
Name | Function |
---|---|
Addition | |
Multiply | |
Squared | |
Linear | |
Tanh | |
Single-legged Gaussian | |
Double-legged Gaussian | |
Exponential | |
Logarithmic | |
Inverse |
We determine the interactions
based on probabilities, guided by repeated reinforcement of the best solutions provided by the QLattice
, as you fit the hundreds of thousands of models, that are discovered. During repeated reinforcement, islands will form in the QLattice
space, each with their own independent evolution. This narrows the search space, and gives way to many separate evolutionary spaces. A benefit to this process, is that the user helps decide which models are useful, and which paths will be reinforced. The user also decides how to constrain the decision space, giving the user full control over the shapes the models will be taking.
Altogether, this approach has some benefits, such as:
- there are far fewer nodes and connections.
- there are functions you wouldn't normally see in a neural network.
- the models are more inspectable, simpler and less prone to overfitting.
- the models are mathematical formulas, allowing you to reason about the consequences of your hypothesis.
- the models that have been tried are diverse and you can trust that nothing has been overlooked during training.
If there's a signal, the QLattice
will find it - so you can trust whether your problem is best solved with a complex non-linear mathematical equation, or a simple linear model.