Deploy a model for inference
by: Miquel Triana
Feyn version: 2.1+
Last updated: 28/10/2021
In many ocasions, one might want to use a newly trained model to make predictions on incoming new data. This is what is called deploying a model for inference.
Deployment can be a complicated task when using black box algorithms, such as random forests or neural networks. This is not the case for feyn
, as its models are simple mathematical expressions that can be exported and evaluated without the need of passing around large files, or importing libraries.
In this tutorial we will show you how to output a model so it can be used to make predictions somewhere else. We will use the models obtained in the Titanic survival tutorial.
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sympy.printing.printer import Printer
import feyn
Parse data and train model
We start by loading and parsing the data. Have a look at the Titanic survival tutorial if you are interested in the details.
df = pd.read_csv('../data/deploy_inference.csv')
# Input missing data, drop non relevant columns
age_dist = df[(df.pclass == 3) & (df.embarked == 'C') & ( == 'male') &
(df.sibsp == 0) & (df.parch == 0) & (df.survived == 0)].age.dropna()
mean_age = np.mean(age_dist)
std_age = np.std(age_dist)
age_guess = np.random.normal(mean_age, std_age, size=2)
df_mod = df.drop(['boat', 'body', 'home.dest', 'name', 'ticket', 'cabin'], axis=1)
df_mod.loc[df[df.age.isna()].index, 'age'] = age_guess
# Split data
output = 'survived'
train, test = train_test_split(df_mod, test_size=0.4, random_state=42, stratify=df_mod[output])
# Define categorical inputs
stypes = {}
for col in train.columns:
if train[col].dtype == 'O':
stypes[col] = 'c'
stypes['pclass'] = 'c'
Once the data is parsed, we can easily train a model with the auto_run
# Train models
ql = feyn.connect_qlattice()
models = ql.auto_run(train, output, kind='classification', stypes=stypes)
# Select the best performing model
best_model = models[0]
Extract model
The method sympify
will output a sympy expression. To be able to evaluate it, we will output the categorical features in terms of their categories and weights (see the how feyn
handles categories for more details). Similarly, we will output the full sygmoid function if the model is a classifier.
sympy_model = best_model.sympify(symbolic_cat=False, symbolic_lr=True)
The sympy expression can be converted into a string with the method doprint
from the class Printer
of sympy. We replace the expression "exp" by "np.exp", a numpy
function that can be evaluated.
printer = Printer()
string_model = printer.doprint(sympy_model).replace("exp", "np.exp")
'1/(0.0963819*np.exp(0.413291*sibsp - 1.72977*(0.0409075*age + 0.918737)*(0.346173*pclass_1 + 0.00905154*pclass_2 - 0.287936*pclass_3 + 0.359734*sex_female - 0.377445*sex_male - 0.666242)) + 1)'
Inference: evaluate expression
The expression contained in string_model
can be evaluated inside a function to make predictions without the need of importing feyn
and loading any object. To create this function we can simply copy and paste the expression inside the definition of model_inference
def model_inference(sibsp, age, pclass_1, pclass_2, pclass_3, sex_female, sex_male):
return 1/(0.0963819*np.exp(0.413291*sibsp -
1.72977*(0.0409075*age + 0.918737)*
(0.346173*pclass_1 +
0.00905154*pclass_2 -
0.287936*pclass_3 +
0.359734*sex_female -
0.377445*sex_male - 0.666242)) + 1)
The model is expressed in terms of the one-hot-encoded features, that can be obtained easily with the pandas
function get_dummies
# Get numeric features as numpy arrays
sibsp = test.sibsp.values
age = test.age.values
# One-hot-encoding of categorical features
pclass_1 = pd.get_dummies(test.pclass)[1].values
pclass_2 = pd.get_dummies(test.pclass)[2].values
pclass_3 = pd.get_dummies(test.pclass)[3].values
sex_female = pd.get_dummies(["female"].values
sex_male = pd.get_dummies(["male"].values
We can check that indeed the model we extracted gives the same results as the predict
method of the feyn
model (up to the specified numeric precision of the coefficients)
best_model.predict(test)[0:20]-model_inference(sibsp, age, pclass_1, pclass_2, pclass_3, sex_female, sex_male)[0:20]
array([ 1.81708470e-07, -1.55812481e-06, -9.27053956e-07, -6.67120051e-07,
-1.40847184e-06, -4.75640350e-07, -6.45967956e-07, -1.60310114e-06,
-4.71003742e-08, -9.27053956e-07, -1.53719299e-06, -7.19278846e-07,
-3.07361949e-07, -9.71066287e-07, -7.23793551e-07, 6.77314471e-08,
-1.91267928e-07, -9.28127093e-07, -1.96795965e-07, -8.04840904e-07])