feyn.reference · Feyn Documentation

This module contains reference models that can be used for comparison with feyn models.

class BaseReferenceModel

def __init__(
    /,
    *args,
    **kwargs
) -> BaseReferenceModel

Base class for reference models

method BaseReferenceModel.absolute_error

def absolute_error(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's absolute error on the provided data.

This function is a shorthand that is equivalent to the following code:
> y_true = data[]
> y_pred = model.predict(data)
> se = feyn.losses.absolute_error(y_true, y_pred)

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    nd.array -- The losses as an array of floats.

Raises:
    TypeError -- if inputs don't match the correct type.

method BaseReferenceModel.accuracy_score

def accuracy_score(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's accuracy score on a data set.

The accuracy score is useful to evaluate classification models. It is the fraction of the preditions that are correct. Formally it is defned as

(number of correct predictions) / (total number of preditions)

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    accuracy score for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method BaseReferenceModel.accuracy_threshold

def accuracy_threshold(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the accuracy score of predictions with optimal threshold

The accuracy score is useful to evaluate classification models. It is the fraction of the preditions that are correct. Accuracy is normally calculated under the assumption that the threshold that separates true from false is 0.5. Hovever, this is not the case when a model was trained with another population composition than on the one which is used.

This function first computes the threshold limining true from false classes that optimises the accuracy. It then returns this threshold along with the accuracy that is obtained using it.

Arguments:
    data {DataFrame} -- Dataset to evaulate accuracy and accuracy threshold

Returns a tuple with:
    threshold that maximizes accuracy
    accuracy score obtained with this threshold

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method BaseReferenceModel.binary_cross_entropy

def binary_cross_entropy(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's binary cross entropy on the provided data.

This function is a shorthand that is equivalent to the following code:
> y_true = data[]
> y_pred = model.predict(data)
> se = feyn.losses.binary_cross_entropy(y_true, y_pred)

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    nd.array -- The losses as an array of floats.

Raises:
    TypeError -- if inputs don't match the correct type.

method BaseReferenceModel.mae

def mae(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's mean absolute error on a data set.

Arguments:
    data {DataFrame}-- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    MAE for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method BaseReferenceModel.mse

def mse(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's mean squared error on a data set.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    MSE for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method BaseReferenceModel.plot_confusion_matrix

def plot_confusion_matrix(
    self,
    data: pandas.core.frame.DataFrame,
    threshold: Optional[float] = 0.5,
    labels: Optional[Iterable] = None,
    title: str = 'Confusion matrix',
    color_map: str = 'feyn-primary',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None
) -> None

Compute and plot a Confusion Matrix.

Arguments:
    data {DataFrame}-- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
    threshold -- Boundary of True and False predictions, default 0.5
    labels -- List of labels to index the matrix
    title -- Title of the plot.
    color_map -- Color map from matplotlib to use for the matrix
    ax -- matplotlib axes object to draw to, default None
    figsize -- Size of created figure, default None
    filename -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used, default None

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method BaseReferenceModel.plot_pr_curve

def plot_pr_curve(
    self,
    data: pandas.core.frame.DataFrame,
    threshold: Optional[float] = None,
    title: str = 'Precision-Recall curve',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None,
    **kwargs
) -> None

Plot the model's precision-recall curve.

This is a shorthand for calling feyn.plots.plot_pr_curve.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
    threshold -- Plots a point on the PR curve of the precision and recall at the given threshold. Default is None
    title -- Title of the plot.
    ax -- matplotlib axes object to draw to, default None
    figsize -- size of figure when  is None, default None
    filename -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used, default is None
    **kwargs -- additional keyword arguments to pass to Axes.plot function

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method BaseReferenceModel.plot_probability_scores

def plot_probability_scores(
    self,
    data: pandas.core.frame.DataFrame,
    nbins: int = 10,
    title: str = 'Predicted Probabilities',
    legend: List[str] = ['Positive Class', 'Negative Class'],
    legend_loc: Optional[str] = 'upper center',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None,
    **kwargs
)

Plots the histogram of probability scores in binary
classification problems, highlighting the negative and
positive classes. Order of truth and prediction matters.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Keyword Arguments:
    nbins {int} -- number of bins (default: {10})
    title {str} -- plot title (default: {''})
    legend {List[str]} -- legend to use on the plot for the positive and negative class (default: ["Positive Class", "Negative Class"])
    legend_loc {str} -- the location (mpl style) to use for the label. If None, legend is hidden
    ax {Axes} -- axes object (default: {None})
    figsize {tuple} -- size of figure (default: {None})
    filename {str} -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used (default: {None})
    kwargs {dict} -- histogram kwargs (default: {None})

Raises:
    TypeError -- if model is not a classification model.
    TypeError -- if inputs don't match the correct type.
    ValueError: if y_true is not bool-like (boolean or 0/1).
    ValueError: if y_pred is not bool-like (boolean or 0/1).
    ValueError: if y_pred and y_true are not same size.
    ValueError: If fewer than two labels are supplied for the legend.

method BaseReferenceModel.plot_regression

def plot_regression(
    self,
    data: pandas.core.frame.DataFrame,
    title: str = 'Actuals vs Prediction',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None
)

This plots the true values on the x-axis and the predicted values on the y-axis.
On top of the plot is the line of equality y=x.
The closer the scattered values are to the line the better the predictions.
The line of best fit between y_true and y_pred is also calculated and plotted. This line should be close to the line y=x

Arguments:
    data {DataFrame} -- The dataset to determine regression quality. It contains input names and output name of the model as columns

Keyword Arguments:
    title {str} -- (default: {"Actuals vs Predictions"})
    ax {AxesSubplot} -- (default: {None})
    figsize {tuple} -- Size of figure (default: {None})
    filename {str} -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used (default: {None})

Raises:
    TypeError -- if inputs don't match the correct type.

method BaseReferenceModel.plot_residuals

def plot_residuals(
    self,
    data: pandas.core.frame.DataFrame,
    title: str = 'Residuals plot',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None
)

This plots the predicted values against the residuals (y_true - y_pred).

Arguments:
    data {DataFrame} -- The dataset containing the samples to determine the residuals of.

Keyword Arguments:
    title {str} -- (default: {"Residual plot"})
    ax {Axes} -- (default: {None})
    figsize {tuple} -- Size of figure (default: {None})
    filename {str} -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used (default: {None})

Raises:
    TypeError -- if inputs don't match the correct type.

method BaseReferenceModel.plot_roc_curve

def plot_roc_curve(
    self,
    data: pandas.core.frame.DataFrame,
    threshold: Optional[float] = None,
    title: str = 'ROC curve',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None,
    **kwargs
) -> None

Plot the model's ROC curve.

This is a shorthand for calling feyn.plots.plot_roc_curve.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
    threshold -- Plots a point on the ROC curve of the true positive rate and false positive rate at the given threshold. Default is None
    title -- Title of the plot.
    ax -- matplotlib axes object to draw to, default None
    figsize -- size of figure when  is None, default None
    filename -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used, default is None
    **kwargs -- additional keyword arguments to pass to Axes.plot function

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method BaseReferenceModel.plot_segmented_loss

def plot_segmented_loss(
    self,
    data: pandas.core.frame.DataFrame,
    by: Optional[str] = None,
    loss_function: str = 'squared_error',
    title: str = 'Segmented Loss',
    legend: List[str] = ['Samples in bin', 'Mean loss for bin'],
    legend_loc: Optional[str] = 'lower right',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None
) -> None

Plot the loss by segment of a dataset.

This plot is useful to evaluate how a model performs on different subsets of the data.

Example:
> models = qlattice.sample_models(["age","smoker","heartrate"], output="heartrate")
> models = feyn.fit_models(models, data)
> best = models[0]
> feyn.plots.plot_segmented_loss(best, data, by="smoker")

This will plot a histogram of the model loss for smokers and non-smokers separately, which can help evaluate wheter the model has better performance for euther of the smoker sub-populations.

You can use any column in the dataset as the `by` parameter. If you use a numerical column, the data will be binned automatically.

Arguments:
    data {DataFrame} -- The dataset to measure the loss on.

Keyword Arguments:
    by -- The column in the dataset to segment by.
    loss_function -- The loss function to compute for each segmnent,
    title -- Title of the plot.
    legend {List[str]} -- legend to use on the plot for bins and loss line (default: ["Samples in bin", "Mean loss for bin"])
    legend_loc {str} -- the location (mpl style) to use for the label. If None, legend is hidden
    ax -- matplotlib axes object to draw to
    figsize -- Size of created figure, default None
    filename -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used, default None

Raises:
    TypeError -- if inputs don't match the correct type.
    ValueError: if by is not in data.
    ValueError: If columns needed for the model are not present in the data.
    ValueError: If fewer than two labels are supplied for the legend.

method BaseReferenceModel.predict

def predict(
    self,
    X: Iterable
)

Get predictions for a given dataset.

Arguments:
    data {Iterable} -- Data to predict for.

Returns:
    Iterable -- The predictions for the data.

method BaseReferenceModel.r2_score

def r2_score(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's r2 score on a data set

The r2 score for a regression model is defined as
1 - rss/tss

Where rss is the residual sum of squares for the predictions, and tss is the total sum of squares.
Intutively, the tss is the resuduals of a so-called "worst" model that always predicts the mean. Therefore, the r2 score expresses how much better the predictions are than such a model.

A result of 0 means that the model is no better than a model that always predicts the mean value
A result of 1 means that the model perfectly predicts the true value

It is possible to get r2 scores below 0 if the predictions are even worse than the mean model.

Arguments:
    data {DataFrame}-- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    r2 score for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method BaseReferenceModel.rmse

def rmse(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's root mean squared error on a data set.

Arguments:
    data {DataFrame}-- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    RMSE for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method BaseReferenceModel.roc_auc_score

def roc_auc_score(
    self,
    data: pandas.core.frame.DataFrame
)

Calculate the Area Under Curve (AUC) of the ROC curve.

A ROC curve depicts the ability of a binary classifier with varying threshold.

The area under the curve (AUC) is the probability that said classifier will
attach a higher score to a random positive instance in comparison to a random
negative instance.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    AUC score for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method BaseReferenceModel.squared_error

def squared_error(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's squared error loss on the provided data.

This function is a shorthand that is equivalent to the following code:
> y_true = data[]
> y_pred = model.predict(data)
> se = feyn.losses.squared_error(y_true, y_pred)

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    nd.array -- The losses as an array of floats.

Raises:
    TypeError -- if inputs don't match the correct type.

class ConstantModel

def __init__(
    output_name: str,
    const: float
) -> ConstantModel

Base class for reference models

method ConstantModel.absolute_error

def absolute_error(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's absolute error on the provided data.

This function is a shorthand that is equivalent to the following code:
> y_true = data[]
> y_pred = model.predict(data)
> se = feyn.losses.absolute_error(y_true, y_pred)

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    nd.array -- The losses as an array of floats.

Raises:
    TypeError -- if inputs don't match the correct type.

method ConstantModel.accuracy_score

def accuracy_score(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's accuracy score on a data set.

The accuracy score is useful to evaluate classification models. It is the fraction of the preditions that are correct. Formally it is defned as

(number of correct predictions) / (total number of preditions)

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    accuracy score for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method ConstantModel.accuracy_threshold

def accuracy_threshold(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the accuracy score of predictions with optimal threshold

The accuracy score is useful to evaluate classification models. It is the fraction of the preditions that are correct. Accuracy is normally calculated under the assumption that the threshold that separates true from false is 0.5. Hovever, this is not the case when a model was trained with another population composition than on the one which is used.

This function first computes the threshold limining true from false classes that optimises the accuracy. It then returns this threshold along with the accuracy that is obtained using it.

Arguments:
    data {DataFrame} -- Dataset to evaulate accuracy and accuracy threshold

Returns a tuple with:
    threshold that maximizes accuracy
    accuracy score obtained with this threshold

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method ConstantModel.binary_cross_entropy

def binary_cross_entropy(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's binary cross entropy on the provided data.

This function is a shorthand that is equivalent to the following code:
> y_true = data[]
> y_pred = model.predict(data)
> se = feyn.losses.binary_cross_entropy(y_true, y_pred)

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    nd.array -- The losses as an array of floats.

Raises:
    TypeError -- if inputs don't match the correct type.

method ConstantModel.mae

def mae(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's mean absolute error on a data set.

Arguments:
    data {DataFrame}-- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    MAE for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method ConstantModel.mse

def mse(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's mean squared error on a data set.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    MSE for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method ConstantModel.plot_confusion_matrix

def plot_confusion_matrix(
    self,
    data: pandas.core.frame.DataFrame,
    threshold: Optional[float] = 0.5,
    labels: Optional[Iterable] = None,
    title: str = 'Confusion matrix',
    color_map: str = 'feyn-primary',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None
) -> None

Compute and plot a Confusion Matrix.

Arguments:
    data {DataFrame}-- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
    threshold -- Boundary of True and False predictions, default 0.5
    labels -- List of labels to index the matrix
    title -- Title of the plot.
    color_map -- Color map from matplotlib to use for the matrix
    ax -- matplotlib axes object to draw to, default None
    figsize -- Size of created figure, default None
    filename -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used, default None

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method ConstantModel.plot_pr_curve

def plot_pr_curve(
    self,
    data: pandas.core.frame.DataFrame,
    threshold: Optional[float] = None,
    title: str = 'Precision-Recall curve',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None,
    **kwargs
) -> None

Plot the model's precision-recall curve.

This is a shorthand for calling feyn.plots.plot_pr_curve.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
    threshold -- Plots a point on the PR curve of the precision and recall at the given threshold. Default is None
    title -- Title of the plot.
    ax -- matplotlib axes object to draw to, default None
    figsize -- size of figure when  is None, default None
    filename -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used, default is None
    **kwargs -- additional keyword arguments to pass to Axes.plot function

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method ConstantModel.plot_probability_scores

def plot_probability_scores(
    self,
    data: pandas.core.frame.DataFrame,
    nbins: int = 10,
    title: str = 'Predicted Probabilities',
    legend: List[str] = ['Positive Class', 'Negative Class'],
    legend_loc: Optional[str] = 'upper center',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None,
    **kwargs
)

Plots the histogram of probability scores in binary
classification problems, highlighting the negative and
positive classes. Order of truth and prediction matters.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Keyword Arguments:
    nbins {int} -- number of bins (default: {10})
    title {str} -- plot title (default: {''})
    legend {List[str]} -- legend to use on the plot for the positive and negative class (default: ["Positive Class", "Negative Class"])
    legend_loc {str} -- the location (mpl style) to use for the label. If None, legend is hidden
    ax {Axes} -- axes object (default: {None})
    figsize {tuple} -- size of figure (default: {None})
    filename {str} -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used (default: {None})
    kwargs {dict} -- histogram kwargs (default: {None})

Raises:
    TypeError -- if model is not a classification model.
    TypeError -- if inputs don't match the correct type.
    ValueError: if y_true is not bool-like (boolean or 0/1).
    ValueError: if y_pred is not bool-like (boolean or 0/1).
    ValueError: if y_pred and y_true are not same size.
    ValueError: If fewer than two labels are supplied for the legend.

method ConstantModel.plot_regression

def plot_regression(
    self,
    data: pandas.core.frame.DataFrame,
    title: str = 'Actuals vs Prediction',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None
)

This plots the true values on the x-axis and the predicted values on the y-axis.
On top of the plot is the line of equality y=x.
The closer the scattered values are to the line the better the predictions.
The line of best fit between y_true and y_pred is also calculated and plotted. This line should be close to the line y=x

Arguments:
    data {DataFrame} -- The dataset to determine regression quality. It contains input names and output name of the model as columns

Keyword Arguments:
    title {str} -- (default: {"Actuals vs Predictions"})
    ax {AxesSubplot} -- (default: {None})
    figsize {tuple} -- Size of figure (default: {None})
    filename {str} -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used (default: {None})

Raises:
    TypeError -- if inputs don't match the correct type.

method ConstantModel.plot_residuals

def plot_residuals(
    self,
    data: pandas.core.frame.DataFrame,
    title: str = 'Residuals plot',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None
)

This plots the predicted values against the residuals (y_true - y_pred).

Arguments:
    data {DataFrame} -- The dataset containing the samples to determine the residuals of.

Keyword Arguments:
    title {str} -- (default: {"Residual plot"})
    ax {Axes} -- (default: {None})
    figsize {tuple} -- Size of figure (default: {None})
    filename {str} -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used (default: {None})

Raises:
    TypeError -- if inputs don't match the correct type.

method ConstantModel.plot_roc_curve

def plot_roc_curve(
    self,
    data: pandas.core.frame.DataFrame,
    threshold: Optional[float] = None,
    title: str = 'ROC curve',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None,
    **kwargs
) -> None

Plot the model's ROC curve.

This is a shorthand for calling feyn.plots.plot_roc_curve.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
    threshold -- Plots a point on the ROC curve of the true positive rate and false positive rate at the given threshold. Default is None
    title -- Title of the plot.
    ax -- matplotlib axes object to draw to, default None
    figsize -- size of figure when  is None, default None
    filename -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used, default is None
    **kwargs -- additional keyword arguments to pass to Axes.plot function

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method ConstantModel.plot_segmented_loss

def plot_segmented_loss(
    self,
    data: pandas.core.frame.DataFrame,
    by: Optional[str] = None,
    loss_function: str = 'squared_error',
    title: str = 'Segmented Loss',
    legend: List[str] = ['Samples in bin', 'Mean loss for bin'],
    legend_loc: Optional[str] = 'lower right',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None
) -> None

Plot the loss by segment of a dataset.

This plot is useful to evaluate how a model performs on different subsets of the data.

Example:
> models = qlattice.sample_models(["age","smoker","heartrate"], output="heartrate")
> models = feyn.fit_models(models, data)
> best = models[0]
> feyn.plots.plot_segmented_loss(best, data, by="smoker")

This will plot a histogram of the model loss for smokers and non-smokers separately, which can help evaluate wheter the model has better performance for euther of the smoker sub-populations.

You can use any column in the dataset as the `by` parameter. If you use a numerical column, the data will be binned automatically.

Arguments:
    data {DataFrame} -- The dataset to measure the loss on.

Keyword Arguments:
    by -- The column in the dataset to segment by.
    loss_function -- The loss function to compute for each segmnent,
    title -- Title of the plot.
    legend {List[str]} -- legend to use on the plot for bins and loss line (default: ["Samples in bin", "Mean loss for bin"])
    legend_loc {str} -- the location (mpl style) to use for the label. If None, legend is hidden
    ax -- matplotlib axes object to draw to
    figsize -- Size of created figure, default None
    filename -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used, default None

Raises:
    TypeError -- if inputs don't match the correct type.
    ValueError: if by is not in data.
    ValueError: If columns needed for the model are not present in the data.
    ValueError: If fewer than two labels are supplied for the legend.

method ConstantModel.predict

def predict(
    self,
    data: Iterable
)

Get predictions for a given dataset.

Arguments:
    data {Iterable} -- Data to predict for.

Returns:
    Iterable -- The predictions for the data.

method ConstantModel.r2_score

def r2_score(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's r2 score on a data set

The r2 score for a regression model is defined as
1 - rss/tss

Where rss is the residual sum of squares for the predictions, and tss is the total sum of squares.
Intutively, the tss is the resuduals of a so-called "worst" model that always predicts the mean. Therefore, the r2 score expresses how much better the predictions are than such a model.

A result of 0 means that the model is no better than a model that always predicts the mean value
A result of 1 means that the model perfectly predicts the true value

It is possible to get r2 scores below 0 if the predictions are even worse than the mean model.

Arguments:
    data {DataFrame}-- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    r2 score for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method ConstantModel.rmse

def rmse(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's root mean squared error on a data set.

Arguments:
    data {DataFrame}-- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    RMSE for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method ConstantModel.roc_auc_score

def roc_auc_score(
    self,
    data: pandas.core.frame.DataFrame
)

Calculate the Area Under Curve (AUC) of the ROC curve.

A ROC curve depicts the ability of a binary classifier with varying threshold.

The area under the curve (AUC) is the probability that said classifier will
attach a higher score to a random positive instance in comparison to a random
negative instance.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    AUC score for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method ConstantModel.squared_error

def squared_error(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's squared error loss on the provided data.

This function is a shorthand that is equivalent to the following code:
> y_true = data[]
> y_pred = model.predict(data)
> se = feyn.losses.squared_error(y_true, y_pred)

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    nd.array -- The losses as an array of floats.

Raises:
    TypeError -- if inputs don't match the correct type.

class GradientBoostingClassifier

def __init__(
    data: pandas.core.frame.DataFrame,
    output_name: str,
    **kwargs
) -> GradientBoostingClassifier

Base class for reference models

method GradientBoostingClassifier.absolute_error

def absolute_error(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's absolute error on the provided data.

This function is a shorthand that is equivalent to the following code:
> y_true = data[]
> y_pred = model.predict(data)
> se = feyn.losses.absolute_error(y_true, y_pred)

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    nd.array -- The losses as an array of floats.

Raises:
    TypeError -- if inputs don't match the correct type.

method GradientBoostingClassifier.accuracy_score

def accuracy_score(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's accuracy score on a data set.

The accuracy score is useful to evaluate classification models. It is the fraction of the preditions that are correct. Formally it is defned as

(number of correct predictions) / (total number of preditions)

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    accuracy score for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method GradientBoostingClassifier.accuracy_threshold

def accuracy_threshold(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the accuracy score of predictions with optimal threshold

The accuracy score is useful to evaluate classification models. It is the fraction of the preditions that are correct. Accuracy is normally calculated under the assumption that the threshold that separates true from false is 0.5. Hovever, this is not the case when a model was trained with another population composition than on the one which is used.

This function first computes the threshold limining true from false classes that optimises the accuracy. It then returns this threshold along with the accuracy that is obtained using it.

Arguments:
    data {DataFrame} -- Dataset to evaulate accuracy and accuracy threshold

Returns a tuple with:
    threshold that maximizes accuracy
    accuracy score obtained with this threshold

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method GradientBoostingClassifier.binary_cross_entropy

def binary_cross_entropy(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's binary cross entropy on the provided data.

This function is a shorthand that is equivalent to the following code:
> y_true = data[]
> y_pred = model.predict(data)
> se = feyn.losses.binary_cross_entropy(y_true, y_pred)

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    nd.array -- The losses as an array of floats.

Raises:
    TypeError -- if inputs don't match the correct type.

method GradientBoostingClassifier.mae

def mae(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's mean absolute error on a data set.

Arguments:
    data {DataFrame}-- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    MAE for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method GradientBoostingClassifier.mse

def mse(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's mean squared error on a data set.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    MSE for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method GradientBoostingClassifier.plot_confusion_matrix

def plot_confusion_matrix(
    self,
    data: pandas.core.frame.DataFrame,
    threshold: Optional[float] = 0.5,
    labels: Optional[Iterable] = None,
    title: str = 'Confusion matrix',
    color_map: str = 'feyn-primary',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None
) -> None

Compute and plot a Confusion Matrix.

Arguments:
    data {DataFrame}-- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
    threshold -- Boundary of True and False predictions, default 0.5
    labels -- List of labels to index the matrix
    title -- Title of the plot.
    color_map -- Color map from matplotlib to use for the matrix
    ax -- matplotlib axes object to draw to, default None
    figsize -- Size of created figure, default None
    filename -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used, default None

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method GradientBoostingClassifier.plot_pr_curve

def plot_pr_curve(
    self,
    data: pandas.core.frame.DataFrame,
    threshold: Optional[float] = None,
    title: str = 'Precision-Recall curve',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None,
    **kwargs
) -> None

Plot the model's precision-recall curve.

This is a shorthand for calling feyn.plots.plot_pr_curve.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
    threshold -- Plots a point on the PR curve of the precision and recall at the given threshold. Default is None
    title -- Title of the plot.
    ax -- matplotlib axes object to draw to, default None
    figsize -- size of figure when  is None, default None
    filename -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used, default is None
    **kwargs -- additional keyword arguments to pass to Axes.plot function

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method GradientBoostingClassifier.plot_probability_scores

def plot_probability_scores(
    self,
    data: pandas.core.frame.DataFrame,
    nbins: int = 10,
    title: str = 'Predicted Probabilities',
    legend: List[str] = ['Positive Class', 'Negative Class'],
    legend_loc: Optional[str] = 'upper center',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None,
    **kwargs
)

Plots the histogram of probability scores in binary
classification problems, highlighting the negative and
positive classes. Order of truth and prediction matters.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Keyword Arguments:
    nbins {int} -- number of bins (default: {10})
    title {str} -- plot title (default: {''})
    legend {List[str]} -- legend to use on the plot for the positive and negative class (default: ["Positive Class", "Negative Class"])
    legend_loc {str} -- the location (mpl style) to use for the label. If None, legend is hidden
    ax {Axes} -- axes object (default: {None})
    figsize {tuple} -- size of figure (default: {None})
    filename {str} -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used (default: {None})
    kwargs {dict} -- histogram kwargs (default: {None})

Raises:
    TypeError -- if model is not a classification model.
    TypeError -- if inputs don't match the correct type.
    ValueError: if y_true is not bool-like (boolean or 0/1).
    ValueError: if y_pred is not bool-like (boolean or 0/1).
    ValueError: if y_pred and y_true are not same size.
    ValueError: If fewer than two labels are supplied for the legend.

method GradientBoostingClassifier.plot_regression

def plot_regression(
    self,
    data: pandas.core.frame.DataFrame,
    title: str = 'Actuals vs Prediction',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None
)

This plots the true values on the x-axis and the predicted values on the y-axis.
On top of the plot is the line of equality y=x.
The closer the scattered values are to the line the better the predictions.
The line of best fit between y_true and y_pred is also calculated and plotted. This line should be close to the line y=x

Arguments:
    data {DataFrame} -- The dataset to determine regression quality. It contains input names and output name of the model as columns

Keyword Arguments:
    title {str} -- (default: {"Actuals vs Predictions"})
    ax {AxesSubplot} -- (default: {None})
    figsize {tuple} -- Size of figure (default: {None})
    filename {str} -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used (default: {None})

Raises:
    TypeError -- if inputs don't match the correct type.

method GradientBoostingClassifier.plot_residuals

def plot_residuals(
    self,
    data: pandas.core.frame.DataFrame,
    title: str = 'Residuals plot',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None
)

This plots the predicted values against the residuals (y_true - y_pred).

Arguments:
    data {DataFrame} -- The dataset containing the samples to determine the residuals of.

Keyword Arguments:
    title {str} -- (default: {"Residual plot"})
    ax {Axes} -- (default: {None})
    figsize {tuple} -- Size of figure (default: {None})
    filename {str} -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used (default: {None})

Raises:
    TypeError -- if inputs don't match the correct type.

method GradientBoostingClassifier.plot_roc_curve

def plot_roc_curve(
    self,
    data: pandas.core.frame.DataFrame,
    threshold: Optional[float] = None,
    title: str = 'ROC curve',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None,
    **kwargs
) -> None

Plot the model's ROC curve.

This is a shorthand for calling feyn.plots.plot_roc_curve.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
    threshold -- Plots a point on the ROC curve of the true positive rate and false positive rate at the given threshold. Default is None
    title -- Title of the plot.
    ax -- matplotlib axes object to draw to, default None
    figsize -- size of figure when  is None, default None
    filename -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used, default is None
    **kwargs -- additional keyword arguments to pass to Axes.plot function

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method GradientBoostingClassifier.plot_segmented_loss

def plot_segmented_loss(
    self,
    data: pandas.core.frame.DataFrame,
    by: Optional[str] = None,
    loss_function: str = 'squared_error',
    title: str = 'Segmented Loss',
    legend: List[str] = ['Samples in bin', 'Mean loss for bin'],
    legend_loc: Optional[str] = 'lower right',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None
) -> None

Plot the loss by segment of a dataset.

This plot is useful to evaluate how a model performs on different subsets of the data.

Example:
> models = qlattice.sample_models(["age","smoker","heartrate"], output="heartrate")
> models = feyn.fit_models(models, data)
> best = models[0]
> feyn.plots.plot_segmented_loss(best, data, by="smoker")

This will plot a histogram of the model loss for smokers and non-smokers separately, which can help evaluate wheter the model has better performance for euther of the smoker sub-populations.

You can use any column in the dataset as the `by` parameter. If you use a numerical column, the data will be binned automatically.

Arguments:
    data {DataFrame} -- The dataset to measure the loss on.

Keyword Arguments:
    by -- The column in the dataset to segment by.
    loss_function -- The loss function to compute for each segmnent,
    title -- Title of the plot.
    legend {List[str]} -- legend to use on the plot for bins and loss line (default: ["Samples in bin", "Mean loss for bin"])
    legend_loc {str} -- the location (mpl style) to use for the label. If None, legend is hidden
    ax -- matplotlib axes object to draw to
    figsize -- Size of created figure, default None
    filename -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used, default None

Raises:
    TypeError -- if inputs don't match the correct type.
    ValueError: if by is not in data.
    ValueError: If columns needed for the model are not present in the data.
    ValueError: If fewer than two labels are supplied for the legend.

method GradientBoostingClassifier.predict

def predict(
    self,
    X: Iterable
)

Get predictions for a given dataset.

Arguments:
    data {Iterable} -- Data to predict for.

Returns:
    Iterable -- The predictions for the data.

method GradientBoostingClassifier.r2_score

def r2_score(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's r2 score on a data set

The r2 score for a regression model is defined as
1 - rss/tss

Where rss is the residual sum of squares for the predictions, and tss is the total sum of squares.
Intutively, the tss is the resuduals of a so-called "worst" model that always predicts the mean. Therefore, the r2 score expresses how much better the predictions are than such a model.

A result of 0 means that the model is no better than a model that always predicts the mean value
A result of 1 means that the model perfectly predicts the true value

It is possible to get r2 scores below 0 if the predictions are even worse than the mean model.

Arguments:
    data {DataFrame}-- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    r2 score for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method GradientBoostingClassifier.rmse

def rmse(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's root mean squared error on a data set.

Arguments:
    data {DataFrame}-- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    RMSE for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method GradientBoostingClassifier.roc_auc_score

def roc_auc_score(
    self,
    data: pandas.core.frame.DataFrame
)

Calculate the Area Under Curve (AUC) of the ROC curve.

A ROC curve depicts the ability of a binary classifier with varying threshold.

The area under the curve (AUC) is the probability that said classifier will
attach a higher score to a random positive instance in comparison to a random
negative instance.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    AUC score for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method GradientBoostingClassifier.squared_error

def squared_error(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's squared error loss on the provided data.

This function is a shorthand that is equivalent to the following code:
> y_true = data[]
> y_pred = model.predict(data)
> se = feyn.losses.squared_error(y_true, y_pred)

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    nd.array -- The losses as an array of floats.

Raises:
    TypeError -- if inputs don't match the correct type.

class LinearRegression

def __init__(
    data: pandas.core.frame.DataFrame,
    output_name: str,
    **kwargs
) -> LinearRegression

Base class for reference models

method LinearRegression.absolute_error

def absolute_error(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's absolute error on the provided data.

This function is a shorthand that is equivalent to the following code:
> y_true = data[]
> y_pred = model.predict(data)
> se = feyn.losses.absolute_error(y_true, y_pred)

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    nd.array -- The losses as an array of floats.

Raises:
    TypeError -- if inputs don't match the correct type.

method LinearRegression.accuracy_score

def accuracy_score(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's accuracy score on a data set.

The accuracy score is useful to evaluate classification models. It is the fraction of the preditions that are correct. Formally it is defned as

(number of correct predictions) / (total number of preditions)

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    accuracy score for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method LinearRegression.accuracy_threshold

def accuracy_threshold(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the accuracy score of predictions with optimal threshold

The accuracy score is useful to evaluate classification models. It is the fraction of the preditions that are correct. Accuracy is normally calculated under the assumption that the threshold that separates true from false is 0.5. Hovever, this is not the case when a model was trained with another population composition than on the one which is used.

This function first computes the threshold limining true from false classes that optimises the accuracy. It then returns this threshold along with the accuracy that is obtained using it.

Arguments:
    data {DataFrame} -- Dataset to evaulate accuracy and accuracy threshold

Returns a tuple with:
    threshold that maximizes accuracy
    accuracy score obtained with this threshold

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method LinearRegression.binary_cross_entropy

def binary_cross_entropy(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's binary cross entropy on the provided data.

This function is a shorthand that is equivalent to the following code:
> y_true = data[]
> y_pred = model.predict(data)
> se = feyn.losses.binary_cross_entropy(y_true, y_pred)

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    nd.array -- The losses as an array of floats.

Raises:
    TypeError -- if inputs don't match the correct type.

method LinearRegression.mae

def mae(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's mean absolute error on a data set.

Arguments:
    data {DataFrame}-- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    MAE for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method LinearRegression.mse

def mse(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's mean squared error on a data set.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    MSE for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method LinearRegression.plot_confusion_matrix

def plot_confusion_matrix(
    self,
    data: pandas.core.frame.DataFrame,
    threshold: Optional[float] = 0.5,
    labels: Optional[Iterable] = None,
    title: str = 'Confusion matrix',
    color_map: str = 'feyn-primary',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None
) -> None

Compute and plot a Confusion Matrix.

Arguments:
    data {DataFrame}-- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
    threshold -- Boundary of True and False predictions, default 0.5
    labels -- List of labels to index the matrix
    title -- Title of the plot.
    color_map -- Color map from matplotlib to use for the matrix
    ax -- matplotlib axes object to draw to, default None
    figsize -- Size of created figure, default None
    filename -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used, default None

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method LinearRegression.plot_pr_curve

def plot_pr_curve(
    self,
    data: pandas.core.frame.DataFrame,
    threshold: Optional[float] = None,
    title: str = 'Precision-Recall curve',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None,
    **kwargs
) -> None

Plot the model's precision-recall curve.

This is a shorthand for calling feyn.plots.plot_pr_curve.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
    threshold -- Plots a point on the PR curve of the precision and recall at the given threshold. Default is None
    title -- Title of the plot.
    ax -- matplotlib axes object to draw to, default None
    figsize -- size of figure when  is None, default None
    filename -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used, default is None
    **kwargs -- additional keyword arguments to pass to Axes.plot function

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method LinearRegression.plot_probability_scores

def plot_probability_scores(
    self,
    data: pandas.core.frame.DataFrame,
    nbins: int = 10,
    title: str = 'Predicted Probabilities',
    legend: List[str] = ['Positive Class', 'Negative Class'],
    legend_loc: Optional[str] = 'upper center',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None,
    **kwargs
)

Plots the histogram of probability scores in binary
classification problems, highlighting the negative and
positive classes. Order of truth and prediction matters.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Keyword Arguments:
    nbins {int} -- number of bins (default: {10})
    title {str} -- plot title (default: {''})
    legend {List[str]} -- legend to use on the plot for the positive and negative class (default: ["Positive Class", "Negative Class"])
    legend_loc {str} -- the location (mpl style) to use for the label. If None, legend is hidden
    ax {Axes} -- axes object (default: {None})
    figsize {tuple} -- size of figure (default: {None})
    filename {str} -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used (default: {None})
    kwargs {dict} -- histogram kwargs (default: {None})

Raises:
    TypeError -- if model is not a classification model.
    TypeError -- if inputs don't match the correct type.
    ValueError: if y_true is not bool-like (boolean or 0/1).
    ValueError: if y_pred is not bool-like (boolean or 0/1).
    ValueError: if y_pred and y_true are not same size.
    ValueError: If fewer than two labels are supplied for the legend.

method LinearRegression.plot_regression

def plot_regression(
    self,
    data: pandas.core.frame.DataFrame,
    title: str = 'Actuals vs Prediction',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None
)

This plots the true values on the x-axis and the predicted values on the y-axis.
On top of the plot is the line of equality y=x.
The closer the scattered values are to the line the better the predictions.
The line of best fit between y_true and y_pred is also calculated and plotted. This line should be close to the line y=x

Arguments:
    data {DataFrame} -- The dataset to determine regression quality. It contains input names and output name of the model as columns

Keyword Arguments:
    title {str} -- (default: {"Actuals vs Predictions"})
    ax {AxesSubplot} -- (default: {None})
    figsize {tuple} -- Size of figure (default: {None})
    filename {str} -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used (default: {None})

Raises:
    TypeError -- if inputs don't match the correct type.

method LinearRegression.plot_residuals

def plot_residuals(
    self,
    data: pandas.core.frame.DataFrame,
    title: str = 'Residuals plot',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None
)

This plots the predicted values against the residuals (y_true - y_pred).

Arguments:
    data {DataFrame} -- The dataset containing the samples to determine the residuals of.

Keyword Arguments:
    title {str} -- (default: {"Residual plot"})
    ax {Axes} -- (default: {None})
    figsize {tuple} -- Size of figure (default: {None})
    filename {str} -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used (default: {None})

Raises:
    TypeError -- if inputs don't match the correct type.

method LinearRegression.plot_roc_curve

def plot_roc_curve(
    self,
    data: pandas.core.frame.DataFrame,
    threshold: Optional[float] = None,
    title: str = 'ROC curve',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None,
    **kwargs
) -> None

Plot the model's ROC curve.

This is a shorthand for calling feyn.plots.plot_roc_curve.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
    threshold -- Plots a point on the ROC curve of the true positive rate and false positive rate at the given threshold. Default is None
    title -- Title of the plot.
    ax -- matplotlib axes object to draw to, default None
    figsize -- size of figure when  is None, default None
    filename -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used, default is None
    **kwargs -- additional keyword arguments to pass to Axes.plot function

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method LinearRegression.plot_segmented_loss

def plot_segmented_loss(
    self,
    data: pandas.core.frame.DataFrame,
    by: Optional[str] = None,
    loss_function: str = 'squared_error',
    title: str = 'Segmented Loss',
    legend: List[str] = ['Samples in bin', 'Mean loss for bin'],
    legend_loc: Optional[str] = 'lower right',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None
) -> None

Plot the loss by segment of a dataset.

This plot is useful to evaluate how a model performs on different subsets of the data.

Example:
> models = qlattice.sample_models(["age","smoker","heartrate"], output="heartrate")
> models = feyn.fit_models(models, data)
> best = models[0]
> feyn.plots.plot_segmented_loss(best, data, by="smoker")

This will plot a histogram of the model loss for smokers and non-smokers separately, which can help evaluate wheter the model has better performance for euther of the smoker sub-populations.

You can use any column in the dataset as the `by` parameter. If you use a numerical column, the data will be binned automatically.

Arguments:
    data {DataFrame} -- The dataset to measure the loss on.

Keyword Arguments:
    by -- The column in the dataset to segment by.
    loss_function -- The loss function to compute for each segmnent,
    title -- Title of the plot.
    legend {List[str]} -- legend to use on the plot for bins and loss line (default: ["Samples in bin", "Mean loss for bin"])
    legend_loc {str} -- the location (mpl style) to use for the label. If None, legend is hidden
    ax -- matplotlib axes object to draw to
    figsize -- Size of created figure, default None
    filename -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used, default None

Raises:
    TypeError -- if inputs don't match the correct type.
    ValueError: if by is not in data.
    ValueError: If columns needed for the model are not present in the data.
    ValueError: If fewer than two labels are supplied for the legend.

method LinearRegression.predict

def predict(
    self,
    X: Iterable
)

Get predictions for a given dataset.

Arguments:
    data {Iterable} -- Data to predict for.

Returns:
    Iterable -- The predictions for the data.

method LinearRegression.r2_score

def r2_score(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's r2 score on a data set

The r2 score for a regression model is defined as
1 - rss/tss

Where rss is the residual sum of squares for the predictions, and tss is the total sum of squares.
Intutively, the tss is the resuduals of a so-called "worst" model that always predicts the mean. Therefore, the r2 score expresses how much better the predictions are than such a model.

A result of 0 means that the model is no better than a model that always predicts the mean value
A result of 1 means that the model perfectly predicts the true value

It is possible to get r2 scores below 0 if the predictions are even worse than the mean model.

Arguments:
    data {DataFrame}-- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    r2 score for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method LinearRegression.rmse

def rmse(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's root mean squared error on a data set.

Arguments:
    data {DataFrame}-- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    RMSE for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method LinearRegression.roc_auc_score

def roc_auc_score(
    self,
    data: pandas.core.frame.DataFrame
)

Calculate the Area Under Curve (AUC) of the ROC curve.

A ROC curve depicts the ability of a binary classifier with varying threshold.

The area under the curve (AUC) is the probability that said classifier will
attach a higher score to a random positive instance in comparison to a random
negative instance.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    AUC score for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method LinearRegression.squared_error

def squared_error(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's squared error loss on the provided data.

This function is a shorthand that is equivalent to the following code:
> y_true = data[]
> y_pred = model.predict(data)
> se = feyn.losses.squared_error(y_true, y_pred)

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    nd.array -- The losses as an array of floats.

Raises:
    TypeError -- if inputs don't match the correct type.

class LogisticRegressionClassifier

def __init__(
    data: pandas.core.frame.DataFrame,
    output_name: str,
    **kwargs
) -> LogisticRegressionClassifier

Base class for reference models

method LogisticRegressionClassifier.absolute_error

def absolute_error(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's absolute error on the provided data.

This function is a shorthand that is equivalent to the following code:
> y_true = data[]
> y_pred = model.predict(data)
> se = feyn.losses.absolute_error(y_true, y_pred)

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    nd.array -- The losses as an array of floats.

Raises:
    TypeError -- if inputs don't match the correct type.

method LogisticRegressionClassifier.accuracy_score

def accuracy_score(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's accuracy score on a data set.

The accuracy score is useful to evaluate classification models. It is the fraction of the preditions that are correct. Formally it is defned as

(number of correct predictions) / (total number of preditions)

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    accuracy score for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method LogisticRegressionClassifier.accuracy_threshold

def accuracy_threshold(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the accuracy score of predictions with optimal threshold

The accuracy score is useful to evaluate classification models. It is the fraction of the preditions that are correct. Accuracy is normally calculated under the assumption that the threshold that separates true from false is 0.5. Hovever, this is not the case when a model was trained with another population composition than on the one which is used.

This function first computes the threshold limining true from false classes that optimises the accuracy. It then returns this threshold along with the accuracy that is obtained using it.

Arguments:
    data {DataFrame} -- Dataset to evaulate accuracy and accuracy threshold

Returns a tuple with:
    threshold that maximizes accuracy
    accuracy score obtained with this threshold

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method LogisticRegressionClassifier.binary_cross_entropy

def binary_cross_entropy(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's binary cross entropy on the provided data.

This function is a shorthand that is equivalent to the following code:
> y_true = data[]
> y_pred = model.predict(data)
> se = feyn.losses.binary_cross_entropy(y_true, y_pred)

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    nd.array -- The losses as an array of floats.

Raises:
    TypeError -- if inputs don't match the correct type.

method LogisticRegressionClassifier.mae

def mae(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's mean absolute error on a data set.

Arguments:
    data {DataFrame}-- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    MAE for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method LogisticRegressionClassifier.mse

def mse(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's mean squared error on a data set.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    MSE for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method LogisticRegressionClassifier.plot_confusion_matrix

def plot_confusion_matrix(
    self,
    data: pandas.core.frame.DataFrame,
    threshold: Optional[float] = 0.5,
    labels: Optional[Iterable] = None,
    title: str = 'Confusion matrix',
    color_map: str = 'feyn-primary',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None
) -> None

Compute and plot a Confusion Matrix.

Arguments:
    data {DataFrame}-- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
    threshold -- Boundary of True and False predictions, default 0.5
    labels -- List of labels to index the matrix
    title -- Title of the plot.
    color_map -- Color map from matplotlib to use for the matrix
    ax -- matplotlib axes object to draw to, default None
    figsize -- Size of created figure, default None
    filename -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used, default None

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method LogisticRegressionClassifier.plot_pr_curve

def plot_pr_curve(
    self,
    data: pandas.core.frame.DataFrame,
    threshold: Optional[float] = None,
    title: str = 'Precision-Recall curve',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None,
    **kwargs
) -> None

Plot the model's precision-recall curve.

This is a shorthand for calling feyn.plots.plot_pr_curve.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
    threshold -- Plots a point on the PR curve of the precision and recall at the given threshold. Default is None
    title -- Title of the plot.
    ax -- matplotlib axes object to draw to, default None
    figsize -- size of figure when  is None, default None
    filename -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used, default is None
    **kwargs -- additional keyword arguments to pass to Axes.plot function

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method LogisticRegressionClassifier.plot_probability_scores

def plot_probability_scores(
    self,
    data: pandas.core.frame.DataFrame,
    nbins: int = 10,
    title: str = 'Predicted Probabilities',
    legend: List[str] = ['Positive Class', 'Negative Class'],
    legend_loc: Optional[str] = 'upper center',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None,
    **kwargs
)

Plots the histogram of probability scores in binary
classification problems, highlighting the negative and
positive classes. Order of truth and prediction matters.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Keyword Arguments:
    nbins {int} -- number of bins (default: {10})
    title {str} -- plot title (default: {''})
    legend {List[str]} -- legend to use on the plot for the positive and negative class (default: ["Positive Class", "Negative Class"])
    legend_loc {str} -- the location (mpl style) to use for the label. If None, legend is hidden
    ax {Axes} -- axes object (default: {None})
    figsize {tuple} -- size of figure (default: {None})
    filename {str} -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used (default: {None})
    kwargs {dict} -- histogram kwargs (default: {None})

Raises:
    TypeError -- if model is not a classification model.
    TypeError -- if inputs don't match the correct type.
    ValueError: if y_true is not bool-like (boolean or 0/1).
    ValueError: if y_pred is not bool-like (boolean or 0/1).
    ValueError: if y_pred and y_true are not same size.
    ValueError: If fewer than two labels are supplied for the legend.

method LogisticRegressionClassifier.plot_regression

def plot_regression(
    self,
    data: pandas.core.frame.DataFrame,
    title: str = 'Actuals vs Prediction',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None
)

This plots the true values on the x-axis and the predicted values on the y-axis.
On top of the plot is the line of equality y=x.
The closer the scattered values are to the line the better the predictions.
The line of best fit between y_true and y_pred is also calculated and plotted. This line should be close to the line y=x

Arguments:
    data {DataFrame} -- The dataset to determine regression quality. It contains input names and output name of the model as columns

Keyword Arguments:
    title {str} -- (default: {"Actuals vs Predictions"})
    ax {AxesSubplot} -- (default: {None})
    figsize {tuple} -- Size of figure (default: {None})
    filename {str} -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used (default: {None})

Raises:
    TypeError -- if inputs don't match the correct type.

method LogisticRegressionClassifier.plot_residuals

def plot_residuals(
    self,
    data: pandas.core.frame.DataFrame,
    title: str = 'Residuals plot',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None
)

This plots the predicted values against the residuals (y_true - y_pred).

Arguments:
    data {DataFrame} -- The dataset containing the samples to determine the residuals of.

Keyword Arguments:
    title {str} -- (default: {"Residual plot"})
    ax {Axes} -- (default: {None})
    figsize {tuple} -- Size of figure (default: {None})
    filename {str} -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used (default: {None})

Raises:
    TypeError -- if inputs don't match the correct type.

method LogisticRegressionClassifier.plot_roc_curve

def plot_roc_curve(
    self,
    data: pandas.core.frame.DataFrame,
    threshold: Optional[float] = None,
    title: str = 'ROC curve',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None,
    **kwargs
) -> None

Plot the model's ROC curve.

This is a shorthand for calling feyn.plots.plot_roc_curve.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
    threshold -- Plots a point on the ROC curve of the true positive rate and false positive rate at the given threshold. Default is None
    title -- Title of the plot.
    ax -- matplotlib axes object to draw to, default None
    figsize -- size of figure when  is None, default None
    filename -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used, default is None
    **kwargs -- additional keyword arguments to pass to Axes.plot function

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method LogisticRegressionClassifier.plot_segmented_loss

def plot_segmented_loss(
    self,
    data: pandas.core.frame.DataFrame,
    by: Optional[str] = None,
    loss_function: str = 'squared_error',
    title: str = 'Segmented Loss',
    legend: List[str] = ['Samples in bin', 'Mean loss for bin'],
    legend_loc: Optional[str] = 'lower right',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None
) -> None

Plot the loss by segment of a dataset.

This plot is useful to evaluate how a model performs on different subsets of the data.

Example:
> models = qlattice.sample_models(["age","smoker","heartrate"], output="heartrate")
> models = feyn.fit_models(models, data)
> best = models[0]
> feyn.plots.plot_segmented_loss(best, data, by="smoker")

This will plot a histogram of the model loss for smokers and non-smokers separately, which can help evaluate wheter the model has better performance for euther of the smoker sub-populations.

You can use any column in the dataset as the `by` parameter. If you use a numerical column, the data will be binned automatically.

Arguments:
    data {DataFrame} -- The dataset to measure the loss on.

Keyword Arguments:
    by -- The column in the dataset to segment by.
    loss_function -- The loss function to compute for each segmnent,
    title -- Title of the plot.
    legend {List[str]} -- legend to use on the plot for bins and loss line (default: ["Samples in bin", "Mean loss for bin"])
    legend_loc {str} -- the location (mpl style) to use for the label. If None, legend is hidden
    ax -- matplotlib axes object to draw to
    figsize -- Size of created figure, default None
    filename -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used, default None

Raises:
    TypeError -- if inputs don't match the correct type.
    ValueError: if by is not in data.
    ValueError: If columns needed for the model are not present in the data.
    ValueError: If fewer than two labels are supplied for the legend.

method LogisticRegressionClassifier.predict

def predict(
    self,
    X: Iterable
)

Get predictions for a given dataset.

Arguments:
    data {Iterable} -- Data to predict for.

Returns:
    Iterable -- The predictions for the data.

method LogisticRegressionClassifier.r2_score

def r2_score(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's r2 score on a data set

The r2 score for a regression model is defined as
1 - rss/tss

Where rss is the residual sum of squares for the predictions, and tss is the total sum of squares.
Intutively, the tss is the resuduals of a so-called "worst" model that always predicts the mean. Therefore, the r2 score expresses how much better the predictions are than such a model.

A result of 0 means that the model is no better than a model that always predicts the mean value
A result of 1 means that the model perfectly predicts the true value

It is possible to get r2 scores below 0 if the predictions are even worse than the mean model.

Arguments:
    data {DataFrame}-- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    r2 score for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method LogisticRegressionClassifier.rmse

def rmse(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's root mean squared error on a data set.

Arguments:
    data {DataFrame}-- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    RMSE for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method LogisticRegressionClassifier.roc_auc_score

def roc_auc_score(
    self,
    data: pandas.core.frame.DataFrame
)

Calculate the Area Under Curve (AUC) of the ROC curve.

A ROC curve depicts the ability of a binary classifier with varying threshold.

The area under the curve (AUC) is the probability that said classifier will
attach a higher score to a random positive instance in comparison to a random
negative instance.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    AUC score for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method LogisticRegressionClassifier.squared_error

def squared_error(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's squared error loss on the provided data.

This function is a shorthand that is equivalent to the following code:
> y_true = data[]
> y_pred = model.predict(data)
> se = feyn.losses.squared_error(y_true, y_pred)

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    nd.array -- The losses as an array of floats.

Raises:
    TypeError -- if inputs don't match the correct type.

method LogisticRegressionClassifier.summary

def summary(
    self,
    ax=None
)

class RandomForestClassifier

def __init__(
    data: pandas.core.frame.DataFrame,
    output_name: str,
    **kwargs
) -> RandomForestClassifier

Base class for reference models

method RandomForestClassifier.absolute_error

def absolute_error(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's absolute error on the provided data.

This function is a shorthand that is equivalent to the following code:
> y_true = data[]
> y_pred = model.predict(data)
> se = feyn.losses.absolute_error(y_true, y_pred)

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    nd.array -- The losses as an array of floats.

Raises:
    TypeError -- if inputs don't match the correct type.

method RandomForestClassifier.accuracy_score

def accuracy_score(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's accuracy score on a data set.

The accuracy score is useful to evaluate classification models. It is the fraction of the preditions that are correct. Formally it is defned as

(number of correct predictions) / (total number of preditions)

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    accuracy score for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method RandomForestClassifier.accuracy_threshold

def accuracy_threshold(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the accuracy score of predictions with optimal threshold

The accuracy score is useful to evaluate classification models. It is the fraction of the preditions that are correct. Accuracy is normally calculated under the assumption that the threshold that separates true from false is 0.5. Hovever, this is not the case when a model was trained with another population composition than on the one which is used.

This function first computes the threshold limining true from false classes that optimises the accuracy. It then returns this threshold along with the accuracy that is obtained using it.

Arguments:
    data {DataFrame} -- Dataset to evaulate accuracy and accuracy threshold

Returns a tuple with:
    threshold that maximizes accuracy
    accuracy score obtained with this threshold

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method RandomForestClassifier.binary_cross_entropy

def binary_cross_entropy(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's binary cross entropy on the provided data.

This function is a shorthand that is equivalent to the following code:
> y_true = data[]
> y_pred = model.predict(data)
> se = feyn.losses.binary_cross_entropy(y_true, y_pred)

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    nd.array -- The losses as an array of floats.

Raises:
    TypeError -- if inputs don't match the correct type.

method RandomForestClassifier.mae

def mae(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's mean absolute error on a data set.

Arguments:
    data {DataFrame}-- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    MAE for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method RandomForestClassifier.mse

def mse(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's mean squared error on a data set.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    MSE for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method RandomForestClassifier.plot_confusion_matrix

def plot_confusion_matrix(
    self,
    data: pandas.core.frame.DataFrame,
    threshold: Optional[float] = 0.5,
    labels: Optional[Iterable] = None,
    title: str = 'Confusion matrix',
    color_map: str = 'feyn-primary',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None
) -> None

Compute and plot a Confusion Matrix.

Arguments:
    data {DataFrame}-- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
    threshold -- Boundary of True and False predictions, default 0.5
    labels -- List of labels to index the matrix
    title -- Title of the plot.
    color_map -- Color map from matplotlib to use for the matrix
    ax -- matplotlib axes object to draw to, default None
    figsize -- Size of created figure, default None
    filename -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used, default None

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method RandomForestClassifier.plot_pr_curve

def plot_pr_curve(
    self,
    data: pandas.core.frame.DataFrame,
    threshold: Optional[float] = None,
    title: str = 'Precision-Recall curve',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None,
    **kwargs
) -> None

Plot the model's precision-recall curve.

This is a shorthand for calling feyn.plots.plot_pr_curve.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
    threshold -- Plots a point on the PR curve of the precision and recall at the given threshold. Default is None
    title -- Title of the plot.
    ax -- matplotlib axes object to draw to, default None
    figsize -- size of figure when  is None, default None
    filename -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used, default is None
    **kwargs -- additional keyword arguments to pass to Axes.plot function

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method RandomForestClassifier.plot_probability_scores

def plot_probability_scores(
    self,
    data: pandas.core.frame.DataFrame,
    nbins: int = 10,
    title: str = 'Predicted Probabilities',
    legend: List[str] = ['Positive Class', 'Negative Class'],
    legend_loc: Optional[str] = 'upper center',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None,
    **kwargs
)

Plots the histogram of probability scores in binary
classification problems, highlighting the negative and
positive classes. Order of truth and prediction matters.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Keyword Arguments:
    nbins {int} -- number of bins (default: {10})
    title {str} -- plot title (default: {''})
    legend {List[str]} -- legend to use on the plot for the positive and negative class (default: ["Positive Class", "Negative Class"])
    legend_loc {str} -- the location (mpl style) to use for the label. If None, legend is hidden
    ax {Axes} -- axes object (default: {None})
    figsize {tuple} -- size of figure (default: {None})
    filename {str} -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used (default: {None})
    kwargs {dict} -- histogram kwargs (default: {None})

Raises:
    TypeError -- if model is not a classification model.
    TypeError -- if inputs don't match the correct type.
    ValueError: if y_true is not bool-like (boolean or 0/1).
    ValueError: if y_pred is not bool-like (boolean or 0/1).
    ValueError: if y_pred and y_true are not same size.
    ValueError: If fewer than two labels are supplied for the legend.

method RandomForestClassifier.plot_regression

def plot_regression(
    self,
    data: pandas.core.frame.DataFrame,
    title: str = 'Actuals vs Prediction',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None
)

This plots the true values on the x-axis and the predicted values on the y-axis.
On top of the plot is the line of equality y=x.
The closer the scattered values are to the line the better the predictions.
The line of best fit between y_true and y_pred is also calculated and plotted. This line should be close to the line y=x

Arguments:
    data {DataFrame} -- The dataset to determine regression quality. It contains input names and output name of the model as columns

Keyword Arguments:
    title {str} -- (default: {"Actuals vs Predictions"})
    ax {AxesSubplot} -- (default: {None})
    figsize {tuple} -- Size of figure (default: {None})
    filename {str} -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used (default: {None})

Raises:
    TypeError -- if inputs don't match the correct type.

method RandomForestClassifier.plot_residuals

def plot_residuals(
    self,
    data: pandas.core.frame.DataFrame,
    title: str = 'Residuals plot',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None
)

This plots the predicted values against the residuals (y_true - y_pred).

Arguments:
    data {DataFrame} -- The dataset containing the samples to determine the residuals of.

Keyword Arguments:
    title {str} -- (default: {"Residual plot"})
    ax {Axes} -- (default: {None})
    figsize {tuple} -- Size of figure (default: {None})
    filename {str} -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used (default: {None})

Raises:
    TypeError -- if inputs don't match the correct type.

method RandomForestClassifier.plot_roc_curve

def plot_roc_curve(
    self,
    data: pandas.core.frame.DataFrame,
    threshold: Optional[float] = None,
    title: str = 'ROC curve',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None,
    **kwargs
) -> None

Plot the model's ROC curve.

This is a shorthand for calling feyn.plots.plot_roc_curve.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
    threshold -- Plots a point on the ROC curve of the true positive rate and false positive rate at the given threshold. Default is None
    title -- Title of the plot.
    ax -- matplotlib axes object to draw to, default None
    figsize -- size of figure when  is None, default None
    filename -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used, default is None
    **kwargs -- additional keyword arguments to pass to Axes.plot function

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method RandomForestClassifier.plot_segmented_loss

def plot_segmented_loss(
    self,
    data: pandas.core.frame.DataFrame,
    by: Optional[str] = None,
    loss_function: str = 'squared_error',
    title: str = 'Segmented Loss',
    legend: List[str] = ['Samples in bin', 'Mean loss for bin'],
    legend_loc: Optional[str] = 'lower right',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None
) -> None

Plot the loss by segment of a dataset.

This plot is useful to evaluate how a model performs on different subsets of the data.

Example:
> models = qlattice.sample_models(["age","smoker","heartrate"], output="heartrate")
> models = feyn.fit_models(models, data)
> best = models[0]
> feyn.plots.plot_segmented_loss(best, data, by="smoker")

This will plot a histogram of the model loss for smokers and non-smokers separately, which can help evaluate wheter the model has better performance for euther of the smoker sub-populations.

You can use any column in the dataset as the `by` parameter. If you use a numerical column, the data will be binned automatically.

Arguments:
    data {DataFrame} -- The dataset to measure the loss on.

Keyword Arguments:
    by -- The column in the dataset to segment by.
    loss_function -- The loss function to compute for each segmnent,
    title -- Title of the plot.
    legend {List[str]} -- legend to use on the plot for bins and loss line (default: ["Samples in bin", "Mean loss for bin"])
    legend_loc {str} -- the location (mpl style) to use for the label. If None, legend is hidden
    ax -- matplotlib axes object to draw to
    figsize -- Size of created figure, default None
    filename -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used, default None

Raises:
    TypeError -- if inputs don't match the correct type.
    ValueError: if by is not in data.
    ValueError: If columns needed for the model are not present in the data.
    ValueError: If fewer than two labels are supplied for the legend.

method RandomForestClassifier.predict

def predict(
    self,
    X: Iterable
)

Get predictions for a given dataset.

Arguments:
    data {Iterable} -- Data to predict for.

Returns:
    Iterable -- The predictions for the data.

method RandomForestClassifier.r2_score

def r2_score(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's r2 score on a data set

The r2 score for a regression model is defined as
1 - rss/tss

Where rss is the residual sum of squares for the predictions, and tss is the total sum of squares.
Intutively, the tss is the resuduals of a so-called "worst" model that always predicts the mean. Therefore, the r2 score expresses how much better the predictions are than such a model.

A result of 0 means that the model is no better than a model that always predicts the mean value
A result of 1 means that the model perfectly predicts the true value

It is possible to get r2 scores below 0 if the predictions are even worse than the mean model.

Arguments:
    data {DataFrame}-- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    r2 score for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method RandomForestClassifier.rmse

def rmse(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's root mean squared error on a data set.

Arguments:
    data {DataFrame}-- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    RMSE for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method RandomForestClassifier.roc_auc_score

def roc_auc_score(
    self,
    data: pandas.core.frame.DataFrame
)

Calculate the Area Under Curve (AUC) of the ROC curve.

A ROC curve depicts the ability of a binary classifier with varying threshold.

The area under the curve (AUC) is the probability that said classifier will
attach a higher score to a random positive instance in comparison to a random
negative instance.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    AUC score for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method RandomForestClassifier.squared_error

def squared_error(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's squared error loss on the provided data.

This function is a shorthand that is equivalent to the following code:
> y_true = data[]
> y_pred = model.predict(data)
> se = feyn.losses.squared_error(y_true, y_pred)

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    nd.array -- The losses as an array of floats.

Raises:
    TypeError -- if inputs don't match the correct type.

class SKLearnClassifier

def __init__(
    sklearn_classifier: type,
    data: pandas.core.frame.DataFrame,
    output_name: str,
    stypes: Optional[Dict[str, str]] = None,
    **kwargs
) -> SKLearnClassifier

Base class for reference models

method SKLearnClassifier.absolute_error

def absolute_error(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's absolute error on the provided data.

This function is a shorthand that is equivalent to the following code:
> y_true = data[]
> y_pred = model.predict(data)
> se = feyn.losses.absolute_error(y_true, y_pred)

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    nd.array -- The losses as an array of floats.

Raises:
    TypeError -- if inputs don't match the correct type.

method SKLearnClassifier.accuracy_score

def accuracy_score(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's accuracy score on a data set.

The accuracy score is useful to evaluate classification models. It is the fraction of the preditions that are correct. Formally it is defned as

(number of correct predictions) / (total number of preditions)

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    accuracy score for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method SKLearnClassifier.accuracy_threshold

def accuracy_threshold(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the accuracy score of predictions with optimal threshold

The accuracy score is useful to evaluate classification models. It is the fraction of the preditions that are correct. Accuracy is normally calculated under the assumption that the threshold that separates true from false is 0.5. Hovever, this is not the case when a model was trained with another population composition than on the one which is used.

This function first computes the threshold limining true from false classes that optimises the accuracy. It then returns this threshold along with the accuracy that is obtained using it.

Arguments:
    data {DataFrame} -- Dataset to evaulate accuracy and accuracy threshold

Returns a tuple with:
    threshold that maximizes accuracy
    accuracy score obtained with this threshold

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method SKLearnClassifier.binary_cross_entropy

def binary_cross_entropy(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's binary cross entropy on the provided data.

This function is a shorthand that is equivalent to the following code:
> y_true = data[]
> y_pred = model.predict(data)
> se = feyn.losses.binary_cross_entropy(y_true, y_pred)

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    nd.array -- The losses as an array of floats.

Raises:
    TypeError -- if inputs don't match the correct type.

method SKLearnClassifier.mae

def mae(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's mean absolute error on a data set.

Arguments:
    data {DataFrame}-- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    MAE for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method SKLearnClassifier.mse

def mse(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's mean squared error on a data set.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    MSE for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method SKLearnClassifier.plot_confusion_matrix

def plot_confusion_matrix(
    self,
    data: pandas.core.frame.DataFrame,
    threshold: Optional[float] = 0.5,
    labels: Optional[Iterable] = None,
    title: str = 'Confusion matrix',
    color_map: str = 'feyn-primary',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None
) -> None

Compute and plot a Confusion Matrix.

Arguments:
    data {DataFrame}-- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
    threshold -- Boundary of True and False predictions, default 0.5
    labels -- List of labels to index the matrix
    title -- Title of the plot.
    color_map -- Color map from matplotlib to use for the matrix
    ax -- matplotlib axes object to draw to, default None
    figsize -- Size of created figure, default None
    filename -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used, default None

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method SKLearnClassifier.plot_pr_curve

def plot_pr_curve(
    self,
    data: pandas.core.frame.DataFrame,
    threshold: Optional[float] = None,
    title: str = 'Precision-Recall curve',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None,
    **kwargs
) -> None

Plot the model's precision-recall curve.

This is a shorthand for calling feyn.plots.plot_pr_curve.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
    threshold -- Plots a point on the PR curve of the precision and recall at the given threshold. Default is None
    title -- Title of the plot.
    ax -- matplotlib axes object to draw to, default None
    figsize -- size of figure when  is None, default None
    filename -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used, default is None
    **kwargs -- additional keyword arguments to pass to Axes.plot function

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method SKLearnClassifier.plot_probability_scores

def plot_probability_scores(
    self,
    data: pandas.core.frame.DataFrame,
    nbins: int = 10,
    title: str = 'Predicted Probabilities',
    legend: List[str] = ['Positive Class', 'Negative Class'],
    legend_loc: Optional[str] = 'upper center',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None,
    **kwargs
)

Plots the histogram of probability scores in binary
classification problems, highlighting the negative and
positive classes. Order of truth and prediction matters.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Keyword Arguments:
    nbins {int} -- number of bins (default: {10})
    title {str} -- plot title (default: {''})
    legend {List[str]} -- legend to use on the plot for the positive and negative class (default: ["Positive Class", "Negative Class"])
    legend_loc {str} -- the location (mpl style) to use for the label. If None, legend is hidden
    ax {Axes} -- axes object (default: {None})
    figsize {tuple} -- size of figure (default: {None})
    filename {str} -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used (default: {None})
    kwargs {dict} -- histogram kwargs (default: {None})

Raises:
    TypeError -- if model is not a classification model.
    TypeError -- if inputs don't match the correct type.
    ValueError: if y_true is not bool-like (boolean or 0/1).
    ValueError: if y_pred is not bool-like (boolean or 0/1).
    ValueError: if y_pred and y_true are not same size.
    ValueError: If fewer than two labels are supplied for the legend.

method SKLearnClassifier.plot_regression

def plot_regression(
    self,
    data: pandas.core.frame.DataFrame,
    title: str = 'Actuals vs Prediction',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None
)

This plots the true values on the x-axis and the predicted values on the y-axis.
On top of the plot is the line of equality y=x.
The closer the scattered values are to the line the better the predictions.
The line of best fit between y_true and y_pred is also calculated and plotted. This line should be close to the line y=x

Arguments:
    data {DataFrame} -- The dataset to determine regression quality. It contains input names and output name of the model as columns

Keyword Arguments:
    title {str} -- (default: {"Actuals vs Predictions"})
    ax {AxesSubplot} -- (default: {None})
    figsize {tuple} -- Size of figure (default: {None})
    filename {str} -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used (default: {None})

Raises:
    TypeError -- if inputs don't match the correct type.

method SKLearnClassifier.plot_residuals

def plot_residuals(
    self,
    data: pandas.core.frame.DataFrame,
    title: str = 'Residuals plot',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None
)

This plots the predicted values against the residuals (y_true - y_pred).

Arguments:
    data {DataFrame} -- The dataset containing the samples to determine the residuals of.

Keyword Arguments:
    title {str} -- (default: {"Residual plot"})
    ax {Axes} -- (default: {None})
    figsize {tuple} -- Size of figure (default: {None})
    filename {str} -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used (default: {None})

Raises:
    TypeError -- if inputs don't match the correct type.

method SKLearnClassifier.plot_roc_curve

def plot_roc_curve(
    self,
    data: pandas.core.frame.DataFrame,
    threshold: Optional[float] = None,
    title: str = 'ROC curve',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None,
    **kwargs
) -> None

Plot the model's ROC curve.

This is a shorthand for calling feyn.plots.plot_roc_curve.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
    threshold -- Plots a point on the ROC curve of the true positive rate and false positive rate at the given threshold. Default is None
    title -- Title of the plot.
    ax -- matplotlib axes object to draw to, default None
    figsize -- size of figure when  is None, default None
    filename -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used, default is None
    **kwargs -- additional keyword arguments to pass to Axes.plot function

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method SKLearnClassifier.plot_segmented_loss

def plot_segmented_loss(
    self,
    data: pandas.core.frame.DataFrame,
    by: Optional[str] = None,
    loss_function: str = 'squared_error',
    title: str = 'Segmented Loss',
    legend: List[str] = ['Samples in bin', 'Mean loss for bin'],
    legend_loc: Optional[str] = 'lower right',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None
) -> None

Plot the loss by segment of a dataset.

This plot is useful to evaluate how a model performs on different subsets of the data.

Example:
> models = qlattice.sample_models(["age","smoker","heartrate"], output="heartrate")
> models = feyn.fit_models(models, data)
> best = models[0]
> feyn.plots.plot_segmented_loss(best, data, by="smoker")

This will plot a histogram of the model loss for smokers and non-smokers separately, which can help evaluate wheter the model has better performance for euther of the smoker sub-populations.

You can use any column in the dataset as the `by` parameter. If you use a numerical column, the data will be binned automatically.

Arguments:
    data {DataFrame} -- The dataset to measure the loss on.

Keyword Arguments:
    by -- The column in the dataset to segment by.
    loss_function -- The loss function to compute for each segmnent,
    title -- Title of the plot.
    legend {List[str]} -- legend to use on the plot for bins and loss line (default: ["Samples in bin", "Mean loss for bin"])
    legend_loc {str} -- the location (mpl style) to use for the label. If None, legend is hidden
    ax -- matplotlib axes object to draw to
    figsize -- Size of created figure, default None
    filename -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used, default None

Raises:
    TypeError -- if inputs don't match the correct type.
    ValueError: if by is not in data.
    ValueError: If columns needed for the model are not present in the data.
    ValueError: If fewer than two labels are supplied for the legend.

method SKLearnClassifier.predict

def predict(
    self,
    X: Iterable
)

Get predictions for a given dataset.

Arguments:
    data {Iterable} -- Data to predict for.

Returns:
    Iterable -- The predictions for the data.

method SKLearnClassifier.r2_score

def r2_score(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's r2 score on a data set

The r2 score for a regression model is defined as
1 - rss/tss

Where rss is the residual sum of squares for the predictions, and tss is the total sum of squares.
Intutively, the tss is the resuduals of a so-called "worst" model that always predicts the mean. Therefore, the r2 score expresses how much better the predictions are than such a model.

A result of 0 means that the model is no better than a model that always predicts the mean value
A result of 1 means that the model perfectly predicts the true value

It is possible to get r2 scores below 0 if the predictions are even worse than the mean model.

Arguments:
    data {DataFrame}-- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    r2 score for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method SKLearnClassifier.rmse

def rmse(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's root mean squared error on a data set.

Arguments:
    data {DataFrame}-- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    RMSE for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method SKLearnClassifier.roc_auc_score

def roc_auc_score(
    self,
    data: pandas.core.frame.DataFrame
)

Calculate the Area Under Curve (AUC) of the ROC curve.

A ROC curve depicts the ability of a binary classifier with varying threshold.

The area under the curve (AUC) is the probability that said classifier will
attach a higher score to a random positive instance in comparison to a random
negative instance.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    AUC score for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method SKLearnClassifier.squared_error

def squared_error(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's squared error loss on the provided data.

This function is a shorthand that is equivalent to the following code:
> y_true = data[]
> y_pred = model.predict(data)
> se = feyn.losses.squared_error(y_true, y_pred)

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    nd.array -- The losses as an array of floats.

Raises:
    TypeError -- if inputs don't match the correct type.

class SKLearnRegressor

def __init__(
    sklearn_regressor: type,
    data: pandas.core.frame.DataFrame,
    output_name: str,
    stypes: Optional[Dict[str, str]] = None,
    **kwargs
) -> SKLearnRegressor

Base class for reference models

method SKLearnRegressor.absolute_error

def absolute_error(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's absolute error on the provided data.

This function is a shorthand that is equivalent to the following code:
> y_true = data[]
> y_pred = model.predict(data)
> se = feyn.losses.absolute_error(y_true, y_pred)

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    nd.array -- The losses as an array of floats.

Raises:
    TypeError -- if inputs don't match the correct type.

method SKLearnRegressor.accuracy_score

def accuracy_score(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's accuracy score on a data set.

The accuracy score is useful to evaluate classification models. It is the fraction of the preditions that are correct. Formally it is defned as

(number of correct predictions) / (total number of preditions)

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    accuracy score for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method SKLearnRegressor.accuracy_threshold

def accuracy_threshold(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the accuracy score of predictions with optimal threshold

The accuracy score is useful to evaluate classification models. It is the fraction of the preditions that are correct. Accuracy is normally calculated under the assumption that the threshold that separates true from false is 0.5. Hovever, this is not the case when a model was trained with another population composition than on the one which is used.

This function first computes the threshold limining true from false classes that optimises the accuracy. It then returns this threshold along with the accuracy that is obtained using it.

Arguments:
    data {DataFrame} -- Dataset to evaulate accuracy and accuracy threshold

Returns a tuple with:
    threshold that maximizes accuracy
    accuracy score obtained with this threshold

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method SKLearnRegressor.binary_cross_entropy

def binary_cross_entropy(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's binary cross entropy on the provided data.

This function is a shorthand that is equivalent to the following code:
> y_true = data[]
> y_pred = model.predict(data)
> se = feyn.losses.binary_cross_entropy(y_true, y_pred)

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    nd.array -- The losses as an array of floats.

Raises:
    TypeError -- if inputs don't match the correct type.

method SKLearnRegressor.mae

def mae(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's mean absolute error on a data set.

Arguments:
    data {DataFrame}-- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    MAE for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method SKLearnRegressor.mse

def mse(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's mean squared error on a data set.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    MSE for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method SKLearnRegressor.plot_confusion_matrix

def plot_confusion_matrix(
    self,
    data: pandas.core.frame.DataFrame,
    threshold: Optional[float] = 0.5,
    labels: Optional[Iterable] = None,
    title: str = 'Confusion matrix',
    color_map: str = 'feyn-primary',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None
) -> None

Compute and plot a Confusion Matrix.

Arguments:
    data {DataFrame}-- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
    threshold -- Boundary of True and False predictions, default 0.5
    labels -- List of labels to index the matrix
    title -- Title of the plot.
    color_map -- Color map from matplotlib to use for the matrix
    ax -- matplotlib axes object to draw to, default None
    figsize -- Size of created figure, default None
    filename -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used, default None

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method SKLearnRegressor.plot_pr_curve

def plot_pr_curve(
    self,
    data: pandas.core.frame.DataFrame,
    threshold: Optional[float] = None,
    title: str = 'Precision-Recall curve',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None,
    **kwargs
) -> None

Plot the model's precision-recall curve.

This is a shorthand for calling feyn.plots.plot_pr_curve.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
    threshold -- Plots a point on the PR curve of the precision and recall at the given threshold. Default is None
    title -- Title of the plot.
    ax -- matplotlib axes object to draw to, default None
    figsize -- size of figure when  is None, default None
    filename -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used, default is None
    **kwargs -- additional keyword arguments to pass to Axes.plot function

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method SKLearnRegressor.plot_probability_scores

def plot_probability_scores(
    self,
    data: pandas.core.frame.DataFrame,
    nbins: int = 10,
    title: str = 'Predicted Probabilities',
    legend: List[str] = ['Positive Class', 'Negative Class'],
    legend_loc: Optional[str] = 'upper center',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None,
    **kwargs
)

Plots the histogram of probability scores in binary
classification problems, highlighting the negative and
positive classes. Order of truth and prediction matters.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Keyword Arguments:
    nbins {int} -- number of bins (default: {10})
    title {str} -- plot title (default: {''})
    legend {List[str]} -- legend to use on the plot for the positive and negative class (default: ["Positive Class", "Negative Class"])
    legend_loc {str} -- the location (mpl style) to use for the label. If None, legend is hidden
    ax {Axes} -- axes object (default: {None})
    figsize {tuple} -- size of figure (default: {None})
    filename {str} -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used (default: {None})
    kwargs {dict} -- histogram kwargs (default: {None})

Raises:
    TypeError -- if model is not a classification model.
    TypeError -- if inputs don't match the correct type.
    ValueError: if y_true is not bool-like (boolean or 0/1).
    ValueError: if y_pred is not bool-like (boolean or 0/1).
    ValueError: if y_pred and y_true are not same size.
    ValueError: If fewer than two labels are supplied for the legend.

method SKLearnRegressor.plot_regression

def plot_regression(
    self,
    data: pandas.core.frame.DataFrame,
    title: str = 'Actuals vs Prediction',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None
)

This plots the true values on the x-axis and the predicted values on the y-axis.
On top of the plot is the line of equality y=x.
The closer the scattered values are to the line the better the predictions.
The line of best fit between y_true and y_pred is also calculated and plotted. This line should be close to the line y=x

Arguments:
    data {DataFrame} -- The dataset to determine regression quality. It contains input names and output name of the model as columns

Keyword Arguments:
    title {str} -- (default: {"Actuals vs Predictions"})
    ax {AxesSubplot} -- (default: {None})
    figsize {tuple} -- Size of figure (default: {None})
    filename {str} -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used (default: {None})

Raises:
    TypeError -- if inputs don't match the correct type.

method SKLearnRegressor.plot_residuals

def plot_residuals(
    self,
    data: pandas.core.frame.DataFrame,
    title: str = 'Residuals plot',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None
)

This plots the predicted values against the residuals (y_true - y_pred).

Arguments:
    data {DataFrame} -- The dataset containing the samples to determine the residuals of.

Keyword Arguments:
    title {str} -- (default: {"Residual plot"})
    ax {Axes} -- (default: {None})
    figsize {tuple} -- Size of figure (default: {None})
    filename {str} -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used (default: {None})

Raises:
    TypeError -- if inputs don't match the correct type.

method SKLearnRegressor.plot_roc_curve

def plot_roc_curve(
    self,
    data: pandas.core.frame.DataFrame,
    threshold: Optional[float] = None,
    title: str = 'ROC curve',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None,
    **kwargs
) -> None

Plot the model's ROC curve.

This is a shorthand for calling feyn.plots.plot_roc_curve.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
    threshold -- Plots a point on the ROC curve of the true positive rate and false positive rate at the given threshold. Default is None
    title -- Title of the plot.
    ax -- matplotlib axes object to draw to, default None
    figsize -- size of figure when  is None, default None
    filename -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used, default is None
    **kwargs -- additional keyword arguments to pass to Axes.plot function

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method SKLearnRegressor.plot_segmented_loss

def plot_segmented_loss(
    self,
    data: pandas.core.frame.DataFrame,
    by: Optional[str] = None,
    loss_function: str = 'squared_error',
    title: str = 'Segmented Loss',
    legend: List[str] = ['Samples in bin', 'Mean loss for bin'],
    legend_loc: Optional[str] = 'lower right',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None
) -> None

Plot the loss by segment of a dataset.

This plot is useful to evaluate how a model performs on different subsets of the data.

Example:
> models = qlattice.sample_models(["age","smoker","heartrate"], output="heartrate")
> models = feyn.fit_models(models, data)
> best = models[0]
> feyn.plots.plot_segmented_loss(best, data, by="smoker")

This will plot a histogram of the model loss for smokers and non-smokers separately, which can help evaluate wheter the model has better performance for euther of the smoker sub-populations.

You can use any column in the dataset as the `by` parameter. If you use a numerical column, the data will be binned automatically.

Arguments:
    data {DataFrame} -- The dataset to measure the loss on.

Keyword Arguments:
    by -- The column in the dataset to segment by.
    loss_function -- The loss function to compute for each segmnent,
    title -- Title of the plot.
    legend {List[str]} -- legend to use on the plot for bins and loss line (default: ["Samples in bin", "Mean loss for bin"])
    legend_loc {str} -- the location (mpl style) to use for the label. If None, legend is hidden
    ax -- matplotlib axes object to draw to
    figsize -- Size of created figure, default None
    filename -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used, default None

Raises:
    TypeError -- if inputs don't match the correct type.
    ValueError: if by is not in data.
    ValueError: If columns needed for the model are not present in the data.
    ValueError: If fewer than two labels are supplied for the legend.

method SKLearnRegressor.predict

def predict(
    self,
    X: Iterable
)

Get predictions for a given dataset.

Arguments:
    data {Iterable} -- Data to predict for.

Returns:
    Iterable -- The predictions for the data.

method SKLearnRegressor.r2_score

def r2_score(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's r2 score on a data set

The r2 score for a regression model is defined as
1 - rss/tss

Where rss is the residual sum of squares for the predictions, and tss is the total sum of squares.
Intutively, the tss is the resuduals of a so-called "worst" model that always predicts the mean. Therefore, the r2 score expresses how much better the predictions are than such a model.

A result of 0 means that the model is no better than a model that always predicts the mean value
A result of 1 means that the model perfectly predicts the true value

It is possible to get r2 scores below 0 if the predictions are even worse than the mean model.

Arguments:
    data {DataFrame}-- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    r2 score for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method SKLearnRegressor.rmse

def rmse(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's root mean squared error on a data set.

Arguments:
    data {DataFrame}-- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    RMSE for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method SKLearnRegressor.roc_auc_score

def roc_auc_score(
    self,
    data: pandas.core.frame.DataFrame
)

Calculate the Area Under Curve (AUC) of the ROC curve.

A ROC curve depicts the ability of a binary classifier with varying threshold.

The area under the curve (AUC) is the probability that said classifier will
attach a higher score to a random positive instance in comparison to a random
negative instance.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    AUC score for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method SKLearnRegressor.squared_error

def squared_error(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's squared error loss on the provided data.

This function is a shorthand that is equivalent to the following code:
> y_true = data[]
> y_pred = model.predict(data)
> se = feyn.losses.squared_error(y_true, y_pred)

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    nd.array -- The losses as an array of floats.

Raises:
    TypeError -- if inputs don't match the correct type.