Gradient checks

It is best practice to do gradient checks before and after gradient-based optimization.

Find suitable tolerances to use during optimization. Importantly, test your gradients using the settings you will use later on.
At the optimum the values should be close to 0, except for parameters with active bounds.
Gradient checks can help you identify inconsistencies and errors, especially when using custom gradient calculation or objectives.

Here we show, how to use the gradient check methods that are implemented in pyPESTO, using the finite differences (FD) method as a comparison. There is a trade-off between the quality of the approximation and numerical noise, so it is recommended to try different FD step sizes.

[1]:

import benchmark_models_petab as models
import numpy as np

import pypesto.optimize as optimize
import pypesto.petab

np.random.seed(2)

import pandas as pd
import seaborn as sns

Set up an example problem

Create the pypesto problem and a random vector of parameter values.

Here, we use the startpoint sampling method to generate random parameter vectors.

[2]:

%%capture

model_name = "Boehm_JProteomeRes2014"
petab_problem = models.get_problem(model_name)

importer = pypesto.petab.PetabImporter(petab_problem)
pypesto_problem = importer.create_problem(verbose=False)

Compiling amici model to folder /home/docs/checkouts/readthedocs.org/user_builds/pypesto/checkouts/1652/doc/example/amici_models/1.0.0/Boehm_JProteomeRes2014.

[3]:

startpoints = pypesto_problem.get_startpoints(n_starts=4)

Gradient check before optimization

Perform a gradient check at the location of one of the random parameter vectors. check_grad compares the gradients obtained by the finite differences (FD) method and the objective gradient. You can modify the finite differences step size via the argument eps.

[4]:

pypesto_problem.objective.check_grad(
    x=startpoints[0],
    eps=1e-5,  # default
    verbosity=0,
)

[4]:

	grad	fd_f	fd_b	fd_c	fd_err	abs_err	rel_err
Epo_degradation_BaF3	2.220912e+09	2.220538e+09	2.220430e+09	2.220484e+09	1.089355e+05	4.279957e+05	1.927489e-04
k_exp_hetero	7.022117e+01	-3.842651e+06	-1.103638e+06	-2.473145e+06	2.739014e+06	2.473215e+06	1.000028e+00
k_exp_homo	8.402213e+04	-1.905737e+06	1.543799e+06	-1.809692e+05	3.449536e+06	2.649914e+05	1.464290e+00
k_imp_hetero	2.825850e+07	3.301833e+07	2.931040e+07	3.116437e+07	3.707935e+06	2.905869e+06	9.324331e-02
k_imp_homo	5.956185e+03	3.761719e+05	-1.238330e+06	-4.310791e+05	1.614502e+06	4.370353e+05	1.013817e+00
k_phos	-2.433153e+09	-2.430767e+09	-2.432919e+09	-2.431843e+09	2.152539e+06	1.309842e+06	5.386211e-04
sd_pSTAT5A_rel	-5.862715e+12	-5.862577e+12	-5.862854e+12	-5.862715e+12	2.771602e+08	2.296471e+03	3.917077e-10
sd_pSTAT5B_rel	-4.759122e+10	-4.758654e+10	-4.759590e+10	-4.759122e+10	9.363916e+06	5.641838e+00	1.185479e-10
sd_rSTAT5A_rel	1.415492e+01	3.586157e+06	-3.586108e+06	2.441406e+01	7.172266e+06	1.025914e+01	4.202142e-01

Explanation of the gradient check result columns:

grad: Objective gradient
fd_f: FD forward difference
fd_b: FD backward difference
fd_c: Approximation of FD central difference (reusing the information from fd_f and fd_b)
fd_err: Deviation between forward and backward differences fd_f, fd_b
abs_err: Absolute error between grad and the central FD gradient fd_c
rel_err Relative error between grad and the central FD gradient fd_c

If there are fixed parameters in your vector you might invoke an error due to the dimension mismatch. Use the helper method Problem.get_reduced_vector to get the reduced vector with only free (estimated) parameters.

Here we set a smaller FD step size eps = 1e-6 and observe that the errors change:

[5]:

parameter_vector = pypesto_problem.get_reduced_vector(startpoints[0])

pypesto_problem.objective.check_grad(
    x=parameter_vector,
    eps=1e-6,
    verbosity=0,
)

[5]:

	grad	fd_f	fd_b	fd_c	fd_err	abs_err	rel_err
Epo_degradation_BaF3	2.220912e+09	2.246010e+09	2.218966e+09	2.232488e+09	2.704346e+07	1.157581e+07	5.185163e-03
k_exp_hetero	7.022117e+01	2.099487e+07	-3.694434e+07	-7.974731e+06	5.793921e+07	7.974802e+06	1.000009e+00
k_exp_homo	8.402213e+04	2.855469e+06	-2.802148e+07	-1.258301e+07	3.087695e+07	1.266703e+07	1.006677e+00
k_imp_hetero	2.825850e+07	-4.105957e+06	1.072534e+07	3.309692e+06	1.483130e+07	2.494881e+07	7.538104e+00
k_imp_homo	5.956185e+03	2.987256e+07	-4.359131e+06	1.275671e+07	3.423169e+07	1.275076e+07	9.995331e-01
k_phos	-2.433153e+09	-2.414397e+09	-2.440591e+09	-2.427494e+09	2.619385e+07	5.659110e+06	2.331256e-03
sd_pSTAT5A_rel	-5.862715e+12	-5.862666e+12	-5.862765e+12	-5.862715e+12	9.872021e+07	1.063561e+03	1.814109e-10
sd_pSTAT5B_rel	-4.759122e+10	-4.755525e+10	-4.762719e+10	-4.759122e+10	7.194189e+07	6.760034e+01	1.420437e-09
sd_rSTAT5A_rel	1.415492e+01	3.586108e+07	-3.586108e+07	0.000000e+00	7.172217e+07	1.415492e+01	1.415492e+07

The method check_grad_multi_eps calls the check_grad method multiple times with different settings for the FD step size and reports the setting that results in the smallest error. You can supply a list of FD step sizes to be tested via the multi_eps argument (or use the default ones), and use the label argument to switch between the FD, or absolute or relative error.

[6]:

gc = pypesto_problem.objective.check_grad_multi_eps(
    x=parameter_vector,
    verbosity=0,
    label="rel_err",  # default
)

Use the pandas style methods to visualise the results of the gradient check, e.g.:

[7]:

def highlight_value_above_threshold(x, threshold=1):
    return ["color: darkorange" if xi > threshold else None for xi in x]


def highlight_gradient_check(gc: pd.DataFrame):
    return (
        gc.style.apply(
            highlight_value_above_threshold,
            subset=["fd_err"],
        )
        .background_gradient(
            cmap=sns.light_palette("purple", as_cmap=True),
            subset=["abs_err"],
        )
        .background_gradient(
            cmap=sns.light_palette("red", as_cmap=True),
            subset=["rel_err"],
        )
        .background_gradient(
            cmap=sns.color_palette("viridis", as_cmap=True),
            subset=["eps"],
        )
    )


highlight_gradient_check(gc)

[7]:

	grad	fd_f	fd_b	fd_c	fd_err	abs_err	rel_err	eps
Epo_degradation_BaF3	2220911980.111206	2218815574.707031	2223017339.599609	2220916457.153320	4201764.892578	4477.042115	0.000002	0.001000
k_exp_hetero	70.221175	501.232910	265.585938	383.409424	235.646973	313.188249	0.816638	0.100000
k_exp_homo	84022.128076	85081.540527	82042.426758	83561.983643	3039.113770	460.144434	0.005507	0.100000
k_imp_hetero	28258498.729311	28309819.091797	28160520.996094	28235170.043945	149298.095703	23328.685366	0.000826	0.001000
k_imp_homo	5956.185241	5343.374023	6553.217773	5948.295898	1209.843750	7.889343	0.001326	0.100000
k_phos	-2433152761.910034	-2435951032.958984	-2430378865.966797	-2433164949.462891	5572166.992188	12187.552857	0.000005	0.001000
sd_pSTAT5A_rel	-5862715285801.673828	-5862576707983.398438	-5862853868212.890625	-5862715288098.144531	277160229.492188	2296.470703	0.000000	0.000010
sd_pSTAT5B_rel	-47591221503.147217	-47586539550.781242	-47595903466.796867	-47591221508.789055	9363916.015625	5.641838	0.000000	0.000010
sd_rSTAT5A_rel	14.154924	35874.755859	-35847.167969	13.793945	71721.923828	0.360978	0.026167	0.001000

There are consistently large discrepancies between forward and backward FD and a large relative error for the parameter k_exp_hetero.

Ideally, all gradients would agree, but especially at not-so-smooth points of the objective, like (local) optima, large FD errors can occur. It is recommended to check gradients over a lot of random points and check if there are consistently large errors for specific parameters.

Below we perform a gradient check for another random point and observe small errors:

[8]:

parameter_vector = startpoints[1]

gc = pypesto_problem.objective.check_grad_multi_eps(
    x=parameter_vector,
    verbosity=0,
    label="rel_err",  # default
)
highlight_gradient_check(gc)

[8]:

	grad	fd_f	fd_b	fd_c	fd_err	abs_err	rel_err	eps
Epo_degradation_BaF3	0.000266	0.000299	0.000238	0.000268	0.000061	0.000002	0.000024	0.100000
k_exp_hetero	0.000174	0.000170	0.000176	0.000173	0.000007	0.000001	0.000007	0.100000
k_exp_homo	-0.000254	-0.000270	-0.000237	-0.000253	0.000032	0.000000	0.000001	0.100000
k_imp_hetero	0.125483	0.125345	0.125628	0.125486	0.000283	0.000003	0.000026	0.001000
k_imp_homo	0.153206	0.153031	0.153370	0.153201	0.000339	0.000006	0.000036	0.001000
k_phos	-0.280337	-0.280660	-0.280019	-0.280339	0.000641	0.000002	0.000008	0.001000
sd_pSTAT5A_rel	-2344.115830	-2344.060317	-2344.171330	-2344.115824	0.111014	0.000007	0.000000	0.000010
sd_pSTAT5B_rel	-122250603.677674	-122247788.796946	-122253418.645635	-122250603.721291	5629.848689	0.043617	0.000000	0.000010
sd_rSTAT5A_rel	-31780.671353	-31779.938564	-31781.404465	-31780.671515	1.465902	0.000162	0.000000	0.000010

Gradient check after optimization

Next, we do optimization and perform a gradient check at a local optimum.

[9]:

%%capture

result = optimize.minimize(
    problem=pypesto_problem,
    optimizer=optimize.ScipyOptimizer(),
    n_starts=4,
)

(Local) optima can be points with weird gradients. At a steep optimum, the fd_err is expected to be high.

At the local optimum shown below, the sd_pSTAT5B_rel forward and backward FD have opposite signs and are quite large, resulting in a substantial fd_err.

[10]:

# parameter vector at the local optimum, obtained from optimization
parameter_vector = pypesto_problem.get_reduced_vector(
    result.optimize_result[0].x
)

highlight_gradient_check(
    gc=pypesto_problem.objective.check_grad_multi_eps(
        x=parameter_vector,
        verbosity=0,
        label="rel_err",  # default
    )
)

[10]:

	grad	fd_f	fd_b	fd_c	fd_err	abs_err	rel_err	eps
Epo_degradation_BaF3	-0.035754	1.264307	-1.337803	-0.036748	2.602110	0.000994	0.027800	0.001000
k_exp_hetero	-0.000614	0.064476	-0.044984	0.009746	0.109460	0.010360	0.094401	0.100000
k_exp_homo	-0.000413	-0.000358	-0.000499	-0.000429	0.000141	0.000016	0.000161	0.100000
k_imp_hetero	-0.019552	0.731206	-0.774481	-0.021638	1.505687	0.002086	0.101065	0.001000
k_imp_homo	-0.006244	0.153369	-0.165213	-0.005922	0.318583	0.000323	0.065528	0.001000
k_phos	0.012015	0.487501	-0.467896	0.009803	0.955396	0.002213	0.204835	0.001000
sd_pSTAT5A_rel	-0.008130	-2354.837221	2354.820964	-0.008129	4709.658185	0.000002	0.000231	0.000000
sd_pSTAT5B_rel	-0.000448	-0.235082	0.234188	-0.000447	0.469269	0.000001	0.002246	0.000010
sd_rSTAT5A_rel	-0.001094	-2354.830201	2354.828013	-0.001094	4709.658214	0.000000	0.000401	0.000000

How to “fix” my gradients?

Find suitable simulation tolerances.

Specific to the petab-amici-pipeline:

Check the simulation logs for Warnings and Errors.
Consider switching between forward and adjoint sensitivity algorithms.

[ ]: