Gradient checks
It is best practice to do gradient checks before and after gradient-based optimization.
Find suitable tolerances to use during optimization. Importantly, test your gradients using the settings you will use later on.
At the optimum the values should be close to 0, except for parameters with active bounds.
Gradient checks can help you identify inconsistencies and errors, especially when using custom gradient calculation or objectives.
Here we show, how to use the gradient check methods that are implemented in pyPESTO, using the finite differences (FD) method as a comparison. There is a trade-off between the quality of the approximation and numerical noise, so it is recommended to try different FD step sizes.
[1]:
import benchmark_models_petab as models
import numpy as np
import pypesto.optimize as optimize
import pypesto.petab
np.random.seed(2)
import pandas as pd
import seaborn as sns
Set up an example problem
[2]:
%%capture
model_name = "Boehm_JProteomeRes2014"
petab_problem = models.get_problem(model_name)
importer = pypesto.petab.PetabImporter(petab_problem)
pypesto_problem = importer.create_problem(verbose=False)
Compiling amici model to folder /home/docs/checkouts/readthedocs.org/user_builds/pypesto/checkouts/1652/doc/example/amici_models/1.0.0/Boehm_JProteomeRes2014.
[3]:
startpoints = pypesto_problem.get_startpoints(n_starts=4)
Gradient check before optimization
Perform a gradient check at the location of one of the random parameter vectors. check_grad compares the gradients obtained by the finite differences (FD) method and the objective gradient. You can modify the finite differences step size via the argument eps.
[4]:
pypesto_problem.objective.check_grad(
x=startpoints[0],
eps=1e-5, # default
verbosity=0,
)
[4]:
| grad | fd_f | fd_b | fd_c | fd_err | abs_err | rel_err | |
|---|---|---|---|---|---|---|---|
| Epo_degradation_BaF3 | 2.220912e+09 | 2.220538e+09 | 2.220430e+09 | 2.220484e+09 | 1.089355e+05 | 4.279957e+05 | 1.927489e-04 |
| k_exp_hetero | 7.022117e+01 | -3.842651e+06 | -1.103638e+06 | -2.473145e+06 | 2.739014e+06 | 2.473215e+06 | 1.000028e+00 |
| k_exp_homo | 8.402213e+04 | -1.905737e+06 | 1.543799e+06 | -1.809692e+05 | 3.449536e+06 | 2.649914e+05 | 1.464290e+00 |
| k_imp_hetero | 2.825850e+07 | 3.301833e+07 | 2.931040e+07 | 3.116437e+07 | 3.707935e+06 | 2.905869e+06 | 9.324331e-02 |
| k_imp_homo | 5.956185e+03 | 3.761719e+05 | -1.238330e+06 | -4.310791e+05 | 1.614502e+06 | 4.370353e+05 | 1.013817e+00 |
| k_phos | -2.433153e+09 | -2.430767e+09 | -2.432919e+09 | -2.431843e+09 | 2.152539e+06 | 1.309842e+06 | 5.386211e-04 |
| sd_pSTAT5A_rel | -5.862715e+12 | -5.862577e+12 | -5.862854e+12 | -5.862715e+12 | 2.771602e+08 | 2.296471e+03 | 3.917077e-10 |
| sd_pSTAT5B_rel | -4.759122e+10 | -4.758654e+10 | -4.759590e+10 | -4.759122e+10 | 9.363916e+06 | 5.641838e+00 | 1.185479e-10 |
| sd_rSTAT5A_rel | 1.415492e+01 | 3.586157e+06 | -3.586108e+06 | 2.441406e+01 | 7.172266e+06 | 1.025914e+01 | 4.202142e-01 |
Explanation of the gradient check result columns:
grad: Objective gradientfd_f: FD forward differencefd_b: FD backward differencefd_c: Approximation of FD central difference (reusing the information fromfd_fandfd_b)fd_err: Deviation between forward and backward differencesfd_f,fd_babs_err: Absolute error betweengradand the central FD gradientfd_crel_errRelative error betweengradand the central FD gradientfd_c
Problem.get_reduced_vector to get the reduced vector with only free (estimated) parameters.eps = 1e-6 and observe that the errors change:[5]:
parameter_vector = pypesto_problem.get_reduced_vector(startpoints[0])
pypesto_problem.objective.check_grad(
x=parameter_vector,
eps=1e-6,
verbosity=0,
)
[5]:
| grad | fd_f | fd_b | fd_c | fd_err | abs_err | rel_err | |
|---|---|---|---|---|---|---|---|
| Epo_degradation_BaF3 | 2.220912e+09 | 2.246010e+09 | 2.218966e+09 | 2.232488e+09 | 2.704346e+07 | 1.157581e+07 | 5.185163e-03 |
| k_exp_hetero | 7.022117e+01 | 2.099487e+07 | -3.694434e+07 | -7.974731e+06 | 5.793921e+07 | 7.974802e+06 | 1.000009e+00 |
| k_exp_homo | 8.402213e+04 | 2.855469e+06 | -2.802148e+07 | -1.258301e+07 | 3.087695e+07 | 1.266703e+07 | 1.006677e+00 |
| k_imp_hetero | 2.825850e+07 | -4.105957e+06 | 1.072534e+07 | 3.309692e+06 | 1.483130e+07 | 2.494881e+07 | 7.538104e+00 |
| k_imp_homo | 5.956185e+03 | 2.987256e+07 | -4.359131e+06 | 1.275671e+07 | 3.423169e+07 | 1.275076e+07 | 9.995331e-01 |
| k_phos | -2.433153e+09 | -2.414397e+09 | -2.440591e+09 | -2.427494e+09 | 2.619385e+07 | 5.659110e+06 | 2.331256e-03 |
| sd_pSTAT5A_rel | -5.862715e+12 | -5.862666e+12 | -5.862765e+12 | -5.862715e+12 | 9.872021e+07 | 1.063561e+03 | 1.814109e-10 |
| sd_pSTAT5B_rel | -4.759122e+10 | -4.755525e+10 | -4.762719e+10 | -4.759122e+10 | 7.194189e+07 | 6.760034e+01 | 1.420437e-09 |
| sd_rSTAT5A_rel | 1.415492e+01 | 3.586108e+07 | -3.586108e+07 | 0.000000e+00 | 7.172217e+07 | 1.415492e+01 | 1.415492e+07 |
The method check_grad_multi_eps calls the check_grad method multiple times with different settings for the FD step size and reports the setting that results in the smallest error. You can supply a list of FD step sizes to be tested via the multi_eps argument (or use the default ones), and use the label argument to switch between the FD, or absolute or relative error.
[6]:
gc = pypesto_problem.objective.check_grad_multi_eps(
x=parameter_vector,
verbosity=0,
label="rel_err", # default
)
Use the pandas style methods to visualise the results of the gradient check, e.g.:
[7]:
def highlight_value_above_threshold(x, threshold=1):
return ["color: darkorange" if xi > threshold else None for xi in x]
def highlight_gradient_check(gc: pd.DataFrame):
return (
gc.style.apply(
highlight_value_above_threshold,
subset=["fd_err"],
)
.background_gradient(
cmap=sns.light_palette("purple", as_cmap=True),
subset=["abs_err"],
)
.background_gradient(
cmap=sns.light_palette("red", as_cmap=True),
subset=["rel_err"],
)
.background_gradient(
cmap=sns.color_palette("viridis", as_cmap=True),
subset=["eps"],
)
)
highlight_gradient_check(gc)
[7]:
| grad | fd_f | fd_b | fd_c | fd_err | abs_err | rel_err | eps | |
|---|---|---|---|---|---|---|---|---|
| Epo_degradation_BaF3 | 2220911980.111206 | 2218815574.707031 | 2223017339.599609 | 2220916457.153320 | 4201764.892578 | 4477.042115 | 0.000002 | 0.001000 |
| k_exp_hetero | 70.221175 | 501.232910 | 265.585938 | 383.409424 | 235.646973 | 313.188249 | 0.816638 | 0.100000 |
| k_exp_homo | 84022.128076 | 85081.540527 | 82042.426758 | 83561.983643 | 3039.113770 | 460.144434 | 0.005507 | 0.100000 |
| k_imp_hetero | 28258498.729311 | 28309819.091797 | 28160520.996094 | 28235170.043945 | 149298.095703 | 23328.685366 | 0.000826 | 0.001000 |
| k_imp_homo | 5956.185241 | 5343.374023 | 6553.217773 | 5948.295898 | 1209.843750 | 7.889343 | 0.001326 | 0.100000 |
| k_phos | -2433152761.910034 | -2435951032.958984 | -2430378865.966797 | -2433164949.462891 | 5572166.992188 | 12187.552857 | 0.000005 | 0.001000 |
| sd_pSTAT5A_rel | -5862715285801.673828 | -5862576707983.398438 | -5862853868212.890625 | -5862715288098.144531 | 277160229.492188 | 2296.470703 | 0.000000 | 0.000010 |
| sd_pSTAT5B_rel | -47591221503.147217 | -47586539550.781242 | -47595903466.796867 | -47591221508.789055 | 9363916.015625 | 5.641838 | 0.000000 | 0.000010 |
| sd_rSTAT5A_rel | 14.154924 | 35874.755859 | -35847.167969 | 13.793945 | 71721.923828 | 0.360978 | 0.026167 | 0.001000 |
There are consistently large discrepancies between forward and backward FD and a large relative error for the parameter k_exp_hetero.
Ideally, all gradients would agree, but especially at not-so-smooth points of the objective, like (local) optima, large FD errors can occur. It is recommended to check gradients over a lot of random points and check if there are consistently large errors for specific parameters.
Below we perform a gradient check for another random point and observe small errors:
[8]:
parameter_vector = startpoints[1]
gc = pypesto_problem.objective.check_grad_multi_eps(
x=parameter_vector,
verbosity=0,
label="rel_err", # default
)
highlight_gradient_check(gc)
[8]:
| grad | fd_f | fd_b | fd_c | fd_err | abs_err | rel_err | eps | |
|---|---|---|---|---|---|---|---|---|
| Epo_degradation_BaF3 | 0.000266 | 0.000299 | 0.000238 | 0.000268 | 0.000061 | 0.000002 | 0.000024 | 0.100000 |
| k_exp_hetero | 0.000174 | 0.000170 | 0.000176 | 0.000173 | 0.000007 | 0.000001 | 0.000007 | 0.100000 |
| k_exp_homo | -0.000254 | -0.000270 | -0.000237 | -0.000253 | 0.000032 | 0.000000 | 0.000001 | 0.100000 |
| k_imp_hetero | 0.125483 | 0.125345 | 0.125628 | 0.125486 | 0.000283 | 0.000003 | 0.000026 | 0.001000 |
| k_imp_homo | 0.153206 | 0.153031 | 0.153370 | 0.153201 | 0.000339 | 0.000006 | 0.000036 | 0.001000 |
| k_phos | -0.280337 | -0.280660 | -0.280019 | -0.280339 | 0.000641 | 0.000002 | 0.000008 | 0.001000 |
| sd_pSTAT5A_rel | -2344.115830 | -2344.060317 | -2344.171330 | -2344.115824 | 0.111014 | 0.000007 | 0.000000 | 0.000010 |
| sd_pSTAT5B_rel | -122250603.677674 | -122247788.796946 | -122253418.645635 | -122250603.721291 | 5629.848689 | 0.043617 | 0.000000 | 0.000010 |
| sd_rSTAT5A_rel | -31780.671353 | -31779.938564 | -31781.404465 | -31780.671515 | 1.465902 | 0.000162 | 0.000000 | 0.000010 |
Gradient check after optimization
Next, we do optimization and perform a gradient check at a local optimum.
[9]:
%%capture
result = optimize.minimize(
problem=pypesto_problem,
optimizer=optimize.ScipyOptimizer(),
n_starts=4,
)
(Local) optima can be points with weird gradients. At a steep optimum, the fd_err is expected to be high.
At the local optimum shown below, the sd_pSTAT5B_rel forward and backward FD have opposite signs and are quite large, resulting in a substantial fd_err.
[10]:
# parameter vector at the local optimum, obtained from optimization
parameter_vector = pypesto_problem.get_reduced_vector(
result.optimize_result[0].x
)
highlight_gradient_check(
gc=pypesto_problem.objective.check_grad_multi_eps(
x=parameter_vector,
verbosity=0,
label="rel_err", # default
)
)
[10]:
| grad | fd_f | fd_b | fd_c | fd_err | abs_err | rel_err | eps | |
|---|---|---|---|---|---|---|---|---|
| Epo_degradation_BaF3 | -0.035754 | 1.264307 | -1.337803 | -0.036748 | 2.602110 | 0.000994 | 0.027800 | 0.001000 |
| k_exp_hetero | -0.000614 | 0.064476 | -0.044984 | 0.009746 | 0.109460 | 0.010360 | 0.094401 | 0.100000 |
| k_exp_homo | -0.000413 | -0.000358 | -0.000499 | -0.000429 | 0.000141 | 0.000016 | 0.000161 | 0.100000 |
| k_imp_hetero | -0.019552 | 0.731206 | -0.774481 | -0.021638 | 1.505687 | 0.002086 | 0.101065 | 0.001000 |
| k_imp_homo | -0.006244 | 0.153369 | -0.165213 | -0.005922 | 0.318583 | 0.000323 | 0.065528 | 0.001000 |
| k_phos | 0.012015 | 0.487501 | -0.467896 | 0.009803 | 0.955396 | 0.002213 | 0.204835 | 0.001000 |
| sd_pSTAT5A_rel | -0.008130 | -2354.837221 | 2354.820964 | -0.008129 | 4709.658185 | 0.000002 | 0.000231 | 0.000000 |
| sd_pSTAT5B_rel | -0.000448 | -0.235082 | 0.234188 | -0.000447 | 0.469269 | 0.000001 | 0.002246 | 0.000010 |
| sd_rSTAT5A_rel | -0.001094 | -2354.830201 | 2354.828013 | -0.001094 | 4709.658214 | 0.000000 | 0.000401 | 0.000000 |
How to “fix” my gradients?
Find suitable simulation tolerances.
Specific to the petab-amici-pipeline:
Check the simulation logs for Warnings and Errors.
Consider switching between forward and adjoint sensitivity algorithms.
[ ]: