{ "cells": [ { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "# Result Storage\n", "\n", "This notebook will show how to store pypesto result objects to be able to load them later on for visualization and further analysis.\n", "This includes sampling, profiling and optimization. Additionally, we will show how to use optimization history to look further into an optimization run and how to store the history.\n", "\n", "After this notebook, you will...\n", "\n", "* know how to store and load optimization, profiling and sampling results\n", "* know how to store and load optimization history\n", "* know basic plotting functions for optimization history to inspect optimization convergence" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "# install if not done yet\n", "# %pip install pypesto --quiet" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "### Imports" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "import logging\n", "import tempfile\n", "\n", "import matplotlib as mpl\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "from IPython.display import Markdown, display\n", "\n", "import pypesto.optimize as optimize\n", "import pypesto.petab\n", "import pypesto.profile as profile\n", "import pypesto.sample as sample\n", "import pypesto.visualize as visualize\n", "\n", "mpl.rcParams[\"figure.dpi\"] = 100\n", "mpl.rcParams[\"font.size\"] = 18\n", "# set a random seed to get reproducible results\n", "np.random.seed(3142)\n", "\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "## 0. Objective function and problem definition\n", "\n", "We will use the Boehm model from the [benchmark initiative](https://github.com/Benchmarking-Initiative/Benchmark-Models-PEtab) in this notebook as an example.\n", "We load the model through [PEtab](https://petab.readthedocs.io/en/latest/), a data format for specifying parameter estimation problems in systems biology." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "%%capture\n", "# directory of the PEtab problem\n", "petab_yaml = \"./conversion_reaction/conversion_reaction.yaml\"\n", "\n", "importer = pypesto.petab.PetabImporter.from_yaml(petab_yaml)\n", "problem = importer.create_problem(verbose=False)" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "## 1. Filling in the result file\n", "\n", "We will now run a standard parameter estimation pipeline with this model. Aside from the part on the history, we shall not go into detail here,\n", "as this is covered in other tutorials such as [Getting Started](getting_started.ipynb) and [AMICI in pyPESTO](amici.ipynb)." ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "### Optimization" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "%%time\n", "\n", "# create optimizers\n", "optimizer = optimize.FidesOptimizer(\n", " verbose=logging.ERROR, options={\"maxiter\": 200}\n", ")\n", "\n", "# set number of starts\n", "n_starts = 10 # usually a larger number >=100 is used\n", "\n", "# Optimization\n", "result = pypesto.optimize.minimize(\n", " problem=problem, optimizer=optimizer, n_starts=n_starts\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "display(Markdown(result.summary()))" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "### Profiling" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "%%time\n", "\n", "# Profiling\n", "result = profile.parameter_profile(\n", " problem=problem,\n", " result=result,\n", " optimizer=optimizer,\n", " profile_index=np.array([0, 1]),\n", ")" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "### Sampling" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "%%time\n", "\n", "# Sampling\n", "sampler = sample.AdaptiveMetropolisSampler()\n", "result = sample.sample(\n", " problem=problem,\n", " sampler=sampler,\n", " n_samples=1000, # rather low\n", " result=result,\n", " filename=None,\n", ")" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "## 2. Storing the result file\n", "\n", "We filled all our analyses into one result file. We can now store this result object into HDF5 format to reload this later on." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "# create temporary file\n", "fn = tempfile.NamedTemporaryFile(suffix=\".hdf5\", delete=False)\n", "# write the result with the write_result function.\n", "# Choose which parts of the result object to save with\n", "# corresponding booleans.\n", "pypesto.store.write_result(\n", " result=result,\n", " filename=fn.name,\n", " problem=True,\n", " optimize=True,\n", " profile=True,\n", " sample=True,\n", ")" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "As easy as we can save the result object, we can also load it again:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "# load result with read_result function\n", "result_loaded = pypesto.store.read_result(fn.name)" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "As you can see, when loading the result object, we get a warning regarding the problem. This is the case, as the problem is not fully saved into hdf5, as a big part of the problem is the objective function. Therefore, after loading the result object, we cannot evaluate the objective function anymore. We can, however, still use the result object for plotting and further analysis.\n", "\n", "The best practice would be to still create the problem through petab and insert it into the result object after loading it." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "# dummy call to non-existent objective function would fail\n", "test_parameter = result.optimize_result[0].x[problem.x_free_indices]\n", "# result_loaded.problem.objective(test_parameter)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "result_loaded.problem = problem\n", "print(\n", " f\"Objective function call: {result_loaded.problem.objective(test_parameter)}\"\n", ")\n", "print(f\"Corresponding saved value: {result_loaded.optimize_result[0].fval}\")" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "To show that for visualizations however, the storage and loading of the result object is accurate, we will plot some result visualizations." ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "## 3. Visualization Comparison" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "### Optimization" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "# waterfall plot original\n", "ax = visualize.waterfall(result)\n", "ax.title.set_text(\"Original Result\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "# waterfall plot loaded\n", "ax = visualize.waterfall(result_loaded)\n", "ax.title.set_text(\"Loaded Result\")" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "### Profiling" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "# profile plot original\n", "ax = visualize.profiles(result)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "# profile plot loaded\n", "ax = visualize.profiles(result_loaded)" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "### Sampling" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "# sampling plot original\n", "ax = visualize.sampling_fval_traces(result)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "# sampling plot loaded\n", "ax = visualize.sampling_fval_traces(result_loaded)" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "We can see that we are perfectly able to reproduce the plots from the loaded result object. With this, we can reuse the result object for further analysis and visualization again and again without spending time and resources on rerunning the analyses." ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "## 4. Optimization History" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "During optimization, it is possible to regularly write the objective function trace to file. This is useful, e.g., when runs fail, or for various diagnostics. Currently, pyPESTO can save histories to 3 backends: in-memory, as CSV files, or to HDF5 files." ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "### Memory History" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "To record the history in-memory, just set `trace_record=True` in the `pypesto.HistoryOptions`. Then, the optimization result contains those histories:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "%%time\n", "# record the history\n", "history_options = pypesto.HistoryOptions(trace_record=True)\n", "\n", "# Run optimizations\n", "result = optimize.minimize(\n", " problem=problem,\n", " optimizer=optimizer,\n", " n_starts=n_starts,\n", " history_options=history_options,\n", " filename=None,\n", ")" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "Now, in addition to queries on the result, we can also access the history." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "print(\"History type: \", type(result.optimize_result.list[0].history))\n", "# print(\"Function value trace of best run: \", result.optimize_result.list[0].history.get_fval_trace())\n", "\n", "fig, ax = plt.subplots(1, 2)\n", "visualize.waterfall(result, ax=ax[0])\n", "visualize.optimizer_history(result, ax=ax[1])\n", "fig.set_size_inches((15, 5))" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "### CSV History" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "The in-memory storage is, however, not stored anywhere. To do that, it is possible to store either to CSV or HDF5. This is specified via the `storage_file` option. If it ends in `.csv`, a `pypesto.objective.history.CsvHistory` will be employed; if it ends in `.hdf5` a `pypesto.objective.history.Hdf5History`. Occurrences of the substring `{id}` in the filename are replaced by the multistart id, allowing to maintain a separate file per run (this is necessary for CSV as otherwise runs are overwritten)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "%%time\n", "# create temporary file\n", "with tempfile.NamedTemporaryFile(suffix=\"_{id}.csv\") as fn_csv:\n", " # record the history and store to CSV\n", " history_options = pypesto.HistoryOptions(\n", " trace_record=True, storage_file=fn_csv.name\n", " )\n", "\n", " # Run optimizations\n", " result = optimize.minimize(\n", " problem=problem,\n", " optimizer=optimizer,\n", " n_starts=n_starts,\n", " history_options=history_options,\n", " filename=None,\n", " )" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "Note that for this simple cost function, saving to CSV takes a considerable amount of time. This overhead decreases for more costly simulators, e.g., using ODE simulations via AMICI." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "print(\"History type: \", type(result.optimize_result.list[0].history))\n", "# print(\"Function value trace of best run: \", result.optimize_result.list[0].history.get_fval_trace())\n", "\n", "fig, ax = plt.subplots(1, 2)\n", "visualize.waterfall(result, ax=ax[0])\n", "visualize.optimizer_history(result, ax=ax[1])\n", "fig.set_size_inches((15, 5))" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "### HDF5 History" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "Just as in CSV, writing the history to HDF5 takes a considerable amount of time.\n", "If a user specifies a HDF5 output file named `my_results.hdf5` and uses a parallelization engine, then:\n", "* a folder is created to contain partial results, named `my_results/` (the stem of the output filename)\n", "* files are created to store the results of each start, named `my_results/my_results_{START_INDEX}.hdf5`\n", "* a file is created to store the combined result from all starts, named `my_results.hdf5`.\n", "Note that this file depends on the files in the `my_results/` directory, so **cease to function** if\n", "`my_results/` is deleted." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "%%time\n", "# create temporary file\n", "f_hdf5 = tempfile.NamedTemporaryFile(suffix=\".hdf5\", delete=False)\n", "fn_hdf5 = f_hdf5.name\n", "\n", "# record the history and store to CSV\n", "history_options = pypesto.HistoryOptions(\n", " trace_record=True, storage_file=fn_hdf5\n", ")\n", "\n", "# Run optimizations\n", "result = optimize.minimize(\n", " problem=problem,\n", " optimizer=optimizer,\n", " n_starts=n_starts,\n", " history_options=history_options,\n", " filename=fn_hdf5,\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "print(\"History type: \", type(result.optimize_result.list[0].history))\n", "# print(\"Function value trace of best run: \", result.optimize_result.list[0].history.get_fval_trace())\n", "\n", "fig, ax = plt.subplots(1, 2)\n", "visualize.waterfall(result, ax=ax[0])\n", "visualize.optimizer_history(result, ax=ax[1])\n", "fig.set_size_inches((15, 5))" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "For the HDF5 history, it is possible to load the history from file, and to plot it, together with the optimization result." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "# load the history\n", "result_loaded_w_history = pypesto.store.read_result(fn_hdf5)\n", "\n", "fig, ax = plt.subplots(1, 2)\n", "visualize.waterfall(result_loaded_w_history, ax=ax[0])\n", "visualize.optimizer_history(result_loaded_w_history, ax=ax[1])\n", "fig.set_size_inches((15, 5))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "# close the temporary file\n", "f_hdf5.close()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.3" } }, "nbformat": 4, "nbformat_minor": 4 }