# An introduction to mathematical modeling in systems biology

Gunnar Cedersund
Rikard Johansson
William Lövfors
Elin Nyman
Tilda Herrgårdh

Jan 2022

sequenceDiagram Hypotheses & Data ->> Models: formulate Models->>Rejection: fail statistical test Models->>Predictions: pass statistical test Predictions->>New experiment: core prediction

Modelling cycle.

## Introduction

First thing to do, is to read the introduction "hidden" below.

Click to expand introduction

This is a new version of the lab introduced in 2019, more oriented towards writing your own scripts to further your understanding of the different parts, and also to give you valuable programming skills for your future career. The current version of the lab is implemented in MATLAB, but you are free to pick other languages if you can find the proper tools. Please note that the supervisors will only help with code issues in MATLAB.

### Purpose of the lab

This lab will on a conceptual level guide you through all the main sub-tasks in a systems biology project: model formulation, optimization of the model parameters with respect to experimental data, model evaluation, statistical testing, (uniqueness) analysis, identification of interesting predictions, and testing of those predictions.

### How to pass

To pass the entire lab you need to individually pass the quiz on lisam. ("Computer lab - coding version" if you're doing this version) AND pass an individual summary and final discussion of the entire lab with one of the supervisors. Along the instructions, there are check-points that correspond to questions in the quiz. These are marked with checkpoint x.

The deadline to be approved is at the final computer lab session

### Preparations

Get familiar with MATLAB by doing Mathworks intro tutorial. You can find the tutorial at this link.

Most of the tasks below start with Background. If one wants to prepare, these can be read in advance of the lab. They are written more as a background about the different topics, and can be read without solving the previous tasks.

### Getting MATLAB running

Follow the installation guide on Lisam, or here.

#### In the computer halls

If you are working on a Linux based computer in the computer hall, use the following steps to get MATLAB going.
First, open a terminal. Then run module add prog/matlab/9.5.
Note that the version sometimes changes, if 9.5 can't be found, check available versions with module avail.

### Interpreting function relationship graphs

Throughout this lab, we will use graphs similar to the one below to illustrate how different functions communicate.

sequenceDiagram F1 ->> F2: arg1, arg2 F2 ->> F1: arg3

Here, a function F1 is sending two arguments, arg1, arg2, to another function F2. F2 then returns arg3to function F1.

### Questions?

If you have questions when going through the lab, first have a look at Q&A in the course room (lisam), if you cannot find an answer, bring the questions to the computer lab sessions, or you can also raise them in Teams. In many situations, your fellow students probably know the answer to your question. Please help each other if possible!

## Main computer lab tasks

### Implementing and simulating a model

Background

Now, let us start with implementing a simple model! Here we will learn the basics of how to create models and simulate them. Implement a model with a single state A, with a constant increase ($k_1 = 2$) and an initial amount of 4 ($A(0)=4$). Let the amount of A be the measurement equations ($\hat{y}$).

Software/toolboxes/compilers.

Check that you have them installed.
Mex-compiler install:
Run mex -setup. If Matlab doesn't find a compiler, install one, e.g. mingw.
Then, download IQM toolbox from LISAM or here, extract the files, move to the folder in MATLAB and then run the file called installIQMtoolsInitial.m. Move back to the folder where you will work on this lab.

Implement simple model

Create a text file and use the following model formulation.
Note that the text file must contain all headers (beginning with “********** “)

********** MODEL NAME
SimpleModel
********** MODEL NOTES
********** MODEL STATES
d/dt(A)=v1
A(0)=4
********** MODEL PARAMETERS
k1=2
********** MODEL VARIABLES
ySim=A
********** MODEL REACTIONS
v1=k1
********** MODEL FUNCTIONS
********** MODEL EVENTS
********** MODEL MATLAB FUNCTIONS

Compile model

Now, we will first convert your equations into a model object, so that you can compile the model into a so called MEX-file. The MEX-file is essentially a way to run C-code inside MATLAB, which is much faster then ordinary MATLAB code.

modelName='SimpleModel'
model=IQMmodel([modelName '.txt']);
IQMmakeMEXmodel(model) %this creates a MEX file which is ~100 times faster


Note: The compiled file will have a filename based on what is written under
****** MODEL NAME, and an ending based on what OS it is compiled on. In this case, SimpleModel.mex[w64/maci64/a64]

Simulate model

Create a main script.
This is the script we will use to call the other functions we create. Here, we will call this script Main.
To create a function/script, press “new” in the toolbar and choose function/script.

Create simulation function (SimulateExperiment) that takes (modelName, params, and time) as input and returns a vector ySim as output. To do this, click on New in MATLABs command window and select Function. Edit the input and outputs to use the following structure:

sequenceDiagram Main ->> SimulateExperiment: modelName, params, time SimulateExperiment ->> Main: ySim

The blue boxes represents a function/script with the name written in it. The black arrows represents the direction a set of variable is sent between two functions. In the above example, the script Main passes the variables modelName, params, time (e.g. = 0:0.2:2) to the function SimulateExperiment, that returns ySim back to Main.

The next step will be to implement the actual simulation, but for now, just set ySim to some dummy values and leave the rest of the script empty.

Implement the actual simulation.
There exists multiple ways of simulating the model. 3 examples are given below:
(Note, IC is the initial conditions)

1. sim = IQMPsimulate(model, time, IC, paramNames, params)
2. sim = IQMPsimulate(modelName, time, IC, paramNames, params)
3. sim = SimpleModel(time, IC, params)

1) takes a IQM model as first input, and will automatically compile the model. This might sound good, but will result in the model being compiled every time a simulation is done. Running a compiled model is much faster, but the compilation takes some time. Therefore this is a bad idea.
2) and 3) both uses the MEX-compiled model, however

3) uses the file directly and is almost always the best alternative (5x faster and more convenient). Here, it is also possible to skip IC and/or params as inputs. The model will then instead use the values written in the model file (when the MEX-file was compiled). In practice, this is done by passing along [] instead of e.g. IC.

Hint Use alternative 3.

Instead of using the actual name (e.g. SimpleModel above), you can use any valid string. See below for an example.

model = str2func(modelName)
sim = model(time, IC, params)


Define output of simulation.
The simulation returns a struct, with many relevant fields. In our case, we are interested in the variable values. You can access these values in the following way:
ySim = sim.variablevalues

Dealing with problematic simulations.
Sometimes the simulation crashes for certain sets of parameters. To bypass that, add a try-catch block in your simulation script and return a vector of the same size as ySim should be (the size of time’), but with a very large value for each element.

try
sim = model(time, [], params); % Replace [] with IC to use custom values
ySim = sim.variablevalues;
catch error
%disp(getReport(error)) %Print error message
ySim=1e50*ones(size(time));
end


Call your new SimulateExperiment function from a Main script with relevant inputs. For now, set params = 2 (same values as in the model file).

Run Main script

To see that it works. As a time-vector, use [0:0.2:2] and for params use the default values. Does your SimulateExperiment return ySim=[4, 4.4, 4.8, 5.2, 5.6, 6, 6.4, 6.8, 7.2, 7.6, 8]?

[ ] Checkpoint 1: Does your simple model behave as you expected?

### Data

Background

During the following exercises you will take on the role of a system biologist working in collaboration with an experimental group.

The experimental group is focusing on the cellular response to growth factors, like the epidermal growth factor (EGF), and their role in cancer. The group have a particular interest in the phosphorylation of the receptor tyrosine kinase EGFR in response to EGF and therefore perform an experiment to elucidate the role of EGFR. In the experiment, they stimulate a cancer cell line with EGFR and use an antibody to measure the tyrosine phosphorylation at site 992, which is known to be involved in downstream growth signaling. The data from these experiments show an interesting dynamic behavior with a fast increase in phosphorylation followed by a decrease down to a steady-state level which is higher than before stimulation (Figure 1) .

Figure 1: The experimental data. A cancer cell line is stimulated with EGF at time point 0, and the response at EGFR-pY992 is measured

The experimental group does not know the underlying mechanisms that give rise to this dynamic behavior. However, they have two different ideas of possible mechanisms, which we will formulate as hypotheses below. The first idea is that there is a negative feedback transmitted from EGFR to a downstream signaling molecule that is involved in dephosphorylation of EGFR. The second idea is that a downstream signaling molecule is secreted to the surrounding medium and there catalyzes breakdown of EGF, which also would reduce the phosphorylation of EGFR. The experimental group would like to know if any of these ideas agree with their data, and if so, how to design experiments to find out which is the more likely idea.

To generalize this problem, we use the following notation:
EGF – Substrate (S)
EGFR – Receptor (R)

Download the experimental data from here or on LISAM.

Load the experimental data into MATLAB by using the load function in your main script. Remember, you can read more about a function with the command help or doc. Run your main script and note that the variable is now in the workspace. Click on it and have a look at the subfields. The subfields can be accessed as e.g. expData.time.

Plot data

Create a function that takes data as input and plot all the points, with error bars.
Use the following structure:

sequenceDiagram Main ->> PlotExperiment: expData

Make sure your plots look nice. Give the axes labels.

Useful MATLAB commands:
errorbar, xlabel, ylabel, title, plot

[ ] Checkpoint 2: How does the data look?

### Hypotheses

Background

The experimentalists have 2 ideas for how they think the biological system they have studied work. These hypotheses are presented below.

Hypothesis 1

From this knowledge, we can draw an interaction graph, along with a description of the reactions.

▪ Reaction 1 is the phosphorylation of the receptor (𝑅). It depends on states 𝑅 and 𝑆 with the parameter 𝑘1.
▪ Reaction 2 is the basal dephosphorylation of 𝑅𝑝 to 𝑅. It depends on the state 𝑅𝑝 with the parameter 𝑘2.
▪ Reaction 3 is the feedback-induced dephosphorylation of 𝑅𝑝 to 𝑅 exerted by 𝑅𝑆𝑝. This reaction depends on the states 𝑅𝑝 and 𝑅𝑆𝑝 with the parameter 𝑘𝑓𝑒𝑒𝑑.
▪ Reaction 4 is the basal dephosphorylation of 𝑅𝑆𝑝 to 𝑅𝑆. It depends on the state 𝑅𝑆𝑝 with the parameter 𝑘4.
▪ Reaction 5 is the phosphorylation of 𝑅𝑆 to 𝑅𝑆𝑝. This phosphorylation is regulated by 𝑅𝑝. The reaction depends on the states 𝑅𝑆 and 𝑅𝑝, with the parameter 𝑘5.

The measurement equation is a one to one relationship with R_p.

Implement hypothesis 1
1. Formulate your equations based on the interaction graph.
2. Create a new model text file and enter your equations for model1.
Use the following values:
Initial values Parameter values
R(0) = 1 k1 = 1
Rp(0) = 0 k2 = 0.0001
RS(0) = 1 kfeed = 1000000
RSp(0) = 0 k4 = 1
S(0) = 1 k5 = 0.01
1. Use your simulation function you created, but with your new model1 and the new parameters. Make sure that you simulate for the same time as your data (you can pass along expData.time directly).
2. Update your plot function to also plot the model simulation. Call your simulation function inside the plot function. (See tips below)
sequenceDiagram Main->>PlotExperiment: expData, params, modelName PlotExperiment ->> SimulateExperiment: modelName, params, time SimulateExperiment ->> PlotExperiment: ySim

You now have ONE “main script” that:
a. Loads the experimental data
b. Compiles a model
c. Plot data with simulation in the same graph.

Tip!
MATLAB automatically overwrites your plot if you do not explicitly tell it to keep the information. To do this, use “hold on” to keep everything plotted in the figure:

errorbar(…)
hold on
plot(…)
...
hold off % This is not mandatory.


Tip!
When simulating for a plot, it is generally a good idea to simulate with a high time resolution, so yo can se what happens between the data points. Instead of passing expData.time (experimental time-points) to model(time,…), create timeHighres and pass that along.

One way of doing this is:

timeHighres = time(1):step:time(end)
sim = model(timeHighres,…)


Where step is the desired step-length (in this case, step=0.001 is probably good)

Hypothesis 2

This hypothesis is similar to the first one, but with some modifications. Here, the feedback-induced dephosphorylation of 𝑅𝑝 (reaction 3 in the first model) is removed. Instead we add a feedback-dependent breakdown of the substrate 𝑆. This reaction is dependent on the states 𝑅𝑆𝑝 and 𝑆 with a parameter 𝑘𝑓𝑒𝑒𝑑 (same name, but a different mode of action). These differences yields the following interaction-graph:

Implement hypothesis 2

Now, let us do the same thing again but for hypothesis 2.
Create a copy of your previous model file and implemented the changes in the hypothesis 2. Use the following values:

Initial values Parameter values
R(0) = 1 k1 = 5
Rp(0) = 0 k2 = 20
RS(0) = 1 kfeed = 10
RSp(0) = 0 k4 = 15
S(0) = 1 k5 = 30

You can simulate the new model by only changing modelName and params in your Main script.

[ ] Checkpoint 3: Is the agreement between models and data convincing? Can we test quantitatively if it is good enough?

### Testing models

Background

After implementing the two hypotheses into two models, you have two questions about your models:

### How good are the models?

To better evaluate your models you want to test them quantitatively.
Let us first establish a measure of how well the simulation can explain data, referred to from now on as the cost of the model. We use the following formulation of the cost:
$v\left(\theta\right)=\ \sum_{\forall t}\frac{{(y_t-\ {\hat{y}}_t\left(\theta\right))}^2}{{SEM}_t^2}$
Where $v(\theta)$ is the cost, given a set of parameters $\theta$. $y_t$ and $SEM_t$ is experimental data (mean and SEM) at time $t$, and ${\hat{y}}_t$ is model simulation at time $t$.
$y_t-\ {\hat{y}}_t\left(\theta\right)$, is commonly referred to as residuals, and represent the difference between the simulation and the data. These are summed in the equation above, or rather, the squared residuals are summed. Note that the residuals in the equation above is also weighted with the SEM.

### Are the models good enough?

Well, you see that they don't look like the data, but when are they are good enough?
There exists many statistical tests that can be used to reject models. One of the most common, and the one we will focus on in this lab is the $\chi^2$-test. A $\chi^2$-test is used to test the size of the residuals, i.e. the distance between model simulations and the mean of the measured data (Figure 2).

Figure 2: Residuals are the difference between model simulation and the mean of the measured data

For normally distributed noise, the cost function, $v(\theta)$, can be used as test variable in the $\chi^2$-test. Therefore, we test whether $v(\theta)$ is larger than a certain threshold. This threshold is determined by the inverse of the cumulative $\chi^2$ distribution function. In order to calculate the threshold, we need to decide significance level and degrees of freedom. A significance level of 5% (p=0.05) means a 5% risk of rejecting the hypothesis when it should not be rejected (i.e. a 5 % risk to reject the true model). The degrees of freedom is most often the number of terms being summed in the cost (i.e. the number of data points).

If the calculated cost is larger than the threshold, we must reject the hypothesis given the set of parameters used to simulate the model. However, there might be another parameter set that would give a lower cost for the same model structure. Therefore, to reject the hypothesis, we must search for the best possible parameter set before we use the $\chi^2$-test. To find the parameter set with the lowest cost, we use optimization.

Calculate the cost

Call your simulation function from inside the CostFunction to get the simulated values.

sequenceDiagram Main->>CostFunction: expData, params, modelName CostFunction ->> SimulateExperiment: modelName, params, time SimulateExperiment ->> CostFunction: ySim CostFunction ->> Main: cost

Tip!
MATLAB interprets *, /, ^ between vectors as vector operations. You likely want elementwise operations. This can be solved by changing / to ./ (note the dot) for any of these operations. Note that this is not an issue for +, -, if the vectors are of the same size.

Tip!
You probably have to transpose ySim. This can be done by adding a ' after the variable (e.g. ySim')

Evaluate the cost

Get the cost for both models.
Here, we will try to reject the hypothesis with a $\chi^2$-test. Calculate the limit to when the model must be rejected. You can get a limit by using the function chi2inv(0.95,dgf). Where 0.95 represents a p-value of 0.05 (1 - 0.05), and dgf is the degrees of freedom.

When selecting the degrees of freedom, there exists two major alternatives. The first is to set dgf = number of data points, and the second is dgf = number of data points - number of parameters
In this case, using the number of data points is probably the best alternative.

A third alternative could be to set limit = optimal cost + chi2inv(0.95,1) (note, dgf = 1)

If cost>limit, reject the model!

[ ] Checkpoint 4: Do you reject any of the hypotheses? Are there other statistical tests you can use?

### Improving the agreement to data

Background

To do be able to test the hypothesis, we must find the best agreement between data and simulation. If the model is not good enough assuming we have the best possible parameters, the hypothesis must be rejected. It is possible to find this best combination of parameters by manually tuning the parameters. However, that would almost certainly take way too much time. Instead, we can find this best agreement using optimization methods.

An optimization problem generally consists of two parts, an objective function with which a current solution is evaluated and some constraints that define what is considered an acceptable solution. In our case, we can use our cost function as an objective function and set lower and upper bounds (lb/ub) on the parameter values as constraints. The problem formulation typically looks something like the equations below.

$min\ v=\ Cost\ function\ (\theta) \\ Subject\ to: lb<\theta

An optimization problem can be hard to solve, and the solution we find is therefore not guaranteed to be the best possible solution. We can for example find a solution which is minimal in a limited surrounding, a so-called local minimum, instead of the best possible solution – the global minimum (Figure 3). There are methods to try to reach the global minimum, for example you can run the optimization several times from different starting points and look for the lowest solution among all runs. Please note that there might be multiple solutions with equal values.

Figure 3: The difference between a local and a global minimum.

Furthermore, there exists many different methods of solving an optimization problem. If a method is implemented as an algorithm, it is typically referred to as a solver.

Pick a solver

Some alternatives (read about them with doc solver)
MATLABs versions: particleswarm, simulannealbnd, fmincon,
IQM toolbox versions: simannealingIQM, simplexIQM

It is possible that not all solvers are equally good on all problems.

If you have a reasonably good start guess, it is probably a good idea to use a simulated annealing based solver. If you don't, particle swarm (which don't need a start guess) is probably a good solution.

Implement optimization

Add global optimization toolbox
To do an optimization, you first need to add the global optimization toolbox: search for "global optimization" in the add-ons explorer.

Enable passing one input variable to the cost function.
Most solvers can only call the cost function with one input variable (the parameter values). We must therefore update our script so that we only pass one input from the Main script to the cost function.

One way to bypass this limitation is to update the CostFunction to only accept one input and instead get the data and select model by either loading the data and model name in the cost function, or by getting them in as global variables. However, both of these methods have issues. A third (and better) alternative is to use an anonymous function. You can read more about anonymous functions here. Essentially, we create a new function which takes one input, and then calls CostFunction with that input as well as extra inputs.

How to implement an anonymous function:
Replace (in main_function):
cost = CostFunction(expdata, modelName, params)
With:

func = @(p) CostFunction(expdata, modelName, p)
cost = func(params)


Here, func acts as an intermediary function, passing along more arguments to the real cost function. Note that if expdata is changed (after you have defined it), the output from func will not change. To do that, func must be redefined (as above).

In the end, you scripts should have a structure similar to the figure below.

sequenceDiagram Main->> optimization solver*: func, x0, lb, ub optimization solver* ->> CostFunction: expData, params, modelName CostFunction ->> SimulateExperiment: modelName, params, time SimulateExperiment ->> CostFunction: ySim CostFunction ->> optimization solver*: cost optimization solver* ->> Main: cost

* Instead of optimization solver, you have the name of the optimization solver you picked, and that you shouldn't make a new script with this name - it refers to the matlab function.

Note also that not all optimization solvers require the same input, but in general they all require a function that can be used to evaluate a parameter set, a start guess of the parameter values (x0), as well as lower and upper bounds on which values the parameter can have.

An example with simulannealbnd

lb = [1e-5, 1e-5, 1e-5, 1e-5, 1e-5]
ub = [1e5, 1e5, 1e5, 1e5, 1e5]
[bestParam, minCost]= simulannealbnd(func, x0, lb, ub)


Tip 1:** using multiple solvers

You can string multiple solvers together. E.g since particle swarm does not require a start guess, it can be used to find one and then pass it on to simannealing.

[bestParamPS, minCostPS] = particleswarm(func, length(lb), lb, ub)
[bestParam, minCost] = simulannealbnd(func, bestParamPS, lb, ub)

Tip 2:** showing the progress while solving

If you want to see how the optimization progresses, you need to send in an options variable to the optimization algorithm. Below is an example for particleswarm. All MATLAB solvers (the IQM toolbox ones are a bit different) handles options in the same way, but note that the solver name has to be passed when setting the options.

options=optimoptions(@particleswarm, 'display','iter')
particleswarm(func, length(lb), lb, ub, options);

Tip 3:** searching in log-space (you want to do this)

It is often better to search for parameter values in log-space. To do this, simply take the logarithm of the start guess and lower/upper bounds. Then, go back to linear space before the simulation is done in SimulateExperiment.

sequenceDiagram Note over Main: Convert to log Main->> optimization solver': func, x0, lb, ub optimization solver ->> CostFunction: expData, params, modelName CostFunction ->> SimulateExperiment: modelName, params, time Note over SimulateExperiment: Convert to linear SimulateExperiment ->> CostFunction: ySim CostFunction ->> optimization solver: cost optimization solver ->> Main: cost

In Main:

x0=log(x0)
lb=log(lb)
ub=log(ub)


in SimulateExperiment:

param=exp(param)


Note that the simulation always have to get the "normal" parameters, if you pass along logarithmized parameters, some will be negative and the ODEs will be hard to solve (resulting in a CV_ODE_ERROR error)

Save results

Nothing is worse than having found a good parameter set, only to overwrite it with a bad one, and then being unable to find the good one again. To avoid this, you should always save your results in a way that does not overwrite any previous results. Use MATLABS save function.

save('filename', 'variable')


Note: Change filename to the file name you want, and variable to the name of the variable you want to save (e.g. bestParam)

Tip:** Using the current time as file name

A good file name to use, that would not overwrite previous results would be to use the current time and date in the name. For this, you can use the MATLAB function datestr. The default layout of the time and date contains characters you can't use when saving a file. To avoid this, we pass along a second argument, with the desired format of the date string.

datestr(now, 'yymmdd-HHMMSS')


Or even better, with also the model name:

[modelName '_' datestr(now, 'yymmdd-HHMMSS')]

Improve optimization

Experiment with the solvers. What is the best fit you can get? How does it look?<
You should get costs for both models that are at least below 20 (likely even better, but let's not spoil that). Plot your best results, to make sure that nothing strange is happening with the simulations.

[ ] Checkpoint 5: Do you have to reject any of the models?

### Model uncertainty

Background

If the cost function has a lower value than the threshold, we will move on and further study the behavior of the model. Unless the best solution is strictly at the boundary, there exists worse solutions than the best one, which would still be able to pass the statistical test. We will now try to find acceptable solutions, which are as different as possible. In this analysis, we study all found parameter sets that give an agreement between model simulation and experimental data – not only best parameter set. The parameter sets are used to simulate the uncertainty of the hypothesis given available data. This uncertainty can tell us things like: Do we have enough experimental data in relation to the complexity of the hypothesis? Are there well determined predictions (i.e. core predictions) to be found when we use the model to simulate new potential experiments?

Save all ok parameters

The easiest way of doing this is to create a text file (filename.csv in this case), and write to that file every time the cost of a tested parameter set is below the limit (e.g. the chi2-limit). This you can do using fopen, which opens he file and creates a file identifier (fid) that can be passed to different functions.

In Main:

fid = fopen('filename.csv','a') % preferably pick a better name. a = append, change to w to replace file
----do the optimization ----
fclose(fid) % We need to close the file when we are done. If you forget this, use fclose('all')


In CostFunction:

----simulate and calculate cost ----
if cost< limit
fprintf(fid, [sprintf('%.52f, ',params) sprintf('%.52f\n',cost)]); % Crashes if no fid is supplied. See tip below. Note that the cost is saved as the last column
end


Suggestion on function/variable names and structure:

[cost] = CostFunction(expdata, modelName, params, fid)

Tip:** Avoiding crashes if fid is not supplied

To not have the function crash if fid is not supplied, we can use “nargin”, which is the number of variables sent to the function. If we have said that fid will be the 4th input argument (see above), we can add another statement to the if-statement in the CostFunction.

if cost< limit && nargin==4
fprintf(fid, [sprintf('%.52f, ',params) sprintf('%.52f\n',cost)]);
end

Run many optimizations

In order to gather many parameter sets to give us an estimation of the model uncertainty, re-run your optimization a few times to find multiple different parameter sets. Ideally we want to find as different solutions as possible. To do this the easy way, just restart your optimization a few times and let it run to finish. Hopefully, it will find a reasonable spread of the parameter values. More optimizations will give an uncertainty closer to the truth. In the end, open your text file and check that there are a good amount of parameter sets in there.

Sample parameters

Create a new “main” script for sampling your saved parameters and plotting. You can essentially copy your old Main script and use that as a base. We will not be optimizing anymore, so those lines can be removed.

Import your .csv files using e.g load, readtable or dlmread
To convert from a table to array, use the function table2array.

Sample your parameter sets.
For computational reasons, it is infeasible to plot all parameter sets. Therefore we need to sample the parameter sets. There are multiple ways of sampling the parameter sets, where some are better than other. One way is to pick at random

It is likely reasonable to plot around 1,000 parameter sets in total.

Suggestion on function/variable names and structure:
[sampledParameters] = SampleParameters(allparameters)

A good way to select parameters

Use sortrows to sort based on the values in a column, then select some parameter sets with a fixed distance. Try picking ~100 sets. Then sort based on second column and repeat.

Some easy ways to select parameters

Pick indices at random
Use randi to generate ~1000 positions from 1 to the total number of parameter sets you have. E.g. ind=randi(size(allParameters,1),1,1000)

Pick indices with equal spacing
Use linspace to generate indices with equal spacing. Note, some values you get from linspace might not be integers. Therefore, round the values using round. E.g ind=round(linspace(1,size(allParameters,1),1000)).

Then, extract those parameter sets from allParameters.
sampledParameters=allParameters(ind,1:end-1) % last column is the cost

Plot model uncertainty

Use your simulation script to simulate your selected parameters. Plot model 1 with uncertainty in one color. Then, do the same thing for model 2, and plot in the same windows with another color. Lastly, plot the data. In the end, you should have both models and the data in one plot.

Suggestions on how to implement this:
Make a copy of PlotExperiment, and pass along sampled parameters and model names for both models. Then do any of the following:

1) Plot multiple lines, with different colors for the different models.

Update the PlotExperiment function to take an extra input argument for color, and loop over your sampledParameters for one model, then loop over the next.
Or, create a new plot function and use your simulation script to simulate your selected parameters. Plot all the simulations as multiple lines in the same figure with the same color (one model = one color).
Then, do the same thing for model 2 and plot in the same windows.
Lastly, plot the data.

2) Create a new function and plot an area using fill

First, simulate all parameter sets and save the maximum and minimum value (ymin and ymax in the example below), then plot using fill. Note that fill creates a shape and always connects the first and last points. Therefore, we must plot the maximum or lower value with the other one flipped (see fliplr). See example below

fill([time fliplr(time)], [ymin fliplr(ymax)], 'k')


[ ] Checkpoint 6: Are the uncertainties reliable?

### Predictions

Background

In order to see if the model behaves reasonable, we want to see if we can predict something that the model has not been trained on, and then test the prediction with a new experiment. Note that in order for the prediction to be testable, it must be limited in at least one point, often referred to as a core prediction.

#### Core predictions

Core predictions are well determined predictions that can be used to differentiate between two models. Such a differentiation is possible when model simulations behave differently and have low uncertainty for both models. Core predictions are used to find experiments that would differentiate between the suggested hypotheses and therefore, if measured, would reject at least one of the suggested hypotheses. The outcome could also be that all hypotheses are rejected, if none of the simulations agree with the new experiment. A core prediction could for example be found at one of the model states where we do not have experimental data, or at the model state we can measure, under new experimental conditions (extended time, a change in input strength etc.).

#### Experimental validation

The last step of the hypothesis testing is to hand over your analysis to the experimental group. The core predictions that discriminate between Model 1 and Model 2 should be discussed to see if the corresponding experiment is possible to perform. Now it is important to go back to think about the underlying biology again. What is the biological meaning of your core predictions? Do you think the experiment you suggest can be performed?

The outcome of the new experiment can be that you reject either Model 1 or Model 2, or that you reject both Model 1 and Model 2. What biological conclusions can we draw in each of these scenarios? What is the next step for you, the systems biologist, in each of these scenarios?

Design new experiment

Try to find a new experiment where the models behave differently.
Note, you must do the same thing in both models, so that you can differentiate them.
There are many things you can test in order to find a prediction where the models are behaving differently. Some alternatives are listed below.

Observing a different state Even though the state we have been observing so far have been behaving in the same way in the two models, that might not be true for other states. Try to plot some different states and see if you can find one that behaves differently.

Manipulating the input signal If we perturb the system a bit, it might be possible that the models will behave differently. One way could be to change the strength of the input signal (essentially changing the concentration of the substrate), or to manipulate it more complexly by for example changing the input at certain time points. In the model, this can be done by adding a so called *event*.

To create an event called event1 (a name is necessary, but not important as long as there are no duplicates), where you set S = 1 when time > 4, add the following to the model file:

...
********** MODEL EVENTS
event1 = gt(time,4), S, 1



You can change the inequality by changing gt (greater than) to ge (greater or equal), lt (less than), or le (less of equal).
Note: the simulation time must be longer then when the event is meant to occur for the event to happen.

[ ] Checkpoint 7: What is a core prediction and why is it important? Do you think your prediction can be tested experimentally?

### Evaluate your prediction

Background

You show the experimental group your predictions, and they tell you that they can do a second stimulation at time point 4. They go about and perform the new experiment. Download the new experimental data from here or on LISAM.

Plot your prediction together with the validation data and evaluate the prediction with the cost and a $\chi^2$-test as before.

Validation

Plot your prediction together with the validation data and evaluate the prediction with the cost and a $\chi^2$-test as before.

[ ] Checkpoint 8: What do you want to tell the experimentalists?

### Summary and final discussion

You have now gone through the computer lab, and it is time to look back upon what you have actually done, as well as to discuss some points that might be easy to misunderstand. Below are some questions to discuss with the supervisor.

Some summarizing questions to discuss

You should know the answers to these questions, but if you do not remember the answer, see this part as a good repetition for the coming dugga.

1. Look back at the lab and, can you give a quick recap of what you did?
2. Why did we square the residuals? And why did we normalize with the SEM?
3. Why is it important that we find the "best" parameter sets when testing the hypothesis? Do we always have to find the best?
4. Can we expect to always find the best solution for our type of problems?
5. What are the problems with underestimating the model uncertainty?
6. If the uncertainty is very big, how could you go about to reduce it?
7. Why do we want to find core predictions when comparing one or more models?

This marks the end of the computer lab. We hope that you have learnt things that will be relevant in your future work!

# Extras

The following sections regard things that might be relevant for you in the future but is outside of the scope of the lab. However, do feel free to come back here and revisit them in the future.

For some models, you should do a steady-state simulation. A steady-state simulation is a simulation of the steady-state behavior of the model - when the states of the model are not changing (their derivatives are 0). When simulating a model, it might take some time for it to settle down into its steady-state. You want your model to be in steady-state before you start your actual simulation so that the simulation that you compare with data is only dependent on the processes that you want to study (i.e. the model structure), and not on the model finding its steady-state.

In practice, get the model to be in steady-state by simulating for a long time and then use the end values of that simulation as initial values to the actual simulation in SimulateExperiment.

SteadyState = model(longTime, [], params);

IC = SteadyState.statevalues(:,end); % these are your steady-state initial conditions
sim = model(time, IC, params); % this is the simulation that you compare with data



Note that you can do this steady-state simulation in a separate function, like this:

sequenceDiagram Main ->> SimulateExperiment: modelName, params, time SimulateExperiment ->> Main: ySim SimulateExperiment ->> SimulateSteadyState: modelName, params SimulateSteadyState ->> SimulateExperiment: IC
Introduction to profile-likelihood

In a profile-likelihood analysis, we plot one parameters value on the x-axis and the cost of the whole parameter set on the y-axis. Such a curve is typically referred to as a profile-likelihood curve. To get the full range of values for the parameter, it is common practice to fixate the parameter being investigated, by removing it from the list of parameters being optimized, and setting it explicitly in the cost function, and re-optimize the rest of the parameters. This is done several times, and each time the parameter is fixed to a new value, spanning a range to see how the cost is affected from changes in the value of this specific parameter (Figure 4). If the resulting curve is defined and passes a threshold for rejection in both the positive and negative direction (Figure 4A), we say that the parameter is identifiable (that is, we can define a limited interval of possible values), and if instead the cost stays low in the positive and/or the negative direction, we say that the parameter is unidentifiable (Figure 4B-D). This whole procedure is repeated for each model parameter.

Figure 4: Examples of identifiable (A) and unidentifiable parameters (B-D). The blue line is the threshold for rejection, the red line is the value of the cost function for different values of the parameter.

In a prediction profile-likelihood analysis, predictions instead of parameter values are investigated over the whole range of possible values. Predictions cannot be fixated, like parameters can. Therefore, the prediction profile-likelihood analysis is often more computationally heavy to perform than the profile-likelihood analysis. Instead of fixation, a term is added to the cost to force the optimization to find a certain value of the prediction, while at the same time minimizing the residuals. Resulting in an objective function something like the following:
$v\left(\theta\right)=\ \sum_{\forall t}\frac{{(y_t-\ {\hat{y}}_t\left(\theta\right))}^2}{{SEM}_t^2}+\lambda\left(p-\hat{p}\right)$
Again, the whole range of values of the predictions are iterated through. A prediction profile-likelihood analysis can be used in the search for core predictions.

A simple (but not ideal) approach

Here, we will not fixate one parameter at time and reoptimize, instead we will use the already collected data. This is worse than fixing the parameters, but we will do it for the sake of time. Furthermore, as opposed to the uncertainty plots before, we do not have to select some parameter sets, we can instead use all of our collected sets. The easiest way to do a profile-likelihood plot, is to simply plot the cost against the values of one parameter. Then repeat for all parameters.

To make it slightly more pretty, one can filter the parameters to only give the lowest cost for each parameter value. This can be done using the function unique. A good idea is likely to round your parameter values to something with less decimals. This can be done with round. Note that round only rounds to integers. To get a specific decimal precision, multiply and then divide the parameter values with a reasonable value. E.g.round(p*100)/100.

Changing figures

There are two main ways you can go about to change your figures to look nicer:

1. plot your figure first, and then make changes through the property inspector (Edit>Figure properties in the figure window), or

2. change the code for how you plot. This way, your figure will look like you want them to every time you run your plot scripts.

Some things that you can do to make your figure look nicer and easier to understand: