{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "5e9cf430",
   "metadata": {},
   "source": [
    "# Mathematical details\n",
    "\n",
    "To simulate times series with multivariate effects, the users has to minimally provide:\n",
    "\n",
    "- ``X``: design matrix ($n_{trials} \\times n_{conditions}$)\n",
    "- ``effects``: specifies the ``condition`` for which the effect should be present, its time ``windows`` and the ``effect_size`` ($\\Delta$)\n",
    "- ``noise_std``: the additive noise within each subject ($\\sigma^2$)\n",
    "- ``n_channels``: the number of channels/sensors\n",
    "- ``tmin, tmax, sfreq``: the timing information of each trial required to determine the number of samples ($n_t$) per trials \n",
    "- ``ch_cov``: channel by channel covariance ($\\Sigma$)\n",
    "\n",
    "Based on this inputs, time series data with multivariate effects are simulated using a multivariate general linear model (GLM). For each subject $s$, the simulated data matrix $\\boldsymbol{Y_s}\\in\\mathbb{R}^{n_{samples} \\times n_{channels}}$ is generated as:\n",
    "\n",
    "$$\n",
    "\\mathbf{Y}_s = \\mathbf{X}_{full} \\mathbf{B}_s + \\mathbf{1} \\boldsymbol{\\beta}_{0,s}^\\top + \\boldsymbol{\\varepsilon}_s,\n",
    "$$\n",
    "\n",
    "where $\\mathbf{X}_{full}$ is the full design matrix, $\\mathbf{B}_s$ is the subject specific matrix of $\\beta$ regression coefficient, $\\boldsymbol{\\beta}_{0,s}$ is the channel specific intercept for each subject, and $\\boldsymbol{\\varepsilon}_s$ is the subject specific multivariate additive noise matrix. \n",
    "\n",
    "The full design matrix $\\mathbf{X}_{full}$ is obtained by taking the Kronecker product between the user specified trial-wise design matrix $X \\in \\mathbb{R}^{n_{trials}\\times n_{conditions}}$ (i.e. ``X``) and the identity matrix $\\mathbf{I}_t \\in \\mathbb{R}^{nt \\times nt}$ where $n_t$ is the number of sample per trials (derived from ``tmin, tmax, sfreq``):\n",
    "\n",
    "$$\\mathbf{X}_{full} = \\mathbf{X}\\bigotimes \\mathbf{I}_t$$\n",
    "\n",
    "yielding a $\\mathbf{X}_{full}$ with $N_{samples}=n_{trials}\\times n_t$ rows-one for every trial–time pair-and $n_{conditions} \\times n_t$ columns-one for every condition–time pair.\n",
    "\n",
    "The intercept is sampled independently for every channel,\n",
    "\n",
    "$$\\boldsymbol{\\beta}_{0,s} \\sim \\mathcal{N(0, \\sigma^2)} \\in \\mathbb{R}^{n_{channels}}$$\n",
    "\n",
    "and the subject specific noise $\\boldsymbol{\\varepsilon}_s $ is drawn from a multivariate normal distribution with spatial covariance $\\mathbf{\\Sigma}$:\n",
    "\n",
    "$$\n",
    "\\boldsymbol{\\varepsilon}_s \\sim \\mathcal{N}(0, \\sigma^2 \\mathbf{\\Sigma})\n",
    "$$\n",
    "\n",
    "The $\\mathbf{B}_s\\in\\mathbb{R}^{(n_{conditions} \\times n_t)\\times n_{channels}}$​ stacks, row-wise, the time-resolved regression coefficients (β-weights) for every condition. Each row therefore corresponds to a specific condition–time pair, and each column to a sensor or feature. To embed a multivariate effect at selected time points, we draw a random spatial pattern from a standard normal distribution $v \\sim \\mathcal{N}(0, \\mathbf{I})$ across channels, and rescale it by a constant $a$ so that its Mahalanobis length equals the user-requested effect size (``effect_size``, see below). Rows of $\\mathbf{B}_s$ that correspond to the chosen condition-times windows are then set to $av^\\top$.\n",
    "\n",
    "## Effect size\n",
    "\n",
    "Effect sizes are simulated based on the Mahalanobis distance ({cite}`mclachlan1999mahalanobis`, {cite}`mahalanobis1930tests`), which is a multivariate generalization of the standard z-score, taking into account the covariance structure of the data. For two classes, the Mahalanobis distance is defined as:\n",
    "\n",
    "$$\\Delta =  \\sqrt{(\\mu_{1} - \\mu_{2})^{T}\\Sigma^{-1}(\\mu_{1} - \\mu_{2})}$$\n",
    "\n",
    "- $\\mu_{1}$ is the mean of the first condition\n",
    "- $\\mu_{2}$ is the mean of the second condition\n",
    "- $\\Sigma$ is the covariance matrix\n",
    "- $T$ denotes matrix transpose\n",
    "\n",
    "In our specific case, as the additive noise is multiplied by the covariance matrix to generate the final data, the effect size is equal to:\n",
    "\n",
    "$$\\Delta =  \\sqrt{(\\mu_{1} - \\mu_{2})^{T}(\\sigma^{2}\\Sigma)^{-1}(\\mu_{1} - \\mu_{2})}$$\n",
    "\n",
    "Which simplifies to:\n",
    "\n",
    "$$\\Delta =  \\frac{1}{\\sigma}\\sqrt{(\\mu_{1} - \\mu_{2})^{T}\\Sigma^{-1}(\\mu_{1} - \\mu_{2})}$$\n",
    "\n",
    "To simulate data with the require effect size $\\Delta$, we generate a random vector $\\tilde v$ by drawing a random number from a standard normal distribution for each channel (which are used as the $\\mathbf{B}$ of our generative GLM). We then normalize that vector to have a Mahalanobis length of 1:\n",
    "\n",
    "$$ v = \\frac{\\tilde v}{\\sqrt{(\\tilde v)^{T}\\Sigma^{-1}(\\tilde v)}}$$\n",
    "\n",
    "We can scale up or down that vector by a constant $a$ and place the centroid of each class on each side thereof to generate a pattern of the desired effect size. Therefore, the Mahalanobis distance of our effect is:\n",
    "\n",
    "$$\\Delta =  \\frac{1}{\\sigma} \\sqrt{av^{T}\\Sigma^{-1}av}$$\n",
    "\n",
    "Which simplifies to:\n",
    "\n",
    "$$\\Delta =  \\frac{a}{\\sigma} \\sqrt{v^{T}\\Sigma^{-1}v}$$\n",
    "\n",
    "Where: \n",
    "- $v$ is a random vector of Mahalanobis length of 1 (i.e. Mahalanobis unit length vector)\n",
    "- $a$ is a constant to scale the effect up or down to achieve the desired effect size.\n",
    "\n",
    "As $v$ is of unit length, the term $\\sqrt{v^{T}\\Sigma^{-1}v}$ is equal to 1. Accordingly, the equation simplifies to:\n",
    "\n",
    "$$\\Delta =  \\frac{a}{\\sigma}$$\n",
    "\n",
    "Accordingly, to generate a multivariate pattern of the desired effect size, we have to resolve $a$ for $||av||_{\\Sigma^{-1}}$ and $sigma$, which gives:\n",
    "\n",
    "$$a = \\Delta * \\sigma$$\n",
    "\n",
    "where: \n",
    "- $\\Delta =$``effect_size``\n",
    "- $\\sigma$=``noise_std``\n",
    "\n",
    "By multplying our vector $v$ by the constant $a$, we ensure that the distance between the two classes matches the desired effect size.\n",
    "\n",
    "### Effect size and decoding accuracy\n",
    "\n",
    "With equal class covariances, a Bayes-optimal linear classifier achieves:\n",
    "\n",
    "$$\\Phi(-\\frac{1}{2}d')$$\n",
    "\n",
    "Where $\\Phi$ is the normal distribution cummulative distribution function ({cite}`mclachlan1999mahalanobis`, {cite}`mclachlan2005discriminant`). Accordingly, the maximal theoretical decoding accuracy is equal to:\n",
    "\n",
    "$$1 - \\Phi(-\\frac{1}{2}d')$$\n",
    "\n",
    "Thus an effect size of $d'=0.5$ implies a theoretical ceiling of ≈ 69 % accuracy, $d'=1$ gives ≈ 84 %, and so on.  By scaling the injected pattern according to the formula above, **multisim** ensures that simulated data respect this relationship irrespective of the number of channels or their covariance.\n",
    "\n",
    "\n",
    "## References\n",
    "```{bibliography}\n",
    ":style: unsrt\n",
    ":filter: docname in docnames\n",
    "```"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}