{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Introduction to Data Science\n",
    "## Lab 14: Nonlinear regression"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Part A: Spline regression"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this problem, you will implement a general spline regression.\n",
    "Recall that a spline of degree $d$ is a piecewise polynomial of degree $d$ with continuity of derivatives of orders $0, 1, \\ldots, d-1$.\n",
    "\n",
    "In the case of a cubic spline ($d = 3$) with $K$ knots, this regression function can be expressed by\n",
    "\n",
    "$$ f(x_i) = \\beta_0 + \\beta_1 \\, b_1(x_i) + \\ldots + \\beta_{K+3} \\, b_1(x_i) $$\n",
    "\n",
    "using appropriate basis functions.\n",
    "\n",
    "From Slide 355, you know that we can start off with monomials $x, x^2, x^3$ and then add for each knot $\\xi$ one **truncated monomial**\n",
    "\n",
    "$$ h(x,\\xi) = (x-\\xi)_+^3 = \\begin{cases} (x - \\xi)^3 & \\text{if } x > \\xi, \\\\ 0 & \\text{otherwise.} \\end{cases} $$\n",
    "\n",
    "\n",
    "A one-dimensional example is provided below.\n",
    "Shown are 100 samples of an artificial data set together with their (unknown) population line."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "# Implement 'unknown' function\n",
    "def y_func(x):\n",
    "    return np.sin(8*x) / np.exp((5*x))\n",
    "\n",
    "# Number of samples\n",
    "n = 100\n",
    "\n",
    "# Degree of randomness\n",
    "eps = 0.1\n",
    "np.random.seed(1)\n",
    "\n",
    "# Draw samples\n",
    "x = np.random.rand(n)\n",
    "y = y_func(x) + eps * np.random.randn(n)\n",
    "\n",
    "# Plot samples\n",
    "plt.scatter(x, y, label='Samples')\n",
    "\n",
    "# Plot population line\n",
    "xlin = np.linspace(0,1,100)\n",
    "plt.plot(xlin, y_func(xlin),'r--', label='Population line')\n",
    "plt.legend();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Task**: Complete the implementation of the function `generateSplineRegressionMatrix`\n",
    "that implements spline regression of degree $d$.\n",
    "Each column should contain one of the basis functions evaluated in all \n",
    "of the knots $\\xi_i$, $i=1,\\ldots,K$.\n",
    "\n",
    "**Hint**: If you have no idea how to get started, have a look at **Part B**. There, we perform a **global cubic regression**.\n",
    "This can be adapted to perform a **cubic spline regression**."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "f286726b154df7207bd2aa7db5d30fec",
     "grade": false,
     "grade_id": "cell-0751201643d9b229",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "def generateSplineRegressionMatrix(x, xi, d = 3):\n",
    "\n",
    "    # Get number of samples\n",
    "    n = len(x)\n",
    "    \n",
    "    # Get number of knots\n",
    "    K = len(xi)\n",
    "    \n",
    "    # Initialize matrix with predictor variables 1\n",
    "    X = np.ones((n,1))\n",
    "\n",
    "    # TASK: Append columns for the monomials x^1, ..., x^d\n",
    "    # YOUR CODE HERE\n",
    "    raise NotImplementedError()\n",
    "        \n",
    "    # TASK: Append one column for each knot using the \n",
    "    # truncated monomial (x - xi_k)_+^d\n",
    "    # YOUR CODE HERE\n",
    "    raise NotImplementedError()\n",
    "    return X"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "8a912089b5c3cf007827d935df5cd749",
     "grade": true,
     "grade_id": "cell-241da47f399a0327",
     "locked": true,
     "points": 0,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "xi = np.linspace(0,1,4)[1:-1]\n",
    "X = generateSplineRegressionMatrix(x, xi, 2)\n",
    "assert abs(X.mean() - 0.3831258879866572) < 1e-8\n",
    "\n",
    "xi = np.linspace(0,1,6)[1:-1]\n",
    "X = generateSplineRegressionMatrix(x, xi, 4)\n",
    "assert X.shape == (100,9)\n",
    "assert abs(X.std() - 0.3610267467203577) < 1e-9"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Task**: Estimate the function `y_func` using **cubic** spline regression with $K = 4$ equidistant knots, i.e., $\\xi_1=0.2, \\xi_2=0.4, \\xi_3=0.6, \\xi_4=0.8$.\n",
    "Store your regression matrix as `X`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "258f968df64f55d60990e40f4d1d1377",
     "grade": false,
     "grade_id": "cell-1356bfd72144750b",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "6a10832bf1d1e890cf504bbd9e9b1511",
     "grade": true,
     "grade_id": "cell-76f96871567adea9",
     "locked": true,
     "points": 0,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "assert abs(X.mean() - 0.2732634981273543) < 1e-8\n",
    "assert X.shape == (100,8)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Task**: Solve the regression problem and plot the regression line together with the population line and the data set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "ea8347123eacf74fc8f723dbfabfa315",
     "grade": false,
     "grade_id": "cell-a6fd26c3954d4415",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Part B: Global spline regression"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Hint**: If you have no idea how to get started, the following code cells might help you.\n",
    "They perform a **global cubic regression** and can be adapted to perform a **cubic spline regression**."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def generatePolynomialRegressionMatrix(x, d = 3):\n",
    "    \n",
    "    # Get number of samples\n",
    "    n = len(x)\n",
    "    \n",
    "    # Initialize matrix with predictor variables 1 (for the intercept)\n",
    "    X = np.ones((n,1))\n",
    "\n",
    "    # Append columns for the monomials x^1, ..., x^d\n",
    "    for i in range(d):\n",
    "        X = np.hstack((X,np.power(x,i+1).reshape(n,1)))\n",
    "    \n",
    "    return X"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set degree of spline regression\n",
    "d = 3\n",
    "\n",
    "# Generate Spline matrix\n",
    "X = generatePolynomialRegressionMatrix(x, d)\n",
    "\n",
    "# Solve the normal equation, remember the @ sign\n",
    "# performs the ordinary matrix multiplication for numpy arrays.\n",
    "# You should also avoid to invert the matrix X^T * X!\n",
    "\n",
    "beta = np.linalg.solve(X.T @ X, X.T @ y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.scatter(x, y, label='Samples')\n",
    "\n",
    "Xreg = generatePolynomialRegressionMatrix(xlin,d)\n",
    "\n",
    "plt.plot(xlin, Xreg @ beta, 'g-', label = 'Regression line')\n",
    "plt.plot(xlin, y_func(xlin), 'r--', label='Population line')\n",
    "plt.legend();"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}