{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\\rightarrow$Run All).\n",
    "\n",
    "Make sure you fill in any place that says `YOUR CODE HERE` or \"YOUR ANSWER HERE\", as well as your name below.\n",
    "\n",
    "Rename this problem sheet as follows:\n",
    "\n",
    "    ps{number of lab}_{your user name}_problem{number of problem sheet in this lab}\n",
    "    \n",
    "for example\n",
    "    \n",
    "    ps2_blja_problem1\n",
    "\n",
    "Submit your homework within one week until next Monday, 9 a.m."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "NAME = \"\"\n",
    "EMAIL = \"\"\n",
    "USERNAME = \"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Introduction to Data Science\n",
    "## Lab 10: Verification of the Leave-One-Out cross-validation (LOOCV) magic formula"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this exercise, we want to verify the magic formula for the Leave-One-Out cross-validation known from the lecture, slide 229.\n",
    "\n",
    "According to this, the cross-validation estimate of the error is\n",
    "\n",
    "$$\n",
    "\\text{CV}^{\\text{LOOCV}}_{(n)} = \\frac{1}{n} \\sum_{i=1}^n \\left(\n",
    "\\frac{y_i - \\hat y_i}{1 - h_i}\n",
    "\\right)^2,\n",
    "\\quad \\text{with } \n",
    "h_i = \\frac{1}{n} + \\frac{(x_i - \\bar x)^2}{\\sum_{j=1}^n (x_j - \\bar x)^2}\n",
    "$$\n",
    "being the leverage statistic of observation $i$.\n",
    "\n",
    "The general formula for the CV estimate, which is valid for all cross-validition methods, is\n",
    "\n",
    "$$\n",
    "\\text{CV}_{(k)} = \\frac{1}{k} \\sum_{i=1}^{k} \\text{MSE}_i.\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, we generate an example to verify that $\\text{CV}^{\\text{LOOCV}}_{(n)} = \\text{CV}_{(n)}$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "cd64dc79811e1c91b820e1b65bb1a503",
     "grade": false,
     "grade_id": "cell-5fb7e0b6a98e747e",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "n = 20\n",
    "X = np.arange(n)\n",
    "np.random.seed(1)\n",
    "y = 4*X+0.1*np.random.randn(n)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Task (1 point)**: Fit a simple linear regression model on our data `X, y`. Here, it should be sufficient to use `numpy`'s `polyfit` function together with `polyval`.\n",
    "Store the regression coefficients in the variable `p`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "44110611c74841efab5e9e9bceff5415",
     "grade": false,
     "grade_id": "cell-a91bdd71d8619f08",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "2f076ffa3ab83b4c95e0dcd0415eb707",
     "grade": true,
     "grade_id": "cell-af008d8fa468b94f",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "assert np.max(np.abs(p - np.array([ 3.99916958, -0.0054475 ]))) < 1e-7"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Task (1 point)**: Determine the predicted values of your data set and store them in the variable `ypred`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "c468241c9f027deb93ff2f48fdb11a6d",
     "grade": false,
     "grade_id": "cell-b7707ff29de634ae",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "ff4cb6ddef7dbadc312b05fcb9aa3c37",
     "grade": true,
     "grade_id": "cell-8ef2a783a2bee007",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "assert np.abs(ypred.mean() - 37.98666353635393) < 1e-8\n",
    "assert np.abs(ypred.std() - 23.06033677189789) < 1e-8"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Task (2 points)**: Implement the function `magicCV` to compute the $\\text{CV}^{\\text{LOOCV}}_{(n)}$ and evaluate it for our example.\n",
    "Store the obtained value $\\text{CV}^{\\text{LOOCV}}_{(n)}$ in a variable `cvn1`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "2eb968392c8bb875108050e688fe475b",
     "grade": false,
     "grade_id": "cell-a81d6910b7f18622",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "def magicCV(X, y, ypred):\n",
    "    \"\"\"Function to compute and return the\n",
    "    CV error estimate using the magic formula\n",
    "    from slide 229.\"\"\"\n",
    "    # Implement the magicCV function here\n",
    "    # YOUR CODE HERE\n",
    "    raise NotImplementedError()\n",
    "    \n",
    "# Evaluate the function for our example\n",
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "b1902d1a8b43b35f240daf4f5cec7fa4",
     "grade": true,
     "grade_id": "cell-6216ebbea69b7d80",
     "locked": true,
     "points": 2,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "assert np.abs(cvn1 - 0.014723772434852839) < 1e-6"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Task (2 points)**: Now, you should determine the value $\\text{CV}_{(n)}$ by performing simple linear regression fits in a LOOCV fashion.\n",
    "You should use the function `LeaveOneOut().split(...)` from `scikit-learn` to generate the test and training indices for each iterate, and update the quantity `cvn2` in each iterate.\n",
    "\n",
    "You should observe that both values `cvn1` and `cvn2` are almost identical."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "fd8fbc89c30b1691b88b004d5b520cd7",
     "grade": false,
     "grade_id": "cell-e0cd0baa3b5ce671",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "from sklearn.model_selection import LeaveOneOut\n",
    "cvn2 = 0.0\n",
    "for train_idx, test_idx in LeaveOneOut().split(X):\n",
    "    # Determine the mean-squared error \n",
    "    # YOUR CODE HERE\n",
    "    raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "aa4509e77bfc108c324834f13d00bd60",
     "grade": true,
     "grade_id": "cell-915d5fbc022f1a91",
     "locked": true,
     "points": 2,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "assert np.abs(cvn2 - 0.014723772434852688) < 1e-6\n",
    "assert abs(cvn1-cvn2)  < 1e-12"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}