{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\\rightarrow$Run All).\n",
    "\n",
    "Make sure you fill in any place that says `YOUR CODE HERE` or \"YOUR ANSWER HERE\", as well as your name below.\n",
    "\n",
    "Rename this problem sheet as follows:\n",
    "\n",
    "    ps{number of lab}_{your user name}_problem{number of problem sheet in this lab}\n",
    "    \n",
    "for example\n",
    "    \n",
    "    ps2_blja_problem1\n",
    "\n",
    "Submit your homework within one week until next Monday, 9 a.m."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "NAME = \"\"\n",
    "EMAIL = \"\"\n",
    "USERNAME = \"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "da465154543cb14b6a58467e18d23899",
     "grade": false,
     "grade_id": "cell-09e40ef55e74ddbf",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "# Introduction to Data Science\n",
    "## Lab 8: Cross-validation for a diabetes data set"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "96c70e1e5d52ad0fd39374d921d1f24e",
     "grade": false,
     "grade_id": "cell-6a9c350b8253c1c6",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "### Part A: Importing the data set"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "cca43054d388b13e54a7d7219a290adf",
     "grade": false,
     "grade_id": "cell-e9f33802d3e4ed8f",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "The diabetes data set contains ten measurements (age, sex, body mass index, average blood pressure, and six blood serum measurements) for each of the `n = 442` patients.\n",
    "\n",
    "The response variable is a quantitative measure of disease progression one year after baseline.\n",
    "\n",
    "**Task**: The data set is part of scikit learn, you can import it by executing the next cell."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "60a65c36d4e824c587423b2ff9770ba4",
     "grade": false,
     "grade_id": "cell-64880bb943a3eeb3",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "from sklearn import datasets\n",
    "diabetes = datasets.load_diabetes()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "33e371c44f82b5e59167810ad53e13d6",
     "grade": false,
     "grade_id": "cell-233004448533d63f",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "Here, `diabetes` will be a dictionary.\n",
    "A *dictionary* is an unordered collection of items.\n",
    "While other compound data types have only `values` as elements (a *list* for example), a dictionary consists of `key: value` pairs.\n",
    "\n",
    "**Task**: You can return the keys using the method `.keys()`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "56a1c617b946cc02d136ac354ce8580d",
     "grade": false,
     "grade_id": "cell-230736cbe8e06f83",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "diabetes.keys()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "91781a52b045bf88ce5a8a43053bd49a",
     "grade": false,
     "grade_id": "cell-f0857d4c565e2188",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "Here, you find that the *dictionary* diabetes contains the keys\n",
    "\n",
    "    'data', 'target', 'DESCR', 'feature_names', 'data_filename', 'target_filename'\n",
    "\n",
    "Since `DESCR` sounds like description, we print its *value* by the following command\n",
    "    \n",
    "    print(diabetes.DESCR)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "720b643071033e63bb59cadd6d81b9b2",
     "grade": false,
     "grade_id": "cell-d6d1064cfde4400b",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "print(diabetes.DESCR)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "5a2b4d9ac718e0d022221030fa2e281b",
     "grade": false,
     "grade_id": "cell-9d7423f254773e52",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "Your first task will be to create a `pandas.DataFrame` to hold this information.\n",
    "\n",
    "**Task (2 points)**:\n",
    "Create a pandas data frame `X` holding the ten predictor variables. You should name the columns in the data frame using the optional argument `columns=cols`, where `cols` is given by\n",
    "    \n",
    "    cols = [\"age\", \"sex\", \"bmi\", \"map\", \"tc\",\n",
    "            \"ldl\", \"hdl\", \"tch\", \"ltg\", \"glu\"]\n",
    "            \n",
    "Store the response variables as an numpy array `y`\n",
    "\n",
    "**Hint**:\n",
    "As in the iris data set, the diabetes data set is as a python dictionary.\n",
    "The predictor variables can be accessed by `diabetes.data`, the responses via `diabetes.target`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "171e20fd57fab2b905c00bf35dc7604b",
     "grade": false,
     "grade_id": "cell-db36fecb18abea08",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "8af74716c4794ab6a3efb2e5d1eca262",
     "grade": true,
     "grade_id": "cell-29c0a4f3fa57f9b3",
     "locked": true,
     "points": 2,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "assert X.shape == (442,10)\n",
    "assert all(X.columns == [\"age\", \"sex\", \"bmi\", \"map\", \"tc\", \"ldl\", \"hdl\", \"tch\", \"ltg\", \"glu\"])\n",
    "assert abs(X.age.mean()) < 1e-10"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "e0217ba271e8f6a96547a693e814d8eb",
     "grade": false,
     "grade_id": "cell-671c2584a7cbb043",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "In the following, we want to try two different estimation approaches:\n",
    "1. At first, we use a plain training-test set approach, where we exclude $1/5$ of the data from training.\n",
    "2. Our second approach is to estimate $5$ different models using 5-fold cross-validation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "f04e342ee40d9681f67a58634bb6defc",
     "grade": false,
     "grade_id": "cell-33c796b1ed15517f",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "### Part B: Simple splitting into training and validation set"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "b206583ebc778f1220606cb32f5d5236",
     "grade": false,
     "grade_id": "cell-c1f1f7d6b17407a1",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "In this part, we want to train a linear model using a subset of our samples.\n",
    "We have done this by hand so far, but there are also methods provided by `sklearn` which will do this work for us.\n",
    "\n",
    "Use the function `train_test_split` from the module `sklearn.model_selection` to divide your data inta a training and a validation set. SInce this selection is made randomly, you should set the optional input `random_state` to fix the seed of the random number generator to ensure comparability, e.g., by setting `random_state = 1`.\n",
    "\n",
    "**Task (1 point)**: Split your data into a training and a test set using the function `train_test_split`.\n",
    "Your *test set should contain 20\\% of the data*.\n",
    "Use `random_state=1`.\n",
    "Store your sets in variables `X_train, X_test, y_train, y_test`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "b6e437dfd4263f73d0772f81208be3d3",
     "grade": false,
     "grade_id": "cell-253d355e804f66c1",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "5ff33f7c9565e8eee8833ae10d020ded",
     "grade": true,
     "grade_id": "cell-5c8e4e639d5436be",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "assert X_train.shape == (353,10)\n",
    "assert y_test.shape == (89,)\n",
    "assert abs(y_test.mean() - 147.20224719101122) < 1e-8\n",
    "assert abs(X_test.age.var() - 0.0023730443017513166) < 1e-8"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Task (1 point)**:\n",
    "Fit a *LinearRegression model* to your **training** data.\n",
    "Use the appropriate method from `sklearn`.\n",
    "\n",
    "Use your model to predict the response on the test set and store your prediction in a variable `test_pred`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "6a3ba410a65a7e6c817eb978e4109b80",
     "grade": false,
     "grade_id": "cell-9d2306f93189f453",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "3630d23a04894298c3b05029de0b157f",
     "grade": true,
     "grade_id": "cell-44c6b5d6f023a572",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "assert abs(test_pred.mean() - 143.7088817962804) < 1e-8"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "a8cd7f8f8f968b6f11c13d01d023d564",
     "grade": false,
     "grade_id": "cell-cfb671f87ebfc359",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "Until now, our plots were always of the type predictor against response or against regression line.\n",
    "Another way to display the quality of a regression fit is to plot the true values against the predicted values.\n",
    "The closer the values are to the identity $f(x) = x$, the better the fit.\n",
    "\n",
    "**Task (2 points)**:\n",
    "Produce a scatterplot of the true values in the validation response against the predicted values. Draw also a line corresponding to the *ideal prediction*, i.e., each prediction is equal to its true value.\n",
    "Label the axes accordingly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "8ebe308a38f31a94c988dc7ab0c147ed",
     "grade": true,
     "grade_id": "cell-0d7703946fd94797",
     "locked": false,
     "points": 2,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "from matplotlib import pyplot as plt\n",
    "%matplotlib inline\n",
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "4e9e52aed23b79c9f26595cfa240628e",
     "grade": false,
     "grade_id": "cell-0bfc37e71a4e2058",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "**Task (1 point)**: Compute the mean squared error $\\text{MSE}_\\text{val}$ on the validation set.\n",
    "You can either use the method `mean_squared_error` from the module `sklearn.metrics`, or you can implement it by yourself.\n",
    "Store the mean squared error in a variable `mse_test`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "a835cb6a9fe99c014d786c4b3224118c",
     "grade": false,
     "grade_id": "cell-0cf189f73fe1b8bb",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "10b811c5dee79e74c1905c38bf9bb2bb",
     "grade": true,
     "grade_id": "cell-39655950c0337042",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "assert abs(mse_test - 2992.5576814529445) < 1e-8"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "302266d9dfa009268d1ed0539321cb42",
     "grade": false,
     "grade_id": "cell-6e3f79f47f759691",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "**Task (1 point)**: What is the proportion of variability that is explained by this linear fit. Store your answer in a variable `expl_var`.\n",
    "\n",
    "*Remember*: A `LinearRegression` has a method that computes exactly this."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "35c5865db1c7dbf69560091ebb9f6c0d",
     "grade": false,
     "grade_id": "cell-02da9e3b706ff219",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "5a5f4ede57677febecab982bb1b518cd",
     "grade": true,
     "grade_id": "cell-cd94b31e3414158f",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "assert abs(expl_var - 0.43843604017332694) < 1e-8"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "2895bb3f81c3b25c143ac1e3b17dcbac",
     "grade": false,
     "grade_id": "cell-63670992fba16704",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "### Part C: K-Fold Cross-Validation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "8995999bcba98f079fe9ed13c47427a7",
     "grade": false,
     "grade_id": "cell-e8c06d868e91a629",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "Next, we want to use cross-validation to select our model.\n",
    "Scikit-learn is a powerful library and possesses numerous modules and functions.\n",
    "Here, we explore the function `cross_val_score`, which can be imported by\n",
    "\n",
    "    from sklearn.model_selection import cross_val_score\n",
    "    \n",
    "This function performs K-fold cross-validation and returns a score for each fold (this is the $R^2$-score by default).\n",
    "    \n",
    "**Task**: Please read the [documentation](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html#sklearn.model_selection.cross_val_score) and import the function `cross_val_score`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "df0129b8049eac98c44bf8eb16284407",
     "grade": false,
     "grade_id": "cell-13e697d4b0ee21f2",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "from sklearn.model_selection import cross_val_score"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "f7ec837aa594008a7463680567e93b59",
     "grade": false,
     "grade_id": "cell-b40037e4f1431811",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "The functions expects as a first argument an `estimator`.\n",
    "We are informed by the documentation that this should be an \"estimator object implementing \\[the method\\] ‘fit’\".\n",
    "\n",
    "This is fulfilled by all estimation methods used so far (e.g. linear models, logistic regression, LDA).\n",
    "In the case of a linear regression fit, this could be\n",
    "    \n",
    "    model = linear_model.LinearRegression()\n",
    "\n",
    "**Task (1 point)**: Perform a 5-fold cross-validation for a linear model on the diabetes data set and print the scores.\n",
    "Store the output of the function `cross_val_score` in a variable `cv_scores`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "278b6e777f3d6fb0e14441fe1cb94773",
     "grade": false,
     "grade_id": "cell-c553fd69d4271177",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "d25ec6da7f8965685f11c6ebb2073562",
     "grade": true,
     "grade_id": "cell-99d619405f87a651",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "assert (cv_scores.mean() - 0.48231812211149394) < 1e-8"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "6ea091c1daace19403d9e9849cfc441d",
     "grade": false,
     "grade_id": "cell-1243681c5a0df9e3",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "**Task (1 point)**: Use the function `cross_val_predict` in the module `sklearn.model_selection` to make prediction on the diabetes data set.\n",
    "Store your answer in a variable `cv_pred`. Use again 5 folds."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "ebefdea203e886bf52e1ba3b784ceb34",
     "grade": false,
     "grade_id": "cell-007de9e596ef7ff7",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "f5cda670c79c6416ec792129c62ac9c4",
     "grade": true,
     "grade_id": "cell-c0de606ebc88d567",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "assert cv_pred.shape == (442,)\n",
    "assert abs(cv_pred.mean() - 151.7873610258396) < 1e-8"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "d963193ea6d66d843c72229747ac4cad",
     "grade": false,
     "grade_id": "cell-91ce7cdbeca42c0e",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "**Task (1 point)**: Make a scatterplot of the true values in the test response against the predicted values similar to the one in **Part B**, but now using all of the data. Label the axes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "739cceb0711dfd4d999913729d695c13",
     "grade": true,
     "grade_id": "cell-688b3f2f9f7696d3",
     "locked": false,
     "points": 1,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "from matplotlib import pyplot as plt\n",
    "%matplotlib inline\n",
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "cb010681074690de7cbe85a135d2429a",
     "grade": false,
     "grade_id": "cell-f37498936a09e60f",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "**Task (1 point)**: Compute the $R^2$-score this model and store it in a variable `accuracy`. You can use the function `r2_score` from the module `sklearn.metrics`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "9edd8eb48fb829f877d9c890099b64bb",
     "grade": false,
     "grade_id": "cell-924703d0d9799788",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()\n",
    "print(\"Cross-validated Accuracy:\", accuracy)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "0ffb8018ee2ae036e1c3f47e31550511",
     "grade": true,
     "grade_id": "cell-bec19fb2b666940c",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "assert abs(accuracy - 0.49532382463572844) < 1e-8"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "b3db316ac354719bddc3db6578e7e5f9",
     "grade": false,
     "grade_id": "cell-d59732cd980a7b79",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "**Caution**: Altough this $R^2$-score is higher than the score for the training/validation set split, they are not really comparable since we computed them on different subsets of the data.\n",
    "To get a more reliable comparison, we must keep part of the data as a so-called *hold-out* data set to be used for estimating the true learning error."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}