Introduction to Data Science (4V, 2Ü) Prof. Ernst, WS 2018/19
Content
Tentative list of course topics:
- Introduction: What is Data Science
- Learning Theory
- Regression
- Neural Networks
- Classification
- Clustering and Tree-Based Methods
- Support Vectors
- Unsupervised Learning
Notices
Extra lab session
In place of the lecture on Thursday, December 20, there will be an extra lab
session during the regular class time in the computer pool.
Class cancellations
There will be no class on Monday, December 17.
Rescheduled lecture
To make up for the lecture on December 10 lost to the railway strike,
we will have a lecture in place of the lab session 13:35-15:05 on Tuesday, December 11
in computer pool.
Temporary lecture room change
On Thursday, November 1 and Monday, November 5, the lecture will take place in Room 2/N101.
New time for Lab Exercises
Beginning on Tuesday, October 23, 2018, we will start our labs 10 minutes earlier, i.e., 13:35.
Cancellation
There will be no lab exercise session on Tuesday, October 16.
Note
To participate in the lab exercises, all students should have an account with the MRZ (Mathematics Computing Center).
Those who do not already have one, please apply for one
by following this page
and collect your login credentials with Ms. Margit Matt (Rh39, Room 704).
First Lab Exercises
Tuesday, October 9, 2018.
First Class
Monday, October 8, 2018.
Listing of this course in the electronic Vorlesungsverzeichnis (course directory):
- Introduction: What is Data Science
- Learning Theory
- Regression
- Neural Networks
- Classification
- Clustering and Tree-Based Methods
- Support Vectors
- Unsupervised Learning
Extra lab session | In place of the lecture on Thursday, December 20, there will be an extra lab session during the regular class time in the computer pool. | Class cancellations | There will be no class on Monday, December 17. | Rescheduled lecture | To make up for the lecture on December 10 lost to the railway strike, we will have a lecture in place of the lab session 13:35-15:05 on Tuesday, December 11 in computer pool. | Temporary lecture room change | On Thursday, November 1 and Monday, November 5, the lecture will take place in Room 2/N101. | New time for Lab Exercises | Beginning on Tuesday, October 23, 2018, we will start our labs 10 minutes earlier, i.e., 13:35. | Cancellation | There will be no lab exercise session on Tuesday, October 16. | Note | To participate in the lab exercises, all students should have an account with the MRZ (Mathematics Computing Center). Those who do not already have one, please apply for one by following this page and collect your login credentials with Ms. Margit Matt (Rh39, Room 704). | First Lab Exercises | Tuesday, October 9, 2018. | First Class | Monday, October 8, 2018. |
---|
Lecture
Literature
- James, Witten, Hastie & Tibshirani. An Introduction to Statistical Learning – with Applications in R. Springer 2013. Available online at this page.
- Here's a continually updated annotated reading list for the course (16.01.2019).
Slides
- What is Data Science? (05.02.2019)
- Learning Theory (05.02.2019)
- Linear Regression (05.02.2019)
- Classification (05.02.2019)
- Resampling Methods (05.02.2019)
- Linear Model Selection and Regularization (05.02.2019)
- Nonlinear Regression Models (05.02.2019)
- Tree-Based Methods (05.02.2019)
- Support Vector Machines (05.02.2019)
- Unsupervised Learning (05.02.2019)
- All slides (05.02.2019)
Exercises
Installation of Programming Environment under Linux (64 bit)
If you want to do the homework on your personal computers, you may clone the programming environment used in the labs. Get miniconda from this web page and follow the steps in the installation dialogue. Next, download the specification file spec-file.txt used in the labs and create a conda environment (under Linux):conda create --name DS2018 --file spec-file.txt
Installation of Programming Environment under Windows and MacOS
Download miniconda for your distribution by following this link and follow the installation instructions. Next, download the yml-file containing the packages used in the labs and create a conda environment in a miniconda/Anaconda shell:conda env create -f DS2018.yml
If your plots are not displayed in the browser, this might be due to a missing package.
After sourcing of the correct environment, the following might help in some cases
python -m ipykernel install --user
Please refer to Conda (Installation under Windows, Linux and MacOS) and Conda (Managing environments) for further information.
Material
In order to start the jupyter notebooks you have to open a terminal and source our conda environmentDS2018
via
source /LOCAL/Software/DataScience2018/setup_env
Next, change the directory to your exercise folder and download the jupyter notebook (right click and "Save link as") into this folder.
Finally, start the notebook via the command (make sure you see the (DS2018) in front of your username):
jupyter notebook Problem_Sheet_XX.ipynb
- Übungsblatt 1 (Problem sheet 1),
- Übungsblatt 2 (Problem sheet 2),
- Problem sheet 3 (jupyter notebook including homework 3)
- Solution to Problem sheet 3
- Problem sheet 4 (jupyter notebook)
- Solution to Problem sheet 4
- Homework 4 (jupyter notebook)
- Solution to Homework 4 (jupyter notebook)
- Problem sheet 5 (jupyter notebook)
- Solution to Problem sheet 5 (jupyter notebook)
- Homework 5 (jupyter notebook)
- Solution to Homework 5 (jupyter notebook)
- Problem sheet 6 (jupyter notebook)
- Solution to Problem sheet 6 (jupyter notebook)
- Homework 6 (jupyter notebook)
- Introduction to R (R script by Vincent Rost)
- Problem sheet and homework 7 (jupyter notebook)
- Solution to Problem sheet 7 (jupyter notebook)
- Problem sheet 8 (jupyter notebook)
- Solution to Problem sheet 8 (jupyter notebook)
- Homework 8 (jupyter notebook)
- Solution to Homework 8 (jupyter notebook)
- Problem sheet 9 (jupyter notebook)
- Solution to Problem sheet 9 (jupyter notebook)
- Homework 9 (jupyter notebook)
- Solution to Homework 9 (jupyter notebook)
- Problem sheet 10 (jupyter notebook)
- Solution to Problem sheet 10 (jupyter notebook)
- Using GEO data in Python (jupyter notebook by Thomas Kranzkowski)
- Homework 10 (jupyter notebook)
- Solution to Homework 10 (jupyter notebook)
- Problem sheet 11 (jupyter notebook)
- Solution to Problem sheet 11 (jupyter notebook)
- Problem sheet 12 (jupyter notebook)
- Solution to Problem sheet 12 (jupyter notebook)
- Homework 12 (jupyter notebook)
- Solution to Homework 12 (jupyter notebook)
- Problem sheet 13 (jupyter notebook)