6 Excercise Checklist

This checklist should help you keeping track of your exercises. Remember that you have to hand in satisfactory solutions to at least two thirds of the exercises. If you’re part of the beginner track this refers to two thirds of part A (EDA) only. If you’re part of the advanced track, you have to hand in at least two thirds of both individual parts A and B. Hence, you cannot hand in 100 percent of the first part and only 50 percent of the second one. You’ll need more than 66% in each one for a certificate. After all, you’re not really that advanced if you only did half of it, right?

Part A: Exploratory Data Analysis (Beginner and Advanced)

  1. Visualize Google Searches
    • Change the format of the dates
    • Plot the time series for “coronavirus” Google searches
    • Find out what happened on the date the spike occurred
  2. Visualize the stock market data
    • Change the names of the stock symbols
    • Plot the stocks in different panels and add vertical lines
  3. Covid-19 Visualization
    • Create a line plot of the worldwide development of confirmed cases
      • Replace missing values with 0
      • Aggregate Data over all countries by time
      • Plot the worldwide time series of confirmed cases
      • Why is there a kink around February 10th?
    • Stacked line plot by country, top 9 countries
      • Find countries with the highest number of confirmed cases
      • Plot the stacked line plot
    • Area Plot with new variable netinfected
      • Merge the data sets by ID
      • Create new variable netinfected
      • Plot an area plot for the coronavirus variables for each country
      • Create a bar plot with the same measures for each country
    • Bar plot of the Mortality Rate
      • Create a new variable “mortality rate” with a measure of your choice
      • Select the 35 countries with the highest mortality rate
      • Create a bar plot of the mortality rate of those countries
      • Highlight the countries with the highest number of confirmed cases
      • How can you explain the difference in the mortality rate for the top 5 countries?
      • Do you see an issue with how you calculated the mortality rate? How would you measure the mortality rate if you had all the resources you needed?
    • Visualization with Maps (depends on R/Python: Check the boxes in the exercise for exact tasks)

Part B: COVID-19 Predictions (Advanced)

  1. Feature Engineering and Correlation Analysis
    • Generate “New Confirmed” variable
    • Generate lag(GoogleSearch) (0-10)
    • Generate correlation matrix
    • Visualize relationships in e.g. scatterplots
  2. Build a Simple Model Prototype
    • Filter Germany only
    • Generate train/test split by date
    • Set up simple univariate linear regression
    • Calculate model performance (MAPE)
    • Plot \(y_{it}\) and \(\hat{y}_{it}\) vs. time in one plot and visualize train/test split
  3. Refine Your Simple Regression Model
    • Again same observations as above, but use lag(GoogleSearch) as independent variable
    • Compare model performance (MAPE and visually) to the current Google Search Variable without lag
    • Try other features and check if they increase performance
  4. Extend Your Model to Several Countries
    • Pick your best model and train and test it on the entire data set with several countries
    • Evaluate model performance (MAPE and visually)
    • Find reasons why country-specific differences exist and outline if your approach is viable