- Explore data
- Augment data
- Train a simple model and evaluate the effect of using augmented data
- Accept assignment: https://classroom.github.com/a/Zz9PEnY2
- Clone your student repository (
git clone
) - Run
uv sync
and check everything is correct withuv run hello.py
cd exercise
- Unzip
02_Files.zip
- Start Jupyter
- Evaluate the nearest neighbour baseline "properly" in this notebook.
Complete the functions in
tasks.py
and pass the tests. You will have to: - Create a subset of the original datasets with 500 images. Create an augmented data set of 2500 images from the selected subset. Pay attention to obtaining a representative balance between healthy and malign samples. The augmented images should be of size 64,64.
- Fit a KNNs classifier using
- the original subset of 500 data samples. You should achieve >60% accuracy.
- the augmented dataset. You should achieve >70% accuracy.
Note: You will have to play with the number, type and hyperparameters of the augmentations and the kNN classifier. Note: You will have to implement the functions to train, predict and evaluate the kNN model.
- Compare the performance using the confusion matrix. Plot it on this notebook.
Tip: Solve each of the tasks first on the notebook, so it is easier to see the input and output of the functions. Check the file
test_exercise2.py
and the docstrings of each function to get more information on how to implement them.