Some notes for DNN lab5
- No need for GPU today.
- Task 1-4 only need 8 epochs. Task 5-7 only need 5 epochs.
- In task 3, use 6 hidden layers instead of 5.
- In task 4, use 0.25 for dropout, for example. Fix trainer to add net.eval() and net.train().
- Dropout (and BatchNorm) after every layer except the last one.
- Dropout usually after activation (doesn't matter for ReLU).
- Will probably not improve max accuracy on MNIST (maybe with a wider the network and better hyperparams?).
- In task 5, use dropout (otherwise you won't see a difference).
- BatchNorm before activation and before dropout.
- In task 7: use torch.nn.Sequence of Conv2d, BatchNorm2d, MaxPool2d, ReLU, Dropout2d, Linear, Flatten, BatchNorm1d.
- Don't make it too generic, you can just try different fixed sequences of layers.
- Dropout2d zeroes a whole channel at a time.
from torch.nn.parameter import Parameter
def forward(x: Tensor) -> Tensor:
if self.training:
mean, var = x.mean(axis=/batch/), x.var(axis=/batch/)
update running_mean, running_var
else:
mean, var = running_mean, running_var
return ...
def __init__(self):
self.gamma = Parameter(torch.ones(num_features))
self.beta = Parameter(torch.zeroes(num_features))
self.running_mean: Tensor
self.running_var: Tensor
self.register_buffer('running_mean', torch.zeros(num_features))
self.register_buffer('running_var', torch.ones(num_features))