fast.ai 2020 — Lesson 6
learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(2, base_lr=0.1)
Learning rate finder helps to pick the best learning rate. The idea is to change the learning rate after every mini-batch and then plot the loss. Good learning rate is somewhere between the steepest point and the minimum. So for example based on the picture below a good learning rate for this situation might be 50^-2. One rule of thumb is to pick magnitude smaller learning rate from the minimum which in this case would be something like 10^-2.

Transfer learning (learn.fine_tune
)
freeze()
"freezes" all the other layers except the last meaning that optimizer will update only the last layers.
After freezing the model it’s common to do just one epoch to update the parameters in the last layer.
Learning rate is divided by two.
Model is “unfreezed”.
More epochs (= train normally).
Model that’s already trained don’t have the same shape as untrained model when using learning rate finder because it’s much harder to improve the model. In that case, a good learning rate can be found by finding the point where loss start to increase a lot (10^-4) and then go ten times smaller (10^-5).

Discriminative learning rates
People have found that it’s better to train earlier layers with smaller learning rate than later layers.
Instead of giving a number to learning rate it’s possible to give slice
learn.fit_one_cycle(12, lr_max=slice(1e-6, 1e-4))
In the example above the first layer gets 1e-6
as learning rate and the last gets 1e-4
and the layers in the middle get learning rates equally increased from the first to the last.
fit_one_cycle
It increases the learning rate in the first 1/3 of batches and then 2/3 it decreases it. Research have found that this kind of learning rate changing helps the training.
Sometimes it’s possible to see the lower loss not in the last epoch and in that case it’s often better to retrain the model by changing epoch to the epoch where the loss was lowest instead of just taking the lowest loss from original model meaning that it’s not the last.
learn = cnn_learner(dls, resnet50, metrics=error_rate).to_fp16()
☝️ half precision floating point. Faster training but often the training looks the same (sometimes even better).
Dataset = Items has indexes and it’s possible to get a length. Often gives a tuple of independent and dependent variable.
DataLoader = Handles data in batches.
Datasets = Data is either in training or validation group. It’s possible to modify the variables by giving functions in lists (code below).
a = list(string.ascii_lowercase)
a[0], len(a)CONSOLE: ('a', 26)def f1(o): return o+'a'
def f2(o): return o+'b'dss = Datasets(a, [[f1,f2]])CONSOLE: ('aab',)dss = Datasets(a, [[f1],[f2]])CONSOLE: ('aa', 'ab')
DataLoaders = Contains training Dataset and validation Dataset.
DataBlock = contains all the above things (dataset, dataloader, dataset, dataloaders)
partial
gets function and then some parameter that it overwrites.
def say_hello(name, say_what="Hello"): return f"{say_what} {name}."
say_hello('Jeremy'),say_hello('Jeremy', 'Ahoy!')CONSOLE: ('Hello Jeremy.', 'Ahoy! Jeremy.')f = partial(say_hello, say_what="Bonjour")
f("Jeremy"),f("Sylvain")CONSOLE: ('Bonjour Jeremy.', 'Bonjour Sylvain.')
Cross validation helps training but it’s not something Jeremy uses because often the change is so small that it doesn’t matter. It’s useful if dataset is small.
Collaborative filtering is based the idea that a model doesn’t need to understand anything about the items it’s recommending but it can do meaningful recommendations by looking what others have done with similar patterns. So if A buys two books and B buys one of those books, the model might recommend the other book to B. But to take it even deeper the model might notice that certain books are similar and even though not A or B read one they might like it because it’s similar to a book they read.

In the above image the values in yellow and blue are parameters. The bold numbers are user and movie ids. Then other cells represent a rating that user gave for that movie. It’s calculated as dot product of two vectors (userId and movieId vector). The idea is that these vectors should represent the movie or the user. The parameters are trained the same way neural network by comparing the predicted ratings to the real values.

class Collab(Module):
def __init__(self, n_users, n_movies, n_factors, y_range=(0,5.5)):
self.user_factors = Embedding(n_users, n_factors)
self.user_bias = Embedding(n_users, 1)
self.movie_factors = Embedding(n_movies, n_factors)
self.movie_bias = Embedding(n_movies, 1)
self.y_range = y_range
def forward(self, x):
users = self.user_factors(x[:,0])
movies = self.movie_factors(x[:,1])
res = (users * movies).sum(dim=1, keepdim=True)
res += self.user_bias(x[:,0]) + self.movie_bias(x[:,1])
return sigmoid_range(res, *self.y_range)
Bias solves the problem that some user rates movies really well in general or the movie is just rated
as a great in general.
Originally posted: https://www.notion.so/lankinen/Folder-Fast-ai-Part-1-2020-e6bc5e0f9bce4d4d9f494ec8259b1119