Originally published here: https://www.notion.so/lankinen/2021-My-working-methods-458bba133c49484ab371b648b873ff6c
I work the best when everything is in order and I don’t need to waste time middle of a task to e.g. find certain notes or files. Over 2 years I have been focusing the way I work, I have learned a few things and took some habits that I find useful for myself. I wanted to go through all of these methods so that I can later look back at them and see how much things has changed or remember some method that I stopped doing.
I store all my notes on Notion. Most of…
This article is about how I tested a hypothesis I had by building a website in 3 days. It’s not my first time building something to test business hypothesis but this time I wanted to do it faster and document the processes.
The hypothesis is that people enjoy commenting 💩 and reading 😖 comments about news the same way they do with YouTube videos. There are some news sites offering comment option but I don’t like all of them. For example Hacker News is great website but the “news” (sometimes just Wikipedia article) rarely interest me.
The first step is…
High school math is enough to understand deep learning.
Lots of data is not needed but some record-breaking results have been made with <50 items.
You don’t need an expensive computer but start of the art results can be made for free.
Deep learning is the same as deep neural networks.
The idea of this course is to approach deep learning from top to bottom. It means that first we get to see and test what DL can be used and then we start learning more details.
The biggest regret a lot of people are saying is that…
Language model = a model that tries to predict the next word of a sentence.
A language model works well on transfer learning as the base model because it knows something about language as it can predict the next word of a sentence.
The base language model should be trained to task specific language model instead of directly using it in classifier.
Language model from scratch
Weight decay (L2 regularization)
The idea is to add the sum of all the weights squared to the loss function. This way the model tries to keep the weights as small as possible because bigger weights will increase the final loss.
loss_with_wd = loss + wd * (parameters**2).sum()
It’s often inefficient to calculate big sum and that is why it’s possible to add it directly to the gradient.
weight.grad += wd * 2 * weight
Creating own Embedding module
return nn.Parameter(torch.zeros(*size).normal_(0, 0.01))class DotProductBias(Module):
def __init__(self, n_users, n_movies, n_factors, y_range=(0,5.5))…
learn = cnn_learner(dls, resnet34, metrics=error_rate)
Learning rate finder helps to pick the best learning rate. The idea is to change the learning rate after every mini-batch and then plot the loss. Good learning rate is somewhere between the steepest point and the minimum. So for example based on the picture below a good learning rate for this situation might be 50^-2. One rule of thumb is to pick magnitude smaller learning rate from the minimum which in this case would be something like 10^-2.
Transfer learning (
freeze() "freezes" all the other layers except the last meaning…
train_x = torch.cat([stacked_threes, stacked_sevens]).view(-1, 28*28)
train_y = tensor( * len(threes) +  * len(sevens)).unsqueeze(1)
CONSOLE: (torch.Size([12396, 784]), torch.Size([12396, 1]))
dset = list(zip(train_x, train_y))
x, y = dset
PRINT: (torch.Size(), tensor())
def init_params(size, variance=1.0):
return (torch.randn(size)*variance).requires_grad_()weights = init_params((28*28,1))
bias = init_params(1)
requires_grad tells that we want to gradients for these weights.
_ in the end means that it's in place operation.
weights * pixel is always zero if pixels are zero and that is why we need to add some number called
The first (red and yellow) circle…
# Load saved learner
learn_inf = load_learner(path/'export.pkl')
# Predict given image
# See labels
Jupyter Notebook widgets are created using IPython widgets.
Deploying CPU is easier than GPU in most of the cases. GPU is needed only if the deployed model still requires a lot of computation like some video model or if the model gets thousands of requests every second making it possible to batch them.
Out of domain data = data might be too optimistic. For example when predicting bears the images might be too clear and middle of the image when in the real…
Classification = Predict label (e.g. dog or cat)
Regression = Predict number (e.g. how old)
The last time we looked this code:
valid_pct=0.2 (validation percent) means that the loader is going to put 20% of the given data aside in training. Then this data is used to measure how good the model is after training. If the same data is used to train and test the model might over-fit to the data. Last time Jeremy briefly talked that over-fitting means that the model is giving good predictions for the data it's trained for but not for other data. For…
Generic iOS Device
Product > Archive(It will open a new window)