- Why can’t we use accuracy as a loss function?
A loss function must be differentiable to perform gradient descent. It seems like you’re trying to measure some sort of 1-accuracy. This doesn’t have a derivative, so you can’t use it.
- gradient descent : An optimization algorithm used to minimize some function by iteratively moving in the direction of steepest descent as defined by the negative of the gradient. -> This is the way to fine linear and the machine learns it.
2. Draw the sigmoid function. What is special about its shape?
3.What is the difference between a loss function and a metric?
4. What is the function to calculate new weights using a learning rate?
- What does the DataLoader class do?
- Write pseudocode showing the basic steps taken in each epoch for SGD.
- Create a function that, if passed two arguments [1,2,3,4]
and ‘abcd’, returns [(1, ‘a’), (2, ‘b’), (3, ‘c’), (4, ‘d’)]. What is special about that output data structure? - What does view do in PyTorch?
- What are the “bias” parameters in a neural network? Why do we need them?
- What does the @ operator do in Python?
- What does the backward
method do? - Why do we have to zero the gradients?
- What information do we have to pass to Learner?
- Show Python or pseudocode for the basic steps of a training loop.
- What is “ReLU”? Draw a plot of it for values from -2
to +2. - What is an “activation function”?
- What’s the difference between F.relu and nn.ReLU?
- The universal approximation theorem shows that any function can be approximated as closely as needed using just one nonlinearity. So why do we normally use more?
- Why do we first resize to a large size on the CPU, and then to a smaller size on the GPU?
- If you are not familiar with regular expressions, find a regular expression tutorial, and some problem sets, and complete them. Have a look on the book’s website for suggestions.
- What are the two ways in which data is most commonly provided, for most deep learning datasets?
- Look up the documentation for L
and try using a few of the new methods is that it adds. - Look up the documentation for the Python pathlib
module and try using a few methods of the Path
class. - Give two examples of ways that image transformations can degrade the quality of the data.
- What method does fastai provide to view the data in a DataLoaders?
- What method does fastai provide to help you debug a DataBlock?
- Should you hold off on training a model until you have thoroughly cleaned your data?
- What are the two pieces that are combined into cross-entropy loss in PyTorch?
- What are the two properties of activations that softmax ensures? Why is this important?
- When might you want your activations to not have these two properties?
- Calculate the exp and softmax columns of <<bear_softmax>> yourself (i.e., in a spreadsheet, with a calculator, or in a notebook)