1943: Warren McCulloch & Walter Pitts—mathematical model of artificial neuron.
1961: Frank Rosenblatt—perceptron.
1961: Arthur Samuel’s checkers program.
1986: James McClelland, David Rumelhart & PDP Research Group—book: “Parallel Distributed Processing”.
A NN is a parameterized function which can, in theory, solve any problem to any level of accuracy.
The learning process is the mapping of input data to output data (in a training set) through the adjustment of the parameters.
Image by <a href="https://news.berkeley.edu/2020/03/19/high-speed-microscope-captures-fleeting-brain-signals/" target="_blank">Na Ji, UC Berkeley</a>
Modified from <a href="https://royalsocietypublishing.org/doi/10.1098/rsta.2019.0163" target="_blank">O.C. Akgun & J. Mei 2019</a>
Single layer of artificial neurons → Unable to learn even some of the simple mathematical functions (Marvin Minsky & Seymour Papert).
Two layers → Theoretically can approximate any math model, but in practice very slow.
More layers → Deeper networks
Single layer of artificial neurons → Unable to learn even some of the simple mathematical functions (Marvin Minsky & Seymour Papert).
Two layers → Theoretically can approximate any math model, but in practice very slow.
More layers → Deeper networks → deep learning.
Used for spatially structured data.
Convolution layers → each neuron receives input only from a subarea of the previous layer.
Pooling → combines the outputs of neurons in a subarea to reduce the data dimensions.
Not fully connected.
Used for chain structured data (e.g. text).
Not feedforward.
First, we need an architecture (size, depth, types of layers, etc.).
This is set before training and does not change.
A model also comprises parameters.
Those are set to some initial values, but will change during training.
To train the model, we need labelled data in the form of input/output pairs.
Inputs and parameters are fed to the architecture.
We get predictions as outputs.
A metric (e.g. error rate) compares predictions and labels and is a measure of model performance.
Because it is not always sensitive enough to changes in parameter values, we compute a loss function …
… which allows to adjust the parameters slightly through backpropagation.
This cycle gets repeated for a number of steps.
At the end of the training process, what matters is the combination of architecture and trained parameters.
That’s what constitute a model.
A model can be considered as a regular program …
… and be used to obtain outputs from inputs.
A language model is a probability distribution over sequences of words. It attempts to predict the next word after any sequence of words.
Pre-trained models can be used to quickly get good results in all sorts of problems such as, for instance, sentiment analysis.
Bias is always present in data.
Document the limitations and scope of your data as best as possible.
Problems to watch for:
The last one is particularly problematic whenever the model outputs the next round of data based on interactions of the current round of data with the real world.
Solution: ensure there are human circuit breakers and oversight.
fastai
is a deep learning library that builds on top of PyTorch, adding a higher level of functionality.
[It] is organized around two main design goals: to be approachable and rapidly productive, while also being deeply hackable and configurable.
Manual
Tutorials
Peer-reviewed paper
Paperback version
Free MOOC version of part 1 of the book
Jupyter notebooks version of the book
Create iterators with the training and validation data.
Train the model.
Get predictions from our model.