Transformer

In-depth explanation

Introduced in "Attention is All You Need" (2017), Transformers replaced RNNs for many sequence tasks. They use self-attention to weigh the importance of different input elements regardless of distance. This enables parallel processing and better handling of long-range dependencies. Transformers power GPT, BERT, and modern LLMs.

Examples

GPT-4

BERT

Vision Transformer (ViT)

Related terms

BERT GPT

More in Deep Learning

Attention Mechanism

A technique that allows models to focus on relevant parts of the input when producing output.

Convolutional Neural Network (CNN)

A neural network architecture designed for processing grid-like data such as images.

Dropout

A regularization technique that randomly drops neurons during training to prevent overfitting.

Fine-Tuning

Adapting a pre-trained model to a new task by training on task-specific data.

LSTM (Long Short-Term Memory)

An RNN variant with gates that control information flow, enabling learning of long-term dependencies.

Recurrent Neural Network (RNN)

A neural network architecture designed for sequential data with connections between nodes forming cycles.