Batch Size

In-depth explanation

Instead of updating weights after each example (stochastic) or after all examples (batch), mini-batch gradient descent updates after a fixed number of examples. Larger batches provide more stable gradients but require more memory; smaller batches add noise but can help escape local minima. Common sizes range from 16 to 256.

Examples

Batch size of 32

Batch size of 128

Related terms

Epoch Gradient Descent

More in Neural Networks

Activation Function

A mathematical function that determines the output of a neuron based on its weighted input sum.

Backpropagation

The algorithm for calculating gradients of the loss function with respect to network weights.

Epoch

One complete pass through the entire training dataset during model training.

Neural Network

A computing system inspired by biological neural networks, consisting of interconnected nodes (neurons).

Neuron

A basic computational unit in a neural network that receives inputs, applies weights and activation, and produces output.