Data Science
Train-Test Split
Dividing data into separate sets for training and evaluating model performance.
In-depth explanation
The training set is used to train the model; the test set evaluates final performance on unseen data. A validation set (from training data) is used for hyperparameter tuning. Typical splits are 80/20 or 70/15/15. Proper splitting prevents data leakage and gives honest performance estimates. Time-series data requires temporal splits.
Examples
80% train, 20% test
70% train, 15% validation, 15% test
Related terms
More in Data Science
Master Train-Test Split.
Learn how to apply this concept with hands-on projects in our comprehensive AI programs.