Overview
This prompt aims to guide users in revising a Python CNN training script to improve model evaluation practices. Programmers and data scientists will benefit by learning how to implement proper validation techniques and avoid information leakage.
Prompt Overview
Purpose: This script aims to implement a CNN model training pipeline with proper data handling to avoid information leakage.
Audience: It is designed for data scientists and machine learning engineers working with PyTorch for model training.
Distinctive Feature: The updated script includes a dedicated validation set and uses F1-score for early stopping and model selection.
Outcome: Users will achieve unbiased performance metrics by evaluating the model on the test set only after training.
Quick Specs
- Media: Text
- Use case: Generation
- Industry: Business Communications, CRM & Sales Software, Development Tools & DevOps
- Techniques: Decomposition, Self-Critique / Reflection, Structured Output
- Models: Claude 3.5 Sonnet, Gemini 2.0 Flash, GPT-4o, Llama 3.1 70B
- Estimated time: 5-10 minutes
- Skill level: Beginner
Variables to Fill
No inputs required — just copy and use the prompt.
Example Variables Block
No example values needed for this prompt.
The Prompt
You are given a Python script implementing a CNN model training pipeline that currently lacks a dedicated validation set. The existing script improperly uses the test set for early stopping and selecting the best model, leading to information leakage and overly optimistic test performance metrics.
Your task is to analyze the provided CNN training script and revise it to include the following improvements:
1. Data Splitting:
– Split the original training dataset into a proper training subset and a validation subset (e.g., 80% train, 20% validation) using `torch.utils.data.random_split` before creating `train_ds_lungs`.
– Create separate DataLoader objects for the training set and the validation set (e.g., `train_loader` and `validation_loader`).
2. Model Training and Evaluation:
– Perform model training epochs using the training set.
– At the end of each epoch, evaluate the model on the validation set and compute validation metrics, especially the F1-score.
3. Early Stopping and Model Selection:
– Use the validation F1-score (or validation loss) as the criterion for early stopping and for selecting and saving the best model’s state (`best_model_state`).
4. Test Set Evaluation:
– Only after the training concludes and the best model state has been loaded, run evaluation on the test set to report final unbiased performance metrics.
5. Maintain clarity, modularity, and proper comments in your updated code to clarify the changes.
# Steps
– Identify where the original dataset is loaded before `train_ds_lungs`.
– Insert a `random_split` to create train and validation subsets.
– Create DataLoaders for train and validation subsets.
– Modify the training loop to evaluate on the validation set after each epoch.
– Implement early stopping using validation F1-score.
– After training finishes, load the best model and evaluate on the test set only once.
# Output Format
– Provide the fully updated Python script implementing these changes.
– Ensure the code includes necessary imports, dataset splitting, DataLoader creation, training loop, validation evaluation, early stopping logic, and final test evaluation.
– Include comments explaining the key modifications.
# Notes
– Assume the necessary standard PyTorch and sklearn imports for metrics and data handling.
– Retain all other original functionalities unless they conflict with the above requirements.
– Focus on clear separation of training, validation, and test phases to prevent information leakage.
Screenshot Examples
How to Use This Prompt
- Copy the prompt provided above.
- Paste the prompt into your preferred coding environment.
- Analyze the existing CNN training script for data handling.
- Implement the suggested improvements in the script.
- Test the updated script for proper functionality.
- Review the code for clarity and comments.
Tips for Best Results
- Data Splitting: Use `torch.utils.data.random_split` to create separate training and validation datasets, ensuring no information leakage.
- Model Training: Train the model on the training set and evaluate it on the validation set at the end of each epoch to compute metrics like the F1-score.
- Early Stopping: Implement early stopping based on the validation F1-score to select and save the best model’s state during training.
- Test Evaluation: After training, evaluate the best model on the test set to obtain unbiased performance metrics, ensuring a clear separation of phases.
FAQ
- What is the purpose of a validation set in model training?
A validation set helps tune model parameters and prevents overfitting by providing unbiased performance metrics during training. - How can we split the dataset into training and validation sets?
Use `torch.utils.data.random_split` to divide the dataset into training and validation subsets, typically in an 80-20 ratio. - What metric should be used for early stopping?
The validation F1-score is commonly used for early stopping to ensure the model generalizes well. - When should the test set be evaluated?
The test set should be evaluated only after training is complete and the best model has been selected.
Compliance and Best Practices
- Best Practice: Review AI output for accuracy and relevance before use.
- Privacy: Avoid sharing personal, financial, or confidential data in prompts.
- Platform Policy: Your use of AI tools must comply with their terms and your local laws.
Revision History
- Version 1.0 (February 2026): Initial release.


