If FlexAI is not yet connected to your GitHub account, run:This enables FlexAI to automatically pull code from repositories referenced in
--repository-url.Overview
This blueprint covers:- Object Detection Training: Fine-tune YOLO11 on custom datasets for object detection
- Instance Segmentation: Train models to detect and segment objects at the pixel level
- Pose Estimation: Train models for human pose detection and keypoint tracking
- Model Validation: Evaluate model performance with comprehensive metrics
- Model Export: Export to optimized formats (ONNX, TensorRT) for deployment
- Inference Deployment: Deploy trained models as FlexAI inference endpoints
detect, segment, pose, track, etc.). This guide demonstrates the core workflows that apply across all computer vision tasks.
Prepare the Dataset
YOLO models require datasets in a specific format. We’ll use the COCO8 dataset (a small subset of COCO) for this example, but you can easily adapt this to your own custom dataset.Each label file contains annotations in YOLO format (one object per line):All coordinates must be normalized to the range [0, 1].Create a data configuration file (YOLO accepts both relative and absolute paths, but using absolute paths (When you use the dataset in a training job with
Option A: Use COCO8 Dataset (Quick Start)
The COCO8 dataset will be automatically downloaded by Ultralytics during training. No manual download is required.Option B: Prepare Your Own Custom Dataset
For custom datasets, follow the YOLO format structure:data.yaml):Both mapping and list formats are supported for class names. You can also use:
names: ["person", "bicycle", "car"]/input/...) reduces ambiguity inside FlexAI jobs.If your dataset uses a different annotation format (COCO JSON, Pascal VOC, etc.), convert it to YOLO format before uploading. Refer to the Ultralytics Data Format documentation for conversion guidance.Upload Custom Dataset to FlexAI
Once your dataset is prepared, upload it to FlexAI:--dataset yolo-custom-dataset, FlexAI will mount the dataset contents directly at /input/ in your training environment. All dataset contents are mounted under /input, preserving their original folder structure.This means:- If your dataset structure is
dataset/images/train/..., it will be accessible at/input/images/train/... - Your
data.yamlshould be at/input/data.yaml - The
pathfield in yourdata.yamlshould be/inputas shown in the example above
data.yaml before starting a training job.Train an Object Detection Model
Train a YOLO11 model for object detection. We’ll start with the nano model (YOLO11n) which is fast and efficient.YOLO11m (Medium):YOLO11l (Large):
Training on COCO8 Dataset
--accels specifies the number of GPUs to allocate (e.g., --accels 4 = 4 GPUs of the chosen accelerator type).Training on Custom Dataset
Training with Larger Models
For better accuracy, use larger YOLO variants. Adjust batch size based on model size:YOLO11s (Small):Train an Instance Segmentation Model
Instance segmentation detects objects and generates pixel-level masks for each instance.For custom segmentation datasets, ensure your labels include polygon annotations in YOLO segmentation format.
Validate the Model
After training, validate your model’s performance on the validation dataset.For validation with a custom dataset:
List Training Checkpoints
Run Validation
For validation with the COCO8 dataset (will be auto-downloaded):Understanding Validation Metrics
YOLO provides comprehensive metrics:- mAP50: Mean Average Precision at IoU threshold 0.5
- mAP50-95: Mean Average Precision averaged across IoU thresholds 0.5-0.95
- Precision: Ratio of true positive detections
- Recall: Ratio of detected ground truth objects
- F1-Score: Harmonic mean of precision and recall
Export the Model
Export your trained model to various formats for optimized deployment.
See the Ultralytics Export documentation for more formats and options.
Download Checkpoint Locally
First, download the best checkpoint to your local machine:Export to ONNX
ONNX format is widely supported and optimized for cross-platform inference:Export to TensorRT
For NVIDIA GPU deployment with maximum performance:Available Export Formats
| Format | format Argument | Use Case |
|---|---|---|
| PyTorch | torchscript | General PyTorch deployment |
| ONNX | onnx | Cross-platform inference |
| TensorRT | engine | NVIDIA GPU optimization |
| CoreML | coreml | Apple devices (iOS/macOS) |
| TFLite | tflite | Mobile and embedded devices |
| OpenVINO | openvino | Intel hardware acceleration |
| NCNN | ncnn | Mobile deployment |
Run Inference on Your Trained Model
After training, you can run inference on images or videos using your trained model.The predictions and annotated images will be saved in the job’s output directory.Then run prediction:Results will be saved to
Run Inference as a Training Job
FlexAI’s managed inference endpoints currently support vLLM only. For YOLO models, use a training job to execute predictions directly:Run Batch Predictions
To run inference on a directory of images, first upload them as a dataset:Download Prediction Results
After the prediction job completes, download the results:Quick Local Testing
You can test your model locally after downloading the checkpoint:runs/detect/predict/.Monitoring Training Progress
Check Training Status
View Training Logs
Training Observability with TensorBoard
Ultralytics automatically logs training metrics. Access FlexAI’s hosted TensorBoard instance to track:- Training and validation loss curves
- mAP (mean Average Precision) metrics
- Precision and recall curves
- Learning rate schedules
Weights & Biases Integration
For advanced monitoring, integrate with Weights & Biases by adding environment variables:Advanced Use Cases
Object Tracking
YOLO11 supports multi-object tracking in videos. First upload your video as a dataset:Model Benchmarking
Compare model performance across different formats and hardware:- PyTorch inference speed
- ONNX inference speed
- TensorRT inference speed (if available)
- Model accuracy metrics
Hyperparameter Tuning
Use Ultralytics’ built-in hyperparameter tuning:Expected Results
Detection Performance (YOLO11n on COCO8)
- mAP50: typically 0.45–0.55
- mAP50-95: typically 0.30–0.40
- Inference Speed: 1.5–2.5 ms/image (H100 TensorRT)
- Model Size: ~3 MB
Segmentation Performance (YOLO11n-seg on COCO8-seg)
- Box mAP50: typically 0.45–0.55
- Mask mAP50: typically 0.40–0.50
- Inference Speed: 2–3 ms/image (H100 TensorRT)
Training Time
- YOLO11n on COCO8: ~5–10 minutes (4 × H100, 100 epochs).
- YOLO11s on COCO8: ~10–15 minutes (4 × H100, 100 epochs).
- YOLO11m on full COCO: ~4–6 hours (4 × H100, 100 epochs).
Technical Details
Recommended Resource Configuration
| Model | GPUs | Batch Size | Memory | Training Time (100 epochs, COCO8) |
|---|---|---|---|---|
| YOLO11n | 1-4 × H100 | 16-32 | 8GB+ | 5-10 min |
| YOLO11s | 2-4 × H100 | 8-16 | 12GB+ | 10-15 min |
| YOLO11m | 4 × H100 | 4-8 | 16GB+ | 15-20 min |
| YOLO11l | 4-8 × H100 | 2-4 | 24GB+ | 20-30 min |
Key Training Parameters
Image Size (imgsz):
- Standard: 640×640
- Small objects: 1280×1280 (slower but better detection)
- Real-time applications: 320×320 or 416×416 (faster inference)
batch):
- Larger batches generally lead to better convergence
- Adjust based on GPU memory: YOLO11n (16-32), YOLO11s (8-16), YOLO11m (4-8)
- Small datasets (< 1000 images): 100-150 epochs
- Medium datasets (1000-10000): 50-100 epochs
- Large datasets (> 10000): 30-50 epochs
patience):
- Stops training if no improvement for N epochs
- Recommended: 50 epochs for COCO8, 30 epochs for larger datasets
- Enabled by default with optimized settings
- Includes mosaic, mixup, HSV augmentation, and geometric transforms
Multi-GPU Training
FlexAI automatically enables distributed training when multiple GPUs are requested:Transfer Learning
Continue training from a previous checkpoint:Troubleshooting
Common Issues
Fixing Training Job Failures:- Reduce batch size:
batch=8orbatch=4 - Use smaller image size:
imgsz=416 - Try a smaller model variant: YOLO11n instead of YOLO11s/m/l
- Train for more epochs:
epochs=200orepochs=300 - Increase image size:
imgsz=1280 - Use a larger model: YOLO11s/m instead of YOLO11n
- Check dataset quality and annotation accuracy
- Ensure balanced class distribution
- Verify YOLO format:
<class_id> <x_center> <y_center> <width> <height> - Check that coordinates are normalized (0-1 range)
- Ensure
data.yamlpaths are correct - Verify image-label pairs match (same filename, different extension)
- Update Ultralytics: Add
ultralytics>=8.3.0to requirements - Check CUDA/TensorRT compatibility
- Try without optimization:
simplify=Falsefor ONNX - Ensure model path is correct
- Verify checkpoint ID is correct
- Check that model path
/checkpoint/weights/best.ptexists - Ensure source path is correct (use
/inputfor datasets) - Check logs:
flexai training logs <job-name>
Dataset Format Requirements
If you have a dataset in another format (COCO JSON, Pascal VOC, etc.), you’ll need to convert it to YOLO format before using it with this blueprint.YOLO Format Specification
Each image should have a corresponding text file with the same name:class_id: Integer class ID (0-indexed)x_center,y_center,width,height: Normalized coordinates (0-1 range)
image001.txt):
Converting Other Formats
For converting from other formats:- COCO JSON: See Ultralytics COCO format documentation
- Pascal VOC: See Ultralytics VOC format documentation
- Other formats: Refer to Ultralytics Data Formats guide
data.yaml has path: /input before uploading to FlexAI.
Best Practices
Dataset Preparation
- Use high-quality, diverse images
- Ensure balanced class distribution
- Include various lighting conditions, angles, and backgrounds
- Minimum 1500 images per class recommended
- Validate annotation accuracy before training
Training
- Start with pre-trained weights (transfer learning)
- Use data augmentation (enabled by default)
- Monitor validation metrics, not just training loss
- Use early stopping to prevent overfitting
- Save checkpoints regularly
Deployment
- Export to optimized formats (ONNX/TensorRT) for production
- Test on representative images before deployment
- Set appropriate confidence thresholds (0.25-0.5 typical range)
- Benchmark inference speed on target hardware