Qwen2.5-7B model and the openhermes-fr dataset.
You will see that this process requires configuring Axolotl’s training parameters, leveraging FlexAI’s managed training infrastructure, and deploying the fine-tuned model as a scalable inference endpoint.
If you haven’t already connected FlexAI to GitHub, run
flexai code-registry connect to set up a code registry connection. This allows FlexAI to pull repositories directly using the repository URL in training commands.Verify Dataset Configuration
First, ensure your domain-specific dataset is properly configured in your Axolotl YAML file. For our French language example, we’ll use the For your own use case, replace this with your domain-specific dataset. The
openhermes-fr dataset.Navigate to code/axolotl/qwen2/fft-7b-french.yaml and verify the dataset configuration:openhermes-fr dataset is specifically designed for French language tasks and serves as an excellent example of domain specialization.Configure Training Parameters
The
qwen2/fft-7b-french.yaml file contains the training configuration for domain-specific fine-tuning. Key settings include:- Model:
Qwen/Qwen2.5-7B- Excellent multilingual base model suitable for domain adaptation - Stage: Full fine-tuning - Perfect for task-specific and domain-specific adaptation
- Dataset:
openhermes-fr- Example domain-specific dataset (replace with your own) - Training: Full fine-tuning with FSDP (Fully Sharded Data Parallel) for optimal performance
Create Secrets
To access the Qwen2.5-7B model and OpenHermes-FR dataset, you need a HuggingFace token. Use theflexai secret create command to store your HuggingFace Token as a secret. Replace <HF_AUTH_TOKEN_SECRET_NAME> with your desired name for the secret:
[Optional] Pre-fetch the Model
To speed up training and avoid downloading large models at runtime, you can pre-fetch your HuggingFace model to FlexAI storage. For example, to pre-fetch theQwen/Qwen2.5-7B model:
-
Create a HuggingFace storage provider:
-
Push the model checkpoint to your storage:
Training
For a 7B model, we recommend using 1 node (4 × H100 GPUs) to ensure reasonable training time and avoid out-of-memory issues.Standard Training (without prefetch)
Training with Model Prefetch
To take advantage of model pre-fetching performed in the Optional: Pre-fetch the Model section, use:Monitoring Training Progress
You can check the status and lifecycle events of your Training Job by running:Training Observability with Weights & Biases
For advanced monitoring and visualization of training metrics, Axolotl supports Weights & Biases (wandb) integration. You can leverage wandb logging for detailed insights into training progress, loss curves, and model performance. To enable wandb logging, update your YAML configuration:Getting Training Checkpoints
Once the Training Job completes successfully, you will be able to list all the produced checkpoints:INFERENCE READY = true - these are ready for serving.
Serving the Trained Model
Deploy your trained model directly from the checkpoint using FlexAI inference. Replace<CHECKPOINT_ID> with the ID from an inference-ready checkpoint:
GPU specification for inference endpoints is currently managed automatically by FlexAI. Future versions will allow explicit GPU count specification for inference workloads to optimize cost and performance based on your specific requirements.
Testing Your Domain-Specific Model
Once the endpoint is running, you can test it with domain-specific prompts. For our French language example, the model should demonstrate strong French language understanding, proper grammar and syntax, and cultural context awareness.Before and After Training Comparison
To illustrate the improvement from fine-tuning on French data, here’s a comparison using the question: “Qui a gagné la Coupe du monde 2018 ?” (who won the 2018 world cup?) Base Model Response (Qwen/Qwen2.5-7B before training):Example API Call
Expected Results
After fine-tuning on domain-specific data, your model should achieve:- Domain Expertise: Specialized knowledge and terminology understanding for your target domain
- Task-Specific Performance: Enhanced capabilities for domain-relevant tasks and workflows
- Maintained General Capabilities: Preserved reasoning, problem-solving, and general language skills
- Strong French Language Understanding: Natural conversation flow, proper grammar, cultural context
- High Performance on French Tasks: Question answering, text summarization, creative writing
Technical Details
Training Configuration Breakdown:
- Full Fine-tuning with FSDP: Enables training of 7B model on 1 node efficiently
- Mixed Precision (bf16): Accelerates training while maintaining numerical stability
- Gradient Accumulation: Effective batch size of 2 (2 steps × 1 per device)
- Learning Rate Schedule: Cosine decay with 10% warmup for stable convergence
- Context Length: 2048 tokens, optimized for conversation tasks
- Sample Packing: Efficient batch utilization for variable-length sequences
- Flash Attention: Optimized attention mechanism for faster training
Resource Requirements
Recommended Configuration for Qwen2.5-7B:- Nodes: 1 node (cost-effective for 7B models)
- Accelerators: 4 × H100 GPUs per node
- Memory: ~200GB+ GPU memory total
- Training Time: ~2-4 hours for 3 epochs
- Storage: ~30GB for checkpoints
FORCE_TORCHRUN=1: Ensures proper distributed training setup
Scaling Options:
- For faster training: Increase to 2 nodes (16 × H100)
- For larger datasets: Adjust
num_epochsparameter - For longer context: Increase
sequence_len(requires more memory) - For memory efficiency: Switch to QLoRA with
load_in_4bit: trueandadapter: qlora - For other models: Use configs in
code/axolotl/directory (Llama, Mistral, Gemma, Phi, etc.)
Additional Training Examples
Axolotl provides extensive configuration examples for various models and training strategies:Llama 3.1 8B with LoRA
Mistral 7B with QLoRA
code/axolotl/ directory for more examples including Gemma, Phi, Qwen2, multimodal models, and advanced configurations.
Troubleshooting
Training Job Fails to Start:- Reduce
micro_batch_sizefrom 1 to lower value (not recommended below 1) - Increase
gradient_accumulation_stepsto maintain effective batch size - Consider switching to QLoRA: set
load_in_4bit: trueandadapter: qlorafor memory efficiency - Enable
fsdp_offload_params: truefor additional memory savings
- Wait for training to complete fully (check with
flexai training inspect) - Ensure Axolotl configuration saves model in compatible format
- Verify training completed successfully without errors
- Verify checkpoint shows
INFERENCE READY = truestatus - Check FlexAI cluster availability with
flexai inference list - Review detailed logs with
flexai inference logs <endpoint-name>
- Verify dataset path is correct in YAML configuration (e.g.,
legmlai/openhermes-fr) - Ensure HuggingFace token has access to private datasets
- Check dataset format matches the specified type (sharegpt, alpaca, etc.)