RL Fine-Tuning with EasyR1: GRPO & DAPO for Better Reasoning

This experiment demonstrates how to use FlexAI to fine-tune language models using reinforcement learning (RL) techniques with EasyR1, a framework for training reasoning-capable models using GRPO (Group Relative Policy Optimization), DAPO, and REINFORCE algorithms. For illustration purposes, we’ll fine-tune the Qwen2.5-7B-Instruct model on mathematical reasoning tasks using the math12k dataset with GRPO algorithm to improve reasoning capabilities.

If you haven’t already connected FlexAI to GitHub, run flexai code-registry connect to set up a code registry connection. This allows FlexAI to pull repositories directly using the repository URL in training commands.

Quick Start

Run GRPO training on Qwen2.5-7B with this single command:

flexai training run grpo \
  --accels 8 --nodes 1 \
  --repository-url https://github.com/flexaihq/blueprints \
  --env FORCE_TORCHRUN=1 \
  --secret WANDB_API_KEY=<WANDB_API_KEY_SECRET_NAME> \
  --secret HF_TOKEN=<HF_AUTH_TOKEN_SECRET_NAME> \
  --requirements-path code/easyR1/requirements.txt \
  --runtime pytorch-28-vllm-0110-nvidia \
  -- python3 -m verl.trainer.main \
      config=code/easyR1/config.yaml \
      worker.actor.model.model_path=Qwen/Qwen2.5-7B-Instruct

Replace <WANDB_API_KEY_SECRET_NAME> and <HF_AUTH_TOKEN_SECRET_NAME> with your actual values.

What is EasyR1?

EasyR1 is a reinforcement learning framework specifically designed for training language models with enhanced reasoning capabilities. It implements several RL algorithms optimized for LLMs:

GRPO (Group Relative Policy Optimization): Efficient policy optimization using group-based advantage estimation
DAPO (Data-Augmented Policy Optimization): Enhanced training with data augmentation strategies
REINFORCE: Classic policy gradient method for LLM fine-tuning

The framework is built on top of VERL (Versatile Efficient Reinforcement Learning), providing distributed training capabilities with FSDP and vLLM integration.

Directory Structure

The code/easyR1/ directory contains:

config.yaml - Main GRPO training configuration
format_prompt/ - Jinja templates for prompt formatting
reward_function/ - Custom reward scoring functions

For baseline training scripts and additional examples, refer to the EasyR1 GitHub repository.

Understand the Configuration

EasyR1 uses a comprehensive YAML configuration file that controls all aspects of RL training. The main configuration file is located at code/easyR1/config.yaml in this repository.

Key Configuration Sections

Data Configuration

data:
  train_files: hiyouga/math12k@train
  val_files: hiyouga/math12k@test
  prompt_key: problem
  answer_key: answer
  format_prompt: ./code/easyR1/format_prompt/math.jinja
  max_prompt_length: 2048
  max_response_length: 2048
  rollout_batch_size: 512

Algorithm Settings

algorithm:
  adv_estimator: grpo  # GRPO, DAPO, or REINFORCE
  use_kl_loss: true
  kl_coef: 1.0e-2

Worker Configuration

worker:
  actor:
    model:
      model_path: Qwen/Qwen2.5-7B-Instruct
    optim:
      lr: 1.0e-6
  rollout:
    n: 5  # number of rollout samples per prompt
    temperature: 1.0
  reward:
    reward_type: batch
    reward_function: ./code/easyR1/reward_function/math.py:compute_score

Reference Baseline Examples

For pre-configured training scripts and baseline examples, refer to the EasyR1 repository. The repository provides multiple baseline configurations for different models and tasks:

Available Baselines (in EasyR1 repo)

Mathematical Reasoning: qwen2_5_7b_math_grpo.sh, qwen3_4b_math_grpo.sh
Geometric Reasoning (Vision-Language): qwen2_5_vl_7b_geo3k_grpo.sh, qwen2_5_vl_7b_geo3k_dapo.sh, qwen2_5_vl_7b_geo3k_reinforce.sh
Multi-Image Tasks: qwen2_5_vl_7b_multi_image.sh

You can adapt these examples to work with FlexAI by following the training commands in this blueprint.

Customize Your Configuration

For your specific use case, you may want to create a custom configuration. Here’s how to customize the config.yaml:

Custom Dataset

Replace the dataset configuration:

data:
  train_files: your-username/your-dataset@train
  val_files: your-username/your-dataset@test
  prompt_key: question  # adjust based on your dataset
  answer_key: solution  # adjust based on your dataset

Custom Reward Function

Create your own reward function in code/easyR1/reward_function/custom.py:

def compute_score(prompts, responses, answers):
    """
    Args:
        prompts: List of input prompts
        responses: List of model responses
        answers: List of ground truth answers

    Returns:
        List of reward scores (float)
    """
    scores = []
    for response, answer in zip(responses, answers):
        # Your custom reward logic here
        score = your_evaluation_function(response, answer)
        scores.append(score)
    return scores

Then update the config to reference your custom reward function:

worker:
  reward:
    reward_function: ./code/easyR1/reward_function/custom.py:compute_score

Custom Prompt Format

Create a custom Jinja template in code/easyR1/format_prompt/custom.jinja:

{{ "{{" }} problem {{ "}}" }}

Please solve this step by step and provide your final answer.

Update the config:

data:
  format_prompt: ./code/easyR1/format_prompt/custom.jinja

Create Secrets

To access HuggingFace models and datasets, you need a HuggingFace token. Use the flexai secret create command to store your HuggingFace Token as a secret:

flexai secret create <HF_AUTH_TOKEN_SECRET_NAME>

Then paste your HuggingFace Token API key value. Use the same command to store your Weights & Biases (wandb) API key as a secret:

flexai secret create <WANDB_API_KEY_SECRET_NAME>

Then paste your Weights & Biases API key value.

[Optional] Pre-fetch the Model

To speed up training and avoid downloading large models at runtime, you can pre-fetch your HuggingFace model to FlexAI storage:

Create a HuggingFace storage provider:

flexai storage create HF-STORAGE --provider huggingface --hf-token-name <HF_AUTH_TOKEN_SECRET_NAME>

Push the model checkpoint to your storage:

flexai checkpoint push qwen25-7b-instruct --storage-provider HF-STORAGE --source-path Qwen/Qwen2.5-7B-Instruct

Training

For RL training with EasyR1, we recommend using 1 node (8 × H100 GPUs) for 7B models to handle the actor, reference model, and rollout workers efficiently.

The commands below use this repository which contains all necessary configuration files in the code/easyR1/ directory.

Standard Training: Mathematical Reasoning with GRPO

flexai training run grpo \
  --accels 8 --nodes 1 \
  --repository-url https://github.com/flexaihq/blueprints \
  --env FORCE_TORCHRUN=1 \
  --secret WANDB_API_KEY=<WANDB_API_KEY_SECRET_NAME> \
  --secret HF_TOKEN=<HF_AUTH_TOKEN_SECRET_NAME> \
  --requirements-path code/easyR1/requirements.txt \
  --runtime pytorch-28-vllm-0110-nvidia \
  -- python3 -m verl.trainer.main \
      config=code/easyR1/config.yaml \
      worker.actor.model.model_path=Qwen/Qwen2.5-7B-Instruct

Training with Model Prefetch

flexai training run grpo-prefetched \
  --accels 8 --nodes 1 \
  --repository-url https://github.com/flexaihq/blueprints \
  --checkpoint qwen25-7b-instruct \
  --env FORCE_TORCHRUN=1 \
  --secret WANDB_API_KEY=<WANDB_API_KEY_SECRET_NAME> \
  --secret HF_TOKEN=<HF_AUTH_TOKEN_SECRET_NAME> \
  --requirements-path code/easyR1/requirements.txt \
  --runtime pytorch-28-vllm-0110-nvidia \
  -- python3 -m verl.trainer.main \
      config=code/easyR1/config.yaml \
      worker.actor.model.model_path=/input-checkpoint/qwen25-7b-instruct

Training with Custom Configuration

To use a modified configuration or different dataset, override config values:

flexai training run grpo-custom \
  --accels 8 --nodes 1 \
  --repository-url https://github.com/flexaihq/blueprints \
  --env FORCE_TORCHRUN=1 \
  --secret WANDB_API_KEY=<WANDB_API_KEY_SECRET_NAME> \
  --secret HF_TOKEN=<HF_AUTH_TOKEN_SECRET_NAME> \
  --requirements-path code/easyR1/requirements.txt \
  --runtime pytorch-28-vllm-0110-nvidia \
  -- python3 -m verl.trainer.main \
      config=code/easyR1/config.yaml \
      worker.actor.model.model_path=Qwen/Qwen2.5-7B-Instruct \
      data.train_files=your-username/your-dataset@train \
      data.val_files=your-username/your-dataset@test \
      trainer.experiment_name=custom-experiment

Monitoring Training Progress

You can check the status and lifecycle events of your Training Job:

flexai training inspect grpo

View the logs of your Training Job:

flexai training logs grpo

Training Observability with Weights & Biases

EasyR1 supports Weights & Biases (wandb) integration for detailed training metrics visualization. The configuration already includes wandb logging:

trainer:
  logger: ["file", "wandb"]
  project_name: easy_r1
  experiment_name: qwen2_5_7b_math_grpo

Getting Training Checkpoints

Once the Training Job completes successfully, you can list all produced checkpoints:

flexai training checkpoints grpo

Look for checkpoints marked as INFERENCE READY = true - these are ready for serving.

Serving the Trained Model

Deploy your RL-trained model directly from the checkpoint using FlexAI inference. Replace <CHECKPOINT_ID> with the ID from an inference-ready checkpoint:

flexai inference serve easyr1-reasoning-endpoint --checkpoint <CHECKPOINT_ID>

Monitor your inference endpoint status:

# List all inference endpoints
flexai inference list

# Get detailed endpoint information
flexai inference inspect easyr1-reasoning-endpoint

# Check endpoint logs
flexai inference logs easyr1-reasoning-endpoint

Testing Your RL-Trained Model

Once the endpoint is running, you can test it with reasoning tasks. For our mathematical reasoning example, the model should demonstrate improved step-by-step reasoning and accurate problem-solving.

Before and After Training Comparison

To illustrate the improvement from RL fine-tuning, here’s a comparison using a math problem: Problem: “If a train travels 120 miles in 2 hours, what is its average speed in miles per hour?” Base Model Response (Qwen2.5-7B-Instruct before RL training):

The average speed is 60 mph.

Issues: Correct answer but no reasoning steps shown RL Fine-tuned Model Response (after GRPO training on math12k):

Let me solve this step by step:

Step 1: Identify the given information
- Distance traveled = 120 miles
- Time taken = 2 hours

Step 2: Apply the speed formula
Speed = Distance / Time

Step 3: Calculate
Speed = 120 miles / 2 hours = 60 miles per hour

Therefore, the average speed of the train is 60 mph.

Improvements: Clear reasoning steps, structured approach, educational value This demonstrates how RL training encourages the model to show its reasoning process, making it more reliable and transparent.

Example API Call

curl -X POST "https://your-endpoint-url/v1/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "prompt": "Solve the following problem step by step: A rectangle has a length of 15 cm and a width of 8 cm. What is its area?",
    "max_tokens": 500,
    "temperature": 0.7
  }'

Expected Results

After RL fine-tuning with EasyR1, your model should achieve:

Enhanced Reasoning: Step-by-step problem-solving with clear explanations
Improved Accuracy: Higher success rate on reasoning tasks
Better Generalization: Ability to apply learned reasoning patterns to new problems
Structured Outputs: More organized and educational responses

For mathematical reasoning tasks:

Explicit Step-by-Step Solutions: Clear breakdown of problem-solving process
Higher Success Rate: Improved accuracy on math benchmarks
Better Error Detection: Ability to identify and correct mistakes

Technical Details

Training Configuration Breakdown

Reinforcement Learning Components:

Actor Model: The model being trained (policy network)
Reference Model: Frozen copy for KL divergence computation
Rollout Workers: Generate multiple responses for each prompt (n=5)
Reward Function: Evaluates response quality (custom per task)

Distributed Training:

FSDP (Fully Sharded Data Parallel): Efficient memory usage for large models
vLLM Integration: Fast inference during rollout generation
Tensor Parallelism: For rollout workers (size=2)

Optimization:

GRPO Algorithm: Group-based advantage estimation for stable training
KL Penalty: Prevents model from deviating too far from base model
Gradient Checkpointing: Reduces memory usage during backpropagation

Resource Requirements

Recommended Configuration for Qwen2.5-7B:

Nodes: 1 node (sufficient for RL training with actor + reference + rollout)
Accelerators: 8 × H100 GPUs per node
Memory: ~400GB+ GPU memory total (actor, reference, and rollout workers)
Training Time: ~8-12 hours for 15 epochs
Storage: ~50GB for checkpoints

Command Line Parameters Explained:

FORCE_TORCHRUN=1: Ensures proper distributed training setup
--runtime pytorch-28-vllm-0110-nvidia: PyTorch 2.8 with vLLM 0.11.0 optimized for EasyR1
--repository-url: Points to the FlexAI blueprints repository
config=code/easyR1/config.yaml: Main configuration file path relative to repository root

Key Configuration Parameters

Data Settings:

rollout_batch_size: 512: Number of prompts per training iteration
max_prompt_length: 2048: Maximum input length
max_response_length: 2048: Maximum output length

Algorithm Settings:

adv_estimator: grpo: Choice of RL algorithm
kl_coef: 1.0e-2: Strength of KL penalty
use_kl_loss: true: Enable KL divergence loss

Training Settings:

total_epochs: 15: Number of training epochs
n_gpus_per_node: 8: GPUs per node
val_freq: 5: Validation every 5 epochs
save_freq: 5: Save checkpoint every 5 epochs

Scaling Options

For faster training: Increase to 2 nodes (16 × H100)
For larger models: Increase tensor_parallel_size for rollout
For better exploration: Increase rollout.n (more samples per prompt)
For memory efficiency: Enable CPU offloading (enable_cpu_offload: true)
For different tasks: Modify reward function and prompt templates

Advanced Examples

Vision-Language Model with Geometric Reasoning

flexai training run grpo-VL-Geo \
  --accels 8 --nodes 1 \
  --repository-url https://github.com/flexaihq/blueprints \
  --env FORCE_TORCHRUN=1 \
  --secret WANDB_API_KEY=<WANDB_API_KEY_SECRET_NAME> \
  --secret HF_TOKEN=<HF_AUTH_TOKEN_SECRET_NAME> \
  --requirements-path code/easyR1/requirements.txt \
  --runtime pytorch-28-vllm-0110-nvidia \
  -- python3 -m verl.trainer.main \
      config=code/easyR1/config.yaml \
      worker.actor.model.model_path=Qwen/Qwen2.5-VL-7B-Instruct \
      data.train_files=hiyouga/geometry3k@train \
      data.val_files=hiyouga/geometry3k@test \
      data.format_prompt=./code/easyR1/format_prompt/r1v.jinja \
      worker.reward.reward_function=./code/easyR1/reward_function/r1v.py:compute_score \
      trainer.experiment_name=qwen2_5_vl_7b_geo3k_grpo

Using DAPO Algorithm

flexai training run Dapo-14B \
  --accels 8 --nodes 1 \
  --repository-url https://github.com/flexaihq/blueprints \
  --env FORCE_TORCHRUN=1 \
  --secret WANDB_API_KEY=<WANDB_API_KEY_SECRET_NAME> \
  --secret HF_TOKEN=<HF_AUTH_TOKEN_SECRET_NAME> \
  --requirements-path code/easyR1/requirements.txt \
  --runtime pytorch-28-vllm-0110-nvidia \
  -- python3 -m verl.trainer.main \
      config=code/easyR1/config.yaml \
      worker.actor.model.model_path=Qwen/Qwen3-14B \
      algorithm.adv_estimator=dapo \
      algorithm.online_filtering=true \
      data.train_files=hiyouga/dapo17k@train \
      data.val_files=hiyouga/dapo17k@test \
      data.format_prompt=./code/easyR1/format_prompt/dapo.jinja \
      worker.reward.reward_function=./code/easyR1/reward_function/dapo.py:compute_score \
      trainer.experiment_name=qwen3_14b_dapo17k_dapo

Troubleshooting

Training Job Fails to Start:

# Check FlexAI authentication
flexai auth status

# Verify repository access
git clone https://github.com/flexaihq/blueprints

Out of Memory Errors:

Reduce rollout_batch_size from 512 to 256
Reduce rollout.n from 5 to 3 (fewer samples per prompt)
Enable CPU offloading: enable_cpu_offload: true in FSDP config
Reduce tensor_parallel_size for rollout workers

Reward Function Errors:

Verify reward function path is correct in config
Test reward function locally before training
Ensure reward function returns float scores for all inputs
Check for NaN or infinite reward values

Checkpoint Not Inference Ready:

Wait for training to complete fully
Check save_model_only: false in config to include all necessary files
Verify training completed without errors

Endpoint Deployment Issues:

Verify checkpoint shows INFERENCE READY = true status
Check FlexAI cluster availability
Review detailed logs with flexai inference logs <endpoint-name>

Dataset Loading Issues:

Verify dataset path format: username/dataset@split
Ensure HuggingFace token has access to datasets
Check prompt_key and answer_key match your dataset schema

vLLM Rollout Errors:

Adjust gpu_memory_utilization (default 0.6)
Reduce tensor_parallel_size if GPUs are insufficient
Enable enforce_eager: true for debugging

References

EasyR1 GitHub: https://github.com/hiyouga/EasyR1
VERL Framework: https://github.com/volcengine/verl
FlexAI Documentation: https://docs.flex.ai
HybridFlow Paper: https://arxiv.org/abs/2409.19256
GRPO Algorithm: Introduced in DeepSeekMath paper - https://arxiv.org/abs/2402.03300
GRPO Documentation: https://huggingface.co/docs/trl/grpo_trainer

Code

`requirements.txt`

git+https://github.com/hiyouga/EasyR1.git@d146d24e990c8102fee44e61e5ca389907712960

`config.yaml`

---
data:
  train_files: hiyouga/math12k@train
  val_files: hiyouga/math12k@test
  prompt_key: problem
  answer_key: answer
  image_key: images
  video_key: videos
  image_dir:
  video_fps: 2.0
  max_prompt_length: 2048
  max_response_length: 2048
  rollout_batch_size: 512
  mini_rollout_batch_size:
  val_batch_size: 1024
  format_prompt: ./code/easyR1/format_prompt/math.jinja
  override_chat_template:
  shuffle: true
  seed: 1
  min_pixels: 262144
  max_pixels: 4194304
  filter_overlong_prompts: true

algorithm:
  adv_estimator: grpo
  disable_kl: false
  use_kl_loss: true
  kl_penalty: low_var_kl
  kl_coef: 1.0e-2
  online_filtering: false
  filter_key: overall
  filter_low: 0.01
  filter_high: 0.99

worker:
  actor:
    global_batch_size: 128
    micro_batch_size_per_device_for_update: 1
    micro_batch_size_per_device_for_experience: 2
    max_grad_norm: 1.0
    padding_free: true
    dynamic_batching: true
    ulysses_size: 1
    model:
      model_path: Qwen/Qwen2.5-7B-Instruct
      enable_gradient_checkpointing: true
      trust_remote_code: false
      freeze_vision_tower: false
    optim:
      lr: 1.0e-6
      weight_decay: 1.0e-2
      strategy: adamw
      lr_warmup_ratio: 0.0
    fsdp:
      enable_full_shard: true
      enable_cpu_offload: false
      enable_rank0_init: true
    offload:
      offload_params: true
      offload_optimizer: true

  rollout:
    n: 5
    temperature: 1.0
    top_p: 1.0
    limit_images: 0
    gpu_memory_utilization: 0.6
    enforce_eager: false
    enable_chunked_prefill: false
    tensor_parallel_size: 2
    disable_tqdm: true
    val_override_config:
      temperature: 0.6
      top_p: 0.95
      n: 1

  ref:
    fsdp:
      enable_full_shard: true
      enable_cpu_offload: true
      enable_rank0_init: true
    offload:
      offload_params: false

  reward:
    reward_type: batch
    reward_function: ./code/easyR1/reward_function/math.py:compute_score

trainer:
  total_epochs: 15
  max_steps:
  project_name: easy_r1
  experiment_name: qwen2_5_7b_math_grpo
  logger: [file, wandb]
  nnodes: 1
  n_gpus_per_node: 8
  max_try_make_batch: 20
  val_freq: 5
  val_before_train: true
  val_only: false
  val_generations_to_log: 3
  save_freq: 5
  save_limit: 3
  save_model_only: false
  save_checkpoint_path: /output-checkpoint
  load_checkpoint_path:
  find_last_checkpoint: true

`format_prompt/math.jinja`

{{ "{{" }} content | trim {{ "}}" }} You FIRST think about the reasoning process as an internal monologue and then provide the final answer. The reasoning process MUST BE enclosed within <think> </think> tags. The final answer MUST BE put in \boxed{}.

`reward_function/math.py`

# Copyright 2024 Bytedance Ltd. and/or its affiliates
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0

import re
from typing import Any

from mathruler.grader import extract_boxed_content, grade_answer


def format_reward(response: str) -> float:
    pattern = re.compile(r"<think>.*</think>.*\\boxed\{.*\}.*", re.DOTALL)
    format_match = re.fullmatch(pattern, response)
    return 1.0 if format_match else 0.0


def accuracy_reward(response: str, ground_truth: str) -> float:
    answer = extract_boxed_content(response)
    return 1.0 if grade_answer(answer, ground_truth) else 0.0


def compute_score(
    reward_inputs: list[dict[str, Any]], format_weight: float = 0.1
) -> list[dict[str, float]]:
    if not isinstance(reward_inputs, list):
        raise ValueError("Please use `reward_type=batch` for math reward function.")

    scores = []
    for reward_input in reward_inputs:
        response = re.sub(
            r"\s*(<|>|/)\s*", r"\1", reward_input["response"]
        )
        format_score = format_reward(response)
        accuracy_score = accuracy_reward(response, reward_input["ground_truth"])
        scores.append(
            {
                "overall": (1 - format_weight) * accuracy_score
                + format_weight * format_score,
                "format": format_score,
                "accuracy": accuracy_score,
            }
        )

    return scores

🚀 Run this on FlexAI

Managed checkpoints mean you never lose a run to preemption. Jobs launch in under 60 seconds — no infra setup, built-in observability.

Get started →Talk to us

Getting Started

Inference

Fine-tuning

Training

Platform Services

Interactive Development

CLI

Console

Best Practices

FAQ

Blueprints

Documentation Index

​Quick Start

​What is EasyR1?

​Directory Structure

​Key Configuration Sections

​Data Configuration

​Algorithm Settings

​Worker Configuration

​Available Baselines (in EasyR1 repo)

​Custom Dataset

​Custom Reward Function

​Custom Prompt Format

​Create Secrets

​[Optional] Pre-fetch the Model

​Training

​Standard Training: Mathematical Reasoning with GRPO

​Training with Model Prefetch

​Training with Custom Configuration

​Monitoring Training Progress

​Training Observability with Weights & Biases

​Getting Training Checkpoints

​Serving the Trained Model

​Testing Your RL-Trained Model

​Before and After Training Comparison

​Example API Call

​Expected Results

​Technical Details

​Training Configuration Breakdown

​Resource Requirements

​Key Configuration Parameters

​Scaling Options

​Advanced Examples

​Vision-Language Model with Geometric Reasoning

​Using DAPO Algorithm

​Troubleshooting

​References

​Code

​requirements.txt

​config.yaml

​format_prompt/math.jinja

​reward_function/math.py

​🚀 Run this on FlexAI

Quick Start

What is EasyR1?

Directory Structure

Key Configuration Sections

Data Configuration

Algorithm Settings

Worker Configuration

Available Baselines (in EasyR1 repo)

Custom Dataset

Custom Reward Function

Custom Prompt Format

Create Secrets

[Optional] Pre-fetch the Model

Training

Standard Training: Mathematical Reasoning with GRPO

Training with Model Prefetch

Training with Custom Configuration

Monitoring Training Progress

Training Observability with Weights & Biases

Getting Training Checkpoints

Serving the Trained Model

Testing Your RL-Trained Model

Before and After Training Comparison

Example API Call

Expected Results

Technical Details

Training Configuration Breakdown

Resource Requirements

Key Configuration Parameters

Scaling Options

Advanced Examples

Vision-Language Model with Geometric Reasoning

Using DAPO Algorithm

Troubleshooting

References

Code

`requirements.txt`

`config.yaml`

`format_prompt/math.jinja`

`reward_function/math.py`

🚀 Run this on FlexAI