Querying an Inference Endpoint
After your Inference Endpoint is up and running, you can start querying it to get predictions from the deployed model.
Back in the Inference Endpoints list, you will notice the URL column field of your newly created Inference Endpoint will be populated after a few seconds. This is the base URL of your endpoint.
The API’s path will vary depending on the model you are using. For instance, here’s how you can query TinyLlama/TinyLlama-1.1B-Chat-v1.0
using curl
:
curl -X POST https://inference-efcaac4c-c228-43d5-bdcf-f6689feb8747-d5e532c3.flex.ai/v1/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer flex_ACY8W7..." \ -d '{ "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "messages": [{"role": "user", "content": "What is love?"}], "max_tokens": 256 }'
A response similar to this will be returned:
{ "id": "chatcmpl-e6c2d199-b57d-4904-a1cb-89d604b146d9", "object": "chat.completion", "created": 1748238665, "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "choices": [ { "index": 0, "message": { "role": "assistant", "reasoning_content": null, "content": "Love is a powerful and meaningful emotion that brings people close together. It is a connection between two individuals that transcends time and space, supporting and nurturing a deep and deepening relationship. Love is the foundation that holds relationships together, reflecting in the happiness, relationships, and harmony surrounding us in every aspect of life. To understand love and its complexities, one would need to delve deep into the arena of psychology, philosophy, anthropology, literature, and social sciences. It is an elusive but natural human reaction, serving as an enabling force for the bonding of our species, and it is an integral aspect of human nature.", "tool_calls": [] }, "logprobs": null, "finish_reason": "stop", "stop_reason": null } ], "usage": { "prompt_tokens": 20, "total_tokens": 162, "completion_tokens": 142, "prompt_tokens_details": null }, "prompt_logprobs": null, "kv_transfer_params": null}
Inference Endpoint Details
Section titled “Inference Endpoint Details”You can select the gear icon ⚙️ (labeled as Configure) in the Actions
field of the Inference Endpoint list row of your newly created Endpoint to open a detailed overview of the Inference Endpoint deployment.
The Details tab will be opened by default, showing you all the relevant information about your Inference Endpoint.
The Details tab
Section titled “The Details tab”This tab provides you with detailed information about your Inference Endpoint, including:
Summary
Section titled “Summary”Field | Description |
---|---|
ID | The unique identifier of the Inference Endpoint. |
Name | The name you assigned to the Inference Endpoint. |
Status | The current status of the Inference Endpoint (e.g., Running , Stopped , etc.). |
URL | The base URL of the Inference Endpoint, which you can use to query the model. |
Playground URL | The URL of the Inference Playground, a user-friendly interface to interact with your deployed model. |
Dashboard URL | The URL of the Inference Endpoint dashboard, where you can monitor the performance and usage of your model. |
Configuration
Section titled “Configuration”Field | Description |
---|---|
Device Architecture | The architecture of the device where the Inference Endpoint is running (e.g., nvidia ). |
Runtime Args | The vLLM runtime arguments that were used to deploy the Inference Endpoint. These can be customized when creating or updating the Inference Endpoint. |
HF Token Secret Name | The name of the FlexAI Secret that contains the Hugging Face Access Token, if applicable. This is only shown if the Inference Endpoint requires a Hugging Face Access Token to access the model. |
API Key Secret Name | The name of the FlexAI Secret that contains the API Key used to authenticate requests to the Inference Endpoint. |
The Logs tab
Section titled “The Logs tab”The Logs tab provides you with real-time logs from your Inference Endpoint, allowing you to monitor its activity and troubleshoot any issues that may arise.
You can use the Search bar input field to filter the logs by a specific keyword. This is useful to quickly find relevant information in the logs.
The Playground
Section titled “The Playground”The FlexAI Inference Playground is a convenient way to interact with your deployed models without needing to write any code —or cURL requests. It allows you to test your Inference Endpoint using a user-friendly interface.
It is a FlexAI-hosted instance of Chainlit 🔗, a UI tool for interacting with multimodal AI models.

You can get the Inference Playground URL by opening the Inference Endpoint’s Details drawer menu. You will find the URL under the default Details tab’s Summary section.