scripts/inference-providers/templates/task/chat-completion.handlebars (43 lines of code) (raw):

## Chat Completion Generate a response given a list of messages in a conversational context, supporting both conversational Language Models (LLMs) and conversational Vision-Language Models (VLMs). This is a subtask of [`text-generation`](https://huggingface.co/docs/inference-providers/tasks/text-generation) and [`image-text-to-text`](https://huggingface.co/docs/inference-providers/tasks/image-text-to-text). ### Recommended models #### Conversational Large Language Models (LLMs) {{#each recommendedModels.chat-completion}} - [{{this.id}}](https://huggingface.co/{{this.id}}): {{this.description}} {{/each}} #### Conversational Vision-Language Models (VLMs) {{#each recommendedModels.conversational-image-text-to-text}} - [{{this.id}}](https://huggingface.co/{{this.id}}): {{this.description}} {{/each}} {{{tips.listModelsLink.image-text-to-text}}} ### API Playground For Chat Completion models, we provide an interactive UI Playground for easier testing: - Quickly iterate on your prompts from the UI. - Set and override system, assistant and user messages. - Browse and select models currently available on the Inference API. - Compare the output of two models side-by-side. - Adjust requests parameters from the UI. - Easily switch between UI view and code snippets. <a href="https://huggingface.co/playground" target="blank"><img src="https://cdn-uploads.huggingface.co/production/uploads/5f17f0a0925b9863e28ad517/9_Tgf0Tv65srhBirZQMTp.png" style="max-width: 400px; width: 100%;"/></a> Access the Inference UI Playground and start exploring: [https://huggingface.co/playground](https://huggingface.co/playground) ### Using the API The API supports: * Using the chat completion API compatible with the OpenAI SDK. * Using grammars, constraints, and tools. * Streaming the output #### Code snippet example for conversational LLMs {{{snippets.chat-completion}}} #### Code snippet example for conversational VLMs {{{snippets.conversational-image-text-to-text}}} ### API specification #### Request {{{constants.specsHeaders}}} {{{specs.chat-completion.input}}} #### Response Output type depends on the `stream` input parameter. If `stream` is `false` (default), the response will be a JSON object with the following fields: {{{specs.chat-completion.output}}} If `stream` is `true`, generated tokens are returned as a stream, using Server-Sent Events (SSE). For more information about streaming, check out [this guide](https://huggingface.co/docs/text-generation-inference/conceptual/streaming). {{{specs.chat-completion.stream_output}}}