Shopping News / Articles
stateless_coordinator
9+ hour, 24+ min ago (53+ words) stateless_coordinator'vLLM Docs A stateless version of the GroupCoordinator class in parallel_state, It will create CPU, device and TCPStore based communication groups that are independent of PyTorch's WORLD group. Hence, communication groups with a different set of participants GPUs can be created…...
End-to-End Profiling - vLLM Hardware Plugin for Intel® Gaudi®
1+ mon, 1+ week ago (201+ words) E2E profiling captures all relevant data in a single run, combining: Due to the large amount of data collected during E2E profiling, Python stack events in the PyTorch Profiler are disabled by default. If you need Python stack events, use either PyTorch…...
Release Notes
3+ mon, 2+ week ago (1724+ words) This is the final release of v0.13.0 for vLLM Ascend. Please follow the official doc to get started. Qwen3-Next: Full support for Qwen3-Next series including 80B-A3B-Instruct with full graph mode, MTP, quantization (W8A8), NZ optimization, and chunked prefill. Fixed multiple accuracy and…...
Supported Models
3+ mon, 3+ week ago (1230+ words) vLLM supports generative and pooling models across various tasks. For each task, we list the model architectures that have been implemented in vLLM. Alongside each architecture, we include some popular models that use it. If vLLM natively supports a model, its…...
speech_to_text
3+ mon, 3+ week ago (105+ words) Convert tokens to verbose segments. This method expects the model to produce timestamps as tokens (similar to Whisper). If the tokens do not include timestamp information, the segments may not be generated correctly. Note: Fields like avg_logprob, compression_ratio, and no_speech_prob are not supported…...
multi_connector
3+ mon, 3+ week ago (81+ words) A wrapper for using multiple KVConnectors at the same time. the required KV cache layout. e.g. HND, or NHD. None if the connector does not require a specific layout. Set xPU-specific copy ops for all sub-connectors. Set the KV connector handshake…...
Troubleshooting
3+ mon, 3+ week ago (961+ words) This document outlines some troubleshooting strategies you can consider. If you think you've discovered a bug, please search existing issues first to see if it has already been reported. If not, please file a new issue, providing as much relevant…...
weight_utils
3+ mon, 3+ week ago (481+ words) Context manager that provides an atomic file writing routine. The context manager writes to a temporary file and, if successful, atomically replaces the original file. The path to the file to write. The file mode for the temporary file (e.g., 'w…...
modelopt - vLLM
3+ mon, 3+ week ago (222+ words) Supports loading kv-cache scaling factors from FP8 checkpoints. Linear method for Model Optimizer static quantization. Supports loading FP8 checkpoints with static weight scale and activation scale. Future support might be added for dynamic scales. Pad intermediate size so FlashInfer kernels' alignment constraints…...
step3_reasoning_parser
3+ mon, 3+ week ago (120+ words) step3_reasoning_parser'vLLM Docs Reasoning parser for Step3 model. The Step3 model uses token to denote the end of reasoning text. This parser extracts all content before as reasoning content. Extract reasoning content from a delta message. Handles streaming output where previous + delta = current....
Shopping
Please enter a search for detailed shopping results.