hinmer

hinmer synced commits to wentao-small-refactor at hinmer/vllm from mirror

c9b968a349 Merge branch 'main' into wentao-small-refactor
855b101d75 [Frontend] add tools for dsv32 developer role (#30040) Signed-off-by: pridejcyang <pridejcyang@tencent.com> Co-authored-by: pridejcyang <pridejcyang@tencent.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
d0502b4928 [MoE][Refactor 1/N] Separate Online Quantization (#30627) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>
3f175f18a2 [Bugfix] Fix multimodal configuration for Qwen3VL MOE model (#30670) Signed-off-by: Max Hu <hyoung2991@gmail.com>
ed586e7724 [Refactor] [3/N] Move tool parser tests and run on CPU (#30693) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Compare 25 commits »

8 hours ago

hinmer synced commits to wentao-enable-eplb-with-default-backend at hinmer/vllm from mirror

d160e1f33b Merge branch 'main' into wentao-enable-eplb-with-default-backend Signed-off-by: yewentao256 <zhyanwentao@126.com>
855b101d75 [Frontend] add tools for dsv32 developer role (#30040) Signed-off-by: pridejcyang <pridejcyang@tencent.com> Co-authored-by: pridejcyang <pridejcyang@tencent.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
d0502b4928 [MoE][Refactor 1/N] Separate Online Quantization (#30627) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>
3f175f18a2 [Bugfix] Fix multimodal configuration for Qwen3VL MOE model (#30670) Signed-off-by: Max Hu <hyoung2991@gmail.com>
ed586e7724 [Refactor] [3/N] Move tool parser tests and run on CPU (#30693) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Compare 75 commits »

8 hours ago

hinmer synced commits to main at hinmer/vllm from mirror

c01d589813 [Benchmarks] `auto_tune.sh`: Use hostname variable for server requests (#30529) Signed-off-by: Kevin Musgrave <kevin.musgrave@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
60dbf7d8f1 Update batch invariant to use attention config (#30704) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
a450c64a30 [Bugfix] Fail instead of ignoring when CompilationConfig gets invalid args (#30708) Signed-off-by: mgoin <mgoin64@gmail.com>
b2191abdca [docs][fix] Update Arm CPU vLLM wheel installation docs (#30594) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
51e5b3e3c4 [Bugfix] Fix ViT with FlashAttention on ROCm (#30703) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Compare 18 commits »

8 hours ago

hinmer synced commits to separate-online-quantization at hinmer/vllm from mirror

4d4c3c38ee Merge branch 'main' into separate-online-quantization
738648fb81 [CustomOp] Support object-level enable for CustomOp (#30547) Signed-off-by: shen-shanshan <467638484@qq.com>
917fdae5b2 [Log] Skip piecewise cudagraph warn when using full cudagraph (#30657) Signed-off-by: Boyuan Feng <boyuan@meta.com>
e2ed238885 Revert "[Fix]Load kv-cache dtype from hf_quant_config.json automatically" (#30653)
174e39ead7 CPU KV Offloading: Use more CUDA streams (#29013) Signed-off-by: Or Ozeri <oro@il.ibm.com>
Compare 30 commits »

23 hours ago

hinmer synced commits to main at hinmer/vllm from mirror

e3a1cd1c59 [XPU] fix Dockerfile.xpu, avoid wheel conflicts (#30662) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
3778673ea8 [Feat] Refactor for `parallel_config` in `FusedMoEModularKernel` (#30282) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
b337647aa0 [Bugfix] Drop empty tool_calls lists to keep assistant replies in chat template (#30648) Signed-off-by: Seokhyun An <iamseokhyun@gmail.com>
a524d1ba0a [Bugfix] Fix deepseek_v32 tokenizer_mode (#30658) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
87b4d1557d [CustomOp][MM] Extract MMEncoderAttention as CustomOp and replace the backend of QwenVisionAttention with it. (#30125) Signed-off-by: shen-shanshan <467638484@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
Compare 8 commits »

23 hours ago

hinmer synced commits to wentao-small-refactor at hinmer/vllm from mirror

db006e8db5 Merge branch 'main' into wentao-small-refactor
ae88aada38 [Feature]Add EVS (Efficient Video Sampling) Support for Qwen3-VL (#29752) Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com> Co-authored-by: deitxfge <huhaibo1990@126.com>
5ccf0efa84 [Bugfix] Improve error messages in ModelConfig validation (#30213) Signed-off-by: ytian218 <ytian218@bloomberg.net> Co-authored-by: ytian218 <ytian218@bloomberg.net>
994acec0cc [Bugfix] Fix fusion for VL models (#30244) Signed-off-by: ElizaWszola <ewszola@redhat.com>
48b8456ff9 [Bugfix] Revert Qwen2-VL part of change in #28271 (#30542) Signed-off-by: Zifei Tong <zifeitong@gmail.com>
Compare 24 commits »

1 day ago

hinmer synced commits to wentao-parallel_config-None-issue at hinmer/vllm from mirror

a5be22d9bc Merge branch 'main' into wentao-parallel_config-None-issue
e2ed238885 Revert "[Fix]Load kv-cache dtype from hf_quant_config.json automatically" (#30653)
174e39ead7 CPU KV Offloading: Use more CUDA streams (#29013) Signed-off-by: Or Ozeri <oro@il.ibm.com>
9ccbf6b692 [responsesAPI]add extra body parameters (#30532) Signed-off-by: Ri0S <aa248424@gmail.com>
ae2e503dda [NIXL][BUG FIX] Fix a bug for PD with host_buffer after merging 29665 (#30420) Signed-off-by: Chendi Xue <chendi.xue@intel.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com>
Compare 26 commits »

1 day ago

hinmer synced commits to main at hinmer/vllm from mirror

e2ed238885 Revert "[Fix]Load kv-cache dtype from hf_quant_config.json automatically" (#30653)
174e39ead7 CPU KV Offloading: Use more CUDA streams (#29013) Signed-off-by: Or Ozeri <oro@il.ibm.com>
9ccbf6b692 [responsesAPI]add extra body parameters (#30532) Signed-off-by: Ri0S <aa248424@gmail.com>
ae2e503dda [NIXL][BUG FIX] Fix a bug for PD with host_buffer after merging 29665 (#30420) Signed-off-by: Chendi Xue <chendi.xue@intel.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com>
9e33a1a75b [Model][Quantization] Override HF defaults to GGUF ones (incl. Qwen3 MoE) (#30118) Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>
Compare 21 commits »

1 day ago

hinmer synced commits to wentao-small-refactor at hinmer/vllm from mirror

968934e77d Merge branch 'main' into wentao-small-refactor
763963aa73 set assume_32bit_indexing and pass unbacked hints (#30459) Signed-off-by: Laith Sakka <lsakka@meta.com>
39cefbdf17 [Refactor] `TokenizerRegistry` only uses lazy imports (#30609) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
ace34e3783 [Bugfix] Qwen3-next with --hf-overrides \{\"num_hidden_layers\":8\} (#30433) Signed-off-by: Chen Zhang <zhangch99@outlook.com>
e5db3e2774 [CI/Build] Fix broken mm processor test Mistral-3-large (#30597) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Compare 28 commits »

1 day ago

hinmer synced commits to wentao-parallel_config-None-issue at hinmer/vllm from mirror

a8daac12d5 Merge branch 'main' into wentao-parallel_config-None-issue
c2dd335d69 fix failing test Signed-off-by: Robert Shaw <robshaw@redhat.com>
24429d5924 [Doc] Add instructions for building docker image on GB300 with CUDA13 (#30414) Signed-off-by: Qidong Su <soodoshll@gmail.com>
6e78ed6ba7 [Logs] Optimize startup logs 4 (#29903) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
7c16f3fbcc [Doc] Add documents for multi-node distributed serving with MP backend (#30509) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Compare 23 commits »

1 day ago

hinmer synced commits to main at hinmer/vllm from mirror

f569c654e1 enable unbacked with aot_compile (#30462) Signed-off-by: Laith Sakka <lsakka@meta.com>
97f2f160fd [ROCm][CI] Add "Qwen3-Next-80B-A3B-Instruct MTP Async EPLB Accuracy Test" Back Into AMD CI (#30590) Signed-off-by: David Chen <530634352@qq.com> Signed-off-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com> Signed-off-by: Micah Williamson <micah.williamson@amd.com> Co-authored-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
29f7d97715 Improve parse_raw_prompt test cases for invalid input .v2 (#30512) Signed-off-by: Kayvan Mivehnejad <K.Mivehnejad@gmail.com>
dc7fb5bebe [Bug][KVConnector][Metrics] Remove a vacuous assertion breaking external-launcher (#30577) Co-authored-by: Qier Li <qier@fb.com>
24429d5924 [Doc] Add instructions for building docker image on GB300 with CUDA13 (#30414) Signed-off-by: Qidong Su <soodoshll@gmail.com>
Compare 11 commits »

1 day ago

hinmer synced commits to ghsa-mcmc-2m55-j8jj at hinmer/vllm from mirror

5a2219a9cc noqa Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
70cc263b9e Fix pre-commit Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Compare 2 commits »

1 day ago

hinmer synced commits to wentao-parallel_config-None-issue at hinmer/vllm from mirror

26760f3e62 address comments Signed-off-by: yewentao256 <zhyanwentao@126.com>
835f86c1b7 Merge branch 'main' into wentao-parallel_config-None-issue
13618626df [MoE-FP8-modelopt] Add FlashInfer alignment padding for intermediate dimensions (#29748) Signed-off-by: Daniel Afrimi <dafrimi@pool0-00589.cm.cluster> Signed-off-by: dafrimi <dafrimi@nvidia.com> Co-authored-by: Daniel Afrimi <dafrimi@pool0-00589.cm.cluster> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
6ec0d8dbe4 [Fix]Load kv-cache dtype from hf_quant_config.json automatically (#29980) Signed-off-by: Daniel Afrimi <dafrimi@nvidia.com>
9693dd0fe3 [CI/Build] Add x86 CPU wheel release pipeline (#28848) Signed-off-by: jiang1.li <jiang1.li@intel.com>
Compare 75 commits »

2 days ago

hinmer synced commits to main at hinmer/vllm from mirror

e5db3e2774 [CI/Build] Fix broken mm processor test Mistral-3-large (#30597) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
64251f48df [Chore] Adjust tokenizer import to avoid circular imports (#30601) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
1cec5b7ea9 [Scheduer] Simplify stop checking for pooling models (#30591) Signed-off-by: Nick Hill <nhill@redhat.com>
b09806e28f [Bugfix] Dictionary MM embeddings for online chat (#30507) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
fdc135d768 [Misc][Quantization] Clarify the intent of GGUF `FusedMoE` weight materialization (#30310) Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>
Compare 28 commits »

2 days ago

hinmer synced commits to wentao-parallel_config-None-issue at hinmer/vllm from mirror

60906cddc0 fix tests Signed-off-by: yewentao256 <zhyanwentao@126.com>
6c9552b4a6 fix layer Signed-off-by: yewentao256 <zhyanwentao@126.com>
Compare 2 commits »

3 days ago

hinmer synced commits to main at hinmer/vllm from mirror

f90319d5d1 [Bugfix] Schedule failure due to wrong get_image_size_with_most_features (#29692)
302b2c1eb9 [CI/Build][AMD] Fix ref_dynamic_per_token_quant reference implementation on ROCm. (#30291) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>
8f8fda261a [Bugfix] Multiple fixes for gpt-oss Chat Completion prompting (#28729) Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
fe1787107e [compile] Parse compile range cache keys as Range during cache loading. (#30516) Signed-off-by: zhxchen17 <zhxchen17@fb.com>
783644e4ac [ROCm][CI] Skip multi-GPU speculative decoding tests when insufficient GPUs available (#30527) Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Compare 34 commits »

3 days ago

hinmer synced commits to main at hinmer/vllm from mirror

3a3b06ee70 [Misc] Improve error message for `is_multimodal` (#30483) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
f4417f8449 [KVConnector] Add KV events to KV Connectors (#28309) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
a11f4a81e0 [Misc][PCP&DCP] relocate PCP feature check (#30050) Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
853611bb18 Fix typo of endpoint name in CLI args docs (#30473) Signed-off-by: Kenichi Maehashi <maehashi@preferred.jp>
d917747c95 [Bugfix] Fix `task` still being passed in tests/benchmarks (#30476) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Compare 11 commits »

4 days ago

hinmer synced commits to wentao-parallel_config-None-issue at hinmer/vllm from mirror

2fabee5ebb fix issue in main Signed-off-by: yewentao256 <zhyanwentao@126.com>
10787cb5de Merge branch 'main' into wentao-parallel_config-None-issue
fcb894222f [Docs] Update EPLB docs (#30426) Signed-off-by: mgoin <mgoin64@gmail.com>
c69c25adef update to moe parallel config Signed-off-by: yewentao256 <zhyanwentao@126.com>
0facc44a4f Merge branch 'main' into wentao-parallel_config-None-issue
Compare 82 commits »

4 days ago

hinmer synced commits to wentao-optimize-startup-logs-4 at hinmer/vllm from mirror

5d56f61cd7 Merge branch 'main' into wentao-optimize-startup-logs-4
eea41804a4 [bug] Fix "Current vLLM config is not set." warnings when FlashInfer attention is used (#30241) Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
9f042ba26b [Perf] Enable environment cache in EngineCore to enable the feature for UniProcExecutor as well (#29289) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
e72d65b959 {Deprecation] Remove tokenizer setter (#30400) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
a9e4106f28 [P/D] KV Load Failure Recovery/Abort Configuration (#26813) Signed-off-by: Will Eaton <weaton@redhat.com> Signed-off-by: Will Eaton <me@wseaton.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: chaunceyjiang <chaunceyjiang@gmail.com>
Compare 115 commits »

4 days ago

hinmer synced commits to main at hinmer/vllm from mirror

d02d1043de fix: enhance human_readable_int function (#30337) Signed-off-by: Andy Xie <andy.xning@gmail.com>
979f50efd0 [Deprecation] Remove fallbacks for `embed_input_ids` and `embed_multimodal` (#30458) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
36c9ce2554 Ensure minimum frames for GLM 4.6V compatibility (#30285) Signed-off-by: Wayne Ferguson <wayneferguson@gmail.com>
1a516557e1 [Doc] Add Baidu Kunlun XPU support (#30455) Signed-off-by: xyDong0223 <dongxinyu23@gmail.com>
d6464f2679 [Chore] Fix torch precision warning (#30428) Signed-off-by: yewentao256 <zhyanwentao@126.com>
Compare 26 commits »

4 days ago