hinmer
Loading Heatmap…

hinmer synced commits to wentao-small-refactor at hinmer/vllm from mirror

  • c9b968a349 Merge branch 'main' into wentao-small-refactor
  • 855b101d75 [Frontend] add tools for dsv32 developer role (#30040) Signed-off-by: pridejcyang <pridejcyang@tencent.com> Co-authored-by: pridejcyang <pridejcyang@tencent.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
  • d0502b4928 [MoE][Refactor 1/N] Separate Online Quantization (#30627) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>
  • 3f175f18a2 [Bugfix] Fix multimodal configuration for Qwen3VL MOE model (#30670) Signed-off-by: Max Hu <hyoung2991@gmail.com>
  • ed586e7724 [Refactor] [3/N] Move tool parser tests and run on CPU (#30693) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
  • Compare 25 commits »

8 hours ago

hinmer synced commits to wentao-enable-eplb-with-default-backend at hinmer/vllm from mirror

  • d160e1f33b Merge branch 'main' into wentao-enable-eplb-with-default-backend Signed-off-by: yewentao256 <zhyanwentao@126.com>
  • 855b101d75 [Frontend] add tools for dsv32 developer role (#30040) Signed-off-by: pridejcyang <pridejcyang@tencent.com> Co-authored-by: pridejcyang <pridejcyang@tencent.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
  • d0502b4928 [MoE][Refactor 1/N] Separate Online Quantization (#30627) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>
  • 3f175f18a2 [Bugfix] Fix multimodal configuration for Qwen3VL MOE model (#30670) Signed-off-by: Max Hu <hyoung2991@gmail.com>
  • ed586e7724 [Refactor] [3/N] Move tool parser tests and run on CPU (#30693) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
  • Compare 75 commits »

8 hours ago

hinmer synced commits to main at hinmer/vllm from mirror

  • c01d589813 [Benchmarks] `auto_tune.sh`: Use hostname variable for server requests (#30529) Signed-off-by: Kevin Musgrave <kevin.musgrave@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
  • 60dbf7d8f1 Update batch invariant to use attention config (#30704) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
  • a450c64a30 [Bugfix] Fail instead of ignoring when CompilationConfig gets invalid args (#30708) Signed-off-by: mgoin <mgoin64@gmail.com>
  • b2191abdca [docs][fix] Update Arm CPU vLLM wheel installation docs (#30594) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
  • 51e5b3e3c4 [Bugfix] Fix ViT with FlashAttention on ROCm (#30703) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
  • Compare 18 commits »

8 hours ago

hinmer synced commits to separate-online-quantization at hinmer/vllm from mirror

  • 4d4c3c38ee Merge branch 'main' into separate-online-quantization
  • 738648fb81 [CustomOp] Support object-level enable for CustomOp (#30547) Signed-off-by: shen-shanshan <467638484@qq.com>
  • 917fdae5b2 [Log] Skip piecewise cudagraph warn when using full cudagraph (#30657) Signed-off-by: Boyuan Feng <boyuan@meta.com>
  • e2ed238885 Revert "[Fix]Load kv-cache dtype from hf_quant_config.json automatically" (#30653)
  • 174e39ead7 CPU KV Offloading: Use more CUDA streams (#29013) Signed-off-by: Or Ozeri <oro@il.ibm.com>
  • Compare 30 commits »

23 hours ago

hinmer synced commits to main at hinmer/vllm from mirror

  • e3a1cd1c59 [XPU] fix Dockerfile.xpu, avoid wheel conflicts (#30662) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
  • 3778673ea8 [Feat] Refactor for `parallel_config` in `FusedMoEModularKernel` (#30282) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
  • b337647aa0 [Bugfix] Drop empty tool_calls lists to keep assistant replies in chat template (#30648) Signed-off-by: Seokhyun An <iamseokhyun@gmail.com>
  • a524d1ba0a [Bugfix] Fix deepseek_v32 tokenizer_mode (#30658) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
  • 87b4d1557d [CustomOp][MM] Extract MMEncoderAttention as CustomOp and replace the backend of QwenVisionAttention with it. (#30125) Signed-off-by: shen-shanshan <467638484@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
  • Compare 8 commits »

23 hours ago

hinmer synced commits to wentao-small-refactor at hinmer/vllm from mirror

  • db006e8db5 Merge branch 'main' into wentao-small-refactor
  • ae88aada38 [Feature]Add EVS (Efficient Video Sampling) Support for Qwen3-VL (#29752) Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com> Co-authored-by: deitxfge <huhaibo1990@126.com>
  • 5ccf0efa84 [Bugfix] Improve error messages in ModelConfig validation (#30213) Signed-off-by: ytian218 <ytian218@bloomberg.net> Co-authored-by: ytian218 <ytian218@bloomberg.net>
  • 994acec0cc [Bugfix] Fix fusion for VL models (#30244) Signed-off-by: ElizaWszola <ewszola@redhat.com>
  • 48b8456ff9 [Bugfix] Revert Qwen2-VL part of change in #28271 (#30542) Signed-off-by: Zifei Tong <zifeitong@gmail.com>
  • Compare 24 commits »

1 day ago

hinmer synced commits to wentao-parallel_config-None-issue at hinmer/vllm from mirror

  • a5be22d9bc Merge branch 'main' into wentao-parallel_config-None-issue
  • e2ed238885 Revert "[Fix]Load kv-cache dtype from hf_quant_config.json automatically" (#30653)
  • 174e39ead7 CPU KV Offloading: Use more CUDA streams (#29013) Signed-off-by: Or Ozeri <oro@il.ibm.com>
  • 9ccbf6b692 [responsesAPI]add extra body parameters (#30532) Signed-off-by: Ri0S <aa248424@gmail.com>
  • ae2e503dda [NIXL][BUG FIX] Fix a bug for PD with host_buffer after merging 29665 (#30420) Signed-off-by: Chendi Xue <chendi.xue@intel.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com>
  • Compare 26 commits »

1 day ago

hinmer synced commits to main at hinmer/vllm from mirror

  • e2ed238885 Revert "[Fix]Load kv-cache dtype from hf_quant_config.json automatically" (#30653)
  • 174e39ead7 CPU KV Offloading: Use more CUDA streams (#29013) Signed-off-by: Or Ozeri <oro@il.ibm.com>
  • 9ccbf6b692 [responsesAPI]add extra body parameters (#30532) Signed-off-by: Ri0S <aa248424@gmail.com>
  • ae2e503dda [NIXL][BUG FIX] Fix a bug for PD with host_buffer after merging 29665 (#30420) Signed-off-by: Chendi Xue <chendi.xue@intel.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com>
  • 9e33a1a75b [Model][Quantization] Override HF defaults to GGUF ones (incl. Qwen3 MoE) (#30118) Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>
  • Compare 21 commits »

1 day ago

hinmer synced commits to wentao-small-refactor at hinmer/vllm from mirror

  • 968934e77d Merge branch 'main' into wentao-small-refactor
  • 763963aa73 set assume_32bit_indexing and pass unbacked hints (#30459) Signed-off-by: Laith Sakka <lsakka@meta.com>
  • 39cefbdf17 [Refactor] `TokenizerRegistry` only uses lazy imports (#30609) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
  • ace34e3783 [Bugfix] Qwen3-next with --hf-overrides \{\"num_hidden_layers\":8\} (#30433) Signed-off-by: Chen Zhang <zhangch99@outlook.com>
  • e5db3e2774 [CI/Build] Fix broken mm processor test Mistral-3-large (#30597) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
  • Compare 28 commits »

1 day ago

hinmer synced commits to wentao-parallel_config-None-issue at hinmer/vllm from mirror

  • a8daac12d5 Merge branch 'main' into wentao-parallel_config-None-issue
  • c2dd335d69 fix failing test Signed-off-by: Robert Shaw <robshaw@redhat.com>
  • 24429d5924 [Doc] Add instructions for building docker image on GB300 with CUDA13 (#30414) Signed-off-by: Qidong Su <soodoshll@gmail.com>
  • 6e78ed6ba7 [Logs] Optimize startup logs 4 (#29903) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
  • 7c16f3fbcc [Doc] Add documents for multi-node distributed serving with MP backend (#30509) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
  • Compare 23 commits »

1 day ago

hinmer synced commits to main at hinmer/vllm from mirror

  • f569c654e1 enable unbacked with aot_compile (#30462) Signed-off-by: Laith Sakka <lsakka@meta.com>
  • 97f2f160fd [ROCm][CI] Add "Qwen3-Next-80B-A3B-Instruct MTP Async EPLB Accuracy Test" Back Into AMD CI (#30590) Signed-off-by: David Chen <530634352@qq.com> Signed-off-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com> Signed-off-by: Micah Williamson <micah.williamson@amd.com> Co-authored-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
  • 29f7d97715 Improve parse_raw_prompt test cases for invalid input .v2 (#30512) Signed-off-by: Kayvan Mivehnejad <K.Mivehnejad@gmail.com>
  • dc7fb5bebe [Bug][KVConnector][Metrics] Remove a vacuous assertion breaking external-launcher (#30577) Co-authored-by: Qier Li <qier@fb.com>
  • 24429d5924 [Doc] Add instructions for building docker image on GB300 with CUDA13 (#30414) Signed-off-by: Qidong Su <soodoshll@gmail.com>
  • Compare 11 commits »

1 day ago

hinmer synced commits to ghsa-mcmc-2m55-j8jj at hinmer/vllm from mirror

1 day ago

hinmer synced commits to wentao-parallel_config-None-issue at hinmer/vllm from mirror

  • 26760f3e62 address comments Signed-off-by: yewentao256 <zhyanwentao@126.com>
  • 835f86c1b7 Merge branch 'main' into wentao-parallel_config-None-issue
  • 13618626df [MoE-FP8-modelopt] Add FlashInfer alignment padding for intermediate dimensions (#29748) Signed-off-by: Daniel Afrimi <dafrimi@pool0-00589.cm.cluster> Signed-off-by: dafrimi <dafrimi@nvidia.com> Co-authored-by: Daniel Afrimi <dafrimi@pool0-00589.cm.cluster> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
  • 6ec0d8dbe4 [Fix]Load kv-cache dtype from hf_quant_config.json automatically (#29980) Signed-off-by: Daniel Afrimi <dafrimi@nvidia.com>
  • 9693dd0fe3 [CI/Build] Add x86 CPU wheel release pipeline (#28848) Signed-off-by: jiang1.li <jiang1.li@intel.com>
  • Compare 75 commits »

2 days ago

hinmer synced commits to main at hinmer/vllm from mirror

  • e5db3e2774 [CI/Build] Fix broken mm processor test Mistral-3-large (#30597) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
  • 64251f48df [Chore] Adjust tokenizer import to avoid circular imports (#30601) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
  • 1cec5b7ea9 [Scheduer] Simplify stop checking for pooling models (#30591) Signed-off-by: Nick Hill <nhill@redhat.com>
  • b09806e28f [Bugfix] Dictionary MM embeddings for online chat (#30507) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
  • fdc135d768 [Misc][Quantization] Clarify the intent of GGUF `FusedMoE` weight materialization (#30310) Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>
  • Compare 28 commits »

2 days ago

hinmer synced commits to wentao-parallel_config-None-issue at hinmer/vllm from mirror

3 days ago

hinmer synced commits to main at hinmer/vllm from mirror

  • f90319d5d1 [Bugfix] Schedule failure due to wrong get_image_size_with_most_features (#29692)
  • 302b2c1eb9 [CI/Build][AMD] Fix ref_dynamic_per_token_quant reference implementation on ROCm. (#30291) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>
  • 8f8fda261a [Bugfix] Multiple fixes for gpt-oss Chat Completion prompting (#28729) Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
  • fe1787107e [compile] Parse compile range cache keys as Range during cache loading. (#30516) Signed-off-by: zhxchen17 <zhxchen17@fb.com>
  • 783644e4ac [ROCm][CI] Skip multi-GPU speculative decoding tests when insufficient GPUs available (#30527) Signed-off-by: Andreas Karatzas <akaratza@amd.com>
  • Compare 34 commits »

3 days ago

hinmer synced commits to main at hinmer/vllm from mirror

  • 3a3b06ee70 [Misc] Improve error message for `is_multimodal` (#30483) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
  • f4417f8449 [KVConnector] Add KV events to KV Connectors (#28309) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
  • a11f4a81e0 [Misc][PCP&DCP] relocate PCP feature check (#30050) Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
  • 853611bb18 Fix typo of endpoint name in CLI args docs (#30473) Signed-off-by: Kenichi Maehashi <maehashi@preferred.jp>
  • d917747c95 [Bugfix] Fix `task` still being passed in tests/benchmarks (#30476) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
  • Compare 11 commits »

4 days ago

hinmer synced commits to wentao-parallel_config-None-issue at hinmer/vllm from mirror

  • 2fabee5ebb fix issue in main Signed-off-by: yewentao256 <zhyanwentao@126.com>
  • 10787cb5de Merge branch 'main' into wentao-parallel_config-None-issue
  • fcb894222f [Docs] Update EPLB docs (#30426) Signed-off-by: mgoin <mgoin64@gmail.com>
  • c69c25adef update to moe parallel config Signed-off-by: yewentao256 <zhyanwentao@126.com>
  • 0facc44a4f Merge branch 'main' into wentao-parallel_config-None-issue
  • Compare 82 commits »

4 days ago

hinmer synced commits to wentao-optimize-startup-logs-4 at hinmer/vllm from mirror

  • 5d56f61cd7 Merge branch 'main' into wentao-optimize-startup-logs-4
  • eea41804a4 [bug] Fix "Current vLLM config is not set." warnings when FlashInfer attention is used (#30241) Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
  • 9f042ba26b [Perf] Enable environment cache in EngineCore to enable the feature for UniProcExecutor as well (#29289) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
  • e72d65b959 {Deprecation] Remove tokenizer setter (#30400) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
  • a9e4106f28 [P/D] KV Load Failure Recovery/Abort Configuration (#26813) Signed-off-by: Will Eaton <weaton@redhat.com> Signed-off-by: Will Eaton <me@wseaton.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: chaunceyjiang <chaunceyjiang@gmail.com>
  • Compare 115 commits »

4 days ago

hinmer synced commits to main at hinmer/vllm from mirror

  • d02d1043de fix: enhance human_readable_int function (#30337) Signed-off-by: Andy Xie <andy.xning@gmail.com>
  • 979f50efd0 [Deprecation] Remove fallbacks for `embed_input_ids` and `embed_multimodal` (#30458) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
  • 36c9ce2554 Ensure minimum frames for GLM 4.6V compatibility (#30285) Signed-off-by: Wayne Ferguson <wayneferguson@gmail.com>
  • 1a516557e1 [Doc] Add Baidu Kunlun XPU support (#30455) Signed-off-by: xyDong0223 <dongxinyu23@gmail.com>
  • d6464f2679 [Chore] Fix torch precision warning (#30428) Signed-off-by: yewentao256 <zhyanwentao@126.com>
  • Compare 26 commits »

4 days ago

Baidu
map