!7845 bugfix】【master】【docs】deepseek和telechat文档修改

Merge pull request !7845 from Yule100/code_docs_telechat2_infer
bugfix 文档修复
--- a/mindformers/parallel_core/training_graph/transformer/multi_latent_attention.py
+++ b/mindformers/parallel_core/training_graph/transformer/multi_latent_attention.py
@@ -182,7 +182,7 @@ class MultiLatentAttention(nn.Cell):
        cp = self.cp

        self.bs_transpose.shard(((dp, cp, tp),))
        self.tnd_transpose.shard(((cp, dp, tp, 1),))
        self.tnd_transpose.shard((layout("cp", "dp", "tp", "None"),))

    def construct(self, x: Tensor, attention_mask=None, rotary_pos_emb=None, rotary_pos_cos=None,
                  rotary_pos_sin=None, prefix_keys_values=None, pad_zeros=None, actual_seq_len=None):
--- a/mindformers/parallel_core/training_graph/transformer/multi_token_prediction.py
+++ b/mindformers/parallel_core/training_graph/transformer/multi_token_prediction.py
@@ -193,13 +193,12 @@ class MultiTokenPredictionLayer(nn.Cell):
        """Set parallel strategy."""
        dp = self.config.data_parallel_size
        tp = self.config.tensor_model_parallel_size
        cp = self.config.context_parallel_size
        self.concat.add_prim_attr("self_define_shard", True)
        self.concat.shard(in_strategy=((layout("cp", "dp", "None"), layout("cp", "dp", "None")),),
                          out_strategy=(layout("cp", "dp", "None"),))
        if self.use_seq_parallel and cp == 1:
            self.enorm.shard(config, in_strategy=(layout("tp", "dp", "None"), layout("None",)))
            self.hnorm.shard(config, in_strategy=(layout("tp", "dp", "None"), layout("None",)))
        if self.use_seq_parallel:
            self.enorm.shard(config, in_strategy=(layout(("cp", "tp"), "dp", "None"), layout("None",)))
            self.hnorm.shard(config, in_strategy=(layout(("cp", "tp"), "dp", "None"), layout("None",)))
            self.concat.add_prim_attr("self_define_shard", True)
            self.concat.shard(in_strategy=((layout(("cp", "tp"), "dp", "None"), layout(("cp", "tp"), "dp", "None")),),
                              out_strategy=(layout(("cp", "tp"), "dp", "None"),))
--- a/research/deepseek3/README.md
+++ b/research/deepseek3/README.md
@@ -87,6 +87,11 @@ python research/deepseek3/fp8_cast_bf16.py \
 --output-bf16-hf-path path/to/hf_model_bf16_dir/
 ```

 参数说明：

 - input-fp8-hf-path：数据类型为fp8的原始权重文件夹路径。
 - output-bf16-hf-path：转换成数据类型为bf16后的权重文件夹路径。

 >`path/to/hf_model_bf16_dir/` 可修改为自定义路径，确保该路径有足够的磁盘空间（约 1.4TB）。

 ## 推理
@@ -135,6 +140,11 @@ python research/deepseek3/convert_weight.py \
 - infer：是否进行推理权重的转换，默认值：`False`。
 - mindspore_ckpt_path：转换后的MindSpore权重文件夹保存路径
 - worker_num：多进程转换的进程数，默认值：`4`。
 - use_grouped_gemm：是否使用grouped_gemm，默认值：`False`。
 - n_head：模型结构中Attention的头数，默认值：`128`。
 - v_head_dim：单个注意力头中，Value向量的维度大小，默认值为：`128`。
 - save_format：权重保存的格式，默认值：`safetensors`。
 - param_json：权重的参数映射表的JSON文件名，默认值：`model.safetensors.index.json`。

 如果使用训练后保存的权重进行推理，需要使用`deepseek3_train2infer.py`脚本将其转换为推理格式。执行以下命令进行转换：

@@ -220,6 +230,11 @@ bash scripts/msrun_launcher.sh "research/deepseek3/run_predict_deepseek.py \
 32 8 $master_ip 8888 3 output/msrun_log False 300
 ```

 参数说明：

 - config： 推理的YAML配置文件路径。
 - input： 推理的问题输入。

 预期的推理结果如下：

 ```txt
--- a/research/telechat2/README.md
+++ b/research/telechat2/README.md
@@ -35,28 +35,24 @@ TeleChat2-7B:
 | config                                              | task                  | Datasets   | SeqLength | phase           | performance  |
 |:---------------------------------------------------:| :-------------------: |:----------:|:---------:|:---------------:|:------------:|
 | [TeleChat2_7B](./telechat2-7b/finetune_telechat_7b.yaml) | text_generation       | example_dataset | 8192      | [finetune](#微调) | 2950 tokens/s/p |
 | [TeleChat2_7B](./telechat2-7b/predict_telechat_7b.yaml) | text_generation       | example_dataset     | 8192      | [predict](#推理)  | 54.1 tokens/s   |

 TeleChat2-35B:

 | config                                              | task                  | Datasets   | SeqLength | phase           | performance  |
 |-----------------------------------------------------| --------------------- |------------|-----------|-----------------|--------------|
 | [TeleChat2_35B](./telechat2-35b/finetune_telechat_35b.yaml) | text_generation       | example_dataset | 8192      | [finetune](#微调) | 516 tokens/s/p |
 | [TeleChat2_35B](./telechat2-35b/predict_telechat_35b.yaml) | text_generation       | example_dataset     | 8192      | [predict](#推理)  | 27.7 tokens/s   |

 TeleChat2-115B:

 | config                                              | task                  | Datasets   | SeqLength | phase           | performance  |
 |-----------------------------------------------------| --------------------- |------------|-----------|-----------------|--------------|
 | [TeleChat2_115B](./telechat2-115b/finetune_telechat_115b.yaml) | text_generation       | example_dataset | 8192      | [finetune](#微调) | 158 tokens/s/p |
 | [TeleChat2_115B](./telechat2-115b/predict_telechat_115b.yaml) | text_generation       | example_dataset     | 8192      | [predict](#推理)  | 26.5 tokens/s   |

 TeleChat2-39B-A12B:

 | config                                                       | task            | Datasets        | SeqLength | phase            | performance   |
 | ------------------------------------------------------------ | --------------- | --------------- | --------- | ---------------- | ------------- |
 | [TeleChat2_39B_A12B](./telechat2-39b-a12b/finetune_telechat_39b_a12b.yaml) | text_generation       | example_dataset | 8192      | [finetune](#微调) | 865 tokens/s/p |
 | [TeleChat2_39B_A12B](./telechat2-39b-a12b/predict_telechat_39b_a12b_parallel.yaml) | text_generation | example_dataset | 8192      | [predict](#推理) | 36.4 tokens/s |

 ## 模型文件

@@ -138,6 +134,12 @@ input_dataset_file: 预训练的数据集
 vocab_file_path: 词模型文件路径(如使用上述链接下载，指定到对应路径下即可)
 max_length: 数据集长度
 output_path: 生成数据集的路径
 seed: 随机数种子，默认值：2024
 start_token: 输入的首token，默认值：<_start>
 user_token: 用户输入的提示词token，默认值：<_usr>
 bot_token: 机器人输入的提示词token，默认值：<_bot>
 end_token: 终止符对应的token，默认值：<_end>
 pad_token: padding时补齐的token，默认值：<_pad>
 ```

  > 注：`bos`, `eos`, `pad`等特殊`ids`要和`yaml`配置文件中`model_config`部分保持一致，默认`bos_token_id=1`, `eos_token_id=2`, `pad_token_id=3`。
Author	SHA1	Message	Date
i-robot	f106c71c19	!7845 bugfix】【master】【docs】deepseek和telechat文档修改 Merge pull request !7845 from Yule100/code_docs_telechat2_infer	1 week ago
Yule100	1ec8a5cff3	bugfix 文档修复	1 week ago
i-robot	08f2e2215c	!7826 【master】【bugfix】消除cp MOE模块的冗余重排 Merge pull request !7826 from lzy0920232/code_bugfix_cp_moe	1 week ago
lzy0920232	b0a2c47564	code_bugfix_cp_moe	1 week ago