4 Commits

Author SHA1 Message Date
  i-robot f106c71c19
!7845 bugfix】【master】【docs】deepseek和telechat文档修改 1 week ago
  Yule100 1ec8a5cff3 bugfix 文档修复 1 week ago
  i-robot 08f2e2215c
!7826 【master】【bugfix】消除cp MOE模块的冗余重排 1 week ago
  lzy0920232 b0a2c47564 code_bugfix_cp_moe 1 week ago
4 changed files with 25 additions and 9 deletions
Split View
  1. +1
    -1
      mindformers/parallel_core/training_graph/transformer/multi_latent_attention.py
  2. +3
    -4
      mindformers/parallel_core/training_graph/transformer/multi_token_prediction.py
  3. +15
    -0
      research/deepseek3/README.md
  4. +6
    -4
      research/telechat2/README.md

+ 1
- 1
mindformers/parallel_core/training_graph/transformer/multi_latent_attention.py View File

@@ -182,7 +182,7 @@ class MultiLatentAttention(nn.Cell):
cp = self.cp

self.bs_transpose.shard(((dp, cp, tp),))
self.tnd_transpose.shard(((cp, dp, tp, 1),))
self.tnd_transpose.shard((layout("cp", "dp", "tp", "None"),))

def construct(self, x: Tensor, attention_mask=None, rotary_pos_emb=None, rotary_pos_cos=None,
rotary_pos_sin=None, prefix_keys_values=None, pad_zeros=None, actual_seq_len=None):


+ 3
- 4
mindformers/parallel_core/training_graph/transformer/multi_token_prediction.py View File

@@ -193,13 +193,12 @@ class MultiTokenPredictionLayer(nn.Cell):
"""Set parallel strategy."""
dp = self.config.data_parallel_size
tp = self.config.tensor_model_parallel_size
cp = self.config.context_parallel_size
self.concat.add_prim_attr("self_define_shard", True)
self.concat.shard(in_strategy=((layout("cp", "dp", "None"), layout("cp", "dp", "None")),),
out_strategy=(layout("cp", "dp", "None"),))
if self.use_seq_parallel and cp == 1:
self.enorm.shard(config, in_strategy=(layout("tp", "dp", "None"), layout("None",)))
self.hnorm.shard(config, in_strategy=(layout("tp", "dp", "None"), layout("None",)))
if self.use_seq_parallel:
self.enorm.shard(config, in_strategy=(layout(("cp", "tp"), "dp", "None"), layout("None",)))
self.hnorm.shard(config, in_strategy=(layout(("cp", "tp"), "dp", "None"), layout("None",)))
self.concat.add_prim_attr("self_define_shard", True)
self.concat.shard(in_strategy=((layout(("cp", "tp"), "dp", "None"), layout(("cp", "tp"), "dp", "None")),),
out_strategy=(layout(("cp", "tp"), "dp", "None"),))


+ 15
- 0
research/deepseek3/README.md View File

@@ -87,6 +87,11 @@ python research/deepseek3/fp8_cast_bf16.py \
--output-bf16-hf-path path/to/hf_model_bf16_dir/
```

参数说明:

- input-fp8-hf-path:数据类型为fp8的原始权重文件夹路径。
- output-bf16-hf-path:转换成数据类型为bf16后的权重文件夹路径。

>`path/to/hf_model_bf16_dir/` 可修改为自定义路径,确保该路径有足够的磁盘空间(约 1.4TB)。

## 推理
@@ -135,6 +140,11 @@ python research/deepseek3/convert_weight.py \
- infer:是否进行推理权重的转换,默认值:`False`。
- mindspore_ckpt_path:转换后的MindSpore权重文件夹保存路径
- worker_num:多进程转换的进程数,默认值:`4`。
- use_grouped_gemm:是否使用grouped_gemm,默认值:`False`。
- n_head:模型结构中Attention的头数,默认值:`128`。
- v_head_dim:单个注意力头中,Value向量的维度大小,默认值为:`128`。
- save_format:权重保存的格式,默认值:`safetensors`。
- param_json:权重的参数映射表的JSON文件名,默认值:`model.safetensors.index.json`。

如果使用训练后保存的权重进行推理,需要使用`deepseek3_train2infer.py`脚本将其转换为推理格式。执行以下命令进行转换:

@@ -220,6 +230,11 @@ bash scripts/msrun_launcher.sh "research/deepseek3/run_predict_deepseek.py \
32 8 $master_ip 8888 3 output/msrun_log False 300
```

参数说明:

- config: 推理的YAML配置文件路径。
- input: 推理的问题输入。

预期的推理结果如下:

```txt


+ 6
- 4
research/telechat2/README.md View File

@@ -35,28 +35,24 @@ TeleChat2-7B:
| config | task | Datasets | SeqLength | phase | performance |
|:---------------------------------------------------:| :-------------------: |:----------:|:---------:|:---------------:|:------------:|
| [TeleChat2_7B](./telechat2-7b/finetune_telechat_7b.yaml) | text_generation | example_dataset | 8192 | [finetune](#微调) | 2950 tokens/s/p |
| [TeleChat2_7B](./telechat2-7b/predict_telechat_7b.yaml) | text_generation | example_dataset | 8192 | [predict](#推理) | 54.1 tokens/s |

TeleChat2-35B:

| config | task | Datasets | SeqLength | phase | performance |
|-----------------------------------------------------| --------------------- |------------|-----------|-----------------|--------------|
| [TeleChat2_35B](./telechat2-35b/finetune_telechat_35b.yaml) | text_generation | example_dataset | 8192 | [finetune](#微调) | 516 tokens/s/p |
| [TeleChat2_35B](./telechat2-35b/predict_telechat_35b.yaml) | text_generation | example_dataset | 8192 | [predict](#推理) | 27.7 tokens/s |

TeleChat2-115B:

| config | task | Datasets | SeqLength | phase | performance |
|-----------------------------------------------------| --------------------- |------------|-----------|-----------------|--------------|
| [TeleChat2_115B](./telechat2-115b/finetune_telechat_115b.yaml) | text_generation | example_dataset | 8192 | [finetune](#微调) | 158 tokens/s/p |
| [TeleChat2_115B](./telechat2-115b/predict_telechat_115b.yaml) | text_generation | example_dataset | 8192 | [predict](#推理) | 26.5 tokens/s |

TeleChat2-39B-A12B:

| config | task | Datasets | SeqLength | phase | performance |
| ------------------------------------------------------------ | --------------- | --------------- | --------- | ---------------- | ------------- |
| [TeleChat2_39B_A12B](./telechat2-39b-a12b/finetune_telechat_39b_a12b.yaml) | text_generation | example_dataset | 8192 | [finetune](#微调) | 865 tokens/s/p |
| [TeleChat2_39B_A12B](./telechat2-39b-a12b/predict_telechat_39b_a12b_parallel.yaml) | text_generation | example_dataset | 8192 | [predict](#推理) | 36.4 tokens/s |

## 模型文件

@@ -138,6 +134,12 @@ input_dataset_file: 预训练的数据集
vocab_file_path: 词模型文件路径(如使用上述链接下载,指定到对应路径下即可)
max_length: 数据集长度
output_path: 生成数据集的路径
seed: 随机数种子,默认值:2024
start_token: 输入的首token,默认值:<_start>
user_token: 用户输入的提示词token,默认值:<_usr>
bot_token: 机器人输入的提示词token,默认值:<_bot>
end_token: 终止符对应的token,默认值:<_end>
pad_token: padding时补齐的token,默认值:<_pad>
```

> 注:`bos`, `eos`, `pad`等特殊`ids`要和`yaml`配置文件中`model_config`部分保持一致,默认`bos_token_id=1`, `eos_token_id=2`, `pad_token_id=3`。


Loading…
Cancel
Save
Baidu
map