73 Commits

Author SHA1 Message Date
  color 3d88f92b2e Merge pull request 'master' (#14) from MindSpore/docs:master into master 1 year ago
  i-robot b171f6d473
!14904 补充优化《精度问题定位指南》和《性能调优指南》 1 year ago
  liuqi 58a462fb2d optimize doc 1 year ago
  i-robot db7f7f427e
!14901 修改mindspore profiler文档 1 year ago
  zhujiaxing 6ded2ed58f 修改mindspore profiler文档 1 year ago
  i-robot e50bec0962
!14900 modify error links 1 year ago
  huan d213eb1d25 modify the file error and error links 1 year ago
  i-robot 529264cd61
!14897 补全profiling文件描述 1 year ago
  chenjunjie 74de608d41 fix profiling file description 1 year ago
  i-robot 0f203d680d
!14896 modify the links 1 year ago
  huan e7138d8600 modify the links 1 year ago
  i-robot c8fbf98361
!14895 modify mindchemistry conf zh 1 year ago
  yuhan d08d107451 modify mindchemistry conf zh 1 year ago
  i-robot d6655d553c
!14890 补充《性能调优文档》 1 year ago
  liuqi c645c5d7cf replenish perf optimize 1 year ago
  i-robot 9156c027f2
!14865 新增《精度调优指南》 1 year ago
  liuqi ab768d082d add acc optimize 1 year ago
  i-robot a27b266248
!14866 新增《性能调优指南》 1 year ago
  liuqi 3f17cb68e2 add pert optimize 1 year ago
  i-robot c7ae8466ec
!14885 modify title and del model 1 year ago
  yuhan 1815e769ba modify title and del model 1 year ago
  i-robot 1f5820ca5f
!14840 MFQ3文档整改--首页、模型、安装 1 year ago
  zxq 1d11731377 MFQ3文档整改--首页、模型仓、安装页面 1 year ago
  i-robot 4f776c0e28
!14882 modify the index files 1 year ago
  huan b562d9b1b9 modify the index files 1 year ago
  i-robot 6170fc2041
!14878 modify the index files 1 year ago
  huan 801e62c7b9 modify the index files 1 year ago
  i-robot fb85f9690a
!14873 修正文档错误 1 year ago
  i-robot adc3103f4e
!14876 modify dump files 1 year ago
  熊攀 b412085052 修改文档错误 1 year ago
  huan de80931b00 modify dump files 1 year ago
  i-robot f24b6297a2
!14864 modify the error links 1 year ago
  huan 3a68fb0ac3 modify the error links 1 year ago
  i-robot 3653b47f75
!14857 modify the error links 1 year ago
  huan 4a5c4c6397 modify the error links 1 year ago
  i-robot cfecadbffe
!14851 modify notebook errors 1 year ago
  yuhan da9fda8405 modify notebook errors 1 year ago
  i-robot 7a1def745e
!14838 [mindchemistry] mindchemistry user页面删除无用引用 1 year ago
  jian981105 6004c5d96e remove reference in user rst page 1 year ago
  i-robot 74df00e6d7
!14837 modify conf and index 1 year ago
  yuhan dc4e321a91 modify conf and index 1 year ago
  i-robot 71f35f0fde
!14835 modify the error anchors 1 year ago
  huan 88890d0980 modify the error anchors 1 year ago
  i-robot 23c21625dc
!14829 modify file format 1 year ago
  huan f1ef2132f2 modify file format 1 year ago
  TingWang 293f6e8310
!14830 modify url and del images 1 year ago
  yuhan a75658da96 modify url and del images 1 year ago
  i-robot 331fe02b70
!14828 modify the file location 1 year ago
  huan 2546281ffb modify the file location 1 year ago
  i-robot a30c2faa77
!14827 [MD] 适配 max_rowsize 接口默认值修改 1 year ago
  Xiao Tianci bdf87fd9fd change API default value of param max_rowsize 1 year ago
  i-robot 7ac43d96d5
!14826 modify conf and urls 1 year ago
  yuhan 6c69703482 modify conf and urls 1 year ago
  i-robot 37e0a37f71
!14821 add backward pre hook 1 year ago
  zjun a0de9bceaa Add backward pre hook 1 year ago
  i-robot 59df33530b
!14822 modify the orange_pi rst files 1 year ago
  huan 6366590d85 modify the rst files 1 year ago
  TingWang 239d24df3e
!14818 modify the files 1 year ago
  TingWang 75e2906a41
!14815 modify err urls 1 year ago
  huan 6f110e7999 modify the files 1 year ago
  yuhan de08cec7ef modify err urls 1 year ago
  i-robot f9554148ff
!14812 modify the file contents 1 year ago
  huan c4c5469520 modify the file contents 1 year ago
  TingWang 6ab0ebb2ac
!14811 modify urls 1 year ago
  yuhan 07d29ad2af modify urls 1 year ago
  TingWang 1a709c9e3c
!14807 modify the file framework 1 year ago
  i-robot 9a106c3fe7
!14808 modify frame 1 year ago
  yuhan 220a67b7f6 modify frame 1 year ago
  huan d01a6bd797 modify the file structure 1 year ago
  i-robot eacd77d637
!14802 modify the format of txt files 1 year ago
  huan 27c62b2a4c modify the format of txt files 1 year ago
  i-robot f2e1db9cb5
!14799 modify the error links 1 year ago
  huan 3d536e6c21 modify the error links 1 year ago
100 changed files with 866 additions and 1095 deletions
Split View
  1. +2
    -1
      .jenkins/rules/codespell/codespell.allow
  2. +2
    -2
      docs/golden_stick/docs/source_en/index.rst
  3. +1
    -1
      docs/golden_stick/docs/source_en/pruner/scop.md
  4. +2
    -2
      docs/golden_stick/docs/source_zh_cn/index.rst
  5. +1
    -1
      docs/golden_stick/docs/source_zh_cn/pruner/scop.md
  6. +1
    -1
      docs/hub/docs/source_en/loading_model_from_hub.md
  7. +1
    -1
      docs/hub/docs/source_zh_cn/loading_model_from_hub.md
  8. +1
    -1
      docs/lite/docs/source_en/quick_start/train_lenet.md
  9. +3
    -3
      docs/lite/docs/source_en/use/cloud_infer/runtime_distributed_cpp.md
  10. +3
    -3
      docs/lite/docs/source_en/use/cloud_infer/runtime_distributed_python.md
  11. +1
    -1
      docs/lite/docs/source_en/use/converter_tool.md
  12. +1
    -1
      docs/lite/docs/source_zh_cn/quick_start/train_lenet.md
  13. +3
    -3
      docs/lite/docs/source_zh_cn/use/cloud_infer/runtime_distributed_cpp.md
  14. +3
    -3
      docs/lite/docs/source_zh_cn/use/cloud_infer/runtime_distributed_python.md
  15. +1
    -1
      docs/lite/docs/source_zh_cn/use/converter_tool.md
  16. +5
    -1
      docs/mindchemistry/docs/source_en/index.rst
  17. +2
    -2
      docs/mindchemistry/docs/source_en/user/molecular_generation.md
  18. +2
    -2
      docs/mindchemistry/docs/source_en/user/molecular_prediction.md
  19. +1
    -0
      docs/mindchemistry/docs/source_zh_cn/conf.py
  20. +5
    -1
      docs/mindchemistry/docs/source_zh_cn/index.rst
  21. +1
    -1
      docs/mindchemistry/docs/source_zh_cn/user/molecular_generation.md
  22. +1
    -1
      docs/mindchemistry/docs/source_zh_cn/user/molecular_prediction.md
  23. +2
    -2
      docs/mindearth/docs/source_en/medium-range/FourCastNet.ipynb
  24. +2
    -2
      docs/mindearth/docs/source_en/medium-range/vit_kno.ipynb
  25. +2
    -2
      docs/mindearth/docs/source_zh_cn/medium-range/FourCastNet.ipynb
  26. +2
    -2
      docs/mindearth/docs/source_zh_cn/medium-range/vit_kno.ipynb
  27. +3
    -3
      docs/mindinsight/docs/source_en/accuracy_optimization.md
  28. +5
    -5
      docs/mindinsight/docs/source_en/accuracy_problem_preliminary_location.md
  29. +2
    -2
      docs/mindinsight/docs/source_en/performance_optimization.md
  30. +1
    -1
      docs/mindinsight/docs/source_en/performance_profiling_gpu.md
  31. +4
    -4
      docs/mindinsight/docs/source_en/performance_tuning_guide.md
  32. +1
    -1
      docs/mindinsight/docs/source_en/profiling/profiling_host_time.txt
  33. +42
    -43
      docs/mindinsight/docs/source_en/profiling/profiling_offline.txt
  34. +0
    -17
      docs/mindinsight/docs/source_en/profiling/profiling_resoure.txt
  35. +1
    -1
      docs/mindinsight/docs/source_en/profiling/profiling_training.txt
  36. +3
    -3
      docs/mindinsight/docs/source_zh_cn/accuracy_optimization.md
  37. +5
    -5
      docs/mindinsight/docs/source_zh_cn/accuracy_problem_preliminary_location.md
  38. +2
    -2
      docs/mindinsight/docs/source_zh_cn/performance_optimization.md
  39. +1
    -1
      docs/mindinsight/docs/source_zh_cn/performance_profiling_gpu.md
  40. +4
    -4
      docs/mindinsight/docs/source_zh_cn/performance_tuning_guide.md
  41. +1
    -1
      docs/mindinsight/docs/source_zh_cn/profiling/profiling_host_time.txt
  42. +42
    -43
      docs/mindinsight/docs/source_zh_cn/profiling/profiling_offline.txt
  43. +0
    -17
      docs/mindinsight/docs/source_zh_cn/profiling/profiling_resoure.txt
  44. +1
    -1
      docs/mindinsight/docs/source_zh_cn/profiling/profiling_training.txt
  45. +3
    -2
      docs/mindspore/Makefile
  46. +5
    -1
      docs/mindspore/_ext/generate_ops_mint_rst.py
  47. +1
    -1
      docs/mindspore/_ext/generate_rst_by_en.py
  48. +29
    -0
      docs/mindspore/source_en/api_python/index.rst
  49. +9
    -4
      docs/mindspore/source_en/conf.py
  50. +6
    -6
      docs/mindspore/source_en/design/distributed_training_design.md
  51. +1
    -186
      docs/mindspore/source_en/design/dynamic_graph_and_static_graph.md
  52. +17
    -0
      docs/mindspore/source_en/design/index.rst
  53. +7
    -9
      docs/mindspore/source_en/faq/data_processing.md
  54. +1
    -1
      docs/mindspore/source_en/faq/distributed_parallel.md
  55. +1
    -1
      docs/mindspore/source_en/faq/feature_advice.md
  56. +1
    -1
      docs/mindspore/source_en/faq/implement_problem.md
  57. +17
    -0
      docs/mindspore/source_en/faq/index.rst
  58. +3
    -3
      docs/mindspore/source_en/faq/network_compilation.md
  59. +2
    -2
      docs/mindspore/source_en/faq/operators_compile.md
  60. +9
    -262
      docs/mindspore/source_en/index.rst
  61. +9
    -0
      docs/mindspore/source_en/kits_tools/index.rst
  62. +2
    -2
      docs/mindspore/source_en/kits_tools/official_models.md
  63. +18
    -0
      docs/mindspore/source_en/kits_tools/overview.md
  64. +7
    -7
      docs/mindspore/source_en/migration_guide/analysis_and_preparation.md
  65. +9
    -17
      docs/mindspore/source_en/migration_guide/debug_and_tune.md
  66. +15
    -12
      docs/mindspore/source_en/migration_guide/faq.rst
  67. +14
    -0
      docs/mindspore/source_en/migration_guide/index.rst
  68. +1
    -1
      docs/mindspore/source_en/migration_guide/migrator_with_tools.md
  69. +1
    -1
      docs/mindspore/source_en/migration_guide/missing_api_processing_policy.md
  70. +4
    -4
      docs/mindspore/source_en/migration_guide/model_development/dataset.md
  71. +4
    -4
      docs/mindspore/source_en/migration_guide/model_development/gradient.md
  72. +1
    -1
      docs/mindspore/source_en/migration_guide/model_development/learning_rate_and_optimizer.md
  73. +1
    -1
      docs/mindspore/source_en/migration_guide/model_development/loss_function.md
  74. +5
    -5
      docs/mindspore/source_en/migration_guide/model_development/model_and_cell.md
  75. +2
    -2
      docs/mindspore/source_en/migration_guide/model_development/model_development.rst
  76. +2
    -2
      docs/mindspore/source_en/migration_guide/model_development/training_and_evaluation.md
  77. +29
    -23
      docs/mindspore/source_en/migration_guide/overview.md
  78. +9
    -0
      docs/mindspore/source_en/migration_guide/reference.rst
  79. +4
    -1
      docs/mindspore/source_en/migration_guide/reproducing_algorithm.md
  80. +2
    -2
      docs/mindspore/source_en/migration_guide/sample_code.md
  81. +1
    -1
      docs/mindspore/source_en/migration_guide/sparsity.md
  82. +1
    -1
      docs/mindspore/source_en/migration_guide/use_third_party_op.md
  83. +1
    -1
      docs/mindspore/source_en/mindformers/appendix/env_variables.md
  84. +2
    -2
      docs/mindspore/source_en/mindformers/index.rst
  85. +10
    -0
      docs/mindspore/source_en/model_infer/index.rst
  86. +8
    -0
      docs/mindspore/source_en/model_infer/llm_infer.rst
  87. +3
    -0
      docs/mindspore/source_en/model_infer/llm_lite.md
  88. +1
    -1
      docs/mindspore/source_en/model_infer/model_compression.md
  89. +1
    -1
      docs/mindspore/source_en/model_infer/overview.md
  90. +3
    -0
      docs/mindspore/source_en/model_train/custom_program/fusion_pass.md
  91. +91
    -313
      docs/mindspore/source_en/model_train/custom_program/hook_program.md
  92. +2
    -2
      docs/mindspore/source_en/model_train/custom_program/initializer.md
  93. +198
    -0
      docs/mindspore/source_en/model_train/custom_program/layer.md
  94. +3
    -3
      docs/mindspore/source_en/model_train/custom_program/loss.md
  95. +98
    -0
      docs/mindspore/source_en/model_train/custom_program/network_custom.md
  96. +12
    -0
      docs/mindspore/source_en/model_train/custom_program/op_custom.rst
  97. +1
    -1
      docs/mindspore/source_en/model_train/custom_program/operation/ms_kernel.md
  98. +7
    -7
      docs/mindspore/source_en/model_train/custom_program/operation/op_custom.md
  99. +1
    -1
      docs/mindspore/source_en/model_train/custom_program/operation/op_custom_adv.md
  100. +2
    -2
      docs/mindspore/source_en/model_train/custom_program/operation/op_custom_aot.md

+ 2
- 1
.jenkins/rules/codespell/codespell.allow View File

@@ -17,4 +17,5 @@ mutiple
allEdges
statments
SubreE
followings
followings
Rouge

+ 2
- 2
docs/golden_stick/docs/source_en/index.rst View File

@@ -50,7 +50,7 @@ General Process of Applying the MindSpore Golden Stick

- **Optimize the network using the MindSpore Golden Stick:** In the original training process, after the original network is defined and before the network is trained, use the MindSpore Golden Stick to optimize the network structure. Generally, this step is implemented by calling the `apply` API of MindSpore Golden Stick. For details, see `Applying the SimQAT Algorithm <https://mindspore.cn/golden_stick/docs/en/master/quantization/simqat.html>`_ .

- **Register the MindSpore Golden Stick callback:** Register the callback of the MindSpore Golden Stick into the model to be trained. Generally, in this step, the `callback` function of MindSpore Golden Stick is called to obtain the corresponding callback object and `register the object into the model <https://www.mindspore.cn/tutorials/en/master/advanced/model/callback.html>`_ .
- **Register the MindSpore Golden Stick callback:** Register the callback of the MindSpore Golden Stick into the model to be trained. Generally, in this step, the `callback` function of MindSpore Golden Stick is called to obtain the corresponding callback object and `register the object into the model <https://www.mindspore.cn/docs/en/master/model_train/train_process/model/callback.html>`_ .

2. Deployment

@@ -59,7 +59,7 @@ General Process of Applying the MindSpore Golden Stick
.. note::
- For details about how to apply the MindSpore Golden Stick, see the detailed description and sample code in each algorithm section.
- For details about the "ms.export" step in the process, see `Exporting MINDIR Model <https://www.mindspore.cn/tutorials/en/master/beginner/save_load.html#saving-and-loading-mindir>`_ .
- For details about the "MindSpore infer" step in the process, see `MindSpore Inference Runtime <https://mindspore.cn/tutorials/experts/en/master/infer/inference.html>`_ .
- For details about the "MindSpore infer" step in the process, see `MindSpore Inference Runtime <https://mindspore.cn/docs/en/master/model_infer/overview.html>`_ .

Roadmap
---------------------------------------


+ 1
- 1
docs/golden_stick/docs/source_en/pruner/scop.md View File

@@ -210,7 +210,7 @@ if __name__ == "__main__":
export(network, inputs, file_name="ResNet_SCOP", file_format='MINDIR')
```

After the pruned model is exported, [use MindSpore for inference](https://www.mindspore.cn/tutorials/experts/en/master/infer/inference.html).
After the pruned model is exported, [use MindSpore for inference](https://www.mindspore.cn/docs/en/master/model_infer/overview.html).

## Summary



+ 2
- 2
docs/golden_stick/docs/source_zh_cn/index.rst View File

@@ -50,7 +50,7 @@ MindSpore Golden Stick除了提供丰富的模型压缩算法外,一个重要

- **应用MindSpore Golden Stick算法优化网络:** 在原训练流程中,在定义原始网络之后,网络训练之前,应用MindSpore Golden Stick算法优化网络结构。一般这个步骤是调用MindSpore Golden Stick的 `apply` 接口实现的,可以参考 `应用SimQAT算法 <https://mindspore.cn/golden_stick/docs/zh-CN/master/quantization/simqat.html#%E5%BA%94%E7%94%A8%E9%87%8F%E5%8C%96%E7%AE%97%E6%B3%95>`_。

- **注册MindSpore Golden Stick回调逻辑:** 将MindSpore Golden Stick算法的回调逻辑注册到要训练的model中。一般这个步骤是调用MindSpore Golden Stick的 `callback` 获取相应的callback对象, `注册到model <https://www.mindspore.cn/tutorials/zh-CN/master/advanced/model/callback.html>`_ 中。
- **注册MindSpore Golden Stick回调逻辑:** 将MindSpore Golden Stick算法的回调逻辑注册到要训练的model中。一般这个步骤是调用MindSpore Golden Stick的 `callback` 获取相应的callback对象, `注册到model <https://www.mindspore.cn/docs/zh-CN/master/model_train/train_process/model/callback.html>`_ 中。

2. 部署阶段

@@ -59,7 +59,7 @@ MindSpore Golden Stick除了提供丰富的模型压缩算法外,一个重要
.. note::
- 应用MindSpore Golden Stick算法的细节,可以在每个算法章节中找到详细说明和示例代码。
- 流程中的"ms.export"步骤可以参考 `导出mindir格式文件 <https://www.mindspore.cn/tutorials/zh-CN/master/beginner/save_load.html#保存和加载mindir>`_ 章节。
- 流程中的"昇思推理优化工具和运行时"步骤可以参考 `昇思推理 <https://mindspore.cn/tutorials/experts/zh-CN/master/infer/inference.html>`_ 章节。
- 流程中的"昇思推理优化工具和运行时"步骤可以参考 `昇思推理 <https://mindspore.cn/docs/zh-CN/master/model_infer/overview.html>`_ 章节。

未来规划
----------


+ 1
- 1
docs/golden_stick/docs/source_zh_cn/pruner/scop.md View File

@@ -210,7 +210,7 @@ if __name__ == "__main__":
export(network, inputs, file_name="ResNet_SCOP", file_format='MINDIR')
```

导出剪枝模型后,请[使用MindSpore进行推理](https://www.mindspore.cn/tutorials/experts/zh-CN/master/infer/inference.html)。
导出剪枝模型后,请[使用MindSpore进行推理](https://www.mindspore.cn/docs/zh-CN/master/model_infer/overview.html)。

## 算法效果汇总



+ 1
- 1
docs/hub/docs/source_en/loading_model_from_hub.md View File

@@ -39,7 +39,7 @@ This document demonstrates the use of the models provided by MindSpore Hub for b

```

3. After loading the model, you can use MindSpore to do inference. You can refer to [Multi-Platform Inference Overview](https://www.mindspore.cn/tutorials/experts/en/master/infer/inference.html).
3. After loading the model, you can use MindSpore to do inference. You can refer to [Multi-Platform Inference Overview](https://www.mindspore.cn/docs/en/master/model_infer/overview.html).

## For Transfer Training



+ 1
- 1
docs/hub/docs/source_zh_cn/loading_model_from_hub.md View File

@@ -39,7 +39,7 @@

```

3. 完成模型加载后,可以使用MindSpore进行推理,参考[推理模型总览](https://www.mindspore.cn/tutorials/experts/zh-CN/master/infer/inference.html)。
3. 完成模型加载后,可以使用MindSpore进行推理,参考[推理模型总览](https://www.mindspore.cn/docs/zh-CN/master/model_infer/overview.html)。

## 用于迁移学习



+ 1
- 1
docs/lite/docs/source_en/quick_start/train_lenet.md View File

@@ -214,7 +214,7 @@ train_lenet_cpp/

Whether it is an off-the-shelf prepared model, or a custom written model, the model needs to be exported to a `.mindir` file. Here we use the already-implemented [LeNet model](https://gitee.com/mindspore/models/tree/master/research/cv/lenet).

> This summary is exported using the MindSpore cloud side feature. For more information, please refer to [MindSpore Tutorial](https://www.mindspore.cn/tutorials/experts/en/master/index.html).
> This summary is exported using the MindSpore cloud side feature. For more information, please refer to [MindSpore Tutorial](https://www.mindspore.cn/tutorials/en/master/index.html).

```python
import numpy as np


+ 3
- 3
docs/lite/docs/source_en/use/cloud_infer/runtime_distributed_cpp.md View File

@@ -4,7 +4,7 @@

## Overview

For scenarios where large-scale neural network models have many parameters and cannot be fully loaded into a single device for inference, distributed inference can be performed using multiple devices. This tutorial describes how to perform MindSpore Lite cloud-side distributed inference using the [C++ interface](https://www.mindspore.cn/lite/api/en/master/index.html). Cloud-side distributed inference is roughly the same process as [Cloud-side single-card inference](https://www.mindspore.cn/lite/docs/en/master/use/cloud_infer/runtime_cpp.html) and can be cross-referenced. For the related contents of distributed inference, please refer to [MindSpore Distributed inference](https://www.mindspore.cn/tutorials/experts/en/master/parallel/model_loading.html#inference), and MindSpore Lite cloud-side distributed inference has more optimization for performance aspects.
For scenarios where large-scale neural network models have many parameters and cannot be fully loaded into a single device for inference, distributed inference can be performed using multiple devices. This tutorial describes how to perform MindSpore Lite cloud-side distributed inference using the [C++ interface](https://www.mindspore.cn/lite/api/en/master/index.html). Cloud-side distributed inference is roughly the same process as [Cloud-side single-card inference](https://www.mindspore.cn/lite/docs/en/master/use/cloud_infer/runtime_cpp.html) and can be cross-referenced. For the related contents of distributed inference, please refer to [MindSpore Distributed inference](https://www.mindspore.cn/docs/en/master/model_train/parallel/model_loading.html#inference), and MindSpore Lite cloud-side distributed inference has more optimization for performance aspects.

MindSpore Lite cloud-side distributed inference is only supported to run in Linux environment deployments with Atlas training series and Nvidia GPU as the supported device types. As shown in the figure below, the distributed inference is currently initiated by a multi-process approach, where each process corresponds to a `Rank` in the communication set, loading, compiling and executing the respective sliced model, with the same input data for each process.

@@ -12,7 +12,7 @@ MindSpore Lite cloud-side distributed inference is only supported to run in Linu

Each process consists of the following main steps:

1. Model reading: Slice and [export the distributed MindIR model](https://www.mindspore.cn/tutorials/experts/en/master/parallel/model_loading.html#exporting-mindir-files-in-the-distributed-scenario) via MindSpore. The number of MindIR models is the same as the number of devices for loading to each device for inference.
1. Model reading: Slice and [export the distributed MindIR model](https://www.mindspore.cn/docs/en/master/model_train/parallel/model_loading.html#exporting-mindir-files-in-the-distributed-scenario) via MindSpore. The number of MindIR models is the same as the number of devices for loading to each device for inference.
2. Context creation and configuration: Create and configure the [Context](https://www.mindspore.cn/lite/api/en/master/generate/classmindspore_Context.html), and hold the distributed inference parameters to guide distributed model compilation and model execution.
3. Model loading and compilation: Use the [Model::Build](https://www.mindspore.cn/lite/api/en/master/generate/classmindspore_Model.html) interface for model loading and model compilation. The model loading phase parses the file cache into a runtime model. The model compilation phase optimizes the front-end computational graph into a high-performance back-end computational graph. The process is time-consuming and it is recommended to compile once and inference multiple times.
4. Model input data padding.
@@ -24,7 +24,7 @@ Each process consists of the following main steps:

1. To download the cloud-side distributed inference C++ sample code, please select the device type: [Ascend](https://gitee.com/mindspore/mindspore/tree/master/mindspore/lite/examples/cloud_infer/ascend_ge_distributed_cpp) or [GPU](https://gitee.com/mindspore/mindspore/tree/master/mindspore/lite/examples/cloud_infer/gpu_trt_distributed_cpp). The directory will be referred to later as the example code directory.

2. Slice and [export the distributed MindIR model](https://www.mindspore.cn/tutorials/experts/en/master/parallel/model_loading.html#exporting-mindir-files-in-the-distributed-scenario) via MindSpore and store it to the sample code directory. For a quick experience, you can download the two sliced Matmul model files [Matmul0.mindir](https://download.mindspore.cn/model_zoo/official/lite/quick_start/Matmul0.mindir), [Matmul1.mindir](https://download.mindspore.cn/model_zoo/official/lite/quick_start/Matmul1.mindir).
2. Slice and [export the distributed MindIR model](https://www.mindspore.cn/docs/en/master/model_train/parallel/model_loading.html#exporting-mindir-files-in-the-distributed-scenario) via MindSpore and store it to the sample code directory. For a quick experience, you can download the two sliced Matmul model files [Matmul0.mindir](https://download.mindspore.cn/model_zoo/official/lite/quick_start/Matmul0.mindir), [Matmul1.mindir](https://download.mindspore.cn/model_zoo/official/lite/quick_start/Matmul1.mindir).

3. For Ascend device type, generate the networking information file through hccl_tools.py as needed, store it in the sample code directory, and fill the path of the file into the configuration file `config_file.ini` in the sample code directory.



+ 3
- 3
docs/lite/docs/source_en/use/cloud_infer/runtime_distributed_python.md View File

@@ -4,7 +4,7 @@

## Overview

For scenarios where large-scale neural network models have many parameters and cannot be fully loaded into a single device for inference, distributed inference can be performed using multiple devices. This tutorial describes how to perform MindSpore Lite cloud-side distributed inference using the [Python interface](https://www.mindspore.cn/lite/api/en/master/mindspore_lite.html). Cloud-side distributed inference is roughly the same process as [Cloud-side single-card inference](https://www.mindspore.cn/lite/docs/en/master/use/cloud_infer/runtime_python.html) and can be cross-referenced. For the related contents of distributed inference, please refer to [MindSpore Distributed inference](https://www.mindspore.cn/tutorials/experts/en/master/parallel/model_loading.html#inference), and MindSpore Lite cloud-side distributed inference has more optimization for performance aspects.
For scenarios where large-scale neural network models have many parameters and cannot be fully loaded into a single device for inference, distributed inference can be performed using multiple devices. This tutorial describes how to perform MindSpore Lite cloud-side distributed inference using the [Python interface](https://www.mindspore.cn/lite/api/en/master/mindspore_lite.html). Cloud-side distributed inference is roughly the same process as [Cloud-side single-card inference](https://www.mindspore.cn/lite/docs/en/master/use/cloud_infer/runtime_python.html) and can be cross-referenced. For the related contents of distributed inference, please refer to [MindSpore Distributed inference](https://www.mindspore.cn/docs/en/master/model_train/parallel/model_loading.html#inference), and MindSpore Lite cloud-side distributed inference has more optimization for performance aspects.

MindSpore Lite cloud-side distributed inference is only supported to run in Linux environment deployments with Atlas training series and Nvidia GPU as the supported device types. As shown in the figure below, the distributed inference is currently initiated by a multi-process approach, where each process corresponds to a `Rank` in the communication set, loading, compiling and executing the respective sliced model, with the same input data for each process.

@@ -12,7 +12,7 @@ MindSpore Lite cloud-side distributed inference is only supported to run in Linu

Each process consists of the following main steps:

1. Model reading: Slice and [export the distributed MindIR model](https://www.mindspore.cn/tutorials/experts/en/master/parallel/model_loading.html#exporting-mindir-files-in-the-distributed-scenario) via MindSpore. The number of MindIR models is the same as the number of devices for loading to each device for inference.
1. Model reading: Slice and [export the distributed MindIR model](https://www.mindspore.cn/docs/en/master/model_train/parallel/model_loading.html#exporting-mindir-files-in-the-distributed-scenario) via MindSpore. The number of MindIR models is the same as the number of devices for loading to each device for inference.
2. Context creation and configuration: Create and configure the [Context](https://www.mindspore.cn/lite/api/en/master/mindspore_lite/mindspore_lite.Context.html#mindspore_lite.Context), and hold the distributed inference parameters to guide distributed model compilation and model execution.
3. Model loading and compilation: Use the [Model.build_from_file](https://www.mindspore.cn/lite/api/en/master/mindspore_lite/mindspore_lite.Model.html#mindspore_lite.Model.build_from_file) interface for model loading and model compilation. The model loading phase parses the file cache into a runtime model. The model compilation phase optimizes the front-end computational graph into a high-performance back-end computational graph. The process is time-consuming and it is recommended to compile once and inference multiple times.
4. Model input data padding.
@@ -24,7 +24,7 @@ Each process consists of the following main steps:

1. To download the cloud-side distributed inference python sample code, please select the device type: [Ascend](https://gitee.com/mindspore/mindspore/tree/master/mindspore/lite/examples/cloud_infer/ascend_ge_distributed_cpp) or [GPU](https://gitee.com/mindspore/mindspore/tree/master/mindspore/lite/examples/cloud_infer/gpu_trt_distributed_cpp). The directory will be referred to later as the example code directory.

2. Slice and [export the distributed MindIR model](https://www.mindspore.cn/tutorials/experts/en/master/parallel/model_loading.html#exporting-mindir-files-in-the-distributed-scenario) via MindSpore and store it to the sample code directory. For a quick experience, you can download the two sliced Matmul model files [Matmul0.mindir](https://download.mindspore.cn/model_zoo/official/lite/quick_start/Matmul0.mindir), [Matmul1.mindir](https://download.mindspore.cn/model_zoo/official/lite/quick_start/Matmul1.mindir).
2. Slice and [export the distributed MindIR model](https://www.mindspore.cn/docs/en/master/model_train/parallel/model_loading.html#exporting-mindir-files-in-the-distributed-scenario) via MindSpore and store it to the sample code directory. For a quick experience, you can download the two sliced Matmul model files [Matmul0.mindir](https://download.mindspore.cn/model_zoo/official/lite/quick_start/Matmul0.mindir), [Matmul1.mindir](https://download.mindspore.cn/model_zoo/official/lite/quick_start/Matmul1.mindir).

3. For Ascend device type, generate the networking information file through hccl_tools.py as needed, store it in the sample code directory, and fill the path of the file into the configuration file `config_file.ini` in the sample code directory.



+ 1
- 1
docs/lite/docs/source_en/use/converter_tool.md View File

@@ -70,7 +70,7 @@ The following describes the parameters in detail.
| `--outputDataType=<OUTPUTDATATYPE>` | No | Set data type of output tensor of quantized model. Only valid for output tensor which has quantization parameters(scale and zero point). Keep same with the data type of output tensor of origin model by default. | FLOAT32, INT8, UINT8, DEFAULT | DEFAULT | - |
| `--outputDataFormat=<OUTPUTDATAFORMAT>` | No | Set the output format of exported model. Only valid for 4-dimensional outputs. | NHWC, NCHW | - | - |
| `--encryptKey=<ENCRYPTKEY>` | No | Set the key for exporting encrypted `ms` models. The key is expressed in hexadecimal. Only AES-GCM is supported, and the key length is only 16Byte. | - | - | - |
| `--encryption=<ENCRYPTION>` | No | Set whether to encrypt when exporting the `ms` model. Exporting encryption can protect the integrity of the model, but it will increase the runtime initialization time. | true, false | true | - |
| `--encryption=<ENCRYPTION>` | No | Set whether to encrypt when exporting the `ms` model. Exporting encryption can protect the integrity of the model, but it will increase the runtime initialization time. | true, false | false | - |
| `--infer=<INFER>` | No | Set whether to pre-inference when conversion is complete. | true, false | false | - |

> - The parameter name and parameter value are separated by an equal sign (=) and no space is allowed between them.


+ 1
- 1
docs/lite/docs/source_zh_cn/quick_start/train_lenet.md View File

@@ -214,7 +214,7 @@ train_lenet_cpp/

首先我们需要基于MindSpore框架创建一个LeNet模型,本例中直接用MindSpore ModelZoo的现有[LeNet模型](https://gitee.com/mindspore/models/tree/master/research/cv/lenet)。

> 本小结使用MindSpore云侧功能导出,更多信息请参考[MindSpore教程](https://www.mindspore.cn/tutorials/experts/zh-CN/master/index.html)。
> 本小结使用MindSpore云侧功能导出,更多信息请参考[MindSpore教程](https://www.mindspore.cn/tutorials/zh-CN/master/index.html)。

```python
import numpy as np


+ 3
- 3
docs/lite/docs/source_zh_cn/use/cloud_infer/runtime_distributed_cpp.md View File

@@ -4,7 +4,7 @@

## 概述

针对大规模神经网络模型参数多、无法完全加载至单设备推理的场景,可利用多设备进行分布式推理。本教程介绍如何使用[C++接口](https://www.mindspore.cn/lite/api/zh-CN/master/index.html)执行MindSpore Lite云侧分布式推理。云侧分布式推理与[云侧单卡推理](https://www.mindspore.cn/lite/docs/zh-CN/master/use/cloud_infer/runtime_cpp.html)流程大致相同,可以相互参考。关于分布式推理的相关内容可参考[MindSpore分布式推理](https://www.mindspore.cn/tutorials/experts/zh-CN/master/parallel/model_loading.html#推理),相比之下,MindSpore Lite云侧分布式推理针对性能方面具有更多的优化。
针对大规模神经网络模型参数多、无法完全加载至单设备推理的场景,可利用多设备进行分布式推理。本教程介绍如何使用[C++接口](https://www.mindspore.cn/lite/api/zh-CN/master/index.html)执行MindSpore Lite云侧分布式推理。云侧分布式推理与[云侧单卡推理](https://www.mindspore.cn/lite/docs/zh-CN/master/use/cloud_infer/runtime_cpp.html)流程大致相同,可以相互参考。关于分布式推理的相关内容可参考[MindSpore分布式推理](https://www.mindspore.cn/docs/zh-CN/master/model_train/parallel/model_loading.html#推理),相比之下,MindSpore Lite云侧分布式推理针对性能方面具有更多的优化。

MindSpore Lite云侧分布式推理仅支持在Linux环境部署运行,支持的设备类型为Atlas训练系列产品和Nvidia GPU。如下图所示,当前通过多进程方式启动分布式推理,每个进程对应通信集合中的一个`Rank`,对各自已切分的模型进行加载、编译与执行,每个进程输入数据相同。

@@ -12,7 +12,7 @@ MindSpore Lite云侧分布式推理仅支持在Linux环境部署运行,支持

每个进程主要包括以下步骤:

1. 模型读取:通过MindSpore切分并[导出分布式MindIR模型](https://www.mindspore.cn/tutorials/experts/zh-CN/master/parallel/model_loading.html#分布式场景导出mindir文件),MindIR模型数量与设备数相同,用于加载到各个设备进行推理。
1. 模型读取:通过MindSpore切分并[导出分布式MindIR模型](https://www.mindspore.cn/docs/zh-CN/master/model_train/parallel/model_loading.html#分布式场景导出mindir文件),MindIR模型数量与设备数相同,用于加载到各个设备进行推理。
2. 上下文创建与配置:创建并配置上下文[Context](https://www.mindspore.cn/lite/api/zh-CN/master/api_cpp/mindspore.html#context),保存分布式推理参数,用于指导分布式模型编译和模型执行。
3. 模型加载与编译:使用[Model::Build](https://www.mindspore.cn/lite/api/zh-CN/master/api_cpp/mindspore.html#build-2)接口进行模型加载和模型编译。模型加载阶段将文件缓存解析成运行时的模型。模型编译阶段将前端计算图优化为高性能后端计算图,该过程耗时较长,建议一次编译,多次推理。
4. 模型输入数据填充。
@@ -24,7 +24,7 @@ MindSpore Lite云侧分布式推理仅支持在Linux环境部署运行,支持

1. 下载云侧分布式推理C++示例代码,请选择设备类型:[Ascend](https://gitee.com/mindspore/mindspore/tree/master/mindspore/lite/examples/cloud_infer/ascend_ge_distributed_cpp)或[GPU](https://gitee.com/mindspore/mindspore/tree/master/mindspore/lite/examples/cloud_infer/gpu_trt_distributed_cpp)。后文将该目录称为示例代码目录。

2. 通过MindSpore切分并[导出分布式MindIR模型](https://www.mindspore.cn/tutorials/experts/zh-CN/master/parallel/model_loading.html#分布式场景导出mindir文件),将其存放至示例代码目录。如需快速体验,可下载已切分的两个Matmul模型文件[Matmul0.mindir](https://download.mindspore.cn/model_zoo/official/lite/quick_start/Matmul0.mindir)、[Matmul1.mindir](https://download.mindspore.cn/model_zoo/official/lite/quick_start/Matmul1.mindir)。
2. 通过MindSpore切分并[导出分布式MindIR模型](https://www.mindspore.cn/docs/zh-CN/master/model_train/parallel/model_loading.html#分布式场景导出mindir文件),将其存放至示例代码目录。如需快速体验,可下载已切分的两个Matmul模型文件[Matmul0.mindir](https://download.mindspore.cn/model_zoo/official/lite/quick_start/Matmul0.mindir)、[Matmul1.mindir](https://download.mindspore.cn/model_zoo/official/lite/quick_start/Matmul1.mindir)。

3. 对于Ascend设备类型,通过hccl_tools.py按照需要生成组网信息文件,存放至示例代码目录,并将该文件路径填入示例代码目录下配置文件 `config_file.ini` 中。



+ 3
- 3
docs/lite/docs/source_zh_cn/use/cloud_infer/runtime_distributed_python.md View File

@@ -4,7 +4,7 @@

## 概述

针对大规模神经网络模型参数多、无法完全加载至单设备推理的场景,可利用多设备进行分布式推理。本教程介绍如何使用[Python接口](https://www.mindspore.cn/lite/api/zh-CN/master/mindspore_lite.html)执行MindSpore Lite云侧分布式推理。云侧分布式推理与[云侧单卡推理](https://www.mindspore.cn/lite/docs/zh-CN/master/use/cloud_infer/runtime_python.html)流程大致相同,可以相互参考。关于分布式推理的相关内容可参考[MindSpore分布式推理](https://www.mindspore.cn/tutorials/experts/zh-CN/master/parallel/model_loading.html#推理),相比之下,MindSpore Lite云侧分布式推理针对性能方面具有更多的优化。
针对大规模神经网络模型参数多、无法完全加载至单设备推理的场景,可利用多设备进行分布式推理。本教程介绍如何使用[Python接口](https://www.mindspore.cn/lite/api/zh-CN/master/mindspore_lite.html)执行MindSpore Lite云侧分布式推理。云侧分布式推理与[云侧单卡推理](https://www.mindspore.cn/lite/docs/zh-CN/master/use/cloud_infer/runtime_python.html)流程大致相同,可以相互参考。关于分布式推理的相关内容可参考[MindSpore分布式推理](https://www.mindspore.cn/docs/zh-CN/master/model_train/parallel/model_loading.html#推理),相比之下,MindSpore Lite云侧分布式推理针对性能方面具有更多的优化。

MindSpore Lite云侧分布式推理仅支持在Linux环境部署运行,支持的设备类型为Atlas训练系列产品和Nvidia GPU。如下图所示,当前通过多进程方式启动分布式推理,每个进程对应通信集合中的一个`Rank`,对各自已切分的模型进行加载、编译与执行,每个进程输入数据相同。

@@ -12,7 +12,7 @@ MindSpore Lite云侧分布式推理仅支持在Linux环境部署运行,支持

每个进程主要包括以下步骤:

1. 模型读取:通过MindSpore切分并[导出分布式MindIR模型](https://www.mindspore.cn/tutorials/experts/zh-CN/master/parallel/model_loading.html#分布式场景导出mindir文件),MindIR模型数量与设备数相同,用于加载到各个设备进行推理。
1. 模型读取:通过MindSpore切分并[导出分布式MindIR模型](https://www.mindspore.cn/docs/zh-CN/master/model_train/parallel/model_loading.html#分布式场景导出mindir文件),MindIR模型数量与设备数相同,用于加载到各个设备进行推理。
2. 上下文创建与配置:创建并配置上下文[Context](https://www.mindspore.cn/lite/api/zh-CN/master/mindspore_lite/mindspore_lite.Context.html#mindspore_lite.Context),保存分布式推理参数,用于指导分布式模型编译和模型执行。
3. 模型加载与编译:使用[Model.build_from_file](https://www.mindspore.cn/lite/api/zh-CN/master/mindspore_lite/mindspore_lite.Model.html#mindspore_lite.Model.build_from_file)接口进行模型加载和模型编译。模型加载阶段将文件缓存解析成运行时的模型。模型编译阶段将前端计算图优化为高性能后端计算图,该过程耗时较长,建议一次编译,多次推理。
4. 模型输入数据填充。
@@ -24,7 +24,7 @@ MindSpore Lite云侧分布式推理仅支持在Linux环境部署运行,支持

1. 下载云侧分布式推理python示例代码,请选择设备类型:[Ascend](https://gitee.com/mindspore/mindspore/tree/master/mindspore/lite/examples/cloud_infer/ascend_ge_distributed_python)或[GPU](https://gitee.com/mindspore/mindspore/tree/master/mindspore/lite/examples/cloud_infer/gpu_trt_distributed_python)。后文将该目录称为示例代码目录。

2. 通过MindSpore切分并[导出分布式MindIR模型](https://www.mindspore.cn/tutorials/experts/zh-CN/master/parallel/model_loading.html#分布式场景导出mindir文件),将其存放至示例代码目录。如需快速体验,可下载已切分的两个Matmul模型文件[Matmul0.mindir](https://download.mindspore.cn/model_zoo/official/lite/quick_start/Matmul0.mindir)、[Matmul1.mindir](https://download.mindspore.cn/model_zoo/official/lite/quick_start/Matmul1.mindir)。
2. 通过MindSpore切分并[导出分布式MindIR模型](https://www.mindspore.cn/docs/zh-CN/master/model_train/parallel/model_loading.html#分布式场景导出mindir文件),将其存放至示例代码目录。如需快速体验,可下载已切分的两个Matmul模型文件[Matmul0.mindir](https://download.mindspore.cn/model_zoo/official/lite/quick_start/Matmul0.mindir)、[Matmul1.mindir](https://download.mindspore.cn/model_zoo/official/lite/quick_start/Matmul1.mindir)。

3. 对于Ascend设备类型,通过hccl_tools.py按照需要生成组网信息文件,存放至示例代码目录,并将该文件路径填入示例代码目录下配置文件 `config_file.ini` 中。



+ 1
- 1
docs/lite/docs/source_zh_cn/use/converter_tool.md View File

@@ -68,7 +68,7 @@ MindSpore Lite模型转换工具提供了多种参数设置,用户可根据需
| `--outputDataType=<OUTPUTDATATYPE>` | 否 | 设定量化模型输出tensor的data type。仅当模型输出tensor的量化参数(scale和zero point)齐备时有效。默认与原始模型输出tensor的data type保持一致。 | FLOAT32、INT8、UINT8、DEFAULT | DEFAULT | - |
| `--outputDataFormat=<OUTPUTDATAFORMAT>` | 否 | 设定导出模型的输出format,只对四维输出有效。 | NHWC、NCHW | - | - |
| `--encryptKey=<ENCRYPTKEY>` | 否 | 设定导出加密`ms`模型的密钥,密钥用十六进制表示。仅支持 AES-GCM,密钥长度仅支持16Byte。 | - | - | - |
| `--encryption=<ENCRYPTION>` | 否 | 设定导出`ms`模型时是否加密,导出加密可保护模型完整性,但会增加运行时初始化时间。 | true、false | true | - |
| `--encryption=<ENCRYPTION>` | 否 | 设定导出`ms`模型时是否加密,导出加密可保护模型完整性,但会增加运行时初始化时间。 | true、false | false | - |
| `--infer=<INFER>` | 否 | 设定是否在转换完成时进行预推理。 | true、false | false | - |

> - 参数名和参数值之间用等号连接,中间不能有空格。


+ 5
- 1
docs/mindchemistry/docs/source_en/index.rst View File

@@ -236,13 +236,15 @@ arXiv:2209.11807v1 [cs.LG] 23 sep 2022.
.. toctree::
:maxdepth: 1
:caption: Quick Start
:hidden:

quick_start/quick_start
quick_start/quick_start

.. toctree::
:glob:
:maxdepth: 1
:caption: User Guide
:hidden:

user/molecular_generation
user/molecular_prediction
@@ -250,6 +252,7 @@ arXiv:2209.11807v1 [cs.LG] 23 sep 2022.
.. toctree::
:maxdepth: 1
:caption: API References
:hidden:

mindchemistry.cell
mindchemistry.e3
@@ -260,5 +263,6 @@ arXiv:2209.11807v1 [cs.LG] 23 sep 2022.
:glob:
:maxdepth: 1
:caption: RELEASE NOTES
:hidden:

RELEASE

+ 2
- 2
docs/mindchemistry/docs/source_en/user/molecular_generation.md View File

@@ -1,8 +1,8 @@
# Molecular Generation

[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindchemistry/docs/source_zh_cn/user/molecular_generation.md)
[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindchemistry/docs/source_zh_cn/user/molecular_generation.md)

Molecular generation, using deep learning generation models to predict and generate components in the particle system. We have integrated a method based on active learning for high entropy alloy design [1], designing high entropy alloy components with extremely low thermal expansion coefficients. In the active learning process, first, candidate high entropy alloy components are generated based on AI models, and the candidate components are screened based on predictive models and thermodynamic calculations to predict the thermal expansion coefficient. Finally, researchers need to determine the final high entropy alloy components based on experimental verification.
Molecular generation, using deep learning generation models to predict and generate components in the particle system. We have integrated a method based on active learning for high entropy alloy design, designing high entropy alloy components with extremely low thermal expansion coefficients. In the active learning process, first, candidate high entropy alloy components are generated based on AI models, and the candidate components are screened based on predictive models and thermodynamic calculations to predict the thermal expansion coefficient. Finally, researchers need to determine the final high entropy alloy components based on experimental verification.

## Supported Networks



+ 2
- 2
docs/mindchemistry/docs/source_en/user/molecular_prediction.md View File

@@ -1,8 +1,8 @@
# Molecular Prediction

[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindchemistry/docs/source_zh_cn/user/molecular_prediction.md)
[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindchemistry/docs/source_zh_cn/user/molecular_prediction.md)

Molecular property prediction, predicting various properties in different particle systems through deep learning networks. We integrated the NequIP model [2] and Allegro model [3] to construct a graph structure description based on the position and number of atoms in the molecular system. Using equivariant calculations and graph neural networks, we calculated the energy of the molecular system.
Molecular property prediction, predicting various properties in different particle systems through deep learning networks. We integrated the NequIP model and Allegro model to construct a graph structure description based on the position and number of atoms in the molecular system. Using equivariant calculations and graph neural networks, we calculated the energy of the molecular system.
Density Functional Theory Hamiltonian Prediction. We integrate the DeephE3nn model, an equivariant neural network based on E3, to predict a Hamiltonian by using the structure of atoms.
Prediction of crystalline material properties. We integrate the Matformer model based on graph neural networks and Transformer architectures, for predicting various properties of crystalline materials.



+ 1
- 0
docs/mindchemistry/docs/source_zh_cn/conf.py View File

@@ -49,6 +49,7 @@ extensions = [
'sphinx.ext.napoleon',
'sphinx.ext.viewcode',
'myst_parser',
'nbsphinx',
'sphinx.ext.mathjax',
'IPython.sphinxext.ipython_console_highlighting'
]


+ 5
- 1
docs/mindchemistry/docs/source_zh_cn/index.rst View File

@@ -175,13 +175,15 @@ arXiv:2209.11807v1 [cs.LG] 23 sep 2022.
.. toctree::
:maxdepth: 1
:caption: 快速入门
:hidden:

quick_start/quick_start
quick_start/quick_start

.. toctree::
:glob:
:maxdepth: 1
:caption: 使用者指南
:hidden:

user/molecular_generation
user/molecular_prediction
@@ -189,6 +191,7 @@ arXiv:2209.11807v1 [cs.LG] 23 sep 2022.
.. toctree::
:maxdepth: 1
:caption: API参考
:hidden:

mindchemistry.cell
mindchemistry.e3
@@ -199,5 +202,6 @@ arXiv:2209.11807v1 [cs.LG] 23 sep 2022.
:glob:
:maxdepth: 1
:caption: RELEASE NOTES
:hidden:

RELEASE

+ 1
- 1
docs/mindchemistry/docs/source_zh_cn/user/molecular_generation.md View File

@@ -2,7 +2,7 @@

[![查看源文件](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindchemistry/docs/source_zh_cn/user/molecular_generation.md)

分子生成,通过深度学习的生成模型去预测并生成粒子体系中的组成. 我们集成了基于主动学习进行高熵合金设计的方法[1],设计热膨胀系数极低的高熵合金组分。在主动学习流程中,首先基于AI模型生成候选的高熵合金组分,并基于预测模型和热动力学计算预测热膨胀系数对候选组分进行筛选,最终需要研究者基于实验验证确定最终的高熵合金组分。
分子生成,通过深度学习的生成模型去预测并生成粒子体系中的组成. 我们集成了基于主动学习进行高熵合金设计的方法,设计热膨胀系数极低的高熵合金组分。在主动学习流程中,首先基于AI模型生成候选的高熵合金组分,并基于预测模型和热动力学计算预测热膨胀系数对候选组分进行筛选,最终需要研究者基于实验验证确定最终的高熵合金组分。

## 已支持网络



+ 1
- 1
docs/mindchemistry/docs/source_zh_cn/user/molecular_prediction.md View File

@@ -2,7 +2,7 @@

[![查看源文件](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindchemistry/docs/source_zh_cn/user/molecular_prediction.md)

分子性质预测,通过深度学习网络预测不同粒子体系中的各种性质. 我们集成了NequIP模型[2]、Allegro模型[3],根据分子体系中各原子的位置与原子数信息构建图结构描述,基于等变计算与图神经网络,计算出分子体系能量。
分子性质预测,通过深度学习网络预测不同粒子体系中的各种性质. 我们集成了NequIP模型、Allegro模型,根据分子体系中各原子的位置与原子数信息构建图结构描述,基于等变计算与图神经网络,计算出分子体系能量。
密度泛函理论哈密顿量预测。我们集成了DeephE3nn模型,基于E3的等变神经网络,利用原子的结构去预测其的哈密顿量。
晶体材料性质预测。我们集成了Matformer模型,基于图神经网络和Transformer架构的模型,用于预测晶体材料的各种性质。



+ 2
- 2
docs/mindearth/docs/source_en/medium-range/FourCastNet.ipynb View File

@@ -137,7 +137,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"You can get parameters of model, data and optimizer from [FourCastNet.yaml](https://gitee.com/mindspore/mindscience/blob/master/MindEarth/applications/medium-range/fourcastnet/FourCastNet.yaml)."
"You can get parameters of model, data and optimizer from [FourCastNet.yaml](https://gitee.com/mindspore/mindscience/blob/master/MindEarth/applications/medium-range/fourcastnet/configs/FourCastNet.yaml)."
]
},
{
@@ -182,7 +182,7 @@
"\n",
"Download the statistic, training and validation dataset from [dataset](https://download.mindspore.cn/mindscience/mindearth/dataset/WeatherBench_1.4_69/) to `./dataset`.\n",
"\n",
"Modify the parameter of `root_dir` in the [FourCastNet.yaml](https://gitee.com/mindspore/mindscience/blob/master/MindEarth/applications/medium-range/fourcastnet/FourCastNet.yaml), which sets the directory for dataset.\n",
"Modify the parameter of `root_dir` in the [FourCastNet.yaml](https://gitee.com/mindspore/mindscience/blob/master/MindEarth/applications/medium-range/fourcastnet/configs/FourCastNet.yaml), which sets the directory for dataset.\n",
"\n",
"The `./dataset` is hosted with the following directory structure:\n",
"\n",


+ 2
- 2
docs/mindearth/docs/source_en/medium-range/vit_kno.ipynb View File

@@ -83,7 +83,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"You can get parameters of model, data and optimizer from [vit_kno.yaml](https://gitee.com/mindspore/mindscience/blob/master/MindEarth/applications/medium-range/koopman_vit/vit_kno.yaml)."
"You can get parameters of model, data and optimizer from [vit_kno.yaml](https://gitee.com/mindspore/mindscience/blob/master/MindEarth/applications/medium-range/koopman_vit/configs/vit_kno_1.4.yaml)."
]
},
{
@@ -120,7 +120,7 @@
"\n",
"Download the statistic, training and validation dataset from [dataset](https://download.mindspore.cn/mindscience/mindearth/dataset/WeatherBench_1.4_69/) to `./dataset`.\n",
"\n",
"Modify the parameter of `root_dir` in the [vit_kno.yaml](https://gitee.com/mindspore/mindscience/blob/master/MindEarth/applications/medium-range/koopman_vit/vit_kno.yaml), which set the directory for dataset.\n",
"Modify the parameter of `root_dir` in the [vit_kno.yaml](https://gitee.com/mindspore/mindscience/blob/master/MindEarth/applications/medium-range/koopman_vit/configs/vit_kno_1.4.yaml), which set the directory for dataset.\n",
"\n",
"The `./dataset` is hosted with the following directory structure:\n",
"\n",


+ 2
- 2
docs/mindearth/docs/source_zh_cn/medium-range/FourCastNet.ipynb View File

@@ -137,7 +137,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"model、data和optimizer的参数可以通过加载yaml文件获取([FourCastNet.yaml](https://gitee.com/mindspore/mindscience/blob/master/MindEarth/applications/medium-range/fourcastnet/FourCastNet.yaml))。"
"model、data和optimizer的参数可以通过加载yaml文件获取([FourCastNet.yaml](https://gitee.com/mindspore/mindscience/blob/master/MindEarth/applications/medium-range/fourcastnet/configs/FourCastNet.yaml))。"
]
},
{
@@ -182,7 +182,7 @@
"\n",
"在[dataset](https://download.mindspore.cn/mindscience/mindearth/dataset/WeatherBench_1.4_69/)路径下,下载正则化参数、训练数据集验证数据集到 `./dataset`目录。\n",
"\n",
"修改[FourCastNet.yaml](https://gitee.com/mindspore/mindscience/blob/master/MindEarth/applications/medium-range/fourcastnet/FourCastNet.yaml)配置文件中的`root_dir`参数,该参数设置了数据集的路径。\n",
"修改[FourCastNet.yaml](https://gitee.com/mindspore/mindscience/blob/master/MindEarth/applications/medium-range/fourcastnet/configs/FourCastNet.yaml)配置文件中的`root_dir`参数,该参数设置了数据集的路径。\n",
"\n",
"`./dataset`中的目录结构如下所示:\n",
"\n",


+ 2
- 2
docs/mindearth/docs/source_zh_cn/medium-range/vit_kno.ipynb View File

@@ -83,7 +83,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"model、data和optimizer的参数可以通过加载yaml文件获取([vit_kno.yaml](https://gitee.com/mindspore/mindscience/blob/master/MindEarth/applications/medium-range/koopman_vit/vit_kno.yaml))。"
"model、data和optimizer的参数可以通过加载yaml文件获取([vit_kno.yaml](https://gitee.com/mindspore/mindscience/blob/master/MindEarth/applications/medium-range/koopman_vit/configs/vit_kno_1.4.yaml))。"
]
},
{
@@ -119,7 +119,7 @@
"\n",
"在[dataset](https://download.mindspore.cn/mindscience/mindearth/dataset/WeatherBench_1.4_69/)路径下,下载正则化参数、训练数据集、验证数据集到 `./dataset`目录。\n",
"\n",
"修改[vit_kno.yaml](https://gitee.com/mindspore/mindscience/blob/master/MindEarth/applications/medium-range/koopman_vit/vit_kno.yaml)配置文件中的`root_dir`参数,该参数设置了数据集的路径。\n",
"修改[vit_kno.yaml](https://gitee.com/mindspore/mindscience/blob/master/MindEarth/applications/medium-range/koopman_vit/configs/vit_kno_1.4.yaml)配置文件中的`root_dir`参数,该参数设置了数据集的路径。\n",
"\n",
"`./dataset`中的目录结构如下所示:\n",
"\n",


+ 3
- 3
docs/mindinsight/docs/source_en/accuracy_optimization.md View File

@@ -103,7 +103,7 @@ The causes of accuracy problems can be classified into hyperparameter problems,

2. The MindSpore constructor constraint is not complied with during graph construction.

The graph construction does not comply with the MindSpore construct constraints. That is, the network in graph mode does not comply with the constraints declared in the MindSpore static graph syntax support. For example, MindSpore does not support the backward computation of functions with key-value pair parameters. For details about complete constraints, see [Static Graph Syntax Support](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html).
The graph construction does not comply with the MindSpore construct constraints. That is, the network in graph mode does not comply with the constraints declared in the MindSpore static graph syntax support. For example, MindSpore does not support the backward computation of functions with key-value pair parameters. For details about complete constraints, see [Static Graph Syntax Support](https://www.mindspore.cn/docs/en/master/model_train/program_form/static_graph.html).

- Computational Graph Structure Problems

@@ -581,13 +581,13 @@ For details about visualized data analysis during training, see [Viewing Dashboa

### Data Problem Handling

Perform operations such as standardization, normalization, and channel conversion on data. For image data processing, add images with random view and rotation. For details about data shuffle, batch, and multiplication, see [Processing Data](https://www.mindspore.cn/tutorials/en/master/advanced/dataset.html), [Data Argumentation](https://www.mindspore.cn/tutorials/en/master/advanced/dataset.html), and [Auto Augmentation](https://www.mindspore.cn/tutorials/experts/en/master/dataset/augment.html).
Perform operations such as standardization, normalization, and channel conversion on data. For image data processing, add images with random view and rotation. For details about data shuffle, batch, and multiplication, see [Processing and Loading Data](https://www.mindspore.cn/docs/en/master/model_train/index.html).

> For details about how to apply the data augmentation operation to a custom dataset, see the [mindspore.dataset.GeneratorDataset.map](https://www.mindspore.cn/docs/en/master/api_python/dataset/dataset_method/operation/mindspore.dataset.Dataset.map.html#mindspore.dataset.Dataset.map) API.

### Hyperparameter Problem Handling

Hyperparameters in AI training include the global learning rate, epoch, and batch. For details about how to set the dynamic learning rate, see [Optimization Algorithm of Learning Rate](https://mindspore.cn/tutorials/zh-CN/master/advanced/modules/optimizer.html).
Hyperparameters in AI training include the global learning rate, epoch, and batch. For details about how to set the dynamic learning rate, see [Optimization Algorithm of Learning Rate](https://mindspore.cn/docs/zh-CN/master/model_train/custom_program/optimizer.html).

### Model Structure Problem Handling



+ 5
- 5
docs/mindinsight/docs/source_en/accuracy_problem_preliminary_location.md View File

@@ -341,13 +341,13 @@ When you run a script on the Ascend backend or use the mixed precision function,
#### mp.01 Overflow occurs during training

Check method:
When the [mixed precision](https://www.mindspore.cn/tutorials/zh-CN/master/advanced/mixed_precision.html) or the Ascend AI processor is used for training, you are advised to check whether overflow occurs.
When the [mixed precision](https://www.mindspore.cn/tutorials/zh-CN/master/beginner/mixed_precision.html) or the Ascend AI processor is used for training, you are advised to check whether overflow occurs.

After the overflow problem is found, find and analyze the first overflow node. (For Ascend overflow data, find the node with the smallest timestamp based on the timestamp in the file name. For GPU overflow data, find the first node in the execution sequence.) Determine the overflow cause based on the input and output data of the API.

The common solutions to the overflow problem are as follows:

1. Enable dynamic loss scale or set a proper static loss scale value. For details, see [LossScale](https://www.mindspore.cn/tutorials/zh-CN/master/advanced/mixed_precision.html). Note that when the static loss scale in the GPU scenario is directly used for Ascend training, unexpected frequent overflow may occur, affecting convergence. After the loss scale is enabled, you may need to perform multiple experiments to adjust the init_loss_scale (initial value), scale_factor, and scale_window of loss scale until there are few floating-point overflows during training.
1. Enable dynamic loss scale or set a proper static loss scale value. For details, see [LossScale](https://www.mindspore.cn/tutorials/zh-CN/master/beginner/mixed_precision.html). Note that when the static loss scale in the GPU scenario is directly used for Ascend training, unexpected frequent overflow may occur, affecting convergence. After the loss scale is enabled, you may need to perform multiple experiments to adjust the init_loss_scale (initial value), scale_factor, and scale_window of loss scale until there are few floating-point overflows during training.
2. If the overflow problem has a key impact on the accuracy and cannot be avoided, change the corresponding API to the FP32 API (the performance may be greatly affected after the adjustment).

Conclusion:
@@ -358,7 +358,7 @@ Enter here.

Check method:

When [mixed precision](https://www.mindspore.cn/tutorials/zh-CN/master/advanced/mixed_precision.html) is used. You can use the default parameter values of DynamicLossScaleManager or FixedLossScaleManager for training. If there are too many overflow steps and the final accuracy is affected, adjust the value of loss_scale based on the overflow phenomenon. If gradient overflow occurs, decrease the value of loss_scale (by dividing the original value of loss_scale by 2). If gradient underflow occurs, increase the value of loss_scale (by multiplying the original value of loss_scale by 2). In most cases, training on the Ascend AI processor is performed with mixed precision. The computation feature of the Ascend AI processor is different from that of the GPU mixed precision. Therefore, you may need to adjust the value of the LossScaleManager hyperparameter to a value different from that on the GPU based on the training result to ensure the precision.
When [mixed precision](https://www.mindspore.cn/tutorials/zh-CN/master/beginner/mixed_precision.html) is used. You can use the default parameter values of DynamicLossScaleManager or FixedLossScaleManager for training. If there are too many overflow steps and the final accuracy is affected, adjust the value of loss_scale based on the overflow phenomenon. If gradient overflow occurs, decrease the value of loss_scale (by dividing the original value of loss_scale by 2). If gradient underflow occurs, increase the value of loss_scale (by multiplying the original value of loss_scale by 2). In most cases, training on the Ascend AI processor is performed with mixed precision. The computation feature of the Ascend AI processor is different from that of the GPU mixed precision. Therefore, you may need to adjust the value of the LossScaleManager hyperparameter to a value different from that on the GPU based on the training result to ensure the precision.

Conclusion:

@@ -368,7 +368,7 @@ Enter here.

Check method:

Gradient clip forcibly adjusts the gradient to a smaller value when the gradient is greater than a threshold. Gradient clip has a good effect on the gradient explosion problem in RNNs. If both [loss scale](https://www.mindspore.cn/tutorials/zh-CN/master/advanced/mixed_precision.html) and gradient clip are used, perform this check. Check the code to ensure that the application object of gradient clip is the original gradient value obtained by dividing the loss scale.
Gradient clip forcibly adjusts the gradient to a smaller value when the gradient is greater than a threshold. Gradient clip has a good effect on the gradient explosion problem in RNNs. If both [loss scale](https://www.mindspore.cn/tutorials/zh-CN/master/beginner/mixed_precision.html) and gradient clip are used, perform this check. Check the code to ensure that the application object of gradient clip is the original gradient value obtained by dividing the loss scale.

Conclusion:

@@ -378,7 +378,7 @@ Enter here.

Check method:

Gradient penalty is a technique that adds a gradient to a cost function to constrain the gradient length. If both [loss scale](https://www.mindspore.cn/tutorials/zh-CN/master/advanced/mixed_precision.html) and gradient penalty are used, perform this check. Check whether the entered gradient is a gradient without loss scale when computing the gradient penalty item. For example, a gradient substituted for the loss scale may be first divided by the loss scale, and then is used to compute the gradient penalty item.
Gradient penalty is a technique that adds a gradient to a cost function to constrain the gradient length. If both [loss scale](https://www.mindspore.cn/tutorials/zh-CN/master/beginner/mixed_precision.html) and gradient penalty are used, perform this check. Check whether the entered gradient is a gradient without loss scale when computing the gradient penalty item. For example, a gradient substituted for the loss scale may be first divided by the loss scale, and then is used to compute the gradient penalty item.

Conclusion:



+ 2
- 2
docs/mindinsight/docs/source_en/performance_optimization.md View File

@@ -43,7 +43,7 @@ By observing the `queue relationship between operators` in the Data Processing t

*Figure 3: Data Preparation Details -- Data Processing*

We can refer to [Optimizing the Data Processing](https://www.mindspore.cn/tutorials/experts/en/master/dataset/optimize.html ) to adjust dataset operations to improve dataset performance.
We can refer to [Optimizing the Data Processing](https://www.mindspore.cn/docs/en/master/model_train/dataset/optimize.html ) to adjust dataset operations to improve dataset performance.

We find that the num_parallel_workers parameter of map operation is 1(default value) by observing the code part of data processing in ResNet50, and code is shown below:

@@ -95,7 +95,7 @@ Open the details page of Operator Time Consumption Ranking, and we find that Mat

*Figure 6: Finding operators that can be optimized via the details page of Operator Time Consumption Ranking*

For Operator Time Consumption optimization, usually float16 type with the less computating amount can be used to improve operator performance if there is no difference in accuracy between float16 and float32 type. We can refer to [Enabling Mixed Precision](https://www.mindspore.cn/tutorials/en/master/advanced/mixed_precision.html) to improve operators performance.
For Operator Time Consumption optimization, usually float16 type with the less computating amount can be used to improve operator performance if there is no difference in accuracy between float16 and float32 type. We can refer to [Enabling Mixed Precision](https://www.mindspore.cn/tutorials/en/master/beginner/mixed_precision.html) to improve operators performance.

Optimization code is shown below:



+ 1
- 1
docs/mindinsight/docs/source_en/performance_profiling_gpu.md View File

@@ -74,7 +74,7 @@ There are two ways to collect neural network performance data. You can enable Pr
- `timeline_limit`(int, optional) - Set the maximum storage size of the timeline file (unit M). When using this parameter, op_time must be set to true. Default value: 500.
- `data_process`(bool, optional) - Indicates whether to collect data to prepare performance data. Default value: true.
- `op_time` (bool, optional) - Whether to collect operators performance data. Default value: true.
- `profile_framework`(str, optional) - Whether to collect host memory and time, it must be one of ["all", "time", "memory", null]. Default: "all".
- `profile_framework`(str, optional) - Whether to collect host time, it must be one of ["all", "time", null]. Default: null.

## Launching MindSpore Insight



+ 4
- 4
docs/mindinsight/docs/source_en/performance_tuning_guide.md View File

@@ -58,7 +58,7 @@ Step 1:Please jump to the `step interval` tab on the `data preparation details

- If there is no time-consuming customized logic in the script, it indicates that sending data from host to device is time-consuming, please feedback to the [MindSpore Community](https://gitee.com/mindspore/mindspore/issues).

Step 2:Please jump to the `data processing` tab on the `data preparation details` page, observe the inter-operator queue, and determine which operation has a performance bottleneck in the data processing. Principles of judgment can be found in the [Performance Profiling](https://www.mindspore.cn/mindinsight/docs/en/master/performance_profiling_ascend.html#data-preparation-performance-analysis) page. Users can reference [Optimizing the Data Processing](https://www.mindspore.cn/tutorials/experts/en/master/dataset/optimize.html) and try to optimize the data processing performance.
Step 2:Please jump to the `data processing` tab on the `data preparation details` page, observe the inter-operator queue, and determine which operation has a performance bottleneck in the data processing. Principles of judgment can be found in the [Performance Profiling](https://www.mindspore.cn/mindinsight/docs/en/master/performance_profiling_ascend.html#data-preparation-performance-analysis) page. Users can reference [Optimizing the Data Processing](https://www.mindspore.cn/docs/en/master/model_train/dataset/optimize.html) and try to optimize the data processing performance.

#### Data Sinking Mode

@@ -69,7 +69,7 @@ Step 1:Please jump to the `step interval` tab on the `data preparation details

Step 2: See how the size curve changes in the host queue. If none of the size in the queue is 0, it indicates that the process by which training data is sent from host to device is a performance bottleneck, please feedback to the [MindSpore Community](https://gitee.com/mindspore/mindspore/issues). Otherwise it indicates that the data processing process is the performance bottleneck, please refer to Step 3 to continue to locate which operation of data processing has performance problems.

Step 3:Please jump to the `data processing` tab on the `data preparation details` page, observe the inter-operator queue, and determine which operation has a performance bottleneck in the data processing. Principles of judgment can be found in the [Performance Profiling](https://www.mindspore.cn/mindinsight/docs/en/master/performance_profiling_ascend.html#data-preparation-performance-analysis) page. Users can reference [Optimizing the Data Processing](https://www.mindspore.cn/tutorials/experts/en/master/dataset/optimize.html) and try to optimize the data processing performance.
Step 3:Please jump to the `data processing` tab on the `data preparation details` page, observe the inter-operator queue, and determine which operation has a performance bottleneck in the data processing. Principles of judgment can be found in the [Performance Profiling](https://www.mindspore.cn/mindinsight/docs/en/master/performance_profiling_ascend.html#data-preparation-performance-analysis) page. Users can reference [Optimizing the Data Processing](https://www.mindspore.cn/docs/en/master/model_train/dataset/optimize.html) and try to optimize the data processing performance.

### Long Forward And Backward Propagation

@@ -125,7 +125,7 @@ Step 2:Observe the forward and backward propagation in the cluster step trace
Step 3: Observe the step tail in the cluster step trace page

- Users should make sure if the step tail of one device is much longer than others first. If it is, it usually caused by slow node in the cluster. Users can refer to Step 1 and Step 2 to find the slow node.
- If the step tail of all devices are essentially the same, and the phase is time-consuming, it is usually due to the long time taken by the AllReduce collective communication operators. Users can try to modify the all_reduce_fusion_config parameter to optimize the performance, and change [AllReduce Fusion Sharding Strategy](https://mindspore.cn/tutorials/experts/en/master/parallel/overview.html) to reduce the time spent in this phase.
- If the step tail of all devices are essentially the same, and the phase is time-consuming, it is usually due to the long time taken by the AllReduce collective communication operators. Users can try to modify the all_reduce_fusion_config parameter to optimize the performance, and change [AllReduce Fusion Sharding Strategy](https://mindspore.cn/docs/en/master/model_train/parallel/overview.html) to reduce the time spent in this phase.

### Model Parallel

@@ -143,7 +143,7 @@ Please refer to step 2 of [Data Parallel](#data-parallel).
Step 3: Observe the pure communication time in the cluster step trace page

On the premise of confirming that there is no slow node through step 1 and step 2, the pure communication time of each card in the cluster should be basically the same. If this phase takes a short time, it means that the communication time caused by re-distribution of operators is very short, and users do not need to consider optimizing the parallel strategy. Otherwise, users need to focus on analyzing whether the parallel strategy can be optimized.
Users need to have a certain understanding of the principle of model parallelism before continue to analyse. Please refer to [Distributed Training](https://www.mindspore.cn/tutorials/experts/en/master/parallel/overview.html) for the basic principles. The following steps are only to assist users in rationality analysis. Whether the parallel strategy has room for optimization and how to optimize it need users to make a judgment after specific analysis of their respective networks.
Users need to have a certain understanding of the principle of model parallelism before continue to analyse. Please refer to [Distributed Training](https://www.mindspore.cn/docs/en/master/model_train/parallel/overview.html) for the basic principles. The following steps are only to assist users in rationality analysis. Whether the parallel strategy has room for optimization and how to optimize it need users to make a judgment after specific analysis of their respective networks.

- If this stage takes a long time, the user can choose any one of the devices and observe its timeline. In the timeline, MindSpore Insight marks the pure communication time, refer to `Pure Communication Op` below.



+ 1
- 1
docs/mindinsight/docs/source_en/profiling/profiling_host_time.txt View File

@@ -1,4 +1,4 @@
Host Side Time Consumption Analysis
-------------------------------------

If the Host side time collection function is enabled, the Host side time-consuming of each stage can be saved in the specified directory after the training is completed. For example, when a Profiler is specified with ``output_ Path="/XXX/profiler_output"`` , the file containing time consumption data on the Host side will be saved in the "/XXX/profiler_output/profile/host_info" directory. The file is in json format and with the prefix ``"timeline_"``, and suffix rank_id. The host side time-consuming file can be viewed by `` chrome://tracing `` . You can use W/S/A/D to zoom in, out, move left, and right to view time consuming information.
If the Host side time collection function is enabled, you can view the time consumption in ascend_timeline_display_[rank_id].json after the traing finished and use `` chrome://tracing `` to display. You can use W/S/A/D to zoom in, out, move left, and right to view time consuming information.

+ 42
- 43
docs/mindinsight/docs/source_en/profiling/profiling_offline.txt View File

@@ -52,11 +52,6 @@ An example of the performance data catalog structure is shown below:
├──── container
├──── FRAMEWORK // Raw data collected on the frame side
│ └──── op_range_*
├──── host_info // The results generated by the framework profiling
│ ├──── dataset_*.csv
│ ├──── host_info_*.csv
│ ├──── host_memory_*.csv
│ └──── timeline_*.json
├──── PROF_{number}_{timestamp}_{string} // msprof performance data
│ ├──── analyse
│ ├──── device_*
@@ -68,7 +63,8 @@ An example of the performance data catalog structure is shown below:
│ └──── task.csv
├──── rank-*_{timestamp}_ascend_ms // MindStudio Insight Visualization Deliverables
│ ├──── ASCEND_PROFILER_OUTPUT // Performance data collected by the MindSpore Profiler interface
│ └──── profiler_info_*.json
│ ├──── profiler_info_*.json
│ └──── profiler_metadata.json // To record user-defined meta data, call the add_metadata or add_metadata_json interface to generate the file
├──── aicore_intermediate_*_detail.csv
├──── aicore_intermediate_*_type.csv
├──── aicpu_intermediate_*.csv
@@ -99,61 +95,64 @@ An example of the performance data catalog structure is shown below:
├──── profiler_info_*.json
├──── step_trace_point_info_*.json
└──── step_trace_raw_*_detail_time.csv
└──── dataset_*.csv
- *represents rank id
- \* represents rank id

Performance Data File Description
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

PROF_{number}_{timestamp}_{string} directory is the performance data collected by CANN Profiling, which is mainly stored in mindstudio_profiler_output. The data introduction can be referred to `Performance data file description <https://www.hiascend.com/document/detail/en/mindstudio/70RC2/mscommandtoolug/mscommandug/atlasprofiling_16_0062.html>`_.

The profiler directory contains three types of files, csv, json, and txt, which cover performance data in terms of operator execution time, memory usage, communication, etc. The file descriptions are shown in the following table. For detailed descriptions of some files, refer to `Performance data <https://www.mindspore.cn/mindinsight/docs/en/master/profiler_files_description.html>`_.

============================================== ==============================================================================
File Names Descriptions
============================================== ==============================================================================
step_trace_point_info_*.json Information about the operator corresponding to the step node (only mode=GRAPH,export GRAPH_OP_RUM=0)
step_trace_raw_*_detail_time.csv Time information for the nodes of each STEP (only mode=GRAPH,export GRAPH_OP_RUM=0)
============================================== ==============================================================================
File Names Descriptions
============================================== ==============================================================================
step_trace_point_info_*.json Information about the operator corresponding to the step node (only mode=GRAPH,export GRAPH_OP_RUM=0)
step_trace_raw_*_detail_time.csv Time information for the nodes of each STEP (only mode=GRAPH,export GRAPH_OP_RUM=0)

dynamic_shape_info_*.json Operator information under dynamic shape
dynamic_shape_info_*.json Operator information under dynamic shape

pipeline_profiling_*.json MindSpore data processing to capture intermediate files of falling disks for MindInsight visualization
minddata_pipeline_raw_*.csv MindSpore data processing to capture intermediate files of falling disks for MindInsight visualization
minddata_pipeline_summary_*.csv MindSpore data processing to capture intermediate files of falling disks for MindInsight visualization
minddata_pipeline_summary_*.json MindSpore data processing to capture intermediate files of falling disks for MindInsight visualization
framework_raw_*.csv Information about AI Core operators in MindSpore data processing
device_queue_profiling_*.txt MindSpore data processing to capture intermediate files of falling disks for MindInsight visualization (data sinking scenarios only)
minddata_aicpu_*.txt Performance data for AI CPU operators in MindSpore data processing (data sinking scenarios only)
dataset_iterator_profiling_*.txt MindSpore data processing to capture intermediate files of falling disks for MindInsight visualization (data non-sinking scenarios only)
pipeline_profiling_*.json MindSpore data processing to capture intermediate files of falling disks for MindInsight visualization
minddata_pipeline_raw_*.csv MindSpore data processing to capture intermediate files of falling disks for MindInsight visualization
minddata_pipeline_summary_*.csv MindSpore data processing to capture intermediate files of falling disks for MindInsight visualization
minddata_pipeline_summary_*.json MindSpore data processing to capture intermediate files of falling disks for MindInsight visualization
framework_raw_*.csv Information about AI Core operators in MindSpore data processing
device_queue_profiling_*.txt MindSpore data processing to capture intermediate files of falling disks for MindInsight visualization (data sinking scenarios only)
minddata_aicpu_*.txt Performance data for AI CPU operators in MindSpore data processing (data sinking scenarios only)
dataset_iterator_profiling_*.txt MindSpore data processing to capture intermediate files of falling disks for MindInsight visualization (data non-sinking scenarios only)

aicore_intermediate_*_detail.csv AI Core operator data
aicore_intermediate_*_type.csv AI Core operator calling counts and time taken statistics
aicpu_intermediate_*.csv Time taken data after AI CPU operator information parsing
flops_*.txt Record the number of floating-point calculations (FLOPs), floating-point calculations per second (FLOPS) for AI Core operators
flops_summary_*.json Record total FLOPs for all operators, average FLOPs for all operators, average FLOPS_Utilization
aicore_intermediate_*_detail.csv AI Core operator data
aicore_intermediate_*_type.csv AI Core operator calling counts and time taken statistics
aicpu_intermediate_*.csv Time taken data after AI CPU operator information parsing
flops_*.txt Record the number of floating-point calculations (FLOPs), floating-point calculations per second (FLOPS) for AI Core operators
flops_summary_*.json Record total FLOPs for all operators, average FLOPs for all operators, average FLOPS_Utilization

ascend_timeline_display_*.json timeline visualization file for MindStudio Insight visualization
ascend_timeline_summary_*.json timeline statistics
output_timeline_data_*.txt Operator timeline data, only if AI Core operator data exists
ascend_timeline_display_*.json timeline visualization file for MindStudio Insight visualization
ascend_timeline_summary_*.json timeline statistics
output_timeline_data_*.txt Operator timeline data, only if AI Core operator data exists

cpu_ms_memory_record_*.txt Raw files for memory profiling
operator_memory_*.csv Operator-level memory information
cpu_ms_memory_record_*.txt Raw files for memory profiling
operator_memory_*.csv Operator-level memory information

minddata_cpu_utilization_*.json CPU utilization rate
minddata_cpu_utilization_*.json CPU utilization rate

cpu_op_detail_info_*.csv CPU operator time taken data (mode=GRAPH only)
cpu_op_type_info_*.csv Class-specific CPU operator time taken statistics (mode=GRAPH only)
cpu_op_execute_timestamp_*.txt CPU operator execution start time and time taken (mode=GRAPH only)
cpu_framework_*.txt CPU operator time taken in heterogeneous scenarios (mode=GRAPH only)
cpu_op_detail_info_*.csv CPU operator time taken data (mode=GRAPH only)
cpu_op_type_info_*.csv Class-specific CPU operator time taken statistics (mode=GRAPH only)
cpu_op_execute_timestamp_*.txt CPU operator execution start time and time taken (mode=GRAPH only)
cpu_framework_*.txt CPU operator time taken in heterogeneous scenarios (mode=GRAPH only)

ascend_cluster_analyse_model-xxx.csv Data related to computation and communication, etc. in model-parallel or pipeline-parallel modes (mode=GRAPH only)
ascend_cluster_analyse_model-xxx.csv Data related to computation and communication, etc. in model-parallel or pipeline-parallel modes (mode=GRAPH only)

hccl_raw_*.csv Card-based communication time and communication wait time (mode=GRAPH only)
hccl_raw_*.csv Card-based communication time and communication wait time (mode=GRAPH only)

parallel_strategy_*.json Operator parallel strategy to capture falling disk intermediate files for MindInsight visualization
parallel_strategy_*.json Operator parallel strategy to capture falling disk intermediate files for MindInsight visualization

profiler_info_*.json Profiler Configuration and other info
============================================== ==============================================================================
profiler_info_*.json Profiler Configuration and other info

- *represents rank id
dataset_*.csv The time consuming of various stages of data processing module
============================================== ==============================================================================

- \* represents rank id
- The complete name of ascend_cluster_analyse_model-xxx_*.csv should be ascend_cluster_analyse_model-{mode}_{stage_num}_{rank_size}_{rank_id}.csv, such as ascend_cluster_analyse_model-parallel_1_8_0.csv

+ 0
- 17
docs/mindinsight/docs/source_en/profiling/profiling_resoure.txt View File

@@ -128,20 +128,3 @@ detailed information from ``Memory Usage``, including:
:alt: memory_graphics.png

*Figure:Memory Statistics*

Host side memory usage
~~~~~~~~~~~~~~~~~~~~~~~~~

If the host side memory collection function is enabled, the memory usage can be saved in the specified directory after the training is completed. For example, when a Profiler is specified with ``output_ Path="/XXX/profiler_output"`` , the file containing host side memory data will be saved in the "/XXX/profiler_output/profile/host_info" directory. The file is in csv format and with prefix ``"host_memory_"`` and suffix rank_id. The meaning of the header is as follows:

- tid: The thread ID of the current thread when collecting host side memory.
- pid: The process ID of the current process when collecting host side memory.
- parent_pid: The process ID of the current process's Parent process when collecting the host side memory.
- module_name: Name of the module that collects host side memory, one or more event may be included in a module.
- event: The event name which collected the host side memory, one or more stage may be included in a event.
- stage: The stage name which collected the host side memory.
- level: 0 means used by framework developers, and 1 means used by users(algorithm engineers).
- start_end: The mark for the start or end of the stage, where 0 represents the start mark, 1 represents the end mark, and 2 represents an indistinguishable start or end.
- custom_info: The component customization information used by framework developers to locate performance issues, possibly empty.
- memory_usage: Host-side memory usage in kB, and 0 means no memory data is collected at the current stage.
- time_stamp: Time stamp in us.

+ 1
- 1
docs/mindinsight/docs/source_en/profiling/profiling_training.txt View File

@@ -207,4 +207,4 @@ Note:

- `parallel_strategy` (bool, optional) - Indicates whether to collect parallel policy performance data. Default value: true.

- `profile_framework` (str, optional) - Whether to collect host memory and time, it must be one of ["all", "time", "memory", null]. Default: "all".
- `profile_framework` (str, optional) - Whether to collect host time, it must be one of ["all", "time", null]. Default: null.

+ 3
- 3
docs/mindinsight/docs/source_zh_cn/accuracy_optimization.md View File

@@ -103,7 +103,7 @@

2. 构图时未遵循MindSpore construct约束。

构图未遵循mindspore construct约束,是指图模式下的网络未遵循MindSpore静态图语法支持中声明的约束。例如,MindSpore目前不支持对带键值对参数的函数求反向。完整约束请见[静态图语法支持](https://www.mindspore.cn/docs/zh-CN/master/note/static_graph_syntax_support.html)。
构图未遵循mindspore construct约束,是指图模式下的网络未遵循MindSpore静态图语法支持中声明的约束。例如,MindSpore目前不支持对带键值对参数的函数求反向。完整约束请见[静态图语法支持](https://www.mindspore.cn/docs/zh-CN/master/model_train/program_form/static_graph.html)。

- 计算图结构问题

@@ -583,13 +583,13 @@ Xie, Z., Sato, I., & Sugiyama, M. (2020). A Diffusion Theory For Deep Learning D

### 数据问题处理

对数据进行标准化、归一化、通道转换等操作,在图片数据处理上,增加随机视野图片,随机旋转度图片等,另外数据混洗、batch和数据倍增等操作,可参考[数据处理](https://www.mindspore.cn/tutorials/zh-CN/master/advanced/dataset.html)、[数据增强](https://www.mindspore.cn/tutorials/zh-CN/master/advanced/dataset.html)和[自动数据增强](https://www.mindspore.cn/tutorials/experts/zh-CN/master/dataset/augment.html)。
对数据进行标准化、归一化、通道转换等操作,在图片数据处理上,增加随机视野图片,随机旋转度图片等,另外数据混洗、batch和数据倍增等操作,可参考[数据加载与处理](https://www.mindspore.cn/docs/zh-CN/master/model_train/index.html)。

> 如何将数据增强增强操作应用到自定义数据集中,可以参考[mindspore.dataset.GeneratorDataset.map](https://www.mindspore.cn/docs/zh-CN/master/api_python/dataset/dataset_method/operation/mindspore.dataset.Dataset.map.html#mindspore.dataset.Dataset.map) API。

### 超参问题处理

AI训练中的超参包含全局学习率,epoch和batch等,如果需要设置动态学习率超参时,可参考资料:[学习率的优化算法](https://mindspore.cn/tutorials/zh-CN/master/advanced/modules/optimizer.html#学习率)。
AI训练中的超参包含全局学习率,epoch和batch等,如果需要设置动态学习率超参时,可参考资料:[学习率的优化算法](https://mindspore.cn/docs/zh-CN/master/model_train/custom_program/optimizer.html#学习率)。

### 模型结构问题处理



+ 5
- 5
docs/mindinsight/docs/source_zh_cn/accuracy_problem_preliminary_location.md View File

@@ -341,13 +341,13 @@ MindSpore API同其它框架的API存在一定差异。有标杆脚本的情况
#### mp.01 训练中存在溢出问题

检查方法:
当使用[混合精度](https://www.mindspore.cn/tutorials/zh-CN/master/advanced/mixed_precision.html)训练,或者是使用Ascend AI处理器训练时,建议检查是否存在溢出问题。
当使用[混合精度](https://www.mindspore.cn/tutorials/zh-CN/master/beginner/mixed_precision.html)训练,或者是使用Ascend AI处理器训练时,建议检查是否存在溢出问题。

发现溢出问题后,应首先找到并分析第一个出现溢出的节点(对于Ascend的溢出数据,可以按文件名中的时间戳,找时间戳最小的一个;对于GPU上的溢出,只要找执行序中最靠前的一个),结合API的输入输出数据确定溢出原因。

出现溢出问题后常见的解决措施如下:

1. 使能动态loss scale功能,或者是合理设置静态loss scale的值,请参考[LossScale](https://www.mindspore.cn/tutorials/zh-CN/master/advanced/mixed_precision.html#损失缩放)。需要注意的是,直接将GPU场景中的静态loss scale用于Ascend上的训练时,可能会导致不期望的频繁溢出,影响收敛。loss scale使能后,可能需要多次实验以调整loss scale的初始值init_loss_scale、调整比例scale_factor、调整窗口scale_window等参数,直到训练中浮点溢出非常少,请参考[DynamicLossScaleManager](https://www.mindspore.cn/docs/zh-CN/master/api_python/amp/mindspore.amp.DynamicLossScaleManager.html)以了解这些参数的含义。
1. 使能动态loss scale功能,或者是合理设置静态loss scale的值,请参考[LossScale](https://www.mindspore.cn/tutorials/zh-CN/master/beginner/mixed_precision.html#损失缩放)。需要注意的是,直接将GPU场景中的静态loss scale用于Ascend上的训练时,可能会导致不期望的频繁溢出,影响收敛。loss scale使能后,可能需要多次实验以调整loss scale的初始值init_loss_scale、调整比例scale_factor、调整窗口scale_window等参数,直到训练中浮点溢出非常少,请参考[DynamicLossScaleManager](https://www.mindspore.cn/docs/zh-CN/master/api_python/amp/mindspore.amp.DynamicLossScaleManager.html)以了解这些参数的含义。
2. 溢出问题对精度有关键影响且无法规避的,将相应的API调整为FP32 API(调整后可能对性能有较大影响)。

检查结论:
@@ -358,7 +358,7 @@ MindSpore API同其它框架的API存在一定差异。有标杆脚本的情况

检查方法:

在使用[混合精度](https://www.mindspore.cn/tutorials/zh-CN/master/advanced/mixed_precision.html)时,一般应确认使能了[DynamicLossScaleManager](https://www.mindspore.cn/docs/zh-CN/master/api_python/amp/mindspore.amp.DynamicLossScaleManager.html)或[FixedLossScaleManager](https://www.mindspore.cn/docs/zh-CN/master/api_python/amp/mindspore.amp.FixedLossScaleManager.html),推荐优先使用DynamicLossScaleManager。可以先使用DynamicLossScaleManager或FixedLossScaleManager的默认参数值进行训练,若产生溢出的迭代过多,影响最终精度时,应根据主要的溢出现象,针对性调整loss_scale的值。当主要溢出现象为梯度上溢时,应减小loss_scale的值(可以尝试将原loss_scale值除以2);当主要溢出现象为梯度下溢时,应增大loss_scale的值(可以尝试将原loss_scale值乘以2)。对于Ascend AI处理器上的训练,其在大部分情况下为混合精度训练。由于Ascend AI处理器计算特性与GPU混合精度计算特性存在差异,LossScaleManager超参也可能需要根据训练情况调整为与GPU上不同的值以保证精度。
在使用[混合精度](https://www.mindspore.cn/tutorials/zh-CN/master/beginner/mixed_precision.html)时,一般应确认使能了[DynamicLossScaleManager](https://www.mindspore.cn/docs/zh-CN/master/api_python/amp/mindspore.amp.DynamicLossScaleManager.html)或[FixedLossScaleManager](https://www.mindspore.cn/docs/zh-CN/master/api_python/amp/mindspore.amp.FixedLossScaleManager.html),推荐优先使用DynamicLossScaleManager。可以先使用DynamicLossScaleManager或FixedLossScaleManager的默认参数值进行训练,若产生溢出的迭代过多,影响最终精度时,应根据主要的溢出现象,针对性调整loss_scale的值。当主要溢出现象为梯度上溢时,应减小loss_scale的值(可以尝试将原loss_scale值除以2);当主要溢出现象为梯度下溢时,应增大loss_scale的值(可以尝试将原loss_scale值乘以2)。对于Ascend AI处理器上的训练,其在大部分情况下为混合精度训练。由于Ascend AI处理器计算特性与GPU混合精度计算特性存在差异,LossScaleManager超参也可能需要根据训练情况调整为与GPU上不同的值以保证精度。

检查结论:

@@ -368,7 +368,7 @@ MindSpore API同其它框架的API存在一定差异。有标杆脚本的情况

检查方法:

梯度裁剪(gradient clip)是指当梯度大于某个阈值时,强制调整梯度使其变小的技术。梯度裁剪对RNN网络中的梯度爆炸问题有较好的效果。如果同时使用了[loss scale](https://www.mindspore.cn/tutorials/zh-CN/master/advanced/mixed_precision.html#损失缩放)和梯度裁剪,需要进行本检查。请对照代码检查确认梯度裁剪的应用对象是除以loss scale后得到的原始梯度值。
梯度裁剪(gradient clip)是指当梯度大于某个阈值时,强制调整梯度使其变小的技术。梯度裁剪对RNN网络中的梯度爆炸问题有较好的效果。如果同时使用了[loss scale](https://www.mindspore.cn/tutorials/zh-CN/master/beginner/mixed_precision.html#损失缩放)和梯度裁剪,需要进行本检查。请对照代码检查确认梯度裁剪的应用对象是除以loss scale后得到的原始梯度值。

检查结论:

@@ -378,7 +378,7 @@ MindSpore API同其它框架的API存在一定差异。有标杆脚本的情况

检查方法:

梯度惩罚是指将梯度添加到代价函数中,约束梯度长度的技术。如果同时使用了[loss scale](https://www.mindspore.cn/tutorials/zh-CN/master/advanced/mixed_precision.html)和梯度惩罚(gradient penalty),需要进行本检查。检查确认计算梯度惩罚项时,输入的梯度为无loss scale的梯度。例如,可以先将代用loss scale的梯度除以loss scale,再用来计算梯度惩罚项。
梯度惩罚是指将梯度添加到代价函数中,约束梯度长度的技术。如果同时使用了[loss scale](https://www.mindspore.cn/tutorials/zh-CN/master/beginner/mixed_precision.html)和梯度惩罚(gradient penalty),需要进行本检查。检查确认计算梯度惩罚项时,输入的梯度为无loss scale的梯度。例如,可以先将代用loss scale的梯度除以loss scale,再用来计算梯度惩罚项。

检查结论:



+ 2
- 2
docs/mindinsight/docs/source_zh_cn/performance_optimization.md View File

@@ -41,7 +41,7 @@ Profiler的功能介绍及使用说明请参见教程:

*图3:数据准备详情页面——数据处理*

针对数据处理操作的性能优化,可以参考[数据处理性能优化](https://www.mindspore.cn/tutorials/experts/zh-CN/master/dataset/optimize.html)页面。
针对数据处理操作的性能优化,可以参考[数据处理性能优化](https://www.mindspore.cn/docs/zh-CN/master/model_train/dataset/optimize.html)页面。
查看ResNet50网络中数据处理的代码部分,发现map操作的num_parallel_workers参数没有设置,默认为1,代码如下:

```python
@@ -91,7 +91,7 @@ data_set = data_set.map(operations=trans, input_columns="image", num_parallel_wo

*图6:通过算子耗时详情页面寻找可优化算子*

对于算子耗时优化,在float16和float32格式精度无明显差别的前提下,通常可使用计算量更小的float16格式來提高性能,参考[使能混合精度](https://www.mindspore.cn/tutorials/zh-CN/master/advanced/mixed_precision.html )页面。
对于算子耗时优化,在float16和float32格式精度无明显差别的前提下,通常可使用计算量更小的float16格式來提高性能,参考[使能混合精度](https://www.mindspore.cn/tutorials/zh-CN/master/beginner/mixed_precision.html )页面。

优化参考代码如下:



+ 1
- 1
docs/mindinsight/docs/source_zh_cn/performance_profiling_gpu.md View File

@@ -80,7 +80,7 @@
- `timeline_limit`(int, 可选) - 设置限制timeline文件存储上限大小(单位M)。使用此参数时,op_time必须设置成true,默认值:500。
- `data_process`(bool, 可选)- 表示是否收集数据准备性能数据,默认值:true。
- `op_time`(bool, 可选)- 表示是否收集算子性能数据,默认值:true。
- `profile_framework`(str, 可选)- 是否需要收集Host侧的内存和时间,可选参数为["all", "time", "memory", null]。默认值:"all"
- `profile_framework`(str, 可选)- 是否需要收集Host侧的时间,可选参数为["all", "time", null]。默认值:null

## 启动MindSpore Insight



+ 4
- 4
docs/mindinsight/docs/source_zh_cn/performance_tuning_guide.md View File

@@ -58,7 +58,7 @@ MindSpore Insight在性能调优的单卡页面为用户提供了`迭代轨迹`

- 若用户脚本中不存在耗时的自定义逻辑,说明框架将数据从Host侧发送到Device侧耗时较长,请到[MindSpore社区](https://gitee.com/mindspore/mindspore/issues) 进行反馈。

步骤2:跳转到`数据准备详情`页的`数据处理`标签页,观察算子间队列,确定数据处理具体哪个操作存在性能瓶颈。判断原则请见[性能调试](https://www.mindspore.cn/mindinsight/docs/zh-CN/master/performance_profiling_ascend.html#数据准备性能分析) 页面的`数据处理pipeline分析`部分。找到存在性能问题的操作后,可参考[数据处理性能优化](https://www.mindspore.cn/tutorials/experts/zh-CN/master/dataset/optimize.html) 页面尝试提高数据处理操作的性能。
步骤2:跳转到`数据准备详情`页的`数据处理`标签页,观察算子间队列,确定数据处理具体哪个操作存在性能瓶颈。判断原则请见[性能调试](https://www.mindspore.cn/mindinsight/docs/zh-CN/master/performance_profiling_ascend.html#数据准备性能分析) 页面的`数据处理pipeline分析`部分。找到存在性能问题的操作后,可参考[数据处理性能优化](https://www.mindspore.cn/docs/zh-CN/master/model_train/dataset/optimize.html) 页面尝试提高数据处理操作的性能。

#### 数据下沉模式

@@ -69,7 +69,7 @@ MindSpore Insight在性能调优的单卡页面为用户提供了`迭代轨迹`

步骤2:查看主机队列Size曲线的变化情况。若该队列Size都不是0,说明训练数据从Host发往Device的流程为性能瓶颈点,请到[MindSpore社区](https://gitee.com/mindspore/mindspore/issues) 反馈;否则说明数据处理流程是性能瓶颈点,请参照步骤3继续定位数据处理哪个操作存在性能问题。

步骤3:跳转到`数据准备详情页的数据处理标签页`观察算子间队列,确定数据处理具体哪个操作存在性能瓶颈。判断原则请见[性能调试](https://www.mindspore.cn/mindinsight/docs/zh-CN/master/performance_profiling_ascend.html#数据准备性能分析) 页面的`数据处理pipeline分析`部分。找到存在性能问题的操作后,可参考[数据处理性能优化](https://www.mindspore.cn/tutorials/experts/zh-CN/master/dataset/optimize.html) 页面尝试提高数据处理操作的性能。
步骤3:跳转到`数据准备详情页的数据处理标签页`观察算子间队列,确定数据处理具体哪个操作存在性能瓶颈。判断原则请见[性能调试](https://www.mindspore.cn/mindinsight/docs/zh-CN/master/performance_profiling_ascend.html#数据准备性能分析) 页面的`数据处理pipeline分析`部分。找到存在性能问题的操作后,可参考[数据处理性能优化](https://www.mindspore.cn/docs/zh-CN/master/model_train/dataset/optimize.html) 页面尝试提高数据处理操作的性能。

### 前反向耗时长

@@ -125,7 +125,7 @@ MindSpore Insight在性能调优的单卡页面为用户提供了`迭代轨迹`
步骤3:观察集群页面的迭代拖尾耗时

- 观察是否有某张卡的迭代拖尾耗时明显比其它卡长,通常该情况是由于集群中存在慢节点导致,用户可参考步骤1和步骤2确定是否有慢节点并修复。
- 若所有卡的迭代拖尾耗时基本相同,且该阶段耗时较长,通常是由于AllReduce集合通信算子耗时长导致。用户可尝试通过修改all_reduce_fusion_config参数,改变[AllReduce融合切分策略](https://mindspore.cn/tutorials/experts/zh-CN/master/parallel/overview.html) 降低该阶段的耗时。
- 若所有卡的迭代拖尾耗时基本相同,且该阶段耗时较长,通常是由于AllReduce集合通信算子耗时长导致。用户可尝试通过修改all_reduce_fusion_config参数,改变[AllReduce融合切分策略](https://mindspore.cn/docs/zh-CN/master/model_train/parallel/overview.html) 降低该阶段的耗时。

### 模型并行

@@ -143,7 +143,7 @@ MindSpore Insight在性能调优的单卡页面为用户提供了`迭代轨迹`
步骤3:观察集群页面的纯通信时间

在通过步骤1和步骤2确认没有慢节点的前提下,集群中各卡的纯通信时间应该基本相同。如果该阶段耗时较短,说明由于算子重排布导致的通信时间对性能的影响较小,用户无需考虑对算子切分策略进行优化。否则,用户需要重点分析算子的切分策略是否可以优化。
在参考如下步骤继续分析前,用户需要对模型并行原理有一定的了解,请参考[分布式训练](https://www.mindspore.cn/tutorials/experts/zh-CN/master/parallel/overview.html)了解其基本原理,如下步骤只是辅助用户进行合理性分析,算子切分策略是否有优化空间、如何优化需要用户了解模型并行原理后,结合各自的网络具体分析进行判断。
在参考如下步骤继续分析前,用户需要对模型并行原理有一定的了解,请参考[分布式训练](https://www.mindspore.cn/docs/zh-CN/master/model_train/parallel/overview.html)了解其基本原理,如下步骤只是辅助用户进行合理性分析,算子切分策略是否有优化空间、如何优化需要用户了解模型并行原理后,结合各自的网络具体分析进行判断。

- 若该阶段耗时较长,用户可任意选其中的一张卡,观察其时间线。在时间线中,MindSpore Insight对纯通信时间做了标记,参考下图中的`Pure Communication Op`:



+ 1
- 1
docs/mindinsight/docs/source_zh_cn/profiling/profiling_host_time.txt View File

@@ -1,4 +1,4 @@
Host侧耗时分析
--------------

如果开启了Host侧时间收集功能,在训练结束后可以在指定目录下查看各阶段的Host侧耗时情况。例如,Profiler实例化时,指定output_path="/XXX/profiler_output",Host侧耗时数据会保存在"/XXX/profiler_output/profiler/host_info"目录下,文件格式为json,前缀为timeline\_,后缀为rank_id。Host侧耗时文件可以用 ``chrome://tracing`` 来展示。可以使用W/S/A/D来放大、缩小、左移、右移地查看耗时信息。
如果开启了Host侧时间收集功能,在训练结束后可以在ascend_timeline_display_[rank_id].json下查看耗时信息。可以用 ``chrome://tracing`` 来展示。可以使用W/S/A/D来放大、缩小、左移、右移地查看耗时信息。

+ 42
- 43
docs/mindinsight/docs/source_zh_cn/profiling/profiling_offline.txt View File

@@ -51,11 +51,6 @@
├──── container
├──── FRAMEWORK // 框架侧采集的原始数据
│ └──── op_range_*
├──── host_info // 框架profiling解析生成的结果
│ ├──── dataset_*.csv
│ ├──── host_info_*.csv
│ ├──── host_memory_*.csv
│ └──── timeline_*.json
├──── PROF_{数字}_{时间戳}_{字符串} // msprof性能数据
│ ├──── analyse
│ ├──── device_*
@@ -67,7 +62,8 @@
│ └──── task.csv
├──── rank-*_{时间戳}_ascend_ms // MindStudio Insight可视化交付件
│ ├──── ASCEND_PROFILER_OUTPUT // MindSpore Profiler接口采集的性能数据
│ └──── profiler_info_*.json
│ ├──── profiler_info_*.json
│ └──── profiler_metadata.json // 记录用户自定义的meta数据,调用add_metadata或add_metadata_json接口生成该文件
├──── aicore_intermediate_*_detail.csv
├──── aicore_intermediate_*_type.csv
├──── aicpu_intermediate_*.csv
@@ -98,61 +94,64 @@
├──── profiler_info_*.json
├──── step_trace_point_info_*.json
└──── step_trace_raw_*_detail_time.csv
└──── dataset_*.csv
- *代表rank id
- \* 代表rank id

性能数据文件描述
~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~

PROF_{数字}_{时间戳}_{字符串}目录下为CANN Profiling采集的性能数据,主要保存在mindstudio_profiler_output中,数据介绍可参考 `性能数据文件说明 <https://www.hiascend.com/document/detail/zh/mindstudio/70RC2/mscommandtoolug/mscommandug/atlasprofiling_16_0062.html>`_。

profiler目录下包含csv、json、txt三类文件,覆盖了算子执行时间、内存占用、通信等方面的性能数据,文件说明见下表。部分文件的详细说明参考 `性能数据 <https://www.mindspore.cn/mindinsight/docs/zh-CN/master/profiler_files_description.html>`_。

============================================== ==============================================================================
文件名 说明
============================================== ==============================================================================
step_trace_point_info_*.json step节点对应的算子信息(仅mode=GRAPH,export GRAPH_OP_RUM=0)
step_trace_raw_*_detail_time.csv 每个step的节点的时间信息(仅mode=GRAPH,export GRAPH_OP_RUM=0)
============================================== ==============================================================================
文件名 说明
============================================== ==============================================================================
step_trace_point_info_*.json step节点对应的算子信息(仅mode=GRAPH,export GRAPH_OP_RUM=0)
step_trace_raw_*_detail_time.csv 每个step的节点的时间信息(仅mode=GRAPH,export GRAPH_OP_RUM=0)

dynamic_shape_info_*.json 动态shape下算子信息

dynamic_shape_info_*.json 动态shape下算子信息
pipeline_profiling_*.json MindSpore数据处理,采集落盘的中间文件,用于MindInsight可视化
minddata_pipeline_raw_*.csv MindSpore数据处理,采集落盘的中间文件,用于MindInsight可视化
minddata_pipeline_summary_*.csv MindSpore数据处理,采集落盘的中间文件,用于MindInsight可视化
minddata_pipeline_summary_*.json MindSpore数据处理,采集落盘的中间文件,用于MindInsight可视化
framework_raw_*.csv MindSpore数据处理中AI Core算子的信息
device_queue_profiling_*.txt MindSpore数据处理,采集落盘的中间文件,用于MindInsight可视化(仅数据下沉场景)
minddata_aicpu_*.txt MindSpore数据处理中AI CPU算子的性能数据(仅数据下沉场景)
dataset_iterator_profiling_*.txt MindSpore数据处理,采集落盘的中间文件,用于MindInsight可视化(仅数据非下沉场景)

pipeline_profiling_*.json MindSpore数据处理,采集落盘的中间文件,用于MindInsight可视化
minddata_pipeline_raw_*.csv MindSpore数据处理,采集落盘的中间文件,用于MindInsight可视化
minddata_pipeline_summary_*.csv MindSpore数据处理,采集落盘的中间文件,用于MindInsight可视化
minddata_pipeline_summary_*.json MindSpore数据处理,采集落盘的中间文件,用于MindInsight可视化
framework_raw_*.csv MindSpore数据处理中AI Core算子的信息
device_queue_profiling_*.txt MindSpore数据处理,采集落盘的中间文件,用于MindInsight可视化(仅数据下沉场景)
minddata_aicpu_*.txt MindSpore数据处理中AI CPU算子的性能数据(仅数据下沉场景)
dataset_iterator_profiling_*.txt MindSpore数据处理,采集落盘的中间文件,用于MindInsight可视化(仅数据非下沉场景)
aicore_intermediate_*_detail.csv AI Core算子数据
aicore_intermediate_*_type.csv AI Core算子调用次数和耗时统计
aicpu_intermediate_*.csv AI CPU算子信息解析后耗时数据
flops_*.txt 记录AI Core算子的浮点计算次数(FLOPs)、每秒的浮点计算次数(FLOPS)
flops_summary_*.json 记录所有算子的总的FLOPs、所有算子的平均FLOPs、平均的FLOPS_Utilization

aicore_intermediate_*_detail.csv AI Core算子数据
aicore_intermediate_*_type.csv AI Core算子调用次数和耗时统计
aicpu_intermediate_*.csv AI CPU算子信息解析后耗时数据
flops_*.txt 记录AI Core算子的浮点计算次数(FLOPs)、每秒的浮点计算次数(FLOPS)
flops_summary_*.json 记录所有算子的总的FLOPs、所有算子的平均FLOPs、平均的FLOPS_Utilization
ascend_timeline_display_*.json timeline可视化文件,用于MindStudio Insight可视化
ascend_timeline_summary_*.json timeline统计数据
output_timeline_data_*.txt 算子timeline数据,只有AI Core算子数据存在时才有

ascend_timeline_display_*.json timeline可视化文件,用于MindStudio Insight可视化
ascend_timeline_summary_*.json timeline统计数据
output_timeline_data_*.txt 算子timeline数据,只有AI Core算子数据存在时才有
cpu_ms_memory_record_*.txt 内存profiling的原始文件
operator_memory_*.csv 算子级内存信息

cpu_ms_memory_record_*.txt 内存profiling的原始文件
operator_memory_*.csv 算子级内存信息
minddata_cpu_utilization_*.json CPU利用率

minddata_cpu_utilization_*.json CPU利用率
cpu_op_detail_info_*.csv CPU算子耗时数据(仅mode=GRAPH)
cpu_op_type_info_*.csv 具体类别CPU算子耗时统计(仅mode=GRAPH)
cpu_op_execute_timestamp_*.txt CPU算子执行起始时间与耗时(仅mode=GRAPH)
cpu_framework_*.txt 异构场景下CPU算子耗时(仅mode=GRAPH)

cpu_op_detail_info_*.csv CPU算子耗时数据(仅mode=GRAPH)
cpu_op_type_info_*.csv 具体类别CPU算子耗时统计(仅mode=GRAPH)
cpu_op_execute_timestamp_*.txt CPU算子执行起始时间与耗时(仅mode=GRAPH)
cpu_framework_*.txt 异构场景下CPU算子耗时(仅mode=GRAPH)
ascend_cluster_analyse_model-xxx.csv 在模型并行或pipeline并行模式下,计算和通信等相关数据(仅mode=GRAPH)

ascend_cluster_analyse_model-xxx.csv 在模型并行或pipeline并行模式下,计算和通信等相关数据(仅mode=GRAPH)
hccl_raw_*.csv 基于卡的通信时间和通信等待时间(仅mode=GRAPH)

hccl_raw_*.csv 基于卡的通信时间和通信等待时间(仅mode=GRAPH)
parallel_strategy_*.json 算子并行策略,采集落盘中间文件,用于MindInsight可视化

parallel_strategy_*.json 算子并行策略,采集落盘中间文件,用于MindInsight可视化
profiler_info_*.json Profiler配置等info信息

profiler_info_*.json Profiler配置等info信息
============================================== ==============================================================================
dataset_*.csv 数据处理模块各阶段执行耗时
============================================== ==============================================================================

- *表示rank id
- \* 表示rank id
- ascend_cluster_analyse_model-xxx_*.csv完整的文件名应该是ascend_cluster_analyse_model-{mode}_{stage_num}_{rank_size}_{rank_id}.csv,比如ascend_cluster_analyse_model-parallel_1_8_0.csv

+ 0
- 17
docs/mindinsight/docs/source_zh_cn/profiling/profiling_resoure.txt View File

@@ -78,20 +78,3 @@ CPU利用率常用场景:

*图:内存使用折线图*

Host侧内存使用情况
~~~~~~~~~~~~~~~~~~

如果开启了Host侧内存收集功能,在训练结束后可以在指定目录下查看内存使用情况。例如,Profiler实例化时,指定output_path="/XXX/profiler_output",Host侧内存数据会保存在"/XXX/profiler_output/profiler/host_info"目录下,文件格式为csv,前缀为host_memory\_,后缀为rank_id。表头的含义如下:

- tid:收集Host侧内存时当前线程的线程号。
- pid:收集Host侧内存时当前进程的进程号。
- parent_pid:收集Host侧内存时当前进程的父进程的进程号。
- module_name:收集Host侧内存的组件名,一个组件包含一个或多个event。
- event:收集Host侧内存的事件名,一个event包含一个或多个stage。
- stage:收集Host侧内存的阶段名。
- level:0表示框架开发者使用,1表示用户(算法工程师)使用。
- start_end:stage开始或结束的标记,0表示开始标记,1表示结束标记,2表示不区分开始或结束。
- custom_info:框架开发者用于定位性能问题的组件自定义信息,可能为空。
- memory_usage:Host侧内存占用,单位为kB,0表示当前阶段没有收集内存数据。
- time_stamp:时间戳,单位为us。


+ 1
- 1
docs/mindinsight/docs/source_zh_cn/profiling/profiling_training.txt View File

@@ -206,4 +206,4 @@

- `parallel_strategy` (bool, 可选) - 表示是否收集并行策略性能数据,默认值:true。

- `profile_framework` (str, 可选) - 是否需要收集Host侧的内存和时间,可选参数为["all", "time", "memory", null]。默认值:"all"。
- `profile_framework` (str, 可选) - 是否需要收集Host侧的内存和时间,可选参数为["all", "time", null]。默认值:"all"。

+ 3
- 2
docs/mindspore/Makefile View File

@@ -19,9 +19,10 @@ help:
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

EXTRADIR = $(SOURCEDIR)/api_python
# EXTRADIR = $(SOURCEDIR)/api_python
# -rm -rf $(BUILDDIR)/* $(EXTRADIR)/*

.PHONY: clean

clean:
-rm -rf $(BUILDDIR)/* $(EXTRADIR)/*
-rm -rf $(BUILDDIR)/*

+ 5
- 1
docs/mindspore/_ext/generate_ops_mint_rst.py View File

@@ -29,7 +29,11 @@ def generate_ops_mint_rst(repo_path, ops_path, mint_path, pr_need='all'):
modulename += '.functional'
mint_ops_dict[modulename] = []
# pylint: disable=eval-used
reg_all = eval(f"{modulename}.__all__")
try:
reg_all = eval(f"{modulename}.__all__")
except AttributeError as e:
print(f'模块名有误:{e}')
continue
one_p = re.findall(r'from mindspore\.(ops|nn).*?(?<!extend) import (.*?)(\n|# )', content)
two_p = [i[1] for i in one_p]
for i in two_p:


+ 1
- 1
docs/mindspore/_ext/generate_rst_by_en.py View File

@@ -17,7 +17,7 @@ def get_api(fullname):
try:
api = eval(f"module_import.{api_name}")
except AttributeError:
print(f'failed to import {api_name}')
print(f'failed to {module_import}.{api_name}')
return ''
return api



+ 29
- 0
docs/mindspore/source_en/api_python/index.rst View File

@@ -0,0 +1,29 @@
API
=========================

.. toctree::
:glob:
:maxdepth: 1

mindspore
mindspore.nn
mindspore.ops
mindspore.ops.primitive
mindspore.mint
mindspore.amp
mindspore.train
mindspore.communication
mindspore.communication.comm_func
mindspore.common.initializer
mindspore.hal
mindspore.dataset
mindspore.dataset.transforms
mindspore.mindrecord
mindspore.nn.probability
mindspore.rewrite
mindspore.multiprocessing
mindspore.boost
mindspore.numpy
mindspore.scipy
mindspore.experimental
../note/env_var_list

+ 9
- 4
docs/mindspore/source_en/conf.py View File

@@ -256,9 +256,15 @@ src_dir_en = os.path.join(repo_path, copy_path)
des_sir = "./api_python"

def copy_source(sourcedir, des_sir):
if os.path.exists(des_sir):
shutil.rmtree(des_sir)
shutil.copytree(sourcedir, des_sir)
for i in os.listdir(sourcedir):
if os.path.isfile(os.path.join(sourcedir, i)):
if os.path.exists(os.path.join(des_sir, i)):
os.remove(os.path.join(des_sir, i))
shutil.copy(os.path.join(sourcedir, i), os.path.join(des_sir, i))
else:
if os.path.exists(os.path.join(des_sir, i)):
shutil.rmtree(os.path.join(des_sir, i))
shutil.copytree(os.path.join(sourcedir, i), os.path.join(des_sir, i))

copy_source(src_dir_en, des_sir)

@@ -425,7 +431,6 @@ release_source = f'[![View Source On Gitee](https://mindspore-website.obs.cn-nor
with open(src_release, "r", encoding="utf-8") as f:
data = f.read()
if len(re.findall("\n## (.*?)\n",data)) > 1:
data = re.sub("\n## MindSpore 2.3.1 [\s\S\n]*?\n## ", "\n## ", data)
content = regex.findall("(\n## MindSpore [^L][\s\S\n]*?)\n## ", data, overlapped=True)
repo_version = re.findall("\n## MindSpore ([0-9]+?\.[0-9]+?)\.([0-9]+?)[ -]", content[0])[0]
content_new = ''


+ 6
- 6
docs/mindspore/source_en/design/distributed_training_design.md View File

@@ -94,9 +94,9 @@ This subsection describes how the `ParallelMode.SEMI_AUTO_PARALLEL` semi-automat

Semi-automatic parallelism supports the automatic mixing of multiple parallel modes, respectively:

**Operator-level parallelism**: Operator parallelism takes the operators in a neural network and slices the input tensor to multiple devices for computation. In this way, data samples and model parameters can be distributed among different devices to train large-scale deep learning models and use cluster resources for parallel computing to improve the overall speed. The user can set the shard strategy for each operator, and the framework will model the slice of each operator and its input tensor according to the shard strategy of the operator to maintain mathematical equivalence. This approach can effectively reduce the load on individual devices and improve computational efficiency, and is suitable for training large-scale deep neural networks. For more details, please refer to [operator-level parallelism](https://www.mindspore.cn/tutorials/experts/en/master/parallel/operator_parallel.html).
**Operator-level parallelism**: Operator parallelism takes the operators in a neural network and slices the input tensor to multiple devices for computation. In this way, data samples and model parameters can be distributed among different devices to train large-scale deep learning models and use cluster resources for parallel computing to improve the overall speed. The user can set the shard strategy for each operator, and the framework will model the slice of each operator and its input tensor according to the shard strategy of the operator to maintain mathematical equivalence. This approach can effectively reduce the load on individual devices and improve computational efficiency, and is suitable for training large-scale deep neural networks. For more details, please refer to [operator-level parallelism](https://www.mindspore.cn/docs/en/master/model_train/parallel/operator_parallel.html).

**Pipeline parallism**: When the number of cluster devices is large, if only operator parallelism is used, communication is required over the communication domain of the entire cluster, which may make communication inefficient and thus reduce the overall performance. The pipeline parallelism can slice the neural network structure into multiple stages, and each stage is running in a part of the device, which limits the communication domain of the collective communication to this part of the device, while the inter-stage uses point-to-point communication. The advantages of pipeline parallelism are: improving communication efficiency, and easily handling neural network structures stacked by layers. The disadvantage is that some nodes may be idle at the same time. Foe detailed information, refer to [pipeline parallelism](https://www.mindspore.cn/tutorials/experts/en/master/parallel/pipeline_parallel.html).
**Pipeline parallism**: When the number of cluster devices is large, if only operator parallelism is used, communication is required over the communication domain of the entire cluster, which may make communication inefficient and thus reduce the overall performance. The pipeline parallelism can slice the neural network structure into multiple stages, and each stage is running in a part of the device, which limits the communication domain of the collective communication to this part of the device, while the inter-stage uses point-to-point communication. The advantages of pipeline parallelism are: improving communication efficiency, and easily handling neural network structures stacked by layers. The disadvantage is that some nodes may be idle at the same time. Foe detailed information, refer to [pipeline parallelism](https://www.mindspore.cn/docs/en/master/model_train/parallel/pipeline_parallel.html).

**MoE parallism**: MoE is to distribute the experts to different workers and each worker takes on different batches of training data. For the non-MoE layer, expert parallelism is the same as data parallelism. In the MoE layer, the tokens in the sequence are sent to the workers corresponding to their matching experts via all-to-all communication. After completing the computation of the corresponding expert, it is then re-passed back to the original worker by all-to-all and organized into the original sequence for computation of the next layer. Since MoE models usually have a large number of experts, the expert parallelism increases more with the size of the model than the model parallelism.

@@ -106,7 +106,7 @@ This subsection describes how the `ParallelMode.SEMI_AUTO_PARALLEL` semi-automat

![multi-copy Parallelism](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_zh_cn/design/images/multi_copy.png)

**Optimizer Parallelism**: When training in data parallelism or operator parallelism, the same copy of the model parameters may exist on multiple devices, which allows the optimizer to have redundant computations across multiple devices when updating that weight. In this case, the computation of the optimizer can be spread over multiple devices by optimizer parallelism. Its advantages are: reducing static memory consumption, and the amount of computation within the optimizer. The disadvantages are: increasing communication overhead. For detailed information, refer to [Optimizer Parallelism](https://www.mindspore.cn/tutorials/experts/en/master/parallel/optimizer_parallel.html).
**Optimizer Parallelism**: When training in data parallelism or operator parallelism, the same copy of the model parameters may exist on multiple devices, which allows the optimizer to have redundant computations across multiple devices when updating that weight. In this case, the computation of the optimizer can be spread over multiple devices by optimizer parallelism. Its advantages are: reducing static memory consumption, and the amount of computation within the optimizer. The disadvantages are: increasing communication overhead. For detailed information, refer to [Optimizer Parallelism](https://www.mindspore.cn/docs/en/master/model_train/parallel/optimizer_parallel.html).

### Semi-automatic Parallel Code

@@ -143,11 +143,11 @@ In fact, the hybrid parallel strategy generation module is responsible for findi

Fully automatic parallelism is very difficult to implement, and MindSpore divides the provided strategy generation algorithm into L1 level and L2 level according to the degree of user intervention required (here we assume that the manually configured full graph strategy SEMI_AUTO is L0 level, and the scheme that does not require user participation is L3 level).

The strategy generation algorithm at the L1 level is called Strategy Broadcast (Sharding Propagation). In this mode, the user only needs to manually define the strategies for a few key operators, and the strategies for the remaining operators in the computational graph are automatically generated by the algorithm. Because the strategy of the key operator has been defined, the cost model of the algorithm mainly describes the redistribution cost between the operators, and the optimization objective is to minimize the redistribution cost of the whole graph. Because the main operator strategy has been defined, which is equivalent to a compressed search space, the search time of this scheme is shorter and its strategy performance depends on the definition of the key operator strategy, so it still requires the user to have the ability to analyze the defined strategy. Refer to [Sharding Propagation](https://www.mindspore.cn/tutorials/experts/en/master/parallel/sharding_propagation.html) for detailed information.
The strategy generation algorithm at the L1 level is called Strategy Broadcast (Sharding Propagation). In this mode, the user only needs to manually define the strategies for a few key operators, and the strategies for the remaining operators in the computational graph are automatically generated by the algorithm. Because the strategy of the key operator has been defined, the cost model of the algorithm mainly describes the redistribution cost between the operators, and the optimization objective is to minimize the redistribution cost of the whole graph. Because the main operator strategy has been defined, which is equivalent to a compressed search space, the search time of this scheme is shorter and its strategy performance depends on the definition of the key operator strategy, so it still requires the user to have the ability to analyze the defined strategy. Refer to [Sharding Propagation](https://www.mindspore.cn/docs/en/master/model_train/parallel/sharding_propagation.html) for detailed information.

There are two types of L2-level strategy generation algorithms, Dynamic Programming and Symbolic Automatic Parallel Planner (SAPP for short). Both methods have their advantages and disadvantages. The dynamic programming algorithm is able to search for the optimal strategy inscribed by the cost model, but it takes longer time to search for parallel strategies for huge networks. The SAPP algorithm is able to generate optimal strategies instantaneously for huge networks and large-scale cuts.
The core idea of the dynamic programming algorithm is to build a cost model of the full graph, including computation cost and communication cost, to describe the absolute time delay in the distributed training process, and to compress the search time using equivalent methods such as edge elimination and point elimination, but the search space actually grows exponentially with the number of devices and operators, so it is not efficient for large clusters with large models.
SAPP is modeled based on the parallelism principle by creating an abstract machine to describe the hardware cluster topology and optimizing the cost model by symbolic simplification. Its cost model compares not the predicted absolute latency, but the relative cost of different parallel strategies, so it can greatly compress the search space and guarantee minute search times for 100-card clusters. Refer to [Distributed Parallel Training Mode](https://www.mindspore.cn/tutorials/experts/en/master/parallel/overview.html)
SAPP is modeled based on the parallelism principle by creating an abstract machine to describe the hardware cluster topology and optimizing the cost model by symbolic simplification. Its cost model compares not the predicted absolute latency, but the relative cost of different parallel strategies, so it can greatly compress the search space and guarantee minute search times for 100-card clusters. Refer to [Distributed Parallel Training Mode](https://www.mindspore.cn/docs/en/master/model_train/parallel/overview.html)

Sharding Propagation and SAPP currently support manual definition of Pipeline + automatic operator parallelism, and can be used in conjunction with optimizations such as recomputation, optimizer parallelism, etc. Dynamic Programming algorithms only support automatic operator parallelism.

@@ -362,7 +362,7 @@ When the EmbeddingTable reaches T level and the single machine memory cannot be

![heterogeneous-heter-ps](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_zh_cn/design/images/heter-ps.png)

Parameter Server encapsulates heterogeneous processes, and users only need to configure parameters to use PS. For the detailed configuration process, refer to [Parameter Server training process](https://www.mindspore.cn/tutorials/experts/en/master/parallel/parameter_server_training.html).
Parameter Server encapsulates heterogeneous processes, and users only need to configure parameters to use PS. For the detailed configuration process, refer to [Parameter Server training process](https://www.mindspore.cn/docs/en/master/model_train/parallel/parameter_server_training.html).

In addition, the process of using PS is also available in the wide&deep network and can be found at: <https://gitee.com/mindspore/models/tree/master/official/recommend/Wide_and_Deep>.



+ 1
- 186
docs/mindspore/source_en/design/dynamic_graph_and_static_graph.md View File

@@ -163,189 +163,4 @@ Similarly for the input y derivation, the same procedure can be used for the der

### Control Flow in PyNative Mode

In the PyNative mode, scripts are executed according to the Python syntax, so in MindSpore, there is no special treatment for the control flow syntax, which is directly expanded and executed according to the Python syntax, and automatic differentiation is performed on the expanded execution operator. For example, for a for loop, the statements in the for loop are continuously executed under PyNative and automatic differentiation is performed on the operators according to the specific number of loops.

## Dynamic and Static Unification

### Overview

The industry currently supports both dynamic and static graph modes. Dynamic graphs are executed by interpretation, with dynamic syntax affinity and flexible expression, and static graphs are executed by using jit compilation optimization, more inclined to static syntax and more restrictions in syntax. For dynamic and static graph modes, firstly MindSpore unifies the API expression, uses the same API in both modes, secondly, unifies the underlying differential mechanism of dynamic and static graphs.

### Interconversion of Dynamic and Static Graphs

In MindSpore, we can switch the execution between using dynamic or static graphs by controlling the mode input parameters. For example:

```python
ms.set_context(mode=ms.PYNATIVE_MODE)
```

Since there are restrictions on Python syntax under static graphs, switching from dynamic to static graphs requires compliance with the syntax restrictions of static graphs in order to execute correctly by using static graphs. For more syntax restrictions for static graphs, refer to [Static Graph Syntax Restrictions](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html).

### Combination of Static and Dynamic Graphs

MindSpore supports mixed execution by using static compilation under dynamic graphs. The function objects that need to be executed with static graphs by using jit modification, and in this way you can achieve mixed execution of dynamic and static graphs. For more use of jit, refer to [jit documentation](https://www.mindspore.cn/tutorials/en/master/beginner/accelerate_with_static_graph.html#decorator-based-startup-method).

For example:

```python
import numpy as np
import mindspore as ms
import mindspore.nn as nn

class AddMulMul(nn.Cell):
def __init__(self):
super(AddMulMul, self).__init__()
self.param = ms.Parameter(ms.Tensor(0.5, ms.float32))

@ms.jit
def construct(self, x):
x = x + x
x = x * self.param
x = x * x
return x

class CellCallSingleCell(nn.Cell):
def __init__(self):
super(CellCallSingleCell, self).__init__()
self.conv = nn.Conv2d(1, 2, kernel_size=2, stride=1, padding=0, weight_init="ones", pad_mode="valid")
self.bn = nn.BatchNorm2d(2, momentum=0.99, eps=0.00001, gamma_init="ones")
self.relu = nn.ReLU()
self.add_mul_mul = AddMulMul()

def construct(self, x):
x = self.conv(x)
x = self.bn(x)
x = self.add_mul_mul(x)
x = self.relu(x)
return x

ms.set_context(mode=ms.PYNATIVE_MODE, device_target="CPU")
inputs = ms.Tensor(np.ones([1, 1, 2, 2]).astype(np.float32))
net = CellCallSingleCell()
out = net(inputs)
print(out)
```

```text
[[[[15.99984]]

[[15.99984]]]]
```

### Static Graph Syntax Enhancement

In the MindSpore static graph mode, users need to follow MindSpore [Static Graph Syntax Support](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html) when writing programs, and there are constraints on the use of syntax. In dynamic graph mode, Python script code will be executed according to Python syntax, and users can use any Python syntax. It can be seen that the syntax constraints of static and dynamic graphs are different.

JIT Fallback considers the unification of static and dynamic graphs from the perspective of static graphs. When an unsupported syntax is found during compilation, the syntax is Fallback to the Python interpreter for interpretation execution. Through the JIT Fallback feature, static graphs can support as much dynamic graph syntax as possible, so that static graphs provide a syntax experience close to dynamic graphs, so as to achieve dynamic and static unity.

In the graph mode scenario, the MindSpore framework will report an error when it encounters unsupported syntax or symbols during graph compilation, mostly in the type inference stage. In the graph compilation stage, the Python source code written by the user is parsed, and then subsequent static analysis, type derivation, optimization and other steps are performed. Therefore, the JIT Fallback feature needs to be pre-detected for unsupported syntax. Common unsupported syntax mainly includes: calling methods of third-party libraries, calling class names to create objects, calling unsupported Python built-in functions, etc. Interpret execution of unsupported syntax Fallback to the Python interpreter. Since the graph mode uses [MindSpore IR (MindIR)](https://www.mindspore.cn/docs/en/master/design/all_scenarios.html#mindspore-ir-mindir), it is necessary to convert the statement executed by the interpretation to the intermediate representation and record the information required by the interpreter.

The following mainly introduces the static graph syntax supported using the JIT Fallback extension. The default value of the JIT syntax support level option jit_syntax_level is 'LAX', extending the static graph syntax with the ability of JIT Fallback.

#### Calling the Third-party Libraries

Complete support for third-party libraries such as NumPy and SciPy. The static graph mode supports many third-party library data types such as np.ndarray and their operation operations, supports obtaining properties and methods that call third-party libraries, and supports interacting with third-party libraries such as NumPy through methods such as Tensor's asnumpy(). In other words, users can call MindSpore's own interface and operator in static graph mode, or directly call the interface of the three-party library, or use them together.

- Supporting data types of third-party libraries (such as NumPy and SciPy), allowing calling and returning objects of third-party libraries.
- Supporting calling methods of third-party libraries.
- Supporting creating Tensor instances by using the data types of the third-party library NumPy.
- The assignment of subscripts for data types in third-party libraries is not currently supported.

For more usage, please refer to the [Calling the Third-party Libraries](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html#calling-the-third-party-libraries) section in [Static Graph Syntax Support](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html).

#### Supporting the Use of Custom Classes

Custom classes that do not use '@jit_class' decorations and do not inherit 'nn. Cell`。 Through the JIT Fallback technical solution, static graph mode allows creating and referencing instances of custom classes, can directly obtain and call properties and methods of custom class instances, and allows modifying properties(Inplace operations).

For more usage, please refer to the [Supporting the Use of Custom Classes](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html#supporting-the-use-of-custom-classes) section in [Static Graph Syntax Support](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html).

#### Basic Operators Support More Data Types

In the syntax of graph mode, the following basic operators in the list are overloaded: ['+', '-', '*', '/', '//', '%', '**', '<<', '>>', '&', '|', '^', 'not', '==', '!=', '<', '>', '<=', '>=', 'in', 'not in', 'y=x[0]']. For more details, please refer to [Operators](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax/operators.html). When getting unsupported input type, those operators need to use extended static graph syntax to support, and make the output consistent with the output in the pynative mode.

For more usage, please refer to the [Basic Operators Support More Data Type](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html#basic-operators-support-more-data-type) section in [Static Graph Syntax Support](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html).

#### Base Type

Use the JIT Fallback feature to extend support for Python's native data types 'List', 'Dictionary', 'None'. For more usage, please refer to the [Base Type](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html#base-type) section in [Static Graph Syntax Support](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html).

##### Supporting List Inplace Modification Operations

- Support for getting the original `List` object from a global variable.
- Inplace operations on input `List` objects are not supported.
- Support for in-place modification of some `List` built-in functions.

##### Supporting the High-Level Usage of Dictionary

- Supporting Top Graph Return Dictionary.
- Supporting Dictionary Index Value Retrieval and Assignment.

##### Supporting the Usage of None

`None` is a special value in Python that represents null and can be assigned to any variable. Functions that do not have a return value statement are considered to return `None`. At the same time, `None` is also supported as the input parameter or return value of the top graph or subgraph. Support `None` as a subscript of a slice as input to `List`, `Tuple`, `Dictionary`.

#### Built-in Functions Support More Data Types

Extend the support for built-in functions. Python built-in functions perfectly support more input types, such as third-party library data types. More support for built-in functions can be found in the [Python built-in functions](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax/python_builtin_functions.html) section.

#### Supporting Control Flow

In order to improve the support of Python standard syntax, realize dynamic and static unification, and extend the support for more data types in the use of control flow statements. Control flow statements refer to flow control statements such as 'if', 'for', and 'while'. Theoretically, by extending the supported syntax, it is also supported in control flow scenarios. For more usage, please refer to [Supporting Control Flow](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html#supporting-control-flow) section in [Static Graph Syntax Support](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html).

#### Supporting Property Setting and Modification

More types of inplace operations are supported. The previous version only supported value modification of the Parameter type through the Inplace operator, and in the static graph mode of MindSpore version 2.1, the properties of custom classes, Cell subclasses, and jit_class classes were supported. In addition to supporting changing the properties of class self and global variables, it also supports inplace operations such as extend(), reverse(), insert(), pop() of the List type. For more usage, please refer to the [Supporting Property Setting and Modification](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html#supporting-property-setting-and-modification) section in [Static Graph Syntax Support](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html).

- Set and modify properties of custom class objects and third-party types.
- Make changes to the Cell's self object.
- Set and modify Cell objects and jit_class objects in the static graph.

#### Supporting Derivation

The static graph syntax supported by JIT Fallback also supports its use in derivation. For more usage, please refer to the [Supporting Derivation](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html#supporting-derivation) section in [Static Graph Syntax Support](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html).

#### Annotation Type

For the syntax supported by the runtime extensions, nodes are generated that cannot be derived by type and are called `Any` types. Since the type cannot derive the correct type at compile time, this `Any` will be operated with a default maximum precision 'Float64' to prevent loss of precision. To optimize performance, it is recommended to minimize the generation of `Any` types. When the user knows exactly what type of statement will be generated through the extension, it is recommended to use `Annotation @jit.typing:` to specify the corresponding Python statement type, thereby determining the type of the interpretation node and avoiding the generation of `Any` types. For more usage, please refer to the [Annotation Type](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html#annotation-type) section in [Static Graph Syntax Support](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html).

#### Instructions for Use

When using the static graph extension support syntax, note the following points:

1. In order to match the support capability of the dynamic graph. That is, it must be within the scope of dynamic graph syntax, including but not limited to data types.

2. When extending the static graph syntax, more syntax is supported, but the execution performance may be affected and is not optimal.

3. When extending the static graph syntax, more syntax is supported, and the ability to import and export cannot be used with MindIR due to use Python.

4. It is not currently supported that the repeated definition of global variables with the same name across Python files, and these global variables are used in the network.

### Conversion Technique from Dynamic Graph to Static Graph

MindSpore provides PIJit, a feature that directly converts a user's dynamic graph code into a static graph without code changes. This feature balances performance and ease of use, removes the cost of switching between static and dynamic modes, and truly unifies static and dynamic modes. It is based on the analysis of Python bytecode, and captures the execution flow of Python graphs. Subgraphs that can be run as static graphs are run as static graphs, and subgraphs that are not supported by Python syntax are run as dynamic graphs, and at the same time, by modifying and adjusting the bytecode to link the static graphs, it achieves the mixed execution of static and dynamic modes. In order to meet the premise of ease of use, improve performance.

#### PIJit Includes the Following Features

- 1. Graph Capture: Pre-processing of bytecode, dynamically tracking the execution of the interpretation, recognizing MindSpore accessible graph operations, and providing split graph to ensure the correctness of function (bytecode) functionality.
- 2. Bytecode Support: Currently supports Python 3.7, Python 3.8, Python 3.9 and Python 3.10 version bytecode.
- 3. Graph Optimization: Optimize the bytecode generated in the graph, including branch cropping, bytecode filtering, function bytecode inlining, constant folding and other functions.
- 4. Exception Capture Mechanism: support for with, try-except syntax.
- 5. Support loop processing: implement features such as graph capture and split graph by simulating the operation stack of bytecode.
- 6. UD Analysis: The method of user-def chain analysis of variables solves the problem that some parameter types cannot be used as the return value of static graphs (Function, Bool, None), and reduces the useless parameters, improves the execution efficiency of the graphs, and reduces the copying of data.
- 7. Side effect analysis and processing: to make up for the disadvantage of side effect processing of static graphs. According to different scenarios, collect and record the variables and byte codes that generate side effects, and supplement the processing of side effects outside the static graphs on the basis of guaranteeing the semantics of the program.
- 8. Guard: The Guard records the conditions that need to be met by the inputs for the subgraph/optimization to enter, and checks if the inputs are suitable for the corresponding subgraph optimization.
- 9. Cache:The graph management caches the subgraph/optimization and Guard correspondences.
- 10. Dynamic Shape and Symbolic Shape: Use input_signature to support Dynamic Shape and Symbolic Shape for Tensor/Tensor List/Tensor Tuple as input prompts. Simultaneously supports automatic recognition of Dynamic Shape after multiple runs.
- 11. Compiling by trace: Supports operator and other type derivations during tracking and bytecode analysis processes.
- 12. Automatic mixed precision: Supports the automatic mixed precision capability of the native mindspore.nn.Cell.

#### Usage

def jit(fn=None, input_signature=None, hash_args=None, jit_config=None, mode="PIJit"):

The original Jit function uses mode="PSJit", the new feature PIJit uses mode="PIJit", jit_config passes a dictionary of parameters that can provide some optimization and debugging options. For example: print_after_all can print the bytecode of the graph and split graph information, loop_unrolling can provide loop unrolling function, enable_dynamic_shape apply dynamic shape.

#### Limitations

- It is not supported to run a function with decoration @jit(mode=\"PIJit\") in static graph mode, in which case the decoration @jit(mode=\"PIJit\") is considered invalid.
- Calls to functions with decoration @jit(mode=\"PIJit\") inside functions decorated with @jit(mode=\"PIJit\") are not supported, and the decorated @jit(mode=\"PIJit\") is considered invalid.
In the PyNative mode, scripts are executed according to the Python syntax, so in MindSpore, there is no special treatment for the control flow syntax, which is directly expanded and executed according to the Python syntax, and automatic differentiation is performed on the expanded execution operator. For example, for a for loop, the statements in the for loop are continuously executed under PyNative and automatic differentiation is performed on the operators according to the specific number of loops.

+ 17
- 0
docs/mindspore/source_en/design/index.rst View File

@@ -0,0 +1,17 @@
Design Concept
=========================

.. toctree::
:glob:
:maxdepth: 1

overview
tensor_view
programming_paradigm
dynamic_graph_and_static_graph
distributed_training_design
data_engine
all_scenarios
graph_fusion_engine
pluggable_device
glossary

+ 7
- 9
docs/mindspore/source_en/faq/data_processing.md View File

@@ -38,7 +38,7 @@ A: You can refer to the following steps to reduce CPU consumption (mainly due to

## Q:  Why there is no difference between the parameter `shuffle` in `GeneratorDataset`, and `shuffle=True` and `shuffle=False` when the task is run?

A: If `shuffle` is enabled, the input `Dataset` must support random access (for example, the user-defined `Dataset` has the `getitem` method). If data is returned in `yeild` mode in the user-defined `Dataset`, random access is not supported. For details, see section [Loading Dataset Overview](https://www.mindspore.cn/tutorials/en/master/advanced/dataset.html) in the tutorial.
A: If `shuffle` is enabled, the input `Dataset` must support random access (for example, the user-defined `Dataset` has the `getitem` method). If data is returned in `yeild` mode in the user-defined `Dataset`, random access is not supported. For details, see section [GeneratorDataset example](https://www.mindspore.cn/docs/en/master/api_python/dataset/mindspore.dataset.GeneratorDataset.html).

<br/>

@@ -160,9 +160,7 @@ A: You can refer to the usage of YOLOv3 which contains the resizing of different

A: [build_seg_data.py](https://gitee.com/mindspore/models/blob/master/research/cv/FCN8s/src/data/build_seg_data.py) is the script of MindRecords generated by the dataset. You can directly use or adapt it to your dataset. Alternatively, you can use `GeneratorDataset` to customize the dataset loading if you want to implement the dataset reading by yourself.

[GenratorDataset example](https://www.mindspore.cn/tutorials/en/master/advanced/dataset.html)

[GeneratorDataset API description](https://www.mindspore.cn/docs/en/master/api_python/dataset/mindspore.dataset.GeneratorDataset.html#mindspore.dataset.GeneratorDataset)
[GeneratorDataset example](https://www.mindspore.cn/docs/en/master/api_python/dataset/mindspore.dataset.GeneratorDataset.html)

<br/>

@@ -191,7 +189,7 @@ ds.GeneratorDataset(..., num_shards=8, shard_id=7, ...)
A: The data schema can be defined as follows:`cv_schema_json = {"label": {"type": "int32", "shape": [-1]}, "data": {"type": "bytes"}}`

Note: A label is an array of the numpy type, where label values 1, 1, 0, 1, 0, 1 are stored. These label values correspond to the same data, that is, the binary value of the same image.
For details, see [Converting Dataset to MindRecord](https://www.mindspore.cn/tutorials/en/master/advanced/dataset/record.html#converting-dataset-to-mindrecord).
For details, see [Converting Dataset to MindRecord](https://www.mindspore.cn/docs/en/master/model_train/dataset/record.html#Converting-Dataset-to-Record-Format).

<br/>

@@ -203,7 +201,7 @@ A: The MNIST gray scale image dataset is used for MindSpore training. Therefore,

## Q: Can you introduce the data processing framework in MindSpore?

A: MindSpore Dataset module makes it easy for users to define data preprocessing pipelines and transform samples efficiently with multiprocessing or multithreading. MindSpore Dataset also provides variable APIs for users to load and process datasets, more introduction please refer to [MindSpore Dataset](https://mindspore.cn/docs/en/master/api_python/mindspore.dataset.html#introduction-to-data-processing-pipeline). If you want to further study the performance optimization of dataset pipeline, please read [Optimizing Data Processing](https://www.mindspore.cn/tutorials/experts/en/master/dataset/optimize.html).
A: MindSpore Dataset module makes it easy for users to define data preprocessing pipelines and transform samples efficiently with multiprocessing or multithreading. MindSpore Dataset also provides variable APIs for users to load and process datasets, more introduction please refer to [MindSpore Dataset](https://mindspore.cn/docs/en/master/api_python/mindspore.dataset.html#introduction-to-data-processing-pipeline). If you want to further study the performance optimization of dataset pipeline, please read [Optimizing Data Processing](https://www.mindspore.cn/docs/en/master/model_train/dataset/optimize.html).

<br/>

@@ -314,7 +312,7 @@ dataset3 = dataset2.map(***)

## Q: What is the API corresponding to DataLoader in MindSpore?

A: If the DataLoader is considered as an API for receiving user-defined datasets, the GeneratorDataset in the MindSpore data processing API is similar to that in the DataLoader and can receive user-defined datasets. For details about how to use the GeneratorDataset, see the [Loading Dataset Overview](https://www.mindspore.cn/tutorials/en/master/advanced/dataset.html), and for details about the differences, see the [API Mapping](https://www.mindspore.cn/docs/en/master/note/api_mapping/pytorch_api_mapping.html).
A: If the DataLoader is considered as an API for receiving user-defined datasets, the GeneratorDataset in the MindSpore data processing API is similar to that in the DataLoader and can receive user-defined datasets. For details about how to use the GeneratorDataset, see the [GeneratorDataset example](https://www.mindspore.cn/docs/en/master/api_python/dataset/mindspore.dataset.GeneratorDataset.html), and for details about the differences, see the [API Mapping](https://www.mindspore.cn/docs/en/master/note/api_mapping/pytorch_api_mapping.html).

<br/>

@@ -500,7 +498,7 @@ A: When using the data sinking mode (where `data preprocessing` -> `sending queu
2022-05-09-11:36:01.893.412 -> 2022-05-09-11:36:02.006.771
```

Improvement method: View the time difference between the last item of `push_end_time` and GetNext error reporting time. If the default GetNext timeout is exceeded (default: 1900s, and can be modified through `mindspore.set_context(op_timeout=xx)`), it indicates poor data preprocessing performance. Please refer to [Optimizing the Data Processing](https://www.mindspore.cn/tutorials/experts/en/master/dataset/optimize.html) to improve data preprocessing performance.
Improvement method: View the time difference between the last item of `push_end_time` and GetNext error reporting time. If the default GetNext timeout is exceeded (default: 1900s, and can be modified through `mindspore.set_context(op_timeout=xx)`), it indicates poor data preprocessing performance. Please refer to [Optimizing the Data Processing](https://www.mindspore.cn/docs/en/master/model_train/dataset/optimize.html) to improve data preprocessing performance.

4. When the log output is similar to the following, it indicates that data preprocessing has generated 182 batches of data and the 183st batch of data is being sent to the device. And the `device_queue` shows that there is sufficient data cache on the device side.

@@ -550,7 +548,7 @@ A: When using the data sinking mode (where `data preprocessing` -> `sending queu
2022-05-09-14:31:04.064.571 ->
```

Improvement method: Please refer to [Optimizing the Data Processing](https://www.mindspore.cn/tutorials/experts/en/master/dataset/optimize.html) to improve data preprocessing performance.
Improvement method: Please refer to [Optimizing the Data Processing](https://www.mindspore.cn/docs/en/master/model_train/dataset/optimize.html) to improve data preprocessing performance.

<br/>



+ 1
- 1
docs/mindspore/source_en/faq/distributed_parallel.md View File

@@ -49,7 +49,7 @@ Solution: Manually `kill` the training process and then restart the training tas
[CRITICAL] DISTRIBUTED [mindspore/ccsrc/distributed/cluster/cluster_context.cc:130] InitNodeRole] Role name is invalid...
```

A: In the case where the user does not start the process using `mpirun` but still calls the `init()` method, MindSpore requires the user to configure several environment variables and verify according to training and [dynamic cluster startup methods](https://www.mindspore.cn/tutorials/experts/zh-CN/master/parallel/dynamic_cluster.html). If without configuring, MindSpore may display the above error message. Therefore, it is suggested that only when performing distributed training, `mindspore.communication.init` is called, and in the case of not using `mpirun`, it is configured the correct environment variables according to the documentation to start distributed training.
A: In the case where the user does not start the process using `mpirun` but still calls the `init()` method, MindSpore requires the user to configure several environment variables and verify according to training and [dynamic cluster startup methods](https://www.mindspore.cn/docs/zh-CN/master/model_train/parallel/dynamic_cluster.html). If without configuring, MindSpore may display the above error message. Therefore, it is suggested that only when performing distributed training, `mindspore.communication.init` is called, and in the case of not using `mpirun`, it is configured the correct environment variables according to the documentation to start distributed training.

<br/>



+ 1
- 1
docs/mindspore/source_en/faq/feature_advice.md View File

@@ -50,7 +50,7 @@ A: The formats of `ckpt` of MindSpore and `ckpt`of TensorFlow are not generic.

## Q: How do I use models trained by MindSpore on Atlas 200/300/500 inference product? Can they be converted to models used by HiLens Kit?

A: Yes. HiLens Kit uses Atlas 200/300/500 inference product as the inference core. Therefore, the two questions are essentially the same, which both need to convert as OM model. Atlas 200/300/500 inference product requires a dedicated OM model. Use MindSpore to export the ONNX and convert it into an OM model supported by Atlas 200/300/500 inference product. For details, see [Multi-platform Inference](https://www.mindspore.cn/tutorials/experts/en/master/infer/inference.html).
A: Yes. HiLens Kit uses Atlas 200/300/500 inference product as the inference core. Therefore, the two questions are essentially the same, which both need to convert as OM model. Atlas 200/300/500 inference product requires a dedicated OM model. Use MindSpore to export the ONNX and convert it into an OM model supported by Atlas 200/300/500 inference product. For details, see [Multi-platform Inference](https://www.mindspore.cn/docs/en/master/model_infer/overview.html).

<br/>



+ 1
- 1
docs/mindspore/source_en/faq/implement_problem.md View File

@@ -243,7 +243,7 @@ print(network.layers)

## Q: When MindSpore is used for model training, there are four input parameters for `CTCLoss`: `inputs`, `labels_indices`, `labels_values`, and `sequence_length`. How do I use `CTCLoss` for model training?

A: The `dataset` received by the defined `model.train` API can consist of multiple pieces of data, for example, (`data1`, `data2`, `data3`, ...). Therefore, the `dataset` can contain `inputs`, `labels_indices`, `labels_values`, and `sequence_length` information. You only need to define the dataset in the corresponding format and transfer it to `model.train`. For details, see [Data Processing API](https://www.mindspore.cn/tutorials/en/master/advanced/dataset.html).
A: The `dataset` received by the defined `model.train` API can consist of multiple pieces of data, for example, (`data1`, `data2`, `data3`, ...). Therefore, the `dataset` can contain `inputs`, `labels_indices`, `labels_values`, and `sequence_length` information. You only need to define the dataset in the corresponding format and transfer it to `model.train`. For details, see [Data Processing API](https://www.mindspore.cn/docs/en/master/model_train/index.html).

<br/>



+ 17
- 0
docs/mindspore/source_en/faq/index.rst View File

@@ -0,0 +1,17 @@
FAQ
========

.. toctree::
:glob:
:maxdepth: 1

installation
data_processing
implement_problem
network_compilation
operators_compile
performance_tuning
precision_tuning
distributed_parallel
inference
feature_advice

+ 3
- 3
docs/mindspore/source_en/faq/network_compilation.md View File

@@ -4,7 +4,7 @@

## Q: What is the set of syntaxes supported by static graph mode?

A: Static graph mode can support a subset of common Python syntax to support the construction and training of neural networks. Some Python syntax is not supported yet. For more detailed supported syntax set, please refer to [Static Graph Syntax Support](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html). In order to facilitate users to choose whether to extend the static graph syntax, the static graph mode provides JIT syntax support level options. For some network scenarios, it is recommended to use basic syntax (nn/ops, etc.) rather than extended syntax (such as numpy third-party library). In addition, it is recommended to use [Advanced Programming Techniques with Static Graphs](https://www.mindspore.cn/tutorials/en/master/advanced/static_graph_expert_programming.html) to optimize compilation performance.
A: Static graph mode can support a subset of common Python syntax to support the construction and training of neural networks. Some Python syntax is not supported yet. For more detailed supported syntax set, please refer to [Static Graph Syntax Support](https://www.mindspore.cn/docs/en/master/model_train/program_form/static_graph.html). In order to facilitate users to choose whether to extend the static graph syntax, the static graph mode provides JIT syntax support level options. For some network scenarios, it is recommended to use basic syntax (nn/ops, etc.) rather than extended syntax (such as numpy third-party library). In addition, it is recommended to use [Advanced Programming Techniques with Static Graphs](https://www.mindspore.cn/docs/en/master/model_train/program_form/static_graph_syntax/static_graph_expert_programming.html) to optimize compilation performance.

<br/>

@@ -531,7 +531,7 @@ net = Net()
out = net(Tensor(x))
```

3) If a function decorated with a @jit decorator is called in a custom class, an error will be reported. In this scenario, it is recommended to add @jit_class decorators to custom classes in the network and avoid the JIT Fallback feature. For more use of custom classes, please refer to [Supporting the Use of Custom Classes](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html#supporting-the-use-of-custom-classes). The use of jit_class decorators can be referred to [Use jit_class](https://www.mindspore.cn/tutorials/en/master/advanced/static_graph_expert_programming.html#using-jit-class).
3) If a function decorated with a @jit decorator is called in a custom class, an error will be reported. In this scenario, it is recommended to add @jit_class decorators to custom classes in the network and avoid the JIT Fallback feature. For more use of custom classes, please refer to [Supporting the Use of Custom Classes](https://www.mindspore.cn/docs/en/master/model_train/program_form/static_graph.html#supporting-the-use-of-custom-classes). The use of jit_class decorators can be referred to [Use jit_class](https://www.mindspore.cn/docs/en/master/model_train/program_form/static_graph_syntax/static_graph_expert_programming.html#using-jit-class).

```python
import mindspore as ms
@@ -772,7 +772,7 @@ A: The following scenarios will trigger recompilation:

## Q: How to determine how many graphs there are in static graph mode? When will the subgraph be divided? What is the impact of multiple subgraphs? How to avoid multiple subgraphs?

A: 1. The number of subgraphs can be obtained by viewing the IR file and searching for "Total subgraphs". For how to view and analyze IR files, please refer to [MindSpore IR Introduction](https://www.mindspore.cn/tutorials/en/master/advanced/error_analysis/mindir.html)
A: 1. The number of subgraphs can be obtained by viewing the IR file and searching for "Total subgraphs". For how to view and analyze IR files, please refer to [MindSpore IR Introduction](https://www.mindspore.cn/docs/en/master/model_train/debug/error_analysis/mindir.html)

2. Subgraph segmentation in static graph mode is common in control flow scenarios, such as if/while. In addition to manual writing by users, the control flow syntax within the MindSpore may also lead to dividing into multiple subgraphs.



+ 2
- 2
docs/mindspore/source_en/faq/operators_compile.md View File

@@ -59,7 +59,7 @@ In MindSpore, you can manually initialize the weight corresponding to the `paddi

## Q: When the `Tile` operator in operations executes `__infer__`, the `value` is `None`. Why is the value lost?

A: The `multiples input` of the `Tile` operator must be a constant (The value cannot directly or indirectly come from the input of the graph). Otherwise, the `None` data will be obtained during graph composition because the graph input is transferred only during graph execution and the input data cannot be obtained during graph composition. For the detailed imformation, refer to [Static Graph Syntax Support](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html).
A: The `multiples input` of the `Tile` operator must be a constant (The value cannot directly or indirectly come from the input of the graph). Otherwise, the `None` data will be obtained during graph composition because the graph input is transferred only during graph execution and the input data cannot be obtained during graph composition. For the detailed imformation, refer to [Static Graph Syntax Support](https://www.mindspore.cn/docs/en/master/model_train/program_form/static_graph.html).

<br/>

@@ -71,7 +71,7 @@ A: TBE (Tensor Boost Engine) operator is Huawei's self-developed Ascend operator

## Q: Has MindSpore implemented the anti-pooling operation similar to `nn.MaxUnpool2d`?

A: Currently, MindSpore does not provide anti-pooling APIs but you can customize the operator to implement the operation. For details, refer to [Customize Operators](https://www.mindspore.cn/tutorials/experts/en/master/operation/op_custom.html).
A: Currently, MindSpore does not provide anti-pooling APIs but you can customize the operator to implement the operation. For details, refer to [Customize Operators](https://www.mindspore.cn/docs/en/master/model_train/custom_program/operation/op_custom.html).

<br/>



+ 9
- 262
docs/mindspore/source_en/index.rst View File

@@ -1,270 +1,17 @@
.. MindSpore documentation master file, created by
sphinx-quickstart on Thu Mar 24 11:00:00 2022.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.

MindSpore Documentation
=======================

.. toctree::
:glob:
:maxdepth: 1
:caption: Design
:hidden:

design/overview
design/tensor_view
design/programming_paradigm
design/dynamic_graph_and_static_graph
design/dynamic_shape
design/distributed_training_design
design/data_engine
design/all_scenarios
design/graph_fusion_engine
design/pluggable_device
design/glossary

.. toctree::
:glob:
:maxdepth: 1
:caption: Models
:hidden:

note/official_models

.. toctree::
:glob:
:maxdepth: 1
:caption: API
:hidden:

api_python/mindspore
api_python/mindspore.nn
api_python/mindspore.ops
api_python/mindspore.ops.primitive
api_python/mindspore.mint
api_python/mindspore.amp
api_python/mindspore.train
api_python/mindspore.communication
api_python/mindspore.communication.comm_func
api_python/mindspore.common.initializer
api_python/mindspore.hal
api_python/mindspore.dataset
api_python/mindspore.dataset.transforms
api_python/mindspore.mindrecord
api_python/mindspore.nn.probability
api_python/mindspore.rewrite
api_python/mindspore.multiprocessing
api_python/mindspore.boost
api_python/mindspore.numpy
api_python/mindspore.scipy
api_python/mindspore.experimental

.. toctree::
:glob:
:maxdepth: 1
:caption: API Mapping
:hidden:

note/api_mapping/pytorch_api_mapping

.. toctree::
:glob:
:maxdepth: 1
:caption: Migration Guide
:titlesonly:
:hidden:

migration_guide/overview
migration_guide/enveriment_preparation
migration_guide/analysis_and_preparation
migration_guide/model_development/model_development
migration_guide/debug_and_tune
migration_guide/sample_code
migration_guide/faq

.. toctree::
:glob:
:maxdepth: 1
:caption: Syntax Support
:hidden:

note/static_graph_syntax_support
note/static_graph_syntax/operators
note/static_graph_syntax/statements
note/static_graph_syntax/python_builtin_functions
note/index_support

.. toctree::
:glob:
:maxdepth: 1
:caption: Environment Variables
:hidden:

note/env_var_list

.. toctree::
:glob:
:maxdepth: 1
:caption: FAQ
:hidden:

faq/installation
faq/data_processing
faq/implement_problem
faq/network_compilation
faq/operators_compile
faq/performance_tuning
faq/precision_tuning
faq/distributed_parallel
faq/inference
faq/feature_advice

.. toctree::
:glob:
:maxdepth: 1
:caption: RELEASE NOTES
:hidden:

design/index
model_train/index
model_infer/index
migration_guide/index
mindformers/index
api_python/index
orange_pi/index
kits_tools/index
faq/index
RELEASE

.. raw:: html

<div class="container">
<div class="row">
<div class="col-md-6">
<div class="doc-article-list">
<div class="doc-article-item">
<a href="./design/overview.html" class="article-link">
<div>
<div class="doc-article-head">
<span class="doc-head-content">Design</span>
</div>
<div class="doc-article-desc">
The design concept of MindSpore's main functions to help framework developers better understand the overall architecture.
</div>
</div>
</a>
</div>
</div>
</div>
<div class="col-md-6">
<div class="doc-article-list">
<div class="doc-article-item">
<a href="./note/official_models.html" class="article-link">
<div>
<div class="doc-article-head">
<span class="doc-head-content">Model Libraries</span>
</div>
<div class="doc-article-desc">
Contains model examples and performance data for different domains.
</div>
</div>
</a>
</div>
</div>
</div>
</div>
<div class="row">
<div class="col-md-6">
<div class="doc-article-list">
<div class="doc-article-item">
<a href="./api_python/mindspore.html" class="article-link">
<div>
<div class="doc-article-head">
<span class="doc-head-content">API</span>
</div>
<div class="doc-article-desc">
MindSpore API description list.
</div>
</div>
</a>
</div>
</div>
</div>
<div class="col-md-6">
<div class="doc-article-list">
<div class="doc-article-item">
<a href="./note/api_mapping/pytorch_api_mapping.html" class="article-link">
<div>
<div class="doc-article-head">
<span class="doc-head-content">API Mapping</span>
</div>
<div class="doc-article-desc">
API mapping between PyTorch and MindSpore provided by the community.
</div>
</div>
</a>
</div>
</div>
</div>
</div>
<div class="row">
<div class="col-md-6">
<div class="doc-article-list">
<div class="doc-article-item">
<a href="./migration_guide/overview.html" class="article-link">
<div>
<div class="doc-article-head">
<span class="doc-head-content">Migration Guide</span>
</div>
<div class="doc-article-desc">
The complete steps and considerations for migrating neural networks from other machine learning frameworks to MindSpore.
</div>
</div>
</a>
</div>
</div>
</div>
<div class="col-md-6">
<div class="doc-article-list">
<div class="doc-article-item">
<a href="./note/static_graph_syntax_support.html" class="article-link">
<div>
<div class="doc-article-head">
<span class="doc-head-content">Syntax Support</span>
</div>
<div class="doc-article-desc">
Syntax support for static graphs, Tensor indexes, etc.
</div>
</div>
</a>
</div>
</div>
</div>
</div>
<div class="row">
<div class="col-md-6">
<div class="doc-article-list">
<div class="doc-article-item">
<a href="./faq/installation.html" class="article-link">
<div>
<div class="doc-article-head">
<span class="doc-head-content">FAQ</span>
</div>
<div class="doc-article-desc">
Frequently asked questions and answers, including installation, data processing, compilation and execution, debugging and tuning, distributed parallelism, inference, etc.
</div>
</div>
</a>
</div>
</div>
</div>
<div class="col-md-6">
<div class="doc-article-list">
<div class="doc-article-item">
<a href="./RELEASE.html" class="article-link">
<div>
<div class="doc-article-head">
<span class="doc-head-content">RELEASE NOTES</span>
</div>
<div class="doc-article-desc">
Contains information on major features and augments, API changes for the release versions.
</div>
</div>
</a>
</div>
</div>
</div>
</div>
</div>

+ 9
- 0
docs/mindspore/source_en/kits_tools/index.rst View File

@@ -0,0 +1,9 @@
Models and Kits
=================

.. toctree::
:glob:
:maxdepth: 1

overview
official_models

docs/mindspore/source_en/note/official_models.md → docs/mindspore/source_en/kits_tools/official_models.md View File

@@ -1,6 +1,6 @@
# Official Models

[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/note/official_models.md)
[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/kits_tools/official_models.md)

## Domain Suite and Extension Packages

@@ -334,4 +334,4 @@
| Computational Fluid Dynamics | [PDE-Net](https://arxiv.org/abs/1710.09668) | [Link](https://gitee.com/mindspore/mindscience/blob/master/MindFlow/applications/data_mechanism_fusion/pde_net/README.md#) | ✅ | ✅ |
| Computational Fluid Dynamics | [hfm](https://www.science.org/doi/abs/10.1126/science.aaw4741) | [Link](https://gitee.com/mindspore/mindscience/blob/master/SciAI/sciai/model/hfm/README.md) | ✅ | ✅ |
| Computational Fluid Dynamics | [label_free_dnn_surrogate](https://www.sciencedirect.com/science/article/pii/S004578251930622X) | [Link](https://gitee.com/mindspore/mindscience/blob/master/SciAI/sciai/model/label_free_dnn_surrogate/README.md) | ✅ | ✅ |
| Computational Fluid Dynamics | [nsf_nets](https://www.sciencedirect.com/science/article/pii/S0021999120307257) | [Link](https://gitee.com/mindspore/mindscience/blob/master/SciAI/sciai/model/nsf_nets/README.md) | ✅ | ✅ |
| Computational Fluid Dynamics | [nsf_nets](https://www.sciencedirect.com/science/article/pii/S0021999120307257) | [Link](https://gitee.com/mindspore/mindscience/blob/master/SciAI/sciai/model/nsf_nets/README.md) | ✅ | ✅ |

+ 18
- 0
docs/mindspore/source_en/kits_tools/overview.md View File

@@ -0,0 +1,18 @@
# Models and Kits

[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/kits_tools/overview.md)

| Core Frameworks | Toolset |Domain Suites and Extension Packages | Scientific Computing | Foundation Model |
| ---- | ----- | ------- | --------- | ------------- |
| [MindSpore](https://www.mindspore.cn/docs/en/master/index.html) | [MindSpore Insight](https://www.mindspore.cn/mindinsight/docs/en/r2.3/index.html) | [MindSpore CV](https://mindspore-lab.github.io/mindcv/) | [MindSpore SciAI](https://www.mindspore.cn/sciai/docs/en/r0.1/index.html) | [MindSpore Transformers](https://mindformers.readthedocs.io/en/latest/)|
| [MindSpore Lite](https://www.mindspore.cn/lite) | [MindSpore Armour](https://www.mindspore.cn/mindarmour/docs/en/r2.0/index.html) | [MindSpore NLP](https://mindnlp.cqu.ai/) | [MindSpore Elec](https://www.mindspore.cn/mindelec/docs/en/r0.2/index.html) | [MindSpore Pet](https://github.com/mindspore-lab/mindpet)|
| [MindSpore AKG](https://gitee.com/mindspore/akg) | [MindSpore Serving](https://www.mindspore.cn/serving/docs/en/r2.0/index.html) | [MindSpore Audio](https://github.com/mindspore-lab/mindaudio) | [MindSpore SPONGE](https://www.mindspore.cn/mindsponge/docs/en/r1.0.0-rc2/index.html) | [MindSpore RLHF](https://github.com/mindspore-lab/mindrlhf)|
| | [MindSpore Federated](https://www.mindspore.cn/federated/docs/en/r0.1/index.html) | [MindSpore OCR](https://mindspore-lab.github.io/mindocr/) | [MindSpore Flow](https://www.mindspore.cn/mindflow/docs/en/r0.2/index.html) | [MindSpore One](https://github.com/mindspore-lab/mindone)|
| | [MindSpore Golden Stick](https://www.mindspore.cn/golden_stick/docs/en/r0.4/index.html) | [MindSpore YOLO](https://mindspore-lab.github.io/mindyolo/) | [MindSpore Earth](https://www.mindspore.cn/mindearth/docs/en/r0.2/index.html) | [MindSpore Recommender](https://www.mindspore.cn/recommender/docs/en/r0.3/index.html)|
| | [MindSpore XAI](https://www.mindspore.cn/xai/docs/en/r1.8/index.html) | [MindSpore Face](https://github.com/mindspore-lab/mindface) | [MindSpore Quantum](https://www.mindspore.cn/mindquantum/docs/en/r0.9/index.html)| |
| | [MindSpore Dev Toolkits](https://www.mindspore.cn/devtoolkit/docs/en/r2.2/index.html) | [MindSpore Graph Learning](https://www.mindspore.cn/graphlearning/docs/en/r0.2/index.html)| | |
| | | [MindSpore Reinforcement](https://www.mindspore.cn/reinforcement/docs/en/r0.7/index.html)| | |
| | | [MindSpore Probability](https://www.mindspore.cn/probability/docs/en/r1.7/index.html)| | |
| | | [MindSpore Pandas](https://www.mindspore.cn/mindpandas/docs/en/r0.2/index.html)| | |
| | | [MindSpore ModelZoo](https://gitee.com/mindspore/models)| | |
| | | [MindSpore Hub](https://www.mindspore.cn/hub/docs/en/r1.9/index.html)| | |

+ 7
- 7
docs/mindspore/source_en/migration_guide/analysis_and_preparation.md View File

@@ -99,11 +99,11 @@ See [TroubleShooter application scenarios](https://gitee.com/mindspore/toolkits/

MindSpore provides Dump function, used to model training in the graph and operator input and output data saved to disk files, generally used for network migration complex problem location (eg: operator overflow, etc). It can be dumped out of the operator level data.

For getting Dump data, refer to: [Synchronous Dump Step](https://www.mindspore.cn/tutorials/experts/en/master/debug/dump.html#dump-step) and [Asynchronous Dump Step](https://www.mindspore.cn/tutorials/experts/en/master/debug/dump.html##dump-step-1).
For getting Dump data, refer to: [Synchronous Dump Step](https://www.mindspore.cn/docs/en/master/model_train/debug/dump.html#dump-step) and [Asynchronous Dump Step](https://www.mindspore.cn/docs/en/master/model_train/debug/dump.html##dump-step-1).

For analyzig Dump data, refer to: [Synchronous Dump Data Analysis Sample](https://www.mindspore.cn/tutorials/experts/en/master/debug/dump.html#data-analysis-sample) and [Asynchronous Dump Data Analysis Sample](https://www.mindspore.cn/tutorials/experts/en/master/debug/dump.html#data-analysis-sample-1)
For analyzig Dump data, refer to: [Synchronous Dump Data Analysis Sample](https://www.mindspore.cn/docs/en/master/model_train/debug/dump.html#data-analysis-sample) and [Asynchronous Dump Data Analysis Sample](https://www.mindspore.cn/docs/en/master/model_train/debug/dump.html#data-analysis-sample-1)

See [Dump](https://www.mindspore.cn/tutorials/experts/en/master/debug/dump.html) for details.
See [Dump](https://www.mindspore.cn/docs/en/master/model_train/debug/dump.html) for details.

### Performance Issues

@@ -133,7 +133,7 @@ Currently, there are two execution modes of a mainstream deep learning framework

- In dynamic graph mode, the program is executed line by line according to the code writing sequence. In the forward execution process, the backward execution graph is dynamically generated according to the backward propagation principle. In this mode, the compiler delivers the operators in the neural network to the device one by one for computing, facilitating users to build and debug the neural network model.

### [Calling the Custom Class](https://www.mindspore.cn/tutorials/en/master/advanced/static_graph_expert_programming.html#using-jit-class)
### [Calling the Custom Class](https://www.mindspore.cn/docs/en/master/model_train/program_form/static_graph_syntax/static_graph_expert_programming.html#using-jit-class)

In static graph mode, you can use `jit_class` to modify a custom class. You can create and call an instance of the custom class, and obtain its attributes and methods.

@@ -143,15 +143,15 @@ In static graph mode, you can use `jit_class` to modify a custom class. You can

Automatic differentiation can calculate a derivative value of a derivative function at a certain point, which is a generalization of backward propagation algorithms. The main problem solved by automatic differential is to decompose a complex mathematical operation into a series of simple basic operations. This function shields a large number of derivative details and processes from users, greatly reducing the threshold for using the framework.

### [Mixed Precision](https://www.mindspore.cn/tutorials/en/master/advanced/mixed_precision.html)
### [Mixed Precision](https://www.mindspore.cn/tutorials/en/master/beginner/mixed_precision.html)

Generally, when a neural network model is trained, the default data type is FP32. In recent years, to accelerate training time, reduce memory occupied during network training, and store a trained model with same precision, more and more mixed-precision training methods are proposed in the industry. The mixed-precision training herein means that both single precision (FP32) and half precision (FP16) are used in a training process.

### [Auto Augmentation](https://www.mindspore.cn/tutorials/experts/en/master/dataset/augment.html)
### [Auto Augmentation](https://www.mindspore.cn/docs/en/master/model_train/dataset/augment.html)

MindSpore not only allows you to customize data augmentation, but also provides an automatic data augmentation mode to automatically perform data augmentation on images based on specific policies.

### [Gradient Accumulation](https://www.mindspore.cn/tutorials/experts/en/master/optimize/gradient_accumulation.html)
### [Gradient Accumulation](https://www.mindspore.cn/docs/en/master/model_train/train_process/optimize/gradient_accumulation.html)

Gradient accumulation is a method of splitting data samples for training neural networks into several small batches by batch and then calculating the batches in sequence. The purpose is to solve the out of memory (OOM) problem that the neural network cannot be trained or the network model cannot be loaded due to insufficient memory.



+ 9
- 17
docs/mindspore/source_en/migration_guide/debug_and_tune.md View File

@@ -1,8 +1,8 @@
# Debugging and Tuning
# Debug and Tune

[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/migration_guide/debug_and_tune.md)

## Debugging Tools
## FAQs and Solutions

- The following common problems may be encountered during the accuracy commissioning phase:
- The first loss and the benchmark are not aligned:
@@ -11,16 +11,8 @@
The problem is mainly caused by the network reverse. This can be done with the help of [TroubleShooter comparing MindSpore to PyTorch ckpt/pth](https://gitee.com/mindspore/toolkits/blob/master/troubleshooter/docs/migrator.md#%E5%BA%94%E7%94%A8%E5%9C%BA%E6%99%AF2%E6%AF%94%E5%AF%B9mindspore%E4%B8%8Epytorch%E7%9A%84ckptpth) to check the results of the network reverse update by comparing the values of the corresponding parameters of ckpt and pth.
- Loss appears NAN/INF:
[TroubleShooter obtains INF/NAN value throw points](https://gitee.com/mindspore/toolkits/blob/master/troubleshooter/docs/tracker.md#%E5%BA%94%E7%94%A8%E5%9C%BA%E6%99%AF2%E8%8E%B7%E5%8F%96infnan%E5%80%BC%E6%8A%9B%E5%87%BA%E7%82%B9) is used to identify the first location in the network where a NAN or INF appears.
Overflow operator detection is also available via the [Dump](https://www.mindspore.cn/tutorials/experts/en/master/debug/dump.html) tool.
- The following common problems may be encountered during the performance debugging phase:
- The first step is time-consuming
This phase mainly completes operations such as graph conversion, graph fusion, graph optimization, etc, which is the process of generating executable models. Refer to [How to Optimize Compilation Performance](https://www.mindspore.cn/tutorials/en/master/advanced/static_graph_expert_programming.html#how-to-optimize-compilation-performance).
- Iteration gap is time-consuming
Most of the time consumption in this phase comes from data acquisition, see [Data Processing Performance Optimization](https://www.mindspore.cn/tutorials/experts/en/master/dataset/optimize.html).
- Forward and reverse computation is time-consuming
This phase mainly executes the forward and reverse operators in the network and carries the main computational work of an iteration. Information such as operator time consumption during training can be recorded to a file via [Profiler](https://www.mindspore.cn/mindinsight/docs/en/master/performance_profiling.html). The performance data provides the performance data of the framework host execution and operator execution, which can also be viewed and analyzed by users through the [MindInsight](https://www.mindspore.cn/mindinsight/docs/en/master/index.html) visualization interface, helping users to debug neural network performance more efficiently.
- Iteration trailing is time-consuming
This phase is time consuming, which may be caused by the collection communication, and you can set the fusion policy to optimize. Refer to [all_reduce_fusion_config set allreduce fusion policy](https://www.mindspore.cn/docs/en/master/api_python/mindspore/mindspore.set_auto_parallel_context.html).
Overflow operator detection is also available via the [Dump](https://www.mindspore.cn/docs/en/master/model_train/debug/dump.html) tool.

- The following common problems may be encountered during the graphics debugging phase:
- Malloc device memory failed:
MindSpore failed to request memory on the device side, the original memory is that the device is occupied by other processes, you can check the running processes by ps -ef | grep "python".
@@ -32,9 +24,9 @@
### Function Debugging

During network migration, you are advised to use the PyNative mode for debugging. In PyNative mode, you can perform debugging, and log printing is user-friendly. After the debugging is complete, the graph mode is used. The graph mode is more user-friendly in execution performance. You can also find some problems in network compilation. For example, gradient truncation caused by third-party operators.
For details, see [Error Analysis](https://www.mindspore.cn/tutorials/en/master/advanced/error_analysis/error_scenario_analysis.html).
For details, see [Error Analysis](https://www.mindspore.cn/docs/en/master/model_train/debug/error_analysis/error_scenario_analysis.html).

### Accuracy Debugging
### Precision Tuning

The accuracy debugging process is as follows:

@@ -201,7 +193,7 @@ After the inference verification is complete, the basic model, data processing,
model = Model(network=train_net)
```

- Check whether overflow occurs. When loss scale is added, overflow detection is added by default to monitor the overflow result. If overflow occurs continuously, you are advised to use the [dump data](https://mindspore.cn/tutorials/experts/en/master/debug/dump.html) of MindSpore Insight to check why overflow occurs.
- Check whether overflow occurs. When loss scale is added, overflow detection is added by default to monitor the overflow result. If overflow occurs continuously, you are advised to use the [dump data](https://mindspore.cn/docs/en/master/model_train/debug/dump.html) of MindSpore Insight to check why overflow occurs.

```python
import numpy as np
@@ -290,7 +282,7 @@ If you find an operator with poor performance, you are advised to contact [MindS

The mixed precision training method accelerates the deep neural network training process by mixing the single-precision floating-point data format and the half-precision floating-point data format without compromising the network accuracy. Mixed precision training can accelerate the computing process, reduce memory usage and retrieval, and enable a larger model or batch size to be trained on specific hardware.

For details, see [Mixed Precision Tutorial](https://www.mindspore.cn/tutorials/en/master/advanced/mixed_precision.html).
For details, see [Mixed Precision Tutorial](https://www.mindspore.cn/tutorials/en/master/beginner/mixed_precision.html).

- Enabling Graph Kernel Fusion

@@ -343,4 +335,4 @@ When the data processing speed is slow, the empty queue is gradually consumed fr

For details about data performance problems, see [Data Preparation Performance Analysis](https://www.mindspore.cn/mindinsight/docs/en/master/performance_profiling_ascend.html#data-preparation-performance-analysis) of MindSpore Insight. This describes common data performance problems and solutions.

For more performance debugging methods, see [Performance Optimization](https://www.mindspore.cn/tutorials/experts/en/master/optimize/execution_opt.html).
For more performance debugging methods, see [Performance Optimization](https://www.mindspore.cn/docs/en/master/model_train/train_process/train_optimize.html).

+ 15
- 12
docs/mindspore/source_en/migration_guide/faq.rst View File

@@ -67,35 +67,35 @@ MindSpore provides a `FAQ <https://mindspore.cn/docs/en/master/faq/installation.

**Q: When using GeneratorDataset or map to load/process data, there may be syntax errors, calculation overflow and other issues that cause data errors, how to troubleshoot and debug?**

A: Observe the error stack information and locate the error code block from the error stack information, add a print or debugging point near the block of code where the error occurred, to further debugging. For details, please refer to `Data Processing Debugging Method 1 <https://www.mindspore.cn/tutorials/en/master/advanced/error_analysis/minddata_debug.html#method-1-errors-in-data-processing-execution,-print-logs-or-add-debug-points-to-code-debugging>`_ .
A: Observe the error stack information and locate the error code block from the error stack information, add a print or debugging point near the block of code where the error occurred, to further debugging. For details, please refer to `Data Processing Debugging Method 1 <https://www.mindspore.cn/docs/en/master/model_train/debug/error_analysis/minddata_debug.html#method-1-errors-in-data-processing-execution,-print-logs-or-add-debug-points-to-code-debugging>`_ .


**Q: How to test the each data processing operator in the map operation if data-enhanced map operation error is reported?**

A: Map operation can be debugged through the execution of individual operators or through data pipeline debugging mode. For details, please refer to `Data Processing Debugging Method 2 <https://www.mindspore.cn/tutorials/en/master/advanced/error_analysis/minddata_debug.html#method-2-data-enhanced-map-operation-error,-testing-the-each-data-processing-operator-in-the-map-operation>`_ .
A: Map operation can be debugged through the execution of individual operators or through data pipeline debugging mode. For details, please refer to `Data Processing Debugging Method 2 <https://www.mindspore.cn/docs/en/master/model_train/debug/error_analysis/minddata_debug.html#method-2-data-enhanced-map-operation-error,-testing-the-each-data-processing-operator-in-the-map-operation>`_ .


**Q: While training, we will get very many WARNINGs suggesting that our dataset performance is slow, how should we handle this?**
A: It is possible to iterate through the dataset individually and see the processing time for each piece of data to determine how well the dataset is performing. For details, please refer to `Data Processing Debugging Method 3 <https://www.mindspore.cn/tutorials/en/master/advanced/error_analysis/minddata_debug.html#method-3-testing-data-processing-performance>`_ .
A: It is possible to iterate through the dataset individually and see the processing time for each piece of data to determine how well the dataset is performing. For details, please refer to `Data Processing Debugging Method 3 <https://www.mindspore.cn/docs/en/master/model_train/debug/error_analysis/minddata_debug.html#method-3-testing-data-processing-performance>`_ .


**Q: In the process of processing data, if abnormal result values are generated due to computational errors, numerical overflow, etc., resulting in operator computation overflow and weight update anomalies during network training, how should we troubleshoot them?**

A: Turn off shuffling and fix random seeds to ensure reproductivity, and then use tools such as NumPy to quickly verify the results. For details, please refer to `Data Processing Debugging Method 4 <https://www.mindspore.cn/tutorials/en/master/advanced/error_analysis/minddata_debug.html#method-4-checking-for-exception-data-in-data-processing>`_ .
A: Turn off shuffling and fix random seeds to ensure reproductivity, and then use tools such as NumPy to quickly verify the results. For details, please refer to `Data Processing Debugging Method 4 <https://www.mindspore.cn/docs/en/master/model_train/debug/error_analysis/minddata_debug.html#method-4-checking-for-exception-data-in-data-processing>`_ .


For more common data processing problems, please refer to `Analyzing Common Data Processing Problems <https://www.mindspore.cn/tutorials/en/master/advanced/error_analysis/minddata_debug.html#analyzing-common-data-processing-problems>`_ , and for differences in data processing during migration, please refer to `Data Pre-Processing Differences Between MindSpore And PyTorch <https://www.mindspore.cn/docs/en/master/migration_guide/model_development/dataset.html#comparison-of-data-processing-differences>`_ .
For more common data processing problems, please refer to `Analyzing Common Data Processing Problems <https://www.mindspore.cn/docs/en/master/model_train/debug/error_analysis/minddata_debug.html#analyzing-common-data-processing-problems>`_ , and for differences in data processing during migration, please refer to `Data Pre-Processing Differences Between MindSpore And PyTorch <https://www.mindspore.cn/docs/en/master/migration_guide/model_development/dataset.html#comparison-of-data-processing-differences>`_ .

- Gradient Derivation

**Q: How can I implement the backward computation of an operator?**

A: MindSpore provides an automated interface for gradient derivation, a feature that shields the user from a great deal of the details and process of derivation. However, if there are some special scenarios where the user needs to manually control the calculation of its backward computation, the user can also define its backward computation through the Cell.bprop interface. For details, please refer to `Customize Cell reverse <https://www.mindspore.cn/tutorials/en/master/advanced/modules/layer.html#custom-cell-reverse>`_ .
A: MindSpore provides an automated interface for gradient derivation, a feature that shields the user from a great deal of the details and process of derivation. However, if there are some special scenarios where the user needs to manually control the calculation of its backward computation, the user can also define its backward computation through the Cell.bprop interface. For details, please refer to `Customize Cell reverse <https://www.mindspore.cn/docs/en/master/model_train/custom_program/network_custom.html#custom-cell-reverse>`_ .

**Q: How to deal with training instability due to gradient overflow?**

A: Network overflows are usually manifested as loss Nan/INF, the loss suddenly becomes very large. MindSpore provides `dump data <https://www.mindspore.cn/tutorials/experts/en/master/debug/dump.html>`_ to get the information about the overflow operator information. When there is gradient underflow in the network, we can use loss scale to support gradient derivation. For details, please refer to `loss scale <https://www.mindspore.cn/docs/en/master/migration_guide/model_development/gradient.html#loss-scale>`_; When the network has gradient explosion, you can consider adding gradient trimming. For details, please refer to `gradient cropping <https://www.mindspore.cn/docs/en/master/migration_guide/model_development/gradient.html#gradient-cropping>`_ .
A: Network overflows are usually manifested as loss Nan/INF, the loss suddenly becomes very large. MindSpore provides `dump data <https://www.mindspore.cn/docs/en/master/model_train/debug/dump.html>`_ to get the information about the overflow operator information. When there is gradient underflow in the network, we can use loss scale to support gradient derivation. For details, please refer to `loss scale <https://www.mindspore.cn/docs/en/master/migration_guide/model_development/gradient.html#loss-scale>`_; When the network has gradient explosion, you can consider adding gradient trimming. For details, please refer to `gradient cropping <https://www.mindspore.cn/docs/en/master/migration_guide/model_development/gradient.html#gradient-cropping>`_ .

- Debugging and Tuning

@@ -134,11 +134,14 @@ MindSpore provides a `FAQ <https://mindspore.cn/docs/en/master/faq/installation.

`MindSpore Model Accuracy Tuning Practice (3): Common Accuracy Problems <https://www.hiascend.com/forum/thread-0235121941523411032-1-1.html>`_.

For more debugging and tuning FAQs, please refer to `Tuning FAQs and Solutions <https://www.mindspore.cn/docs/en/master/migration_guide/debug_and_tune.html#debugging-tools>`_ .
For more debugging and tuning FAQs, please refer to
`Function Debugging <https://www.mindspore.cn/docs/en/master/migration_guide/debug.html>`_,
`Precision Tuning <https://www.mindspore.cn/docs/en/master/migration_guide/acc_debug.html>`_,
`Performance Tuning <https://www.mindspore.cn/docs/en/master/migration_guide/perf_debug.html>`_ .

**Q: During model training, the first step takes a long time, how to optimize it?**

A: During the model training process, the first step contains the network compilation time. If you want to optimize the performance of the first step, you can analyze whether the model compilation can be optimized. For details, please refer to `Static graph network compilation performance optimization <https://www.mindspore.cn/tutorials/en/master/advanced/static_graph_expert_programming.html>`_.
A: During the model training process, the first step contains the network compilation time. If you want to optimize the performance of the first step, you can analyze whether the model compilation can be optimized. For details, please refer to `Static graph network compilation performance optimization <https://www.mindspore.cn/docs/en/master/model_train/program_form/static_graph_syntax/static_graph_expert_programming.html>`_.

**Q: The non-first step takes a long time during model training, how to optimize it?**

@@ -176,7 +179,7 @@ MindSpore provides a `FAQ <https://mindspore.cn/docs/en/master/faq/installation.
loss = loss/response_gt
return loss

See `Static diagram syntax support <https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html>`_ for details.
See `Static diagram syntax support <https://www.mindspore.cn/docs/en/master/model_train/program_form/static_graph.html>`_ for details.

**Q: What can I do if the error is reported during training: RuntimeError: "Launch kernel failed, name:Default/... What to do" ?**

@@ -190,7 +193,7 @@ MindSpore provides a `FAQ <https://mindspore.cn/docs/en/master/faq/installation.

A: There are many reasons for static graph errors, and the general failure will be printed in the log. If you can't intuitively get the error information from the log, you can analyze it by export GLOG_v=1 to specify the log level to get more detailed information about the error.

Meanwhile, when the compilation of computational graphs reports errors, it will automatically save the file analyze_failed.ir, which can help to analyze the location of the error code. For more details, please refer to `Static Graph Mode Error Analysis <https://www.mindspore.cn/tutorials/en/master/advanced/error_analysis/error_scenario_analysis.html>`_.
Meanwhile, when the compilation of computational graphs reports errors, it will automatically save the file analyze_failed.ir, which can help to analyze the location of the error code. For more details, please refer to `Static Graph Mode Error Analysis <https://www.mindspore.cn/docs/en/master/model_train/debug/error_analysis/error_scenario_analysis.html>`_.

**Q: Out Of Memory error is reported during Graph mode static graph training, what should I do?**

@@ -200,6 +203,6 @@ MindSpore provides a `FAQ <https://mindspore.cn/docs/en/master/faq/installation.
When there is not enough memory, try lowering the batch_size; analyze the memory to see if there are too many communication operators resulting in low overall memory reuse.
For more details, please refer to `Analysis of the problem of insufficient resources <https://www.mindspore.cn/tutorials/en/master/advanced/error_analysis/mindrt_debug.html#insufficient-resources>`_ .
For more details, please refer to `Analysis of the problem of insufficient resources <https://www.mindspore.cn/docs/en/master/model_train/debug/error_analysis/mindrt_debug.html#insufficient-resources>`_ .

See `Execution Issues <https://www.mindspore.cn/docs/en/master/faq/implement_problem.html>`_ for more tuning FAQs.

+ 14
- 0
docs/mindspore/source_en/migration_guide/index.rst View File

@@ -0,0 +1,14 @@
Model Migration
=========================

.. toctree::
:glob:
:maxdepth: 1

overview
enveriment_preparation
analysis_and_preparation
model_development/model_development
debug_and_tune
sample_code
reference

+ 1
- 1
docs/mindspore/source_en/migration_guide/migrator_with_tools.md View File

@@ -17,7 +17,7 @@ This guide describes how to apply various migration-related tools to improve the
| [MindSpore Dev Toolkit](https://www.mindspore.cn/devtoolkit/docs/en/master/index.html) | MindSpore Dev Toolkit is a development kit supporting the cross-platform Python IDE plug-in developed by MindSpore, and provides functions such as Project creation, intelligent supplement, API search, and Document search. | With capabilities such as API search, it is possible to improve the efficiency of users network migration development. |
| [TroubleShooter](https://gitee.com/mindspore/toolkits/tree/master/troubleshooter) | TroubleShooter is a MindSpore web development debugging toolkit designed to provide convenient, easy-to-use debugging capabilities. | Network debugging toolset (e.g., network weight migration, accuracy comparison, code tracing, error reporting analysis, execution tracking and other functions) to help users improve migration debugging efficiency. |
| [Profiler](https://www.mindspore.cn/mindinsight/docs/en/master/performance_profiling.html) | Profiler can record information such as operator time consumption during the training process into a file, which can be viewed and analyzed by the user through a visual interface, helping the user to debug neural network performance more efficiently. | After the network migration, if the execution performance is not good, you can use Profiler to analyze the performance. Profiler provides Profiler analysis of the host execution of the framework, as well as the execution of the operator. |
| [Dump](https://www.mindspore.cn/tutorials/experts/en/master/debug/dump.html) | The Dump function is provided to save the graphs from model training and the input and output data of the operators to a disk file. | Generally used for network migration complex problem localization (eg: operator overflow, etc.) and can dump out the operator-level data. |
| [Dump](https://www.mindspore.cn/docs/en/master/model_train/debug/dump.html) | The Dump function is provided to save the graphs from model training and the input and output data of the operators to a disk file. | Generally used for network migration complex problem localization (eg: operator overflow, etc.) and can dump out the operator-level data. |

## Examples of Network Migration Tool Applications



+ 1
- 1
docs/mindspore/source_en/migration_guide/missing_api_processing_policy.md View File

@@ -234,7 +234,7 @@ The final error is less than 1e-5, which is a reasonable accuracy error.

## 3. Customize operators

When existing APIs cannot be used for packaging, or the performance of cell encapsulation is poor, you need to customize operators. For details, see [Custom Operators](https://www.mindspore.cn/tutorials/experts/en/master/operation/op_custom.html).
When existing APIs cannot be used for packaging, or the performance of cell encapsulation is poor, you need to customize operators. For details, see [Custom Operators](https://www.mindspore.cn/docs/en/master/model_train/custom_program/operation/op_custom.html).

In addition to migrating APIs, you can also use the `aot` development mode of the `Custom` operator to call the PyTorch Aten operator for quick verification. For details, see [Using Third-party Operator Libraries Based on Customized Interfaces](https://www.mindspore.cn/docs/en/master/migration_guide/use_third_party_op.html).



+ 4
- 4
docs/mindspore/source_en/migration_guide/model_development/dataset.md View File

@@ -6,11 +6,11 @@ This chapter focuses on considerations related to data processing in network mig

[Data Processing](https://www.mindspore.cn/tutorials/en/master/beginner/dataset.html)

[Auto Augmentation](https://www.mindspore.cn/tutorials/experts/en/master/dataset/augment.html)
[Auto Augmentation](https://www.mindspore.cn/docs/en/master/model_train/dataset/augment.html)

[Lightweight Data Processing](https://mindspore.cn/tutorials/en/master/advanced/dataset/eager.html)
[Lightweight Data Processing](https://mindspore.cn/docs/en/master/model_train/dataset/eager.html)

[Optimizing the Data Processing](https://www.mindspore.cn/tutorials/experts/en/master/dataset/optimize.html)
[Optimizing the Data Processing](https://www.mindspore.cn/docs/en/master/model_train/dataset/optimize.html)

## Comparison of Data Processing Differences

@@ -19,7 +19,7 @@ The basic process of data construction in MindSpore and PyTorch mainly includes
### Processing Common Datasets

MindSpore provides [interfaces](https://www.mindspore.cn/docs/en/master/api_python/mindspore.dataset.html) for loading common datasets from many different domains.
In addition to the above commonly used datasets in the industry, MindSpore has also developed MindRecord data format to cope with efficient reading, mega data storage and reading scenarios, and you can refer to [MindRecord](https://www.mindspore.cn/tutorials/en/master/advanced/dataset/record.html) . Since this article is to introduce similar APIs and the differences in the writing style, so we have selected one of the more classic dataset APIs as an example of migration comparison. For other dataset interface differences, please refer to the [torchaudio](https://www.mindspore.cn/docs/en/master/note/api_mapping/pytorch_api_mapping.html#torchaudio), [torchtext](https://www.mindspore.cn/docs/en/master/note/api_mapping/pytorch_api_mapping.html#torchtext), [torchvision](https://www.mindspore.cn/docs/en/master/note/api_mapping/pytorch_api_mapping.html#torchvision) modules of PyTorch and MindSpore API mapping table.
In addition to the above commonly used datasets in the industry, MindSpore has also developed MindRecord data format to cope with efficient reading, mega data storage and reading scenarios, and you can refer to [MindRecord](https://www.mindspore.cn/docs/en/master/model_train/dataset/record.html) . Since this article is to introduce similar APIs and the differences in the writing style, so we have selected one of the more classic dataset APIs as an example of migration comparison. For other dataset interface differences, please refer to the [torchaudio](https://www.mindspore.cn/docs/en/master/note/api_mapping/pytorch_api_mapping.html#torchaudio), [torchtext](https://www.mindspore.cn/docs/en/master/note/api_mapping/pytorch_api_mapping.html#torchtext), [torchvision](https://www.mindspore.cn/docs/en/master/note/api_mapping/pytorch_api_mapping.html#torchvision) modules of PyTorch and MindSpore API mapping table.

Here is an example of FashionMnistDataset. The following figure shows how to use the PyTorch API (left part), and how to use the MindSpore API (right part). The main reading process is: use FashionMnist API to load the source dataset, then use transforms to transform the data content, and finally according to the batch operation on the dataset. The key parts of the code on both sides are marked with color boxes.



+ 4
- 4
docs/mindspore/source_en/migration_guide/model_development/gradient.md View File

@@ -358,7 +358,7 @@ MindSpore does not require this function. MindSpore is an automatic differentiat
## Automatic Differentiation Interfaces

After the forward network is constructed, MindSpore provides an interface to [automatic differentiation](https://mindspore.cn/tutorials/en/master/beginner/autograd.html) to calculate the gradient results of the model.
In the tutorial of [automatic derivation](https://mindspore.cn/tutorials/en/master/advanced/derivation.html), some descriptions of various gradient calculation scenarios are given.
In the tutorial of [automatic derivation](https://mindspore.cn/docs/en/master/model_train/train_process/derivation.html), some descriptions of various gradient calculation scenarios are given.

### mindspore.grad

@@ -640,9 +640,9 @@ This function is similar to the function of grad, and it is not recommended in t

Since the gradient overflow may be encountered in the process of finding the gradient in the mixed accuracy scenario, we generally use the loss scale to accompany the gradient derivation.

> On Ascend, because operators such as Conv, Sort, and TopK can only be float16, and MatMul is preferably float16 due to performance issues, it is recommended that loss scale operations be used as standard for network training. [List of operators on Ascend only support float16][https://www.mindspore.cn/docs/en/master/migration_guide/debug_and_tune.html#4-training-accuracy].
> On Ascend, because operators such as Conv, Sort, and TopK can only be float16, and MatMul is preferably float16 due to performance issues, it is recommended that loss scale operations be used as standard for network training. [List of operators on Ascend only support float16][https://www.mindspore.cn/docs/en/master/migration_guide/acc_debug.html#4-training-accuracy].
>
> The overflow can obtain overflow operator information via MindSpore Insight [dump data](https://mindspore.cn/tutorials/experts/en/master/debug/dump.html).
> The overflow can obtain overflow operator information via MindSpore Insight [dump data](https://mindspore.cn/docs/en/master/model_train/debug/dump.html).
>
> General overflow manifests itself as loss Nan/INF, loss suddenly becomes large, etc.

@@ -708,4 +708,4 @@ grad = ops.clip_by_global_norm(grad)

Gradient accumulation is a way that data samples of a kind of training neural network is split into several small Batches by Batch, and then calculated in order to solve the OOM (Out Of Memory) problem that due to the lack of memory, resulting in too large Batch size, the neural network can not be trained or the network model is too large to load.

For detailed, refer to [Gradient Accumulation](https://www.mindspore.cn/tutorials/experts/en/master/optimize/gradient_accumulation.html).
For detailed, refer to [Gradient Accumulation](https://www.mindspore.cn/docs/en/master/model_train/train_process/optimize/gradient_accumulation.html).

+ 1
- 1
docs/mindspore/source_en/migration_guide/model_development/learning_rate_and_optimizer.md View File

@@ -2,7 +2,7 @@

[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/migration_guide/model_development/learning_rate_and_optimizer.md)

Before reading this chapter, please read the official MindSpore tutorial [Optimizer](https://mindspore.cn/tutorials/en/master/advanced/modules/optimizer.html).
Before reading this chapter, please read the official MindSpore tutorial [Optimizer](https://mindspore.cn/docs/en/master/model_train/custom_program/optimizer.html).

Here is an introduction to some special ways of using MindSpore optimizer and the principle of learning rate decay strategy.



+ 1
- 1
docs/mindspore/source_en/migration_guide/model_development/loss_function.md View File

@@ -2,7 +2,7 @@

[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/migration_guide/model_development/loss_function.md)

Before reading this chapter, please read the MindSpore official website tutorial first[Loss Function](https://www.mindspore.cn/tutorials/en/master/advanced/modules/loss.html).
Before reading this chapter, please read the MindSpore official website tutorial first[Loss Function](https://www.mindspore.cn/docs/en/master/model_train/custom_program/loss.html).

The MindSpore official website tutorial on loss functions explains built-in, custom, and multi label loss functions, as well as guidance on their use in model training. Here is a list of differences in functionality and interface between MindSpore's loss function and PyTorch's loss function.



+ 5
- 5
docs/mindspore/source_en/migration_guide/model_development/model_and_cell.md View File

@@ -10,7 +10,7 @@ The basic logic of PyTorch and MindSpore is shown below:

It can be seen that PyTorch and MindSpore generally require network definition, forward computation, backward computation, and gradient update steps in the implementation process.

- Network definition: In the network definition, the desired forward network, loss function, and optimizer are generally defined. To define the forward network in Net(), PyTorch network inherits from nn.Module; similarly, MindSpore network inherits from nn.Cell. In MindSpore, the loss function and optimizers can be customized in addition to using those provided in MindSpore. You can refer to [Model Module Customization](https://mindspore.cn/tutorials/en/master/advanced/modules.html). Interfaces such as functional/nn can be used to splice the required forward networks, loss functions and optimizers.
- Network definition: In the network definition, the desired forward network, loss function, and optimizer are generally defined. To define the forward network in Net(), PyTorch network inherits from nn.Module; similarly, MindSpore network inherits from nn.Cell. In MindSpore, the loss function and optimizers can be customized in addition to using those provided in MindSpore. You can refer to [Model Module Customization](https://mindspore.cn/docs/en/master/model_train/index.html). Interfaces such as functional/nn can be used to splice the required forward networks, loss functions and optimizers.

- Forward computation: Run the instantiated network to get the logit, and use the logit and target as inputs to calculate the loss. It should be noted that if the forward function has more than one output, you need to pay attention to the effect of more than one output on the result when calculating the backward function.

@@ -538,9 +538,9 @@ x = initializer(Uniform(), [1, 2, 3], mindspore.float32)

##### Customizing Initialization Parameters

Generally, the high-level API encapsulated by MindSpore initializes parameters by default. Sometimes, the initialization distribution is inconsistent with the required initialization and PyTorch initialization. In this case, you need to customize initialization. [Initializing Network Arguments](https://mindspore.cn/tutorials/en/master/advanced/modules/initializer.html#customized-parameter-initialization) describes a method of initializing parameters during using API attributes. This section describes a method of initializing parameters by using Cell.
Generally, the high-level API encapsulated by MindSpore initializes parameters by default. Sometimes, the initialization distribution is inconsistent with the required initialization and PyTorch initialization. In this case, you need to customize initialization. [Initializing Network Arguments](https://mindspore.cn/docs/en/master/model_train/custom_program/initializer.html#customized-parameter-initialization) describes a method of initializing parameters during using API attributes. This section describes a method of initializing parameters by using Cell.

For details about the parameters, see [Network Parameters](https://mindspore.cn/tutorials/zh-CN/master/advanced/modules/initializer.html). This section uses `Cell` as an example to describe how to obtain all parameters in `Cell` and how to initialize the parameters in `Cell`.
For details about the parameters, see [Network Parameters](https://mindspore.cn/docs/zh-CN/master/model_train/custom_program/initializer.html). This section uses `Cell` as an example to describe how to obtain all parameters in `Cell` and how to initialize the parameters in `Cell`.

> Note that the method described in this section cannot be performed in `construct`. To change the value of a parameter on the network, use [assign](https://www.mindspore.cn/docs/en/master/api_python/ops/mindspore.ops.assign.html).

@@ -766,7 +766,7 @@ For `Cell`, MindSpore provides two image modes: `GRAPH_MODE` (static image) and

The **inference** behavior of the model in `PyNative` mode is the same as that of common Python code. However, during training, **once a tensor is converted into NumPy for other operations, the gradient of the network is truncated, which is equivalent to detach of PyTorch**.

When `GRAPH_MODE` is used, syntax restrictions usually occur. In this case, graph compilation needs to be performed on the Python code. However, MindSpore does not support the complete Python syntax set. Therefore, there are some restrictions on compiling the `construct` function. For details about the restrictions, see [MindSpore Static Graph Syntax](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html).
When `GRAPH_MODE` is used, syntax restrictions usually occur. In this case, graph compilation needs to be performed on the Python code. However, MindSpore does not support the complete Python syntax set. Therefore, there are some restrictions on compiling the `construct` function. For details about the restrictions, see [MindSpore Static Graph Syntax](https://www.mindspore.cn/docs/en/master/model_train/program_form/static_graph.html).

Compared with the detailed syntax description, the common restrictions are as follows:

@@ -881,7 +881,7 @@ dx (Tensor(shape=[2, 5], dtype=Float32, value=
[0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000]]),)
```

Now, let's see how to [customize backward network construction](https://www.mindspore.cn/tutorials/en/master/advanced/modules/layer.html#custom-cell-reverse).
Now, let's see how to [customize backward network construction](https://www.mindspore.cn/docs/en/master/model_train/custom_program/network_custom.html#custom-cell-reverse).

```python
import numpy as np


+ 2
- 2
docs/mindspore/source_en/migration_guide/model_development/model_development.rst View File

@@ -49,7 +49,7 @@ The training process of the whole network consists of 5 modules:
details and procedures and greatly reduces the threshold of
framework. When you need to customize the gradient, MindSpore also
provides
`interface <https://www.mindspore.cn/tutorials/en/master/advanced/modules/layer.html#custom-cell-reverse>`__
`interface <https://www.mindspore.cn/docs/en/master/model_train/custom_program/network_custom.html#custom-cell-reverse>`__
to freely implement the gradient calculation.

- Optimizer: used to calculate and update network parameters during
@@ -199,7 +199,7 @@ for the following situations:
precision.
4. In Ascend environment, Conv, Sort and TopK can only be float16, and
add `loss
scale <https://www.mindspore.cn/tutorials/zh-CN/master/advanced/mixed_precision.html>`__
scale <https://www.mindspore.cn/tutorials/zh-CN/master/beginner/mixed_precision.html>`__
to avoid overflow.
5. In the Ascend environment, operators with the stride property such as
Conv and Pooling have rules about the length of the stride, which


+ 2
- 2
docs/mindspore/source_en/migration_guide/model_development/training_and_evaluation.md View File

@@ -336,8 +336,8 @@ mpirun --allow-run-as-root -n $RANK_SIZE python ../train.py --config_path=$CONFI

If on the GPU, you can set which cards to use by `export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7`. Specifying the card number is not currently supported on Ascend.

Please refer to [Distributed Case](https://www.mindspore.cn/tutorials/experts/en/master/parallel/distributed_case.html) for more details.
Please refer to [Distributed Case](https://www.mindspore.cn/docs/en/master/model_train/parallel/distributed_case.html) for more details.

## Offline Inference

In addition to the possibility of online inference, MindSpore provides many offline inference methods for different environments. Please refer to [Model Inference](https://www.mindspore.cn/tutorials/experts/en/master/infer/inference.html) for details.
In addition to the possibility of online inference, MindSpore provides many offline inference methods for different environments. Please refer to [Model Inference](https://www.mindspore.cn/docs/en/master/model_infer/overview.html) for details.

+ 29
- 23
docs/mindspore/source_en/migration_guide/overview.md View File

@@ -13,24 +13,28 @@ E-.-text2(AI Platform ModelArts)
B-->|Step 2|F(<font color=blue>Model Analysis and Preparation</font>)
F-.-text3(Reproducing algorithm, analyzing API compliance using MindSpore Dev Toolkit and analyzing function compliance.)
B-->|Step 3|G(<font color=blue>Network Constructing Comparison</font>)
G-->I(<font color=blue>Dataset</font>)
I-.-text4(Aligning the process of dataset loading, augmentation and reading)
G-->J(<font color=blue>Network Constructing</font>)
J-.-text5(Aligning the network)
G-->N(<font color=blue>Loss Function</font>)
N-.-text6(Aligning the loss function)
G-->K(<font color=blue>Learning Rate and Optimizer</font>)
K-.-text7(Aligning the optimizer and learning rate strategy)
G-->L(<font color=blue>Gradient</font>)
L-.-text8(Aligning the reverse gradients)
G-->M(<font color=blue>Training and Evaluation Process</font>)
M-.-text9(Aligning the process of training and evaluation)
B-->|Step 4|H(<font color=blue>Debug and Tuning</font>)
H-.-text10(Aligning from three aspects: function, precision and performance)
G-->K(<font color=blue>Dataset</font>)
K-.-text4(Aligning the process of dataset loading, augmentation and reading)
G-->L(<font color=blue>Network Constructing</font>)
L-.-text5(Aligning the network)
G-->P(<font color=blue>Loss Function</font>)
P-.-text6(Aligning the loss function)
G-->M(<font color=blue>Learning Rate and Optimizer</font>)
M-.-text7(Aligning the optimizer and learning rate strategy)
G-->N(<font color=blue>Gradient</font>)
N-.-text8(Aligning the reverse gradients)
G-->O(<font color=blue>Training and Evaluation Process</font>)
O-.-text9(Aligning the process of training and evaluation)
B-->|Step 4|H(<font color=blue>Function Debugging</font>)
H-.-text10(Functional alignment)
B-->|Step 5|I(<font color=blue>Precision Tuning</font>)
I-.-text11(Precision alignment)
B-->|Step 6|J(<font color=blue>Performance Tuning</font>)
J-.-text12(Performance Alignment)
A-->C(<font color=blue>A Migration Sample</font>)
C-.-text11(The network migration sample, taking ResNet50 as an example.)
C-.-text13(The network migration sample, taking ResNet50 as an example.)
A-->D(<font color=blue>FAQs</font>)
D-.-text12(Provides the frequently-asked questions and corresponding solutions in migration process.)
D-.-text14(Provides the frequently-asked questions and corresponding solutions in migration process.)

click C "https://www.mindspore.cn/docs/en/master/migration_guide/sample_code.html"
click D "https://www.mindspore.cn/docs/en/master/migration_guide/faq.html"
@@ -38,12 +42,14 @@ click D "https://www.mindspore.cn/docs/en/master/migration_guide/faq.html"
click E "https://www.mindspore.cn/docs/en/master/migration_guide/enveriment_preparation.html"
click F "https://www.mindspore.cn/docs/en/master/migration_guide/analysis_and_preparation.html"
click G "https://www.mindspore.cn/docs/en/master/migration_guide/model_development/model_development.html"
click H "https://www.mindspore.cn/docs/en/master/migration_guide/debug_and_tune.html"
click H "https://www.mindspore.cn/docs/en/master/migration_guide/debug.html"
click I "https://www.mindspore.cn/docs/en/master/migration_guide/acc_debug.html"
click J "https://www.mindspore.cn/docs/en/master/migration_guide/perf_debug.html"

click I "https://www.mindspore.cn/docs/en/master/migration_guide/model_development/dataset.html"
click J "https://www.mindspore.cn/docs/en/master/migration_guide/model_development/model_and_cell.html"
click K "https://www.mindspore.cn/docs/en/master/migration_guide/model_development/learning_rate_and_optimizer.html"
click L "https://www.mindspore.cn/docs/en/master/migration_guide/model_development/gradient.html"
click M "https://www.mindspore.cn/docs/en/master/migration_guide/model_development/training_and_evaluation.html"
click N "https://www.mindspore.cn/docs/en/master/migration_guide/model_development/loss_function.html"
click K "https://www.mindspore.cn/docs/en/master/migration_guide/model_development/dataset.html"
click L "https://www.mindspore.cn/docs/en/master/migration_guide/model_development/model_and_cell.html"
click M "https://www.mindspore.cn/docs/en/master/migration_guide/model_development/learning_rate_and_optimizer.html"
click N "https://www.mindspore.cn/docs/en/master/migration_guide/model_development/gradient.html"
click O "https://www.mindspore.cn/docs/en/master/migration_guide/model_development/training_and_evaluation.html"
click P "https://www.mindspore.cn/docs/en/master/migration_guide/model_development/loss_function.html"
```

+ 9
- 0
docs/mindspore/source_en/migration_guide/reference.rst View File

@@ -0,0 +1,9 @@
Reference
==========

.. toctree::
:maxdepth: 1

../note/api_mapping/pytorch_api_mapping
migrator_with_tools
faq

+ 4
- 1
docs/mindspore/source_en/migration_guide/reproducing_algorithm.md View File

@@ -82,4 +82,7 @@ After obtaining the reference code, you need to reproduce the accuracy of the re

- Obtain the loss decrease trend to check whether the training convergence trend on MindSpore is normal.
- Obtain the parameter file for conversion and inference verification. For details, see [Inference and Training Process](https://www.mindspore.cn/docs/en/master/migration_guide/model_development/training_and_evaluation.html).
- Obtain the performance baseline for performance tuning. For details, see [Debugging and Tuning](https://www.mindspore.cn/docs/en/master/migration_guide/debug_and_tune.html).
- Obtain the performance baseline for performance tuning. For details, see
[Function Debugging](https://www.mindspore.cn/docs/en/master/migration_guide/debug.html),
[Precision Tuning](https://www.mindspore.cn/docs/en/master/migration_guide/acc_debug.html),
[Performance Tuning](https://www.mindspore.cn/docs/en/master/migration_guide/perf_debug.html).

+ 2
- 2
docs/mindspore/source_en/migration_guide/sample_code.md View File

@@ -1081,7 +1081,7 @@ MindSpore has three methods to use mixed precision:

1. Use `Cast` to convert the network input `cast` into `float16` and the loss input `cast` into `float32`.
2. Use the `to_float` method of `Cell`. For details, see [Network Construction](https://www.mindspore.cn/docs/en/master/migration_guide/model_development/model_and_cell.html).
3. Use the `amp_level` interface of the `Model` to perform mixed precision. For details, see [Automatic Mixed-Precision](https://www.mindspore.cn/tutorials/en/master/advanced/mixed_precision.html#automatic-mix-precision).
3. Use the `amp_level` interface of the `Model` to perform mixed precision. For details, see [Automatic Mixed-Precision](https://www.mindspore.cn/tutorials/en/master/beginner/mixed_precision.html#automatic-mix-precision).

Use the third method to set `amp_level` in `Model` to `O3` and check the profiler result.

@@ -1102,7 +1102,7 @@ If most of the data queues are empty, you need to optimize the data performance.

![resnet_profiler12](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_zh_cn/migration_guide/images/resnet_profiler12.png)

In the queue of each data processing operation, the last operator and the `batch` operator are empty for a long time. In this case, you can increase the degree of parallelism of the `batch` operator. For details, see [Data Processing Performance Tuning](https://www.mindspore.cn/tutorials/experts/en/master/dataset/optimize.html).
In the queue of each data processing operation, the last operator and the `batch` operator are empty for a long time. In this case, you can increase the degree of parallelism of the `batch` operator. For details, see [Data Processing Performance Tuning](https://www.mindspore.cn/docs/en/master/model_train/dataset/optimize.html).

The code required for ResNet migration can be obtained from [code](https://gitee.com/mindspore/docs/tree/master/docs/mindspore/source_zh_cn/migration_guide/code).



+ 1
- 1
docs/mindspore/source_en/migration_guide/sparsity.md View File

@@ -7,4 +7,4 @@ A [sparse tensor](https://matteding.github.io/2019/04/25/sparse-matrices/) is a
In some scenarios (such as recommendation systems, molecular dynamics, graph neural networks), the data is sparse. If you use common dense tensors to represent the data, you may introduce many unnecessary calculations, storage, and communication costs. In this case, it is better to use sparse tensor to represent the data.

MindSpore now supports the most commonly used [CSR and COO data formats](https://www.mindspore.cn/tutorials/en/master/beginner/tensor.html#sparse-tensor). Currently, only a limited number of sparse operators are supported, and most sparse features are restricted. In this case, you are advised to check whether the corresponding operator supports sparse computing. If the operator does not support sparse computing, convert it into a common operator.
After the operator is converted into a dense operator, the video memory used increases. Therefore, the batch size implemented by referring to may not be used for training. In this case, you can use [Gradient Accumulation](https://www.mindspore.cn/tutorials/experts/en/master/optimize/gradient_accumulation.html) to simulate large batch training.
After the operator is converted into a dense operator, the video memory used increases. Therefore, the batch size implemented by referring to may not be used for training. In this case, you can use [Gradient Accumulation](https://www.mindspore.cn/docs/en/master/model_train/train_process/optimize/gradient_accumulation.html) to simulate large batch training.

+ 1
- 1
docs/mindspore/source_en/migration_guide/use_third_party_op.md View File

@@ -6,7 +6,7 @@

When lacking of the built-in operators during developing a network, you can use the primitive in [Custom](https://www.mindspore.cn/docs/en/master/api_python/ops/mindspore.ops.Custom.html#mindspore-ops-custom) to easily and quickly define and use different types of customized operators.

Developers can choose different customized operator development methods according to their needs. For details, please refer to the [Usage Guide](https://www.mindspore.cn/tutorials/experts/en/master/operation/op_custom.html) of Custom operator.
Developers can choose different customized operator development methods according to their needs. For details, please refer to the [Usage Guide](https://www.mindspore.cn/docs/en/master/model_train/custom_program/operation/op_custom.html) of Custom operator.

One of the development methods for customized operators, the `aot` method, has its own special use. The `aot` can call the corresponding `cpp`/`cuda` functions by loading a pre-compiled `so`. Therefore. When a third-party library provides `API`, a `cpp`/`cuda` function, you can try to call its function interface in `so`, which is described below by taking `Aten` library in PyTorch as an example.



+ 1
- 1
docs/mindspore/source_en/mindformers/appendix/env_variables.md View File

@@ -1,3 +1,3 @@
# Environment Variable Descriptions

![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)(https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/mindformers/appendix/env_variables.md)
[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/mindformers/appendix/env_variables.md)

+ 2
- 2
docs/mindspore/source_en/mindformers/index.rst View File

@@ -1,4 +1,4 @@
Large Model Specialization
Large Model Development
===========================

.. toctree::
@@ -69,6 +69,6 @@ Large Model Specialization
:caption: FAQ

faq/model_related
faq/func_releated
faq/func_related
faq/mindformers_contribution
faq/openmind_contribution

+ 10
- 0
docs/mindspore/source_en/model_infer/index.rst View File

@@ -0,0 +1,10 @@
Model Inference
=========================

.. toctree::
:glob:
:maxdepth: 1

overview
llm_infer
llm_lite

+ 8
- 0
docs/mindspore/source_en/model_infer/llm_infer.rst View File

@@ -0,0 +1,8 @@
LLM Inference
==============

.. toctree::
:glob:
:maxdepth: 1

model_compression

+ 3
- 0
docs/mindspore/source_en/model_infer/llm_lite.md View File

@@ -0,0 +1,3 @@
# Device-side Inference

[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/model_infer/llm_lite.md)

tutorials/experts/source_en/infer/model_compression.md → docs/mindspore/source_en/model_infer/model_compression.md View File

@@ -1,6 +1,6 @@
# Model Compression

[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/tutorials/experts/source_en/infer/model_compression.md)
[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/model_infer/model_compression.md)

## Overview


tutorials/experts/source_en/infer/inference.md → docs/mindspore/source_en/model_infer/overview.md View File

@@ -1,6 +1,6 @@
# Inference Model Overview

[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/tutorials/experts/source_en/infer/inference.md)
[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/model_infer/overview.md)

MindSpore can execute inference tasks on different hardware platforms based on trained models.


+ 3
- 0
docs/mindspore/source_en/model_train/custom_program/fusion_pass.md View File

@@ -0,0 +1,3 @@
# Custom Fusion Pass

[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/model_train/custom_program/fusion_pass.md)

tutorials/source_en/advanced/modules/layer.md → docs/mindspore/source_en/model_train/custom_program/hook_program.md View File

@@ -1,306 +1,12 @@
[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/tutorials/source_en/advanced/modules/layer.md)
# Hook Programming

# Cell and Parameter

Cell, as the basic unit of neural network construction, corresponds to the concept of neural network layer, and the abstract encapsulation of Tensor computation operation can represent the neural network structure more accurately and clearly. In addition to the basic Tensor computation flow definition, the neural network layer contains functions such as parameter management and state management. Parameter is the core of neural network training and is usually used as an internal member variable of the neural network layer. In this section, we systematically introduce parameters, neural network layers and their related usage.

## Parameter

Parameter is a special class of Tensor, which is a variable whose value can be updated during model training. MindSpore provides the `mindspore.Parameter` class for Parameter construction. In order to distinguish between Parameter for different purposes, two different categories of Parameter are defined below. In order to distinguish between Parameter for different purposes, two different categories of Parameter are defined below:

- Trainable parameter. Tensor that is updated after the gradient is obtained according to the backward propagation algorithm during model training, and `required_grad` needs to be set to `True`.
- Untrainable parameters. Tensor that does not participate in backward propagation needs to update values (e.g. `mean` and `var` variables in BatchNorm), when `requires_grad` needs to be set to `False`.

> Parameter is set to `required_grad=True` by default.

We construct a simple fully-connected layer as follows:

```python
import numpy as np
import mindspore
from mindspore import nn
from mindspore import ops
from mindspore import Tensor, Parameter

class Network(nn.Cell):
def __init__(self):
super().__init__()
self.w = Parameter(Tensor(np.random.randn(5, 3), mindspore.float32), name='w') # weight
self.b = Parameter(Tensor(np.random.randn(3,), mindspore.float32), name='b') # bias

def construct(self, x):
z = ops.matmul(x, self.w) + self.b
return z

net = Network()
```

In the `__init__` method of `Cell`, we define two parameters `w` and `b` and configure `name` for namespace management. Use `self.attr` in the `construct` method to call directly to participate in Tensor operations.

### Obtaining Parameter

After constructing the neural network layer by using Cell+Parameter, we can use various methods to obtain the Parameter managed by Cell.

#### Obtaining a Single Parameter

To get a particular parameter individually, just call a member variable of a Python class directly.

```python
print(net.b.asnumpy())
```

```text
[-1.2192779 -0.36789745 0.0946381 ]
```

#### Obtaining a Trainable Parameter

Trainable parameters can be obtained by using the `Cell.trainable_params` method, and this interface is usually called when configuring the optimizer.

```python
print(net.trainable_params())
```

```text
[Parameter (name=w, shape=(5, 3), dtype=Float32, requires_grad=True), Parameter (name=b, shape=(3,), dtype=Float32, requires_grad=True)]
```

#### Obtaining All Parameters

Use the `Cell.get_parameters()` method to get all parameters, at which point a Python iterator will be returned.

```python
print(type(net.get_parameters()))
```

```text
<class 'generator'>
```

Or you can call `Cell.parameters_and_names` to return the parameter names and parameters.

```python
for name, param in net.parameters_and_names():
print(f"{name}:\n{param.asnumpy()}")
```

```text
w:
[[ 4.15680408e-02 -1.20311625e-01 5.02573885e-02]
[ 1.22175144e-04 -1.34980649e-01 1.17642188e+00]
[ 7.57667869e-02 -1.74758151e-01 -5.19092619e-01]
[-1.67846107e+00 3.27240258e-01 -2.06452996e-01]
[ 5.72323874e-02 -8.27963874e-02 5.94243526e-01]]
b:
[-1.2192779 -0.36789745 0.0946381 ]
```

### Modifying the Parameter

#### Modifying Parameter Values Directly

Parameter is a special kind of Tensor, so its value can be modified by using the Tensor index modification.

```python
net.b[0] = 1.
print(net.b.asnumpy())
```

```text
[ 1. -0.36789745 0.0946381 ]
```

#### Overriding the Modified Parameter Values

The `Parameter.set_data` method can be called to override the Parameter by using a Tensor with the same Shape. This method is commonly used for [Cell traversal initialization](https://www.mindspore.cn/tutorials/en/master/advanced/modules/initializer.html) by using Initializer.

```python
net.b.set_data(Tensor([3, 4, 5]))
print(net.b.asnumpy())
```

```text
[3. 4. 5.]
```

#### Modifying Parameter Values During Runtime

The main role of parameters is to update their values during model training, which involves parameter modification during runtime after backward propagation to obtain gradients, or when untrainable parameters need to be updated. Due to the compiled design of MindSpore's [Accelerating with Static Graphs](https://www.mindspore.cn/tutorials/en/master/beginner/accelerate_with_static_graph.html), it is necessary at this point to use the `mindspore.ops.assign` interface to assign parameters. This method is commonly used in [Custom Optimizer](https://www.mindspore.cn/tutorials/en/master/advanced/modules/optimizer.html) scenarios. The following is a simple sample modification of parameter values during runtime:

```python
import mindspore as ms

@ms.jit
def modify_parameter():
b_hat = ms.Tensor([7, 8, 9])
ops.assign(net.b, b_hat)
return True

modify_parameter()
print(net.b.asnumpy())
```

```text
[7. 8. 9.]
```

### Parameter Tuple

ParameterTuple, variable tuple, used to store multiple Parameter, is inherited from tuple tuples, and provides cloning function.

The following example provides the ParameterTuple creation method:

```python
from mindspore.common.initializer import initializer
from mindspore import ParameterTuple
# Creation
x = Parameter(default_input=ms.Tensor(np.arange(2 * 3).reshape((2, 3))), name="x")
y = Parameter(default_input=initializer('ones', [1, 2, 3], ms.float32), name='y')
z = Parameter(default_input=2.0, name='z')
params = ParameterTuple((x, y, z))

# Clone from params and change the name to "params_copy"
params_copy = params.clone("params_copy")

print(params)
print(params_copy)
```

```text
(Parameter (name=x, shape=(2, 3), dtype=Int64, requires_grad=True), Parameter (name=y, shape=(1, 2, 3), dtype=Float32, requires_grad=True), Parameter (name=z, shape=(), dtype=Float32, requires_grad=True))
(Parameter (name=params_copy.x, shape=(2, 3), dtype=Int64, requires_grad=True), Parameter (name=params_copy.y, shape=(1, 2, 3), dtype=Float32, requires_grad=True), Parameter (name=params_copy.z, shape=(), dtype=Float32, requires_grad=True))
```

## Cell Training State Change

Some Tensor operations in neural networks do not behave the same during training and inference, e.g., `nn.Dropout` performs random dropout during training but not during inference, and `nn.BatchNorm` requires updating the `mean` and `var` variables during training and fixing their values unchanged during inference. So we can set the state of the neural network through the `Cell.set_train` interface.

When `set_train` is set to True, the neural network state is `train`, and the default value of `set_train` interface is `True`:

```python
net.set_train()
print(net.phase)
```

```text
train
```

When `set_train` is set to False, the neural network state is `predict`:

```python
net.set_train(False)
print(net.phase)
```

```text
predict
```

## Custom Neural Network Layers

Normally, the neural network layer interface and function interface provided by MindSpore can meet the model construction requirements, but since the AI field is constantly updating, it is possible to encounter new network structures without built-in modules. At this point, we can customize the neural network layer through the function interface provided by MindSpore, Primitive operator, and can use the `Cell.bprop` method to customize the reverse. The following are the details of each of the three customization methods.

### Constructing Neural Network Layers by Using the Function Interface

MindSpore provides a large number of basic function interfaces, which can be used to construct complex Tensor operations, encapsulated as neural network layers. The following is an example of `Threshold` with the following equation:

$$
y =\begin{cases}
x, &\text{ if } x > \text{threshold} \\
\text{value}, &\text{ otherwise }
\end{cases}
$$

It can be seen that `Threshold` determines whether the value of the Tensor is greater than the `threshold` value, keeps the value whose judgment result is `True`, and replaces the value whose judgment result is `False`. Therefore, the corresponding implementation is as follows:

```python
class Threshold(nn.Cell):
def __init__(self, threshold, value):
super().__init__()
self.threshold = threshold
self.value = value

def construct(self, inputs):
cond = ops.gt(inputs, self.threshold)
value = ops.fill(inputs.dtype, inputs.shape, self.value)
return ops.select(cond, inputs, value)
```

Here `ops.gt`, `ops.fill`, and `ops.select` are used to implement judgment and replacement respectively. The following custom `Threshold` layer is implemented:

```python
m = Threshold(0.1, 20)
inputs = mindspore.Tensor([0.1, 0.2, 0.3], mindspore.float32)
m(inputs)
```

```text
Tensor(shape=[3], dtype=Float32, value= [ 2.00000000e+01, 2.00000003e-01, 3.00000012e-01])
```

It can be seen that `inputs[0] = threshold`, so it is replaced with `20`.

### Custom Cell Reverse

In special scenarios, we not only need to customize the forward logic of the neural network layer, but also want to manually control the computation of its reverse, which we can define through the `Cell.bprop` interface. The function will be used in scenarios such as new neural network structure design and backward propagation speed optimization. In the following, we take `Dropout2d` as an example to introduce custom Cell reverse.

```python
class Dropout2d(nn.Cell):
def __init__(self, keep_prob):
super().__init__()
self.keep_prob = keep_prob
self.dropout2d = ops.Dropout2D(keep_prob)

def construct(self, x):
return self.dropout2d(x)

def bprop(self, x, out, dout):
_, mask = out
dy, _ = dout
if self.keep_prob != 0:
dy = dy * (1 / self.keep_prob)
dy = mask.astype(mindspore.float32) * dy
return (dy.astype(x.dtype), )

dropout_2d = Dropout2d(0.8)
dropout_2d.bprop_debug = True
```

The `bprop` method has three separate input parameters:

- *x*: Forward input. When there are multiple forward inputs, the same number of inputs are required.
- *out*: Forward input.
- *dout*: When backward propagation is performed, the current Cell executes the previous reverse result.

Generally we need to calculate the reverse result according to the reverse derivative formula based on the forward output and the reverse result of the front layer, and return it. The reverse calculation of `Dropout2d` requires masking the reverse result of the front layer based on the `mask` matrix of the forward output, and then scaling according to `keep_prob`. The final implementation can get the correct calculation result.

When customizing the reverse direction of a Cell, it supports extended writing in PyNative mode and can differentiate the weights inside the Cell. The specific columns are as follows:

```python
class NetWithParam(nn.Cell):
def __init__(self):
super(NetWithParam, self).__init__()
self.w = Parameter(Tensor(np.array([2.0], dtype=np.float32)), name='weight')
self.internal_params = [self.w]

def construct(self, x):
output = self.w * x
return output

def bprop(self, *args):
return (self.w * args[-1],), {self.w: args[0] * args[-1]}
```

`bprop` method supports *args as an input parameter, and the last data in the args array, `args[-1]` is the gradient returned to the cell. Set the weight of differentiation through `self.internal_params`, and return a tuple and a dictionary in the `bprop` function. Return the tuple corresponding to the input gradient, as well as the dictionary corresponding to the gradient with key as the weight and value as the weight.

## Hook Function
[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/model_train/custom_program/hook_program.md)

Debugging deep learning networks is a big task for every practitioner in the field of deep learning. Since the deep learning network hides the input and output data as well as the inverse gradient of the intermediate layer operators, only the gradient of the network input data (feature quantity and weight) is provided, resulting in the inability to accurately sense the data changes of the intermediate layer operators, which reduces the debugging efficiency. In order to facilitate users to debug the deep learning network accurately and quickly, MindSpore designes Hook function in dynamic graph mode. **Using Hook function can capture the input and output data of intermediate layer operators as well as the reverse gradient**.

Currently, four forms of Hook functions are provided in dynamic graph mode: HookBackward operator and register_forward_pre_hook, register_forward_hook, register_backward_hook functions registered on Cell objects.
Currently, five forms of Hook functions are provided in dynamic graph mode: HookBackward operator and register_forward_pre_hook, register_forward_hook, register_backward_pre_hook, register_backward_hook functions registered on Cell objects.

### HookBackward Operator
## HookBackward Operator

HookBackward implements the Hook function in the form of an operator. The user initializes a HookBackward operator and places it at the location in the deep learning network where the gradient needs to be captured. In the forward execution of the network, the HookBackward operator outputs the input data as is without any modification. When the network back propagates the gradient, the Hook function registered on HookBackward will capture the gradient back propagated to this point. The user can customize the operation on the gradient in the Hook function, such as printing the gradient, or returning a new gradient.

@@ -337,7 +43,7 @@ output: (Tensor(shape=[], dtype=Float32, value= 4), Tensor(shape=[], dtype=Float

For more descriptions of the HookBackward operator, refer to the [API documentation](https://mindspore.cn/docs/en/master/api_python/ops/mindspore.ops.HookBackward.html).

### register_forward_pre_hook Function in Cell Object
## register_forward_pre_hook Function in Cell Object

The user can use the `register_forward_pre_hook` function on the Cell object to register a custom Hook function to capture data that is passed to that Cell object. This function does not work in static graph mode and inside functions modified with `@jit`. The `register_forward_pre_hook` function takes the Hook function as an input and returns a `handle` object that corresponds to the Hook function. The user can remove the corresponding Hook function by calling the `remove()` function of the `handle` object. Each call to the `register_forward_pre_hook` function returns a different `handle` object. Hook functions should be defined in the following way.

@@ -441,7 +147,7 @@ To avoid running failure when scripts switch to graph mode, it is not recommende

For more information about the `register_forward_pre_hook` function of the Cell object, refer to the [API documentation](https://mindspore.cn/docs/en/master/api_python/nn/mindspore.nn.Cell.html#mindspore.nn.Cell.register_forward_pre_hook).

### register_forward_hook Function of Cell Object
## register_forward_hook Function of Cell Object

The user can use the `register_forward_hook` function on the Cell object to register a custom Hook function that captures the data passed forward to the Cell object and the output data of the Cell object. This function does not work in static graph mode and inside functions modified with `@jit`. The `register_forward_hook` function takes the Hook function as an input and returns a `handle` object that corresponds to the Hook function. The user can remove the corresponding Hook function by calling the `remove()` function of the `handle` object. Each call to the `register_forward_hook` function returns a different `handle` object. Hook functions should be defined in the following way.

@@ -506,21 +212,86 @@ To avoid running failure when the script switches to graph mode, it is not recom

For more information about the `register_forward_hook` function of the Cell object, please refer to the [API documentation](https://mindspore.cn/docs/en/master/api_python/nn/mindspore.nn.Cell.html#mindspore.nn.Cell.register_forward_hook).

### register_backward_hook Function of Cell Object
## register_backward_pre_hook Function of Cell Object

The user can use the `register_backward_pre_hook` function on the Cell object to register a custom Hook function that captures the gradient associated with the Cell object when the network is back propagated. This function does not work in graph mode or inside functions modified with `@jit`. The `register_backward_pre_hook` function takes the Hook function as an input and returns a `handle` object that corresponds to the Hook function. The user can remove the corresponding Hook function by calling the `remove()` function of the `handle` object. Each call to the `register_backward_pre_hook` function will return a different `handle` object.

Unlike the custom Hook function used by the HookBackward operator, the inputs of the Hook function used by `register_backward_pre_hook` contains `cell`, which represents the information of the Cell object, the gradient passed to the Cell object in reverse of the Cell object.

The sample code is as follows:

```python
def backward_hook_pre_function(cell, grad_output):
print(grad_output)
```

Here `cell` is the information of the Cell object, `grad_output` is the gradient passed to the Cell object when the network is back-propagated. Therefore, the user can use the `register_backward_pre_hook` function to capture the backward input gradients of a particular Cell object in the network. The user can customize the operations on the gradient in the Hook function, such as viewing, printing the gradient, or returning the new input gradient. If you need to return the new input gradient in the Hook function, the return value must be in the form of `tuple`.

The sample code is as follows:

```python
import numpy as np
import mindspore as ms
import mindspore.nn as nn

ms.set_context(mode=ms.PYNATIVE_MODE)

def backward_hook_pre_function(cell, grad_output):
print(grad_output)

class Net(nn.Cell):
def __init__(self):
super(Net, self).__init__()
self.conv = nn.Conv2d(1, 2, kernel_size=2, stride=1, padding=0, weight_init="ones", pad_mode="valid")
self.bn = nn.BatchNorm2d(2, momentum=0.99, eps=0.00001, gamma_init="ones")
self.handle = self.bn.register_backward_pre_hook(backward_hook_pre_function)
self.relu = nn.ReLU()

def construct(self, x):
x = self.conv(x)
x = self.bn(x)
x = self.relu(x)
return x

net = Net()
grad_net = ms.grad(net)
output = grad_net(ms.Tensor(np.ones([1, 1, 2, 2]).astype(np.float32)))
print(output)
net.handle.remove()
output = grad_net(ms.Tensor(np.ones([1, 1, 2, 2]).astype(np.float32)))
print("-------------\n", output)
```

```text
(Tensor(shape=[1, 2, 1, 1], dtype=Float32, value=
[[[[ 1.00000000e+00]],
[[ 1.00000000e+00]]]]),)
[[[[1.99999 1.99999]
[1.99999 1.99999]]]]
-------------
[[[[1.99999 1.99999]
[1.99999 1.99999]]]]
```

To avoid running failure when the scripts switch to graph mode, it is not recommended to call the `register_backward_pre_hook` function and the `remove()` function of the `handle` object in the `construct` function of the Cell object. In PyNative mode, if the `register_backward_pre_hook` function is called in the `construct` function of the Cell object, the Cell object will register a new Hook function every time it runs.

For more information about the `register_backward_pre_hook` function of the Cell object, please refer to the [API documentation](https://mindspore.cn/docs/en/master/api_python/nn/mindspore.nn.Cell.html#mindspore.nn.Cell.register_backward_pre_hook).

## register_backward_hook Function of Cell Object

The user can use the `register_backward_hook` function on the Cell object to register a custom Hook function that captures the gradient associated with the Cell object when the network is back propagated. This function does not work in graph mode or inside functions modified with `@jit`. The `register_backward_hook` function takes the Hook function as an input and returns a `handle` object that corresponds to the Hook function. The user can remove the corresponding Hook function by calling the `remove()` function of the `handle` object. Each call to the `register_backward_hook` function will return a different `handle` object.

Unlike the custom Hook function used by the HookBackward operator, the inputs of the Hook function used by `register_backward_hook` contains `cell_id`, which represents the name and id information of the Cell object, the gradient passed to the Cell object in reverse, and the gradient of the reverse output of the Cell object.
Unlike the custom Hook function used by the HookBackward operator, the inputs of the Hook function used by `register_backward_hook` contains `cell`, which represents the information of the Cell object, the gradient passed to the Cell object in reverse, and the gradient of the reverse output of the Cell object.

The sample code is as follows:

```python
def backward_hook_function(cell_id, grad_input, grad_output):
def backward_hook_function(cell, grad_input, grad_output):
print(grad_input)
print(grad_output)
```

Here `cell_id` is the name and the ID information of the Cell object, `grad_input` is the gradient passed to the Cell object when the network is back-propagated, which corresponds to the reverse output gradient of the next operator in the forward process. `grad_output` is the gradient of the reverse output of the Cell object. Therefore, the user can use the `register_backward_hook` function to capture the backward input and backward output gradients of a particular Cell object in the network. The user can customize the operations on the gradient in the Hook function, such as viewing, printing the gradient, or returning the new output gradient. If you need to return the new output gradient in the Hook function, the return value must be in the form of `tuple`.
Here `cell` is the information of the Cell object, `grad_input` is the gradient of the reverse output of the Cell object. `grad_output` is the gradient passed to the Cell object when the network is back-propagated, which corresponds to the reverse output gradient of the next operator in the forward process. Therefore, the user can use the `register_backward_hook` function to capture the backward input and backward output gradients of a particular Cell object in the network. The user can customize the operations on the gradient in the Hook function, such as viewing, printing the gradient, or returning the new output gradient. If you need to return the new output gradient in the Hook function, the return value must be in the form of `tuple`.

The sample code is as follows:

@@ -531,7 +302,7 @@ import mindspore.nn as nn

ms.set_context(mode=ms.PYNATIVE_MODE)

def backward_hook_function(cell_id, grad_input, grad_output):
def backward_hook_function(cell, grad_input, grad_output):
print(grad_input)
print(grad_output)

@@ -572,7 +343,13 @@ print("-------------\n", output)
[1.99999 1.99999]]]]
```

When the `register_backward_hook` function and the `register_forward_pre_hook` function, and the `register_forward_hook` function act on the same Cell object at the same time, if the `register_forward_pre_hook` and the `register_forward_hook` functions add other operators for data processing, these new operators will participate in the forward calculation of the data before or after the execution of the Cell object, but the backward gradient of these new operators is not captured by the `register_backward_hook` function. The Hook function registered in `register_backward_hook` only captures the input and output gradients of the original Cell object.
To avoid running failure when the scripts switch to graph mode, it is not recommended to call the `register_backward_hook` function and the `remove()` function of the `handle` object in the `construct` function of the Cell object. In PyNative mode, if the `register_backward_hook` function is called in the `construct` function of the Cell object, the Cell object will register a new Hook function every time it runs.

For more information about the `register_backward_hook` function of the Cell object, please refer to the [API documentation](https://mindspore.cn/docs/en/master/api_python/nn/mindspore.nn.Cell.html#mindspore.nn.Cell.register_backward_hook).

## Using the multiple hook function of Cell Object

When the `register_backward_pre_hook` function, the `register_backward_hook` function, the `register_forward_pre_hook` function, and the `register_forward_hook` function act on the same Cell object at the same time, if the `register_forward_pre_hook` and the `register_forward_hook` functions add other operators for data processing, these new operators will participate in the forward calculation of the data before or after the execution of the Cell object, but the backward gradient of these new operators is not captured by the `register_backward_pre_hook` function or the `register_backward_hook` function. The Hook function registered in `register_backward_pre_hook` only captures the input gradients of the original Cell object. The Hook function registered in `register_backward_hook` only captures the input and output gradients of the original Cell object.

The sample code is as follows:

@@ -594,7 +371,10 @@ def forward_hook_fn(cell, inputs, outputs):
outputs = outputs + outputs
return outputs

def backward_hook_fn(cell_id, grad_input, grad_output):
def backward_pre_hook_fn(cell, grad_output):
print("grad input: ", grad_output)

def backward_hook_fn(cell, grad_input, grad_output):
print("grad input: ", grad_input)
print("grad output: ", grad_output)

@@ -604,7 +384,8 @@ class Net(nn.Cell):
self.relu = nn.ReLU()
self.handle = self.relu.register_forward_pre_hook(forward_pre_hook_fn)
self.handle2 = self.relu.register_forward_hook(forward_hook_fn)
self.handle3 = self.relu.register_backward_hook(backward_hook_fn)
self.handle3 = self.relu.register_backward_pre_hook(backward_pre_hook_fn)
self.handle4 = self.relu.register_backward_hook(backward_hook_fn)

def construct(self, x, y):
x = x + y
@@ -621,13 +402,10 @@ print(gradient)
forward inputs: (Tensor(shape=[1], dtype=Float32, value= [ 2.00000000e+00]),)
forward inputs: (Tensor(shape=[1], dtype=Float32, value= [ 2.00000000e+00]),)
forward outputs: [2.]
grad input: (Tensor(shape=[1], dtype=Float32, value= [ 1.00000000e+00]),)
grad input: (Tensor(shape=[1], dtype=Float32, value= [ 2.00000000e+00]),)
grad output: (Tensor(shape=[1], dtype=Float32, value= [ 2.00000000e+00]),)
(Tensor(shape=[1], dtype=Float32, value= [ 2.00000000e+00]), Tensor(shape=[1], dtype=Float32, value= [ 2.00000000e+00]))
```

Here `grad_input` is the gradient passed to `self.relu` when the gradient is back-propagated, not the gradient of the new `Add` operator in the `forward_hook_fn` function. Here `grad_output` is the reverse output gradient of the `self.relu` when the gradient is back-propagated, not the reverse output gradient of the new `Add` operator in the `forward_pre_hook_fn` function. The `register_forward_pre_hook` and `register_forward_hook` functions work before and after the execution of the Cell object and do not affect the gradient capture range of the reverse Hook function on the Cell object.

To avoid running failure when the scripts switch to graph mode, it is not recommended to call the `register_backward_hook` function and the `remove()` function of the `handle` object in the `construct` function of the Cell object. In PyNative mode, if the `register_backward_hook` function is called in the `construct` function of the Cell object, the Cell object will register a new Hook function every time it runs.

For more information about the `register_backward_hook` function of the Cell object, please refer to the [API documentation](https://mindspore.cn/docs/en/master/api_python/nn/mindspore.nn.Cell.html#mindspore.nn.Cell.register_backward_hook).
Here `grad_output` is the gradient passed to `self.relu` when the gradient is back-propagated, not the gradient of the new `Add` operator in the `forward_hook_fn` function. Here `grad_input` is the reverse output gradient of the `self.relu` when the gradient is back-propagated, not the reverse output gradient of the new `Add` operator in the `forward_pre_hook_fn` function. The `register_forward_pre_hook` and `register_forward_hook` functions work before and after the execution of the Cell object and do not affect the gradient capture range of the reverse Hook function on the Cell object.

tutorials/source_en/advanced/modules/initializer.md → docs/mindspore/source_en/model_train/custom_program/initializer.md View File

@@ -1,6 +1,6 @@
[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/tutorials/source_en/advanced/modules/initializer.md)
# Custom Parameter Initialization

# Parameter Initialization
[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/model_train/custom_program/initializer.md)

## Initializing with Built-In Parameters


+ 198
- 0
docs/mindspore/source_en/model_train/custom_program/layer.md View File

@@ -0,0 +1,198 @@
# Cell and Parameter

[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/model_train/custom_program/layer.md)

Cell, as the basic unit of neural network construction, corresponds to the concept of neural network layer, and the abstract encapsulation of Tensor computation operation can represent the neural network structure more accurately and clearly. In addition to the basic Tensor computation flow definition, the neural network layer contains functions such as parameter management and state management. Parameter is the core of neural network training and is usually used as an internal member variable of the neural network layer. In this section, we systematically introduce parameters, neural network layers and their related usage.

## Parameter

Parameter is a special class of Tensor, which is a variable whose value can be updated during model training. MindSpore provides the `mindspore.Parameter` class for Parameter construction. In order to distinguish between Parameter for different purposes, two different categories of Parameter are defined below. In order to distinguish between Parameter for different purposes, two different categories of Parameter are defined below:

- Trainable parameter. Tensor that is updated after the gradient is obtained according to the backward propagation algorithm during model training, and `required_grad` needs to be set to `True`.
- Untrainable parameters. Tensor that does not participate in backward propagation needs to update values (e.g. `mean` and `var` variables in BatchNorm), when `requires_grad` needs to be set to `False`.

> Parameter is set to `required_grad=True` by default.

We construct a simple fully-connected layer as follows:

```python
import numpy as np
import mindspore
from mindspore import nn
from mindspore import ops
from mindspore import Tensor, Parameter

class Network(nn.Cell):
def __init__(self):
super().__init__()
self.w = Parameter(Tensor(np.random.randn(5, 3), mindspore.float32), name='w') # weight
self.b = Parameter(Tensor(np.random.randn(3,), mindspore.float32), name='b') # bias

def construct(self, x):
z = ops.matmul(x, self.w) + self.b
return z

net = Network()
```

In the `__init__` method of `Cell`, we define two parameters `w` and `b` and configure `name` for namespace management. Use `self.attr` in the `construct` method to call directly to participate in Tensor operations.

### Obtaining Parameter

After constructing the neural network layer by using Cell+Parameter, we can use various methods to obtain the Parameter managed by Cell.

#### Obtaining a Single Parameter

To get a particular parameter individually, just call a member variable of a Python class directly.

```python
print(net.b.asnumpy())
```

```text
[-1.2192779 -0.36789745 0.0946381 ]
```

#### Obtaining a Trainable Parameter

Trainable parameters can be obtained by using the `Cell.trainable_params` method, and this interface is usually called when configuring the optimizer.

```python
print(net.trainable_params())
```

```text
[Parameter (name=w, shape=(5, 3), dtype=Float32, requires_grad=True), Parameter (name=b, shape=(3,), dtype=Float32, requires_grad=True)]
```

#### Obtaining All Parameters

Use the `Cell.get_parameters()` method to get all parameters, at which point a Python iterator will be returned.

```python
print(type(net.get_parameters()))
```

```text
<class 'generator'>
```

Or you can call `Cell.parameters_and_names` to return the parameter names and parameters.

```python
for name, param in net.parameters_and_names():
print(f"{name}:\n{param.asnumpy()}")
```

```text
w:
[[ 4.15680408e-02 -1.20311625e-01 5.02573885e-02]
[ 1.22175144e-04 -1.34980649e-01 1.17642188e+00]
[ 7.57667869e-02 -1.74758151e-01 -5.19092619e-01]
[-1.67846107e+00 3.27240258e-01 -2.06452996e-01]
[ 5.72323874e-02 -8.27963874e-02 5.94243526e-01]]
b:
[-1.2192779 -0.36789745 0.0946381 ]
```

### Modifying the Parameter

#### Modifying Parameter Values Directly

Parameter is a special kind of Tensor, so its value can be modified by using the Tensor index modification.

```python
net.b[0] = 1.
print(net.b.asnumpy())
```

```text
[ 1. -0.36789745 0.0946381 ]
```

#### Overriding the Modified Parameter Values

The `Parameter.set_data` method can be called to override the Parameter by using a Tensor with the same Shape. This method is commonly used for [Cell traversal initialization](https://www.mindspore.cn/docs/en/master/model_train/custom_program/initializer.html) by using Initializer.

```python
net.b.set_data(Tensor([3, 4, 5]))
print(net.b.asnumpy())
```

```text
[3. 4. 5.]
```

#### Modifying Parameter Values During Runtime

The main role of parameters is to update their values during model training, which involves parameter modification during runtime after backward propagation to obtain gradients, or when untrainable parameters need to be updated. Due to the compiled design of MindSpore's [Accelerating with Static Graphs](https://www.mindspore.cn/tutorials/en/master/beginner/accelerate_with_static_graph.html), it is necessary at this point to use the `mindspore.ops.assign` interface to assign parameters. This method is commonly used in [Custom Optimizer](https://www.mindspore.cn/docs/en/master/model_train/custom_program/optimizer.html) scenarios. The following is a simple sample modification of parameter values during runtime:

```python
import mindspore as ms

@ms.jit
def modify_parameter():
b_hat = ms.Tensor([7, 8, 9])
ops.assign(net.b, b_hat)
return True

modify_parameter()
print(net.b.asnumpy())
```

```text
[7. 8. 9.]
```

### Parameter Tuple

ParameterTuple, variable tuple, used to store multiple Parameter, is inherited from tuple tuples, and provides cloning function.

The following example provides the ParameterTuple creation method:

```python
from mindspore.common.initializer import initializer
from mindspore import ParameterTuple
# Creation
x = Parameter(default_input=ms.Tensor(np.arange(2 * 3).reshape((2, 3))), name="x")
y = Parameter(default_input=initializer('ones', [1, 2, 3], ms.float32), name='y')
z = Parameter(default_input=2.0, name='z')
params = ParameterTuple((x, y, z))

# Clone from params and change the name to "params_copy"
params_copy = params.clone("params_copy")

print(params)
print(params_copy)
```

```text
(Parameter (name=x, shape=(2, 3), dtype=Int64, requires_grad=True), Parameter (name=y, shape=(1, 2, 3), dtype=Float32, requires_grad=True), Parameter (name=z, shape=(), dtype=Float32, requires_grad=True))
(Parameter (name=params_copy.x, shape=(2, 3), dtype=Int64, requires_grad=True), Parameter (name=params_copy.y, shape=(1, 2, 3), dtype=Float32, requires_grad=True), Parameter (name=params_copy.z, shape=(), dtype=Float32, requires_grad=True))
```

## Cell Training State Change

Some Tensor operations in neural networks do not behave the same during training and inference, e.g., `nn.Dropout` performs random dropout during training but not during inference, and `nn.BatchNorm` requires updating the `mean` and `var` variables during training and fixing their values unchanged during inference. So we can set the state of the neural network through the `Cell.set_train` interface.

When `set_train` is set to True, the neural network state is `train`, and the default value of `set_train` interface is `True`:

```python
net.set_train()
print(net.phase)
```

```text
train
```

When `set_train` is set to False, the neural network state is `predict`:

```python
net.set_train(False)
print(net.phase)
```

```text
predict
```

tutorials/source_en/advanced/modules/loss.md → docs/mindspore/source_en/model_train/custom_program/loss.md View File

@@ -1,6 +1,6 @@
[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/tutorials/source_en/advanced/modules/loss.md)
# Custom Loss Function

# Loss Function
[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/model_train/custom_program/loss.md)

A loss function is also called objective function and is used to measure the difference between a predicted value and an actual value.

@@ -8,7 +8,7 @@ In deep learning, model training is a process of reducing the loss function valu

The `mindspore.nn` module provides many [general loss functions](https://www.mindspore.cn/docs/en/master/api_python/mindspore.nn.html#loss-function), but these functions cannot meet all requirements. In many cases, you need to customize the required loss functions. The following describes how to customize loss functions.

![lossfun.png](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/tutorials/source_zh_cn/advanced/modules/images/loss_function.png)
![lossfun.png](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_zh_cn/model_train/custom_program/images/loss_function.png)

## Built-in Loss Functions


+ 98
- 0
docs/mindspore/source_en/model_train/custom_program/network_custom.md View File

@@ -0,0 +1,98 @@
# Custom Neural Network Layers

[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/model_train/custom_program/network_custom.md)

Normally, the neural network layer interface and function interface provided by MindSpore can meet the model construction requirements, but since the AI field is constantly updating, it is possible to encounter new network structures without built-in modules. At this point, we can customize the neural network layer through the function interface provided by MindSpore, Primitive operator, and can use the `Cell.bprop` method to customize the reverse. The following are the details of each of the three customization methods.

## Constructing Neural Network Layers by Using the Function Interface

MindSpore provides a large number of basic function interfaces, which can be used to construct complex Tensor operations, encapsulated as neural network layers. The following is an example of `Threshold` with the following equation:

$$
y =\begin{cases}
x, &\text{ if } x > \text{threshold} \\
\text{value}, &\text{ otherwise }
\end{cases}
$$

It can be seen that `Threshold` determines whether the value of the Tensor is greater than the `threshold` value, keeps the value whose judgment result is `True`, and replaces the value whose judgment result is `False`. Therefore, the corresponding implementation is as follows:

```python
class Threshold(nn.Cell):
def __init__(self, threshold, value):
super().__init__()
self.threshold = threshold
self.value = value

def construct(self, inputs):
cond = ops.gt(inputs, self.threshold)
value = ops.fill(inputs.dtype, inputs.shape, self.value)
return ops.select(cond, inputs, value)
```

Here `ops.gt`, `ops.fill`, and `ops.select` are used to implement judgment and replacement respectively. The following custom `Threshold` layer is implemented:

```python
m = Threshold(0.1, 20)
inputs = mindspore.Tensor([0.1, 0.2, 0.3], mindspore.float32)
m(inputs)
```

```text
Tensor(shape=[3], dtype=Float32, value= [ 2.00000000e+01, 2.00000003e-01, 3.00000012e-01])
```

It can be seen that `inputs[0] = threshold`, so it is replaced with `20`.

## Custom Cell Reverse

In special scenarios, we not only need to customize the forward logic of the neural network layer, but also want to manually control the computation of its reverse, which we can define through the `Cell.bprop` interface. The function will be used in scenarios such as new neural network structure design and backward propagation speed optimization. In the following, we take `Dropout2d` as an example to introduce custom Cell reverse.

```python
class Dropout2d(nn.Cell):
def __init__(self, keep_prob):
super().__init__()
self.keep_prob = keep_prob
self.dropout2d = ops.Dropout2D(keep_prob)

def construct(self, x):
return self.dropout2d(x)

def bprop(self, x, out, dout):
_, mask = out
dy, _ = dout
if self.keep_prob != 0:
dy = dy * (1 / self.keep_prob)
dy = mask.astype(mindspore.float32) * dy
return (dy.astype(x.dtype), )

dropout_2d = Dropout2d(0.8)
dropout_2d.bprop_debug = True
```

The `bprop` method has three separate input parameters:

- *x*: Forward input. When there are multiple forward inputs, the same number of inputs are required.
- *out*: Forward input.
- *dout*: When backward propagation is performed, the current Cell executes the previous reverse result.

Generally we need to calculate the reverse result according to the reverse derivative formula based on the forward output and the reverse result of the front layer, and return it. The reverse calculation of `Dropout2d` requires masking the reverse result of the front layer based on the `mask` matrix of the forward output, and then scaling according to `keep_prob`. The final implementation can get the correct calculation result.

When customizing the reverse direction of a Cell, it supports extended writing in PyNative mode and can differentiate the weights inside the Cell. The specific columns are as follows:

```python
class NetWithParam(nn.Cell):
def __init__(self):
super(NetWithParam, self).__init__()
self.w = Parameter(Tensor(np.array([2.0], dtype=np.float32)), name='weight')
self.internal_params = [self.w]

def construct(self, x):
output = self.w * x
return output

def bprop(self, *args):
return (self.w * args[-1],), {self.w: args[0] * args[-1]}
```

`bprop` method supports *args as an input parameter, and the last data in the args array, `args[-1]` is the gradient returned to the cell. Set the weight of differentiation through `self.internal_params`, and return a tuple and a dictionary in the `bprop` function. Return the tuple corresponding to the input gradient, as well as the dictionary corresponding to the gradient with key as the weight and value as the weight.

+ 12
- 0
docs/mindspore/source_en/model_train/custom_program/op_custom.rst View File

@@ -0,0 +1,12 @@
Custom Operators
=================

.. toctree::
:glob:
:maxdepth: 1

operation/op_custom
operation/ms_kernel
operation/op_custom_adv
operation/op_custom_aot
operation/op_custom_ascendc

tutorials/experts/source_en/operation/ms_kernel.md → docs/mindspore/source_en/model_train/custom_program/operation/ms_kernel.md View File

@@ -1,6 +1,6 @@
# MindSpore Hybrid Syntax Specification

[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/tutorials/experts/source_en/operation/ms_kernel.md)
[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/model_train/custom_program/operation/ms_kernel.md)

## Overview


tutorials/experts/source_en/operation/op_custom.md → docs/mindspore/source_en/model_train/custom_program/operation/op_custom.md View File

@@ -1,6 +1,6 @@
# Custom Operators (Custom-based)

[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/tutorials/experts/source_en/operation/op_custom.md)
[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/model_train/custom_program/operation/op_custom.md)

## Overview

@@ -139,10 +139,10 @@ JIT (Just In Time) refers to operators compiled directly by the framework during

A custom operator of Hybrid type is the default defined type of a custom operator. By using a custom operator of the Hybrid type, users can describe the operator calculation logic in Python-like syntax without paying attention to the engineering details defined by the operator for the MindSpore framework, allowing the user to focus on the algorithm itself.

Custom operators of Hybrid type use [MindSpore Hybrid DSL](https://www.mindspore.cn/tutorials/experts/en/master/operation/ms_kernel.html#syntax-specification) to describe the implementation of the calculation logic inside the operator. Functions defined with MindSpore Hybrid DSL can be parsed by the [AKG Operator Compiler](https://gitee.com/mindspore/akg) for JIT compilation to generate efficient operators for use in training reasoning for large-scale models. At the same time, the function defined by MindSpore Hybrid DSL can be called directly as a `numpy` function, which is convenient for users to debug and flexibly switch to [pyfunc type custom operator](#the-introduction-to-custom-operator-an-example), so that when developed, custom operator expressions are reused for multiple modes, multiple platforms and multiple scenes.
Custom operators of Hybrid type use [MindSpore Hybrid DSL](https://www.mindspore.cn/docs/en/master/model_train/custom_program/operation/ms_kernel.html#syntax-specification) to describe the implementation of the calculation logic inside the operator. Functions defined with MindSpore Hybrid DSL can be parsed by the [AKG Operator Compiler](https://gitee.com/mindspore/akg) for JIT compilation to generate efficient operators for use in training reasoning for large-scale models. At the same time, the function defined by MindSpore Hybrid DSL can be called directly as a `numpy` function, which is convenient for users to debug and flexibly switch to [pyfunc type custom operator](#the-introduction-to-custom-operator-an-example), so that when developed, custom operator expressions are reused for multiple modes, multiple platforms and multiple scenes.

The following example (test_custom_hybrid.py) shows how to write a custom operator of the hybrid type. The operator computes the sum of two tensors.
Notice that custom operators of Hybrid type use the source to source transformation method to connect the graph compiler and the operator compiler. Users can use the keywords of MindSpore Hybrid DSL directly in the script, such as `output_tensor` below, without importing any Python modules. For more information about the keywords, refer to [MindSpore Hybrid DSL Keywords](https://www.mindspore.cn/tutorials/experts/en/master/operation/ms_kernel.html#keywords).
Notice that custom operators of Hybrid type use the source to source transformation method to connect the graph compiler and the operator compiler. Users can use the keywords of MindSpore Hybrid DSL directly in the script, such as `output_tensor` below, without importing any Python modules. For more information about the keywords, refer to [MindSpore Hybrid DSL Keywords](https://www.mindspore.cn/docs/en/master/model_train/custom_program/operation/ms_kernel.html#keywords).

```python
import numpy as np
@@ -198,7 +198,7 @@ The custom operator of akg type uses the [MindSpore AKG](https://gitee.com/minds

Operator output shape and data type inference can be realized by defining Python functions to describe the inference logic of operator output shape and data type.

If the operator contains attributes or only supports specific input and output data types or data formats, operator information needs to be registered, and for how to generate operator information, see [Registering the Operator Information](https://www.mindspore.cn/tutorials/experts/en/master/operation/op_custom_adv.html#registering-the-operator-information). If the operator information is not registered, when operator selection and mapping are made in the backend, the operator information is derived from the input of the current operator.
If the operator contains attributes or only supports specific input and output data types or data formats, operator information needs to be registered, and for how to generate operator information, see [Registering the Operator Information](https://www.mindspore.cn/docs/en/master/model_train/custom_program/operation/op_custom_adv.html#registering-the-operator-information). If the operator information is not registered, when operator selection and mapping are made in the backend, the operator information is derived from the input of the current operator.

The following is an example of the development process of a custom operator of type akg in test_custom_akg.py, where the custom operator implements the addition of two input tensors.

@@ -252,7 +252,7 @@ For more complete examples of akg-type custom operators, see the [use cases](htt

## AOT-Compiled Custom Operator

AOT type of customized operator means that the user compiles the operator into a binary file beforehand and then accesses the network. Usually, users optimize their implementations using programming languages such as C/C++/CUDA and compile their operators as dynamic libraries to accelerate MindSpore networks. As a result, users can perform ultimate optimization on their operators and leverage the performance of the corresponding backend hardware. Here, we will introduce some basic knowledge about AOT type custom operators. For more advanced usage and functionality of AOT type custom operators, please refer to [Advanced Usage of AOT Type Custom Operators](https://www.mindspore.cn/tutorials/experts/en/master/operation/op_custom_aot.html).
AOT type of customized operator means that the user compiles the operator into a binary file beforehand and then accesses the network. Usually, users optimize their implementations using programming languages such as C/C++/CUDA and compile their operators as dynamic libraries to accelerate MindSpore networks. As a result, users can perform ultimate optimization on their operators and leverage the performance of the corresponding backend hardware. Here, we will introduce some basic knowledge about AOT type custom operators. For more advanced usage and functionality of AOT type custom operators, please refer to [Advanced Usage of AOT Type Custom Operators](https://www.mindspore.cn/docs/en/master/model_train/custom_program/operation/op_custom_aot.html).

### Defining Custom Operator of aot Type

@@ -279,7 +279,7 @@ In the Python script, the format for the `func` input in `Custom` is `Path_To_Fu

Operator output shape and data type inference can be realized by defining Python functions to describe the inference logic.

If the operator only supports some specific input and output data types, the operator information needs to be registered. For the creation of operator information, please refer to [Registering the Operator Information](https://www.mindspore.cn/tutorials/experts/en/master/operation/op_custom_adv.html#registering-the-operator-information).
If the operator only supports some specific input and output data types, the operator information needs to be registered. For the creation of operator information, please refer to [Registering the Operator Information](https://www.mindspore.cn/docs/en/master/model_train/custom_program/operation/op_custom_adv.html#registering-the-operator-information).

The following examples introduce the development process of aot type custom operator on GPU platform and CPU platform, where the custom operator implements the function of adding two input tensors.

@@ -452,7 +452,7 @@ The custom operator of julia type uses Julia to describe the internal calculatio

Operator output shape and data type inference can be realized by defining Python functions to describe the inference logic of the operator output shape and the data type.

If the custom operator only supports specific input and output data types, you need to define the operator information. For the creation of operator information, please refer to [Registering the Operator Information](https://www.mindspore.cn/tutorials/experts/en/master/operation/op_custom_adv.html#registering-the-operator-information).
If the custom operator only supports specific input and output data types, you need to define the operator information. For the creation of operator information, please refer to [Registering the Operator Information](https://www.mindspore.cn/docs/en/master/model_train/custom_program/operation/op_custom_adv.html#registering-the-operator-information).

Takes the function of adding two input tensors as an example to introduce how to define a custom operator of julia type.


tutorials/experts/source_en/operation/op_custom_adv.md → docs/mindspore/source_en/model_train/custom_program/operation/op_custom_adv.md View File

@@ -1,6 +1,6 @@
# Custom Operator Registration

[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/tutorials/experts/source_en/operation/op_custom_adv.md)
[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/model_train/custom_program/operation/op_custom_adv.md)

## Registering the Operator Information


tutorials/experts/source_en/operation/op_custom_aot.md → docs/mindspore/source_en/model_train/custom_program/operation/op_custom_aot.md View File

@@ -1,12 +1,12 @@
# Advanced Usage of aot-type Custom Operators

[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/tutorials/experts/source_en/operation/op_custom_aot.md)
[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/model_train/custom_program/operation/op_custom_aot.md)

## Overview

aot-type custom operators use a pre-compilation approach, which requires developers to write the source code files for the corresponding function based on a specific interface, and compile the source code files in advance into a dynamic link library.
Then, during network runtime, the framework will automatically call and execute the function in the dynamic link library.
aot-type custom operators support CUDA language for GPU platforms and C and C++ languages for CPU platforms. For basic knowledge of developing aot-type custom operators, please refer to [basic tutorial](https://www.mindspore.cn/tutorials/experts/en/master/operation/op_custom.html#defining-custom-operator-of-aot-type).
aot-type custom operators support CUDA language for GPU platforms and C and C++ languages for CPU platforms. For basic knowledge of developing aot-type custom operators, please refer to [basic tutorial](https://www.mindspore.cn/docs/en/master/model_train/custom_program/operation/op_custom.html#defining-custom-operator-of-aot-type).

In this tutorial, we will demonstrate advanced features of aot-type custom operators, including:


Some files were not shown because too many files changed in this diff

Loading…
Cancel
Save
Baidu
map