@@ -50,7 +50,7 @@ General Process of Applying the MindSpore Golden Stick
- **Optimize the network using the MindSpore Golden Stick:** In the original training process, after the original network is defined and before the network is trained, use the MindSpore Golden Stick to optimize the network structure. Generally, this step is implemented by calling the `apply` API of MindSpore Golden Stick. For details, see `Applying the SimQAT Algorithm <https://mindspore.cn/golden_stick/docs/en/master/quantization/simqat.html>`_ .
- **Register the MindSpore Golden Stick callback:** Register the callback of the MindSpore Golden Stick into the model to be trained. Generally, in this step, the `callback` function of MindSpore Golden Stick is called to obtain the corresponding callback object and `register the object into the model <https://www.mindspore.cn/tutorials/en/master/advanced/model/callback.html>`_ .
- **Register the MindSpore Golden Stick callback:** Register the callback of the MindSpore Golden Stick into the model to be trained. Generally, in this step, the `callback` function of MindSpore Golden Stick is called to obtain the corresponding callback object and `register the object into the model <https://www.mindspore.cn/docs/en/master/model_train/train_process/model/callback.html>`_ .
2. Deployment
@@ -59,7 +59,7 @@ General Process of Applying the MindSpore Golden Stick
.. note::
- For details about how to apply the MindSpore Golden Stick, see the detailed description and sample code in each algorithm section.
- For details about the "ms.export" step in the process, see `Exporting MINDIR Model <https://www.mindspore.cn/tutorials/en/master/beginner/save_load.html#saving-and-loading-mindir>`_ .
- For details about the "MindSpore infer" step in the process, see `MindSpore Inference Runtime <https://mindspore.cn/tutorials/experts/en/master/infer/inference.html>`_ .
- For details about the "MindSpore infer" step in the process, see `MindSpore Inference Runtime <https://mindspore.cn/docs/en/master/model_infer/overview.html>`_ .
@@ -50,7 +50,7 @@ MindSpore Golden Stick除了提供丰富的模型压缩算法外,一个重要
- **应用MindSpore Golden Stick算法优化网络:** 在原训练流程中,在定义原始网络之后,网络训练之前,应用MindSpore Golden Stick算法优化网络结构。一般这个步骤是调用MindSpore Golden Stick的 `apply` 接口实现的,可以参考 `应用SimQAT算法 <https://mindspore.cn/golden_stick/docs/zh-CN/master/quantization/simqat.html#%E5%BA%94%E7%94%A8%E9%87%8F%E5%8C%96%E7%AE%97%E6%B3%95>`_。
- **注册MindSpore Golden Stick回调逻辑:** 将MindSpore Golden Stick算法的回调逻辑注册到要训练的model中。一般这个步骤是调用MindSpore Golden Stick的 `callback` 获取相应的callback对象, `注册到model <https://www.mindspore.cn/tutorials/zh-CN/master/advanced/model/callback.html>`_ 中。
- **注册MindSpore Golden Stick回调逻辑:** 将MindSpore Golden Stick算法的回调逻辑注册到要训练的model中。一般这个步骤是调用MindSpore Golden Stick的 `callback` 获取相应的callback对象, `注册到model <https://www.mindspore.cn/docs/zh-CN/master/model_train/train_process/model/callback.html>`_ 中。
2. 部署阶段
@@ -59,7 +59,7 @@ MindSpore Golden Stick除了提供丰富的模型压缩算法外,一个重要
.. note::
- 应用MindSpore Golden Stick算法的细节,可以在每个算法章节中找到详细说明和示例代码。
@@ -39,7 +39,7 @@ This document demonstrates the use of the models provided by MindSpore Hub for b
```
3. After loading the model, you can use MindSpore to do inference. You can refer to [Multi-Platform Inference Overview](https://www.mindspore.cn/tutorials/experts/en/master/infer/inference.html).
3. After loading the model, you can use MindSpore to do inference. You can refer to [Multi-Platform Inference Overview](https://www.mindspore.cn/docs/en/master/model_infer/overview.html).
Whether it is an off-the-shelf prepared model, or a custom written model, the model needs to be exported to a `.mindir` file. Here we use the already-implemented [LeNet model](https://gitee.com/mindspore/models/tree/master/research/cv/lenet).
> This summary is exported using the MindSpore cloud side feature. For more information, please refer to [MindSpore Tutorial](https://www.mindspore.cn/tutorials/experts/en/master/index.html).
> This summary is exported using the MindSpore cloud side feature. For more information, please refer to [MindSpore Tutorial](https://www.mindspore.cn/tutorials/en/master/index.html).
For scenarios where large-scale neural network models have many parameters and cannot be fully loaded into a single device for inference, distributed inference can be performed using multiple devices. This tutorial describes how to perform MindSpore Lite cloud-side distributed inference using the [C++ interface](https://www.mindspore.cn/lite/api/en/master/index.html). Cloud-side distributed inference is roughly the same process as [Cloud-side single-card inference](https://www.mindspore.cn/lite/docs/en/master/use/cloud_infer/runtime_cpp.html) and can be cross-referenced. For the related contents of distributed inference, please refer to [MindSpore Distributed inference](https://www.mindspore.cn/tutorials/experts/en/master/parallel/model_loading.html#inference), and MindSpore Lite cloud-side distributed inference has more optimization for performance aspects.
For scenarios where large-scale neural network models have many parameters and cannot be fully loaded into a single device for inference, distributed inference can be performed using multiple devices. This tutorial describes how to perform MindSpore Lite cloud-side distributed inference using the [C++ interface](https://www.mindspore.cn/lite/api/en/master/index.html). Cloud-side distributed inference is roughly the same process as [Cloud-side single-card inference](https://www.mindspore.cn/lite/docs/en/master/use/cloud_infer/runtime_cpp.html) and can be cross-referenced. For the related contents of distributed inference, please refer to [MindSpore Distributed inference](https://www.mindspore.cn/docs/en/master/model_train/parallel/model_loading.html#inference), and MindSpore Lite cloud-side distributed inference has more optimization for performance aspects.
MindSpore Lite cloud-side distributed inference is only supported to run in Linux environment deployments with Atlas training series and Nvidia GPU as the supported device types. As shown in the figure below, the distributed inference is currently initiated by a multi-process approach, where each process corresponds to a `Rank` in the communication set, loading, compiling and executing the respective sliced model, with the same input data for each process.
@@ -12,7 +12,7 @@ MindSpore Lite cloud-side distributed inference is only supported to run in Linu
Each process consists of the following main steps:
1. Model reading: Slice and [export the distributed MindIR model](https://www.mindspore.cn/tutorials/experts/en/master/parallel/model_loading.html#exporting-mindir-files-in-the-distributed-scenario) via MindSpore. The number of MindIR models is the same as the number of devices for loading to each device for inference.
1. Model reading: Slice and [export the distributed MindIR model](https://www.mindspore.cn/docs/en/master/model_train/parallel/model_loading.html#exporting-mindir-files-in-the-distributed-scenario) via MindSpore. The number of MindIR models is the same as the number of devices for loading to each device for inference.
2. Context creation and configuration: Create and configure the [Context](https://www.mindspore.cn/lite/api/en/master/generate/classmindspore_Context.html), and hold the distributed inference parameters to guide distributed model compilation and model execution.
3. Model loading and compilation: Use the [Model::Build](https://www.mindspore.cn/lite/api/en/master/generate/classmindspore_Model.html) interface for model loading and model compilation. The model loading phase parses the file cache into a runtime model. The model compilation phase optimizes the front-end computational graph into a high-performance back-end computational graph. The process is time-consuming and it is recommended to compile once and inference multiple times.
4. Model input data padding.
@@ -24,7 +24,7 @@ Each process consists of the following main steps:
1. To download the cloud-side distributed inference C++ sample code, please select the device type: [Ascend](https://gitee.com/mindspore/mindspore/tree/master/mindspore/lite/examples/cloud_infer/ascend_ge_distributed_cpp) or [GPU](https://gitee.com/mindspore/mindspore/tree/master/mindspore/lite/examples/cloud_infer/gpu_trt_distributed_cpp). The directory will be referred to later as the example code directory.
2. Slice and [export the distributed MindIR model](https://www.mindspore.cn/tutorials/experts/en/master/parallel/model_loading.html#exporting-mindir-files-in-the-distributed-scenario) via MindSpore and store it to the sample code directory. For a quick experience, you can download the two sliced Matmul model files [Matmul0.mindir](https://download.mindspore.cn/model_zoo/official/lite/quick_start/Matmul0.mindir), [Matmul1.mindir](https://download.mindspore.cn/model_zoo/official/lite/quick_start/Matmul1.mindir).
2. Slice and [export the distributed MindIR model](https://www.mindspore.cn/docs/en/master/model_train/parallel/model_loading.html#exporting-mindir-files-in-the-distributed-scenario) via MindSpore and store it to the sample code directory. For a quick experience, you can download the two sliced Matmul model files [Matmul0.mindir](https://download.mindspore.cn/model_zoo/official/lite/quick_start/Matmul0.mindir), [Matmul1.mindir](https://download.mindspore.cn/model_zoo/official/lite/quick_start/Matmul1.mindir).
3. For Ascend device type, generate the networking information file through hccl_tools.py as needed, store it in the sample code directory, and fill the path of the file into the configuration file `config_file.ini` in the sample code directory.
For scenarios where large-scale neural network models have many parameters and cannot be fully loaded into a single device for inference, distributed inference can be performed using multiple devices. This tutorial describes how to perform MindSpore Lite cloud-side distributed inference using the [Python interface](https://www.mindspore.cn/lite/api/en/master/mindspore_lite.html). Cloud-side distributed inference is roughly the same process as [Cloud-side single-card inference](https://www.mindspore.cn/lite/docs/en/master/use/cloud_infer/runtime_python.html) and can be cross-referenced. For the related contents of distributed inference, please refer to [MindSpore Distributed inference](https://www.mindspore.cn/tutorials/experts/en/master/parallel/model_loading.html#inference), and MindSpore Lite cloud-side distributed inference has more optimization for performance aspects.
For scenarios where large-scale neural network models have many parameters and cannot be fully loaded into a single device for inference, distributed inference can be performed using multiple devices. This tutorial describes how to perform MindSpore Lite cloud-side distributed inference using the [Python interface](https://www.mindspore.cn/lite/api/en/master/mindspore_lite.html). Cloud-side distributed inference is roughly the same process as [Cloud-side single-card inference](https://www.mindspore.cn/lite/docs/en/master/use/cloud_infer/runtime_python.html) and can be cross-referenced. For the related contents of distributed inference, please refer to [MindSpore Distributed inference](https://www.mindspore.cn/docs/en/master/model_train/parallel/model_loading.html#inference), and MindSpore Lite cloud-side distributed inference has more optimization for performance aspects.
MindSpore Lite cloud-side distributed inference is only supported to run in Linux environment deployments with Atlas training series and Nvidia GPU as the supported device types. As shown in the figure below, the distributed inference is currently initiated by a multi-process approach, where each process corresponds to a `Rank` in the communication set, loading, compiling and executing the respective sliced model, with the same input data for each process.
@@ -12,7 +12,7 @@ MindSpore Lite cloud-side distributed inference is only supported to run in Linu
Each process consists of the following main steps:
1. Model reading: Slice and [export the distributed MindIR model](https://www.mindspore.cn/tutorials/experts/en/master/parallel/model_loading.html#exporting-mindir-files-in-the-distributed-scenario) via MindSpore. The number of MindIR models is the same as the number of devices for loading to each device for inference.
1. Model reading: Slice and [export the distributed MindIR model](https://www.mindspore.cn/docs/en/master/model_train/parallel/model_loading.html#exporting-mindir-files-in-the-distributed-scenario) via MindSpore. The number of MindIR models is the same as the number of devices for loading to each device for inference.
2. Context creation and configuration: Create and configure the [Context](https://www.mindspore.cn/lite/api/en/master/mindspore_lite/mindspore_lite.Context.html#mindspore_lite.Context), and hold the distributed inference parameters to guide distributed model compilation and model execution.
3. Model loading and compilation: Use the [Model.build_from_file](https://www.mindspore.cn/lite/api/en/master/mindspore_lite/mindspore_lite.Model.html#mindspore_lite.Model.build_from_file) interface for model loading and model compilation. The model loading phase parses the file cache into a runtime model. The model compilation phase optimizes the front-end computational graph into a high-performance back-end computational graph. The process is time-consuming and it is recommended to compile once and inference multiple times.
4. Model input data padding.
@@ -24,7 +24,7 @@ Each process consists of the following main steps:
1. To download the cloud-side distributed inference python sample code, please select the device type: [Ascend](https://gitee.com/mindspore/mindspore/tree/master/mindspore/lite/examples/cloud_infer/ascend_ge_distributed_cpp) or [GPU](https://gitee.com/mindspore/mindspore/tree/master/mindspore/lite/examples/cloud_infer/gpu_trt_distributed_cpp). The directory will be referred to later as the example code directory.
2. Slice and [export the distributed MindIR model](https://www.mindspore.cn/tutorials/experts/en/master/parallel/model_loading.html#exporting-mindir-files-in-the-distributed-scenario) via MindSpore and store it to the sample code directory. For a quick experience, you can download the two sliced Matmul model files [Matmul0.mindir](https://download.mindspore.cn/model_zoo/official/lite/quick_start/Matmul0.mindir), [Matmul1.mindir](https://download.mindspore.cn/model_zoo/official/lite/quick_start/Matmul1.mindir).
2. Slice and [export the distributed MindIR model](https://www.mindspore.cn/docs/en/master/model_train/parallel/model_loading.html#exporting-mindir-files-in-the-distributed-scenario) via MindSpore and store it to the sample code directory. For a quick experience, you can download the two sliced Matmul model files [Matmul0.mindir](https://download.mindspore.cn/model_zoo/official/lite/quick_start/Matmul0.mindir), [Matmul1.mindir](https://download.mindspore.cn/model_zoo/official/lite/quick_start/Matmul1.mindir).
3. For Ascend device type, generate the networking information file through hccl_tools.py as needed, store it in the sample code directory, and fill the path of the file into the configuration file `config_file.ini` in the sample code directory.
@@ -70,7 +70,7 @@ The following describes the parameters in detail.
| `--outputDataType=<OUTPUTDATATYPE>` | No | Set data type of output tensor of quantized model. Only valid for output tensor which has quantization parameters(scale and zero point). Keep same with the data type of output tensor of origin model by default. | FLOAT32, INT8, UINT8, DEFAULT | DEFAULT | - |
| `--outputDataFormat=<OUTPUTDATAFORMAT>` | No | Set the output format of exported model. Only valid for 4-dimensional outputs. | NHWC, NCHW | - | - |
| `--encryptKey=<ENCRYPTKEY>` | No | Set the key for exporting encrypted `ms` models. The key is expressed in hexadecimal. Only AES-GCM is supported, and the key length is only 16Byte. | - | - | - |
| `--encryption=<ENCRYPTION>` | No | Set whether to encrypt when exporting the `ms` model. Exporting encryption can protect the integrity of the model, but it will increase the runtime initialization time. | true, false | true | - |
| `--encryption=<ENCRYPTION>` | No | Set whether to encrypt when exporting the `ms` model. Exporting encryption can protect the integrity of the model, but it will increase the runtime initialization time. | true, false | false | - |
| `--infer=<INFER>` | No | Set whether to pre-inference when conversion is complete. | true, false | false | - |
> - The parameter name and parameter value are separated by an equal sign (=) and no space is allowed between them.
[](https://gitee.com/mindspore/docs/blob/master/docs/mindchemistry/docs/source_zh_cn/user/molecular_generation.md)
[](https://gitee.com/mindspore/docs/blob/master/docs/mindchemistry/docs/source_zh_cn/user/molecular_generation.md)
Molecular generation, using deep learning generation models to predict and generate components in the particle system. We have integrated a method based on active learning for high entropy alloy design [1], designing high entropy alloy components with extremely low thermal expansion coefficients. In the active learning process, first, candidate high entropy alloy components are generated based on AI models, and the candidate components are screened based on predictive models and thermodynamic calculations to predict the thermal expansion coefficient. Finally, researchers need to determine the final high entropy alloy components based on experimental verification.
Molecular generation, using deep learning generation models to predict and generate components in the particle system. We have integrated a method based on active learning for high entropy alloy design, designing high entropy alloy components with extremely low thermal expansion coefficients. In the active learning process, first, candidate high entropy alloy components are generated based on AI models, and the candidate components are screened based on predictive models and thermodynamic calculations to predict the thermal expansion coefficient. Finally, researchers need to determine the final high entropy alloy components based on experimental verification.
[](https://gitee.com/mindspore/docs/blob/master/docs/mindchemistry/docs/source_zh_cn/user/molecular_prediction.md)
[](https://gitee.com/mindspore/docs/blob/master/docs/mindchemistry/docs/source_zh_cn/user/molecular_prediction.md)
Molecular property prediction, predicting various properties in different particle systems through deep learning networks. We integrated the NequIP model [2] and Allegro model [3] to construct a graph structure description based on the position and number of atoms in the molecular system. Using equivariant calculations and graph neural networks, we calculated the energy of the molecular system.
Molecular property prediction, predicting various properties in different particle systems through deep learning networks. We integrated the NequIP model and Allegro model to construct a graph structure description based on the position and number of atoms in the molecular system. Using equivariant calculations and graph neural networks, we calculated the energy of the molecular system.
Density Functional Theory Hamiltonian Prediction. We integrate the DeephE3nn model, an equivariant neural network based on E3, to predict a Hamiltonian by using the structure of atoms.
Prediction of crystalline material properties. We integrate the Matformer model based on graph neural networks and Transformer architectures, for predicting various properties of crystalline materials.
"You can get parameters of model, data and optimizer from [FourCastNet.yaml](https://gitee.com/mindspore/mindscience/blob/master/MindEarth/applications/medium-range/fourcastnet/FourCastNet.yaml)."
"You can get parameters of model, data and optimizer from [FourCastNet.yaml](https://gitee.com/mindspore/mindscience/blob/master/MindEarth/applications/medium-range/fourcastnet/configs/FourCastNet.yaml)."
]
},
{
@@ -182,7 +182,7 @@
"\n",
"Download the statistic, training and validation dataset from [dataset](https://download.mindspore.cn/mindscience/mindearth/dataset/WeatherBench_1.4_69/) to `./dataset`.\n",
"\n",
"Modify the parameter of `root_dir` in the [FourCastNet.yaml](https://gitee.com/mindspore/mindscience/blob/master/MindEarth/applications/medium-range/fourcastnet/FourCastNet.yaml), which sets the directory for dataset.\n",
"Modify the parameter of `root_dir` in the [FourCastNet.yaml](https://gitee.com/mindspore/mindscience/blob/master/MindEarth/applications/medium-range/fourcastnet/configs/FourCastNet.yaml), which sets the directory for dataset.\n",
"\n",
"The `./dataset` is hosted with the following directory structure:\n",
"You can get parameters of model, data and optimizer from [vit_kno.yaml](https://gitee.com/mindspore/mindscience/blob/master/MindEarth/applications/medium-range/koopman_vit/vit_kno.yaml)."
"You can get parameters of model, data and optimizer from [vit_kno.yaml](https://gitee.com/mindspore/mindscience/blob/master/MindEarth/applications/medium-range/koopman_vit/configs/vit_kno_1.4.yaml)."
]
},
{
@@ -120,7 +120,7 @@
"\n",
"Download the statistic, training and validation dataset from [dataset](https://download.mindspore.cn/mindscience/mindearth/dataset/WeatherBench_1.4_69/) to `./dataset`.\n",
"\n",
"Modify the parameter of `root_dir` in the [vit_kno.yaml](https://gitee.com/mindspore/mindscience/blob/master/MindEarth/applications/medium-range/koopman_vit/vit_kno.yaml), which set the directory for dataset.\n",
"Modify the parameter of `root_dir` in the [vit_kno.yaml](https://gitee.com/mindspore/mindscience/blob/master/MindEarth/applications/medium-range/koopman_vit/configs/vit_kno_1.4.yaml), which set the directory for dataset.\n",
"\n",
"The `./dataset` is hosted with the following directory structure:\n",
@@ -103,7 +103,7 @@ The causes of accuracy problems can be classified into hyperparameter problems,
2. The MindSpore constructor constraint is not complied with during graph construction.
The graph construction does not comply with the MindSpore construct constraints. That is, the network in graph mode does not comply with the constraints declared in the MindSpore static graph syntax support. For example, MindSpore does not support the backward computation of functions with key-value pair parameters. For details about complete constraints, see [Static Graph Syntax Support](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html).
The graph construction does not comply with the MindSpore construct constraints. That is, the network in graph mode does not comply with the constraints declared in the MindSpore static graph syntax support. For example, MindSpore does not support the backward computation of functions with key-value pair parameters. For details about complete constraints, see [Static Graph Syntax Support](https://www.mindspore.cn/docs/en/master/model_train/program_form/static_graph.html).
- Computational Graph Structure Problems
@@ -581,13 +581,13 @@ For details about visualized data analysis during training, see [Viewing Dashboa
### Data Problem Handling
Perform operations such as standardization, normalization, and channel conversion on data. For image data processing, add images with random view and rotation. For details about data shuffle, batch, and multiplication, see [Processing Data](https://www.mindspore.cn/tutorials/en/master/advanced/dataset.html), [Data Argumentation](https://www.mindspore.cn/tutorials/en/master/advanced/dataset.html), and [Auto Augmentation](https://www.mindspore.cn/tutorials/experts/en/master/dataset/augment.html).
Perform operations such as standardization, normalization, and channel conversion on data. For image data processing, add images with random view and rotation. For details about data shuffle, batch, and multiplication, see [Processing and Loading Data](https://www.mindspore.cn/docs/en/master/model_train/index.html).
> For details about how to apply the data augmentation operation to a custom dataset, see the [mindspore.dataset.GeneratorDataset.map](https://www.mindspore.cn/docs/en/master/api_python/dataset/dataset_method/operation/mindspore.dataset.Dataset.map.html#mindspore.dataset.Dataset.map) API.
### Hyperparameter Problem Handling
Hyperparameters in AI training include the global learning rate, epoch, and batch. For details about how to set the dynamic learning rate, see [Optimization Algorithm of Learning Rate](https://mindspore.cn/tutorials/zh-CN/master/advanced/modules/optimizer.html).
Hyperparameters in AI training include the global learning rate, epoch, and batch. For details about how to set the dynamic learning rate, see [Optimization Algorithm of Learning Rate](https://mindspore.cn/docs/zh-CN/master/model_train/custom_program/optimizer.html).
@@ -341,13 +341,13 @@ When you run a script on the Ascend backend or use the mixed precision function,
#### mp.01 Overflow occurs during training
Check method:
When the [mixed precision](https://www.mindspore.cn/tutorials/zh-CN/master/advanced/mixed_precision.html) or the Ascend AI processor is used for training, you are advised to check whether overflow occurs.
When the [mixed precision](https://www.mindspore.cn/tutorials/zh-CN/master/beginner/mixed_precision.html) or the Ascend AI processor is used for training, you are advised to check whether overflow occurs.
After the overflow problem is found, find and analyze the first overflow node. (For Ascend overflow data, find the node with the smallest timestamp based on the timestamp in the file name. For GPU overflow data, find the first node in the execution sequence.) Determine the overflow cause based on the input and output data of the API.
The common solutions to the overflow problem are as follows:
1. Enable dynamic loss scale or set a proper static loss scale value. For details, see [LossScale](https://www.mindspore.cn/tutorials/zh-CN/master/advanced/mixed_precision.html). Note that when the static loss scale in the GPU scenario is directly used for Ascend training, unexpected frequent overflow may occur, affecting convergence. After the loss scale is enabled, you may need to perform multiple experiments to adjust the init_loss_scale (initial value), scale_factor, and scale_window of loss scale until there are few floating-point overflows during training.
1. Enable dynamic loss scale or set a proper static loss scale value. For details, see [LossScale](https://www.mindspore.cn/tutorials/zh-CN/master/beginner/mixed_precision.html). Note that when the static loss scale in the GPU scenario is directly used for Ascend training, unexpected frequent overflow may occur, affecting convergence. After the loss scale is enabled, you may need to perform multiple experiments to adjust the init_loss_scale (initial value), scale_factor, and scale_window of loss scale until there are few floating-point overflows during training.
2. If the overflow problem has a key impact on the accuracy and cannot be avoided, change the corresponding API to the FP32 API (the performance may be greatly affected after the adjustment).
Conclusion:
@@ -358,7 +358,7 @@ Enter here.
Check method:
When [mixed precision](https://www.mindspore.cn/tutorials/zh-CN/master/advanced/mixed_precision.html) is used. You can use the default parameter values of DynamicLossScaleManager or FixedLossScaleManager for training. If there are too many overflow steps and the final accuracy is affected, adjust the value of loss_scale based on the overflow phenomenon. If gradient overflow occurs, decrease the value of loss_scale (by dividing the original value of loss_scale by 2). If gradient underflow occurs, increase the value of loss_scale (by multiplying the original value of loss_scale by 2). In most cases, training on the Ascend AI processor is performed with mixed precision. The computation feature of the Ascend AI processor is different from that of the GPU mixed precision. Therefore, you may need to adjust the value of the LossScaleManager hyperparameter to a value different from that on the GPU based on the training result to ensure the precision.
When [mixed precision](https://www.mindspore.cn/tutorials/zh-CN/master/beginner/mixed_precision.html) is used. You can use the default parameter values of DynamicLossScaleManager or FixedLossScaleManager for training. If there are too many overflow steps and the final accuracy is affected, adjust the value of loss_scale based on the overflow phenomenon. If gradient overflow occurs, decrease the value of loss_scale (by dividing the original value of loss_scale by 2). If gradient underflow occurs, increase the value of loss_scale (by multiplying the original value of loss_scale by 2). In most cases, training on the Ascend AI processor is performed with mixed precision. The computation feature of the Ascend AI processor is different from that of the GPU mixed precision. Therefore, you may need to adjust the value of the LossScaleManager hyperparameter to a value different from that on the GPU based on the training result to ensure the precision.
Conclusion:
@@ -368,7 +368,7 @@ Enter here.
Check method:
Gradient clip forcibly adjusts the gradient to a smaller value when the gradient is greater than a threshold. Gradient clip has a good effect on the gradient explosion problem in RNNs. If both [loss scale](https://www.mindspore.cn/tutorials/zh-CN/master/advanced/mixed_precision.html) and gradient clip are used, perform this check. Check the code to ensure that the application object of gradient clip is the original gradient value obtained by dividing the loss scale.
Gradient clip forcibly adjusts the gradient to a smaller value when the gradient is greater than a threshold. Gradient clip has a good effect on the gradient explosion problem in RNNs. If both [loss scale](https://www.mindspore.cn/tutorials/zh-CN/master/beginner/mixed_precision.html) and gradient clip are used, perform this check. Check the code to ensure that the application object of gradient clip is the original gradient value obtained by dividing the loss scale.
Conclusion:
@@ -378,7 +378,7 @@ Enter here.
Check method:
Gradient penalty is a technique that adds a gradient to a cost function to constrain the gradient length. If both [loss scale](https://www.mindspore.cn/tutorials/zh-CN/master/advanced/mixed_precision.html) and gradient penalty are used, perform this check. Check whether the entered gradient is a gradient without loss scale when computing the gradient penalty item. For example, a gradient substituted for the loss scale may be first divided by the loss scale, and then is used to compute the gradient penalty item.
Gradient penalty is a technique that adds a gradient to a cost function to constrain the gradient length. If both [loss scale](https://www.mindspore.cn/tutorials/zh-CN/master/beginner/mixed_precision.html) and gradient penalty are used, perform this check. Check whether the entered gradient is a gradient without loss scale when computing the gradient penalty item. For example, a gradient substituted for the loss scale may be first divided by the loss scale, and then is used to compute the gradient penalty item.
@@ -43,7 +43,7 @@ By observing the `queue relationship between operators` in the Data Processing t
*Figure 3: Data Preparation Details -- Data Processing*
We can refer to [Optimizing the Data Processing](https://www.mindspore.cn/tutorials/experts/en/master/dataset/optimize.html ) to adjust dataset operations to improve dataset performance.
We can refer to [Optimizing the Data Processing](https://www.mindspore.cn/docs/en/master/model_train/dataset/optimize.html ) to adjust dataset operations to improve dataset performance.
We find that the num_parallel_workers parameter of map operation is 1(default value) by observing the code part of data processing in ResNet50, and code is shown below:
@@ -95,7 +95,7 @@ Open the details page of Operator Time Consumption Ranking, and we find that Mat
*Figure 6: Finding operators that can be optimized via the details page of Operator Time Consumption Ranking*
For Operator Time Consumption optimization, usually float16 type with the less computating amount can be used to improve operator performance if there is no difference in accuracy between float16 and float32 type. We can refer to [Enabling Mixed Precision](https://www.mindspore.cn/tutorials/en/master/advanced/mixed_precision.html) to improve operators performance.
For Operator Time Consumption optimization, usually float16 type with the less computating amount can be used to improve operator performance if there is no difference in accuracy between float16 and float32 type. We can refer to [Enabling Mixed Precision](https://www.mindspore.cn/tutorials/en/master/beginner/mixed_precision.html) to improve operators performance.
@@ -74,7 +74,7 @@ There are two ways to collect neural network performance data. You can enable Pr
- `timeline_limit`(int, optional) - Set the maximum storage size of the timeline file (unit M). When using this parameter, op_time must be set to true. Default value: 500.
- `data_process`(bool, optional) - Indicates whether to collect data to prepare performance data. Default value: true.
@@ -58,7 +58,7 @@ Step 1:Please jump to the `step interval` tab on the `data preparation details
- If there is no time-consuming customized logic in the script, it indicates that sending data from host to device is time-consuming, please feedback to the [MindSpore Community](https://gitee.com/mindspore/mindspore/issues).
Step 2:Please jump to the `data processing` tab on the `data preparation details` page, observe the inter-operator queue, and determine which operation has a performance bottleneck in the data processing. Principles of judgment can be found in the [Performance Profiling](https://www.mindspore.cn/mindinsight/docs/en/master/performance_profiling_ascend.html#data-preparation-performance-analysis) page. Users can reference [Optimizing the Data Processing](https://www.mindspore.cn/tutorials/experts/en/master/dataset/optimize.html) and try to optimize the data processing performance.
Step 2:Please jump to the `data processing` tab on the `data preparation details` page, observe the inter-operator queue, and determine which operation has a performance bottleneck in the data processing. Principles of judgment can be found in the [Performance Profiling](https://www.mindspore.cn/mindinsight/docs/en/master/performance_profiling_ascend.html#data-preparation-performance-analysis) page. Users can reference [Optimizing the Data Processing](https://www.mindspore.cn/docs/en/master/model_train/dataset/optimize.html) and try to optimize the data processing performance.
#### Data Sinking Mode
@@ -69,7 +69,7 @@ Step 1:Please jump to the `step interval` tab on the `data preparation details
Step 2: See how the size curve changes in the host queue. If none of the size in the queue is 0, it indicates that the process by which training data is sent from host to device is a performance bottleneck, please feedback to the [MindSpore Community](https://gitee.com/mindspore/mindspore/issues). Otherwise it indicates that the data processing process is the performance bottleneck, please refer to Step 3 to continue to locate which operation of data processing has performance problems.
Step 3:Please jump to the `data processing` tab on the `data preparation details` page, observe the inter-operator queue, and determine which operation has a performance bottleneck in the data processing. Principles of judgment can be found in the [Performance Profiling](https://www.mindspore.cn/mindinsight/docs/en/master/performance_profiling_ascend.html#data-preparation-performance-analysis) page. Users can reference [Optimizing the Data Processing](https://www.mindspore.cn/tutorials/experts/en/master/dataset/optimize.html) and try to optimize the data processing performance.
Step 3:Please jump to the `data processing` tab on the `data preparation details` page, observe the inter-operator queue, and determine which operation has a performance bottleneck in the data processing. Principles of judgment can be found in the [Performance Profiling](https://www.mindspore.cn/mindinsight/docs/en/master/performance_profiling_ascend.html#data-preparation-performance-analysis) page. Users can reference [Optimizing the Data Processing](https://www.mindspore.cn/docs/en/master/model_train/dataset/optimize.html) and try to optimize the data processing performance.
### Long Forward And Backward Propagation
@@ -125,7 +125,7 @@ Step 2:Observe the forward and backward propagation in the cluster step trace
Step 3: Observe the step tail in the cluster step trace page
- Users should make sure if the step tail of one device is much longer than others first. If it is, it usually caused by slow node in the cluster. Users can refer to Step 1 and Step 2 to find the slow node.
- If the step tail of all devices are essentially the same, and the phase is time-consuming, it is usually due to the long time taken by the AllReduce collective communication operators. Users can try to modify the all_reduce_fusion_config parameter to optimize the performance, and change [AllReduce Fusion Sharding Strategy](https://mindspore.cn/tutorials/experts/en/master/parallel/overview.html) to reduce the time spent in this phase.
- If the step tail of all devices are essentially the same, and the phase is time-consuming, it is usually due to the long time taken by the AllReduce collective communication operators. Users can try to modify the all_reduce_fusion_config parameter to optimize the performance, and change [AllReduce Fusion Sharding Strategy](https://mindspore.cn/docs/en/master/model_train/parallel/overview.html) to reduce the time spent in this phase.
### Model Parallel
@@ -143,7 +143,7 @@ Please refer to step 2 of [Data Parallel](#data-parallel).
Step 3: Observe the pure communication time in the cluster step trace page
On the premise of confirming that there is no slow node through step 1 and step 2, the pure communication time of each card in the cluster should be basically the same. If this phase takes a short time, it means that the communication time caused by re-distribution of operators is very short, and users do not need to consider optimizing the parallel strategy. Otherwise, users need to focus on analyzing whether the parallel strategy can be optimized.
Users need to have a certain understanding of the principle of model parallelism before continue to analyse. Please refer to [Distributed Training](https://www.mindspore.cn/tutorials/experts/en/master/parallel/overview.html) for the basic principles. The following steps are only to assist users in rationality analysis. Whether the parallel strategy has room for optimization and how to optimize it need users to make a judgment after specific analysis of their respective networks.
Users need to have a certain understanding of the principle of model parallelism before continue to analyse. Please refer to [Distributed Training](https://www.mindspore.cn/docs/en/master/model_train/parallel/overview.html) for the basic principles. The following steps are only to assist users in rationality analysis. Whether the parallel strategy has room for optimization and how to optimize it need users to make a judgment after specific analysis of their respective networks.
- If this stage takes a long time, the user can choose any one of the devices and observe its timeline. In the timeline, MindSpore Insight marks the pure communication time, refer to `Pure Communication Op` below.
If the Host side time collection function is enabled, the Host side time-consuming of each stage can be saved in the specified directory after the training is completed. For example, when a Profiler is specified with ``output_ Path="/XXX/profiler_output"`` , the file containing time consumption data on the Host side will be saved in the "/XXX/profiler_output/profile/host_info" directory. The file is in json format and with the prefix ``"timeline_"``, and suffix rank_id. The host side time-consuming file can be viewed by `` chrome://tracing `` . You can use W/S/A/D to zoom in, out, move left, and right to view time consuming information.
If the Host side time collection function is enabled, you can view the time consumption in ascend_timeline_display_[rank_id].json after the traing finished and use `` chrome://tracing `` to display. You can use W/S/A/D to zoom in, out, move left, and right to view time consuming information.
│ ├──── ASCEND_PROFILER_OUTPUT // Performance data collected by the MindSpore Profiler interface
│ └──── profiler_info_*.json
│ ├──── profiler_info_*.json
│ └──── profiler_metadata.json // To record user-defined meta data, call the add_metadata or add_metadata_json interface to generate the file
├──── aicore_intermediate_*_detail.csv
├──── aicore_intermediate_*_type.csv
├──── aicpu_intermediate_*.csv
@@ -99,61 +95,64 @@ An example of the performance data catalog structure is shown below:
├──── profiler_info_*.json
├──── step_trace_point_info_*.json
└──── step_trace_raw_*_detail_time.csv
└──── dataset_*.csv
- *represents rank id
- \*represents rank id
Performance Data File Description
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
PROF_{number}_{timestamp}_{string} directory is the performance data collected by CANN Profiling, which is mainly stored in mindstudio_profiler_output. The data introduction can be referred to `Performance data file description <https://www.hiascend.com/document/detail/en/mindstudio/70RC2/mscommandtoolug/mscommandug/atlasprofiling_16_0062.html>`_.
The profiler directory contains three types of files, csv, json, and txt, which cover performance data in terms of operator execution time, memory usage, communication, etc. The file descriptions are shown in the following table. For detailed descriptions of some files, refer to `Performance data <https://www.mindspore.cn/mindinsight/docs/en/master/profiler_files_description.html>`_.
step_trace_point_info_*.json Information about the operator corresponding to the step node (only mode=GRAPH,export GRAPH_OP_RUM=0)
step_trace_raw_*_detail_time.csv Time information for the nodes of each STEP (only mode=GRAPH,export GRAPH_OP_RUM=0)
dynamic_shape_info_*.json Operator information under dynamic shape
dynamic_shape_info_*.json Operator information under dynamic shape
pipeline_profiling_*.json MindSpore data processing to capture intermediate files of falling disks for MindInsight visualization
minddata_pipeline_raw_*.csv MindSpore data processing to capture intermediate files of falling disks for MindInsight visualization
minddata_pipeline_summary_*.csv MindSpore data processing to capture intermediate files of falling disks for MindInsight visualization
minddata_pipeline_summary_*.json MindSpore data processing to capture intermediate files of falling disks for MindInsight visualization
framework_raw_*.csv Information about AI Core operators in MindSpore data processing
device_queue_profiling_*.txt MindSpore data processing to capture intermediate files of falling disks for MindInsight visualization (data sinking scenarios only)
minddata_aicpu_*.txt Performance data for AI CPU operators in MindSpore data processing (data sinking scenarios only)
dataset_iterator_profiling_*.txt MindSpore data processing to capture intermediate files of falling disks for MindInsight visualization (data non-sinking scenarios only)
pipeline_profiling_*.json MindSpore data processing to capture intermediate files of falling disks for MindInsight visualization
minddata_pipeline_raw_*.csv MindSpore data processing to capture intermediate files of falling disks for MindInsight visualization
minddata_pipeline_summary_*.csv MindSpore data processing to capture intermediate files of falling disks for MindInsight visualization
minddata_pipeline_summary_*.json MindSpore data processing to capture intermediate files of falling disks for MindInsight visualization
framework_raw_*.csv Information about AI Core operators in MindSpore data processing
device_queue_profiling_*.txt MindSpore data processing to capture intermediate files of falling disks for MindInsight visualization (data sinking scenarios only)
minddata_aicpu_*.txt Performance data for AI CPU operators in MindSpore data processing (data sinking scenarios only)
dataset_iterator_profiling_*.txt MindSpore data processing to capture intermediate files of falling disks for MindInsight visualization (data non-sinking scenarios only)
aicore_intermediate_*_detail.csv AI Core operator data
aicore_intermediate_*_type.csv AI Core operator calling counts and time taken statistics
aicpu_intermediate_*.csv Time taken data after AI CPU operator information parsing
flops_*.txt Record the number of floating-point calculations (FLOPs), floating-point calculations per second (FLOPS) for AI Core operators
flops_summary_*.json Record total FLOPs for all operators, average FLOPs for all operators, average FLOPS_Utilization
aicore_intermediate_*_detail.csv AI Core operator data
aicore_intermediate_*_type.csv AI Core operator calling counts and time taken statistics
aicpu_intermediate_*.csv Time taken data after AI CPU operator information parsing
flops_*.txt Record the number of floating-point calculations (FLOPs), floating-point calculations per second (FLOPS) for AI Core operators
flops_summary_*.json Record total FLOPs for all operators, average FLOPs for all operators, average FLOPS_Utilization
ascend_timeline_display_*.json timeline visualization file for MindStudio Insight visualization
output_timeline_data_*.txt Operator timeline data, only if AI Core operator data exists
cpu_ms_memory_record_*.txt Raw files for memory profiling
operator_memory_*.csv Operator-level memory information
cpu_ms_memory_record_*.txt Raw files for memory profiling
operator_memory_*.csv Operator-level memory information
minddata_cpu_utilization_*.json CPU utilization rate
minddata_cpu_utilization_*.json CPU utilization rate
cpu_op_detail_info_*.csv CPU operator time taken data (mode=GRAPH only)
cpu_op_type_info_*.csv Class-specific CPU operator time taken statistics (mode=GRAPH only)
cpu_op_execute_timestamp_*.txt CPU operator execution start time and time taken (mode=GRAPH only)
cpu_framework_*.txt CPU operator time taken in heterogeneous scenarios (mode=GRAPH only)
cpu_op_detail_info_*.csv CPU operator time taken data (mode=GRAPH only)
cpu_op_type_info_*.csv Class-specific CPU operator time taken statistics (mode=GRAPH only)
cpu_op_execute_timestamp_*.txt CPU operator execution start time and time taken (mode=GRAPH only)
cpu_framework_*.txt CPU operator time taken in heterogeneous scenarios (mode=GRAPH only)
ascend_cluster_analyse_model-xxx.csv Data related to computation and communication, etc. in model-parallel or pipeline-parallel modes (mode=GRAPH only)
ascend_cluster_analyse_model-xxx.csv Data related to computation and communication, etc. in model-parallel or pipeline-parallel modes (mode=GRAPH only)
hccl_raw_*.csv Card-based communication time and communication wait time (mode=GRAPH only)
hccl_raw_*.csv Card-based communication time and communication wait time (mode=GRAPH only)
parallel_strategy_*.json Operator parallel strategy to capture falling disk intermediate files for MindInsight visualization
parallel_strategy_*.json Operator parallel strategy to capture falling disk intermediate files for MindInsight visualization
profiler_info_*.json Profiler Configuration and other info
- The complete name of ascend_cluster_analyse_model-xxx_*.csv should be ascend_cluster_analyse_model-{mode}_{stage_num}_{rank_size}_{rank_id}.csv, such as ascend_cluster_analyse_model-parallel_1_8_0.csv
@@ -128,20 +128,3 @@ detailed information from ``Memory Usage``, including:
:alt: memory_graphics.png
*Figure:Memory Statistics*
Host side memory usage
~~~~~~~~~~~~~~~~~~~~~~~~~
If the host side memory collection function is enabled, the memory usage can be saved in the specified directory after the training is completed. For example, when a Profiler is specified with ``output_ Path="/XXX/profiler_output"`` , the file containing host side memory data will be saved in the "/XXX/profiler_output/profile/host_info" directory. The file is in csv format and with prefix ``"host_memory_"`` and suffix rank_id. The meaning of the header is as follows:
- tid: The thread ID of the current thread when collecting host side memory.
- pid: The process ID of the current process when collecting host side memory.
- parent_pid: The process ID of the current process's Parent process when collecting the host side memory.
- module_name: Name of the module that collects host side memory, one or more event may be included in a module.
- event: The event name which collected the host side memory, one or more stage may be included in a event.
- stage: The stage name which collected the host side memory.
- level: 0 means used by framework developers, and 1 means used by users(algorithm engineers).
- start_end: The mark for the start or end of the stage, where 0 represents the start mark, 1 represents the end mark, and 2 represents an indistinguishable start or end.
- custom_info: The component customization information used by framework developers to locate performance issues, possibly empty.
- memory_usage: Host-side memory usage in kB, and 0 means no memory data is collected at the current stage.
@@ -94,9 +94,9 @@ This subsection describes how the `ParallelMode.SEMI_AUTO_PARALLEL` semi-automat
Semi-automatic parallelism supports the automatic mixing of multiple parallel modes, respectively:
**Operator-level parallelism**: Operator parallelism takes the operators in a neural network and slices the input tensor to multiple devices for computation. In this way, data samples and model parameters can be distributed among different devices to train large-scale deep learning models and use cluster resources for parallel computing to improve the overall speed. The user can set the shard strategy for each operator, and the framework will model the slice of each operator and its input tensor according to the shard strategy of the operator to maintain mathematical equivalence. This approach can effectively reduce the load on individual devices and improve computational efficiency, and is suitable for training large-scale deep neural networks. For more details, please refer to [operator-level parallelism](https://www.mindspore.cn/tutorials/experts/en/master/parallel/operator_parallel.html).
**Operator-level parallelism**: Operator parallelism takes the operators in a neural network and slices the input tensor to multiple devices for computation. In this way, data samples and model parameters can be distributed among different devices to train large-scale deep learning models and use cluster resources for parallel computing to improve the overall speed. The user can set the shard strategy for each operator, and the framework will model the slice of each operator and its input tensor according to the shard strategy of the operator to maintain mathematical equivalence. This approach can effectively reduce the load on individual devices and improve computational efficiency, and is suitable for training large-scale deep neural networks. For more details, please refer to [operator-level parallelism](https://www.mindspore.cn/docs/en/master/model_train/parallel/operator_parallel.html).
**Pipeline parallism**: When the number of cluster devices is large, if only operator parallelism is used, communication is required over the communication domain of the entire cluster, which may make communication inefficient and thus reduce the overall performance. The pipeline parallelism can slice the neural network structure into multiple stages, and each stage is running in a part of the device, which limits the communication domain of the collective communication to this part of the device, while the inter-stage uses point-to-point communication. The advantages of pipeline parallelism are: improving communication efficiency, and easily handling neural network structures stacked by layers. The disadvantage is that some nodes may be idle at the same time. Foe detailed information, refer to [pipeline parallelism](https://www.mindspore.cn/tutorials/experts/en/master/parallel/pipeline_parallel.html).
**Pipeline parallism**: When the number of cluster devices is large, if only operator parallelism is used, communication is required over the communication domain of the entire cluster, which may make communication inefficient and thus reduce the overall performance. The pipeline parallelism can slice the neural network structure into multiple stages, and each stage is running in a part of the device, which limits the communication domain of the collective communication to this part of the device, while the inter-stage uses point-to-point communication. The advantages of pipeline parallelism are: improving communication efficiency, and easily handling neural network structures stacked by layers. The disadvantage is that some nodes may be idle at the same time. Foe detailed information, refer to [pipeline parallelism](https://www.mindspore.cn/docs/en/master/model_train/parallel/pipeline_parallel.html).
**MoE parallism**: MoE is to distribute the experts to different workers and each worker takes on different batches of training data. For the non-MoE layer, expert parallelism is the same as data parallelism. In the MoE layer, the tokens in the sequence are sent to the workers corresponding to their matching experts via all-to-all communication. After completing the computation of the corresponding expert, it is then re-passed back to the original worker by all-to-all and organized into the original sequence for computation of the next layer. Since MoE models usually have a large number of experts, the expert parallelism increases more with the size of the model than the model parallelism.
@@ -106,7 +106,7 @@ This subsection describes how the `ParallelMode.SEMI_AUTO_PARALLEL` semi-automat
**Optimizer Parallelism**: When training in data parallelism or operator parallelism, the same copy of the model parameters may exist on multiple devices, which allows the optimizer to have redundant computations across multiple devices when updating that weight. In this case, the computation of the optimizer can be spread over multiple devices by optimizer parallelism. Its advantages are: reducing static memory consumption, and the amount of computation within the optimizer. The disadvantages are: increasing communication overhead. For detailed information, refer to [Optimizer Parallelism](https://www.mindspore.cn/tutorials/experts/en/master/parallel/optimizer_parallel.html).
**Optimizer Parallelism**: When training in data parallelism or operator parallelism, the same copy of the model parameters may exist on multiple devices, which allows the optimizer to have redundant computations across multiple devices when updating that weight. In this case, the computation of the optimizer can be spread over multiple devices by optimizer parallelism. Its advantages are: reducing static memory consumption, and the amount of computation within the optimizer. The disadvantages are: increasing communication overhead. For detailed information, refer to [Optimizer Parallelism](https://www.mindspore.cn/docs/en/master/model_train/parallel/optimizer_parallel.html).
### Semi-automatic Parallel Code
@@ -143,11 +143,11 @@ In fact, the hybrid parallel strategy generation module is responsible for findi
Fully automatic parallelism is very difficult to implement, and MindSpore divides the provided strategy generation algorithm into L1 level and L2 level according to the degree of user intervention required (here we assume that the manually configured full graph strategy SEMI_AUTO is L0 level, and the scheme that does not require user participation is L3 level).
The strategy generation algorithm at the L1 level is called Strategy Broadcast (Sharding Propagation). In this mode, the user only needs to manually define the strategies for a few key operators, and the strategies for the remaining operators in the computational graph are automatically generated by the algorithm. Because the strategy of the key operator has been defined, the cost model of the algorithm mainly describes the redistribution cost between the operators, and the optimization objective is to minimize the redistribution cost of the whole graph. Because the main operator strategy has been defined, which is equivalent to a compressed search space, the search time of this scheme is shorter and its strategy performance depends on the definition of the key operator strategy, so it still requires the user to have the ability to analyze the defined strategy. Refer to [Sharding Propagation](https://www.mindspore.cn/tutorials/experts/en/master/parallel/sharding_propagation.html) for detailed information.
The strategy generation algorithm at the L1 level is called Strategy Broadcast (Sharding Propagation). In this mode, the user only needs to manually define the strategies for a few key operators, and the strategies for the remaining operators in the computational graph are automatically generated by the algorithm. Because the strategy of the key operator has been defined, the cost model of the algorithm mainly describes the redistribution cost between the operators, and the optimization objective is to minimize the redistribution cost of the whole graph. Because the main operator strategy has been defined, which is equivalent to a compressed search space, the search time of this scheme is shorter and its strategy performance depends on the definition of the key operator strategy, so it still requires the user to have the ability to analyze the defined strategy. Refer to [Sharding Propagation](https://www.mindspore.cn/docs/en/master/model_train/parallel/sharding_propagation.html) for detailed information.
There are two types of L2-level strategy generation algorithms, Dynamic Programming and Symbolic Automatic Parallel Planner (SAPP for short). Both methods have their advantages and disadvantages. The dynamic programming algorithm is able to search for the optimal strategy inscribed by the cost model, but it takes longer time to search for parallel strategies for huge networks. The SAPP algorithm is able to generate optimal strategies instantaneously for huge networks and large-scale cuts.
The core idea of the dynamic programming algorithm is to build a cost model of the full graph, including computation cost and communication cost, to describe the absolute time delay in the distributed training process, and to compress the search time using equivalent methods such as edge elimination and point elimination, but the search space actually grows exponentially with the number of devices and operators, so it is not efficient for large clusters with large models.
SAPP is modeled based on the parallelism principle by creating an abstract machine to describe the hardware cluster topology and optimizing the cost model by symbolic simplification. Its cost model compares not the predicted absolute latency, but the relative cost of different parallel strategies, so it can greatly compress the search space and guarantee minute search times for 100-card clusters. Refer to [Distributed Parallel Training Mode](https://www.mindspore.cn/tutorials/experts/en/master/parallel/overview.html)
SAPP is modeled based on the parallelism principle by creating an abstract machine to describe the hardware cluster topology and optimizing the cost model by symbolic simplification. Its cost model compares not the predicted absolute latency, but the relative cost of different parallel strategies, so it can greatly compress the search space and guarantee minute search times for 100-card clusters. Refer to [Distributed Parallel Training Mode](https://www.mindspore.cn/docs/en/master/model_train/parallel/overview.html)
Sharding Propagation and SAPP currently support manual definition of Pipeline + automatic operator parallelism, and can be used in conjunction with optimizations such as recomputation, optimizer parallelism, etc. Dynamic Programming algorithms only support automatic operator parallelism.
@@ -362,7 +362,7 @@ When the EmbeddingTable reaches T level and the single machine memory cannot be
Parameter Server encapsulates heterogeneous processes, and users only need to configure parameters to use PS. For the detailed configuration process, refer to [Parameter Server training process](https://www.mindspore.cn/tutorials/experts/en/master/parallel/parameter_server_training.html).
Parameter Server encapsulates heterogeneous processes, and users only need to configure parameters to use PS. For the detailed configuration process, refer to [Parameter Server training process](https://www.mindspore.cn/docs/en/master/model_train/parallel/parameter_server_training.html).
In addition, the process of using PS is also available in the wide&deep network and can be found at: <https://gitee.com/mindspore/models/tree/master/official/recommend/Wide_and_Deep>.
@@ -163,189 +163,4 @@ Similarly for the input y derivation, the same procedure can be used for the der
### Control Flow in PyNative Mode
In the PyNative mode, scripts are executed according to the Python syntax, so in MindSpore, there is no special treatment for the control flow syntax, which is directly expanded and executed according to the Python syntax, and automatic differentiation is performed on the expanded execution operator. For example, for a for loop, the statements in the for loop are continuously executed under PyNative and automatic differentiation is performed on the operators according to the specific number of loops.
## Dynamic and Static Unification
### Overview
The industry currently supports both dynamic and static graph modes. Dynamic graphs are executed by interpretation, with dynamic syntax affinity and flexible expression, and static graphs are executed by using jit compilation optimization, more inclined to static syntax and more restrictions in syntax. For dynamic and static graph modes, firstly MindSpore unifies the API expression, uses the same API in both modes, secondly, unifies the underlying differential mechanism of dynamic and static graphs.
### Interconversion of Dynamic and Static Graphs
In MindSpore, we can switch the execution between using dynamic or static graphs by controlling the mode input parameters. For example:
```python
ms.set_context(mode=ms.PYNATIVE_MODE)
```
Since there are restrictions on Python syntax under static graphs, switching from dynamic to static graphs requires compliance with the syntax restrictions of static graphs in order to execute correctly by using static graphs. For more syntax restrictions for static graphs, refer to [Static Graph Syntax Restrictions](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html).
### Combination of Static and Dynamic Graphs
MindSpore supports mixed execution by using static compilation under dynamic graphs. The function objects that need to be executed with static graphs by using jit modification, and in this way you can achieve mixed execution of dynamic and static graphs. For more use of jit, refer to [jit documentation](https://www.mindspore.cn/tutorials/en/master/beginner/accelerate_with_static_graph.html#decorator-based-startup-method).
In the MindSpore static graph mode, users need to follow MindSpore [Static Graph Syntax Support](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html) when writing programs, and there are constraints on the use of syntax. In dynamic graph mode, Python script code will be executed according to Python syntax, and users can use any Python syntax. It can be seen that the syntax constraints of static and dynamic graphs are different.
JIT Fallback considers the unification of static and dynamic graphs from the perspective of static graphs. When an unsupported syntax is found during compilation, the syntax is Fallback to the Python interpreter for interpretation execution. Through the JIT Fallback feature, static graphs can support as much dynamic graph syntax as possible, so that static graphs provide a syntax experience close to dynamic graphs, so as to achieve dynamic and static unity.
In the graph mode scenario, the MindSpore framework will report an error when it encounters unsupported syntax or symbols during graph compilation, mostly in the type inference stage. In the graph compilation stage, the Python source code written by the user is parsed, and then subsequent static analysis, type derivation, optimization and other steps are performed. Therefore, the JIT Fallback feature needs to be pre-detected for unsupported syntax. Common unsupported syntax mainly includes: calling methods of third-party libraries, calling class names to create objects, calling unsupported Python built-in functions, etc. Interpret execution of unsupported syntax Fallback to the Python interpreter. Since the graph mode uses [MindSpore IR (MindIR)](https://www.mindspore.cn/docs/en/master/design/all_scenarios.html#mindspore-ir-mindir), it is necessary to convert the statement executed by the interpretation to the intermediate representation and record the information required by the interpreter.
The following mainly introduces the static graph syntax supported using the JIT Fallback extension. The default value of the JIT syntax support level option jit_syntax_level is 'LAX', extending the static graph syntax with the ability of JIT Fallback.
#### Calling the Third-party Libraries
Complete support for third-party libraries such as NumPy and SciPy. The static graph mode supports many third-party library data types such as np.ndarray and their operation operations, supports obtaining properties and methods that call third-party libraries, and supports interacting with third-party libraries such as NumPy through methods such as Tensor's asnumpy(). In other words, users can call MindSpore's own interface and operator in static graph mode, or directly call the interface of the three-party library, or use them together.
- Supporting data types of third-party libraries (such as NumPy and SciPy), allowing calling and returning objects of third-party libraries.
- Supporting calling methods of third-party libraries.
- Supporting creating Tensor instances by using the data types of the third-party library NumPy.
- The assignment of subscripts for data types in third-party libraries is not currently supported.
For more usage, please refer to the [Calling the Third-party Libraries](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html#calling-the-third-party-libraries) section in [Static Graph Syntax Support](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html).
#### Supporting the Use of Custom Classes
Custom classes that do not use '@jit_class' decorations and do not inherit 'nn. Cell`。 Through the JIT Fallback technical solution, static graph mode allows creating and referencing instances of custom classes, can directly obtain and call properties and methods of custom class instances, and allows modifying properties(Inplace operations).
For more usage, please refer to the [Supporting the Use of Custom Classes](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html#supporting-the-use-of-custom-classes) section in [Static Graph Syntax Support](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html).
#### Basic Operators Support More Data Types
In the syntax of graph mode, the following basic operators in the list are overloaded: ['+', '-', '*', '/', '//', '%', '**', '<<', '>>', '&', '|', '^', 'not', '==', '!=', '<', '>', '<=', '>=', 'in', 'not in', 'y=x[0]']. For more details, please refer to [Operators](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax/operators.html). When getting unsupported input type, those operators need to use extended static graph syntax to support, and make the output consistent with the output in the pynative mode.
For more usage, please refer to the [Basic Operators Support More Data Type](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html#basic-operators-support-more-data-type) section in [Static Graph Syntax Support](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html).
#### Base Type
Use the JIT Fallback feature to extend support for Python's native data types 'List', 'Dictionary', 'None'. For more usage, please refer to the [Base Type](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html#base-type) section in [Static Graph Syntax Support](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html).
##### Supporting List Inplace Modification Operations
- Support for getting the original `List` object from a global variable.
- Inplace operations on input `List` objects are not supported.
- Support for in-place modification of some `List` built-in functions.
##### Supporting the High-Level Usage of Dictionary
- Supporting Top Graph Return Dictionary.
- Supporting Dictionary Index Value Retrieval and Assignment.
##### Supporting the Usage of None
`None` is a special value in Python that represents null and can be assigned to any variable. Functions that do not have a return value statement are considered to return `None`. At the same time, `None` is also supported as the input parameter or return value of the top graph or subgraph. Support `None` as a subscript of a slice as input to `List`, `Tuple`, `Dictionary`.
#### Built-in Functions Support More Data Types
Extend the support for built-in functions. Python built-in functions perfectly support more input types, such as third-party library data types. More support for built-in functions can be found in the [Python built-in functions](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax/python_builtin_functions.html) section.
#### Supporting Control Flow
In order to improve the support of Python standard syntax, realize dynamic and static unification, and extend the support for more data types in the use of control flow statements. Control flow statements refer to flow control statements such as 'if', 'for', and 'while'. Theoretically, by extending the supported syntax, it is also supported in control flow scenarios. For more usage, please refer to [Supporting Control Flow](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html#supporting-control-flow) section in [Static Graph Syntax Support](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html).
#### Supporting Property Setting and Modification
More types of inplace operations are supported. The previous version only supported value modification of the Parameter type through the Inplace operator, and in the static graph mode of MindSpore version 2.1, the properties of custom classes, Cell subclasses, and jit_class classes were supported. In addition to supporting changing the properties of class self and global variables, it also supports inplace operations such as extend(), reverse(), insert(), pop() of the List type. For more usage, please refer to the [Supporting Property Setting and Modification](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html#supporting-property-setting-and-modification) section in [Static Graph Syntax Support](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html).
- Set and modify properties of custom class objects and third-party types.
- Make changes to the Cell's self object.
- Set and modify Cell objects and jit_class objects in the static graph.
#### Supporting Derivation
The static graph syntax supported by JIT Fallback also supports its use in derivation. For more usage, please refer to the [Supporting Derivation](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html#supporting-derivation) section in [Static Graph Syntax Support](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html).
#### Annotation Type
For the syntax supported by the runtime extensions, nodes are generated that cannot be derived by type and are called `Any` types. Since the type cannot derive the correct type at compile time, this `Any` will be operated with a default maximum precision 'Float64' to prevent loss of precision. To optimize performance, it is recommended to minimize the generation of `Any` types. When the user knows exactly what type of statement will be generated through the extension, it is recommended to use `Annotation @jit.typing:` to specify the corresponding Python statement type, thereby determining the type of the interpretation node and avoiding the generation of `Any` types. For more usage, please refer to the [Annotation Type](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html#annotation-type) section in [Static Graph Syntax Support](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html).
#### Instructions for Use
When using the static graph extension support syntax, note the following points:
1. In order to match the support capability of the dynamic graph. That is, it must be within the scope of dynamic graph syntax, including but not limited to data types.
2. When extending the static graph syntax, more syntax is supported, but the execution performance may be affected and is not optimal.
3. When extending the static graph syntax, more syntax is supported, and the ability to import and export cannot be used with MindIR due to use Python.
4. It is not currently supported that the repeated definition of global variables with the same name across Python files, and these global variables are used in the network.
### Conversion Technique from Dynamic Graph to Static Graph
MindSpore provides PIJit, a feature that directly converts a user's dynamic graph code into a static graph without code changes. This feature balances performance and ease of use, removes the cost of switching between static and dynamic modes, and truly unifies static and dynamic modes. It is based on the analysis of Python bytecode, and captures the execution flow of Python graphs. Subgraphs that can be run as static graphs are run as static graphs, and subgraphs that are not supported by Python syntax are run as dynamic graphs, and at the same time, by modifying and adjusting the bytecode to link the static graphs, it achieves the mixed execution of static and dynamic modes. In order to meet the premise of ease of use, improve performance.
#### PIJit Includes the Following Features
- 1. Graph Capture: Pre-processing of bytecode, dynamically tracking the execution of the interpretation, recognizing MindSpore accessible graph operations, and providing split graph to ensure the correctness of function (bytecode) functionality.
- 2. Bytecode Support: Currently supports Python 3.7, Python 3.8, Python 3.9 and Python 3.10 version bytecode.
- 3. Graph Optimization: Optimize the bytecode generated in the graph, including branch cropping, bytecode filtering, function bytecode inlining, constant folding and other functions.
- 4. Exception Capture Mechanism: support for with, try-except syntax.
- 5. Support loop processing: implement features such as graph capture and split graph by simulating the operation stack of bytecode.
- 6. UD Analysis: The method of user-def chain analysis of variables solves the problem that some parameter types cannot be used as the return value of static graphs (Function, Bool, None), and reduces the useless parameters, improves the execution efficiency of the graphs, and reduces the copying of data.
- 7. Side effect analysis and processing: to make up for the disadvantage of side effect processing of static graphs. According to different scenarios, collect and record the variables and byte codes that generate side effects, and supplement the processing of side effects outside the static graphs on the basis of guaranteeing the semantics of the program.
- 8. Guard: The Guard records the conditions that need to be met by the inputs for the subgraph/optimization to enter, and checks if the inputs are suitable for the corresponding subgraph optimization.
- 9. Cache:The graph management caches the subgraph/optimization and Guard correspondences.
- 10. Dynamic Shape and Symbolic Shape: Use input_signature to support Dynamic Shape and Symbolic Shape for Tensor/Tensor List/Tensor Tuple as input prompts. Simultaneously supports automatic recognition of Dynamic Shape after multiple runs.
- 11. Compiling by trace: Supports operator and other type derivations during tracking and bytecode analysis processes.
- 12. Automatic mixed precision: Supports the automatic mixed precision capability of the native mindspore.nn.Cell.
The original Jit function uses mode="PSJit", the new feature PIJit uses mode="PIJit", jit_config passes a dictionary of parameters that can provide some optimization and debugging options. For example: print_after_all can print the bytecode of the graph and split graph information, loop_unrolling can provide loop unrolling function, enable_dynamic_shape apply dynamic shape.
#### Limitations
- It is not supported to run a function with decoration @jit(mode=\"PIJit\") in static graph mode, in which case the decoration @jit(mode=\"PIJit\") is considered invalid.
- Calls to functions with decoration @jit(mode=\"PIJit\") inside functions decorated with @jit(mode=\"PIJit\") are not supported, and the decorated @jit(mode=\"PIJit\") is considered invalid.
In the PyNative mode, scripts are executed according to the Python syntax, so in MindSpore, there is no special treatment for the control flow syntax, which is directly expanded and executed according to the Python syntax, and automatic differentiation is performed on the expanded execution operator. For example, for a for loop, the statements in the for loop are continuously executed under PyNative and automatic differentiation is performed on the operators according to the specific number of loops.
@@ -38,7 +38,7 @@ A: You can refer to the following steps to reduce CPU consumption (mainly due to
## Q: Why there is no difference between the parameter `shuffle` in `GeneratorDataset`, and `shuffle=True` and `shuffle=False` when the task is run?
A: If `shuffle` is enabled, the input `Dataset` must support random access (for example, the user-defined `Dataset` has the `getitem` method). If data is returned in `yeild` mode in the user-defined `Dataset`, random access is not supported. For details, see section [Loading Dataset Overview](https://www.mindspore.cn/tutorials/en/master/advanced/dataset.html) in the tutorial.
A: If `shuffle` is enabled, the input `Dataset` must support random access (for example, the user-defined `Dataset` has the `getitem` method). If data is returned in `yeild` mode in the user-defined `Dataset`, random access is not supported. For details, see section [GeneratorDataset example](https://www.mindspore.cn/docs/en/master/api_python/dataset/mindspore.dataset.GeneratorDataset.html).
<br/>
@@ -160,9 +160,7 @@ A: You can refer to the usage of YOLOv3 which contains the resizing of different
A: [build_seg_data.py](https://gitee.com/mindspore/models/blob/master/research/cv/FCN8s/src/data/build_seg_data.py) is the script of MindRecords generated by the dataset. You can directly use or adapt it to your dataset. Alternatively, you can use `GeneratorDataset` to customize the dataset loading if you want to implement the dataset reading by yourself.
[GeneratorDataset API description](https://www.mindspore.cn/docs/en/master/api_python/dataset/mindspore.dataset.GeneratorDataset.html#mindspore.dataset.GeneratorDataset)
A: The data schema can be defined as follows:`cv_schema_json = {"label": {"type": "int32", "shape": [-1]}, "data": {"type": "bytes"}}`
Note: A label is an array of the numpy type, where label values 1, 1, 0, 1, 0, 1 are stored. These label values correspond to the same data, that is, the binary value of the same image.
For details, see [Converting Dataset to MindRecord](https://www.mindspore.cn/tutorials/en/master/advanced/dataset/record.html#converting-dataset-to-mindrecord).
For details, see [Converting Dataset to MindRecord](https://www.mindspore.cn/docs/en/master/model_train/dataset/record.html#Converting-Dataset-to-Record-Format).
<br/>
@@ -203,7 +201,7 @@ A: The MNIST gray scale image dataset is used for MindSpore training. Therefore,
## Q: Can you introduce the data processing framework in MindSpore?
A: MindSpore Dataset module makes it easy for users to define data preprocessing pipelines and transform samples efficiently with multiprocessing or multithreading. MindSpore Dataset also provides variable APIs for users to load and process datasets, more introduction please refer to [MindSpore Dataset](https://mindspore.cn/docs/en/master/api_python/mindspore.dataset.html#introduction-to-data-processing-pipeline). If you want to further study the performance optimization of dataset pipeline, please read [Optimizing Data Processing](https://www.mindspore.cn/tutorials/experts/en/master/dataset/optimize.html).
A: MindSpore Dataset module makes it easy for users to define data preprocessing pipelines and transform samples efficiently with multiprocessing or multithreading. MindSpore Dataset also provides variable APIs for users to load and process datasets, more introduction please refer to [MindSpore Dataset](https://mindspore.cn/docs/en/master/api_python/mindspore.dataset.html#introduction-to-data-processing-pipeline). If you want to further study the performance optimization of dataset pipeline, please read [Optimizing Data Processing](https://www.mindspore.cn/docs/en/master/model_train/dataset/optimize.html).
<br/>
@@ -314,7 +312,7 @@ dataset3 = dataset2.map(***)
## Q: What is the API corresponding to DataLoader in MindSpore?
A: If the DataLoader is considered as an API for receiving user-defined datasets, the GeneratorDataset in the MindSpore data processing API is similar to that in the DataLoader and can receive user-defined datasets. For details about how to use the GeneratorDataset, see the [Loading Dataset Overview](https://www.mindspore.cn/tutorials/en/master/advanced/dataset.html), and for details about the differences, see the [API Mapping](https://www.mindspore.cn/docs/en/master/note/api_mapping/pytorch_api_mapping.html).
A: If the DataLoader is considered as an API for receiving user-defined datasets, the GeneratorDataset in the MindSpore data processing API is similar to that in the DataLoader and can receive user-defined datasets. For details about how to use the GeneratorDataset, see the [GeneratorDataset example](https://www.mindspore.cn/docs/en/master/api_python/dataset/mindspore.dataset.GeneratorDataset.html), and for details about the differences, see the [API Mapping](https://www.mindspore.cn/docs/en/master/note/api_mapping/pytorch_api_mapping.html).
<br/>
@@ -500,7 +498,7 @@ A: When using the data sinking mode (where `data preprocessing` -> `sending queu
Improvement method: View the time difference between the last item of `push_end_time` and GetNext error reporting time. If the default GetNext timeout is exceeded (default: 1900s, and can be modified through `mindspore.set_context(op_timeout=xx)`), it indicates poor data preprocessing performance. Please refer to [Optimizing the Data Processing](https://www.mindspore.cn/tutorials/experts/en/master/dataset/optimize.html) to improve data preprocessing performance.
Improvement method: View the time difference between the last item of `push_end_time` and GetNext error reporting time. If the default GetNext timeout is exceeded (default: 1900s, and can be modified through `mindspore.set_context(op_timeout=xx)`), it indicates poor data preprocessing performance. Please refer to [Optimizing the Data Processing](https://www.mindspore.cn/docs/en/master/model_train/dataset/optimize.html) to improve data preprocessing performance.
4. When the log output is similar to the following, it indicates that data preprocessing has generated 182 batches of data and the 183st batch of data is being sent to the device. And the `device_queue` shows that there is sufficient data cache on the device side.
@@ -550,7 +548,7 @@ A: When using the data sinking mode (where `data preprocessing` -> `sending queu
2022-05-09-14:31:04.064.571 ->
```
Improvement method: Please refer to [Optimizing the Data Processing](https://www.mindspore.cn/tutorials/experts/en/master/dataset/optimize.html) to improve data preprocessing performance.
Improvement method: Please refer to [Optimizing the Data Processing](https://www.mindspore.cn/docs/en/master/model_train/dataset/optimize.html) to improve data preprocessing performance.
@@ -49,7 +49,7 @@ Solution: Manually `kill` the training process and then restart the training tas
[CRITICAL] DISTRIBUTED [mindspore/ccsrc/distributed/cluster/cluster_context.cc:130] InitNodeRole] Role name is invalid...
```
A: In the case where the user does not start the process using `mpirun` but still calls the `init()` method, MindSpore requires the user to configure several environment variables and verify according to training and [dynamic cluster startup methods](https://www.mindspore.cn/tutorials/experts/zh-CN/master/parallel/dynamic_cluster.html). If without configuring, MindSpore may display the above error message. Therefore, it is suggested that only when performing distributed training, `mindspore.communication.init` is called, and in the case of not using `mpirun`, it is configured the correct environment variables according to the documentation to start distributed training.
A: In the case where the user does not start the process using `mpirun` but still calls the `init()` method, MindSpore requires the user to configure several environment variables and verify according to training and [dynamic cluster startup methods](https://www.mindspore.cn/docs/zh-CN/master/model_train/parallel/dynamic_cluster.html). If without configuring, MindSpore may display the above error message. Therefore, it is suggested that only when performing distributed training, `mindspore.communication.init` is called, and in the case of not using `mpirun`, it is configured the correct environment variables according to the documentation to start distributed training.
@@ -50,7 +50,7 @@ A: The formats of `ckpt` of MindSpore and `ckpt`of TensorFlow are not generic.
## Q: How do I use models trained by MindSpore on Atlas 200/300/500 inference product? Can they be converted to models used by HiLens Kit?
A: Yes. HiLens Kit uses Atlas 200/300/500 inference product as the inference core. Therefore, the two questions are essentially the same, which both need to convert as OM model. Atlas 200/300/500 inference product requires a dedicated OM model. Use MindSpore to export the ONNX and convert it into an OM model supported by Atlas 200/300/500 inference product. For details, see [Multi-platform Inference](https://www.mindspore.cn/tutorials/experts/en/master/infer/inference.html).
A: Yes. HiLens Kit uses Atlas 200/300/500 inference product as the inference core. Therefore, the two questions are essentially the same, which both need to convert as OM model. Atlas 200/300/500 inference product requires a dedicated OM model. Use MindSpore to export the ONNX and convert it into an OM model supported by Atlas 200/300/500 inference product. For details, see [Multi-platform Inference](https://www.mindspore.cn/docs/en/master/model_infer/overview.html).
## Q: When MindSpore is used for model training, there are four input parameters for `CTCLoss`: `inputs`, `labels_indices`, `labels_values`, and `sequence_length`. How do I use `CTCLoss` for model training?
A: The `dataset` received by the defined `model.train` API can consist of multiple pieces of data, for example, (`data1`, `data2`, `data3`, ...). Therefore, the `dataset` can contain `inputs`, `labels_indices`, `labels_values`, and `sequence_length` information. You only need to define the dataset in the corresponding format and transfer it to `model.train`. For details, see [Data Processing API](https://www.mindspore.cn/tutorials/en/master/advanced/dataset.html).
A: The `dataset` received by the defined `model.train` API can consist of multiple pieces of data, for example, (`data1`, `data2`, `data3`, ...). Therefore, the `dataset` can contain `inputs`, `labels_indices`, `labels_values`, and `sequence_length` information. You only need to define the dataset in the corresponding format and transfer it to `model.train`. For details, see [Data Processing API](https://www.mindspore.cn/docs/en/master/model_train/index.html).
## Q: What is the set of syntaxes supported by static graph mode?
A: Static graph mode can support a subset of common Python syntax to support the construction and training of neural networks. Some Python syntax is not supported yet. For more detailed supported syntax set, please refer to [Static Graph Syntax Support](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html). In order to facilitate users to choose whether to extend the static graph syntax, the static graph mode provides JIT syntax support level options. For some network scenarios, it is recommended to use basic syntax (nn/ops, etc.) rather than extended syntax (such as numpy third-party library). In addition, it is recommended to use [Advanced Programming Techniques with Static Graphs](https://www.mindspore.cn/tutorials/en/master/advanced/static_graph_expert_programming.html) to optimize compilation performance.
A: Static graph mode can support a subset of common Python syntax to support the construction and training of neural networks. Some Python syntax is not supported yet. For more detailed supported syntax set, please refer to [Static Graph Syntax Support](https://www.mindspore.cn/docs/en/master/model_train/program_form/static_graph.html). In order to facilitate users to choose whether to extend the static graph syntax, the static graph mode provides JIT syntax support level options. For some network scenarios, it is recommended to use basic syntax (nn/ops, etc.) rather than extended syntax (such as numpy third-party library). In addition, it is recommended to use [Advanced Programming Techniques with Static Graphs](https://www.mindspore.cn/docs/en/master/model_train/program_form/static_graph_syntax/static_graph_expert_programming.html) to optimize compilation performance.
<br/>
@@ -531,7 +531,7 @@ net = Net()
out = net(Tensor(x))
```
3) If a function decorated with a @jit decorator is called in a custom class, an error will be reported. In this scenario, it is recommended to add @jit_class decorators to custom classes in the network and avoid the JIT Fallback feature. For more use of custom classes, please refer to [Supporting the Use of Custom Classes](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html#supporting-the-use-of-custom-classes). The use of jit_class decorators can be referred to [Use jit_class](https://www.mindspore.cn/tutorials/en/master/advanced/static_graph_expert_programming.html#using-jit-class).
3) If a function decorated with a @jit decorator is called in a custom class, an error will be reported. In this scenario, it is recommended to add @jit_class decorators to custom classes in the network and avoid the JIT Fallback feature. For more use of custom classes, please refer to [Supporting the Use of Custom Classes](https://www.mindspore.cn/docs/en/master/model_train/program_form/static_graph.html#supporting-the-use-of-custom-classes). The use of jit_class decorators can be referred to [Use jit_class](https://www.mindspore.cn/docs/en/master/model_train/program_form/static_graph_syntax/static_graph_expert_programming.html#using-jit-class).
```python
import mindspore as ms
@@ -772,7 +772,7 @@ A: The following scenarios will trigger recompilation:
## Q: How to determine how many graphs there are in static graph mode? When will the subgraph be divided? What is the impact of multiple subgraphs? How to avoid multiple subgraphs?
A: 1. The number of subgraphs can be obtained by viewing the IR file and searching for "Total subgraphs". For how to view and analyze IR files, please refer to [MindSpore IR Introduction](https://www.mindspore.cn/tutorials/en/master/advanced/error_analysis/mindir.html)
A: 1. The number of subgraphs can be obtained by viewing the IR file and searching for "Total subgraphs". For how to view and analyze IR files, please refer to [MindSpore IR Introduction](https://www.mindspore.cn/docs/en/master/model_train/debug/error_analysis/mindir.html)
2. Subgraph segmentation in static graph mode is common in control flow scenarios, such as if/while. In addition to manual writing by users, the control flow syntax within the MindSpore may also lead to dividing into multiple subgraphs.
@@ -59,7 +59,7 @@ In MindSpore, you can manually initialize the weight corresponding to the `paddi
## Q: When the `Tile` operator in operations executes `__infer__`, the `value` is `None`. Why is the value lost?
A: The `multiples input` of the `Tile` operator must be a constant (The value cannot directly or indirectly come from the input of the graph). Otherwise, the `None` data will be obtained during graph composition because the graph input is transferred only during graph execution and the input data cannot be obtained during graph composition. For the detailed imformation, refer to [Static Graph Syntax Support](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html).
A: The `multiples input` of the `Tile` operator must be a constant (The value cannot directly or indirectly come from the input of the graph). Otherwise, the `None` data will be obtained during graph composition because the graph input is transferred only during graph execution and the input data cannot be obtained during graph composition. For the detailed imformation, refer to [Static Graph Syntax Support](https://www.mindspore.cn/docs/en/master/model_train/program_form/static_graph.html).
## Q: Has MindSpore implemented the anti-pooling operation similar to `nn.MaxUnpool2d`?
A: Currently, MindSpore does not provide anti-pooling APIs but you can customize the operator to implement the operation. For details, refer to [Customize Operators](https://www.mindspore.cn/tutorials/experts/en/master/operation/op_custom.html).
A: Currently, MindSpore does not provide anti-pooling APIs but you can customize the operator to implement the operation. For details, refer to [Customize Operators](https://www.mindspore.cn/docs/en/master/model_train/custom_program/operation/op_custom.html).
Frequently asked questions and answers, including installation, data processing, compilation and execution, debugging and tuning, distributed parallelism, inference, etc.
[](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/note/official_models.md)
[](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/kits_tools/official_models.md)
[](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/kits_tools/overview.md)
| Core Frameworks | Toolset |Domain Suites and Extension Packages | Scientific Computing | Foundation Model |
@@ -99,11 +99,11 @@ See [TroubleShooter application scenarios](https://gitee.com/mindspore/toolkits/
MindSpore provides Dump function, used to model training in the graph and operator input and output data saved to disk files, generally used for network migration complex problem location (eg: operator overflow, etc). It can be dumped out of the operator level data.
For getting Dump data, refer to: [Synchronous Dump Step](https://www.mindspore.cn/tutorials/experts/en/master/debug/dump.html#dump-step) and [Asynchronous Dump Step](https://www.mindspore.cn/tutorials/experts/en/master/debug/dump.html##dump-step-1).
For getting Dump data, refer to: [Synchronous Dump Step](https://www.mindspore.cn/docs/en/master/model_train/debug/dump.html#dump-step) and [Asynchronous Dump Step](https://www.mindspore.cn/docs/en/master/model_train/debug/dump.html##dump-step-1).
For analyzig Dump data, refer to: [Synchronous Dump Data Analysis Sample](https://www.mindspore.cn/tutorials/experts/en/master/debug/dump.html#data-analysis-sample) and [Asynchronous Dump Data Analysis Sample](https://www.mindspore.cn/tutorials/experts/en/master/debug/dump.html#data-analysis-sample-1)
For analyzig Dump data, refer to: [Synchronous Dump Data Analysis Sample](https://www.mindspore.cn/docs/en/master/model_train/debug/dump.html#data-analysis-sample) and [Asynchronous Dump Data Analysis Sample](https://www.mindspore.cn/docs/en/master/model_train/debug/dump.html#data-analysis-sample-1)
See [Dump](https://www.mindspore.cn/tutorials/experts/en/master/debug/dump.html) for details.
See [Dump](https://www.mindspore.cn/docs/en/master/model_train/debug/dump.html) for details.
### Performance Issues
@@ -133,7 +133,7 @@ Currently, there are two execution modes of a mainstream deep learning framework
- In dynamic graph mode, the program is executed line by line according to the code writing sequence. In the forward execution process, the backward execution graph is dynamically generated according to the backward propagation principle. In this mode, the compiler delivers the operators in the neural network to the device one by one for computing, facilitating users to build and debug the neural network model.
### [Calling the Custom Class](https://www.mindspore.cn/tutorials/en/master/advanced/static_graph_expert_programming.html#using-jit-class)
### [Calling the Custom Class](https://www.mindspore.cn/docs/en/master/model_train/program_form/static_graph_syntax/static_graph_expert_programming.html#using-jit-class)
In static graph mode, you can use `jit_class` to modify a custom class. You can create and call an instance of the custom class, and obtain its attributes and methods.
@@ -143,15 +143,15 @@ In static graph mode, you can use `jit_class` to modify a custom class. You can
Automatic differentiation can calculate a derivative value of a derivative function at a certain point, which is a generalization of backward propagation algorithms. The main problem solved by automatic differential is to decompose a complex mathematical operation into a series of simple basic operations. This function shields a large number of derivative details and processes from users, greatly reducing the threshold for using the framework.
Generally, when a neural network model is trained, the default data type is FP32. In recent years, to accelerate training time, reduce memory occupied during network training, and store a trained model with same precision, more and more mixed-precision training methods are proposed in the industry. The mixed-precision training herein means that both single precision (FP32) and half precision (FP16) are used in a training process.
MindSpore not only allows you to customize data augmentation, but also provides an automatic data augmentation mode to automatically perform data augmentation on images based on specific policies.
Gradient accumulation is a method of splitting data samples for training neural networks into several small batches by batch and then calculating the batches in sequence. The purpose is to solve the out of memory (OOM) problem that the neural network cannot be trained or the network model cannot be loaded due to insufficient memory.
[](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/migration_guide/debug_and_tune.md)
## Debugging Tools
## FAQs and Solutions
- The following common problems may be encountered during the accuracy commissioning phase:
- The first loss and the benchmark are not aligned:
@@ -11,16 +11,8 @@
The problem is mainly caused by the network reverse. This can be done with the help of [TroubleShooter comparing MindSpore to PyTorch ckpt/pth](https://gitee.com/mindspore/toolkits/blob/master/troubleshooter/docs/migrator.md#%E5%BA%94%E7%94%A8%E5%9C%BA%E6%99%AF2%E6%AF%94%E5%AF%B9mindspore%E4%B8%8Epytorch%E7%9A%84ckptpth) to check the results of the network reverse update by comparing the values of the corresponding parameters of ckpt and pth.
- Loss appears NAN/INF:
[TroubleShooter obtains INF/NAN value throw points](https://gitee.com/mindspore/toolkits/blob/master/troubleshooter/docs/tracker.md#%E5%BA%94%E7%94%A8%E5%9C%BA%E6%99%AF2%E8%8E%B7%E5%8F%96infnan%E5%80%BC%E6%8A%9B%E5%87%BA%E7%82%B9) is used to identify the first location in the network where a NAN or INF appears.
Overflow operator detection is also available via the [Dump](https://www.mindspore.cn/tutorials/experts/en/master/debug/dump.html) tool.
- The following common problems may be encountered during the performance debugging phase:
- The first step is time-consuming
This phase mainly completes operations such as graph conversion, graph fusion, graph optimization, etc, which is the process of generating executable models. Refer to [How to Optimize Compilation Performance](https://www.mindspore.cn/tutorials/en/master/advanced/static_graph_expert_programming.html#how-to-optimize-compilation-performance).
- Iteration gap is time-consuming
Most of the time consumption in this phase comes from data acquisition, see [Data Processing Performance Optimization](https://www.mindspore.cn/tutorials/experts/en/master/dataset/optimize.html).
- Forward and reverse computation is time-consuming
This phase mainly executes the forward and reverse operators in the network and carries the main computational work of an iteration. Information such as operator time consumption during training can be recorded to a file via [Profiler](https://www.mindspore.cn/mindinsight/docs/en/master/performance_profiling.html). The performance data provides the performance data of the framework host execution and operator execution, which can also be viewed and analyzed by users through the [MindInsight](https://www.mindspore.cn/mindinsight/docs/en/master/index.html) visualization interface, helping users to debug neural network performance more efficiently.
- Iteration trailing is time-consuming
This phase is time consuming, which may be caused by the collection communication, and you can set the fusion policy to optimize. Refer to [all_reduce_fusion_config set allreduce fusion policy](https://www.mindspore.cn/docs/en/master/api_python/mindspore/mindspore.set_auto_parallel_context.html).
Overflow operator detection is also available via the [Dump](https://www.mindspore.cn/docs/en/master/model_train/debug/dump.html) tool.
- The following common problems may be encountered during the graphics debugging phase:
- Malloc device memory failed:
MindSpore failed to request memory on the device side, the original memory is that the device is occupied by other processes, you can check the running processes by ps -ef | grep "python".
@@ -32,9 +24,9 @@
### Function Debugging
During network migration, you are advised to use the PyNative mode for debugging. In PyNative mode, you can perform debugging, and log printing is user-friendly. After the debugging is complete, the graph mode is used. The graph mode is more user-friendly in execution performance. You can also find some problems in network compilation. For example, gradient truncation caused by third-party operators.
For details, see [Error Analysis](https://www.mindspore.cn/tutorials/en/master/advanced/error_analysis/error_scenario_analysis.html).
For details, see [Error Analysis](https://www.mindspore.cn/docs/en/master/model_train/debug/error_analysis/error_scenario_analysis.html).
### Accuracy Debugging
### Precision Tuning
The accuracy debugging process is as follows:
@@ -201,7 +193,7 @@ After the inference verification is complete, the basic model, data processing,
model = Model(network=train_net)
```
- Check whether overflow occurs. When loss scale is added, overflow detection is added by default to monitor the overflow result. If overflow occurs continuously, you are advised to use the [dump data](https://mindspore.cn/tutorials/experts/en/master/debug/dump.html) of MindSpore Insight to check why overflow occurs.
- Check whether overflow occurs. When loss scale is added, overflow detection is added by default to monitor the overflow result. If overflow occurs continuously, you are advised to use the [dump data](https://mindspore.cn/docs/en/master/model_train/debug/dump.html) of MindSpore Insight to check why overflow occurs.
```python
import numpy as np
@@ -290,7 +282,7 @@ If you find an operator with poor performance, you are advised to contact [MindS
The mixed precision training method accelerates the deep neural network training process by mixing the single-precision floating-point data format and the half-precision floating-point data format without compromising the network accuracy. Mixed precision training can accelerate the computing process, reduce memory usage and retrieval, and enable a larger model or batch size to be trained on specific hardware.
For details, see [Mixed Precision Tutorial](https://www.mindspore.cn/tutorials/en/master/advanced/mixed_precision.html).
For details, see [Mixed Precision Tutorial](https://www.mindspore.cn/tutorials/en/master/beginner/mixed_precision.html).
- Enabling Graph Kernel Fusion
@@ -343,4 +335,4 @@ When the data processing speed is slow, the empty queue is gradually consumed fr
For details about data performance problems, see [Data Preparation Performance Analysis](https://www.mindspore.cn/mindinsight/docs/en/master/performance_profiling_ascend.html#data-preparation-performance-analysis) of MindSpore Insight. This describes common data performance problems and solutions.
For more performance debugging methods, see [Performance Optimization](https://www.mindspore.cn/tutorials/experts/en/master/optimize/execution_opt.html).
For more performance debugging methods, see [Performance Optimization](https://www.mindspore.cn/docs/en/master/model_train/train_process/train_optimize.html).
@@ -67,35 +67,35 @@ MindSpore provides a `FAQ <https://mindspore.cn/docs/en/master/faq/installation.
**Q: When using GeneratorDataset or map to load/process data, there may be syntax errors, calculation overflow and other issues that cause data errors, how to troubleshoot and debug?**
A: Observe the error stack information and locate the error code block from the error stack information, add a print or debugging point near the block of code where the error occurred, to further debugging. For details, please refer to `Data Processing Debugging Method 1 <https://www.mindspore.cn/tutorials/en/master/advanced/error_analysis/minddata_debug.html#method-1-errors-in-data-processing-execution,-print-logs-or-add-debug-points-to-code-debugging>`_ .
A: Observe the error stack information and locate the error code block from the error stack information, add a print or debugging point near the block of code where the error occurred, to further debugging. For details, please refer to `Data Processing Debugging Method 1 <https://www.mindspore.cn/docs/en/master/model_train/debug/error_analysis/minddata_debug.html#method-1-errors-in-data-processing-execution,-print-logs-or-add-debug-points-to-code-debugging>`_ .
**Q: How to test the each data processing operator in the map operation if data-enhanced map operation error is reported?**
A: Map operation can be debugged through the execution of individual operators or through data pipeline debugging mode. For details, please refer to `Data Processing Debugging Method 2 <https://www.mindspore.cn/tutorials/en/master/advanced/error_analysis/minddata_debug.html#method-2-data-enhanced-map-operation-error,-testing-the-each-data-processing-operator-in-the-map-operation>`_ .
A: Map operation can be debugged through the execution of individual operators or through data pipeline debugging mode. For details, please refer to `Data Processing Debugging Method 2 <https://www.mindspore.cn/docs/en/master/model_train/debug/error_analysis/minddata_debug.html#method-2-data-enhanced-map-operation-error,-testing-the-each-data-processing-operator-in-the-map-operation>`_ .
**Q: While training, we will get very many WARNINGs suggesting that our dataset performance is slow, how should we handle this?**
A: It is possible to iterate through the dataset individually and see the processing time for each piece of data to determine how well the dataset is performing. For details, please refer to `Data Processing Debugging Method 3 <https://www.mindspore.cn/tutorials/en/master/advanced/error_analysis/minddata_debug.html#method-3-testing-data-processing-performance>`_ .
A: It is possible to iterate through the dataset individually and see the processing time for each piece of data to determine how well the dataset is performing. For details, please refer to `Data Processing Debugging Method 3 <https://www.mindspore.cn/docs/en/master/model_train/debug/error_analysis/minddata_debug.html#method-3-testing-data-processing-performance>`_ .
**Q: In the process of processing data, if abnormal result values are generated due to computational errors, numerical overflow, etc., resulting in operator computation overflow and weight update anomalies during network training, how should we troubleshoot them?**
A: Turn off shuffling and fix random seeds to ensure reproductivity, and then use tools such as NumPy to quickly verify the results. For details, please refer to `Data Processing Debugging Method 4 <https://www.mindspore.cn/tutorials/en/master/advanced/error_analysis/minddata_debug.html#method-4-checking-for-exception-data-in-data-processing>`_ .
A: Turn off shuffling and fix random seeds to ensure reproductivity, and then use tools such as NumPy to quickly verify the results. For details, please refer to `Data Processing Debugging Method 4 <https://www.mindspore.cn/docs/en/master/model_train/debug/error_analysis/minddata_debug.html#method-4-checking-for-exception-data-in-data-processing>`_ .
For more common data processing problems, please refer to `Analyzing Common Data Processing Problems <https://www.mindspore.cn/tutorials/en/master/advanced/error_analysis/minddata_debug.html#analyzing-common-data-processing-problems>`_ , and for differences in data processing during migration, please refer to `Data Pre-Processing Differences Between MindSpore And PyTorch <https://www.mindspore.cn/docs/en/master/migration_guide/model_development/dataset.html#comparison-of-data-processing-differences>`_ .
For more common data processing problems, please refer to `Analyzing Common Data Processing Problems <https://www.mindspore.cn/docs/en/master/model_train/debug/error_analysis/minddata_debug.html#analyzing-common-data-processing-problems>`_ , and for differences in data processing during migration, please refer to `Data Pre-Processing Differences Between MindSpore And PyTorch <https://www.mindspore.cn/docs/en/master/migration_guide/model_development/dataset.html#comparison-of-data-processing-differences>`_ .
- Gradient Derivation
**Q: How can I implement the backward computation of an operator?**
A: MindSpore provides an automated interface for gradient derivation, a feature that shields the user from a great deal of the details and process of derivation. However, if there are some special scenarios where the user needs to manually control the calculation of its backward computation, the user can also define its backward computation through the Cell.bprop interface. For details, please refer to `Customize Cell reverse <https://www.mindspore.cn/tutorials/en/master/advanced/modules/layer.html#custom-cell-reverse>`_ .
A: MindSpore provides an automated interface for gradient derivation, a feature that shields the user from a great deal of the details and process of derivation. However, if there are some special scenarios where the user needs to manually control the calculation of its backward computation, the user can also define its backward computation through the Cell.bprop interface. For details, please refer to `Customize Cell reverse <https://www.mindspore.cn/docs/en/master/model_train/custom_program/network_custom.html#custom-cell-reverse>`_ .
**Q: How to deal with training instability due to gradient overflow?**
A: Network overflows are usually manifested as loss Nan/INF, the loss suddenly becomes very large. MindSpore provides `dump data <https://www.mindspore.cn/tutorials/experts/en/master/debug/dump.html>`_ to get the information about the overflow operator information. When there is gradient underflow in the network, we can use loss scale to support gradient derivation. For details, please refer to `loss scale <https://www.mindspore.cn/docs/en/master/migration_guide/model_development/gradient.html#loss-scale>`_; When the network has gradient explosion, you can consider adding gradient trimming. For details, please refer to `gradient cropping <https://www.mindspore.cn/docs/en/master/migration_guide/model_development/gradient.html#gradient-cropping>`_ .
A: Network overflows are usually manifested as loss Nan/INF, the loss suddenly becomes very large. MindSpore provides `dump data <https://www.mindspore.cn/docs/en/master/model_train/debug/dump.html>`_ to get the information about the overflow operator information. When there is gradient underflow in the network, we can use loss scale to support gradient derivation. For details, please refer to `loss scale <https://www.mindspore.cn/docs/en/master/migration_guide/model_development/gradient.html#loss-scale>`_; When the network has gradient explosion, you can consider adding gradient trimming. For details, please refer to `gradient cropping <https://www.mindspore.cn/docs/en/master/migration_guide/model_development/gradient.html#gradient-cropping>`_ .
- Debugging and Tuning
@@ -134,11 +134,14 @@ MindSpore provides a `FAQ <https://mindspore.cn/docs/en/master/faq/installation.
`MindSpore Model Accuracy Tuning Practice (3): Common Accuracy Problems <https://www.hiascend.com/forum/thread-0235121941523411032-1-1.html>`_.
For more debugging and tuning FAQs, please refer to `Tuning FAQs and Solutions <https://www.mindspore.cn/docs/en/master/migration_guide/debug_and_tune.html#debugging-tools>`_ .
For more debugging and tuning FAQs, please refer to
**Q: During model training, the first step takes a long time, how to optimize it?**
A: During the model training process, the first step contains the network compilation time. If you want to optimize the performance of the first step, you can analyze whether the model compilation can be optimized. For details, please refer to `Static graph network compilation performance optimization <https://www.mindspore.cn/tutorials/en/master/advanced/static_graph_expert_programming.html>`_.
A: During the model training process, the first step contains the network compilation time. If you want to optimize the performance of the first step, you can analyze whether the model compilation can be optimized. For details, please refer to `Static graph network compilation performance optimization <https://www.mindspore.cn/docs/en/master/model_train/program_form/static_graph_syntax/static_graph_expert_programming.html>`_.
**Q: The non-first step takes a long time during model training, how to optimize it?**
@@ -176,7 +179,7 @@ MindSpore provides a `FAQ <https://mindspore.cn/docs/en/master/faq/installation.
loss = loss/response_gt
return loss
See `Static diagram syntax support <https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html>`_ for details.
See `Static diagram syntax support <https://www.mindspore.cn/docs/en/master/model_train/program_form/static_graph.html>`_ for details.
**Q: What can I do if the error is reported during training: RuntimeError: "Launch kernel failed, name:Default/... What to do" ?**
@@ -190,7 +193,7 @@ MindSpore provides a `FAQ <https://mindspore.cn/docs/en/master/faq/installation.
A: There are many reasons for static graph errors, and the general failure will be printed in the log. If you can't intuitively get the error information from the log, you can analyze it by export GLOG_v=1 to specify the log level to get more detailed information about the error.
Meanwhile, when the compilation of computational graphs reports errors, it will automatically save the file analyze_failed.ir, which can help to analyze the location of the error code. For more details, please refer to `Static Graph Mode Error Analysis <https://www.mindspore.cn/tutorials/en/master/advanced/error_analysis/error_scenario_analysis.html>`_.
Meanwhile, when the compilation of computational graphs reports errors, it will automatically save the file analyze_failed.ir, which can help to analyze the location of the error code. For more details, please refer to `Static Graph Mode Error Analysis <https://www.mindspore.cn/docs/en/master/model_train/debug/error_analysis/error_scenario_analysis.html>`_.
**Q: Out Of Memory error is reported during Graph mode static graph training, what should I do?**
@@ -200,6 +203,6 @@ MindSpore provides a `FAQ <https://mindspore.cn/docs/en/master/faq/installation.
When there is not enough memory, try lowering the batch_size; analyze the memory to see if there are too many communication operators resulting in low overall memory reuse.
For more details, please refer to `Analysis of the problem of insufficient resources <https://www.mindspore.cn/tutorials/en/master/advanced/error_analysis/mindrt_debug.html#insufficient-resources>`_ .
For more details, please refer to `Analysis of the problem of insufficient resources <https://www.mindspore.cn/docs/en/master/model_train/debug/error_analysis/mindrt_debug.html#insufficient-resources>`_ .
See `Execution Issues <https://www.mindspore.cn/docs/en/master/faq/implement_problem.html>`_ for more tuning FAQs.
@@ -17,7 +17,7 @@ This guide describes how to apply various migration-related tools to improve the
| [MindSpore Dev Toolkit](https://www.mindspore.cn/devtoolkit/docs/en/master/index.html) | MindSpore Dev Toolkit is a development kit supporting the cross-platform Python IDE plug-in developed by MindSpore, and provides functions such as Project creation, intelligent supplement, API search, and Document search. | With capabilities such as API search, it is possible to improve the efficiency of users network migration development. |
| [TroubleShooter](https://gitee.com/mindspore/toolkits/tree/master/troubleshooter) | TroubleShooter is a MindSpore web development debugging toolkit designed to provide convenient, easy-to-use debugging capabilities. | Network debugging toolset (e.g., network weight migration, accuracy comparison, code tracing, error reporting analysis, execution tracking and other functions) to help users improve migration debugging efficiency. |
| [Profiler](https://www.mindspore.cn/mindinsight/docs/en/master/performance_profiling.html) | Profiler can record information such as operator time consumption during the training process into a file, which can be viewed and analyzed by the user through a visual interface, helping the user to debug neural network performance more efficiently. | After the network migration, if the execution performance is not good, you can use Profiler to analyze the performance. Profiler provides Profiler analysis of the host execution of the framework, as well as the execution of the operator. |
| [Dump](https://www.mindspore.cn/tutorials/experts/en/master/debug/dump.html) | The Dump function is provided to save the graphs from model training and the input and output data of the operators to a disk file. | Generally used for network migration complex problem localization (eg: operator overflow, etc.) and can dump out the operator-level data. |
| [Dump](https://www.mindspore.cn/docs/en/master/model_train/debug/dump.html) | The Dump function is provided to save the graphs from model training and the input and output data of the operators to a disk file. | Generally used for network migration complex problem localization (eg: operator overflow, etc.) and can dump out the operator-level data. |
## Examples of Network Migration Tool Applications
@@ -234,7 +234,7 @@ The final error is less than 1e-5, which is a reasonable accuracy error.
## 3. Customize operators
When existing APIs cannot be used for packaging, or the performance of cell encapsulation is poor, you need to customize operators. For details, see [Custom Operators](https://www.mindspore.cn/tutorials/experts/en/master/operation/op_custom.html).
When existing APIs cannot be used for packaging, or the performance of cell encapsulation is poor, you need to customize operators. For details, see [Custom Operators](https://www.mindspore.cn/docs/en/master/model_train/custom_program/operation/op_custom.html).
In addition to migrating APIs, you can also use the `aot` development mode of the `Custom` operator to call the PyTorch Aten operator for quick verification. For details, see [Using Third-party Operator Libraries Based on Customized Interfaces](https://www.mindspore.cn/docs/en/master/migration_guide/use_third_party_op.html).
[Lightweight Data Processing](https://mindspore.cn/tutorials/en/master/advanced/dataset/eager.html)
[Lightweight Data Processing](https://mindspore.cn/docs/en/master/model_train/dataset/eager.html)
[Optimizing the Data Processing](https://www.mindspore.cn/tutorials/experts/en/master/dataset/optimize.html)
[Optimizing the Data Processing](https://www.mindspore.cn/docs/en/master/model_train/dataset/optimize.html)
## Comparison of Data Processing Differences
@@ -19,7 +19,7 @@ The basic process of data construction in MindSpore and PyTorch mainly includes
### Processing Common Datasets
MindSpore provides [interfaces](https://www.mindspore.cn/docs/en/master/api_python/mindspore.dataset.html) for loading common datasets from many different domains.
In addition to the above commonly used datasets in the industry, MindSpore has also developed MindRecord data format to cope with efficient reading, mega data storage and reading scenarios, and you can refer to [MindRecord](https://www.mindspore.cn/tutorials/en/master/advanced/dataset/record.html) . Since this article is to introduce similar APIs and the differences in the writing style, so we have selected one of the more classic dataset APIs as an example of migration comparison. For other dataset interface differences, please refer to the [torchaudio](https://www.mindspore.cn/docs/en/master/note/api_mapping/pytorch_api_mapping.html#torchaudio), [torchtext](https://www.mindspore.cn/docs/en/master/note/api_mapping/pytorch_api_mapping.html#torchtext), [torchvision](https://www.mindspore.cn/docs/en/master/note/api_mapping/pytorch_api_mapping.html#torchvision) modules of PyTorch and MindSpore API mapping table.
In addition to the above commonly used datasets in the industry, MindSpore has also developed MindRecord data format to cope with efficient reading, mega data storage and reading scenarios, and you can refer to [MindRecord](https://www.mindspore.cn/docs/en/master/model_train/dataset/record.html) . Since this article is to introduce similar APIs and the differences in the writing style, so we have selected one of the more classic dataset APIs as an example of migration comparison. For other dataset interface differences, please refer to the [torchaudio](https://www.mindspore.cn/docs/en/master/note/api_mapping/pytorch_api_mapping.html#torchaudio), [torchtext](https://www.mindspore.cn/docs/en/master/note/api_mapping/pytorch_api_mapping.html#torchtext), [torchvision](https://www.mindspore.cn/docs/en/master/note/api_mapping/pytorch_api_mapping.html#torchvision) modules of PyTorch and MindSpore API mapping table.
Here is an example of FashionMnistDataset. The following figure shows how to use the PyTorch API (left part), and how to use the MindSpore API (right part). The main reading process is: use FashionMnist API to load the source dataset, then use transforms to transform the data content, and finally according to the batch operation on the dataset. The key parts of the code on both sides are marked with color boxes.
@@ -358,7 +358,7 @@ MindSpore does not require this function. MindSpore is an automatic differentiat
## Automatic Differentiation Interfaces
After the forward network is constructed, MindSpore provides an interface to [automatic differentiation](https://mindspore.cn/tutorials/en/master/beginner/autograd.html) to calculate the gradient results of the model.
In the tutorial of [automatic derivation](https://mindspore.cn/tutorials/en/master/advanced/derivation.html), some descriptions of various gradient calculation scenarios are given.
In the tutorial of [automatic derivation](https://mindspore.cn/docs/en/master/model_train/train_process/derivation.html), some descriptions of various gradient calculation scenarios are given.
### mindspore.grad
@@ -640,9 +640,9 @@ This function is similar to the function of grad, and it is not recommended in t
Since the gradient overflow may be encountered in the process of finding the gradient in the mixed accuracy scenario, we generally use the loss scale to accompany the gradient derivation.
> On Ascend, because operators such as Conv, Sort, and TopK can only be float16, and MatMul is preferably float16 due to performance issues, it is recommended that loss scale operations be used as standard for network training. [List of operators on Ascend only support float16][https://www.mindspore.cn/docs/en/master/migration_guide/debug_and_tune.html#4-training-accuracy].
> On Ascend, because operators such as Conv, Sort, and TopK can only be float16, and MatMul is preferably float16 due to performance issues, it is recommended that loss scale operations be used as standard for network training. [List of operators on Ascend only support float16][https://www.mindspore.cn/docs/en/master/migration_guide/acc_debug.html#4-training-accuracy].
>
> The overflow can obtain overflow operator information via MindSpore Insight [dump data](https://mindspore.cn/tutorials/experts/en/master/debug/dump.html).
> The overflow can obtain overflow operator information via MindSpore Insight [dump data](https://mindspore.cn/docs/en/master/model_train/debug/dump.html).
>
> General overflow manifests itself as loss Nan/INF, loss suddenly becomes large, etc.
@@ -708,4 +708,4 @@ grad = ops.clip_by_global_norm(grad)
Gradient accumulation is a way that data samples of a kind of training neural network is split into several small Batches by Batch, and then calculated in order to solve the OOM (Out Of Memory) problem that due to the lack of memory, resulting in too large Batch size, the neural network can not be trained or the network model is too large to load.
For detailed, refer to [Gradient Accumulation](https://www.mindspore.cn/tutorials/experts/en/master/optimize/gradient_accumulation.html).
For detailed, refer to [Gradient Accumulation](https://www.mindspore.cn/docs/en/master/model_train/train_process/optimize/gradient_accumulation.html).
[](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/migration_guide/model_development/learning_rate_and_optimizer.md)
Before reading this chapter, please read the official MindSpore tutorial [Optimizer](https://mindspore.cn/tutorials/en/master/advanced/modules/optimizer.html).
Before reading this chapter, please read the official MindSpore tutorial [Optimizer](https://mindspore.cn/docs/en/master/model_train/custom_program/optimizer.html).
Here is an introduction to some special ways of using MindSpore optimizer and the principle of learning rate decay strategy.
[](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/migration_guide/model_development/loss_function.md)
Before reading this chapter, please read the MindSpore official website tutorial first[Loss Function](https://www.mindspore.cn/tutorials/en/master/advanced/modules/loss.html).
Before reading this chapter, please read the MindSpore official website tutorial first[Loss Function](https://www.mindspore.cn/docs/en/master/model_train/custom_program/loss.html).
The MindSpore official website tutorial on loss functions explains built-in, custom, and multi label loss functions, as well as guidance on their use in model training. Here is a list of differences in functionality and interface between MindSpore's loss function and PyTorch's loss function.
@@ -10,7 +10,7 @@ The basic logic of PyTorch and MindSpore is shown below:
It can be seen that PyTorch and MindSpore generally require network definition, forward computation, backward computation, and gradient update steps in the implementation process.
- Network definition: In the network definition, the desired forward network, loss function, and optimizer are generally defined. To define the forward network in Net(), PyTorch network inherits from nn.Module; similarly, MindSpore network inherits from nn.Cell. In MindSpore, the loss function and optimizers can be customized in addition to using those provided in MindSpore. You can refer to [Model Module Customization](https://mindspore.cn/tutorials/en/master/advanced/modules.html). Interfaces such as functional/nn can be used to splice the required forward networks, loss functions and optimizers.
- Network definition: In the network definition, the desired forward network, loss function, and optimizer are generally defined. To define the forward network in Net(), PyTorch network inherits from nn.Module; similarly, MindSpore network inherits from nn.Cell. In MindSpore, the loss function and optimizers can be customized in addition to using those provided in MindSpore. You can refer to [Model Module Customization](https://mindspore.cn/docs/en/master/model_train/index.html). Interfaces such as functional/nn can be used to splice the required forward networks, loss functions and optimizers.
- Forward computation: Run the instantiated network to get the logit, and use the logit and target as inputs to calculate the loss. It should be noted that if the forward function has more than one output, you need to pay attention to the effect of more than one output on the result when calculating the backward function.
Generally, the high-level API encapsulated by MindSpore initializes parameters by default. Sometimes, the initialization distribution is inconsistent with the required initialization and PyTorch initialization. In this case, you need to customize initialization. [Initializing Network Arguments](https://mindspore.cn/tutorials/en/master/advanced/modules/initializer.html#customized-parameter-initialization) describes a method of initializing parameters during using API attributes. This section describes a method of initializing parameters by using Cell.
Generally, the high-level API encapsulated by MindSpore initializes parameters by default. Sometimes, the initialization distribution is inconsistent with the required initialization and PyTorch initialization. In this case, you need to customize initialization. [Initializing Network Arguments](https://mindspore.cn/docs/en/master/model_train/custom_program/initializer.html#customized-parameter-initialization) describes a method of initializing parameters during using API attributes. This section describes a method of initializing parameters by using Cell.
For details about the parameters, see [Network Parameters](https://mindspore.cn/tutorials/zh-CN/master/advanced/modules/initializer.html). This section uses `Cell` as an example to describe how to obtain all parameters in `Cell` and how to initialize the parameters in `Cell`.
For details about the parameters, see [Network Parameters](https://mindspore.cn/docs/zh-CN/master/model_train/custom_program/initializer.html). This section uses `Cell` as an example to describe how to obtain all parameters in `Cell` and how to initialize the parameters in `Cell`.
> Note that the method described in this section cannot be performed in `construct`. To change the value of a parameter on the network, use [assign](https://www.mindspore.cn/docs/en/master/api_python/ops/mindspore.ops.assign.html).
@@ -766,7 +766,7 @@ For `Cell`, MindSpore provides two image modes: `GRAPH_MODE` (static image) and
The **inference** behavior of the model in `PyNative` mode is the same as that of common Python code. However, during training, **once a tensor is converted into NumPy for other operations, the gradient of the network is truncated, which is equivalent to detach of PyTorch**.
When `GRAPH_MODE` is used, syntax restrictions usually occur. In this case, graph compilation needs to be performed on the Python code. However, MindSpore does not support the complete Python syntax set. Therefore, there are some restrictions on compiling the `construct` function. For details about the restrictions, see [MindSpore Static Graph Syntax](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html).
When `GRAPH_MODE` is used, syntax restrictions usually occur. In this case, graph compilation needs to be performed on the Python code. However, MindSpore does not support the complete Python syntax set. Therefore, there are some restrictions on compiling the `construct` function. For details about the restrictions, see [MindSpore Static Graph Syntax](https://www.mindspore.cn/docs/en/master/model_train/program_form/static_graph.html).
Compared with the detailed syntax description, the common restrictions are as follows:
Now, let's see how to [customize backward network construction](https://www.mindspore.cn/tutorials/en/master/advanced/modules/layer.html#custom-cell-reverse).
Now, let's see how to [customize backward network construction](https://www.mindspore.cn/docs/en/master/model_train/custom_program/network_custom.html#custom-cell-reverse).
If on the GPU, you can set which cards to use by `export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7`. Specifying the card number is not currently supported on Ascend.
Please refer to [Distributed Case](https://www.mindspore.cn/tutorials/experts/en/master/parallel/distributed_case.html) for more details.
Please refer to [Distributed Case](https://www.mindspore.cn/docs/en/master/model_train/parallel/distributed_case.html) for more details.
## Offline Inference
In addition to the possibility of online inference, MindSpore provides many offline inference methods for different environments. Please refer to [Model Inference](https://www.mindspore.cn/tutorials/experts/en/master/infer/inference.html) for details.
In addition to the possibility of online inference, MindSpore provides many offline inference methods for different environments. Please refer to [Model Inference](https://www.mindspore.cn/docs/en/master/model_infer/overview.html) for details.
@@ -82,4 +82,7 @@ After obtaining the reference code, you need to reproduce the accuracy of the re
- Obtain the loss decrease trend to check whether the training convergence trend on MindSpore is normal.
- Obtain the parameter file for conversion and inference verification. For details, see [Inference and Training Process](https://www.mindspore.cn/docs/en/master/migration_guide/model_development/training_and_evaluation.html).
- Obtain the performance baseline for performance tuning. For details, see [Debugging and Tuning](https://www.mindspore.cn/docs/en/master/migration_guide/debug_and_tune.html).
- Obtain the performance baseline for performance tuning. For details, see
@@ -1081,7 +1081,7 @@ MindSpore has three methods to use mixed precision:
1. Use `Cast` to convert the network input `cast` into `float16` and the loss input `cast` into `float32`.
2. Use the `to_float` method of `Cell`. For details, see [Network Construction](https://www.mindspore.cn/docs/en/master/migration_guide/model_development/model_and_cell.html).
3. Use the `amp_level` interface of the `Model` to perform mixed precision. For details, see [Automatic Mixed-Precision](https://www.mindspore.cn/tutorials/en/master/advanced/mixed_precision.html#automatic-mix-precision).
3. Use the `amp_level` interface of the `Model` to perform mixed precision. For details, see [Automatic Mixed-Precision](https://www.mindspore.cn/tutorials/en/master/beginner/mixed_precision.html#automatic-mix-precision).
Use the third method to set `amp_level` in `Model` to `O3` and check the profiler result.
@@ -1102,7 +1102,7 @@ If most of the data queues are empty, you need to optimize the data performance.
In the queue of each data processing operation, the last operator and the `batch` operator are empty for a long time. In this case, you can increase the degree of parallelism of the `batch` operator. For details, see [Data Processing Performance Tuning](https://www.mindspore.cn/tutorials/experts/en/master/dataset/optimize.html).
In the queue of each data processing operation, the last operator and the `batch` operator are empty for a long time. In this case, you can increase the degree of parallelism of the `batch` operator. For details, see [Data Processing Performance Tuning](https://www.mindspore.cn/docs/en/master/model_train/dataset/optimize.html).
The code required for ResNet migration can be obtained from [code](https://gitee.com/mindspore/docs/tree/master/docs/mindspore/source_zh_cn/migration_guide/code).
@@ -7,4 +7,4 @@ A [sparse tensor](https://matteding.github.io/2019/04/25/sparse-matrices/) is a
In some scenarios (such as recommendation systems, molecular dynamics, graph neural networks), the data is sparse. If you use common dense tensors to represent the data, you may introduce many unnecessary calculations, storage, and communication costs. In this case, it is better to use sparse tensor to represent the data.
MindSpore now supports the most commonly used [CSR and COO data formats](https://www.mindspore.cn/tutorials/en/master/beginner/tensor.html#sparse-tensor). Currently, only a limited number of sparse operators are supported, and most sparse features are restricted. In this case, you are advised to check whether the corresponding operator supports sparse computing. If the operator does not support sparse computing, convert it into a common operator.
After the operator is converted into a dense operator, the video memory used increases. Therefore, the batch size implemented by referring to may not be used for training. In this case, you can use [Gradient Accumulation](https://www.mindspore.cn/tutorials/experts/en/master/optimize/gradient_accumulation.html) to simulate large batch training.
After the operator is converted into a dense operator, the video memory used increases. Therefore, the batch size implemented by referring to may not be used for training. In this case, you can use [Gradient Accumulation](https://www.mindspore.cn/docs/en/master/model_train/train_process/optimize/gradient_accumulation.html) to simulate large batch training.
When lacking of the built-in operators during developing a network, you can use the primitive in [Custom](https://www.mindspore.cn/docs/en/master/api_python/ops/mindspore.ops.Custom.html#mindspore-ops-custom) to easily and quickly define and use different types of customized operators.
Developers can choose different customized operator development methods according to their needs. For details, please refer to the [Usage Guide](https://www.mindspore.cn/tutorials/experts/en/master/operation/op_custom.html) of Custom operator.
Developers can choose different customized operator development methods according to their needs. For details, please refer to the [Usage Guide](https://www.mindspore.cn/docs/en/master/model_train/custom_program/operation/op_custom.html) of Custom operator.
One of the development methods for customized operators, the `aot` method, has its own special use. The `aot` can call the corresponding `cpp`/`cuda` functions by loading a pre-compiled `so`. Therefore. When a third-party library provides `API`, a `cpp`/`cuda` function, you can try to call its function interface in `so`, which is described below by taking `Aten` library in PyTorch as an example.
(https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/mindformers/appendix/env_variables.md)
[](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/mindformers/appendix/env_variables.md)
[](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/model_infer/llm_lite.md)
[](https://gitee.com/mindspore/docs/blob/master/tutorials/experts/source_en/infer/model_compression.md)
[](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/model_infer/model_compression.md)
[](https://gitee.com/mindspore/docs/blob/master/tutorials/experts/source_en/infer/inference.md)
[](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/model_infer/overview.md)
MindSpore can execute inference tasks on different hardware platforms based on trained models.
[](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/model_train/custom_program/fusion_pass.md)
[](https://gitee.com/mindspore/docs/blob/master/tutorials/source_en/advanced/modules/layer.md)
# Hook Programming
# Cell and Parameter
Cell, as the basic unit of neural network construction, corresponds to the concept of neural network layer, and the abstract encapsulation of Tensor computation operation can represent the neural network structure more accurately and clearly. In addition to the basic Tensor computation flow definition, the neural network layer contains functions such as parameter management and state management. Parameter is the core of neural network training and is usually used as an internal member variable of the neural network layer. In this section, we systematically introduce parameters, neural network layers and their related usage.
## Parameter
Parameter is a special class of Tensor, which is a variable whose value can be updated during model training. MindSpore provides the `mindspore.Parameter` class for Parameter construction. In order to distinguish between Parameter for different purposes, two different categories of Parameter are defined below. In order to distinguish between Parameter for different purposes, two different categories of Parameter are defined below:
- Trainable parameter. Tensor that is updated after the gradient is obtained according to the backward propagation algorithm during model training, and `required_grad` needs to be set to `True`.
- Untrainable parameters. Tensor that does not participate in backward propagation needs to update values (e.g. `mean` and `var` variables in BatchNorm), when `requires_grad` needs to be set to `False`.
> Parameter is set to `required_grad=True` by default.
We construct a simple fully-connected layer as follows:
In the `__init__` method of `Cell`, we define two parameters `w` and `b` and configure `name` for namespace management. Use `self.attr` in the `construct` method to call directly to participate in Tensor operations.
### Obtaining Parameter
After constructing the neural network layer by using Cell+Parameter, we can use various methods to obtain the Parameter managed by Cell.
#### Obtaining a Single Parameter
To get a particular parameter individually, just call a member variable of a Python class directly.
```python
print(net.b.asnumpy())
```
```text
[-1.2192779 -0.36789745 0.0946381 ]
```
#### Obtaining a Trainable Parameter
Trainable parameters can be obtained by using the `Cell.trainable_params` method, and this interface is usually called when configuring the optimizer.
Use the `Cell.get_parameters()` method to get all parameters, at which point a Python iterator will be returned.
```python
print(type(net.get_parameters()))
```
```text
<class 'generator'>
```
Or you can call `Cell.parameters_and_names` to return the parameter names and parameters.
```python
for name, param in net.parameters_and_names():
print(f"{name}:\n{param.asnumpy()}")
```
```text
w:
[[ 4.15680408e-02 -1.20311625e-01 5.02573885e-02]
[ 1.22175144e-04 -1.34980649e-01 1.17642188e+00]
[ 7.57667869e-02 -1.74758151e-01 -5.19092619e-01]
[-1.67846107e+00 3.27240258e-01 -2.06452996e-01]
[ 5.72323874e-02 -8.27963874e-02 5.94243526e-01]]
b:
[-1.2192779 -0.36789745 0.0946381 ]
```
### Modifying the Parameter
#### Modifying Parameter Values Directly
Parameter is a special kind of Tensor, so its value can be modified by using the Tensor index modification.
```python
net.b[0] = 1.
print(net.b.asnumpy())
```
```text
[ 1. -0.36789745 0.0946381 ]
```
#### Overriding the Modified Parameter Values
The `Parameter.set_data` method can be called to override the Parameter by using a Tensor with the same Shape. This method is commonly used for [Cell traversal initialization](https://www.mindspore.cn/tutorials/en/master/advanced/modules/initializer.html) by using Initializer.
```python
net.b.set_data(Tensor([3, 4, 5]))
print(net.b.asnumpy())
```
```text
[3. 4. 5.]
```
#### Modifying Parameter Values During Runtime
The main role of parameters is to update their values during model training, which involves parameter modification during runtime after backward propagation to obtain gradients, or when untrainable parameters need to be updated. Due to the compiled design of MindSpore's [Accelerating with Static Graphs](https://www.mindspore.cn/tutorials/en/master/beginner/accelerate_with_static_graph.html), it is necessary at this point to use the `mindspore.ops.assign` interface to assign parameters. This method is commonly used in [Custom Optimizer](https://www.mindspore.cn/tutorials/en/master/advanced/modules/optimizer.html) scenarios. The following is a simple sample modification of parameter values during runtime:
```python
import mindspore as ms
@ms.jit
def modify_parameter():
b_hat = ms.Tensor([7, 8, 9])
ops.assign(net.b, b_hat)
return True
modify_parameter()
print(net.b.asnumpy())
```
```text
[7. 8. 9.]
```
### Parameter Tuple
ParameterTuple, variable tuple, used to store multiple Parameter, is inherited from tuple tuples, and provides cloning function.
The following example provides the ParameterTuple creation method:
```python
from mindspore.common.initializer import initializer
from mindspore import ParameterTuple
# Creation
x = Parameter(default_input=ms.Tensor(np.arange(2 * 3).reshape((2, 3))), name="x")
y = Parameter(default_input=initializer('ones', [1, 2, 3], ms.float32), name='y')
z = Parameter(default_input=2.0, name='z')
params = ParameterTuple((x, y, z))
# Clone from params and change the name to "params_copy"
Some Tensor operations in neural networks do not behave the same during training and inference, e.g., `nn.Dropout` performs random dropout during training but not during inference, and `nn.BatchNorm` requires updating the `mean` and `var` variables during training and fixing their values unchanged during inference. So we can set the state of the neural network through the `Cell.set_train` interface.
When `set_train` is set to True, the neural network state is `train`, and the default value of `set_train` interface is `True`:
```python
net.set_train()
print(net.phase)
```
```text
train
```
When `set_train` is set to False, the neural network state is `predict`:
```python
net.set_train(False)
print(net.phase)
```
```text
predict
```
## Custom Neural Network Layers
Normally, the neural network layer interface and function interface provided by MindSpore can meet the model construction requirements, but since the AI field is constantly updating, it is possible to encounter new network structures without built-in modules. At this point, we can customize the neural network layer through the function interface provided by MindSpore, Primitive operator, and can use the `Cell.bprop` method to customize the reverse. The following are the details of each of the three customization methods.
### Constructing Neural Network Layers by Using the Function Interface
MindSpore provides a large number of basic function interfaces, which can be used to construct complex Tensor operations, encapsulated as neural network layers. The following is an example of `Threshold` with the following equation:
$$
y =\begin{cases}
x, &\text{ if } x > \text{threshold} \\
\text{value}, &\text{ otherwise }
\end{cases}
$$
It can be seen that `Threshold` determines whether the value of the Tensor is greater than the `threshold` value, keeps the value whose judgment result is `True`, and replaces the value whose judgment result is `False`. Therefore, the corresponding implementation is as follows:
```python
class Threshold(nn.Cell):
def __init__(self, threshold, value):
super().__init__()
self.threshold = threshold
self.value = value
def construct(self, inputs):
cond = ops.gt(inputs, self.threshold)
value = ops.fill(inputs.dtype, inputs.shape, self.value)
return ops.select(cond, inputs, value)
```
Here `ops.gt`, `ops.fill`, and `ops.select` are used to implement judgment and replacement respectively. The following custom `Threshold` layer is implemented:
It can be seen that `inputs[0] = threshold`, so it is replaced with `20`.
### Custom Cell Reverse
In special scenarios, we not only need to customize the forward logic of the neural network layer, but also want to manually control the computation of its reverse, which we can define through the `Cell.bprop` interface. The function will be used in scenarios such as new neural network structure design and backward propagation speed optimization. In the following, we take `Dropout2d` as an example to introduce custom Cell reverse.
```python
class Dropout2d(nn.Cell):
def __init__(self, keep_prob):
super().__init__()
self.keep_prob = keep_prob
self.dropout2d = ops.Dropout2D(keep_prob)
def construct(self, x):
return self.dropout2d(x)
def bprop(self, x, out, dout):
_, mask = out
dy, _ = dout
if self.keep_prob != 0:
dy = dy * (1 / self.keep_prob)
dy = mask.astype(mindspore.float32) * dy
return (dy.astype(x.dtype), )
dropout_2d = Dropout2d(0.8)
dropout_2d.bprop_debug = True
```
The `bprop` method has three separate input parameters:
- *x*: Forward input. When there are multiple forward inputs, the same number of inputs are required.
- *out*: Forward input.
- *dout*: When backward propagation is performed, the current Cell executes the previous reverse result.
Generally we need to calculate the reverse result according to the reverse derivative formula based on the forward output and the reverse result of the front layer, and return it. The reverse calculation of `Dropout2d` requires masking the reverse result of the front layer based on the `mask` matrix of the forward output, and then scaling according to `keep_prob`. The final implementation can get the correct calculation result.
When customizing the reverse direction of a Cell, it supports extended writing in PyNative mode and can differentiate the weights inside the Cell. The specific columns are as follows:
`bprop` method supports *args as an input parameter, and the last data in the args array, `args[-1]` is the gradient returned to the cell. Set the weight of differentiation through `self.internal_params`, and return a tuple and a dictionary in the `bprop` function. Return the tuple corresponding to the input gradient, as well as the dictionary corresponding to the gradient with key as the weight and value as the weight.
## Hook Function
[](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/model_train/custom_program/hook_program.md)
Debugging deep learning networks is a big task for every practitioner in the field of deep learning. Since the deep learning network hides the input and output data as well as the inverse gradient of the intermediate layer operators, only the gradient of the network input data (feature quantity and weight) is provided, resulting in the inability to accurately sense the data changes of the intermediate layer operators, which reduces the debugging efficiency. In order to facilitate users to debug the deep learning network accurately and quickly, MindSpore designes Hook function in dynamic graph mode. **Using Hook function can capture the input and output data of intermediate layer operators as well as the reverse gradient**.
Currently, four forms of Hook functions are provided in dynamic graph mode: HookBackward operator and register_forward_pre_hook, register_forward_hook, register_backward_hook functions registered on Cell objects.
Currently, five forms of Hook functions are provided in dynamic graph mode: HookBackward operator and register_forward_pre_hook, register_forward_hook, register_backward_pre_hook, register_backward_hook functions registered on Cell objects.
### HookBackward Operator
## HookBackward Operator
HookBackward implements the Hook function in the form of an operator. The user initializes a HookBackward operator and places it at the location in the deep learning network where the gradient needs to be captured. In the forward execution of the network, the HookBackward operator outputs the input data as is without any modification. When the network back propagates the gradient, the Hook function registered on HookBackward will capture the gradient back propagated to this point. The user can customize the operation on the gradient in the Hook function, such as printing the gradient, or returning a new gradient.
For more descriptions of the HookBackward operator, refer to the [API documentation](https://mindspore.cn/docs/en/master/api_python/ops/mindspore.ops.HookBackward.html).
### register_forward_pre_hook Function in Cell Object
## register_forward_pre_hook Function in Cell Object
The user can use the `register_forward_pre_hook` function on the Cell object to register a custom Hook function to capture data that is passed to that Cell object. This function does not work in static graph mode and inside functions modified with `@jit`. The `register_forward_pre_hook` function takes the Hook function as an input and returns a `handle` object that corresponds to the Hook function. The user can remove the corresponding Hook function by calling the `remove()` function of the `handle` object. Each call to the `register_forward_pre_hook` function returns a different `handle` object. Hook functions should be defined in the following way.
@@ -441,7 +147,7 @@ To avoid running failure when scripts switch to graph mode, it is not recommende
For more information about the `register_forward_pre_hook` function of the Cell object, refer to the [API documentation](https://mindspore.cn/docs/en/master/api_python/nn/mindspore.nn.Cell.html#mindspore.nn.Cell.register_forward_pre_hook).
### register_forward_hook Function of Cell Object
## register_forward_hook Function of Cell Object
The user can use the `register_forward_hook` function on the Cell object to register a custom Hook function that captures the data passed forward to the Cell object and the output data of the Cell object. This function does not work in static graph mode and inside functions modified with `@jit`. The `register_forward_hook` function takes the Hook function as an input and returns a `handle` object that corresponds to the Hook function. The user can remove the corresponding Hook function by calling the `remove()` function of the `handle` object. Each call to the `register_forward_hook` function returns a different `handle` object. Hook functions should be defined in the following way.
@@ -506,21 +212,86 @@ To avoid running failure when the script switches to graph mode, it is not recom
For more information about the `register_forward_hook` function of the Cell object, please refer to the [API documentation](https://mindspore.cn/docs/en/master/api_python/nn/mindspore.nn.Cell.html#mindspore.nn.Cell.register_forward_hook).
### register_backward_hook Function of Cell Object
## register_backward_pre_hook Function of Cell Object
The user can use the `register_backward_pre_hook` function on the Cell object to register a custom Hook function that captures the gradient associated with the Cell object when the network is back propagated. This function does not work in graph mode or inside functions modified with `@jit`. The `register_backward_pre_hook` function takes the Hook function as an input and returns a `handle` object that corresponds to the Hook function. The user can remove the corresponding Hook function by calling the `remove()` function of the `handle` object. Each call to the `register_backward_pre_hook` function will return a different `handle` object.
Unlike the custom Hook function used by the HookBackward operator, the inputs of the Hook function used by `register_backward_pre_hook` contains `cell`, which represents the information of the Cell object, the gradient passed to the Cell object in reverse of the Cell object.
Here `cell` is the information of the Cell object, `grad_output` is the gradient passed to the Cell object when the network is back-propagated. Therefore, the user can use the `register_backward_pre_hook` function to capture the backward input gradients of a particular Cell object in the network. The user can customize the operations on the gradient in the Hook function, such as viewing, printing the gradient, or returning the new input gradient. If you need to return the new input gradient in the Hook function, the return value must be in the form of `tuple`.
To avoid running failure when the scripts switch to graph mode, it is not recommended to call the `register_backward_pre_hook` function and the `remove()` function of the `handle` object in the `construct` function of the Cell object. In PyNative mode, if the `register_backward_pre_hook` function is called in the `construct` function of the Cell object, the Cell object will register a new Hook function every time it runs.
For more information about the `register_backward_pre_hook` function of the Cell object, please refer to the [API documentation](https://mindspore.cn/docs/en/master/api_python/nn/mindspore.nn.Cell.html#mindspore.nn.Cell.register_backward_pre_hook).
## register_backward_hook Function of Cell Object
The user can use the `register_backward_hook` function on the Cell object to register a custom Hook function that captures the gradient associated with the Cell object when the network is back propagated. This function does not work in graph mode or inside functions modified with `@jit`. The `register_backward_hook` function takes the Hook function as an input and returns a `handle` object that corresponds to the Hook function. The user can remove the corresponding Hook function by calling the `remove()` function of the `handle` object. Each call to the `register_backward_hook` function will return a different `handle` object.
Unlike the custom Hook function used by the HookBackward operator, the inputs of the Hook function used by `register_backward_hook` contains `cell_id`, which represents the name and id information of the Cell object, the gradient passed to the Cell object in reverse, and the gradient of the reverse output of the Cell object.
Unlike the custom Hook function used by the HookBackward operator, the inputs of the Hook function used by `register_backward_hook` contains `cell`, which represents the information of the Cell object, the gradient passed to the Cell object in reverse, and the gradient of the reverse output of the Cell object.
Here `cell_id` is the name and the ID information of the Cell object, `grad_input` is the gradient passed to the Cell object when the network is back-propagated, which corresponds to the reverse output gradient of the next operator in the forward process. `grad_output` is the gradient of the reverse output of the Cell object. Therefore, the user can use the `register_backward_hook` function to capture the backward input and backward output gradients of a particular Cell object in the network. The user can customize the operations on the gradient in the Hook function, such as viewing, printing the gradient, or returning the new output gradient. If you need to return the new output gradient in the Hook function, the return value must be in the form of `tuple`.
Here `cell` is the information of the Cell object, `grad_input` is the gradient of the reverse output of the Cell object. `grad_output` is the gradient passed to the Cell object when the network is back-propagated, which corresponds to the reverse output gradient of the next operator in the forward process. Therefore, the user can use the `register_backward_hook` function to capture the backward input and backward output gradients of a particular Cell object in the network. The user can customize the operations on the gradient in the Hook function, such as viewing, printing the gradient, or returning the new output gradient. If you need to return the new output gradient in the Hook function, the return value must be in the form of `tuple`.
When the `register_backward_hook` function and the `register_forward_pre_hook` function, and the `register_forward_hook` function act on the same Cell object at the same time, if the `register_forward_pre_hook` and the `register_forward_hook` functions add other operators for data processing, these new operators will participate in the forward calculation of the data before or after the execution of the Cell object, but the backward gradient of these new operators is not captured by the `register_backward_hook` function. The Hook function registered in `register_backward_hook` only captures the input and output gradients of the original Cell object.
To avoid running failure when the scripts switch to graph mode, it is not recommended to call the `register_backward_hook` function and the `remove()` function of the `handle` object in the `construct` function of the Cell object. In PyNative mode, if the `register_backward_hook` function is called in the `construct` function of the Cell object, the Cell object will register a new Hook function every time it runs.
For more information about the `register_backward_hook` function of the Cell object, please refer to the [API documentation](https://mindspore.cn/docs/en/master/api_python/nn/mindspore.nn.Cell.html#mindspore.nn.Cell.register_backward_hook).
## Using the multiple hook function of Cell Object
When the `register_backward_pre_hook` function, the `register_backward_hook` function, the `register_forward_pre_hook` function, and the `register_forward_hook` function act on the same Cell object at the same time, if the `register_forward_pre_hook` and the `register_forward_hook` functions add other operators for data processing, these new operators will participate in the forward calculation of the data before or after the execution of the Cell object, but the backward gradient of these new operators is not captured by the `register_backward_pre_hook` function or the `register_backward_hook` function. The Hook function registered in `register_backward_pre_hook` only captures the input gradients of the original Cell object. The Hook function registered in `register_backward_hook` only captures the input and output gradients of the original Cell object.
Here `grad_input` is the gradient passed to `self.relu` when the gradient is back-propagated, not the gradient of the new `Add` operator in the `forward_hook_fn` function. Here `grad_output` is the reverse output gradient of the `self.relu` when the gradient is back-propagated, not the reverse output gradient of the new `Add` operator in the `forward_pre_hook_fn` function. The `register_forward_pre_hook` and `register_forward_hook` functions work before and after the execution of the Cell object and do not affect the gradient capture range of the reverse Hook function on the Cell object.
To avoid running failure when the scripts switch to graph mode, it is not recommended to call the `register_backward_hook` function and the `remove()` function of the `handle` object in the `construct` function of the Cell object. In PyNative mode, if the `register_backward_hook` function is called in the `construct` function of the Cell object, the Cell object will register a new Hook function every time it runs.
For more information about the `register_backward_hook` function of the Cell object, please refer to the [API documentation](https://mindspore.cn/docs/en/master/api_python/nn/mindspore.nn.Cell.html#mindspore.nn.Cell.register_backward_hook).
Here `grad_output` is the gradient passed to `self.relu` when the gradient is back-propagated, not the gradient of the new `Add` operator in the `forward_hook_fn` function. Here `grad_input` is the reverse output gradient of the `self.relu` when the gradient is back-propagated, not the reverse output gradient of the new `Add` operator in the `forward_pre_hook_fn` function. The `register_forward_pre_hook` and `register_forward_hook` functions work before and after the execution of the Cell object and do not affect the gradient capture range of the reverse Hook function on the Cell object.
[](https://gitee.com/mindspore/docs/blob/master/tutorials/source_en/advanced/modules/initializer.md)
# Custom Parameter Initialization
# Parameter Initialization
[](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/model_train/custom_program/initializer.md)
[](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/model_train/custom_program/layer.md)
Cell, as the basic unit of neural network construction, corresponds to the concept of neural network layer, and the abstract encapsulation of Tensor computation operation can represent the neural network structure more accurately and clearly. In addition to the basic Tensor computation flow definition, the neural network layer contains functions such as parameter management and state management. Parameter is the core of neural network training and is usually used as an internal member variable of the neural network layer. In this section, we systematically introduce parameters, neural network layers and their related usage.
## Parameter
Parameter is a special class of Tensor, which is a variable whose value can be updated during model training. MindSpore provides the `mindspore.Parameter` class for Parameter construction. In order to distinguish between Parameter for different purposes, two different categories of Parameter are defined below. In order to distinguish between Parameter for different purposes, two different categories of Parameter are defined below:
- Trainable parameter. Tensor that is updated after the gradient is obtained according to the backward propagation algorithm during model training, and `required_grad` needs to be set to `True`.
- Untrainable parameters. Tensor that does not participate in backward propagation needs to update values (e.g. `mean` and `var` variables in BatchNorm), when `requires_grad` needs to be set to `False`.
> Parameter is set to `required_grad=True` by default.
We construct a simple fully-connected layer as follows:
In the `__init__` method of `Cell`, we define two parameters `w` and `b` and configure `name` for namespace management. Use `self.attr` in the `construct` method to call directly to participate in Tensor operations.
### Obtaining Parameter
After constructing the neural network layer by using Cell+Parameter, we can use various methods to obtain the Parameter managed by Cell.
#### Obtaining a Single Parameter
To get a particular parameter individually, just call a member variable of a Python class directly.
```python
print(net.b.asnumpy())
```
```text
[-1.2192779 -0.36789745 0.0946381 ]
```
#### Obtaining a Trainable Parameter
Trainable parameters can be obtained by using the `Cell.trainable_params` method, and this interface is usually called when configuring the optimizer.
Use the `Cell.get_parameters()` method to get all parameters, at which point a Python iterator will be returned.
```python
print(type(net.get_parameters()))
```
```text
<class 'generator'>
```
Or you can call `Cell.parameters_and_names` to return the parameter names and parameters.
```python
for name, param in net.parameters_and_names():
print(f"{name}:\n{param.asnumpy()}")
```
```text
w:
[[ 4.15680408e-02 -1.20311625e-01 5.02573885e-02]
[ 1.22175144e-04 -1.34980649e-01 1.17642188e+00]
[ 7.57667869e-02 -1.74758151e-01 -5.19092619e-01]
[-1.67846107e+00 3.27240258e-01 -2.06452996e-01]
[ 5.72323874e-02 -8.27963874e-02 5.94243526e-01]]
b:
[-1.2192779 -0.36789745 0.0946381 ]
```
### Modifying the Parameter
#### Modifying Parameter Values Directly
Parameter is a special kind of Tensor, so its value can be modified by using the Tensor index modification.
```python
net.b[0] = 1.
print(net.b.asnumpy())
```
```text
[ 1. -0.36789745 0.0946381 ]
```
#### Overriding the Modified Parameter Values
The `Parameter.set_data` method can be called to override the Parameter by using a Tensor with the same Shape. This method is commonly used for [Cell traversal initialization](https://www.mindspore.cn/docs/en/master/model_train/custom_program/initializer.html) by using Initializer.
```python
net.b.set_data(Tensor([3, 4, 5]))
print(net.b.asnumpy())
```
```text
[3. 4. 5.]
```
#### Modifying Parameter Values During Runtime
The main role of parameters is to update their values during model training, which involves parameter modification during runtime after backward propagation to obtain gradients, or when untrainable parameters need to be updated. Due to the compiled design of MindSpore's [Accelerating with Static Graphs](https://www.mindspore.cn/tutorials/en/master/beginner/accelerate_with_static_graph.html), it is necessary at this point to use the `mindspore.ops.assign` interface to assign parameters. This method is commonly used in [Custom Optimizer](https://www.mindspore.cn/docs/en/master/model_train/custom_program/optimizer.html) scenarios. The following is a simple sample modification of parameter values during runtime:
```python
import mindspore as ms
@ms.jit
def modify_parameter():
b_hat = ms.Tensor([7, 8, 9])
ops.assign(net.b, b_hat)
return True
modify_parameter()
print(net.b.asnumpy())
```
```text
[7. 8. 9.]
```
### Parameter Tuple
ParameterTuple, variable tuple, used to store multiple Parameter, is inherited from tuple tuples, and provides cloning function.
The following example provides the ParameterTuple creation method:
```python
from mindspore.common.initializer import initializer
from mindspore import ParameterTuple
# Creation
x = Parameter(default_input=ms.Tensor(np.arange(2 * 3).reshape((2, 3))), name="x")
y = Parameter(default_input=initializer('ones', [1, 2, 3], ms.float32), name='y')
z = Parameter(default_input=2.0, name='z')
params = ParameterTuple((x, y, z))
# Clone from params and change the name to "params_copy"
Some Tensor operations in neural networks do not behave the same during training and inference, e.g., `nn.Dropout` performs random dropout during training but not during inference, and `nn.BatchNorm` requires updating the `mean` and `var` variables during training and fixing their values unchanged during inference. So we can set the state of the neural network through the `Cell.set_train` interface.
When `set_train` is set to True, the neural network state is `train`, and the default value of `set_train` interface is `True`:
```python
net.set_train()
print(net.phase)
```
```text
train
```
When `set_train` is set to False, the neural network state is `predict`:
[](https://gitee.com/mindspore/docs/blob/master/tutorials/source_en/advanced/modules/loss.md)
# Custom Loss Function
# Loss Function
[](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/model_train/custom_program/loss.md)
A loss function is also called objective function and is used to measure the difference between a predicted value and an actual value.
@@ -8,7 +8,7 @@ In deep learning, model training is a process of reducing the loss function valu
The `mindspore.nn` module provides many [general loss functions](https://www.mindspore.cn/docs/en/master/api_python/mindspore.nn.html#loss-function), but these functions cannot meet all requirements. In many cases, you need to customize the required loss functions. The following describes how to customize loss functions.
[](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/model_train/custom_program/network_custom.md)
Normally, the neural network layer interface and function interface provided by MindSpore can meet the model construction requirements, but since the AI field is constantly updating, it is possible to encounter new network structures without built-in modules. At this point, we can customize the neural network layer through the function interface provided by MindSpore, Primitive operator, and can use the `Cell.bprop` method to customize the reverse. The following are the details of each of the three customization methods.
## Constructing Neural Network Layers by Using the Function Interface
MindSpore provides a large number of basic function interfaces, which can be used to construct complex Tensor operations, encapsulated as neural network layers. The following is an example of `Threshold` with the following equation:
$$
y =\begin{cases}
x, &\text{ if } x > \text{threshold} \\
\text{value}, &\text{ otherwise }
\end{cases}
$$
It can be seen that `Threshold` determines whether the value of the Tensor is greater than the `threshold` value, keeps the value whose judgment result is `True`, and replaces the value whose judgment result is `False`. Therefore, the corresponding implementation is as follows:
```python
class Threshold(nn.Cell):
def __init__(self, threshold, value):
super().__init__()
self.threshold = threshold
self.value = value
def construct(self, inputs):
cond = ops.gt(inputs, self.threshold)
value = ops.fill(inputs.dtype, inputs.shape, self.value)
return ops.select(cond, inputs, value)
```
Here `ops.gt`, `ops.fill`, and `ops.select` are used to implement judgment and replacement respectively. The following custom `Threshold` layer is implemented:
It can be seen that `inputs[0] = threshold`, so it is replaced with `20`.
## Custom Cell Reverse
In special scenarios, we not only need to customize the forward logic of the neural network layer, but also want to manually control the computation of its reverse, which we can define through the `Cell.bprop` interface. The function will be used in scenarios such as new neural network structure design and backward propagation speed optimization. In the following, we take `Dropout2d` as an example to introduce custom Cell reverse.
```python
class Dropout2d(nn.Cell):
def __init__(self, keep_prob):
super().__init__()
self.keep_prob = keep_prob
self.dropout2d = ops.Dropout2D(keep_prob)
def construct(self, x):
return self.dropout2d(x)
def bprop(self, x, out, dout):
_, mask = out
dy, _ = dout
if self.keep_prob != 0:
dy = dy * (1 / self.keep_prob)
dy = mask.astype(mindspore.float32) * dy
return (dy.astype(x.dtype), )
dropout_2d = Dropout2d(0.8)
dropout_2d.bprop_debug = True
```
The `bprop` method has three separate input parameters:
- *x*: Forward input. When there are multiple forward inputs, the same number of inputs are required.
- *out*: Forward input.
- *dout*: When backward propagation is performed, the current Cell executes the previous reverse result.
Generally we need to calculate the reverse result according to the reverse derivative formula based on the forward output and the reverse result of the front layer, and return it. The reverse calculation of `Dropout2d` requires masking the reverse result of the front layer based on the `mask` matrix of the forward output, and then scaling according to `keep_prob`. The final implementation can get the correct calculation result.
When customizing the reverse direction of a Cell, it supports extended writing in PyNative mode and can differentiate the weights inside the Cell. The specific columns are as follows:
`bprop` method supports *args as an input parameter, and the last data in the args array, `args[-1]` is the gradient returned to the cell. Set the weight of differentiation through `self.internal_params`, and return a tuple and a dictionary in the `bprop` function. Return the tuple corresponding to the input gradient, as well as the dictionary corresponding to the gradient with key as the weight and value as the weight.
[](https://gitee.com/mindspore/docs/blob/master/tutorials/experts/source_en/operation/ms_kernel.md)
[](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/model_train/custom_program/operation/ms_kernel.md)
[](https://gitee.com/mindspore/docs/blob/master/tutorials/experts/source_en/operation/op_custom.md)
[](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/model_train/custom_program/operation/op_custom.md)
## Overview
@@ -139,10 +139,10 @@ JIT (Just In Time) refers to operators compiled directly by the framework during
A custom operator of Hybrid type is the default defined type of a custom operator. By using a custom operator of the Hybrid type, users can describe the operator calculation logic in Python-like syntax without paying attention to the engineering details defined by the operator for the MindSpore framework, allowing the user to focus on the algorithm itself.
Custom operators of Hybrid type use [MindSpore Hybrid DSL](https://www.mindspore.cn/tutorials/experts/en/master/operation/ms_kernel.html#syntax-specification) to describe the implementation of the calculation logic inside the operator. Functions defined with MindSpore Hybrid DSL can be parsed by the [AKG Operator Compiler](https://gitee.com/mindspore/akg) for JIT compilation to generate efficient operators for use in training reasoning for large-scale models. At the same time, the function defined by MindSpore Hybrid DSL can be called directly as a `numpy` function, which is convenient for users to debug and flexibly switch to [pyfunc type custom operator](#the-introduction-to-custom-operator-an-example), so that when developed, custom operator expressions are reused for multiple modes, multiple platforms and multiple scenes.
Custom operators of Hybrid type use [MindSpore Hybrid DSL](https://www.mindspore.cn/docs/en/master/model_train/custom_program/operation/ms_kernel.html#syntax-specification) to describe the implementation of the calculation logic inside the operator. Functions defined with MindSpore Hybrid DSL can be parsed by the [AKG Operator Compiler](https://gitee.com/mindspore/akg) for JIT compilation to generate efficient operators for use in training reasoning for large-scale models. At the same time, the function defined by MindSpore Hybrid DSL can be called directly as a `numpy` function, which is convenient for users to debug and flexibly switch to [pyfunc type custom operator](#the-introduction-to-custom-operator-an-example), so that when developed, custom operator expressions are reused for multiple modes, multiple platforms and multiple scenes.
The following example (test_custom_hybrid.py) shows how to write a custom operator of the hybrid type. The operator computes the sum of two tensors.
Notice that custom operators of Hybrid type use the source to source transformation method to connect the graph compiler and the operator compiler. Users can use the keywords of MindSpore Hybrid DSL directly in the script, such as `output_tensor` below, without importing any Python modules. For more information about the keywords, refer to [MindSpore Hybrid DSL Keywords](https://www.mindspore.cn/tutorials/experts/en/master/operation/ms_kernel.html#keywords).
Notice that custom operators of Hybrid type use the source to source transformation method to connect the graph compiler and the operator compiler. Users can use the keywords of MindSpore Hybrid DSL directly in the script, such as `output_tensor` below, without importing any Python modules. For more information about the keywords, refer to [MindSpore Hybrid DSL Keywords](https://www.mindspore.cn/docs/en/master/model_train/custom_program/operation/ms_kernel.html#keywords).
```python
import numpy as np
@@ -198,7 +198,7 @@ The custom operator of akg type uses the [MindSpore AKG](https://gitee.com/minds
Operator output shape and data type inference can be realized by defining Python functions to describe the inference logic of operator output shape and data type.
If the operator contains attributes or only supports specific input and output data types or data formats, operator information needs to be registered, and for how to generate operator information, see [Registering the Operator Information](https://www.mindspore.cn/tutorials/experts/en/master/operation/op_custom_adv.html#registering-the-operator-information). If the operator information is not registered, when operator selection and mapping are made in the backend, the operator information is derived from the input of the current operator.
If the operator contains attributes or only supports specific input and output data types or data formats, operator information needs to be registered, and for how to generate operator information, see [Registering the Operator Information](https://www.mindspore.cn/docs/en/master/model_train/custom_program/operation/op_custom_adv.html#registering-the-operator-information). If the operator information is not registered, when operator selection and mapping are made in the backend, the operator information is derived from the input of the current operator.
The following is an example of the development process of a custom operator of type akg in test_custom_akg.py, where the custom operator implements the addition of two input tensors.
@@ -252,7 +252,7 @@ For more complete examples of akg-type custom operators, see the [use cases](htt
## AOT-Compiled Custom Operator
AOT type of customized operator means that the user compiles the operator into a binary file beforehand and then accesses the network. Usually, users optimize their implementations using programming languages such as C/C++/CUDA and compile their operators as dynamic libraries to accelerate MindSpore networks. As a result, users can perform ultimate optimization on their operators and leverage the performance of the corresponding backend hardware. Here, we will introduce some basic knowledge about AOT type custom operators. For more advanced usage and functionality of AOT type custom operators, please refer to [Advanced Usage of AOT Type Custom Operators](https://www.mindspore.cn/tutorials/experts/en/master/operation/op_custom_aot.html).
AOT type of customized operator means that the user compiles the operator into a binary file beforehand and then accesses the network. Usually, users optimize their implementations using programming languages such as C/C++/CUDA and compile their operators as dynamic libraries to accelerate MindSpore networks. As a result, users can perform ultimate optimization on their operators and leverage the performance of the corresponding backend hardware. Here, we will introduce some basic knowledge about AOT type custom operators. For more advanced usage and functionality of AOT type custom operators, please refer to [Advanced Usage of AOT Type Custom Operators](https://www.mindspore.cn/docs/en/master/model_train/custom_program/operation/op_custom_aot.html).
### Defining Custom Operator of aot Type
@@ -279,7 +279,7 @@ In the Python script, the format for the `func` input in `Custom` is `Path_To_Fu
Operator output shape and data type inference can be realized by defining Python functions to describe the inference logic.
If the operator only supports some specific input and output data types, the operator information needs to be registered. For the creation of operator information, please refer to [Registering the Operator Information](https://www.mindspore.cn/tutorials/experts/en/master/operation/op_custom_adv.html#registering-the-operator-information).
If the operator only supports some specific input and output data types, the operator information needs to be registered. For the creation of operator information, please refer to [Registering the Operator Information](https://www.mindspore.cn/docs/en/master/model_train/custom_program/operation/op_custom_adv.html#registering-the-operator-information).
The following examples introduce the development process of aot type custom operator on GPU platform and CPU platform, where the custom operator implements the function of adding two input tensors.
@@ -452,7 +452,7 @@ The custom operator of julia type uses Julia to describe the internal calculatio
Operator output shape and data type inference can be realized by defining Python functions to describe the inference logic of the operator output shape and the data type.
If the custom operator only supports specific input and output data types, you need to define the operator information. For the creation of operator information, please refer to [Registering the Operator Information](https://www.mindspore.cn/tutorials/experts/en/master/operation/op_custom_adv.html#registering-the-operator-information).
If the custom operator only supports specific input and output data types, you need to define the operator information. For the creation of operator information, please refer to [Registering the Operator Information](https://www.mindspore.cn/docs/en/master/model_train/custom_program/operation/op_custom_adv.html#registering-the-operator-information).
Takes the function of adding two input tensors as an example to introduce how to define a custom operator of julia type.
[](https://gitee.com/mindspore/docs/blob/master/tutorials/experts/source_en/operation/op_custom_adv.md)
[](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/model_train/custom_program/operation/op_custom_adv.md)
[](https://gitee.com/mindspore/docs/blob/master/tutorials/experts/source_en/operation/op_custom_aot.md)
[](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/model_train/custom_program/operation/op_custom_aot.md)
## Overview
aot-type custom operators use a pre-compilation approach, which requires developers to write the source code files for the corresponding function based on a specific interface, and compile the source code files in advance into a dynamic link library.
Then, during network runtime, the framework will automatically call and execute the function in the dynamic link library.
aot-type custom operators support CUDA language for GPU platforms and C and C++ languages for CPU platforms. For basic knowledge of developing aot-type custom operators, please refer to [basic tutorial](https://www.mindspore.cn/tutorials/experts/en/master/operation/op_custom.html#defining-custom-operator-of-aot-type).
aot-type custom operators support CUDA language for GPU platforms and C and C++ languages for CPU platforms. For basic knowledge of developing aot-type custom operators, please refer to [basic tutorial](https://www.mindspore.cn/docs/en/master/model_train/custom_program/operation/op_custom.html#defining-custom-operator-of-aot-type).
In this tutorial, we will demonstrate advanced features of aot-type custom operators, including:
Some files were not shown because too many files changed in this diff
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.