Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about using AMD VitisAI EP, how can i run my model on AMD NPU? #24214

Open
yunhaolsh opened this issue Mar 27, 2025 · 1 comment
Open
Labels
ep:VitisAI issues related to Vitis AI execution provider

Comments

@yunhaolsh
Copy link

Describe the issue

Could anyone help to tell me which CPU models can run the model on the NPU via onnxruntime VitisAI EP?
Mine is a Ryzen 7 260 w, checking on the official website Ryzen 7 260 w is support for Ryzen AI Software, I wonder if it can support onnxruntime VisitAI EP? I see that the onnxruntime EP page says only AMD Ryzen 7040U, 7040HS for windows are support? I
I followed the tutorial https://ryzenai.docs.amd.com/en/latest/getstartex.html to quantify the model and ran it, but it was still all running on the cpu.

Thanks.

To reproduce

logs:

model name:D:\RyzenAISW\RyzenAI-SW\tutorial\getting_started_resnet\cpp\build\Release\resnet_quantized.onnx
ep:npu
Found total 8 core(s) from windows system:

core 1 consist of logical processors: 1 2
core 2 consist of logical processors: 3 4
core 3 consist of logical processors: 5 6
core 4 consist of logical processors: 7 8
core 5 consist of logical processors: 9 10
core 6 consist of logical processors: 11 12
core 7 consist of logical processors: 13 14
core 8 consist of logical processors: 15 16

Detected L2 cache size: 1048576 bytes
Available Execution Providers:

VitisAIExecutionProvider
DmlExecutionProvider
CPUExecutionProvider
Session Options { execution_mode:0 execution_order:DEFAULT enable_profiling:1 optimized_model_filepath: enable_mem_pattern:1 enable_mem_reuse:1 enable_cpu_mem_arena:1 profile_file_prefix:D:\RyzenAISW\RyzenAI-SW\tutorial\getting_started_resnet\cpp\test.json session_logid: session_log_severity_level:-1 session_log_verbosity_level:0 max_num_graph_transformation_steps:10 graph_optimization_level:3 intra_op_param:OrtThreadPoolParams { thread_pool_size: 0 auto_set_affinity: 0 allow_spinning: 1 dynamic_block_base_: 0 stack_size: 0 affinity_str: set_denormal_as_zero: 0 } inter_op_param:OrtThreadPoolParams { thread_pool_size: 0 auto_set_affinity: 0 allow_spinning: 1 dynamic_block_base_: 0 stack_size: 0 affinity_str: set_denormal_as_zero: 0 } use_per_session_threads:1 thread_pool_allow_spinning:1 use_deterministic_compute:0 config_options: { } }
Flush-to-zero and denormal-as-zero are off
Creating and using per session threadpools since use_per_session_threads_ is true
Dynamic block base set to 0
EP Context cache enabled: 0
EP context cache embed mode: 1
User specified EP context cache path:
Initializing session.
Adding default CPU execution provider.
Creating BFCArena for Cpu with following configs: initial_chunk_size_bytes: 1048576 max_dead_bytes_per_chunk: 134217728 initial_growth_chunk_size_bytes: 2097152 max_power_of_two_extend_bytes: 1073741824 memory limit: 18446744073709551615 arena_extend_strategy: 0
Creating 21 bins of max chunk size 256 to 268435456
This model does not have any local functions defined. AOT Inlining is not performed
GraphTransformer EnsureUniqueDQForNodeUnit modified: 1 with status: OK
GraphTransformer Level1_RuleBasedTransformer modified: 1 with status: OK
Removing initializer '/avgpool/GlobalAveragePool_output_0_Scale'. It is no longer used by any node.
GraphTransformer DoubleQDQPairsRemover modified: 0 with status: OK
Total shared scalar initializer count: 174
GraphTransformer ConstantSharing modified: 1 with status: OK
GraphTransformer CommonSubexpressionElimination modified: 0 with status: OK
GraphTransformer ConstantFolding modified: 0 with status: OK
GraphTransformer MatMulAddFusion modified: 0 with status: OK
GraphTransformer ReshapeFusion modified: 0 with status: OK
GraphTransformer FreeDimensionOverrideTransformer modified: 0 with status: OK
GraphTransformer QDQPropagationTransformer modified: 0 with status: OK
GraphTransformer EnsureUniqueDQForNodeUnit modified: 0 with status: OK
GraphTransformer RocmBlasAltImpl modified: 0 with status: OK
GraphTransformer TransposeOptimizer modified: 0 with status: OK
GraphTransformer Level1_RuleBasedTransformer modified: 0 with status: OK
GraphTransformer DoubleQDQPairsRemover modified: 0 with status: OK
GraphTransformer CommonSubexpressionElimination modified: 0 with status: OK
GraphTransformer ConstantFolding modified: 0 with status: OK
GraphTransformer MatMulAddFusion modified: 0 with status: OK
GraphTransformer ReshapeFusion modified: 0 with status: OK
GraphTransformer FreeDimensionOverrideTransformer modified: 0 with status: OK
GraphTransformer QDQPropagationTransformer modified: 0 with status: OK
GraphTransformer EnsureUniqueDQForNodeUnit modified: 0 with status: OK
GraphTransformer RocmBlasAltImpl modified: 0 with status: OK
WARNING: Logging before InitGoogleLogging() is written to STDERR
I20250327 08:58:37.283396 9064 vitisai_compile_model.cpp:1046] Vitis AI EP Load ONNX Model Success
I20250327 08:58:37.283396 9064 vitisai_compile_model.cpp:1047] Graph Input Node Name/Shape (1)
I20250327 08:58:37.283396 9064 vitisai_compile_model.cpp:1051] input : [-1x3x32x32]
I20250327 08:58:37.283396 9064 vitisai_compile_model.cpp:1057] Graph Output Node Name/Shape (1)
I20250327 08:58:37.283396 9064 vitisai_compile_model.cpp:1061] output : [-1x10]
[Vitis AI EP] No. of Operators : CPU 400
GraphTransformer Level2_RuleBasedTransformer modified: 0 with status: OK
GraphTransformer TransposeOptimizer_CPUExecutionProvider modified: 0 with status: OK
GraphTransformer QDQS8ToU8Transformer modified: 0 with status: OK
Matched MaxPool
Matched GlobalAveragePool
GraphTransformer QDQSelectorActionTransformer modified: 1 with status: OK
GraphTransformer GemmActivationFusion modified: 1 with status: OK
GraphTransformer MatMulIntegerToFloatFusion modified: 0 with status: OK
GraphTransformer DynamicQuantizeMatMulFusion modified: 0 with status: OK
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
GraphTransformer ConvActivationFusion modified: 1 with status: OK
GraphTransformer GeluFusion modified: 0 with status: OK
GraphTransformer LayerNormFusion modified: 0 with status: OK
GraphTransformer SimplifiedLayerNormFusion modified: 0 with status: OK
GraphTransformer AttentionFusion modified: 0 with status: OK
GraphTransformer EmbedLayerNormFusion modified: 0 with status: OK
GraphTransformer GatherSliceToSplitFusion modified: 0 with status: OK
GraphTransformer GatherToSliceFusion modified: 0 with status: OK
GraphTransformer MatmulTransposeFusion modified: 0 with status: OK
GraphTransformer BiasGeluFusion modified: 0 with status: OK
GraphTransformer SkipLayerNormFusion modified: 0 with status: OK
GraphTransformer FastGeluFusion modified: 0 with status: OK
GraphTransformer QuickGeluFusion modified: 0 with status: OK
GraphTransformer BiasSoftmaxFusion modified: 0 with status: OK
GraphTransformer BiasDropoutFusion modified: 0 with status: OK
GraphTransformer MatMulScaleFusion modified: 0 with status: OK
GraphTransformer MatMulActivationFusion modified: 0 with status: OK
GraphTransformer MatMulNBitsFusion modified: 0 with status: OK
GraphTransformer QDQFinalCleanupTransformer modified: 0 with status: OK
GraphTransformer Level2_RuleBasedTransformer modified: 0 with status: OK
GraphTransformer QDQS8ToU8Transformer modified: 0 with status: OK
GraphTransformer QDQSelectorActionTransformer modified: 0 with status: OK
GraphTransformer GemmActivationFusion modified: 0 with status: OK
GraphTransformer MatMulIntegerToFloatFusion modified: 0 with status: OK
GraphTransformer DynamicQuantizeMatMulFusion modified: 0 with status: OK
GraphTransformer ConvActivationFusion modified: 0 with status: OK
GraphTransformer GeluFusion modified: 0 with status: OK
GraphTransformer LayerNormFusion modified: 0 with status: OK
GraphTransformer SimplifiedLayerNormFusion modified: 0 with status: OK
GraphTransformer AttentionFusion modified: 0 with status: OK
GraphTransformer EmbedLayerNormFusion modified: 0 with status: OK
GraphTransformer GatherSliceToSplitFusion modified: 0 with status: OK
GraphTransformer GatherToSliceFusion modified: 0 with status: OK
GraphTransformer MatmulTransposeFusion modified: 0 with status: OK
GraphTransformer BiasGeluFusion modified: 0 with status: OK
GraphTransformer SkipLayerNormFusion modified: 0 with status: OK
GraphTransformer FastGeluFusion modified: 0 with status: OK
GraphTransformer QuickGeluFusion modified: 0 with status: OK
GraphTransformer BiasSoftmaxFusion modified: 0 with status: OK
GraphTransformer BiasDropoutFusion modified: 0 with status: OK
GraphTransformer MatMulScaleFusion modified: 0 with status: OK
GraphTransformer MatMulActivationFusion modified: 0 with status: OK
GraphTransformer MatMulNBitsFusion modified: 0 with status: OK
GraphTransformer QDQFinalCleanupTransformer modified: 0 with status: OK
GraphTransformer NchwcTransformer modified: 0 with status: OK
GraphTransformer NhwcTransformer modified: 0 with status: OK
GraphTransformer ConvAddActivationFusion modified: 0 with status: OK
GraphTransformer RemoveDuplicateCastTransformer modified: 0 with status: OK
GraphTransformer CastFloat16Transformer modified: 0 with status: OK
GraphTransformer MemcpyTransformer modified: 0 with status: OK
Node placements
All nodes placed on [CPUExecutionProvider]. Number of nodes: 362
SaveMLValueNameIndexMapping
Done saving OrtValue mappings.
Use DeviceBasedPartition as default
Saving initialized tensors.
Extending BFCArena for Cpu. bin_num:0 (requested) num_bytes: 4 (actual) rounded_bytes:256
Extended allocation by 1048576 bytes.
Total allocated bytes: 1048576
Allocated memory at 000001D7C17C0080 to 000001D7C18C0080
Extending BFCArena for Cpu. bin_num:11 (requested) num_bytes: 589824 (actual) rounded_bytes:589824
Extended allocation by 2097152 bytes.
Total allocated bytes: 3145728
Allocated memory at 000001D7C18DF080 to 000001D7C1ADF080
Extending BFCArena for Cpu. bin_num:10 (requested) num_bytes: 262144 (actual) rounded_bytes:262144
Extended allocation by 4194304 bytes.
Total allocated bytes: 7340032
Allocated memory at 000001D7C3A1E080 to 000001D7C3E1E080
Extending BFCArena for Cpu. bin_num:10 (requested) num_bytes: 262144 (actual) rounded_bytes:262144
Extended allocation by 8388608 bytes.
Total allocated bytes: 15728640
Allocated memory at 000001D7C54DC080 to 000001D7C5CDC080
Extending BFCArena for Cpu. bin_num:0 (requested) num_bytes: 1 (actual) rounded_bytes:256
Extended allocation by 16777216 bytes.
Total allocated bytes: 32505856
Allocated memory at 000001D7C5CE5080 to 000001D7C6CE5080
Done saving initialized tensors
Session successfully initialized.
Writing profiler data to file D:\RyzenAISW\RyzenAI-SW\tutorial\getting_started_resnet\cpp\test.json_2025-03-27_08-58-37.json
Session Options { execution_mode:0 execution_order:DEFAULT enable_profiling:1 optimized_model_filepath: enable_mem_pattern:1 enable_mem_reuse:1 enable_cpu_mem_arena:1 profile_file_prefix:D:\RyzenAISW\RyzenAI-SW\tutorial\getting_started_resnet\cpp\test.json session_logid: session_log_severity_level:-1 session_log_verbosity_level:0 max_num_graph_transformation_steps:10 graph_optimization_level:3 intra_op_param:OrtThreadPoolParams { thread_pool_size: 0 auto_set_affinity: 0 allow_spinning: 1 dynamic_block_base_: 0 stack_size: 0 affinity_str: set_denormal_as_zero: 0 } inter_op_param:OrtThreadPoolParams { thread_pool_size: 0 auto_set_affinity: 0 allow_spinning: 1 dynamic_block_base_: 0 stack_size: 0 affinity_str: set_denormal_as_zero: 0 } use_per_session_threads:1 thread_pool_allow_spinning:1 use_deterministic_compute:0 config_options: { } }
Creating and using per session threadpools since use_per_session_threads_ is true
Dynamic block base set to 0
EP Context cache enabled: 0
EP context cache embed mode: 1
User specified EP context cache path:
Initializing session.
Adding default CPU execution provider.
Creating BFCArena for Cpu with following configs: initial_chunk_size_bytes: 1048576 max_dead_bytes_per_chunk: 134217728 initial_growth_chunk_size_bytes: 2097152 max_power_of_two_extend_bytes: 1073741824 memory limit: 18446744073709551615 arena_extend_strategy: 0
Creating 21 bins of max chunk size 256 to 268435456
This model does not have any local functions defined. AOT Inlining is not performed
GraphTransformer EnsureUniqueDQForNodeUnit modified: 1 with status: OK
GraphTransformer Level1_RuleBasedTransformer modified: 1 with status: OK
Removing initializer '/avgpool/GlobalAveragePool_output_0_Scale'. It is no longer used by any node.
GraphTransformer DoubleQDQPairsRemover modified: 0 with status: OK
Total shared scalar initializer count: 174
GraphTransformer ConstantSharing modified: 1 with status: OK
GraphTransformer CommonSubexpressionElimination modified: 0 with status: OK
GraphTransformer ConstantFolding modified: 0 with status: OK
GraphTransformer MatMulAddFusion modified: 0 with status: OK
GraphTransformer ReshapeFusion modified: 0 with status: OK
GraphTransformer FreeDimensionOverrideTransformer modified: 0 with status: OK
GraphTransformer QDQPropagationTransformer modified: 0 with status: OK
GraphTransformer EnsureUniqueDQForNodeUnit modified: 0 with status: OK
GraphTransformer RocmBlasAltImpl modified: 0 with status: OK
GraphTransformer TransposeOptimizer modified: 0 with status: OK
GraphTransformer Level1_RuleBasedTransformer modified: 0 with status: OK
GraphTransformer DoubleQDQPairsRemover modified: 0 with status: OK
GraphTransformer CommonSubexpressionElimination modified: 0 with status: OK
GraphTransformer ConstantFolding modified: 0 with status: OK
GraphTransformer MatMulAddFusion modified: 0 with status: OK
GraphTransformer ReshapeFusion modified: 0 with status: OK
GraphTransformer FreeDimensionOverrideTransformer modified: 0 with status: OK
GraphTransformer QDQPropagationTransformer modified: 0 with status: OK
GraphTransformer EnsureUniqueDQForNodeUnit modified: 0 with status: OK
GraphTransformer RocmBlasAltImpl modified: 0 with status: OK
[Vitis AI EP] No. of Operators : CPU 400
GraphTransformer Level2_RuleBasedTransformer modified: 0 with status: OK
GraphTransformer TransposeOptimizer_CPUExecutionProvider modified: 0 with status: OK
GraphTransformer QDQS8ToU8Transformer modified: 0 with status: OK
Matched MaxPool
Matched GlobalAveragePool
GraphTransformer QDQSelectorActionTransformer modified: 1 with status: OK
GraphTransformer GemmActivationFusion modified: 1 with status: OK
GraphTransformer MatMulIntegerToFloatFusion modified: 0 with status: OK
GraphTransformer DynamicQuantizeMatMulFusion modified: 0 with status: OK
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
Matched Conv
GraphTransformer ConvActivationFusion modified: 1 with status: OK
GraphTransformer GeluFusion modified: 0 with status: OK
GraphTransformer LayerNormFusion modified: 0 with status: OK
GraphTransformer SimplifiedLayerNormFusion modified: 0 with status: OK
GraphTransformer AttentionFusion modified: 0 with status: OK
GraphTransformer EmbedLayerNormFusion modified: 0 with status: OK
GraphTransformer GatherSliceToSplitFusion modified: 0 with status: OK
GraphTransformer GatherToSliceFusion modified: 0 with status: OK
GraphTransformer MatmulTransposeFusion modified: 0 with status: OK
GraphTransformer BiasGeluFusion modified: 0 with status: OK
GraphTransformer SkipLayerNormFusion modified: 0 with status: OK
GraphTransformer FastGeluFusion modified: 0 with status: OK
GraphTransformer QuickGeluFusion modified: 0 with status: OK
GraphTransformer BiasSoftmaxFusion modified: 0 with status: OK
GraphTransformer BiasDropoutFusion modified: 0 with status: OK
GraphTransformer MatMulScaleFusion modified: 0 with status: OK
GraphTransformer MatMulActivationFusion modified: 0 with status: OK
GraphTransformer MatMulNBitsFusion modified: 0 with status: OK
GraphTransformer QDQFinalCleanupTransformer modified: 0 with status: OK
GraphTransformer Level2_RuleBasedTransformer modified: 0 with status: OK
GraphTransformer QDQS8ToU8Transformer modified: 0 with status: OK
GraphTransformer QDQSelectorActionTransformer modified: 0 with status: OK
GraphTransformer GemmActivationFusion modified: 0 with status: OK
GraphTransformer MatMulIntegerToFloatFusion modified: 0 with status: OK
GraphTransformer DynamicQuantizeMatMulFusion modified: 0 with status: OK
GraphTransformer ConvActivationFusion modified: 0 with status: OK
GraphTransformer GeluFusion modified: 0 with status: OK
GraphTransformer LayerNormFusion modified: 0 with status: OK
GraphTransformer SimplifiedLayerNormFusion modified: 0 with status: OK
GraphTransformer AttentionFusion modified: 0 with status: OK
GraphTransformer EmbedLayerNormFusion modified: 0 with status: OK
GraphTransformer GatherSliceToSplitFusion modified: 0 with status: OK
GraphTransformer GatherToSliceFusion modified: 0 with status: OK
GraphTransformer MatmulTransposeFusion modified: 0 with status: OK
GraphTransformer BiasGeluFusion modified: 0 with status: OK
GraphTransformer SkipLayerNormFusion modified: 0 with status: OK
GraphTransformer FastGeluFusion modified: 0 with status: OK
GraphTransformer QuickGeluFusion modified: 0 with status: OK
GraphTransformer BiasSoftmaxFusion modified: 0 with status: OK
GraphTransformer BiasDropoutFusion modified: 0 with status: OK
GraphTransformer MatMulScaleFusion modified: 0 with status: OK
GraphTransformer MatMulActivationFusion modified: 0 with status: OK
GraphTransformer MatMulNBitsFusion modified: 0 with status: OK
GraphTransformer QDQFinalCleanupTransformer modified: 0 with status: OK
GraphTransformer NchwcTransformer modified: 0 with status: OK
GraphTransformer NhwcTransformer modified: 0 with status: OK
GraphTransformer ConvAddActivationFusion modified: 0 with status: OK
GraphTransformer RemoveDuplicateCastTransformer modified: 0 with status: OK
GraphTransformer CastFloat16Transformer modified: 0 with status: OK
GraphTransformer MemcpyTransformer modified: 0 with status: OK
Node placements
All nodes placed on [CPUExecutionProvider]. Number of nodes: 362
SaveMLValueNameIndexMapping
Done saving OrtValue mappings.
Use DeviceBasedPartition as default
Saving initialized tensors.
Extending BFCArena for Cpu. bin_num:0 (requested) num_bytes: 1 (actual) rounded_bytes:256
Extended allocation by 1048576 bytes.
Total allocated bytes: 1048576
Allocated memory at 000001D7C17CF080 to 000001D7C18CF080
Extending BFCArena for Cpu. bin_num:13 (requested) num_bytes: 2359296 (actual) rounded_bytes:2359296
Extended allocation by 4194304 bytes.
Total allocated bytes: 5242880
Allocated memory at 000001D7C3A2C080 to 000001D7C3E2C080
Extending BFCArena for Cpu. bin_num:0 (requested) num_bytes: 1 (actual) rounded_bytes:256
Extended allocation by 4194304 bytes.
Total allocated bytes: 9437184
Allocated memory at 000001D7C3E35080 to 000001D7C4235080
Extending BFCArena for Cpu. bin_num:0 (requested) num_bytes: 1 (actual) rounded_bytes:256
Extended allocation by 8388608 bytes.
Total allocated bytes: 17825792
Allocated memory at 000001D7C54D8080 to 000001D7C5CD8080
Extending BFCArena for Cpu. bin_num:13 (requested) num_bytes: 2359296 (actual) rounded_bytes:2359296
Extended allocation by 16777216 bytes.
Total allocated bytes: 34603008
Allocated memory at 000001D7C757E080 to 000001D7C857E080
Done saving initialized tensors
Session successfully initialized.
Input Node Name/Shape (1):
input : -1x3x32x32
output : -1x10
Extending BFCArena for Cpu. bin_num:15 (requested) num_bytes: 9437184 (actual) rounded_bytes:9437184
Extended allocation by 33554432 bytes.
Total allocated bytes: 68157440
Allocated memory at 000001D7C858C080 to 000001D7CA58C080
Final results:
Predicted label is tiger shark, Galeocerdo cuvieri, and actual label is tiger shark, Galeocerdo cuvieri,
Predicted label is hen, and actual label is hen,
Predicted label is hen, and actual label is hen,
Predicted label is tench, Tinca tinca, and actual label is tench, Tinca tinca,
Predicted label is stingray, and actual label is stingray,
Predicted label is stingray, and actual label is stingray,
Predicted label is ostrich, Struthio camelus, and actual label is goldfish, Carassius auratus,
Predicted label is stingray, and actual label is stingray,
Predicted label is tiger shark, Galeocerdo cuvieri, and actual label is tiger shark, Galeocerdo cuvieri,
Predicted label is goldfish, Carassius auratus, and actual label is goldfish, Carassius auratus,
Writing profiler data to file D:\RyzenAISW\RyzenAI-SW\tutorial\getting_started_resnet\cpp\test.json_2025-03-27_08-58-37.json

Urgency

No response

Platform

Windows

OS Version

11

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.19.0

ONNX Runtime API

C++

Architecture

X64

Execution Provider

Vitis AI

Execution Provider Library Version

RyzenAI 1.3.1

@github-actions github-actions bot added the ep:VitisAI issues related to Vitis AI execution provider label Mar 27, 2025
@yunhaolsh
Copy link
Author

Does anyone know anything about this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:VitisAI issues related to Vitis AI execution provider
Projects
None yet
Development

No branches or pull requests

1 participant