Error encountered while using TensorRT for inference #901

oshindow · 2025-01-18T04:04:14Z

Hi, I am trying to rebuild the TRT plan on a 4090 GPU. The command I am using is:

trtexec --onnx=pretrained_models/CosyVoice2-0.5B/flow.decoder.estimator.fp32.onnx \
    --saveEngine=pretrained_models/CosyVoice2-0.5B/flow.decoder.estimator.fp32.4090.plan

...and the output is:

&&&& RUNNING TensorRT.trtexec [TensorRT v100001] # trtexec --onnx=pretrained_models/CosyVoice2-0.5B/flow.decoder.estimator.fp32.onnx --saveEngine=pretrained_models/CosyVoice2-0.5B/flow.decoder.estimator.fp32.4090.plan
[01/18/2025-11:51:12] [I] === Model Options ===
[01/18/2025-11:51:12] [I] Format: ONNX
[01/18/2025-11:51:12] [I] Model: pretrained_models/CosyVoice2-0.5B/flow.decoder.estimator.fp32.onnx
[01/18/2025-11:51:12] [I] Output:
[01/18/2025-11:51:12] [I] === Build Options ===
[01/18/2025-11:51:12] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default, tacticSharedMem: default
[01/18/2025-11:51:12] [I] avgTiming: 8
[01/18/2025-11:51:12] [I] Precision: FP32
[01/18/2025-11:51:12] [I] LayerPrecisions: 
[01/18/2025-11:51:12] [I] Layer Device Types: 
[01/18/2025-11:51:12] [I] Calibration: 
[01/18/2025-11:51:12] [I] Refit: Disabled
[01/18/2025-11:51:12] [I] Strip weights: Disabled
[01/18/2025-11:51:12] [I] Version Compatible: Disabled
[01/18/2025-11:51:12] [I] ONNX Plugin InstanceNorm: Disabled
[01/18/2025-11:51:12] [I] TensorRT runtime: full
[01/18/2025-11:51:12] [I] Lean DLL Path: 
[01/18/2025-11:51:12] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[01/18/2025-11:51:12] [I] Exclude Lean Runtime: Disabled
[01/18/2025-11:51:12] [I] Sparsity: Disabled
[01/18/2025-11:51:12] [I] Safe mode: Disabled
[01/18/2025-11:51:12] [I] Build DLA standalone loadable: Disabled
[01/18/2025-11:51:12] [I] Allow GPU fallback for DLA: Disabled
[01/18/2025-11:51:12] [I] DirectIO mode: Disabled
[01/18/2025-11:51:12] [I] Restricted mode: Disabled
[01/18/2025-11:51:12] [I] Skip inference: Disabled
[01/18/2025-11:51:12] [I] Save engine: pretrained_models/CosyVoice2-0.5B/flow.decoder.estimator.fp32.4090.plan
[01/18/2025-11:51:12] [I] Load engine: 
[01/18/2025-11:51:12] [I] Profiling verbosity: 0
[01/18/2025-11:51:12] [I] Tactic sources: Using default tactic sources
[01/18/2025-11:51:12] [I] timingCacheMode: local
[01/18/2025-11:51:12] [I] timingCacheFile: 
[01/18/2025-11:51:12] [I] Enable Compilation Cache: Enabled
[01/18/2025-11:51:12] [I] errorOnTimingCacheMiss: Disabled
[01/18/2025-11:51:12] [I] Preview Features: Use default preview flags.
[01/18/2025-11:51:12] [I] MaxAuxStreams: -1
[01/18/2025-11:51:12] [I] BuilderOptimizationLevel: -1
[01/18/2025-11:51:12] [I] Calibration Profile Index: 0
[01/18/2025-11:51:12] [I] Weight Streaming: Disabled
[01/18/2025-11:51:12] [I] Debug Tensors: 
[01/18/2025-11:51:12] [I] Input(s)s format: fp32:CHW
[01/18/2025-11:51:12] [I] Output(s)s format: fp32:CHW
[01/18/2025-11:51:12] [I] Input build shapes: model
[01/18/2025-11:51:12] [I] Input calibration shapes: model
[01/18/2025-11:51:12] [I] === System Options ===
[01/18/2025-11:51:12] [I] Device: 0
[01/18/2025-11:51:12] [I] DLACore: 
[01/18/2025-11:51:12] [I] Plugins:
[01/18/2025-11:51:12] [I] setPluginsToSerialize:
[01/18/2025-11:51:12] [I] dynamicPlugins:
[01/18/2025-11:51:12] [I] ignoreParsedPluginLibs: 0
[01/18/2025-11:51:12] [I] 
[01/18/2025-11:51:12] [I] === Inference Options ===
[01/18/2025-11:51:12] [I] Batch: Explicit
[01/18/2025-11:51:12] [I] Input inference shapes: model
[01/18/2025-11:51:12] [I] Iterations: 10
[01/18/2025-11:51:12] [I] Duration: 3s (+ 200ms warm up)
[01/18/2025-11:51:12] [I] Sleep time: 0ms
[01/18/2025-11:51:12] [I] Idle time: 0ms
[01/18/2025-11:51:12] [I] Inference Streams: 1
[01/18/2025-11:51:12] [I] ExposeDMA: Disabled
[01/18/2025-11:51:12] [I] Data transfers: Enabled
[01/18/2025-11:51:12] [I] Spin-wait: Disabled
[01/18/2025-11:51:12] [I] Multithreading: Disabled
[01/18/2025-11:51:12] [I] CUDA Graph: Disabled
[01/18/2025-11:51:12] [I] Separate profiling: Disabled
[01/18/2025-11:51:12] [I] Time Deserialize: Disabled
[01/18/2025-11:51:12] [I] Time Refit: Disabled
[01/18/2025-11:51:12] [I] NVTX verbosity: 0
[01/18/2025-11:51:12] [I] Persistent Cache Ratio: 0
[01/18/2025-11:51:12] [I] Optimization Profile Index: 0
[01/18/2025-11:51:12] [I] Weight Streaming Budget: Disabled
[01/18/2025-11:51:12] [I] Inputs:
[01/18/2025-11:51:12] [I] Debug Tensor Save Destinations:
[01/18/2025-11:51:12] [I] === Reporting Options ===
[01/18/2025-11:51:12] [I] Verbose: Disabled
[01/18/2025-11:51:12] [I] Averages: 10 inferences
[01/18/2025-11:51:12] [I] Percentiles: 90,95,99
[01/18/2025-11:51:12] [I] Dump refittable layers:Disabled
[01/18/2025-11:51:12] [I] Dump output: Disabled
[01/18/2025-11:51:12] [I] Profile: Disabled
[01/18/2025-11:51:12] [I] Export timing to JSON file: 
[01/18/2025-11:51:12] [I] Export output to JSON file: 
[01/18/2025-11:51:12] [I] Export profile to JSON file: 
[01/18/2025-11:51:12] [I] 
[01/18/2025-11:51:12] [I] === Device Information ===
[01/18/2025-11:51:12] [I] Available Devices: 
[01/18/2025-11:51:12] [I]   Device 0: "NVIDIA GeForce RTX 4090" UUID: GPU-2ef010d0-11a4-b12b-c05b-027a9f6ec187
[01/18/2025-11:51:12] [I]   Device 1: "NVIDIA GeForce RTX 4090" UUID: GPU-d3f842b8-770d-ee14-8817-4764e3f405dc
[01/18/2025-11:51:12] [I]   Device 2: "NVIDIA GeForce RTX 4090" UUID: GPU-4c447716-f03b-bd64-77dc-d42daecf1418
[01/18/2025-11:51:12] [I]   Device 3: "NVIDIA GeForce RTX 4090" UUID: GPU-0111000f-de93-24c9-4d9b-0e0b917c79d6
[01/18/2025-11:51:12] [I] Selected Device: NVIDIA GeForce RTX 4090
[01/18/2025-11:51:12] [I] Selected Device ID: 0
[01/18/2025-11:51:12] [I] Selected Device UUID: GPU-2ef010d0-11a4-b12b-c05b-027a9f6ec187
[01/18/2025-11:51:12] [I] Compute Capability: 8.9
[01/18/2025-11:51:12] [I] SMs: 128
[01/18/2025-11:51:12] [I] Device Global Memory: 24217 MiB
[01/18/2025-11:51:12] [I] Shared Memory per SM: 100 KiB
[01/18/2025-11:51:12] [I] Memory Bus Width: 384 bits (ECC disabled)
[01/18/2025-11:51:12] [I] Application Compute Clock Rate: 2.52 GHz
[01/18/2025-11:51:12] [I] Application Memory Clock Rate: 10.501 GHz
[01/18/2025-11:51:12] [I] 
[01/18/2025-11:51:12] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[01/18/2025-11:51:12] [I] 
[01/18/2025-11:51:12] [I] TensorRT version: 10.0.1
[01/18/2025-11:51:12] [I] Loading standard plugins
[01/18/2025-11:51:12] [I] [TRT] [MemUsageChange] Init CUDA: CPU +13, GPU +0, now: CPU 16, GPU 12850 (MiB)
[01/18/2025-11:51:15] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +1774, GPU +312, now: CPU 1926, GPU 13162 (MiB)
[01/18/2025-11:51:15] [I] Start parsing network model.
[01/18/2025-11:51:15] [I] [TRT] ----------------------------------------------------------------
[01/18/2025-11:51:15] [I] [TRT] Input filename:   pretrained_models/CosyVoice2-0.5B/flow.decoder.estimator.fp32.onnx
[01/18/2025-11:51:15] [I] [TRT] ONNX IR version:  0.0.8
[01/18/2025-11:51:15] [I] [TRT] Opset version:    17
[01/18/2025-11:51:15] [I] [TRT] Producer name:    pytorch
[01/18/2025-11:51:15] [I] [TRT] Producer version: 2.3.1
[01/18/2025-11:51:15] [I] [TRT] Domain:           
[01/18/2025-11:51:15] [I] [TRT] Model version:    0
[01/18/2025-11:51:15] [I] [TRT] Doc string:       
[01/18/2025-11:51:15] [I] [TRT] ----------------------------------------------------------------
[01/18/2025-11:51:16] [I] Finished parsing network model. Parse time: 0.820438
[01/18/2025-11:51:16] [W] Dynamic dimensions required for input: x, but no shapes were provided. Automatically overriding shape to: 2x80x1
[01/18/2025-11:51:16] [I] Set shape of input tensor x for optimization profile 0 to: MIN=2x80x1 OPT=2x80x1 MAX=2x80x1
[01/18/2025-11:51:16] [W] Dynamic dimensions required for input: mask, but no shapes were provided. Automatically overriding shape to: 2x1x1
[01/18/2025-11:51:16] [I] Set shape of input tensor mask for optimization profile 0 to: MIN=2x1x1 OPT=2x1x1 MAX=2x1x1
[01/18/2025-11:51:16] [W] Dynamic dimensions required for input: mu, but no shapes were provided. Automatically overriding shape to: 2x80x1
[01/18/2025-11:51:16] [I] Set shape of input tensor mu for optimization profile 0 to: MIN=2x80x1 OPT=2x80x1 MAX=2x80x1
[01/18/2025-11:51:16] [W] Dynamic dimensions required for input: cond, but no shapes were provided. Automatically overriding shape to: 2x80x1
[01/18/2025-11:51:16] [I] Set shape of input tensor cond for optimization profile 0 to: MIN=2x80x1 OPT=2x80x1 MAX=2x80x1
[01/18/2025-11:51:16] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[01/18/2025-11:52:02] [I] [TRT] [GraphReduction] The approximate region cut reduction algorithm is called.
[01/18/2025-11:52:02] [I] [TRT] Detected 6 inputs and 1 output network tensors.
[01/18/2025-11:52:04] [I] [TRT] Total Host Persistent Memory: 417360
[01/18/2025-11:52:04] [I] [TRT] Total Device Persistent Memory: 3146240
[01/18/2025-11:52:04] [I] [TRT] Total Scratch Memory: 18874880
[01/18/2025-11:52:04] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 353 steps to complete.
[01/18/2025-11:52:04] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 15.7201ms to assign 11 blocks to 353 nodes requiring 18897408 bytes.
[01/18/2025-11:52:04] [I] [TRT] Total Activation Memory: 18896384
[01/18/2025-11:52:04] [I] [TRT] Total Weights Memory: 285212224
[01/18/2025-11:52:04] [I] [TRT] Engine generation completed in 47.3271 seconds.
[01/18/2025-11:52:04] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 4 MiB, GPU 272 MiB
[01/18/2025-11:52:04] [I] [TRT] [MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 3059 MiB
[01/18/2025-11:52:04] [I] Engine built in 48.2243 sec.
[01/18/2025-11:52:04] [I] Created engine with size: 286.258 MiB
[01/18/2025-11:52:05] [I] [TRT] Loaded engine size: 286 MiB
[01/18/2025-11:52:05] [I] Engine deserialized in 0.291931 sec.
[01/18/2025-11:52:05] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +22, now: CPU 0, GPU 293 (MiB)
[01/18/2025-11:52:05] [I] Setting persistentCacheLimit to 0 bytes.
[01/18/2025-11:52:05] [I] Created execution context with device memory size: 18.021 MiB
[01/18/2025-11:52:05] [I] Using random values for input x
[01/18/2025-11:52:05] [I] Input binding for x with dimensions 2x80x1 is created.
[01/18/2025-11:52:05] [I] Using random values for input mask
[01/18/2025-11:52:05] [I] Input binding for mask with dimensions 2x1x1 is created.
[01/18/2025-11:52:05] [I] Using random values for input mu
[01/18/2025-11:52:05] [I] Input binding for mu with dimensions 2x80x1 is created.
[01/18/2025-11:52:05] [I] Using random values for input t
[01/18/2025-11:52:05] [I] Input binding for t with dimensions 2 is created.
[01/18/2025-11:52:05] [I] Using random values for input spks
[01/18/2025-11:52:05] [I] Input binding for spks with dimensions 2x80 is created.
[01/18/2025-11:52:05] [I] Using random values for input cond
[01/18/2025-11:52:05] [I] Input binding for cond with dimensions 2x80x1 is created.
[01/18/2025-11:52:05] [I] Output binding for dphi_dt with dimensions 2x80x1 is created.
[01/18/2025-11:52:05] [I] Starting inference
[01/18/2025-11:52:09] [I] Warmup completed 90 queries over 200 ms
[01/18/2025-11:52:09] [I] Timing trace has 1384 queries over 3.00522 s
[01/18/2025-11:52:09] [I] 
[01/18/2025-11:52:09] [I] === Trace details ===
[01/18/2025-11:52:09] [I] Trace averages of 10 runs:
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12661 ms - Host latency: 2.15258 ms (enqueue 2.12436 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12116 ms - Host latency: 2.14765 ms (enqueue 2.11872 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12306 ms - Host latency: 2.14832 ms (enqueue 2.11936 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12483 ms - Host latency: 2.15106 ms (enqueue 2.12301 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.1176 ms - Host latency: 2.14309 ms (enqueue 2.11544 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12877 ms - Host latency: 2.15436 ms (enqueue 2.12661 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11874 ms - Host latency: 2.14407 ms (enqueue 2.11649 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.1258 ms - Host latency: 2.15309 ms (enqueue 2.12383 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.1255 ms - Host latency: 2.15047 ms (enqueue 2.12329 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11781 ms - Host latency: 2.14308 ms (enqueue 2.11576 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12307 ms - Host latency: 2.15019 ms (enqueue 2.12084 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12646 ms - Host latency: 2.15181 ms (enqueue 2.12444 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.13363 ms - Host latency: 2.15887 ms (enqueue 2.13131 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12795 ms - Host latency: 2.15281 ms (enqueue 2.12585 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.13575 ms - Host latency: 2.16146 ms (enqueue 2.13359 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12522 ms - Host latency: 2.15087 ms (enqueue 2.12328 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12084 ms - Host latency: 2.1466 ms (enqueue 2.11885 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12695 ms - Host latency: 2.15226 ms (enqueue 2.12504 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.14701 ms - Host latency: 2.17227 ms (enqueue 2.14489 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.13661 ms - Host latency: 2.16226 ms (enqueue 2.13466 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12032 ms - Host latency: 2.145 ms (enqueue 2.1185 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12571 ms - Host latency: 2.15048 ms (enqueue 2.1234 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12784 ms - Host latency: 2.15264 ms (enqueue 2.12567 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12178 ms - Host latency: 2.14659 ms (enqueue 2.11985 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12708 ms - Host latency: 2.15439 ms (enqueue 2.12512 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11148 ms - Host latency: 2.13685 ms (enqueue 2.10953 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11747 ms - Host latency: 2.14238 ms (enqueue 2.11548 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11729 ms - Host latency: 2.14282 ms (enqueue 2.1151 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12158 ms - Host latency: 2.14609 ms (enqueue 2.11965 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12279 ms - Host latency: 2.14864 ms (enqueue 2.12072 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11662 ms - Host latency: 2.14172 ms (enqueue 2.11476 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.1163 ms - Host latency: 2.14063 ms (enqueue 2.11422 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11937 ms - Host latency: 2.14461 ms (enqueue 2.11734 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11581 ms - Host latency: 2.14034 ms (enqueue 2.11368 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12011 ms - Host latency: 2.1447 ms (enqueue 2.1184 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11886 ms - Host latency: 2.14375 ms (enqueue 2.11673 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12225 ms - Host latency: 2.14686 ms (enqueue 2.11992 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11774 ms - Host latency: 2.14262 ms (enqueue 2.11566 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12878 ms - Host latency: 2.15368 ms (enqueue 2.12668 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.1207 ms - Host latency: 2.14625 ms (enqueue 2.11862 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12682 ms - Host latency: 2.15305 ms (enqueue 2.12479 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11893 ms - Host latency: 2.1444 ms (enqueue 2.1171 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11796 ms - Host latency: 2.14232 ms (enqueue 2.11575 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11998 ms - Host latency: 2.14492 ms (enqueue 2.11781 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11877 ms - Host latency: 2.14423 ms (enqueue 2.11671 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.13579 ms - Host latency: 2.16085 ms (enqueue 2.13369 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12563 ms - Host latency: 2.15126 ms (enqueue 2.12352 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.15004 ms - Host latency: 2.17537 ms (enqueue 2.14795 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12207 ms - Host latency: 2.14727 ms (enqueue 2.11952 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.1243 ms - Host latency: 2.14938 ms (enqueue 2.12239 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11882 ms - Host latency: 2.14395 ms (enqueue 2.11688 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12448 ms - Host latency: 2.14967 ms (enqueue 2.1224 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.13065 ms - Host latency: 2.15562 ms (enqueue 2.12866 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12024 ms - Host latency: 2.14487 ms (enqueue 2.11813 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12733 ms - Host latency: 2.15367 ms (enqueue 2.12513 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12642 ms - Host latency: 2.1522 ms (enqueue 2.12434 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12942 ms - Host latency: 2.15387 ms (enqueue 2.12745 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12333 ms - Host latency: 2.14806 ms (enqueue 2.12141 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12017 ms - Host latency: 2.14554 ms (enqueue 2.11735 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11353 ms - Host latency: 2.13795 ms (enqueue 2.11158 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.1267 ms - Host latency: 2.15125 ms (enqueue 2.12457 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.13115 ms - Host latency: 2.15604 ms (enqueue 2.12915 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12421 ms - Host latency: 2.14938 ms (enqueue 2.12223 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.1181 ms - Host latency: 2.14348 ms (enqueue 2.11595 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.10984 ms - Host latency: 2.13479 ms (enqueue 2.10765 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11971 ms - Host latency: 2.14521 ms (enqueue 2.1177 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11877 ms - Host latency: 2.14482 ms (enqueue 2.11647 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11548 ms - Host latency: 2.14062 ms (enqueue 2.1134 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11283 ms - Host latency: 2.13875 ms (enqueue 2.11079 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11995 ms - Host latency: 2.1453 ms (enqueue 2.11763 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12115 ms - Host latency: 2.14644 ms (enqueue 2.11919 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11285 ms - Host latency: 2.13779 ms (enqueue 2.11073 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.10731 ms - Host latency: 2.1324 ms (enqueue 2.10519 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11097 ms - Host latency: 2.13715 ms (enqueue 2.10883 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11067 ms - Host latency: 2.13617 ms (enqueue 2.10862 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12876 ms - Host latency: 2.15474 ms (enqueue 2.12664 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12552 ms - Host latency: 2.15145 ms (enqueue 2.12354 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.13806 ms - Host latency: 2.16545 ms (enqueue 2.13585 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12737 ms - Host latency: 2.15391 ms (enqueue 2.1255 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12969 ms - Host latency: 2.15553 ms (enqueue 2.1275 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12889 ms - Host latency: 2.15465 ms (enqueue 2.12675 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12046 ms - Host latency: 2.1459 ms (enqueue 2.11849 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.13052 ms - Host latency: 2.1558 ms (enqueue 2.12837 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12131 ms - Host latency: 2.14712 ms (enqueue 2.11927 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.14629 ms - Host latency: 2.17253 ms (enqueue 2.14429 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12396 ms - Host latency: 2.14949 ms (enqueue 2.12174 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.13228 ms - Host latency: 2.15742 ms (enqueue 2.13005 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12117 ms - Host latency: 2.14575 ms (enqueue 2.11899 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12109 ms - Host latency: 2.14553 ms (enqueue 2.11924 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.14695 ms - Host latency: 2.17151 ms (enqueue 2.14441 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12253 ms - Host latency: 2.14705 ms (enqueue 2.12014 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.13162 ms - Host latency: 2.15642 ms (enqueue 2.1293 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11834 ms - Host latency: 2.14431 ms (enqueue 2.11646 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.13293 ms - Host latency: 2.15806 ms (enqueue 2.13084 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.1231 ms - Host latency: 2.14919 ms (enqueue 2.12114 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12075 ms - Host latency: 2.14526 ms (enqueue 2.11877 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12458 ms - Host latency: 2.15134 ms (enqueue 2.12236 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12959 ms - Host latency: 2.15503 ms (enqueue 2.12764 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.14263 ms - Host latency: 2.16736 ms (enqueue 2.14065 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12393 ms - Host latency: 2.14856 ms (enqueue 2.12197 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.14741 ms - Host latency: 2.17231 ms (enqueue 2.14556 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12173 ms - Host latency: 2.14797 ms (enqueue 2.1197 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.13237 ms - Host latency: 2.15781 ms (enqueue 2.13013 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12041 ms - Host latency: 2.14756 ms (enqueue 2.11819 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11731 ms - Host latency: 2.14277 ms (enqueue 2.11516 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.16917 ms - Host latency: 2.19475 ms (enqueue 2.16658 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.16716 ms - Host latency: 2.19377 ms (enqueue 2.16438 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.16995 ms - Host latency: 2.19619 ms (enqueue 2.16775 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.15891 ms - Host latency: 2.18523 ms (enqueue 2.15645 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.16753 ms - Host latency: 2.19355 ms (enqueue 2.16533 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.14878 ms - Host latency: 2.17466 ms (enqueue 2.14648 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.15544 ms - Host latency: 2.18259 ms (enqueue 2.15305 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12834 ms - Host latency: 2.15544 ms (enqueue 2.12646 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11301 ms - Host latency: 2.13809 ms (enqueue 2.11104 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11716 ms - Host latency: 2.14331 ms (enqueue 2.11514 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11753 ms - Host latency: 2.1428 ms (enqueue 2.11567 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12375 ms - Host latency: 2.14907 ms (enqueue 2.12163 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11836 ms - Host latency: 2.14343 ms (enqueue 2.11646 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11873 ms - Host latency: 2.14409 ms (enqueue 2.11667 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12786 ms - Host latency: 2.1532 ms (enqueue 2.12571 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11641 ms - Host latency: 2.14146 ms (enqueue 2.11462 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12397 ms - Host latency: 2.14973 ms (enqueue 2.12192 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12588 ms - Host latency: 2.15156 ms (enqueue 2.12383 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.1283 ms - Host latency: 2.15356 ms (enqueue 2.12629 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11834 ms - Host latency: 2.14353 ms (enqueue 2.11626 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12664 ms - Host latency: 2.15212 ms (enqueue 2.12483 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11736 ms - Host latency: 2.14453 ms (enqueue 2.11548 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11875 ms - Host latency: 2.14443 ms (enqueue 2.11685 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12131 ms - Host latency: 2.1469 ms (enqueue 2.11912 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.15535 ms - Host latency: 2.18101 ms (enqueue 2.15315 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.18926 ms - Host latency: 2.2145 ms (enqueue 2.18716 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.21328 ms - Host latency: 2.24062 ms (enqueue 2.21079 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.13293 ms - Host latency: 2.15908 ms (enqueue 2.13057 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.35378 ms - Host latency: 2.38013 ms (enqueue 2.35144 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.13975 ms - Host latency: 2.1634 ms (enqueue 2.13787 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12556 ms - Host latency: 2.151 ms (enqueue 2.12375 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.1208 ms - Host latency: 2.14438 ms (enqueue 2.1186 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12668 ms - Host latency: 2.15051 ms (enqueue 2.12461 ms)
[01/18/2025-11:52:09] [I] 
[01/18/2025-11:52:09] [I] === Performance summary ===
[01/18/2025-11:52:09] [I] Throughput: 460.532 qps
[01/18/2025-11:52:09] [I] Latency: min = 2.1167 ms, max = 4.2041 ms, mean = 2.1545 ms, median = 2.14783 ms, percentile(90%) = 2.18115 ms, percentile(95%) = 2.19379 ms, percentile(99%) = 2.26636 ms
[01/18/2025-11:52:09] [I] Enqueue Time: min = 2.08948 ms, max = 4.15747 ms, mean = 2.12695 ms, median = 2.12012 ms, percentile(90%) = 2.15259 ms, percentile(95%) = 2.16455 ms, percentile(99%) = 2.23877 ms
[01/18/2025-11:52:09] [I] H2D Latency: min = 0.0166016 ms, max = 0.0350037 ms, mean = 0.0180589 ms, median = 0.0178223 ms, percentile(90%) = 0.0184631 ms, percentile(95%) = 0.0187988 ms, percentile(99%) = 0.0253906 ms
[01/18/2025-11:52:09] [I] GPU Compute Time: min = 2.09106 ms, max = 4.16382 ms, mean = 2.12906 ms, median = 2.1223 ms, percentile(90%) = 2.15552 ms, percentile(95%) = 2.16772 ms, percentile(99%) = 2.24048 ms
[01/18/2025-11:52:09] [I] D2H Latency: min = 0.00537109 ms, max = 0.0267639 ms, mean = 0.00738133 ms, median = 0.00723267 ms, percentile(90%) = 0.00796509 ms, percentile(95%) = 0.00828552 ms, percentile(99%) = 0.0129395 ms
[01/18/2025-11:52:09] [I] Total Host Walltime: 3.00522 s
[01/18/2025-11:52:09] [I] Total GPU Compute Time: 2.94662 s
[01/18/2025-11:52:09] [W] * Throughput may be bound by Enqueue Time rather than GPU Compute and the GPU may be under-utilized.
[01/18/2025-11:52:09] [W]   If not already in use, --useCudaGraph (utilize CUDA graphs where possible) may increase the throughput.
[01/18/2025-11:52:09] [W] * GPU compute time is unstable, with coefficient of variance = 2.8653%.
[01/18/2025-11:52:09] [W]   If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[01/18/2025-11:52:09] [I] Explanations of the performance metrics are printed in the verbose logs.
[01/18/2025-11:52:09] [I] 
&&&& PASSED TensorRT.trtexec [TensorRT v100001] # trtexec --onnx=pretrained_models/CosyVoice2-0.5B/flow.decoder.estimator.fp32.onnx --saveEngine=pretrained_models/CosyVoice2-0.5B/flow.decoder.estimator.fp32.4090.plan

Then I encountered the following errors while using the rebuild plan for inference.

failed to import ttsfrd, use WeTextProcessing instead
2025-01-18 11:52:31,129 INFO load cosyvoice2: load_jit True, load_trt True, fp16 False
/home/ubuntu/miniconda3/lib/python3.12/site-packages/diffusers/models/lora.py:393: FutureWarning: `LoRACompatibleLinear` is deprecated and will be removed in version 1.0.0. Use of `LoRACompatibleLinear` is deprecated. Please switch to PEFT backend by installing PEFT: `pip install peft`.
  deprecate("LoRACompatibleLinear", "1.0.0", deprecation_message)
2025-01-18 11:52:34,623 INFO input frame rate=25
/home/ubuntu/miniconda3/lib/python3.12/site-packages/torch/nn/utils/weight_norm.py:28: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
/data/codes/CosyVoice/runtime/python/grpc/../../../cosyvoice/dataset/processor.py:24: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
  torchaudio.set_audio_backend('soundfile')
/home/ubuntu/miniconda3/lib/python3.12/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:115: UserWarning: Specified provider 'CUDAExecutionProvider' is not in available provider names.Available providers: 'AzureExecutionProvider, CPUExecutionProvider'
  warnings.warn(
2025-01-18 11:52:37,007 WETEXT INFO building fst for zh_normalizer ...
2025-01-18 11:52:37,007 INFO building fst for zh_normalizer ...
2025-01-18 11:53:06,924 WETEXT INFO done
2025-01-18 11:53:06,924 INFO done
2025-01-18 11:53:06,924 WETEXT INFO fst path: /home/ubuntu/miniconda3/lib/python3.12/site-packages/tn/zh_tn_tagger.fst
2025-01-18 11:53:06,924 INFO fst path: /home/ubuntu/miniconda3/lib/python3.12/site-packages/tn/zh_tn_tagger.fst
2025-01-18 11:53:06,924 WETEXT INFO           /home/ubuntu/miniconda3/lib/python3.12/site-packages/tn/zh_tn_verbalizer.fst
2025-01-18 11:53:06,924 INFO           /home/ubuntu/miniconda3/lib/python3.12/site-packages/tn/zh_tn_verbalizer.fst
2025-01-18 11:53:06,929 WETEXT INFO found existing fst: /home/ubuntu/miniconda3/lib/python3.12/site-packages/tn/en_tn_tagger.fst
2025-01-18 11:53:06,929 INFO found existing fst: /home/ubuntu/miniconda3/lib/python3.12/site-packages/tn/en_tn_tagger.fst
2025-01-18 11:53:06,929 WETEXT INFO                     /home/ubuntu/miniconda3/lib/python3.12/site-packages/tn/en_tn_verbalizer.fst
2025-01-18 11:53:06,929 INFO                     /home/ubuntu/miniconda3/lib/python3.12/site-packages/tn/en_tn_verbalizer.fst
2025-01-18 11:53:06,929 WETEXT INFO skip building fst for en_normalizer ...
2025-01-18 11:53:06,929 INFO skip building fst for en_normalizer ...
2025-01-18 11:53:08,026 INFO reset cosyvoice2: load_jit True, load_trt True, fp16 False
2025-01-18 11:53:10,334 INFO load jit model
2025-01-18 11:53:10,613 INFO load trt model
[01/18/2025-11:53:10] [TRT] [I] Loaded engine size: 286 MiB
[01/18/2025-11:53:11] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +22, now: CPU 0, GPU 293 (MiB)
2025-01-18 11:53:11,214 INFO grpc service initialized
2025-01-18 11:53:11,216 INFO server listening on 0.0.0.0:50000
2025-01-18 11:53:15,749 INFO get instruct inference request
2025-01-18 11:53:15,749 INFO inference instruct 1.4543533325195312e-05
2025-01-18 11:53:15,749 INFO send inference response
  0%|                                                                                                                                                                                                  | 0/1 [00:00<?, ?it/s]2025-01-18 11:53:15,764 INFO instruct_text 你好,我是通义实验室语音合成大模型。
2025-01-18 11:53:15,771 INFO instruct_text 你能用粤语的口音说吗？<|endofprompt|>
2025-01-18 11:53:15,772 INFO synthesis text 你好,我是通义实验室语音合成大模型。
We detected that you are passing `past_key_values` as a tuple of tuples. This is deprecated and will be removed in v4.47. Please convert your cache or use an appropriate `Cache` class (https://huggingface.co/docs/transformers/kv_cache#legacy-cache-format)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:19] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:19] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:19] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:19] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:19] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:19] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:19] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:19] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:19] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)

I have no idea why this error occurred. Could someone kindly assist me? Thank you!

The text was updated successfully, but these errors were encountered:

aluminumbox · 2025-01-18T14:39:09Z

first verify our instruct_text code in readme

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error encountered while using TensorRT for inference #901

Error encountered while using TensorRT for inference #901

oshindow commented Jan 18, 2025

aluminumbox commented Jan 18, 2025

Error encountered while using TensorRT for inference #901

Error encountered while using TensorRT for inference #901

Comments

oshindow commented Jan 18, 2025

aluminumbox commented Jan 18, 2025