Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error encountered while using TensorRT for inference #901

Open
oshindow opened this issue Jan 18, 2025 · 1 comment
Open

Error encountered while using TensorRT for inference #901

oshindow opened this issue Jan 18, 2025 · 1 comment

Comments

@oshindow
Copy link

Hi, I am trying to rebuild the TRT plan on a 4090 GPU. The command I am using is:

trtexec --onnx=pretrained_models/CosyVoice2-0.5B/flow.decoder.estimator.fp32.onnx \
    --saveEngine=pretrained_models/CosyVoice2-0.5B/flow.decoder.estimator.fp32.4090.plan

...and the output is:

&&&& RUNNING TensorRT.trtexec [TensorRT v100001] # trtexec --onnx=pretrained_models/CosyVoice2-0.5B/flow.decoder.estimator.fp32.onnx --saveEngine=pretrained_models/CosyVoice2-0.5B/flow.decoder.estimator.fp32.4090.plan
[01/18/2025-11:51:12] [I] === Model Options ===
[01/18/2025-11:51:12] [I] Format: ONNX
[01/18/2025-11:51:12] [I] Model: pretrained_models/CosyVoice2-0.5B/flow.decoder.estimator.fp32.onnx
[01/18/2025-11:51:12] [I] Output:
[01/18/2025-11:51:12] [I] === Build Options ===
[01/18/2025-11:51:12] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default, tacticSharedMem: default
[01/18/2025-11:51:12] [I] avgTiming: 8
[01/18/2025-11:51:12] [I] Precision: FP32
[01/18/2025-11:51:12] [I] LayerPrecisions: 
[01/18/2025-11:51:12] [I] Layer Device Types: 
[01/18/2025-11:51:12] [I] Calibration: 
[01/18/2025-11:51:12] [I] Refit: Disabled
[01/18/2025-11:51:12] [I] Strip weights: Disabled
[01/18/2025-11:51:12] [I] Version Compatible: Disabled
[01/18/2025-11:51:12] [I] ONNX Plugin InstanceNorm: Disabled
[01/18/2025-11:51:12] [I] TensorRT runtime: full
[01/18/2025-11:51:12] [I] Lean DLL Path: 
[01/18/2025-11:51:12] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[01/18/2025-11:51:12] [I] Exclude Lean Runtime: Disabled
[01/18/2025-11:51:12] [I] Sparsity: Disabled
[01/18/2025-11:51:12] [I] Safe mode: Disabled
[01/18/2025-11:51:12] [I] Build DLA standalone loadable: Disabled
[01/18/2025-11:51:12] [I] Allow GPU fallback for DLA: Disabled
[01/18/2025-11:51:12] [I] DirectIO mode: Disabled
[01/18/2025-11:51:12] [I] Restricted mode: Disabled
[01/18/2025-11:51:12] [I] Skip inference: Disabled
[01/18/2025-11:51:12] [I] Save engine: pretrained_models/CosyVoice2-0.5B/flow.decoder.estimator.fp32.4090.plan
[01/18/2025-11:51:12] [I] Load engine: 
[01/18/2025-11:51:12] [I] Profiling verbosity: 0
[01/18/2025-11:51:12] [I] Tactic sources: Using default tactic sources
[01/18/2025-11:51:12] [I] timingCacheMode: local
[01/18/2025-11:51:12] [I] timingCacheFile: 
[01/18/2025-11:51:12] [I] Enable Compilation Cache: Enabled
[01/18/2025-11:51:12] [I] errorOnTimingCacheMiss: Disabled
[01/18/2025-11:51:12] [I] Preview Features: Use default preview flags.
[01/18/2025-11:51:12] [I] MaxAuxStreams: -1
[01/18/2025-11:51:12] [I] BuilderOptimizationLevel: -1
[01/18/2025-11:51:12] [I] Calibration Profile Index: 0
[01/18/2025-11:51:12] [I] Weight Streaming: Disabled
[01/18/2025-11:51:12] [I] Debug Tensors: 
[01/18/2025-11:51:12] [I] Input(s)s format: fp32:CHW
[01/18/2025-11:51:12] [I] Output(s)s format: fp32:CHW
[01/18/2025-11:51:12] [I] Input build shapes: model
[01/18/2025-11:51:12] [I] Input calibration shapes: model
[01/18/2025-11:51:12] [I] === System Options ===
[01/18/2025-11:51:12] [I] Device: 0
[01/18/2025-11:51:12] [I] DLACore: 
[01/18/2025-11:51:12] [I] Plugins:
[01/18/2025-11:51:12] [I] setPluginsToSerialize:
[01/18/2025-11:51:12] [I] dynamicPlugins:
[01/18/2025-11:51:12] [I] ignoreParsedPluginLibs: 0
[01/18/2025-11:51:12] [I] 
[01/18/2025-11:51:12] [I] === Inference Options ===
[01/18/2025-11:51:12] [I] Batch: Explicit
[01/18/2025-11:51:12] [I] Input inference shapes: model
[01/18/2025-11:51:12] [I] Iterations: 10
[01/18/2025-11:51:12] [I] Duration: 3s (+ 200ms warm up)
[01/18/2025-11:51:12] [I] Sleep time: 0ms
[01/18/2025-11:51:12] [I] Idle time: 0ms
[01/18/2025-11:51:12] [I] Inference Streams: 1
[01/18/2025-11:51:12] [I] ExposeDMA: Disabled
[01/18/2025-11:51:12] [I] Data transfers: Enabled
[01/18/2025-11:51:12] [I] Spin-wait: Disabled
[01/18/2025-11:51:12] [I] Multithreading: Disabled
[01/18/2025-11:51:12] [I] CUDA Graph: Disabled
[01/18/2025-11:51:12] [I] Separate profiling: Disabled
[01/18/2025-11:51:12] [I] Time Deserialize: Disabled
[01/18/2025-11:51:12] [I] Time Refit: Disabled
[01/18/2025-11:51:12] [I] NVTX verbosity: 0
[01/18/2025-11:51:12] [I] Persistent Cache Ratio: 0
[01/18/2025-11:51:12] [I] Optimization Profile Index: 0
[01/18/2025-11:51:12] [I] Weight Streaming Budget: Disabled
[01/18/2025-11:51:12] [I] Inputs:
[01/18/2025-11:51:12] [I] Debug Tensor Save Destinations:
[01/18/2025-11:51:12] [I] === Reporting Options ===
[01/18/2025-11:51:12] [I] Verbose: Disabled
[01/18/2025-11:51:12] [I] Averages: 10 inferences
[01/18/2025-11:51:12] [I] Percentiles: 90,95,99
[01/18/2025-11:51:12] [I] Dump refittable layers:Disabled
[01/18/2025-11:51:12] [I] Dump output: Disabled
[01/18/2025-11:51:12] [I] Profile: Disabled
[01/18/2025-11:51:12] [I] Export timing to JSON file: 
[01/18/2025-11:51:12] [I] Export output to JSON file: 
[01/18/2025-11:51:12] [I] Export profile to JSON file: 
[01/18/2025-11:51:12] [I] 
[01/18/2025-11:51:12] [I] === Device Information ===
[01/18/2025-11:51:12] [I] Available Devices: 
[01/18/2025-11:51:12] [I]   Device 0: "NVIDIA GeForce RTX 4090" UUID: GPU-2ef010d0-11a4-b12b-c05b-027a9f6ec187
[01/18/2025-11:51:12] [I]   Device 1: "NVIDIA GeForce RTX 4090" UUID: GPU-d3f842b8-770d-ee14-8817-4764e3f405dc
[01/18/2025-11:51:12] [I]   Device 2: "NVIDIA GeForce RTX 4090" UUID: GPU-4c447716-f03b-bd64-77dc-d42daecf1418
[01/18/2025-11:51:12] [I]   Device 3: "NVIDIA GeForce RTX 4090" UUID: GPU-0111000f-de93-24c9-4d9b-0e0b917c79d6
[01/18/2025-11:51:12] [I] Selected Device: NVIDIA GeForce RTX 4090
[01/18/2025-11:51:12] [I] Selected Device ID: 0
[01/18/2025-11:51:12] [I] Selected Device UUID: GPU-2ef010d0-11a4-b12b-c05b-027a9f6ec187
[01/18/2025-11:51:12] [I] Compute Capability: 8.9
[01/18/2025-11:51:12] [I] SMs: 128
[01/18/2025-11:51:12] [I] Device Global Memory: 24217 MiB
[01/18/2025-11:51:12] [I] Shared Memory per SM: 100 KiB
[01/18/2025-11:51:12] [I] Memory Bus Width: 384 bits (ECC disabled)
[01/18/2025-11:51:12] [I] Application Compute Clock Rate: 2.52 GHz
[01/18/2025-11:51:12] [I] Application Memory Clock Rate: 10.501 GHz
[01/18/2025-11:51:12] [I] 
[01/18/2025-11:51:12] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[01/18/2025-11:51:12] [I] 
[01/18/2025-11:51:12] [I] TensorRT version: 10.0.1
[01/18/2025-11:51:12] [I] Loading standard plugins
[01/18/2025-11:51:12] [I] [TRT] [MemUsageChange] Init CUDA: CPU +13, GPU +0, now: CPU 16, GPU 12850 (MiB)
[01/18/2025-11:51:15] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +1774, GPU +312, now: CPU 1926, GPU 13162 (MiB)
[01/18/2025-11:51:15] [I] Start parsing network model.
[01/18/2025-11:51:15] [I] [TRT] ----------------------------------------------------------------
[01/18/2025-11:51:15] [I] [TRT] Input filename:   pretrained_models/CosyVoice2-0.5B/flow.decoder.estimator.fp32.onnx
[01/18/2025-11:51:15] [I] [TRT] ONNX IR version:  0.0.8
[01/18/2025-11:51:15] [I] [TRT] Opset version:    17
[01/18/2025-11:51:15] [I] [TRT] Producer name:    pytorch
[01/18/2025-11:51:15] [I] [TRT] Producer version: 2.3.1
[01/18/2025-11:51:15] [I] [TRT] Domain:           
[01/18/2025-11:51:15] [I] [TRT] Model version:    0
[01/18/2025-11:51:15] [I] [TRT] Doc string:       
[01/18/2025-11:51:15] [I] [TRT] ----------------------------------------------------------------
[01/18/2025-11:51:16] [I] Finished parsing network model. Parse time: 0.820438
[01/18/2025-11:51:16] [W] Dynamic dimensions required for input: x, but no shapes were provided. Automatically overriding shape to: 2x80x1
[01/18/2025-11:51:16] [I] Set shape of input tensor x for optimization profile 0 to: MIN=2x80x1 OPT=2x80x1 MAX=2x80x1
[01/18/2025-11:51:16] [W] Dynamic dimensions required for input: mask, but no shapes were provided. Automatically overriding shape to: 2x1x1
[01/18/2025-11:51:16] [I] Set shape of input tensor mask for optimization profile 0 to: MIN=2x1x1 OPT=2x1x1 MAX=2x1x1
[01/18/2025-11:51:16] [W] Dynamic dimensions required for input: mu, but no shapes were provided. Automatically overriding shape to: 2x80x1
[01/18/2025-11:51:16] [I] Set shape of input tensor mu for optimization profile 0 to: MIN=2x80x1 OPT=2x80x1 MAX=2x80x1
[01/18/2025-11:51:16] [W] Dynamic dimensions required for input: cond, but no shapes were provided. Automatically overriding shape to: 2x80x1
[01/18/2025-11:51:16] [I] Set shape of input tensor cond for optimization profile 0 to: MIN=2x80x1 OPT=2x80x1 MAX=2x80x1
[01/18/2025-11:51:16] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[01/18/2025-11:52:02] [I] [TRT] [GraphReduction] The approximate region cut reduction algorithm is called.
[01/18/2025-11:52:02] [I] [TRT] Detected 6 inputs and 1 output network tensors.
[01/18/2025-11:52:04] [I] [TRT] Total Host Persistent Memory: 417360
[01/18/2025-11:52:04] [I] [TRT] Total Device Persistent Memory: 3146240
[01/18/2025-11:52:04] [I] [TRT] Total Scratch Memory: 18874880
[01/18/2025-11:52:04] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 353 steps to complete.
[01/18/2025-11:52:04] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 15.7201ms to assign 11 blocks to 353 nodes requiring 18897408 bytes.
[01/18/2025-11:52:04] [I] [TRT] Total Activation Memory: 18896384
[01/18/2025-11:52:04] [I] [TRT] Total Weights Memory: 285212224
[01/18/2025-11:52:04] [I] [TRT] Engine generation completed in 47.3271 seconds.
[01/18/2025-11:52:04] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 4 MiB, GPU 272 MiB
[01/18/2025-11:52:04] [I] [TRT] [MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 3059 MiB
[01/18/2025-11:52:04] [I] Engine built in 48.2243 sec.
[01/18/2025-11:52:04] [I] Created engine with size: 286.258 MiB
[01/18/2025-11:52:05] [I] [TRT] Loaded engine size: 286 MiB
[01/18/2025-11:52:05] [I] Engine deserialized in 0.291931 sec.
[01/18/2025-11:52:05] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +22, now: CPU 0, GPU 293 (MiB)
[01/18/2025-11:52:05] [I] Setting persistentCacheLimit to 0 bytes.
[01/18/2025-11:52:05] [I] Created execution context with device memory size: 18.021 MiB
[01/18/2025-11:52:05] [I] Using random values for input x
[01/18/2025-11:52:05] [I] Input binding for x with dimensions 2x80x1 is created.
[01/18/2025-11:52:05] [I] Using random values for input mask
[01/18/2025-11:52:05] [I] Input binding for mask with dimensions 2x1x1 is created.
[01/18/2025-11:52:05] [I] Using random values for input mu
[01/18/2025-11:52:05] [I] Input binding for mu with dimensions 2x80x1 is created.
[01/18/2025-11:52:05] [I] Using random values for input t
[01/18/2025-11:52:05] [I] Input binding for t with dimensions 2 is created.
[01/18/2025-11:52:05] [I] Using random values for input spks
[01/18/2025-11:52:05] [I] Input binding for spks with dimensions 2x80 is created.
[01/18/2025-11:52:05] [I] Using random values for input cond
[01/18/2025-11:52:05] [I] Input binding for cond with dimensions 2x80x1 is created.
[01/18/2025-11:52:05] [I] Output binding for dphi_dt with dimensions 2x80x1 is created.
[01/18/2025-11:52:05] [I] Starting inference
[01/18/2025-11:52:09] [I] Warmup completed 90 queries over 200 ms
[01/18/2025-11:52:09] [I] Timing trace has 1384 queries over 3.00522 s
[01/18/2025-11:52:09] [I] 
[01/18/2025-11:52:09] [I] === Trace details ===
[01/18/2025-11:52:09] [I] Trace averages of 10 runs:
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12661 ms - Host latency: 2.15258 ms (enqueue 2.12436 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12116 ms - Host latency: 2.14765 ms (enqueue 2.11872 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12306 ms - Host latency: 2.14832 ms (enqueue 2.11936 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12483 ms - Host latency: 2.15106 ms (enqueue 2.12301 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.1176 ms - Host latency: 2.14309 ms (enqueue 2.11544 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12877 ms - Host latency: 2.15436 ms (enqueue 2.12661 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11874 ms - Host latency: 2.14407 ms (enqueue 2.11649 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.1258 ms - Host latency: 2.15309 ms (enqueue 2.12383 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.1255 ms - Host latency: 2.15047 ms (enqueue 2.12329 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11781 ms - Host latency: 2.14308 ms (enqueue 2.11576 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12307 ms - Host latency: 2.15019 ms (enqueue 2.12084 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12646 ms - Host latency: 2.15181 ms (enqueue 2.12444 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.13363 ms - Host latency: 2.15887 ms (enqueue 2.13131 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12795 ms - Host latency: 2.15281 ms (enqueue 2.12585 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.13575 ms - Host latency: 2.16146 ms (enqueue 2.13359 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12522 ms - Host latency: 2.15087 ms (enqueue 2.12328 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12084 ms - Host latency: 2.1466 ms (enqueue 2.11885 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12695 ms - Host latency: 2.15226 ms (enqueue 2.12504 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.14701 ms - Host latency: 2.17227 ms (enqueue 2.14489 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.13661 ms - Host latency: 2.16226 ms (enqueue 2.13466 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12032 ms - Host latency: 2.145 ms (enqueue 2.1185 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12571 ms - Host latency: 2.15048 ms (enqueue 2.1234 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12784 ms - Host latency: 2.15264 ms (enqueue 2.12567 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12178 ms - Host latency: 2.14659 ms (enqueue 2.11985 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12708 ms - Host latency: 2.15439 ms (enqueue 2.12512 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11148 ms - Host latency: 2.13685 ms (enqueue 2.10953 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11747 ms - Host latency: 2.14238 ms (enqueue 2.11548 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11729 ms - Host latency: 2.14282 ms (enqueue 2.1151 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12158 ms - Host latency: 2.14609 ms (enqueue 2.11965 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12279 ms - Host latency: 2.14864 ms (enqueue 2.12072 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11662 ms - Host latency: 2.14172 ms (enqueue 2.11476 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.1163 ms - Host latency: 2.14063 ms (enqueue 2.11422 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11937 ms - Host latency: 2.14461 ms (enqueue 2.11734 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11581 ms - Host latency: 2.14034 ms (enqueue 2.11368 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12011 ms - Host latency: 2.1447 ms (enqueue 2.1184 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11886 ms - Host latency: 2.14375 ms (enqueue 2.11673 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12225 ms - Host latency: 2.14686 ms (enqueue 2.11992 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11774 ms - Host latency: 2.14262 ms (enqueue 2.11566 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12878 ms - Host latency: 2.15368 ms (enqueue 2.12668 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.1207 ms - Host latency: 2.14625 ms (enqueue 2.11862 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12682 ms - Host latency: 2.15305 ms (enqueue 2.12479 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11893 ms - Host latency: 2.1444 ms (enqueue 2.1171 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11796 ms - Host latency: 2.14232 ms (enqueue 2.11575 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11998 ms - Host latency: 2.14492 ms (enqueue 2.11781 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11877 ms - Host latency: 2.14423 ms (enqueue 2.11671 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.13579 ms - Host latency: 2.16085 ms (enqueue 2.13369 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12563 ms - Host latency: 2.15126 ms (enqueue 2.12352 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.15004 ms - Host latency: 2.17537 ms (enqueue 2.14795 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12207 ms - Host latency: 2.14727 ms (enqueue 2.11952 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.1243 ms - Host latency: 2.14938 ms (enqueue 2.12239 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11882 ms - Host latency: 2.14395 ms (enqueue 2.11688 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12448 ms - Host latency: 2.14967 ms (enqueue 2.1224 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.13065 ms - Host latency: 2.15562 ms (enqueue 2.12866 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12024 ms - Host latency: 2.14487 ms (enqueue 2.11813 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12733 ms - Host latency: 2.15367 ms (enqueue 2.12513 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12642 ms - Host latency: 2.1522 ms (enqueue 2.12434 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12942 ms - Host latency: 2.15387 ms (enqueue 2.12745 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12333 ms - Host latency: 2.14806 ms (enqueue 2.12141 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12017 ms - Host latency: 2.14554 ms (enqueue 2.11735 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11353 ms - Host latency: 2.13795 ms (enqueue 2.11158 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.1267 ms - Host latency: 2.15125 ms (enqueue 2.12457 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.13115 ms - Host latency: 2.15604 ms (enqueue 2.12915 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12421 ms - Host latency: 2.14938 ms (enqueue 2.12223 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.1181 ms - Host latency: 2.14348 ms (enqueue 2.11595 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.10984 ms - Host latency: 2.13479 ms (enqueue 2.10765 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11971 ms - Host latency: 2.14521 ms (enqueue 2.1177 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11877 ms - Host latency: 2.14482 ms (enqueue 2.11647 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11548 ms - Host latency: 2.14062 ms (enqueue 2.1134 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11283 ms - Host latency: 2.13875 ms (enqueue 2.11079 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11995 ms - Host latency: 2.1453 ms (enqueue 2.11763 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12115 ms - Host latency: 2.14644 ms (enqueue 2.11919 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11285 ms - Host latency: 2.13779 ms (enqueue 2.11073 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.10731 ms - Host latency: 2.1324 ms (enqueue 2.10519 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11097 ms - Host latency: 2.13715 ms (enqueue 2.10883 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11067 ms - Host latency: 2.13617 ms (enqueue 2.10862 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12876 ms - Host latency: 2.15474 ms (enqueue 2.12664 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12552 ms - Host latency: 2.15145 ms (enqueue 2.12354 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.13806 ms - Host latency: 2.16545 ms (enqueue 2.13585 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12737 ms - Host latency: 2.15391 ms (enqueue 2.1255 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12969 ms - Host latency: 2.15553 ms (enqueue 2.1275 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12889 ms - Host latency: 2.15465 ms (enqueue 2.12675 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12046 ms - Host latency: 2.1459 ms (enqueue 2.11849 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.13052 ms - Host latency: 2.1558 ms (enqueue 2.12837 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12131 ms - Host latency: 2.14712 ms (enqueue 2.11927 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.14629 ms - Host latency: 2.17253 ms (enqueue 2.14429 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12396 ms - Host latency: 2.14949 ms (enqueue 2.12174 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.13228 ms - Host latency: 2.15742 ms (enqueue 2.13005 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12117 ms - Host latency: 2.14575 ms (enqueue 2.11899 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12109 ms - Host latency: 2.14553 ms (enqueue 2.11924 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.14695 ms - Host latency: 2.17151 ms (enqueue 2.14441 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12253 ms - Host latency: 2.14705 ms (enqueue 2.12014 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.13162 ms - Host latency: 2.15642 ms (enqueue 2.1293 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11834 ms - Host latency: 2.14431 ms (enqueue 2.11646 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.13293 ms - Host latency: 2.15806 ms (enqueue 2.13084 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.1231 ms - Host latency: 2.14919 ms (enqueue 2.12114 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12075 ms - Host latency: 2.14526 ms (enqueue 2.11877 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12458 ms - Host latency: 2.15134 ms (enqueue 2.12236 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12959 ms - Host latency: 2.15503 ms (enqueue 2.12764 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.14263 ms - Host latency: 2.16736 ms (enqueue 2.14065 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12393 ms - Host latency: 2.14856 ms (enqueue 2.12197 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.14741 ms - Host latency: 2.17231 ms (enqueue 2.14556 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12173 ms - Host latency: 2.14797 ms (enqueue 2.1197 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.13237 ms - Host latency: 2.15781 ms (enqueue 2.13013 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12041 ms - Host latency: 2.14756 ms (enqueue 2.11819 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11731 ms - Host latency: 2.14277 ms (enqueue 2.11516 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.16917 ms - Host latency: 2.19475 ms (enqueue 2.16658 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.16716 ms - Host latency: 2.19377 ms (enqueue 2.16438 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.16995 ms - Host latency: 2.19619 ms (enqueue 2.16775 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.15891 ms - Host latency: 2.18523 ms (enqueue 2.15645 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.16753 ms - Host latency: 2.19355 ms (enqueue 2.16533 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.14878 ms - Host latency: 2.17466 ms (enqueue 2.14648 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.15544 ms - Host latency: 2.18259 ms (enqueue 2.15305 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12834 ms - Host latency: 2.15544 ms (enqueue 2.12646 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11301 ms - Host latency: 2.13809 ms (enqueue 2.11104 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11716 ms - Host latency: 2.14331 ms (enqueue 2.11514 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11753 ms - Host latency: 2.1428 ms (enqueue 2.11567 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12375 ms - Host latency: 2.14907 ms (enqueue 2.12163 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11836 ms - Host latency: 2.14343 ms (enqueue 2.11646 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11873 ms - Host latency: 2.14409 ms (enqueue 2.11667 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12786 ms - Host latency: 2.1532 ms (enqueue 2.12571 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11641 ms - Host latency: 2.14146 ms (enqueue 2.11462 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12397 ms - Host latency: 2.14973 ms (enqueue 2.12192 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12588 ms - Host latency: 2.15156 ms (enqueue 2.12383 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.1283 ms - Host latency: 2.15356 ms (enqueue 2.12629 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11834 ms - Host latency: 2.14353 ms (enqueue 2.11626 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12664 ms - Host latency: 2.15212 ms (enqueue 2.12483 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11736 ms - Host latency: 2.14453 ms (enqueue 2.11548 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.11875 ms - Host latency: 2.14443 ms (enqueue 2.11685 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12131 ms - Host latency: 2.1469 ms (enqueue 2.11912 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.15535 ms - Host latency: 2.18101 ms (enqueue 2.15315 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.18926 ms - Host latency: 2.2145 ms (enqueue 2.18716 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.21328 ms - Host latency: 2.24062 ms (enqueue 2.21079 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.13293 ms - Host latency: 2.15908 ms (enqueue 2.13057 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.35378 ms - Host latency: 2.38013 ms (enqueue 2.35144 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.13975 ms - Host latency: 2.1634 ms (enqueue 2.13787 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12556 ms - Host latency: 2.151 ms (enqueue 2.12375 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.1208 ms - Host latency: 2.14438 ms (enqueue 2.1186 ms)
[01/18/2025-11:52:09] [I] Average on 10 runs - GPU latency: 2.12668 ms - Host latency: 2.15051 ms (enqueue 2.12461 ms)
[01/18/2025-11:52:09] [I] 
[01/18/2025-11:52:09] [I] === Performance summary ===
[01/18/2025-11:52:09] [I] Throughput: 460.532 qps
[01/18/2025-11:52:09] [I] Latency: min = 2.1167 ms, max = 4.2041 ms, mean = 2.1545 ms, median = 2.14783 ms, percentile(90%) = 2.18115 ms, percentile(95%) = 2.19379 ms, percentile(99%) = 2.26636 ms
[01/18/2025-11:52:09] [I] Enqueue Time: min = 2.08948 ms, max = 4.15747 ms, mean = 2.12695 ms, median = 2.12012 ms, percentile(90%) = 2.15259 ms, percentile(95%) = 2.16455 ms, percentile(99%) = 2.23877 ms
[01/18/2025-11:52:09] [I] H2D Latency: min = 0.0166016 ms, max = 0.0350037 ms, mean = 0.0180589 ms, median = 0.0178223 ms, percentile(90%) = 0.0184631 ms, percentile(95%) = 0.0187988 ms, percentile(99%) = 0.0253906 ms
[01/18/2025-11:52:09] [I] GPU Compute Time: min = 2.09106 ms, max = 4.16382 ms, mean = 2.12906 ms, median = 2.1223 ms, percentile(90%) = 2.15552 ms, percentile(95%) = 2.16772 ms, percentile(99%) = 2.24048 ms
[01/18/2025-11:52:09] [I] D2H Latency: min = 0.00537109 ms, max = 0.0267639 ms, mean = 0.00738133 ms, median = 0.00723267 ms, percentile(90%) = 0.00796509 ms, percentile(95%) = 0.00828552 ms, percentile(99%) = 0.0129395 ms
[01/18/2025-11:52:09] [I] Total Host Walltime: 3.00522 s
[01/18/2025-11:52:09] [I] Total GPU Compute Time: 2.94662 s
[01/18/2025-11:52:09] [W] * Throughput may be bound by Enqueue Time rather than GPU Compute and the GPU may be under-utilized.
[01/18/2025-11:52:09] [W]   If not already in use, --useCudaGraph (utilize CUDA graphs where possible) may increase the throughput.
[01/18/2025-11:52:09] [W] * GPU compute time is unstable, with coefficient of variance = 2.8653%.
[01/18/2025-11:52:09] [W]   If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[01/18/2025-11:52:09] [I] Explanations of the performance metrics are printed in the verbose logs.
[01/18/2025-11:52:09] [I] 
&&&& PASSED TensorRT.trtexec [TensorRT v100001] # trtexec --onnx=pretrained_models/CosyVoice2-0.5B/flow.decoder.estimator.fp32.onnx --saveEngine=pretrained_models/CosyVoice2-0.5B/flow.decoder.estimator.fp32.4090.plan

Then I encountered the following errors while using the rebuild plan for inference.

failed to import ttsfrd, use WeTextProcessing instead
2025-01-18 11:52:31,129 INFO load cosyvoice2: load_jit True, load_trt True, fp16 False
/home/ubuntu/miniconda3/lib/python3.12/site-packages/diffusers/models/lora.py:393: FutureWarning: `LoRACompatibleLinear` is deprecated and will be removed in version 1.0.0. Use of `LoRACompatibleLinear` is deprecated. Please switch to PEFT backend by installing PEFT: `pip install peft`.
  deprecate("LoRACompatibleLinear", "1.0.0", deprecation_message)
2025-01-18 11:52:34,623 INFO input frame rate=25
/home/ubuntu/miniconda3/lib/python3.12/site-packages/torch/nn/utils/weight_norm.py:28: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
/data/codes/CosyVoice/runtime/python/grpc/../../../cosyvoice/dataset/processor.py:24: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
  torchaudio.set_audio_backend('soundfile')
/home/ubuntu/miniconda3/lib/python3.12/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:115: UserWarning: Specified provider 'CUDAExecutionProvider' is not in available provider names.Available providers: 'AzureExecutionProvider, CPUExecutionProvider'
  warnings.warn(
2025-01-18 11:52:37,007 WETEXT INFO building fst for zh_normalizer ...
2025-01-18 11:52:37,007 INFO building fst for zh_normalizer ...
2025-01-18 11:53:06,924 WETEXT INFO done
2025-01-18 11:53:06,924 INFO done
2025-01-18 11:53:06,924 WETEXT INFO fst path: /home/ubuntu/miniconda3/lib/python3.12/site-packages/tn/zh_tn_tagger.fst
2025-01-18 11:53:06,924 INFO fst path: /home/ubuntu/miniconda3/lib/python3.12/site-packages/tn/zh_tn_tagger.fst
2025-01-18 11:53:06,924 WETEXT INFO           /home/ubuntu/miniconda3/lib/python3.12/site-packages/tn/zh_tn_verbalizer.fst
2025-01-18 11:53:06,924 INFO           /home/ubuntu/miniconda3/lib/python3.12/site-packages/tn/zh_tn_verbalizer.fst
2025-01-18 11:53:06,929 WETEXT INFO found existing fst: /home/ubuntu/miniconda3/lib/python3.12/site-packages/tn/en_tn_tagger.fst
2025-01-18 11:53:06,929 INFO found existing fst: /home/ubuntu/miniconda3/lib/python3.12/site-packages/tn/en_tn_tagger.fst
2025-01-18 11:53:06,929 WETEXT INFO                     /home/ubuntu/miniconda3/lib/python3.12/site-packages/tn/en_tn_verbalizer.fst
2025-01-18 11:53:06,929 INFO                     /home/ubuntu/miniconda3/lib/python3.12/site-packages/tn/en_tn_verbalizer.fst
2025-01-18 11:53:06,929 WETEXT INFO skip building fst for en_normalizer ...
2025-01-18 11:53:06,929 INFO skip building fst for en_normalizer ...
2025-01-18 11:53:08,026 INFO reset cosyvoice2: load_jit True, load_trt True, fp16 False
2025-01-18 11:53:10,334 INFO load jit model
2025-01-18 11:53:10,613 INFO load trt model
[01/18/2025-11:53:10] [TRT] [I] Loaded engine size: 286 MiB
[01/18/2025-11:53:11] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +22, now: CPU 0, GPU 293 (MiB)
2025-01-18 11:53:11,214 INFO grpc service initialized
2025-01-18 11:53:11,216 INFO server listening on 0.0.0.0:50000
2025-01-18 11:53:15,749 INFO get instruct inference request
2025-01-18 11:53:15,749 INFO inference instruct 1.4543533325195312e-05
2025-01-18 11:53:15,749 INFO send inference response
  0%|                                                                                                                                                                                                  | 0/1 [00:00<?, ?it/s]2025-01-18 11:53:15,764 INFO instruct_text 你好,我是通义实验室语音合成大模型。
2025-01-18 11:53:15,771 INFO instruct_text 你能用粤语的口音说吗?<|endofprompt|>
2025-01-18 11:53:15,772 INFO synthesis text 你好,我是通义实验室语音合成大模型。
We detected that you are passing `past_key_values` as a tuple of tuples. This is deprecated and will be removed in v4.47. Please convert your cache or use an appropriate `Cache` class (https://huggingface.co/docs/transformers/kv_cache#legacy-cache-format)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:18] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:19] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:19] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:19] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:19] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:19] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:19] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:19] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:19] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[01/18/2025-11:53:19] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)

I have no idea why this error occurred. Could someone kindly assist me? Thank you!

@aluminumbox
Copy link
Collaborator

first verify our instruct_text code in readme

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants