diff --git "a/context_encoding_model/_tp0_bk4/log-neuron-cc.txt" "b/context_encoding_model/_tp0_bk4/log-neuron-cc.txt" new file mode 100644--- /dev/null +++ "b/context_encoding_model/_tp0_bk4/log-neuron-cc.txt" @@ -0,0 +1,9559 @@ +2025-11-04T21:38:33Z INFO 8685 [root]: /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/bin/neuronx-cc compile --framework=XLA /home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk4/model.MODULE_95ef7ca73cc0a6161be2+96be3c33.hlo_module.pb --output /home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk4/model.MODULE_95ef7ca73cc0a6161be2+96be3c33.neff --target=trn2 --auto-cast=none --model-type=transformer '--tensorizer-options=--enable-ccop-compute-overlap --cc-pipeline-tiling-factor=2 --vectorize-strided-dma' --lnc=2 -O1 '--internal-hlo2tensorizer-options= --modular-flow-mac-threshold=10 --verify-hlo=true' --logfile=/home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk4/log-neuron-cc.txt --verbose=35 +2025-11-04T21:38:33Z INFO 8685 [root]: NeuronX Compiler version 2.21.33363.0+82129205 Python version 3.10.12 HWM version 2.21.0.33363+82129205 NumPy version 1.26.4 Running on AMI ami-00632e4ca97ea8199 Running in region usw2-az2 +2025-11-04T21:38:33Z INFO 8698 [root]: XLA detected +2025-11-04T21:38:33Z INFO 8698 [root]: Pipeline: HLOToTensorizer Frontend StaticIOTranspose WalrusDriver BIRLinker Kelper NeffWrapper +2025-11-04T21:38:34Z INFO 8698 [root]: Intermediate files stored in /home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk4/neuronxcc-yihckw_e, output in /home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk4 +2025-11-04T21:38:34Z INFO 8698 [pipeline.Pipeline.0]: Job Pipeline len(in_states) 1 +2025-11-04T21:38:34Z INFO 8698 [pipeline.Pipeline.0]: Processing input #0 +2025-11-04T21:38:34Z INFO 8698 [pipeline.Pipeline.0]: Running pipeline Pipeline.0 +2025-11-04T21:38:34Z INFO 8698 [pipeline.Pipeline.0]: Starting job job.HLOToTensorizer.0 +2025-11-04T21:38:34Z INFO 8698 [job.HLOToTensorizer.0]: Job HLOToTensorizer len(in_states) 1 +2025-11-04T21:38:34Z INFO 8698 [job.HLOToTensorizer.0]: Processing input #0 +2025-11-04T21:38:34Z INFO 8698 [job.HLOToTensorizer.0]: Executing: /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/starfish/bin/hlo2penguin --input /home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk4/model.MODULE_95ef7ca73cc0a6161be2+96be3c33.hlo_module.pb --out-dir ./ --output penguin.py --remat --max-costly-ops=2 --max-live-in-size=5 --max-remat-chain-size=10 --max-mem-multiple=1.8 --min-def-use-distance=500 --remat-policy=transformer --allow-same-pass-remat=true --verbose=error --logfile=/home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk4/log-neuron-cc.txt --logfile-verbose=info --layers-per-module=1 --partition --emit-tensor-level-dropout-ops --modular-flow-mac-threshold=10 --verify-hlo=true --native-to-custom-softmax --partitioner-opts='--transformer' +2025-11-04T21:38:34Z INFO 8698 [job.HLOToTensorizer.0]: +Pre-Partition Pre-Opt Histogram: +total HLO instructions: 8312 + reshape 1912 23.00% ################################################################ + broadcast 1123 13.51% ##################################### + transpose 1072 12.90% ################################### + convert 945 11.37% ############################### + constant 636 7.65% ##################### + parameter 371 4.46% ############ + slice 347 4.17% ########### + add 284 3.42% ######### + get-tuple-element 259 3.12% ######## + multiply 255 3.07% ######## + dot 198 2.38% ###### + call 174 2.09% ##### + compare 173 2.08% ##### + select 170 2.05% ##### + concatenate 116 1.40% ### + tuple 57 0.69% # + scatter 57 0.69% # + negate 56 0.67% # + all-reduce 56 0.67% # + divide 29 0.35% + gather 6 0.07% + iota 5 0.06% + all-gather 3 0.04% + reduce 3 0.04% + custom-call 2 0.02% + sine 1 0.01% + cosine 1 0.01% + maximum 1 0.01% + + +Pre-Partition Post-Op Histogram: +total HLO instructions: 5437 + reshape 1421 26.14% ################################################################ + transpose 817 15.03% #################################### + convert 720 13.24% ################################ + constant 443 8.15% ################### + parameter 371 6.82% ################ + broadcast 266 4.89% ########### + dot 197 3.62% ######## + custom-call 175 3.22% ####### + multiply 171 3.15% ####### + add 171 3.15% ####### + get-tuple-element 147 2.70% ###### + slice 115 2.12% ##### + concatenate 114 2.10% ##### + compare 59 1.09% ## + select 58 1.07% ## + scatter 57 1.05% ## + negate 56 1.03% ## + all-reduce 56 1.03% ## + gather 6 0.11% + all-gather 3 0.06% + iota 3 0.06% + reduce 3 0.06% + pad 2 0.04% + sine 1 0.02% + divide 1 0.02% + tuple 1 0.02% + maximum 1 0.02% + rng 1 0.02% + cosine 1 0.02% + +Potential split-points stats: #CC 59 #AR 56 #AG 3 #BN 0 nClamp 0 +ModuleSplitter initial partitioning... #parts 59 +ModuleSplitter initial partitioning... Done. + 0 1 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 57 58 +New disjoint wave: start 2 len 54 NumReps: 27 macs 1507533520896 +First non-zero-mac/used part from the end is 58 +Not enough zero-mac parts. skip +ModuleSplitter initial partitioning... #parts 29 +ModuleSplitter initial partitioning... Done. +Remat: gather-iota 0 matches, 0 ops rematted +Wrote HLO netlist to hlo_netlist.json +Wrote graph partitions in debug_info_hlo_partitions.json +Processing partition 0 +Replaced 0 dropout sequences with OffloadedDropout +HLO Ops used in computation: add all-gather all-reduce broadcast compare concatenate constant convert cosine custom-call dot gather get-tuple-element multiply negate parameter reshape scatter select sine slice transpose tuple +Invoking RemoveOptimizationBarriers pass +Processing partition 1 +Replaced 0 dropout sequences with OffloadedDropout +HLO Ops used in computation: add all-reduce broadcast compare concatenate constant convert custom-call dot get-tuple-element multiply negate parameter reshape scatter select slice transpose tuple +Invoking RemoveOptimizationBarriers pass +Processing partition 2 +Replaced 0 dropout sequences with OffloadedDropout +HLO Ops used in computation: add all-gather all-reduce broadcast compare concatenate constant convert custom-call divide dot gather get-tuple-element iota maximum multiply pad parameter reduce reshape rng scatter select slice transpose tuple +Invoking RemoveOptimizationBarriers pass + +2025-11-04T21:38:34Z INFO 8698 [job.HLOToTensorizer.0]: IR signature: 66a95a93f4019d420bf017fa5e43303ad25f3e0a31011f48fad97ade9028ee76 for sg0000/HLOToTensorizer +2025-11-04T21:38:34Z INFO 8698 [job.HLOToTensorizer.0]: IR signature: bdabb093663dc2324f935e932f22345ab4111086fe33706a3c2e0f7ba61b67a0 for sg0001/HLOToTensorizer +2025-11-04T21:38:34Z INFO 8698 [job.HLOToTensorizer.0]: IR signature: b450695497ff1fc37081039a148fbf215cafeb494d658b63147a29e8e8488685 for sg0002/HLOToTensorizer +2025-11-04T21:38:34Z INFO 8698 [job.HLOToTensorizer.0]: Job #0 finished +2025-11-04T21:38:34Z INFO 8698 [pipeline.Pipeline.0]: Finished job job.HLOToTensorizer.0 +2025-11-04T21:38:34Z INFO 8698 [pipeline.Pipeline.0]: Starting job job.Frontend.0 +2025-11-04T21:38:34Z INFO 8698 [job.Frontend.0]: Job Frontend len(in_states) 1 +2025-11-04T21:38:34Z INFO 8698 [job.Frontend.0]: Processing input #0 +2025-11-04T21:38:34Z INFO 8698 [job.Frontend.0]: Start model loading +2025-11-04T21:38:34Z INFO 8698 [job.Frontend.0]: Start tensorization +2025-11-04T21:38:34Z INFO 8698 [job.Frontend.0]: Num jobs: 12 +2025-11-04T21:38:34Z USER 8698 [root/Tensorizer/Tensorizer]: Running Tensorizer +2025-11-04T21:38:34Z INFO 8698 [Tensorizer]: Max workers: 3 +2025-11-04T21:38:34Z INFO 8739 [Tensorizer]: Building model from Penguin script "penguin.py.000001"... +2025-11-04T21:38:34Z INFO 8738 [Tensorizer]: Building model from Penguin script "penguin.py.000000"... +2025-11-04T21:38:34Z INFO 8740 [Tensorizer]: Building model from Penguin script "penguin.py.000002"... +2025-11-04T21:38:34Z INFO 8738 [Tensorizer]: Allocate SB of shape (128, 0) for CausalAttentionMMSoftmaxMMWithoutSwap +2025-11-04T21:38:34Z INFO 8738 [Tensorizer]: Allocate PSUM of shape (8, 128, 0) for CausalAttentionMMSoftmaxMMWithoutSwap +2025-11-04T21:38:34Z INFO 8739 [Tensorizer]: Allocate SB of shape (128, 0) for CausalAttentionMMSoftmaxMMWithoutSwap +2025-11-04T21:38:34Z INFO 8739 [Tensorizer]: Allocate PSUM of shape (8, 128, 0) for CausalAttentionMMSoftmaxMMWithoutSwap +2025-11-04T21:38:34Z INFO 8739 [Tensorizer]: Tensorizer options: --enable-ccop-compute-overlap --cc-pipeline-tiling-factor=2 --vectorize-strided-dma --run-pg-layout-and-tiling --enable-dse-after-mask-propagation --disable-concat-delinearizer --num-neuroncores-per-sengine=2 --num-neuroncores-per-sengine=2 --internal_dynamic_dma_scratch_size_per_partition=16384 --disable-bitcasted-transpose --dont-verify-after-all --fp32-cast=none --mm-transpose-type=fp32 --disable-expensive-checks --disable-max-stride-tiling --hbm-scratchpad-page-size-in-bytes=536870912 --enable-replication --max-local-tensor-tile-size-in-bytes=32768 --tensor-layout-p-order=0 --tensor-layout-b-order=1 --enable-advanced-delinearization --weight-coalescing-threshold=512 --enable-bir-converter=enable --enable-tritium-loopfusion --enable-softmax-kernel --model-type-transformer --enable-isl-in-injective-check --enable-dge-on-io-dma --enable-dge-on-spill-reload-dma --enable-dge-on-indirect-dma --enable-dge-on-vector-indirect-dma --keep-rng-tensor-op +2025-11-04T21:38:34Z INFO 8738 [Tensorizer]: Tensorizer options: --enable-ccop-compute-overlap --cc-pipeline-tiling-factor=2 --vectorize-strided-dma --run-pg-layout-and-tiling --enable-dse-after-mask-propagation --disable-concat-delinearizer --num-neuroncores-per-sengine=2 --num-neuroncores-per-sengine=2 --internal_dynamic_dma_scratch_size_per_partition=16384 --disable-bitcasted-transpose --dont-verify-after-all --fp32-cast=none --mm-transpose-type=fp32 --disable-expensive-checks --disable-max-stride-tiling --hbm-scratchpad-page-size-in-bytes=536870912 --enable-replication --max-local-tensor-tile-size-in-bytes=32768 --tensor-layout-p-order=0 --tensor-layout-b-order=1 --enable-advanced-delinearization --weight-coalescing-threshold=512 --enable-bir-converter=enable --enable-tritium-loopfusion --enable-softmax-kernel --model-type-transformer --enable-isl-in-injective-check --enable-dge-on-io-dma --enable-dge-on-spill-reload-dma --enable-dge-on-indirect-dma --enable-dge-on-vector-indirect-dma --keep-rng-tensor-op +2025-11-04T21:38:34Z INFO 8739 [sg0001/Tensorizer/DoNothing]: Running DoNothing +2025-11-04T21:38:34Z INFO 8739 [sg0001/Tensorizer/DoNothing]: Finished (changed=True) +2025-11-04T21:38:34Z INFO 8738 [sg0000/Tensorizer/DoNothing]: Running DoNothing +2025-11-04T21:38:34Z INFO 8738 [sg0000/Tensorizer/DoNothing]: Finished (changed=True) +2025-11-04T21:38:34Z INFO 8739 [sg0001/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds +2025-11-04T21:38:34Z INFO 8739 [sg0001/Tensorizer/LegalizeOpLevelAlias]: Running LegalizeOpLevelAlias +2025-11-04T21:38:34Z INFO 8739 [sg0001/Tensorizer/LegalizeOpLevelAlias]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8739 [sg0001/Tensorizer/LegalizeOpLevelAlias]: LegalizeOpLevelAlias finished after 0.002 seconds +2025-11-04T21:38:34Z INFO 8739 [sg0001/Tensorizer/OptimizeAliasedCopyChain]: Running OptimizeAliasedCopyChain +2025-11-04T21:38:34Z INFO 8739 [sg0001/Tensorizer/OptimizeAliasedCopyChain]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8739 [sg0001/Tensorizer/OptimizeAliasedCopyChain]: OptimizeAliasedCopyChain finished after 0.002 seconds +2025-11-04T21:38:34Z INFO 8739 [sg0001/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction +2025-11-04T21:38:34Z INFO 8739 [sg0001/Tensorizer/AliasDependencyInduction]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8738 [sg0000/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds +2025-11-04T21:38:34Z INFO 8738 [sg0000/Tensorizer/LegalizeOpLevelAlias]: Running LegalizeOpLevelAlias +2025-11-04T21:38:34Z INFO 8738 [sg0000/Tensorizer/LegalizeOpLevelAlias]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8738 [sg0000/Tensorizer/LegalizeOpLevelAlias]: LegalizeOpLevelAlias finished after 0.005 seconds +2025-11-04T21:38:34Z INFO 8738 [sg0000/Tensorizer/OptimizeAliasedCopyChain]: Running OptimizeAliasedCopyChain +2025-11-04T21:38:34Z INFO 8738 [sg0000/Tensorizer/OptimizeAliasedCopyChain]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8738 [sg0000/Tensorizer/OptimizeAliasedCopyChain]: OptimizeAliasedCopyChain finished after 0.002 seconds +2025-11-04T21:38:35Z INFO 8738 [sg0000/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction +2025-11-04T21:38:35Z INFO 8738 [sg0000/Tensorizer/AliasDependencyInduction]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8738 [sg0000/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.002 seconds +2025-11-04T21:38:35Z INFO 8738 [sg0000/Tensorizer/TransformConvOp]: Running TransformConvOp +2025-11-04T21:38:35Z INFO 8738 [sg0000/Tensorizer/TransformConvOp]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.003 seconds +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/TransformConvOp]: Running TransformConvOp +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/TransformConvOp]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8738 [sg0000/Tensorizer/TransformConvOp]: TransformConvOp finished after 0.005 seconds +2025-11-04T21:38:35Z INFO 8738 [sg0000/Tensorizer/LowerTensorOp]: Running LowerTensorOp +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/TransformConvOp]: TransformConvOp finished after 0.007 seconds +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/LowerTensorOp]: Running LowerTensorOp +2025-11-04T21:38:35Z INFO 8738 [sg0000/Tensorizer/LowerTensorOp]: Finished (changed=True) +2025-11-04T21:38:35Z INFO 8738 [sg0000/Tensorizer/LowerTensorOp]: LowerTensorOp finished after 0.029 seconds +2025-11-04T21:38:35Z INFO 8738 [sg0000/Tensorizer/AliasDependencyReset]: Running AliasDependencyReset +2025-11-04T21:38:35Z INFO 8738 [sg0000/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination +2025-11-04T21:38:35Z INFO 8738 [sg0000/Tensorizer/AliasDependencyElimination]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8738 [sg0000/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.000 seconds +2025-11-04T21:38:35Z INFO 8738 [sg0000/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/LowerTensorOp]: Finished (changed=True) +2025-11-04T21:38:35Z INFO 8738 [sg0000/Tensorizer/AliasDependencyInduction]: Finished (changed=True) +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/LowerTensorOp]: LowerTensorOp finished after 0.032 seconds +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/AliasDependencyReset]: Running AliasDependencyReset +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/AliasDependencyElimination]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.000 seconds +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/AliasDependencyInduction]: Finished (changed=True) +2025-11-04T21:38:35Z INFO 8738 [sg0000/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.011 seconds +2025-11-04T21:38:35Z INFO 8738 [sg0000/Tensorizer/AliasDependencyReset]: AliasDependencyReset finished after 0.065 seconds +2025-11-04T21:38:35Z INFO 8738 [sg0000/Tensorizer/LegalizeCCOpLayout]: Running LegalizeCCOpLayout +2025-11-04T21:38:35Z INFO 8738 [sg0000/Tensorizer/LegalizeCCOpLayout]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8738 [sg0000/Tensorizer/LegalizeCCOpLayout]: LegalizeCCOpLayout finished after 0.003 seconds +2025-11-04T21:38:35Z INFO 8738 [sg0000/Tensorizer/TensorOpSimplifier]: Running TensorOpSimplifier +2025-11-04T21:38:35Z INFO 8738 [sg0000/Tensorizer/TensorOpSimplifier]: Finished (changed=True) +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.016 seconds +2025-11-04T21:38:35Z INFO 8740 [Tensorizer]: Tensorizer options: --enable-ccop-compute-overlap --cc-pipeline-tiling-factor=2 --vectorize-strided-dma --run-pg-layout-and-tiling --enable-dse-after-mask-propagation --disable-concat-delinearizer --num-neuroncores-per-sengine=2 --num-neuroncores-per-sengine=2 --internal_dynamic_dma_scratch_size_per_partition=16384 --disable-bitcasted-transpose --dont-verify-after-all --fp32-cast=none --mm-transpose-type=fp32 --disable-expensive-checks --disable-max-stride-tiling --hbm-scratchpad-page-size-in-bytes=536870912 --enable-replication --max-local-tensor-tile-size-in-bytes=32768 --tensor-layout-p-order=0 --tensor-layout-b-order=1 --enable-advanced-delinearization --weight-coalescing-threshold=512 --enable-bir-converter=enable --enable-tritium-loopfusion --enable-softmax-kernel --model-type-transformer --enable-isl-in-injective-check --enable-dge-on-io-dma --enable-dge-on-spill-reload-dma --enable-dge-on-indirect-dma --enable-dge-on-vector-indirect-dma --keep-rng-tensor-op +2025-11-04T21:38:35Z INFO 8738 [sg0000/Tensorizer/TensorOpSimplifier]: TensorOpSimplifier finished after 0.010 seconds +2025-11-04T21:38:35Z INFO 8738 [sg0000/Tensorizer/CanonicalizeIR]: Running CanonicalizeIR +2025-11-04T21:38:35Z INFO 8738 [sg0000/Tensorizer/CanonicalizeIR]: Finished (changed=True) +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/DoNothing]: Running DoNothing +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/DoNothing]: Finished (changed=True) +2025-11-04T21:38:35Z INFO 8738 [sg0000/Tensorizer/CanonicalizeIR]: CanonicalizeIR finished after 0.002 seconds +2025-11-04T21:38:35Z INFO 8738 [sg0000/Tensorizer/ResolveComplicatePredicates]: Running ResolveComplicatePredicates +2025-11-04T21:38:35Z INFO 8738 [sg0000/Tensorizer/ResolveComplicatePredicates]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8738 [sg0000/Tensorizer/ResolveComplicatePredicates]: ResolveComplicatePredicates finished after 0.001 seconds +2025-11-04T21:38:35Z INFO 8738 [sg0000/Tensorizer/AffinePredicateResolution]: Running AffinePredicateResolution +2025-11-04T21:38:35Z INFO 8738 [sg0000/Tensorizer/AffinePredicateResolution]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8738 [sg0000/Tensorizer/AffinePredicateResolution]: AffinePredicateResolution finished after 0.002 seconds +2025-11-04T21:38:35Z INFO 8738 [sg0000/Tensorizer/EliminateDivs]: Running EliminateDivs +2025-11-04T21:38:35Z INFO 8738 [sg0000/Tensorizer/EliminateDivs]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/LegalizeOpLevelAlias]: Running LegalizeOpLevelAlias +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/LegalizeOpLevelAlias]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/LegalizeOpLevelAlias]: LegalizeOpLevelAlias finished after 0.002 seconds +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/OptimizeAliasedCopyChain]: Running OptimizeAliasedCopyChain +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/OptimizeAliasedCopyChain]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/OptimizeAliasedCopyChain]: OptimizeAliasedCopyChain finished after 0.001 seconds +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/AliasDependencyInduction]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.002 seconds +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/TransformConvOp]: Running TransformConvOp +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/TransformConvOp]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/AliasDependencyReset]: AliasDependencyReset finished after 0.077 seconds +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/LegalizeCCOpLayout]: Running LegalizeCCOpLayout +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/LegalizeCCOpLayout]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/TransformConvOp]: TransformConvOp finished after 0.013 seconds +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/LowerTensorOp]: Running LowerTensorOp +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/LegalizeCCOpLayout]: LegalizeCCOpLayout finished after 0.005 seconds +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/TensorOpSimplifier]: Running TensorOpSimplifier +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/LowerTensorOp]: Finished (changed=True) +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/LowerTensorOp]: LowerTensorOp finished after 0.028 seconds +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/AliasDependencyReset]: Running AliasDependencyReset +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/AliasDependencyElimination]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/TensorOpSimplifier]: Finished (changed=True) +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.000 seconds +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/TensorOpSimplifier]: TensorOpSimplifier finished after 0.024 seconds +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/CanonicalizeIR]: Running CanonicalizeIR +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/AliasDependencyInduction]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/CanonicalizeIR]: Finished (changed=True) +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.022 seconds +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/AliasDependencyReset]: AliasDependencyReset finished after 0.046 seconds +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/LegalizeCCOpLayout]: Running LegalizeCCOpLayout +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/LegalizeCCOpLayout]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/LegalizeCCOpLayout]: LegalizeCCOpLayout finished after 0.002 seconds +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/TensorOpSimplifier]: Running TensorOpSimplifier +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/TensorOpSimplifier]: Finished (changed=True) +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/TensorOpSimplifier]: TensorOpSimplifier finished after 0.005 seconds +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/CanonicalizeIR]: Running CanonicalizeIR +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/CanonicalizeIR]: Finished (changed=True) +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/CanonicalizeIR]: CanonicalizeIR finished after 0.003 seconds +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/ResolveComplicatePredicates]: Running ResolveComplicatePredicates +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/ResolveComplicatePredicates]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/ResolveComplicatePredicates]: ResolveComplicatePredicates finished after 0.001 seconds +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/AffinePredicateResolution]: Running AffinePredicateResolution +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/AffinePredicateResolution]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/AffinePredicateResolution]: AffinePredicateResolution finished after 0.003 seconds +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/EliminateDivs]: Running EliminateDivs +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/EliminateDivs]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/CanonicalizeIR]: CanonicalizeIR finished after 0.003 seconds +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/ResolveComplicatePredicates]: Running ResolveComplicatePredicates +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/ResolveComplicatePredicates]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/ResolveComplicatePredicates]: ResolveComplicatePredicates finished after 0.002 seconds +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/AffinePredicateResolution]: Running AffinePredicateResolution +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/AffinePredicateResolution]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/AffinePredicateResolution]: AffinePredicateResolution finished after 0.002 seconds +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/EliminateDivs]: Running EliminateDivs +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/EliminateDivs]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/EliminateDivs]: EliminateDivs finished after 0.003 seconds +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/PerfectLoopNest]: Running PerfectLoopNest +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/PerfectLoopNest]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/PerfectLoopNest]: PerfectLoopNest finished after 0.002 seconds +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/Simplifier]: Running Simplifier +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/Simplifier]: Running Simplifier_iteration_0 +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/Simplifier]: Simplifier_iteration_0 finished after 0.010 seconds +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/Simplifier]: Running Simplifier_iteration_1 +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/Simplifier]: Simplifier_iteration_1 finished after 0.003 seconds +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/Simplifier]: Running Simplifier_iteration_2 +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/Simplifier]: Simplifier_iteration_2 finished after 0.004 seconds +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/Simplifier]: Finished (changed=True) +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/EliminateDivs]: EliminateDivs finished after 0.004 seconds +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/PerfectLoopNest]: Running PerfectLoopNest +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/PerfectLoopNest]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/Simplifier]: Simplifier finished after 0.017 seconds +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/PerfectLoopNest]: PerfectLoopNest finished after 0.003 seconds +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/Simplifier]: Running Simplifier +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/Simplifier]: Running Simplifier_iteration_0 +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/Simplifier]: Simplifier_iteration_0 finished after 0.010 seconds +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/Simplifier]: Running Simplifier_iteration_1 +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.002 seconds +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/TCTransform]: Running TCTransform +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/Simplifier]: Simplifier_iteration_1 finished after 0.003 seconds +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/Simplifier]: Finished (changed=True) +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/TCTransform]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/Simplifier]: Simplifier finished after 0.013 seconds +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.001 seconds +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/TCTransform]: Running TCTransform +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/TCTransform]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/TCTransform]: TCTransform finished after 0.001 seconds +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/CommuteConcat]: Running CommuteConcat +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/CommuteConcat]: Running CommuteConcat_iteration_0 +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/CommuteConcat]: CommuteConcat_iteration_0 finished after 0.002 seconds +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/CommuteConcat]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/CommuteConcat]: CommuteConcat finished after 0.002 seconds +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/ExpandBatchNorm]: Running ExpandBatchNorm +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/ExpandBatchNorm]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/ExpandBatchNorm]: ExpandBatchNorm finished after 0.002 seconds +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/TCTransform]: Running TCTransform +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/TCTransform]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8738 [sg0000/Tensorizer/EliminateDivs]: EliminateDivs finished after 0.005 seconds +2025-11-04T21:38:35Z INFO 8738 [sg0000/Tensorizer/PerfectLoopNest]: Running PerfectLoopNest +2025-11-04T21:38:35Z INFO 8738 [sg0000/Tensorizer/PerfectLoopNest]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/TCTransform]: TCTransform finished after 0.003 seconds +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/CommuteConcat]: Running CommuteConcat +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/CommuteConcat]: Running CommuteConcat_iteration_0 +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/CommuteConcat]: CommuteConcat_iteration_0 finished after 0.001 seconds +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/CommuteConcat]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/CommuteConcat]: CommuteConcat finished after 0.002 seconds +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/ExpandBatchNorm]: Running ExpandBatchNorm +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/ExpandBatchNorm]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/ExpandBatchNorm]: ExpandBatchNorm finished after 0.004 seconds +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/TCTransform]: Running TCTransform +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/TCTransform]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/TCTransform]: TCTransform finished after 0.003 seconds +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.002 seconds +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/TensorOpTransform]: Running TensorOpTransform +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/TensorOpTransform]: Running TensorOpTransform_iteration_0 +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/TCTransform]: TCTransform finished after 0.002 seconds +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/TensorOpTransform]: TensorOpTransform_iteration_0 finished after 0.029 seconds +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/TensorOpTransform]: Running TensorOpTransform_iteration_1 +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/TensorOpTransform]: TensorOpTransform_iteration_1 finished after 0.004 seconds +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/TensorOpTransform]: Finished (changed=True) +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.002 seconds +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/TensorOpTransform]: Running TensorOpTransform +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/TensorOpTransform]: Running TensorOpTransform_iteration_0 +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/TensorOpTransform]: TensorOpTransform finished after 0.033 seconds +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/LateLowerTensorOp]: Running LateLowerTensorOp +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/LateLowerTensorOp]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/LateLowerTensorOp]: LateLowerTensorOp finished after 0.002 seconds +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/AliasDependencyReset]: Running AliasDependencyReset +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/AliasDependencyElimination]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.000 seconds +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/AliasDependencyInduction]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/TensorOpTransform]: TensorOpTransform_iteration_0 finished after 0.054 seconds +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/TensorOpTransform]: Running TensorOpTransform_iteration_1 +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.009 seconds +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/TensorOpTransform]: TensorOpTransform_iteration_1 finished after 0.008 seconds +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/TensorOpTransform]: Finished (changed=True) +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/AliasDependencyReset]: AliasDependencyReset finished after 0.031 seconds +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/MemcpyElimination]: Running MemcpyElimination +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/MemcpyElimination]: Running MemcpyElimination_iteration_0 +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/TensorOpTransform]: TensorOpTransform finished after 0.062 seconds +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/LateLowerTensorOp]: Running LateLowerTensorOp +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/LateLowerTensorOp]: Finished (changed=True) +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/MemcpyElimination]: MemcpyElimination_iteration_0 finished after 0.049 seconds +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/MemcpyElimination]: Running MemcpyElimination_iteration_1 +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/MemcpyElimination]: MemcpyElimination_iteration_1 finished after 0.005 seconds +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/MemcpyElimination]: Finished (changed=True) +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/LateLowerTensorOp]: LateLowerTensorOp finished after 0.007 seconds +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/AliasDependencyReset]: Running AliasDependencyReset +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/AliasDependencyElimination]: Finished (changed=True) +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/MemcpyElimination]: MemcpyElimination finished after 0.055 seconds +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/LoopFusion]: Running LoopFusion +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/LoopFusion]: Running LoopFusion_iteration_0 +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.000 seconds +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/LoopFusion]: LoopFusion_iteration_0 finished after 0.035 seconds +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/LoopFusion]: Running LoopFusion_iteration_1 +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/AliasDependencyInduction]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/LoopFusion]: LoopFusion_iteration_1 finished after 0.013 seconds +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/LoopFusion]: Running LoopFusion_iteration_0 +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/LoopFusion]: LoopFusion_iteration_0 finished after 0.009 seconds +2025-11-04T21:38:35Z INFO 8740 [sg0002/Tensorizer/LoopFusion]: Finished (changed=True) +2025-11-04T21:38:35Z INFO 8739 [sg0001/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.030 seconds +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/LoopFusion]: LoopFusion finished after 0.066 seconds +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/Rematerialization]: Running Rematerialization +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/Rematerialization]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/AliasDependencyReset]: AliasDependencyReset finished after 0.085 seconds +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/MemcpyElimination]: Running MemcpyElimination +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/MemcpyElimination]: Running MemcpyElimination_iteration_0 +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/Rematerialization]: Rematerialization finished after 0.011 seconds +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/Simplifier]: Running Simplifier +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/Simplifier]: Running Simplifier_iteration_0 +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/PerfectLoopNest]: PerfectLoopNest finished after 0.003 seconds +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/Simplifier]: Running Simplifier +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/Simplifier]: Running Simplifier_iteration_0 +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/Simplifier]: Simplifier_iteration_0 finished after 0.013 seconds +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/Simplifier]: Running Simplifier_iteration_1 +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/Simplifier]: Simplifier_iteration_1 finished after 0.007 seconds +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/Simplifier]: Running Simplifier_iteration_2 +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/Simplifier]: Simplifier_iteration_2 finished after 0.006 seconds +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/Simplifier]: Finished (changed=True) +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/Simplifier]: Simplifier finished after 0.032 seconds +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/Delinearization]: Running Delinearization +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/Simplifier]: Simplifier_iteration_0 finished after 0.035 seconds +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/Simplifier]: Running Simplifier_iteration_1 +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/Simplifier]: Simplifier_iteration_1 finished after 0.024 seconds +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/Simplifier]: Running Simplifier_iteration_2 +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/Delinearization]: Finished (changed=True) +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/Delinearization]: Delinearization finished after 0.043 seconds +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/DeadStoreElimination]: Running DeadStoreElimination +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/Simplifier]: Simplifier_iteration_2 finished after 0.029 seconds +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/Simplifier]: Finished (changed=True) +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.089 seconds +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.002 seconds +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/TCTransform]: Running TCTransform +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/TCTransform]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/TCTransform]: TCTransform finished after 0.010 seconds +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/CommuteConcat]: Running CommuteConcat +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/CommuteConcat]: Running CommuteConcat_iteration_0 +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/CommuteConcat]: CommuteConcat_iteration_0 finished after 0.002 seconds +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/CommuteConcat]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/CommuteConcat]: CommuteConcat finished after 0.002 seconds +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/ExpandBatchNorm]: Running ExpandBatchNorm +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/ExpandBatchNorm]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/DeadStoreElimination]: Finished (changed=True) +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/ExpandBatchNorm]: ExpandBatchNorm finished after 0.003 seconds +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/TCTransform]: Running TCTransform +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/TCTransform]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/TCTransform]: TCTransform finished after 0.002 seconds +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.005 seconds +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/TensorOpTransform]: Running TensorOpTransform +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/TensorOpTransform]: Running TensorOpTransform_iteration_0 +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/DeadStoreElimination]: DeadStoreElimination finished after 0.095 seconds +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/Simplifier]: Running Simplifier +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/Simplifier]: Running Simplifier_iteration_0 +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/Simplifier]: Simplifier_iteration_0 finished after 0.007 seconds +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/Simplifier]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/Simplifier]: Simplifier finished after 0.009 seconds +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/LICM]: Running LICM +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/TensorOpTransform]: TensorOpTransform_iteration_0 finished after 0.047 seconds +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/TensorOpTransform]: Running TensorOpTransform_iteration_1 +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/LICM]: Finished (changed=True) +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/MemcpyElimination]: MemcpyElimination_iteration_0 finished after 0.309 seconds +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/MemcpyElimination]: Running MemcpyElimination_iteration_1 +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/TensorOpTransform]: TensorOpTransform_iteration_1 finished after 0.012 seconds +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/TensorOpTransform]: Finished (changed=True) +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/MemcpyElimination]: MemcpyElimination_iteration_1 finished after 0.008 seconds +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/MemcpyElimination]: Finished (changed=True) +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/LICM]: LICM finished after 0.010 seconds +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/Delinearization]: Running Delinearization +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/Delinearization]: Finished (changed=True) +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/TensorOpTransform]: TensorOpTransform finished after 0.062 seconds +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/LateLowerTensorOp]: Running LateLowerTensorOp +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/LateLowerTensorOp]: Finished (changed=True) +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/MemcpyElimination]: MemcpyElimination finished after 0.322 seconds +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/LoopFusion]: Running LoopFusion +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/LoopFusion]: Running LoopFusion_iteration_0 +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/LateLowerTensorOp]: LateLowerTensorOp finished after 0.007 seconds +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/AliasDependencyReset]: Running AliasDependencyReset +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/AliasDependencyElimination]: Finished (changed=True) +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/LoopFusion]: LoopFusion_iteration_0 finished after 0.025 seconds +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/LoopFusion]: Running LoopFusion_iteration_1 +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/LoopFusion]: LoopFusion_iteration_1 finished after 0.006 seconds +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/LoopFusion]: Running LoopFusion_iteration_2 +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.000 seconds +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/LoopFusion]: LoopFusion_iteration_2 finished after 0.008 seconds +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/LoopFusion]: Running LoopFusion_iteration_0 +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/LoopFusion]: LoopFusion_iteration_0 finished after 0.014 seconds +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/LoopFusion]: Running LoopFusion_iteration_1 +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/LoopFusion]: LoopFusion_iteration_1 finished after 0.011 seconds +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/LoopFusion]: Finished (changed=True) +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/Delinearization]: Delinearization finished after 0.004 seconds +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/LoopFusion]: Running LoopFusion +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/LoopFusion]: Running LoopFusion_iteration_0 +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/AliasDependencyInduction]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/LoopFusion]: LoopFusion_iteration_0 finished after 0.014 seconds +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/LoopFusion]: LoopFusion finished after 0.073 seconds +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/Rematerialization]: Running Rematerialization +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/LoopFusion]: Running LoopFusion_iteration_0 +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/Rematerialization]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/LoopFusion]: LoopFusion_iteration_0 finished after 0.018 seconds +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/LoopFusion]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.033 seconds +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/AliasDependencyReset]: AliasDependencyReset finished after 0.102 seconds +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/MemcpyElimination]: Running MemcpyElimination +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/MemcpyElimination]: Running MemcpyElimination_iteration_0 +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/Rematerialization]: Rematerialization finished after 0.015 seconds +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/Simplifier]: Running Simplifier +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/Simplifier]: Running Simplifier_iteration_0 +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/LoopFusion]: LoopFusion finished after 0.040 seconds +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/SimplifySlice]: Running SimplifySlice +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/Simplifier]: Simplifier_iteration_0 finished after 0.015 seconds +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/SimplifySlice]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/Simplifier]: Running Simplifier_iteration_1 +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/SimplifySlice]: SimplifySlice finished after 0.002 seconds +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/LICM]: Running LICM +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/LICM]: Finished (changed=True) +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/Simplifier]: Simplifier_iteration_1 finished after 0.017 seconds +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/Simplifier]: Finished (changed=True) +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/LICM]: LICM finished after 0.002 seconds +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/Simplifier]: Running Simplifier +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/Simplifier]: Running Simplifier_iteration_0 +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/Simplifier]: Simplifier_iteration_0 finished after 0.009 seconds +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/Simplifier]: Running Simplifier_iteration_1 +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/Simplifier]: Simplifier finished after 0.040 seconds +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/Delinearization]: Running Delinearization +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/Simplifier]: Simplifier_iteration_1 finished after 0.009 seconds +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/Simplifier]: Finished (changed=True) +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/Simplifier]: Simplifier finished after 0.025 seconds +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/ValueNumbering]: Running ValueNumbering +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/Delinearization]: Finished (changed=True) +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/ValueNumbering]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/Delinearization]: Delinearization finished after 0.027 seconds +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/DeadStoreElimination]: Running DeadStoreElimination +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/ValueNumbering]: ValueNumbering finished after 0.013 seconds +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/LICM]: Running LICM +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/LICM]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/LICM]: LICM finished after 0.017 seconds +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/PadElimination]: Running PadElimination +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/PadElimination]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/PadElimination]: PadElimination finished after 0.001 seconds +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/Delinearization]: Running Delinearization +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/Delinearization]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/Delinearization]: Delinearization finished after 0.018 seconds +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/LoopFusion]: Running LoopFusion +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/LoopFusion]: Running LoopFusion_iteration_0 +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/LoopFusion]: LoopFusion_iteration_0 finished after 0.010 seconds +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/LoopFusion]: Running LoopFusion_iteration_0 +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/LoopFusion]: LoopFusion_iteration_0 finished after 0.019 seconds +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/LoopFusion]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/LoopFusion]: LoopFusion finished after 0.029 seconds +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/MemcpyElimination]: MemcpyElimination_iteration_0 finished after 0.283 seconds +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/MemcpyElimination]: Running MemcpyElimination_iteration_1 +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.009 seconds +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/Simplifier]: Running Simplifier +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/Simplifier]: Running Simplifier_iteration_0 +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/Simplifier]: Simplifier_iteration_0 finished after 0.009 seconds +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/Simplifier]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/MemcpyElimination]: MemcpyElimination_iteration_1 finished after 0.017 seconds +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/MemcpyElimination]: Finished (changed=True) +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/Simplifier]: Simplifier finished after 0.010 seconds +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/LICM]: Running LICM +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/LICM]: Finished (changed=True) +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/DeadStoreElimination]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/LICM]: LICM finished after 0.003 seconds +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/ValueNumbering]: Running ValueNumbering +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/ValueNumbering]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/DeadStoreElimination]: DeadStoreElimination finished after 0.201 seconds +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/Simplifier]: Running Simplifier +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/Simplifier]: Running Simplifier_iteration_0 +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/Simplifier]: Simplifier_iteration_0 finished after 0.009 seconds +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/Simplifier]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/MemcpyElimination]: MemcpyElimination finished after 0.301 seconds +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion_iteration_0 +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/Simplifier]: Simplifier finished after 0.009 seconds +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/LICM]: Running LICM +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/LICM]: Finished (changed=True) +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/ValueNumbering]: ValueNumbering finished after 0.003 seconds +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/TCTransform]: Running TCTransform +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/LoopFusion]: LoopFusion_iteration_0 finished after 0.034 seconds +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion_iteration_1 +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/TCTransform]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/TCTransform]: TCTransform finished after 0.007 seconds +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/CommuteConcat]: Running CommuteConcat +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/CommuteConcat]: Running CommuteConcat_iteration_0 +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/LoopFusion]: LoopFusion_iteration_1 finished after 0.016 seconds +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion_iteration_2 +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/CommuteConcat]: CommuteConcat_iteration_0 finished after 0.001 seconds +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/CommuteConcat]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/LoopFusion]: LoopFusion_iteration_2 finished after 0.008 seconds +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion_iteration_0 +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/LICM]: LICM finished after 0.006 seconds +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/Delinearization]: Running Delinearization +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/LoopFusion]: LoopFusion_iteration_0 finished after 0.019 seconds +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion_iteration_1 +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/CommuteConcat]: CommuteConcat finished after 0.008 seconds +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/Delinearization]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/RecognizeOpIdiom]: Running RecognizeOpIdiom +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/RecognizeOpIdiom]: Running RecognizeOpIdiom_iteration_0 +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/LoopFusion]: LoopFusion_iteration_1 finished after 0.007 seconds +2025-11-04T21:38:36Z INFO 8738 [sg0000/Tensorizer/LoopFusion]: Finished (changed=True) +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/RecognizeOpIdiom]: RecognizeOpIdiom_iteration_0 finished after 0.006 seconds +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/RecognizeOpIdiom]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/Delinearization]: Delinearization finished after 0.019 seconds +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/LoopFusion]: Running LoopFusion +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/LoopFusion]: Running LoopFusion_iteration_0 +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/LoopFusion]: LoopFusion_iteration_0 finished after 0.009 seconds +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/LoopFusion]: Running LoopFusion_iteration_0 +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/RecognizeOpIdiom]: RecognizeOpIdiom finished after 0.009 seconds +2025-11-04T21:38:36Z INFO 8740 [sg0002/Tensorizer/MaskPropagation]: Running MaskPropagation +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/LoopFusion]: LoopFusion_iteration_0 finished after 0.006 seconds +2025-11-04T21:38:36Z INFO 8739 [sg0001/Tensorizer/LoopFusion]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/LoopFusion]: LoopFusion finished after 0.017 seconds +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/SimplifySlice]: Running SimplifySlice +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/SimplifySlice]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/MaskPropagation]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/SimplifySlice]: SimplifySlice finished after 0.002 seconds +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/LICM]: Running LICM +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/LICM]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/MaskPropagation]: MaskPropagation finished after 0.020 seconds +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/DeadStoreElimination]: Running DeadStoreElimination +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/DeadStoreElimination]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/LICM]: LICM finished after 0.008 seconds +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/Simplifier]: Running Simplifier +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/Simplifier]: Running Simplifier_iteration_0 +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/Simplifier]: Simplifier_iteration_0 finished after 0.008 seconds +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/Simplifier]: Running Simplifier_iteration_1 +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/DeadStoreElimination]: DeadStoreElimination finished after 0.010 seconds +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/Recompute]: Running Recompute +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/Recompute]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/Simplifier]: Simplifier_iteration_1 finished after 0.006 seconds +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/Simplifier]: Finished (changed=True) +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/Recompute]: Recompute finished after 0.001 seconds +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination_iteration_0 +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/DeadCodeElimination]: DeadCodeElimination_iteration_0 finished after 0.002 seconds +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/DeadCodeElimination]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/DeadCodeElimination]: DeadCodeElimination finished after 0.002 seconds +2025-11-04T21:38:37Z INFO 8740 [Tensorizer]: After optimization: 39 statements +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/DoNothing]: Running DoNothing +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/DoNothing]: Finished (changed=True) +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/MutateDataType]: Running MutateDataType +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/MutateDataType]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/MutateDataType]: MutateDataType finished after 0.002 seconds +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/Simplifier]: Simplifier finished after 0.015 seconds +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/ValueNumbering]: Running ValueNumbering +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/ValueNumbering]: Finished (changed=True) +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/ValueNumbering]: ValueNumbering finished after 0.007 seconds +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/LICM]: Running LICM +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/LICM]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.001 seconds +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/Simplifier]: Running Simplifier +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/Simplifier]: Running Simplifier_iteration_0 +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/Simplifier]: Simplifier_iteration_0 finished after 0.007 seconds +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/Simplifier]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/LICM]: LICM finished after 0.007 seconds +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/PadElimination]: Running PadElimination +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/PadElimination]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/Simplifier]: Simplifier finished after 0.008 seconds +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/TileCCOps]: Running TileCCOps +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/TileCCOps]: pass did not tile CC tensor due to `All gather output tensor check failed` +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/TileCCOps]: in float32 (512,) %'all_gather.2' = AllGatherOp-162 AllGather_add(float32 (256,) %'add.11', replica_groups = [[0, 1]],all_gather_dim = DimensionSet((512,), {0}),stream_id = -1) # dl = tensor_op_name: _all-gather.6459 | hlo_id: 108 | , id = 162 +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/TileCCOps]: pass did not tile CC tensor due to `multi_rank_size=2048 is not above min_allgather_tile_size_in_bytes=8388608` +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/TileCCOps]: in uint32 (512,) %'all_gather.3' = AllGatherOp-178 AllGather_add(uint32 (256,) %'add.12', replica_groups = [[0, 1]],all_gather_dim = DimensionSet((512,), {0}),stream_id = -1) # dl = tensor_op_name: _all-gather.6596 | hlo_id: 117 | , id = 178 +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/TileCCOps]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/PadElimination]: PadElimination finished after 0.001 seconds +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/Delinearization]: Running Delinearization +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/Delinearization]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/TileCCOps]: TileCCOps finished after 0.012 seconds +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/DelinearIndices]: Running DelinearIndices +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/DelinearIndices]: Finished (changed=True) +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/Delinearization]: Delinearization finished after 0.005 seconds +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/LoopFusion]: Running LoopFusion +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/LoopFusion]: Running LoopFusion_iteration_0 +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/LoopFusion]: LoopFusion_iteration_0 finished after 0.004 seconds +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/LoopFusion]: Running LoopFusion_iteration_0 +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/LoopFusion]: LoopFusion_iteration_0 finished after 0.006 seconds +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/LoopFusion]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.008 seconds +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/Delinearization]: Running Delinearization +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/Delinearization]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/LoopFusion]: LoopFusion finished after 0.012 seconds +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.002 seconds +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/Simplifier]: Running Simplifier +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/Simplifier]: Running Simplifier_iteration_0 +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/Simplifier]: Simplifier_iteration_0 finished after 0.007 seconds +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/Simplifier]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/Delinearization]: Delinearization finished after 0.004 seconds +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/DelinearIndices]: Running DelinearIndices +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/Simplifier]: Simplifier finished after 0.007 seconds +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/LICM]: Running LICM +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/DelinearIndices]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/LICM]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/LICM]: LICM finished after 0.006 seconds +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/ValueNumbering]: Running ValueNumbering +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/ValueNumbering]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.022 seconds +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination_iteration_0 +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/DeadCodeElimination]: DeadCodeElimination_iteration_0 finished after 0.001 seconds +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/DeadCodeElimination]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/ValueNumbering]: ValueNumbering finished after 0.010 seconds +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/TCTransform]: Running TCTransform +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/TCTransform]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/DeadCodeElimination]: DeadCodeElimination finished after 0.011 seconds +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/LateLowerReshapeOp]: Running LateLowerReshapeOp +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/LateLowerReshapeOp]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/LateLowerReshapeOp]: LateLowerReshapeOp finished after 0.002 seconds +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/InferIntrinsicOnCC]: Running InferIntrinsicOnCC +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/InferIntrinsicOnCC]: Finished (changed=True) +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/InferIntrinsicOnCC]: InferIntrinsicOnCC finished after 0.009 seconds +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/ResolveAccessConflict]: Running ResolveAccessConflict +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/ResolveAccessConflict]: Running DeadCodeElimination_iteration_0 +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/ResolveAccessConflict]: DeadCodeElimination_iteration_0 finished after 0.001 seconds +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/ResolveAccessConflict]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/ResolveAccessConflict]: ResolveAccessConflict finished after 0.009 seconds +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/LICM]: Running LICM +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/LICM]: Finished (changed=True) +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/LICM]: LICM finished after 0.003 seconds +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/LocalLayoutOpt]: Running LocalLayoutOpt +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/TCTransform]: TCTransform finished after 0.003 seconds +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/CommuteConcat]: Running CommuteConcat +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/CommuteConcat]: Running CommuteConcat_iteration_0 +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/LocalLayoutOpt]: Finished (changed=True) +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/CommuteConcat]: CommuteConcat_iteration_0 finished after 0.004 seconds +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/CommuteConcat]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/LocalLayoutOpt]: LocalLayoutOpt finished after 0.022 seconds +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/DelinearIndices]: Running DelinearIndices +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/DelinearIndices]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/CommuteConcat]: CommuteConcat finished after 0.004 seconds +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/RecognizeOpIdiom]: Running RecognizeOpIdiom +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/RecognizeOpIdiom]: Running RecognizeOpIdiom_iteration_0 +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/RecognizeOpIdiom]: RecognizeOpIdiom_iteration_0 finished after 0.014 seconds +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/RecognizeOpIdiom]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.019 seconds +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/PGLayoutTilingPipeline]: Running PGLayoutTilingPipeline +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/LowerCCOpBlockAxis]: Running LowerCCOpBlockAxis +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/LowerCCOpBlockAxis]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/RecognizeOpIdiom]: RecognizeOpIdiom finished after 0.014 seconds +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/MaskPropagation]: Running MaskPropagation +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/LowerCCOpBlockAxis]: LowerCCOpBlockAxis finished after 0.008 seconds +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/LayoutPreprocessingAndAnalysis]: Running LayoutPreprocessingAndAnalysis +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/MaskPropagation]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/LayoutPreprocessing]: Running LayoutPreprocessing +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/MaskPropagation]: MaskPropagation finished after 0.013 seconds +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/DeadStoreElimination]: Running DeadStoreElimination +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/Delinearization]: Running Delinearization +2025-11-04T21:38:37Z INFO 8738 [sg0000/Tensorizer/LoopFusion]: LoopFusion finished after 0.099 seconds +2025-11-04T21:38:37Z INFO 8738 [sg0000/Tensorizer/Rematerialization]: Running Rematerialization +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/Delinearization]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8738 [sg0000/Tensorizer/Rematerialization]: Finished (changed=True) +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/Delinearization]: Delinearization finished after 0.007 seconds +2025-11-04T21:38:37Z INFO 8738 [sg0000/Tensorizer/Rematerialization]: Rematerialization finished after 0.024 seconds +2025-11-04T21:38:37Z INFO 8738 [sg0000/Tensorizer/Simplifier]: Running Simplifier +2025-11-04T21:38:37Z INFO 8738 [sg0000/Tensorizer/Simplifier]: Running Simplifier_iteration_0 +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/LayoutPreprocessing]: Finished (changed=True) +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/LayoutPreprocessing]: LayoutPreprocessing finished after 0.070 seconds +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/LayoutRequirementAnalysis]: Running LayoutRequirementAnalysis +2025-11-04T21:38:37Z INFO 8738 [sg0000/Tensorizer/Simplifier]: Simplifier_iteration_0 finished after 0.018 seconds +2025-11-04T21:38:37Z INFO 8738 [sg0000/Tensorizer/Simplifier]: Running Simplifier_iteration_1 +2025-11-04T21:38:37Z INFO 8738 [sg0000/Tensorizer/Simplifier]: Simplifier_iteration_1 finished after 0.007 seconds +2025-11-04T21:38:37Z INFO 8738 [sg0000/Tensorizer/Simplifier]: Running Simplifier_iteration_2 +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/DeadStoreElimination]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8738 [sg0000/Tensorizer/Simplifier]: Simplifier_iteration_2 finished after 0.013 seconds +2025-11-04T21:38:37Z INFO 8738 [sg0000/Tensorizer/Simplifier]: Finished (changed=True) +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/LayoutRequirementAnalysis]: LayoutRequirementAnalysis finished after 0.013 seconds +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/LayoutPreprocessingAndAnalysis]: LayoutPreprocessingAndAnalysis finished after 0.111 seconds +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/InferNonlocalTensors]: Running InferNonlocalTensors +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/InferNonlocalTensors]: prefer_non_broadcast_par: True +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/DeadStoreElimination]: DeadStoreElimination finished after 0.090 seconds +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/Recompute]: Running Recompute +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/Recompute]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/InferNonlocalTensors]: prefer_non_broadcast_par: True +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/Recompute]: Recompute finished after 0.001 seconds +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination_iteration_0 +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/DeadCodeElimination]: DeadCodeElimination_iteration_0 finished after 0.003 seconds +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/DeadCodeElimination]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/InferNonlocalTensors]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/DeadCodeElimination]: DeadCodeElimination finished after 0.008 seconds +2025-11-04T21:38:37Z INFO 8739 [Tensorizer]: After optimization: 32 statements +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/DoNothing]: Running DoNothing +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/DoNothing]: Finished (changed=True) +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/MutateDataType]: Running MutateDataType +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/MutateDataType]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/MutateDataType]: MutateDataType finished after 0.002 seconds +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.004 seconds +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/Simplifier]: Running Simplifier +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/Simplifier]: Running Simplifier_iteration_0 +2025-11-04T21:38:37Z INFO 8738 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.043 seconds +2025-11-04T21:38:37Z INFO 8738 [sg0000/Tensorizer/Delinearization]: Running Delinearization +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/Simplifier]: Simplifier_iteration_0 finished after 0.020 seconds +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/Simplifier]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/InferNonlocalTensors]: InferNonlocalTensors finished after 0.042 seconds +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/PAGLayoutOpt]: Running PAGLayoutOpt +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/ParAxesAnnotation]: Running ParAxesAnnotation +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/Simplifier]: Simplifier finished after 0.021 seconds +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/TileCCOps]: Running TileCCOps +2025-11-04T21:38:37Z INFO 8738 [sg0000/Tensorizer/Delinearization]: Finished (changed=True) +2025-11-04T21:38:37Z INFO 8738 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.021 seconds +2025-11-04T21:38:37Z INFO 8738 [sg0000/Tensorizer/DeadStoreElimination]: Running DeadStoreElimination +2025-11-04T21:38:37Z INFO 8740 [sg0002/Tensorizer/LayoutSearchAlgorithm]: prefer_non_broadcast_par: True +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/TileCCOps]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/TileCCOps]: TileCCOps finished after 0.026 seconds +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/DelinearIndices]: Running DelinearIndices +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/DelinearIndices]: Finished (changed=True) +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.059 seconds +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/Delinearization]: Running Delinearization +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/Delinearization]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8738 [sg0000/Tensorizer/DeadStoreElimination]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/Delinearization]: Delinearization finished after 0.014 seconds +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/DelinearIndices]: Running DelinearIndices +2025-11-04T21:38:37Z INFO 8738 [sg0000/Tensorizer/DeadStoreElimination]: DeadStoreElimination finished after 0.106 seconds +2025-11-04T21:38:37Z INFO 8738 [sg0000/Tensorizer/Simplifier]: Running Simplifier +2025-11-04T21:38:37Z INFO 8738 [sg0000/Tensorizer/Simplifier]: Running Simplifier_iteration_0 +2025-11-04T21:38:37Z INFO 8739 [sg0001/Tensorizer/DelinearIndices]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/Simplifier]: Simplifier_iteration_0 finished after 0.012 seconds +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/Simplifier]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8740 [sg0002/Tensorizer/ParAxesAnnotation]: Finished (changed=True) +2025-11-04T21:38:38Z INFO 8739 [sg0001/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.032 seconds +2025-11-04T21:38:38Z INFO 8739 [sg0001/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination +2025-11-04T21:38:38Z INFO 8739 [sg0001/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination_iteration_0 +2025-11-04T21:38:38Z INFO 8739 [sg0001/Tensorizer/DeadCodeElimination]: DeadCodeElimination_iteration_0 finished after 0.002 seconds +2025-11-04T21:38:38Z INFO 8739 [sg0001/Tensorizer/DeadCodeElimination]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8739 [sg0001/Tensorizer/DeadCodeElimination]: DeadCodeElimination finished after 0.003 seconds +2025-11-04T21:38:38Z INFO 8739 [sg0001/Tensorizer/LateLowerReshapeOp]: Running LateLowerReshapeOp +2025-11-04T21:38:38Z INFO 8739 [sg0001/Tensorizer/LateLowerReshapeOp]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8739 [sg0001/Tensorizer/LateLowerReshapeOp]: LateLowerReshapeOp finished after 0.006 seconds +2025-11-04T21:38:38Z INFO 8739 [sg0001/Tensorizer/InferIntrinsicOnCC]: Running InferIntrinsicOnCC +2025-11-04T21:38:38Z INFO 8740 [sg0002/Tensorizer/ParAxesAnnotation]: ParAxesAnnotation finished after 0.188 seconds +2025-11-04T21:38:38Z INFO 8740 [sg0002/Tensorizer/InsertLocalTransposes]: Running InsertLocalTransposes +2025-11-04T21:38:38Z INFO 8740 [sg0002/Tensorizer/InsertLocalTransposes]: Finished (changed=True) +2025-11-04T21:38:38Z INFO 8739 [sg0001/Tensorizer/InferIntrinsicOnCC]: Finished (changed=True) +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.012 seconds +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/LICM]: Running LICM +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/LICM]: Finished (changed=True) +2025-11-04T21:38:38Z INFO 8740 [sg0002/Tensorizer/InsertLocalTransposes]: InsertLocalTransposes finished after 0.018 seconds +2025-11-04T21:38:38Z INFO 8740 [sg0002/Tensorizer/PAGLayoutOpt]: PAGLayoutOpt finished after 0.292 seconds +2025-11-04T21:38:38Z INFO 8740 [sg0002/Tensorizer/DelinearizeSPMD]: Running DelinearizeSPMD +2025-11-04T21:38:38Z INFO 8740 [sg0002/Tensorizer/Delinearization]: Running Delinearization +2025-11-04T21:38:38Z INFO 8740 [sg0002/Tensorizer/Delinearization]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/LICM]: LICM finished after 0.007 seconds +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/Delinearization]: Running Delinearization +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/Delinearization]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8740 [sg0002/Tensorizer/Delinearization]: Delinearization finished after 0.012 seconds +2025-11-04T21:38:38Z INFO 8740 [sg0002/Tensorizer/DelinearizeSPMD]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8740 [sg0002/Tensorizer/DelinearizeSPMD]: DelinearizeSPMD finished after 0.036 seconds +2025-11-04T21:38:38Z INFO 8740 [sg0002/Tensorizer/ShardingPropagationAnalysis]: Running ShardingPropagationAnalysis +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.012 seconds +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion_iteration_0 +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/LoopFusion]: LoopFusion_iteration_0 finished after 0.006 seconds +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion_iteration_0 +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/LoopFusion]: LoopFusion_iteration_0 finished after 0.011 seconds +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/LoopFusion]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8739 [sg0001/Tensorizer/InferIntrinsicOnCC]: InferIntrinsicOnCC finished after 0.046 seconds +2025-11-04T21:38:38Z INFO 8739 [sg0001/Tensorizer/ResolveAccessConflict]: Running ResolveAccessConflict +2025-11-04T21:38:38Z INFO 8739 [sg0001/Tensorizer/ResolveAccessConflict]: Running DeadCodeElimination_iteration_0 +2025-11-04T21:38:38Z INFO 8739 [sg0001/Tensorizer/ResolveAccessConflict]: DeadCodeElimination_iteration_0 finished after 0.002 seconds +2025-11-04T21:38:38Z INFO 8739 [sg0001/Tensorizer/ResolveAccessConflict]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8739 [sg0001/Tensorizer/ResolveAccessConflict]: ResolveAccessConflict finished after 0.007 seconds +2025-11-04T21:38:38Z INFO 8739 [sg0001/Tensorizer/LICM]: Running LICM +2025-11-04T21:38:38Z INFO 8739 [sg0001/Tensorizer/LICM]: Finished (changed=True) +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/LoopFusion]: LoopFusion finished after 0.018 seconds +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/SimplifySlice]: Running SimplifySlice +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/SimplifySlice]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/SimplifySlice]: SimplifySlice finished after 0.002 seconds +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/LICM]: Running LICM +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/LICM]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/LICM]: LICM finished after 0.002 seconds +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/Simplifier]: Running Simplifier +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/Simplifier]: Running Simplifier_iteration_0 +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/Simplifier]: Simplifier_iteration_0 finished after 0.007 seconds +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/Simplifier]: Running Simplifier_iteration_1 +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/Simplifier]: Simplifier_iteration_1 finished after 0.006 seconds +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/Simplifier]: Finished (changed=True) +2025-11-04T21:38:38Z INFO 8740 [sg0002/Tensorizer/ShardingPropagationAnalysis]: ShardingPropagationAnalysis finished after 0.107 seconds +2025-11-04T21:38:38Z INFO 8740 [sg0002/Tensorizer/InferShardAxis]: Running InferShardAxis +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.014 seconds +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/ValueNumbering]: Running ValueNumbering +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/ValueNumbering]: Finished (changed=True) +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/ValueNumbering]: ValueNumbering finished after 0.006 seconds +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/LICM]: Running LICM +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/LICM]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/LICM]: LICM finished after 0.003 seconds +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/PadElimination]: Running PadElimination +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/PadElimination]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/PadElimination]: PadElimination finished after 0.002 seconds +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/Delinearization]: Running Delinearization +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/Delinearization]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.010 seconds +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion_iteration_0 +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/LoopFusion]: LoopFusion_iteration_0 finished after 0.009 seconds +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion_iteration_0 +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/LoopFusion]: LoopFusion_iteration_0 finished after 0.009 seconds +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/LoopFusion]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8739 [sg0001/Tensorizer/LICM]: LICM finished after 0.003 seconds +2025-11-04T21:38:38Z INFO 8739 [sg0001/Tensorizer/LocalLayoutOpt]: Running LocalLayoutOpt +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/LoopFusion]: LoopFusion finished after 0.019 seconds +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.001 seconds +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/Simplifier]: Running Simplifier +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/Simplifier]: Running Simplifier_iteration_0 +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/Simplifier]: Simplifier_iteration_0 finished after 0.013 seconds +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/Simplifier]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.016 seconds +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/LICM]: Running LICM +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/LICM]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/LICM]: LICM finished after 0.006 seconds +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/ValueNumbering]: Running ValueNumbering +2025-11-04T21:38:38Z INFO 8739 [sg0001/Tensorizer/LocalLayoutOpt]: Finished (changed=True) +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/ValueNumbering]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8739 [sg0001/Tensorizer/LocalLayoutOpt]: LocalLayoutOpt finished after 0.097 seconds +2025-11-04T21:38:38Z INFO 8739 [sg0001/Tensorizer/DelinearIndices]: Running DelinearIndices +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/ValueNumbering]: ValueNumbering finished after 0.008 seconds +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/TCTransform]: Running TCTransform +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/TCTransform]: Finished (changed=True) +2025-11-04T21:38:38Z INFO 8739 [sg0001/Tensorizer/DelinearIndices]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/TCTransform]: TCTransform finished after 0.002 seconds +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/CommuteConcat]: Running CommuteConcat +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/CommuteConcat]: Running CommuteConcat_iteration_0 +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/CommuteConcat]: CommuteConcat_iteration_0 finished after 0.004 seconds +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/CommuteConcat]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8739 [sg0001/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.027 seconds +2025-11-04T21:38:38Z INFO 8739 [sg0001/Tensorizer/PGLayoutTilingPipeline]: Running PGLayoutTilingPipeline +2025-11-04T21:38:38Z INFO 8739 [sg0001/Tensorizer/LowerCCOpBlockAxis]: Running LowerCCOpBlockAxis +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/CommuteConcat]: CommuteConcat finished after 0.010 seconds +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/RecognizeOpIdiom]: Running RecognizeOpIdiom +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/RecognizeOpIdiom]: Running RecognizeOpIdiom_iteration_0 +2025-11-04T21:38:38Z INFO 8739 [sg0001/Tensorizer/LowerCCOpBlockAxis]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8739 [sg0001/Tensorizer/LowerCCOpBlockAxis]: LowerCCOpBlockAxis finished after 0.016 seconds +2025-11-04T21:38:38Z INFO 8739 [sg0001/Tensorizer/LayoutPreprocessingAndAnalysis]: Running LayoutPreprocessingAndAnalysis +2025-11-04T21:38:38Z INFO 8739 [sg0001/Tensorizer/LayoutPreprocessing]: Running LayoutPreprocessing +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/RecognizeOpIdiom]: RecognizeOpIdiom_iteration_0 finished after 0.017 seconds +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/RecognizeOpIdiom]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/RecognizeOpIdiom]: RecognizeOpIdiom finished after 0.017 seconds +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/MaskPropagation]: Running MaskPropagation +2025-11-04T21:38:38Z INFO 8739 [sg0001/Tensorizer/Delinearization]: Running Delinearization +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/MaskPropagation]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8739 [sg0001/Tensorizer/Delinearization]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/MaskPropagation]: MaskPropagation finished after 0.010 seconds +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/DeadStoreElimination]: Running DeadStoreElimination +2025-11-04T21:38:38Z INFO 8739 [sg0001/Tensorizer/Delinearization]: Delinearization finished after 0.007 seconds +2025-11-04T21:38:38Z INFO 8739 [sg0001/Tensorizer/LayoutPreprocessing]: Finished (changed=True) +2025-11-04T21:38:38Z INFO 8740 [sg0002/Tensorizer/ShardResult]: =================== Dumping Debug Info ===================== +2025-11-04T21:38:38Z INFO 8740 [sg0002/Tensorizer/ShardResult]: ------------------ Sharding summary ------------------ +total number of dags: 36 +total number of sharded dags: 13 + +total bytes transferred from input, output, non local tensors: 391205666 +total bytes transferred from input, output, non local tensors with 2x bandwidths: 366017296 +% bytes transferred with 2x bandwidths: 93.56 + +NC0 FLOPs: 181850210 +NC1 FLOPs: 181842016 +% FLOPs sharded: 100.00 + + +Shard dim: 2048, Number of dags: 7 +Matmuls sharded with this dim: +[2048(s),2,6,2,128] @ [2,6,2,128,8,2,128] = [2048(s),8,2,128] (stationary-streaming swapped) Number of occurrences: 1 +[2048(s),2,8,128] @ [2,8,128,2,6,2,128] = [2048(s),2,6,2,128] Number of occurrences: 2 + + +Shard dim: 256, Number of dags: 5 +Matmuls sharded with this dim: + + +Shard dim: 75968, Number of dags: 1 +Matmuls sharded with this dim: +[2,8,128] @ [2,8,128,75968(s)] = [75968(s)] Number of occurrences: 1 + + + +2025-11-04T21:38:38Z INFO 8739 [sg0001/Tensorizer/LayoutPreprocessing]: LayoutPreprocessing finished after 0.082 seconds +2025-11-04T21:38:38Z INFO 8739 [sg0001/Tensorizer/LayoutRequirementAnalysis]: Running LayoutRequirementAnalysis +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/DeadStoreElimination]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/DeadStoreElimination]: DeadStoreElimination finished after 0.071 seconds +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/Recompute]: Running Recompute +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/Recompute]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/Recompute]: Recompute finished after 0.000 seconds +2025-11-04T21:38:38Z INFO 8740 [sg0002/Tensorizer/DelinearIndices]: Running DelinearIndices +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination_iteration_0 +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/DeadCodeElimination]: DeadCodeElimination_iteration_0 finished after 0.003 seconds +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/DeadCodeElimination]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8740 [sg0002/Tensorizer/DelinearIndices]: Finished (changed=True) +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/DeadCodeElimination]: DeadCodeElimination finished after 0.004 seconds +2025-11-04T21:38:38Z INFO 8738 [Tensorizer]: After optimization: 32 statements +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/DoNothing]: Running DoNothing +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/DoNothing]: Finished (changed=True) +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/MutateDataType]: Running MutateDataType +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/MutateDataType]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/MutateDataType]: MutateDataType finished after 0.003 seconds +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.001 seconds +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/Simplifier]: Running Simplifier +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/Simplifier]: Running Simplifier_iteration_0 +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/Simplifier]: Simplifier_iteration_0 finished after 0.008 seconds +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/Simplifier]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8740 [sg0002/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.010 seconds +2025-11-04T21:38:38Z INFO 8740 [sg0002/Tensorizer/RemoveShardedPartitionAxes]: Running RemoveShardedPartitionAxes +2025-11-04T21:38:38Z INFO 8740 [sg0002/Tensorizer/RemoveShardedPartitionAxes]: Finished (changed=True) +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.008 seconds +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/TileCCOps]: Running TileCCOps +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/TileCCOps]: Finished (changed=True) +2025-11-04T21:38:38Z INFO 8740 [sg0002/Tensorizer/RemoveShardedPartitionAxes]: RemoveShardedPartitionAxes finished after 0.015 seconds +2025-11-04T21:38:38Z INFO 8740 [sg0002/Tensorizer/InferShardAxis]: Finished (changed=True) +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/TileCCOps]: TileCCOps finished after 0.007 seconds +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/DelinearIndices]: Running DelinearIndices +2025-11-04T21:38:38Z INFO 8740 [sg0002/Tensorizer/InferShardAxis]: InferShardAxis finished after 0.578 seconds +2025-11-04T21:38:38Z INFO 8740 [sg0002/Tensorizer/MaskPropagation]: Running MaskPropagation +2025-11-04T21:38:38Z INFO 8740 [sg0002/Tensorizer/MaskPropagation]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/DelinearIndices]: Finished (changed=True) +2025-11-04T21:38:38Z INFO 8740 [sg0002/Tensorizer/MaskPropagation]: MaskPropagation finished after 0.004 seconds +2025-11-04T21:38:38Z INFO 8740 [sg0002/Tensorizer/CanonicalizeDAGForPGTiling]: Running CanonicalizeDAGForPGTiling +2025-11-04T21:38:38Z INFO 8740 [sg0002/Tensorizer/CanonicalizeDAGForPGTiling]: Finished (changed=True) +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.029 seconds +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/Delinearization]: Running Delinearization +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/Delinearization]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.005 seconds +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/DelinearIndices]: Running DelinearIndices +2025-11-04T21:38:38Z INFO 8738 [sg0000/Tensorizer/DelinearIndices]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8740 [sg0002/Tensorizer/CanonicalizeDAGForPGTiling]: CanonicalizeDAGForPGTiling finished after 0.008 seconds +2025-11-04T21:38:38Z INFO 8740 [sg0002/Tensorizer/LowerCCOpBlockAxis]: Running LowerCCOpBlockAxis +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/LowerCCOpBlockAxis]: Finished (changed=False) +2025-11-04T21:38:39Z INFO 8739 [sg0001/Tensorizer/LayoutRequirementAnalysis]: LayoutRequirementAnalysis finished after 0.027 seconds +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.012 seconds +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination_iteration_0 +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/DeadCodeElimination]: DeadCodeElimination_iteration_0 finished after 0.003 seconds +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/DeadCodeElimination]: Finished (changed=False) +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/LowerCCOpBlockAxis]: LowerCCOpBlockAxis finished after 0.008 seconds +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/PGTiling]: Running PGTiling +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/AGOrderingAnalysisPass]: Running AGOrderingAnalysisPass +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/DeadCodeElimination]: DeadCodeElimination finished after 0.004 seconds +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/LateLowerReshapeOp]: Running LateLowerReshapeOp +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/LateLowerReshapeOp]: Finished (changed=False) +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 598 of IO tensor {'CrossPassTensor': ''}bfloat16 %input367|N|(128, 2, 8) is not sorted, index list (w/ AG ids): [(26, 'AG77'), (21, 'AG79'), (22, 'AG78')] +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 599 of IO tensor {'CrossPassTensor': ''}bfloat16 %input368|NC|(2, 6, 128, 2, 8, 2, 128) is not sorted, index list (w/ AG ids): [(26, 'AG77'), (21, 'AG79'), (22, 'AG78')] +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 600 of IO tensor {'CrossPassTensor': ''}bfloat16 %input366|NC|(2, 6, 128, 2, 8, 2, 128) is not sorted, index list (w/ AG ids): [(26, 'AG77'), (21, 'AG79'), (22, 'AG78')] +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 601 of IO tensor {'CrossPassTensor': ''}bfloat16 %input365|NC|(8, 2, 128, 6, 2, 2, 128) is not sorted, index list (w/ AG ids): [(18, 'AG85'), (25, 'AG82'), (19, 'AG84'), (24, 'AG83')] +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 602 of IO tensor {'CrossPassTensor': ''}bfloat16 %input370|N|(128, 2, 8) is not sorted, index list (w/ AG ids): [(26, 'AG77'), (21, 'AG79'), (22, 'AG78')] +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 553 of IO tensor {'CrossPassTensor': ''}bfloat16 %input369|NC|(2, 37984, 2, 8, 128) is not sorted, index list (w/ AG ids): [(3, 'AG94'), (23, 'AG93'), (21, 'AG79'), (22, 'AG78')] +2025-11-04T21:38:39Z INFO 8739 [sg0001/Tensorizer/LayoutPreprocessingAndAnalysis]: LayoutPreprocessingAndAnalysis finished after 0.378 seconds +2025-11-04T21:38:39Z INFO 8739 [sg0001/Tensorizer/InferNonlocalTensors]: Running InferNonlocalTensors +2025-11-04T21:38:39Z INFO 8739 [sg0001/Tensorizer/InferNonlocalTensors]: prefer_non_broadcast_par: True +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/AGOrderingAnalysisPass]: AGOrderingAnalysisPass finished after 0.048 seconds +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/StaticTransposeLocalTensor]: Running StaticTransposeLocalTensor +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/StaticTransposeLocalTensor]: Finished (changed=True) +2025-11-04T21:38:39Z INFO 8739 [sg0001/Tensorizer/InferNonlocalTensors]: prefer_non_broadcast_par: True +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/LateLowerReshapeOp]: LateLowerReshapeOp finished after 0.012 seconds +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/InferIntrinsicOnCC]: Running InferIntrinsicOnCC +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/InferIntrinsicOnCC]: Finished (changed=False) +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/StaticTransposeLocalTensor]: StaticTransposeLocalTensor finished after 0.013 seconds +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/PComputeCutting]: Running PComputeCutting +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/PComputeCutting]: Finished (changed=True) +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/InferIntrinsicOnCC]: InferIntrinsicOnCC finished after 0.017 seconds +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/ResolveAccessConflict]: Running ResolveAccessConflict +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/ResolveAccessConflict]: Running DeadCodeElimination_iteration_0 +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/PComputeCutting]: PComputeCutting finished after 0.009 seconds +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/BFComputeCutting]: Running BFComputeCutting +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/ResolveAccessConflict]: DeadCodeElimination_iteration_0 finished after 0.007 seconds +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/ResolveAccessConflict]: Finished (changed=False) +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/BFComputeCutting]: Finished (changed=True) +2025-11-04T21:38:39Z INFO 8739 [sg0001/Tensorizer/InferNonlocalTensors]: Finished (changed=False) +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/ResolveAccessConflict]: ResolveAccessConflict finished after 0.013 seconds +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/LICM]: Running LICM +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/LICM]: Finished (changed=True) +2025-11-04T21:38:39Z INFO 8739 [sg0001/Tensorizer/InferNonlocalTensors]: InferNonlocalTensors finished after 0.085 seconds +2025-11-04T21:38:39Z INFO 8739 [sg0001/Tensorizer/PAGLayoutOpt]: Running PAGLayoutOpt +2025-11-04T21:38:39Z INFO 8739 [sg0001/Tensorizer/ParAxesAnnotation]: Running ParAxesAnnotation +2025-11-04T21:38:39Z INFO 8739 [sg0001/Tensorizer/LayoutSearchAlgorithm]: prefer_non_broadcast_par: True +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/BFComputeCutting]: BFComputeCutting finished after 0.008 seconds +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/LoopSplitting]: Running LoopSplitting +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/LoopSplitting]: Finished (changed=False) +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/LoopSplitting]: LoopSplitting finished after 0.001 seconds +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/MacroGeneration]: Running MacroGeneration +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/LICM]: LICM finished after 0.003 seconds +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/LocalLayoutOpt]: Running LocalLayoutOpt +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/MacroGeneration]: Finished (changed=True) +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/LocalLayoutOpt]: Finished (changed=True) +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/MacroGeneration]: MacroGeneration finished after 0.064 seconds +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/PGTiling]: PGTiling finished after 0.263 seconds +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/InsertIOTransposes]: Running InsertIOTransposes +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/LocalLayoutOpt]: LocalLayoutOpt finished after 0.044 seconds +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/DelinearIndices]: Running DelinearIndices +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/InsertIOTransposes]: Finished (changed=True) +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/DelinearIndices]: Finished (changed=False) +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/InsertIOTransposes]: InsertIOTransposes finished after 0.040 seconds +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/InsertOffloadedTransposes]: Running InsertOffloadedTransposes +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.024 seconds +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/PGLayoutTilingPipeline]: Running PGLayoutTilingPipeline +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/LowerCCOpBlockAxis]: Running LowerCCOpBlockAxis +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/InsertOffloadedTransposes]: OffloadedTranspose inserted: 0 +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/InsertOffloadedTransposes]: Finished (changed=False) +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/LowerCCOpBlockAxis]: Finished (changed=True) +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/InsertOffloadedTransposes]: InsertOffloadedTransposes finished after 0.015 seconds +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/DramToDramTranspose]: Running DramToDramTranspose +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/LowerCCOpBlockAxis]: LowerCCOpBlockAxis finished after 0.008 seconds +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/LayoutPreprocessingAndAnalysis]: Running LayoutPreprocessingAndAnalysis +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/LayoutPreprocessing]: Running LayoutPreprocessing +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/DramToDramTranspose]: Finished (changed=False) +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/Delinearization]: Running Delinearization +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/DramToDramTranspose]: DramToDramTranspose finished after 0.013 seconds +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/Delinearization]: Finished (changed=False) +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/PGLayoutTilingPipeline]: PGLayoutTilingPipeline finished after 1.852 seconds +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/TilingProfiler]: Running TilingProfiler +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/TilingBottleneck]: +20 MACROS WITH LARGEST INSTRUCTION COUNTS: +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/TilingBottleneck]: 9504: transpose_128x128 +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/TilingBottleneck]: 9504: matmul_128x128x1 +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/TilingBottleneck]: 1536: matmul_128x128x512 +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/TilingBottleneck]: 1536: matmul_128x128x512 +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/TilingBottleneck]: 1536: matmul_128x128x512 +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/TilingBottleneck]: 256: transpose_128x128 +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/TilingBottleneck]: 256: transpose_128x128 +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/TilingBottleneck]: 256: transpose_128x128 +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/TilingBottleneck]: 96: simd128x512 +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.006 seconds +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/TilingBottleneck]: 64: rmsnorm128x512x128 +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/TilingBottleneck]: 64: simd128x512 +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/TilingBottleneck]: 64: rmsnorm128x512x128 +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/TilingBottleneck]: 4: reduce512x1x1 +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/TilingBottleneck]: 4: simd1x512 +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/TilingBottleneck]: 4: reduce512x1x1 +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/TilingBottleneck]: 2: simd1x128 +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/TilingBottleneck]: 2: simd1x128 +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/TilingBottleneck]: 2: indirect_load128x1 +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/TilingBottleneck]: 2: simd1x128 +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/TilingBottleneck]: 2: simd1x128 +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/TilingProfiler]: Finished (changed=False) +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/TilingProfiler]: TilingProfiler finished after 0.025 seconds +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/LayoutPreprocessing]: Finished (changed=True) +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/FlattenMacroLoop]: Finished (changed=True) +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/LayoutPreprocessing]: LayoutPreprocessing finished after 0.073 seconds +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/LayoutRequirementAnalysis]: Running LayoutRequirementAnalysis +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.016 seconds +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/InferNeuronTensor]: Running InferNeuronTensor +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/InferNeuronTensor]: Running InferNeuronTensor_iteration_0 +2025-11-04T21:38:39Z INFO 8739 [sg0001/Tensorizer/ParAxesAnnotation]: Finished (changed=True) +2025-11-04T21:38:39Z INFO 8739 [sg0001/Tensorizer/ParAxesAnnotation]: ParAxesAnnotation finished after 0.323 seconds +2025-11-04T21:38:39Z INFO 8739 [sg0001/Tensorizer/InsertLocalTransposes]: Running InsertLocalTransposes +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/LayoutRequirementAnalysis]: LayoutRequirementAnalysis finished after 0.012 seconds +2025-11-04T21:38:39Z INFO 8739 [sg0001/Tensorizer/InsertLocalTransposes]: Finished (changed=True) +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/LayoutPreprocessingAndAnalysis]: LayoutPreprocessingAndAnalysis finished after 0.137 seconds +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/InferNonlocalTensors]: Running InferNonlocalTensors +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/InferNonlocalTensors]: prefer_non_broadcast_par: True +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/InferNeuronTensor]: InferNeuronTensor_iteration_0 finished after 0.053 seconds +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/InferNeuronTensor]: Running InferNeuronTensor_iteration_1 +2025-11-04T21:38:39Z INFO 8739 [sg0001/Tensorizer/InsertLocalTransposes]: InsertLocalTransposes finished after 0.018 seconds +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/InferNeuronTensor]: InferNeuronTensor_iteration_1 finished after 0.003 seconds +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/InferNeuronTensor]: Finished (changed=True) +2025-11-04T21:38:39Z INFO 8739 [sg0001/Tensorizer/PAGLayoutOpt]: PAGLayoutOpt finished after 0.377 seconds +2025-11-04T21:38:39Z INFO 8739 [sg0001/Tensorizer/DelinearizeSPMD]: Running DelinearizeSPMD +2025-11-04T21:38:39Z INFO 8739 [sg0001/Tensorizer/Delinearization]: Running Delinearization +2025-11-04T21:38:39Z INFO 8739 [sg0001/Tensorizer/Delinearization]: Finished (changed=False) +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/InferNeuronTensor]: InferNeuronTensor finished after 0.057 seconds +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier_iteration_0 +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/NeuronSimplifier]: NeuronSimplifier_iteration_0 finished after 0.010 seconds +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/NeuronSimplifier]: Finished (changed=False) +2025-11-04T21:38:39Z INFO 8739 [sg0001/Tensorizer/Delinearization]: Delinearization finished after 0.006 seconds +2025-11-04T21:38:39Z INFO 8739 [sg0001/Tensorizer/DelinearizeSPMD]: Finished (changed=False) +2025-11-04T21:38:39Z INFO 8739 [sg0001/Tensorizer/DelinearizeSPMD]: DelinearizeSPMD finished after 0.032 seconds +2025-11-04T21:38:39Z INFO 8739 [sg0001/Tensorizer/ShardingPropagationAnalysis]: Running ShardingPropagationAnalysis +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/InferNonlocalTensors]: prefer_non_broadcast_par: True +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.010 seconds +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/LICM]: Running LICM +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/LICM]: Finished (changed=True) +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/LICM]: LICM finished after 0.003 seconds +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/RewriteReplicationMatmul]: Running RewriteReplicationMatmul +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/RewriteReplicationMatmul]: Finished (changed=False) +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/RewriteReplicationMatmul]: RewriteReplicationMatmul finished after 0.004 seconds +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/FlattenMacroLoop]: Finished (changed=True) +2025-11-04T21:38:39Z INFO 8739 [sg0001/Tensorizer/ShardingPropagationAnalysis]: ShardingPropagationAnalysis finished after 0.029 seconds +2025-11-04T21:38:39Z INFO 8739 [sg0001/Tensorizer/InferShardAxis]: Running InferShardAxis +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/InferNonlocalTensors]: Finished (changed=False) +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.007 seconds +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/SimplifyMacroPredicates]: Running SimplifyMacroPredicates +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/InferNonlocalTensors]: InferNonlocalTensors finished after 0.165 seconds +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/PAGLayoutOpt]: Running PAGLayoutOpt +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/ParAxesAnnotation]: Running ParAxesAnnotation +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/SimplifyMacroPredicates]: Finished (changed=True) +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/SimplifyMacroPredicates]: SimplifyMacroPredicates finished after 0.016 seconds +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/DataLocalityOpt]: Running DataLocalityOpt +2025-11-04T21:38:39Z INFO 8738 [sg0000/Tensorizer/LayoutSearchAlgorithm]: prefer_non_broadcast_par: True +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/DataLocalityOpt]: Finished (changed=True) +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/DataLocalityOpt]: DataLocalityOpt finished after 0.170 seconds +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/DMATilingProfiler]: Running DMATilingProfiler +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/PostDLOTilingBottleneck]: +20 MACROS WITH LARGEST INSTRUCTION COUNTS: +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 9504: transpose_128x128 +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 9504: matmul_128x128x1 +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 1536: matmul_128x128x512 +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 1536: matmul_128x128x512 +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 1536: matmul_128x128x512 +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 594: transpose_128x1 +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 256: transpose_128x128 +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 256: transpose_128x128 +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 256: transpose_128x128 +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 96: simd128x512 +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 96: dma128x512 +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 64: rmsnorm128x512x128 +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 64: simd128x512 +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 64: rmsnorm128x512x128 +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 32: dma128x1024 +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 24: dma128x2048 +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 24: dma128x2048 +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 4: reduce512x1x1 +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 4: simd1x512 +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 4: reduce512x1x1 +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/DMATilingProfiler]: Finished (changed=False) +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/DMATilingProfiler]: DMATilingProfiler finished after 0.009 seconds +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier_iteration_0 +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/NeuronSimplifier]: NeuronSimplifier_iteration_0 finished after 0.014 seconds +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/NeuronSimplifier]: Finished (changed=False) +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.014 seconds +2025-11-04T21:38:39Z INFO 8740 [sg0002/Tensorizer/LegalizeSundaMacro]: Running LegalizeSundaMacro +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/LegalizeSundaMacro]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/LegalizeSundaMacro]: LegalizeSundaMacro finished after 0.027 seconds +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/InsertImplicitShardAxisBeforeISel]: Running InsertImplicitShardAxisBeforeISel +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/InsertImplicitShardAxisBeforeISel]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/InsertImplicitShardAxisBeforeISel]: InsertImplicitShardAxisBeforeISel finished after 0.013 seconds +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier_iteration_0 +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/NeuronSimplifier]: NeuronSimplifier_iteration_0 finished after 0.035 seconds +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier_iteration_1 +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/NeuronSimplifier]: NeuronSimplifier_iteration_1 finished after 0.021 seconds +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/NeuronSimplifier]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.056 seconds +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/PerfectLoopNest]: Running PerfectLoopNest +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/PerfectLoopNest]: Finished (changed=False) +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/PerfectLoopNest]: PerfectLoopNest finished after 0.006 seconds +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/ShardResult]: =================== Dumping Debug Info ===================== +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/ShardResult]: ------------------ Sharding summary ------------------ +total number of dags: 32 +total number of sharded dags: 25 + +total bytes transferred from input, output, non local tensors: 119546884 +total bytes transferred from input, output, non local tensors with 2x bandwidths: 85987328 +% bytes transferred with 2x bandwidths: 71.93 + +NC0 FLOPs: 36893488143169486851 +NC1 FLOPs: 36893488143169486848 +% FLOPs sharded: 100.00 + + +Shard dim: 2048, Number of dags: 24 +Matmuls sharded with this dim: +[2048(s),2,6,2,128] @ [2,6,2,128,8,2,128] = [2048(s),8,2,128] (stationary-streaming swapped) Number of occurrences: 1 +[2048(s),2,8,128] @ [2,8,128,2,2,2,2,64] = [2048(s),2,2,2,2,64] Number of occurrences: 1 +[2048(s),2,8,128] @ [2,8,128,2,6,2,128] = [2048(s),2,6,2,128] Number of occurrences: 2 +[2048(s),2,8,128] @ [2,8,128,4,128] = [2048(s),4,128] Number of occurrences: 1 +[2048(s),2,8,128] @ [2,8,128,4,2,64] = [2048(s),4,2,64] Number of occurrences: 1 + + +Shard dim: 2, Number of dags: 1 +Matmuls sharded with this dim: +[2048,4,2,128] @ [4,2,128,2(s),2,4,128] = [2048,2(s),2,4,128] (stationary-streaming swapped) Number of occurrences: 1 + + + +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/FlattenMacroLoop]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.021 seconds +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/RewriteWeights]: Running RewriteWeights +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/RewriteWeights]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/DelinearIndices]: Running DelinearIndices +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/RewriteWeights]: RewriteWeights finished after 0.008 seconds +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/ReshapeWeights]: Running ReshapeWeights +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/ReshapeWeights]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/DelinearIndices]: Finished (changed=False) +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/ReshapeWeights]: ReshapeWeights finished after 0.002 seconds +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/FlattenMacroLoop]: Finished (changed=False) +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.021 seconds +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/RemoveShardedPartitionAxes]: Running RemoveShardedPartitionAxes +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.005 seconds +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/SimplifyMacroPredicates]: Running SimplifyMacroPredicates +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/RemoveShardedPartitionAxes]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8738 [sg0000/Tensorizer/ParAxesAnnotation]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/SimplifyMacroPredicates]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/RemoveShardedPartitionAxes]: RemoveShardedPartitionAxes finished after 0.047 seconds +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/InferShardAxis]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/InferShardAxis]: InferShardAxis finished after 0.605 seconds +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/MaskPropagation]: Running MaskPropagation +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/MaskPropagation]: Finished (changed=False) +2025-11-04T21:38:40Z INFO 8738 [sg0000/Tensorizer/ParAxesAnnotation]: ParAxesAnnotation finished after 0.555 seconds +2025-11-04T21:38:40Z INFO 8738 [sg0000/Tensorizer/InsertLocalTransposes]: Running InsertLocalTransposes +2025-11-04T21:38:40Z INFO 8738 [sg0000/Tensorizer/InsertLocalTransposes]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/SimplifyMacroPredicates]: SimplifyMacroPredicates finished after 0.032 seconds +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/InferInitValue]: Running InferInitValue +2025-11-04T21:38:40Z INFO 8738 [sg0000/Tensorizer/InsertLocalTransposes]: InsertLocalTransposes finished after 0.011 seconds +2025-11-04T21:38:40Z INFO 8738 [sg0000/Tensorizer/PAGLayoutOpt]: PAGLayoutOpt finished after 0.649 seconds +2025-11-04T21:38:40Z INFO 8738 [sg0000/Tensorizer/DelinearizeSPMD]: Running DelinearizeSPMD +2025-11-04T21:38:40Z INFO 8738 [sg0000/Tensorizer/Delinearization]: Running Delinearization +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/MaskPropagation]: MaskPropagation finished after 0.005 seconds +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/CanonicalizeDAGForPGTiling]: Running CanonicalizeDAGForPGTiling +2025-11-04T21:38:40Z INFO 8738 [sg0000/Tensorizer/Delinearization]: Finished (changed=False) +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/CanonicalizeDAGForPGTiling]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8738 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.020 seconds +2025-11-04T21:38:40Z INFO 8738 [sg0000/Tensorizer/DelinearizeSPMD]: Finished (changed=False) +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/InferInitValue]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8738 [sg0000/Tensorizer/DelinearizeSPMD]: DelinearizeSPMD finished after 0.037 seconds +2025-11-04T21:38:40Z INFO 8738 [sg0000/Tensorizer/ShardingPropagationAnalysis]: Running ShardingPropagationAnalysis +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/InferInitValue]: InferInitValue finished after 0.084 seconds +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier_iteration_0 +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/NeuronSimplifier]: NeuronSimplifier_iteration_0 finished after 0.021 seconds +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/NeuronSimplifier]: Finished (changed=False) +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/CanonicalizeDAGForPGTiling]: CanonicalizeDAGForPGTiling finished after 0.011 seconds +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/LowerCCOpBlockAxis]: Running LowerCCOpBlockAxis +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.022 seconds +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/SimplifyTensor]: Running SimplifyTensor +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/LowerCCOpBlockAxis]: Finished (changed=False) +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/SimplifyTensor]: Running DeadCodeElimination_iteration_0 +2025-11-04T21:38:40Z INFO 8738 [sg0000/Tensorizer/ShardingPropagationAnalysis]: ShardingPropagationAnalysis finished after 0.040 seconds +2025-11-04T21:38:40Z INFO 8738 [sg0000/Tensorizer/InferShardAxis]: Running InferShardAxis +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/SimplifyTensor]: DeadCodeElimination_iteration_0 finished after 0.004 seconds +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/SimplifyTensor]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/SimplifyTensor]: SimplifyTensor finished after 0.018 seconds +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/LICM]: Running LICM +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/LICM]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/LowerCCOpBlockAxis]: LowerCCOpBlockAxis finished after 0.015 seconds +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/PGTiling]: Running PGTiling +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/AGOrderingAnalysisPass]: Running AGOrderingAnalysisPass +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/LICM]: LICM finished after 0.006 seconds +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/SundaISel]: Running SundaISel +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 655 of IO tensor {'CrossPassTensor': ''}bfloat16 %input70|N|(128, 2, 8) is not sorted, index list (w/ AG ids): [(31, 'AG111'), (27, 'AG113'), (29, 'AG112')] +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 656 of IO tensor {'CrossPassTensor': ''}bfloat16 %input71|NC|(2, 6, 128, 2, 8, 2, 128) is not sorted, index list (w/ AG ids): [(31, 'AG111'), (27, 'AG113'), (29, 'AG112')] +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 657 of IO tensor {'CrossPassTensor': ''}bfloat16 %input69|NC|(2, 6, 128, 2, 8, 2, 128) is not sorted, index list (w/ AG ids): [(31, 'AG111'), (27, 'AG113'), (29, 'AG112')] +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 658 of IO tensor {'CrossPassTensor': ''}bfloat16 %input68|NC|(8, 2, 128, 6, 2, 2, 128) is not sorted, index list (w/ AG ids): [(24, 'AG119'), (30, 'AG116'), (25, 'AG118'), (28, 'AG117')] +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 659 of IO tensor {'CrossPassTensor': ''}bfloat16 %input74|N|(128, 2, 8) is not sorted, index list (w/ AG ids): [(31, 'AG111'), (27, 'AG113'), (29, 'AG112')] +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 660 of IO tensor {'CrossPassTensor': ''}bfloat16 %input78|NC|(2, 2, 128, 8, 2, 2, 2, 64) is not sorted, index list (w/ AG ids): [(27, 'AG113'), (31, 'AG111'), (29, 'AG112')] +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 661 of IO tensor {'CrossPassTensor': ''}bfloat16 %input77|N|(64, 2) is not sorted, index list (w/ AG ids): [(13, 'AG123'), (9, 'AG124')] +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 662 of IO tensor {'CrossPassTensor': ''}bfloat16 %input76|NC|(2, 128, 8, 4, 2, 64) is not sorted, index list (w/ AG ids): [(27, 'AG113'), (31, 'AG111'), (29, 'AG112')] +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 663 of IO tensor {'CrossPassTensor': ''}bfloat16 %input75|N|(64, 2) is not sorted, index list (w/ AG ids): [(18, 'AG128'), (14, 'AG129')] +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 664 of IO tensor {'CrossPassTensor': ''}bfloat16 %input73|NC|(2, 128, 8, 4, 128) is not sorted, index list (w/ AG ids): [(27, 'AG113'), (31, 'AG111'), (29, 'AG112')] +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 444 of IO tensor {'CrossPassTensor': ''}bfloat16 %input72|NC|(2, 2, 128, 4, 2, 4, 128) is not sorted, index list (w/ AG ids): [(20, 'AG135'), (12, 'AG137'), (17, 'AG136')] +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 694 of IO tensor non_local bfloat16 %reshape.68(4, 2, 2, 64, 2, 1024) is not sorted, index list (w/ AG ids): [(10, 'AG130'), (15, 'AG131'), (7, 'AG115'), (26, 'AG114')] +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 644 of IO tensor non_local bfloat16 %reshape.73(4, 2, 2, 1024, 128) is not sorted, index list (w/ AG ids): [(11, 'AG133'), (16, 'AG134'), (7, 'AG115'), (19, 'AG132')] +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/AGOrderingAnalysisPass]: AGOrderingAnalysisPass finished after 0.102 seconds +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/StaticTransposeLocalTensor]: Running StaticTransposeLocalTensor +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/SundaISel]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/StaticTransposeLocalTensor]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/SundaISel]: SundaISel finished after 0.103 seconds +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/NeuronAliasDependencyReset]: Running NeuronAliasDependencyReset +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/AliasDependencyElimination]: Finished (changed=False) +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.000 seconds +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/NeuronAliasDependencyInduction]: Running NeuronAliasDependencyInduction +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/NeuronAliasDependencyInduction]: Finished (changed=False) +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/NeuronAliasDependencyInduction]: NeuronAliasDependencyInduction finished after 0.001 seconds +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/NeuronAliasDependencyReset]: NeuronAliasDependencyReset finished after 0.022 seconds +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/LowerComplexBroadcast]: Running LowerComplexBroadcast +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/LowerComplexBroadcast]: Finished (changed=False) +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/StaticTransposeLocalTensor]: StaticTransposeLocalTensor finished after 0.008 seconds +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/PComputeCutting]: Running PComputeCutting +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/LowerComplexBroadcast]: LowerComplexBroadcast finished after 0.006 seconds +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/NeuronLoopInterchange]: Running NeuronLoopInterchange +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/NeuronLoopInterchange]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/PComputeCutting]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/NeuronLoopInterchange]: NeuronLoopInterchange finished after 0.014 seconds +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/PComputeCutting]: PComputeCutting finished after 0.039 seconds +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/BFComputeCutting]: Running BFComputeCutting +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/BFComputeCutting]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.011 seconds +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/NeuronLoopFusion]: Running NeuronLoopFusion +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/NeuronLoopFusion]: Running NeuronLoopFusion_iteration_0 +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/BFComputeCutting]: BFComputeCutting finished after 0.009 seconds +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/LoopSplitting]: Running LoopSplitting +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/LoopSplitting]: Finished (changed=False) +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/NeuronLoopFusion]: NeuronLoopFusion_iteration_0 finished after 0.035 seconds +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/NeuronLoopFusion]: Running NeuronLoopFusion_iteration_1 +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/LoopSplitting]: LoopSplitting finished after 0.006 seconds +2025-11-04T21:38:40Z INFO 8739 [sg0001/Tensorizer/MacroGeneration]: Running MacroGeneration +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/NeuronLoopFusion]: NeuronLoopFusion_iteration_1 finished after 0.007 seconds +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/NeuronLoopFusion]: Running NeuronLoopFusion_iteration_2 +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/NeuronLoopFusion]: NeuronLoopFusion_iteration_2 finished after 0.011 seconds +2025-11-04T21:38:40Z INFO 8738 [sg0000/Tensorizer/ShardResult]: =================== Dumping Debug Info ===================== +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/NeuronLoopFusion]: Running NeuronLoopFusion_iteration_3 +2025-11-04T21:38:40Z INFO 8738 [sg0000/Tensorizer/ShardResult]: ------------------ Sharding summary ------------------ +total number of dags: 32 +total number of sharded dags: 25 + +total bytes transferred from input, output, non local tensors: 68180998 +total bytes transferred from input, output, non local tensors with 2x bandwidths: 59791360 +% bytes transferred with 2x bandwidths: 87.70 + +NC0 FLOPs: 36893488143150284803 +NC1 FLOPs: 36893488143150284800 +% FLOPs sharded: 100.00 + + +Shard dim: 2048, Number of dags: 23 +Matmuls sharded with this dim: +[2048(s),2,2,4,128] @ [2,2,4,128,2,2,2,2,64] = [2048(s),2,2,2,2,64] Number of occurrences: 1 +[2048(s),2,2,4,128] @ [2,2,4,128,4,128] = [2048(s),4,128] Number of occurrences: 1 +[2048(s),2,2,4,128] @ [2,2,4,128,4,2,64] = [2048(s),4,2,64] Number of occurrences: 1 +[64] @ [2048(s)] = [64,2048(s)] Number of occurrences: 1 + + +Shard dim: 2, Number of dags: 1 +Matmuls sharded with this dim: +[2048,4,2,128] @ [4,2,128,2(s),2,4,128] = [2048,2(s),2,4,128] (stationary-streaming swapped) Number of occurrences: 1 + + +Shard dim: 512, Number of dags: 1 +Matmuls sharded with this dim: + + + +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/NeuronLoopFusion]: NeuronLoopFusion_iteration_3 finished after 0.009 seconds +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/NeuronLoopFusion]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/NeuronLoopFusion]: NeuronLoopFusion finished after 0.073 seconds +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/NeuronLoopInterchange]: Running NeuronLoopInterchange +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/NeuronLoopInterchange]: Finished (changed=False) +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/NeuronLoopInterchange]: NeuronLoopInterchange finished after 0.003 seconds +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-11-04T21:38:40Z INFO 8738 [sg0000/Tensorizer/DelinearIndices]: Running DelinearIndices +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/NeuronLICM]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8738 [sg0000/Tensorizer/DelinearIndices]: Finished (changed=False) +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.023 seconds +2025-11-04T21:38:40Z INFO 8740 [sg0002/Tensorizer/FactorizeBlkDims]: Running FactorizeBlkDims +2025-11-04T21:38:40Z INFO 8738 [sg0000/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.026 seconds +2025-11-04T21:38:40Z INFO 8738 [sg0000/Tensorizer/RemoveShardedPartitionAxes]: Running RemoveShardedPartitionAxes +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/FactorizeBlkDims]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/FactorizeBlkDims]: FactorizeBlkDims finished after 0.037 seconds +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_0 +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/RemoveShardedPartitionAxes]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/MacroGeneration]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/RemoveShardedPartitionAxes]: RemoveShardedPartitionAxes finished after 0.042 seconds +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/InferShardAxis]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/InferShardAxis]: InferShardAxis finished after 0.544 seconds +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/MaskPropagation]: Running MaskPropagation +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/MacroGeneration]: MacroGeneration finished after 0.171 seconds +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/MaskPropagation]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_0 finished after 0.059 seconds +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_1 +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/PGTiling]: PGTiling finished after 0.533 seconds +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/InsertIOTransposes]: Running InsertIOTransposes +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_1 finished after 0.012 seconds +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/NeuronInstComb]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.073 seconds +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/NeuronValueNumbering]: Running NeuronValueNumbering +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/NeuronValueNumbering]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/MaskPropagation]: MaskPropagation finished after 0.015 seconds +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/CanonicalizeDAGForPGTiling]: Running CanonicalizeDAGForPGTiling +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/CanonicalizeDAGForPGTiling]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/CanonicalizeDAGForPGTiling]: CanonicalizeDAGForPGTiling finished after 0.004 seconds +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/LowerCCOpBlockAxis]: Running LowerCCOpBlockAxis +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/InsertIOTransposes]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/LowerCCOpBlockAxis]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/NeuronValueNumbering]: NeuronValueNumbering finished after 0.014 seconds +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_0 +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/InsertIOTransposes]: InsertIOTransposes finished after 0.074 seconds +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/InsertOffloadedTransposes]: Running InsertOffloadedTransposes +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_0 finished after 0.020 seconds +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_1 +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/LowerCCOpBlockAxis]: LowerCCOpBlockAxis finished after 0.009 seconds +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/PGTiling]: Running PGTiling +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/AGOrderingAnalysisPass]: Running AGOrderingAnalysisPass +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_1 finished after 0.018 seconds +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/NeuronInstComb]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/InsertOffloadedTransposes]: OffloadedTranspose inserted: 0 +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/InsertOffloadedTransposes]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.046 seconds +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/InferSharedMemLoc]: Running InferSharedMemLoc +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/InferSharedMemLoc]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/InsertOffloadedTransposes]: InsertOffloadedTransposes finished after 0.035 seconds +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/DramToDramTranspose]: Running DramToDramTranspose +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/InferSharedMemLoc]: InferSharedMemLoc finished after 0.006 seconds +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/VectorizeDMA]: Running VectorizeDMA +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/VectorizeDMA]: Running VectorizeDMA_iteration_0 +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 633 of IO tensor {'CrossPassTensor': ''}bfloat16 %input63|N|(128, 2, 2, 4) is not sorted, index list (w/ AG ids): [(30, 'AG95'), (24, 'AG98'), (21, 'AG97'), (27, 'AG96')] +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 634 of IO tensor {'CrossPassTensor': ''}bfloat16 %input67|NC|(2, 2, 128, 2, 4, 2, 2, 2, 64) is not sorted, index list (w/ AG ids): [(24, 'AG98'), (30, 'AG95'), (21, 'AG97'), (27, 'AG96')] +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 635 of IO tensor {'CrossPassTensor': ''}bfloat16 %input66|N|(64, 2) is not sorted, index list (w/ AG ids): [(25, 'AG101'), (22, 'AG104')] +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 636 of IO tensor {'CrossPassTensor': ''}bfloat16 %input65|NC|(2, 128, 2, 4, 4, 2, 64) is not sorted, index list (w/ AG ids): [(24, 'AG98'), (30, 'AG95'), (21, 'AG97'), (27, 'AG96')] +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 637 of IO tensor {'CrossPassTensor': ''}bfloat16 %input64|N|(64, 2) is not sorted, index list (w/ AG ids): [(25, 'AG101'), (18, 'AG108')] +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 638 of IO tensor {'CrossPassTensor': ''}bfloat16 %input62|NC|(2, 128, 2, 4, 4, 128) is not sorted, index list (w/ AG ids): [(24, 'AG98'), (30, 'AG95'), (21, 'AG97'), (27, 'AG96')] +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 419 of IO tensor {'CrossPassTensor': ''}bfloat16 %input61|NC|(2, 2, 128, 4, 2, 4, 128) is not sorted, index list (w/ AG ids): [(28, 'AG114'), (23, 'AG116'), (26, 'AG115')] +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 631 of IO tensor non_local bfloat16 %all_gather.1(2, 2, 4, 128, 2, 1024) is not sorted, index list (w/ AG ids): [(21, 'AG97'), (24, 'AG98'), (27, 'AG96'), (1, 'AG100'), (29, 'AG99')] +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 520 of IO tensor {'IntermediateTensor': ''}bfloat16 %intermediate0(2, 1024, 2, 2, 4, 128) is not sorted, index list (w/ AG ids): [(1, 'AG100'), (29, 'AG99'), (24, 'AG98'), (21, 'AG97'), (27, 'AG96')] +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 582 of IO tensor non_local bfloat16 %reshape.16(2, 2, 2, 2, 64, 2, 1024) is not sorted, index list (w/ AG ids): [(6, 'AG107'), (13, 'AG106'), (17, 'AG105'), (22, 'AG104'), (25, 'AG101'), (1, 'AG100')] +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 676 of IO tensor non_local bfloat16 %reshape.24(4, 2, 2, 64, 2, 1024) is not sorted, index list (w/ AG ids): [(7, 'AG109'), (14, 'AG110'), (18, 'AG108'), (25, 'AG101'), (1, 'AG100')] +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 614 of IO tensor non_local bfloat16 %reshape.29(4, 2, 2, 1024, 128) is not sorted, index list (w/ AG ids): [(8, 'AG112'), (15, 'AG113'), (1, 'AG100'), (19, 'AG111')] +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/VectorizeDMA]: VectorizeDMA_iteration_0 finished after 0.005 seconds +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/VectorizeDMA]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/DramToDramTranspose]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/AGOrderingAnalysisPass]: AGOrderingAnalysisPass finished after 0.062 seconds +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/StaticTransposeLocalTensor]: Running StaticTransposeLocalTensor +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/StaticTransposeLocalTensor]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/DramToDramTranspose]: DramToDramTranspose finished after 0.022 seconds +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/PGLayoutTilingPipeline]: PGLayoutTilingPipeline finished after 2.710 seconds +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/TilingProfiler]: Running TilingProfiler +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/TilingBottleneck]: +20 MACROS WITH LARGEST INSTRUCTION COUNTS: +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/TilingBottleneck]: 1536: matmul_128x128x512 +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/TilingBottleneck]: 1536: matmul_128x128x512 +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/TilingBottleneck]: 1536: matmul_128x128x512 +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/TilingBottleneck]: 512: matmul_128x128x512 +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/TilingBottleneck]: 512: matmul_128x128x512 +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/TilingBottleneck]: 256: transpose_128x128 +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/TilingBottleneck]: 256: transpose_128x128 +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/TilingBottleneck]: 256: transpose_128x128 +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/TilingBottleneck]: 256: matmul_128x128x512 +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/TilingBottleneck]: 256: matmul_128x128x512 +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/TilingBottleneck]: 96: simd128x512 +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/TilingBottleneck]: 64: rmsnorm128x512x128 +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/TilingBottleneck]: 64: simd128x512 +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/TilingBottleneck]: 64: rmsnorm128x512x128 +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/TilingBottleneck]: 64: transpose_128x128 +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/TilingBottleneck]: 64: transpose_128x128 +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/TilingBottleneck]: 64: transpose_128x128 +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/TilingBottleneck]: 64: generic_store128x128 +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/TilingBottleneck]: 64: generic_store128x128 +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/TilingBottleneck]: 32: rmsnorm128x512x128 +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/StaticTransposeLocalTensor]: StaticTransposeLocalTensor finished after 0.011 seconds +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/PComputeCutting]: Running PComputeCutting +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/TilingProfiler]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/TilingProfiler]: TilingProfiler finished after 0.022 seconds +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/PComputeCutting]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/VectorizeDMA]: VectorizeDMA finished after 0.006 seconds +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/FlattenMacroLoop]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.019 seconds +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/InferNeuronTensor]: Running InferNeuronTensor +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/InferNeuronTensor]: Running InferNeuronTensor_iteration_0 +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/PComputeCutting]: PComputeCutting finished after 0.024 seconds +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/BFComputeCutting]: Running BFComputeCutting +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/BFComputeCutting]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.008 seconds +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/LegalizePartitionReduce]: Running LegalizePartitionReduce +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/LegalizePartitionReduce]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/LegalizePartitionReduce]: LegalizePartitionReduce finished after 0.003 seconds +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/DeConcat]: Running DeConcat +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/DeConcat]: Running DeConcat_iteration_0 +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/DeConcat]: DeConcat_iteration_0 finished after 0.011 seconds +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/DeConcat]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/BFComputeCutting]: BFComputeCutting finished after 0.008 seconds +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/LoopSplitting]: Running LoopSplitting +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/LoopSplitting]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/DeConcat]: DeConcat finished after 0.011 seconds +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/FactorizeThreadAxesInFreeDims]: Running FactorizeThreadAxesInFreeDims +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/InferNeuronTensor]: InferNeuronTensor_iteration_0 finished after 0.090 seconds +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/InferNeuronTensor]: Running InferNeuronTensor_iteration_1 +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/FactorizeThreadAxesInFreeDims]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/InferNeuronTensor]: InferNeuronTensor_iteration_1 finished after 0.005 seconds +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/InferNeuronTensor]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/LoopSplitting]: LoopSplitting finished after 0.002 seconds +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/MacroGeneration]: Running MacroGeneration +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/InferNeuronTensor]: InferNeuronTensor finished after 0.097 seconds +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier_iteration_0 +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/FactorizeThreadAxesInFreeDims]: FactorizeThreadAxesInFreeDims finished after 0.009 seconds +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/PartialSimdFusion]: Running PartialSimdFusion +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/PartialSimdFusion]: Running PartialSimdFusion_iteration_0 +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/NeuronSimplifier]: NeuronSimplifier_iteration_0 finished after 0.026 seconds +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/NeuronSimplifier]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.026 seconds +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/LICM]: Running LICM +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/PartialSimdFusion]: PartialSimdFusion_iteration_0 finished after 0.023 seconds +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/PartialSimdFusion]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/LICM]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/PartialSimdFusion]: PartialSimdFusion finished after 0.024 seconds +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/TritiumFusion]: Running TritiumFusion +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/LICM]: LICM finished after 0.004 seconds +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/RewriteReplicationMatmul]: Running RewriteReplicationMatmul +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/RewriteReplicationMatmul]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/RewriteReplicationMatmul]: RewriteReplicationMatmul finished after 0.003 seconds +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/FlattenMacroLoop]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.012 seconds +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/SimplifyMacroPredicates]: Running SimplifyMacroPredicates +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/SimplifyMacroPredicates]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/SimplifyMacroPredicates]: SimplifyMacroPredicates finished after 0.009 seconds +2025-11-04T21:38:41Z INFO 8739 [sg0001/Tensorizer/DataLocalityOpt]: Running DataLocalityOpt +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/MacroGeneration]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/MacroGeneration]: MacroGeneration finished after 0.173 seconds +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/PGTiling]: PGTiling finished after 0.462 seconds +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/InsertIOTransposes]: Running InsertIOTransposes +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/TritiumFusion]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/TritiumFusion]: TritiumFusion finished after 0.146 seconds +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/CCOpFusion]: Running CCOpFusion +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/CCOpFusion]: Running CCOpFusion_iteration_0 +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/CCOpFusion]: CCOpFusion_iteration_0 finished after 0.047 seconds +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/CCOpFusion]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/InsertIOTransposes]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/CCOpFusion]: CCOpFusion finished after 0.047 seconds +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/VectorizeMatMult]: Running VectorizeMatMult +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/InsertIOTransposes]: InsertIOTransposes finished after 0.078 seconds +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/InsertOffloadedTransposes]: Running InsertOffloadedTransposes +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/VectorizeMatMult]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/VectorizeMatMult]: VectorizeMatMult finished after 0.029 seconds +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/PartialLoopFusion]: Running PartialLoopFusion +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/PartialLoopFusion]: Running PartialLoopFusion_iteration_0 +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/InsertOffloadedTransposes]: OffloadedTranspose inserted: 0 +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/InsertOffloadedTransposes]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/InsertOffloadedTransposes]: InsertOffloadedTransposes finished after 0.036 seconds +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/DramToDramTranspose]: Running DramToDramTranspose +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/PartialLoopFusion]: PartialLoopFusion_iteration_0 finished after 0.056 seconds +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/PartialLoopFusion]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/DramToDramTranspose]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/PartialLoopFusion]: PartialLoopFusion finished after 0.057 seconds +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/DramToDramTranspose]: DramToDramTranspose finished after 0.035 seconds +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/NeuronLICM]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/PGLayoutTilingPipeline]: PGLayoutTilingPipeline finished after 2.516 seconds +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/TilingProfiler]: Running TilingProfiler +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.030 seconds +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/LowerTranspose]: Running LowerTranspose +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/TilingBottleneck]: +20 MACROS WITH LARGEST INSTRUCTION COUNTS: +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/TilingBottleneck]: 512: matmul_128x128x512 +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/TilingBottleneck]: 512: matmul_128x128x512 +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/TilingBottleneck]: 256: transpose_128x128 +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/TilingBottleneck]: 256: matmul_128x128x512 +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/TilingBottleneck]: 256: matmul_128x128x512 +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/TilingBottleneck]: 128: transpose_128x128 +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/TilingBottleneck]: 128: transpose_128x128 +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/TilingBottleneck]: 128: transpose_128x128 +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/TilingBottleneck]: 128: transpose_128x128 +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/TilingBottleneck]: 64: indirect_load128x256 +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/TilingBottleneck]: 64: simd128x512 +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/TilingBottleneck]: 64: rmsnorm128x512x128 +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/TilingBottleneck]: 64: transpose_128x128 +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/TilingBottleneck]: 64: transpose_128x128 +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/TilingBottleneck]: 64: generic_store128x128 +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/TilingBottleneck]: 64: generic_store128x128 +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/TilingBottleneck]: 32: rmsnorm128x512x128 +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/TilingBottleneck]: 32: simd128x256 +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/TilingBottleneck]: 32: simd128x256 +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/TilingBottleneck]: 32: simd128x512 +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/TilingProfiler]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/TilingProfiler]: TilingProfiler finished after 0.043 seconds +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/LowerTranspose]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/LowerTranspose]: LowerTranspose finished after 0.047 seconds +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/LowerBroadcast]: Running LowerBroadcast +2025-11-04T21:38:41Z INFO 8740 [sg0002/Tensorizer/LowerBroadcast]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8738 [sg0000/Tensorizer/FlattenMacroLoop]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/LowerBroadcast]: LowerBroadcast finished after 0.005 seconds +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb_iteration_0 +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.028 seconds +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/InferNeuronTensor]: Running InferNeuronTensor +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/InferNeuronTensor]: Running InferNeuronTensor_iteration_0 +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb_iteration_0 finished after 0.028 seconds +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb_iteration_1 +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb_iteration_1 finished after 0.013 seconds +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/LateNeuronInstComb]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb finished after 0.043 seconds +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/SplitAccGrp]: Running SplitAccGrp +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/SplitAccGrp]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/SplitAccGrp]: SplitAccGrp finished after 0.003 seconds +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/SpillPSum]: Running SpillPSum +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/DataLocalityOpt]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/InferNeuronTensor]: InferNeuronTensor_iteration_0 finished after 0.088 seconds +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/InferNeuronTensor]: Running InferNeuronTensor_iteration_1 +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/SpillPSum]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/DataLocalityOpt]: DataLocalityOpt finished after 0.454 seconds +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/DMATilingProfiler]: Running DMATilingProfiler +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/InferNeuronTensor]: InferNeuronTensor_iteration_1 finished after 0.006 seconds +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/InferNeuronTensor]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/PostDLOTilingBottleneck]: +20 MACROS WITH LARGEST INSTRUCTION COUNTS: +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 1536: matmul_128x128x512 +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 1536: matmul_128x128x512 +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 1536: matmul_128x128x512 +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 512: matmul_128x128x512 +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 512: matmul_128x128x512 +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 256: transpose_128x128 +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 256: transpose_128x128 +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 256: transpose_128x128 +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 256: matmul_128x128x512 +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 256: matmul_128x128x512 +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 128: dma128x128 +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 96: simd128x512 +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 96: dma128x512 +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 64: rmsnorm128x512x128 +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 64: simd128x512 +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 64: rmsnorm128x512x128 +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 64: transpose_128x128 +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 64: transpose_128x128 +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 64: transpose_128x128 +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 64: generic_store128x128 +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/DMATilingProfiler]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/InferNeuronTensor]: InferNeuronTensor finished after 0.095 seconds +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier_iteration_0 +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/DMATilingProfiler]: DMATilingProfiler finished after 0.010 seconds +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier_iteration_0 +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/NeuronSimplifier]: NeuronSimplifier_iteration_0 finished after 0.019 seconds +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/NeuronSimplifier]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.019 seconds +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/LICM]: Running LICM +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/NeuronSimplifier]: NeuronSimplifier_iteration_0 finished after 0.022 seconds +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/NeuronSimplifier]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/LICM]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.024 seconds +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/LegalizeSundaMacro]: Running LegalizeSundaMacro +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/LICM]: LICM finished after 0.005 seconds +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/RewriteReplicationMatmul]: Running RewriteReplicationMatmul +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/RewriteReplicationMatmul]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/LegalizeSundaMacro]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/RewriteReplicationMatmul]: RewriteReplicationMatmul finished after 0.002 seconds +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/LegalizeSundaMacro]: LegalizeSundaMacro finished after 0.024 seconds +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/InsertImplicitShardAxisBeforeISel]: Running InsertImplicitShardAxisBeforeISel +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/FlattenMacroLoop]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/InsertImplicitShardAxisBeforeISel]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.018 seconds +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/SimplifyMacroPredicates]: Running SimplifyMacroPredicates +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/SimplifyMacroPredicates]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/SimplifyMacroPredicates]: SimplifyMacroPredicates finished after 0.006 seconds +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/DataLocalityOpt]: Running DataLocalityOpt +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/InsertImplicitShardAxisBeforeISel]: InsertImplicitShardAxisBeforeISel finished after 0.013 seconds +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier_iteration_0 +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/SpillPSum]: SpillPSum finished after 0.034 seconds +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/LowerIntrinsics]: Running LowerIntrinsics +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/NeuronSimplifier]: NeuronSimplifier_iteration_0 finished after 0.050 seconds +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/NeuronSimplifier]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.051 seconds +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/PerfectLoopNest]: Running PerfectLoopNest +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/PerfectLoopNest]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/LowerIntrinsics]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/PerfectLoopNest]: PerfectLoopNest finished after 0.008 seconds +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/LowerIntrinsics]: LowerIntrinsics finished after 0.043 seconds +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/InlineNativeKernels]: Running InlineNativeKernels +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/InlineNativeKernels]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/InlineNativeKernels]: InlineNativeKernels finished after 0.002 seconds +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/LegalizeType]: Running LegalizeType +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/LegalizeType]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/FlattenMacroLoop]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/LegalizeType]: LegalizeType finished after 0.014 seconds +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.041 seconds +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/RewriteWeights]: Running RewriteWeights +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/NeuronLICM]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/RewriteWeights]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/RewriteWeights]: RewriteWeights finished after 0.012 seconds +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/ReshapeWeights]: Running ReshapeWeights +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/ReshapeWeights]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/ReshapeWeights]: ReshapeWeights finished after 0.002 seconds +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/FlattenMacroLoop]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.027 seconds +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/InferPSumTensor]: Running InferPSumTensor +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/InferPSumTensor]: Running InferPSumTensor_iteration_0 +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.011 seconds +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/SimplifyMacroPredicates]: Running SimplifyMacroPredicates +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/InferPSumTensor]: InferPSumTensor_iteration_0 finished after 0.025 seconds +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/InferPSumTensor]: Running InferPSumTensor_iteration_1 +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/DataLocalityOpt]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/SimplifyMacroPredicates]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/SimplifyMacroPredicates]: SimplifyMacroPredicates finished after 0.021 seconds +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/InferInitValue]: Running InferInitValue +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/InferPSumTensor]: InferPSumTensor_iteration_1 finished after 0.027 seconds +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/InferPSumTensor]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/DataLocalityOpt]: DataLocalityOpt finished after 0.259 seconds +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/DMATilingProfiler]: Running DMATilingProfiler +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/PostDLOTilingBottleneck]: +20 MACROS WITH LARGEST INSTRUCTION COUNTS: +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 512: matmul_128x128x512 +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 512: matmul_128x128x512 +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 256: transpose_128x128 +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 256: matmul_128x128x512 +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 256: matmul_128x128x512 +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 128: transpose_128x128 +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 128: transpose_128x128 +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 128: transpose_128x128 +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 128: transpose_128x128 +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 128: dma128x128 +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 64: indirect_load128x256 +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 64: dma128x512 +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 64: dma128x512 +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 64: rmsnorm128x512x128 +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 64: transpose_128x128 +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 64: transpose_128x128 +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 64: generic_store128x128 +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 64: generic_store128x128 +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 32: rmsnorm128x512x128 +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 32: simd128x256 +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/DMATilingProfiler]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/DMATilingProfiler]: DMATilingProfiler finished after 0.007 seconds +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier_iteration_0 +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/InferPSumTensor]: InferPSumTensor finished after 0.052 seconds +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/WeightCoalescing]: Running WeightCoalescing +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/NeuronSimplifier]: NeuronSimplifier_iteration_0 finished after 0.019 seconds +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/NeuronSimplifier]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/WeightCoalescing]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/InferInitValue]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.020 seconds +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/LegalizeSundaMacro]: Running LegalizeSundaMacro +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/InferInitValue]: InferInitValue finished after 0.068 seconds +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/LegalizeSundaMacro]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier_iteration_0 +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/LegalizeSundaMacro]: LegalizeSundaMacro finished after 0.017 seconds +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/InsertImplicitShardAxisBeforeISel]: Running InsertImplicitShardAxisBeforeISel +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/InsertImplicitShardAxisBeforeISel]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/NeuronSimplifier]: NeuronSimplifier_iteration_0 finished after 0.025 seconds +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/NeuronSimplifier]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/InsertImplicitShardAxisBeforeISel]: InsertImplicitShardAxisBeforeISel finished after 0.008 seconds +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier_iteration_0 +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/NeuronSimplifier]: NeuronSimplifier_iteration_0 finished after 0.021 seconds +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/NeuronSimplifier]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.025 seconds +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/SimplifyTensor]: Running SimplifyTensor +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/SimplifyTensor]: Running DeadCodeElimination_iteration_0 +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.021 seconds +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/PerfectLoopNest]: Running PerfectLoopNest +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/SimplifyTensor]: DeadCodeElimination_iteration_0 finished after 0.004 seconds +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/SimplifyTensor]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/PerfectLoopNest]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/SimplifyTensor]: SimplifyTensor finished after 0.015 seconds +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/LICM]: Running LICM +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/LICM]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/LICM]: LICM finished after 0.006 seconds +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/SundaISel]: Running SundaISel +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/PerfectLoopNest]: PerfectLoopNest finished after 0.003 seconds +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/FlattenMacroLoop]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.013 seconds +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/RewriteWeights]: Running RewriteWeights +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/RewriteWeights]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/WeightCoalescing]: WeightCoalescing finished after 0.007 seconds +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/LegalizeSundaAccess]: Running LegalizeSundaAccess +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/SundaISel]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/SundaISel]: SundaISel finished after 0.068 seconds +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/NeuronAliasDependencyReset]: Running NeuronAliasDependencyReset +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/AliasDependencyElimination]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.000 seconds +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/NeuronAliasDependencyInduction]: Running NeuronAliasDependencyInduction +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/NeuronAliasDependencyInduction]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/NeuronAliasDependencyInduction]: NeuronAliasDependencyInduction finished after 0.002 seconds +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/LegalizeSundaAccess]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/NeuronAliasDependencyReset]: NeuronAliasDependencyReset finished after 0.028 seconds +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/LowerComplexBroadcast]: Running LowerComplexBroadcast +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/LowerComplexBroadcast]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/LowerComplexBroadcast]: LowerComplexBroadcast finished after 0.005 seconds +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/NeuronLoopInterchange]: Running NeuronLoopInterchange +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/NeuronLoopInterchange]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/NeuronLoopInterchange]: NeuronLoopInterchange finished after 0.006 seconds +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/RewriteWeights]: RewriteWeights finished after 0.006 seconds +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/ReshapeWeights]: Running ReshapeWeights +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/ReshapeWeights]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/ReshapeWeights]: ReshapeWeights finished after 0.002 seconds +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop +2025-11-04T21:38:42Z INFO 8738 [sg0000/Tensorizer/FlattenMacroLoop]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/LegalizeSundaAccess]: LegalizeSundaAccess finished after 0.046 seconds +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/RelaxPredicates]: Running RelaxPredicates +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/RelaxPredicates]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.002 seconds +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/NeuronLoopFusion]: Running NeuronLoopFusion +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/NeuronLoopFusion]: Running NeuronLoopFusion_iteration_0 +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/RelaxPredicates]: RelaxPredicates finished after 0.006 seconds +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/TensorInitialization]: Running TensorInitialization +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/NeuronLoopFusion]: NeuronLoopFusion_iteration_0 finished after 0.023 seconds +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/NeuronLoopFusion]: Running NeuronLoopFusion_iteration_1 +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/TensorInitialization]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/NeuronLoopFusion]: NeuronLoopFusion_iteration_1 finished after 0.008 seconds +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/NeuronLoopFusion]: Running NeuronLoopFusion_iteration_2 +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/TensorInitialization]: TensorInitialization finished after 0.011 seconds +2025-11-04T21:38:42Z INFO 8740 [sg0002/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/NeuronLoopFusion]: NeuronLoopFusion_iteration_2 finished after 0.007 seconds +2025-11-04T21:38:42Z INFO 8739 [sg0001/Tensorizer/NeuronLoopFusion]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/NeuronLoopFusion]: NeuronLoopFusion finished after 0.041 seconds +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/NeuronLoopInterchange]: Running NeuronLoopInterchange +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/NeuronLoopInterchange]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.014 seconds +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/ExpandISAMacro]: Running ExpandISAMacro +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/ExpandISAMacro]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/ExpandISAMacro]: ExpandISAMacro finished after 0.006 seconds +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/SimplifyNeuronTensor]: Running SimplifyNeuronTensor +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/SimplifyNeuronTensor]: Running DeadCodeElimination_iteration_0 +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/SimplifyNeuronTensor]: DeadCodeElimination_iteration_0 finished after 0.002 seconds +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/SimplifyNeuronTensor]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/NeuronLoopInterchange]: NeuronLoopInterchange finished after 0.004 seconds +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/SimplifyNeuronTensor]: SimplifyNeuronTensor finished after 0.013 seconds +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/DMALocalityOpt]: Running DMALocalityOpt +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/DMALocalityOpt]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/NeuronLICM]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/DMALocalityOpt]: DMALocalityOpt finished after 0.002 seconds +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/DataStreaming]: Running DataStreaming +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/DataStreaming]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.020 seconds +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/FactorizeBlkDims]: Running FactorizeBlkDims +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/DataStreaming]: DataStreaming finished after 0.007 seconds +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/SFKVectorizer]: Running SFKVectorizer +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/FactorizeBlkDims]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/FactorizeBlkDims]: FactorizeBlkDims finished after 0.038 seconds +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_0 +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.008 seconds +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/SimplifyMacroPredicates]: Running SimplifyMacroPredicates +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/SimplifyMacroPredicates]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/SimplifyMacroPredicates]: SimplifyMacroPredicates finished after 0.010 seconds +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/InferInitValue]: Running InferInitValue +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/SFKVectorizer]: Running VectorizeLoop_iteration_0 +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_0 finished after 0.099 seconds +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_1 +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_1 finished after 0.007 seconds +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/NeuronInstComb]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/InferInitValue]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.114 seconds +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/NeuronValueNumbering]: Running NeuronValueNumbering +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/NeuronValueNumbering]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/SFKVectorizer]: VectorizeLoop_iteration_0 finished after 0.047 seconds +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/SFKVectorizer]: Running VectorizeLoop_iteration_1 +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/NeuronValueNumbering]: NeuronValueNumbering finished after 0.006 seconds +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_0 +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/SFKVectorizer]: VectorizeLoop_iteration_1 finished after 0.008 seconds +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_0 finished after 0.008 seconds +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/SFKVectorizer]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/NeuronInstComb]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.009 seconds +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/InferSharedMemLoc]: Running InferSharedMemLoc +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/InferSharedMemLoc]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/InferInitValue]: InferInitValue finished after 0.085 seconds +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier_iteration_0 +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/SFKVectorizer]: SFKVectorizer finished after 0.192 seconds +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/LateLegalizeInst]: Running LateLegalizeInst +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/NeuronSimplifier]: NeuronSimplifier_iteration_0 finished after 0.020 seconds +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/NeuronSimplifier]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/LateLegalizeInst]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.021 seconds +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/SimplifyTensor]: Running SimplifyTensor +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/SimplifyTensor]: Running DeadCodeElimination_iteration_0 +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/SimplifyTensor]: DeadCodeElimination_iteration_0 finished after 0.003 seconds +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/SimplifyTensor]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/LateLegalizeInst]: LateLegalizeInst finished after 0.012 seconds +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/CoalesceCCOp]: Running CoalesceCCOp +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/CoalesceCCOp]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/SimplifyTensor]: SimplifyTensor finished after 0.013 seconds +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/LICM]: Running LICM +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/LICM]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/CoalesceCCOp]: CoalesceCCOp finished after 0.008 seconds +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/SimpleAllReduceTiling]: Running SimpleAllReduceTiling +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/SimpleAllReduceTiling]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/LICM]: LICM finished after 0.006 seconds +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/SundaISel]: Running SundaISel +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/SimpleAllReduceTiling]: SimpleAllReduceTiling finished after 0.004 seconds +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/InsertCoreBarrier]: Running InsertCoreBarrier +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/InsertCoreBarrier]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/InsertCoreBarrier]: InsertCoreBarrier finished after 0.008 seconds +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/DMAProfiler]: Running DMAProfiler +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/DMAProfiler]: Top 10 (estimated) latency DMAs: +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/DMAProfiler]: Est. DMA time: 1.523ms (300.000MiB, est bw: 206.549GB/s, 56.654% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[2] bfloat16 (2, 297, 128, 2048) %'992.1591'[i31_0,4i31_1_0_0+i31_1_0_1,i0.128,i1.128+128i2.16] = load bfloat16<128 x 2048> {'CrossPassTensor': ''}bfloat16 (2, 37984, 16, 128) %'input369'[i31_0,i0.128+512i31_1_0_0+128i31_1_0_1,i2.16,i1.128] # id=1590, src_id=None, , instances=600 # dl = tensor_op_name: input369_pftranspose_992 | hlo_id: 95 | if -i0.128-512i31_1_0_0-128i31_1_0_1+37983 >= 0 and -4i31_1_0_0-i31_1_0_1+296 >= 0 [[i0.128];[i1.128, i2.16]] -> [[i0.128];[i1.128, i2.16]] +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/DMAProfiler]: Est. DMA time: 244.771us (48.000MiB, est bw: 205.627GB/s, 9.105% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[5] bfloat16 (2, 2, 6, 2, 2, 128, 2048) %1532[i11_0,i11_1_0,2i10_0_0_1_0+i10_0_0_1_1,i10_0_0_0,c2_1041,i0.128,i1.2048] = load bfloat16<128 x 2048> {'CrossPassTensor': ''}bfloat16 (2, 6, 128, 2, 2048) %'input366'[i10_0_0_0,2i10_0_0_1_0+i10_0_0_1_1,i0.128,c2_1041,i1.2048] # id=1367, src_id=None, , instances=96 # dl = tensor_op_name: _dot.197 | hlo_id: 52 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/DMAProfiler]: Est. DMA time: 244.771us (48.000MiB, est bw: 205.627GB/s, 9.105% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[5] bfloat16 (2, 2, 6, 2, 2, 128, 2048) %1530[i16_0_1076,i13_1_0,2i12_0_0_1_0+i12_0_0_1_1,i12_0_0_0,c2_1052,i0.128,i1.2048] = load bfloat16<128 x 2048> {'CrossPassTensor': ''}bfloat16 (2, 6, 128, 2, 2048) %'input368'[i12_0_0_0,2i12_0_0_1_0+i12_0_0_1_1,i0.128,c2_1052,i1.2048] # id=1370, src_id=None, , instances=96 # dl = tensor_op_name: _dot.198 | hlo_id: 42 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/DMAProfiler]: Est. DMA time: 198.539us (24.000MiB, est bw: 126.755GB/s, 7.385% of tot. time) for bfloat16<128 x 512> TongaSB partitions[5] bfloat16 (2, 2, 2, 2, 6, 128, 2, 512) %'input365_local_1070'[i16_0_1076,i15_0_0_0_1,i15_0_0_0_0,c1_1062_2054,c2_1063_2054,i0.128,i3.2,i1.128+128i2.2+256p_1701_2054] = load bfloat16<128 x 512> {'CrossPassTensor': ''}bfloat16 (4, 2, 2, 128, 6, 2, 2, 128) %'input365'[i15_0_0_0_1+2i15_0_0_0_0,p_1701_2054,c1_1062_2054,i0.128,c2_1063_2054,i3.2,i2.2,i1.128] # id=1376, src_id=None, , instances=192 # dl = tensor_op_name: _dot.199 | hlo_id: 63 | [[i0.128];[i1.128, i2.2, i3.2]] -> [[i0.128];[i1.128, i2.2, i3.2]] +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/DMAProfiler]: Est. DMA time: 193.732us (300.000KiB, est bw: 1.586GB/s, 7.207% of tot. time) for float32<1 x 128> {'no_delinear': '0'}non_local float32 (1, 2, 37984) %'convert.55'[0,i31_0,i0.128+512i31_1_0_0+128i31_1_0_1] = store float32<1 x 128> TongaSB partitions[2] float32 (2, 297, 1, 128) %'dot.200.1601'[i31_0,4i31_1_0_0+i31_1_0_1,0,i0.128] # id=1599, src_id=None, , instances=600 # dl = tensor_op_name: _dot.200 | hlo_id: 95 | if -i0.128-512i31_1_0_0-128i31_1_0_1+37983 >= 0 and -4i31_1_0_0-i31_1_0_1+296 >= 0 [[];[i0.128]] -> [[];[i0.128]] +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/DMAProfiler]: Est. DMA time: 41.879us (8.000MiB, est bw: 200.308GB/s, 1.558% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[3] bfloat16 (2, 2, 4, 128, 2048) %'996.1675'[i11_0,i11_1_0,T_i2_0_2052,i0.128,i1.2048] = load bfloat16<128 x 2048> non_local bfloat16 (2, 2, 512, 2048) %'add.9'[i11_0,i11_1_0,i0.128+128T_i2_0_2052,i1.2048] # id=1565, src_id=None, , instances=16 # dl = tensor_op_name: add.9_pftranspose_996 | hlo_id: 27 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/DMAProfiler]: Est. DMA time: 41.879us (8.000MiB, est bw: 200.308GB/s, 1.558% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[3] bfloat16 (2, 2, 4, 128, 2, 2, 512) %'_reload_1526'[i16_0_1076,i13_1_0,i4_0_1_1529_2053_0,i0.128,i3.2,i2.2,i1.512] = load bfloat16<128 x 2048> DRAM3DBlk partitions[3] bfloat16 (4, 2, 2, 128, 2048) %'_spill_1523'[i4_0_1_1529_2053_0,i16_0_1076,i13_1_0,i0.128,i1.512+1024i2.2+512i3.2] # id=1528, src_id=None, , instances=16 # dl = tensor_op_name: _dot.198 | hlo_id: 42 | [[i0.128];[i1.512, i2.2, i3.2]] -> [[i0.128];[i1.512, i2.2, i3.2]] +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/DMAProfiler]: Est. DMA time: 41.879us (8.000MiB, est bw: 200.308GB/s, 1.558% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[3] bfloat16 (2, 2, 4, 128, 2048) %'1000.1680'[T_i20_0_1008,T_i20_1_0_1008,T_i2_0_2055,i0.128,i1.2048] = load bfloat16<128 x 2048> non_local bfloat16 (4194304,) %'all_reduce.3-buffer-2076'[2097152T_i20_0_1008+2048i0.128+1048576T_i20_1_0_1008+i1.2048+262144T_i2_0_2055] # id=1574, src_id=None, , instances=16 # dl = tensor_op_name: all_reduce.3_pftranspose_1000 | hlo_id: 66 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/DMAProfiler]: Est. DMA time: 25.532us (8.000MiB, est bw: 328.547GB/s, 0.950% of tot. time) for bfloat16<128 x 2048> DRAM3DBlk partitions[3] bfloat16 (4, 2, 2, 128, 2048) %'_spill_1523'[i2_0_1_1639_2057_0,i11_0,i11_1_0,i0.128,i1.2048] = store bfloat16<128 x 2048> TongaSB partitions[3] bfloat16 (2, 2, 4, 128, 2048) %1014[i11_0,i11_1_0,i2_0_1_1639_2057_0,i0.128,i1.2048] # id=1525, src_id=None, , instances=16 # dl = tensor_op_name: _custom-call.348 | hlo_id: 34 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/DMAProfiler]: Est. DMA time: 25.532us (8.000MiB, est bw: 328.547GB/s, 0.950% of tot. time) for bfloat16<128 x 2048> non_local bfloat16 (4194304,) %'dot.14-buffer-2074'[2097152i16_0_1076+2048i0.128+1048576i16_1_0_0_1076_1531+i1.2048+262144i16_1_0_1_1076_1531] = store bfloat16<128 x 2048> TongaSB partitions[3] bfloat16 (2, 2, 4, 128, 2048) %1077[i16_0_1076,i16_1_0_0_1076_1531,i16_1_0_1_1076_1531,i0.128,i1.2048] # id=1379, src_id=None, , instances=16 # dl = tensor_op_name: _dot.199 | hlo_id: 63 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/DMAProfiler]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/DMAProfiler]: DMAProfiler finished after 0.010 seconds +2025-11-04T21:38:43Z INFO 8740 [sg0002/Tensorizer/OptimizeNKIKernels]: Running OptimizeNKIKernels +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/SundaISel]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8740 [topk/Tensorizer/DoNothing]: Running DoNothing +2025-11-04T21:38:43Z INFO 8740 [topk/Tensorizer/DoNothing]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/SundaISel]: SundaISel finished after 0.065 seconds +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/NeuronAliasDependencyReset]: Running NeuronAliasDependencyReset +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/AliasDependencyElimination]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.000 seconds +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/NeuronAliasDependencyInduction]: Running NeuronAliasDependencyInduction +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/NeuronAliasDependencyInduction]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/NeuronAliasDependencyInduction]: NeuronAliasDependencyInduction finished after 0.001 seconds +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/NeuronAliasDependencyReset]: NeuronAliasDependencyReset finished after 0.023 seconds +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/LowerComplexBroadcast]: Running LowerComplexBroadcast +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/LowerComplexBroadcast]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8740 [topk/Tensorizer/DoNothing]: DoNothing finished after 0.002 seconds +2025-11-04T21:38:43Z INFO 8740 [topk/Tensorizer/InferSharedMemLoc]: Running InferSharedMemLoc +2025-11-04T21:38:43Z INFO 8740 [topk/Tensorizer/InferSharedMemLoc]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/LowerComplexBroadcast]: LowerComplexBroadcast finished after 0.011 seconds +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/NeuronLoopInterchange]: Running NeuronLoopInterchange +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/NeuronLoopInterchange]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8740 [topk/Tensorizer/InferSharedMemLoc]: InferSharedMemLoc finished after 0.013 seconds +2025-11-04T21:38:43Z INFO 8740 [topk/Tensorizer/FactorizeBlkDims]: Running FactorizeBlkDims +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/NeuronLoopInterchange]: NeuronLoopInterchange finished after 0.005 seconds +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.007 seconds +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/NeuronLoopFusion]: Running NeuronLoopFusion +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/NeuronLoopFusion]: Running NeuronLoopFusion_iteration_0 +2025-11-04T21:38:43Z INFO 8740 [topk/Tensorizer/FactorizeBlkDims]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8740 [topk/Tensorizer/FactorizeBlkDims]: FactorizeBlkDims finished after 0.037 seconds +2025-11-04T21:38:43Z INFO 8740 [topk/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-11-04T21:38:43Z INFO 8740 [topk/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_0 +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/InferSharedMemLoc]: InferSharedMemLoc finished after 0.008 seconds +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/VectorizeDMA]: Running VectorizeDMA +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/VectorizeDMA]: Running VectorizeDMA_iteration_0 +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/NeuronLoopFusion]: NeuronLoopFusion_iteration_0 finished after 0.045 seconds +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/NeuronLoopFusion]: Running NeuronLoopFusion_iteration_1 +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/VectorizeDMA]: VectorizeDMA_iteration_0 finished after 0.007 seconds +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/VectorizeDMA]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/NeuronLoopFusion]: NeuronLoopFusion_iteration_1 finished after 0.011 seconds +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/NeuronLoopFusion]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8740 [topk/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_0 finished after 0.044 seconds +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/VectorizeDMA]: VectorizeDMA finished after 0.008 seconds +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-11-04T21:38:43Z INFO 8740 [topk/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_1 +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/NeuronLoopFusion]: NeuronLoopFusion finished after 0.056 seconds +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/NeuronLoopInterchange]: Running NeuronLoopInterchange +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/NeuronLoopInterchange]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8740 [topk/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_1 finished after 0.020 seconds +2025-11-04T21:38:43Z INFO 8740 [topk/Tensorizer/NeuronInstComb]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.006 seconds +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/LegalizePartitionReduce]: Running LegalizePartitionReduce +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/LegalizePartitionReduce]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/NeuronLoopInterchange]: NeuronLoopInterchange finished after 0.009 seconds +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/LegalizePartitionReduce]: LegalizePartitionReduce finished after 0.003 seconds +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/DeConcat]: Running DeConcat +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/DeConcat]: Running DeConcat_iteration_0 +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/DeConcat]: DeConcat_iteration_0 finished after 0.008 seconds +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/DeConcat]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/DeConcat]: DeConcat finished after 0.010 seconds +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/FactorizeThreadAxesInFreeDims]: Running FactorizeThreadAxesInFreeDims +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/FactorizeThreadAxesInFreeDims]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/NeuronLICM]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8740 [topk/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.071 seconds +2025-11-04T21:38:43Z INFO 8740 [topk/Tensorizer/NeuronValueNumbering]: Running NeuronValueNumbering +2025-11-04T21:38:43Z INFO 8740 [topk/Tensorizer/NeuronValueNumbering]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.053 seconds +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/FactorizeBlkDims]: Running FactorizeBlkDims +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/FactorizeThreadAxesInFreeDims]: FactorizeThreadAxesInFreeDims finished after 0.008 seconds +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/PartialSimdFusion]: Running PartialSimdFusion +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/PartialSimdFusion]: Running PartialSimdFusion_iteration_0 +2025-11-04T21:38:43Z INFO 8740 [topk/Tensorizer/NeuronValueNumbering]: NeuronValueNumbering finished after 0.013 seconds +2025-11-04T21:38:43Z INFO 8740 [topk/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-11-04T21:38:43Z INFO 8740 [topk/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_0 +2025-11-04T21:38:43Z INFO 8740 [topk/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_0 finished after 0.019 seconds +2025-11-04T21:38:43Z INFO 8740 [topk/Tensorizer/NeuronInstComb]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/FactorizeBlkDims]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8740 [topk/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.020 seconds +2025-11-04T21:38:43Z INFO 8740 [topk/Tensorizer/LowerTranspose]: Running LowerTranspose +2025-11-04T21:38:43Z INFO 8740 [topk/Tensorizer/LowerTranspose]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/FactorizeBlkDims]: FactorizeBlkDims finished after 0.061 seconds +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_0 +2025-11-04T21:38:43Z INFO 8740 [topk/Tensorizer/LowerTranspose]: LowerTranspose finished after 0.014 seconds +2025-11-04T21:38:43Z INFO 8740 [topk/Tensorizer/LowerBroadcast]: Running LowerBroadcast +2025-11-04T21:38:43Z INFO 8740 [topk/Tensorizer/LowerBroadcast]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/PartialSimdFusion]: PartialSimdFusion_iteration_0 finished after 0.103 seconds +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/PartialSimdFusion]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8740 [topk/Tensorizer/LowerBroadcast]: LowerBroadcast finished after 0.014 seconds +2025-11-04T21:38:43Z INFO 8740 [topk/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb +2025-11-04T21:38:43Z INFO 8740 [topk/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb_iteration_0 +2025-11-04T21:38:43Z INFO 8740 [topk/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb_iteration_0 finished after 0.010 seconds +2025-11-04T21:38:43Z INFO 8740 [topk/Tensorizer/LateNeuronInstComb]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/PartialSimdFusion]: PartialSimdFusion finished after 0.104 seconds +2025-11-04T21:38:43Z INFO 8739 [sg0001/Tensorizer/TritiumFusion]: Running TritiumFusion +2025-11-04T21:38:43Z INFO 8740 [topk/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb finished after 0.011 seconds +2025-11-04T21:38:43Z INFO 8740 [topk/Tensorizer/SpillPSum]: Running SpillPSum +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_0 finished after 0.089 seconds +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_1 +2025-11-04T21:38:43Z INFO 8738 [sg0000/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_1 finished after 0.008 seconds +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/NeuronInstComb]: Finished (changed=True) +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/SpillPSum]: Finished (changed=True) +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.108 seconds +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/NeuronValueNumbering]: Running NeuronValueNumbering +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/NeuronValueNumbering]: Finished (changed=True) +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/SpillPSum]: SpillPSum finished after 0.042 seconds +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/LowerIntrinsics]: Running LowerIntrinsics +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/LowerIntrinsics]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/NeuronValueNumbering]: NeuronValueNumbering finished after 0.007 seconds +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_0 +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_0 finished after 0.005 seconds +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/NeuronInstComb]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.006 seconds +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/InferSharedMemLoc]: Running InferSharedMemLoc +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/InferSharedMemLoc]: Finished (changed=True) +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/LowerIntrinsics]: LowerIntrinsics finished after 0.004 seconds +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/LegalizeType]: Running LegalizeType +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/LegalizeType]: Finished (changed=True) +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/InferSharedMemLoc]: InferSharedMemLoc finished after 0.004 seconds +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/VectorizeDMA]: Running VectorizeDMA +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/VectorizeDMA]: Running VectorizeDMA_iteration_0 +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/VectorizeDMA]: VectorizeDMA_iteration_0 finished after 0.005 seconds +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/VectorizeDMA]: Running VectorizeDMA_iteration_1 +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/VectorizeDMA]: VectorizeDMA_iteration_1 finished after 0.002 seconds +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/VectorizeDMA]: Finished (changed=True) +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/LegalizeType]: LegalizeType finished after 0.016 seconds +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/TritiumFusion]: Finished (changed=True) +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/VectorizeDMA]: VectorizeDMA finished after 0.008 seconds +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/NeuronLICM]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/TritiumFusion]: TritiumFusion finished after 0.169 seconds +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/CCOpFusion]: Running CCOpFusion +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/CCOpFusion]: Running CCOpFusion_iteration_0 +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.015 seconds +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/InferPSumTensor]: Running InferPSumTensor +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/InferPSumTensor]: Running InferPSumTensor_iteration_0 +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/InferPSumTensor]: InferPSumTensor_iteration_0 finished after 0.015 seconds +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/InferPSumTensor]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.002 seconds +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/LegalizePartitionReduce]: Running LegalizePartitionReduce +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/CCOpFusion]: CCOpFusion_iteration_0 finished after 0.036 seconds +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/CCOpFusion]: Finished (changed=True) +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/LegalizePartitionReduce]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/InferPSumTensor]: InferPSumTensor finished after 0.016 seconds +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/WeightCoalescing]: Running WeightCoalescing +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/WeightCoalescing]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/WeightCoalescing]: WeightCoalescing finished after 0.006 seconds +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/LegalizeSundaAccess]: Running LegalizeSundaAccess +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/LegalizePartitionReduce]: LegalizePartitionReduce finished after 0.002 seconds +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/DeConcat]: Running DeConcat +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/DeConcat]: Running DeConcat_iteration_0 +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/DeConcat]: DeConcat_iteration_0 finished after 0.003 seconds +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/DeConcat]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/LegalizeSundaAccess]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/DeConcat]: DeConcat finished after 0.006 seconds +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/FactorizeThreadAxesInFreeDims]: Running FactorizeThreadAxesInFreeDims +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/FactorizeThreadAxesInFreeDims]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/LegalizeSundaAccess]: LegalizeSundaAccess finished after 0.026 seconds +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/FactorizeThreadAxesInFreeDims]: FactorizeThreadAxesInFreeDims finished after 0.002 seconds +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/PartialSimdFusion]: Running PartialSimdFusion +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/PartialSimdFusion]: Running PartialSimdFusion_iteration_0 +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.007 seconds +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/ExpandISAMacro]: Running ExpandISAMacro +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/ExpandISAMacro]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/CCOpFusion]: CCOpFusion finished after 0.038 seconds +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/VectorizeMatMult]: Running VectorizeMatMult +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/ExpandISAMacro]: ExpandISAMacro finished after 0.005 seconds +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/SimplifyNeuronTensor]: Running SimplifyNeuronTensor +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/PartialSimdFusion]: PartialSimdFusion_iteration_0 finished after 0.059 seconds +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/PartialSimdFusion]: Finished (changed=True) +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/PartialSimdFusion]: PartialSimdFusion finished after 0.060 seconds +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/TritiumFusion]: Running TritiumFusion +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/VectorizeMatMult]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/VectorizeMatMult]: VectorizeMatMult finished after 0.043 seconds +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/PartialLoopFusion]: Running PartialLoopFusion +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/PartialLoopFusion]: Running PartialLoopFusion_iteration_0 +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/PartialLoopFusion]: PartialLoopFusion_iteration_0 finished after 0.050 seconds +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/PartialLoopFusion]: Finished (changed=True) +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/PartialLoopFusion]: PartialLoopFusion finished after 0.051 seconds +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/NeuronLICM]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/SimplifyNeuronTensor]: Running DeadCodeElimination_iteration_0 +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/SimplifyNeuronTensor]: DeadCodeElimination_iteration_0 finished after 0.002 seconds +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.012 seconds +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/LowerTranspose]: Running LowerTranspose +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/SimplifyNeuronTensor]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/LowerTranspose]: Finished (changed=True) +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/SimplifyNeuronTensor]: SimplifyNeuronTensor finished after 0.133 seconds +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/DMALocalityOpt]: Running DMALocalityOpt +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/TritiumFusion]: Finished (changed=True) +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/DMALocalityOpt]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/LowerTranspose]: LowerTranspose finished after 0.015 seconds +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/LowerBroadcast]: Running LowerBroadcast +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/LowerBroadcast]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/DMALocalityOpt]: DMALocalityOpt finished after 0.005 seconds +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/DataStreaming]: Running DataStreaming +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/DataStreaming]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/LowerBroadcast]: LowerBroadcast finished after 0.003 seconds +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb_iteration_0 +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb_iteration_0 finished after 0.013 seconds +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/LateNeuronInstComb]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/DataStreaming]: DataStreaming finished after 0.012 seconds +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/SFKVectorizer]: Running SFKVectorizer +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb finished after 0.014 seconds +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/SplitAccGrp]: Running SplitAccGrp +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/SplitAccGrp]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/SplitAccGrp]: SplitAccGrp finished after 0.003 seconds +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/SpillPSum]: Running SpillPSum +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/SFKVectorizer]: Running VectorizeLoop_iteration_0 +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/SFKVectorizer]: VectorizeLoop_iteration_0 finished after 0.002 seconds +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/SFKVectorizer]: Finished (changed=True) +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/TritiumFusion]: TritiumFusion finished after 0.120 seconds +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/CCOpFusion]: Running CCOpFusion +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/CCOpFusion]: Running CCOpFusion_iteration_0 +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/SFKVectorizer]: SFKVectorizer finished after 0.050 seconds +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/LateLegalizeInst]: Running LateLegalizeInst +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/SpillPSum]: Finished (changed=True) +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/SpillPSum]: SpillPSum finished after 0.049 seconds +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/LowerIntrinsics]: Running LowerIntrinsics +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/LateLegalizeInst]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/LateLegalizeInst]: LateLegalizeInst finished after 0.014 seconds +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/CoalesceCCOp]: Running CoalesceCCOp +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/CCOpFusion]: CCOpFusion_iteration_0 finished after 0.037 seconds +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/CCOpFusion]: Finished (changed=True) +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/CCOpFusion]: CCOpFusion finished after 0.038 seconds +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/VectorizeMatMult]: Running VectorizeMatMult +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/CoalesceCCOp]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/CoalesceCCOp]: CoalesceCCOp finished after 0.009 seconds +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/SimpleAllReduceTiling]: Running SimpleAllReduceTiling +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/SimpleAllReduceTiling]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/VectorizeMatMult]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/SimpleAllReduceTiling]: SimpleAllReduceTiling finished after 0.005 seconds +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/InsertCoreBarrier]: Running InsertCoreBarrier +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/LowerIntrinsics]: Finished (changed=True) +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/InsertCoreBarrier]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/VectorizeMatMult]: VectorizeMatMult finished after 0.030 seconds +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/PartialLoopFusion]: Running PartialLoopFusion +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/PartialLoopFusion]: Running PartialLoopFusion_iteration_0 +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/InsertCoreBarrier]: InsertCoreBarrier finished after 0.007 seconds +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/DMAProfiler]: Running DMAProfiler +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/DMAProfiler]: Top 10 (estimated) latency DMAs: +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/DMAProfiler]: Est. DMA time: 4.177us (296.750KiB, est bw: 72.741GB/s, 20.220% of tot. time) for float32<32 x 2374> TongaSB partitions[0] float32 (32, 2630) %4(init=0.0)[i0.32,i1.2374] = load float32<32 x 2374> float32 (32, 2374) %6[i0.32,i1.2374] # id=7, src_id=None, , instances=1 # dl = tensor_op_name: | /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/nki/_pre_prod_kernels/topk/topk.py:45:0 | [[i0.32];[i1.2374]] -> [[i0.32];[i1.2374]] +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/DMAProfiler]: Est. DMA time: 4.177us (296.750KiB, est bw: 72.741GB/s, 20.220% of tot. time) for float32<32 x 2374> TongaSB partitions[0] float32 (32, 2374) %10[i0.32,i1.2374] = load float32<32 x 2374> float32 (1, 75968) %'inp'[i0.32,i1.2374] # id=9, src_id=None, , instances=1 # dl = tensor_op_name: | /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/nki/_pre_prod_kernels/topk/topk.py:45:0 | [[i0.32];[i1.2374]] -> [[i0.32];[i1.2374]] +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/DMAProfiler]: Est. DMA time: 1.965us (4.000KiB, est bw: 2.085GB/s, 9.509% of tot. time) for float32<32 x 32> TongaSB partitions[0] float32 (32, 32) %485[i0.32,i1.32] = load float32<32 x 32> float32 (32, 32) %3[i0.32,i1.32] # id=13, src_id=None, , instances=1 # dl = tensor_op_name: | /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/nki/_pre_prod_kernels/topk/topk.py:45:0 | [[i0.32];[i1.32]] -> [[i0.32];[i1.32]] +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/DMAProfiler]: Est. DMA time: 1.922us (1.000KiB, est bw: 0.533GB/s, 9.301% of tot. time) for float32<1 x 256> TongaSB partitions[0] float32 (1, 256) %316[0,i0.256] = load float32<1 x 256> float32 (32, 8) %304[0,i0.256] # id=306, src_id=None, , instances=1 # dl = tensor_op_name: | /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/nki/_pre_prod_kernels/topk/topk.py:45:0 | [[];[i0.256]] -> [[];[i0.256]] +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/DMAProfiler]: Est. DMA time: 1.922us (1.000KiB, est bw: 0.533GB/s, 9.301% of tot. time) for uint32<1 x 256> TongaSB partitions[0] uint32 (1, 256) %319[0,i0.256] = load float32<1 x 256> float32 (32, 8) %307[0,i0.256] # id=309, src_id=None, , instances=1 # dl = tensor_op_name: | /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/nki/_pre_prod_kernels/topk/topk.py:45:0 | [[];[i0.256]] -> [[];[i0.256]] +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/DMAProfiler]: Est. DMA time: 1.640us (1.000KiB, est bw: 0.625GB/s, 7.936% of tot. time) for uint32<1 x 256> uint32 (1, 256) %'topk_indices'[0,i0.256] = store uint32<1 x 256> TongaSB partitions[0] uint32 (1, 256) %'global_id_buf'(init=0.0)[0,i0.256] # id=322, src_id=None, , instances=1 # dl = tensor_op_name: | /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/nki/_pre_prod_kernels/topk/topk.py:45:0 | [[];[i0.256]] -> [[];[i0.256]] +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/DMAProfiler]: Est. DMA time: 1.640us (1.000KiB, est bw: 0.625GB/s, 7.936% of tot. time) for float32<1 x 256> float32 (1, 256) %'topk_values'[0,i0.256] = store float32<1 x 256> TongaSB partitions[0] float32 (1, 256) %'val_buf'(init=0.0)[0,i0.256] # id=324, src_id=None, , instances=1 # dl = tensor_op_name: | /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/nki/_pre_prod_kernels/topk/topk.py:45:0 | [[];[i0.256]] -> [[];[i0.256]] +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/DMAProfiler]: Est. DMA time: 1.609us (1.000KiB, est bw: 0.636GB/s, 7.789% of tot. time) for float32<32 x 8> float32 (32, 8) %304[i0.32,i1.8] = store float32<32 x 8> TongaSB partitions[0] float32 (32, 8) %296[i0.32,i1.8] # id=305, src_id=None, , instances=1 # dl = tensor_op_name: | /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/nki/_pre_prod_kernels/topk/topk.py:45:0 | [[i0.32];[i1.8]] -> [[i0.32];[i1.8]] +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/DMAProfiler]: Est. DMA time: 1.609us (1.000KiB, est bw: 0.636GB/s, 7.789% of tot. time) for float32<32 x 8> float32 (32, 8) %307[i0.32,i1.8] = store float32<32 x 8> TongaSB partitions[0] float32 (32, 8) %517[i0.32,i1.8] # id=308, src_id=None, , instances=1 # dl = tensor_op_name: | /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/nki/_pre_prod_kernels/topk/topk.py:45:0 | [[i0.32];[i1.8]] -> [[i0.32];[i1.8]] +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/DMAProfiler]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/DMAProfiler]: DMAProfiler finished after 0.006 seconds +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/InferSharedMemLoc]: Running InferSharedMemLoc +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/InferSharedMemLoc]: Finished (changed=True) +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/PartialLoopFusion]: PartialLoopFusion_iteration_0 finished after 0.045 seconds +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/PartialLoopFusion]: Finished (changed=True) +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/InferSharedMemLoc]: InferSharedMemLoc finished after 0.013 seconds +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/PartialLoopFusion]: PartialLoopFusion finished after 0.046 seconds +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/LowerIntrinsics]: LowerIntrinsics finished after 0.062 seconds +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/InlineNativeKernels]: Running InlineNativeKernels +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/InlineNativeKernels]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/InlineNativeKernels]: InlineNativeKernels finished after 0.005 seconds +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/LegalizeType]: Running LegalizeType +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/LegalizeType]: Finished (changed=True) +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/NeuronLICM]: Finished (changed=True) +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/LegalizeType]: LegalizeType finished after 0.012 seconds +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/NeuronLICM]: Finished (changed=True) +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.061 seconds +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/LowerTranspose]: Running LowerTranspose +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.018 seconds +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/InferPSumTensor]: Running InferPSumTensor +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/InferPSumTensor]: Running InferPSumTensor_iteration_0 +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/LowerTranspose]: Finished (changed=True) +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/DoNothing]: Running DoNothing +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/DoNothing]: Finished (changed=True) +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/LowerTranspose]: LowerTranspose finished after 0.022 seconds +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/LowerBroadcast]: Running LowerBroadcast +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/LowerBroadcast]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/InferSharedMemLoc]: Running InferSharedMemLoc +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/InferSharedMemLoc]: Finished (changed=True) +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/LowerBroadcast]: LowerBroadcast finished after 0.006 seconds +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb_iteration_0 +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/InferPSumTensor]: InferPSumTensor_iteration_0 finished after 0.046 seconds +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/InferPSumTensor]: Running InferPSumTensor_iteration_1 +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/InferSharedMemLoc]: InferSharedMemLoc finished after 0.014 seconds +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/FactorizeBlkDims]: Running FactorizeBlkDims +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/InferPSumTensor]: InferPSumTensor_iteration_1 finished after 0.039 seconds +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/InferPSumTensor]: Finished (changed=True) +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/FactorizeBlkDims]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb_iteration_0 finished after 0.042 seconds +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb_iteration_1 +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/InferPSumTensor]: InferPSumTensor finished after 0.086 seconds +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb_iteration_1 finished after 0.009 seconds +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/WeightCoalescing]: Running WeightCoalescing +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/LateNeuronInstComb]: Finished (changed=True) +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/WeightCoalescing]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb finished after 0.054 seconds +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/SplitAccGrp]: Running SplitAccGrp +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/SplitAccGrp]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/WeightCoalescing]: WeightCoalescing finished after 0.004 seconds +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/LegalizeSundaAccess]: Running LegalizeSundaAccess +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/SplitAccGrp]: SplitAccGrp finished after 0.003 seconds +2025-11-04T21:38:44Z INFO 8738 [sg0000/Tensorizer/SpillPSum]: Running SpillPSum +2025-11-04T21:38:44Z INFO 8739 [sg0001/Tensorizer/LegalizeSundaAccess]: Finished (changed=True) +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/FactorizeBlkDims]: FactorizeBlkDims finished after 0.033 seconds +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-11-04T21:38:44Z INFO 8740 [topk/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_0 +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/LegalizeSundaAccess]: LegalizeSundaAccess finished after 0.026 seconds +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/RelaxPredicates]: Running RelaxPredicates +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/RelaxPredicates]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/SpillPSum]: Finished (changed=True) +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/RelaxPredicates]: RelaxPredicates finished after 0.008 seconds +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/TensorInitialization]: Running TensorInitialization +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_0 finished after 0.029 seconds +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_1 +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/TensorInitialization]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/SpillPSum]: SpillPSum finished after 0.043 seconds +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/LowerIntrinsics]: Running LowerIntrinsics +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_1 finished after 0.014 seconds +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/NeuronInstComb]: Finished (changed=True) +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/TensorInitialization]: TensorInitialization finished after 0.005 seconds +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.045 seconds +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/NeuronValueNumbering]: Running NeuronValueNumbering +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/LowerIntrinsics]: Finished (changed=True) +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/NeuronValueNumbering]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.007 seconds +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/ExpandISAMacro]: Running ExpandISAMacro +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/ExpandISAMacro]: Finished (changed=True) +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/NeuronValueNumbering]: NeuronValueNumbering finished after 0.007 seconds +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_0 +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/ExpandISAMacro]: ExpandISAMacro finished after 0.006 seconds +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/SimplifyNeuronTensor]: Running SimplifyNeuronTensor +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_0 finished after 0.010 seconds +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/NeuronInstComb]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/SimplifyNeuronTensor]: Running DeadCodeElimination_iteration_0 +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/SimplifyNeuronTensor]: DeadCodeElimination_iteration_0 finished after 0.001 seconds +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.011 seconds +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/LowerTranspose]: Running LowerTranspose +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/SimplifyNeuronTensor]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/LowerTranspose]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/SimplifyNeuronTensor]: SimplifyNeuronTensor finished after 0.012 seconds +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/DMALocalityOpt]: Running DMALocalityOpt +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/DMALocalityOpt]: Finished (changed=True) +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/DMALocalityOpt]: DMALocalityOpt finished after 0.002 seconds +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/DataStreaming]: Running DataStreaming +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/DataStreaming]: Finished (changed=True) +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/DataStreaming]: DataStreaming finished after 0.007 seconds +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/SFKVectorizer]: Running SFKVectorizer +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/LowerIntrinsics]: LowerIntrinsics finished after 0.042 seconds +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/InlineNativeKernels]: Running InlineNativeKernels +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/InlineNativeKernels]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/InlineNativeKernels]: InlineNativeKernels finished after 0.003 seconds +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/LegalizeType]: Running LegalizeType +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/LegalizeType]: Finished (changed=True) +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/LegalizeType]: LegalizeType finished after 0.008 seconds +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/LowerTranspose]: LowerTranspose finished after 0.005 seconds +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/LowerBroadcast]: Running LowerBroadcast +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/LowerBroadcast]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/NeuronLICM]: Finished (changed=True) +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/LowerBroadcast]: LowerBroadcast finished after 0.023 seconds +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb_iteration_0 +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.046 seconds +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/InferPSumTensor]: Running InferPSumTensor +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/InferPSumTensor]: Running InferPSumTensor_iteration_0 +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb_iteration_0 finished after 0.017 seconds +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/LateNeuronInstComb]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb finished after 0.020 seconds +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/SpillPSum]: Running SpillPSum +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/InferPSumTensor]: InferPSumTensor_iteration_0 finished after 0.052 seconds +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/InferPSumTensor]: Running InferPSumTensor_iteration_1 +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/SpillPSum]: Finished (changed=True) +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/InferPSumTensor]: InferPSumTensor_iteration_1 finished after 0.042 seconds +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/InferPSumTensor]: Finished (changed=True) +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/SFKVectorizer]: Running VectorizeLoop_iteration_0 +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/SpillPSum]: SpillPSum finished after 0.059 seconds +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/LowerIntrinsics]: Running LowerIntrinsics +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/LowerIntrinsics]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/InferPSumTensor]: InferPSumTensor finished after 0.095 seconds +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/WeightCoalescing]: Running WeightCoalescing +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/WeightCoalescing]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/LowerIntrinsics]: LowerIntrinsics finished after 0.008 seconds +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/LegalizeType]: Running LegalizeType +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/WeightCoalescing]: WeightCoalescing finished after 0.009 seconds +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/LegalizeSundaAccess]: Running LegalizeSundaAccess +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/LegalizeType]: Finished (changed=True) +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/LegalizeType]: LegalizeType finished after 0.026 seconds +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/NeuronLICM]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.028 seconds +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/InferPSumTensor]: Running InferPSumTensor +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/InferPSumTensor]: Running InferPSumTensor_iteration_0 +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/SFKVectorizer]: VectorizeLoop_iteration_0 finished after 0.166 seconds +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/SFKVectorizer]: Running VectorizeLoop_iteration_1 +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/LegalizeSundaAccess]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/SFKVectorizer]: VectorizeLoop_iteration_1 finished after 0.025 seconds +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/LegalizeSundaAccess]: LegalizeSundaAccess finished after 0.135 seconds +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/RelaxPredicates]: Running RelaxPredicates +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/RelaxPredicates]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/SFKVectorizer]: Finished (changed=True) +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/RelaxPredicates]: RelaxPredicates finished after 0.009 seconds +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/TensorInitialization]: Running TensorInitialization +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/SFKVectorizer]: SFKVectorizer finished after 0.441 seconds +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/LateLegalizeInst]: Running LateLegalizeInst +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/TensorInitialization]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/LateLegalizeInst]: Finished (changed=True) +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/TensorInitialization]: TensorInitialization finished after 0.014 seconds +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/LateLegalizeInst]: LateLegalizeInst finished after 0.018 seconds +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/CoalesceCCOp]: Running CoalesceCCOp +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.005 seconds +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/ExpandISAMacro]: Running ExpandISAMacro +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/CoalesceCCOp]: Finished (changed=True) +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/InferPSumTensor]: InferPSumTensor_iteration_0 finished after 0.175 seconds +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/InferPSumTensor]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/ExpandISAMacro]: Finished (changed=True) +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/CoalesceCCOp]: CoalesceCCOp finished after 0.017 seconds +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/SimpleAllReduceTiling]: Running SimpleAllReduceTiling +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/SimpleAllReduceTiling]: Finished (changed=True) +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/SimpleAllReduceTiling]: SimpleAllReduceTiling finished after 0.005 seconds +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/InsertCoreBarrier]: Running InsertCoreBarrier +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/InsertCoreBarrier]: Finished (changed=True) +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/ExpandISAMacro]: ExpandISAMacro finished after 0.009 seconds +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/SimplifyNeuronTensor]: Running SimplifyNeuronTensor +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/InferPSumTensor]: InferPSumTensor finished after 0.180 seconds +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/WeightCoalescing]: Running WeightCoalescing +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/WeightCoalescing]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/SimplifyNeuronTensor]: Running DeadCodeElimination_iteration_0 +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/SimplifyNeuronTensor]: DeadCodeElimination_iteration_0 finished after 0.002 seconds +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/SimplifyNeuronTensor]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/WeightCoalescing]: WeightCoalescing finished after 0.007 seconds +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/LegalizeSundaAccess]: Running LegalizeSundaAccess +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/SimplifyNeuronTensor]: SimplifyNeuronTensor finished after 0.034 seconds +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/DMALocalityOpt]: Running DMALocalityOpt +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/DMALocalityOpt]: Finished (changed=True) +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/DMALocalityOpt]: DMALocalityOpt finished after 0.002 seconds +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/DataStreaming]: Running DataStreaming +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/LegalizeSundaAccess]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/InsertCoreBarrier]: InsertCoreBarrier finished after 0.008 seconds +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/DMAProfiler]: Running DMAProfiler +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/DMAProfiler]: Top 10 (estimated) latency DMAs: +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/DMAProfiler]: Est. DMA time: 244.771us (48.000MiB, est bw: 205.627GB/s, 17.910% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[5] bfloat16 (2, 2, 6, 2, 2, 128, 2048) %1783[i11_0,i11_1_0,2i10_0_0_1_0+i10_0_0_1_1,i10_0_0_0,c2_1397,i0.128,i1.2048] = load bfloat16<128 x 2048> {'CrossPassTensor': ''}bfloat16 (2, 6, 128, 2, 2048) %'input69'[i10_0_0_0,2i10_0_0_1_0+i10_0_0_1_1,i0.128,c2_1397,i1.2048] # id=1658, src_id=None, , instances=96 # dl = tensor_op_name: _dot.4 | hlo_id: 40 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/DMAProfiler]: Est. DMA time: 244.771us (48.000MiB, est bw: 205.627GB/s, 17.910% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[5] bfloat16 (2, 2, 6, 2, 2, 128, 2048) %1781[i16_0_1432,i13_1_0,2i12_0_0_1_0+i12_0_0_1_1,i12_0_0_0,c2_1408,i0.128,i1.2048] = load bfloat16<128 x 2048> {'CrossPassTensor': ''}bfloat16 (2, 6, 128, 2, 2048) %'input71'[i12_0_0_0,2i12_0_0_1_0+i12_0_0_1_1,i0.128,c2_1408,i1.2048] # id=1661, src_id=None, , instances=96 # dl = tensor_op_name: _dot.5 | hlo_id: 30 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/DataStreaming]: Finished (changed=True) +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/DMAProfiler]: Est. DMA time: 198.539us (24.000MiB, est bw: 126.755GB/s, 14.527% of tot. time) for bfloat16<128 x 512> TongaSB partitions[5] bfloat16 (2, 2, 2, 2, 6, 128, 2, 512) %'input68_local_1426'[i16_0_1432,i15_0_0_0_1,i15_0_0_0_0,c1_1418_2435,c2_1419_2435,i0.128,i3.2,i1.128+128i2.2+256p_1957_2435] = load bfloat16<128 x 512> {'CrossPassTensor': ''}bfloat16 (4, 2, 2, 128, 6, 2, 2, 128) %'input68'[i15_0_0_0_1+2i15_0_0_0_0,p_1957_2435,c1_1418_2435,i0.128,c2_1419_2435,i3.2,i2.2,i1.128] # id=1667, src_id=None, , instances=192 # dl = tensor_op_name: _dot.6 | hlo_id: 51 | [[i0.128];[i1.128, i2.2, i3.2]] -> [[i0.128];[i1.128, i2.2, i3.2]] +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/DMAProfiler]: Est. DMA time: 82.457us (16.000MiB, est bw: 203.466GB/s, 6.033% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[5] bfloat16 (2, 2, 2, 2, 2, 128, 2048) %1784[i37_0,i37_1_0,i38_0_0,c1_1442,c2_1443,i0.128,i1.2048] = load bfloat16<128 x 2048> {'CrossPassTensor': ''}bfloat16 (2, 2, 128, 4096) %'input78'[i38_0_0,c1_1442,i0.128,i1.2048+2048c2_1443] # id=1681, src_id=None, , instances=32 # dl = tensor_op_name: _dot.9 | hlo_id: 71 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/DMAProfiler]: Est. DMA time: 41.879us (8.000MiB, est bw: 200.308GB/s, 3.064% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[3] bfloat16 (2, 2, 4, 128, 2048) %'1350.1923'[i11_0,i11_1_0,T_i2_0_2433,i0.128,i1.2048] = load bfloat16<128 x 2048> non_local bfloat16 (2, 2, 512, 2048) %'add.4'[i11_0,i11_1_0,i0.128+128T_i2_0_2433,i1.2048] # id=1796, src_id=None, , instances=16 # dl = tensor_op_name: add.4_pftranspose_1350 | hlo_id: 15 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/DMAProfiler]: Est. DMA time: 41.879us (8.000MiB, est bw: 200.308GB/s, 3.064% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[4] bfloat16 (2, 2, 2, 2, 128, 2048) %'_reload_1777'[i16_0_1432,i13_1_0,i4_0_0_711_1780_2434,i4_0_1_1780_0_2434,i0.128,i1.2048] = load bfloat16<128 x 2048> DRAM3DBlk partitions[4] bfloat16 (2, 2, 2, 2, 128, 2048) %'_spill_1774'[i4_0_0_711_1780_2434,i4_0_1_1780_0_2434,i16_0_1432,i13_1_0,i0.128,i1.2048] # id=1779, src_id=None, , instances=16 # dl = tensor_op_name: _dot.5 | hlo_id: 30 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/DMAProfiler]: Est. DMA time: 41.879us (8.000MiB, est bw: 200.308GB/s, 3.064% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[3] bfloat16 (2, 2, 4, 128, 2048) %'1354.1928'[i37_0,i37_1_0,T_i2_0_2436,i0.128,i1.2048] = load bfloat16<128 x 2048> non_local bfloat16 (4194304,) %'all_reduce.1-buffer-2494'[2097152i37_0+2048i0.128+1048576i37_1_0+i1.2048+262144T_i2_0_2436] # id=1805, src_id=None, , instances=16 # dl = tensor_op_name: all_reduce.1_pftranspose_1354 | hlo_id: 54 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/DMAProfiler]: Est. DMA time: 41.879us (8.000MiB, est bw: 200.308GB/s, 3.064% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[4] bfloat16 (2, 2, 2, 2, 128, 2048) %'_reload_1788'[i67_0,i67_1_0_0,i51_0_0_1791,i51_0_1_0_1791,i0.128,i1.2048] = load bfloat16<128 x 2048> DRAM3DBlk partitions[4] bfloat16 (2, 2, 2, 2, 128, 2048) %'_spill_1785'[i51_0_0_1791,i51_0_1_0_1791,i67_0,i67_1_0_0,i0.128,i1.2048] # id=1790, src_id=None, , instances=16 # dl = tensor_op_name: _dot.8 | hlo_id: 114 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/DMAProfiler]: Est. DMA time: 41.879us (8.000MiB, est bw: 200.308GB/s, 3.064% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[4] bfloat16 (2, 2, 2, 2, 128, 2048) %'_reload_1788_reload_1794'[i2_0_1518,i2_1_0_1518_0,i51_0_0_1791_1793,i51_0_1_0_1791_1793,i0.128,i1.2048] = load bfloat16<128 x 2048> DRAM3DBlk partitions[4] bfloat16 (2, 2, 2, 2, 128, 2048) %'_spill_1785'[i51_0_0_1791_1793,i51_0_1_0_1791_1793,i2_0_1518,i2_1_0_1518_0,i0.128,i1.2048] # id=1792, src_id=None, , instances=16 # dl = tensor_op_name: _dot.8 | hlo_id: 114 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/DMAProfiler]: Est. DMA time: 41.879us (8.000MiB, est bw: 200.308GB/s, 3.064% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[4] bfloat16 (2, 2, 2, 2, 128, 16, 128) %'get_tuple_element.2_local_1526'[i98_0_0_0_1543,c0_1520_0,c0_1520_1,c1_1521,i0.128,i1.16,i2.128] = load bfloat16<128 x 2048> non_local bfloat16 (4, 2, 128, 16, 128) %'get_tuple_element.2'[2c0_1520_0+c0_1520_1,c1_1521,i0.128,i1.16,i2.128] # id=1733, src_id=None, , instances=16 # dl = tensor_op_name: _dot.10 | hlo_id: 173 | [[i0.128];[i2.128, i1.16]] -> [[i0.128];[i2.128, i1.16]] +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/DMAProfiler]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/LegalizeSundaAccess]: LegalizeSundaAccess finished after 0.036 seconds +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/DMAProfiler]: DMAProfiler finished after 0.012 seconds +2025-11-04T21:38:45Z INFO 8739 [sg0001/Tensorizer/OptimizeNKIKernels]: Running OptimizeNKIKernels +2025-11-04T21:38:45Z INFO 8739 [attention_isa_kernel/Tensorizer/DoNothing]: Running DoNothing +2025-11-04T21:38:45Z INFO 8739 [attention_isa_kernel/Tensorizer/DoNothing]: Finished (changed=True) +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/DataStreaming]: DataStreaming finished after 0.024 seconds +2025-11-04T21:38:45Z INFO 8738 [sg0000/Tensorizer/SFKVectorizer]: Running SFKVectorizer +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.007 seconds +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/ExpandISAMacro]: Running ExpandISAMacro +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/ExpandISAMacro]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/ExpandISAMacro]: ExpandISAMacro finished after 0.008 seconds +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/SimplifyNeuronTensor]: Running SimplifyNeuronTensor +2025-11-04T21:38:45Z INFO 8739 [attention_isa_kernel/Tensorizer/DoNothing]: DoNothing finished after 0.001 seconds +2025-11-04T21:38:45Z INFO 8739 [attention_isa_kernel/Tensorizer/InferSharedMemLoc]: Running InferSharedMemLoc +2025-11-04T21:38:45Z INFO 8739 [attention_isa_kernel/Tensorizer/InferSharedMemLoc]: Finished (changed=True) +2025-11-04T21:38:45Z INFO 8739 [attention_isa_kernel/Tensorizer/InferSharedMemLoc]: InferSharedMemLoc finished after 0.002 seconds +2025-11-04T21:38:45Z INFO 8739 [attention_isa_kernel/Tensorizer/FactorizeBlkDims]: Running FactorizeBlkDims +2025-11-04T21:38:45Z INFO 8739 [attention_isa_kernel/Tensorizer/FactorizeBlkDims]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8739 [attention_isa_kernel/Tensorizer/FactorizeBlkDims]: FactorizeBlkDims finished after 0.005 seconds +2025-11-04T21:38:45Z INFO 8739 [attention_isa_kernel/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-11-04T21:38:45Z INFO 8739 [attention_isa_kernel/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_0 +2025-11-04T21:38:45Z INFO 8739 [attention_isa_kernel/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_0 finished after 0.000 seconds +2025-11-04T21:38:45Z INFO 8739 [attention_isa_kernel/Tensorizer/NeuronInstComb]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8739 [attention_isa_kernel/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.001 seconds +2025-11-04T21:38:45Z INFO 8739 [attention_isa_kernel/Tensorizer/NeuronValueNumbering]: Running NeuronValueNumbering +2025-11-04T21:38:45Z INFO 8739 [attention_isa_kernel/Tensorizer/NeuronValueNumbering]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8739 [attention_isa_kernel/Tensorizer/NeuronValueNumbering]: NeuronValueNumbering finished after 0.000 seconds +2025-11-04T21:38:45Z INFO 8739 [attention_isa_kernel/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-11-04T21:38:45Z INFO 8739 [attention_isa_kernel/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_0 +2025-11-04T21:38:45Z INFO 8739 [attention_isa_kernel/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_0 finished after 0.000 seconds +2025-11-04T21:38:45Z INFO 8739 [attention_isa_kernel/Tensorizer/NeuronInstComb]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8739 [attention_isa_kernel/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.001 seconds +2025-11-04T21:38:45Z INFO 8739 [attention_isa_kernel/Tensorizer/LowerTranspose]: Running LowerTranspose +2025-11-04T21:38:45Z INFO 8739 [attention_isa_kernel/Tensorizer/LowerTranspose]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/SimplifyNeuronTensor]: Running DeadCodeElimination_iteration_0 +2025-11-04T21:38:45Z INFO 8740 [topk/Tensorizer/SimplifyNeuronTensor]: DeadCodeElimination_iteration_0 finished after 0.002 seconds +2025-11-04T21:38:45Z INFO 8739 [attention_isa_kernel/Tensorizer/LowerTranspose]: LowerTranspose finished after 0.000 seconds +2025-11-04T21:38:45Z INFO 8739 [attention_isa_kernel/Tensorizer/LowerBroadcast]: Running LowerBroadcast +2025-11-04T21:38:45Z INFO 8739 [attention_isa_kernel/Tensorizer/LowerBroadcast]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8740 [topk/Tensorizer/SimplifyNeuronTensor]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/LowerBroadcast]: LowerBroadcast finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb_iteration_0 +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb_iteration_0 finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/LateNeuronInstComb]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/SpillPSum]: Running SpillPSum +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/SpillPSum]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/SpillPSum]: SpillPSum finished after 0.001 seconds +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/LowerIntrinsics]: Running LowerIntrinsics +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/LowerIntrinsics]: Finished (changed=True) +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/LowerIntrinsics]: LowerIntrinsics finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/LegalizeType]: Running LegalizeType +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/LegalizeType]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/LegalizeType]: LegalizeType finished after 0.003 seconds +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/NeuronLICM]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/InferPSumTensor]: Running InferPSumTensor +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/InferPSumTensor]: Running InferPSumTensor_iteration_0 +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/InferPSumTensor]: InferPSumTensor_iteration_0 finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/InferPSumTensor]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/InferPSumTensor]: InferPSumTensor finished after 0.004 seconds +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/WeightCoalescing]: Running WeightCoalescing +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/WeightCoalescing]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/WeightCoalescing]: WeightCoalescing finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/LegalizeSundaAccess]: Running LegalizeSundaAccess +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/LegalizeSundaAccess]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/LegalizeSundaAccess]: LegalizeSundaAccess finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8738 [sg0000/Tensorizer/SFKVectorizer]: Running VectorizeLoop_iteration_0 +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/ExpandISAMacro]: Running ExpandISAMacro +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/ExpandISAMacro]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/ExpandISAMacro]: ExpandISAMacro finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/SimplifyNeuronTensor]: Running SimplifyNeuronTensor +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/SimplifyNeuronTensor]: Running DeadCodeElimination_iteration_0 +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/SimplifyNeuronTensor]: DeadCodeElimination_iteration_0 finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/SimplifyNeuronTensor]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/SimplifyNeuronTensor]: SimplifyNeuronTensor finished after 0.001 seconds +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/DMALocalityOpt]: Running DMALocalityOpt +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/DMALocalityOpt]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/DMALocalityOpt]: DMALocalityOpt finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/DataStreaming]: Running DataStreaming +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/DataStreaming]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/DataStreaming]: DataStreaming finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/SFKVectorizer]: Running SFKVectorizer +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/SFKVectorizer]: Running VectorizeLoop_iteration_0 +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/SFKVectorizer]: VectorizeLoop_iteration_0 finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/SFKVectorizer]: Finished (changed=True) +2025-11-04T21:38:46Z INFO 8740 [topk/Tensorizer/SimplifyNeuronTensor]: SimplifyNeuronTensor finished after 0.111 seconds +2025-11-04T21:38:46Z INFO 8740 [topk/Tensorizer/DMALocalityOpt]: Running DMALocalityOpt +2025-11-04T21:38:46Z INFO 8740 [topk/Tensorizer/DMALocalityOpt]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/SFKVectorizer]: SFKVectorizer finished after 0.005 seconds +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/LateLegalizeInst]: Running LateLegalizeInst +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/LateLegalizeInst]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8740 [topk/Tensorizer/DMALocalityOpt]: DMALocalityOpt finished after 0.014 seconds +2025-11-04T21:38:46Z INFO 8740 [topk/Tensorizer/DataStreaming]: Running DataStreaming +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/LateLegalizeInst]: LateLegalizeInst finished after 0.002 seconds +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/CoalesceCCOp]: Running CoalesceCCOp +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/CoalesceCCOp]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8740 [topk/Tensorizer/DataStreaming]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/CoalesceCCOp]: CoalesceCCOp finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/SimpleAllReduceTiling]: Running SimpleAllReduceTiling +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/SimpleAllReduceTiling]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/SimpleAllReduceTiling]: SimpleAllReduceTiling finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/InsertCoreBarrier]: Running InsertCoreBarrier +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/InsertCoreBarrier]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8740 [topk/Tensorizer/DataStreaming]: DataStreaming finished after 0.032 seconds +2025-11-04T21:38:46Z INFO 8740 [topk/Tensorizer/SFKVectorizer]: Running SFKVectorizer +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/InsertCoreBarrier]: InsertCoreBarrier finished after 0.002 seconds +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/DMAProfiler]: Running DMAProfiler +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/DMAProfiler]: Top 10 (estimated) latency DMAs: +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/DMAProfiler]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/DMAProfiler]: DMAProfiler finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/InferSharedMemLoc]: Running InferSharedMemLoc +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/InferSharedMemLoc]: Finished (changed=True) +2025-11-04T21:38:46Z INFO 8739 [attention_isa_kernel/Tensorizer/InferSharedMemLoc]: InferSharedMemLoc finished after 0.001 seconds +2025-11-04T21:38:46Z INFO 8739 [sg0001/Tensorizer/OptimizeNKIKernels]: Allocate SB of shape (128, 60284) for CausalAttentionMMSoftmaxMMWithoutSwap +2025-11-04T21:38:46Z INFO 8739 [sg0001/Tensorizer/OptimizeNKIKernels]: Allocate PSUM of shape (8, 128, 2048) for CausalAttentionMMSoftmaxMMWithoutSwap +2025-11-04T21:38:46Z INFO 8739 [sg0001/Tensorizer/OptimizeNKIKernels]: Finished (changed=True) +2025-11-04T21:38:46Z INFO 8739 [sg0001/Tensorizer/OptimizeNKIKernels]: OptimizeNKIKernels finished after 0.484 seconds +2025-11-04T21:38:46Z INFO 8739 [sg0001/Tensorizer/CCOpFusion]: Running CCOpFusion +2025-11-04T21:38:46Z INFO 8739 [sg0001/Tensorizer/CCOpFusion]: Running CCOpFusion_iteration_0 +2025-11-04T21:38:46Z INFO 8738 [sg0000/Tensorizer/SFKVectorizer]: VectorizeLoop_iteration_0 finished after 0.231 seconds +2025-11-04T21:38:46Z INFO 8738 [sg0000/Tensorizer/SFKVectorizer]: Running VectorizeLoop_iteration_1 +2025-11-04T21:38:46Z INFO 8740 [topk/Tensorizer/SFKVectorizer]: Running VectorizeLoop_iteration_0 +2025-11-04T21:38:46Z INFO 8740 [topk/Tensorizer/SFKVectorizer]: VectorizeLoop_iteration_0 finished after 0.002 seconds +2025-11-04T21:38:46Z INFO 8738 [sg0000/Tensorizer/SFKVectorizer]: VectorizeLoop_iteration_1 finished after 0.020 seconds +2025-11-04T21:38:46Z INFO 8740 [topk/Tensorizer/SFKVectorizer]: Finished (changed=True) +2025-11-04T21:38:46Z INFO 8740 [topk/Tensorizer/SFKVectorizer]: SFKVectorizer finished after 0.095 seconds +2025-11-04T21:38:46Z INFO 8740 [topk/Tensorizer/LateLegalizeInst]: Running LateLegalizeInst +2025-11-04T21:38:46Z INFO 8738 [sg0000/Tensorizer/SFKVectorizer]: Finished (changed=True) +2025-11-04T21:38:46Z INFO 8739 [sg0001/Tensorizer/CCOpFusion]: CCOpFusion_iteration_0 finished after 0.068 seconds +2025-11-04T21:38:46Z INFO 8739 [sg0001/Tensorizer/CCOpFusion]: Finished (changed=True) +2025-11-04T21:38:46Z INFO 8738 [sg0000/Tensorizer/SFKVectorizer]: SFKVectorizer finished after 0.543 seconds +2025-11-04T21:38:46Z INFO 8740 [topk/Tensorizer/LateLegalizeInst]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8738 [sg0000/Tensorizer/LateLegalizeInst]: Running LateLegalizeInst +2025-11-04T21:38:46Z INFO 8740 [topk/Tensorizer/LateLegalizeInst]: LateLegalizeInst finished after 0.023 seconds +2025-11-04T21:38:46Z INFO 8738 [sg0000/Tensorizer/LateLegalizeInst]: Finished (changed=True) +2025-11-04T21:38:46Z INFO 8740 [topk/Tensorizer/CoalesceCCOp]: Running CoalesceCCOp +2025-11-04T21:38:46Z INFO 8738 [sg0000/Tensorizer/LateLegalizeInst]: LateLegalizeInst finished after 0.013 seconds +2025-11-04T21:38:46Z INFO 8738 [sg0000/Tensorizer/CoalesceCCOp]: Running CoalesceCCOp +2025-11-04T21:38:46Z INFO 8740 [topk/Tensorizer/CoalesceCCOp]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8738 [sg0000/Tensorizer/CoalesceCCOp]: Finished (changed=True) +2025-11-04T21:38:46Z INFO 8740 [topk/Tensorizer/CoalesceCCOp]: CoalesceCCOp finished after 0.012 seconds +2025-11-04T21:38:46Z INFO 8740 [topk/Tensorizer/SimpleAllReduceTiling]: Running SimpleAllReduceTiling +2025-11-04T21:38:46Z INFO 8740 [topk/Tensorizer/SimpleAllReduceTiling]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8738 [sg0000/Tensorizer/CoalesceCCOp]: CoalesceCCOp finished after 0.012 seconds +2025-11-04T21:38:46Z INFO 8738 [sg0000/Tensorizer/SimpleAllReduceTiling]: Running SimpleAllReduceTiling +2025-11-04T21:38:46Z INFO 8738 [sg0000/Tensorizer/SimpleAllReduceTiling]: Finished (changed=True) +2025-11-04T21:38:46Z INFO 8740 [topk/Tensorizer/SimpleAllReduceTiling]: SimpleAllReduceTiling finished after 0.007 seconds +2025-11-04T21:38:46Z INFO 8740 [topk/Tensorizer/InsertCoreBarrier]: Running InsertCoreBarrier +2025-11-04T21:38:46Z INFO 8740 [topk/Tensorizer/InsertCoreBarrier]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8738 [sg0000/Tensorizer/SimpleAllReduceTiling]: SimpleAllReduceTiling finished after 0.005 seconds +2025-11-04T21:38:46Z INFO 8738 [sg0000/Tensorizer/InsertCoreBarrier]: Running InsertCoreBarrier +2025-11-04T21:38:46Z INFO 8738 [sg0000/Tensorizer/InsertCoreBarrier]: Finished (changed=True) +2025-11-04T21:38:46Z INFO 8740 [topk/Tensorizer/InsertCoreBarrier]: InsertCoreBarrier finished after 0.007 seconds +2025-11-04T21:38:46Z INFO 8740 [topk/Tensorizer/DMAProfiler]: Running DMAProfiler +2025-11-04T21:38:46Z INFO 8740 [topk/Tensorizer/DMAProfiler]: Top 10 (estimated) latency DMAs: +2025-11-04T21:38:46Z INFO 8740 [topk/Tensorizer/DMAProfiler]: Est. DMA time: 2.014us (2.000KiB, est bw: 1.017GB/s, 12.329% of tot. time) for float32<32 x 16> TongaSB partitions[0] float32 (32, 272) %4(init=0.0)[i0.32,i1.16] = load float32<32 x 16> float32 (32, 16) %6[i0.32,i1.16] # id=7, src_id=None, , instances=1 # dl = tensor_op_name: | /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/nki/_pre_prod_kernels/topk/topk.py:45:0 | [[i0.32];[i1.16]] -> [[i0.32];[i1.16]] +2025-11-04T21:38:46Z INFO 8740 [topk/Tensorizer/DMAProfiler]: Est. DMA time: 2.014us (2.000KiB, est bw: 1.017GB/s, 12.329% of tot. time) for float32<32 x 16> TongaSB partitions[0] float32 (32, 16) %10[i0.32,i1.16] = load float32<32 x 16> float32 (1, 512) %'inp'[i0.32,i1.16] # id=9, src_id=None, , instances=1 # dl = tensor_op_name: | /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/nki/_pre_prod_kernels/topk/topk.py:45:0 | [[i0.32];[i1.16]] -> [[i0.32];[i1.16]] +2025-11-04T21:38:46Z INFO 8740 [topk/Tensorizer/DMAProfiler]: Est. DMA time: 1.965us (4.000KiB, est bw: 2.085GB/s, 12.028% of tot. time) for float32<32 x 32> TongaSB partitions[0] float32 (32, 32) %485[i0.32,i1.32] = load float32<32 x 32> float32 (32, 32) %3[i0.32,i1.32] # id=13, src_id=None, , instances=1 # dl = tensor_op_name: | /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/nki/_pre_prod_kernels/topk/topk.py:45:0 | [[i0.32];[i1.32]] -> [[i0.32];[i1.32]] +2025-11-04T21:38:46Z INFO 8740 [topk/Tensorizer/DMAProfiler]: Est. DMA time: 1.922us (1.000KiB, est bw: 0.533GB/s, 11.765% of tot. time) for float32<1 x 256> TongaSB partitions[0] float32 (1, 256) %316[0,i0.256] = load float32<1 x 256> float32 (32, 8) %304[0,i0.256] # id=306, src_id=None, , instances=1 # dl = tensor_op_name: | /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/nki/_pre_prod_kernels/topk/topk.py:45:0 | [[];[i0.256]] -> [[];[i0.256]] +2025-11-04T21:38:46Z INFO 8740 [topk/Tensorizer/DMAProfiler]: Est. DMA time: 1.922us (1.000KiB, est bw: 0.533GB/s, 11.765% of tot. time) for uint32<1 x 256> TongaSB partitions[0] uint32 (1, 256) %319[0,i0.256] = load float32<1 x 256> float32 (32, 8) %307[0,i0.256] # id=309, src_id=None, , instances=1 # dl = tensor_op_name: | /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/nki/_pre_prod_kernels/topk/topk.py:45:0 | [[];[i0.256]] -> [[];[i0.256]] +2025-11-04T21:38:46Z INFO 8740 [topk/Tensorizer/DMAProfiler]: Est. DMA time: 1.640us (1.000KiB, est bw: 0.625GB/s, 10.038% of tot. time) for uint32<1 x 256> uint32 (1, 256) %'topk_indices'[0,i0.256] = store uint32<1 x 256> TongaSB partitions[0] uint32 (1, 256) %'global_id_buf'(init=0.0)[0,i0.256] # id=322, src_id=None, , instances=1 # dl = tensor_op_name: | /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/nki/_pre_prod_kernels/topk/topk.py:45:0 | [[];[i0.256]] -> [[];[i0.256]] +2025-11-04T21:38:46Z INFO 8740 [topk/Tensorizer/DMAProfiler]: Est. DMA time: 1.640us (1.000KiB, est bw: 0.625GB/s, 10.038% of tot. time) for float32<1 x 256> float32 (1, 256) %'topk_values'[0,i0.256] = store float32<1 x 256> TongaSB partitions[0] float32 (1, 256) %'val_buf'(init=0.0)[0,i0.256] # id=324, src_id=None, , instances=1 # dl = tensor_op_name: | /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/nki/_pre_prod_kernels/topk/topk.py:45:0 | [[];[i0.256]] -> [[];[i0.256]] +2025-11-04T21:38:46Z INFO 8740 [topk/Tensorizer/DMAProfiler]: Est. DMA time: 1.609us (1.000KiB, est bw: 0.636GB/s, 9.852% of tot. time) for float32<32 x 8> float32 (32, 8) %304[i0.32,i1.8] = store float32<32 x 8> TongaSB partitions[0] float32 (32, 8) %296[i0.32,i1.8] # id=305, src_id=None, , instances=1 # dl = tensor_op_name: | /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/nki/_pre_prod_kernels/topk/topk.py:45:0 | [[i0.32];[i1.8]] -> [[i0.32];[i1.8]] +2025-11-04T21:38:46Z INFO 8740 [topk/Tensorizer/DMAProfiler]: Est. DMA time: 1.609us (1.000KiB, est bw: 0.636GB/s, 9.852% of tot. time) for float32<32 x 8> float32 (32, 8) %307[i0.32,i1.8] = store float32<32 x 8> TongaSB partitions[0] float32 (32, 8) %517[i0.32,i1.8] # id=308, src_id=None, , instances=1 # dl = tensor_op_name: | /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/nki/_pre_prod_kernels/topk/topk.py:45:0 | [[i0.32];[i1.8]] -> [[i0.32];[i1.8]] +2025-11-04T21:38:46Z INFO 8740 [topk/Tensorizer/DMAProfiler]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8738 [sg0000/Tensorizer/InsertCoreBarrier]: InsertCoreBarrier finished after 0.008 seconds +2025-11-04T21:38:46Z INFO 8738 [sg0000/Tensorizer/DMAProfiler]: Running DMAProfiler +2025-11-04T21:38:46Z INFO 8738 [sg0000/Tensorizer/DMAProfiler]: Top 10 (estimated) latency DMAs: +2025-11-04T21:38:46Z INFO 8738 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 82.457us (16.000MiB, est bw: 203.466GB/s, 12.767% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[5] bfloat16 (2, 2, 2, 2, 2, 128, 2048) %1881[i34_0,i34_1_0_0,i35_0_0,c1_1532,c2_1533,i0.128,i1.2048] = load bfloat16<128 x 2048> {'CrossPassTensor': ''}bfloat16 (2, 2, 128, 2, 2048) %'input67'[i35_0_0,c1_1532,i0.128,c2_1533,i1.2048] # id=1742, src_id=None, , instances=32 # dl = tensor_op_name: _dot.2 | hlo_id: 32 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-11-04T21:38:46Z INFO 8738 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 51.143us (4.000MiB, est bw: 82.011GB/s, 7.919% of tot. time) for bfloat16<128 x 256> TongaSB partitions[3] bfloat16 (2, 2, 16, 128, 256) %'transpose.1_pftranspose_1450'[T_i2_1_0_1454,T_i2_0_1454,i3_0,i0.128,i1.256] = indirect_load bfloat16<128 x 256> {'CrossPassTensor': ''}bfloat16 (151936, 2, 2, 256) %'input60'[i0.128,T_i2_0_1454,T_i2_1_0_1454,i1.256] generic generic_dims:[0] generic_addrs: int32<128 x 1> TongaSB partitions[1] int32 (2, 128, 16, 1) %'input0_local_1493'[T_i2_1_0_1454,i0.128,i3_0,0] # id=1698, src_id=None, , attrs={'mode': OOBMode.ERROR}, instances=64 # dl = tensor_op_name: _gather.41 | hlo_id: 12 | [[i0.128];[i1.256]] -> [[i0.128];[i1.256]] +2025-11-04T21:38:46Z INFO 8738 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 41.879us (8.000MiB, est bw: 200.308GB/s, 6.484% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[3] bfloat16 (2, 2, 4, 128, 2, 2, 512) %'intermediate0_pftranspose_1455'[i0_0,i1_1_0,i1_1_1_0,i0.128,i3.2,i2.2,i1.512] = load bfloat16<128 x 2048> DRAM2DBlk partitions[1] bfloat16 (2, 1, 2, 4, 128, 2, 2, 512) %'all_gather.1'[i1_1_0,0,i3.2,i1_1_1_0,i0.128,i0_0,i2.2,i1.512] # id=1701, src_id=None, , instances=16 # dl = tensor_op_name: UnnamedModule | hlo_id: 1 | [[i0.128];[i1.512, i2.2, i3.2]] -> [[i0.128];[i1.512, i2.2, i3.2]] +2025-11-04T21:38:46Z INFO 8738 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 41.879us (8.000MiB, est bw: 200.308GB/s, 6.484% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[3] bfloat16 (2, 2, 4, 128, 2, 2, 512) %'custom-call.177.1878'[i17_0_1521_1880,i16_0_1_0_1521_1880,i16_0_1_1_1521_1880,i0.128,i3.2,i2.2,i1.512] = load bfloat16<128 x 2048> DRAM2DBlk partitions[1] bfloat16 (2, 1, 2, 4, 128, 2, 2, 512) %'all_gather.1'[i16_0_1_0_1521_1880,0,i3.2,i16_0_1_1_1521_1880,i0.128,i17_0_1521_1880,i2.2,i1.512] # id=1737, src_id=None, , instances=16 # dl = tensor_op_name: _custom-call.177 | hlo_id: 24 | [[i0.128];[i1.512, i2.2, i3.2]] -> [[i0.128];[i1.512, i2.2, i3.2]] +2025-11-04T21:38:46Z INFO 8738 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 41.879us (8.000MiB, est bw: 200.308GB/s, 6.484% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[4] bfloat16 (2, 2, 2, 2, 128, 2048) %'_reload_1885'[i64_0,i64_1_0_0,i48_0_1_0_1888,i48_0_0_1888,i0.128,i1.2048] = load bfloat16<128 x 2048> DRAM3DBlk partitions[4] bfloat16 (2, 2, 2, 2, 128, 2048) %'_spill_1882'[i48_0_1_0_1888,i48_0_0_1888,i64_0,i64_1_0_0,i0.128,i1.2048] # id=1887, src_id=None, , instances=16 # dl = tensor_op_name: _dot.1 | hlo_id: 88 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-11-04T21:38:46Z INFO 8738 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 41.879us (8.000MiB, est bw: 200.308GB/s, 6.484% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[4] bfloat16 (2, 2, 2, 2, 128, 2048) %'_reload_1885_reload_1891'[i2_0_1578,i2_1_0_1578_0,i48_0_1_0_1888_1890,i48_0_0_1888_1890,i0.128,i1.2048] = load bfloat16<128 x 2048> DRAM3DBlk partitions[4] bfloat16 (2, 2, 2, 2, 128, 2048) %'_spill_1882'[i48_0_1_0_1888_1890,i48_0_0_1888_1890,i2_0_1578,i2_1_0_1578_0,i0.128,i1.2048] # id=1889, src_id=None, , instances=16 # dl = tensor_op_name: _dot.1 | hlo_id: 88 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-11-04T21:38:46Z INFO 8738 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 41.879us (8.000MiB, est bw: 200.308GB/s, 6.484% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[4] bfloat16 (2, 2, 2, 2, 128, 16, 128) %'get_tuple_element.1_local_1586'[i95_0_0_0_1603,c0_1580_0,c0_1580_1,c1_1581,i0.128,i1.16,i2.128] = load bfloat16<128 x 2048> non_local bfloat16 (4, 2, 128, 16, 128) %'get_tuple_element.1'[2c0_1580_0+c0_1580_1,c1_1581,i0.128,i1.16,i2.128] # id=1842, src_id=None, , instances=16 # dl = tensor_op_name: _dot.3 | hlo_id: 147 | [[i0.128];[i2.128, i1.16]] -> [[i0.128];[i2.128, i1.16]] +2025-11-04T21:38:46Z INFO 8738 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 30.524us (8.000MiB, est bw: 274.819GB/s, 4.726% of tot. time) for bfloat16<128 x 1024> {'IntermediateTensor': ''}bfloat16 (2, 2, 512, 2, 2, 512) %'intermediate0'(init=0.0)[T_i0_0_1459,T_i0_1_0_1459,i0.128+128T_i0_1_1_1459_0,i2.2,T_i1_1_0_1459,i1.512] = store bfloat16<128 x 1024> TongaSB partitions[4] bfloat16 (2, 2, 2, 4, 128, 1024) %'1455.2023'[T_i0_0_1459,T_i1_1_0_1459,T_i0_1_0_1459,T_i0_1_1_1459_0,i0.128,i1.512+512i2.2] # id=2021, src_id=None, , instances=32 # dl = tensor_op_name: intermediate0_pftranspose_1455 | hlo_id: 1 | [[i0.128];[i1.512, i2.2]] -> [[i0.128];[i1.512, i2.2]] +2025-11-04T21:38:46Z INFO 8738 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 30.524us (8.000MiB, est bw: 274.819GB/s, 4.726% of tot. time) for bfloat16<128 x 1024> non_local bfloat16 (4194304,) %'dot.4-buffer-2754'[1024i95_0_0_0_1603+2048i0.128+262144i96_0_1603+i1.1024] = store bfloat16<128 x 1024> TongaSB partitions[2] bfloat16 (2, 16, 128, 1024) %1604[i95_0_0_0_1603,i96_0_1603,i0.128,i1.1024] # id=1846, src_id=None, , instances=32 # dl = tensor_op_name: _dot.3 | hlo_id: 147 | [[i0.128];[i1.1024]] -> [[i0.128];[i1.1024]] +2025-11-04T21:38:46Z INFO 8738 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 26.236us (2.000MiB, est bw: 79.934GB/s, 4.062% of tot. time) for bfloat16<128 x 128> bfloat16 (8, 4, 4096, 128) %'output2'[i0.128,i1.128] generic, generic_dims:[0] generic_addrs: int32<128 x 1> TongaSB partitions[5] int32 (2, 2, 2, 2, 4, 128, 1) %'scatter.6719.2278'[i111_0,i105_0,i105_1,i104_1_0_0,i104_1_0_1,i0.128,0] = indirect_save bfloat16<128 x 128> TongaSB partitions[3] bfloat16 (2, 2, 2, 128, 4, 2, 128) %'transpose.19'[i111_0,i104_1_0_0,i105_0,i0.128,i104_1_0_1,i105_1,i1.128] # id=1860, src_id=None, , attrs={'mode': OOBMode.ERROR}, instances=64 # dl = tensor_op_name: _scatter.6719 | hlo_id: 187 | [[i0.128];[i1.128]] -> [[i0.128];[i1.128]] +2025-11-04T21:38:46Z INFO 8738 [sg0000/Tensorizer/DMAProfiler]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8740 [topk/Tensorizer/DMAProfiler]: DMAProfiler finished after 0.008 seconds +2025-11-04T21:38:46Z INFO 8740 [topk/Tensorizer/InferSharedMemLoc]: Running InferSharedMemLoc +2025-11-04T21:38:46Z INFO 8740 [topk/Tensorizer/InferSharedMemLoc]: Finished (changed=True) +2025-11-04T21:38:46Z INFO 8738 [sg0000/Tensorizer/DMAProfiler]: DMAProfiler finished after 0.007 seconds +2025-11-04T21:38:46Z INFO 8738 [sg0000/Tensorizer/OptimizeNKIKernels]: Running OptimizeNKIKernels +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/DoNothing]: Running DoNothing +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/DoNothing]: Finished (changed=True) +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/InferSharedMemLoc]: Running InferSharedMemLoc +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/InferSharedMemLoc]: Finished (changed=True) +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/InferSharedMemLoc]: InferSharedMemLoc finished after 0.001 seconds +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/FactorizeBlkDims]: Running FactorizeBlkDims +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/FactorizeBlkDims]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/FactorizeBlkDims]: FactorizeBlkDims finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_0 +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_0 finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/NeuronInstComb]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.001 seconds +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/NeuronValueNumbering]: Running NeuronValueNumbering +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/NeuronValueNumbering]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/NeuronValueNumbering]: NeuronValueNumbering finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_0 +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_0 finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/NeuronInstComb]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.001 seconds +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/LowerTranspose]: Running LowerTranspose +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/LowerTranspose]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/LowerTranspose]: LowerTranspose finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/LowerBroadcast]: Running LowerBroadcast +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/LowerBroadcast]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/LowerBroadcast]: LowerBroadcast finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb_iteration_0 +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb_iteration_0 finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/LateNeuronInstComb]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/SpillPSum]: Running SpillPSum +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/SpillPSum]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/SpillPSum]: SpillPSum finished after 0.002 seconds +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/LowerIntrinsics]: Running LowerIntrinsics +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/LowerIntrinsics]: Finished (changed=True) +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/LowerIntrinsics]: LowerIntrinsics finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/LegalizeType]: Running LegalizeType +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/LegalizeType]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/LegalizeType]: LegalizeType finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/NeuronLICM]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/InferPSumTensor]: Running InferPSumTensor +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/InferPSumTensor]: Running InferPSumTensor_iteration_0 +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/InferPSumTensor]: InferPSumTensor_iteration_0 finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/InferPSumTensor]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/InferPSumTensor]: InferPSumTensor finished after 0.001 seconds +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/WeightCoalescing]: Running WeightCoalescing +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/WeightCoalescing]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/WeightCoalescing]: WeightCoalescing finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/LegalizeSundaAccess]: Running LegalizeSundaAccess +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/LegalizeSundaAccess]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/LegalizeSundaAccess]: LegalizeSundaAccess finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/ExpandISAMacro]: Running ExpandISAMacro +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/ExpandISAMacro]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/ExpandISAMacro]: ExpandISAMacro finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/SimplifyNeuronTensor]: Running SimplifyNeuronTensor +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/SimplifyNeuronTensor]: Running DeadCodeElimination_iteration_0 +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/SimplifyNeuronTensor]: DeadCodeElimination_iteration_0 finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/SimplifyNeuronTensor]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/SimplifyNeuronTensor]: SimplifyNeuronTensor finished after 0.002 seconds +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/DMALocalityOpt]: Running DMALocalityOpt +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/DMALocalityOpt]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/DMALocalityOpt]: DMALocalityOpt finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/DataStreaming]: Running DataStreaming +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/DataStreaming]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/DataStreaming]: DataStreaming finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/SFKVectorizer]: Running SFKVectorizer +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/SFKVectorizer]: Running VectorizeLoop_iteration_0 +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/SFKVectorizer]: VectorizeLoop_iteration_0 finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/SFKVectorizer]: Finished (changed=True) +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/SFKVectorizer]: SFKVectorizer finished after 0.002 seconds +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/LateLegalizeInst]: Running LateLegalizeInst +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/LateLegalizeInst]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/LateLegalizeInst]: LateLegalizeInst finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/CoalesceCCOp]: Running CoalesceCCOp +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/CoalesceCCOp]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/CoalesceCCOp]: CoalesceCCOp finished after 0.001 seconds +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/SimpleAllReduceTiling]: Running SimpleAllReduceTiling +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/SimpleAllReduceTiling]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/SimpleAllReduceTiling]: SimpleAllReduceTiling finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/InsertCoreBarrier]: Running InsertCoreBarrier +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/InsertCoreBarrier]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/InsertCoreBarrier]: InsertCoreBarrier finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/DMAProfiler]: Running DMAProfiler +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/DMAProfiler]: Top 10 (estimated) latency DMAs: +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/DMAProfiler]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/DMAProfiler]: DMAProfiler finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/InferSharedMemLoc]: Running InferSharedMemLoc +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/InferSharedMemLoc]: Finished (changed=True) +2025-11-04T21:38:46Z INFO 8738 [attention_isa_kernel/Tensorizer/InferSharedMemLoc]: InferSharedMemLoc finished after 0.001 seconds +2025-11-04T21:38:46Z INFO 8738 [sg0000/Tensorizer/OptimizeNKIKernels]: Allocate SB of shape (128, 60284) for CausalAttentionMMSoftmaxMMWithoutSwap +2025-11-04T21:38:46Z INFO 8738 [sg0000/Tensorizer/OptimizeNKIKernels]: Allocate PSUM of shape (8, 128, 2048) for CausalAttentionMMSoftmaxMMWithoutSwap +2025-11-04T21:38:46Z INFO 8738 [sg0000/Tensorizer/OptimizeNKIKernels]: Finished (changed=True) +2025-11-04T21:38:46Z INFO 8738 [sg0000/Tensorizer/OptimizeNKIKernels]: OptimizeNKIKernels finished after 0.372 seconds +2025-11-04T21:38:46Z INFO 8738 [sg0000/Tensorizer/CCOpFusion]: Running CCOpFusion +2025-11-04T21:38:46Z INFO 8738 [sg0000/Tensorizer/CCOpFusion]: Running CCOpFusion_iteration_0 +2025-11-04T21:38:46Z INFO 8739 [sg0001/Tensorizer/CCOpFusion]: CCOpFusion finished after 0.068 seconds +2025-11-04T21:38:46Z INFO 8739 [sg0001/Tensorizer/StaticProfiler]: Running StaticProfiler +2025-11-04T21:38:46Z INFO 8739 [sg0001/Tensorizer/StaticProfiler]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8740 [topk/Tensorizer/InferSharedMemLoc]: InferSharedMemLoc finished after 0.005 seconds +2025-11-04T21:38:46Z INFO 8739 [sg0001/Tensorizer/StaticProfiler]: StaticProfiler finished after 0.014 seconds +2025-11-04T21:38:46Z INFO 8739 [sg0001/Tensorizer/SplitAPUnionSets]: Running SplitAPUnionSets +2025-11-04T21:38:46Z INFO 8738 [sg0000/Tensorizer/CCOpFusion]: CCOpFusion_iteration_0 finished after 0.074 seconds +2025-11-04T21:38:46Z INFO 8738 [sg0000/Tensorizer/CCOpFusion]: Finished (changed=True) +2025-11-04T21:38:47Z INFO 8738 [sg0000/Tensorizer/CCOpFusion]: CCOpFusion finished after 0.076 seconds +2025-11-04T21:38:47Z INFO 8738 [sg0000/Tensorizer/StaticProfiler]: Running StaticProfiler +2025-11-04T21:38:47Z INFO 8738 [sg0000/Tensorizer/StaticProfiler]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8739 [sg0001/Tensorizer/SplitAPUnionSets]: Finished (changed=True) +2025-11-04T21:38:47Z INFO 8738 [sg0000/Tensorizer/StaticProfiler]: StaticProfiler finished after 0.021 seconds +2025-11-04T21:38:47Z INFO 8738 [sg0000/Tensorizer/SplitAPUnionSets]: Running SplitAPUnionSets +2025-11-04T21:38:47Z INFO 8739 [sg0001/Tensorizer/SplitAPUnionSets]: SplitAPUnionSets finished after 0.053 seconds +2025-11-04T21:38:47Z INFO 8739 [sg0001/Tensorizer/LateLegalizePostSplit]: Running LateLegalizePostSplit +2025-11-04T21:38:47Z INFO 8739 [sg0001/Tensorizer/LateLegalizePostSplit]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8739 [sg0001/Tensorizer/LateLegalizePostSplit]: LateLegalizePostSplit finished after 0.011 seconds +2025-11-04T21:38:47Z INFO 8739 [sg0001/Tensorizer/InferSharedMemLoc]: Running InferSharedMemLoc +2025-11-04T21:38:47Z INFO 8738 [sg0000/Tensorizer/SplitAPUnionSets]: Finished (changed=True) +2025-11-04T21:38:47Z INFO 8739 [sg0001/Tensorizer/InferSharedMemLoc]: Finished (changed=True) +2025-11-04T21:38:47Z INFO 8738 [sg0000/Tensorizer/SplitAPUnionSets]: SplitAPUnionSets finished after 0.045 seconds +2025-11-04T21:38:47Z INFO 8738 [sg0000/Tensorizer/LateLegalizePostSplit]: Running LateLegalizePostSplit +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/DoNothing]: Running DoNothing +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/DoNothing]: Finished (changed=True) +2025-11-04T21:38:47Z INFO 8738 [sg0000/Tensorizer/LateLegalizePostSplit]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8739 [sg0001/Tensorizer/InferSharedMemLoc]: InferSharedMemLoc finished after 0.007 seconds +2025-11-04T21:38:47Z INFO 8739 [sg0001/Tensorizer/LowerShardAxis]: Running LowerShardAxis +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/DoNothing]: DoNothing finished after 0.001 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/InferSharedMemLoc]: Running InferSharedMemLoc +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/InferSharedMemLoc]: Finished (changed=True) +2025-11-04T21:38:47Z INFO 8739 [sg0001/Tensorizer/LowerShardAxis]: Finished (changed=True) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/InferSharedMemLoc]: InferSharedMemLoc finished after 0.001 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/FactorizeBlkDims]: Running FactorizeBlkDims +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/FactorizeBlkDims]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/FactorizeBlkDims]: FactorizeBlkDims finished after 0.001 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_0 +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_0 finished after 0.001 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/NeuronInstComb]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.001 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/NeuronValueNumbering]: Running NeuronValueNumbering +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/NeuronValueNumbering]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8738 [sg0000/Tensorizer/LateLegalizePostSplit]: LateLegalizePostSplit finished after 0.008 seconds +2025-11-04T21:38:47Z INFO 8738 [sg0000/Tensorizer/InferSharedMemLoc]: Running InferSharedMemLoc +2025-11-04T21:38:47Z INFO 8738 [sg0000/Tensorizer/InferSharedMemLoc]: Finished (changed=True) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/NeuronValueNumbering]: NeuronValueNumbering finished after 0.001 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_0 +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_0 finished after 0.001 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/NeuronInstComb]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8738 [sg0000/Tensorizer/InferSharedMemLoc]: InferSharedMemLoc finished after 0.013 seconds +2025-11-04T21:38:47Z INFO 8738 [sg0000/Tensorizer/LowerShardAxis]: Running LowerShardAxis +2025-11-04T21:38:47Z INFO 8738 [sg0000/Tensorizer/LowerShardAxis]: Finished (changed=True) +2025-11-04T21:38:47Z INFO 8739 [sg0001/Tensorizer/LowerShardAxis]: LowerShardAxis finished after 0.014 seconds +2025-11-04T21:38:47Z INFO 8739 [sg0001/Tensorizer/CCOpFusion]: Running CCOpFusion +2025-11-04T21:38:47Z INFO 8739 [sg0001/Tensorizer/CCOpFusion]: Running CCOpFusion_iteration_0 +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.002 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LowerTranspose]: Running LowerTranspose +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LowerTranspose]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LowerTranspose]: LowerTranspose finished after 0.000 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LowerBroadcast]: Running LowerBroadcast +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LowerBroadcast]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LowerBroadcast]: LowerBroadcast finished after 0.000 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb_iteration_0 +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb_iteration_0 finished after 0.001 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LateNeuronInstComb]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8738 [sg0000/Tensorizer/LowerShardAxis]: LowerShardAxis finished after 0.015 seconds +2025-11-04T21:38:47Z INFO 8738 [sg0000/Tensorizer/CCOpFusion]: Running CCOpFusion +2025-11-04T21:38:47Z INFO 8738 [sg0000/Tensorizer/CCOpFusion]: Running CCOpFusion_iteration_0 +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb finished after 0.002 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/SpillPSum]: Running SpillPSum +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/SpillPSum]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8739 [sg0001/Tensorizer/CCOpFusion]: CCOpFusion_iteration_0 finished after 0.066 seconds +2025-11-04T21:38:47Z INFO 8739 [sg0001/Tensorizer/CCOpFusion]: Finished (changed=True) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/SpillPSum]: SpillPSum finished after 0.001 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LowerIntrinsics]: Running LowerIntrinsics +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LowerIntrinsics]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8739 [sg0001/Tensorizer/CCOpFusion]: CCOpFusion finished after 0.071 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LowerIntrinsics]: LowerIntrinsics finished after 0.000 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LegalizeType]: Running LegalizeType +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LegalizeType]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LegalizeType]: LegalizeType finished after 0.001 seconds +2025-11-04T21:38:47Z INFO 8738 [sg0000/Tensorizer/CCOpFusion]: CCOpFusion_iteration_0 finished after 0.065 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-11-04T21:38:47Z INFO 8738 [sg0000/Tensorizer/CCOpFusion]: Finished (changed=True) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/NeuronLICM]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8738 [sg0000/Tensorizer/CCOpFusion]: CCOpFusion finished after 0.065 seconds +2025-11-04T21:38:47Z INFO 8738 [sg0000/Tensorizer/DumpGraphAndMetadata]: Running DumpGraphAndMetadata +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.001 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/InferPSumTensor]: Running InferPSumTensor +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/InferPSumTensor]: Running InferPSumTensor_iteration_0 +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/InferPSumTensor]: InferPSumTensor_iteration_0 finished after 0.001 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/InferPSumTensor]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8738 [sg0000/Tensorizer/DumpGraphAndMetadata]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8739 [sg0001/Tensorizer/DumpGraphAndMetadata]: Running DumpGraphAndMetadata +2025-11-04T21:38:47Z INFO 8739 [sg0001/Tensorizer/DumpGraphAndMetadata]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/InferPSumTensor]: InferPSumTensor finished after 0.001 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/WeightCoalescing]: Running WeightCoalescing +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/WeightCoalescing]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/WeightCoalescing]: WeightCoalescing finished after 0.001 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LegalizeSundaAccess]: Running LegalizeSundaAccess +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LegalizeSundaAccess]: Finished (changed=True) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LegalizeSundaAccess]: LegalizeSundaAccess finished after 0.007 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8739 [sg0001/Tensorizer/DumpGraphAndMetadata]: DumpGraphAndMetadata finished after 0.007 seconds +2025-11-04T21:38:47Z INFO 8739 [sg0001/Tensorizer/ZeroSizeTensorElimination]: Running ZeroSizeTensorElimination +2025-11-04T21:38:47Z INFO 8739 [sg0001/Tensorizer/ZeroSizeTensorElimination]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8739 [sg0001/Tensorizer/ZeroSizeTensorElimination]: ZeroSizeTensorElimination finished after 0.000 seconds +2025-11-04T21:38:47Z INFO 8739 [sg0001/Tensorizer/LowerToSendRecv]: Running LowerToSendRecv +2025-11-04T21:38:47Z INFO 8739 [sg0001/Tensorizer/LowerToSendRecv]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.008 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/ExpandISAMacro]: Running ExpandISAMacro +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/ExpandISAMacro]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8738 [sg0000/Tensorizer/DumpGraphAndMetadata]: DumpGraphAndMetadata finished after 0.015 seconds +2025-11-04T21:38:47Z INFO 8738 [sg0000/Tensorizer/ZeroSizeTensorElimination]: Running ZeroSizeTensorElimination +2025-11-04T21:38:47Z INFO 8738 [sg0000/Tensorizer/ZeroSizeTensorElimination]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8738 [sg0000/Tensorizer/ZeroSizeTensorElimination]: ZeroSizeTensorElimination finished after 0.000 seconds +2025-11-04T21:38:47Z INFO 8738 [sg0000/Tensorizer/LowerToSendRecv]: Running LowerToSendRecv +2025-11-04T21:38:47Z INFO 8738 [sg0000/Tensorizer/LowerToSendRecv]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8739 [sg0001/Tensorizer/LowerToSendRecv]: LowerToSendRecv finished after 0.006 seconds +2025-11-04T21:38:47Z INFO 8739 [sg0001/Tensorizer/BirCodeGenLoop]: Running BirCodeGenLoop +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/ExpandISAMacro]: ExpandISAMacro finished after 0.001 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/SimplifyNeuronTensor]: Running SimplifyNeuronTensor +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/SimplifyNeuronTensor]: Running DeadCodeElimination_iteration_0 +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/SimplifyNeuronTensor]: DeadCodeElimination_iteration_0 finished after 0.001 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/SimplifyNeuronTensor]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8738 [sg0000/Tensorizer/LowerToSendRecv]: LowerToSendRecv finished after 0.006 seconds +2025-11-04T21:38:47Z INFO 8738 [sg0000/Tensorizer/BirCodeGenLoop]: Running BirCodeGenLoop +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/SimplifyNeuronTensor]: SimplifyNeuronTensor finished after 0.006 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/DMALocalityOpt]: Running DMALocalityOpt +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/DMALocalityOpt]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/DMALocalityOpt]: DMALocalityOpt finished after 0.000 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/DataStreaming]: Running DataStreaming +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/DataStreaming]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/DataStreaming]: DataStreaming finished after 0.001 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/SFKVectorizer]: Running SFKVectorizer +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/SFKVectorizer]: Running VectorizeLoop_iteration_0 +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/SFKVectorizer]: VectorizeLoop_iteration_0 finished after 0.002 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/SFKVectorizer]: Finished (changed=True) +2025-11-04T21:38:47Z INFO 8739 [sg0001/Tensorizer/BirCodeGenLoop]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/SFKVectorizer]: SFKVectorizer finished after 0.012 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LateLegalizeInst]: Running LateLegalizeInst +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LateLegalizeInst]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LateLegalizeInst]: LateLegalizeInst finished after 0.001 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/CoalesceCCOp]: Running CoalesceCCOp +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/CoalesceCCOp]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/CoalesceCCOp]: CoalesceCCOp finished after 0.000 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/SimpleAllReduceTiling]: Running SimpleAllReduceTiling +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/SimpleAllReduceTiling]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/SimpleAllReduceTiling]: SimpleAllReduceTiling finished after 0.000 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/InsertCoreBarrier]: Running InsertCoreBarrier +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/InsertCoreBarrier]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/InsertCoreBarrier]: InsertCoreBarrier finished after 0.001 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/DMAProfiler]: Running DMAProfiler +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/DMAProfiler]: Top 10 (estimated) latency DMAs: +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/DMAProfiler]: Est. DMA time: 5.852us (1.000MiB, est bw: 179.191GB/s, 59.288% of tot. time) for float32<128 x 2048> TongaSB partitions[0] float32 (128, 2048) %13[i0.128,i1.2048] = load float32<128 x 2048> float32 (1, 256) %'x'[i0.128,i1.2048] # id=8, src_id=None, , instances=1 # dl = tensor_op_name: | if i0.128 == 0 and -i1.2048+255 >= 0 [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/DMAProfiler]: Est. DMA time: 4.018us (1.000MiB, est bw: 260.951GB/s, 40.712% of tot. time) for float32<128 x 2048> float32 (1, 256) %'y'[i0.128,i1.2048] = store float32<128 x 2048> TongaSB partitions[0] float32 (128, 2048) %11[i0.128,i1.2048] # id=10, src_id=None, , instances=1 # dl = tensor_op_name: | if i0.128 == 0 and -i1.2048+255 >= 0 [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/DMAProfiler]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/DMAProfiler]: DMAProfiler finished after 0.002 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/InferSharedMemLoc]: Running InferSharedMemLoc +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/InferSharedMemLoc]: Finished (changed=True) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/InferSharedMemLoc]: InferSharedMemLoc finished after 0.000 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/DoNothing]: Running DoNothing +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/DoNothing]: Finished (changed=True) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/InferSharedMemLoc]: Running InferSharedMemLoc +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/InferSharedMemLoc]: Finished (changed=True) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/InferSharedMemLoc]: InferSharedMemLoc finished after 0.000 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/FactorizeBlkDims]: Running FactorizeBlkDims +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/FactorizeBlkDims]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/FactorizeBlkDims]: FactorizeBlkDims finished after 0.001 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_0 +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_0 finished after 0.001 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/NeuronInstComb]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.001 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/NeuronValueNumbering]: Running NeuronValueNumbering +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/NeuronValueNumbering]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/NeuronValueNumbering]: NeuronValueNumbering finished after 0.002 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_0 +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_0 finished after 0.003 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/NeuronInstComb]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8738 [sg0000/Tensorizer/BirCodeGenLoop]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.003 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LowerTranspose]: Running LowerTranspose +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LowerTranspose]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LowerTranspose]: LowerTranspose finished after 0.000 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LowerBroadcast]: Running LowerBroadcast +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LowerBroadcast]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LowerBroadcast]: LowerBroadcast finished after 0.000 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb_iteration_0 +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb_iteration_0 finished after 0.001 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LateNeuronInstComb]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb finished after 0.001 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/SpillPSum]: Running SpillPSum +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/SpillPSum]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/SpillPSum]: SpillPSum finished after 0.001 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LowerIntrinsics]: Running LowerIntrinsics +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LowerIntrinsics]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LowerIntrinsics]: LowerIntrinsics finished after 0.000 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LegalizeType]: Running LegalizeType +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LegalizeType]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LegalizeType]: LegalizeType finished after 0.000 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/NeuronLICM]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8738 [sg0000/Tensorizer/BirCodeGenLoop]: BirCodeGenLoop finished after 0.227 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.001 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/InferPSumTensor]: Running InferPSumTensor +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/InferPSumTensor]: Running InferPSumTensor_iteration_0 +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/InferPSumTensor]: InferPSumTensor_iteration_0 finished after 0.001 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/InferPSumTensor]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/InferPSumTensor]: InferPSumTensor finished after 0.001 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/WeightCoalescing]: Running WeightCoalescing +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/WeightCoalescing]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/WeightCoalescing]: WeightCoalescing finished after 0.000 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LegalizeSundaAccess]: Running LegalizeSundaAccess +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LegalizeSundaAccess]: Finished (changed=True) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LegalizeSundaAccess]: LegalizeSundaAccess finished after 0.002 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.004 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/ExpandISAMacro]: Running ExpandISAMacro +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/ExpandISAMacro]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/ExpandISAMacro]: ExpandISAMacro finished after 0.001 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/SimplifyNeuronTensor]: Running SimplifyNeuronTensor +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/SimplifyNeuronTensor]: Running DeadCodeElimination_iteration_0 +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/SimplifyNeuronTensor]: DeadCodeElimination_iteration_0 finished after 0.003 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/SimplifyNeuronTensor]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/SimplifyNeuronTensor]: SimplifyNeuronTensor finished after 0.004 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/DMALocalityOpt]: Running DMALocalityOpt +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/DMALocalityOpt]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/DMALocalityOpt]: DMALocalityOpt finished after 0.000 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/DataStreaming]: Running DataStreaming +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/DataStreaming]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/DataStreaming]: DataStreaming finished after 0.000 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/SFKVectorizer]: Running SFKVectorizer +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/SFKVectorizer]: Running VectorizeLoop_iteration_0 +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/SFKVectorizer]: VectorizeLoop_iteration_0 finished after 0.002 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/SFKVectorizer]: Finished (changed=True) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/SFKVectorizer]: SFKVectorizer finished after 0.009 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LateLegalizeInst]: Running LateLegalizeInst +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LateLegalizeInst]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/LateLegalizeInst]: LateLegalizeInst finished after 0.002 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/CoalesceCCOp]: Running CoalesceCCOp +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/CoalesceCCOp]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/CoalesceCCOp]: CoalesceCCOp finished after 0.000 seconds +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/SimpleAllReduceTiling]: Running SimpleAllReduceTiling +2025-11-04T21:38:47Z INFO 8740 [cumsum/Tensorizer/SimpleAllReduceTiling]: Finished (changed=False) +2025-11-04T21:38:48Z INFO 8740 [cumsum/Tensorizer/SimpleAllReduceTiling]: SimpleAllReduceTiling finished after 0.000 seconds +2025-11-04T21:38:48Z INFO 8740 [cumsum/Tensorizer/InsertCoreBarrier]: Running InsertCoreBarrier +2025-11-04T21:38:48Z INFO 8740 [cumsum/Tensorizer/InsertCoreBarrier]: Finished (changed=False) +2025-11-04T21:38:48Z INFO 8740 [cumsum/Tensorizer/InsertCoreBarrier]: InsertCoreBarrier finished after 0.000 seconds +2025-11-04T21:38:48Z INFO 8740 [cumsum/Tensorizer/DMAProfiler]: Running DMAProfiler +2025-11-04T21:38:48Z INFO 8740 [cumsum/Tensorizer/DMAProfiler]: Top 10 (estimated) latency DMAs: +2025-11-04T21:38:48Z INFO 8740 [cumsum/Tensorizer/DMAProfiler]: Est. DMA time: 5.852us (1.000MiB, est bw: 179.191GB/s, 59.288% of tot. time) for float32<128 x 2048> TongaSB partitions[0] float32 (128, 2048) %13[i0.128,i1.2048] = load float32<128 x 2048> float32 (1, 256) %'x'[i0.128,i1.2048] # id=8, src_id=None, , instances=1 # dl = tensor_op_name: | if i0.128 == 0 and -i1.2048+255 >= 0 [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-11-04T21:38:48Z INFO 8740 [cumsum/Tensorizer/DMAProfiler]: Est. DMA time: 4.018us (1.000MiB, est bw: 260.951GB/s, 40.712% of tot. time) for float32<128 x 2048> float32 (1, 256) %'y'[i0.128,i1.2048] = store float32<128 x 2048> TongaSB partitions[0] float32 (128, 2048) %11[i0.128,i1.2048] # id=10, src_id=None, , instances=1 # dl = tensor_op_name: | if i0.128 == 0 and -i1.2048+255 >= 0 [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-11-04T21:38:48Z INFO 8740 [cumsum/Tensorizer/DMAProfiler]: Finished (changed=False) +2025-11-04T21:38:48Z INFO 8738 [Tensorizer]: BirCodeGen estimate #instances=2497 in sg0000 +2025-11-04T21:38:48Z INFO 8738 [Tensorizer]: IR signature: 5b877131f2ef8acfc34e97e3867e07024be6f656e97d33a85d081e6375f4e2da for nc00/sg0000/TensorizerBIR +2025-11-04T21:38:48Z INFO 8738 [sg0000/Tensorizer/BirCodeGenLoop]: Running BirCodeGenLoop +2025-11-04T21:38:48Z INFO 8740 [cumsum/Tensorizer/DMAProfiler]: DMAProfiler finished after 0.001 seconds +2025-11-04T21:38:48Z INFO 8740 [cumsum/Tensorizer/InferSharedMemLoc]: Running InferSharedMemLoc +2025-11-04T21:38:48Z INFO 8740 [cumsum/Tensorizer/InferSharedMemLoc]: Finished (changed=True) +2025-11-04T21:38:48Z INFO 8740 [cumsum/Tensorizer/InferSharedMemLoc]: InferSharedMemLoc finished after 0.000 seconds +2025-11-04T21:38:48Z INFO 8740 [sg0002/Tensorizer/OptimizeNKIKernels]: Finished (changed=True) +2025-11-04T21:38:48Z INFO 8740 [sg0002/Tensorizer/OptimizeNKIKernels]: OptimizeNKIKernels finished after 4.612 seconds +2025-11-04T21:38:48Z INFO 8740 [sg0002/Tensorizer/CCOpFusion]: Running CCOpFusion +2025-11-04T21:38:48Z INFO 8740 [sg0002/Tensorizer/CCOpFusion]: Running CCOpFusion_iteration_0 +2025-11-04T21:38:48Z INFO 8739 [sg0001/Tensorizer/BirCodeGenLoop]: BirCodeGenLoop finished after 0.086 seconds +2025-11-04T21:38:48Z INFO 8738 [sg0000/Tensorizer/BirCodeGenLoop]: Finished (changed=False) +2025-11-04T21:38:48Z INFO 8738 [sg0000/Tensorizer/BirCodeGenLoop]: BirCodeGenLoop finished after 0.160 seconds +2025-11-04T21:38:48Z INFO 8740 [sg0002/Tensorizer/CCOpFusion]: CCOpFusion_iteration_0 finished after 0.121 seconds +2025-11-04T21:38:48Z INFO 8740 [sg0002/Tensorizer/CCOpFusion]: Finished (changed=True) +2025-11-04T21:38:48Z INFO 8740 [sg0002/Tensorizer/CCOpFusion]: CCOpFusion finished after 0.122 seconds +2025-11-04T21:38:48Z INFO 8740 [sg0002/Tensorizer/StaticProfiler]: Running StaticProfiler +2025-11-04T21:38:48Z INFO 8739 [Tensorizer]: BirCodeGen estimate #instances=5010 in sg0001 +2025-11-04T21:38:48Z INFO 8739 [Tensorizer]: IR signature: ae3cf2eeac56439f4dbfa3195a8a1557e2d9206a5ca521c2c69b0869d598cc3d for nc00/sg0001/TensorizerBIR +2025-11-04T21:38:48Z INFO 8739 [sg0001/Tensorizer/BirCodeGenLoop]: Running BirCodeGenLoop +2025-11-04T21:38:48Z INFO 8740 [sg0002/Tensorizer/StaticProfiler]: Finished (changed=False) +2025-11-04T21:38:48Z INFO 8740 [sg0002/Tensorizer/StaticProfiler]: StaticProfiler finished after 0.041 seconds +2025-11-04T21:38:48Z INFO 8740 [sg0002/Tensorizer/SplitAPUnionSets]: Running SplitAPUnionSets +2025-11-04T21:38:48Z INFO 8739 [sg0001/Tensorizer/BirCodeGenLoop]: Finished (changed=False) +2025-11-04T21:38:48Z INFO 8739 [sg0001/Tensorizer/BirCodeGenLoop]: BirCodeGenLoop finished after 0.058 seconds +2025-11-04T21:38:48Z INFO 8740 [sg0002/Tensorizer/SplitAPUnionSets]: Finished (changed=True) +2025-11-04T21:38:48Z INFO 8738 [Tensorizer]: BirCodeGen estimate #instances=2497 in sg0000 +2025-11-04T21:38:48Z INFO 8738 [Tensorizer]: IR signature: 59e1c733b0daccd547db5862709f43e810d5628a4d5446ad5ca95c693a22ceb7 for nc01/sg0000/TensorizerBIR +2025-11-04T21:38:48Z INFO 8738 [Tensorizer]: Weights total number of bytes: 262402 +2025-11-04T21:38:48Z INFO 8740 [sg0002/Tensorizer/SplitAPUnionSets]: SplitAPUnionSets finished after 0.097 seconds +2025-11-04T21:38:48Z INFO 8740 [sg0002/Tensorizer/LateLegalizePostSplit]: Running LateLegalizePostSplit +2025-11-04T21:38:48Z INFO 8738 [Tensorizer]: Successfully built model. +2025-11-04T21:38:48Z INFO 8740 [sg0002/Tensorizer/LateLegalizePostSplit]: Finished (changed=False) +2025-11-04T21:38:48Z INFO 8740 [sg0002/Tensorizer/LateLegalizePostSplit]: LateLegalizePostSplit finished after 0.024 seconds +2025-11-04T21:38:48Z INFO 8740 [sg0002/Tensorizer/InferSharedMemLoc]: Running InferSharedMemLoc +2025-11-04T21:38:48Z INFO 8739 [Tensorizer]: BirCodeGen estimate #instances=5010 in sg0001 +2025-11-04T21:38:48Z INFO 8739 [Tensorizer]: IR signature: a1b6adb35cb835694160853d7728d7a2a42f91918c31b0588938b0c674982ff8 for nc01/sg0001/TensorizerBIR +2025-11-04T21:38:48Z INFO 8739 [Tensorizer]: Weights total number of bytes: 262146 +2025-11-04T21:38:48Z INFO 8739 [Tensorizer]: Successfully built model. +2025-11-04T21:38:48Z INFO 8740 [sg0002/Tensorizer/InferSharedMemLoc]: Finished (changed=True) +2025-11-04T21:38:48Z INFO 8740 [sg0002/Tensorizer/InferSharedMemLoc]: InferSharedMemLoc finished after 0.026 seconds +2025-11-04T21:38:48Z INFO 8740 [sg0002/Tensorizer/LowerShardAxis]: Running LowerShardAxis +2025-11-04T21:38:48Z INFO 8740 [sg0002/Tensorizer/LowerShardAxis]: Finished (changed=True) +2025-11-04T21:38:48Z INFO 8740 [sg0002/Tensorizer/LowerShardAxis]: LowerShardAxis finished after 0.033 seconds +2025-11-04T21:38:48Z INFO 8740 [sg0002/Tensorizer/CCOpFusion]: Running CCOpFusion +2025-11-04T21:38:48Z INFO 8740 [sg0002/Tensorizer/CCOpFusion]: Running CCOpFusion_iteration_0 +2025-11-04T21:38:48Z INFO 8740 [sg0002/Tensorizer/CCOpFusion]: CCOpFusion_iteration_0 finished after 0.085 seconds +2025-11-04T21:38:48Z INFO 8740 [sg0002/Tensorizer/CCOpFusion]: Finished (changed=True) +2025-11-04T21:38:48Z INFO 8740 [sg0002/Tensorizer/CCOpFusion]: CCOpFusion finished after 0.085 seconds +2025-11-04T21:38:48Z INFO 8740 [sg0002/Tensorizer/DumpGraphAndMetadata]: Running DumpGraphAndMetadata +2025-11-04T21:38:48Z INFO 8740 [sg0002/Tensorizer/DumpGraphAndMetadata]: Finished (changed=False) +2025-11-04T21:38:48Z INFO 8740 [sg0002/Tensorizer/DumpGraphAndMetadata]: DumpGraphAndMetadata finished after 0.034 seconds +2025-11-04T21:38:48Z INFO 8740 [sg0002/Tensorizer/ZeroSizeTensorElimination]: Running ZeroSizeTensorElimination +2025-11-04T21:38:48Z INFO 8740 [sg0002/Tensorizer/ZeroSizeTensorElimination]: Finished (changed=False) +2025-11-04T21:38:48Z INFO 8740 [sg0002/Tensorizer/ZeroSizeTensorElimination]: ZeroSizeTensorElimination finished after 0.000 seconds +2025-11-04T21:38:48Z INFO 8740 [sg0002/Tensorizer/LowerToSendRecv]: Running LowerToSendRecv +2025-11-04T21:38:48Z INFO 8740 [sg0002/Tensorizer/LowerToSendRecv]: Finished (changed=True) +2025-11-04T21:38:48Z INFO 8740 [sg0002/Tensorizer/LowerToSendRecv]: LowerToSendRecv finished after 0.034 seconds +2025-11-04T21:38:48Z INFO 8740 [sg0002/Tensorizer/BirCodeGenLoop]: Running BirCodeGenLoop +2025-11-04T21:38:49Z INFO 8740 [sg0002/Tensorizer/BirCodeGenLoop]: Finished (changed=False) +2025-11-04T21:38:49Z INFO 8740 [sg0002/Tensorizer/BirCodeGenLoop]: BirCodeGenLoop finished after 0.419 seconds +2025-11-04T21:38:49Z INFO 8740 [Tensorizer]: BirCodeGen estimate #instances=27737 in sg0002 +2025-11-04T21:38:49Z INFO 8740 [Tensorizer]: IR signature: 40b0e2826a5aa877434cdd2afa7ab0c573e5ec22d18ed88beaee2cb56969f285 for nc00/sg0002/TensorizerBIR +2025-11-04T21:38:49Z INFO 8740 [sg0002/Tensorizer/BirCodeGenLoop]: Running BirCodeGenLoop +2025-11-04T21:38:49Z INFO 8740 [sg0002/Tensorizer/BirCodeGenLoop]: Finished (changed=False) +2025-11-04T21:38:49Z INFO 8740 [sg0002/Tensorizer/BirCodeGenLoop]: BirCodeGenLoop finished after 0.291 seconds +2025-11-04T21:38:50Z INFO 8740 [Tensorizer]: BirCodeGen estimate #instances=27737 in sg0002 +2025-11-04T21:38:50Z INFO 8740 [Tensorizer]: IR signature: f488c29018b2708cbfbc5b1a6d95ed80e62df8a6ac43f19858b47ac5d0410655 for nc01/sg0002/TensorizerBIR +2025-11-04T21:38:50Z INFO 8740 [Tensorizer]: Weights total number of bytes: 410376 +2025-11-04T21:38:50Z INFO 8740 [Tensorizer]: Successfully built model. +2025-11-04T21:38:50Z USER 8698 [root/Tensorizer/Tensorizer]: Tensorizer finished after 15.430 seconds +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: End tensorization +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: Network input: input60 +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: Network input: input0 +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: Network input: input63 +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: Network input: input67 +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: Network input: input66 +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: Network input: input1 +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: Network input: input65 +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: Network input: input64 +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: Network input: input62 +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: Network input: input61 +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: Network input: input4 +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: Network input: input2 +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: Network input: input5 +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: Network input: input70 +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: Network input: input71 +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: Network input: input69 +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: Network input: input68 +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: Network input: input74 +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: Network input: input78 +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: Network input: input77 +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: Network input: input76 +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: Network input: input75 +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: Network input: input73 +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: Network input: input72 +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: Network input: input6 +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: Network input: input2 +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: Network input: input7 +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: Network input: input367 +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: Network input: input368 +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: Network input: input366 +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: Network input: input365 +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: Network input: input370 +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: Network input: input1 +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: Network input: input369 +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: Network input: input3 +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: wrote bir.json +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: wrote tensor_map.json +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: wrote bir.json +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: wrote tensor_map.json +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: wrote bir.json +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: wrote tensor_map.json +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: wrote bir.json +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: wrote tensor_map.json +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: wrote bir.json +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: wrote tensor_map.json +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: wrote bir.json +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: wrote tensor_map.json +2025-11-04T21:38:50Z INFO 8698 [job.Frontend.0]: Job #0 finished +2025-11-04T21:38:50Z INFO 8698 [pipeline.Pipeline.0]: Finished job job.Frontend.0 +2025-11-04T21:38:50Z INFO 8698 [pipeline.Pipeline.0]: Starting job job.StaticIOTranspose.0 +2025-11-04T21:38:50Z INFO 8698 [pipeline.Pipeline.0]: Finished job job.StaticIOTranspose.0 +2025-11-04T21:38:50Z INFO 8698 [pipeline.Pipeline.0]: Starting job job.WalrusDriver.0 +2025-11-04T21:38:50Z INFO 8698 [job.WalrusDriver.0]: BackendDriver has 6 states with 2 core LNC +2025-11-04T21:38:50Z INFO 8698 [job.WalrusDriver.0]: BackendDriver VNC cwd: /home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk4/neuronxcc-yihckw_e +2025-11-04T21:38:50Z INFO 8698 [job.WalrusDriver.0]: BackendDriver: found partitions within VNC, using VNC + MT modular flow. +2025-11-04T21:38:50Z INFO 8698 [job.BIRLinker.1]: Creating directory nc00/sgLnk/sg00 +2025-11-04T21:38:50Z INFO 8698 [job.BIRLinker.2]: Creating directory nc01/sgLnk/sg00 +2025-11-04T21:38:50Z INFO 8698 [job.WalrusDriver.0]: BackendDriver in_state.num_states 6 with 2 core LNC +2025-11-04T21:38:50Z INFO 8698 [job.WalrusDriver.0]: Executing /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/starfish/bin/walrus_driver --optlevel 2 --allocator coloring --verbose 35 --logfile-verbose 20 --logfile /home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk4/log-neuron-cc.txt -o walrus_bir.out.json --enable-call-graph --enable-mt-backend --link-subgraphs nc00/sg00,nc01/sg00,nc00/sg01,nc01/sg01,nc00/sg02,nc01/sg02 --link-dir sgLnk/sg00 --vnc-nc-per-sengine 2 --execute-repetition 1 -i bir.json --min_split_size 10240 --skip_split_vns '' --no_split_dram --split_huge_dram_tensor 1.0 --preprocessing_only --max_tensorizer_distance 64 --pack_same_shape_only --instruction_fetch_latency 511 --max-partitions 1 --policy 3 --auxflag 0 --interleave none --schedule-delayed-latency 1 --postsched-mm-accum-reorder=false --max-load-lower-bound 0.14 --force-prefetch-follow-incoming-order -1 --allreduce-buffer-size 500 --dram-page-size 512 --dram-rotation-size -1 --allreduce-rotation-dis 8 --repeat-load-thres 4 --enable-mm-transpose-remat-optimization=true --save-len-thres 512 --save-dma-cnt-thres 32 --print-format json --relaxed-order=true --enable-anti-dependence-reduction=false --num-semaphores-per-queue 16 --numcores 1 --act-root-json /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/pwp/pwp_bin_trainium/act_info.json --dve-root-json /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/dve/dve_bin_gen3/dve_info.json --enable-verifier=true --enable-birsim=false --enable-birsim-sync-only=false --enable-data-race-checker=false --enable-new-backend=true --inject-error=NONE --enable-internal-partitioner --dge-levels scalar_dynamic_offset,io,spill_reload,vector_dynamic_offsets --dynamic-dma-scratch-size-per-partition=16384 --neff-output-filename /home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk4/model.MODULE_95ef7ca73cc0a6161be2+96be3c33.neff +2025-11-04T21:38:50Z INFO 8698 [job.WalrusDriver.0]: Working directory is /home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk4/neuronxcc-yihckw_e +2025-11-04T21:38:50Z INFO 8698 [job.WalrusDriver.0]: propagate_exit=True +2025-11-04T21:38:50Z INFO 8698 [job.WalrusDriver.0]: use_logger=False +2025-11-04T21:38:50Z INFO 8698 [job.WalrusDriver.0]: expose_stderr=True +2025-11-04T21:38:50Z INFO 9072 [Logging]: Logging to ../log-neuron-cc.txt at level 'INFO' +2025-11-04T21:38:50Z INFO 9072 [BackendDriver]: max_allowed_parallelism=12 +2025-11-04T21:38:50Z INFO 9072 [BackendDriver]: Loading module from nc00/sg00/bir.json +2025-11-04T21:38:50Z INFO 9072 [BackendDriver]: Loading module from nc01/sg01/bir.json +2025-11-04T21:38:50Z INFO 9072 [BackendDriver]: Loading module from nc01/sg00/bir.json +2025-11-04T21:38:50Z INFO 9072 [BackendDriver]: Loading module from nc00/sg01/bir.json +2025-11-04T21:38:50Z INFO 9072 [BackendDriver]: Loading module from nc00/sg02/bir.json +2025-11-04T21:38:50Z INFO 9072 [BackendDriver]: Loading module from nc01/sg02/bir.json +2025-11-04T21:38:50Z INFO 9072 [BackendDriver]: Backend driver mtBackend: true numModules: 6 Cwd: "/home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk4/neuronxcc-yihckw_e" +2025-11-04T21:38:50Z INFO 9072 [BackendDriver]: DynamicDMA is enabled +2025-11-04T21:38:50Z INFO 9072 [BackendDriver]: DynamicDMA levels being enabled: io, spill_reload, scalar_dynamic_offset, vector_dynamic_offsets, +2025-11-04T21:38:50Z INFO 9072 [BackendDriver]: Modular flow call graph is enabled +2025-11-04T21:38:50Z INFO 9072 [BackendDriver]: Internal partitioner is enabled +2025-11-04T21:38:50Z USER 9072 [BackendPassManager]: Running mod_parallel_pass +2025-11-04T21:38:50Z INFO 9072 [BackendPassManager]: Inputs to mod_parallel_pass: modules=6 functions=6 allocs=1926 blocks=6 instructions=1664 Max writers: 65 Max Readers: 64 +2025-11-04T21:38:50Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running do_nothing +2025-11-04T21:38:50Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running do_nothing +2025-11-04T21:38:50Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running do_nothing +2025-11-04T21:38:50Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running do_nothing +2025-11-04T21:38:50Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running do_nothing +2025-11-04T21:38:50Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to do_nothing: modules=1 functions=1 allocs=212 blocks=1 instructions=72 Max writers: 4 Max Readers: 9 +2025-11-04T21:38:50Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to do_nothing: modules=1 functions=1 allocs=159 blocks=1 instructions=72 Max writers: 4 Max Readers: 8 +2025-11-04T21:38:50Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to do_nothing: modules=1 functions=1 allocs=159 blocks=1 instructions=72 Max writers: 4 Max Readers: 8 +2025-11-04T21:38:50Z USER 9072 (nc00/sg00) [ModuleForkPass]: do_nothing finished after 0.000 seconds +2025-11-04T21:38:50Z USER 9072 (nc01/sg01) [ModuleForkPass]: do_nothing finished after 0.000 seconds +2025-11-04T21:38:50Z USER 9072 (nc00/sg01) [ModuleForkPass]: do_nothing finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 102mb, ru_maxrss: 215mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 102mb, ru_maxrss: 215mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to do_nothing: modules=1 functions=1 allocs=592 blocks=1 instructions=688 Max writers: 65 Max Readers: 64 +2025-11-04T21:38:50Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 102mb, ru_maxrss: 215mb (delta=0mb) +2025-11-04T21:38:50Z USER 9072 (nc01/sg02) [ModuleForkPass]: do_nothing finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to do_nothing: modules=1 functions=1 allocs=592 blocks=1 instructions=688 Max writers: 65 Max Readers: 64 +2025-11-04T21:38:50Z USER 9072 (nc00/sg02) [ModuleForkPass]: do_nothing finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 159 memory location(s), 1 block(s), and 72 instruction(s). Max writers: 4 Max Readers: 8 +2025-11-04T21:38:50Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running birverifier +2025-11-04T21:38:50Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 102mb, ru_maxrss: 215mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=159 blocks=1 instructions=72 Max writers: 4 Max Readers: 8 +2025-11-04T21:38:50Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 102mb, ru_maxrss: 215mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 159 memory location(s), 1 block(s), and 72 instruction(s). Max writers: 4 Max Readers: 8 +2025-11-04T21:38:50Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running birverifier +2025-11-04T21:38:50Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=159 blocks=1 instructions=72 Max writers: 4 Max Readers: 8 +2025-11-04T21:38:50Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 212 memory location(s), 1 block(s), and 72 instruction(s). Max writers: 4 Max Readers: 9 +2025-11-04T21:38:50Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running birverifier +2025-11-04T21:38:50Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=212 blocks=1 instructions=72 Max writers: 4 Max Readers: 9 +2025-11-04T21:38:50Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 592 memory location(s), 1 block(s), and 688 instruction(s). Max writers: 65 Max Readers: 64 +2025-11-04T21:38:50Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 592 memory location(s), 1 block(s), and 688 instruction(s). Max writers: 65 Max Readers: 64 +2025-11-04T21:38:50Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running birverifier +2025-11-04T21:38:50Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running birverifier +2025-11-04T21:38:50Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=592 blocks=1 instructions=688 Max writers: 65 Max Readers: 64 +2025-11-04T21:38:50Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=592 blocks=1 instructions=688 Max writers: 65 Max Readers: 64 +2025-11-04T21:38:50Z WARNING 9072 [birverifier::InstVisitor]: (nc00/sg00) Non - output memory location with no reader: {convert.232.2310}@SB<0,0>(1x2)#Internal DebugInfo: +2025-11-04T21:38:50Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running do_nothing +2025-11-04T21:38:50Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to do_nothing: modules=1 functions=1 allocs=212 blocks=1 instructions=72 Max writers: 4 Max Readers: 9 +2025-11-04T21:38:50Z USER 9072 (nc01/sg00) [ModuleForkPass]: do_nothing finished after 0.004 seconds +2025-11-04T21:38:50Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 102mb, ru_maxrss: 215mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 212 memory location(s), 1 block(s), and 72 instruction(s). Max writers: 4 Max Readers: 9 +2025-11-04T21:38:50Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running birverifier +2025-11-04T21:38:50Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=212 blocks=1 instructions=72 Max writers: 4 Max Readers: 9 +2025-11-04T21:38:50Z WARNING 9072 [birverifier::InstVisitor]: (nc01/sg00) Non - output memory location with no reader: {convert.232.2310}@SB<0,0>(1x2)#Internal DebugInfo: +2025-11-04T21:38:50Z USER 9072 (nc00/sg00) [ModuleForkPass]: birverifier finished after 0.020 seconds +2025-11-04T21:38:50Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 110mb, ru_maxrss: 215mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 212 memory location(s), 1 block(s), and 72 instruction(s). Max writers: 4 Max Readers: 9 +2025-11-04T21:38:50Z USER 9072 (nc01/sg00) [ModuleForkPass]: birverifier finished after 0.050 seconds +2025-11-04T21:38:50Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 132mb, ru_maxrss: 215mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 212 memory location(s), 1 block(s), and 72 instruction(s). Max writers: 4 Max Readers: 9 +2025-11-04T21:38:50Z USER 9072 (nc01/sg01) [ModuleForkPass]: birverifier finished after 0.094 seconds +2025-11-04T21:38:50Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 162mb, ru_maxrss: 215mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 159 memory location(s), 1 block(s), and 72 instruction(s). Max writers: 4 Max Readers: 8 +2025-11-04T21:38:50Z USER 9072 (nc00/sg01) [ModuleForkPass]: birverifier finished after 0.099 seconds +2025-11-04T21:38:50Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 167mb, ru_maxrss: 215mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 159 memory location(s), 1 block(s), and 72 instruction(s). Max writers: 4 Max Readers: 8 +2025-11-04T21:38:50Z USER 9072 (nc00/sg02) [ModuleForkPass]: birverifier finished after 0.218 seconds +2025-11-04T21:38:50Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 221mb, ru_maxrss: 221mb (delta=6mb) +2025-11-04T21:38:50Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 592 memory location(s), 1 block(s), and 688 instruction(s). Max writers: 65 Max Readers: 64 +2025-11-04T21:38:50Z USER 9072 (nc01/sg02) [ModuleForkPass]: birverifier finished after 0.233 seconds +2025-11-04T21:38:50Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 222mb, ru_maxrss: 222mb (delta=7mb) +2025-11-04T21:38:50Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 592 memory location(s), 1 block(s), and 688 instruction(s). Max writers: 65 Max Readers: 64 +2025-11-04T21:38:50Z USER 9072 [ModuleForkPass]: Compilation status: Total modules: 6, Passed: 6, Failed: 0 +2025-11-04T21:38:50Z USER 9072 [BackendPassManager]: mod_parallel_pass finished after 0.236 seconds +2025-11-04T21:38:50Z INFO 9072 [BackendPassManager]: curr_vmrss: 222mb, ru_maxrss: 222mb (delta=7mb) +2025-11-04T21:38:50Z USER 9072 [BackendPassManager]: Running subgraph_parallel_pass +2025-11-04T21:38:50Z INFO 9072 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=6 functions=6 allocs=1926 blocks=6 instructions=1664 Max writers: 65 Max Readers: 64 +2025-11-04T21:38:50Z USER 9072 (sg00) [SubgraphForkPass]: Running lnc_verifier +2025-11-04T21:38:50Z USER 9072 (sg01) [SubgraphForkPass]: Running lnc_verifier +2025-11-04T21:38:50Z USER 9072 (sg02) [SubgraphForkPass]: Running lnc_verifier +2025-11-04T21:38:50Z INFO 9072 (sg00) [SubgraphForkPass]: Inputs to lnc_verifier: modules=2 functions=2 allocs=424 blocks=2 instructions=144 Max writers: 4 Max Readers: 9 +2025-11-04T21:38:50Z INFO 9072 (sg01) [SubgraphForkPass]: Inputs to lnc_verifier: modules=2 functions=2 allocs=318 blocks=2 instructions=144 Max writers: 4 Max Readers: 8 +2025-11-04T21:38:50Z USER 9072 (sg00) [SubgraphForkPass]: lnc_verifier finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9072 (sg00) [SubgraphForkPass]: curr_vmrss: 222mb, ru_maxrss: 222mb (delta=0mb) +2025-11-04T21:38:50Z USER 9072 (sg01) [SubgraphForkPass]: lnc_verifier finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9072 (sg01) [SubgraphForkPass]: curr_vmrss: 222mb, ru_maxrss: 222mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9072 (sg00) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 424 memory location(s), 2 block(s), and 144 instruction(s). Max writers: 4 Max Readers: 9 +2025-11-04T21:38:50Z INFO 9072 (sg02) [SubgraphForkPass]: Inputs to lnc_verifier: modules=2 functions=2 allocs=1184 blocks=2 instructions=1376 Max writers: 65 Max Readers: 64 +2025-11-04T21:38:50Z INFO 9072 (sg01) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 318 memory location(s), 2 block(s), and 144 instruction(s). Max writers: 4 Max Readers: 8 +2025-11-04T21:38:50Z USER 9072 (sg02) [SubgraphForkPass]: lnc_verifier finished after 0.001 seconds +2025-11-04T21:38:50Z INFO 9072 (sg02) [SubgraphForkPass]: curr_vmrss: 222mb, ru_maxrss: 222mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9072 (sg02) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 1184 memory location(s), 2 block(s), and 1376 instruction(s). Max writers: 65 Max Readers: 64 +2025-11-04T21:38:50Z USER 9072 [SubgraphForkPass]: Compilation status: Total subgraphs: 3, Passed: 3, Failed: 0 +2025-11-04T21:38:50Z USER 9072 [BackendPassManager]: subgraph_parallel_pass finished after 0.003 seconds +2025-11-04T21:38:50Z INFO 9072 [BackendPassManager]: curr_vmrss: 222mb, ru_maxrss: 222mb (delta=0mb) +2025-11-04T21:38:50Z USER 9072 [BackendPassManager]: Running mod_parallel_pass +2025-11-04T21:38:50Z INFO 9072 [BackendPassManager]: Inputs to mod_parallel_pass: modules=6 functions=6 allocs=1926 blocks=6 instructions=1664 Max writers: 65 Max Readers: 64 +2025-11-04T21:38:50Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running expand_replication +2025-11-04T21:38:50Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running expand_replication +2025-11-04T21:38:50Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to expand_replication: modules=1 functions=1 allocs=592 blocks=1 instructions=688 Max writers: 65 Max Readers: 64 +2025-11-04T21:38:50Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to expand_replication: modules=1 functions=1 allocs=592 blocks=1 instructions=688 Max writers: 65 Max Readers: 64 +2025-11-04T21:38:50Z INFO 9072 (nc01/sg02) [ExpandReplication]: Found 0 replicated matmults +2025-11-04T21:38:50Z INFO 9072 (nc00/sg02) [ExpandReplication]: Found 0 replicated matmults +2025-11-04T21:38:50Z USER 9072 (nc01/sg02) [ModuleForkPass]: expand_replication finished after 0.000 seconds +2025-11-04T21:38:50Z USER 9072 (nc00/sg02) [ModuleForkPass]: expand_replication finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 222mb, ru_maxrss: 222mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 222mb, ru_maxrss: 222mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 592 memory location(s), 1 block(s), and 688 instruction(s). Max writers: 65 Max Readers: 64 +2025-11-04T21:38:50Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running unroll +2025-11-04T21:38:50Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 592 memory location(s), 1 block(s), and 688 instruction(s). Max writers: 65 Max Readers: 64 +2025-11-04T21:38:50Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running unroll +2025-11-04T21:38:50Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to unroll: modules=1 functions=1 allocs=592 blocks=1 instructions=688 Max writers: 65 Max Readers: 64 +2025-11-04T21:38:50Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to unroll: modules=1 functions=1 allocs=592 blocks=1 instructions=688 Max writers: 65 Max Readers: 64 +2025-11-04T21:38:50Z INFO 9072 (nc01/sg02) [Unroll]: INFO (Unroll) Start unrolling at Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9072 (nc00/sg02) [Unroll]: INFO (Unroll) Start unrolling at Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running expand_replication +2025-11-04T21:38:50Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to expand_replication: modules=1 functions=1 allocs=212 blocks=1 instructions=72 Max writers: 4 Max Readers: 9 +2025-11-04T21:38:50Z INFO 9072 (nc01/sg00) [ExpandReplication]: Found 0 replicated matmults +2025-11-04T21:38:50Z USER 9072 (nc01/sg00) [ModuleForkPass]: expand_replication finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 224mb, ru_maxrss: 224mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 212 memory location(s), 1 block(s), and 72 instruction(s). Max writers: 4 Max Readers: 9 +2025-11-04T21:38:50Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running unroll +2025-11-04T21:38:50Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to unroll: modules=1 functions=1 allocs=212 blocks=1 instructions=72 Max writers: 4 Max Readers: 9 +2025-11-04T21:38:50Z INFO 9072 (nc01/sg00) [Unroll]: INFO (Unroll) Start unrolling at Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running expand_replication +2025-11-04T21:38:50Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to expand_replication: modules=1 functions=1 allocs=212 blocks=1 instructions=72 Max writers: 4 Max Readers: 9 +2025-11-04T21:38:50Z INFO 9072 (nc00/sg00) [ExpandReplication]: Found 0 replicated matmults +2025-11-04T21:38:50Z USER 9072 (nc00/sg00) [ModuleForkPass]: expand_replication finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 225mb, ru_maxrss: 225mb (delta=0mb) +2025-11-04T21:38:50Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running expand_replication +2025-11-04T21:38:50Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to expand_replication: modules=1 functions=1 allocs=159 blocks=1 instructions=72 Max writers: 4 Max Readers: 8 +2025-11-04T21:38:50Z INFO 9072 (nc00/sg01) [ExpandReplication]: Found 0 replicated matmults +2025-11-04T21:38:50Z USER 9072 (nc00/sg01) [ModuleForkPass]: expand_replication finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 226mb, ru_maxrss: 226mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 159 memory location(s), 1 block(s), and 72 instruction(s). Max writers: 4 Max Readers: 8 +2025-11-04T21:38:50Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running unroll +2025-11-04T21:38:50Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to unroll: modules=1 functions=1 allocs=159 blocks=1 instructions=72 Max writers: 4 Max Readers: 8 +2025-11-04T21:38:50Z INFO 9072 (nc00/sg01) [Unroll]: INFO (Unroll) Start unrolling at Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running expand_replication +2025-11-04T21:38:50Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to expand_replication: modules=1 functions=1 allocs=159 blocks=1 instructions=72 Max writers: 4 Max Readers: 8 +2025-11-04T21:38:50Z INFO 9072 (nc01/sg01) [ExpandReplication]: Found 0 replicated matmults +2025-11-04T21:38:50Z USER 9072 (nc01/sg01) [ModuleForkPass]: expand_replication finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 228mb, ru_maxrss: 228mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 159 memory location(s), 1 block(s), and 72 instruction(s). Max writers: 4 Max Readers: 8 +2025-11-04T21:38:50Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running unroll +2025-11-04T21:38:50Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to unroll: modules=1 functions=1 allocs=159 blocks=1 instructions=72 Max writers: 4 Max Readers: 8 +2025-11-04T21:38:50Z INFO 9072 (nc01/sg01) [Unroll]: INFO (Unroll) Start unrolling at Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 212 memory location(s), 1 block(s), and 72 instruction(s). Max writers: 4 Max Readers: 9 +2025-11-04T21:38:50Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running unroll +2025-11-04T21:38:50Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to unroll: modules=1 functions=1 allocs=212 blocks=1 instructions=72 Max writers: 4 Max Readers: 9 +2025-11-04T21:38:50Z INFO 9072 (nc00/sg00) [Unroll]: INFO (Unroll) Start unrolling at Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [Unroll]: INFO (Unroll) DONE unrolling Tue Nov 4 21:38:50 2025 + +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [Unroll]: sg0000 Instruction count after Unroll: +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [Unroll]: Total count: 2495 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [Unroll]: Matmult: 1281 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [Unroll]: TensorScalarPtr: 340 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [Unroll]: TensorTensor: 268 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [Unroll]: GenericCopy: 222 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [Unroll]: Activation: 110 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [Unroll]: DMACopy: 96 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [Unroll]: Load: 81 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [Unroll]: Save: 70 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [Unroll]: Memset: 10 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [Unroll]: StreamShuffle: 8 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [Unroll]: CoreBarrier: 4 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [Unroll]: CollectiveCompute: 3 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [Unroll]: Select: 1 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [Unroll]: BIRKernel: 1 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [Unroll]: Unrolled DGE count with Dynamic AP: 96 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: unroll finished after 0.105 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 307mb, ru_maxrss: 307mb (delta=83mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2485 memory location(s), 1 block(s), and 2495 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running dead_code_elim_o1 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to dead_code_elim_o1: modules=1 functions=1 allocs=2485 blocks=1 instructions=2495 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DeadCodeElim]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DeadCodeElim]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DeadCodeElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DeadCodeElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: dead_code_elim_o1 finished after 0.010 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 311mb, ru_maxrss: 311mb (delta=4mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1191 memory location(s), 1 block(s), and 2493 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [Unroll]: INFO (Unroll) DONE unrolling Tue Nov 4 21:38:50 2025 + +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [Unroll]: sg0000 Instruction count after Unroll: +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [Unroll]: Total count: 2497 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [Unroll]: Matmult: 1281 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [Unroll]: TensorScalarPtr: 340 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [Unroll]: TensorTensor: 268 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [Unroll]: GenericCopy: 222 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [Unroll]: Activation: 110 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [Unroll]: DMACopy: 97 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [Unroll]: Load: 81 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [Unroll]: Save: 71 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [Unroll]: Memset: 10 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [Unroll]: StreamShuffle: 8 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [Unroll]: CoreBarrier: 4 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [Unroll]: CollectiveCompute: 3 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [Unroll]: Select: 1 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [Unroll]: BIRKernel: 1 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [Unroll]: Unrolled DGE count with Dynamic AP: 96 +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: unroll finished after 0.136 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 321mb, ru_maxrss: 321mb (delta=90mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2485 memory location(s), 1 block(s), and 2497 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running dead_code_elim_o1 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to dead_code_elim_o1: modules=1 functions=1 allocs=2485 blocks=1 instructions=2497 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DeadCodeElim]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DeadCodeElim]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DeadCodeElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DeadCodeElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: dead_code_elim_o1 finished after 0.014 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 322mb, ru_maxrss: 322mb (delta=1mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1192 memory location(s), 1 block(s), and 2496 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [Unroll]: INFO (Unroll) DONE unrolling Tue Nov 4 21:38:50 2025 + +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [Unroll]: sg0001 Instruction count after Unroll: +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [Unroll]: Total count: 5010 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [Unroll]: Matmult: 3656 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [Unroll]: Load: 284 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [Unroll]: TensorScalarPtr: 254 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [Unroll]: GenericCopy: 245 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [Unroll]: TensorTensor: 240 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [Unroll]: Activation: 164 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [Unroll]: Save: 73 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [Unroll]: DMACopy: 66 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [Unroll]: Memset: 12 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [Unroll]: StreamShuffle: 8 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [Unroll]: CoreBarrier: 4 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [Unroll]: CollectiveCompute: 2 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [Unroll]: Select: 1 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [Unroll]: BIRKernel: 1 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [Unroll]: Unrolled DGE count with Dynamic AP: 64 +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: unroll finished after 0.276 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 391mb, ru_maxrss: 391mb (delta=165mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2712 memory location(s), 1 block(s), and 5010 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [Unroll]: INFO (Unroll) DONE unrolling Tue Nov 4 21:38:50 2025 + +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [Unroll]: sg0001 Instruction count after Unroll: +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [Unroll]: Total count: 5008 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [Unroll]: Matmult: 3656 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [Unroll]: Load: 284 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [Unroll]: TensorScalarPtr: 254 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [Unroll]: GenericCopy: 245 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [Unroll]: TensorTensor: 240 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [Unroll]: Activation: 164 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [Unroll]: Save: 72 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [Unroll]: DMACopy: 65 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [Unroll]: Memset: 12 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [Unroll]: StreamShuffle: 8 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [Unroll]: CoreBarrier: 4 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [Unroll]: CollectiveCompute: 2 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [Unroll]: Select: 1 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [Unroll]: BIRKernel: 1 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [Unroll]: Unrolled DGE count with Dynamic AP: 64 +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: unroll finished after 0.284 seconds +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running dead_code_elim_o1 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 373mb, ru_maxrss: 391mb (delta=163mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to dead_code_elim_o1: modules=1 functions=1 allocs=2712 blocks=1 instructions=5010 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [DeadCodeElim]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2712 memory location(s), 1 block(s), and 5008 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running dead_code_elim_o1 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to dead_code_elim_o1: modules=1 functions=1 allocs=2712 blocks=1 instructions=5008 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [DeadCodeElim]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [DeadCodeElim]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [DeadCodeElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [DeadCodeElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [DeadCodeElim]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [DeadCodeElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [DeadCodeElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: dead_code_elim_o1 finished after 0.019 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 376mb, ru_maxrss: 391mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1334 memory location(s), 1 block(s), and 5010 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: dead_code_elim_o1 finished after 0.018 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 373mb, ru_maxrss: 391mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1333 memory location(s), 1 block(s), and 5007 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [Unroll]: INFO (Unroll) DONE unrolling Tue Nov 4 21:38:50 2025 + +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [Unroll]: sg0002 Instruction count after Unroll: +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [Unroll]: Total count: 15907 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [Unroll]: Matmult: 12682 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [Unroll]: GenericCopy: 1516 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [Unroll]: Load: 551 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [Unroll]: Save: 331 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [Unroll]: TensorTensor: 159 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [Unroll]: Gather: 131 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [Unroll]: Max: 128 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [Unroll]: MaxIndexAndMatchReplace: 128 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [Unroll]: Activation: 119 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [Unroll]: TensorScalarPtr: 86 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [Unroll]: Memset: 26 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [Unroll]: CoreBarrier: 13 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [Unroll]: TensorReduce: 13 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [Unroll]: CollectiveCompute: 8 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [Unroll]: StreamShuffle: 4 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [Unroll]: Select: 4 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [Unroll]: Iota: 3 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [Unroll]: Reciprocal: 3 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [Unroll]: DMACopy: 2 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [Unroll]: Unrolled DGE count with Dynamic AP: 1 +2025-11-04T21:38:51Z USER 9072 (nc01/sg02) [ModuleForkPass]: unroll finished after 0.493 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 437mb, ru_maxrss: 437mb (delta=215mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 6470 memory location(s), 1 block(s), and 15907 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running dead_code_elim_o1 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to dead_code_elim_o1: modules=1 functions=1 allocs=6470 blocks=1 instructions=15907 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [DeadCodeElim]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [DeadCodeElim]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [DeadCodeElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [DeadCodeElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z USER 9072 (nc01/sg02) [ModuleForkPass]: dead_code_elim_o1 finished after 0.039 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 397mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2990 memory location(s), 1 block(s), and 15127 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [Unroll]: INFO (Unroll) DONE unrolling Tue Nov 4 21:38:50 2025 + +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [Unroll]: sg0002 Instruction count after Unroll: +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [Unroll]: Total count: 15918 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [Unroll]: Matmult: 12682 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [Unroll]: GenericCopy: 1516 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [Unroll]: Load: 551 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [Unroll]: Save: 342 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [Unroll]: TensorTensor: 159 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [Unroll]: Gather: 131 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [Unroll]: Max: 128 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [Unroll]: MaxIndexAndMatchReplace: 128 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [Unroll]: Activation: 119 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [Unroll]: TensorScalarPtr: 86 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [Unroll]: Memset: 26 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [Unroll]: CoreBarrier: 13 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [Unroll]: TensorReduce: 13 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [Unroll]: CollectiveCompute: 8 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [Unroll]: StreamShuffle: 4 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [Unroll]: Select: 4 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [Unroll]: Iota: 3 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [Unroll]: Reciprocal: 3 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [Unroll]: DMACopy: 2 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [Unroll]: Unrolled DGE count with Dynamic AP: 1 +2025-11-04T21:38:51Z USER 9072 (nc00/sg02) [ModuleForkPass]: unroll finished after 0.598 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 411mb, ru_maxrss: 437mb (delta=215mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 6470 memory location(s), 1 block(s), and 15918 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running dead_code_elim_o1 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to dead_code_elim_o1: modules=1 functions=1 allocs=6470 blocks=1 instructions=15918 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [DeadCodeElim]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [DeadCodeElim]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [DeadCodeElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [DeadCodeElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z USER 9072 (nc00/sg02) [ModuleForkPass]: dead_code_elim_o1 finished after 0.038 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 363mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3439 memory location(s), 1 block(s), and 15908 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z USER 9072 [ModuleForkPass]: Compilation status: Total modules: 6, Passed: 6, Failed: 0 +2025-11-04T21:38:51Z USER 9072 [BackendPassManager]: mod_parallel_pass finished after 0.644 seconds +2025-11-04T21:38:51Z INFO 9072 [BackendPassManager]: curr_vmrss: 363mb, ru_maxrss: 437mb (delta=215mb) +2025-11-04T21:38:51Z USER 9072 [BackendPassManager]: Running subgraph_parallel_pass +2025-11-04T21:38:51Z INFO 9072 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=6 functions=6 allocs=11479 blocks=6 instructions=46041 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z USER 9072 (sg00) [SubgraphForkPass]: Running localize_shared_memory +2025-11-04T21:38:51Z INFO 9072 (sg00) [SubgraphForkPass]: Inputs to localize_shared_memory: modules=2 functions=2 allocs=2383 blocks=2 instructions=4989 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (sg00) [SubgraphForkPass]: localize_shared_memory finished after 0.001 seconds +2025-11-04T21:38:51Z INFO 9072 (sg00) [SubgraphForkPass]: curr_vmrss: 363mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z USER 9072 (sg01) [SubgraphForkPass]: Running localize_shared_memory +2025-11-04T21:38:51Z USER 9072 (sg02) [SubgraphForkPass]: Running localize_shared_memory +2025-11-04T21:38:51Z INFO 9072 (sg01) [SubgraphForkPass]: Inputs to localize_shared_memory: modules=2 functions=2 allocs=2667 blocks=2 instructions=10017 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (sg02) [SubgraphForkPass]: Inputs to localize_shared_memory: modules=2 functions=2 allocs=6429 blocks=2 instructions=31035 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z USER 9072 (sg01) [SubgraphForkPass]: localize_shared_memory finished after 0.001 seconds +2025-11-04T21:38:51Z INFO 9072 (sg01) [SubgraphForkPass]: curr_vmrss: 363mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (sg01) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 2667 memory location(s), 2 block(s), and 10017 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (sg00) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 2379 memory location(s), 2 block(s), and 4989 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (sg02) [SubgraphForkPass]: localize_shared_memory finished after 0.001 seconds +2025-11-04T21:38:51Z INFO 9072 (sg02) [SubgraphForkPass]: curr_vmrss: 363mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (sg02) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 6429 memory location(s), 2 block(s), and 31035 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z USER 9072 [SubgraphForkPass]: Compilation status: Total subgraphs: 3, Passed: 3, Failed: 0 +2025-11-04T21:38:51Z USER 9072 [BackendPassManager]: subgraph_parallel_pass finished after 0.007 seconds +2025-11-04T21:38:51Z INFO 9072 [BackendPassManager]: curr_vmrss: 363mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z USER 9072 [BackendPassManager]: Running mod_parallel_pass +2025-11-04T21:38:51Z INFO 9072 [BackendPassManager]: Inputs to mod_parallel_pass: modules=6 functions=6 allocs=11475 blocks=6 instructions=46041 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running birverifier +2025-11-04T21:38:51Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running birverifier +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=2990 blocks=1 instructions=15127 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=3439 blocks=1 instructions=15908 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z WARNING 9072 [birverifier::InstVisitor]: (nc01/sg02) Non - output memory location with no reader: {divide.1_1267_i1}@SB<0,0>(1x1024)#Internal DebugInfo: +2025-11-04T21:38:51Z WARNING 9072 [birverifier::InstVisitor]: (nc01/sg02) Non - output memory location with no reader: {select.5_1272_i1}@SB<0,0>(1x1024)#Internal DebugInfo: +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running birverifier +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running birverifier +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=1189 blocks=1 instructions=2493 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running birverifier +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=1334 blocks=1 instructions=5010 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running birverifier +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=1190 blocks=1 instructions=2496 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=1333 blocks=1 instructions=5007 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: birverifier finished after 0.024 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 366mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1189 memory location(s), 1 block(s), and 2493 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: birverifier finished after 0.022 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 367mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1190 memory location(s), 1 block(s), and 2496 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: birverifier finished after 0.031 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 367mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1333 memory location(s), 1 block(s), and 5007 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: birverifier finished after 0.041 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 367mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1334 memory location(s), 1 block(s), and 5010 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc01/sg02) [ModuleForkPass]: birverifier finished after 0.097 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 376mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2990 memory location(s), 1 block(s), and 15127 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z USER 9072 (nc00/sg02) [ModuleForkPass]: birverifier finished after 0.105 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 377mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3439 memory location(s), 1 block(s), and 15908 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z USER 9072 [ModuleForkPass]: Compilation status: Total modules: 6, Passed: 6, Failed: 0 +2025-11-04T21:38:51Z USER 9072 [BackendPassManager]: mod_parallel_pass finished after 0.106 seconds +2025-11-04T21:38:51Z INFO 9072 [BackendPassManager]: curr_vmrss: 377mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z USER 9072 [BackendPassManager]: Running subgraph_parallel_pass +2025-11-04T21:38:51Z INFO 9072 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=6 functions=6 allocs=11475 blocks=6 instructions=46041 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z USER 9072 (sg00) [SubgraphForkPass]: Running lnc_verifier +2025-11-04T21:38:51Z USER 9072 (sg01) [SubgraphForkPass]: Running lnc_verifier +2025-11-04T21:38:51Z USER 9072 (sg02) [SubgraphForkPass]: Running lnc_verifier +2025-11-04T21:38:51Z INFO 9072 (sg01) [SubgraphForkPass]: Inputs to lnc_verifier: modules=2 functions=2 allocs=2667 blocks=2 instructions=10017 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (sg00) [SubgraphForkPass]: Inputs to lnc_verifier: modules=2 functions=2 allocs=2379 blocks=2 instructions=4989 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (sg02) [SubgraphForkPass]: Inputs to lnc_verifier: modules=2 functions=2 allocs=6429 blocks=2 instructions=31035 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z USER 9072 (sg01) [SubgraphForkPass]: lnc_verifier finished after 0.002 seconds +2025-11-04T21:38:51Z USER 9072 (sg00) [SubgraphForkPass]: lnc_verifier finished after 0.001 seconds +2025-11-04T21:38:51Z INFO 9072 (sg00) [SubgraphForkPass]: curr_vmrss: 377mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (sg00) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 2379 memory location(s), 2 block(s), and 4989 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (sg01) [SubgraphForkPass]: curr_vmrss: 377mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (sg01) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 2667 memory location(s), 2 block(s), and 10017 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (sg02) [SubgraphForkPass]: lnc_verifier finished after 0.009 seconds +2025-11-04T21:38:51Z INFO 9072 (sg02) [SubgraphForkPass]: curr_vmrss: 377mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (sg02) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 6429 memory location(s), 2 block(s), and 31035 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z USER 9072 [SubgraphForkPass]: Compilation status: Total subgraphs: 3, Passed: 3, Failed: 0 +2025-11-04T21:38:51Z USER 9072 [BackendPassManager]: subgraph_parallel_pass finished after 0.013 seconds +2025-11-04T21:38:51Z INFO 9072 [BackendPassManager]: curr_vmrss: 377mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z USER 9072 [BackendPassManager]: Running mod_parallel_pass +2025-11-04T21:38:51Z INFO 9072 [BackendPassManager]: Inputs to mod_parallel_pass: modules=6 functions=6 allocs=11475 blocks=6 instructions=46041 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running instruction_reorder +2025-11-04T21:38:51Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running instruction_reorder +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running instruction_reorder +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running instruction_reorder +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running instruction_reorder +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running instruction_reorder +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to instruction_reorder: modules=1 functions=1 allocs=1334 blocks=1 instructions=5010 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to instruction_reorder: modules=1 functions=1 allocs=1333 blocks=1 instructions=5007 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to instruction_reorder: modules=1 functions=1 allocs=1189 blocks=1 instructions=2493 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to instruction_reorder: modules=1 functions=1 allocs=1190 blocks=1 instructions=2496 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to instruction_reorder: modules=1 functions=1 allocs=2990 blocks=1 instructions=15127 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to instruction_reorder: modules=1 functions=1 allocs=3439 blocks=1 instructions=15908 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: instruction_reorder finished after 0.001 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 377mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: instruction_reorder finished after 0.001 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 377mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1189 memory location(s), 1 block(s), and 2493 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running psum_legalization +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to psum_legalization: modules=1 functions=1 allocs=1189 blocks=1 instructions=2493 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1190 memory location(s), 1 block(s), and 2496 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running psum_legalization +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to psum_legalization: modules=1 functions=1 allocs=1190 blocks=1 instructions=2496 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: instruction_reorder finished after 0.001 seconds +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: instruction_reorder finished after 0.001 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 377mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 377mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: psum_legalization finished after 0.000 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 377mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1334 memory location(s), 1 block(s), and 5010 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1333 memory location(s), 1 block(s), and 5007 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1189 memory location(s), 1 block(s), and 2493 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running psum_legalization +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running non_ssa_legalization +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running psum_legalization +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: psum_legalization finished after 0.000 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to non_ssa_legalization: modules=1 functions=1 allocs=1189 blocks=1 instructions=2493 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 377mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [NonSSALeg]: remove_redundant_loads +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to psum_legalization: modules=1 functions=1 allocs=1334 blocks=1 instructions=5010 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to psum_legalization: modules=1 functions=1 allocs=1333 blocks=1 instructions=5007 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1190 memory location(s), 1 block(s), and 2496 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running non_ssa_legalization +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to non_ssa_legalization: modules=1 functions=1 allocs=1190 blocks=1 instructions=2496 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [NonSSALeg]: remove_redundant_loads +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [NonSSALeg]: remove_redundant_loads: 0 +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: psum_legalization finished after 0.001 seconds +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: psum_legalization finished after 0.001 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 377mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 377mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [NonSSALeg]: remove_redundant_loads: 0 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1334 memory location(s), 1 block(s), and 5010 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1333 memory location(s), 1 block(s), and 5007 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running non_ssa_legalization +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to non_ssa_legalization: modules=1 functions=1 allocs=1334 blocks=1 instructions=5010 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [NonSSALeg]: remove_redundant_loads +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [NonSSALeg]: [Non-SSA legalization]created 0 memorylocations +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [NonSSALeg]: [Non-SSA legalization]created 0 memorylocations +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: non_ssa_legalization finished after 0.002 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 377mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1190 memory location(s), 1 block(s), and 2496 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running legalize_cce_dma +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to legalize_cce_dma: modules=1 functions=1 allocs=1190 blocks=1 instructions=2496 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: legalize_cce_dma finished after 0.000 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 377mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1190 memory location(s), 1 block(s), and 2496 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running pre_opts +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [NonSSALeg]: remove_redundant_loads: 0 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to pre_opts: modules=1 functions=1 allocs=1190 blocks=1 instructions=2496 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PreOpts]: Skipped. No pre-opt passes enabled +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: pre_opts finished after 0.000 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 377mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1190 memory location(s), 1 block(s), and 2496 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running error_injector +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to error_injector: modules=1 functions=1 allocs=1190 blocks=1 instructions=2496 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z WARNING 9072 (nc00/sg00) [ErrorInjector]: Unrecognized injected error value "0" +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: error_injector finished after 0.000 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 377mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1190 memory location(s), 1 block(s), and 2496 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running vn_splitter +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to vn_splitter: modules=1 functions=1 allocs=1190 blocks=1 instructions=2496 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [VNSplitter]: INFO (VNSplitter) Collected all the internal vnodes: size = 8 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [VNSplitter]: INFO (VNSplitter) Done with analyze and splitting: total dead nodes = 0 +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running non_ssa_legalization +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to non_ssa_legalization: modules=1 functions=1 allocs=1333 blocks=1 instructions=5007 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [NonSSALeg]: remove_redundant_loads +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [NonSSALeg]: [Non-SSA legalization]created 0 memorylocations +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: non_ssa_legalization finished after 0.004 seconds +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: non_ssa_legalization finished after 0.002 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 377mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 377mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1334 memory location(s), 1 block(s), and 5010 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1189 memory location(s), 1 block(s), and 2493 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running legalize_cce_dma +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running legalize_cce_dma +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to legalize_cce_dma: modules=1 functions=1 allocs=1189 blocks=1 instructions=2493 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to legalize_cce_dma: modules=1 functions=1 allocs=1334 blocks=1 instructions=5010 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: legalize_cce_dma finished after 0.000 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 377mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: legalize_cce_dma finished after 0.000 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 377mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1189 memory location(s), 1 block(s), and 2493 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running pre_opts +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1334 memory location(s), 1 block(s), and 5010 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to pre_opts: modules=1 functions=1 allocs=1189 blocks=1 instructions=2493 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PreOpts]: Skipped. No pre-opt passes enabled +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running pre_opts +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: pre_opts finished after 0.000 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 377mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [NonSSALeg]: remove_redundant_loads: 0 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to pre_opts: modules=1 functions=1 allocs=1334 blocks=1 instructions=5010 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PreOpts]: Skipped. No pre-opt passes enabled +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1189 memory location(s), 1 block(s), and 2493 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: pre_opts finished after 0.000 seconds +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running error_injector +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 377mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to error_injector: modules=1 functions=1 allocs=1189 blocks=1 instructions=2493 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z WARNING 9072 (nc01/sg00) [ErrorInjector]: Unrecognized injected error value "0" +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: error_injector finished after 0.000 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 377mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1334 memory location(s), 1 block(s), and 5010 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running error_injector +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1189 memory location(s), 1 block(s), and 2493 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running vn_splitter +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to error_injector: modules=1 functions=1 allocs=1334 blocks=1 instructions=5010 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z WARNING 9072 (nc00/sg01) [ErrorInjector]: Unrecognized injected error value "0" +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to vn_splitter: modules=1 functions=1 allocs=1189 blocks=1 instructions=2493 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: error_injector finished after 0.000 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 377mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z USER 9072 (nc01/sg02) [ModuleForkPass]: instruction_reorder finished after 0.007 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 377mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1334 memory location(s), 1 block(s), and 5010 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running vn_splitter +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [VNSplitter]: INFO (VNSplitter) Collected all the internal vnodes: size = 8 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [VNSplitter]: INFO (VNSplitter) Done with analyze and splitting: total dead nodes = 0 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to vn_splitter: modules=1 functions=1 allocs=1334 blocks=1 instructions=5010 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2990 memory location(s), 1 block(s), and 15127 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running psum_legalization +2025-11-04T21:38:51Z USER 9072 (nc00/sg02) [ModuleForkPass]: instruction_reorder finished after 0.004 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [VNSplitter]: INFO (VNSplitter) Collected all the internal vnodes: size = 16 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 377mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [VNSplitter]: INFO (VNSplitter) Done with analyze and splitting: total dead nodes = 0 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to psum_legalization: modules=1 functions=1 allocs=2990 blocks=1 instructions=15127 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3439 memory location(s), 1 block(s), and 15908 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running psum_legalization +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to psum_legalization: modules=1 functions=1 allocs=3439 blocks=1 instructions=15908 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [NonSSALeg]: [Non-SSA legalization]created 0 memorylocations +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: non_ssa_legalization finished after 0.006 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 378mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1333 memory location(s), 1 block(s), and 5007 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running legalize_cce_dma +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to legalize_cce_dma: modules=1 functions=1 allocs=1333 blocks=1 instructions=5007 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ShrinkDN]: INFO (ShrinkDN): Shrunk 1 nodes. Total savings 448 bytes/partition +2025-11-04T21:38:51Z INFO 9072 [PerformanceProfiler]: number of tensorizer non-local-tensor caused reload left 0 +2025-11-04T21:38:51Z INFO 9072 [PerformanceProfiler]: number of tensorizer non-local-tensor caused spill left 0 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ShrinkDN]: INFO (ShrinkDN): Shrunk 1 nodes. Total savings 448 bytes/partition +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [VNSplitterPass]: INFO (VNSplitter) Time: 0 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [VNSplitterPass]: INFO (VerticalFusion) Time: 0.001 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [VNSplitterPass]: INFO (ShrinkDN) Time: 0.001 seconds +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: vn_splitter finished after 0.003 seconds +2025-11-04T21:38:51Z USER 9072 (nc00/sg02) [ModuleForkPass]: psum_legalization finished after 0.002 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 378mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3439 memory location(s), 1 block(s), and 15908 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running non_ssa_legalization +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to non_ssa_legalization: modules=1 functions=1 allocs=3439 blocks=1 instructions=15908 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [NonSSALeg]: remove_redundant_loads +2025-11-04T21:38:51Z INFO 9072 [PerformanceProfiler]: number of tensorizer non-local-tensor caused reload left 0 +2025-11-04T21:38:51Z INFO 9072 [PerformanceProfiler]: number of tensorizer non-local-tensor caused spill left 0 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [VNSplitterPass]: INFO (VNSplitter) Time: 0 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [VNSplitterPass]: INFO (VerticalFusion) Time: 0.002 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [VNSplitterPass]: INFO (ShrinkDN) Time: 0.002 seconds +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: vn_splitter finished after 0.008 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 378mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1190 memory location(s), 1 block(s), and 2496 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running constant_propagate +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to constant_propagate: modules=1 functions=1 allocs=1190 blocks=1 instructions=2496 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ConstantPropagate]: [Constant_propagate for select] directly remove instruction number: 0 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 378mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1189 memory location(s), 1 block(s), and 2493 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running constant_propagate +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to constant_propagate: modules=1 functions=1 allocs=1189 blocks=1 instructions=2493 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg02) [ModuleForkPass]: psum_legalization finished after 0.006 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 378mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: legalize_cce_dma finished after 0.005 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2990 memory location(s), 1 block(s), and 15127 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 378mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running non_ssa_legalization +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ConstantPropagate]: [Constant_propagate for select] directly remove instruction number: 0 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1333 memory location(s), 1 block(s), and 5007 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running pre_opts +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to non_ssa_legalization: modules=1 functions=1 allocs=2990 blocks=1 instructions=15127 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to pre_opts: modules=1 functions=1 allocs=1333 blocks=1 instructions=5007 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [NonSSALeg]: remove_redundant_loads +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PreOpts]: Skipped. No pre-opt passes enabled +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ConstantPropagate]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: pre_opts finished after 0.000 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 378mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1333 memory location(s), 1 block(s), and 5007 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running error_injector +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to error_injector: modules=1 functions=1 allocs=1333 blocks=1 instructions=5007 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z WARNING 9072 (nc01/sg01) [ErrorInjector]: Unrecognized injected error value "0" +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: error_injector finished after 0.000 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 378mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1333 memory location(s), 1 block(s), and 5007 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running vn_splitter +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to vn_splitter: modules=1 functions=1 allocs=1333 blocks=1 instructions=5007 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [VNSplitter]: INFO (VNSplitter) Collected all the internal vnodes: size = 16 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [VNSplitter]: INFO (VNSplitter) Done with analyze and splitting: total dead nodes = 0 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ConstantPropagate]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 [PerformanceProfiler]: number of tensorizer non-local-tensor caused reload left 0 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 [PerformanceProfiler]: number of tensorizer non-local-tensor caused spill left 0 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [VNSplitterPass]: INFO (VNSplitter) Time: 0 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [VNSplitterPass]: INFO (VerticalFusion) Time: 0.001 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [VNSplitterPass]: INFO (ShrinkDN) Time: 0.007 seconds +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: vn_splitter finished after 0.010 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 378mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1334 memory location(s), 1 block(s), and 5010 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running constant_propagate +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to constant_propagate: modules=1 functions=1 allocs=1334 blocks=1 instructions=5010 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ConstantPropagate]: [Constant_propagate for Affineselect] directly remove instruction number: 0 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ConstantPropagate]: [Constant_propagate for select] directly remove instruction number: 0 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [NonSSALeg]: remove_redundant_loads: 0 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 [PerformanceProfiler]: number of tensorizer non-local-tensor caused reload left 0 +2025-11-04T21:38:51Z INFO 9072 [PerformanceProfiler]: number of tensorizer non-local-tensor caused spill left 0 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [VNSplitterPass]: INFO (VNSplitter) Time: 0 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [VNSplitterPass]: INFO (VerticalFusion) Time: 0.001 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [VNSplitterPass]: INFO (ShrinkDN) Time: 0.005 seconds +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: vn_splitter finished after 0.008 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 378mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ConstantPropagate]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ConstantPropagate]: [Constant_propagate for Affineselect] directly remove instruction number: 0 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1333 memory location(s), 1 block(s), and 5007 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running constant_propagate +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to constant_propagate: modules=1 functions=1 allocs=1333 blocks=1 instructions=5007 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ConstantPropagate]: [Constant_propagate for select] directly remove instruction number: 0 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ConstantPropagate]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [NonSSALeg]: remove_redundant_loads: 0 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ConstantPropagate]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: constant_propagate finished after 0.015 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 378mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1189 memory location(s), 1 block(s), and 2493 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running lower_ac +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to lower_ac: modules=1 functions=1 allocs=1189 blocks=1 instructions=2493 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [LowerAC]: INFO (LowerAC) Lowered 0 loads, 0 saves, 0 copies. +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: lower_ac finished after 0.000 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 378mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1189 memory location(s), 1 block(s), and 2493 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running input_dma_coalescing +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to input_dma_coalescing: modules=1 functions=1 allocs=1189 blocks=1 instructions=2493 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: DMA input Coalescing combined 0 input loads +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: input_dma_coalescing finished after 0.001 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 379mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1189 memory location(s), 1 block(s), and 2493 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running remat_optimization +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to remat_optimization: modules=1 functions=1 allocs=1189 blocks=1 instructions=2493 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ConstantPropagate]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: constant_propagate finished after 0.019 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 379mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1190 memory location(s), 1 block(s), and 2496 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running lower_ac +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to lower_ac: modules=1 functions=1 allocs=1190 blocks=1 instructions=2496 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [LowerAC]: INFO (LowerAC) Lowered 0 loads, 0 saves, 0 copies. +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: lower_ac finished after 0.000 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 379mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1190 memory location(s), 1 block(s), and 2496 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running input_dma_coalescing +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to input_dma_coalescing: modules=1 functions=1 allocs=1190 blocks=1 instructions=2496 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: DMA input Coalescing combined 0 input loads +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: input_dma_coalescing finished after 0.001 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 379mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1190 memory location(s), 1 block(s), and 2496 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running remat_optimization +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to remat_optimization: modules=1 functions=1 allocs=1190 blocks=1 instructions=2496 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ConstantPropagate]: [Constant_propagate for Affineselect] directly remove instruction number: 0 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [NonSSALeg]: [Non-SSA legalization]created 0 memorylocations +2025-11-04T21:38:51Z USER 9072 (nc01/sg02) [ModuleForkPass]: non_ssa_legalization finished after 0.024 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 379mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2990 memory location(s), 1 block(s), and 15127 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running legalize_cce_dma +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to legalize_cce_dma: modules=1 functions=1 allocs=2990 blocks=1 instructions=15127 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [RematOpt]: Removed 0 remat instructions +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: remat_optimization finished after 0.008 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 379mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ConstantPropagate]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1189 memory location(s), 1 block(s), and 2493 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running coalesce_multichannel_cc_ops +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to coalesce_multichannel_cc_ops: modules=1 functions=1 allocs=1189 blocks=1 instructions=2493 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: coalesce_multichannel_cc_ops finished after 0.000 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 379mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1189 memory location(s), 1 block(s), and 2493 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running infer_stream_ids +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to infer_stream_ids: modules=1 functions=1 allocs=1189 blocks=1 instructions=2493 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: infer_stream_ids finished after 0.000 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 379mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1189 memory location(s), 1 block(s), and 2493 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running pre_sched +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to pre_sched: modules=1 functions=1 allocs=1189 blocks=1 instructions=2493 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PreSched]: Start PRE scheduling 2 cores: 1 at: Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 [LayerSpiller]: LayerSpill: Start... +2025-11-04T21:38:51Z INFO 9072 [LayerSpiller]: LayerSpill: Found 1 Splits CCs +2025-11-04T21:38:51Z INFO 9072 [LayerSpiller]: Grouped CCs to 1 clusters. +2025-11-04T21:38:51Z INFO 9072 [LayerSpiller]: LayerSpill: To Spill 0 multi-layer tensors +2025-11-04T21:38:51Z INFO 9072 [LayerSpiller]: LayerSpill: set uninit flag on 0 insts +2025-11-04T21:38:51Z INFO 9072 [LayerSpiller]: LayerSpill: Done. +2025-11-04T21:38:51Z USER 9072 (nc01/sg02) [ModuleForkPass]: legalize_cce_dma finished after 0.002 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PreSched]: Start split live ranges Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 379mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2990 memory location(s), 1 block(s), and 15127 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running pre_opts +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PreSched]: No split opportunities: +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PreSched]: End split live ranges Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PreSched]: Strt remove redundncies Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PreSched]: remove_redundant_memsets +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to pre_opts: modules=1 functions=1 allocs=2990 blocks=1 instructions=15127 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [PreOpts]: Skipped. No pre-opt passes enabled +2025-11-04T21:38:51Z USER 9072 (nc01/sg02) [ModuleForkPass]: pre_opts finished after 0.000 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 379mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2990 memory location(s), 1 block(s), and 15127 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running error_injector +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to error_injector: modules=1 functions=1 allocs=2990 blocks=1 instructions=15127 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z WARNING 9072 (nc01/sg02) [ErrorInjector]: Unrecognized injected error value "0" +2025-11-04T21:38:51Z USER 9072 (nc01/sg02) [ModuleForkPass]: error_injector finished after 0.000 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 379mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2990 memory location(s), 1 block(s), and 15127 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running vn_splitter +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to vn_splitter: modules=1 functions=1 allocs=2990 blocks=1 instructions=15127 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [VNSplitter]: INFO (VNSplitter) Collected all the internal vnodes: size = 8 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [VNSplitter]: INFO (VNSplitter) Done with analyze and splitting: total dead nodes = 0 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [RematOpt]: Removed 0 remat instructions +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: remat_optimization finished after 0.008 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 379mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1190 memory location(s), 1 block(s), and 2496 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running coalesce_multichannel_cc_ops +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to coalesce_multichannel_cc_ops: modules=1 functions=1 allocs=1190 blocks=1 instructions=2496 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: coalesce_multichannel_cc_ops finished after 0.000 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 379mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1190 memory location(s), 1 block(s), and 2496 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running infer_stream_ids +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to infer_stream_ids: modules=1 functions=1 allocs=1190 blocks=1 instructions=2496 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: infer_stream_ids finished after 0.000 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 379mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1190 memory location(s), 1 block(s), and 2496 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running pre_sched +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to pre_sched: modules=1 functions=1 allocs=1190 blocks=1 instructions=2496 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PreSched]: Start PRE scheduling 2 cores: 1 at: Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 [LayerSpiller]: LayerSpill: Start... +2025-11-04T21:38:51Z INFO 9072 [LayerSpiller]: LayerSpill: Found 1 Splits CCs +2025-11-04T21:38:51Z INFO 9072 [LayerSpiller]: Grouped CCs to 1 clusters. +2025-11-04T21:38:51Z INFO 9072 [LayerSpiller]: LayerSpill: To Spill 0 multi-layer tensors +2025-11-04T21:38:51Z INFO 9072 [LayerSpiller]: LayerSpill: set uninit flag on 0 insts +2025-11-04T21:38:51Z INFO 9072 [LayerSpiller]: LayerSpill: Done. +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PreSched]: Start split live ranges Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PreSched]: No split opportunities: +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PreSched]: End split live ranges Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PreSched]: Strt remove redundncies Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PreSched]: remove_redundant_memsets +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PreSched]: remove_redundant_memsets: 0 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PreSched]: remove_redundant_loads +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PreSched]: remove_redundant_memsets: 0 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PreSched]: remove_redundant_loads +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ConstantPropagate]: [Constant_propagate for Affineselect] directly remove instruction number: 0 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PreSched]: remove_redundant_loads: 0 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PreSched]: End remove redundncies Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PreSched]: Start DCE Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PreSched]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ConstantPropagate]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PreSched]: remove_redundant_loads: 0 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PreSched]: End remove redundncies Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PreSched]: Start DCE Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PreSched]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PreSched]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PreSched]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PreSched]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PreSched]: End DCE Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [NonSSALeg]: [Non-SSA legalization]created 0 memorylocations +2025-11-04T21:38:51Z USER 9072 (nc00/sg02) [ModuleForkPass]: non_ssa_legalization finished after 0.040 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 379mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PreSched]: Start build flow dependencies Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [build_flow_deps]: Start build fdeps. Invocation: 1Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3439 memory location(s), 1 block(s), and 15908 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [build_flow_deps]: Allocs: 1189 instructions: 2493 +2025-11-04T21:38:51Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running legalize_cce_dma +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: constant_propagate finished after 0.034 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 379mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to legalize_cce_dma: modules=1 functions=1 allocs=3439 blocks=1 instructions=15908 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1334 memory location(s), 1 block(s), and 5010 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running lower_ac +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to lower_ac: modules=1 functions=1 allocs=1334 blocks=1 instructions=5010 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [LowerAC]: INFO (LowerAC) Lowered 0 loads, 0 saves, 0 copies. +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: lower_ac finished after 0.000 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 379mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1334 memory location(s), 1 block(s), and 5010 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running input_dma_coalescing +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to input_dma_coalescing: modules=1 functions=1 allocs=1334 blocks=1 instructions=5010 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PreSched]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PreSched]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PreSched]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PreSched]: End DCE Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PreSched]: Start build flow dependencies Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [build_flow_deps]: Start build fdeps. Invocation: 2Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [build_flow_deps]: Allocs: 1190 instructions: 2496 +2025-11-04T21:38:51Z USER 9072 (nc00/sg02) [ModuleForkPass]: legalize_cce_dma finished after 0.002 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 379mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3439 memory location(s), 1 block(s), and 15908 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running pre_opts +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to pre_opts: modules=1 functions=1 allocs=3439 blocks=1 instructions=15908 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [PreOpts]: Skipped. No pre-opt passes enabled +2025-11-04T21:38:51Z USER 9072 (nc00/sg02) [ModuleForkPass]: pre_opts finished after 0.000 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 379mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3439 memory location(s), 1 block(s), and 15908 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running error_injector +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to error_injector: modules=1 functions=1 allocs=3439 blocks=1 instructions=15908 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z WARNING 9072 (nc00/sg02) [ErrorInjector]: Unrecognized injected error value "0" +2025-11-04T21:38:51Z USER 9072 (nc00/sg02) [ModuleForkPass]: error_injector finished after 0.000 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 379mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3439 memory location(s), 1 block(s), and 15908 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running vn_splitter +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to vn_splitter: modules=1 functions=1 allocs=3439 blocks=1 instructions=15908 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [VNSplitter]: INFO (VNSplitter) Collected all the internal vnodes: size = 15 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [VNSplitter]: INFO (VNSplitter) Done with analyze and splitting: total dead nodes = 0 +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: constant_propagate finished after 0.037 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 379mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1333 memory location(s), 1 block(s), and 5007 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running lower_ac +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to lower_ac: modules=1 functions=1 allocs=1333 blocks=1 instructions=5007 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: DMA input Coalescing combined 0 input loads +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: input_dma_coalescing finished after 0.015 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 379mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1334 memory location(s), 1 block(s), and 5010 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running remat_optimization +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to remat_optimization: modules=1 functions=1 allocs=1334 blocks=1 instructions=5010 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [LowerAC]: INFO (LowerAC) Lowered 0 loads, 0 saves, 0 copies. +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: lower_ac finished after 0.009 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 379mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1333 memory location(s), 1 block(s), and 5007 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running input_dma_coalescing +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to input_dma_coalescing: modules=1 functions=1 allocs=1333 blocks=1 instructions=5007 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 [PerformanceProfiler]: number of tensorizer non-local-tensor caused reload left 0 +2025-11-04T21:38:51Z INFO 9072 [PerformanceProfiler]: number of tensorizer non-local-tensor caused spill left 0 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [VNSplitterPass]: INFO (VNSplitter) Time: 0 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [VNSplitterPass]: INFO (VerticalFusion) Time: 0.003 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [VNSplitterPass]: INFO (ShrinkDN) Time: 0.007 seconds +2025-11-04T21:38:51Z USER 9072 (nc01/sg02) [ModuleForkPass]: vn_splitter finished after 0.029 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: DMA input Coalescing combined 0 input loads +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: input_dma_coalescing finished after 0.002 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [RematOpt]: Removed 0 remat instructions +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: remat_optimization finished after 0.003 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 379mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1334 memory location(s), 1 block(s), and 5010 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running coalesce_multichannel_cc_ops +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to coalesce_multichannel_cc_ops: modules=1 functions=1 allocs=1334 blocks=1 instructions=5010 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: coalesce_multichannel_cc_ops finished after 0.000 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 379mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 379mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1334 memory location(s), 1 block(s), and 5010 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running infer_stream_ids +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to infer_stream_ids: modules=1 functions=1 allocs=1334 blocks=1 instructions=5010 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2990 memory location(s), 1 block(s), and 15127 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running constant_propagate +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: infer_stream_ids finished after 0.000 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 379mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1334 memory location(s), 1 block(s), and 5010 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to constant_propagate: modules=1 functions=1 allocs=2990 blocks=1 instructions=15127 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running pre_sched +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to pre_sched: modules=1 functions=1 allocs=1334 blocks=1 instructions=5010 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PreSched]: Start PRE scheduling 2 cores: 1 at: Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 [LayerSpiller]: LayerSpill: Start... +2025-11-04T21:38:51Z INFO 9072 [LayerSpiller]: LayerSpill: Found 2 Splits CCs +2025-11-04T21:38:51Z INFO 9072 [LayerSpiller]: Grouped CCs to 2 clusters. +2025-11-04T21:38:51Z INFO 9072 [LayerSpiller]: LayerSpill: To Spill 0 multi-layer tensors +2025-11-04T21:38:51Z INFO 9072 [LayerSpiller]: LayerSpill: set uninit flag on 0 insts +2025-11-04T21:38:51Z INFO 9072 [LayerSpiller]: LayerSpill: Done. +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PreSched]: Start split live ranges Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PreSched]: No split opportunities: +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PreSched]: End split live ranges Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PreSched]: Strt remove redundncies Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ShrinkDN]: INFO (ShrinkDN): Shrunk 2 nodes. Total savings 14336 bytes/partition +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PreSched]: remove_redundant_memsets +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 379mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1333 memory location(s), 1 block(s), and 5007 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running remat_optimization +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to remat_optimization: modules=1 functions=1 allocs=1333 blocks=1 instructions=5007 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PreSched]: remove_redundant_memsets: 0 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PreSched]: remove_redundant_loads +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PreSched]: remove_redundant_loads: 0 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PreSched]: End remove redundncies Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PreSched]: Start DCE Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PreSched]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [build_flow_deps]: Build fdeps inserted 6900 edges +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [build_flow_deps]: Done build fdeps 6900 Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [RematOpt]: Removed 0 remat instructions +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PreSched]: End build flow dependencies Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PreSched]: Start remove useless insts Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PreSched]: remove_useless_insts +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: remat_optimization finished after 0.003 seconds +2025-11-04T21:38:51Z INFO 9072 [PerformanceProfiler]: number of tensorizer non-local-tensor caused reload left 0 +2025-11-04T21:38:51Z INFO 9072 [PerformanceProfiler]: number of tensorizer non-local-tensor caused spill left 0 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [VNSplitterPass]: INFO (VNSplitter) Time: 0 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [VNSplitterPass]: INFO (VerticalFusion) Time: 0.007 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [VNSplitterPass]: INFO (ShrinkDN) Time: 0.009 seconds +2025-11-04T21:38:51Z USER 9072 (nc00/sg02) [ModuleForkPass]: vn_splitter finished after 0.022 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 379mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PreSched]: remove Useless Instructions: 0 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PreSched]: End remove useless insts Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PreSched]: Start scratchpad optimization Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3439 memory location(s), 1 block(s), and 15908 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running constant_propagate +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to constant_propagate: modules=1 functions=1 allocs=3439 blocks=1 instructions=15908 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PreSched]: End scratchpad optimization Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ConstantPropagate]: [Constant_propagate for select] directly remove instruction number: 0 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PreSched]: DONE PRE scheduling Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: pre_sched finished after 0.044 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 379mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1189 memory location(s), 1 block(s), and 2493 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running tensor_copy_elim +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=1189 blocks=1 instructions=2493 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ConstantPropagate]: [Constant_propagate for select] directly remove instruction number: 0 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [TensorCopyElim]: Tensor CP elimination: 0 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [TensorCopyElim]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [TensorCopyElim]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [TensorCopyElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [TensorCopyElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: tensor_copy_elim finished after 0.004 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 379mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1189 memory location(s), 1 block(s), and 2493 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running dynamic_dma_setup +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to dynamic_dma_setup: modules=1 functions=1 allocs=1189 blocks=1 instructions=2493 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: dynamic_dma_setup finished after 0.000 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 379mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1190 memory location(s), 1 block(s), and 2493 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running runtime_memory_reservation +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to runtime_memory_reservation: modules=1 functions=1 allocs=1190 blocks=1 instructions=2493 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: runtime_memory_reservation finished after 0.000 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 379mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1191 memory location(s), 1 block(s), and 2493 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running lower_klir_kernel +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to lower_klir_kernel: modules=1 functions=1 allocs=1191 blocks=1 instructions=2493 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PreSched]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: lower_klir_kernel finished after 0.000 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 379mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1191 memory location(s), 1 block(s), and 2493 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running lower_nki_kernel +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to lower_nki_kernel: modules=1 functions=1 allocs=1191 blocks=1 instructions=2493 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: lower_nki_kernel finished after 0.000 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 379mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1191 memory location(s), 1 block(s), and 2493 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running coloring_allocator_psum +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to coloring_allocator_psum: modules=1 functions=1 allocs=1191 blocks=1 instructions=2493 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PreSched]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PreSched]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PreSched]: End DCE Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 379mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1333 memory location(s), 1 block(s), and 5007 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running coalesce_multichannel_cc_ops +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to coalesce_multichannel_cc_ops: modules=1 functions=1 allocs=1333 blocks=1 instructions=5007 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PSUM_Allocator]: allocating PSUM +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PSUM_Allocator]: main loop +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PSUM_Allocator]: renumber locations +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PSUM_Allocator]: size = 290 +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: coalesce_multichannel_cc_ops finished after 0.001 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 379mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1333 memory location(s), 1 block(s), and 5007 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running infer_stream_ids +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to infer_stream_ids: modules=1 functions=1 allocs=1333 blocks=1 instructions=5007 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: infer_stream_ids finished after 0.000 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 379mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1333 memory location(s), 1 block(s), and 5007 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running pre_sched +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PSUM_Allocator]: build_no_bitmap start +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to pre_sched: modules=1 functions=1 allocs=1333 blocks=1 instructions=5007 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PreSched]: Start PRE scheduling 2 cores: 1 at: Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PSUM_Allocator]: 100% PSUM demand before spilling +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PSUM_Allocator]: PSUM high-water mark = 8 tensors +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PSUM_Allocator]: found 708 edges +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PSUM_Allocator]: mean: 4.88276 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PSUM_Allocator]: median: 5.98549 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PSUM_Allocator]: adjacency vectors require 5664 bytes +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PSUM_Allocator]: build_no_bitmap done +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PSUM_Allocator]: find costs +2025-11-04T21:38:51Z INFO 9072 [LayerSpiller]: LayerSpill: Start... +2025-11-04T21:38:51Z INFO 9072 [LayerSpiller]: LayerSpill: Found 2 Splits CCs +2025-11-04T21:38:51Z INFO 9072 [LayerSpiller]: Grouped CCs to 2 clusters. +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ConstantPropagate]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:51Z INFO 9072 [LayerSpiller]: LayerSpill: To Spill 0 multi-layer tensors +2025-11-04T21:38:51Z INFO 9072 [LayerSpiller]: LayerSpill: set uninit flag on 0 insts +2025-11-04T21:38:51Z INFO 9072 [LayerSpiller]: LayerSpill: Done. +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PreSched]: Start split live ranges Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PreSched]: No split opportunities: +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PreSched]: End split live ranges Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PreSched]: Strt remove redundncies Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PreSched]: remove_redundant_memsets +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PreSched]: remove_redundant_memsets: 0 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PreSched]: remove_redundant_loads +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PSUM_Allocator]: best-of-n loop, heuristic = 0, allow_psum_spill_within_accum_group = false +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PSUM_Allocator]: simplify interference graph +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PSUM_Allocator]: initialize low and high +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PSUM_Allocator]: lo = 290 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PSUM_Allocator]: hi = 0 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PSUM_Allocator]: inf = 0 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PSUM_Allocator]: total = 290 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PSUM_Allocator]: simplify +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PSUM_Allocator]: new candidates = 0 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PSUM_Allocator]: select ranges +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PreSched]: Start build flow dependencies Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [build_flow_deps]: Start build fdeps. Invocation: 3Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PSUM_Allocator]: no more spills +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PSUM_Allocator]: PSUM score = 0 (lower is better) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PSUM_Allocator]: spilling from PSUM cost about 0 cycles +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PSUM_Allocator]: 100% PSUM utilization after allocation +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: coloring_allocator_psum finished after 0.006 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 379mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1191 memory location(s), 1 block(s), and 2493 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running dma_optimization_psum +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to dma_optimization_psum: modules=1 functions=1 allocs=1191 blocks=1 instructions=2493 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload instructions +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload memory locations +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: dma_optimization_psum finished after 0.002 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 380mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1191 memory location(s), 1 block(s), and 2493 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running address_rotation_psum +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to address_rotation_psum: modules=1 functions=1 allocs=1191 blocks=1 instructions=2493 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [build_flow_deps]: Allocs: 1334 instructions: 5010 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: PSUM Rotation rotated 4 PSUM Banks +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: PSUM Rotation rotated 1 PSUM Banks +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: PSUM Rotation rotated 57 PSUM Banks +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: address_rotation_psum finished after 0.006 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 380mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1191 memory location(s), 1 block(s), and 2493 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running coloring_allocator_sb +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to coloring_allocator_sb: modules=1 functions=1 allocs=1191 blocks=1 instructions=2493 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes loaded 35742468 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ColoringAllocator::Rep]: INFO: Pre GCA average loaded DMA size 3715 bytes +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes saved 21495808 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ColoringAllocator::Rep]: INFO: Pre GCA average saved DMA size 2399 bytes +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 4243456 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 172 bytes +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: allocating SB +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: main loop +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: renumber locations +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: size = 855 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: find partners +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: found 148 accumulation groups +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [build_flow_deps]: Build fdeps inserted 6902 edges +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [build_flow_deps]: Done build fdeps 6902 Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PreSched]: End build flow dependencies Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PreSched]: Start remove useless insts Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PreSched]: remove_useless_insts +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ConstantPropagate]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PreSched]: remove_redundant_loads: 0 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PreSched]: End remove redundncies Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PreSched]: Start DCE Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PreSched]: remove Useless Instructions: 0 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PreSched]: End remove useless insts Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PreSched]: Start scratchpad optimization Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PreSched]: End scratchpad optimization Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PreSched]: DONE PRE scheduling Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: pre_sched finished after 0.070 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 380mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1190 memory location(s), 1 block(s), and 2496 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running tensor_copy_elim +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=1190 blocks=1 instructions=2496 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [TensorCopyElim]: Tensor CP elimination: 0 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ConstantPropagate]: [Constant_propagate for Affineselect] directly remove instruction number: 0 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PreSched]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [TensorCopyElim]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [TensorCopyElim]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: largest = _dot.3-t1658_i61 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: tensors = 10 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: requires 40960 bytes/partition +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: expanding partners +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [TensorCopyElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [TensorCopyElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ConstantPropagate]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:51Z INFO 9072 []: find first defs for local +2025-11-04T21:38:51Z INFO 9072 []: find first defs for global +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: tensor_copy_elim finished after 0.012 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 380mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1190 memory location(s), 1 block(s), and 2496 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running dynamic_dma_setup +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to dynamic_dma_setup: modules=1 functions=1 allocs=1190 blocks=1 instructions=2496 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: dynamic_dma_setup finished after 0.000 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 380mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [build_flow_deps]: Build fdeps inserted 15159 edges +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [build_flow_deps]: Done build fdeps 15159 Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PreSched]: End build flow dependencies Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PreSched]: Start remove useless insts Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PreSched]: remove_useless_insts +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1191 memory location(s), 1 block(s), and 2496 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running runtime_memory_reservation +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to runtime_memory_reservation: modules=1 functions=1 allocs=1191 blocks=1 instructions=2496 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: runtime_memory_reservation finished after 0.000 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 380mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1192 memory location(s), 1 block(s), and 2496 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running lower_klir_kernel +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to lower_klir_kernel: modules=1 functions=1 allocs=1192 blocks=1 instructions=2496 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: lower_klir_kernel finished after 0.000 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 380mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1192 memory location(s), 1 block(s), and 2496 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running lower_nki_kernel +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to lower_nki_kernel: modules=1 functions=1 allocs=1192 blocks=1 instructions=2496 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: lower_nki_kernel finished after 0.000 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 380mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1192 memory location(s), 1 block(s), and 2496 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running coloring_allocator_psum +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to coloring_allocator_psum: modules=1 functions=1 allocs=1192 blocks=1 instructions=2496 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PSUM_Allocator]: allocating PSUM +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PSUM_Allocator]: main loop +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PSUM_Allocator]: renumber locations +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PSUM_Allocator]: size = 290 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PreSched]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PreSched]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PreSched]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PreSched]: End DCE Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PreSched]: remove Useless Instructions: 0 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PreSched]: End remove useless insts Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PreSched]: Start scratchpad optimization Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PreSched]: End scratchpad optimization Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PreSched]: Start build flow dependencies Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [build_flow_deps]: Start build fdeps. Invocation: 4Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PreSched]: DONE PRE scheduling Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: pre_sched finished after 0.067 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PSUM_Allocator]: build_no_bitmap start +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 380mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1334 memory location(s), 1 block(s), and 5010 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running tensor_copy_elim +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=1334 blocks=1 instructions=5010 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PSUM_Allocator]: 100% PSUM demand before spilling +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PSUM_Allocator]: PSUM high-water mark = 8 tensors +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PSUM_Allocator]: found 708 edges +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PSUM_Allocator]: mean: 4.88276 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PSUM_Allocator]: median: 5.98549 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PSUM_Allocator]: adjacency vectors require 5664 bytes +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PSUM_Allocator]: build_no_bitmap done +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PSUM_Allocator]: find costs +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [build_flow_deps]: Allocs: 1333 instructions: 5007 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ConstantPropagate]: [Constant_propagate for Affineselect] directly remove instruction number: 0 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [TensorCopyElim]: Tensor CP elimination: 1 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [TensorCopyElim]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:51Z USER 9072 (nc00/sg02) [ModuleForkPass]: constant_propagate finished after 0.063 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 380mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3439 memory location(s), 1 block(s), and 15908 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running lower_ac +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to lower_ac: modules=1 functions=1 allocs=3439 blocks=1 instructions=15908 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [TensorCopyElim]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [TensorCopyElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [LowerAC]: INFO (LowerAC) Lowered 0 loads, 0 saves, 0 copies. +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [TensorCopyElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z USER 9072 (nc00/sg02) [ModuleForkPass]: lower_ac finished after 0.003 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 380mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3439 memory location(s), 1 block(s), and 15908 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running input_dma_coalescing +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: tensor_copy_elim finished after 0.008 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 380mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1333 memory location(s), 1 block(s), and 5009 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to input_dma_coalescing: modules=1 functions=1 allocs=3439 blocks=1 instructions=15908 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running dynamic_dma_setup +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to dynamic_dma_setup: modules=1 functions=1 allocs=1333 blocks=1 instructions=5009 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: dynamic_dma_setup finished after 0.000 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 380mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1334 memory location(s), 1 block(s), and 5009 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running runtime_memory_reservation +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to runtime_memory_reservation: modules=1 functions=1 allocs=1334 blocks=1 instructions=5009 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: runtime_memory_reservation finished after 0.000 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 380mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1335 memory location(s), 1 block(s), and 5009 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running lower_klir_kernel +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to lower_klir_kernel: modules=1 functions=1 allocs=1335 blocks=1 instructions=5009 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: lower_klir_kernel finished after 0.000 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ConstantPropagate]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 380mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1335 memory location(s), 1 block(s), and 5009 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running lower_nki_kernel +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to lower_nki_kernel: modules=1 functions=1 allocs=1335 blocks=1 instructions=5009 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: lower_nki_kernel finished after 0.000 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 380mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1335 memory location(s), 1 block(s), and 5009 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running coloring_allocator_psum +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to coloring_allocator_psum: modules=1 functions=1 allocs=1335 blocks=1 instructions=5009 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PSUM_Allocator]: best-of-n loop, heuristic = 0, allow_psum_spill_within_accum_group = false +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PSUM_Allocator]: simplify interference graph +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PSUM_Allocator]: initialize low and high +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PSUM_Allocator]: lo = 290 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PSUM_Allocator]: hi = 0 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PSUM_Allocator]: inf = 0 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PSUM_Allocator]: total = 290 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PSUM_Allocator]: simplify +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PSUM_Allocator]: new candidates = 0 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PSUM_Allocator]: select ranges +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PSUM_Allocator]: no more spills +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PSUM_Allocator]: PSUM score = 0 (lower is better) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PSUM_Allocator]: spilling from PSUM cost about 0 cycles +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [PSUM_Allocator]: 100% PSUM utilization after allocation +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: coloring_allocator_psum finished after 0.025 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 381mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1192 memory location(s), 1 block(s), and 2496 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running dma_optimization_psum +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to dma_optimization_psum: modules=1 functions=1 allocs=1192 blocks=1 instructions=2496 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PSUM_Allocator]: allocating PSUM +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PSUM_Allocator]: main loop +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PSUM_Allocator]: renumber locations +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PSUM_Allocator]: size = 326 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PSUM_Allocator]: build_no_bitmap start +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PSUM_Allocator]: 100% PSUM demand before spilling +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PSUM_Allocator]: PSUM high-water mark = 8 tensors +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PSUM_Allocator]: found 1087 edges +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PSUM_Allocator]: mean: 6.66871 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PSUM_Allocator]: median: 6.99997 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PSUM_Allocator]: adjacency vectors require 8696 bytes +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PSUM_Allocator]: build_no_bitmap done +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PSUM_Allocator]: find costs +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: find loads +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: DMA input Coalescing combined 0 input loads +2025-11-04T21:38:51Z USER 9072 (nc00/sg02) [ModuleForkPass]: input_dma_coalescing finished after 0.010 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 381mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: 2 pin count +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: 72 remat count +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: 2 pinned tensors will require about 16392 bytes/partition +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: build interference graph +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: pass 1 int-tree +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3439 memory location(s), 1 block(s), and 15908 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running remat_optimization +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to remat_optimization: modules=1 functions=1 allocs=3439 blocks=1 instructions=15908 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: Num intervals 855 Num locations 855 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: IntervalTree Build Done +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: info.neighbors init Done +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: info.neighbors partners Done +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: IntervalTree readback Done +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: edge: 20447 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: mean: 47.8292 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: median: 43.9454 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: find costs +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload instructions +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload memory locations +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: dma_optimization_psum finished after 0.011 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 382mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1192 memory location(s), 1 block(s), and 2496 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running address_rotation_psum +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to address_rotation_psum: modules=1 functions=1 allocs=1192 blocks=1 instructions=2496 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: best-of-n loop, heuristic = 0 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: simplify interference graph +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: initialize safe & unsafe +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: safe = 767 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: unsafe = 74 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: inf = 12 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: total = 853 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: simplify +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: simplify_step3_sorted2 #Unsafe 0 #Pinned 0 #Safe 0 minCost 1.79769e+308 maxCost 2.22507e-308 locations 855 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: new candidates = 0 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: select ranges +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: PSUM Rotation rotated 4 PSUM Banks +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: Total: 853 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: Spilled: 0.000 (0) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: Allocated: 1.000 (853) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: Rover zone: 0.931 (794) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: Pre-rover zone: 0.009 (8) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: Post-rover zone: 0.060 (51) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: Slice zone: 0.000 (0) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: Blocks nothing: 0.001 (1) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: Blocks medium: 0.000 (0) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: Blocks tall: 0.999 (852) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: Visited until tall blocking (mean): 0.994 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: Visited until tall blocking (median): 1.000 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: Visited until tall blocking (p95): 1.000 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: Success +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: SB spills = 0 tensors +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: size = 0 bytes/partition +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: remats = 0 tensors +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: unpinned = 0 tensors +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: size = 0 bytes/partition +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: SB score = 0 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: spilling from SB cost about 0 cycles +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: 16392 bytes/partition (100%) successfully pinned +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: pinning saved approximately 8300 cycles +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [SB_Allocator]: 0% SB utilization after allocation +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PSUM_Allocator]: best-of-n loop, heuristic = 0, allow_psum_spill_within_accum_group = false +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PSUM_Allocator]: simplify interference graph +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PSUM_Allocator]: initialize low and high +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PSUM_Allocator]: lo = 326 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PSUM_Allocator]: hi = 0 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PSUM_Allocator]: inf = 0 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PSUM_Allocator]: total = 326 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PSUM_Allocator]: simplify +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PSUM_Allocator]: new candidates = 0 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PSUM_Allocator]: select ranges +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes loaded 35742468 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ColoringAllocator::Rep]: INFO: Post GCA average loaded DMA size 3715 bytes +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes saved 21495808 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ColoringAllocator::Rep]: INFO: Post GCA average saved DMA size 2399 bytes +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 4243456 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 172 bytes +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: coloring_allocator_sb finished after 0.063 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 382mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1191 memory location(s), 1 block(s), and 2493 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running address_rotation_sb +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=1191 blocks=1 instructions=2493 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PSUM_Allocator]: no more spills +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PSUM_Allocator]: PSUM score = 0 (lower is better) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PSUM_Allocator]: spilling from PSUM cost about 0 cycles +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [PSUM_Allocator]: 100% PSUM utilization after allocation +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: coloring_allocator_psum finished after 0.019 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 382mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1335 memory location(s), 1 block(s), and 5009 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running dma_optimization_psum +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to dma_optimization_psum: modules=1 functions=1 allocs=1335 blocks=1 instructions=5009 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: PSUM Rotation rotated 1 PSUM Banks +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: address_rotation_sb finished after 0.003 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 382mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1191 memory location(s), 1 block(s), and 2493 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running dma_optimization_sb +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to dma_optimization_sb: modules=1 functions=1 allocs=1191 blocks=1 instructions=2493 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: DMA optimization In bytes loaded or saved 57238276, 25.8061% input load, 8.24377% output write, 65.9502% spill/reload [sg0000] +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload instructions +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload memory locations +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: dma_optimization_psum finished after 0.005 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 383mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1335 memory location(s), 1 block(s), and 5009 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running address_rotation_psum +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to address_rotation_psum: modules=1 functions=1 allocs=1335 blocks=1 instructions=5009 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: removed 0 identical load +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: adjusted 0 DMACopy remat +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: sub-graph will get execute 1 times +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: [Load Merging]: removed 0 remat/cloned instructions +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: [Load shrink]: shrinked 0 GCA remat/cloned instructions +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: [Load Merging + Load shrink] reduced input/const loading DMA traffic 7864320, 13.7396% out of total dma traffic(1.47709e+07) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [RematOpt]: Removed 0 remat instructions +2025-11-04T21:38:51Z USER 9072 (nc00/sg02) [ModuleForkPass]: remat_optimization finished after 0.018 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 383mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: PSUM Rotation rotated 57 PSUM Banks +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: address_rotation_psum finished after 0.015 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 383mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3439 memory location(s), 1 block(s), and 15908 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running coalesce_multichannel_cc_ops +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1192 memory location(s), 1 block(s), and 2496 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running coloring_allocator_sb +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to coloring_allocator_sb: modules=1 functions=1 allocs=1192 blocks=1 instructions=2496 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to coalesce_multichannel_cc_ops: modules=1 functions=1 allocs=3439 blocks=1 instructions=15908 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes loaded 35742468 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ColoringAllocator::Rep]: INFO: Pre GCA average loaded DMA size 3715 bytes +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes saved 21495810 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ColoringAllocator::Rep]: INFO: Pre GCA average saved DMA size 2398 bytes +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 4243456 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 172 bytes +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [build_flow_deps]: Build fdeps inserted 15157 edges +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [build_flow_deps]: Done build fdeps 15157 Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PreSched]: End build flow dependencies Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PreSched]: Start remove useless insts Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PreSched]: remove_useless_insts +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: [spill optimization round 0]: removed 24 spill/reload instructions +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: [spill optimization round 0]: removed 24 spill/reload memory locations +2025-11-04T21:38:51Z USER 9072 (nc00/sg02) [ModuleForkPass]: coalesce_multichannel_cc_ops finished after 0.002 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 383mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: PSUM Rotation rotated 0 PSUM Banks +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3439 memory location(s), 1 block(s), and 15908 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running infer_stream_ids +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: allocating SB +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: main loop +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to infer_stream_ids: modules=1 functions=1 allocs=3439 blocks=1 instructions=15908 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: renumber locations +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: size = 856 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: [spill optimization round 1]: removed 0 spill/reload instructions +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: [spill optimization round 1]: removed 0 spill/reload memory locations +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 12582912, 33.3333% out of total spill/reload dma traffic +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: find partners +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: found 148 accumulation groups +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: largest = _dot.3-t1658_i31 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: tensors = 10 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: requires 40960 bytes/partition +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: expanding partners +2025-11-04T21:38:51Z USER 9072 (nc00/sg02) [ModuleForkPass]: infer_stream_ids finished after 0.002 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 383mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3439 memory location(s), 1 block(s), and 15908 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running pre_sched +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload instructions +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload memory locations +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to pre_sched: modules=1 functions=1 allocs=3439 blocks=1 instructions=15908 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [PreSched]: Start PRE scheduling 2 cores: 1 at: Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: [Re-allocation Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload instructions +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload memory locations +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-11-04T21:38:51Z INFO 9072 [LayerSpiller]: LayerSpill: Start... +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: PSUM Rotation rotated 2 PSUM Banks +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: [remove extra save] removed 0 memlocs and 0 instructions +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: [remove_memset_spill]: removed 0 spill/reload instructions +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: [remove_memset_spill]: removed 0 spill/reload memory locations +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PreSched]: remove Useless Instructions: 0 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PreSched]: End remove useless insts Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PreSched]: Start scratchpad optimization Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:51Z INFO 9072 []: find first defs for local +2025-11-04T21:38:51Z INFO 9072 [LayerSpiller]: LayerSpill: Found 2 Splits CCs +2025-11-04T21:38:51Z INFO 9072 [LayerSpiller]: Grouped CCs to 2 clusters. +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PreSched]: End scratchpad optimization Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 []: find first defs for global +2025-11-04T21:38:51Z USER 9072 (nc01/sg02) [ModuleForkPass]: constant_propagate finished after 0.115 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 383mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2990 memory location(s), 1 block(s), and 15127 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running lower_ac +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to lower_ac: modules=1 functions=1 allocs=2990 blocks=1 instructions=15127 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: PSUM Rotation rotated 2 PSUM Banks +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: address_rotation_psum finished after 0.015 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 383mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1335 memory location(s), 1 block(s), and 5009 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running coloring_allocator_sb +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to coloring_allocator_sb: modules=1 functions=1 allocs=1335 blocks=1 instructions=5009 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: DMA SpillSave Coalescing Round 0 combined 0 SpillSaves and Reloads +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: average loaded DMA size 3448 bytes +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: average saved DMA size 2180 bytes +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes loaded 103916036 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ColoringAllocator::Rep]: INFO: Pre GCA average loaded DMA size 2889 bytes +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes saved 27262978 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ColoringAllocator::Rep]: INFO: Pre GCA average saved DMA size 2957 bytes +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 2129920 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 130 bytes +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes loaded 19489540 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: INFO: Post DMA coalescing average loaded DMA size 3448 bytes +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes saved 17301504 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: INFO: Post DMA coalescing average saved DMA size 2180 bytes +2025-11-04T21:38:51Z INFO 9072 [LayerSpiller]: LayerSpill: To Spill 0 multi-layer tensors +2025-11-04T21:38:51Z INFO 9072 [LayerSpiller]: LayerSpill: set uninit flag on 0 insts +2025-11-04T21:38:51Z INFO 9072 [LayerSpiller]: LayerSpill: Done. +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [PreSched]: Start split live ranges Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: [Experiment partial DMA access] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: [DMA optimization] reduced DMA traffic 20447232, 35.723% out of total dma traffic +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: DMA optimization Out bytes loaded or saved 36791044, 28.7479% input load, 12.8254% output write, 58.4267% spill/reload [sg0000] +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes loaded 19489540 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: INFO: Post DMA optimization average loaded DMA size 3448 bytes +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes saved 17301504 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: INFO: Post DMA optimization average saved DMA size 2180 bytes +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes DMAcopyed 4243456 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: INFO: Post DMA optimization average DMAcopyed DMA size 172 bytes +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: INFO: Post DMA optimization average DMA size 1072 bytes +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: INFO: Finished set_spill_canreadUninit(module); +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: DMA optimization re-enable optimization +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: dma_optimization_sb finished after 0.020 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 383mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1151 memory location(s), 1 block(s), and 2454 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running address_rotation_sb +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=1151 blocks=1 instructions=2454 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: allocating SB +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: main loop +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: renumber locations +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: size = 952 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: find loads +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: SB Rotation rotated 2 Sb address +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: 2 pin count +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: 72 remat count +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: 2 pinned tensors will require about 16392 bytes/partition +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: build interference graph +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: pass 1 int-tree +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: find partners +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [PreSched]: Num_Splits: 0 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [PreSched]: End split live ranges Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [PreSched]: Strt remove redundncies Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [PreSched]: remove_redundant_memsets +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: found 282 accumulation groups +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: largest = _dot.6-t1592_i7 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: tensors = 36 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: requires 49152 bytes/partition +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: expanding partners +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: Num intervals 856 Num locations 856 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: IntervalTree Build Done +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: info.neighbors init Done +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: info.neighbors partners Done +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: IntervalTree readback Done +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: edge: 20461 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: mean: 47.8061 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: median: 44.1866 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: find costs +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: SB Rotation rotated 15 Sb address +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [LowerAC]: INFO (LowerAC) Lowered 0 loads, 0 saves, 0 copies. +2025-11-04T21:38:51Z USER 9072 (nc01/sg02) [ModuleForkPass]: lower_ac finished after 0.010 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 384mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 []: find first defs for local +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2990 memory location(s), 1 block(s), and 15127 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running input_dma_coalescing +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to input_dma_coalescing: modules=1 functions=1 allocs=2990 blocks=1 instructions=15127 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [PreSched]: remove_redundant_memsets: 5 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [PreSched]: remove_redundant_loads +2025-11-04T21:38:51Z INFO 9072 []: find first defs for global +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: best-of-n loop, heuristic = 0 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: simplify interference graph +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: initialize safe & unsafe +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: safe = 768 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: unsafe = 74 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: inf = 12 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: total = 854 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: simplify +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: simplify_step3_sorted2 #Unsafe 0 #Pinned 0 #Safe 0 minCost 1.79769e+308 maxCost 2.22507e-308 locations 856 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: new candidates = 0 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: select ranges +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: Total: 854 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: Spilled: 0.000 (0) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: Allocated: 1.000 (854) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: Rover zone: 0.931 (795) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: Pre-rover zone: 0.009 (8) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: Post-rover zone: 0.060 (51) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: Slice zone: 0.000 (0) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: Blocks nothing: 0.001 (1) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: Blocks medium: 0.000 (0) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: Blocks tall: 0.999 (853) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: Visited until tall blocking (mean): 0.993 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: Visited until tall blocking (median): 1.000 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: Visited until tall blocking (p95): 1.000 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: Success +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: SB spills = 0 tensors +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: size = 0 bytes/partition +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: remats = 0 tensors +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: unpinned = 0 tensors +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: size = 0 bytes/partition +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: SB score = 0 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: spilling from SB cost about 0 cycles +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: 16392 bytes/partition (100%) successfully pinned +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: pinning saved approximately 8300 cycles +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [SB_Allocator]: 0% SB utilization after allocation +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PreSched]: DONE PRE scheduling Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: pre_sched finished after 0.114 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 385mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [PreSched]: remove_redundant_loads: 0 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [PreSched]: End remove redundncies Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [PreSched]: Start DCE Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1333 memory location(s), 1 block(s), and 5007 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes loaded 35742468 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ColoringAllocator::Rep]: INFO: Post GCA average loaded DMA size 3715 bytes +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes saved 21495810 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ColoringAllocator::Rep]: INFO: Post GCA average saved DMA size 2398 bytes +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 4243456 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 172 bytes +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: coloring_allocator_sb finished after 0.032 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 385mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1192 memory location(s), 1 block(s), and 2496 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running address_rotation_sb +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=1192 blocks=1 instructions=2496 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running tensor_copy_elim +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=1333 blocks=1 instructions=5007 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: SB Rotation rotated 43 Sb address +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [TensorCopyElim]: Tensor CP elimination: 1 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [PreSched]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: SB Rotation rotated 2 Sb address +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: address_rotation_sb finished after 0.006 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 385mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1192 memory location(s), 1 block(s), and 2496 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running dma_optimization_sb +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to dma_optimization_sb: modules=1 functions=1 allocs=1192 blocks=1 instructions=2496 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: DMA optimization In bytes loaded or saved 57238278, 25.8061% input load, 8.24377% output write, 65.9502% spill/reload [sg0000] +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: DMA input Coalescing combined 0 input loads +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: removed 0 identical load +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: SB Rotation rotated 48 Sb address +2025-11-04T21:38:51Z USER 9072 (nc01/sg02) [ModuleForkPass]: input_dma_coalescing finished after 0.018 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 385mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2990 memory location(s), 1 block(s), and 15127 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running remat_optimization +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to remat_optimization: modules=1 functions=1 allocs=2990 blocks=1 instructions=15127 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: address_rotation_sb finished after 0.029 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 385mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1151 memory location(s), 1 block(s), and 2454 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running coloring_allocator_dram +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to coloring_allocator_dram: modules=1 functions=1 allocs=1151 blocks=1 instructions=2454 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [TensorCopyElim]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DRAM_Allocator]: allocating spills in DRAM pre_link mode for address space Local +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DRAM_Allocator]: reserved space = 196864 bytes +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DRAM_Allocator]: spill space = 0 bytes +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DRAM_Allocator]: aligned spill space = 0 bytes +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DRAM_Allocator]: dram space = 107374182400 bytes +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DRAM_Allocator]: renumber locations +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DRAM_Allocator]: size = 0 +2025-11-04T21:38:51Z INFO 9072 []: find first defs for local +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: adjusted 0 DMACopy remat +2025-11-04T21:38:51Z INFO 9072 []: find first defs for global +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DRAM_Allocator]: Num intervals 0 Num locations 0 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DRAM_Allocator]: IntervalTree Build Done +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DRAM_Allocator]: info.neighbors init Done +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DRAM_Allocator]: IntervalTree readback Done +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DRAM_Allocator]: simplify interference graph +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DRAM_Allocator]: initialize low and high +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DRAM_Allocator]: lo = 0 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DRAM_Allocator]: hi = 0 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DRAM_Allocator]: total = 0 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DRAM_Allocator]: simplify +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DRAM_Allocator]: new candidates = 0 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DRAM_Allocator]: select ranges +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DRAM_Allocator]: CC buffer size limit 524288000 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DRAM_Allocator]: allreduce_dram_hwm 0 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DRAM_Allocator]: Real CC buffer size 0 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DRAM_Allocator]: DRAM hwm after allocation: 0 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DRAM_Allocator]: DRAM allocation successful +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: sub-graph will get execute 1 times +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: [Load Merging]: removed 0 remat/cloned instructions +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: coloring_allocator_dram finished after 0.004 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 385mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1151 memory location(s), 1 block(s), and 2454 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running address_rotation_dram +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to address_rotation_dram: modules=1 functions=1 allocs=1151 blocks=1 instructions=2454 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: Runtime page size at 512MB +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: [Load shrink]: shrinked 0 GCA remat/cloned instructions +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: DRAM hwm before rotation 0 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: [Load Merging + Load shrink] reduced input/const loading DMA traffic 7864320, 13.7396% out of total dma traffic(1.47709e+07) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: allreduce buffer size 524288000 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: allreduce hwm 8388608 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: Real CC buffer size 8388608 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: DRAM hwm after rotation 0 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: DRAM Rotation rotated 0 Dram address +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: address_rotation_dram finished after 0.002 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 385mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1151 memory location(s), 1 block(s), and 2454 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running tensorcopy_accel +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to tensorcopy_accel: modules=1 functions=1 allocs=1151 blocks=1 instructions=2454 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [TensorCopyAccel::Impl]: Running peephole optimization pass +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [TensorCopyAccel::Impl]: Accelerated 52 out of 230 tensorcopy in Function: sg0000 average acceleration factor: 1 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: tensorcopy_accel finished after 0.000 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 385mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1151 memory location(s), 1 block(s), and 2454 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running peephole_opts +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to peephole_opts: modules=1 functions=1 allocs=1151 blocks=1 instructions=2454 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [PeepholeOpts]: PeepholeOpts enabled? Recip: true Tsp: true Tc: false SplitSelect: true SimplifyMemset true +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: peephole_opts finished after 0.001 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 385mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1151 memory location(s), 1 block(s), and 2455 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running lower_kernel +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to lower_kernel: modules=1 functions=1 allocs=1151 blocks=1 instructions=2455 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [LowerKernel]: Started running LowerKernel +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [LowerKernel]: BIR SB coloring allocator is disabled +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [LowerKernel]: Start of kernel lowering pass, number of insts: 2455, number of allocs: 1151 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [LowerKernel]: Found InstBIRKernel: [CausalAttentionMMSoftmaxMMWithoutSwap]I-2769-0 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [LowerKernel]: Scan BKs time (s): 0.000213 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [LowerKernel]: Set architecture: gen3 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [LowerKernel]: Input/output shapes for Kernel inst [I-2769-0] +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [LowerKernel]: input0: [ 4 128 2048 ] +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [LowerKernel]: input1: [ 4 128 2048 ] +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [LowerKernel]: input2: [ 4 2048 128 ] +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [LowerKernel]: input3: ap +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [LowerKernel]: output0: [ 4 128 2048 ] +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [LowerKernel]: do_input1_tp=false +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [LowerKernel]: do_out_tp=true +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [LowerKernel]: Legalized inp_ap=[[262144,4],[2048,128],[1,2048]] +Offset: 1048576 +Memory Location: {reshape.16}@DRAM(2097152x2)#Internal DebugInfo: +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [LowerKernel]: Legalized inp_ap=[[262144,4],[2048,128],[1,2048]] +Offset: 1048576 +Memory Location: {reshape.24}@DRAM(2097152x2)#Internal DebugInfo: +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [LowerKernel]: AP of Q indicates standalone Q tensor. +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [LowerKernel]: parallel_split_n = input1_ap[1].getStep() / input1_ap[2].getNum() = 2048 / 2048 = 1 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [LowerKernel]: Sharding/tiling split_i=0, split_n=1 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [LowerKernel]: Flash attention has been disabled +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [LowerKernel]: Scratch sbuf for kernel I-2769-0: [105472, 165756) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [LowerKernel]: seq_len=2048, seq_len2=2048, complete_seq_len2=2048 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [LowerKernel]: Creating identity matrices with AffineSelect +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [LowerKernel]: seq_len=2048, seq_len2=2048, complete_seq_len2=2048 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [LowerKernel]: Creating identity matrices with AffineSelect +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [PreSched]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: [spill optimization round 0]: removed 24 spill/reload instructions +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: [spill optimization round 0]: removed 24 spill/reload memory locations +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [LowerKernel]: seq_len=2048, seq_len2=2048, complete_seq_len2=2048 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [LowerKernel]: Creating identity matrices with AffineSelect +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [TensorCopyElim]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [TensorCopyElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [TensorCopyElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [PreSched]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: tensor_copy_elim finished after 0.025 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: [spill optimization round 1]: removed 0 spill/reload instructions +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [LowerKernel]: seq_len=2048, seq_len2=2048, complete_seq_len2=2048 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [LowerKernel]: Creating identity matrices with AffineSelect +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: [spill optimization round 1]: removed 0 spill/reload memory locations +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 12582912, 33.3333% out of total spill/reload dma traffic +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [PreSched]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [PreSched]: End DCE Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload instructions +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload memory locations +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: [Re-allocation Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [LowerKernel]: Lower BKs time (s): 0.034601 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: lower_kernel finished after 0.011 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 394mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2294 memory location(s), 1 block(s), and 4458 instruction(s). Max writers: 65 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running lower_klir_kernel +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to lower_klir_kernel: modules=1 functions=1 allocs=2294 blocks=1 instructions=4458 Max writers: 65 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: lower_klir_kernel finished after 0.001 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 392mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2294 memory location(s), 1 block(s), and 4458 instruction(s). Max writers: 65 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running lower_nki_kernel +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to lower_nki_kernel: modules=1 functions=1 allocs=2294 blocks=1 instructions=4458 Max writers: 65 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: lower_nki_kernel finished after 0.001 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 392mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2294 memory location(s), 1 block(s), and 4458 instruction(s). Max writers: 65 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running non_ssa_legalization +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to non_ssa_legalization: modules=1 functions=1 allocs=2294 blocks=1 instructions=4458 Max writers: 65 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [NonSSALeg]: remove_redundant_loads +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 392mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1332 memory location(s), 1 block(s), and 5006 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running dynamic_dma_setup +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to dynamic_dma_setup: modules=1 functions=1 allocs=1332 blocks=1 instructions=5006 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: dynamic_dma_setup finished after 0.000 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 392mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1333 memory location(s), 1 block(s), and 5006 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running runtime_memory_reservation +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to runtime_memory_reservation: modules=1 functions=1 allocs=1333 blocks=1 instructions=5006 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: runtime_memory_reservation finished after 0.000 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 392mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1334 memory location(s), 1 block(s), and 5006 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running lower_klir_kernel +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to lower_klir_kernel: modules=1 functions=1 allocs=1334 blocks=1 instructions=5006 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [NonSSALeg]: remove_redundant_loads: 0 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload instructions +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload memory locations +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: lower_klir_kernel finished after 0.001 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 392mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1334 memory location(s), 1 block(s), and 5006 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running lower_nki_kernel +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to lower_nki_kernel: modules=1 functions=1 allocs=1334 blocks=1 instructions=5006 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: lower_nki_kernel finished after 0.001 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 392mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: [remove extra save] removed 0 memlocs and 0 instructions +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1334 memory location(s), 1 block(s), and 5006 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running coloring_allocator_psum +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to coloring_allocator_psum: modules=1 functions=1 allocs=1334 blocks=1 instructions=5006 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: [remove_memset_spill]: removed 0 spill/reload instructions +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: [remove_memset_spill]: removed 0 spill/reload memory locations +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [PreSched]: Start build flow dependencies Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [build_flow_deps]: Start build fdeps. Invocation: 5Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: find loads +2025-11-04T21:38:51Z INFO 9072 (nc00/sg02) [build_flow_deps]: Allocs: 3439 instructions: 15903 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: 2 pin count +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: 179 remat count +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: 2 pinned tensors will require about 16392 bytes/partition +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: build interference graph +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: pass 1 int-tree +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PSUM_Allocator]: allocating PSUM +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PSUM_Allocator]: main loop +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: Num intervals 952 Num locations 952 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: IntervalTree Build Done +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: info.neighbors init Done +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PSUM_Allocator]: renumber locations +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: DMA SpillSave Coalescing Round 0 combined 0 SpillSaves and Reloads +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: info.neighbors partners Done +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: average loaded DMA size 3448 bytes +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: average saved DMA size 2179 bytes +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes loaded 19489540 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: INFO: Post DMA coalescing average loaded DMA size 3448 bytes +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes saved 17301506 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: INFO: Post DMA coalescing average saved DMA size 2179 bytes +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PSUM_Allocator]: size = 326 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: [Experiment partial DMA access] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: [DMA optimization] reduced DMA traffic 20447232, 35.723% out of total dma traffic +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: DMA optimization Out bytes loaded or saved 36791046, 28.7479% input load, 12.8254% output write, 58.4267% spill/reload [sg0000] +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes loaded 19489540 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: INFO: Post DMA optimization average loaded DMA size 3448 bytes +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes saved 17301506 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: INFO: Post DMA optimization average saved DMA size 2179 bytes +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes DMAcopyed 4243456 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: INFO: Post DMA optimization average DMAcopyed DMA size 172 bytes +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: INFO: Post DMA optimization average DMA size 1072 bytes +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: INFO: Finished set_spill_canreadUninit(module); +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: DMA optimization re-enable optimization +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: dma_optimization_sb finished after 0.044 seconds +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 394mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1152 memory location(s), 1 block(s), and 2457 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running address_rotation_sb +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=1152 blocks=1 instructions=2457 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PSUM_Allocator]: build_no_bitmap start +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [RematOpt]: Removed 0 remat instructions +2025-11-04T21:38:51Z USER 9072 (nc01/sg02) [ModuleForkPass]: remat_optimization finished after 0.043 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PSUM_Allocator]: 100% PSUM demand before spilling +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PSUM_Allocator]: PSUM high-water mark = 8 tensors +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PSUM_Allocator]: found 1087 edges +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PSUM_Allocator]: mean: 6.66871 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PSUM_Allocator]: median: 6.99997 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PSUM_Allocator]: adjacency vectors require 8696 bytes +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PSUM_Allocator]: build_no_bitmap done +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PSUM_Allocator]: find costs +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: SB Rotation rotated 2 Sb address +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: IntervalTree readback Done +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: edge: 34912 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: mean: 73.3445 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: median: 62.8963 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: find costs +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 394mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: SB Rotation rotated 15 Sb address +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2990 memory location(s), 1 block(s), and 15127 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running coalesce_multichannel_cc_ops +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to coalesce_multichannel_cc_ops: modules=1 functions=1 allocs=2990 blocks=1 instructions=15127 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z USER 9072 (nc01/sg02) [ModuleForkPass]: coalesce_multichannel_cc_ops finished after 0.002 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 395mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2990 memory location(s), 1 block(s), and 15127 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running infer_stream_ids +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to infer_stream_ids: modules=1 functions=1 allocs=2990 blocks=1 instructions=15127 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z USER 9072 (nc01/sg02) [ModuleForkPass]: infer_stream_ids finished after 0.002 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 395mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2990 memory location(s), 1 block(s), and 15127 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running pre_sched +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to pre_sched: modules=1 functions=1 allocs=2990 blocks=1 instructions=15127 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [PreSched]: Start PRE scheduling 2 cores: 1 at: Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: best-of-n loop, heuristic = 0 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: simplify interference graph +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: initialize safe & unsafe +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: safe = 594 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: unsafe = 283 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: inf = 73 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: total = 950 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: simplify +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: simplify_step3_sorted2 #Unsafe 194 #Pinned 0 #Safe 0 minCost 0.00452202 maxCost 0.0359378 locations 952 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: new candidates = 55 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: select ranges +2025-11-04T21:38:51Z INFO 9072 [LayerSpiller]: LayerSpill: Start... +2025-11-04T21:38:51Z INFO 9072 [LayerSpiller]: LayerSpill: Found 2 Splits CCs +2025-11-04T21:38:51Z INFO 9072 [LayerSpiller]: Grouped CCs to 2 clusters. +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [NonSSALeg]: [Non-SSA legalization]created 32 memorylocations +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: non_ssa_legalization finished after 0.040 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 396mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2310 memory location(s), 1 block(s), and 4458 instruction(s). Max writers: 65 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running dynamic_dma_cleanup +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to dynamic_dma_cleanup: modules=1 functions=1 allocs=2310 blocks=1 instructions=4458 Max writers: 65 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: Total: 950 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: Spilled: 0.000 (0) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: Allocated: 1.000 (950) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: Rover zone: 0.774 (735) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: Pre-rover zone: 0.013 (12) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: Post-rover zone: 0.214 (203) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: Slice zone: 0.000 (0) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: Blocks nothing: 0.001 (1) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: Blocks medium: 0.000 (0) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: Blocks tall: 0.999 (949) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: Visited until tall blocking (mean): 0.995 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: Visited until tall blocking (median): 1.000 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: Visited until tall blocking (p95): 1.000 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: Success +2025-11-04T21:38:51Z INFO 9072 [LayerSpiller]: LayerSpill: To Spill 0 multi-layer tensors +2025-11-04T21:38:51Z INFO 9072 [LayerSpiller]: LayerSpill: set uninit flag on 0 insts +2025-11-04T21:38:51Z INFO 9072 [LayerSpiller]: LayerSpill: Done. +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [PreSched]: Start split live ranges Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: dynamic_dma_cleanup finished after 0.002 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 396mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2310 memory location(s), 1 block(s), and 4458 instruction(s). Max writers: 65 Max Readers: 448 +2025-11-04T21:38:51Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running birverifier +2025-11-04T21:38:51Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=2310 blocks=1 instructions=4458 Max writers: 65 Max Readers: 448 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: SB spills = 0 tensors +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: size = 0 bytes/partition +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: remats = 0 tensors +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: unpinned = 0 tensors +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: size = 0 bytes/partition +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: SB score = 0 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: spilling from SB cost about 0 cycles +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: 16392 bytes/partition (100%) successfully pinned +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: pinning saved approximately 8300 cycles +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [SB_Allocator]: 0% SB utilization after allocation +2025-11-04T21:38:51Z WARNING 9072 [birverifier::InstVisitor]: (nc01/sg00) Non - output memory location with no reader: {I-2769-0_s0_aten__mul_broadcast.7-t210_b0}@SB<0,114948>(128x4)#Internal DebugInfo: +2025-11-04T21:38:51Z WARNING 9072 [birverifier::InstVisitor]: (nc01/sg00) Non - output memory location with no reader: {I-2769-0_s0_aten__mul_broadcast.7-t210_b1}@SB<0,114948>(128x4)#Internal DebugInfo: +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [PreSched]: Num_Splits: 0 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [PreSched]: End split live ranges Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [PreSched]: Strt remove redundncies Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg02) [PreSched]: remove_redundant_memsets +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PSUM_Allocator]: best-of-n loop, heuristic = 0, allow_psum_spill_within_accum_group = false +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes loaded 103916036 +2025-11-04T21:38:51Z WARNING 9072 [birverifier::InstVisitor]: (nc01/sg00) Non - output memory location with no reader: {I-2769-0_s0_aten__mul_broadcast.7-t210_b2}@SB<0,114948>(128x4)#Internal DebugInfo: +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ColoringAllocator::Rep]: INFO: Post GCA average loaded DMA size 2889 bytes +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes saved 27262978 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ColoringAllocator::Rep]: INFO: Post GCA average saved DMA size 2957 bytes +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 2129920 +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 130 bytes +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: coloring_allocator_sb finished after 0.105 seconds +2025-11-04T21:38:51Z WARNING 9072 [birverifier::InstVisitor]: (nc01/sg00) Non - output memory location with no reader: {I-2769-0_s0_aten__mul_broadcast.7-t210_b3}@SB<0,114948>(128x4)#Internal DebugInfo: +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 396mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1335 memory location(s), 1 block(s), and 5009 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PSUM_Allocator]: simplify interference graph +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PSUM_Allocator]: initialize low and high +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PSUM_Allocator]: lo = 326 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PSUM_Allocator]: hi = 0 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PSUM_Allocator]: inf = 0 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PSUM_Allocator]: total = 326 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PSUM_Allocator]: simplify +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PSUM_Allocator]: new candidates = 0 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PSUM_Allocator]: select ranges +2025-11-04T21:38:51Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running address_rotation_sb +2025-11-04T21:38:51Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=1335 blocks=1 instructions=5009 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PSUM_Allocator]: no more spills +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PSUM_Allocator]: PSUM score = 0 (lower is better) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PSUM_Allocator]: spilling from PSUM cost about 0 cycles +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [PSUM_Allocator]: 100% PSUM utilization after allocation +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: coloring_allocator_psum finished after 0.052 seconds +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 396mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1334 memory location(s), 1 block(s), and 5006 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:51Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running dma_optimization_psum +2025-11-04T21:38:51Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to dma_optimization_psum: modules=1 functions=1 allocs=1334 blocks=1 instructions=5006 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PreSched]: remove_redundant_memsets: 1 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PreSched]: remove_redundant_loads +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload instructions +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload memory locations +2025-11-04T21:38:52Z USER 9072 (nc01/sg01) [ModuleForkPass]: dma_optimization_psum finished after 0.018 seconds +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 398mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1334 memory location(s), 1 block(s), and 5006 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running address_rotation_psum +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to address_rotation_psum: modules=1 functions=1 allocs=1334 blocks=1 instructions=5006 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: SB Rotation rotated 43 Sb address +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: SB Rotation rotated 2 Sb address +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: SB Rotation rotated 48 Sb address +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:52Z USER 9072 (nc00/sg01) [ModuleForkPass]: address_rotation_sb finished after 0.031 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 399mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1335 memory location(s), 1 block(s), and 5009 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running dma_optimization_sb +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to dma_optimization_sb: modules=1 functions=1 allocs=1335 blocks=1 instructions=5009 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: DMA optimization In bytes loaded or saved 131179014, 60.0326% input load, 3.19739% output write, 36.77% spill/reload [sg0001] +2025-11-04T21:38:52Z USER 9072 (nc01/sg00) [ModuleForkPass]: birverifier finished after 0.042 seconds +2025-11-04T21:38:52Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 399mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2310 memory location(s), 1 block(s), and 4458 instruction(s). Max writers: 65 Max Readers: 448 +2025-11-04T21:38:52Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running dynamic_dma_scan +2025-11-04T21:38:52Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to dynamic_dma_scan: modules=1 functions=1 allocs=2310 blocks=1 instructions=4458 Max writers: 65 Max Readers: 448 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PreSched]: remove_redundant_loads: 0 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PreSched]: End remove redundncies Tue Nov 4 21:38:52 2025 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PreSched]: Start DCE Tue Nov 4 21:38:52 2025 +2025-11-04T21:38:52Z USER 9072 (nc01/sg00) [ModuleForkPass]: dynamic_dma_scan finished after 0.002 seconds +2025-11-04T21:38:52Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 400mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2310 memory location(s), 1 block(s), and 4458 instruction(s). Max writers: 65 Max Readers: 448 +2025-11-04T21:38:52Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running build_fdeps +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: PSUM Rotation rotated 0 PSUM Banks +2025-11-04T21:38:52Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to build_fdeps: modules=1 functions=1 allocs=2310 blocks=1 instructions=4458 Max writers: 65 Max Readers: 448 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg00) [build_flow_deps]: Start build fdeps. Invocation: 6Tue Nov 4 21:38:52 2025 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:52Z USER 9072 (nc00/sg00) [ModuleForkPass]: address_rotation_sb finished after 0.071 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 400mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1152 memory location(s), 1 block(s), and 2457 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:52Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running coloring_allocator_dram +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to coloring_allocator_dram: modules=1 functions=1 allocs=1152 blocks=1 instructions=2457 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: removed 0 identical load +2025-11-04T21:38:52Z INFO 9072 (nc01/sg00) [build_flow_deps]: Allocs: 2310 instructions: 4458 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: allocating spills in DRAM pre_link mode for address space Local +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: reserved space = 196864 bytes +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: spill space = 0 bytes +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: aligned spill space = 0 bytes +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: dram space = 107374182400 bytes +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: renumber locations +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: size = 0 +2025-11-04T21:38:52Z INFO 9072 []: find first defs for local +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: adjusted 0 DMACopy remat +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: PSUM Rotation rotated 2 PSUM Banks +2025-11-04T21:38:52Z INFO 9072 []: find first defs for global +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: sub-graph will get execute 27 times +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: [Load Merging]: removed 0 remat/cloned instructions +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: Num intervals 0 Num locations 0 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: IntervalTree Build Done +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: info.neighbors init Done +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: IntervalTree readback Done +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: simplify interference graph +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: initialize low and high +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: lo = 0 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: hi = 0 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: total = 0 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: simplify +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: new candidates = 0 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: select ranges +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: CC buffer size limit 524288000 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: allreduce_dram_hwm 0 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: Real CC buffer size 0 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: DRAM hwm after allocation: 0 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: DRAM allocation successful +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: [Load shrink]: shrinked 0 GCA remat/cloned instructions +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: [Load Merging + Load shrink] reduced input/const loading DMA traffic 9961472, 7.5938% out of total dma traffic(7.87502e+07) +2025-11-04T21:38:52Z USER 9072 (nc00/sg00) [ModuleForkPass]: coloring_allocator_dram finished after 0.011 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 400mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1152 memory location(s), 1 block(s), and 2457 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:52Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running address_rotation_dram +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to address_rotation_dram: modules=1 functions=1 allocs=1152 blocks=1 instructions=2457 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: Runtime page size at 512MB +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: DRAM hwm before rotation 0 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: allreduce buffer size 524288000 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: allreduce hwm 8388608 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: Real CC buffer size 8388608 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: DRAM hwm after rotation 0 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: DRAM Rotation rotated 0 Dram address +2025-11-04T21:38:52Z USER 9072 (nc00/sg00) [ModuleForkPass]: address_rotation_dram finished after 0.002 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 400mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1152 memory location(s), 1 block(s), and 2457 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:52Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running tensorcopy_accel +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to tensorcopy_accel: modules=1 functions=1 allocs=1152 blocks=1 instructions=2457 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [TensorCopyAccel::Impl]: Running peephole optimization pass +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [TensorCopyAccel::Impl]: Accelerated 52 out of 231 tensorcopy in Function: sg0000 average acceleration factor: 1 +2025-11-04T21:38:52Z USER 9072 (nc00/sg00) [ModuleForkPass]: tensorcopy_accel finished after 0.001 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 401mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1152 memory location(s), 1 block(s), and 2457 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PreSched]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:52Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running peephole_opts +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to peephole_opts: modules=1 functions=1 allocs=1152 blocks=1 instructions=2457 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [PeepholeOpts]: PeepholeOpts enabled? Recip: true Tsp: true Tc: false SplitSelect: true SimplifyMemset true +2025-11-04T21:38:52Z USER 9072 (nc00/sg00) [ModuleForkPass]: peephole_opts finished after 0.001 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 401mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1152 memory location(s), 1 block(s), and 2458 instruction(s). Max writers: 32 Max Readers: 448 +2025-11-04T21:38:52Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running lower_kernel +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to lower_kernel: modules=1 functions=1 allocs=1152 blocks=1 instructions=2458 Max writers: 32 Max Readers: 448 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [LowerKernel]: Started running LowerKernel +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [LowerKernel]: BIR SB coloring allocator is disabled +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [LowerKernel]: Start of kernel lowering pass, number of insts: 2458, number of allocs: 1152 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [LowerKernel]: Found InstBIRKernel: [CausalAttentionMMSoftmaxMMWithoutSwap]I-2769-0 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [LowerKernel]: Scan BKs time (s): 0.002549 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [LowerKernel]: Set architecture: gen3 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [LowerKernel]: Input/output shapes for Kernel inst [I-2769-0] +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [LowerKernel]: input0: [ 4 128 2048 ] +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [LowerKernel]: input1: [ 4 128 2048 ] +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [LowerKernel]: input2: [ 4 2048 128 ] +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [LowerKernel]: input3: ap +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [LowerKernel]: output0: [ 4 128 2048 ] +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [LowerKernel]: do_input1_tp=false +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [LowerKernel]: do_out_tp=true +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [LowerKernel]: Legalized inp_ap=[[262144,4],[2048,128],[1,2048]] +Offset: 0 +Memory Location: {reshape.16}@DRAM(2097152x2)#Internal DebugInfo: +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [LowerKernel]: Legalized inp_ap=[[262144,4],[2048,128],[1,2048]] +Offset: 0 +Memory Location: {reshape.24}@DRAM(2097152x2)#Internal DebugInfo: +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [LowerKernel]: AP of Q indicates standalone Q tensor. +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [LowerKernel]: parallel_split_n = input1_ap[1].getStep() / input1_ap[2].getNum() = 2048 / 2048 = 1 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [LowerKernel]: Sharding/tiling split_i=0, split_n=1 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [LowerKernel]: Flash attention has been disabled +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [LowerKernel]: Scratch sbuf for kernel I-2769-0: [105472, 165756) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [LowerKernel]: seq_len=2048, seq_len2=2048, complete_seq_len2=2048 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [LowerKernel]: Creating identity matrices with AffineSelect +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: [spill optimization round 0]: removed 12 spill/reload instructions +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: [spill optimization round 0]: removed 12 spill/reload memory locations +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: PSUM Rotation rotated 2 PSUM Banks +2025-11-04T21:38:52Z USER 9072 (nc01/sg01) [ModuleForkPass]: address_rotation_psum finished after 0.044 seconds +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 404mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1334 memory location(s), 1 block(s), and 5006 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running coloring_allocator_sb +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to coloring_allocator_sb: modules=1 functions=1 allocs=1334 blocks=1 instructions=5006 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes loaded 103916036 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ColoringAllocator::Rep]: INFO: Pre GCA average loaded DMA size 2889 bytes +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes saved 27262976 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ColoringAllocator::Rep]: INFO: Pre GCA average saved DMA size 2958 bytes +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 2129920 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 130 bytes +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [LowerKernel]: seq_len=2048, seq_len2=2048, complete_seq_len2=2048 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [LowerKernel]: Creating identity matrices with AffineSelect +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: [spill optimization round 1]: removed 2 spill/reload instructions +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: [spill optimization round 1]: removed 2 spill/reload memory locations +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: allocating SB +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: main loop +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [LowerKernel]: seq_len=2048, seq_len2=2048, complete_seq_len2=2048 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [LowerKernel]: Creating identity matrices with AffineSelect +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: renumber locations +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: [spill optimization round 2]: removed 0 spill/reload instructions +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: [spill optimization round 2]: removed 0 spill/reload memory locations +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: size = 951 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 7340032, 15.2174% out of total spill/reload dma traffic +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: find partners +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [build_flow_deps]: Build fdeps inserted 52134 edges +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [build_flow_deps]: Done build fdeps 52134 Tue Nov 4 21:38:52 2025 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [PreSched]: End build flow dependencies Tue Nov 4 21:38:52 2025 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [PreSched]: Start remove useless insts Tue Nov 4 21:38:52 2025 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [PreSched]: remove_useless_insts +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: found 282 accumulation groups +2025-11-04T21:38:52Z INFO 9072 (nc01/sg00) [build_flow_deps]: Build fdeps inserted 10913 edges +2025-11-04T21:38:52Z INFO 9072 (nc01/sg00) [build_flow_deps]: Done build fdeps 10913 Tue Nov 4 21:38:52 2025 +2025-11-04T21:38:52Z USER 9072 (nc01/sg00) [ModuleForkPass]: build_fdeps finished after 0.044 seconds +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: largest = _dot.6-t1592_i40 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 408mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: tensors = 36 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: requires 49152 bytes/partition +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: expanding partners +2025-11-04T21:38:52Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2310 memory location(s), 1 block(s), and 4458 instruction(s). Max writers: 65 Max Readers: 448 +2025-11-04T21:38:52Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running remove_redundancies +2025-11-04T21:38:52Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to remove_redundancies: modules=1 functions=1 allocs=2310 blocks=1 instructions=4458 Max writers: 65 Max Readers: 448 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg00) [RemoveRedundancies]: remove_clobbered_writes +2025-11-04T21:38:52Z INFO 9072 (nc01/sg00) [RemoveRedundancies]: remove_clobbered_writes: 0 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg00) [RemoveRedundancies]: remove_useless_insts +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [LowerKernel]: seq_len=2048, seq_len2=2048, complete_seq_len2=2048 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload instructions +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload memory locations +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [LowerKernel]: Creating identity matrices with AffineSelect +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: [Re-allocation Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-11-04T21:38:52Z INFO 9072 []: find first defs for local +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [PreSched]: remove Useless Instructions: 0 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [PreSched]: End remove useless insts Tue Nov 4 21:38:52 2025 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [PreSched]: Start scratchpad optimization Tue Nov 4 21:38:52 2025 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg00) [RemoveRedundancies]: remove Useless Instructions: 28 +2025-11-04T21:38:52Z USER 9072 (nc01/sg00) [ModuleForkPass]: remove_redundancies finished after 0.011 seconds +2025-11-04T21:38:52Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 412mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2298 memory location(s), 1 block(s), and 4430 instruction(s). Max writers: 65 Max Readers: 448 +2025-11-04T21:38:52Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running anti_dependency_analyzer +2025-11-04T21:38:52Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=2298 blocks=1 instructions=4430 Max writers: 65 Max Readers: 448 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg00) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB} +2025-11-04T21:38:52Z INFO 9072 (nc01/sg00) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload instructions +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload memory locations +2025-11-04T21:38:52Z INFO 9072 []: find first defs for global +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: [remove extra save] removed 0 memlocs and 0 instructions +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [LowerKernel]: Lower BKs time (s): 0.098138 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PreSched]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: [remove_memset_spill]: removed 0 spill/reload instructions +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: [remove_memset_spill]: removed 0 spill/reload memory locations +2025-11-04T21:38:52Z USER 9072 (nc00/sg00) [ModuleForkPass]: lower_kernel finished after 0.035 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 417mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2295 memory location(s), 1 block(s), and 4461 instruction(s). Max writers: 65 Max Readers: 448 +2025-11-04T21:38:52Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running lower_klir_kernel +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to lower_klir_kernel: modules=1 functions=1 allocs=2295 blocks=1 instructions=4461 Max writers: 65 Max Readers: 448 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:52Z USER 9072 (nc00/sg00) [ModuleForkPass]: lower_klir_kernel finished after 0.001 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 416mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2295 memory location(s), 1 block(s), and 4461 instruction(s). Max writers: 65 Max Readers: 448 +2025-11-04T21:38:52Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running lower_nki_kernel +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to lower_nki_kernel: modules=1 functions=1 allocs=2295 blocks=1 instructions=4461 Max writers: 65 Max Readers: 448 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [PreSched]: End scratchpad optimization Tue Nov 4 21:38:52 2025 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PreSched]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PreSched]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:52Z USER 9072 (nc00/sg00) [ModuleForkPass]: lower_nki_kernel finished after 0.001 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 416mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2295 memory location(s), 1 block(s), and 4461 instruction(s). Max writers: 65 Max Readers: 448 +2025-11-04T21:38:52Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running non_ssa_legalization +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to non_ssa_legalization: modules=1 functions=1 allocs=2295 blocks=1 instructions=4461 Max writers: 65 Max Readers: 448 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [NonSSALeg]: remove_redundant_loads +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PreSched]: End DCE Tue Nov 4 21:38:52 2025 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [NonSSALeg]: remove_redundant_loads: 0 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PreSched]: Start build flow dependencies Tue Nov 4 21:38:52 2025 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [build_flow_deps]: Start build fdeps. Invocation: 7Tue Nov 4 21:38:52 2025 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: DMA SpillSave Coalescing Round 0 combined 0 SpillSaves and Reloads +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: average loaded DMA size 2755 bytes +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: average saved DMA size 2908 bytes +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes loaded 88187396 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: INFO: Post DMA coalescing average loaded DMA size 2755 bytes +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes saved 25690114 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: INFO: Post DMA coalescing average saved DMA size 2908 bytes +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [build_flow_deps]: Allocs: 2990 instructions: 15126 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: [Experiment partial DMA access] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: [DMA optimization] reduced DMA traffic 17301504, 13.1892% out of total dma traffic +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: DMA optimization Out bytes loaded or saved 113877510, 60.4059% input load, 3.68317% output write, 35.9109% spill/reload [sg0001] +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes loaded 88187396 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: INFO: Post DMA optimization average loaded DMA size 2755 bytes +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes saved 25690114 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: INFO: Post DMA optimization average saved DMA size 2908 bytes +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes DMAcopyed 2129920 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: INFO: Post DMA optimization average DMAcopyed DMA size 130 bytes +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: INFO: Post DMA optimization average DMA size 2027 bytes +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: INFO: Finished set_spill_canreadUninit(module); +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: DMA optimization re-enable optimization +2025-11-04T21:38:52Z USER 9072 (nc00/sg01) [ModuleForkPass]: dma_optimization_sb finished after 0.093 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 417mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1300 memory location(s), 1 block(s), and 4975 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running address_rotation_sb +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=1300 blocks=1 instructions=4975 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: find loads +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: 2 pin count +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: 179 remat count +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: 2 pinned tensors will require about 16392 bytes/partition +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: build interference graph +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: pass 1 int-tree +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: SB Rotation rotated 13 Sb address +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: Num intervals 951 Num locations 951 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: IntervalTree Build Done +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: info.neighbors init Done +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [NonSSALeg]: [Non-SSA legalization]created 32 memorylocations +2025-11-04T21:38:52Z USER 9072 (nc00/sg00) [ModuleForkPass]: non_ssa_legalization finished after 0.051 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 421mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2311 memory location(s), 1 block(s), and 4461 instruction(s). Max writers: 65 Max Readers: 448 +2025-11-04T21:38:52Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running dynamic_dma_cleanup +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to dynamic_dma_cleanup: modules=1 functions=1 allocs=2311 blocks=1 instructions=4461 Max writers: 65 Max Readers: 448 +2025-11-04T21:38:52Z USER 9072 (nc00/sg00) [ModuleForkPass]: dynamic_dma_cleanup finished after 0.001 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 421mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2311 memory location(s), 1 block(s), and 4461 instruction(s). Max writers: 65 Max Readers: 448 +2025-11-04T21:38:52Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running birverifier +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=2311 blocks=1 instructions=4461 Max writers: 65 Max Readers: 448 +2025-11-04T21:38:52Z WARNING 9072 [birverifier::InstVisitor]: (nc00/sg00) Non - output memory location with no reader: {I-2769-0_s0_aten__mul_broadcast.7-t210_b0}@SB<0,114948>(128x4)#Internal DebugInfo: +2025-11-04T21:38:52Z WARNING 9072 [birverifier::InstVisitor]: (nc00/sg00) Non - output memory location with no reader: {I-2769-0_s0_aten__mul_broadcast.7-t210_b1}@SB<0,114948>(128x4)#Internal DebugInfo: +2025-11-04T21:38:52Z WARNING 9072 [birverifier::InstVisitor]: (nc00/sg00) Non - output memory location with no reader: {I-2769-0_s0_aten__mul_broadcast.7-t210_b2}@SB<0,114948>(128x4)#Internal DebugInfo: +2025-11-04T21:38:52Z WARNING 9072 [birverifier::InstVisitor]: (nc00/sg00) Non - output memory location with no reader: {I-2769-0_s0_aten__mul_broadcast.7-t210_b3}@SB<0,114948>(128x4)#Internal DebugInfo: +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: info.neighbors partners Done +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: IntervalTree readback Done +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: edge: 34898 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: mean: 73.3922 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: median: 63.0205 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: find costs +2025-11-04T21:38:52Z USER 9072 (nc01/sg00) [ModuleForkPass]: anti_dependency_analyzer finished after 0.087 seconds +2025-11-04T21:38:52Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 423mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2298 memory location(s), 1 block(s), and 4430 instruction(s). Max writers: 65 Max Readers: 448 +2025-11-04T21:38:52Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running tensor_copy_elim +2025-11-04T21:38:52Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=2298 blocks=1 instructions=4430 Max writers: 65 Max Readers: 448 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: best-of-n loop, heuristic = 0 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: simplify interference graph +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: initialize safe & unsafe +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: safe = 593 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: unsafe = 283 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: inf = 73 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: total = 949 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: simplify +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: simplify_step3_sorted2 #Unsafe 194 #Pinned 0 #Safe 0 minCost 0.00452202 maxCost 0.0359378 locations 951 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: new candidates = 55 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: select ranges +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: SB Rotation rotated 98 Sb address +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: Total: 949 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: Spilled: 0.000 (0) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: Allocated: 1.000 (949) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: Rover zone: 0.774 (735) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: Pre-rover zone: 0.012 (11) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: Post-rover zone: 0.214 (203) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: Slice zone: 0.000 (0) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: Blocks nothing: 0.001 (1) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: Blocks medium: 0.000 (0) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: Blocks tall: 0.999 (948) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: Visited until tall blocking (mean): 0.996 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: Visited until tall blocking (median): 1.000 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: Visited until tall blocking (p95): 1.000 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: Success +2025-11-04T21:38:52Z USER 9072 (nc00/sg00) [ModuleForkPass]: birverifier finished after 0.032 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 422mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg00) [TensorCopyElim]: Tensor CP elimination: 64 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [PreSched]: DONE PRE scheduling Tue Nov 4 21:38:52 2025 +2025-11-04T21:38:52Z USER 9072 (nc00/sg02) [ModuleForkPass]: pre_sched finished after 0.305 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 422mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2311 memory location(s), 1 block(s), and 4461 instruction(s). Max writers: 65 Max Readers: 448 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3439 memory location(s), 1 block(s), and 15903 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:52Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running tensor_copy_elim +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=3439 blocks=1 instructions=15903 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:52Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running dynamic_dma_scan +2025-11-04T21:38:52Z INFO 9072 (nc01/sg00) [TensorCopyElim]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to dynamic_dma_scan: modules=1 functions=1 allocs=2311 blocks=1 instructions=4461 Max writers: 65 Max Readers: 448 +2025-11-04T21:38:52Z USER 9072 (nc00/sg00) [ModuleForkPass]: dynamic_dma_scan finished after 0.003 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 422mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2311 memory location(s), 1 block(s), and 4461 instruction(s). Max writers: 65 Max Readers: 448 +2025-11-04T21:38:52Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running build_fdeps +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to build_fdeps: modules=1 functions=1 allocs=2311 blocks=1 instructions=4461 Max writers: 65 Max Readers: 448 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [build_flow_deps]: Start build fdeps. Invocation: 8Tue Nov 4 21:38:52 2025 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [build_flow_deps]: Allocs: 2311 instructions: 4461 +2025-11-04T21:38:52Z USER 9072 (nc01/sg00) [ModuleForkPass]: tensor_copy_elim finished after 0.019 seconds +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: SB spills = 0 tensors +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: size = 0 bytes/partition +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: remats = 0 tensors +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: unpinned = 0 tensors +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: size = 0 bytes/partition +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: SB score = 0 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: spilling from SB cost about 0 cycles +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: 16392 bytes/partition (100%) successfully pinned +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: pinning saved approximately 8300 cycles +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [SB_Allocator]: 0% SB utilization after allocation +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes loaded 103916036 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ColoringAllocator::Rep]: INFO: Post GCA average loaded DMA size 2889 bytes +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes saved 27262976 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ColoringAllocator::Rep]: INFO: Post GCA average saved DMA size 2958 bytes +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 2129920 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 130 bytes +2025-11-04T21:38:52Z USER 9072 (nc01/sg01) [ModuleForkPass]: coloring_allocator_sb finished after 0.150 seconds +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 424mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1334 memory location(s), 1 block(s), and 5006 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running address_rotation_sb +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=1334 blocks=1 instructions=5006 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 424mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2233 memory location(s), 1 block(s), and 4366 instruction(s). Max writers: 65 Max Readers: 448 +2025-11-04T21:38:52Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running dead_code_elim_o0 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to dead_code_elim_o0: modules=1 functions=1 allocs=2233 blocks=1 instructions=4366 Max writers: 65 Max Readers: 448 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [TensorCopyElim]: Tensor CP elimination: 63 +2025-11-04T21:38:52Z USER 9072 (nc01/sg00) [ModuleForkPass]: dead_code_elim_o0 finished after 0.010 seconds +2025-11-04T21:38:52Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 425mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2233 memory location(s), 1 block(s), and 4366 instruction(s). Max writers: 65 Max Readers: 448 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:52Z USER 9072 (nc01/sg01) [ModuleForkPass]: address_rotation_sb finished after 0.015 seconds +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 425mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1334 memory location(s), 1 block(s), and 5006 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running dma_optimization_sb +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to dma_optimization_sb: modules=1 functions=1 allocs=1334 blocks=1 instructions=5006 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: DMA optimization In bytes loaded or saved 131179012, 60.0326% input load, 3.19739% output write, 36.77% spill/reload [sg0001] +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [build_flow_deps]: Build fdeps inserted 40220 edges +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [build_flow_deps]: Done build fdeps 40220 Tue Nov 4 21:38:52 2025 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PreSched]: End build flow dependencies Tue Nov 4 21:38:52 2025 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PreSched]: Start remove useless insts Tue Nov 4 21:38:52 2025 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PreSched]: remove_useless_insts +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [build_flow_deps]: Build fdeps inserted 10915 edges +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [build_flow_deps]: Done build fdeps 10915 Tue Nov 4 21:38:52 2025 +2025-11-04T21:38:52Z USER 9072 (nc00/sg00) [ModuleForkPass]: build_fdeps finished after 0.037 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 424mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2311 memory location(s), 1 block(s), and 4461 instruction(s). Max writers: 65 Max Readers: 448 +2025-11-04T21:38:52Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running remove_redundancies +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to remove_redundancies: modules=1 functions=1 allocs=2311 blocks=1 instructions=4461 Max writers: 65 Max Readers: 448 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [RemoveRedundancies]: remove_clobbered_writes +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [RemoveRedundancies]: remove_clobbered_writes: 0 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [RemoveRedundancies]: remove_useless_insts +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [RemoveRedundancies]: remove Useless Instructions: 28 +2025-11-04T21:38:52Z USER 9072 (nc00/sg00) [ModuleForkPass]: remove_redundancies finished after 0.004 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 424mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2299 memory location(s), 1 block(s), and 4433 instruction(s). Max writers: 65 Max Readers: 448 +2025-11-04T21:38:52Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running anti_dependency_analyzer +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=2299 blocks=1 instructions=4433 Max writers: 65 Max Readers: 448 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB} +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PreSched]: remove Useless Instructions: 0 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PreSched]: End remove useless insts Tue Nov 4 21:38:52 2025 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PreSched]: Start scratchpad optimization Tue Nov 4 21:38:52 2025 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [TensorCopyElim]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PreSched]: End scratchpad optimization Tue Nov 4 21:38:52 2025 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: removed 0 identical load +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: adjusted 0 DMACopy remat +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: SB Rotation rotated 54 Sb address +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: sub-graph will get execute 27 times +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: [Load Merging]: removed 0 remat/cloned instructions +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: [Load shrink]: shrinked 0 GCA remat/cloned instructions +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: [Load Merging + Load shrink] reduced input/const loading DMA traffic 9961472, 7.5938% out of total dma traffic(7.87502e+07) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: SB Rotation rotated 1 Sb address +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: [spill optimization round 0]: removed 12 spill/reload instructions +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: [spill optimization round 0]: removed 12 spill/reload memory locations +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: SB Rotation rotated 8 Sb address +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: [spill optimization round 1]: removed 2 spill/reload instructions +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: [spill optimization round 1]: removed 2 spill/reload memory locations +2025-11-04T21:38:52Z USER 9072 (nc00/sg00) [ModuleForkPass]: anti_dependency_analyzer finished after 0.053 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 430mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2299 memory location(s), 1 block(s), and 4433 instruction(s). Max writers: 65 Max Readers: 448 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:52Z USER 9072 (nc00/sg01) [ModuleForkPass]: address_rotation_sb finished after 0.166 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 428mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1300 memory location(s), 1 block(s), and 4975 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running coloring_allocator_dram +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to coloring_allocator_dram: modules=1 functions=1 allocs=1300 blocks=1 instructions=4975 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:52Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running tensor_copy_elim +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=2299 blocks=1 instructions=4433 Max writers: 65 Max Readers: 448 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: [spill optimization round 2]: removed 0 spill/reload instructions +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: [spill optimization round 2]: removed 0 spill/reload memory locations +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: allocating spills in DRAM pre_link mode for address space Local +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: reserved space = 196608 bytes +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: spill space = 6815744 bytes +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: aligned spill space = 6815744 bytes +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: dram space = 107374182400 bytes +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: renumber locations +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: size = 13 +2025-11-04T21:38:52Z INFO 9072 []: find first defs for local +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 7340032, 15.2174% out of total spill/reload dma traffic +2025-11-04T21:38:52Z INFO 9072 []: find first defs for global +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [TensorCopyElim]: Tensor CP elimination: 64 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: Num intervals 13 Num locations 13 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload instructions +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload memory locations +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: IntervalTree Build Done +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: info.neighbors init Done +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: [Re-allocation Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [TensorCopyElim]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload instructions +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload memory locations +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: [remove extra save] removed 0 memlocs and 0 instructions +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: [remove_memset_spill]: removed 0 spill/reload instructions +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: [remove_memset_spill]: removed 0 spill/reload memory locations +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: IntervalTree readback Done +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: simplify interference graph +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: initialize low and high +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: lo = 13 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: hi = 0 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: total = 13 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: simplify +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: new candidates = 0 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: select ranges +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: CC buffer size limit 524288000 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: allreduce_dram_hwm 0 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: Real CC buffer size 0 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: DRAM hwm after allocation: 4194304 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: DRAM allocation successful +2025-11-04T21:38:52Z USER 9072 (nc00/sg01) [ModuleForkPass]: coloring_allocator_dram finished after 0.017 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 428mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1300 memory location(s), 1 block(s), and 4975 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running address_rotation_dram +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to address_rotation_dram: modules=1 functions=1 allocs=1300 blocks=1 instructions=4975 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: Runtime page size at 512MB +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [TensorCopyElim]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [TensorCopyElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PreSched]: DONE PRE scheduling Tue Nov 4 21:38:52 2025 +2025-11-04T21:38:52Z USER 9072 (nc01/sg02) [ModuleForkPass]: pre_sched finished after 0.336 seconds +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 428mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2990 memory location(s), 1 block(s), and 15126 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:52Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running tensor_copy_elim +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=2990 blocks=1 instructions=15126 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:52Z USER 9072 (nc00/sg00) [ModuleForkPass]: tensor_copy_elim finished after 0.027 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 427mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: DRAM hwm before rotation 4194304 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2234 memory location(s), 1 block(s), and 4369 instruction(s). Max writers: 65 Max Readers: 448 +2025-11-04T21:38:52Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running dead_code_elim_o0 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to dead_code_elim_o0: modules=1 functions=1 allocs=2234 blocks=1 instructions=4369 Max writers: 65 Max Readers: 448 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: DMA SpillSave Coalescing Round 0 combined 0 SpillSaves and Reloads +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: average loaded DMA size 2755 bytes +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: average saved DMA size 2908 bytes +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes loaded 88187396 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: INFO: Post DMA coalescing average loaded DMA size 2755 bytes +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes saved 25690112 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: INFO: Post DMA coalescing average saved DMA size 2908 bytes +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: allreduce buffer size 524288000 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: allreduce hwm 8388608 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: [Experiment partial DMA access] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: [DMA optimization] reduced DMA traffic 17301504, 13.1892% out of total dma traffic +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: DMA optimization Out bytes loaded or saved 113877508, 60.4059% input load, 3.68317% output write, 35.9109% spill/reload [sg0001] +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: Real CC buffer size 8388608 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes loaded 88187396 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: INFO: Post DMA optimization average loaded DMA size 2755 bytes +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes saved 25690112 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: INFO: Post DMA optimization average saved DMA size 2908 bytes +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes DMAcopyed 2129920 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: INFO: Post DMA optimization average DMAcopyed DMA size 130 bytes +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: INFO: Post DMA optimization average DMA size 2027 bytes +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: INFO: Finished set_spill_canreadUninit(module); +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: DRAM hwm after rotation 4194304 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: DRAM Rotation rotated 0 Dram address +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: DMA optimization re-enable optimization +2025-11-04T21:38:52Z USER 9072 (nc01/sg01) [ModuleForkPass]: dma_optimization_sb finished after 0.093 seconds +2025-11-04T21:38:52Z USER 9072 (nc00/sg01) [ModuleForkPass]: address_rotation_dram finished after 0.012 seconds +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 427mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 427mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1299 memory location(s), 1 block(s), and 4972 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running address_rotation_sb +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1300 memory location(s), 1 block(s), and 4975 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running tensorcopy_accel +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=1299 blocks=1 instructions=4972 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc00/sg00) [ModuleForkPass]: dead_code_elim_o0 finished after 0.005 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 427mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to tensorcopy_accel: modules=1 functions=1 allocs=1300 blocks=1 instructions=4975 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [TensorCopyAccel::Impl]: Running peephole optimization pass +2025-11-04T21:38:52Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2234 memory location(s), 1 block(s), and 4369 instruction(s). Max writers: 65 Max Readers: 448 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [TensorCopyAccel::Impl]: Accelerated 36 out of 256 tensorcopy in Function: sg0001 average acceleration factor: 1 +2025-11-04T21:38:52Z USER 9072 (nc00/sg01) [ModuleForkPass]: tensorcopy_accel finished after 0.001 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 428mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1300 memory location(s), 1 block(s), and 4975 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running peephole_opts +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to peephole_opts: modules=1 functions=1 allocs=1300 blocks=1 instructions=4975 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [PeepholeOpts]: PeepholeOpts enabled? Recip: true Tsp: true Tc: false SplitSelect: true SimplifyMemset true +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [TensorCopyElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:52Z USER 9072 (nc00/sg01) [ModuleForkPass]: peephole_opts finished after 0.002 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 428mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1300 memory location(s), 1 block(s), and 4976 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running lower_kernel +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to lower_kernel: modules=1 functions=1 allocs=1300 blocks=1 instructions=4976 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [LowerKernel]: Started running LowerKernel +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [LowerKernel]: BIR SB coloring allocator is disabled +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [LowerKernel]: Start of kernel lowering pass, number of insts: 4976, number of allocs: 1300 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [LowerKernel]: Found InstBIRKernel: [CausalAttentionMMSoftmaxMMWithoutSwap]I-2513-0 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [LowerKernel]: Scan BKs time (s): 0.002254 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [LowerKernel]: Set architecture: gen3 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [LowerKernel]: Input/output shapes for Kernel inst [I-2513-0] +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [LowerKernel]: input0: [ 4 128 2048 ] +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [LowerKernel]: input1: [ 4 128 2048 ] +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [LowerKernel]: input2: [ 4 2048 128 ] +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [LowerKernel]: input3: ap +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [LowerKernel]: output0: [ 4 128 2048 ] +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [LowerKernel]: do_input1_tp=false +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [LowerKernel]: do_out_tp=true +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [LowerKernel]: Legalized inp_ap=[[262144,4],[2048,128],[1,2048]] +Offset: 0 +Memory Location: {reshape.60}@DRAM(2097152x2)#Internal DebugInfo: +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [LowerKernel]: Legalized inp_ap=[[262144,4],[2048,128],[1,2048]] +Offset: 0 +Memory Location: {reshape.68}@DRAM(2097152x2)#Internal DebugInfo: +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [LowerKernel]: AP of Q indicates standalone Q tensor. +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [LowerKernel]: parallel_split_n = input1_ap[1].getStep() / input1_ap[2].getNum() = 2048 / 2048 = 1 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [LowerKernel]: Sharding/tiling split_i=0, split_n=1 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [LowerKernel]: Flash attention has been disabled +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [LowerKernel]: Scratch sbuf for kernel I-2513-0: [61440, 121724) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [LowerKernel]: seq_len=2048, seq_len2=2048, complete_seq_len2=2048 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [LowerKernel]: Creating identity matrices with AffineSelect +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: SB Rotation rotated 13 Sb address +2025-11-04T21:38:52Z USER 9072 (nc00/sg02) [ModuleForkPass]: tensor_copy_elim finished after 0.136 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 429mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3376 memory location(s), 1 block(s), and 15840 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:52Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running dynamic_dma_setup +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to dynamic_dma_setup: modules=1 functions=1 allocs=3376 blocks=1 instructions=15840 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:52Z USER 9072 (nc00/sg02) [ModuleForkPass]: dynamic_dma_setup finished after 0.000 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 428mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3377 memory location(s), 1 block(s), and 15840 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:52Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running runtime_memory_reservation +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to runtime_memory_reservation: modules=1 functions=1 allocs=3377 blocks=1 instructions=15840 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:52Z USER 9072 (nc00/sg02) [ModuleForkPass]: runtime_memory_reservation finished after 0.000 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 428mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3378 memory location(s), 1 block(s), and 15840 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:52Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running lower_klir_kernel +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to lower_klir_kernel: modules=1 functions=1 allocs=3378 blocks=1 instructions=15840 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:52Z USER 9072 (nc00/sg02) [ModuleForkPass]: lower_klir_kernel finished after 0.002 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [LowerKernel]: seq_len=2048, seq_len2=2048, complete_seq_len2=2048 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [LowerKernel]: Creating identity matrices with AffineSelect +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 430mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3378 memory location(s), 1 block(s), and 15840 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:52Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running lower_nki_kernel +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to lower_nki_kernel: modules=1 functions=1 allocs=3378 blocks=1 instructions=15840 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [TensorCopyElim]: Tensor CP elimination: 0 +2025-11-04T21:38:52Z USER 9072 (nc00/sg02) [ModuleForkPass]: lower_nki_kernel finished after 0.002 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 430mb, ru_maxrss: 437mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3378 memory location(s), 1 block(s), and 15840 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:52Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running coloring_allocator_psum +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to coloring_allocator_psum: modules=1 functions=1 allocs=3378 blocks=1 instructions=15840 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: SB Rotation rotated 98 Sb address +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [LowerKernel]: seq_len=2048, seq_len2=2048, complete_seq_len2=2048 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [LowerKernel]: Creating identity matrices with AffineSelect +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [TensorCopyElim]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [LowerKernel]: seq_len=2048, seq_len2=2048, complete_seq_len2=2048 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [LowerKernel]: Creating identity matrices with AffineSelect +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [LowerKernel]: Lower BKs time (s): 0.064573 +2025-11-04T21:38:52Z USER 9072 (nc00/sg01) [ModuleForkPass]: lower_kernel finished after 0.029 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 439mb, ru_maxrss: 439mb (delta=2mb) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2443 memory location(s), 1 block(s), and 6979 instruction(s). Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running lower_klir_kernel +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to lower_klir_kernel: modules=1 functions=1 allocs=2443 blocks=1 instructions=6979 Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc00/sg01) [ModuleForkPass]: lower_klir_kernel finished after 0.001 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 439mb, ru_maxrss: 439mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2443 memory location(s), 1 block(s), and 6979 instruction(s). Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running lower_nki_kernel +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to lower_nki_kernel: modules=1 functions=1 allocs=2443 blocks=1 instructions=6979 Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc00/sg01) [ModuleForkPass]: lower_nki_kernel finished after 0.001 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 439mb, ru_maxrss: 439mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2443 memory location(s), 1 block(s), and 6979 instruction(s). Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running non_ssa_legalization +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to non_ssa_legalization: modules=1 functions=1 allocs=2443 blocks=1 instructions=6979 Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [NonSSALeg]: remove_redundant_loads +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [NonSSALeg]: remove_redundant_loads: 0 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [TensorCopyElim]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [TensorCopyElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [TensorCopyElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:52Z USER 9072 (nc01/sg02) [ModuleForkPass]: tensor_copy_elim finished after 0.056 seconds +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 440mb, ru_maxrss: 440mb (delta=3mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2990 memory location(s), 1 block(s), and 15126 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:52Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running dynamic_dma_setup +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to dynamic_dma_setup: modules=1 functions=1 allocs=2990 blocks=1 instructions=15126 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:52Z USER 9072 (nc01/sg02) [ModuleForkPass]: dynamic_dma_setup finished after 0.000 seconds +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 439mb, ru_maxrss: 440mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2991 memory location(s), 1 block(s), and 15126 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:52Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running runtime_memory_reservation +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to runtime_memory_reservation: modules=1 functions=1 allocs=2991 blocks=1 instructions=15126 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:52Z USER 9072 (nc01/sg02) [ModuleForkPass]: runtime_memory_reservation finished after 0.000 seconds +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 439mb, ru_maxrss: 440mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2992 memory location(s), 1 block(s), and 15126 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:52Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running lower_klir_kernel +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to lower_klir_kernel: modules=1 functions=1 allocs=2992 blocks=1 instructions=15126 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [PSUM_Allocator]: allocating PSUM +2025-11-04T21:38:52Z USER 9072 (nc01/sg02) [ModuleForkPass]: lower_klir_kernel finished after 0.002 seconds +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 439mb, ru_maxrss: 440mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2992 memory location(s), 1 block(s), and 15126 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:52Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running lower_nki_kernel +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to lower_nki_kernel: modules=1 functions=1 allocs=2992 blocks=1 instructions=15126 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:52Z USER 9072 (nc01/sg02) [ModuleForkPass]: lower_nki_kernel finished after 0.002 seconds +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 440mb, ru_maxrss: 440mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2992 memory location(s), 1 block(s), and 15126 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:52Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running coloring_allocator_psum +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to coloring_allocator_psum: modules=1 functions=1 allocs=2992 blocks=1 instructions=15126 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [PSUM_Allocator]: main loop +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [PSUM_Allocator]: renumber locations +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [PSUM_Allocator]: size = 1278 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PSUM_Allocator]: allocating PSUM +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PSUM_Allocator]: main loop +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [NonSSALeg]: [Non-SSA legalization]created 32 memorylocations +2025-11-04T21:38:52Z USER 9072 (nc00/sg01) [ModuleForkPass]: non_ssa_legalization finished after 0.051 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 442mb, ru_maxrss: 442mb (delta=3mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PSUM_Allocator]: renumber locations +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PSUM_Allocator]: size = 1154 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2459 memory location(s), 1 block(s), and 6979 instruction(s). Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running dynamic_dma_cleanup +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to dynamic_dma_cleanup: modules=1 functions=1 allocs=2459 blocks=1 instructions=6979 Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [PSUM_Allocator]: build_no_bitmap start +2025-11-04T21:38:52Z USER 9072 (nc00/sg01) [ModuleForkPass]: dynamic_dma_cleanup finished after 0.002 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 442mb, ru_maxrss: 442mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2459 memory location(s), 1 block(s), and 6979 instruction(s). Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running birverifier +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=2459 blocks=1 instructions=6979 Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [PSUM_Allocator]: 100% PSUM demand before spilling +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [PSUM_Allocator]: PSUM high-water mark = 8 tensors +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [PSUM_Allocator]: found 1645 edges +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [PSUM_Allocator]: mean: 2.57433 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [PSUM_Allocator]: median: 1.68169 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [PSUM_Allocator]: adjacency vectors require 13160 bytes +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [PSUM_Allocator]: build_no_bitmap done +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [PSUM_Allocator]: find costs +2025-11-04T21:38:52Z WARNING 9072 [birverifier::InstVisitor]: (nc00/sg01) Non - output memory location with no reader: {I-2513-0_s0_aten__mul_broadcast.7-t210_b0}@SB<0,70916>(128x4)#Internal DebugInfo: +2025-11-04T21:38:52Z WARNING 9072 [birverifier::InstVisitor]: (nc00/sg01) Non - output memory location with no reader: {I-2513-0_s0_aten__mul_broadcast.7-t210_b1}@SB<0,70916>(128x4)#Internal DebugInfo: +2025-11-04T21:38:52Z WARNING 9072 [birverifier::InstVisitor]: (nc00/sg01) Non - output memory location with no reader: {I-2513-0_s0_aten__mul_broadcast.7-t210_b2}@SB<0,70916>(128x4)#Internal DebugInfo: +2025-11-04T21:38:52Z WARNING 9072 [birverifier::InstVisitor]: (nc00/sg01) Non - output memory location with no reader: {I-2513-0_s0_aten__mul_broadcast.7-t210_b3}@SB<0,70916>(128x4)#Internal DebugInfo: +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PSUM_Allocator]: build_no_bitmap start +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PSUM_Allocator]: 100% PSUM demand before spilling +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PSUM_Allocator]: PSUM high-water mark = 8 tensors +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PSUM_Allocator]: found 1583 edges +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PSUM_Allocator]: mean: 2.7435 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PSUM_Allocator]: median: 1.98285 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PSUM_Allocator]: adjacency vectors require 12664 bytes +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PSUM_Allocator]: build_no_bitmap done +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PSUM_Allocator]: find costs +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: SB Rotation rotated 54 Sb address +2025-11-04T21:38:52Z USER 9072 (nc00/sg01) [ModuleForkPass]: birverifier finished after 0.024 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 443mb, ru_maxrss: 443mb (delta=1mb) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2459 memory location(s), 1 block(s), and 6979 instruction(s). Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running dynamic_dma_scan +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to dynamic_dma_scan: modules=1 functions=1 allocs=2459 blocks=1 instructions=6979 Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [PSUM_Allocator]: best-of-n loop, heuristic = 0, allow_psum_spill_within_accum_group = false +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [PSUM_Allocator]: simplify interference graph +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [PSUM_Allocator]: initialize low and high +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [PSUM_Allocator]: lo = 1204 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [PSUM_Allocator]: hi = 74 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [PSUM_Allocator]: inf = 0 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [PSUM_Allocator]: total = 1278 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [PSUM_Allocator]: simplify +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [PSUM_Allocator]: new candidates = 0 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [PSUM_Allocator]: select ranges +2025-11-04T21:38:52Z USER 9072 (nc00/sg01) [ModuleForkPass]: dynamic_dma_scan finished after 0.001 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 443mb, ru_maxrss: 443mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2459 memory location(s), 1 block(s), and 6979 instruction(s). Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running build_fdeps +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to build_fdeps: modules=1 functions=1 allocs=2459 blocks=1 instructions=6979 Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: SB Rotation rotated 1 Sb address +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [build_flow_deps]: Start build fdeps. Invocation: 9Tue Nov 4 21:38:52 2025 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [build_flow_deps]: Allocs: 2459 instructions: 6979 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [PSUM_Allocator]: no more spills +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [PSUM_Allocator]: PSUM score = 0 (lower is better) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [PSUM_Allocator]: spilling from PSUM cost about 0 cycles +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [PSUM_Allocator]: 100% PSUM utilization after allocation +2025-11-04T21:38:52Z USER 9072 (nc00/sg02) [ModuleForkPass]: coloring_allocator_psum finished after 0.109 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 443mb, ru_maxrss: 443mb (delta=6mb) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3378 memory location(s), 1 block(s), and 15840 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:52Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running dma_optimization_psum +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to dma_optimization_psum: modules=1 functions=1 allocs=3378 blocks=1 instructions=15840 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: SB Rotation rotated 8 Sb address +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PSUM_Allocator]: best-of-n loop, heuristic = 0, allow_psum_spill_within_accum_group = false +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PSUM_Allocator]: simplify interference graph +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PSUM_Allocator]: initialize low and high +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PSUM_Allocator]: lo = 1080 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PSUM_Allocator]: hi = 74 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PSUM_Allocator]: inf = 0 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PSUM_Allocator]: total = 1154 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PSUM_Allocator]: simplify +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PSUM_Allocator]: new candidates = 0 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PSUM_Allocator]: select ranges +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PSUM_Allocator]: no more spills +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PSUM_Allocator]: PSUM score = 0 (lower is better) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PSUM_Allocator]: spilling from PSUM cost about 0 cycles +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [PSUM_Allocator]: 100% PSUM utilization after allocation +2025-11-04T21:38:52Z USER 9072 (nc01/sg02) [ModuleForkPass]: coloring_allocator_psum finished after 0.073 seconds +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 442mb, ru_maxrss: 443mb (delta=3mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2992 memory location(s), 1 block(s), and 15126 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:52Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running dma_optimization_psum +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to dma_optimization_psum: modules=1 functions=1 allocs=2992 blocks=1 instructions=15126 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:52Z USER 9072 (nc01/sg01) [ModuleForkPass]: address_rotation_sb finished after 0.136 seconds +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 441mb, ru_maxrss: 443mb (delta=6mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1299 memory location(s), 1 block(s), and 4972 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running coloring_allocator_dram +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to coloring_allocator_dram: modules=1 functions=1 allocs=1299 blocks=1 instructions=4972 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DRAM_Allocator]: allocating spills in DRAM pre_link mode for address space Local +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DRAM_Allocator]: reserved space = 196608 bytes +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DRAM_Allocator]: spill space = 6815744 bytes +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DRAM_Allocator]: aligned spill space = 6815744 bytes +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DRAM_Allocator]: dram space = 107374182400 bytes +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DRAM_Allocator]: renumber locations +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DRAM_Allocator]: size = 13 +2025-11-04T21:38:52Z INFO 9072 []: find first defs for local +2025-11-04T21:38:52Z INFO 9072 []: find first defs for global +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload instructions +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload memory locations +2025-11-04T21:38:52Z USER 9072 (nc00/sg02) [ModuleForkPass]: dma_optimization_psum finished after 0.027 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 441mb, ru_maxrss: 443mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3378 memory location(s), 1 block(s), and 15840 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:52Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running address_rotation_psum +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to address_rotation_psum: modules=1 functions=1 allocs=3378 blocks=1 instructions=15840 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [build_flow_deps]: Build fdeps inserted 19244 edges +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [build_flow_deps]: Done build fdeps 19244 Tue Nov 4 21:38:52 2025 +2025-11-04T21:38:52Z USER 9072 (nc00/sg01) [ModuleForkPass]: build_fdeps finished after 0.039 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 441mb, ru_maxrss: 443mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2459 memory location(s), 1 block(s), and 6979 instruction(s). Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running remove_redundancies +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to remove_redundancies: modules=1 functions=1 allocs=2459 blocks=1 instructions=6979 Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [RemoveRedundancies]: remove_clobbered_writes +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DRAM_Allocator]: Num intervals 13 Num locations 13 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DRAM_Allocator]: IntervalTree Build Done +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DRAM_Allocator]: info.neighbors init Done +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DRAM_Allocator]: IntervalTree readback Done +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DRAM_Allocator]: simplify interference graph +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DRAM_Allocator]: initialize low and high +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DRAM_Allocator]: lo = 13 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DRAM_Allocator]: hi = 0 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DRAM_Allocator]: total = 13 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DRAM_Allocator]: simplify +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DRAM_Allocator]: new candidates = 0 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DRAM_Allocator]: select ranges +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DRAM_Allocator]: CC buffer size limit 524288000 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DRAM_Allocator]: allreduce_dram_hwm 0 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DRAM_Allocator]: Real CC buffer size 0 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DRAM_Allocator]: DRAM hwm after allocation: 4194304 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DRAM_Allocator]: DRAM allocation successful +2025-11-04T21:38:52Z USER 9072 (nc01/sg01) [ModuleForkPass]: coloring_allocator_dram finished after 0.021 seconds +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 441mb, ru_maxrss: 443mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1299 memory location(s), 1 block(s), and 4972 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running address_rotation_dram +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to address_rotation_dram: modules=1 functions=1 allocs=1299 blocks=1 instructions=4972 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: Runtime page size at 512MB +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [RemoveRedundancies]: remove_clobbered_writes: 0 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [RemoveRedundancies]: remove_useless_insts +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [RemoveRedundancies]: remove Useless Instructions: 28 +2025-11-04T21:38:52Z USER 9072 (nc00/sg01) [ModuleForkPass]: remove_redundancies finished after 0.007 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 441mb, ru_maxrss: 443mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload instructions +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload memory locations +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: DRAM hwm before rotation 4194304 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2447 memory location(s), 1 block(s), and 6951 instruction(s). Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running anti_dependency_analyzer +2025-11-04T21:38:52Z USER 9072 (nc01/sg02) [ModuleForkPass]: dma_optimization_psum finished after 0.028 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=2447 blocks=1 instructions=6951 Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB} +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 441mb, ru_maxrss: 443mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2992 memory location(s), 1 block(s), and 15126 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:52Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running address_rotation_psum +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to address_rotation_psum: modules=1 functions=1 allocs=2992 blocks=1 instructions=15126 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: allreduce buffer size 524288000 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: allreduce hwm 8388608 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: Real CC buffer size 8388608 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: DRAM hwm after rotation 4194304 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: DRAM Rotation rotated 0 Dram address +2025-11-04T21:38:52Z USER 9072 (nc01/sg01) [ModuleForkPass]: address_rotation_dram finished after 0.013 seconds +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 443mb, ru_maxrss: 443mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1299 memory location(s), 1 block(s), and 4972 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running tensorcopy_accel +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to tensorcopy_accel: modules=1 functions=1 allocs=1299 blocks=1 instructions=4972 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [TensorCopyAccel::Impl]: Running peephole optimization pass +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [TensorCopyAccel::Impl]: Accelerated 36 out of 255 tensorcopy in Function: sg0001 average acceleration factor: 1 +2025-11-04T21:38:52Z USER 9072 (nc01/sg01) [ModuleForkPass]: tensorcopy_accel finished after 0.001 seconds +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 445mb, ru_maxrss: 445mb (delta=2mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1299 memory location(s), 1 block(s), and 4972 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running peephole_opts +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to peephole_opts: modules=1 functions=1 allocs=1299 blocks=1 instructions=4972 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [PeepholeOpts]: PeepholeOpts enabled? Recip: true Tsp: true Tc: false SplitSelect: true SimplifyMemset true +2025-11-04T21:38:52Z USER 9072 (nc01/sg01) [ModuleForkPass]: peephole_opts finished after 0.002 seconds +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 446mb, ru_maxrss: 446mb (delta=1mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1299 memory location(s), 1 block(s), and 4973 instruction(s). Max writers: 32 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running lower_kernel +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to lower_kernel: modules=1 functions=1 allocs=1299 blocks=1 instructions=4973 Max writers: 32 Max Readers: 496 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [LowerKernel]: Started running LowerKernel +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [LowerKernel]: BIR SB coloring allocator is disabled +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [LowerKernel]: Start of kernel lowering pass, number of insts: 4973, number of allocs: 1299 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [LowerKernel]: Found InstBIRKernel: [CausalAttentionMMSoftmaxMMWithoutSwap]I-2513-0 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [LowerKernel]: Scan BKs time (s): 0.001451 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [LowerKernel]: Set architecture: gen3 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [LowerKernel]: Input/output shapes for Kernel inst [I-2513-0] +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [LowerKernel]: input0: [ 4 128 2048 ] +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [LowerKernel]: input1: [ 4 128 2048 ] +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [LowerKernel]: input2: [ 4 2048 128 ] +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [LowerKernel]: input3: ap +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [LowerKernel]: output0: [ 4 128 2048 ] +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [LowerKernel]: do_input1_tp=false +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [LowerKernel]: do_out_tp=true +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [LowerKernel]: Legalized inp_ap=[[262144,4],[2048,128],[1,2048]] +Offset: 1048576 +Memory Location: {reshape.60}@DRAM(2097152x2)#Internal DebugInfo: +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [LowerKernel]: Legalized inp_ap=[[262144,4],[2048,128],[1,2048]] +Offset: 1048576 +Memory Location: {reshape.68}@DRAM(2097152x2)#Internal DebugInfo: +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [LowerKernel]: AP of Q indicates standalone Q tensor. +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [LowerKernel]: parallel_split_n = input1_ap[1].getStep() / input1_ap[2].getNum() = 2048 / 2048 = 1 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [LowerKernel]: Sharding/tiling split_i=0, split_n=1 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [LowerKernel]: Flash attention has been disabled +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [LowerKernel]: Scratch sbuf for kernel I-2513-0: [61440, 121724) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [LowerKernel]: seq_len=2048, seq_len2=2048, complete_seq_len2=2048 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [LowerKernel]: Creating identity matrices with AffineSelect +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [LowerKernel]: seq_len=2048, seq_len2=2048, complete_seq_len2=2048 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [LowerKernel]: Creating identity matrices with AffineSelect +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [LowerKernel]: seq_len=2048, seq_len2=2048, complete_seq_len2=2048 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [LowerKernel]: Creating identity matrices with AffineSelect +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [LowerKernel]: seq_len=2048, seq_len2=2048, complete_seq_len2=2048 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [LowerKernel]: Creating identity matrices with AffineSelect +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [LowerKernel]: Lower BKs time (s): 0.100282 +2025-11-04T21:38:52Z USER 9072 (nc01/sg01) [ModuleForkPass]: lower_kernel finished after 0.042 seconds +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 463mb, ru_maxrss: 463mb (delta=17mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2442 memory location(s), 1 block(s), and 6976 instruction(s). Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running lower_klir_kernel +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to lower_klir_kernel: modules=1 functions=1 allocs=2442 blocks=1 instructions=6976 Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc01/sg01) [ModuleForkPass]: lower_klir_kernel finished after 0.001 seconds +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 464mb, ru_maxrss: 464mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2442 memory location(s), 1 block(s), and 6976 instruction(s). Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running lower_nki_kernel +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to lower_nki_kernel: modules=1 functions=1 allocs=2442 blocks=1 instructions=6976 Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc01/sg01) [ModuleForkPass]: lower_nki_kernel finished after 0.001 seconds +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: PSUM Rotation rotated 0 PSUM Banks +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 464mb, ru_maxrss: 464mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2442 memory location(s), 1 block(s), and 6976 instruction(s). Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running non_ssa_legalization +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to non_ssa_legalization: modules=1 functions=1 allocs=2442 blocks=1 instructions=6976 Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [NonSSALeg]: remove_redundant_loads +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: PSUM Rotation rotated 62 PSUM Banks +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [NonSSALeg]: remove_redundant_loads: 0 +2025-11-04T21:38:52Z USER 9072 (nc00/sg01) [ModuleForkPass]: anti_dependency_analyzer finished after 0.084 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 463mb, ru_maxrss: 464mb (delta=21mb) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2447 memory location(s), 1 block(s), and 6951 instruction(s). Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running tensor_copy_elim +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=2447 blocks=1 instructions=6951 Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: PSUM Rotation rotated 3 PSUM Banks +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [TensorCopyElim]: Tensor CP elimination: 64 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: PSUM Rotation rotated 3 PSUM Banks +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [TensorCopyElim]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [NonSSALeg]: [Non-SSA legalization]created 32 memorylocations +2025-11-04T21:38:52Z USER 9072 (nc01/sg01) [ModuleForkPass]: non_ssa_legalization finished after 0.051 seconds +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 463mb, ru_maxrss: 464mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2458 memory location(s), 1 block(s), and 6976 instruction(s). Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running dynamic_dma_cleanup +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to dynamic_dma_cleanup: modules=1 functions=1 allocs=2458 blocks=1 instructions=6976 Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc01/sg01) [ModuleForkPass]: dynamic_dma_cleanup finished after 0.002 seconds +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 462mb, ru_maxrss: 464mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2458 memory location(s), 1 block(s), and 6976 instruction(s). Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running birverifier +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=2458 blocks=1 instructions=6976 Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z WARNING 9072 [birverifier::InstVisitor]: (nc01/sg01) Non - output memory location with no reader: {I-2513-0_s0_aten__mul_broadcast.7-t210_b0}@SB<0,70916>(128x4)#Internal DebugInfo: +2025-11-04T21:38:52Z WARNING 9072 [birverifier::InstVisitor]: (nc01/sg01) Non - output memory location with no reader: {I-2513-0_s0_aten__mul_broadcast.7-t210_b1}@SB<0,70916>(128x4)#Internal DebugInfo: +2025-11-04T21:38:52Z WARNING 9072 [birverifier::InstVisitor]: (nc01/sg01) Non - output memory location with no reader: {I-2513-0_s0_aten__mul_broadcast.7-t210_b2}@SB<0,70916>(128x4)#Internal DebugInfo: +2025-11-04T21:38:52Z WARNING 9072 [birverifier::InstVisitor]: (nc01/sg01) Non - output memory location with no reader: {I-2513-0_s0_aten__mul_broadcast.7-t210_b3}@SB<0,70916>(128x4)#Internal DebugInfo: +2025-11-04T21:38:52Z USER 9072 (nc00/sg01) [ModuleForkPass]: tensor_copy_elim finished after 0.037 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 462mb, ru_maxrss: 464mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2382 memory location(s), 1 block(s), and 6887 instruction(s). Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running dead_code_elim_o0 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to dead_code_elim_o0: modules=1 functions=1 allocs=2382 blocks=1 instructions=6887 Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc01/sg01) [ModuleForkPass]: birverifier finished after 0.016 seconds +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 463mb, ru_maxrss: 464mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2458 memory location(s), 1 block(s), and 6976 instruction(s). Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running dynamic_dma_scan +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to dynamic_dma_scan: modules=1 functions=1 allocs=2458 blocks=1 instructions=6976 Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc01/sg01) [ModuleForkPass]: dynamic_dma_scan finished after 0.001 seconds +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 463mb, ru_maxrss: 464mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2458 memory location(s), 1 block(s), and 6976 instruction(s). Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running build_fdeps +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to build_fdeps: modules=1 functions=1 allocs=2458 blocks=1 instructions=6976 Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [build_flow_deps]: Start build fdeps. Invocation: 10Tue Nov 4 21:38:52 2025 +2025-11-04T21:38:52Z USER 9072 (nc00/sg01) [ModuleForkPass]: dead_code_elim_o0 finished after 0.010 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 463mb, ru_maxrss: 464mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2382 memory location(s), 1 block(s), and 6887 instruction(s). Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [build_flow_deps]: Allocs: 2458 instructions: 6976 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [build_flow_deps]: Build fdeps inserted 19242 edges +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [build_flow_deps]: Done build fdeps 19242 Tue Nov 4 21:38:52 2025 +2025-11-04T21:38:52Z USER 9072 (nc01/sg01) [ModuleForkPass]: build_fdeps finished after 0.027 seconds +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 465mb, ru_maxrss: 465mb (delta=1mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2458 memory location(s), 1 block(s), and 6976 instruction(s). Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running remove_redundancies +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to remove_redundancies: modules=1 functions=1 allocs=2458 blocks=1 instructions=6976 Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [RemoveRedundancies]: remove_clobbered_writes +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [RemoveRedundancies]: remove_clobbered_writes: 0 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [RemoveRedundancies]: remove_useless_insts +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: PSUM Rotation rotated 6 PSUM Banks +2025-11-04T21:38:52Z USER 9072 (nc00/sg02) [ModuleForkPass]: address_rotation_psum finished after 0.177 seconds +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 465mb, ru_maxrss: 465mb (delta=22mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [RemoveRedundancies]: remove Useless Instructions: 28 +2025-11-04T21:38:52Z USER 9072 (nc01/sg01) [ModuleForkPass]: remove_redundancies finished after 0.005 seconds +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 465mb, ru_maxrss: 465mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2446 memory location(s), 1 block(s), and 6948 instruction(s). Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running anti_dependency_analyzer +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=2446 blocks=1 instructions=6948 Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB} +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3378 memory location(s), 1 block(s), and 15840 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:52Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running coloring_allocator_sb +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to coloring_allocator_sb: modules=1 functions=1 allocs=3378 blocks=1 instructions=15840 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: PSUM Rotation rotated 6 PSUM Banks +2025-11-04T21:38:52Z USER 9072 (nc01/sg02) [ModuleForkPass]: address_rotation_psum finished after 0.172 seconds +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 466mb, ru_maxrss: 466mb (delta=23mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2992 memory location(s), 1 block(s), and 15126 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:52Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running coloring_allocator_sb +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to coloring_allocator_sb: modules=1 functions=1 allocs=2992 blocks=1 instructions=15126 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes loaded 231771806 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [ColoringAllocator::Rep]: INFO: Pre GCA average loaded DMA size 3479 bytes +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes saved 12751371 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [ColoringAllocator::Rep]: INFO: Pre GCA average saved DMA size 3384 bytes +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 4100 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 241 bytes +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes loaded 231136402 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ColoringAllocator::Rep]: INFO: Pre GCA average loaded DMA size 3500 bytes +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes saved 12736000 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ColoringAllocator::Rep]: INFO: Pre GCA average saved DMA size 3776 bytes +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 4100 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 241 bytes +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: allocating SB +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: main loop +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: renumber locations +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: size = 2046 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: allocating SB +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: main loop +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: renumber locations +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: size = 1794 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: find partners +2025-11-04T21:38:52Z USER 9072 (nc01/sg01) [ModuleForkPass]: anti_dependency_analyzer finished after 0.048 seconds +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 470mb, ru_maxrss: 470mb (delta=5mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2446 memory location(s), 1 block(s), and 6948 instruction(s). Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running tensor_copy_elim +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=2446 blocks=1 instructions=6948 Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: find partners +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: found 1271 accumulation groups +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: largest = _dot.199-t1193_i2 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: tensors = 36 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: requires 49152 bytes/partition +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: expanding partners +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: found 1147 accumulation groups +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: largest = _dot.199-t1193_i48 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: tensors = 36 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: requires 49152 bytes/partition +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: expanding partners +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [TensorCopyElim]: Tensor CP elimination: 64 +2025-11-04T21:38:52Z INFO 9072 []: find first defs for local +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [TensorCopyElim]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:52Z INFO 9072 []: find first defs for global +2025-11-04T21:38:52Z USER 9072 (nc01/sg01) [ModuleForkPass]: tensor_copy_elim finished after 0.029 seconds +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 469mb, ru_maxrss: 470mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2381 memory location(s), 1 block(s), and 6884 instruction(s). Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running dead_code_elim_o0 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to dead_code_elim_o0: modules=1 functions=1 allocs=2381 blocks=1 instructions=6884 Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z INFO 9072 []: find first defs for local +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: find loads +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: 2 pin count +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: 432 remat count +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: 2 pinned tensors will require about 16392 bytes/partition +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: build interference graph +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: pass 1 int-tree +2025-11-04T21:38:52Z USER 9072 (nc01/sg01) [ModuleForkPass]: dead_code_elim_o0 finished after 0.015 seconds +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 471mb, ru_maxrss: 471mb (delta=1mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2381 memory location(s), 1 block(s), and 6884 instruction(s). Max writers: 65 Max Readers: 496 +2025-11-04T21:38:52Z INFO 9072 []: find first defs for global +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: Num intervals 1794 Num locations 1794 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: IntervalTree Build Done +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: info.neighbors init Done +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: info.neighbors partners Done +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: IntervalTree readback Done +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: edge: 30203 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: mean: 33.6711 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: median: 26.4587 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: find costs +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: best-of-n loop, heuristic = 0 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: simplify interference graph +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: initialize safe & unsafe +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: safe = 1519 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: unsafe = 214 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: inf = 59 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: total = 1792 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: simplify +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: simplify_step3_sorted2 #Unsafe 196 #Pinned 0 #Safe 0 minCost 0.00452202 maxCost 1.13113 locations 1794 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: new candidates = 55 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: select ranges +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: Total: 1792 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: Spilled: 0.000 (0) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: Allocated: 1.000 (1792) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: Rover zone: 0.900 (1613) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: Pre-rover zone: 0.014 (25) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: Post-rover zone: 0.086 (154) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: Slice zone: 0.000 (0) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: Blocks nothing: 0.015 (26) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: Blocks medium: 0.001 (2) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: Visited until medium blocking (mean): 0.716 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: Visited until medium blocking (median): 0.714 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: Visited until medium blocking (p95): 0.714 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: Blocks tall: 0.984 (1764) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: Visited until tall blocking (mean): 0.816 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: Visited until tall blocking (median): 0.999 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: Visited until tall blocking (p95): 1.000 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: Success +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: SB spills = 0 tensors +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: size = 0 bytes/partition +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: remats = 0 tensors +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: unpinned = 0 tensors +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: size = 0 bytes/partition +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: SB score = 0 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: spilling from SB cost about 0 cycles +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: 16392 bytes/partition (100%) successfully pinned +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: pinning saved approximately 8300 cycles +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [SB_Allocator]: 0% SB utilization after allocation +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: find loads +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: 2 pin count +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: 442 remat count +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: 2 pinned tensors will require about 16392 bytes/partition +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: build interference graph +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: pass 1 int-tree +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes loaded 231136402 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ColoringAllocator::Rep]: INFO: Post GCA average loaded DMA size 3500 bytes +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes saved 12736000 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ColoringAllocator::Rep]: INFO: Post GCA average saved DMA size 3776 bytes +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 4100 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 241 bytes +2025-11-04T21:38:52Z USER 9072 (nc01/sg02) [ModuleForkPass]: coloring_allocator_sb finished after 0.128 seconds +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 475mb, ru_maxrss: 475mb (delta=9mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2992 memory location(s), 1 block(s), and 15126 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:52Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running address_rotation_sb +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=2992 blocks=1 instructions=15126 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: Num intervals 2046 Num locations 2046 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: IntervalTree Build Done +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: info.neighbors init Done +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: info.neighbors partners Done +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: IntervalTree readback Done +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: edge: 31789 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: mean: 31.0743 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: median: 23.3305 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: find costs +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: best-of-n loop, heuristic = 0 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: simplify interference graph +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: initialize safe & unsafe +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:52Z USER 9072 (nc01/sg02) [ModuleForkPass]: address_rotation_sb finished after 0.028 seconds +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 473mb, ru_maxrss: 475mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2992 memory location(s), 1 block(s), and 15126 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:52Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running dma_optimization_sb +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to dma_optimization_sb: modules=1 functions=1 allocs=2992 blocks=1 instructions=15126 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: DMA optimization In bytes loaded or saved 243872402, 89.6167% input load, 0% output write, 10.3833% spill/reload [sg0002] +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: safe = 1769 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: unsafe = 216 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: inf = 59 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: total = 2044 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: simplify +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: simplify_step3_sorted2 #Unsafe 196 #Pinned 0 #Safe 0 minCost 0.00452202 maxCost 1.13113 locations 2046 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: new candidates = 55 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: select ranges +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: Total: 2044 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: Spilled: 0.000 (0) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: Allocated: 1.000 (2044) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: Rover zone: 0.886 (1811) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: Pre-rover zone: 0.033 (67) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: Post-rover zone: 0.079 (162) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: Slice zone: 0.002 (4) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: Blocks nothing: 0.057 (116) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: Blocks medium: 0.006 (12) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: Visited until medium blocking (mean): 0.588 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: Visited until medium blocking (median): 0.612 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: Visited until medium blocking (p95): 0.842 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: Blocks tall: 0.937 (1916) +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: Visited until tall blocking (mean): 0.742 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: Visited until tall blocking (median): 0.981 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: Visited until tall blocking (p95): 1.000 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: Success +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: removed 0 identical load +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: adjusted 0 DMACopy remat +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: sub-graph will get execute 1 times +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: [Load Merging]: removed 0 remat/cloned instructions +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: [Load shrink]: shrinked 0 GCA remat/cloned instructions +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: [Load Merging + Load shrink] reduced input/const loading DMA traffic 5246976, 2.15153% out of total dma traffic(2.1855e+08) +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload instructions +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload memory locations +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload instructions +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload memory locations +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: [Re-allocation Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload instructions +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload memory locations +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: [remove extra save] removed 0 memlocs and 0 instructions +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: [remove_memset_spill]: removed 0 spill/reload instructions +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: [remove_memset_spill]: removed 0 spill/reload memory locations +2025-11-04T21:38:52Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: SB spills = 0 tensors +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: size = 0 bytes/partition +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: remats = 0 tensors +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: unpinned = 0 tensors +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: size = 0 bytes/partition +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: SB score = 0 +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: spilling from SB cost about 0 cycles +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: 16392 bytes/partition (100%) successfully pinned +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: pinning saved approximately 8300 cycles +2025-11-04T21:38:52Z INFO 9072 (nc00/sg02) [SB_Allocator]: 0% SB utilization after allocation +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes loaded 231771806 +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ColoringAllocator::Rep]: INFO: Post GCA average loaded DMA size 3479 bytes +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes saved 12751371 +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ColoringAllocator::Rep]: INFO: Post GCA average saved DMA size 3384 bytes +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 4100 +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 241 bytes +2025-11-04T21:38:53Z USER 9072 (nc00/sg02) [ModuleForkPass]: coloring_allocator_sb finished after 0.352 seconds +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 473mb, ru_maxrss: 475mb (delta=10mb) +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3378 memory location(s), 1 block(s), and 15840 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running address_rotation_sb +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=3378 blocks=1 instructions=15840 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: DMA SpillSave Coalescing Round 0 combined 0 SpillSaves and Reloads +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: average loaded DMA size 3488 bytes +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: average saved DMA size 3776 bytes +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes loaded 225889426 +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: INFO: Post DMA coalescing average loaded DMA size 3488 bytes +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes saved 12736000 +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: INFO: Post DMA coalescing average saved DMA size 3776 bytes +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:53Z USER 9072 (nc00/sg02) [ModuleForkPass]: address_rotation_sb finished after 0.028 seconds +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 465mb, ru_maxrss: 475mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3378 memory location(s), 1 block(s), and 15840 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running dma_optimization_sb +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to dma_optimization_sb: modules=1 functions=1 allocs=3378 blocks=1 instructions=15840 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: DMA optimization In bytes loaded or saved 244523177, 89.5074% input load, 1.63584e-06% output write, 10.4926% spill/reload [sg0002] +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: [Experiment partial DMA access] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: [DMA optimization] reduced DMA traffic 5246976, 2.15153% out of total dma traffic +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: DMA optimization Out bytes loaded or saved 238625426, 89.3883% input load, 0% output write, 10.6117% spill/reload [sg0002] +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes loaded 225889426 +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: INFO: Post DMA optimization average loaded DMA size 3488 bytes +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes saved 12736000 +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: INFO: Post DMA optimization average saved DMA size 3776 bytes +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: removed 0 identical load +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes DMAcopyed 4100 +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: INFO: Post DMA optimization average DMAcopyed DMA size 241 bytes +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: INFO: Post DMA optimization average DMA size 3501 bytes +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: INFO: Finished set_spill_canreadUninit(module); +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: adjusted 0 DMACopy remat +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: DMA optimization re-enable optimization +2025-11-04T21:38:53Z USER 9072 (nc01/sg02) [ModuleForkPass]: dma_optimization_sb finished after 0.249 seconds +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 466mb, ru_maxrss: 475mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2980 memory location(s), 1 block(s), and 15115 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running address_rotation_sb +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=2980 blocks=1 instructions=15115 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: sub-graph will get execute 1 times +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: [Load Merging]: removed 0 remat/cloned instructions +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: SB Rotation rotated 11 Sb address +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: [Load shrink]: shrinked 0 GCA remat/cloned instructions +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: [Load Merging + Load shrink] reduced input/const loading DMA traffic 5242880, 2.14412% out of total dma traffic(2.18866e+08) +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload instructions +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload memory locations +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: SB Rotation rotated 177 Sb address +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload instructions +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload memory locations +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: [Re-allocation Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload instructions +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload memory locations +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: [remove extra save] removed 0 memlocs and 0 instructions +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: [remove_memset_spill]: removed 0 spill/reload instructions +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: [remove_memset_spill]: removed 0 spill/reload memory locations +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: DMA SpillSave Coalescing Round 0 combined 0 SpillSaves and Reloads +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: average loaded DMA size 3467 bytes +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: average saved DMA size 3384 bytes +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes loaded 226528926 +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: INFO: Post DMA coalescing average loaded DMA size 3467 bytes +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes saved 12751371 +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: INFO: Post DMA coalescing average saved DMA size 3384 bytes +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: [Experiment partial DMA access] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: [DMA optimization] reduced DMA traffic 5242880, 2.14412% out of total dma traffic +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: DMA optimization Out bytes loaded or saved 239280297, 89.2775% input load, 1.67168e-06% output write, 10.7225% spill/reload [sg0002] +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes loaded 226528926 +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: INFO: Post DMA optimization average loaded DMA size 3467 bytes +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes saved 12751371 +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: INFO: Post DMA optimization average saved DMA size 3384 bytes +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes DMAcopyed 4100 +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: INFO: Post DMA optimization average DMAcopyed DMA size 241 bytes +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: INFO: Post DMA optimization average DMA size 3461 bytes +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: INFO: Finished set_spill_canreadUninit(module); +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: DMA optimization re-enable optimization +2025-11-04T21:38:53Z USER 9072 (nc00/sg02) [ModuleForkPass]: dma_optimization_sb finished after 0.158 seconds +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 468mb, ru_maxrss: 475mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3367 memory location(s), 1 block(s), and 15830 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running address_rotation_sb +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=3367 blocks=1 instructions=15830 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: SB Rotation rotated 15 Sb address +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: SB Rotation rotated 194 Sb address +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: SB Rotation rotated 23 Sb address +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: SB Rotation rotated 1 Sb address +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: SB Rotation rotated 60 Sb address +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: SB Rotation rotated 31 Sb address +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: SB Rotation rotated 3 Sb address +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: SB Rotation rotated 124 Sb address +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:53Z USER 9072 (nc01/sg02) [ModuleForkPass]: address_rotation_sb finished after 0.454 seconds +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 472mb, ru_maxrss: 475mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2980 memory location(s), 1 block(s), and 15115 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running coloring_allocator_dram +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to coloring_allocator_dram: modules=1 functions=1 allocs=2980 blocks=1 instructions=15115 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DRAM_Allocator]: allocating spills in DRAM pre_link mode for address space Local +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DRAM_Allocator]: reserved space = 32768 bytes +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DRAM_Allocator]: spill space = 4194304 bytes +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DRAM_Allocator]: aligned spill space = 4194304 bytes +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DRAM_Allocator]: dram space = 107374182400 bytes +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DRAM_Allocator]: renumber locations +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DRAM_Allocator]: size = 8 +2025-11-04T21:38:53Z INFO 9072 []: find first defs for local +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:53Z USER 9072 (nc00/sg02) [ModuleForkPass]: address_rotation_sb finished after 0.362 seconds +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 473mb, ru_maxrss: 475mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3367 memory location(s), 1 block(s), and 15830 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running coloring_allocator_dram +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to coloring_allocator_dram: modules=1 functions=1 allocs=3367 blocks=1 instructions=15830 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:53Z INFO 9072 []: find first defs for global +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: allocating spills in DRAM pre_link mode for address space Local +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: reserved space = 34824 bytes +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: spill space = 4201476 bytes +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: aligned spill space = 4222976 bytes +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: dram space = 107374182400 bytes +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: renumber locations +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: size = 15 +2025-11-04T21:38:53Z INFO 9072 []: find first defs for local +2025-11-04T21:38:53Z INFO 9072 []: find first defs for global +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DRAM_Allocator]: Num intervals 8 Num locations 8 +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DRAM_Allocator]: IntervalTree Build Done +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DRAM_Allocator]: info.neighbors init Done +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DRAM_Allocator]: IntervalTree readback Done +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DRAM_Allocator]: simplify interference graph +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DRAM_Allocator]: initialize low and high +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DRAM_Allocator]: lo = 8 +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DRAM_Allocator]: hi = 0 +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DRAM_Allocator]: total = 8 +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DRAM_Allocator]: simplify +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DRAM_Allocator]: new candidates = 0 +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DRAM_Allocator]: select ranges +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DRAM_Allocator]: CC buffer size limit 524288000 +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DRAM_Allocator]: allreduce_dram_hwm 0 +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DRAM_Allocator]: Real CC buffer size 0 +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DRAM_Allocator]: DRAM hwm after allocation: 4194304 +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DRAM_Allocator]: DRAM allocation successful +2025-11-04T21:38:53Z USER 9072 (nc01/sg02) [ModuleForkPass]: coloring_allocator_dram finished after 0.072 seconds +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: Num intervals 15 Num locations 15 +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: IntervalTree Build Done +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: info.neighbors init Done +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 477mb, ru_maxrss: 477mb (delta=2mb) +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2980 memory location(s), 1 block(s), and 15115 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running address_rotation_dram +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to address_rotation_dram: modules=1 functions=1 allocs=2980 blocks=1 instructions=15115 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: Runtime page size at 512MB +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: IntervalTree readback Done +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: simplify interference graph +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: initialize low and high +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: lo = 15 +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: hi = 0 +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: total = 15 +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: simplify +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: new candidates = 0 +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: select ranges +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: CC buffer size limit 524288000 +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: allreduce_dram_hwm 0 +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: Real CC buffer size 0 +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: DRAM hwm after allocation: 4194304 +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: DRAM allocation successful +2025-11-04T21:38:53Z USER 9072 (nc00/sg02) [ModuleForkPass]: coloring_allocator_dram finished after 0.042 seconds +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 474mb, ru_maxrss: 477mb (delta=2mb) +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3367 memory location(s), 1 block(s), and 15830 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running address_rotation_dram +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to address_rotation_dram: modules=1 functions=1 allocs=3367 blocks=1 instructions=15830 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: Runtime page size at 512MB +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: DRAM hwm before rotation 4194304 +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: DRAM hwm before rotation 4194304 +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: allreduce buffer size 524288000 +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: allreduce hwm 8388608 +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: Real CC buffer size 8388608 +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: DRAM hwm after rotation 4194304 +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: DRAM Rotation rotated 0 Dram address +2025-11-04T21:38:53Z USER 9072 (nc00/sg02) [ModuleForkPass]: address_rotation_dram finished after 0.022 seconds +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 472mb, ru_maxrss: 477mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3367 memory location(s), 1 block(s), and 15830 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running tensorcopy_accel +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to tensorcopy_accel: modules=1 functions=1 allocs=3367 blocks=1 instructions=15830 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [TensorCopyAccel::Impl]: Running peephole optimization pass +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [TensorCopyAccel::Impl]: Accelerated 609 out of 1468 tensorcopy in Function: sg0002 average acceleration factor: 1 +2025-11-04T21:38:53Z USER 9072 (nc00/sg02) [ModuleForkPass]: tensorcopy_accel finished after 0.007 seconds +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 474mb, ru_maxrss: 477mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3367 memory location(s), 1 block(s), and 15830 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running peephole_opts +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to peephole_opts: modules=1 functions=1 allocs=3367 blocks=1 instructions=15830 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [PeepholeOpts]: PeepholeOpts enabled? Recip: true Tsp: true Tc: false SplitSelect: true SimplifyMemset true +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: allreduce buffer size 524288000 +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: allreduce hwm 8388608 +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: Real CC buffer size 8388608 +2025-11-04T21:38:53Z USER 9072 (nc00/sg02) [ModuleForkPass]: peephole_opts finished after 0.006 seconds +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 472mb, ru_maxrss: 477mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3367 memory location(s), 1 block(s), and 15834 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running lower_kernel +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: DRAM hwm after rotation 4194304 +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: DRAM Rotation rotated 0 Dram address +2025-11-04T21:38:53Z USER 9072 (nc01/sg02) [ModuleForkPass]: address_rotation_dram finished after 0.040 seconds +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 472mb, ru_maxrss: 477mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to lower_kernel: modules=1 functions=1 allocs=3367 blocks=1 instructions=15834 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [LowerKernel]: Started running LowerKernel +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [LowerKernel]: BIR SB coloring allocator is disabled +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [LowerKernel]: Start of kernel lowering pass, number of insts: 15834, number of allocs: 3367 +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2980 memory location(s), 1 block(s), and 15115 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running tensorcopy_accel +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to tensorcopy_accel: modules=1 functions=1 allocs=2980 blocks=1 instructions=15115 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [TensorCopyAccel::Impl]: Running peephole optimization pass +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [LowerKernel]: Scan BKs time (s): 0.0016 +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [LowerKernel]: Lower BKs time (s): 3e-06 +2025-11-04T21:38:53Z USER 9072 (nc00/sg02) [ModuleForkPass]: lower_kernel finished after 0.002 seconds +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 472mb, ru_maxrss: 477mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3367 memory location(s), 1 block(s), and 15834 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running lower_klir_kernel +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to lower_klir_kernel: modules=1 functions=1 allocs=3367 blocks=1 instructions=15834 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z USER 9072 (nc00/sg02) [ModuleForkPass]: lower_klir_kernel finished after 0.001 seconds +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 473mb, ru_maxrss: 477mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3367 memory location(s), 1 block(s), and 15834 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running lower_nki_kernel +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to lower_nki_kernel: modules=1 functions=1 allocs=3367 blocks=1 instructions=15834 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z USER 9072 (nc00/sg02) [ModuleForkPass]: lower_nki_kernel finished after 0.001 seconds +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 474mb, ru_maxrss: 477mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3367 memory location(s), 1 block(s), and 15834 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running non_ssa_legalization +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to non_ssa_legalization: modules=1 functions=1 allocs=3367 blocks=1 instructions=15834 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [NonSSALeg]: remove_redundant_loads +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [NonSSALeg]: remove_redundant_loads: 0 +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [TensorCopyAccel::Impl]: Accelerated 609 out of 1329 tensorcopy in Function: sg0002 average acceleration factor: 1 +2025-11-04T21:38:53Z USER 9072 (nc01/sg02) [ModuleForkPass]: tensorcopy_accel finished after 0.012 seconds +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 475mb, ru_maxrss: 477mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2980 memory location(s), 1 block(s), and 15115 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running peephole_opts +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to peephole_opts: modules=1 functions=1 allocs=2980 blocks=1 instructions=15115 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [PeepholeOpts]: PeepholeOpts enabled? Recip: true Tsp: true Tc: false SplitSelect: true SimplifyMemset true +2025-11-04T21:38:53Z USER 9072 (nc01/sg02) [ModuleForkPass]: peephole_opts finished after 0.013 seconds +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 474mb, ru_maxrss: 477mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2980 memory location(s), 1 block(s), and 15119 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running lower_kernel +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to lower_kernel: modules=1 functions=1 allocs=2980 blocks=1 instructions=15119 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [LowerKernel]: Started running LowerKernel +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [LowerKernel]: BIR SB coloring allocator is disabled +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [LowerKernel]: Start of kernel lowering pass, number of insts: 15119, number of allocs: 2980 +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [LowerKernel]: Scan BKs time (s): 0.003007 +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [LowerKernel]: Lower BKs time (s): 4e-06 +2025-11-04T21:38:53Z USER 9072 (nc01/sg02) [ModuleForkPass]: lower_kernel finished after 0.002 seconds +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 474mb, ru_maxrss: 477mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2980 memory location(s), 1 block(s), and 15119 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running lower_klir_kernel +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to lower_klir_kernel: modules=1 functions=1 allocs=2980 blocks=1 instructions=15119 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [NonSSALeg]: [Non-SSA legalization]created 0 memorylocations +2025-11-04T21:38:53Z USER 9072 (nc00/sg02) [ModuleForkPass]: non_ssa_legalization finished after 0.045 seconds +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 474mb, ru_maxrss: 477mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3367 memory location(s), 1 block(s), and 15834 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running dynamic_dma_cleanup +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to dynamic_dma_cleanup: modules=1 functions=1 allocs=3367 blocks=1 instructions=15834 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z USER 9072 (nc00/sg02) [ModuleForkPass]: dynamic_dma_cleanup finished after 0.004 seconds +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 474mb, ru_maxrss: 477mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3367 memory location(s), 1 block(s), and 15834 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running birverifier +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=3367 blocks=1 instructions=15834 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z USER 9072 (nc01/sg02) [ModuleForkPass]: lower_klir_kernel finished after 0.009 seconds +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 474mb, ru_maxrss: 477mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2980 memory location(s), 1 block(s), and 15119 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running lower_nki_kernel +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to lower_nki_kernel: modules=1 functions=1 allocs=2980 blocks=1 instructions=15119 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z USER 9072 (nc01/sg02) [ModuleForkPass]: lower_nki_kernel finished after 0.002 seconds +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 474mb, ru_maxrss: 477mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2980 memory location(s), 1 block(s), and 15119 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running non_ssa_legalization +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to non_ssa_legalization: modules=1 functions=1 allocs=2980 blocks=1 instructions=15119 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [NonSSALeg]: remove_redundant_loads +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [NonSSALeg]: remove_redundant_loads: 0 +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [NonSSALeg]: [Non-SSA legalization]created 0 memorylocations +2025-11-04T21:38:53Z USER 9072 (nc01/sg02) [ModuleForkPass]: non_ssa_legalization finished after 0.013 seconds +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 474mb, ru_maxrss: 477mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2980 memory location(s), 1 block(s), and 15119 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running dynamic_dma_cleanup +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to dynamic_dma_cleanup: modules=1 functions=1 allocs=2980 blocks=1 instructions=15119 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z USER 9072 (nc01/sg02) [ModuleForkPass]: dynamic_dma_cleanup finished after 0.003 seconds +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 474mb, ru_maxrss: 477mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2980 memory location(s), 1 block(s), and 15119 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running birverifier +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=2980 blocks=1 instructions=15119 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z WARNING 9072 [birverifier::InstVisitor]: (nc01/sg02) Non - output memory location with no reader: {divide.1_1267_i1}@SB<32,16384>(1x1024)#Internal DebugInfo: +2025-11-04T21:38:53Z WARNING 9072 [birverifier::InstVisitor]: (nc01/sg02) Non - output memory location with no reader: {select.5_1272_i1}@SB<96,17536>(1x1024)#Internal DebugInfo: +2025-11-04T21:38:53Z USER 9072 (nc00/sg02) [ModuleForkPass]: birverifier finished after 0.037 seconds +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 474mb, ru_maxrss: 477mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3367 memory location(s), 1 block(s), and 15834 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running dynamic_dma_scan +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to dynamic_dma_scan: modules=1 functions=1 allocs=3367 blocks=1 instructions=15834 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z USER 9072 (nc00/sg02) [ModuleForkPass]: dynamic_dma_scan finished after 0.003 seconds +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 474mb, ru_maxrss: 477mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3367 memory location(s), 1 block(s), and 15834 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running build_fdeps +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to build_fdeps: modules=1 functions=1 allocs=3367 blocks=1 instructions=15834 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [build_flow_deps]: Start build fdeps. Invocation: 11Tue Nov 4 21:38:53 2025 +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [build_flow_deps]: Allocs: 3367 instructions: 15834 +2025-11-04T21:38:53Z USER 9072 (nc01/sg02) [ModuleForkPass]: birverifier finished after 0.038 seconds +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 475mb, ru_maxrss: 477mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2980 memory location(s), 1 block(s), and 15119 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running dynamic_dma_scan +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to dynamic_dma_scan: modules=1 functions=1 allocs=2980 blocks=1 instructions=15119 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z USER 9072 (nc01/sg02) [ModuleForkPass]: dynamic_dma_scan finished after 0.008 seconds +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 477mb, ru_maxrss: 477mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2980 memory location(s), 1 block(s), and 15119 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running build_fdeps +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to build_fdeps: modules=1 functions=1 allocs=2980 blocks=1 instructions=15119 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [build_flow_deps]: Start build fdeps. Invocation: 12Tue Nov 4 21:38:53 2025 +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [build_flow_deps]: Allocs: 2980 instructions: 15119 +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [build_flow_deps]: Build fdeps inserted 52076 edges +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [build_flow_deps]: Done build fdeps 52076 Tue Nov 4 21:38:53 2025 +2025-11-04T21:38:53Z USER 9072 (nc00/sg02) [ModuleForkPass]: build_fdeps finished after 0.089 seconds +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 484mb, ru_maxrss: 484mb (delta=7mb) +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3367 memory location(s), 1 block(s), and 15834 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running remove_redundancies +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to remove_redundancies: modules=1 functions=1 allocs=3367 blocks=1 instructions=15834 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [RemoveRedundancies]: remove_clobbered_writes +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [RemoveRedundancies]: remove_clobbered_writes: 0 +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [RemoveRedundancies]: remove_useless_insts +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [RemoveRedundancies]: remove Useless Instructions: 0 +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [build_flow_deps]: Build fdeps inserted 40225 edges +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [build_flow_deps]: Done build fdeps 40225 Tue Nov 4 21:38:53 2025 +2025-11-04T21:38:53Z USER 9072 (nc01/sg02) [ModuleForkPass]: build_fdeps finished after 0.064 seconds +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 484mb, ru_maxrss: 484mb (delta=7mb) +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2980 memory location(s), 1 block(s), and 15119 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running remove_redundancies +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to remove_redundancies: modules=1 functions=1 allocs=2980 blocks=1 instructions=15119 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [RemoveRedundancies]: remove_clobbered_writes +2025-11-04T21:38:53Z USER 9072 (nc00/sg02) [ModuleForkPass]: remove_redundancies finished after 0.020 seconds +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 484mb, ru_maxrss: 484mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3367 memory location(s), 1 block(s), and 15834 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running anti_dependency_analyzer +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=3367 blocks=1 instructions=15834 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB} +2025-11-04T21:38:53Z INFO 9072 (nc00/sg02) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [RemoveRedundancies]: remove_clobbered_writes: 0 +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [RemoveRedundancies]: remove_useless_insts +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [RemoveRedundancies]: remove Useless Instructions: 0 +2025-11-04T21:38:53Z USER 9072 (nc01/sg02) [ModuleForkPass]: remove_redundancies finished after 0.008 seconds +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 484mb, ru_maxrss: 484mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2980 memory location(s), 1 block(s), and 15119 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running anti_dependency_analyzer +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=2980 blocks=1 instructions=15119 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB} +2025-11-04T21:38:53Z INFO 9072 (nc01/sg02) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:54Z USER 9072 (nc00/sg02) [ModuleForkPass]: anti_dependency_analyzer finished after 0.196 seconds +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 522mb, ru_maxrss: 522mb (delta=38mb) +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3367 memory location(s), 1 block(s), and 15834 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:54Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running tensor_copy_elim +2025-11-04T21:38:54Z USER 9072 (nc01/sg02) [ModuleForkPass]: anti_dependency_analyzer finished after 0.194 seconds +2025-11-04T21:38:54Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 501mb, ru_maxrss: 522mb (delta=38mb) +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=3367 blocks=1 instructions=15834 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:54Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2980 memory location(s), 1 block(s), and 15119 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:54Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running tensor_copy_elim +2025-11-04T21:38:54Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=2980 blocks=1 instructions=15119 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [TensorCopyElim]: Tensor CP elimination: 0 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [TensorCopyElim]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:54Z INFO 9072 (nc01/sg02) [TensorCopyElim]: Tensor CP elimination: 0 +2025-11-04T21:38:54Z USER 9072 (nc00/sg02) [ModuleForkPass]: tensor_copy_elim finished after 0.038 seconds +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 502mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3367 memory location(s), 1 block(s), and 15834 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:54Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running dead_code_elim_o0 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to dead_code_elim_o0: modules=1 functions=1 allocs=3367 blocks=1 instructions=15834 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:54Z INFO 9072 (nc01/sg02) [TensorCopyElim]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:54Z USER 9072 (nc00/sg02) [ModuleForkPass]: dead_code_elim_o0 finished after 0.014 seconds +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 501mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3367 memory location(s), 1 block(s), and 15834 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:54Z USER 9072 (nc01/sg02) [ModuleForkPass]: tensor_copy_elim finished after 0.068 seconds +2025-11-04T21:38:54Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 501mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2980 memory location(s), 1 block(s), and 15119 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:54Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running dead_code_elim_o0 +2025-11-04T21:38:54Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to dead_code_elim_o0: modules=1 functions=1 allocs=2980 blocks=1 instructions=15119 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:54Z USER 9072 (nc01/sg02) [ModuleForkPass]: dead_code_elim_o0 finished after 0.013 seconds +2025-11-04T21:38:54Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 499mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2980 memory location(s), 1 block(s), and 15119 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:54Z USER 9072 [ModuleForkPass]: Compilation status: Total modules: 6, Passed: 6, Failed: 0 +2025-11-04T21:38:54Z USER 9072 [BackendPassManager]: mod_parallel_pass finished after 2.427 seconds +2025-11-04T21:38:54Z INFO 9072 [BackendPassManager]: curr_vmrss: 499mb, ru_maxrss: 522mb (delta=85mb) +2025-11-04T21:38:54Z USER 9072 [BackendPassManager]: Running subgraph_parallel_pass +2025-11-04T21:38:54Z INFO 9072 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=6 functions=6 allocs=15577 blocks=6 instructions=53459 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:54Z USER 9072 (sg01) [SubgraphForkPass]: Running localize_shared_memory +2025-11-04T21:38:54Z USER 9072 (sg02) [SubgraphForkPass]: Running localize_shared_memory +2025-11-04T21:38:54Z INFO 9072 (sg02) [SubgraphForkPass]: Inputs to localize_shared_memory: modules=2 functions=2 allocs=6347 blocks=2 instructions=30953 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:54Z INFO 9072 (sg01) [SubgraphForkPass]: Inputs to localize_shared_memory: modules=2 functions=2 allocs=4763 blocks=2 instructions=13771 Max writers: 65 Max Readers: 496 +2025-11-04T21:38:54Z USER 9072 (sg01) [SubgraphForkPass]: localize_shared_memory finished after 0.001 seconds +2025-11-04T21:38:54Z USER 9072 (sg02) [SubgraphForkPass]: localize_shared_memory finished after 0.001 seconds +2025-11-04T21:38:54Z INFO 9072 (sg01) [SubgraphForkPass]: curr_vmrss: 495mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9072 (sg02) [SubgraphForkPass]: curr_vmrss: 495mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9072 (sg01) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 4763 memory location(s), 2 block(s), and 13771 instruction(s). Max writers: 65 Max Readers: 496 +2025-11-04T21:38:54Z USER 9072 (sg01) [SubgraphForkPass]: Running lower_local_collectives +2025-11-04T21:38:54Z INFO 9072 (sg02) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 6347 memory location(s), 2 block(s), and 30953 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:54Z USER 9072 (sg02) [SubgraphForkPass]: Running lower_local_collectives +2025-11-04T21:38:54Z INFO 9072 (sg01) [SubgraphForkPass]: Inputs to lower_local_collectives: modules=2 functions=2 allocs=4763 blocks=2 instructions=13771 Max writers: 65 Max Readers: 496 +2025-11-04T21:38:54Z INFO 9072 (sg02) [SubgraphForkPass]: Inputs to lower_local_collectives: modules=2 functions=2 allocs=6347 blocks=2 instructions=30953 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:54Z USER 9072 (sg01) [SubgraphForkPass]: lower_local_collectives finished after 0.004 seconds +2025-11-04T21:38:54Z USER 9072 (sg00) [SubgraphForkPass]: Running localize_shared_memory +2025-11-04T21:38:54Z INFO 9072 (sg01) [SubgraphForkPass]: curr_vmrss: 495mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9072 (sg01) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 4763 memory location(s), 2 block(s), and 13775 instruction(s). Max writers: 65 Max Readers: 496 +2025-11-04T21:38:54Z USER 9072 (sg01) [SubgraphForkPass]: Running extend_shared_lifetimes +2025-11-04T21:38:54Z INFO 9072 (sg00) [SubgraphForkPass]: Inputs to localize_shared_memory: modules=2 functions=2 allocs=4467 blocks=2 instructions=8735 Max writers: 65 Max Readers: 448 +2025-11-04T21:38:54Z INFO 9072 (sg01) [SubgraphForkPass]: Inputs to extend_shared_lifetimes: modules=2 functions=2 allocs=4763 blocks=2 instructions=13775 Max writers: 65 Max Readers: 496 +2025-11-04T21:38:54Z USER 9072 (sg00) [SubgraphForkPass]: localize_shared_memory finished after 0.001 seconds +2025-11-04T21:38:54Z INFO 9072 (sg00) [SubgraphForkPass]: curr_vmrss: 495mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9072 (sg00) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 4467 memory location(s), 2 block(s), and 8735 instruction(s). Max writers: 65 Max Readers: 448 +2025-11-04T21:38:54Z USER 9072 (sg00) [SubgraphForkPass]: Running lower_local_collectives +2025-11-04T21:38:54Z INFO 9072 (sg00) [SubgraphForkPass]: Inputs to lower_local_collectives: modules=2 functions=2 allocs=4467 blocks=2 instructions=8735 Max writers: 65 Max Readers: 448 +2025-11-04T21:38:54Z USER 9072 (sg00) [SubgraphForkPass]: lower_local_collectives finished after 0.010 seconds +2025-11-04T21:38:54Z INFO 9072 (sg00) [SubgraphForkPass]: curr_vmrss: 495mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9072 (sg00) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 4467 memory location(s), 2 block(s), and 8741 instruction(s). Max writers: 65 Max Readers: 448 +2025-11-04T21:38:54Z USER 9072 (sg00) [SubgraphForkPass]: Running extend_shared_lifetimes +2025-11-04T21:38:54Z INFO 9072 (sg00) [SubgraphForkPass]: Inputs to extend_shared_lifetimes: modules=2 functions=2 allocs=4467 blocks=2 instructions=8741 Max writers: 65 Max Readers: 448 +2025-11-04T21:38:54Z USER 9072 (sg02) [SubgraphForkPass]: lower_local_collectives finished after 0.024 seconds +2025-11-04T21:38:54Z INFO 9072 (sg02) [SubgraphForkPass]: curr_vmrss: 495mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z USER 9072 (sg01) [SubgraphForkPass]: extend_shared_lifetimes finished after 0.022 seconds +2025-11-04T21:38:54Z INFO 9072 (sg01) [SubgraphForkPass]: curr_vmrss: 495mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9072 (sg01) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 4763 memory location(s), 2 block(s), and 13779 instruction(s). Max writers: 66 Max Readers: 496 +2025-11-04T21:38:54Z INFO 9072 (sg02) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 6353 memory location(s), 2 block(s), and 30971 instruction(s). Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:54Z USER 9072 (sg02) [SubgraphForkPass]: Running extend_shared_lifetimes +2025-11-04T21:38:54Z INFO 9072 (sg02) [SubgraphForkPass]: Inputs to extend_shared_lifetimes: modules=2 functions=2 allocs=6353 blocks=2 instructions=30971 Max writers: 298 Max Readers: 5434 +2025-11-04T21:38:54Z USER 9072 (sg00) [SubgraphForkPass]: extend_shared_lifetimes finished after 0.029 seconds +2025-11-04T21:38:54Z INFO 9072 (sg00) [SubgraphForkPass]: curr_vmrss: 495mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9072 (sg00) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 4467 memory location(s), 2 block(s), and 8745 instruction(s). Max writers: 66 Max Readers: 448 +2025-11-04T21:38:54Z USER 9072 (sg02) [SubgraphForkPass]: extend_shared_lifetimes finished after 0.069 seconds +2025-11-04T21:38:54Z INFO 9072 (sg02) [SubgraphForkPass]: curr_vmrss: 495mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9072 (sg02) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 6353 memory location(s), 2 block(s), and 30975 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:54Z USER 9072 [SubgraphForkPass]: Compilation status: Total subgraphs: 3, Passed: 3, Failed: 0 +2025-11-04T21:38:54Z USER 9072 [BackendPassManager]: subgraph_parallel_pass finished after 0.104 seconds +2025-11-04T21:38:54Z INFO 9072 [BackendPassManager]: curr_vmrss: 495mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z USER 9072 [BackendPassManager]: Running mod_parallel_pass +2025-11-04T21:38:54Z INFO 9072 [BackendPassManager]: Inputs to mod_parallel_pass: modules=6 functions=6 allocs=15583 blocks=6 instructions=53499 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:54Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running coloring_allocator_dram_shared +2025-11-04T21:38:54Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running coloring_allocator_dram_shared +2025-11-04T21:38:54Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running coloring_allocator_dram_shared +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to coloring_allocator_dram_shared: modules=1 functions=1 allocs=2234 blocks=1 instructions=4374 Max writers: 66 Max Readers: 448 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:54Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to coloring_allocator_dram_shared: modules=1 functions=1 allocs=2233 blocks=1 instructions=4371 Max writers: 66 Max Readers: 448 +2025-11-04T21:38:54Z INFO 9072 (nc01/sg00) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to coloring_allocator_dram_shared: modules=1 functions=1 allocs=3370 blocks=1 instructions=15845 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:54Z INFO 9072 (nc01/sg00) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:54Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running coloring_allocator_dram_shared +2025-11-04T21:38:54Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running coloring_allocator_dram_shared +2025-11-04T21:38:54Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to coloring_allocator_dram_shared: modules=1 functions=1 allocs=2382 blocks=1 instructions=6891 Max writers: 66 Max Readers: 496 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg01) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:54Z INFO 9072 (nc00/sg01) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:54Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to coloring_allocator_dram_shared: modules=1 functions=1 allocs=2381 blocks=1 instructions=6888 Max writers: 66 Max Readers: 496 +2025-11-04T21:38:54Z INFO 9072 (nc01/sg01) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:54Z INFO 9072 (nc01/sg01) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:54Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running coloring_allocator_dram_shared +2025-11-04T21:38:54Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to coloring_allocator_dram_shared: modules=1 functions=1 allocs=2983 blocks=1 instructions=15130 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:54Z INFO 9072 (nc01/sg02) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:54Z INFO 9072 (nc01/sg02) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:54Z INFO 9072 (nc01/sg00) [DRAM_Allocator]: allocating spills in DRAM pre_link mode for address space Shared +2025-11-04T21:38:54Z INFO 9072 (nc01/sg00) [DRAM_Allocator]: reserved space = 164096 bytes +2025-11-04T21:38:54Z INFO 9072 (nc01/sg00) [DRAM_Allocator]: spill space = 46137344 bytes +2025-11-04T21:38:54Z INFO 9072 (nc01/sg00) [DRAM_Allocator]: aligned spill space = 46137344 bytes +2025-11-04T21:38:54Z INFO 9072 (nc01/sg00) [DRAM_Allocator]: dram space = 107374182400 bytes +2025-11-04T21:38:54Z INFO 9072 (nc01/sg00) [DRAM_Allocator]: Skipping shared tensor allocations on core 1, marking as remoteLocalTarget instead +2025-11-04T21:38:54Z USER 9072 (nc01/sg00) [ModuleForkPass]: coloring_allocator_dram_shared finished after 0.014 seconds +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: allocating spills in DRAM pre_link mode for address space Shared +2025-11-04T21:38:54Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 498mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2233 memory location(s), 1 block(s), and 4371 instruction(s). Max writers: 66 Max Readers: 448 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: reserved space = 164096 bytes +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: spill space = 46137344 bytes +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: aligned spill space = 46137344 bytes +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: dram space = 107374182400 bytes +2025-11-04T21:38:54Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: allocating spills in DRAM pre_link mode for address space Shared +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: renumber locations +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: size = 10 +2025-11-04T21:38:54Z INFO 9072 []: find first defs for local +2025-11-04T21:38:54Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: reserved space = 6979584 bytes +2025-11-04T21:38:54Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: spill space = 58720256 bytes +2025-11-04T21:38:54Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: aligned spill space = 58720256 bytes +2025-11-04T21:38:54Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: dram space = 107374182400 bytes +2025-11-04T21:38:54Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: renumber locations +2025-11-04T21:38:54Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: size = 9 +2025-11-04T21:38:54Z INFO 9072 []: find first defs for local +2025-11-04T21:38:54Z INFO 9072 []: find first defs for global +2025-11-04T21:38:54Z INFO 9072 (nc01/sg01) [DRAM_Allocator]: allocating spills in DRAM pre_link mode for address space Shared +2025-11-04T21:38:54Z INFO 9072 (nc01/sg01) [DRAM_Allocator]: reserved space = 6979584 bytes +2025-11-04T21:38:54Z INFO 9072 (nc01/sg01) [DRAM_Allocator]: spill space = 58720256 bytes +2025-11-04T21:38:54Z INFO 9072 (nc01/sg01) [DRAM_Allocator]: aligned spill space = 58720256 bytes +2025-11-04T21:38:54Z INFO 9072 (nc01/sg01) [DRAM_Allocator]: dram space = 107374182400 bytes +2025-11-04T21:38:54Z INFO 9072 (nc01/sg01) [DRAM_Allocator]: Skipping shared tensor allocations on core 1, marking as remoteLocalTarget instead +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: allocating spills in DRAM pre_link mode for address space Shared +2025-11-04T21:38:54Z USER 9072 (nc01/sg01) [ModuleForkPass]: coloring_allocator_dram_shared finished after 0.027 seconds +2025-11-04T21:38:54Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 498mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2381 memory location(s), 1 block(s), and 6888 instruction(s). Max writers: 66 Max Readers: 496 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: reserved space = 4236300 bytes +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: spill space = 33872898 bytes +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: aligned spill space = 33918976 bytes +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: dram space = 107374182400 bytes +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: renumber locations +2025-11-04T21:38:54Z INFO 9072 []: find first defs for global +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: size = 19 +2025-11-04T21:38:54Z INFO 9072 []: find first defs for local +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: Num intervals 10 Num locations 10 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: IntervalTree Build Done +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: info.neighbors init Done +2025-11-04T21:38:54Z INFO 9072 []: find first defs for global +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: IntervalTree readback Done +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: simplify interference graph +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: initialize low and high +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: lo = 10 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: hi = 0 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: total = 10 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: simplify +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: new candidates = 0 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: Fall back to default allocation strategy [Core0 Local, Shared] +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: select ranges +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: CC buffer size limit 524288000 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: allreduce_dram_hwm 29360128 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: Real CC buffer size 29360128 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: DRAM hwm after allocation: 46137344 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [DRAM_Allocator]: DRAM allocation successful +2025-11-04T21:38:54Z USER 9072 (nc00/sg00) [ModuleForkPass]: coloring_allocator_dram_shared finished after 0.054 seconds +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 497mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2234 memory location(s), 1 block(s), and 4374 instruction(s). Max writers: 66 Max Readers: 448 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: Num intervals 9 Num locations 9 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: IntervalTree Build Done +2025-11-04T21:38:54Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: info.neighbors init Done +2025-11-04T21:38:54Z INFO 9072 (nc01/sg02) [DRAM_Allocator]: allocating spills in DRAM pre_link mode for address space Shared +2025-11-04T21:38:54Z INFO 9072 (nc01/sg02) [DRAM_Allocator]: reserved space = 4227072 bytes +2025-11-04T21:38:54Z INFO 9072 (nc01/sg02) [DRAM_Allocator]: spill space = 33872898 bytes +2025-11-04T21:38:54Z INFO 9072 (nc01/sg02) [DRAM_Allocator]: aligned spill space = 33918976 bytes +2025-11-04T21:38:54Z INFO 9072 (nc01/sg02) [DRAM_Allocator]: dram space = 107374182400 bytes +2025-11-04T21:38:54Z INFO 9072 (nc01/sg02) [DRAM_Allocator]: Skipping shared tensor allocations on core 1, marking as remoteLocalTarget instead +2025-11-04T21:38:54Z USER 9072 (nc01/sg02) [ModuleForkPass]: coloring_allocator_dram_shared finished after 0.065 seconds +2025-11-04T21:38:54Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 497mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2983 memory location(s), 1 block(s), and 15130 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: IntervalTree readback Done +2025-11-04T21:38:54Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: simplify interference graph +2025-11-04T21:38:54Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: initialize low and high +2025-11-04T21:38:54Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: lo = 9 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: hi = 0 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: total = 9 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: simplify +2025-11-04T21:38:54Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: new candidates = 0 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: Already used DRAM hwm: 4194304 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: Fall back to default allocation strategy [Core0 Local, Shared] +2025-11-04T21:38:54Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: Already used DRAM hwm: 4194304 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: select ranges +2025-11-04T21:38:54Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: CC buffer size limit 524288000 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: allreduce_dram_hwm 37748736 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: Real CC buffer size 37748736 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: DRAM hwm after allocation: 58720256 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg01) [DRAM_Allocator]: DRAM allocation successful +2025-11-04T21:38:54Z USER 9072 (nc00/sg01) [ModuleForkPass]: coloring_allocator_dram_shared finished after 0.080 seconds +2025-11-04T21:38:54Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 498mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2382 memory location(s), 1 block(s), and 6891 instruction(s). Max writers: 66 Max Readers: 496 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: Num intervals 19 Num locations 19 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: IntervalTree Build Done +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: info.neighbors init Done +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: IntervalTree readback Done +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: simplify interference graph +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: initialize low and high +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: lo = 19 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: hi = 0 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: total = 19 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: simplify +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: new candidates = 0 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: Already used DRAM hwm: 4194304 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: Fall back to default allocation strategy [Core0 Local, Shared] +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: Already used DRAM hwm: 4194304 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: select ranges +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: CC buffer size limit 524288000 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: allreduce_dram_hwm 20987904 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: Real CC buffer size 20987904 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: DRAM hwm after allocation: 29691904 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [DRAM_Allocator]: DRAM allocation successful +2025-11-04T21:38:54Z USER 9072 (nc00/sg02) [ModuleForkPass]: coloring_allocator_dram_shared finished after 0.090 seconds +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 497mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3370 memory location(s), 1 block(s), and 15845 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:54Z USER 9072 [ModuleForkPass]: Compilation status: Total modules: 6, Passed: 6, Failed: 0 +2025-11-04T21:38:54Z USER 9072 [BackendPassManager]: mod_parallel_pass finished after 0.093 seconds +2025-11-04T21:38:54Z INFO 9072 [BackendPassManager]: curr_vmrss: 494mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z USER 9072 [BackendPassManager]: Running subgraph_parallel_pass +2025-11-04T21:38:54Z INFO 9072 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=6 functions=6 allocs=15583 blocks=6 instructions=53499 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:54Z USER 9072 (sg02) [SubgraphForkPass]: Running sync_shared_allocations +2025-11-04T21:38:54Z USER 9072 (sg01) [SubgraphForkPass]: Running sync_shared_allocations +2025-11-04T21:38:54Z INFO 9072 (sg01) [SubgraphForkPass]: Inputs to sync_shared_allocations: modules=2 functions=2 allocs=4763 blocks=2 instructions=13779 Max writers: 66 Max Readers: 496 +2025-11-04T21:38:54Z INFO 9072 (sg02) [SubgraphForkPass]: Inputs to sync_shared_allocations: modules=2 functions=2 allocs=6353 blocks=2 instructions=30975 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:54Z USER 9072 (sg00) [SubgraphForkPass]: Running sync_shared_allocations +2025-11-04T21:38:54Z USER 9072 (sg02) [SubgraphForkPass]: sync_shared_allocations finished after 0.001 seconds +2025-11-04T21:38:54Z INFO 9072 (sg02) [SubgraphForkPass]: curr_vmrss: 494mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9072 (sg00) [SubgraphForkPass]: Inputs to sync_shared_allocations: modules=2 functions=2 allocs=4467 blocks=2 instructions=8745 Max writers: 66 Max Readers: 448 +2025-11-04T21:38:54Z INFO 9072 (sg02) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 6353 memory location(s), 2 block(s), and 30975 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:54Z USER 9072 (sg01) [SubgraphForkPass]: sync_shared_allocations finished after 0.006 seconds +2025-11-04T21:38:54Z INFO 9072 (sg01) [SubgraphForkPass]: curr_vmrss: 493mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z USER 9072 (sg00) [SubgraphForkPass]: sync_shared_allocations finished after 0.006 seconds +2025-11-04T21:38:54Z INFO 9072 (sg00) [SubgraphForkPass]: curr_vmrss: 493mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9072 (sg00) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 4467 memory location(s), 2 block(s), and 8745 instruction(s). Max writers: 66 Max Readers: 448 +2025-11-04T21:38:54Z INFO 9072 (sg01) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 4763 memory location(s), 2 block(s), and 13779 instruction(s). Max writers: 66 Max Readers: 496 +2025-11-04T21:38:54Z USER 9072 [SubgraphForkPass]: Compilation status: Total subgraphs: 3, Passed: 3, Failed: 0 +2025-11-04T21:38:54Z USER 9072 [BackendPassManager]: subgraph_parallel_pass finished after 0.010 seconds +2025-11-04T21:38:54Z INFO 9072 [BackendPassManager]: curr_vmrss: 492mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z USER 9072 [BackendPassManager]: Running mod_parallel_pass +2025-11-04T21:38:54Z INFO 9072 [BackendPassManager]: Inputs to mod_parallel_pass: modules=6 functions=6 allocs=15583 blocks=6 instructions=53499 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:54Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running anti_dependency_analyzer_post_shared_dram +2025-11-04T21:38:54Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running anti_dependency_analyzer_post_shared_dram +2025-11-04T21:38:54Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running anti_dependency_analyzer_post_shared_dram +2025-11-04T21:38:54Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running anti_dependency_analyzer_post_shared_dram +2025-11-04T21:38:54Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to anti_dependency_analyzer_post_shared_dram: modules=1 functions=1 allocs=2382 blocks=1 instructions=6891 Max writers: 66 Max Readers: 496 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg01) [AntiDependencyAnalyzer]: Analysis types: {DRAM} +2025-11-04T21:38:54Z INFO 9072 (nc00/sg01) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:54Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to anti_dependency_analyzer_post_shared_dram: modules=1 functions=1 allocs=2983 blocks=1 instructions=15130 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:54Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to anti_dependency_analyzer_post_shared_dram: modules=1 functions=1 allocs=2381 blocks=1 instructions=6888 Max writers: 66 Max Readers: 496 +2025-11-04T21:38:54Z INFO 9072 (nc01/sg02) [AntiDependencyAnalyzer]: Analysis types: {DRAM} +2025-11-04T21:38:54Z INFO 9072 (nc01/sg02) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:54Z INFO 9072 (nc01/sg01) [AntiDependencyAnalyzer]: Analysis types: {DRAM} +2025-11-04T21:38:54Z INFO 9072 (nc01/sg01) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to anti_dependency_analyzer_post_shared_dram: modules=1 functions=1 allocs=3370 blocks=1 instructions=15845 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [AntiDependencyAnalyzer]: Analysis types: {DRAM} +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:54Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running anti_dependency_analyzer_post_shared_dram +2025-11-04T21:38:54Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to anti_dependency_analyzer_post_shared_dram: modules=1 functions=1 allocs=2233 blocks=1 instructions=4371 Max writers: 66 Max Readers: 448 +2025-11-04T21:38:54Z INFO 9072 (nc01/sg00) [AntiDependencyAnalyzer]: Analysis types: {DRAM} +2025-11-04T21:38:54Z INFO 9072 (nc01/sg00) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:54Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running anti_dependency_analyzer_post_shared_dram +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to anti_dependency_analyzer_post_shared_dram: modules=1 functions=1 allocs=2234 blocks=1 instructions=4374 Max writers: 66 Max Readers: 448 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [AntiDependencyAnalyzer]: Analysis types: {DRAM} +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:54Z USER 9072 (nc00/sg01) [ModuleForkPass]: anti_dependency_analyzer_post_shared_dram finished after 0.019 seconds +2025-11-04T21:38:54Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 492mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2382 memory location(s), 1 block(s), and 6891 instruction(s). Max writers: 66 Max Readers: 496 +2025-11-04T21:38:54Z USER 9072 (nc00/sg00) [ModuleForkPass]: anti_dependency_analyzer_post_shared_dram finished after 0.011 seconds +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 492mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2234 memory location(s), 1 block(s), and 4374 instruction(s). Max writers: 66 Max Readers: 448 +2025-11-04T21:38:54Z USER 9072 (nc01/sg00) [ModuleForkPass]: anti_dependency_analyzer_post_shared_dram finished after 0.022 seconds +2025-11-04T21:38:54Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 492mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2233 memory location(s), 1 block(s), and 4371 instruction(s). Max writers: 66 Max Readers: 448 +2025-11-04T21:38:54Z USER 9072 (nc01/sg02) [ModuleForkPass]: anti_dependency_analyzer_post_shared_dram finished after 0.031 seconds +2025-11-04T21:38:54Z USER 9072 (nc01/sg01) [ModuleForkPass]: anti_dependency_analyzer_post_shared_dram finished after 0.029 seconds +2025-11-04T21:38:54Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 492mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2381 memory location(s), 1 block(s), and 6888 instruction(s). Max writers: 66 Max Readers: 496 +2025-11-04T21:38:54Z USER 9072 (nc00/sg02) [ModuleForkPass]: anti_dependency_analyzer_post_shared_dram finished after 0.036 seconds +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 492mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3370 memory location(s), 1 block(s), and 15845 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:54Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 492mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2983 memory location(s), 1 block(s), and 15130 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:54Z USER 9072 [ModuleForkPass]: Compilation status: Total modules: 6, Passed: 6, Failed: 0 +2025-11-04T21:38:54Z USER 9072 [BackendPassManager]: mod_parallel_pass finished after 0.040 seconds +2025-11-04T21:38:54Z INFO 9072 [BackendPassManager]: curr_vmrss: 492mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z USER 9072 [BackendPassManager]: Running nc_parallel_pass +2025-11-04T21:38:54Z INFO 9072 [BackendPassManager]: Inputs to nc_parallel_pass: modules=6 functions=6 allocs=15583 blocks=6 instructions=53499 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:54Z USER 9072 (nc00) [CoreForkPass]: Running memory_analysis_after_coloring_allocator_dram_shared +2025-11-04T21:38:54Z USER 9072 (nc01) [CoreForkPass]: Running memory_analysis_after_coloring_allocator_dram_shared +2025-11-04T21:38:54Z INFO 9072 (nc00) [CoreForkPass]: Inputs to memory_analysis_after_coloring_allocator_dram_shared: modules=3 functions=3 allocs=7986 blocks=3 instructions=27110 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:54Z INFO 9072 (nc01) [CoreForkPass]: Inputs to memory_analysis_after_coloring_allocator_dram_shared: modules=3 functions=3 allocs=7597 blocks=3 instructions=26389 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:54Z USER 9072 (nc00) [CoreForkPass]: memory_analysis_after_coloring_allocator_dram_shared finished after 0.123 seconds +2025-11-04T21:38:54Z INFO 9072 (nc00) [CoreForkPass]: curr_vmrss: 506mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9072 (nc00) [CoreForkPass]: Output has 3 module(s), 3 function(s), 7986 memory location(s), 3 block(s), and 27110 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:54Z USER 9072 (nc01) [CoreForkPass]: memory_analysis_after_coloring_allocator_dram_shared finished after 0.126 seconds +2025-11-04T21:38:54Z INFO 9072 (nc01) [CoreForkPass]: curr_vmrss: 500mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9072 (nc01) [CoreForkPass]: Output has 3 module(s), 3 function(s), 7597 memory location(s), 3 block(s), and 26389 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:54Z USER 9072 [CoreForkPass]: Compilation status: Total modules: 2, Passed: 6, Failed: 0 +2025-11-04T21:38:54Z USER 9072 [BackendPassManager]: nc_parallel_pass finished after 0.132 seconds +2025-11-04T21:38:54Z INFO 9072 [BackendPassManager]: curr_vmrss: 496mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z USER 9072 [BackendPassManager]: Running mod_parallel_pass +2025-11-04T21:38:54Z INFO 9072 [BackendPassManager]: Inputs to mod_parallel_pass: modules=6 functions=6 allocs=15583 blocks=6 instructions=53499 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:54Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running prefetch_scheduling_before_sched +2025-11-04T21:38:54Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running prefetch_scheduling_before_sched +2025-11-04T21:38:54Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to prefetch_scheduling_before_sched: modules=1 functions=1 allocs=2983 blocks=1 instructions=15130 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:54Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to prefetch_scheduling_before_sched: modules=1 functions=1 allocs=2381 blocks=1 instructions=6888 Max writers: 66 Max Readers: 496 +2025-11-04T21:38:54Z USER 9072 (nc01/sg01) [ModuleForkPass]: prefetch_scheduling_before_sched finished after 0.000 seconds +2025-11-04T21:38:54Z USER 9072 (nc01/sg02) [ModuleForkPass]: prefetch_scheduling_before_sched finished after 0.000 seconds +2025-11-04T21:38:54Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 492mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 492mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2381 memory location(s), 1 block(s), and 6888 instruction(s). Max writers: 66 Max Readers: 496 +2025-11-04T21:38:54Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2983 memory location(s), 1 block(s), and 15130 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:54Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running post_sched +2025-11-04T21:38:54Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running post_sched +2025-11-04T21:38:54Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to post_sched: modules=1 functions=1 allocs=2381 blocks=1 instructions=6888 Max writers: 66 Max Readers: 496 +2025-11-04T21:38:54Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to post_sched: modules=1 functions=1 allocs=2983 blocks=1 instructions=15130 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:54Z INFO 9072 [PostSched]: Detected modules.size() == 1; running LNC=1 post_sched +2025-11-04T21:38:54Z INFO 9072 [PostSched]: Detected modules.size() == 1; running LNC=1 post_sched +2025-11-04T21:38:54Z INFO 9072 [PostSched]: Detected --lnc_aware_scheduler=false; running LNC=1 post_sched +2025-11-04T21:38:54Z INFO 9072 [PostSched]: Detected --lnc_aware_scheduler=false; running LNC=1 post_sched +2025-11-04T21:38:54Z INFO 9072 [post_scheduler]: Start PosT ScheD 3 gen3 Tue Nov 4 21:38:54 2025 +2025-11-04T21:38:54Z INFO 9072 [post_scheduler]: Start PosT ScheD 3 gen3 Tue Nov 4 21:38:54 2025 +2025-11-04T21:38:54Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running prefetch_scheduling_before_sched +2025-11-04T21:38:54Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running prefetch_scheduling_before_sched +2025-11-04T21:38:54Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running prefetch_scheduling_before_sched +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to prefetch_scheduling_before_sched: modules=1 functions=1 allocs=2234 blocks=1 instructions=4374 Max writers: 66 Max Readers: 448 +2025-11-04T21:38:54Z USER 9072 (nc00/sg00) [ModuleForkPass]: prefetch_scheduling_before_sched finished after 0.000 seconds +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to prefetch_scheduling_before_sched: modules=1 functions=1 allocs=3370 blocks=1 instructions=15845 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 492mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to prefetch_scheduling_before_sched: modules=1 functions=1 allocs=2382 blocks=1 instructions=6891 Max writers: 66 Max Readers: 496 +2025-11-04T21:38:54Z USER 9072 (nc00/sg02) [ModuleForkPass]: prefetch_scheduling_before_sched finished after 0.000 seconds +2025-11-04T21:38:54Z USER 9072 (nc00/sg01) [ModuleForkPass]: prefetch_scheduling_before_sched finished after 0.000 seconds +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 492mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 492mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running prefetch_scheduling_before_sched +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2234 memory location(s), 1 block(s), and 4374 instruction(s). Max writers: 66 Max Readers: 448 +2025-11-04T21:38:54Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running post_sched +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to post_sched: modules=1 functions=1 allocs=2234 blocks=1 instructions=4374 Max writers: 66 Max Readers: 448 +2025-11-04T21:38:54Z INFO 9072 [PostSched]: Detected modules.size() == 1; running LNC=1 post_sched +2025-11-04T21:38:54Z INFO 9072 [PostSched]: Detected --lnc_aware_scheduler=false; running LNC=1 post_sched +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3370 memory location(s), 1 block(s), and 15845 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:54Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running post_sched +2025-11-04T21:38:54Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to prefetch_scheduling_before_sched: modules=1 functions=1 allocs=2233 blocks=1 instructions=4371 Max writers: 66 Max Readers: 448 +2025-11-04T21:38:54Z USER 9072 (nc01/sg00) [ModuleForkPass]: prefetch_scheduling_before_sched finished after 0.000 seconds +2025-11-04T21:38:54Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2382 memory location(s), 1 block(s), and 6891 instruction(s). Max writers: 66 Max Readers: 496 +2025-11-04T21:38:54Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 492mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running post_sched +2025-11-04T21:38:54Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to post_sched: modules=1 functions=1 allocs=3370 blocks=1 instructions=15845 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:54Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2233 memory location(s), 1 block(s), and 4371 instruction(s). Max writers: 66 Max Readers: 448 +2025-11-04T21:38:54Z INFO 9072 [PostSched]: Detected modules.size() == 1; running LNC=1 post_sched +2025-11-04T21:38:54Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running post_sched +2025-11-04T21:38:54Z INFO 9072 [PostSched]: Detected --lnc_aware_scheduler=false; running LNC=1 post_sched +2025-11-04T21:38:54Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to post_sched: modules=1 functions=1 allocs=2382 blocks=1 instructions=6891 Max writers: 66 Max Readers: 496 +2025-11-04T21:38:54Z INFO 9072 [PostSched]: Detected modules.size() == 1; running LNC=1 post_sched +2025-11-04T21:38:54Z INFO 9072 [PostSched]: Detected --lnc_aware_scheduler=false; running LNC=1 post_sched +2025-11-04T21:38:54Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to post_sched: modules=1 functions=1 allocs=2233 blocks=1 instructions=4371 Max writers: 66 Max Readers: 448 +2025-11-04T21:38:54Z INFO 9072 [PostSched]: Detected modules.size() == 1; running LNC=1 post_sched +2025-11-04T21:38:54Z INFO 9072 [PostSched]: Detected --lnc_aware_scheduler=false; running LNC=1 post_sched +2025-11-04T21:38:54Z INFO 9072 [post_scheduler]: Start PosT ScheD 3 gen3 Tue Nov 4 21:38:54 2025 +2025-11-04T21:38:54Z INFO 9072 [post_scheduler]: Start PosT ScheD 3 gen3 Tue Nov 4 21:38:54 2025 +2025-11-04T21:38:54Z INFO 9072 [post_scheduler]: Start PosT ScheD 3 gen3 Tue Nov 4 21:38:54 2025 +2025-11-04T21:38:54Z INFO 9072 [post_scheduler]: Start PosT ScheD 3 gen3 Tue Nov 4 21:38:54 2025 +2025-11-04T21:38:54Z INFO 9072 [post_scheduler]: Time-aware hwm post-sched +2025-11-04T21:38:54Z INFO 9072 [post_scheduler]: Time-aware hwm post-sched +2025-11-04T21:38:54Z INFO 9072 [post_scheduler]: Time-aware hwm post-sched +2025-11-04T21:38:54Z INFO 9072 [post_scheduler]: Time-aware hwm post-sched +2025-11-04T21:38:54Z INFO 9072 [post_scheduler]: Time-aware simulation time: 1456630 +2025-11-04T21:38:54Z INFO 9072 [post_scheduler]: Done PosT ScheD Tue Nov 4 21:38:54 2025 +2025-11-04T21:38:54Z USER 9072 (nc00/sg00) [ModuleForkPass]: post_sched finished after 0.190 seconds +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 509mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2234 memory location(s), 1 block(s), and 4374 instruction(s). Max writers: 66 Max Readers: 448 +2025-11-04T21:38:54Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running expand_scheduling_units +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to expand_scheduling_units: modules=1 functions=1 allocs=2234 blocks=1 instructions=4374 Max writers: 66 Max Readers: 448 +2025-11-04T21:38:54Z USER 9072 (nc00/sg00) [ModuleForkPass]: expand_scheduling_units finished after 0.000 seconds +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 506mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2234 memory location(s), 1 block(s), and 4374 instruction(s). Max writers: 66 Max Readers: 448 +2025-11-04T21:38:54Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running dead_code_elim_o0 +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to dead_code_elim_o0: modules=1 functions=1 allocs=2234 blocks=1 instructions=4374 Max writers: 66 Max Readers: 448 +2025-11-04T21:38:54Z USER 9072 (nc00/sg00) [ModuleForkPass]: dead_code_elim_o0 finished after 0.005 seconds +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 507mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2234 memory location(s), 1 block(s), and 4374 instruction(s). Max writers: 66 Max Readers: 448 +2025-11-04T21:38:54Z INFO 9072 [post_scheduler]: Time-aware simulation time: 59031477 +2025-11-04T21:38:54Z INFO 9072 [post_scheduler]: Done PosT ScheD Tue Nov 4 21:38:54 2025 +2025-11-04T21:38:54Z USER 9072 (nc01/sg01) [ModuleForkPass]: post_sched finished after 0.221 seconds +2025-11-04T21:38:54Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 508mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2381 memory location(s), 1 block(s), and 6888 instruction(s). Max writers: 66 Max Readers: 496 +2025-11-04T21:38:54Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running expand_scheduling_units +2025-11-04T21:38:54Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to expand_scheduling_units: modules=1 functions=1 allocs=2381 blocks=1 instructions=6888 Max writers: 66 Max Readers: 496 +2025-11-04T21:38:54Z USER 9072 (nc01/sg01) [ModuleForkPass]: expand_scheduling_units finished after 0.001 seconds +2025-11-04T21:38:54Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 505mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2381 memory location(s), 1 block(s), and 6888 instruction(s). Max writers: 66 Max Readers: 496 +2025-11-04T21:38:54Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running dead_code_elim_o0 +2025-11-04T21:38:54Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to dead_code_elim_o0: modules=1 functions=1 allocs=2381 blocks=1 instructions=6888 Max writers: 66 Max Readers: 496 +2025-11-04T21:38:54Z USER 9072 (nc01/sg01) [ModuleForkPass]: dead_code_elim_o0 finished after 0.009 seconds +2025-11-04T21:38:54Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 505mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2381 memory location(s), 1 block(s), and 6888 instruction(s). Max writers: 66 Max Readers: 496 +2025-11-04T21:38:54Z INFO 9072 [post_scheduler]: Time-aware simulation time: 1431952 +2025-11-04T21:38:54Z INFO 9072 [post_scheduler]: Done PosT ScheD Tue Nov 4 21:38:54 2025 +2025-11-04T21:38:54Z USER 9072 (nc01/sg00) [ModuleForkPass]: post_sched finished after 0.246 seconds +2025-11-04T21:38:54Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 507mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2233 memory location(s), 1 block(s), and 4371 instruction(s). Max writers: 66 Max Readers: 448 +2025-11-04T21:38:54Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running expand_scheduling_units +2025-11-04T21:38:54Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to expand_scheduling_units: modules=1 functions=1 allocs=2233 blocks=1 instructions=4371 Max writers: 66 Max Readers: 448 +2025-11-04T21:38:54Z USER 9072 (nc01/sg00) [ModuleForkPass]: expand_scheduling_units finished after 0.000 seconds +2025-11-04T21:38:54Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 505mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2233 memory location(s), 1 block(s), and 4371 instruction(s). Max writers: 66 Max Readers: 448 +2025-11-04T21:38:54Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running dead_code_elim_o0 +2025-11-04T21:38:54Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to dead_code_elim_o0: modules=1 functions=1 allocs=2233 blocks=1 instructions=4371 Max writers: 66 Max Readers: 448 +2025-11-04T21:38:54Z USER 9072 (nc01/sg00) [ModuleForkPass]: dead_code_elim_o0 finished after 0.005 seconds +2025-11-04T21:38:54Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 505mb, ru_maxrss: 522mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2233 memory location(s), 1 block(s), and 4371 instruction(s). Max writers: 66 Max Readers: 448 +2025-11-04T21:38:54Z INFO 9072 [post_scheduler]: Time-aware hwm post-sched +2025-11-04T21:38:54Z INFO 9072 [post_scheduler]: Time-aware hwm post-sched +2025-11-04T21:38:54Z INFO 9072 [post_scheduler]: Time-aware simulation time: 59697783 +2025-11-04T21:38:55Z INFO 9072 [post_scheduler]: Done PosT ScheD Tue Nov 4 21:38:55 2025 +2025-11-04T21:38:55Z USER 9072 (nc00/sg01) [ModuleForkPass]: post_sched finished after 0.493 seconds +2025-11-04T21:38:55Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 533mb, ru_maxrss: 533mb (delta=11mb) +2025-11-04T21:38:55Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2382 memory location(s), 1 block(s), and 6891 instruction(s). Max writers: 66 Max Readers: 496 +2025-11-04T21:38:55Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running expand_scheduling_units +2025-11-04T21:38:55Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to expand_scheduling_units: modules=1 functions=1 allocs=2382 blocks=1 instructions=6891 Max writers: 66 Max Readers: 496 +2025-11-04T21:38:55Z USER 9072 (nc00/sg01) [ModuleForkPass]: expand_scheduling_units finished after 0.005 seconds +2025-11-04T21:38:55Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 531mb, ru_maxrss: 533mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2382 memory location(s), 1 block(s), and 6891 instruction(s). Max writers: 66 Max Readers: 496 +2025-11-04T21:38:55Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running dead_code_elim_o0 +2025-11-04T21:38:55Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to dead_code_elim_o0: modules=1 functions=1 allocs=2382 blocks=1 instructions=6891 Max writers: 66 Max Readers: 496 +2025-11-04T21:38:55Z USER 9072 (nc00/sg01) [ModuleForkPass]: dead_code_elim_o0 finished after 0.008 seconds +2025-11-04T21:38:55Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 531mb, ru_maxrss: 533mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2382 memory location(s), 1 block(s), and 6891 instruction(s). Max writers: 66 Max Readers: 496 +2025-11-04T21:38:55Z INFO 9072 [post_scheduler]: Time-aware simulation time: 2180498 +2025-11-04T21:38:55Z INFO 9072 [post_scheduler]: Done PosT ScheD Tue Nov 4 21:38:55 2025 +2025-11-04T21:38:55Z USER 9072 (nc01/sg02) [ModuleForkPass]: post_sched finished after 0.611 seconds +2025-11-04T21:38:55Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 533mb, ru_maxrss: 533mb (delta=11mb) +2025-11-04T21:38:55Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2983 memory location(s), 1 block(s), and 15130 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:55Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running expand_scheduling_units +2025-11-04T21:38:55Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to expand_scheduling_units: modules=1 functions=1 allocs=2983 blocks=1 instructions=15130 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:55Z USER 9072 (nc01/sg02) [ModuleForkPass]: expand_scheduling_units finished after 0.003 seconds +2025-11-04T21:38:55Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 527mb, ru_maxrss: 533mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2983 memory location(s), 1 block(s), and 15130 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:55Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running dead_code_elim_o0 +2025-11-04T21:38:55Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to dead_code_elim_o0: modules=1 functions=1 allocs=2983 blocks=1 instructions=15130 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:55Z USER 9072 (nc01/sg02) [ModuleForkPass]: dead_code_elim_o0 finished after 0.016 seconds +2025-11-04T21:38:55Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 527mb, ru_maxrss: 533mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2983 memory location(s), 1 block(s), and 15126 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:55Z INFO 9072 [post_scheduler]: Time-aware simulation time: 2339050 +2025-11-04T21:38:55Z INFO 9072 [post_scheduler]: Done PosT ScheD Tue Nov 4 21:38:55 2025 +2025-11-04T21:38:55Z USER 9072 (nc00/sg02) [ModuleForkPass]: post_sched finished after 0.815 seconds +2025-11-04T21:38:55Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 528mb, ru_maxrss: 533mb (delta=11mb) +2025-11-04T21:38:55Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3370 memory location(s), 1 block(s), and 15845 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:55Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running expand_scheduling_units +2025-11-04T21:38:55Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to expand_scheduling_units: modules=1 functions=1 allocs=3370 blocks=1 instructions=15845 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:55Z USER 9072 (nc00/sg02) [ModuleForkPass]: expand_scheduling_units finished after 0.002 seconds +2025-11-04T21:38:55Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 522mb, ru_maxrss: 533mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3370 memory location(s), 1 block(s), and 15845 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:55Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running dead_code_elim_o0 +2025-11-04T21:38:55Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to dead_code_elim_o0: modules=1 functions=1 allocs=3370 blocks=1 instructions=15845 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:55Z USER 9072 (nc00/sg02) [ModuleForkPass]: dead_code_elim_o0 finished after 0.017 seconds +2025-11-04T21:38:55Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 522mb, ru_maxrss: 533mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3370 memory location(s), 1 block(s), and 15845 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:55Z USER 9072 [ModuleForkPass]: Compilation status: Total modules: 6, Passed: 6, Failed: 0 +2025-11-04T21:38:55Z USER 9072 [BackendPassManager]: mod_parallel_pass finished after 0.847 seconds +2025-11-04T21:38:55Z INFO 9072 [BackendPassManager]: curr_vmrss: 522mb, ru_maxrss: 533mb (delta=11mb) +2025-11-04T21:38:55Z USER 9072 [BackendPassManager]: Running subgraph_parallel_pass +2025-11-04T21:38:55Z INFO 9072 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=6 functions=6 allocs=15583 blocks=6 instructions=53495 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:55Z USER 9072 (sg01) [SubgraphForkPass]: Running localize_shared_memory +2025-11-04T21:38:55Z INFO 9072 (sg01) [SubgraphForkPass]: Inputs to localize_shared_memory: modules=2 functions=2 allocs=4763 blocks=2 instructions=13779 Max writers: 66 Max Readers: 496 +2025-11-04T21:38:55Z USER 9072 (sg01) [SubgraphForkPass]: localize_shared_memory finished after 0.001 seconds +2025-11-04T21:38:55Z INFO 9072 (sg01) [SubgraphForkPass]: curr_vmrss: 521mb, ru_maxrss: 533mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9072 (sg01) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 4763 memory location(s), 2 block(s), and 13779 instruction(s). Max writers: 66 Max Readers: 496 +2025-11-04T21:38:55Z USER 9072 (sg00) [SubgraphForkPass]: Running localize_shared_memory +2025-11-04T21:38:55Z INFO 9072 (sg00) [SubgraphForkPass]: Inputs to localize_shared_memory: modules=2 functions=2 allocs=4467 blocks=2 instructions=8745 Max writers: 66 Max Readers: 448 +2025-11-04T21:38:55Z USER 9072 (sg02) [SubgraphForkPass]: Running localize_shared_memory +2025-11-04T21:38:55Z USER 9072 (sg00) [SubgraphForkPass]: localize_shared_memory finished after 0.001 seconds +2025-11-04T21:38:55Z INFO 9072 (sg00) [SubgraphForkPass]: curr_vmrss: 521mb, ru_maxrss: 533mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9072 (sg02) [SubgraphForkPass]: Inputs to localize_shared_memory: modules=2 functions=2 allocs=6353 blocks=2 instructions=30971 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:55Z INFO 9072 (sg00) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 4467 memory location(s), 2 block(s), and 8745 instruction(s). Max writers: 66 Max Readers: 448 +2025-11-04T21:38:55Z USER 9072 (sg02) [SubgraphForkPass]: localize_shared_memory finished after 0.001 seconds +2025-11-04T21:38:55Z INFO 9072 (sg02) [SubgraphForkPass]: curr_vmrss: 520mb, ru_maxrss: 533mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9072 (sg02) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 6353 memory location(s), 2 block(s), and 30971 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:55Z USER 9072 [SubgraphForkPass]: Compilation status: Total subgraphs: 3, Passed: 3, Failed: 0 +2025-11-04T21:38:55Z USER 9072 [BackendPassManager]: subgraph_parallel_pass finished after 0.011 seconds +2025-11-04T21:38:55Z INFO 9072 [BackendPassManager]: curr_vmrss: 520mb, ru_maxrss: 533mb (delta=0mb) +2025-11-04T21:38:55Z USER 9072 [BackendPassManager]: Running mod_parallel_pass +2025-11-04T21:38:55Z INFO 9072 [BackendPassManager]: Inputs to mod_parallel_pass: modules=6 functions=6 allocs=15583 blocks=6 instructions=53495 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:55Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running address_rotation_sb +2025-11-04T21:38:55Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running address_rotation_sb +2025-11-04T21:38:55Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=3370 blocks=1 instructions=15845 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:55Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=2983 blocks=1 instructions=15126 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:55Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running address_rotation_sb +2025-11-04T21:38:55Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running address_rotation_sb +2025-11-04T21:38:55Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running address_rotation_sb +2025-11-04T21:38:55Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=2233 blocks=1 instructions=4371 Max writers: 66 Max Readers: 448 +2025-11-04T21:38:55Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=2382 blocks=1 instructions=6891 Max writers: 66 Max Readers: 496 +2025-11-04T21:38:55Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=2234 blocks=1 instructions=4374 Max writers: 66 Max Readers: 448 +2025-11-04T21:38:55Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running address_rotation_sb +2025-11-04T21:38:55Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=2381 blocks=1 instructions=6888 Max writers: 66 Max Readers: 496 +2025-11-04T21:38:55Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: PSUM Rotation rotated 305 PSUM Banks +2025-11-04T21:38:55Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: PSUM Rotation rotated 233 PSUM Banks +2025-11-04T21:38:55Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: PSUM Rotation rotated 53 PSUM Banks +2025-11-04T21:38:55Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: PSUM Rotation rotated 233 PSUM Banks +2025-11-04T21:38:55Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: PSUM Rotation rotated 258 PSUM Banks +2025-11-04T21:38:55Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: PSUM Rotation rotated 84 PSUM Banks +2025-11-04T21:38:55Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: SB Rotation rotated 35 Sb address +2025-11-04T21:38:55Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: PSUM Rotation rotated 53 PSUM Banks +2025-11-04T21:38:55Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: PSUM Rotation rotated 258 PSUM Banks +2025-11-04T21:38:55Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: SB Rotation rotated 33 Sb address +2025-11-04T21:38:55Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: PSUM Rotation rotated 263 PSUM Banks +2025-11-04T21:38:55Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: SB Rotation rotated 139 Sb address +2025-11-04T21:38:55Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: PSUM Rotation rotated 305 PSUM Banks +2025-11-04T21:38:55Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: SB Rotation rotated 139 Sb address +2025-11-04T21:38:55Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: SB Rotation rotated 33 Sb address +2025-11-04T21:38:55Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: SB Rotation rotated 88 Sb address +2025-11-04T21:38:55Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: PSUM Rotation rotated 84 PSUM Banks +2025-11-04T21:38:55Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: SB Rotation rotated 89 Sb address +2025-11-04T21:38:55Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: SB Rotation rotated 131 Sb address +2025-11-04T21:38:55Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: SB Rotation rotated 35 Sb address +2025-11-04T21:38:55Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: PSUM Rotation rotated 744 PSUM Banks +2025-11-04T21:38:55Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: PSUM Rotation rotated 263 PSUM Banks +2025-11-04T21:38:55Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: SB Rotation rotated 177 Sb address +2025-11-04T21:38:55Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: SB Rotation rotated 105 Sb address +2025-11-04T21:38:55Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: SB Rotation rotated 35 Sb address +2025-11-04T21:38:55Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: SB Rotation rotated 1 Sb address +2025-11-04T21:38:55Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: SB Rotation rotated 5 Sb address +2025-11-04T21:38:55Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: SB Rotation rotated 33 Sb address +2025-11-04T21:38:55Z INFO 9072 (nc00/sg00) [DMAOptimizationBase]: SB Rotation rotated 1 Sb address +2025-11-04T21:38:55Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: SB Rotation rotated 177 Sb address +2025-11-04T21:38:55Z USER 9072 (nc00/sg00) [ModuleForkPass]: address_rotation_sb finished after 0.302 seconds +2025-11-04T21:38:55Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 522mb, ru_maxrss: 533mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2234 memory location(s), 1 block(s), and 4374 instruction(s). Max writers: 66 Max Readers: 448 +2025-11-04T21:38:55Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running anti_dependency_analyzer +2025-11-04T21:38:55Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=2234 blocks=1 instructions=4374 Max writers: 66 Max Readers: 448 +2025-11-04T21:38:55Z INFO 9072 (nc00/sg00) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB} +2025-11-04T21:38:55Z INFO 9072 (nc00/sg00) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:55Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: SB Rotation rotated 166 Sb address +2025-11-04T21:38:55Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: SB Rotation rotated 1 Sb address +2025-11-04T21:38:55Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: SB Rotation rotated 131 Sb address +2025-11-04T21:38:55Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:55Z INFO 9072 (nc01/sg00) [DMAOptimizationBase]: SB Rotation rotated 1 Sb address +2025-11-04T21:38:55Z USER 9072 (nc00/sg00) [ModuleForkPass]: anti_dependency_analyzer finished after 0.056 seconds +2025-11-04T21:38:55Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 526mb, ru_maxrss: 533mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2234 memory location(s), 1 block(s), and 4374 instruction(s). Max writers: 66 Max Readers: 448 +2025-11-04T21:38:55Z USER 9072 (nc01/sg00) [ModuleForkPass]: address_rotation_sb finished after 0.357 seconds +2025-11-04T21:38:55Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 524mb, ru_maxrss: 533mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2233 memory location(s), 1 block(s), and 4371 instruction(s). Max writers: 66 Max Readers: 448 +2025-11-04T21:38:55Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running anti_dependency_analyzer +2025-11-04T21:38:55Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running anti_dependency_analyzer +2025-11-04T21:38:55Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=2233 blocks=1 instructions=4371 Max writers: 66 Max Readers: 448 +2025-11-04T21:38:55Z INFO 9072 (nc01/sg00) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB} +2025-11-04T21:38:55Z INFO 9072 (nc01/sg00) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:55Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=2234 blocks=1 instructions=4374 Max writers: 66 Max Readers: 448 +2025-11-04T21:38:55Z INFO 9072 (nc00/sg00) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS} +2025-11-04T21:38:55Z INFO 9072 (nc00/sg00) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:55Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: PSUM Rotation rotated 17 PSUM Banks +2025-11-04T21:38:55Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: PSUM Rotation rotated 821 PSUM Banks +2025-11-04T21:38:55Z USER 9072 (nc00/sg00) [ModuleForkPass]: anti_dependency_analyzer finished after 0.022 seconds +2025-11-04T21:38:55Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 525mb, ru_maxrss: 533mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2234 memory location(s), 1 block(s), and 4374 instruction(s). Max writers: 66 Max Readers: 448 +2025-11-04T21:38:55Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running dep_opt +2025-11-04T21:38:55Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to dep_opt: modules=1 functions=1 allocs=2234 blocks=1 instructions=4374 Max writers: 66 Max Readers: 448 +2025-11-04T21:38:55Z INFO 9072 (nc00/sg00) [build_flow_deps]: Start build fdeps. Invocation: 13Tue Nov 4 21:38:55 2025 +2025-11-04T21:38:55Z INFO 9072 (nc00/sg00) [build_flow_deps]: Allocs: 2234 instructions: 4374 +2025-11-04T21:38:55Z INFO 9072 (nc00/sg01) [DMAOptimizationBase]: SB Rotation rotated 1 Sb address +2025-11-04T21:38:55Z USER 9072 (nc00/sg01) [ModuleForkPass]: address_rotation_sb finished after 0.387 seconds +2025-11-04T21:38:55Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 524mb, ru_maxrss: 533mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2382 memory location(s), 1 block(s), and 6891 instruction(s). Max writers: 66 Max Readers: 496 +2025-11-04T21:38:55Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running anti_dependency_analyzer +2025-11-04T21:38:55Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=2382 blocks=1 instructions=6891 Max writers: 66 Max Readers: 496 +2025-11-04T21:38:55Z INFO 9072 (nc00/sg01) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB} +2025-11-04T21:38:55Z INFO 9072 (nc00/sg01) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:55Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: SB Rotation rotated 105 Sb address +2025-11-04T21:38:55Z USER 9072 (nc01/sg00) [ModuleForkPass]: anti_dependency_analyzer finished after 0.075 seconds +2025-11-04T21:38:55Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 529mb, ru_maxrss: 533mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2233 memory location(s), 1 block(s), and 4371 instruction(s). Max writers: 66 Max Readers: 448 +2025-11-04T21:38:55Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running anti_dependency_analyzer +2025-11-04T21:38:55Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=2233 blocks=1 instructions=4371 Max writers: 66 Max Readers: 448 +2025-11-04T21:38:55Z INFO 9072 (nc01/sg00) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS} +2025-11-04T21:38:55Z INFO 9072 (nc01/sg00) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:55Z INFO 9072 (nc00/sg00) [build_flow_deps]: Build fdeps inserted 10802 edges +2025-11-04T21:38:55Z INFO 9072 (nc00/sg00) [build_flow_deps]: Done build fdeps 10802 Tue Nov 4 21:38:55 2025 +2025-11-04T21:38:55Z USER 9072 (nc00/sg00) [ModuleForkPass]: dep_opt finished after 0.077 seconds +2025-11-04T21:38:55Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 527mb, ru_maxrss: 533mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2234 memory location(s), 1 block(s), and 4374 instruction(s). Max writers: 66 Max Readers: 448 +2025-11-04T21:38:55Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running report_stats +2025-11-04T21:38:55Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to report_stats: modules=1 functions=1 allocs=2234 blocks=1 instructions=4374 Max writers: 66 Max Readers: 448 +2025-11-04T21:38:55Z INFO 9072 (nc00/sg00) [ReportStats]: Data Movement Statistics: sg0000 +┌─────────────────┬────────────────────────────┬───────┬────────────┐ +│ Instruction │ Kind │ Count │ Bytes │ +├─────────────────┼────────────────────────────┼───────┼────────────┤ +│ DMACopy │ ExternalInput -> Internal │ 32 │ 9957277696 │ +│ DMACopy │ Internal -> ExternalOutput │ 64 │ 2147483648 │ +│ DMACopy │ Internal -> Output │ 1 │ 16777216 │ +│ DMACopy (Spill) │ Internal │ 64 │ 0 │ +│ Load │ Const -> Internal │ 5 │ 73984 │ +│ Load │ ExternalInput -> Internal │ 28 │ 10502660 │ +│ Load │ Internal │ 161 │ 15204352 │ +│ Save │ Internal │ 108 │ 14680064 │ +│ Save │ Internal -> Output │ 19 │ 4718594 │ +└─────────────────┴────────────────────────────┴───────┴────────────┘ + +2025-11-04T21:38:55Z INFO 9072 (nc00/sg00) [ReportStats]: +┌─────────────────────┬───────┐ +│ Bytes per partition │ Count │ +├─────────────────────┼───────┤ +│ 2 │ 5 │ +│ 4 │ 1 │ +│ 32 │ 1 │ +│ 64 │ 1 │ +│ 128 │ 2 │ +│ 256 │ 194 │ +│ 512 │ 1 │ +│ 1024 │ 16 │ +│ 2048 │ 90 │ +│ 4096 │ 42 │ +│ 1048576 │ 64 │ +│ 8388608 │ 2 │ +└─────────────────────┴───────┘ + +2025-11-04T21:38:55Z INFO 9072 (nc00/sg00) [ReportStats]: MM Stats: #MatMults 2145 #MatMult-Transposes 449 +2025-11-04T21:38:55Z INFO 9072 (nc00/sg00) [ReportStats]: IO Tensor size combined: 457986564 +2025-11-04T21:38:55Z INFO 9072 (nc00/sg00) [ReportStats]: IO Tensor Statistics: +┌────────────────────┬────────────────┬──────────┬──────────────┐ +│ Largest IO Tensors │ Kind │ Src Type │ Size (Bytes) │ +├────────────────────┼────────────────┼──────────┼──────────────┤ +│ input60 │ ExternalInput │ bfloat16 │ 311164928 │ +│ input4 │ ExternalInput │ bfloat16 │ 33554432 │ +│ output1 │ ExternalOutput │ bfloat16 │ 33554432 │ +│ input5 │ ExternalInput │ bfloat16 │ 33554432 │ +│ output2 │ ExternalOutput │ bfloat16 │ 33554432 │ +│ input61 │ ExternalInput │ bfloat16 │ 4194304 │ +│ input67 │ ExternalInput │ bfloat16 │ 4194304 │ +│ input62 │ ExternalInput │ bfloat16 │ 2097152 │ +│ input65 │ ExternalInput │ bfloat16 │ 2097152 │ +│ input1 │ ExternalInput │ int32 │ 8192 │ +└────────────────────┴────────────────┴──────────┴──────────────┘ + +2025-11-04T21:38:55Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: SB Rotation rotated 5 Sb address +2025-11-04T21:38:55Z INFO 9072 (nc00/sg00) [ReportStats]: Large (Internal) Tensor Statistics: +┌───────────────────────────┬──────────┬──────────┬──────────────┐ +│ Largest Tensors │ Kind │ Src Type │ Size (Bytes) │ +├───────────────────────────┼──────────┼──────────┼──────────────┤ +│ intermediate3 │ Output │ bfloat16 │ 8388608 │ +│ intermediate0 │ Output │ bfloat16 │ 8388608 │ +│ intermediate3-buffer-2756 │ Internal │ bfloat16 │ 8388608 │ +│ dot.4-buffer-2754 │ Internal │ bfloat16 │ 8388608 │ +│ get_tuple_element.1 │ Internal │ bfloat16 │ 4194304 │ +│ all_gather.1_i0 │ Internal │ bfloat16 │ 4194304 │ +│ reshape.16 │ Internal │ bfloat16 │ 4194304 │ +│ all_gather.1_i1 │ Internal │ bfloat16 │ 4194304 │ +│ reshape.24 │ Internal │ bfloat16 │ 4194304 │ +│ reshape.29 │ Internal │ bfloat16 │ 4194304 │ +└───────────────────────────┴──────────┴──────────┴──────────────┘ + +2025-11-04T21:38:55Z USER 9072 (nc00/sg00) [ModuleForkPass]: report_stats finished after 0.002 seconds +2025-11-04T21:38:55Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 528mb, ru_maxrss: 533mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2234 memory location(s), 1 block(s), and 4374 instruction(s). Max writers: 66 Max Readers: 448 +2025-11-04T21:38:55Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: PSUM Rotation rotated 17 PSUM Banks +2025-11-04T21:38:55Z USER 9072 (nc00/sg01) [ModuleForkPass]: anti_dependency_analyzer finished after 0.107 seconds +2025-11-04T21:38:55Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 528mb, ru_maxrss: 533mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2382 memory location(s), 1 block(s), and 6891 instruction(s). Max writers: 66 Max Readers: 496 +2025-11-04T21:38:55Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running anti_dependency_analyzer +2025-11-04T21:38:55Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=2382 blocks=1 instructions=6891 Max writers: 66 Max Readers: 496 +2025-11-04T21:38:55Z INFO 9072 (nc00/sg01) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS} +2025-11-04T21:38:55Z INFO 9072 (nc00/sg01) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:55Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: PSUM Rotation rotated 178 PSUM Banks +2025-11-04T21:38:55Z USER 9072 (nc01/sg00) [ModuleForkPass]: anti_dependency_analyzer finished after 0.079 seconds +2025-11-04T21:38:55Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 526mb, ru_maxrss: 533mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: SB Rotation rotated 166 Sb address +2025-11-04T21:38:55Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2233 memory location(s), 1 block(s), and 4371 instruction(s). Max writers: 66 Max Readers: 448 +2025-11-04T21:38:55Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running dep_opt +2025-11-04T21:38:55Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to dep_opt: modules=1 functions=1 allocs=2233 blocks=1 instructions=4371 Max writers: 66 Max Readers: 448 +2025-11-04T21:38:55Z INFO 9072 (nc01/sg00) [build_flow_deps]: Start build fdeps. Invocation: 14Tue Nov 4 21:38:55 2025 +2025-11-04T21:38:55Z INFO 9072 (nc01/sg00) [build_flow_deps]: Allocs: 2233 instructions: 4371 +2025-11-04T21:38:55Z INFO 9072 (nc01/sg00) [build_flow_deps]: Build fdeps inserted 10802 edges +2025-11-04T21:38:55Z INFO 9072 (nc01/sg00) [build_flow_deps]: Done build fdeps 10802 Tue Nov 4 21:38:55 2025 +2025-11-04T21:38:55Z USER 9072 (nc00/sg01) [ModuleForkPass]: anti_dependency_analyzer finished after 0.055 seconds +2025-11-04T21:38:55Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 525mb, ru_maxrss: 533mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:55Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2382 memory location(s), 1 block(s), and 6891 instruction(s). Max writers: 66 Max Readers: 496 +2025-11-04T21:38:55Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running dep_opt +2025-11-04T21:38:55Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to dep_opt: modules=1 functions=1 allocs=2382 blocks=1 instructions=6891 Max writers: 66 Max Readers: 496 +2025-11-04T21:38:55Z INFO 9072 (nc00/sg01) [build_flow_deps]: Start build fdeps. Invocation: 15Tue Nov 4 21:38:55 2025 +2025-11-04T21:38:55Z INFO 9072 (nc00/sg01) [build_flow_deps]: Allocs: 2382 instructions: 6891 +2025-11-04T21:38:55Z USER 9072 (nc01/sg00) [ModuleForkPass]: dep_opt finished after 0.042 seconds +2025-11-04T21:38:55Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 525mb, ru_maxrss: 533mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2233 memory location(s), 1 block(s), and 4371 instruction(s). Max writers: 66 Max Readers: 448 +2025-11-04T21:38:55Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: SB Rotation rotated 35 Sb address +2025-11-04T21:38:55Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running report_stats +2025-11-04T21:38:55Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to report_stats: modules=1 functions=1 allocs=2233 blocks=1 instructions=4371 Max writers: 66 Max Readers: 448 +2025-11-04T21:38:55Z INFO 9072 (nc01/sg00) [ReportStats]: Data Movement Statistics: sg0000 +┌─────────────────┬────────────────────────────┬───────┬────────────┐ +│ Instruction │ Kind │ Count │ Bytes │ +├─────────────────┼────────────────────────────┼───────┼────────────┤ +│ DMACopy │ ExternalInput -> Internal │ 32 │ 9957277696 │ +│ DMACopy │ Internal -> ExternalOutput │ 64 │ 2147483648 │ +│ DMACopy (Spill) │ Internal │ 64 │ 0 │ +│ Load │ Const -> Internal │ 5 │ 73984 │ +│ Load │ ExternalInput -> Internal │ 28 │ 10502660 │ +│ Load │ Internal │ 161 │ 15204352 │ +│ Save │ Internal │ 108 │ 14680064 │ +│ Save │ Internal -> Output │ 18 │ 4718592 │ +└─────────────────┴────────────────────────────┴───────┴────────────┘ + +2025-11-04T21:38:55Z INFO 9072 (nc01/sg00) [ReportStats]: +┌─────────────────────┬───────┐ +│ Bytes per partition │ Count │ +├─────────────────────┼───────┤ +│ 2 │ 4 │ +│ 4 │ 1 │ +│ 32 │ 1 │ +│ 64 │ 1 │ +│ 128 │ 2 │ +│ 256 │ 194 │ +│ 512 │ 1 │ +│ 1024 │ 16 │ +│ 2048 │ 90 │ +│ 4096 │ 42 │ +│ 1048576 │ 64 │ +└─────────────────────┴───────┘ + +2025-11-04T21:38:55Z INFO 9072 (nc01/sg00) [ReportStats]: MM Stats: #MatMults 2145 #MatMult-Transposes 449 +2025-11-04T21:38:55Z INFO 9072 (nc01/sg00) [ReportStats]: IO Tensor size combined: 457986564 +2025-11-04T21:38:55Z INFO 9072 (nc01/sg00) [ReportStats]: IO Tensor Statistics: +┌────────────────────┬────────────────┬──────────┬──────────────┐ +│ Largest IO Tensors │ Kind │ Src Type │ Size (Bytes) │ +├────────────────────┼────────────────┼──────────┼──────────────┤ +│ input60 │ ExternalInput │ bfloat16 │ 311164928 │ +│ input4 │ ExternalInput │ bfloat16 │ 33554432 │ +│ output1 │ ExternalOutput │ bfloat16 │ 33554432 │ +│ input5 │ ExternalInput │ bfloat16 │ 33554432 │ +│ output2 │ ExternalOutput │ bfloat16 │ 33554432 │ +│ input61 │ ExternalInput │ bfloat16 │ 4194304 │ +│ input67 │ ExternalInput │ bfloat16 │ 4194304 │ +│ input62 │ ExternalInput │ bfloat16 │ 2097152 │ +│ input65 │ ExternalInput │ bfloat16 │ 2097152 │ +│ input1 │ ExternalInput │ int32 │ 8192 │ +└────────────────────┴────────────────┴──────────┴──────────────┘ + +2025-11-04T21:38:55Z INFO 9072 (nc01/sg00) [ReportStats]: Large (Internal) Tensor Statistics: +┌───────────────────────────┬──────────┬──────────┬──────────────┐ +│ Largest Tensors │ Kind │ Src Type │ Size (Bytes) │ +├───────────────────────────┼──────────┼──────────┼──────────────┤ +│ intermediate3 │ Output │ bfloat16 │ 8388608 │ +│ intermediate0 │ Output │ bfloat16 │ 8388608 │ +│ intermediate3-buffer-2756 │ Internal │ bfloat16 │ 8388608 │ +│ dot.4-buffer-2754 │ Internal │ bfloat16 │ 8388608 │ +│ get_tuple_element.1 │ Internal │ bfloat16 │ 4194304 │ +│ all_gather.1_i0 │ Internal │ bfloat16 │ 4194304 │ +│ reshape.16 │ Internal │ bfloat16 │ 4194304 │ +│ all_gather.1_i1 │ Internal │ bfloat16 │ 4194304 │ +│ reshape.24 │ Internal │ bfloat16 │ 4194304 │ +│ reshape.29 │ Internal │ bfloat16 │ 4194304 │ +└───────────────────────────┴──────────┴──────────┴──────────────┘ + +2025-11-04T21:38:55Z USER 9072 (nc01/sg00) [ModuleForkPass]: report_stats finished after 0.002 seconds +2025-11-04T21:38:55Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 525mb, ru_maxrss: 533mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2233 memory location(s), 1 block(s), and 4371 instruction(s). Max writers: 66 Max Readers: 448 +2025-11-04T21:38:55Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: PSUM Rotation rotated 362 PSUM Banks +2025-11-04T21:38:55Z INFO 9072 (nc01/sg01) [DMAOptimizationBase]: SB Rotation rotated 1 Sb address +2025-11-04T21:38:55Z USER 9072 (nc01/sg01) [ModuleForkPass]: address_rotation_sb finished after 0.587 seconds +2025-11-04T21:38:55Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 525mb, ru_maxrss: 533mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2381 memory location(s), 1 block(s), and 6888 instruction(s). Max writers: 66 Max Readers: 496 +2025-11-04T21:38:55Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running anti_dependency_analyzer +2025-11-04T21:38:55Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=2381 blocks=1 instructions=6888 Max writers: 66 Max Readers: 496 +2025-11-04T21:38:55Z INFO 9072 (nc01/sg01) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB} +2025-11-04T21:38:55Z INFO 9072 (nc01/sg01) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:55Z INFO 9072 (nc00/sg01) [build_flow_deps]: Build fdeps inserted 19109 edges +2025-11-04T21:38:55Z INFO 9072 (nc00/sg01) [build_flow_deps]: Done build fdeps 19109 Tue Nov 4 21:38:55 2025 +2025-11-04T21:38:55Z USER 9072 (nc00/sg01) [ModuleForkPass]: dep_opt finished after 0.069 seconds +2025-11-04T21:38:55Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 526mb, ru_maxrss: 533mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2382 memory location(s), 1 block(s), and 6891 instruction(s). Max writers: 66 Max Readers: 496 +2025-11-04T21:38:55Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running report_stats +2025-11-04T21:38:55Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to report_stats: modules=1 functions=1 allocs=2382 blocks=1 instructions=6891 Max writers: 66 Max Readers: 496 +2025-11-04T21:38:55Z INFO 9072 (nc00/sg01) [ReportStats]: Data Movement Statistics: sg0001 +┌─────────────────┬────────────────────────────┬───────┬────────────┐ +│ Instruction │ Kind │ Count │ Bytes │ +├─────────────────┼────────────────────────────┼───────┼────────────┤ +│ DMACopy │ Input -> Internal │ 1 │ 12582912 │ +│ DMACopy │ Internal -> ExternalOutput │ 64 │ 2147483648 │ +│ DMACopy │ Internal -> Output │ 1 │ 16777216 │ +│ DMACopy (Spill) │ Internal │ 64 │ 0 │ +│ Load │ Const -> Internal │ 5 │ 98304 │ +│ Load │ ExternalInput -> Internal │ 209 │ 68166148 │ +│ Load │ Input -> Internal │ 2 │ 524288 │ +│ Load │ Internal │ 181 │ 25690112 │ +│ Save │ Internal │ 125 │ 23592960 │ +│ Save │ Internal -> Output │ 9 │ 4194306 │ +└─────────────────┴────────────────────────────┴───────┴────────────┘ + +2025-11-04T21:38:55Z INFO 9072 (nc00/sg01) [ReportStats]: +┌─────────────────────┬───────┐ +│ Bytes per partition │ Count │ +├─────────────────────┼───────┤ +│ 2 │ 5 │ +│ 4 │ 1 │ +│ 32 │ 2 │ +│ 128 │ 4 │ +│ 256 │ 193 │ +│ 1024 │ 112 │ +│ 2048 │ 42 │ +│ 4096 │ 172 │ +│ 1048576 │ 64 │ +│ 4194304 │ 3 │ +│ 8388608 │ 2 │ +└─────────────────────┴───────┘ + +2025-11-04T21:38:56Z INFO 9072 (nc00/sg01) [ReportStats]: MM Stats: #MatMults 4520 #MatMult-Transposes 496 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg01) [ReportStats]: IO Tensor size combined: 184558084 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg01) [ReportStats]: IO Tensor Statistics: +┌────────────────────┬────────────────┬──────────┬──────────────┐ +│ Largest IO Tensors │ Kind │ Src Type │ Size (Bytes) │ +├────────────────────┼────────────────┼──────────┼──────────────┤ +│ output4 │ ExternalOutput │ bfloat16 │ 33554432 │ +│ input6 │ ExternalInput │ bfloat16 │ 33554432 │ +│ input7 │ ExternalInput │ bfloat16 │ 33554432 │ +│ output3 │ ExternalOutput │ bfloat16 │ 33554432 │ +│ input68 │ ExternalInput │ bfloat16 │ 12582912 │ +│ input71 │ ExternalInput │ bfloat16 │ 12582912 │ +│ input69 │ ExternalInput │ bfloat16 │ 12582912 │ +│ input72 │ ExternalInput │ bfloat16 │ 4194304 │ +│ input78 │ ExternalInput │ bfloat16 │ 4194304 │ +│ input76 │ ExternalInput │ bfloat16 │ 2097152 │ +└────────────────────┴────────────────┴──────────┴──────────────┘ + +2025-11-04T21:38:56Z INFO 9072 (nc00/sg01) [ReportStats]: Large (Internal) Tensor Statistics: +┌───────────────────────────┬──────────┬──────────┬──────────────┐ +│ Largest Tensors │ Kind │ Src Type │ Size (Bytes) │ +├───────────────────────────┼──────────┼──────────┼──────────────┤ +│ intermediate3 │ Input │ bfloat16 │ 8388608 │ +│ dot.7-buffer-2492 │ Internal │ bfloat16 │ 8388608 │ +│ dot.11-buffer-2497 │ Internal │ bfloat16 │ 8388608 │ +│ intermediate5 │ Output │ bfloat16 │ 8388608 │ +│ intermediate0 │ Input │ bfloat16 │ 8388608 │ +│ all_reduce.1-buffer-2494 │ Internal │ bfloat16 │ 8388608 │ +│ intermediate6-buffer-2499 │ Internal │ bfloat16 │ 8388608 │ +│ intermediate6 │ Output │ bfloat16 │ 8388608 │ +│ add.4 │ Internal │ bfloat16 │ 8388608 │ +│ reshape.60 │ Internal │ bfloat16 │ 4194304 │ +└───────────────────────────┴──────────┴──────────┴──────────────┘ + +2025-11-04T21:38:56Z USER 9072 (nc00/sg01) [ModuleForkPass]: report_stats finished after 0.010 seconds +2025-11-04T21:38:56Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 527mb, ru_maxrss: 533mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2382 memory location(s), 1 block(s), and 6891 instruction(s). Max writers: 66 Max Readers: 496 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: SB Rotation rotated 40 Sb address +2025-11-04T21:38:56Z USER 9072 (nc01/sg01) [ModuleForkPass]: anti_dependency_analyzer finished after 0.080 seconds +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 528mb, ru_maxrss: 533mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2381 memory location(s), 1 block(s), and 6888 instruction(s). Max writers: 66 Max Readers: 496 +2025-11-04T21:38:56Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running anti_dependency_analyzer +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=2381 blocks=1 instructions=6888 Max writers: 66 Max Readers: 496 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS} +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: SB Rotation rotated 50 Sb address +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: SB Rotation rotated 10 Sb address +2025-11-04T21:38:56Z USER 9072 (nc01/sg01) [ModuleForkPass]: anti_dependency_analyzer finished after 0.045 seconds +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 527mb, ru_maxrss: 533mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2381 memory location(s), 1 block(s), and 6888 instruction(s). Max writers: 66 Max Readers: 496 +2025-11-04T21:38:56Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running dep_opt +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to dep_opt: modules=1 functions=1 allocs=2381 blocks=1 instructions=6888 Max writers: 66 Max Readers: 496 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [build_flow_deps]: Start build fdeps. Invocation: 16Tue Nov 4 21:38:56 2025 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [build_flow_deps]: Allocs: 2381 instructions: 6888 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: SB Rotation rotated 4 Sb address +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [build_flow_deps]: Build fdeps inserted 19106 edges +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [build_flow_deps]: Done build fdeps 19106 Tue Nov 4 21:38:56 2025 +2025-11-04T21:38:56Z USER 9072 (nc01/sg01) [ModuleForkPass]: dep_opt finished after 0.050 seconds +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 526mb, ru_maxrss: 533mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2381 memory location(s), 1 block(s), and 6888 instruction(s). Max writers: 66 Max Readers: 496 +2025-11-04T21:38:56Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running report_stats +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to report_stats: modules=1 functions=1 allocs=2381 blocks=1 instructions=6888 Max writers: 66 Max Readers: 496 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [ReportStats]: Data Movement Statistics: sg0001 +┌─────────────────┬────────────────────────────┬───────┬────────────┐ +│ Instruction │ Kind │ Count │ Bytes │ +├─────────────────┼────────────────────────────┼───────┼────────────┤ +│ DMACopy │ Input -> Internal │ 1 │ 12582912 │ +│ DMACopy │ Internal -> ExternalOutput │ 64 │ 2147483648 │ +│ DMACopy (Spill) │ Internal │ 64 │ 0 │ +│ Load │ Const -> Internal │ 5 │ 98304 │ +│ Load │ ExternalInput -> Internal │ 209 │ 68166148 │ +│ Load │ Input -> Internal │ 2 │ 524288 │ +│ Load │ Internal │ 181 │ 25690112 │ +│ Save │ Internal │ 125 │ 23592960 │ +│ Save │ Internal -> Output │ 8 │ 4194304 │ +└─────────────────┴────────────────────────────┴───────┴────────────┘ + +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [ReportStats]: +┌─────────────────────┬───────┐ +│ Bytes per partition │ Count │ +├─────────────────────┼───────┤ +│ 2 │ 4 │ +│ 4 │ 1 │ +│ 32 │ 2 │ +│ 128 │ 4 │ +│ 256 │ 193 │ +│ 1024 │ 112 │ +│ 2048 │ 42 │ +│ 4096 │ 172 │ +│ 1048576 │ 64 │ +│ 4194304 │ 3 │ +└─────────────────────┴───────┘ + +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [ReportStats]: MM Stats: #MatMults 4520 #MatMult-Transposes 496 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [ReportStats]: IO Tensor size combined: 184558084 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [ReportStats]: IO Tensor Statistics: +┌────────────────────┬────────────────┬──────────┬──────────────┐ +│ Largest IO Tensors │ Kind │ Src Type │ Size (Bytes) │ +├────────────────────┼────────────────┼──────────┼──────────────┤ +│ output4 │ ExternalOutput │ bfloat16 │ 33554432 │ +│ input6 │ ExternalInput │ bfloat16 │ 33554432 │ +│ input7 │ ExternalInput │ bfloat16 │ 33554432 │ +│ output3 │ ExternalOutput │ bfloat16 │ 33554432 │ +│ input68 │ ExternalInput │ bfloat16 │ 12582912 │ +│ input71 │ ExternalInput │ bfloat16 │ 12582912 │ +│ input69 │ ExternalInput │ bfloat16 │ 12582912 │ +│ input72 │ ExternalInput │ bfloat16 │ 4194304 │ +│ input78 │ ExternalInput │ bfloat16 │ 4194304 │ +│ input76 │ ExternalInput │ bfloat16 │ 2097152 │ +└────────────────────┴────────────────┴──────────┴──────────────┘ + +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [ReportStats]: Large (Internal) Tensor Statistics: +┌───────────────────────────┬──────────┬──────────┬──────────────┐ +│ Largest Tensors │ Kind │ Src Type │ Size (Bytes) │ +├───────────────────────────┼──────────┼──────────┼──────────────┤ +│ intermediate3 │ Input │ bfloat16 │ 8388608 │ +│ dot.7-buffer-2492 │ Internal │ bfloat16 │ 8388608 │ +│ dot.11-buffer-2497 │ Internal │ bfloat16 │ 8388608 │ +│ intermediate5 │ Output │ bfloat16 │ 8388608 │ +│ intermediate0 │ Input │ bfloat16 │ 8388608 │ +│ all_reduce.1-buffer-2494 │ Internal │ bfloat16 │ 8388608 │ +│ intermediate6-buffer-2499 │ Internal │ bfloat16 │ 8388608 │ +│ intermediate6 │ Output │ bfloat16 │ 8388608 │ +│ add.4 │ Internal │ bfloat16 │ 8388608 │ +│ reshape.60 │ Internal │ bfloat16 │ 4194304 │ +└───────────────────────────┴──────────┴──────────┴──────────────┘ + +2025-11-04T21:38:56Z USER 9072 (nc01/sg01) [ModuleForkPass]: report_stats finished after 0.003 seconds +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 526mb, ru_maxrss: 533mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2381 memory location(s), 1 block(s), and 6888 instruction(s). Max writers: 66 Max Readers: 496 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: SB Rotation rotated 63 Sb address +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: SB Rotation rotated 36 Sb address +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: SB Rotation rotated 5 Sb address +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:56Z USER 9072 (nc01/sg02) [ModuleForkPass]: address_rotation_sb finished after 0.888 seconds +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 526mb, ru_maxrss: 533mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2983 memory location(s), 1 block(s), and 15126 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running anti_dependency_analyzer +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=2983 blocks=1 instructions=15126 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB} +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: SB Rotation rotated 74 Sb address +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:56Z USER 9072 (nc01/sg02) [ModuleForkPass]: anti_dependency_analyzer finished after 0.114 seconds +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 541mb, ru_maxrss: 541mb (delta=8mb) +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2983 memory location(s), 1 block(s), and 15126 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running anti_dependency_analyzer +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=2983 blocks=1 instructions=15126 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS} +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:56Z USER 9072 (nc01/sg02) [ModuleForkPass]: anti_dependency_analyzer finished after 0.021 seconds +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 528mb, ru_maxrss: 541mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2983 memory location(s), 1 block(s), and 15126 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running dep_opt +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to dep_opt: modules=1 functions=1 allocs=2983 blocks=1 instructions=15126 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z USER 9072 (nc00/sg02) [ModuleForkPass]: address_rotation_sb finished after 1.027 seconds +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 528mb, ru_maxrss: 541mb (delta=8mb) +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3370 memory location(s), 1 block(s), and 15845 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running anti_dependency_analyzer +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=3370 blocks=1 instructions=15845 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB} +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [build_flow_deps]: Start build fdeps. Invocation: 17Tue Nov 4 21:38:56 2025 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [build_flow_deps]: Allocs: 2983 instructions: 15126 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [build_flow_deps]: Build fdeps inserted 40226 edges +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [build_flow_deps]: Done build fdeps 40226 Tue Nov 4 21:38:56 2025 +2025-11-04T21:38:56Z USER 9072 (nc01/sg02) [ModuleForkPass]: dep_opt finished after 0.092 seconds +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 539mb, ru_maxrss: 541mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2983 memory location(s), 1 block(s), and 15126 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running report_stats +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to report_stats: modules=1 functions=1 allocs=2983 blocks=1 instructions=15126 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [ReportStats]: Data Movement Statistics: sg0002 +┌─────────────┬───────────────────────────┬───────┬───────────┐ +│ Instruction │ Kind │ Count │ Bytes │ +├─────────────┼───────────────────────────┼───────┼───────────┤ +│ DMACopy │ Input -> Internal │ 1 │ 12582912 │ +│ DMACopy │ Internal │ 2 │ 8388608 │ +│ Load │ Const -> Internal │ 1 │ 32768 │ +│ Load │ ExternalInput -> Internal │ 486 │ 213270540 │ +│ Load │ Internal │ 32 │ 12586118 │ +│ Save │ Internal │ 324 │ 12736000 │ +└─────────────┴───────────────────────────┴───────┴───────────┘ + +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [ReportStats]: +┌─────────────────────┬───────┐ +│ Bytes per partition │ Count │ +├─────────────────────┼───────┤ +│ 2 │ 2 │ +│ 4 │ 4 │ +│ 32 │ 2 │ +│ 128 │ 2 │ +│ 256 │ 1 │ +│ 384 │ 1 │ +│ 512 │ 302 │ +│ 1024 │ 97 │ +│ 4096 │ 433 │ +│ 4194304 │ 3 │ +└─────────────────────┴───────┘ + +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [ReportStats]: MM Stats: #MatMults 12554 #MatMult-Transposes 5434 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [ReportStats]: IO Tensor size combined: 348930064 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [ReportStats]: IO Tensor Statistics: +┌────────────────────┬────────────────┬──────────┬──────────────┐ +│ Largest IO Tensors │ Kind │ Src Type │ Size (Bytes) │ +├────────────────────┼────────────────┼──────────┼──────────────┤ +│ input369 │ ExternalInput │ bfloat16 │ 311164928 │ +│ input365 │ ExternalInput │ bfloat16 │ 12582912 │ +│ input368 │ ExternalInput │ bfloat16 │ 12582912 │ +│ input366 │ ExternalInput │ bfloat16 │ 12582912 │ +│ input1 │ ExternalInput │ int32 │ 8192 │ +│ input370 │ ExternalInput │ bfloat16 │ 4096 │ +│ input367 │ ExternalInput │ bfloat16 │ 4096 │ +│ input3 │ ExternalInput │ float32 │ 12 │ +│ output0 │ ExternalOutput │ int32 │ 4 │ +└────────────────────┴────────────────┴──────────┴──────────────┘ + +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [ReportStats]: Large (Internal) Tensor Statistics: +┌────────────────────────────────┬──────────┬──────────┬──────────────┐ +│ Largest Tensors │ Kind │ Src Type │ Size (Bytes) │ +├────────────────────────────────┼──────────┼──────────┼──────────────┤ +│ convert.53 │ Internal │ bfloat16 │ 8388608 │ +│ all_reduce.3-buffer-2076 │ Internal │ bfloat16 │ 8388608 │ +│ intermediate84 │ Input │ bfloat16 │ 8388608 │ +│ dot.14-buffer-2074 │ Internal │ bfloat16 │ 8388608 │ +│ intermediate83 │ Input │ bfloat16 │ 8388608 │ +│ add.9 │ Internal │ bfloat16 │ 8388608 │ +│ DynamicDMAScratchLoc │ Internal │ uint8 │ 2097152 │ +│ add.9_pftranspose_996-t1615_i7 │ Internal │ bfloat16 │ 1048576 │ +│ add.9_pftranspose_996-t1615_i6 │ Internal │ bfloat16 │ 1048576 │ +│ add.9_pftranspose_996-t1615_i5 │ Internal │ bfloat16 │ 1048576 │ +└────────────────────────────────┴──────────┴──────────┴──────────────┘ + +2025-11-04T21:38:56Z USER 9072 (nc01/sg02) [ModuleForkPass]: report_stats finished after 0.030 seconds +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 541mb, ru_maxrss: 541mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2983 memory location(s), 1 block(s), and 15126 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z USER 9072 (nc00/sg02) [ModuleForkPass]: anti_dependency_analyzer finished after 0.195 seconds +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 542mb, ru_maxrss: 542mb (delta=1mb) +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3370 memory location(s), 1 block(s), and 15845 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running anti_dependency_analyzer +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=3370 blocks=1 instructions=15845 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS} +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:56Z USER 9072 (nc00/sg02) [ModuleForkPass]: anti_dependency_analyzer finished after 0.022 seconds +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 529mb, ru_maxrss: 542mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3370 memory location(s), 1 block(s), and 15845 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running dep_opt +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to dep_opt: modules=1 functions=1 allocs=3370 blocks=1 instructions=15845 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [build_flow_deps]: Start build fdeps. Invocation: 18Tue Nov 4 21:38:56 2025 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [build_flow_deps]: Allocs: 3370 instructions: 15845 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [build_flow_deps]: Build fdeps inserted 50557 edges +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [build_flow_deps]: Done build fdeps 50557 Tue Nov 4 21:38:56 2025 +2025-11-04T21:38:56Z USER 9072 (nc00/sg02) [ModuleForkPass]: dep_opt finished after 0.085 seconds +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 529mb, ru_maxrss: 542mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3370 memory location(s), 1 block(s), and 15845 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running report_stats +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to report_stats: modules=1 functions=1 allocs=3370 blocks=1 instructions=15845 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [ReportStats]: Data Movement Statistics: sg0002 +┌─────────────┬────────────────────────────┬───────┬───────────┐ +│ Instruction │ Kind │ Count │ Bytes │ +├─────────────┼────────────────────────────┼───────┼───────────┤ +│ DMACopy │ Input -> Internal │ 1 │ 12582912 │ +│ DMACopy │ Internal │ 4 │ 8388608 │ +│ Load │ Const -> Internal │ 8 │ 348936 │ +│ Load │ ExternalInput -> Internal │ 487 │ 213274636 │ +│ Load │ Internal │ 46 │ 12905354 │ +│ Save │ Internal │ 341 │ 12751367 │ +│ Save │ Internal -> ExternalOutput │ 1 │ 4 │ +└─────────────┴────────────────────────────┴───────┴───────────┘ + +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [ReportStats]: +┌─────────────────────┬───────┐ +│ Bytes per partition │ Count │ +├─────────────────────┼───────┤ +│ 1 │ 1 │ +│ 2 │ 3 │ +│ 4 │ 9 │ +│ 8 │ 2 │ +│ 16 │ 3 │ +│ 32 │ 6 │ +│ 64 │ 2 │ +│ 128 │ 4 │ +│ 256 │ 1 │ +│ 384 │ 1 │ +│ 512 │ 302 │ +│ 1024 │ 113 │ +│ 2048 │ 1 │ +│ 4096 │ 434 │ +│ 9496 │ 2 │ +│ 4194304 │ 3 │ +└─────────────────────┴───────┘ + +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [ReportStats]: MM Stats: #MatMults 12678 #MatMult-Transposes 5434 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [ReportStats]: IO Tensor size combined: 348930064 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [ReportStats]: IO Tensor Statistics: +┌────────────────────┬────────────────┬──────────┬──────────────┐ +│ Largest IO Tensors │ Kind │ Src Type │ Size (Bytes) │ +├────────────────────┼────────────────┼──────────┼──────────────┤ +│ input369 │ ExternalInput │ bfloat16 │ 311164928 │ +│ input365 │ ExternalInput │ bfloat16 │ 12582912 │ +│ input368 │ ExternalInput │ bfloat16 │ 12582912 │ +│ input366 │ ExternalInput │ bfloat16 │ 12582912 │ +│ input1 │ ExternalInput │ int32 │ 8192 │ +│ input370 │ ExternalInput │ bfloat16 │ 4096 │ +│ input367 │ ExternalInput │ bfloat16 │ 4096 │ +│ input3 │ ExternalInput │ float32 │ 12 │ +│ output0 │ ExternalOutput │ int32 │ 4 │ +└────────────────────┴────────────────┴──────────┴──────────────┘ + +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [ReportStats]: Large (Internal) Tensor Statistics: +┌──────────────────────────┬──────────┬──────────┬──────────────┐ +│ Largest Tensors │ Kind │ Src Type │ Size (Bytes) │ +├──────────────────────────┼──────────┼──────────┼──────────────┤ +│ add.9 │ Internal │ bfloat16 │ 8388608 │ +│ convert.53 │ Internal │ bfloat16 │ 8388608 │ +│ intermediate84 │ Input │ bfloat16 │ 8388608 │ +│ dot.14-buffer-2074 │ Internal │ bfloat16 │ 8388608 │ +│ intermediate83 │ Input │ bfloat16 │ 8388608 │ +│ all_reduce.3-buffer-2076 │ Internal │ bfloat16 │ 8388608 │ +│ DynamicDMAScratchLoc │ Internal │ uint8 │ 2097152 │ +│ -t3069 │ Internal │ float32 │ 1048576 │ +│ -t3063 │ Internal │ float32 │ 1048576 │ +│ -t3058 │ Internal │ float32 │ 1048576 │ +└──────────────────────────┴──────────┴──────────┴──────────────┘ + +2025-11-04T21:38:56Z USER 9072 (nc00/sg02) [ModuleForkPass]: report_stats finished after 0.006 seconds +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 529mb, ru_maxrss: 542mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3370 memory location(s), 1 block(s), and 15845 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z USER 9072 [ModuleForkPass]: Compilation status: Total modules: 6, Passed: 6, Failed: 0 +2025-11-04T21:38:56Z USER 9072 [BackendPassManager]: mod_parallel_pass finished after 1.341 seconds +2025-11-04T21:38:56Z INFO 9072 [BackendPassManager]: curr_vmrss: 529mb, ru_maxrss: 542mb (delta=9mb) +2025-11-04T21:38:56Z USER 9072 [BackendPassManager]: Running assign_trigger_engine +2025-11-04T21:38:56Z INFO 9072 [BackendPassManager]: Inputs to assign_trigger_engine: modules=6 functions=6 allocs=15583 blocks=6 instructions=53495 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg00) [AssignTriggerEngine]: Assigned trigger engine for 118 DMA instructions. Moved 10 DMA instructions to CC's engines. +2025-11-04T21:38:56Z INFO 9072 (nc01/sg00) [AssignTriggerEngine]: Assigned trigger engine for 117 DMA instructions. Moved 9 DMA instructions to CC's engines. +2025-11-04T21:38:56Z INFO 9072 (nc00/sg01) [AssignTriggerEngine]: Assigned trigger engine for 134 DMA instructions. Moved 9 DMA instructions to CC's engines. +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [AssignTriggerEngine]: Assigned trigger engine for 133 DMA instructions. Moved 8 DMA instructions to CC's engines. +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [AssignTriggerEngine]: Assigned trigger engine for 352 DMA instructions. Moved 11 DMA instructions to CC's engines. +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [AssignTriggerEngine]: Assigned trigger engine for 333 DMA instructions. Moved 9 DMA instructions to CC's engines. +2025-11-04T21:38:56Z INFO 9072 [AssignTriggerEngine]: Limiting IO queue to SP only +2025-11-04T21:38:56Z USER 9072 [BackendPassManager]: assign_trigger_engine finished after 0.031 seconds +2025-11-04T21:38:56Z INFO 9072 [BackendPassManager]: curr_vmrss: 528mb, ru_maxrss: 542mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9072 [BackendPassManager]: Output has 6 module(s), 6 function(s), 15583 memory location(s), 6 block(s), and 53495 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z USER 9072 [BackendPassManager]: Running mod_parallel_pass +2025-11-04T21:38:56Z INFO 9072 [BackendPassManager]: Inputs to mod_parallel_pass: modules=6 functions=6 allocs=15583 blocks=6 instructions=53495 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running sync_before_global_cc +2025-11-04T21:38:56Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running sync_before_global_cc +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to sync_before_global_cc: modules=1 functions=1 allocs=3370 blocks=1 instructions=15845 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to sync_before_global_cc: modules=1 functions=1 allocs=2983 blocks=1 instructions=15126 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running sync_before_global_cc +2025-11-04T21:38:56Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running sync_before_global_cc +2025-11-04T21:38:56Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running sync_before_global_cc +2025-11-04T21:38:56Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running sync_before_global_cc +2025-11-04T21:38:56Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to sync_before_global_cc: modules=1 functions=1 allocs=2233 blocks=1 instructions=4371 Max writers: 66 Max Readers: 448 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to sync_before_global_cc: modules=1 functions=1 allocs=2382 blocks=1 instructions=6891 Max writers: 66 Max Readers: 496 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to sync_before_global_cc: modules=1 functions=1 allocs=2381 blocks=1 instructions=6888 Max writers: 66 Max Readers: 496 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to sync_before_global_cc: modules=1 functions=1 allocs=2234 blocks=1 instructions=4374 Max writers: 66 Max Readers: 448 +2025-11-04T21:38:56Z USER 9072 (nc01/sg00) [ModuleForkPass]: sync_before_global_cc finished after 0.001 seconds +2025-11-04T21:38:56Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 528mb, ru_maxrss: 542mb (delta=0mb) +2025-11-04T21:38:56Z USER 9072 (nc00/sg00) [ModuleForkPass]: sync_before_global_cc finished after 0.001 seconds +2025-11-04T21:38:56Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 528mb, ru_maxrss: 542mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2233 memory location(s), 1 block(s), and 4374 instruction(s). Max writers: 66 Max Readers: 448 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2234 memory location(s), 1 block(s), and 4377 instruction(s). Max writers: 66 Max Readers: 448 +2025-11-04T21:38:56Z USER 9072 (nc00/sg01) [ModuleForkPass]: sync_before_global_cc finished after 0.002 seconds +2025-11-04T21:38:56Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 528mb, ru_maxrss: 542mb (delta=0mb) +2025-11-04T21:38:56Z USER 9072 (nc01/sg01) [ModuleForkPass]: sync_before_global_cc finished after 0.002 seconds +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 528mb, ru_maxrss: 542mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2382 memory location(s), 1 block(s), and 6893 instruction(s). Max writers: 66 Max Readers: 496 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2381 memory location(s), 1 block(s), and 6890 instruction(s). Max writers: 66 Max Readers: 496 +2025-11-04T21:38:56Z USER 9072 (nc00/sg02) [ModuleForkPass]: sync_before_global_cc finished after 0.007 seconds +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 528mb, ru_maxrss: 542mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3370 memory location(s), 1 block(s), and 15848 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z USER 9072 (nc01/sg02) [ModuleForkPass]: sync_before_global_cc finished after 0.011 seconds +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 528mb, ru_maxrss: 542mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2983 memory location(s), 1 block(s), and 15129 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z USER 9072 [ModuleForkPass]: Compilation status: Total modules: 6, Passed: 6, Failed: 0 +2025-11-04T21:38:56Z USER 9072 [BackendPassManager]: mod_parallel_pass finished after 0.013 seconds +2025-11-04T21:38:56Z INFO 9072 [BackendPassManager]: curr_vmrss: 528mb, ru_maxrss: 542mb (delta=0mb) +2025-11-04T21:38:56Z USER 9072 [BackendPassManager]: Running assign_hwdge_engine +2025-11-04T21:38:56Z INFO 9072 [BackendPassManager]: Inputs to assign_hwdge_engine: modules=6 functions=6 allocs=15583 blocks=6 instructions=53511 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z USER 9072 [BackendPassManager]: assign_hwdge_engine finished after 0.013 seconds +2025-11-04T21:38:56Z INFO 9072 [BackendPassManager]: curr_vmrss: 528mb, ru_maxrss: 542mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9072 [BackendPassManager]: Output has 6 module(s), 6 function(s), 15583 memory location(s), 6 block(s), and 53511 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z USER 9072 [BackendPassManager]: Running mod_parallel_pass +2025-11-04T21:38:56Z INFO 9072 [BackendPassManager]: Inputs to mod_parallel_pass: modules=6 functions=6 allocs=15583 blocks=6 instructions=53511 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running alloc_queues +2025-11-04T21:38:56Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running alloc_queues +2025-11-04T21:38:56Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running alloc_queues +2025-11-04T21:38:56Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running alloc_queues +2025-11-04T21:38:56Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running alloc_queues +2025-11-04T21:38:56Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to alloc_queues: modules=1 functions=1 allocs=2233 blocks=1 instructions=4374 Max writers: 66 Max Readers: 448 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to alloc_queues: modules=1 functions=1 allocs=2382 blocks=1 instructions=6893 Max writers: 66 Max Readers: 496 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to alloc_queues: modules=1 functions=1 allocs=3370 blocks=1 instructions=15848 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to alloc_queues: modules=1 functions=1 allocs=2983 blocks=1 instructions=15129 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to alloc_queues: modules=1 functions=1 allocs=2381 blocks=1 instructions=6890 Max writers: 66 Max Readers: 496 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg01) [AllocQueues]: Alloc Queue info: +┌───────────────────┬────────────────┬────────────┬────────────┬──────────────────┐ +│ Name │ DMAQueue::Type │ Engine │ Num Queues │ Num instructions │ +├───────────────────┼────────────────┼────────────┼────────────┼──────────────────┤ +│ qSPIO0 │ input │ SP │ 16 │ 3 │ +│ qPoolSpillReload0 │ data │ Pool │ 16 │ 64 │ +│ qSPDynamicHW │ dynamic │ SP │ 16 │ 186 │ +│ qPoolDynamic │ dynamic │ Pool │ 16 │ 283 │ +│ qActDynamicHW │ dynamic │ Activation │ 16 │ 125 │ +└───────────────────┴────────────────┴────────────┴────────────┴──────────────────┘ + +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [AllocQueues]: Alloc Queue info: +┌───────────────────┬────────────────┬────────────┬────────────┬──────────────────┐ +│ Name │ DMAQueue::Type │ Engine │ Num Queues │ Num instructions │ +├───────────────────┼────────────────┼────────────┼────────────┼──────────────────┤ +│ qSPIO0 │ input │ SP │ 16 │ 2 │ +│ qPoolSpillReload0 │ data │ Pool │ 16 │ 64 │ +│ qSPDynamicHW │ dynamic │ SP │ 16 │ 186 │ +│ qPoolDynamic │ dynamic │ Pool │ 16 │ 282 │ +│ qActDynamicHW │ dynamic │ Activation │ 16 │ 125 │ +└───────────────────┴────────────────┴────────────┴────────────┴──────────────────┘ + +2025-11-04T21:38:56Z USER 9072 (nc00/sg01) [ModuleForkPass]: alloc_queues finished after 0.002 seconds +2025-11-04T21:38:56Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 529mb, ru_maxrss: 542mb (delta=0mb) +2025-11-04T21:38:56Z USER 9072 (nc01/sg01) [ModuleForkPass]: alloc_queues finished after 0.002 seconds +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 529mb, ru_maxrss: 542mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2381 memory location(s), 1 block(s), and 6890 instruction(s). Max writers: 66 Max Readers: 496 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2382 memory location(s), 1 block(s), and 6893 instruction(s). Max writers: 66 Max Readers: 496 +2025-11-04T21:38:56Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running chain_dma_transposes +2025-11-04T21:38:56Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running chain_dma_transposes +2025-11-04T21:38:56Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to chain_dma_transposes: modules=1 functions=1 allocs=2382 blocks=1 instructions=6893 Max writers: 66 Max Readers: 496 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to chain_dma_transposes: modules=1 functions=1 allocs=2381 blocks=1 instructions=6890 Max writers: 66 Max Readers: 496 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [AllocQueues]: Alloc Queue info: +┌───────────────────┬────────────────┬────────────┬────────────┬──────────────────┐ +│ Name │ DMAQueue::Type │ Engine │ Num Queues │ Num instructions │ +├───────────────────┼────────────────┼────────────┼────────────┼──────────────────┤ +│ qSPIO0 │ input │ SP │ 16 │ 6 │ +│ qSPSpillReload0 │ data │ SP │ 16 │ 15 │ +│ qPoolSpillReload0 │ data │ Pool │ 16 │ 3 │ +│ qActSpillReload0 │ data │ Activation │ 16 │ 298 │ +│ qDVESpillReload0 │ data │ DVE │ 16 │ 1 │ +│ qSPDynamicHW │ dynamic │ SP │ 16 │ 17 │ +│ qPoolDynamic │ dynamic │ Pool │ 16 │ 482 │ +│ qActDynamicHW │ dynamic │ Activation │ 16 │ 24 │ +└───────────────────┴────────────────┴────────────┴────────────┴──────────────────┘ + +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [AllocQueues]: Alloc Queue info: +┌───────────────────┬────────────────┬────────────┬────────────┬──────────────────┐ +│ Name │ DMAQueue::Type │ Engine │ Num Queues │ Num instructions │ +├───────────────────┼────────────────┼────────────┼────────────┼──────────────────┤ +│ qSPIO0 │ input │ SP │ 16 │ 8 │ +│ qSPSpillReload0 │ data │ SP │ 16 │ 32 │ +│ qDVESpillReload0 │ data │ DVE │ 16 │ 9 │ +│ qPoolSpillReload0 │ data │ Pool │ 16 │ 10 │ +│ qActSpillReload0 │ data │ Activation │ 16 │ 301 │ +│ qSPDynamicHW │ dynamic │ SP │ 16 │ 22 │ +│ qPoolDynamic │ dynamic │ Pool │ 16 │ 482 │ +│ qActDynamicHW │ dynamic │ Activation │ 16 │ 24 │ +└───────────────────┴────────────────┴────────────┴────────────┴──────────────────┘ + +2025-11-04T21:38:56Z USER 9072 (nc00/sg02) [ModuleForkPass]: alloc_queues finished after 0.003 seconds +2025-11-04T21:38:56Z USER 9072 (nc01/sg02) [ModuleForkPass]: alloc_queues finished after 0.003 seconds +2025-11-04T21:38:56Z INFO 9072 (nc01/sg00) [AllocQueues]: Alloc Queue info: +┌───────────────────┬───────���────────┬────────────┬────────────┬──────────────────┐ +│ Name │ DMAQueue::Type │ Engine │ Num Queues │ Num instructions │ +├───────────────────┼────────────────┼────────────┼────────────┼──────────────────┤ +│ qSPIO0 │ input │ SP │ 16 │ 2 │ +│ qSPSpillReload0 │ data │ SP │ 16 │ 1 │ +│ qPoolSpillReload0 │ data │ Pool │ 16 │ 64 │ +│ qSPDynamicHW │ dynamic │ SP │ 16 │ 165 │ +│ qPoolDynamic │ dynamic │ Pool │ 16 │ 140 │ +│ qActDynamicHW │ dynamic │ Activation │ 16 │ 108 │ +└───────────────────┴────────────────┴────────────┴────────────┴──────────────────┘ + +2025-11-04T21:38:56Z USER 9072 (nc01/sg00) [ModuleForkPass]: alloc_queues finished after 0.003 seconds +2025-11-04T21:38:56Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 528mb, ru_maxrss: 542mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2233 memory location(s), 1 block(s), and 4374 instruction(s). Max writers: 66 Max Readers: 448 +2025-11-04T21:38:56Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running chain_dma_transposes +2025-11-04T21:38:56Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to chain_dma_transposes: modules=1 functions=1 allocs=2233 blocks=1 instructions=4374 Max writers: 66 Max Readers: 448 +2025-11-04T21:38:56Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running alloc_queues +2025-11-04T21:38:56Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to alloc_queues: modules=1 functions=1 allocs=2234 blocks=1 instructions=4377 Max writers: 66 Max Readers: 448 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 528mb, ru_maxrss: 542mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2983 memory location(s), 1 block(s), and 15129 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running chain_dma_transposes +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to chain_dma_transposes: modules=1 functions=1 allocs=2983 blocks=1 instructions=15129 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z USER 9072 (nc01/sg00) [ModuleForkPass]: chain_dma_transposes finished after 0.001 seconds +2025-11-04T21:38:56Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 528mb, ru_maxrss: 542mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2233 memory location(s), 1 block(s), and 4374 instruction(s). Max writers: 66 Max Readers: 448 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 528mb, ru_maxrss: 542mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3370 memory location(s), 1 block(s), and 15848 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running chain_dma_transposes +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to chain_dma_transposes: modules=1 functions=1 allocs=3370 blocks=1 instructions=15848 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z USER 9072 (nc01/sg01) [ModuleForkPass]: chain_dma_transposes finished after 0.006 seconds +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 528mb, ru_maxrss: 542mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2381 memory location(s), 1 block(s), and 6890 instruction(s). Max writers: 66 Max Readers: 496 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg00) [AllocQueues]: Alloc Queue info: +┌───────────────────┬────────────────┬────────────┬────────────┬──────────────────┐ +│ Name │ DMAQueue::Type │ Engine │ Num Queues │ Num instructions │ +├───────────────────┼────────────────┼────────────┼────────────┼──────────────────┤ +│ qSPIO0 │ input │ SP │ 16 │ 3 │ +│ qSPSpillReload0 │ data │ SP │ 16 │ 1 │ +│ qPoolSpillReload0 │ data │ Pool │ 16 │ 64 │ +│ qSPDynamicHW │ dynamic │ SP │ 16 │ 165 │ +│ qPoolDynamic │ dynamic │ Pool │ 16 │ 141 │ +│ qActDynamicHW │ dynamic │ Activation │ 16 │ 108 │ +└───────────────────┴────────────────┴────────────┴────────────┴──────────────────┘ + +2025-11-04T21:38:56Z USER 9072 (nc00/sg00) [ModuleForkPass]: alloc_queues finished after 0.008 seconds +2025-11-04T21:38:56Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 528mb, ru_maxrss: 542mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2234 memory location(s), 1 block(s), and 4377 instruction(s). Max writers: 66 Max Readers: 448 +2025-11-04T21:38:56Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running chain_dma_transposes +2025-11-04T21:38:56Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to chain_dma_transposes: modules=1 functions=1 allocs=2234 blocks=1 instructions=4377 Max writers: 66 Max Readers: 448 +2025-11-04T21:38:56Z USER 9072 (nc00/sg00) [ModuleForkPass]: chain_dma_transposes finished after 0.001 seconds +2025-11-04T21:38:56Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 528mb, ru_maxrss: 542mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2234 memory location(s), 1 block(s), and 4377 instruction(s). Max writers: 66 Max Readers: 448 +2025-11-04T21:38:56Z USER 9072 (nc00/sg01) [ModuleForkPass]: chain_dma_transposes finished after 0.009 seconds +2025-11-04T21:38:56Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 528mb, ru_maxrss: 542mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2382 memory location(s), 1 block(s), and 6893 instruction(s). Max writers: 66 Max Readers: 496 +2025-11-04T21:38:56Z USER 9072 (nc01/sg02) [ModuleForkPass]: chain_dma_transposes finished after 0.007 seconds +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 528mb, ru_maxrss: 542mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2983 memory location(s), 1 block(s), and 15129 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z USER 9072 (nc00/sg02) [ModuleForkPass]: chain_dma_transposes finished after 0.007 seconds +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 528mb, ru_maxrss: 542mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3370 memory location(s), 1 block(s), and 15848 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z USER 9072 [ModuleForkPass]: Compilation status: Total modules: 6, Passed: 6, Failed: 0 +2025-11-04T21:38:56Z USER 9072 [BackendPassManager]: mod_parallel_pass finished after 0.019 seconds +2025-11-04T21:38:56Z INFO 9072 [BackendPassManager]: curr_vmrss: 528mb, ru_maxrss: 542mb (delta=0mb) +2025-11-04T21:38:56Z USER 9072 [BackendPassManager]: Running nc_parallel_pass +2025-11-04T21:38:56Z INFO 9072 [BackendPassManager]: Inputs to nc_parallel_pass: modules=6 functions=6 allocs=15583 blocks=6 instructions=53511 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z USER 9072 (nc00) [CoreForkPass]: Running insert_dma_switch_queue_instance +2025-11-04T21:38:56Z USER 9072 (nc01) [CoreForkPass]: Running insert_dma_switch_queue_instance +2025-11-04T21:38:56Z INFO 9072 (nc01) [CoreForkPass]: Inputs to insert_dma_switch_queue_instance: modules=3 functions=3 allocs=7597 blocks=3 instructions=26393 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z USER 9072 (nc01) [CoreForkPass]: insert_dma_switch_queue_instance finished after 0.001 seconds +2025-11-04T21:38:56Z INFO 9072 (nc00) [CoreForkPass]: Inputs to insert_dma_switch_queue_instance: modules=3 functions=3 allocs=7986 blocks=3 instructions=27118 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z INFO 9072 (nc01) [CoreForkPass]: curr_vmrss: 528mb, ru_maxrss: 542mb (delta=0mb) +2025-11-04T21:38:56Z USER 9072 (nc00) [CoreForkPass]: insert_dma_switch_queue_instance finished after 0.001 seconds +2025-11-04T21:38:56Z INFO 9072 (nc00) [CoreForkPass]: curr_vmrss: 528mb, ru_maxrss: 542mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9072 (nc01) [CoreForkPass]: Output has 3 module(s), 3 function(s), 7597 memory location(s), 3 block(s), and 26393 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z INFO 9072 (nc00) [CoreForkPass]: Output has 3 module(s), 3 function(s), 7986 memory location(s), 3 block(s), and 27118 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z USER 9072 [CoreForkPass]: Compilation status: Total modules: 2, Passed: 6, Failed: 0 +2025-11-04T21:38:56Z USER 9072 [BackendPassManager]: nc_parallel_pass finished after 0.002 seconds +2025-11-04T21:38:56Z INFO 9072 [BackendPassManager]: curr_vmrss: 528mb, ru_maxrss: 542mb (delta=0mb) +2025-11-04T21:38:56Z USER 9072 [BackendPassManager]: Running mod_parallel_pass +2025-11-04T21:38:56Z INFO 9072 [BackendPassManager]: Inputs to mod_parallel_pass: modules=6 functions=6 allocs=15583 blocks=6 instructions=53511 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running prefetch_scheduling_after_sched +2025-11-04T21:38:56Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running prefetch_scheduling_after_sched +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to prefetch_scheduling_after_sched: modules=1 functions=1 allocs=2381 blocks=1 instructions=6890 Max writers: 66 Max Readers: 496 +2025-11-04T21:38:56Z USER 9072 (nc01/sg01) [ModuleForkPass]: prefetch_scheduling_after_sched finished after 0.000 seconds +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to prefetch_scheduling_after_sched: modules=1 functions=1 allocs=2983 blocks=1 instructions=15129 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 528mb, ru_maxrss: 542mb (delta=0mb) +2025-11-04T21:38:56Z USER 9072 (nc01/sg02) [ModuleForkPass]: prefetch_scheduling_after_sched finished after 0.000 seconds +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 528mb, ru_maxrss: 542mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2381 memory location(s), 1 block(s), and 6890 instruction(s). Max writers: 66 Max Readers: 496 +2025-11-04T21:38:56Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running lower_control +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2983 memory location(s), 1 block(s), and 15129 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running lower_control +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to lower_control: modules=1 functions=1 allocs=2381 blocks=1 instructions=6890 Max writers: 66 Max Readers: 496 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to lower_control: modules=1 functions=1 allocs=2983 blocks=1 instructions=15129 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running prefetch_scheduling_after_sched +2025-11-04T21:38:56Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to prefetch_scheduling_after_sched: modules=1 functions=1 allocs=2382 blocks=1 instructions=6893 Max writers: 66 Max Readers: 496 +2025-11-04T21:38:56Z USER 9072 (nc00/sg01) [ModuleForkPass]: prefetch_scheduling_after_sched finished after 0.000 seconds +2025-11-04T21:38:56Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 528mb, ru_maxrss: 542mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2382 memory location(s), 1 block(s), and 6893 instruction(s). Max writers: 66 Max Readers: 496 +2025-11-04T21:38:56Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running lower_control +2025-11-04T21:38:56Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to lower_control: modules=1 functions=1 allocs=2382 blocks=1 instructions=6893 Max writers: 66 Max Readers: 496 +2025-11-04T21:38:56Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running prefetch_scheduling_after_sched +2025-11-04T21:38:56Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running prefetch_scheduling_after_sched +2025-11-04T21:38:56Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running prefetch_scheduling_after_sched +2025-11-04T21:38:56Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to prefetch_scheduling_after_sched: modules=1 functions=1 allocs=2233 blocks=1 instructions=4374 Max writers: 66 Max Readers: 448 +2025-11-04T21:38:56Z USER 9072 (nc01/sg00) [ModuleForkPass]: prefetch_scheduling_after_sched finished after 0.000 seconds +2025-11-04T21:38:56Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to prefetch_scheduling_after_sched: modules=1 functions=1 allocs=2234 blocks=1 instructions=4377 Max writers: 66 Max Readers: 448 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 528mb, ru_maxrss: 542mb (delta=0mb) +2025-11-04T21:38:56Z USER 9072 (nc00/sg00) [ModuleForkPass]: prefetch_scheduling_after_sched finished after 0.000 seconds +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to prefetch_scheduling_after_sched: modules=1 functions=1 allocs=3370 blocks=1 instructions=15848 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 528mb, ru_maxrss: 542mb (delta=0mb) +2025-11-04T21:38:56Z USER 9072 (nc00/sg02) [ModuleForkPass]: prefetch_scheduling_after_sched finished after 0.000 seconds +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 528mb, ru_maxrss: 542mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2233 memory location(s), 1 block(s), and 4374 instruction(s). Max writers: 66 Max Readers: 448 +2025-11-04T21:38:56Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running lower_control +2025-11-04T21:38:56Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2234 memory location(s), 1 block(s), and 4377 instruction(s). Max writers: 66 Max Readers: 448 +2025-11-04T21:38:56Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running lower_control +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3370 memory location(s), 1 block(s), and 15848 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running lower_control +2025-11-04T21:38:56Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to lower_control: modules=1 functions=1 allocs=2233 blocks=1 instructions=4374 Max writers: 66 Max Readers: 448 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to lower_control: modules=1 functions=1 allocs=2234 blocks=1 instructions=4377 Max writers: 66 Max Readers: 448 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to lower_control: modules=1 functions=1 allocs=3370 blocks=1 instructions=15848 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg00) [LowerControl]: EraseInterBbDeps removed 0 inter-BB deps +2025-11-04T21:38:56Z INFO 9072 (nc01/sg00) [LowerControl]: EraseInterBbDeps removed 0 inter-BB deps +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [LowerControl]: EraseInterBbDeps removed 0 inter-BB deps +2025-11-04T21:38:56Z USER 9072 (nc01/sg00) [ModuleForkPass]: lower_control finished after 0.007 seconds +2025-11-04T21:38:56Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 528mb, ru_maxrss: 542mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2233 memory location(s), 1 block(s), and 4374 instruction(s). Max writers: 66 Max Readers: 448 +2025-11-04T21:38:56Z USER 9072 (nc01/sg00) [ModuleForkPass]: Running dep_reduction +2025-11-04T21:38:56Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Inputs to dep_reduction: modules=1 functions=1 allocs=2233 blocks=1 instructions=4374 Max writers: 66 Max Readers: 448 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg00) [DepReduction]: Start Dependency Reduction +2025-11-04T21:38:56Z INFO 9072 (nc01/sg00) [DepReduction]: Cacheing dependencies for debug info +2025-11-04T21:38:56Z INFO 9072 (nc00/sg01) [LowerControl]: EraseInterBbDeps removed 0 inter-BB deps +2025-11-04T21:38:56Z USER 9072 (nc00/sg00) [ModuleForkPass]: lower_control finished after 0.009 seconds +2025-11-04T21:38:56Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 528mb, ru_maxrss: 542mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2234 memory location(s), 1 block(s), and 4377 instruction(s). Max writers: 66 Max Readers: 448 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [LowerControl]: EraseInterBbDeps removed 0 inter-BB deps +2025-11-04T21:38:56Z USER 9072 (nc00/sg00) [ModuleForkPass]: Running dep_reduction +2025-11-04T21:38:56Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Inputs to dep_reduction: modules=1 functions=1 allocs=2234 blocks=1 instructions=4377 Max writers: 66 Max Readers: 448 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg00) [DepReduction]: Start Dependency Reduction +2025-11-04T21:38:56Z INFO 9072 (nc00/sg00) [DepReduction]: Cacheing dependencies for debug info +2025-11-04T21:38:56Z USER 9072 (nc01/sg01) [ModuleForkPass]: lower_control finished after 0.022 seconds +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 528mb, ru_maxrss: 542mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2381 memory location(s), 1 block(s), and 6890 instruction(s). Max writers: 66 Max Readers: 496 +2025-11-04T21:38:56Z USER 9072 (nc01/sg01) [ModuleForkPass]: Running dep_reduction +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Inputs to dep_reduction: modules=1 functions=1 allocs=2381 blocks=1 instructions=6890 Max writers: 66 Max Readers: 496 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [DepReduction]: Start Dependency Reduction +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [DepReduction]: Cacheing dependencies for debug info +2025-11-04T21:38:56Z INFO 9072 (nc01/sg00) [DepReduction]: Processing async instrs... +2025-11-04T21:38:56Z INFO 9072 (nc01/sg00) [DepReduction]: Processing secondary edges per engine... +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [LowerControl]: EraseInterBbDeps removed 0 inter-BB deps +2025-11-04T21:38:56Z USER 9072 (nc00/sg01) [ModuleForkPass]: lower_control finished after 0.027 seconds +2025-11-04T21:38:56Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 529mb, ru_maxrss: 542mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2382 memory location(s), 1 block(s), and 6893 instruction(s). Max writers: 66 Max Readers: 496 +2025-11-04T21:38:56Z USER 9072 (nc00/sg01) [ModuleForkPass]: Running dep_reduction +2025-11-04T21:38:56Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Inputs to dep_reduction: modules=1 functions=1 allocs=2382 blocks=1 instructions=6893 Max writers: 66 Max Readers: 496 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg01) [DepReduction]: Start Dependency Reduction +2025-11-04T21:38:56Z INFO 9072 (nc00/sg01) [DepReduction]: Cacheing dependencies for debug info +2025-11-04T21:38:56Z INFO 9072 (nc00/sg00) [DepReduction]: Processing async instrs... +2025-11-04T21:38:56Z INFO 9072 (nc00/sg00) [DepReduction]: Processing secondary edges per engine... +2025-11-04T21:38:56Z INFO 9072 (nc01/sg00) [DepReduction]: Processing secondary edges per engine, Done. Num edges removed 5259 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg00) [DepReduction]: Processing redundant descendants, Done. Num edges removed 5708 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg00) [DepReduction]: Processing async instrs, Done. Num edges removed 5708 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg00) [DepReduction]: Processing secondary edges per engine, Done. Num edges removed 5265 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [DepReduction]: Processing async instrs... +2025-11-04T21:38:56Z INFO 9072 (nc00/sg00) [DepReduction]: Processing redundant descendants, Done. Num edges removed 5711 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg00) [DepReduction]: Processing async instrs, Done. Num edges removed 5711 +2025-11-04T21:38:56Z USER 9072 (nc01/sg02) [ModuleForkPass]: lower_control finished after 0.047 seconds +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [DepReduction]: Processing secondary edges per engine... +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 531mb, ru_maxrss: 542mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2983 memory location(s), 1 block(s), and 15129 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z USER 9072 (nc01/sg02) [ModuleForkPass]: Running dep_reduction +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Inputs to dep_reduction: modules=1 functions=1 allocs=2983 blocks=1 instructions=15129 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [DepReduction]: Start Dependency Reduction +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [DepReduction]: Cacheing dependencies for debug info +2025-11-04T21:38:56Z USER 9072 (nc00/sg02) [ModuleForkPass]: lower_control finished after 0.042 seconds +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 531mb, ru_maxrss: 542mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3370 memory location(s), 1 block(s), and 15848 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z USER 9072 (nc00/sg02) [ModuleForkPass]: Running dep_reduction +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Inputs to dep_reduction: modules=1 functions=1 allocs=3370 blocks=1 instructions=15848 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [DepReduction]: Start Dependency Reduction +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [DepReduction]: Cacheing dependencies for debug info +2025-11-04T21:38:56Z INFO 9072 (nc00/sg01) [DepReduction]: Processing async instrs... +2025-11-04T21:38:56Z INFO 9072 (nc00/sg01) [DepReduction]: Processing secondary edges per engine... +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [DepReduction]: Processing secondary edges per engine, Done. Num edges removed 9325 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg01) [DepReduction]: Processing secondary edges per engine, Done. Num edges removed 9333 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [DepReduction]: Processing redundant descendants, Done. Num edges removed 9954 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [DepReduction]: Processing async instrs, Done. Num edges removed 9954 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg01) [DepReduction]: Processing redundant descendants, Done. Num edges removed 9963 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg01) [DepReduction]: Processing async instrs, Done. Num edges removed 9963 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [DepReduction]: Processing async instrs... +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [DepReduction]: Processing secondary edges per engine... +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [DepReduction]: Processing async instrs... +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [DepReduction]: Processing secondary edges per engine... +2025-11-04T21:38:56Z INFO 9072 (nc01/sg00) [DepReduction]: Num Async removed: 0 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg00) [DepReduction]: Finished dependency reduction: 20989 removed, new total 2299 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg00) [DepReduction]: Finished Dependency Reduction +2025-11-04T21:38:56Z USER 9072 (nc01/sg00) [ModuleForkPass]: dep_reduction finished after 0.087 seconds +2025-11-04T21:38:56Z INFO 9072 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 542mb, ru_maxrss: 542mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9072 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2233 memory location(s), 1 block(s), and 4374 instruction(s). Max writers: 66 Max Readers: 448 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [DepReduction]: Processing secondary edges per engine, Done. Num edges removed 15321 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [DepReduction]: Processing secondary edges per engine, Done. Num edges removed 14763 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg00) [DepReduction]: Num Async removed: 0 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg00) [DepReduction]: Finished dependency reduction: 20853 removed, new total 2300 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg00) [DepReduction]: Finished Dependency Reduction +2025-11-04T21:38:56Z USER 9072 (nc00/sg00) [ModuleForkPass]: dep_reduction finished after 0.112 seconds +2025-11-04T21:38:56Z INFO 9072 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 545mb, ru_maxrss: 545mb (delta=3mb) +2025-11-04T21:38:56Z INFO 9072 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2234 memory location(s), 1 block(s), and 4377 instruction(s). Max writers: 66 Max Readers: 448 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [DepReduction]: Processing redundant descendants, Done. Num edges removed 16538 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg02) [DepReduction]: Processing async instrs, Done. Num edges removed 16538 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [DepReduction]: Processing redundant descendants, Done. Num edges removed 15614 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg02) [DepReduction]: Processing async instrs, Done. Num edges removed 15614 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg01) [DepReduction]: Num Async removed: 0 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg01) [DepReduction]: Finished dependency reduction: 37741 removed, new total 2634 +2025-11-04T21:38:56Z INFO 9072 (nc00/sg01) [DepReduction]: Finished Dependency Reduction +2025-11-04T21:38:56Z USER 9072 (nc00/sg01) [ModuleForkPass]: dep_reduction finished after 0.136 seconds +2025-11-04T21:38:56Z INFO 9072 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 547mb, ru_maxrss: 547mb (delta=5mb) +2025-11-04T21:38:56Z INFO 9072 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2382 memory location(s), 1 block(s), and 6893 instruction(s). Max writers: 66 Max Readers: 496 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [DepReduction]: Num Async removed: 0 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [DepReduction]: Finished dependency reduction: 37730 removed, new total 2632 +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [DepReduction]: Finished Dependency Reduction +2025-11-04T21:38:56Z USER 9072 (nc01/sg01) [ModuleForkPass]: dep_reduction finished after 0.165 seconds +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 547mb, ru_maxrss: 547mb (delta=5mb) +2025-11-04T21:38:56Z INFO 9072 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2381 memory location(s), 1 block(s), and 6890 instruction(s). Max writers: 66 Max Readers: 496 +2025-11-04T21:38:57Z INFO 9072 (nc00/sg02) [DepReduction]: Num Async removed: 0 +2025-11-04T21:38:57Z INFO 9072 (nc00/sg02) [DepReduction]: Finished dependency reduction: 83724 removed, new total 4715 +2025-11-04T21:38:57Z INFO 9072 (nc00/sg02) [DepReduction]: Finished Dependency Reduction +2025-11-04T21:38:57Z USER 9072 (nc00/sg02) [ModuleForkPass]: dep_reduction finished after 0.233 seconds +2025-11-04T21:38:57Z INFO 9072 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 549mb, ru_maxrss: 549mb (delta=7mb) +2025-11-04T21:38:57Z INFO 9072 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3370 memory location(s), 1 block(s), and 15848 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z INFO 9072 (nc01/sg02) [DepReduction]: Num Async removed: 0 +2025-11-04T21:38:57Z INFO 9072 (nc01/sg02) [DepReduction]: Finished dependency reduction: 69964 removed, new total 3879 +2025-11-04T21:38:57Z INFO 9072 (nc01/sg02) [DepReduction]: Finished Dependency Reduction +2025-11-04T21:38:57Z USER 9072 (nc01/sg02) [ModuleForkPass]: dep_reduction finished after 0.256 seconds +2025-11-04T21:38:57Z INFO 9072 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 549mb, ru_maxrss: 549mb (delta=7mb) +2025-11-04T21:38:57Z INFO 9072 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2983 memory location(s), 1 block(s), and 15129 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z USER 9072 [ModuleForkPass]: Compilation status: Total modules: 6, Passed: 6, Failed: 0 +2025-11-04T21:38:57Z USER 9072 [BackendPassManager]: mod_parallel_pass finished after 0.320 seconds +2025-11-04T21:38:57Z INFO 9072 [BackendPassManager]: curr_vmrss: 544mb, ru_maxrss: 549mb (delta=7mb) +2025-11-04T21:38:57Z USER 9072 [BackendPassManager]: Running nc_parallel_pass +2025-11-04T21:38:57Z INFO 9072 [BackendPassManager]: Inputs to nc_parallel_pass: modules=6 functions=6 allocs=15583 blocks=6 instructions=53511 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z USER 9072 (nc00) [CoreForkPass]: Running bir_linker +2025-11-04T21:38:57Z INFO 9072 (nc00) [CoreForkPass]: Inputs to bir_linker: modules=3 functions=3 allocs=7986 blocks=3 instructions=27118 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [BirLinker]: bir_linker cwd: +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [BirLinker]: Num intermediates 86 +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [BirLinker]: Num Module Definitions 3 +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [BirLinker]: Linking to a call-graph structure +2025-11-04T21:38:57Z USER 9072 (nc01) [CoreForkPass]: Running bir_linker +2025-11-04T21:38:57Z INFO 9072 (nc01) [CoreForkPass]: Inputs to bir_linker: modules=3 functions=3 allocs=7597 blocks=3 instructions=26393 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [BirLinker]: bir_linker cwd: +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [BirLinker]: DMA Descriptor ReUse Enabled. +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [BirLinker]: Num intermediates 86 +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [BirLinker]: Num Module Definitions 3 +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [BirLinker]: Linking to a call-graph structure +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [BirLinker]: Added a new SpillReload Que qSPPIOParam0 +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [BirLinker]: tensor_map verification successful. +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [BirLinker]: Writing updated tensor_map /home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk4/neuronxcc-yihckw_e/nc01/sgLnk/sg00/tensor_map.json +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [BirLinker]: tensor_map verification successful. +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [BirLinker]: Writing updated tensor_map /home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk4/neuronxcc-yihckw_e/nc00/sgLnk/sg00/tensor_map.json +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [BirLinker]: PostLink Stats: #MatMults 136739 #MatMult-Transposes 19275 +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [BirLinker]: Total Intermediate MMTs 432 #out: 0 #inp: 432 #symmetric: 0 +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [BirLinker]: Total Intermediate IOs with MMTs: 2 #out: 0 #inp: 2 #both: 0 +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [BirLinker]: releasing pre-link modules +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [BirLinker]: PostLink Stats: #MatMults 136863 #MatMult-Transposes 19275 +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [BirLinker]: Total Intermediate MMTs 432 #out: 0 #inp: 432 #symmetric: 0 +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [BirLinker]: Total Intermediate IOs with MMTs: 2 #out: 0 #inp: 2 #both: 0 +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [BirLinker]: releasing pre-link modules +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [BirLinker]: linking Done. +2025-11-04T21:38:57Z USER 9072 (nc01) [CoreForkPass]: bir_linker finished after 0.549 seconds +2025-11-04T21:38:57Z INFO 9072 (nc01) [CoreForkPass]: curr_vmrss: 852mb, ru_maxrss: 852mb (delta=303mb) +2025-11-04T21:38:57Z INFO 9072 (nc01) [CoreForkPass]: Output has 1 module(s), 4 function(s), 8111 memory location(s), 4 block(s), and 26435 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z USER 9072 (nc01) [CoreForkPass]: Running postlnk_dma_report +2025-11-04T21:38:57Z INFO 9072 (nc01) [CoreForkPass]: Inputs to postlnk_dma_report: modules=1 functions=4 allocs=8111 blocks=4 instructions=26435 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [DMAReport]: DMA Report: Bytes loaded or saved 406071194, 72.0732% input load, 2.19491% output write, 25.7318% spill/reload +2025-11-04T21:38:57Z USER 9072 (nc01) [CoreForkPass]: postlnk_dma_report finished after 0.006 seconds +2025-11-04T21:38:57Z INFO 9072 (nc01) [CoreForkPass]: curr_vmrss: 624mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:57Z INFO 9072 (nc01) [CoreForkPass]: Output has 1 module(s), 4 function(s), 8111 memory location(s), 4 block(s), and 26435 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z USER 9072 (nc01) [CoreForkPass]: Running report_stats +2025-11-04T21:38:57Z INFO 9072 (nc01) [CoreForkPass]: Inputs to report_stats: modules=1 functions=4 allocs=8111 blocks=4 instructions=26435 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [ReportStats]: Data Movement Statistics: main +┌─────────────┬──────┬───────┬───────┐ +│ Instruction │ Kind │ Count │ Bytes │ +└─────────────┴──────┴───────┴───────┘ + +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [ReportStats]: +┌─────────────────────┬───────┐ +│ Bytes per partition │ Count │ +└─────────────────────┴───────┘ + +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [ReportStats]: Data Movement Statistics: sg0000 +┌─────────────────┬────────────────────────────┬───────┬────────────┐ +│ Instruction │ Kind │ Count │ Bytes │ +├─────────────────┼────────────────────────────┼───────┼────────────┤ +│ DMACopy │ ExternalInput -> Internal │ 32 │ 9957277696 │ +│ DMACopy │ Internal -> ExternalOutput │ 64 │ 2147483648 │ +│ DMACopy (Spill) │ Internal │ 64 │ 0 │ +│ Load │ Const -> Internal │ 5 │ 73984 │ +│ Load │ ExternalInput -> Internal │ 28 │ 10502660 │ +│ Load │ Internal │ 161 │ 15204352 │ +│ Save │ Internal │ 108 │ 14680064 │ +│ Save │ Internal -> Output │ 18 │ 4718592 │ +└─────────────────┴────────────────────────────┴───────┴────────────┘ + +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [ReportStats]: +┌─────────────────────┬───────┐ +│ Bytes per partition │ Count │ +├─────────────────────┼───────┤ +│ 2 │ 4 │ +│ 4 │ 1 │ +│ 32 │ 1 │ +│ 64 │ 1 │ +│ 128 │ 2 │ +│ 256 │ 194 │ +│ 512 │ 1 │ +│ 1024 │ 16 │ +│ 2048 │ 90 │ +│ 4096 │ 42 │ +│ 1048576 │ 64 │ +└─────────────────────┴───────┘ + +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [ReportStats]: Data Movement Statistics: sg0001 +┌─────────────────┬────────────────────────────┬───────┬────────────┐ +│ Instruction │ Kind │ Count │ Bytes │ +├─────────────────┼────────────────────────────┼───────┼────────────┤ +│ DMACopy │ Input -> Internal │ 1 │ 12582912 │ +│ DMACopy │ Internal -> ExternalOutput │ 64 │ 2147483648 │ +│ DMACopy (Spill) │ Internal │ 64 │ 0 │ +│ Load │ Const -> Internal │ 5 │ 98304 │ +│ Load │ ExternalInput -> Internal │ 209 │ 68166148 │ +│ Load │ Input -> Internal │ 2 │ 524288 │ +│ Load │ Internal │ 181 │ 25690112 │ +│ Save │ Internal │ 125 │ 23592960 │ +│ Save │ Internal -> Output │ 8 │ 4194304 │ +└─────────────────┴────────────────────────────┴───────┴────────────┘ + +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [ReportStats]: +┌─────────────────────┬───────┐ +│ Bytes per partition │ Count │ +├─────────────────────┼───────┤ +│ 2 │ 4 │ +│ 4 │ 1 │ +│ 32 │ 2 │ +│ 128 │ 4 │ +│ 256 │ 193 │ +│ 1024 │ 112 │ +│ 2048 │ 42 │ +│ 4096 │ 172 │ +│ 1048576 │ 64 │ +│ 4194304 │ 3 │ +└─────────────────────┴───────┘ + +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [ReportStats]: Data Movement Statistics: sg0002 +┌─────────────┬───────────────────────────┬───────┬───────────┐ +│ Instruction │ Kind │ Count │ Bytes │ +├─────────────┼───────────────────────────┼───────┼───────────┤ +│ DMACopy │ Input -> Internal │ 1 │ 12582912 │ +│ DMACopy │ Internal │ 2 │ 8388608 │ +│ Load │ Const -> Internal │ 1 │ 32768 │ +│ Load │ ExternalInput -> Internal │ 486 │ 213270540 │ +│ Load │ Internal │ 32 │ 12586118 │ +│ Save │ Internal │ 324 │ 12736000 │ +└─────────────┴───────────────────────────┴───────┴───────────┘ + +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [ReportStats]: +┌─────────────────────┬───────┐ +│ Bytes per partition │ Count │ +├─────────────────────┼───────┤ +│ 2 │ 2 │ +│ 4 │ 4 │ +│ 32 │ 2 │ +│ 128 │ 2 │ +│ 256 │ 1 │ +│ 384 │ 1 │ +│ 512 │ 302 │ +│ 1024 │ 97 │ +│ 4096 │ 433 │ +│ 4194304 │ 3 │ +└─────────────────────┴───────┘ + +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [ReportStats]: MM Stats: #MatMults 19219 #MatMult-Transposes 6379 +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [ReportStats]: IO Tensor size combined: 6781451308 +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [ReportStats]: IO Tensor Statistics: +┌────────────────────┬────────────────┬──────────┬──────────────┐ +│ Largest IO Tensors │ Kind │ Src Type │ Size (Bytes) │ +├────────────────────┼────────────────┼──────────┼──────────────┤ +│ input60_sg0000 │ ExternalInput │ bfloat16 │ 311164928 │ +│ input369_sg0002 │ ExternalInput │ bfloat16 │ 311164928 │ +│ input60 │ ExternalInput │ bfloat16 │ 311164928 │ +│ input369 │ ExternalInput │ bfloat16 │ 311164928 │ +│ output3 │ ExternalOutput │ bfloat16 │ 33554432 │ +│ output2 │ ExternalOutput │ bfloat16 │ 33554432 │ +│ input5 │ ExternalInput │ bfloat16 │ 33554432 │ +│ output7 │ ExternalOutput │ bfloat16 │ 33554432 │ +│ input4 │ ExternalInput │ bfloat16 │ 33554432 │ +│ output11 │ ExternalOutput │ bfloat16 │ 33554432 │ +└────────────────────┴────────────────┴──────────┴──────────────┘ + +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [ReportStats]: Large (Internal) Tensor Statistics: +┌─────────────────┬───────────────────┬──────────┬──────────────┐ +│ Largest Tensors │ Kind │ Src Type │ Size (Bytes) │ +├─────────────────┼───────────────────┼──────────┼──────────────┤ +│ intermediate3 │ InternalInterface │ bfloat16 │ 8388608 │ +│ intermediate0 │ InternalInterface │ bfloat16 │ 8388608 │ +│ intermediate20 │ InternalInterface │ bfloat16 │ 8388608 │ +│ intermediate11 │ InternalInterface │ bfloat16 │ 8388608 │ +│ intermediate5 │ InternalInterface │ bfloat16 │ 8388608 │ +│ intermediate14 │ InternalInterface │ bfloat16 │ 8388608 │ +│ intermediate26 │ InternalInterface │ bfloat16 │ 8388608 │ +│ intermediate23 │ InternalInterface │ bfloat16 │ 8388608 │ +│ intermediate17 │ InternalInterface │ bfloat16 │ 8388608 │ +│ intermediate8 │ InternalInterface │ bfloat16 │ 8388608 │ +└─────────────────┴───────────────────┴──────────┴──────────────┘ + +2025-11-04T21:38:57Z USER 9072 (nc01) [CoreForkPass]: report_stats finished after 0.011 seconds +2025-11-04T21:38:57Z INFO 9072 (nc01) [CoreForkPass]: curr_vmrss: 624mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:57Z INFO 9072 (nc01) [CoreForkPass]: Output has 1 module(s), 4 function(s), 8111 memory location(s), 4 block(s), and 26435 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z USER 9072 (nc01) [CoreForkPass]: Running coloring_allocator_dram_post_lnk +2025-11-04T21:38:57Z INFO 9072 (nc01) [CoreForkPass]: Inputs to coloring_allocator_dram_post_lnk: modules=1 functions=4 allocs=8111 blocks=4 instructions=26435 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [DRAM_Allocator]: allocating spills in DRAM post_link mode for address space Local +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [DRAM_Allocator]: reserved space = 0 bytes +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [DRAM_Allocator]: spill space = 0 bytes +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [DRAM_Allocator]: aligned spill space = 0 bytes +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [DRAM_Allocator]: dram space = 107374182400 bytes +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [DRAM_Allocator]: renumber locations +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [DRAM_Allocator]: size = 0 +2025-11-04T21:38:57Z INFO 9072 []: find first defs for local +2025-11-04T21:38:57Z INFO 9072 []: find first defs for global +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [DRAM_Allocator]: Num intervals 0 Num locations 0 +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [DRAM_Allocator]: IntervalTree Build Done +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [DRAM_Allocator]: info.neighbors init Done +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [DRAM_Allocator]: IntervalTree readback Done +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [DRAM_Allocator]: simplify interference graph +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [DRAM_Allocator]: initialize low and high +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [DRAM_Allocator]: lo = 0 +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [DRAM_Allocator]: hi = 0 +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [DRAM_Allocator]: total = 0 +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [DRAM_Allocator]: simplify +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [DRAM_Allocator]: new candidates = 0 +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [DRAM_Allocator]: Already used DRAM hwm: 58720256 +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [DRAM_Allocator]: select ranges +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [DRAM_Allocator]: CC buffer size limit 524288000 +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [DRAM_Allocator]: allreduce_dram_hwm 58720256 +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [DRAM_Allocator]: Real CC buffer size 58720256 +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [DRAM_Allocator]: DRAM hwm after allocation: 58720256 +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [DRAM_Allocator]: DRAM allocation successful +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:57Z USER 9072 (nc01) [CoreForkPass]: coloring_allocator_dram_post_lnk finished after 0.042 seconds +2025-11-04T21:38:57Z INFO 9072 (nc01) [CoreForkPass]: curr_vmrss: 624mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:57Z INFO 9072 (nc01) [CoreForkPass]: Output has 1 module(s), 4 function(s), 8111 memory location(s), 4 block(s), and 26435 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z USER 9072 (nc01) [CoreForkPass]: Running coloring_allocator_dram_shared_post_lnk +2025-11-04T21:38:57Z INFO 9072 (nc01) [CoreForkPass]: Inputs to coloring_allocator_dram_shared_post_lnk: modules=1 functions=4 allocs=8111 blocks=4 instructions=26435 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [DRAM_Allocator]: allocating spills in DRAM post_link mode for address space Shared +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [DRAM_Allocator]: reserved space = 0 bytes +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [DRAM_Allocator]: spill space = 470810680 bytes +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [DRAM_Allocator]: aligned spill space = 470925312 bytes +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [DRAM_Allocator]: dram space = 107374182400 bytes +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [DRAM_Allocator]: Skipping shared tensor allocations on core 1, marking as remoteLocalTarget instead +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [BirLinker]: linking Done. +2025-11-04T21:38:57Z USER 9072 (nc00) [CoreForkPass]: bir_linker finished after 0.654 seconds +2025-11-04T21:38:57Z INFO 9072 (nc00) [CoreForkPass]: curr_vmrss: 624mb, ru_maxrss: 852mb (delta=303mb) +2025-11-04T21:38:57Z INFO 9072 (nc00) [CoreForkPass]: Output has 1 module(s), 4 function(s), 8500 memory location(s), 4 block(s), and 27160 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z USER 9072 (nc00) [CoreForkPass]: Running postlnk_dma_report +2025-11-04T21:38:57Z INFO 9072 (nc00) [CoreForkPass]: Inputs to postlnk_dma_report: modules=1 functions=4 allocs=8500 blocks=4 instructions=27160 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DMAReport]: DMA Report: Bytes loaded or saved 406726069, 72.0359% input load, 2.19138% output write, 25.7727% spill/reload +2025-11-04T21:38:57Z USER 9072 (nc00) [CoreForkPass]: postlnk_dma_report finished after 0.006 seconds +2025-11-04T21:38:57Z INFO 9072 (nc00) [CoreForkPass]: curr_vmrss: 624mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:57Z INFO 9072 (nc00) [CoreForkPass]: Output has 1 module(s), 4 function(s), 8500 memory location(s), 4 block(s), and 27160 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z USER 9072 (nc00) [CoreForkPass]: Running report_stats +2025-11-04T21:38:57Z INFO 9072 (nc00) [CoreForkPass]: Inputs to report_stats: modules=1 functions=4 allocs=8500 blocks=4 instructions=27160 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [ReportStats]: Data Movement Statistics: main +┌─────────────┬──────┬───────┬───────┐ +│ Instruction │ Kind │ Count │ Bytes │ +└─────────────┴──────┴───────┴───────┘ + +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [ReportStats]: +┌─────────────────────┬───────┐ +│ Bytes per partition │ Count │ +└─────────────────────┴───────┘ + +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [ReportStats]: Data Movement Statistics: sg0000 +┌─────────────────┬────────────────────────────┬───────┬────────────┐ +│ Instruction │ Kind │ Count │ Bytes │ +├─────────────────┼────────────────────────────┼───────┼────────────┤ +│ DMACopy │ ExternalInput -> Internal │ 32 │ 9957277696 │ +│ DMACopy │ Internal -> ExternalOutput │ 64 │ 2147483648 │ +│ DMACopy │ Internal -> Output │ 1 │ 16777216 │ +│ DMACopy (Spill) │ Internal │ 64 │ 0 │ +│ Load │ Const -> Internal │ 5 │ 73984 │ +│ Load │ ExternalInput -> Internal │ 28 │ 10502660 │ +│ Load │ Internal │ 161 │ 15204352 │ +│ Save │ Internal │ 108 │ 14680064 │ +│ Save │ Internal -> Output │ 19 │ 4718594 │ +└─────────────────┴────────────────────────────┴───────┴────────────┘ + +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [ReportStats]: +┌─────────────────────┬───────┐ +│ Bytes per partition │ Count │ +├─────────────────────┼───────┤ +│ 2 │ 5 │ +│ 4 │ 1 │ +│ 32 │ 1 │ +│ 64 │ 1 │ +│ 128 │ 2 │ +│ 256 │ 194 │ +│ 512 │ 1 │ +│ 1024 │ 16 │ +│ 2048 │ 90 │ +│ 4096 │ 42 │ +│ 1048576 │ 64 │ +│ 8388608 │ 2 │ +└─────────────────────┴───────┘ + +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [ReportStats]: Data Movement Statistics: sg0001 +┌─────────────────┬────────────────────────────┬───────┬───────────��┐ +│ Instruction │ Kind │ Count │ Bytes │ +├─────────────────┼────────────────────────────┼───────┼────────────┤ +│ DMACopy │ Input -> Internal │ 1 │ 12582912 │ +│ DMACopy │ Internal -> ExternalOutput │ 64 │ 2147483648 │ +│ DMACopy │ Internal -> Output │ 1 │ 16777216 │ +│ DMACopy (Spill) │ Internal │ 64 │ 0 │ +│ Load │ Const -> Internal │ 5 │ 98304 │ +│ Load │ ExternalInput -> Internal │ 209 │ 68166148 │ +│ Load │ Input -> Internal │ 2 │ 524288 │ +│ Load │ Internal │ 181 │ 25690112 │ +│ Save │ Internal │ 125 │ 23592960 │ +│ Save │ Internal -> Output │ 9 │ 4194306 │ +└─────────────────┴────────────────────────────┴───────┴────────────┘ + +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [ReportStats]: +┌─────────────────────┬───────┐ +│ Bytes per partition │ Count │ +├─────────────────────┼───────┤ +│ 2 │ 5 │ +│ 4 │ 1 │ +│ 32 │ 2 │ +│ 128 │ 4 │ +│ 256 │ 193 │ +│ 1024 │ 112 │ +│ 2048 │ 42 │ +│ 4096 │ 172 │ +│ 1048576 │ 64 │ +│ 4194304 │ 3 │ +│ 8388608 │ 2 │ +└─────────────────────┴───────┘ + +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [ReportStats]: Data Movement Statistics: sg0002 +┌─────────────┬────────────────────────────┬───────┬───────────┐ +│ Instruction │ Kind │ Count │ Bytes │ +├─────────────┼────────────────────────────┼───────┼───────────┤ +│ DMACopy │ Input -> Internal │ 1 │ 12582912 │ +│ DMACopy │ Internal │ 4 │ 8388608 │ +│ Load │ Const -> Internal │ 8 │ 348936 │ +│ Load │ ExternalInput -> Internal │ 487 │ 213274636 │ +│ Load │ Internal │ 46 │ 12905354 │ +│ Save │ Internal │ 341 │ 12751367 │ +│ Save │ Internal -> ExternalOutput │ 1 │ 4 │ +└─────────────┴────────────────────────────┴───────┴───────────┘ + +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [ReportStats]: +┌─────────────────────┬───────┐ +│ Bytes per partition │ Count │ +├─────────────────────┼───────┤ +│ 1 │ 1 │ +│ 2 │ 3 │ +│ 4 │ 9 │ +│ 8 │ 2 │ +│ 16 │ 3 │ +│ 32 │ 6 │ +│ 64 │ 2 │ +│ 128 │ 4 │ +│ 256 │ 1 │ +│ 384 │ 1 │ +│ 512 │ 302 │ +│ 1024 │ 113 │ +│ 2048 │ 1 │ +│ 4096 │ 434 │ +│ 9496 │ 2 │ +│ 4194304 │ 3 │ +└─────────────────────┴───────┘ + +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [ReportStats]: MM Stats: #MatMults 19343 #MatMult-Transposes 6379 +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [ReportStats]: IO Tensor size combined: 6781451308 +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [ReportStats]: IO Tensor Statistics: +┌────────────────────┬────────────────┬──────────┬──────────────┐ +│ Largest IO Tensors │ Kind │ Src Type │ Size (Bytes) │ +├────────────────────┼────────────────┼──────────┼──────────────┤ +│ input60_sg0000 │ ExternalInput │ bfloat16 │ 311164928 │ +│ input369_sg0002 │ ExternalInput │ bfloat16 │ 311164928 │ +│ input60 │ ExternalInput │ bfloat16 │ 311164928 │ +│ input369 │ ExternalInput │ bfloat16 │ 311164928 │ +│ output3 │ ExternalOutput │ bfloat16 │ 33554432 │ +│ output2 │ ExternalOutput │ bfloat16 │ 33554432 │ +│ input5 │ ExternalInput │ bfloat16 │ 33554432 │ +│ output7 │ ExternalOutput │ bfloat16 │ 33554432 │ +│ input4 │ ExternalInput │ bfloat16 │ 33554432 │ +│ output11 │ ExternalOutput │ bfloat16 │ 33554432 │ +└────────────────────┴────────────────┴──────────┴──────────────┘ + +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [ReportStats]: Large (Internal) Tensor Statistics: +┌─────────────────┬───────────────────┬──────────┬──────────────┐ +│ Largest Tensors │ Kind │ Src Type │ Size (Bytes) │ +├─────────────────┼───────────────────┼──────────┼──────────────┤ +│ intermediate3 │ InternalInterface │ bfloat16 │ 8388608 │ +│ intermediate0 │ InternalInterface │ bfloat16 │ 8388608 │ +│ intermediate20 │ InternalInterface │ bfloat16 │ 8388608 │ +│ intermediate11 │ InternalInterface │ bfloat16 │ 8388608 │ +│ intermediate5 │ InternalInterface │ bfloat16 │ 8388608 │ +│ intermediate14 │ InternalInterface │ bfloat16 │ 8388608 │ +│ intermediate26 │ InternalInterface │ bfloat16 │ 8388608 │ +│ intermediate23 │ InternalInterface │ bfloat16 │ 8388608 │ +│ intermediate17 │ InternalInterface │ bfloat16 │ 8388608 │ +│ intermediate8 │ InternalInterface │ bfloat16 │ 8388608 │ +└─────────────────┴───────────────────┴──────────┴──────────────┘ + +2025-11-04T21:38:57Z USER 9072 (nc00) [CoreForkPass]: report_stats finished after 0.012 seconds +2025-11-04T21:38:57Z INFO 9072 (nc00) [CoreForkPass]: curr_vmrss: 624mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:57Z INFO 9072 (nc00) [CoreForkPass]: Output has 1 module(s), 4 function(s), 8500 memory location(s), 4 block(s), and 27160 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z USER 9072 (nc00) [CoreForkPass]: Running coloring_allocator_dram_post_lnk +2025-11-04T21:38:57Z INFO 9072 (nc00) [CoreForkPass]: Inputs to coloring_allocator_dram_post_lnk: modules=1 functions=4 allocs=8500 blocks=4 instructions=27160 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: allocating spills in DRAM post_link mode for address space Local +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: reserved space = 0 bytes +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: spill space = 0 bytes +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: aligned spill space = 0 bytes +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: dram space = 107374182400 bytes +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: renumber locations +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: size = 0 +2025-11-04T21:38:57Z INFO 9072 []: find first defs for local +2025-11-04T21:38:57Z INFO 9072 []: find first defs for global +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: Num intervals 0 Num locations 0 +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: IntervalTree Build Done +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: info.neighbors init Done +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: IntervalTree readback Done +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: simplify interference graph +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: initialize low and high +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: lo = 0 +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: hi = 0 +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: total = 0 +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: simplify +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: new candidates = 0 +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: Already used DRAM hwm: 58720256 +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: select ranges +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: CC buffer size limit 524288000 +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: allreduce_dram_hwm 58720256 +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: Real CC buffer size 58720256 +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: DRAM hwm after allocation: 58720256 +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: DRAM allocation successful +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:57Z USER 9072 (nc01) [CoreForkPass]: coloring_allocator_dram_shared_post_lnk finished after 0.041 seconds +2025-11-04T21:38:57Z INFO 9072 (nc01) [CoreForkPass]: curr_vmrss: 624mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:57Z INFO 9072 (nc01) [CoreForkPass]: Output has 1 module(s), 4 function(s), 8111 memory location(s), 4 block(s), and 26435 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:57Z USER 9072 (nc00) [CoreForkPass]: coloring_allocator_dram_post_lnk finished after 0.034 seconds +2025-11-04T21:38:57Z INFO 9072 (nc00) [CoreForkPass]: curr_vmrss: 624mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:57Z INFO 9072 (nc00) [CoreForkPass]: Output has 1 module(s), 4 function(s), 8500 memory location(s), 4 block(s), and 27160 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z USER 9072 (nc00) [CoreForkPass]: Running coloring_allocator_dram_shared_post_lnk +2025-11-04T21:38:57Z INFO 9072 (nc00) [CoreForkPass]: Inputs to coloring_allocator_dram_shared_post_lnk: modules=1 functions=4 allocs=8500 blocks=4 instructions=27160 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: allocating spills in DRAM post_link mode for address space Shared +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: reserved space = 0 bytes +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: spill space = 470810680 bytes +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: aligned spill space = 470925312 bytes +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: dram space = 107374182400 bytes +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: renumber locations +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: size = 86 +2025-11-04T21:38:57Z INFO 9072 []: find first defs for local +2025-11-04T21:38:57Z INFO 9072 []: find first defs for global +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: Num intervals 86 Num locations 86 +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: IntervalTree Build Done +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: info.neighbors init Done +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: IntervalTree readback Done +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: simplify interference graph +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: initialize low and high +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: lo = 86 +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: hi = 0 +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: total = 86 +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: simplify +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: new candidates = 0 +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: Already used DRAM hwm: 58720256 +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: select ranges +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: CC buffer size limit 524288000 +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: allreduce_dram_hwm 58720256 +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: Real CC buffer size 58720256 +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: DRAM hwm after allocation: 93335552 +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [DRAM_Allocator]: DRAM allocation successful +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:57Z USER 9072 (nc00) [CoreForkPass]: coloring_allocator_dram_shared_post_lnk finished after 0.026 seconds +2025-11-04T21:38:57Z INFO 9072 (nc00) [CoreForkPass]: curr_vmrss: 624mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:57Z INFO 9072 (nc00) [CoreForkPass]: Output has 1 module(s), 4 function(s), 8500 memory location(s), 4 block(s), and 27160 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z USER 9072 [CoreForkPass]: Compilation status: Total modules: 2, Passed: 6, Failed: 0 +2025-11-04T21:38:57Z USER 9072 [BackendPassManager]: nc_parallel_pass finished after 0.742 seconds +2025-11-04T21:38:57Z INFO 9072 [BackendPassManager]: curr_vmrss: 624mb, ru_maxrss: 852mb (delta=303mb) +2025-11-04T21:38:57Z USER 9072 [BackendPassManager]: Running subgraph_parallel_pass +2025-11-04T21:38:57Z INFO 9072 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=2 functions=8 allocs=16611 blocks=8 instructions=53595 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z USER 9072 (sg00) [SubgraphForkPass]: Running sync_shared_allocations +2025-11-04T21:38:57Z INFO 9072 (sg00) [SubgraphForkPass]: Inputs to sync_shared_allocations: modules=2 functions=8 allocs=16611 blocks=8 instructions=53595 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z USER 9072 (sg00) [SubgraphForkPass]: sync_shared_allocations finished after 0.001 seconds +2025-11-04T21:38:57Z INFO 9072 (sg00) [SubgraphForkPass]: curr_vmrss: 624mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:57Z INFO 9072 (sg00) [SubgraphForkPass]: Output has 2 module(s), 8 function(s), 16611 memory location(s), 8 block(s), and 53595 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z USER 9072 [SubgraphForkPass]: Compilation status: Total subgraphs: 1, Passed: 1, Failed: 0 +2025-11-04T21:38:57Z USER 9072 [BackendPassManager]: subgraph_parallel_pass finished after 0.004 seconds +2025-11-04T21:38:57Z INFO 9072 [BackendPassManager]: curr_vmrss: 624mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:57Z USER 9072 [BackendPassManager]: Running nc_parallel_pass +2025-11-04T21:38:57Z INFO 9072 [BackendPassManager]: Inputs to nc_parallel_pass: modules=2 functions=8 allocs=16611 blocks=8 instructions=53595 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z USER 9072 (nc00) [CoreForkPass]: Running memory_analysis_after_coloring_allocator_dram_post_lnk +2025-11-04T21:38:57Z USER 9072 (nc01) [CoreForkPass]: Running memory_analysis_after_coloring_allocator_dram_post_lnk +2025-11-04T21:38:57Z INFO 9072 (nc00) [CoreForkPass]: Inputs to memory_analysis_after_coloring_allocator_dram_post_lnk: modules=1 functions=4 allocs=8500 blocks=4 instructions=27160 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z INFO 9072 (nc01) [CoreForkPass]: Inputs to memory_analysis_after_coloring_allocator_dram_post_lnk: modules=1 functions=4 allocs=8111 blocks=4 instructions=26435 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z USER 9072 (nc00) [CoreForkPass]: memory_analysis_after_coloring_allocator_dram_post_lnk finished after 0.023 seconds +2025-11-04T21:38:57Z INFO 9072 (nc00) [CoreForkPass]: curr_vmrss: 624mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:57Z INFO 9072 (nc00) [CoreForkPass]: Output has 1 module(s), 4 function(s), 8500 memory location(s), 4 block(s), and 27160 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z USER 9072 (nc00) [CoreForkPass]: Running lower_dynamic_dma +2025-11-04T21:38:57Z INFO 9072 (nc00) [CoreForkPass]: Inputs to lower_dynamic_dma: modules=1 functions=4 allocs=8500 blocks=4 instructions=27160 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z USER 9072 (nc01) [CoreForkPass]: memory_analysis_after_coloring_allocator_dram_post_lnk finished after 0.024 seconds +2025-11-04T21:38:57Z INFO 9072 (nc01) [CoreForkPass]: curr_vmrss: 624mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:57Z INFO 9072 (nc01) [CoreForkPass]: Output has 1 module(s), 4 function(s), 8111 memory location(s), 4 block(s), and 26435 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z USER 9072 (nc01) [CoreForkPass]: Running lower_dynamic_dma +2025-11-04T21:38:57Z INFO 9072 (nc01) [CoreForkPass]: Inputs to lower_dynamic_dma: modules=1 functions=4 allocs=8111 blocks=4 instructions=26435 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z USER 9072 (nc00) [CoreForkPass]: lower_dynamic_dma finished after 0.006 seconds +2025-11-04T21:38:57Z INFO 9072 (nc00) [CoreForkPass]: curr_vmrss: 624mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:57Z INFO 9072 (nc00) [CoreForkPass]: Output has 1 module(s), 4 function(s), 8500 memory location(s), 4 block(s), and 27160 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z USER 9072 (nc00) [CoreForkPass]: Running legalize_dynamic_dma +2025-11-04T21:38:57Z INFO 9072 (nc00) [CoreForkPass]: Inputs to legalize_dynamic_dma: modules=1 functions=4 allocs=8500 blocks=4 instructions=27160 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z USER 9072 (nc01) [CoreForkPass]: lower_dynamic_dma finished after 0.006 seconds +2025-11-04T21:38:57Z INFO 9072 (nc01) [CoreForkPass]: curr_vmrss: 624mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:57Z INFO 9072 (nc01) [CoreForkPass]: Output has 1 module(s), 4 function(s), 8111 memory location(s), 4 block(s), and 26435 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z USER 9072 (nc01) [CoreForkPass]: Running legalize_dynamic_dma +2025-11-04T21:38:57Z INFO 9072 (nc01) [CoreForkPass]: Inputs to legalize_dynamic_dma: modules=1 functions=4 allocs=8111 blocks=4 instructions=26435 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [LegalizeDynamicDMA]: Legalize Dynamic DMA scanned 1 DGE instructions +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [LegalizeDynamicDMA]: After Legalize Dynamic DMA, 1 DGE instructions were scanned +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [LegalizeDynamicDMA]: +┌───────────┬───────────────────────────────┬────────────────────────────┐ +│ Sub-Pass │ Illegal Instructions Detected │ New Instructions Generated │ +├───────────┼───────────────────────────────┼────────────────────────────┤ +│ Peeling │ 0 │ 0 │ +│ Unrolling │ 0 │ 0 │ +│ Splitting │ 0 │ 0 │ +└───────────┴───────────────────────────────┴────────────────────────────┘ + +2025-11-04T21:38:57Z USER 9072 (nc00) [CoreForkPass]: legalize_dynamic_dma finished after 0.014 seconds +2025-11-04T21:38:57Z INFO 9072 (nc00) [CoreForkPass]: curr_vmrss: 624mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [LegalizeDynamicDMA]: Legalize Dynamic DMA scanned 1 DGE instructions +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [LegalizeDynamicDMA]: After Legalize Dynamic DMA, 1 DGE instructions were scanned +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [LegalizeDynamicDMA]: +┌───────────┬───────────────────────────────┬────────────────────────────┐ +│ Sub-Pass │ Illegal Instructions Detected │ New Instructions Generated │ +├───────────┼───────────────────────────────┼────────────────────────────┤ +│ Peeling │ 0 │ 0 │ +│ Unrolling │ 0 │ 0 │ +│ Splitting │ 0 │ 0 │ +└───────────┴───────────────────────────────┴────────────────────────────┘ + +2025-11-04T21:38:57Z USER 9072 (nc01) [CoreForkPass]: legalize_dynamic_dma finished after 0.013 seconds +2025-11-04T21:38:57Z INFO 9072 (nc01) [CoreForkPass]: curr_vmrss: 624mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:57Z INFO 9072 (nc00) [CoreForkPass]: Output has 1 module(s), 4 function(s), 8500 memory location(s), 4 block(s), and 27160 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z USER 9072 (nc00) [CoreForkPass]: Running optimize_queue_switch +2025-11-04T21:38:57Z INFO 9072 (nc01) [CoreForkPass]: Output has 1 module(s), 4 function(s), 8111 memory location(s), 4 block(s), and 26435 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z USER 9072 (nc01) [CoreForkPass]: Running optimize_queue_switch +2025-11-04T21:38:57Z INFO 9072 (nc00) [CoreForkPass]: Inputs to optimize_queue_switch: modules=1 functions=4 allocs=8500 blocks=4 instructions=27160 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z INFO 9072 (nc01) [CoreForkPass]: Inputs to optimize_queue_switch: modules=1 functions=4 allocs=8111 blocks=4 instructions=26435 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [OptimizeQueueSwitch]: Optimize queue switch has replaced 7 total SQI Instructions with RQI +2025-11-04T21:38:57Z USER 9072 (nc01) [CoreForkPass]: optimize_queue_switch finished after 0.003 seconds +2025-11-04T21:38:57Z INFO 9072 (nc01) [CoreForkPass]: curr_vmrss: 624mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [OptimizeQueueSwitch]: Optimize queue switch has replaced 7 total SQI Instructions with RQI +2025-11-04T21:38:57Z USER 9072 (nc00) [CoreForkPass]: optimize_queue_switch finished after 0.003 seconds +2025-11-04T21:38:57Z INFO 9072 (nc00) [CoreForkPass]: curr_vmrss: 624mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:57Z INFO 9072 (nc01) [CoreForkPass]: Output has 1 module(s), 4 function(s), 8111 memory location(s), 4 block(s), and 26442 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z USER 9072 (nc01) [CoreForkPass]: Running lower_dma +2025-11-04T21:38:57Z INFO 9072 (nc00) [CoreForkPass]: Output has 1 module(s), 4 function(s), 8500 memory location(s), 4 block(s), and 27167 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z USER 9072 (nc00) [CoreForkPass]: Running lower_dma +2025-11-04T21:38:57Z INFO 9072 (nc01) [CoreForkPass]: Inputs to lower_dma: modules=1 functions=4 allocs=8111 blocks=4 instructions=26442 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z INFO 9072 (nc00) [CoreForkPass]: Inputs to lower_dma: modules=1 functions=4 allocs=8500 blocks=4 instructions=27167 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z INFO 9072 (nc01/sgLnk) [LowerDMA]: lower_dma metrics start + IO + Copy (DGE/DMA) + 128 partition : 6240/6240 (100% DGE) + power-of-2 partition : 6241/6274 (99.474% DGE) + > 3 dimensional : 0/0 + non-integer desc size : 0/0 + total : 6242/6275 (99.4741% DGE) + Cast (DGE/DMA) + 128 partition : 57/57 (100% DGE) + power-of-2 partition : 169/170 (99.4118% DGE) + > 3 dimensional : 0/0 + non-integer desc size : 0/0 + total : 169/170 (99.4118% DGE) + Spill/Reload + Copy (DGE/DMA) + 128 partition : 8711/8719 (99.9082% DGE) + power-of-2 partition : 8711/9028 (96.4887% DGE) + > 3 dimensional : 0/8 (0% DGE) + non-integer desc size : 0/0 + total : 8711/9028 (96.4887% DGE) + Cast (DGE/DMA) + 128 partition : 0/0 + power-of-2 partition : 0/0 + > 3 dimensional : 0/0 + non-integer desc size : 0/0 + total : 0/0 + CopyMode + CCE : 29 + Transpose : 1792 + Replicate : 0 + Dynamic (DGE/DMA) + scalar : 1/1 (100% DGE) + vector : 1824/1824 (100% DGE) + Opcode + ReadVarAddr : 0 + IndirectLoad : 0 + IndirectSave : 0 + IndirectSaveAccumulate : 0 + DstReduceDGE : 0 +lower_dma metrics end +2025-11-04T21:38:57Z USER 9072 (nc01) [CoreForkPass]: lower_dma finished after 0.069 seconds +2025-11-04T21:38:57Z INFO 9072 (nc01) [CoreForkPass]: curr_vmrss: 624mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:57Z INFO 9072 (nc01) [CoreForkPass]: Output has 1 module(s), 4 function(s), 8111 memory location(s), 4 block(s), and 26442 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z USER 9072 (nc01) [CoreForkPass]: Running expand_all_engine +2025-11-04T21:38:57Z INFO 9072 (nc01) [CoreForkPass]: Inputs to expand_all_engine: modules=1 functions=4 allocs=8111 blocks=4 instructions=26442 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z INFO 9072 (nc00/sgLnk) [LowerDMA]: lower_dma metrics start + IO + Copy (DGE/DMA) + 128 partition : 6240/6240 (100% DGE) + power-of-2 partition : 6269/6332 (99.0051% DGE) + > 3 dimensional : 0/0 + non-integer desc size : 0/0 + total : 6270/6333 (99.0052% DGE) + Cast (DGE/DMA) + 128 partition : 57/57 (100% DGE) + power-of-2 partition : 169/170 (99.4118% DGE) + > 3 dimensional : 0/0 + non-integer desc size : 0/0 + total : 169/170 (99.4118% DGE) + Spill/Reload + Copy (DGE/DMA) + 128 partition : 8716/8724 (99.9083% DGE) + power-of-2 partition : 8716/9066 (96.1394% DGE) + > 3 dimensional : 0/8 (0% DGE) + non-integer desc size : 0/0 + total : 8716/9066 (96.1394% DGE) + Cast (DGE/DMA) + 128 partition : 0/0 + power-of-2 partition : 0/2 (0% DGE) + > 3 dimensional : 0/0 + non-integer desc size : 0/0 + total : 0/2 (0% DGE) + CopyMode + CCE : 29 + Transpose : 1792 + Replicate : 0 + Dynamic (DGE/DMA) + scalar : 1/1 (100% DGE) + vector : 1824/1824 (100% DGE) + Opcode + ReadVarAddr : 0 + IndirectLoad : 0 + IndirectSave : 0 + IndirectSaveAccumulate : 0 + DstReduceDGE : 0 +lower_dma metrics end +2025-11-04T21:38:57Z USER 9072 (nc00) [CoreForkPass]: lower_dma finished after 0.073 seconds +2025-11-04T21:38:57Z INFO 9072 (nc00) [CoreForkPass]: curr_vmrss: 624mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:57Z INFO 9072 (nc00) [CoreForkPass]: Output has 1 module(s), 4 function(s), 8500 memory location(s), 4 block(s), and 27167 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z USER 9072 (nc00) [CoreForkPass]: Running expand_all_engine +2025-11-04T21:38:57Z INFO 9072 (nc00) [CoreForkPass]: Inputs to expand_all_engine: modules=1 functions=4 allocs=8500 blocks=4 instructions=27167 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z USER 9072 (nc01) [CoreForkPass]: expand_all_engine finished after 0.007 seconds +2025-11-04T21:38:57Z INFO 9072 (nc01) [CoreForkPass]: curr_vmrss: 624mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:57Z INFO 9072 (nc01) [CoreForkPass]: Output has 1 module(s), 4 function(s), 8111 memory location(s), 4 block(s), and 26442 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z USER 9072 (nc01) [CoreForkPass]: Running alloc_semaphores +2025-11-04T21:38:57Z INFO 9072 (nc01) [CoreForkPass]: Inputs to alloc_semaphores: modules=1 functions=4 allocs=8111 blocks=4 instructions=26442 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z USER 9072 (nc00) [CoreForkPass]: expand_all_engine finished after 0.008 seconds +2025-11-04T21:38:57Z INFO 9072 (nc00) [CoreForkPass]: curr_vmrss: 624mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:57Z INFO 9072 (nc00) [CoreForkPass]: Output has 1 module(s), 4 function(s), 8500 memory location(s), 4 block(s), and 27167 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:57Z USER 9072 (nc00) [CoreForkPass]: Running alloc_semaphores +2025-11-04T21:38:57Z INFO 9072 (nc00) [CoreForkPass]: Inputs to alloc_semaphores: modules=1 functions=4 allocs=8500 blocks=4 instructions=27167 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z USER 9072 (nc01) [CoreForkPass]: alloc_semaphores finished after 0.034 seconds +2025-11-04T21:38:58Z INFO 9072 (nc01) [CoreForkPass]: curr_vmrss: 624mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:58Z INFO 9072 (nc01) [CoreForkPass]: Output has 1 module(s), 4 function(s), 8111 memory location(s), 4 block(s), and 26442 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z USER 9072 (nc01) [CoreForkPass]: Running expand_inst_late +2025-11-04T21:38:58Z INFO 9072 (nc01) [CoreForkPass]: Inputs to expand_inst_late: modules=1 functions=4 allocs=8111 blocks=4 instructions=26442 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z USER 9072 (nc00) [CoreForkPass]: alloc_semaphores finished after 0.036 seconds +2025-11-04T21:38:58Z INFO 9072 (nc00) [CoreForkPass]: curr_vmrss: 624mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:58Z INFO 9072 (nc00) [CoreForkPass]: Output has 1 module(s), 4 function(s), 8500 memory location(s), 4 block(s), and 27167 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z USER 9072 (nc00) [CoreForkPass]: Running expand_inst_late +2025-11-04T21:38:58Z INFO 9072 (nc00) [CoreForkPass]: Inputs to expand_inst_late: modules=1 functions=4 allocs=8500 blocks=4 instructions=27167 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z USER 9072 (nc01) [CoreForkPass]: expand_inst_late finished after 0.038 seconds +2025-11-04T21:38:58Z INFO 9072 (nc01) [CoreForkPass]: curr_vmrss: 624mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:58Z INFO 9072 (nc01) [CoreForkPass]: Output has 1 module(s), 4 function(s), 8111 memory location(s), 4 block(s), and 26741 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z USER 9072 (nc01) [CoreForkPass]: Running seq_inst_opt +2025-11-04T21:38:58Z INFO 9072 (nc01) [CoreForkPass]: Inputs to seq_inst_opt: modules=1 functions=4 allocs=8111 blocks=4 instructions=26741 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [SeqInstOpt]: Removing 0 unnecessary InstRegisterMove instruction(s) from Block1 +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [SeqInstOpt]: Removing 160 unnecessary InstRegisterMove instruction(s) from Block1 +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [SeqInstOpt]: Removing 129 unnecessary InstRegisterMove instruction(s) from Block1 +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [SeqInstOpt]: Removing 0 unnecessary InstRegisterMove instruction(s) from Block1 +2025-11-04T21:38:58Z USER 9072 (nc01) [CoreForkPass]: seq_inst_opt finished after 0.006 seconds +2025-11-04T21:38:58Z INFO 9072 (nc01) [CoreForkPass]: curr_vmrss: 624mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:58Z USER 9072 (nc00) [CoreForkPass]: expand_inst_late finished after 0.039 seconds +2025-11-04T21:38:58Z INFO 9072 (nc00) [CoreForkPass]: curr_vmrss: 624mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:58Z INFO 9072 (nc01) [CoreForkPass]: Output has 1 module(s), 4 function(s), 8111 memory location(s), 4 block(s), and 26452 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z USER 9072 (nc01) [CoreForkPass]: Running lower_sync +2025-11-04T21:38:58Z INFO 9072 (nc01) [CoreForkPass]: Inputs to lower_sync: modules=1 functions=4 allocs=8111 blocks=4 instructions=26452 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z INFO 9072 (nc00) [CoreForkPass]: Output has 1 module(s), 4 function(s), 8500 memory location(s), 4 block(s), and 27466 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z USER 9072 (nc00) [CoreForkPass]: Running seq_inst_opt +2025-11-04T21:38:58Z INFO 9072 (nc00) [CoreForkPass]: Inputs to seq_inst_opt: modules=1 functions=4 allocs=8500 blocks=4 instructions=27466 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [SeqInstOpt]: Removing 0 unnecessary InstRegisterMove instruction(s) from Block1 +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [SeqInstOpt]: Removing 160 unnecessary InstRegisterMove instruction(s) from Block1 +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [SeqInstOpt]: Removing 129 unnecessary InstRegisterMove instruction(s) from Block1 +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [SeqInstOpt]: Removing 0 unnecessary InstRegisterMove instruction(s) from Block1 +2025-11-04T21:38:58Z USER 9072 (nc00) [CoreForkPass]: seq_inst_opt finished after 0.005 seconds +2025-11-04T21:38:58Z INFO 9072 (nc00) [CoreForkPass]: curr_vmrss: 624mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:58Z INFO 9072 (nc00) [CoreForkPass]: Output has 1 module(s), 4 function(s), 8500 memory location(s), 4 block(s), and 27177 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z USER 9072 (nc00) [CoreForkPass]: Running lower_sync +2025-11-04T21:38:58Z INFO 9072 (nc00) [CoreForkPass]: Inputs to lower_sync: modules=1 functions=4 allocs=8500 blocks=4 instructions=27177 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z USER 9072 (nc01) [CoreForkPass]: lower_sync finished after 0.018 seconds +2025-11-04T21:38:58Z INFO 9072 (nc01) [CoreForkPass]: curr_vmrss: 624mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:58Z INFO 9072 (nc01) [CoreForkPass]: Output has 1 module(s), 4 function(s), 8111 memory location(s), 4 block(s), and 28370 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z USER 9072 (nc01) [CoreForkPass]: Running lower_act +2025-11-04T21:38:58Z INFO 9072 (nc01) [CoreForkPass]: Inputs to lower_act: modules=1 functions=4 allocs=8111 blocks=4 instructions=28370 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z USER 9072 (nc01) [CoreForkPass]: lower_act finished after 0.005 seconds +2025-11-04T21:38:58Z INFO 9072 (nc01) [CoreForkPass]: curr_vmrss: 624mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:58Z INFO 9072 (nc01) [CoreForkPass]: Output has 1 module(s), 4 function(s), 8111 memory location(s), 4 block(s), and 28385 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z USER 9072 (nc01) [CoreForkPass]: Running lower_dve +2025-11-04T21:38:58Z INFO 9072 (nc01) [CoreForkPass]: Inputs to lower_dve: modules=1 functions=4 allocs=8111 blocks=4 instructions=28385 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [LowerDVE]: Loading DVE opcodes table dve_info.json from /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/dve/dve_bin_gen3/dve_info.json +2025-11-04T21:38:58Z USER 9072 (nc00) [CoreForkPass]: lower_sync finished after 0.019 seconds +2025-11-04T21:38:58Z INFO 9072 (nc00) [CoreForkPass]: curr_vmrss: 625mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:58Z INFO 9072 (nc00) [CoreForkPass]: Output has 1 module(s), 4 function(s), 8500 memory location(s), 4 block(s), and 29262 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z USER 9072 (nc00) [CoreForkPass]: Running lower_act +2025-11-04T21:38:58Z INFO 9072 (nc00) [CoreForkPass]: Inputs to lower_act: modules=1 functions=4 allocs=8500 blocks=4 instructions=29262 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z USER 9072 (nc00) [CoreForkPass]: lower_act finished after 0.005 seconds +2025-11-04T21:38:58Z INFO 9072 (nc00) [CoreForkPass]: curr_vmrss: 625mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:58Z INFO 9072 (nc00) [CoreForkPass]: Output has 1 module(s), 4 function(s), 8500 memory location(s), 4 block(s), and 29278 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z USER 9072 (nc00) [CoreForkPass]: Running lower_dve +2025-11-04T21:38:58Z INFO 9072 (nc00) [CoreForkPass]: Inputs to lower_dve: modules=1 functions=4 allocs=8500 blocks=4 instructions=29278 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [LowerDVE]: Loading DVE opcodes table dve_info.json from /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/dve/dve_bin_gen3/dve_info.json +2025-11-04T21:38:58Z USER 9072 (nc01) [CoreForkPass]: lower_dve finished after 0.102 seconds +2025-11-04T21:38:58Z INFO 9072 (nc01) [CoreForkPass]: curr_vmrss: 639mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:58Z INFO 9072 (nc01) [CoreForkPass]: Output has 1 module(s), 4 function(s), 8111 memory location(s), 4 block(s), and 28385 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z USER 9072 (nc01) [CoreForkPass]: Running lower_ap +2025-11-04T21:38:58Z INFO 9072 (nc01) [CoreForkPass]: Inputs to lower_ap: modules=1 functions=4 allocs=8111 blocks=4 instructions=28385 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z USER 9072 (nc00) [CoreForkPass]: lower_dve finished after 0.103 seconds +2025-11-04T21:38:58Z INFO 9072 (nc00) [CoreForkPass]: curr_vmrss: 639mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:58Z USER 9072 (nc01) [CoreForkPass]: lower_ap finished after 0.009 seconds +2025-11-04T21:38:58Z INFO 9072 (nc01) [CoreForkPass]: curr_vmrss: 639mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:58Z INFO 9072 (nc00) [CoreForkPass]: Output has 1 module(s), 4 function(s), 8500 memory location(s), 4 block(s), and 29278 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z USER 9072 (nc00) [CoreForkPass]: Running lower_ap +2025-11-04T21:38:58Z INFO 9072 (nc01) [CoreForkPass]: Output has 1 module(s), 4 function(s), 8111 memory location(s), 4 block(s), and 28385 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z USER 9072 (nc01) [CoreForkPass]: Running coloring_allocator_reg +2025-11-04T21:38:58Z INFO 9072 (nc00) [CoreForkPass]: Inputs to lower_ap: modules=1 functions=4 allocs=8500 blocks=4 instructions=29278 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z INFO 9072 (nc01) [CoreForkPass]: Inputs to coloring_allocator_reg: modules=1 functions=4 allocs=8111 blocks=4 instructions=28385 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: allocating REG +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: main loop iteration 1 +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: allocating REG +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: main loop iteration 1 +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: renumber registers +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: size = 4 +2025-11-04T21:38:58Z INFO 9072 []: find first defs for local reg +2025-11-04T21:38:58Z INFO 9072 []: find first defs for global reg +2025-11-04T21:38:58Z USER 9072 (nc00) [CoreForkPass]: lower_ap finished after 0.009 seconds +2025-11-04T21:38:58Z INFO 9072 (nc00) [CoreForkPass]: curr_vmrss: 639mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:58Z INFO 9072 (nc00) [CoreForkPass]: Output has 1 module(s), 4 function(s), 8500 memory location(s), 4 block(s), and 29278 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z USER 9072 (nc00) [CoreForkPass]: Running coloring_allocator_reg +2025-11-04T21:38:58Z INFO 9072 (nc00) [CoreForkPass]: Inputs to coloring_allocator_reg: modules=1 functions=4 allocs=8500 blocks=4 instructions=29278 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: allocating REG +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: main loop iteration 1 +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: live range analysis +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: find costs +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: allocating REG +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: main loop iteration 1 +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: simplify interference graph +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: initialize low and high +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: lo = 4 +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: hi = 0 +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: inf = 0 +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: total = 4 +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: simplify +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: new candidates = 0 +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: select ranges +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: no more spills +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: REG score = 0 (lower is better) +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: Spilling from REG cost about 0 cycles +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: 0% REG utilization after allocation +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: renumber registers +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: size = 4 +2025-11-04T21:38:58Z INFO 9072 []: find first defs for local reg +2025-11-04T21:38:58Z INFO 9072 []: find first defs for global reg +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: allocating REG +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: main loop iteration 1 +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: live range analysis +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: renumber registers +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: size = 3 +2025-11-04T21:38:58Z INFO 9072 []: find first defs for local reg +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: find costs +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: simplify interference graph +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: initialize low and high +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: lo = 4 +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: hi = 0 +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: inf = 0 +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: total = 4 +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: simplify +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: new candidates = 0 +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: select ranges +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: no more spills +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: REG score = 0 (lower is better) +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: Spilling from REG cost about 0 cycles +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: 0% REG utilization after allocation +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:58Z INFO 9072 []: find first defs for global reg +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: live range analysis +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: allocating REG +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: main loop iteration 1 +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: renumber registers +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: size = 3 +2025-11-04T21:38:58Z INFO 9072 []: find first defs for local reg +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: find costs +2025-11-04T21:38:58Z INFO 9072 []: find first defs for global reg +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: simplify interference graph +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: initialize low and high +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: lo = 3 +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: hi = 0 +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: inf = 0 +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: total = 3 +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: simplify +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: new candidates = 0 +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: select ranges +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: no more spills +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: REG score = 0 (lower is better) +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: Spilling from REG cost about 0 cycles +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: 0% REG utilization after allocation +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: live range analysis +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: find costs +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: simplify interference graph +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: initialize low and high +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: lo = 3 +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: hi = 0 +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: inf = 0 +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: total = 3 +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: simplify +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: new candidates = 0 +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: select ranges +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: no more spills +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: REG score = 0 (lower is better) +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: Spilling from REG cost about 0 cycles +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: 0% REG utilization after allocation +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: allocating REG +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: main loop iteration 1 +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: renumber registers +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: size = 4 +2025-11-04T21:38:58Z INFO 9072 []: find first defs for local reg +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: allocating REG +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: main loop iteration 1 +2025-11-04T21:38:58Z INFO 9072 []: find first defs for global reg +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: renumber registers +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: size = 4 +2025-11-04T21:38:58Z INFO 9072 []: find first defs for local reg +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: live range analysis +2025-11-04T21:38:58Z INFO 9072 []: find first defs for global reg +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: find costs +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: live range analysis +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: simplify interference graph +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: initialize low and high +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: lo = 4 +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: hi = 0 +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: inf = 0 +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: total = 4 +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: simplify +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: new candidates = 0 +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: select ranges +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: no more spills +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: REG score = 0 (lower is better) +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: Spilling from REG cost about 0 cycles +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [REG_Allocator]: 0% REG utilization after allocation +2025-11-04T21:38:58Z USER 9072 (nc01) [CoreForkPass]: coloring_allocator_reg finished after 0.078 seconds +2025-11-04T21:38:58Z INFO 9072 (nc01) [CoreForkPass]: curr_vmrss: 644mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:58Z INFO 9072 (nc01) [CoreForkPass]: Output has 1 module(s), 4 function(s), 8111 memory location(s), 4 block(s), and 28385 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: find costs +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: simplify interference graph +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: initialize low and high +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: lo = 4 +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: hi = 0 +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: inf = 0 +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: total = 4 +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: simplify +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: new candidates = 0 +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: select ranges +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: no more spills +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: REG score = 0 (lower is better) +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: Spilling from REG cost about 0 cycles +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [REG_Allocator]: 0% REG utilization after allocation +2025-11-04T21:38:58Z USER 9072 (nc00) [CoreForkPass]: coloring_allocator_reg finished after 0.080 seconds +2025-11-04T21:38:58Z INFO 9072 (nc00) [CoreForkPass]: curr_vmrss: 644mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:58Z INFO 9072 (nc00) [CoreForkPass]: Output has 1 module(s), 4 function(s), 8500 memory location(s), 4 block(s), and 29278 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z USER 9072 [CoreForkPass]: Compilation status: Total modules: 2, Passed: 2, Failed: 0 +2025-11-04T21:38:58Z USER 9072 [BackendPassManager]: nc_parallel_pass finished after 0.438 seconds +2025-11-04T21:38:58Z INFO 9072 [BackendPassManager]: curr_vmrss: 644mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:58Z USER 9072 [BackendPassManager]: Running vnc_remote_addr_map +2025-11-04T21:38:58Z INFO 9072 [BackendPassManager]: Inputs to vnc_remote_addr_map: modules=2 functions=8 allocs=16611 blocks=8 instructions=57663 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z USER 9072 [BackendPassManager]: vnc_remote_addr_map finished after 0.002 seconds +2025-11-04T21:38:58Z INFO 9072 [BackendPassManager]: curr_vmrss: 644mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:58Z INFO 9072 [BackendPassManager]: Output has 2 module(s), 8 function(s), 16611 memory location(s), 8 block(s), and 57663 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z USER 9072 [BackendPassManager]: Running vnc_link +2025-11-04T21:38:58Z INFO 9072 [BackendPassManager]: Inputs to vnc_link: modules=2 functions=8 allocs=16611 blocks=8 instructions=57663 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z INFO 9072 [VncLink]: Found 0 remote updates +2025-11-04T21:38:58Z USER 9072 [BackendPassManager]: vnc_link finished after 0.001 seconds +2025-11-04T21:38:58Z INFO 9072 [BackendPassManager]: curr_vmrss: 644mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:58Z INFO 9072 [BackendPassManager]: Output has 2 module(s), 8 function(s), 16611 memory location(s), 8 block(s), and 57663 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z USER 9072 [BackendPassManager]: Running mod_parallel_pass +2025-11-04T21:38:58Z INFO 9072 [BackendPassManager]: Inputs to mod_parallel_pass: modules=2 functions=8 allocs=16611 blocks=8 instructions=57663 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z USER 9072 (nc00/sgLnk) [ModuleForkPass]: Running birverifier +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=4 allocs=8500 blocks=4 instructions=29278 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z USER 9072 (nc01/sgLnk) [ModuleForkPass]: Running birverifier +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=4 allocs=8111 blocks=4 instructions=28385 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z USER 9072 (nc00/sgLnk) [ModuleForkPass]: birverifier finished after 0.077 seconds +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [ModuleForkPass]: curr_vmrss: 644mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [ModuleForkPass]: Output has 1 module(s), 4 function(s), 8500 memory location(s), 4 block(s), and 29278 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z USER 9072 (nc01/sgLnk) [ModuleForkPass]: birverifier finished after 0.079 seconds +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [ModuleForkPass]: curr_vmrss: 644mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [ModuleForkPass]: Output has 1 module(s), 4 function(s), 8111 memory location(s), 4 block(s), and 28385 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z USER 9072 [ModuleForkPass]: Compilation status: Total modules: 2, Passed: 2, Failed: 0 +2025-11-04T21:38:58Z USER 9072 [BackendPassManager]: mod_parallel_pass finished after 0.083 seconds +2025-11-04T21:38:58Z INFO 9072 [BackendPassManager]: curr_vmrss: 644mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:58Z USER 9072 [BackendPassManager]: Running subgraph_parallel_pass +2025-11-04T21:38:58Z INFO 9072 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=2 functions=8 allocs=16611 blocks=8 instructions=57663 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z USER 9072 (sg00) [SubgraphForkPass]: Running lnc_verifier +2025-11-04T21:38:58Z INFO 9072 (sg00) [SubgraphForkPass]: Inputs to lnc_verifier: modules=2 functions=8 allocs=16611 blocks=8 instructions=57663 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z USER 9072 (sg00) [SubgraphForkPass]: lnc_verifier finished after 0.015 seconds +2025-11-04T21:38:58Z INFO 9072 (sg00) [SubgraphForkPass]: curr_vmrss: 644mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:58Z INFO 9072 (sg00) [SubgraphForkPass]: Output has 2 module(s), 8 function(s), 16611 memory location(s), 8 block(s), and 57663 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z USER 9072 [SubgraphForkPass]: Compilation status: Total subgraphs: 1, Passed: 1, Failed: 0 +2025-11-04T21:38:58Z USER 9072 [BackendPassManager]: subgraph_parallel_pass finished after 0.017 seconds +2025-11-04T21:38:58Z INFO 9072 [BackendPassManager]: curr_vmrss: 644mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:58Z USER 9072 [BackendPassManager]: Running mod_parallel_pass +2025-11-04T21:38:58Z INFO 9072 [BackendPassManager]: Inputs to mod_parallel_pass: modules=2 functions=8 allocs=16611 blocks=8 instructions=57663 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z USER 9072 (nc00/sgLnk) [ModuleForkPass]: Running codegen +2025-11-04T21:38:58Z USER 9072 (nc01/sgLnk) [ModuleForkPass]: Running codegen +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [ModuleForkPass]: Inputs to codegen: modules=1 functions=4 allocs=8500 blocks=4 instructions=29278 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [ModuleForkPass]: Inputs to codegen: modules=1 functions=4 allocs=8111 blocks=4 instructions=28385 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [Codegen]: Total un-allocated DRAM tensors by kind: +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [Codegen]: +┌────────────────┬─────────────┐ +│ TensorKind │ Size (GB) │ +├────────────────┼─────────────┤ +│ ExternalInput │ 1.89234 │ +│ ExternalOutput │ 1.75 │ +│ Const │ 0.000626575 │ +└────────────────┴─────────────┘ + +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [Codegen]: Total un-allocated DRAM tensors by kind: +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [Codegen]: +┌────────────────┬────────────┐ +│ TensorKind │ Size (GB) │ +├────────────────┼────────────┤ +│ ExternalInput │ 1.89234 │ +│ ExternalOutput │ 1.75 │ +│ Const │ 0.00062466 │ +└────────────────┴────────────┘ + +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [Codegen]: Instruction Stats: +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [Codegen]: +┌─────────────────────┬───────┐ +│ Opcode │ Count │ +├─────────────────────┼───────┤ +│ MATMUL │ 19683 │ +│ LDWEIGHTS │ 19580 │ +│ EVENT_SEMAPHORE │ 2085 │ +│ UNKNOWN(0xd4) │ 1536 │ +│ ACTIVATE │ 1162 │ +│ COPY │ 979 │ +│ CAST │ 942 │ +│ TENSOR_TENSOR │ 923 │ +│ PSEUDO_DMA_TRIGGER │ 495 │ +│ UNKNOWN(0x9b) │ 320 │ +│ UNKNOWN(0x9a) │ 320 │ +│ GATHER │ 291 │ +│ POOL_BUFFER_LOAD │ 291 │ +│ TENSOR_SCALAR_ADDR │ 287 │ +│ MEMSET │ 177 │ +│ UNKNOWN(0xda) │ 169 │ +│ UNKNOWN(0xd3) │ 145 │ +│ TENSOR_REDUCE │ 141 │ +│ UNKNOWN(0x92) │ 136 │ +│ RECIPROCAL │ 131 │ +│ DVE_READ_INDICES │ 128 │ +│ UNKNOWN(0x24) │ 128 │ +│ MATCH_REPLACE8 │ 128 │ +│ MATCH_VALUE_LOAD │ 128 │ +│ MAX8 │ 128 │ +│ TENSOR_SCALAR │ 69 │ +│ UNKNOWN(0xd8) │ 52 │ +│ PSEUDO_BRANCH_LABEL │ 20 │ +│ LOAD_MASK_SELECT │ 20 │ +│ STREAM_SHUFFLE │ 20 │ +│ ACT_TABLE_LOAD │ 16 │ +│ UNKNOWN(0xd2) │ 15 │ +│ UNKNOWN(0xd9) │ 8 │ +│ PSEUDO_DMA_REARM │ 7 │ +│ UNKNOWN(0xcf) │ 7 │ +│ MOVE │ 7 │ +│ UNKNOWN(0xe8) │ 6 │ +│ IOTA │ 3 │ +│ UNKNOWN(0xe5) │ 2 │ +│ ALU_OP │ 2 │ +│ PSEUDO_TENSOR_LOAD │ 1 │ +│ TENSOR_SCALAR │ 1 │ +│ RNG │ 1 │ +└─────────────────────┴───────┘ + +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [Codegen]: +┌────────────┬───────┐ +│ Engine │ Count │ +├────────────┼───────┤ +│ Unassigned │ 0 │ +│ GPSIMD │ 3209 │ +│ Scalar │ 4320 │ +│ Tensor │ 39586 │ +│ SyncDMA │ 0 │ +│ Vector │ 3070 │ +│ Sync │ 525 │ +│ All │ 0 │ +└────────────┴───────┘ + +2025-11-04T21:38:58Z USER 9072 (nc00/sgLnk) [Codegen]: isa_gen finished after 0.248 seconds +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [Codegen]: Number of DMA descriptors on each queue instance: +┌───────────────────────────┬────────────────┐ +│ Queue Instance │ RT Descriptors │ +├────────────���──────────────┼────────────────┤ +│ qActSpillReload0_defId_2 │ 602 │ +│ qDVESpillReload0_defId_2 │ 142 │ +│ qPoolSpillReload0_defId_0 │ 163840 │ +│ qPoolSpillReload0_defId_1 │ 163840 │ +│ qPoolSpillReload0_defId_2 │ 207 │ +│ qSPIO0 │ 86088 │ +│ qSPPIOParam0 │ 56 │ +│ qSPSpillReload0_defId_0 │ 2 │ +│ qSPSpillReload0_defId_2 │ 8550 │ +└───────────────────────────┴────────────────┘ + +Total descriptors: 423327 (0.00630806 GB) +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [Codegen]: Number of DMA engines used by each queue: +┌───────────────────┬──────────────────────┐ +│ Queue │ DMA Engines │ +├───────────────────┼──────────────────────┤ +│ qSPDynamicHW │ 16 │ +│ qPoolDynamic │ 16 │ +│ qSPSpillReload0 │ 16 │ +│ qSPIO0 │ 16 │ +│ qActDynamicHW │ 16 │ +│ qPoolSpillReload0 │ 16 │ +│ qDVESpillReload0 │ 16 │ +│ qActSpillReload0 │ 16 │ +│ qSPPIOParam0 │ 16 │ +├───────────────────┼──────────────────────┤ +│ TOTAL │ 144 (must be <= 176) │ +└───────────────────┴──────────────────────┘ + +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [Codegen]: Tensors with largest descriptor count: +┌───────────────────────────────────────────────────────┬───────────────┬──────────┬──────────────────┐ +│ Tensor Name │ Kind │ Src Type │ Descriptor Count │ +├───────────────────────────────────────────────────────┼───────────────┼──────────┼──────────────────┤ +│ I-2513-0_grp_14_sec_0_mhlo_exponential_6_b0_i0_sg0001 │ Internal │ bfloat16 │ 16 │ +│ I-2513-0_b3_grp_15_s0_tile0_exp_tp_sbuf_sg0001 │ Internal │ bfloat16 │ 16 │ +│ I-2513-0_b0_grp_14_s0_tile0_exp_tp_sbuf_sg0001 │ Internal │ bfloat16 │ 16 │ +│ I-2769-0_grp_14_sec_0_mhlo_exponential_6_b1_i0_sg0000 │ Internal │ bfloat16 │ 16 │ +│ I-2513-0_grp_12_sec_0_mhlo_exponential_6_b2_i0_sg0001 │ Internal │ bfloat16 │ 16 │ +│ add.4_sg0001 │ Internal │ bfloat16 │ 27 │ +│ all-reduce.465.2514_sg0001 │ Internal │ bfloat16 │ 27 │ +│ compare.2.1760_sg0001 │ Internal │ int32 │ 27 │ +│ input2 │ ExternalInput │ int32 │ 28 │ +│ convert.55_sg0002 │ Internal │ float32 │ 298 │ +└───────────────────────────────────────────────────────┴───────────────┴──────────┴──────────────────┘ + +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [Codegen]: Instruction Stats: +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [Codegen]: +┌─────────────────────┬───────┐ +│ Opcode │ Count │ +├─────────────────────┼───────┤ +│ MATMUL │ 19435 │ +│ LDWEIGHTS │ 19332 │ +│ EVENT_SEMAPHORE │ 1918 │ +│ UNKNOWN(0xd4) │ 1529 │ +│ ACTIVATE │ 1155 │ +│ CAST │ 942 │ +│ TENSOR_TENSOR │ 921 │ +│ COPY │ 851 │ +│ PSEUDO_DMA_TRIGGER │ 456 │ +│ UNKNOWN(0x9a) │ 320 │ +│ UNKNOWN(0x9b) │ 320 │ +│ TENSOR_SCALAR_ADDR │ 287 │ +│ UNKNOWN(0xda) │ 169 │ +│ MEMSET │ 163 │ +│ UNKNOWN(0xd3) │ 145 │ +│ UNKNOWN(0x92) │ 136 │ +│ TENSOR_REDUCE │ 136 │ +│ RECIPROCAL │ 129 │ +│ UNKNOWN(0x24) │ 128 │ +│ TENSOR_SCALAR │ 67 │ +│ UNKNOWN(0xd8) │ 52 │ +│ PSEUDO_BRANCH_LABEL │ 20 │ +│ LOAD_MASK_SELECT │ 16 │ +│ STREAM_SHUFFLE │ 16 │ +│ UNKNOWN(0xd2) │ 15 │ +│ ACT_TABLE_LOAD │ 15 │ +│ UNKNOWN(0xd9) │ 8 │ +│ PSEUDO_DMA_REARM │ 7 │ +│ UNKNOWN(0xcf) │ 7 │ +│ MOVE │ 7 │ +│ UNKNOWN(0xe8) │ 6 │ +│ IOTA │ 3 │ +│ ALU_OP │ 2 │ +│ PSEUDO_TENSOR_LOAD │ 1 │ +└─────────────────────┴───────┘ + +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [Codegen]: +┌────────────┬───────┐ +│ Engine │ Count │ +├────────────┼───────┤ +│ Unassigned │ 0 │ +│ GPSIMD │ 2569 │ +│ Scalar │ 4169 │ +│ Tensor │ 39085 │ +│ SyncDMA │ 0 │ +│ Vector │ 2432 │ +│ Sync │ 479 │ +│ All │ 0 │ +└────────────┴───────┘ + +2025-11-04T21:38:58Z USER 9072 (nc00/sgLnk) [Codegen]: dma_desc_gen finished after 0.036 seconds +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [Codegen]: Generating debug info +2025-11-04T21:38:58Z USER 9072 (nc01/sgLnk) [Codegen]: isa_gen finished after 0.289 seconds +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [Codegen]: Number of DMA descriptors on each queue instance: +┌───────────────────────────┬────────────────┐ +│ Queue Instance │ RT Descriptors │ +├───────────────────────────┼────────────────┤ +│ qActSpillReload0_defId_2 │ 596 │ +│ qDVESpillReload0_defId_2 │ 2 │ +│ qPoolSpillReload0_defId_0 │ 163840 │ +│ qPoolSpillReload0_defId_1 │ 163840 │ +│ qPoolSpillReload0_defId_2 │ 7 │ +│ qSPIO0 │ 86084 │ +│ qSPSpillReload0_defId_0 │ 2 │ +│ qSPSpillReload0_defId_2 │ 8206 │ +└───────────────────────────┴────────────────┘ + +Total descriptors: 422577 (0.00629689 GB) +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [Codegen]: Number of DMA engines used by each queue: +┌───────────────────┬──────────────────────┐ +│ Queue │ DMA Engines │ +├───────────────────┼──────────────────────┤ +│ qSPDynamicHW │ 16 │ +│ qPoolDynamic │ 16 │ +│ qSPSpillReload0 │ 16 │ +│ qSPIO0 │ 16 │ +│ qActDynamicHW │ 16 │ +│ qPoolSpillReload0 │ 16 │ +│ qActSpillReload0 │ 16 │ +│ qDVESpillReload0 │ 16 │ +├───────────────────┼──────────────────────┤ +│ TOTAL │ 128 (must be <= 176) │ +└───────────────────┴──────────────────────┘ + +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [Codegen]: Tensors with largest descriptor count: +┌───────────────────────────────────────────────────────┬───────────────┬──────────┬──────────────────┐ +│ Tensor Name │ Kind │ Src Type │ Descriptor Count │ +├───────────────────────────────────────────────────────┼───────────────┼──────────┼──────────────────┤ +│ I-2513-0_grp_12_sec_0_mhlo_exponential_6_b3_i0_sg0001 │ Internal │ bfloat16 │ 16 │ +│ I-2769-0_grp_12_sec_0_mhlo_exponential_6_b1_i0_sg0000 │ Internal │ bfloat16 │ 16 │ +│ I-2513-0_b2_grp_13_s0_tile0_exp_tp_sbuf_sg0001 │ Internal │ bfloat16 │ 16 │ +│ I-2769-0_grp_15_sec_0_mhlo_exponential_6_b1_i0_sg0000 │ Internal │ bfloat16 │ 16 │ +│ I-2769-0_b1_grp_12_s0_tile0_exp_tp_sbuf_sg0000 │ Internal │ bfloat16 │ 16 │ +│ I-2513-0_grp_13_sec_0_mhlo_exponential_6_b2_i0_sg0001 │ Internal │ bfloat16 │ 16 │ +│ compare.2.1760_sg0001 │ Internal │ int32 │ 27 │ +│ add.4_sg0001 │ Internal │ bfloat16 │ 27 │ +│ input2 │ ExternalInput │ int32 │ 28 │ +│ convert.55_sg0002 │ Internal │ float32 │ 297 │ +└───────────────────────────────────────────────────────┴───────────────┴──────────┴──────────────────┘ + +2025-11-04T21:38:58Z USER 9072 (nc01/sgLnk) [Codegen]: dma_desc_gen finished after 0.035 seconds +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [Codegen]: Generating debug info +2025-11-04T21:38:58Z USER 9072 (nc00/sgLnk) [Codegen]: debug_info_gen finished after 0.062 seconds +2025-11-04T21:38:58Z USER 9072 (nc00/sgLnk) [ModuleForkPass]: codegen finished after 0.363 seconds +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [ModuleForkPass]: curr_vmrss: 673mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [ModuleForkPass]: Output has 1 module(s), 4 function(s), 8500 memory location(s), 4 block(s), and 29278 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z USER 9072 (nc01/sgLnk) [Codegen]: debug_info_gen finished after 0.050 seconds +2025-11-04T21:38:58Z USER 9072 (nc01/sgLnk) [ModuleForkPass]: codegen finished after 0.389 seconds +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [ModuleForkPass]: curr_vmrss: 673mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [ModuleForkPass]: Output has 1 module(s), 4 function(s), 8111 memory location(s), 4 block(s), and 28385 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z USER 9072 [ModuleForkPass]: Compilation status: Total modules: 2, Passed: 2, Failed: 0 +2025-11-04T21:38:58Z USER 9072 [BackendPassManager]: mod_parallel_pass finished after 0.393 seconds +2025-11-04T21:38:58Z INFO 9072 [BackendPassManager]: curr_vmrss: 673mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:58Z USER 9072 [BackendPassManager]: Running hbm_usage +2025-11-04T21:38:58Z INFO 9072 [BackendPassManager]: Inputs to hbm_usage: modules=2 functions=8 allocs=16611 blocks=8 instructions=57663 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [HBMUsage]: +┌───────────────┬──────────┬───────────────────┐ +│ DMA Ring Type │ I/O Size │ Spill/Reload Size │ +├───────────────┼──────────┼───────────────────┤ +│ Copy │ 1.125KB │ 101.312KB │ +│ CCE │ 1.312MB │ 48.000B │ +│ Transpose │ 0.000B │ 5.000MB │ +│ Replicate │ 0.000B │ 0.000B │ +│ Overhead │ 16.000KB │ 127.250KB │ +└───────────────┴──────────┴───────────────────┘ + +2025-11-04T21:38:58Z INFO 9072 (nc00/sgLnk) [HBMUsage]: +┌─────────────────────┬───────────┐ +│ DRAM Memory Usage │ Size │ +├─────────────────────┼───────────┤ +│ Total: │ 3.739GB │ +│ Model Code │ 3.095MB │ +│ Model Constants │ 657.012KB │ +│ Unallocated Tensors │ 3.642GB │ +│ Allocated Tensors │ 89.008MB │ +│ DMA Ring IO │ 1.329MB │ +│ DMA Ring Spill │ 5.223MB │ +└─────────────────────┴───────────┘ + +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [HBMUsage]: +┌───────────────┬──────────┬───────────────────┐ +│ DMA Ring Type │ I/O Size │ Spill/Reload Size │ +├───────────────┼──────────┼───────────────────┤ +│ Copy │ 1.062KB │ 89.656KB │ +│ CCE │ 1.312MB │ 48.000B │ +│ Transpose │ 0.000B │ 5.000MB │ +│ Replicate │ 0.000B │ 0.000B │ +│ Overhead │ 15.500KB │ 111.500KB │ +└───────────────┴──────────┴───────────────────┘ + +2025-11-04T21:38:58Z INFO 9072 (nc01/sgLnk) [HBMUsage]: +┌─────────────────────┬───────────┐ +│ DRAM Memory Usage │ Size │ +├─────────────────────┼───────────┤ +│ Total: │ 3.707GB │ +│ Model Code │ 2.974MB │ +│ Model Constants │ 655.004KB │ +│ Unallocated Tensors │ 3.642GB │ +│ Allocated Tensors │ 56.000MB │ +│ DMA Ring IO │ 1.329MB │ +│ DMA Ring Spill │ 5.196MB │ +└─────────────────────┴───────────┘ + +2025-11-04T21:38:58Z INFO 9072 [HBMUsage]: Total estimated HBM usage is: 3.803GB +2025-11-04T21:38:58Z USER 9072 [BackendPassManager]: hbm_usage finished after 0.006 seconds +2025-11-04T21:38:58Z INFO 9072 [BackendPassManager]: curr_vmrss: 673mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:58Z INFO 9072 [BackendPassManager]: Output has 2 module(s), 8 function(s), 16611 memory location(s), 8 block(s), and 57663 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z USER 9072 [BackendPassManager]: Running neff_packager +2025-11-04T21:38:58Z INFO 9072 [BackendPassManager]: Inputs to neff_packager: modules=2 functions=8 allocs=16611 blocks=8 instructions=57663 Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z INFO 9072 [NeffPackager]: FileDeDuper file not found value_sg0000_constant.7_CRSM.npy +2025-11-04T21:38:58Z INFO 9072 [NeffPackager]: FileDeDuper file not found value_sg0000_constant.9-1688_CRSM.npy +2025-11-04T21:38:58Z INFO 9072 [NeffPackager]: FileDeDuper file not found value_sg0000_constant.3-1605-1690_CRSM.npy +2025-11-04T21:38:58Z INFO 9072 [NeffPackager]: FileDeDuper file not found value_sg0000_constant.2-1616-1692_CRSM.npy +2025-11-04T21:38:58Z INFO 9072 [NeffPackager]: FileDeDuper file not found value_sg0000_identity_2015_CRSM.npy +2025-11-04T21:38:58Z INFO 9072 [NeffPackager]: FileDeDuper file not found value_sg0000_identity_2002_CRSM.npy +2025-11-04T21:38:58Z INFO 9072 [NeffPackager]: FileDeDuper file not found value_sg0001_constant.15_CRSM.npy +2025-11-04T21:38:58Z INFO 9072 [NeffPackager]: FileDeDuper file not found value_sg0001_constant.12-1545-1641_CRSM.npy +2025-11-04T21:38:58Z INFO 9072 [NeffPackager]: FileDeDuper file not found value_sg0001_constant.11-1556-1643_CRSM.npy +2025-11-04T21:38:58Z INFO 9072 [NeffPackager]: FileDeDuper file not found value_sg0001_constant.12-1567-1645_CRSM.npy +2025-11-04T21:38:58Z INFO 9072 [NeffPackager]: FileDeDuper file not found value_sg0001_constant.11-1577-1647_CRSM.npy +2025-11-04T21:38:58Z INFO 9072 [NeffPackager]: FileDeDuper file not found value_sg0001_identity_1799_CRSM.npy +2025-11-04T21:38:58Z INFO 9072 [NeffPackager]: FileDeDuper file not found value_sg0002_constant.24_CRSM.npy +2025-11-04T21:38:58Z INFO 9072 [NeffPackager]: FileDeDuper file not found value_sg0002_constant.25_CRSM.npy +2025-11-04T21:38:58Z INFO 9072 [NeffPackager]: FileDeDuper file not found value_sg0002_constant.26_CRSM.npy +2025-11-04T21:38:58Z INFO 9072 [NeffPackager]: FileDeDuper file not found value_sg0002_constant.28_CRSM.npy +2025-11-04T21:38:58Z INFO 9072 [NeffPackager]: FileDeDuper file not found value_sg0002_constant.29_CRSM.npy +2025-11-04T21:38:58Z INFO 9072 [NeffPackager]: FileDeDuper file not found value_sg0002_constant.27-1134-1355_CRSM.npy +2025-11-04T21:38:58Z INFO 9072 [NeffPackager]: FileDeDuper file not found value_sg0002_identity_1568_CRSM.npy +2025-11-04T21:38:58Z INFO 9072 [NeffPackager]: Const File de-dup saved 0 KB of memory footprint +2025-11-04T21:38:58Z INFO 9072 [NeffPackager]: FileDeDuper file not found value_sg0000_constant.7_CRSM.npy +2025-11-04T21:38:58Z INFO 9072 [NeffPackager]: FileDeDuper file not found value_sg0000_constant.9-1688_CRSM.npy +2025-11-04T21:38:58Z INFO 9072 [NeffPackager]: FileDeDuper file not found value_sg0000_constant.3-1605-1690_CRSM.npy +2025-11-04T21:38:58Z INFO 9072 [NeffPackager]: FileDeDuper file not found value_sg0000_constant.2-1616-1692_CRSM.npy +2025-11-04T21:38:58Z INFO 9072 [NeffPackager]: FileDeDuper file not found value_sg0000_identity_2015_CRSM.npy +2025-11-04T21:38:58Z INFO 9072 [NeffPackager]: FileDeDuper file not found value_sg0000_identity_2002_CRSM.npy +2025-11-04T21:38:58Z INFO 9072 [NeffPackager]: FileDeDuper file not found value_sg0001_constant.15_CRSM.npy +2025-11-04T21:38:58Z INFO 9072 [NeffPackager]: FileDeDuper file not found value_sg0001_constant.12-1545-1641_CRSM.npy +2025-11-04T21:38:58Z INFO 9072 [NeffPackager]: FileDeDuper file not found value_sg0001_constant.11-1556-1643_CRSM.npy +2025-11-04T21:38:58Z INFO 9072 [NeffPackager]: FileDeDuper file not found value_sg0001_constant.12-1567-1645_CRSM.npy +2025-11-04T21:38:58Z INFO 9072 [NeffPackager]: FileDeDuper file not found value_sg0001_constant.11-1577-1647_CRSM.npy +2025-11-04T21:38:58Z INFO 9072 [NeffPackager]: FileDeDuper file not found value_sg0001_identity_1799_CRSM.npy +2025-11-04T21:38:58Z INFO 9072 [NeffPackager]: FileDeDuper file not found value_sg0002_constant.26_CRSM.npy +2025-11-04T21:38:58Z INFO 9072 [NeffPackager]: FileDeDuper file not found value_sg0002_constant.28_CRSM.npy +2025-11-04T21:38:58Z INFO 9072 [NeffPackager]: FileDeDuper file not found value_sg0002_constant.29_CRSM.npy +2025-11-04T21:38:58Z INFO 9072 [NeffPackager]: FileDeDuper file not found value_sg0002_identity_1568_CRSM.npy +2025-11-04T21:38:58Z INFO 9072 [NeffPackager]: Const File de-dup saved 0 KB of memory footprint +2025-11-04T21:38:58Z WARNING 9072 [NeffFileWriter]: writeKelp missing file /home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk4/neuronxcc-yihckw_e/metrics.json +2025-11-04T21:38:58Z WARNING 9072 [NeffFileWriter]: writeKelp missing file /local/p4clients/pkgbuild-const/workspace/build/KaenaCompiler/KaenaCompiler-2.x.207535.0/AL2_x86_64/DEV.STD.PTHREAD/build/private/_skbuild/linux-x86_64-3.10/cmake-build/neuronxcc/walrus/neff_packager/MetricMetadata.json +2025-11-04T21:38:58Z INFO 9072 [NeffFileWriter]: Neff will be written to: /home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk4/model.MODULE_95ef7ca73cc0a6161be2+96be3c33.neff +2025-11-04T21:38:58Z INFO 9072 [NeffFileWriter]: IR signature: 1ad0472a9e7631754b31a760a7d927aa for neff artifacts +2025-11-04T21:38:58Z USER 9072 [BackendPassManager]: neff_packager finished after 0.137 seconds +2025-11-04T21:38:58Z INFO 9072 [BackendPassManager]: curr_vmrss: 673mb, ru_maxrss: 852mb (delta=0mb) +2025-11-04T21:38:58Z INFO 9072 [BackendPassManager]: Output has 2 module(s), 8 function(s), 16611 memory location(s), 8 block(s), and 57663 instruction(s). Max writers: 299 Max Readers: 5434 +2025-11-04T21:38:58Z INFO 9072 [BackendDriver]: HBM scratchpad usage summary (post-allocation): +┌──────┬───────────┬────────────────────────────────────────────────────────────┬─────────────┐ +│ Core │ Subgraph │ Description │ Value │ +├──────┼───────────┼────────────────────────────────────────────────────────────┼─────────────┤ +│ nc00 │ sg00 │ Peak scratchpad usage: local │ 0.000000 GB │ +│ nc00 │ sg00 │ Peak scratchpad usage: local and shared │ 0.042969 GB │ +│ nc00 │ sg00 │ Total size of allocated tensors: local │ 0.000000 GB │ +│ nc00 │ sg00 │ Total size of allocated tensors: shared │ 0.042969 GB │ +│ nc00 │ sg01 │ Peak scratchpad usage: local │ 0.003906 GB │ +│ nc00 │ sg01 │ Peak scratchpad usage: local and shared │ 0.054688 GB │ +│ nc00 │ sg01 │ Total size of allocated tensors: local │ 0.006348 GB │ +│ nc00 │ sg01 │ Total size of allocated tensors: shared │ 0.054688 GB │ +│ nc00 │ sg02 │ Peak scratchpad usage: local │ 0.003906 GB │ +│ nc00 │ sg02 │ Peak scratchpad usage: local and shared │ 0.027653 GB │ +│ nc00 │ sg02 │ Total size of allocated tensors: local │ 0.003933 GB │ +│ nc00 │ sg02 │ Total size of allocated tensors: shared │ 0.031590 GB │ +│ nc00 │ Max │ Peak scratchpad usage: local │ 0.003906 GB │ +│ nc00 │ Max │ Peak scratchpad usage: local and shared │ 0.054688 GB │ +│ nc00 │ Post-link │ Peak scratchpad usage after intermediate tensor allocation │ 0.086926 GB │ +│ nc00 │ Post-link │ Total size of allocated intermediate tensors │ 0.438583 GB │ +├──────┼───────────┼────────────────────────────────────────────────────────────┼─────────────┤ +│ nc01 │ sg00 │ Peak scratchpad usage: local │ 0.000000 GB │ +│ nc01 │ sg00 │ Total size of allocated tensors: local │ 0.000000 GB │ +│ nc01 │ sg01 │ Peak scratchpad usage: local │ 0.003906 GB │ +│ nc01 │ sg01 │ Total size of allocated tensors: local │ 0.006348 GB │ +│ nc01 │ sg02 │ Peak scratchpad usage: local │ 0.003906 GB │ +│ nc01 │ sg02 │ Total size of allocated tensors: local │ 0.003906 GB │ +│ nc01 │ Max │ Peak scratchpad usage: local │ 0.003906 GB │ +├──────┼───────────┼────────────────────────────────────────────────────────────┼─────────────┤ +│ Max │ Max │ Peak scratchpad usage │ 0.086926 GB │ +│ Max │ Max │ Peak scratchpad usage (page-aligned) │ 0.500000 GB │ +└──────┴───────────┴────────────────────────────────────────────────────────────┴─────────────┘ + +2025-11-04T21:38:58Z INFO 9072 [BackendDriver]: Largest tensors at peak scratchpad usage, core=nc00, subgraph=sg00, addr_space=shared (complete data located at nc00/sg00/memory_analysis_after_coloring_allocator_dram_shared_DRAM_Shared_hwm_allocations.csv): +┌────────────────────────────────────────────────────────────────┬──────────┬───────────────┬─────────────┐ +│ Tensor Name │ Type │ # Sub-tensors │ Total Size │ +├────────────────────────────────────────────────────────────────┼──────────┼───────────────┼─────────────┤ +│ dot.4 │ bfloat16 │ 1 │ 8.000000 MB │ +│ get_tuple_element.1 │ bfloat16 │ 1 │ 4.000000 MB │ +│ reshape.16 │ bfloat16 │ 1 │ 4.000000 MB │ +│ reshape.24 │ bfloat16 │ 1 │ 4.000000 MB │ +│ reshape.29 │ bfloat16 │ 1 │ 4.000000 MB │ +└────────────────────────────────────────────────────────────────┴──────────┴───────────────┴─────────────┘ + +2025-11-04T21:38:58Z INFO 9072 [BackendDriver]: Largest tensors at peak scratchpad usage, core=nc00, subgraph=sg02, addr_space=local (complete data located at nc00/sg02/memory_analysis_after_coloring_allocator_dram_shared_DRAM_Local_hwm_allocations.csv): +┌────────────────────────────────────────────────────────────────┬──────────┬───────────────┬─────────────┐ +│ Tensor Name │ Type │ # Sub-tensors │ Total Size │ +├────────────────────────────────────────────────────────────────┼──────────┼───────────────┼─────────────┤ +│ _spill_1774 │ bfloat16 │ 1 │ 0.000008 MB │ +└────────────────────────────────────────────────────────────────┴──────────┴───────────────┴─────────────┘ + +2025-11-04T21:38:58Z INFO 9072 [BackendDriver]: Largest intermediate tensors at peak scratchpad usage, core=nc00 (complete data located at nc00//sgLnk/sg00/memory_analysis_after_coloring_allocator_dram_post_lnk_DRAM_Shared_hwm_allocations.csv): +┌────────────────────────────────────────────────────────────────┬──────────┬───────────────┬─────────────┐ +│ Tensor Name │ Type │ # Sub-tensors │ Total Size │ +├────────────────────────────────────────────────────────────────┼──────────┼───────────────┼─────────────┤ +│ intermediate0 │ bfloat16 │ 1 │ 8.000000 MB │ +│ intermediate3 │ bfloat16 │ 1 │ 8.000000 MB │ +│ intermediate5 │ bfloat16 │ 1 │ 8.000000 MB │ +│ intermediate6 │ bfloat16 │ 1 │ 8.000000 MB │ +│ intermediate1 │ bfloat16 │ 1 │ 0.500000 MB │ +│ intermediate2 │ bfloat16 │ 1 │ 0.500000 MB │ +│ intermediate4 │ bfloat16 │ 1 │ 0.003906 MB │ +│ intermediate7 │ bfloat16 │ 1 │ 0.003906 MB │ +└────────────────────────────────────────────────────────────────┴──────────┴───────────────┴─────────────┘ + +2025-11-04T21:38:58Z INFO 9072 [BackendDriver]: Largest tensors at peak scratchpad usage, core=nc01, subgraph=sg02, addr_space=local (complete data located at nc01/sg02/memory_analysis_after_coloring_allocator_dram_shared_DRAM_Local_hwm_allocations.csv): +┌────────────────────────────────────────────────────────────────┬──────────┬───────────────┬─────────────┐ +│ Tensor Name │ Type │ # Sub-tensors │ Total Size │ +├────────────────────────────────────────────────────────────────┼──────────┼───────────────┼─────────────┤ +│ _spill_1785 │ bfloat16 │ 3 │ 0.011719 MB │ +└────────────────────────────────────────────────────────────────┴──────────┴───────────────┴─────────────┘ + +2025-11-04T21:38:58Z INFO 9072 [BackendDriver]: Largest intermediate tensors at peak scratchpad usage, core=nc01 (complete data located at nc01//sgLnk/sg00/memory_analysis_after_coloring_allocator_dram_post_lnk_DRAM_Shared_hwm_allocations.csv): +┌────────────────────────────────────────────────────────────────┬──────────┬───────────────┬─────────────┐ +│ Tensor Name │ Type │ # Sub-tensors │ Total Size │ +├────────────────────────────────────────────────────────────────┼──────────┼───────────────┼─────────────┤ +│ intermediate0 │ bfloat16 │ 1 │ 8.000000 MB │ +│ intermediate3 │ bfloat16 │ 1 │ 8.000000 MB │ +│ intermediate5 │ bfloat16 │ 1 │ 8.000000 MB │ +│ intermediate6 │ bfloat16 │ 1 │ 8.000000 MB │ +│ intermediate1 │ bfloat16 │ 1 │ 0.500000 MB │ +│ intermediate2 │ bfloat16 │ 1 │ 0.500000 MB │ +│ intermediate4 │ bfloat16 │ 1 │ 0.003906 MB │ +│ intermediate7 │ bfloat16 │ 1 │ 0.003906 MB │ +└────────────────��───────────────────────────────────────────────┴──────────┴───────────────┴─────────────┘ + +2025-11-04T21:38:58Z INFO 9072 [BackendDriver]: Backend completed successfully, tearing down. +2025-11-04T21:38:59Z INFO 8698 [job.WalrusDriver.0]: VNCBackend: completed successfully. +2025-11-04T21:38:59Z INFO 8698 [pipeline.Pipeline.0]: Finished job job.WalrusDriver.0 +2025-11-04T21:38:59Z INFO 8698 [pipeline.Pipeline.0]: Starting job job.BIRLinker.0 +2025-11-04T21:38:59Z INFO 8698 [job.BIRLinker.0]: Replay this job by calling: /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/bin/neuronx-cc compile --framework XLA --state '{"model": ["/home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk4/model.MODULE_95ef7ca73cc0a6161be2+96be3c33.hlo_module.pb"], "tensormap": "tensor_map.json", "bir": "walrus_bir.out.json", "lorean_sg_key": null, "input_name_map": null, "output_name_map": null, "constant_tensors": null, "cached_wavegraph": "walrus_bir.out.json", "state_dir": "/home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk4/neuronxcc-yihckw_e/nc00/sg00", "state_id": "nc00/sg00"}' --pipeline BIRLinker +2025-11-04T21:38:59Z INFO 8698 [job.BIRLinker.0]: BIRLinker cwd: /home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk4/neuronxcc-yihckw_e +2025-11-04T21:38:59Z INFO 8698 [job.BIRLinker.0]: Linking already done. +2025-11-04T21:38:59Z INFO 8698 [pipeline.Pipeline.0]: Finished job job.BIRLinker.0 +2025-11-04T21:38:59Z INFO 8698 [pipeline.Pipeline.0]: Starting job job.Kelper.0 +2025-11-04T21:38:59Z INFO 8698 [job.Kelper.0]: Skipping neff generation which was already performed by neff_packager +2025-11-04T21:38:59Z INFO 8698 [pipeline.Pipeline.0]: Finished job job.Kelper.0 +2025-11-04T21:38:59Z INFO 8698 [pipeline.Pipeline.0]: Starting job job.NeffWrapper.0 +2025-11-04T21:38:59Z INFO 8698 [job.NeffWrapper.0]: Job NeffWrapper len(in_states) 1 +2025-11-04T21:38:59Z INFO 8698 [job.NeffWrapper.0]: Processing input #0 +2025-11-04T21:38:59Z INFO 8698 [job.NeffWrapper.0]: Start NeffWrapper +2025-11-04T21:38:59Z INFO 8698 [job.NeffWrapper.0]: Executing: /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/starfish/bin/hlo-neff-wrapper --hlo /home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk4/model.MODULE_95ef7ca73cc0a6161be2+96be3c33.hlo_module.pb --neff /home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk4/model.MODULE_95ef7ca73cc0a6161be2+96be3c33.neff --io_transposes /home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk4/neuronxcc-yihckw_e/io_transposes.json --output /home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk4/wrapped_neff.hlo --netlist /home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk4/neuronxcc-yihckw_e/hlo_netlist.json +2025-11-04T21:38:59Z INFO 8698 [job.NeffWrapper.0]: There are no io transposes nor zero-sized parameters. Output will not be produced. +Hlo neff wrapper finished successfully. Have a wonderful day :D + +2025-11-04T21:38:59Z INFO 8698 [job.NeffWrapper.0]: Job #0 finished +2025-11-04T21:38:59Z INFO 8698 [pipeline.Pipeline.0]: Finished job job.NeffWrapper.0 +2025-11-04T21:38:59Z INFO 8698 [pipeline.Pipeline.0]: Finished pipeline Pipeline +2025-11-04T21:38:59Z INFO 8698 [pipeline.Pipeline.0]: Job #0 finished +2025-11-04T21:38:59Z INFO 8685 [root]: Subcommand returned with exitcode=0