diff --git "a/context_encoding_model/_tp0_bk3/log-neuron-cc.txt" "b/context_encoding_model/_tp0_bk3/log-neuron-cc.txt" new file mode 100644--- /dev/null +++ "b/context_encoding_model/_tp0_bk3/log-neuron-cc.txt" @@ -0,0 +1,9555 @@ +2025-11-04T21:38:32Z INFO 8576 [root]: /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/bin/neuronx-cc compile --framework=XLA /home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk3/model.MODULE_be035899334776123ed5+d208bdce.hlo_module.pb --output /home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk3/model.MODULE_be035899334776123ed5+d208bdce.neff --target=trn2 --auto-cast=none --model-type=transformer '--tensorizer-options=--enable-ccop-compute-overlap --cc-pipeline-tiling-factor=2 --vectorize-strided-dma' --lnc=2 -O1 '--internal-hlo2tensorizer-options= --modular-flow-mac-threshold=10 --verify-hlo=true' --logfile=/home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk3/log-neuron-cc.txt --verbose=35 +2025-11-04T21:38:32Z INFO 8576 [root]: NeuronX Compiler version 2.21.33363.0+82129205 Python version 3.10.12 HWM version 2.21.0.33363+82129205 NumPy version 1.26.4 Running on AMI ami-00632e4ca97ea8199 Running in region usw2-az2 +2025-11-04T21:38:32Z INFO 8594 [root]: XLA detected +2025-11-04T21:38:32Z INFO 8594 [root]: Pipeline: HLOToTensorizer Frontend StaticIOTranspose WalrusDriver BIRLinker Kelper NeffWrapper +2025-11-04T21:38:32Z INFO 8594 [root]: Intermediate files stored in /home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk3/neuronxcc-miwah3fg, output in /home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk3 +2025-11-04T21:38:32Z INFO 8594 [pipeline.Pipeline.0]: Job Pipeline len(in_states) 1 +2025-11-04T21:38:32Z INFO 8594 [pipeline.Pipeline.0]: Processing input #0 +2025-11-04T21:38:32Z INFO 8594 [pipeline.Pipeline.0]: Running pipeline Pipeline.0 +2025-11-04T21:38:32Z INFO 8594 [pipeline.Pipeline.0]: Starting job job.HLOToTensorizer.0 +2025-11-04T21:38:32Z INFO 8594 [job.HLOToTensorizer.0]: Job HLOToTensorizer len(in_states) 1 +2025-11-04T21:38:32Z INFO 8594 [job.HLOToTensorizer.0]: Processing input #0 +2025-11-04T21:38:32Z INFO 8594 [job.HLOToTensorizer.0]: Executing: /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/starfish/bin/hlo2penguin --input /home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk3/model.MODULE_be035899334776123ed5+d208bdce.hlo_module.pb --out-dir ./ --output penguin.py --remat --max-costly-ops=2 --max-live-in-size=5 --max-remat-chain-size=10 --max-mem-multiple=1.8 --min-def-use-distance=500 --remat-policy=transformer --allow-same-pass-remat=true --verbose=error --logfile=/home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk3/log-neuron-cc.txt --logfile-verbose=info --layers-per-module=1 --partition --emit-tensor-level-dropout-ops --modular-flow-mac-threshold=10 --verify-hlo=true --native-to-custom-softmax --partitioner-opts='--transformer' +2025-11-04T21:38:33Z INFO 8594 [job.HLOToTensorizer.0]: +Pre-Partition Pre-Opt Histogram: +total HLO instructions: 8312 + reshape 1912 23.00% ################################################################ + broadcast 1123 13.51% ##################################### + transpose 1072 12.90% ################################### + convert 945 11.37% ############################### + constant 636 7.65% ##################### + parameter 371 4.46% ############ + slice 347 4.17% ########### + add 284 3.42% ######### + get-tuple-element 259 3.12% ######## + multiply 255 3.07% ######## + dot 198 2.38% ###### + call 174 2.09% ##### + compare 173 2.08% ##### + select 170 2.05% ##### + concatenate 116 1.40% ### + tuple 57 0.69% # + scatter 57 0.69% # + negate 56 0.67% # + all-reduce 56 0.67% # + divide 29 0.35% + gather 6 0.07% + iota 5 0.06% + all-gather 3 0.04% + reduce 3 0.04% + custom-call 2 0.02% + sine 1 0.01% + cosine 1 0.01% + maximum 1 0.01% + + +Pre-Partition Post-Op Histogram: +total HLO instructions: 5437 + reshape 1421 26.14% ################################################################ + transpose 817 15.03% #################################### + convert 720 13.24% ################################ + constant 443 8.15% ################### + parameter 371 6.82% ################ + broadcast 266 4.89% ########### + dot 197 3.62% ######## + custom-call 175 3.22% ####### + multiply 171 3.15% ####### + add 171 3.15% ####### + get-tuple-element 147 2.70% ###### + slice 115 2.12% ##### + concatenate 114 2.10% ##### + compare 59 1.09% ## + select 58 1.07% ## + scatter 57 1.05% ## + negate 56 1.03% ## + all-reduce 56 1.03% ## + gather 6 0.11% + all-gather 3 0.06% + iota 3 0.06% + reduce 3 0.06% + pad 2 0.04% + sine 1 0.02% + divide 1 0.02% + tuple 1 0.02% + maximum 1 0.02% + rng 1 0.02% + cosine 1 0.02% + +Potential split-points stats: #CC 59 #AR 56 #AG 3 #BN 0 nClamp 0 +ModuleSplitter initial partitioning... #parts 59 +ModuleSplitter initial partitioning... Done. + 0 1 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 57 58 +New disjoint wave: start 2 len 54 NumReps: 27 macs 724775731200 +First non-zero-mac/used part from the end is 58 +Not enough zero-mac parts. skip +ModuleSplitter initial partitioning... #parts 29 +ModuleSplitter initial partitioning... Done. +Remat: gather-iota 0 matches, 0 ops rematted +Wrote HLO netlist to hlo_netlist.json +Wrote graph partitions in debug_info_hlo_partitions.json +Processing partition 0 +Replaced 0 dropout sequences with OffloadedDropout +HLO Ops used in computation: add all-gather all-reduce broadcast compare concatenate constant convert cosine custom-call dot gather get-tuple-element multiply negate parameter reshape scatter select sine slice transpose tuple +Invoking RemoveOptimizationBarriers pass +Processing partition 1 +Replaced 0 dropout sequences with OffloadedDropout +HLO Ops used in computation: add all-reduce broadcast compare concatenate constant convert custom-call dot get-tuple-element multiply negate parameter reshape scatter select slice transpose tuple +Invoking RemoveOptimizationBarriers pass +Processing partition 2 +Replaced 0 dropout sequences with OffloadedDropout +HLO Ops used in computation: add all-gather all-reduce broadcast compare concatenate constant convert custom-call divide dot gather get-tuple-element iota maximum multiply pad parameter reduce reshape rng scatter select slice transpose tuple +Invoking RemoveOptimizationBarriers pass + +2025-11-04T21:38:33Z INFO 8594 [job.HLOToTensorizer.0]: IR signature: 86247b71fdb68182914f06dcd53871dafb9196589a2268ca003589535514de57 for sg0000/HLOToTensorizer +2025-11-04T21:38:33Z INFO 8594 [job.HLOToTensorizer.0]: IR signature: d06bf201ec237c2793e6f9f6befbb43fe986d2b05cbf3fd38077014348d4b362 for sg0001/HLOToTensorizer +2025-11-04T21:38:33Z INFO 8594 [job.HLOToTensorizer.0]: IR signature: 44e4d964e525fd3b8d5dfaf970931f8b9d7fa6a97ad605e177799846a0eca67f for sg0002/HLOToTensorizer +2025-11-04T21:38:33Z INFO 8594 [job.HLOToTensorizer.0]: Job #0 finished +2025-11-04T21:38:33Z INFO 8594 [pipeline.Pipeline.0]: Finished job job.HLOToTensorizer.0 +2025-11-04T21:38:33Z INFO 8594 [pipeline.Pipeline.0]: Starting job job.Frontend.0 +2025-11-04T21:38:33Z INFO 8594 [job.Frontend.0]: Job Frontend len(in_states) 1 +2025-11-04T21:38:33Z INFO 8594 [job.Frontend.0]: Processing input #0 +2025-11-04T21:38:33Z INFO 8594 [job.Frontend.0]: Start model loading +2025-11-04T21:38:33Z INFO 8594 [job.Frontend.0]: Start tensorization +2025-11-04T21:38:33Z INFO 8594 [job.Frontend.0]: Num jobs: 12 +2025-11-04T21:38:33Z USER 8594 [root/Tensorizer/Tensorizer]: Running Tensorizer +2025-11-04T21:38:33Z INFO 8594 [Tensorizer]: Max workers: 3 +2025-11-04T21:38:33Z INFO 8680 [Tensorizer]: Building model from Penguin script "penguin.py.000000"... +2025-11-04T21:38:33Z INFO 8682 [Tensorizer]: Building model from Penguin script "penguin.py.000002"... +2025-11-04T21:38:33Z INFO 8681 [Tensorizer]: Building model from Penguin script "penguin.py.000001"... +2025-11-04T21:38:33Z INFO 8680 [Tensorizer]: Allocate SB of shape (128, 0) for CausalAttentionMMSoftmaxMMWithoutSwap +2025-11-04T21:38:33Z INFO 8680 [Tensorizer]: Allocate PSUM of shape (8, 128, 0) for CausalAttentionMMSoftmaxMMWithoutSwap +2025-11-04T21:38:33Z INFO 8681 [Tensorizer]: Allocate SB of shape (128, 0) for CausalAttentionMMSoftmaxMMWithoutSwap +2025-11-04T21:38:33Z INFO 8681 [Tensorizer]: Allocate PSUM of shape (8, 128, 0) for CausalAttentionMMSoftmaxMMWithoutSwap +2025-11-04T21:38:33Z INFO 8680 [Tensorizer]: Tensorizer options: --enable-ccop-compute-overlap --cc-pipeline-tiling-factor=2 --vectorize-strided-dma --run-pg-layout-and-tiling --enable-dse-after-mask-propagation --disable-concat-delinearizer --num-neuroncores-per-sengine=2 --num-neuroncores-per-sengine=2 --internal_dynamic_dma_scratch_size_per_partition=16384 --disable-bitcasted-transpose --dont-verify-after-all --fp32-cast=none --mm-transpose-type=fp32 --disable-expensive-checks --disable-max-stride-tiling --hbm-scratchpad-page-size-in-bytes=536870912 --enable-replication --max-local-tensor-tile-size-in-bytes=32768 --tensor-layout-p-order=0 --tensor-layout-b-order=1 --enable-advanced-delinearization --weight-coalescing-threshold=512 --enable-bir-converter=enable --enable-tritium-loopfusion --enable-softmax-kernel --model-type-transformer --enable-isl-in-injective-check --enable-dge-on-io-dma --enable-dge-on-spill-reload-dma --enable-dge-on-indirect-dma --enable-dge-on-vector-indirect-dma --keep-rng-tensor-op +2025-11-04T21:38:33Z INFO 8680 [sg0000/Tensorizer/DoNothing]: Running DoNothing +2025-11-04T21:38:33Z INFO 8680 [sg0000/Tensorizer/DoNothing]: Finished (changed=True) +2025-11-04T21:38:33Z INFO 8681 [Tensorizer]: Tensorizer options: --enable-ccop-compute-overlap --cc-pipeline-tiling-factor=2 --vectorize-strided-dma --run-pg-layout-and-tiling --enable-dse-after-mask-propagation --disable-concat-delinearizer --num-neuroncores-per-sengine=2 --num-neuroncores-per-sengine=2 --internal_dynamic_dma_scratch_size_per_partition=16384 --disable-bitcasted-transpose --dont-verify-after-all --fp32-cast=none --mm-transpose-type=fp32 --disable-expensive-checks --disable-max-stride-tiling --hbm-scratchpad-page-size-in-bytes=536870912 --enable-replication --max-local-tensor-tile-size-in-bytes=32768 --tensor-layout-p-order=0 --tensor-layout-b-order=1 --enable-advanced-delinearization --weight-coalescing-threshold=512 --enable-bir-converter=enable --enable-tritium-loopfusion --enable-softmax-kernel --model-type-transformer --enable-isl-in-injective-check --enable-dge-on-io-dma --enable-dge-on-spill-reload-dma --enable-dge-on-indirect-dma --enable-dge-on-vector-indirect-dma --keep-rng-tensor-op +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/DoNothing]: Running DoNothing +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/DoNothing]: Finished (changed=True) +2025-11-04T21:38:33Z INFO 8680 [sg0000/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds +2025-11-04T21:38:33Z INFO 8680 [sg0000/Tensorizer/LegalizeOpLevelAlias]: Running LegalizeOpLevelAlias +2025-11-04T21:38:33Z INFO 8680 [sg0000/Tensorizer/LegalizeOpLevelAlias]: Finished (changed=False) +2025-11-04T21:38:33Z INFO 8680 [sg0000/Tensorizer/LegalizeOpLevelAlias]: LegalizeOpLevelAlias finished after 0.002 seconds +2025-11-04T21:38:33Z INFO 8680 [sg0000/Tensorizer/OptimizeAliasedCopyChain]: Running OptimizeAliasedCopyChain +2025-11-04T21:38:33Z INFO 8680 [sg0000/Tensorizer/OptimizeAliasedCopyChain]: Finished (changed=False) +2025-11-04T21:38:33Z INFO 8680 [sg0000/Tensorizer/OptimizeAliasedCopyChain]: OptimizeAliasedCopyChain finished after 0.002 seconds +2025-11-04T21:38:33Z INFO 8680 [sg0000/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction +2025-11-04T21:38:33Z INFO 8680 [sg0000/Tensorizer/AliasDependencyInduction]: Finished (changed=False) +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/LegalizeOpLevelAlias]: Running LegalizeOpLevelAlias +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/LegalizeOpLevelAlias]: Finished (changed=False) +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/LegalizeOpLevelAlias]: LegalizeOpLevelAlias finished after 0.002 seconds +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/OptimizeAliasedCopyChain]: Running OptimizeAliasedCopyChain +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/OptimizeAliasedCopyChain]: Finished (changed=False) +2025-11-04T21:38:33Z INFO 8680 [sg0000/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.003 seconds +2025-11-04T21:38:33Z INFO 8680 [sg0000/Tensorizer/TransformConvOp]: Running TransformConvOp +2025-11-04T21:38:33Z INFO 8680 [sg0000/Tensorizer/TransformConvOp]: Finished (changed=False) +2025-11-04T21:38:33Z INFO 8680 [sg0000/Tensorizer/TransformConvOp]: TransformConvOp finished after 0.003 seconds +2025-11-04T21:38:33Z INFO 8680 [sg0000/Tensorizer/LowerTensorOp]: Running LowerTensorOp +2025-11-04T21:38:33Z INFO 8680 [sg0000/Tensorizer/LowerTensorOp]: Finished (changed=True) +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/OptimizeAliasedCopyChain]: OptimizeAliasedCopyChain finished after 0.002 seconds +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/AliasDependencyInduction]: Finished (changed=False) +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.002 seconds +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/TransformConvOp]: Running TransformConvOp +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/TransformConvOp]: Finished (changed=False) +2025-11-04T21:38:33Z INFO 8680 [sg0000/Tensorizer/LowerTensorOp]: LowerTensorOp finished after 0.012 seconds +2025-11-04T21:38:33Z INFO 8680 [sg0000/Tensorizer/AliasDependencyReset]: Running AliasDependencyReset +2025-11-04T21:38:33Z INFO 8680 [sg0000/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination +2025-11-04T21:38:33Z INFO 8680 [sg0000/Tensorizer/AliasDependencyElimination]: Finished (changed=False) +2025-11-04T21:38:33Z INFO 8680 [sg0000/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.000 seconds +2025-11-04T21:38:33Z INFO 8680 [sg0000/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/TransformConvOp]: TransformConvOp finished after 0.003 seconds +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/LowerTensorOp]: Running LowerTensorOp +2025-11-04T21:38:33Z INFO 8680 [sg0000/Tensorizer/AliasDependencyInduction]: Finished (changed=True) +2025-11-04T21:38:33Z INFO 8680 [sg0000/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.022 seconds +2025-11-04T21:38:33Z INFO 8680 [sg0000/Tensorizer/AliasDependencyReset]: AliasDependencyReset finished after 0.044 seconds +2025-11-04T21:38:33Z INFO 8680 [sg0000/Tensorizer/LegalizeCCOpLayout]: Running LegalizeCCOpLayout +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/LowerTensorOp]: Finished (changed=True) +2025-11-04T21:38:33Z INFO 8680 [sg0000/Tensorizer/LegalizeCCOpLayout]: Finished (changed=False) +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/LowerTensorOp]: LowerTensorOp finished after 0.040 seconds +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/AliasDependencyReset]: Running AliasDependencyReset +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/AliasDependencyElimination]: Finished (changed=False) +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.000 seconds +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/AliasDependencyInduction]: Finished (changed=True) +2025-11-04T21:38:33Z INFO 8680 [sg0000/Tensorizer/LegalizeCCOpLayout]: LegalizeCCOpLayout finished after 0.009 seconds +2025-11-04T21:38:33Z INFO 8680 [sg0000/Tensorizer/TensorOpSimplifier]: Running TensorOpSimplifier +2025-11-04T21:38:33Z INFO 8680 [sg0000/Tensorizer/TensorOpSimplifier]: Finished (changed=True) +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.017 seconds +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/AliasDependencyReset]: AliasDependencyReset finished after 0.047 seconds +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/LegalizeCCOpLayout]: Running LegalizeCCOpLayout +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/LegalizeCCOpLayout]: Finished (changed=False) +2025-11-04T21:38:33Z INFO 8682 [Tensorizer]: Tensorizer options: --enable-ccop-compute-overlap --cc-pipeline-tiling-factor=2 --vectorize-strided-dma --run-pg-layout-and-tiling --enable-dse-after-mask-propagation --disable-concat-delinearizer --num-neuroncores-per-sengine=2 --num-neuroncores-per-sengine=2 --internal_dynamic_dma_scratch_size_per_partition=16384 --disable-bitcasted-transpose --dont-verify-after-all --fp32-cast=none --mm-transpose-type=fp32 --disable-expensive-checks --disable-max-stride-tiling --hbm-scratchpad-page-size-in-bytes=536870912 --enable-replication --max-local-tensor-tile-size-in-bytes=32768 --tensor-layout-p-order=0 --tensor-layout-b-order=1 --enable-advanced-delinearization --weight-coalescing-threshold=512 --enable-bir-converter=enable --enable-tritium-loopfusion --enable-softmax-kernel --model-type-transformer --enable-isl-in-injective-check --enable-dge-on-io-dma --enable-dge-on-spill-reload-dma --enable-dge-on-indirect-dma --enable-dge-on-vector-indirect-dma --keep-rng-tensor-op +2025-11-04T21:38:33Z INFO 8682 [sg0002/Tensorizer/DoNothing]: Running DoNothing +2025-11-04T21:38:33Z INFO 8682 [sg0002/Tensorizer/DoNothing]: Finished (changed=True) +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/LegalizeCCOpLayout]: LegalizeCCOpLayout finished after 0.002 seconds +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/TensorOpSimplifier]: Running TensorOpSimplifier +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/TensorOpSimplifier]: Finished (changed=True) +2025-11-04T21:38:33Z INFO 8682 [sg0002/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds +2025-11-04T21:38:33Z INFO 8682 [sg0002/Tensorizer/LegalizeOpLevelAlias]: Running LegalizeOpLevelAlias +2025-11-04T21:38:33Z INFO 8682 [sg0002/Tensorizer/LegalizeOpLevelAlias]: Finished (changed=False) +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/TensorOpSimplifier]: TensorOpSimplifier finished after 0.010 seconds +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/CanonicalizeIR]: Running CanonicalizeIR +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/CanonicalizeIR]: Finished (changed=True) +2025-11-04T21:38:33Z INFO 8682 [sg0002/Tensorizer/LegalizeOpLevelAlias]: LegalizeOpLevelAlias finished after 0.002 seconds +2025-11-04T21:38:33Z INFO 8682 [sg0002/Tensorizer/OptimizeAliasedCopyChain]: Running OptimizeAliasedCopyChain +2025-11-04T21:38:33Z INFO 8682 [sg0002/Tensorizer/OptimizeAliasedCopyChain]: Finished (changed=False) +2025-11-04T21:38:33Z INFO 8682 [sg0002/Tensorizer/OptimizeAliasedCopyChain]: OptimizeAliasedCopyChain finished after 0.001 seconds +2025-11-04T21:38:33Z INFO 8682 [sg0002/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction +2025-11-04T21:38:33Z INFO 8682 [sg0002/Tensorizer/AliasDependencyInduction]: Finished (changed=False) +2025-11-04T21:38:33Z INFO 8682 [sg0002/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.001 seconds +2025-11-04T21:38:33Z INFO 8682 [sg0002/Tensorizer/TransformConvOp]: Running TransformConvOp +2025-11-04T21:38:33Z INFO 8682 [sg0002/Tensorizer/TransformConvOp]: Finished (changed=False) +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/CanonicalizeIR]: CanonicalizeIR finished after 0.003 seconds +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/ResolveComplicatePredicates]: Running ResolveComplicatePredicates +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/ResolveComplicatePredicates]: Finished (changed=False) +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/ResolveComplicatePredicates]: ResolveComplicatePredicates finished after 0.002 seconds +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/AffinePredicateResolution]: Running AffinePredicateResolution +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/AffinePredicateResolution]: Finished (changed=False) +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/AffinePredicateResolution]: AffinePredicateResolution finished after 0.002 seconds +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/EliminateDivs]: Running EliminateDivs +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/EliminateDivs]: Finished (changed=False) +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/EliminateDivs]: EliminateDivs finished after 0.003 seconds +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/PerfectLoopNest]: Running PerfectLoopNest +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/PerfectLoopNest]: Finished (changed=False) +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/PerfectLoopNest]: PerfectLoopNest finished after 0.002 seconds +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/Simplifier]: Running Simplifier +2025-11-04T21:38:33Z INFO 8681 [sg0001/Tensorizer/Simplifier]: Running Simplifier_iteration_0 +2025-11-04T21:38:33Z INFO 8682 [sg0002/Tensorizer/TransformConvOp]: TransformConvOp finished after 0.006 seconds +2025-11-04T21:38:33Z INFO 8682 [sg0002/Tensorizer/LowerTensorOp]: Running LowerTensorOp +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/Simplifier]: Simplifier_iteration_0 finished after 0.022 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/Simplifier]: Running Simplifier_iteration_1 +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/Simplifier]: Simplifier_iteration_1 finished after 0.009 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/Simplifier]: Running Simplifier_iteration_2 +2025-11-04T21:38:34Z INFO 8680 [sg0000/Tensorizer/TensorOpSimplifier]: TensorOpSimplifier finished after 0.009 seconds +2025-11-04T21:38:34Z INFO 8680 [sg0000/Tensorizer/CanonicalizeIR]: Running CanonicalizeIR +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/LowerTensorOp]: Finished (changed=True) +2025-11-04T21:38:34Z INFO 8680 [sg0000/Tensorizer/CanonicalizeIR]: Finished (changed=True) +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/Simplifier]: Simplifier_iteration_2 finished after 0.013 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/Simplifier]: Finished (changed=True) +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/LowerTensorOp]: LowerTensorOp finished after 0.030 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/AliasDependencyReset]: Running AliasDependencyReset +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/AliasDependencyElimination]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.000 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/AliasDependencyInduction]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/Simplifier]: Simplifier finished after 0.056 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.002 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/TCTransform]: Running TCTransform +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/TCTransform]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.011 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/AliasDependencyReset]: AliasDependencyReset finished after 0.053 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/LegalizeCCOpLayout]: Running LegalizeCCOpLayout +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/LegalizeCCOpLayout]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/LegalizeCCOpLayout]: LegalizeCCOpLayout finished after 0.003 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/TensorOpSimplifier]: Running TensorOpSimplifier +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/TensorOpSimplifier]: Finished (changed=True) +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/TCTransform]: TCTransform finished after 0.002 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/CommuteConcat]: Running CommuteConcat +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/CommuteConcat]: Running CommuteConcat_iteration_0 +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/CommuteConcat]: CommuteConcat_iteration_0 finished after 0.002 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/CommuteConcat]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/TensorOpSimplifier]: TensorOpSimplifier finished after 0.006 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/CanonicalizeIR]: Running CanonicalizeIR +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/CanonicalizeIR]: Finished (changed=True) +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/CommuteConcat]: CommuteConcat finished after 0.003 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/ExpandBatchNorm]: Running ExpandBatchNorm +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/ExpandBatchNorm]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/ExpandBatchNorm]: ExpandBatchNorm finished after 0.003 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/TCTransform]: Running TCTransform +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/TCTransform]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/TCTransform]: TCTransform finished after 0.002 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.002 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/TensorOpTransform]: Running TensorOpTransform +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/TensorOpTransform]: Running TensorOpTransform_iteration_0 +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/CanonicalizeIR]: CanonicalizeIR finished after 0.002 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/ResolveComplicatePredicates]: Running ResolveComplicatePredicates +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/ResolveComplicatePredicates]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/ResolveComplicatePredicates]: ResolveComplicatePredicates finished after 0.002 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/AffinePredicateResolution]: Running AffinePredicateResolution +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/TensorOpTransform]: TensorOpTransform_iteration_0 finished after 0.031 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/TensorOpTransform]: Running TensorOpTransform_iteration_1 +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/AffinePredicateResolution]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/TensorOpTransform]: TensorOpTransform_iteration_1 finished after 0.004 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/TensorOpTransform]: Finished (changed=True) +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/AffinePredicateResolution]: AffinePredicateResolution finished after 0.001 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/EliminateDivs]: Running EliminateDivs +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/EliminateDivs]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/TensorOpTransform]: TensorOpTransform finished after 0.037 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/LateLowerTensorOp]: Running LateLowerTensorOp +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/LateLowerTensorOp]: Finished (changed=True) +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/LateLowerTensorOp]: LateLowerTensorOp finished after 0.004 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/AliasDependencyReset]: Running AliasDependencyReset +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/AliasDependencyElimination]: Finished (changed=True) +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.000 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/AliasDependencyInduction]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.007 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/AliasDependencyReset]: AliasDependencyReset finished after 0.026 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/MemcpyElimination]: Running MemcpyElimination +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/MemcpyElimination]: Running MemcpyElimination_iteration_0 +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/EliminateDivs]: EliminateDivs finished after 0.002 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/PerfectLoopNest]: Running PerfectLoopNest +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/PerfectLoopNest]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/PerfectLoopNest]: PerfectLoopNest finished after 0.001 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/Simplifier]: Running Simplifier +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/Simplifier]: Running Simplifier_iteration_0 +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/Simplifier]: Simplifier_iteration_0 finished after 0.010 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/Simplifier]: Running Simplifier_iteration_1 +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/Simplifier]: Simplifier_iteration_1 finished after 0.003 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/Simplifier]: Finished (changed=True) +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/Simplifier]: Simplifier finished after 0.013 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.002 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/TCTransform]: Running TCTransform +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/TCTransform]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/TCTransform]: TCTransform finished after 0.002 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/CommuteConcat]: Running CommuteConcat +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/CommuteConcat]: Running CommuteConcat_iteration_0 +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/CommuteConcat]: CommuteConcat_iteration_0 finished after 0.001 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/CommuteConcat]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/CommuteConcat]: CommuteConcat finished after 0.001 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/ExpandBatchNorm]: Running ExpandBatchNorm +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/ExpandBatchNorm]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/MemcpyElimination]: MemcpyElimination_iteration_0 finished after 0.123 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/MemcpyElimination]: Running MemcpyElimination_iteration_1 +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/ExpandBatchNorm]: ExpandBatchNorm finished after 0.002 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/TCTransform]: Running TCTransform +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/TCTransform]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/MemcpyElimination]: MemcpyElimination_iteration_1 finished after 0.004 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/MemcpyElimination]: Finished (changed=True) +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/TCTransform]: TCTransform finished after 0.002 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.001 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/TensorOpTransform]: Running TensorOpTransform +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/TensorOpTransform]: Running TensorOpTransform_iteration_0 +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/MemcpyElimination]: MemcpyElimination finished after 0.130 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/LoopFusion]: Running LoopFusion +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/LoopFusion]: Running LoopFusion_iteration_0 +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/LoopFusion]: LoopFusion_iteration_0 finished after 0.013 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/LoopFusion]: Running LoopFusion_iteration_1 +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/LoopFusion]: LoopFusion_iteration_1 finished after 0.006 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/LoopFusion]: Running LoopFusion_iteration_2 +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/TensorOpTransform]: TensorOpTransform_iteration_0 finished after 0.032 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/TensorOpTransform]: Running TensorOpTransform_iteration_1 +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/LoopFusion]: LoopFusion_iteration_2 finished after 0.004 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/LoopFusion]: Running LoopFusion_iteration_0 +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/TensorOpTransform]: TensorOpTransform_iteration_1 finished after 0.004 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/TensorOpTransform]: Finished (changed=True) +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/LoopFusion]: LoopFusion_iteration_0 finished after 0.005 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/LoopFusion]: Running LoopFusion_iteration_1 +2025-11-04T21:38:34Z INFO 8680 [sg0000/Tensorizer/CanonicalizeIR]: CanonicalizeIR finished after 0.007 seconds +2025-11-04T21:38:34Z INFO 8680 [sg0000/Tensorizer/ResolveComplicatePredicates]: Running ResolveComplicatePredicates +2025-11-04T21:38:34Z INFO 8680 [sg0000/Tensorizer/ResolveComplicatePredicates]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/LoopFusion]: LoopFusion_iteration_1 finished after 0.005 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/LoopFusion]: Finished (changed=True) +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/TensorOpTransform]: TensorOpTransform finished after 0.038 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/LateLowerTensorOp]: Running LateLowerTensorOp +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/LateLowerTensorOp]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8680 [sg0000/Tensorizer/ResolveComplicatePredicates]: ResolveComplicatePredicates finished after 0.002 seconds +2025-11-04T21:38:34Z INFO 8680 [sg0000/Tensorizer/AffinePredicateResolution]: Running AffinePredicateResolution +2025-11-04T21:38:34Z INFO 8680 [sg0000/Tensorizer/AffinePredicateResolution]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/LateLowerTensorOp]: LateLowerTensorOp finished after 0.002 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/AliasDependencyReset]: Running AliasDependencyReset +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/AliasDependencyElimination]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.000 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/AliasDependencyInduction]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.006 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/AliasDependencyReset]: AliasDependencyReset finished after 0.028 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/MemcpyElimination]: Running MemcpyElimination +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/MemcpyElimination]: Running MemcpyElimination_iteration_0 +2025-11-04T21:38:34Z INFO 8680 [sg0000/Tensorizer/AffinePredicateResolution]: AffinePredicateResolution finished after 0.002 seconds +2025-11-04T21:38:34Z INFO 8680 [sg0000/Tensorizer/EliminateDivs]: Running EliminateDivs +2025-11-04T21:38:34Z INFO 8680 [sg0000/Tensorizer/EliminateDivs]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/MemcpyElimination]: MemcpyElimination_iteration_0 finished after 0.031 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/MemcpyElimination]: Running MemcpyElimination_iteration_1 +2025-11-04T21:38:34Z INFO 8680 [sg0000/Tensorizer/EliminateDivs]: EliminateDivs finished after 0.006 seconds +2025-11-04T21:38:34Z INFO 8680 [sg0000/Tensorizer/PerfectLoopNest]: Running PerfectLoopNest +2025-11-04T21:38:34Z INFO 8680 [sg0000/Tensorizer/PerfectLoopNest]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/MemcpyElimination]: MemcpyElimination_iteration_1 finished after 0.004 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/MemcpyElimination]: Finished (changed=True) +2025-11-04T21:38:34Z INFO 8680 [sg0000/Tensorizer/PerfectLoopNest]: PerfectLoopNest finished after 0.002 seconds +2025-11-04T21:38:34Z INFO 8680 [sg0000/Tensorizer/Simplifier]: Running Simplifier +2025-11-04T21:38:34Z INFO 8680 [sg0000/Tensorizer/Simplifier]: Running Simplifier_iteration_0 +2025-11-04T21:38:34Z INFO 8680 [sg0000/Tensorizer/Simplifier]: Simplifier_iteration_0 finished after 0.012 seconds +2025-11-04T21:38:34Z INFO 8680 [sg0000/Tensorizer/Simplifier]: Running Simplifier_iteration_1 +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/MemcpyElimination]: MemcpyElimination finished after 0.036 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/LoopFusion]: Running LoopFusion +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/LoopFusion]: Running LoopFusion_iteration_0 +2025-11-04T21:38:34Z INFO 8680 [sg0000/Tensorizer/Simplifier]: Simplifier_iteration_1 finished after 0.005 seconds +2025-11-04T21:38:34Z INFO 8680 [sg0000/Tensorizer/Simplifier]: Running Simplifier_iteration_2 +2025-11-04T21:38:34Z INFO 8680 [sg0000/Tensorizer/Simplifier]: Simplifier_iteration_2 finished after 0.005 seconds +2025-11-04T21:38:34Z INFO 8680 [sg0000/Tensorizer/Simplifier]: Finished (changed=True) +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/LoopFusion]: LoopFusion_iteration_0 finished after 0.014 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/LoopFusion]: Running LoopFusion_iteration_1 +2025-11-04T21:38:34Z INFO 8680 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.025 seconds +2025-11-04T21:38:34Z INFO 8680 [sg0000/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/LoopFusion]: LoopFusion_iteration_1 finished after 0.003 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/LoopFusion]: Running LoopFusion_iteration_0 +2025-11-04T21:38:34Z INFO 8680 [sg0000/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/LoopFusion]: LoopFusion_iteration_0 finished after 0.004 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/LoopFusion]: Finished (changed=True) +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/LoopFusion]: LoopFusion finished after 0.035 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/Rematerialization]: Running Rematerialization +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/Rematerialization]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/LoopFusion]: LoopFusion finished after 0.023 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/Rematerialization]: Running Rematerialization +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/Rematerialization]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/Rematerialization]: Rematerialization finished after 0.004 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/Simplifier]: Running Simplifier +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/Simplifier]: Running Simplifier_iteration_0 +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/Simplifier]: Simplifier_iteration_0 finished after 0.005 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/Simplifier]: Running Simplifier_iteration_1 +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/Simplifier]: Simplifier_iteration_1 finished after 0.005 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/Simplifier]: Finished (changed=True) +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/Rematerialization]: Rematerialization finished after 0.002 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/Simplifier]: Running Simplifier +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/Simplifier]: Running Simplifier_iteration_0 +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/Simplifier]: Simplifier_iteration_0 finished after 0.006 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/Simplifier]: Running Simplifier_iteration_1 +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/Simplifier]: Simplifier finished after 0.011 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/Delinearization]: Running Delinearization +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/Simplifier]: Simplifier_iteration_1 finished after 0.004 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/Simplifier]: Running Simplifier_iteration_2 +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/Simplifier]: Simplifier_iteration_2 finished after 0.004 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/Simplifier]: Finished (changed=True) +2025-11-04T21:38:34Z INFO 8680 [sg0000/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.002 seconds +2025-11-04T21:38:34Z INFO 8680 [sg0000/Tensorizer/TCTransform]: Running TCTransform +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/Delinearization]: Finished (changed=True) +2025-11-04T21:38:34Z INFO 8680 [sg0000/Tensorizer/TCTransform]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/Delinearization]: Delinearization finished after 0.009 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/DeadStoreElimination]: Running DeadStoreElimination +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/Simplifier]: Simplifier finished after 0.016 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/Delinearization]: Running Delinearization +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/Delinearization]: Finished (changed=True) +2025-11-04T21:38:34Z INFO 8680 [sg0000/Tensorizer/TCTransform]: TCTransform finished after 0.003 seconds +2025-11-04T21:38:34Z INFO 8680 [sg0000/Tensorizer/CommuteConcat]: Running CommuteConcat +2025-11-04T21:38:34Z INFO 8680 [sg0000/Tensorizer/CommuteConcat]: Running CommuteConcat_iteration_0 +2025-11-04T21:38:34Z INFO 8680 [sg0000/Tensorizer/CommuteConcat]: CommuteConcat_iteration_0 finished after 0.002 seconds +2025-11-04T21:38:34Z INFO 8680 [sg0000/Tensorizer/CommuteConcat]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8680 [sg0000/Tensorizer/CommuteConcat]: CommuteConcat finished after 0.003 seconds +2025-11-04T21:38:34Z INFO 8680 [sg0000/Tensorizer/ExpandBatchNorm]: Running ExpandBatchNorm +2025-11-04T21:38:34Z INFO 8680 [sg0000/Tensorizer/ExpandBatchNorm]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/DeadStoreElimination]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/DeadStoreElimination]: DeadStoreElimination finished after 0.052 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/Simplifier]: Running Simplifier +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/Simplifier]: Running Simplifier_iteration_0 +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/Simplifier]: Simplifier_iteration_0 finished after 0.004 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/Simplifier]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/Simplifier]: Simplifier finished after 0.004 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/LICM]: Running LICM +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/LICM]: Finished (changed=True) +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/LICM]: LICM finished after 0.002 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/Delinearization]: Running Delinearization +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/Delinearization]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/Delinearization]: Delinearization finished after 0.004 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/LoopFusion]: Running LoopFusion +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/LoopFusion]: Running LoopFusion_iteration_0 +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/LoopFusion]: LoopFusion_iteration_0 finished after 0.003 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/LoopFusion]: Running LoopFusion_iteration_0 +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/LoopFusion]: LoopFusion_iteration_0 finished after 0.003 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/LoopFusion]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/LoopFusion]: LoopFusion finished after 0.008 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/SimplifySlice]: Running SimplifySlice +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/SimplifySlice]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/SimplifySlice]: SimplifySlice finished after 0.002 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/LICM]: Running LICM +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/LICM]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/LICM]: LICM finished after 0.001 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/Simplifier]: Running Simplifier +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/Simplifier]: Running Simplifier_iteration_0 +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/Simplifier]: Simplifier_iteration_0 finished after 0.004 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/Simplifier]: Running Simplifier_iteration_1 +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/Simplifier]: Simplifier_iteration_1 finished after 0.004 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/Simplifier]: Finished (changed=True) +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/Delinearization]: Delinearization finished after 0.009 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/DeadStoreElimination]: Running DeadStoreElimination +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/Simplifier]: Simplifier finished after 0.009 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/ValueNumbering]: Running ValueNumbering +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/ValueNumbering]: Finished (changed=True) +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/ValueNumbering]: ValueNumbering finished after 0.005 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/LICM]: Running LICM +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/DeadStoreElimination]: Finished (changed=True) +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/LICM]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/DeadStoreElimination]: DeadStoreElimination finished after 0.031 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/Simplifier]: Running Simplifier +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/Simplifier]: Running Simplifier_iteration_0 +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/Simplifier]: Simplifier_iteration_0 finished after 0.004 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/Simplifier]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/LICM]: LICM finished after 0.002 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/PadElimination]: Running PadElimination +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/PadElimination]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/PadElimination]: PadElimination finished after 0.000 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/Delinearization]: Running Delinearization +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/Delinearization]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/Simplifier]: Simplifier finished after 0.006 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/LICM]: Running LICM +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/LICM]: Finished (changed=True) +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/LICM]: LICM finished after 0.003 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/Delinearization]: Running Delinearization +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/Delinearization]: Finished (changed=True) +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/Delinearization]: Delinearization finished after 0.003 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/LoopFusion]: Running LoopFusion +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/LoopFusion]: Running LoopFusion_iteration_0 +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/LoopFusion]: LoopFusion_iteration_0 finished after 0.002 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/LoopFusion]: Running LoopFusion_iteration_0 +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/LoopFusion]: LoopFusion_iteration_0 finished after 0.002 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/LoopFusion]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/Delinearization]: Delinearization finished after 0.005 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/LoopFusion]: Running LoopFusion +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/LoopFusion]: Running LoopFusion_iteration_0 +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/LoopFusion]: LoopFusion_iteration_0 finished after 0.004 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/LoopFusion]: Running LoopFusion_iteration_0 +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/LoopFusion]: LoopFusion_iteration_0 finished after 0.003 seconds +2025-11-04T21:38:34Z INFO 8681 [sg0001/Tensorizer/LoopFusion]: Finished (changed=False) +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/LoopFusion]: LoopFusion finished after 0.005 seconds +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/SimplifySlice]: Running SimplifySlice +2025-11-04T21:38:34Z INFO 8682 [sg0002/Tensorizer/SimplifySlice]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/LoopFusion]: LoopFusion finished after 0.009 seconds +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.001 seconds +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/Simplifier]: Running Simplifier +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/Simplifier]: Running Simplifier_iteration_0 +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/Simplifier]: Simplifier_iteration_0 finished after 0.004 seconds +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/Simplifier]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/SimplifySlice]: SimplifySlice finished after 0.001 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/LICM]: Running LICM +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/LICM]: Finished (changed=True) +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/Simplifier]: Simplifier finished after 0.004 seconds +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/LICM]: Running LICM +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/LICM]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/LICM]: LICM finished after 0.003 seconds +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/ValueNumbering]: Running ValueNumbering +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/ValueNumbering]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/ExpandBatchNorm]: ExpandBatchNorm finished after 0.003 seconds +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/TCTransform]: Running TCTransform +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/TCTransform]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/TCTransform]: TCTransform finished after 0.002 seconds +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.002 seconds +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/TensorOpTransform]: Running TensorOpTransform +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/TensorOpTransform]: Running TensorOpTransform_iteration_0 +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/ValueNumbering]: ValueNumbering finished after 0.008 seconds +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/TCTransform]: Running TCTransform +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/TCTransform]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/TCTransform]: TCTransform finished after 0.002 seconds +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/CommuteConcat]: Running CommuteConcat +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/CommuteConcat]: Running CommuteConcat_iteration_0 +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/CommuteConcat]: CommuteConcat_iteration_0 finished after 0.002 seconds +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/CommuteConcat]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/TensorOpTransform]: TensorOpTransform_iteration_0 finished after 0.046 seconds +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/TensorOpTransform]: Running TensorOpTransform_iteration_1 +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/TensorOpTransform]: TensorOpTransform_iteration_1 finished after 0.004 seconds +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/TensorOpTransform]: Finished (changed=True) +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/LICM]: LICM finished after 0.009 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/Simplifier]: Running Simplifier +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/Simplifier]: Running Simplifier_iteration_0 +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/Simplifier]: Simplifier_iteration_0 finished after 0.005 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/Simplifier]: Running Simplifier_iteration_1 +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/TensorOpTransform]: TensorOpTransform finished after 0.050 seconds +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/LateLowerTensorOp]: Running LateLowerTensorOp +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/Simplifier]: Simplifier_iteration_1 finished after 0.004 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/Simplifier]: Finished (changed=True) +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/LateLowerTensorOp]: Finished (changed=True) +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/Simplifier]: Simplifier finished after 0.011 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/ValueNumbering]: Running ValueNumbering +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/ValueNumbering]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/CommuteConcat]: CommuteConcat finished after 0.003 seconds +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/RecognizeOpIdiom]: Running RecognizeOpIdiom +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/RecognizeOpIdiom]: Running RecognizeOpIdiom_iteration_0 +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/RecognizeOpIdiom]: RecognizeOpIdiom_iteration_0 finished after 0.007 seconds +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/RecognizeOpIdiom]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/LateLowerTensorOp]: LateLowerTensorOp finished after 0.006 seconds +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/AliasDependencyReset]: Running AliasDependencyReset +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/AliasDependencyElimination]: Finished (changed=True) +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/RecognizeOpIdiom]: RecognizeOpIdiom finished after 0.007 seconds +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/MaskPropagation]: Running MaskPropagation +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/MaskPropagation]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/ValueNumbering]: ValueNumbering finished after 0.004 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/LICM]: Running LICM +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/LICM]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/LICM]: LICM finished after 0.002 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/PadElimination]: Running PadElimination +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/PadElimination]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/PadElimination]: PadElimination finished after 0.001 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/Delinearization]: Running Delinearization +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/Delinearization]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.000 seconds +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/AliasDependencyInduction]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/Delinearization]: Delinearization finished after 0.003 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/LoopFusion]: Running LoopFusion +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/LoopFusion]: Running LoopFusion_iteration_0 +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/LoopFusion]: LoopFusion_iteration_0 finished after 0.002 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/LoopFusion]: Running LoopFusion_iteration_0 +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/LoopFusion]: LoopFusion_iteration_0 finished after 0.002 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/LoopFusion]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.009 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/LoopFusion]: LoopFusion finished after 0.006 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/AliasDependencyReset]: AliasDependencyReset finished after 0.088 seconds +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/MemcpyElimination]: Running MemcpyElimination +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/MemcpyElimination]: Running MemcpyElimination_iteration_0 +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.003 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/Simplifier]: Running Simplifier +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/Simplifier]: Running Simplifier_iteration_0 +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/Simplifier]: Simplifier_iteration_0 finished after 0.008 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/Simplifier]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/MaskPropagation]: MaskPropagation finished after 0.003 seconds +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/DeadStoreElimination]: Running DeadStoreElimination +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/Simplifier]: Simplifier finished after 0.009 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/LICM]: Running LICM +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/LICM]: Finished (changed=True) +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/LICM]: LICM finished after 0.005 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/ValueNumbering]: Running ValueNumbering +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/ValueNumbering]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/ValueNumbering]: ValueNumbering finished after 0.003 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/TCTransform]: Running TCTransform +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/TCTransform]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/TCTransform]: TCTransform finished after 0.001 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/CommuteConcat]: Running CommuteConcat +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/CommuteConcat]: Running CommuteConcat_iteration_0 +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/CommuteConcat]: CommuteConcat_iteration_0 finished after 0.001 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/CommuteConcat]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/DeadStoreElimination]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/CommuteConcat]: CommuteConcat finished after 0.002 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/RecognizeOpIdiom]: Running RecognizeOpIdiom +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/RecognizeOpIdiom]: Running RecognizeOpIdiom_iteration_0 +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/RecognizeOpIdiom]: RecognizeOpIdiom_iteration_0 finished after 0.006 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/RecognizeOpIdiom]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/MemcpyElimination]: MemcpyElimination_iteration_0 finished after 0.112 seconds +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/MemcpyElimination]: Running MemcpyElimination_iteration_1 +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/MemcpyElimination]: MemcpyElimination_iteration_1 finished after 0.004 seconds +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/DeadStoreElimination]: DeadStoreElimination finished after 0.060 seconds +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/MemcpyElimination]: Finished (changed=True) +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/Recompute]: Running Recompute +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/Recompute]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/MemcpyElimination]: MemcpyElimination finished after 0.117 seconds +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion_iteration_0 +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/LoopFusion]: LoopFusion_iteration_0 finished after 0.009 seconds +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion_iteration_1 +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/LoopFusion]: LoopFusion_iteration_1 finished after 0.003 seconds +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion_iteration_2 +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/LoopFusion]: LoopFusion_iteration_2 finished after 0.002 seconds +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion_iteration_0 +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/RecognizeOpIdiom]: RecognizeOpIdiom finished after 0.007 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/MaskPropagation]: Running MaskPropagation +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/LoopFusion]: LoopFusion_iteration_0 finished after 0.005 seconds +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion_iteration_1 +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/LoopFusion]: LoopFusion_iteration_1 finished after 0.003 seconds +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/LoopFusion]: Finished (changed=True) +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/MaskPropagation]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/Recompute]: Recompute finished after 0.000 seconds +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination_iteration_0 +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/DeadCodeElimination]: DeadCodeElimination_iteration_0 finished after 0.001 seconds +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/DeadCodeElimination]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/MaskPropagation]: MaskPropagation finished after 0.007 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/DeadStoreElimination]: Running DeadStoreElimination +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/DeadStoreElimination]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/LoopFusion]: LoopFusion finished after 0.024 seconds +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/Rematerialization]: Running Rematerialization +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/Rematerialization]: Finished (changed=True) +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/Rematerialization]: Rematerialization finished after 0.004 seconds +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/Simplifier]: Running Simplifier +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/Simplifier]: Running Simplifier_iteration_0 +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/Simplifier]: Simplifier_iteration_0 finished after 0.005 seconds +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/Simplifier]: Running Simplifier_iteration_1 +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/Simplifier]: Simplifier_iteration_1 finished after 0.004 seconds +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/Simplifier]: Running Simplifier_iteration_2 +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/Simplifier]: Simplifier_iteration_2 finished after 0.005 seconds +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/Simplifier]: Finished (changed=True) +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/DeadCodeElimination]: DeadCodeElimination finished after 0.004 seconds +2025-11-04T21:38:35Z INFO 8681 [Tensorizer]: After optimization: 32 statements +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/DoNothing]: Running DoNothing +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/DoNothing]: Finished (changed=True) +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/MutateDataType]: Running MutateDataType +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/MutateDataType]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/MutateDataType]: MutateDataType finished after 0.002 seconds +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.003 seconds +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/Simplifier]: Running Simplifier +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/Simplifier]: Running Simplifier_iteration_0 +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/Simplifier]: Simplifier_iteration_0 finished after 0.005 seconds +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/Simplifier]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/Simplifier]: Simplifier finished after 0.007 seconds +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/TileCCOps]: Running TileCCOps +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/TileCCOps]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/DeadStoreElimination]: DeadStoreElimination finished after 0.007 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/Recompute]: Running Recompute +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/Recompute]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/Recompute]: Recompute finished after 0.000 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination_iteration_0 +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/DeadCodeElimination]: DeadCodeElimination_iteration_0 finished after 0.002 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/DeadCodeElimination]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/DeadCodeElimination]: DeadCodeElimination finished after 0.002 seconds +2025-11-04T21:38:35Z INFO 8682 [Tensorizer]: After optimization: 39 statements +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/DoNothing]: Running DoNothing +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/DoNothing]: Finished (changed=True) +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/MutateDataType]: Running MutateDataType +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/MutateDataType]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/MutateDataType]: MutateDataType finished after 0.002 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.001 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/Simplifier]: Running Simplifier +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/Simplifier]: Running Simplifier_iteration_0 +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/Simplifier]: Simplifier_iteration_0 finished after 0.003 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/Simplifier]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/Simplifier]: Simplifier finished after 0.003 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/TileCCOps]: Running TileCCOps +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/TileCCOps]: pass did not tile CC tensor due to `All gather output tensor check failed` +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/TileCCOps]: in float32 (512,) %'all_gather.2' = AllGatherOp-162 AllGather_add(float32 (256,) %'add.11', replica_groups = [[0, 1]],all_gather_dim = DimensionSet((512,), {0}),stream_id = -1) # dl = tensor_op_name: _all-gather.6459 | hlo_id: 108 | , id = 162 +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/TileCCOps]: pass did not tile CC tensor due to `multi_rank_size=2048 is not above min_allgather_tile_size_in_bytes=8388608` +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/TileCCOps]: in uint32 (512,) %'all_gather.3' = AllGatherOp-178 AllGather_add(uint32 (256,) %'add.12', replica_groups = [[0, 1]],all_gather_dim = DimensionSet((512,), {0}),stream_id = -1) # dl = tensor_op_name: _all-gather.6596 | hlo_id: 117 | , id = 178 +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/TileCCOps]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/TileCCOps]: TileCCOps finished after 0.005 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/DelinearIndices]: Running DelinearIndices +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/DelinearIndices]: Finished (changed=True) +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/TileCCOps]: TileCCOps finished after 0.007 seconds +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/DelinearIndices]: Running DelinearIndices +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.009 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/Delinearization]: Running Delinearization +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/Delinearization]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/DelinearIndices]: Finished (changed=True) +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/Delinearization]: Delinearization finished after 0.003 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/DelinearIndices]: Running DelinearIndices +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/DelinearIndices]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.029 seconds +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/Delinearization]: Running Delinearization +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/Delinearization]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.009 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination_iteration_0 +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/DeadCodeElimination]: DeadCodeElimination_iteration_0 finished after 0.001 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/DeadCodeElimination]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/Delinearization]: Delinearization finished after 0.005 seconds +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/DelinearIndices]: Running DelinearIndices +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/DeadCodeElimination]: DeadCodeElimination finished after 0.002 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/LateLowerReshapeOp]: Running LateLowerReshapeOp +2025-11-04T21:38:35Z INFO 8681 [sg0001/Tensorizer/DelinearIndices]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/LateLowerReshapeOp]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/LateLowerReshapeOp]: LateLowerReshapeOp finished after 0.002 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/InferIntrinsicOnCC]: Running InferIntrinsicOnCC +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/InferIntrinsicOnCC]: Finished (changed=True) +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/InferIntrinsicOnCC]: InferIntrinsicOnCC finished after 0.008 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/ResolveAccessConflict]: Running ResolveAccessConflict +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/ResolveAccessConflict]: Running DeadCodeElimination_iteration_0 +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/ResolveAccessConflict]: DeadCodeElimination_iteration_0 finished after 0.001 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/ResolveAccessConflict]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/ResolveAccessConflict]: ResolveAccessConflict finished after 0.007 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/LICM]: Running LICM +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/LICM]: Finished (changed=True) +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/LICM]: LICM finished after 0.003 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/LocalLayoutOpt]: Running LocalLayoutOpt +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/LocalLayoutOpt]: Finished (changed=True) +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.015 seconds +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/Delinearization]: Running Delinearization +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/Delinearization]: Finished (changed=True) +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/LocalLayoutOpt]: LocalLayoutOpt finished after 0.023 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/DelinearIndices]: Running DelinearIndices +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.007 seconds +2025-11-04T21:38:35Z INFO 8680 [sg0000/Tensorizer/DeadStoreElimination]: Running DeadStoreElimination +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/DelinearIndices]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.028 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/PGLayoutTilingPipeline]: Running PGLayoutTilingPipeline +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/LowerCCOpBlockAxis]: Running LowerCCOpBlockAxis +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/LowerCCOpBlockAxis]: Finished (changed=False) +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/LowerCCOpBlockAxis]: LowerCCOpBlockAxis finished after 0.013 seconds +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/LayoutPreprocessingAndAnalysis]: Running LayoutPreprocessingAndAnalysis +2025-11-04T21:38:35Z INFO 8682 [sg0002/Tensorizer/LayoutPreprocessing]: Running LayoutPreprocessing +2025-11-04T21:38:36Z INFO 8682 [sg0002/Tensorizer/Delinearization]: Running Delinearization +2025-11-04T21:38:36Z INFO 8682 [sg0002/Tensorizer/Delinearization]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8682 [sg0002/Tensorizer/Delinearization]: Delinearization finished after 0.019 seconds +2025-11-04T21:38:36Z INFO 8682 [sg0002/Tensorizer/LayoutPreprocessing]: Finished (changed=True) +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/DeadStoreElimination]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8682 [sg0002/Tensorizer/LayoutPreprocessing]: LayoutPreprocessing finished after 0.102 seconds +2025-11-04T21:38:36Z INFO 8682 [sg0002/Tensorizer/LayoutRequirementAnalysis]: Running LayoutRequirementAnalysis +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/DeadStoreElimination]: DeadStoreElimination finished after 0.154 seconds +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/Simplifier]: Running Simplifier +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/Simplifier]: Running Simplifier_iteration_0 +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/Simplifier]: Simplifier_iteration_0 finished after 0.022 seconds +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/Simplifier]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.013 seconds +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination_iteration_0 +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/DeadCodeElimination]: DeadCodeElimination_iteration_0 finished after 0.002 seconds +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/DeadCodeElimination]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.025 seconds +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/LICM]: Running LICM +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/LICM]: Finished (changed=True) +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/DeadCodeElimination]: DeadCodeElimination finished after 0.012 seconds +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/LateLowerReshapeOp]: Running LateLowerReshapeOp +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/LateLowerReshapeOp]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/LICM]: LICM finished after 0.019 seconds +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/Delinearization]: Running Delinearization +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/Delinearization]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8682 [sg0002/Tensorizer/LayoutRequirementAnalysis]: LayoutRequirementAnalysis finished after 0.033 seconds +2025-11-04T21:38:36Z INFO 8682 [sg0002/Tensorizer/LayoutPreprocessingAndAnalysis]: LayoutPreprocessingAndAnalysis finished after 0.233 seconds +2025-11-04T21:38:36Z INFO 8682 [sg0002/Tensorizer/InferNonlocalTensors]: Running InferNonlocalTensors +2025-11-04T21:38:36Z INFO 8682 [sg0002/Tensorizer/InferNonlocalTensors]: prefer_non_broadcast_par: True +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.009 seconds +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion_iteration_0 +2025-11-04T21:38:36Z INFO 8682 [sg0002/Tensorizer/InferNonlocalTensors]: prefer_non_broadcast_par: True +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/LoopFusion]: LoopFusion_iteration_0 finished after 0.004 seconds +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion_iteration_0 +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/LoopFusion]: LoopFusion_iteration_0 finished after 0.007 seconds +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/LateLowerReshapeOp]: LateLowerReshapeOp finished after 0.007 seconds +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/InferIntrinsicOnCC]: Running InferIntrinsicOnCC +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/LoopFusion]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8682 [sg0002/Tensorizer/InferNonlocalTensors]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/LoopFusion]: LoopFusion finished after 0.022 seconds +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/SimplifySlice]: Running SimplifySlice +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/SimplifySlice]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/InferIntrinsicOnCC]: Finished (changed=True) +2025-11-04T21:38:36Z INFO 8682 [sg0002/Tensorizer/InferNonlocalTensors]: InferNonlocalTensors finished after 0.057 seconds +2025-11-04T21:38:36Z INFO 8682 [sg0002/Tensorizer/PAGLayoutOpt]: Running PAGLayoutOpt +2025-11-04T21:38:36Z INFO 8682 [sg0002/Tensorizer/ParAxesAnnotation]: Running ParAxesAnnotation +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/InferIntrinsicOnCC]: InferIntrinsicOnCC finished after 0.025 seconds +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/ResolveAccessConflict]: Running ResolveAccessConflict +2025-11-04T21:38:36Z INFO 8682 [sg0002/Tensorizer/LayoutSearchAlgorithm]: prefer_non_broadcast_par: True +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/ResolveAccessConflict]: Running DeadCodeElimination_iteration_0 +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/ResolveAccessConflict]: DeadCodeElimination_iteration_0 finished after 0.001 seconds +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/ResolveAccessConflict]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/SimplifySlice]: SimplifySlice finished after 0.003 seconds +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/LICM]: Running LICM +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/LICM]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/LICM]: LICM finished after 0.003 seconds +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/Simplifier]: Running Simplifier +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/Simplifier]: Running Simplifier_iteration_0 +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/Simplifier]: Simplifier_iteration_0 finished after 0.007 seconds +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/Simplifier]: Running Simplifier_iteration_1 +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/Simplifier]: Simplifier_iteration_1 finished after 0.010 seconds +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/Simplifier]: Finished (changed=True) +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/ResolveAccessConflict]: ResolveAccessConflict finished after 0.006 seconds +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/LICM]: Running LICM +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/LICM]: Finished (changed=True) +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.024 seconds +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/ValueNumbering]: Running ValueNumbering +2025-11-04T21:38:36Z INFO 8682 [sg0002/Tensorizer/ParAxesAnnotation]: Finished (changed=True) +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/ValueNumbering]: Finished (changed=True) +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/LICM]: LICM finished after 0.012 seconds +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/LocalLayoutOpt]: Running LocalLayoutOpt +2025-11-04T21:38:36Z INFO 8682 [sg0002/Tensorizer/ParAxesAnnotation]: ParAxesAnnotation finished after 0.089 seconds +2025-11-04T21:38:36Z INFO 8682 [sg0002/Tensorizer/InsertLocalTransposes]: Running InsertLocalTransposes +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/ValueNumbering]: ValueNumbering finished after 0.015 seconds +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/LICM]: Running LICM +2025-11-04T21:38:36Z INFO 8682 [sg0002/Tensorizer/InsertLocalTransposes]: Finished (changed=True) +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/LICM]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8682 [sg0002/Tensorizer/InsertLocalTransposes]: InsertLocalTransposes finished after 0.022 seconds +2025-11-04T21:38:36Z INFO 8682 [sg0002/Tensorizer/PAGLayoutOpt]: PAGLayoutOpt finished after 0.162 seconds +2025-11-04T21:38:36Z INFO 8682 [sg0002/Tensorizer/DelinearizeSPMD]: Running DelinearizeSPMD +2025-11-04T21:38:36Z INFO 8682 [sg0002/Tensorizer/Delinearization]: Running Delinearization +2025-11-04T21:38:36Z INFO 8682 [sg0002/Tensorizer/Delinearization]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/LICM]: LICM finished after 0.002 seconds +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/PadElimination]: Running PadElimination +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/PadElimination]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/PadElimination]: PadElimination finished after 0.001 seconds +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/Delinearization]: Running Delinearization +2025-11-04T21:38:36Z INFO 8682 [sg0002/Tensorizer/Delinearization]: Delinearization finished after 0.010 seconds +2025-11-04T21:38:36Z INFO 8682 [sg0002/Tensorizer/DelinearizeSPMD]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/LocalLayoutOpt]: Finished (changed=True) +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/Delinearization]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8682 [sg0002/Tensorizer/DelinearizeSPMD]: DelinearizeSPMD finished after 0.044 seconds +2025-11-04T21:38:36Z INFO 8682 [sg0002/Tensorizer/ShardingPropagationAnalysis]: Running ShardingPropagationAnalysis +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.022 seconds +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion_iteration_0 +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/LoopFusion]: LoopFusion_iteration_0 finished after 0.008 seconds +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion_iteration_0 +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/LoopFusion]: LoopFusion_iteration_0 finished after 0.006 seconds +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/LoopFusion]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/LocalLayoutOpt]: LocalLayoutOpt finished after 0.132 seconds +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/DelinearIndices]: Running DelinearIndices +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/LoopFusion]: LoopFusion finished after 0.015 seconds +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/DelinearIndices]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.002 seconds +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/Simplifier]: Running Simplifier +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/Simplifier]: Running Simplifier_iteration_0 +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.024 seconds +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/PGLayoutTilingPipeline]: Running PGLayoutTilingPipeline +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/LowerCCOpBlockAxis]: Running LowerCCOpBlockAxis +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/Simplifier]: Simplifier_iteration_0 finished after 0.014 seconds +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/Simplifier]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/LowerCCOpBlockAxis]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.014 seconds +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/LICM]: Running LICM +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/LICM]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/LowerCCOpBlockAxis]: LowerCCOpBlockAxis finished after 0.022 seconds +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/LayoutPreprocessingAndAnalysis]: Running LayoutPreprocessingAndAnalysis +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/LayoutPreprocessing]: Running LayoutPreprocessing +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/LICM]: LICM finished after 0.002 seconds +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/ValueNumbering]: Running ValueNumbering +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/ValueNumbering]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/Delinearization]: Running Delinearization +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/Delinearization]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/ValueNumbering]: ValueNumbering finished after 0.007 seconds +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/TCTransform]: Running TCTransform +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/TCTransform]: Finished (changed=True) +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/Delinearization]: Delinearization finished after 0.008 seconds +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/TCTransform]: TCTransform finished after 0.002 seconds +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/CommuteConcat]: Running CommuteConcat +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/CommuteConcat]: Running CommuteConcat_iteration_0 +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/CommuteConcat]: CommuteConcat_iteration_0 finished after 0.001 seconds +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/CommuteConcat]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/CommuteConcat]: CommuteConcat finished after 0.002 seconds +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/RecognizeOpIdiom]: Running RecognizeOpIdiom +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/RecognizeOpIdiom]: Running RecognizeOpIdiom_iteration_0 +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/LayoutPreprocessing]: Finished (changed=True) +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/RecognizeOpIdiom]: RecognizeOpIdiom_iteration_0 finished after 0.010 seconds +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/RecognizeOpIdiom]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/LayoutPreprocessing]: LayoutPreprocessing finished after 0.125 seconds +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/LayoutRequirementAnalysis]: Running LayoutRequirementAnalysis +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/RecognizeOpIdiom]: RecognizeOpIdiom finished after 0.014 seconds +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/MaskPropagation]: Running MaskPropagation +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/MaskPropagation]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8682 [sg0002/Tensorizer/ShardingPropagationAnalysis]: ShardingPropagationAnalysis finished after 0.280 seconds +2025-11-04T21:38:36Z INFO 8682 [sg0002/Tensorizer/InferShardAxis]: Running InferShardAxis +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/MaskPropagation]: MaskPropagation finished after 0.009 seconds +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/DeadStoreElimination]: Running DeadStoreElimination +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/LayoutRequirementAnalysis]: LayoutRequirementAnalysis finished after 0.027 seconds +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/LayoutPreprocessingAndAnalysis]: LayoutPreprocessingAndAnalysis finished after 0.237 seconds +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/InferNonlocalTensors]: Running InferNonlocalTensors +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/InferNonlocalTensors]: prefer_non_broadcast_par: True +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/InferNonlocalTensors]: prefer_non_broadcast_par: True +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/DeadStoreElimination]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/DeadStoreElimination]: DeadStoreElimination finished after 0.071 seconds +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/Recompute]: Running Recompute +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/Recompute]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8681 [sg0001/Tensorizer/InferNonlocalTensors]: Finished (changed=False) +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/Recompute]: Recompute finished after 0.001 seconds +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination_iteration_0 +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/DeadCodeElimination]: DeadCodeElimination_iteration_0 finished after 0.002 seconds +2025-11-04T21:38:36Z INFO 8680 [sg0000/Tensorizer/DeadCodeElimination]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/DeadCodeElimination]: DeadCodeElimination finished after 0.002 seconds +2025-11-04T21:38:37Z INFO 8680 [Tensorizer]: After optimization: 32 statements +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/DoNothing]: Running DoNothing +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/DoNothing]: Finished (changed=True) +2025-11-04T21:38:37Z INFO 8681 [sg0001/Tensorizer/InferNonlocalTensors]: InferNonlocalTensors finished after 0.059 seconds +2025-11-04T21:38:37Z INFO 8681 [sg0001/Tensorizer/PAGLayoutOpt]: Running PAGLayoutOpt +2025-11-04T21:38:37Z INFO 8681 [sg0001/Tensorizer/ParAxesAnnotation]: Running ParAxesAnnotation +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/MutateDataType]: Running MutateDataType +2025-11-04T21:38:37Z INFO 8681 [sg0001/Tensorizer/LayoutSearchAlgorithm]: prefer_non_broadcast_par: True +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/MutateDataType]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/MutateDataType]: MutateDataType finished after 0.006 seconds +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.001 seconds +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/Simplifier]: Running Simplifier +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/Simplifier]: Running Simplifier_iteration_0 +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/Simplifier]: Simplifier_iteration_0 finished after 0.007 seconds +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/Simplifier]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.013 seconds +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/TileCCOps]: Running TileCCOps +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/TileCCOps]: pass did not tile CC tensor due to `multi_rank_size=4194304 is not above min_allgather_tile_size_in_bytes=8388608` +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/TileCCOps]: in bfloat16 (2048, 1024) %'all_gather.1' = AllGatherOp-34 AllGather_add(bfloat16 (1024, 1024) %'transpose.1', replica_groups = [[0, 1]],all_gather_dim = DimensionSet((2048, 1024), {0}),stream_id = -1) # dl = tensor_op_name: _all-gather.47 | hlo_id: 15 | , id = 34 +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/TileCCOps]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/TileCCOps]: TileCCOps finished after 0.011 seconds +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/DelinearIndices]: Running DelinearIndices +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/DelinearIndices]: Finished (changed=True) +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.025 seconds +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/Delinearization]: Running Delinearization +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/Delinearization]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.006 seconds +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/DelinearIndices]: Running DelinearIndices +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/DelinearIndices]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.016 seconds +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/ShardResult]: =================== Dumping Debug Info ===================== +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination_iteration_0 +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/DeadCodeElimination]: DeadCodeElimination_iteration_0 finished after 0.001 seconds +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/ShardResult]: ------------------ Sharding summary ------------------ +total number of dags: 36 +total number of sharded dags: 13 + +total bytes transferred from input, output, non local tensors: 370225954 +total bytes transferred from input, output, non local tensors with 2x bandwidths: 345041680 +% bytes transferred with 2x bandwidths: 93.20 + +NC0 FLOPs: 55340232214854943330 +NC1 FLOPs: 55340232214854936160 +% FLOPs sharded: 100.00 + + +Shard dim: 1024, Number of dags: 7 +Matmuls sharded with this dim: +[1024(s),2,6,2,128] @ [2,6,2,128,8,2,128] = [1024(s),8,2,128] (stationary-streaming swapped) Number of occurrences: 1 +[1024(s),2,8,128] @ [2,8,128,2,6,2,128] = [1024(s),2,6,2,128] Number of occurrences: 2 + + +Shard dim: 256, Number of dags: 5 +Matmuls sharded with this dim: + + +Shard dim: 75968, Number of dags: 1 +Matmuls sharded with this dim: +[2,8,128] @ [2,8,128,75968(s)] = [75968(s)] Number of occurrences: 1 + + + +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/DeadCodeElimination]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/DeadCodeElimination]: DeadCodeElimination finished after 0.002 seconds +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/LateLowerReshapeOp]: Running LateLowerReshapeOp +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/LateLowerReshapeOp]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/DelinearIndices]: Running DelinearIndices +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/LateLowerReshapeOp]: LateLowerReshapeOp finished after 0.001 seconds +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/InferIntrinsicOnCC]: Running InferIntrinsicOnCC +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/DelinearIndices]: Finished (changed=True) +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/InferIntrinsicOnCC]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.009 seconds +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/RemoveShardedPartitionAxes]: Running RemoveShardedPartitionAxes +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/RemoveShardedPartitionAxes]: Finished (changed=True) +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/InferIntrinsicOnCC]: InferIntrinsicOnCC finished after 0.016 seconds +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/ResolveAccessConflict]: Running ResolveAccessConflict +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/ResolveAccessConflict]: Running DeadCodeElimination_iteration_0 +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/ResolveAccessConflict]: DeadCodeElimination_iteration_0 finished after 0.001 seconds +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/ResolveAccessConflict]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/RemoveShardedPartitionAxes]: RemoveShardedPartitionAxes finished after 0.015 seconds +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/InferShardAxis]: Finished (changed=True) +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/ResolveAccessConflict]: ResolveAccessConflict finished after 0.007 seconds +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/LICM]: Running LICM +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/LICM]: Finished (changed=True) +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/InferShardAxis]: InferShardAxis finished after 0.460 seconds +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/MaskPropagation]: Running MaskPropagation +2025-11-04T21:38:37Z INFO 8681 [sg0001/Tensorizer/ParAxesAnnotation]: Finished (changed=True) +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/MaskPropagation]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/LICM]: LICM finished after 0.017 seconds +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/LocalLayoutOpt]: Running LocalLayoutOpt +2025-11-04T21:38:37Z INFO 8681 [sg0001/Tensorizer/ParAxesAnnotation]: ParAxesAnnotation finished after 0.336 seconds +2025-11-04T21:38:37Z INFO 8681 [sg0001/Tensorizer/InsertLocalTransposes]: Running InsertLocalTransposes +2025-11-04T21:38:37Z INFO 8681 [sg0001/Tensorizer/InsertLocalTransposes]: Finished (changed=True) +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/MaskPropagation]: MaskPropagation finished after 0.014 seconds +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/LocalLayoutOpt]: Finished (changed=True) +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/CanonicalizeDAGForPGTiling]: Running CanonicalizeDAGForPGTiling +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/CanonicalizeDAGForPGTiling]: Finished (changed=True) +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/LocalLayoutOpt]: LocalLayoutOpt finished after 0.030 seconds +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/DelinearIndices]: Running DelinearIndices +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/DelinearIndices]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8681 [sg0001/Tensorizer/InsertLocalTransposes]: InsertLocalTransposes finished after 0.014 seconds +2025-11-04T21:38:37Z INFO 8681 [sg0001/Tensorizer/PAGLayoutOpt]: PAGLayoutOpt finished after 0.404 seconds +2025-11-04T21:38:37Z INFO 8681 [sg0001/Tensorizer/DelinearizeSPMD]: Running DelinearizeSPMD +2025-11-04T21:38:37Z INFO 8681 [sg0001/Tensorizer/Delinearization]: Running Delinearization +2025-11-04T21:38:37Z INFO 8681 [sg0001/Tensorizer/Delinearization]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/CanonicalizeDAGForPGTiling]: CanonicalizeDAGForPGTiling finished after 0.011 seconds +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/LowerCCOpBlockAxis]: Running LowerCCOpBlockAxis +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/LowerCCOpBlockAxis]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8681 [sg0001/Tensorizer/Delinearization]: Delinearization finished after 0.008 seconds +2025-11-04T21:38:37Z INFO 8681 [sg0001/Tensorizer/DelinearizeSPMD]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8681 [sg0001/Tensorizer/DelinearizeSPMD]: DelinearizeSPMD finished after 0.034 seconds +2025-11-04T21:38:37Z INFO 8681 [sg0001/Tensorizer/ShardingPropagationAnalysis]: Running ShardingPropagationAnalysis +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/LowerCCOpBlockAxis]: LowerCCOpBlockAxis finished after 0.007 seconds +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/PGTiling]: Running PGTiling +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/AGOrderingAnalysisPass]: Running AGOrderingAnalysisPass +2025-11-04T21:38:37Z INFO 8681 [sg0001/Tensorizer/ShardingPropagationAnalysis]: ShardingPropagationAnalysis finished after 0.031 seconds +2025-11-04T21:38:37Z INFO 8681 [sg0001/Tensorizer/InferShardAxis]: Running InferShardAxis +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.013 seconds +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/PGLayoutTilingPipeline]: Running PGLayoutTilingPipeline +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/LowerCCOpBlockAxis]: Running LowerCCOpBlockAxis +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/LowerCCOpBlockAxis]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 598 of IO tensor {'CrossPassTensor': ''}bfloat16 %input367|N|(128, 2, 8) is not sorted, index list (w/ AG ids): [(26, 'AG77'), (21, 'AG79'), (22, 'AG78')] +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 599 of IO tensor {'CrossPassTensor': ''}bfloat16 %input368|NC|(2, 6, 128, 2, 8, 2, 128) is not sorted, index list (w/ AG ids): [(26, 'AG77'), (21, 'AG79'), (22, 'AG78')] +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 600 of IO tensor {'CrossPassTensor': ''}bfloat16 %input366|NC|(2, 6, 128, 2, 8, 2, 128) is not sorted, index list (w/ AG ids): [(26, 'AG77'), (21, 'AG79'), (22, 'AG78')] +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 601 of IO tensor {'CrossPassTensor': ''}bfloat16 %input365|NC|(8, 2, 128, 6, 2, 2, 128) is not sorted, index list (w/ AG ids): [(18, 'AG85'), (25, 'AG82'), (19, 'AG84'), (24, 'AG83')] +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 602 of IO tensor {'CrossPassTensor': ''}bfloat16 %input370|N|(128, 2, 8) is not sorted, index list (w/ AG ids): [(26, 'AG77'), (21, 'AG79'), (22, 'AG78')] +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 553 of IO tensor {'CrossPassTensor': ''}bfloat16 %input369|NC|(2, 37984, 2, 8, 128) is not sorted, index list (w/ AG ids): [(3, 'AG94'), (23, 'AG93'), (21, 'AG79'), (22, 'AG78')] +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/LowerCCOpBlockAxis]: LowerCCOpBlockAxis finished after 0.014 seconds +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/LayoutPreprocessingAndAnalysis]: Running LayoutPreprocessingAndAnalysis +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/LayoutPreprocessing]: Running LayoutPreprocessing +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/AGOrderingAnalysisPass]: AGOrderingAnalysisPass finished after 0.058 seconds +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/StaticTransposeLocalTensor]: Running StaticTransposeLocalTensor +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/StaticTransposeLocalTensor]: Finished (changed=True) +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/Delinearization]: Running Delinearization +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/StaticTransposeLocalTensor]: StaticTransposeLocalTensor finished after 0.023 seconds +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/PComputeCutting]: Running PComputeCutting +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/Delinearization]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/PComputeCutting]: Finished (changed=True) +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.011 seconds +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/PComputeCutting]: PComputeCutting finished after 0.016 seconds +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/BFComputeCutting]: Running BFComputeCutting +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/LayoutPreprocessing]: Finished (changed=True) +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/BFComputeCutting]: Finished (changed=True) +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/LayoutPreprocessing]: LayoutPreprocessing finished after 0.113 seconds +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/LayoutRequirementAnalysis]: Running LayoutRequirementAnalysis +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/BFComputeCutting]: BFComputeCutting finished after 0.003 seconds +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/LoopSplitting]: Running LoopSplitting +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/LoopSplitting]: Finished (changed=False) +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/LoopSplitting]: LoopSplitting finished after 0.001 seconds +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/MacroGeneration]: Running MacroGeneration +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/LayoutRequirementAnalysis]: LayoutRequirementAnalysis finished after 0.019 seconds +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/LayoutPreprocessingAndAnalysis]: LayoutPreprocessingAndAnalysis finished after 0.172 seconds +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/InferNonlocalTensors]: Running InferNonlocalTensors +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/InferNonlocalTensors]: prefer_non_broadcast_par: True +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/MacroGeneration]: Finished (changed=True) +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/MacroGeneration]: MacroGeneration finished after 0.128 seconds +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/PGTiling]: PGTiling finished after 0.363 seconds +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/InsertIOTransposes]: Running InsertIOTransposes +2025-11-04T21:38:37Z INFO 8680 [sg0000/Tensorizer/InferNonlocalTensors]: prefer_non_broadcast_par: True +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/InsertIOTransposes]: Finished (changed=True) +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/InsertIOTransposes]: InsertIOTransposes finished after 0.068 seconds +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/InsertOffloadedTransposes]: Running InsertOffloadedTransposes +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/InsertOffloadedTransposes]: OffloadedTranspose inserted: 0 +2025-11-04T21:38:37Z INFO 8682 [sg0002/Tensorizer/InsertOffloadedTransposes]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/InsertOffloadedTransposes]: InsertOffloadedTransposes finished after 0.018 seconds +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/DramToDramTranspose]: Running DramToDramTranspose +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/DramToDramTranspose]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/DramToDramTranspose]: DramToDramTranspose finished after 0.013 seconds +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/PGLayoutTilingPipeline]: PGLayoutTilingPipeline finished after 2.054 seconds +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/TilingProfiler]: Running TilingProfiler +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/TilingBottleneck]: +20 MACROS WITH LARGEST INSTRUCTION COUNTS: +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/TilingBottleneck]: 9504: transpose_128x128 +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/TilingBottleneck]: 9504: matmul_128x128x1 +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/TilingBottleneck]: 768: matmul_128x128x512 +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/TilingBottleneck]: 768: matmul_128x128x512 +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/TilingBottleneck]: 768: matmul_128x128x512 +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/TilingBottleneck]: 128: transpose_128x128 +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/TilingBottleneck]: 128: transpose_128x128 +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/TilingBottleneck]: 128: transpose_128x128 +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/TilingBottleneck]: 48: simd128x512 +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/TilingBottleneck]: 32: rmsnorm128x512x128 +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/TilingBottleneck]: 32: simd128x512 +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/TilingBottleneck]: 32: rmsnorm128x512x128 +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/TilingBottleneck]: 2: reduce512x1x1 +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/TilingBottleneck]: 2: simd1x512 +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/TilingBottleneck]: 2: reduce512x1x1 +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/TilingBottleneck]: 2: simd1x128 +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/TilingBottleneck]: 2: simd1x128 +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/TilingBottleneck]: 2: indirect_load128x1 +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/TilingBottleneck]: 2: simd1x128 +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/TilingBottleneck]: 2: simd1x128 +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/TilingProfiler]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8680 [sg0000/Tensorizer/InferNonlocalTensors]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/TilingProfiler]: TilingProfiler finished after 0.029 seconds +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/FlattenMacroLoop]: Finished (changed=True) +2025-11-04T21:38:38Z INFO 8680 [sg0000/Tensorizer/InferNonlocalTensors]: InferNonlocalTensors finished after 0.311 seconds +2025-11-04T21:38:38Z INFO 8680 [sg0000/Tensorizer/PAGLayoutOpt]: Running PAGLayoutOpt +2025-11-04T21:38:38Z INFO 8680 [sg0000/Tensorizer/ParAxesAnnotation]: Running ParAxesAnnotation +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/ShardResult]: =================== Dumping Debug Info ===================== +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.021 seconds +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/InferNeuronTensor]: Running InferNeuronTensor +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/InferNeuronTensor]: Running InferNeuronTensor_iteration_0 +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/ShardResult]: ------------------ Sharding summary ------------------ +total number of dags: 32 +total number of sharded dags: 25 + +total bytes transferred from input, output, non local tensors: 84943876 +total bytes transferred from input, output, non local tensors with 2x bandwidths: 51384320 +% bytes transferred with 2x bandwidths: 60.49 + +NC0 FLOPs: 92233720359980498947 +NC1 FLOPs: 92233720359980498944 +% FLOPs sharded: 100.00 + + +Shard dim: 1024, Number of dags: 24 +Matmuls sharded with this dim: +[1024(s),2,6,2,128] @ [2,6,2,128,8,2,128] = [1024(s),8,2,128] (stationary-streaming swapped) Number of occurrences: 1 +[1024(s),2,8,128] @ [2,8,128,2,2,2,2,64] = [1024(s),2,2,2,2,64] Number of occurrences: 1 +[1024(s),2,8,128] @ [2,8,128,2,6,2,128] = [1024(s),2,6,2,128] Number of occurrences: 2 +[1024(s),2,8,128] @ [2,8,128,4,128] = [1024(s),4,128] Number of occurrences: 1 +[1024(s),2,8,128] @ [2,8,128,4,2,64] = [1024(s),4,2,64] Number of occurrences: 1 + + +Shard dim: 2, Number of dags: 1 +Matmuls sharded with this dim: +[1024,4,2,128] @ [4,2,128,2(s),2,4,128] = [1024,2(s),2,4,128] (stationary-streaming swapped) Number of occurrences: 1 + + + +2025-11-04T21:38:38Z INFO 8680 [sg0000/Tensorizer/LayoutSearchAlgorithm]: prefer_non_broadcast_par: True +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/InferNeuronTensor]: InferNeuronTensor_iteration_0 finished after 0.055 seconds +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/InferNeuronTensor]: Running InferNeuronTensor_iteration_1 +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/InferNeuronTensor]: InferNeuronTensor_iteration_1 finished after 0.002 seconds +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/InferNeuronTensor]: Finished (changed=True) +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/DelinearIndices]: Running DelinearIndices +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/InferNeuronTensor]: InferNeuronTensor finished after 0.058 seconds +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier_iteration_0 +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/DelinearIndices]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/NeuronSimplifier]: NeuronSimplifier_iteration_0 finished after 0.020 seconds +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/NeuronSimplifier]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.021 seconds +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/RemoveShardedPartitionAxes]: Running RemoveShardedPartitionAxes +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.021 seconds +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/LICM]: Running LICM +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/LICM]: Finished (changed=True) +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/RemoveShardedPartitionAxes]: Finished (changed=True) +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/LICM]: LICM finished after 0.009 seconds +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/RewriteReplicationMatmul]: Running RewriteReplicationMatmul +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/RewriteReplicationMatmul]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/RemoveShardedPartitionAxes]: RemoveShardedPartitionAxes finished after 0.033 seconds +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/InferShardAxis]: Finished (changed=True) +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/RewriteReplicationMatmul]: RewriteReplicationMatmul finished after 0.006 seconds +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/FlattenMacroLoop]: Finished (changed=True) +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/InferShardAxis]: InferShardAxis finished after 0.753 seconds +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/MaskPropagation]: Running MaskPropagation +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/MaskPropagation]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.009 seconds +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/SimplifyMacroPredicates]: Running SimplifyMacroPredicates +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/MaskPropagation]: MaskPropagation finished after 0.007 seconds +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/CanonicalizeDAGForPGTiling]: Running CanonicalizeDAGForPGTiling +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/SimplifyMacroPredicates]: Finished (changed=True) +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/CanonicalizeDAGForPGTiling]: Finished (changed=True) +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/SimplifyMacroPredicates]: SimplifyMacroPredicates finished after 0.025 seconds +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/DataLocalityOpt]: Running DataLocalityOpt +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/CanonicalizeDAGForPGTiling]: CanonicalizeDAGForPGTiling finished after 0.022 seconds +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/LowerCCOpBlockAxis]: Running LowerCCOpBlockAxis +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/LowerCCOpBlockAxis]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/LowerCCOpBlockAxis]: LowerCCOpBlockAxis finished after 0.019 seconds +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/PGTiling]: Running PGTiling +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/AGOrderingAnalysisPass]: Running AGOrderingAnalysisPass +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 655 of IO tensor {'CrossPassTensor': ''}bfloat16 %input70|N|(128, 2, 8) is not sorted, index list (w/ AG ids): [(31, 'AG111'), (27, 'AG113'), (29, 'AG112')] +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 656 of IO tensor {'CrossPassTensor': ''}bfloat16 %input71|NC|(2, 6, 128, 2, 8, 2, 128) is not sorted, index list (w/ AG ids): [(31, 'AG111'), (27, 'AG113'), (29, 'AG112')] +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 657 of IO tensor {'CrossPassTensor': ''}bfloat16 %input69|NC|(2, 6, 128, 2, 8, 2, 128) is not sorted, index list (w/ AG ids): [(31, 'AG111'), (27, 'AG113'), (29, 'AG112')] +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 658 of IO tensor {'CrossPassTensor': ''}bfloat16 %input68|NC|(8, 2, 128, 6, 2, 2, 128) is not sorted, index list (w/ AG ids): [(24, 'AG119'), (30, 'AG116'), (25, 'AG118'), (28, 'AG117')] +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 659 of IO tensor {'CrossPassTensor': ''}bfloat16 %input74|N|(128, 2, 8) is not sorted, index list (w/ AG ids): [(31, 'AG111'), (27, 'AG113'), (29, 'AG112')] +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 660 of IO tensor {'CrossPassTensor': ''}bfloat16 %input78|NC|(2, 2, 128, 8, 2, 2, 2, 64) is not sorted, index list (w/ AG ids): [(27, 'AG113'), (31, 'AG111'), (29, 'AG112')] +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 661 of IO tensor {'CrossPassTensor': ''}bfloat16 %input77|N|(64, 2) is not sorted, index list (w/ AG ids): [(13, 'AG123'), (9, 'AG124')] +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 662 of IO tensor {'CrossPassTensor': ''}bfloat16 %input76|NC|(2, 128, 8, 4, 2, 64) is not sorted, index list (w/ AG ids): [(27, 'AG113'), (31, 'AG111'), (29, 'AG112')] +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 663 of IO tensor {'CrossPassTensor': ''}bfloat16 %input75|N|(64, 2) is not sorted, index list (w/ AG ids): [(18, 'AG128'), (14, 'AG129')] +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 664 of IO tensor {'CrossPassTensor': ''}bfloat16 %input73|NC|(2, 128, 8, 4, 128) is not sorted, index list (w/ AG ids): [(27, 'AG113'), (31, 'AG111'), (29, 'AG112')] +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 444 of IO tensor {'CrossPassTensor': ''}bfloat16 %input72|NC|(2, 2, 128, 4, 2, 4, 128) is not sorted, index list (w/ AG ids): [(20, 'AG135'), (12, 'AG137'), (17, 'AG136')] +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 694 of IO tensor non_local bfloat16 %reshape.68(4, 2, 2, 64, 2, 512) is not sorted, index list (w/ AG ids): [(10, 'AG130'), (15, 'AG131'), (7, 'AG115'), (26, 'AG114')] +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 644 of IO tensor non_local bfloat16 %reshape.73(4, 2, 2, 512, 128) is not sorted, index list (w/ AG ids): [(11, 'AG133'), (16, 'AG134'), (7, 'AG115'), (19, 'AG132')] +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/AGOrderingAnalysisPass]: AGOrderingAnalysisPass finished after 0.119 seconds +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/StaticTransposeLocalTensor]: Running StaticTransposeLocalTensor +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/StaticTransposeLocalTensor]: Finished (changed=True) +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/StaticTransposeLocalTensor]: StaticTransposeLocalTensor finished after 0.015 seconds +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/PComputeCutting]: Running PComputeCutting +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/PComputeCutting]: Finished (changed=True) +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/PComputeCutting]: PComputeCutting finished after 0.021 seconds +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/BFComputeCutting]: Running BFComputeCutting +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/BFComputeCutting]: Finished (changed=True) +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/BFComputeCutting]: BFComputeCutting finished after 0.003 seconds +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/LoopSplitting]: Running LoopSplitting +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/LoopSplitting]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/DataLocalityOpt]: Finished (changed=True) +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/DataLocalityOpt]: DataLocalityOpt finished after 0.280 seconds +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/DMATilingProfiler]: Running DMATilingProfiler +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/PostDLOTilingBottleneck]: +20 MACROS WITH LARGEST INSTRUCTION COUNTS: +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 9504: transpose_128x128 +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 9504: matmul_128x128x1 +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 768: matmul_128x128x512 +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 768: matmul_128x128x512 +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 768: matmul_128x128x512 +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 594: transpose_128x1 +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 128: transpose_128x128 +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 128: transpose_128x128 +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 128: transpose_128x128 +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 96: dma128x512 +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 48: simd128x512 +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 32: rmsnorm128x512x128 +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 32: simd128x512 +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 32: rmsnorm128x512x128 +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 24: dma128x2048 +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 24: dma128x2048 +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 16: dma128x1024 +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 2: reduce512x1x1 +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 2: dma1x512 +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 2: simd1x512 +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/DMATilingProfiler]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/LoopSplitting]: LoopSplitting finished after 0.008 seconds +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/MacroGeneration]: Running MacroGeneration +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/DMATilingProfiler]: DMATilingProfiler finished after 0.012 seconds +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier_iteration_0 +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/NeuronSimplifier]: NeuronSimplifier_iteration_0 finished after 0.029 seconds +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/NeuronSimplifier]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.029 seconds +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/LegalizeSundaMacro]: Running LegalizeSundaMacro +2025-11-04T21:38:38Z INFO 8680 [sg0000/Tensorizer/ParAxesAnnotation]: Finished (changed=True) +2025-11-04T21:38:38Z INFO 8680 [sg0000/Tensorizer/ParAxesAnnotation]: ParAxesAnnotation finished after 0.645 seconds +2025-11-04T21:38:38Z INFO 8680 [sg0000/Tensorizer/InsertLocalTransposes]: Running InsertLocalTransposes +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/LegalizeSundaMacro]: Finished (changed=True) +2025-11-04T21:38:38Z INFO 8680 [sg0000/Tensorizer/InsertLocalTransposes]: Finished (changed=True) +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/LegalizeSundaMacro]: LegalizeSundaMacro finished after 0.041 seconds +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/InsertImplicitShardAxisBeforeISel]: Running InsertImplicitShardAxisBeforeISel +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/InsertImplicitShardAxisBeforeISel]: Finished (changed=True) +2025-11-04T21:38:38Z INFO 8680 [sg0000/Tensorizer/InsertLocalTransposes]: InsertLocalTransposes finished after 0.012 seconds +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/MacroGeneration]: Finished (changed=True) +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/InsertImplicitShardAxisBeforeISel]: InsertImplicitShardAxisBeforeISel finished after 0.015 seconds +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier_iteration_0 +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/NeuronSimplifier]: NeuronSimplifier_iteration_0 finished after 0.019 seconds +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier_iteration_1 +2025-11-04T21:38:38Z INFO 8680 [sg0000/Tensorizer/PAGLayoutOpt]: PAGLayoutOpt finished after 0.696 seconds +2025-11-04T21:38:38Z INFO 8680 [sg0000/Tensorizer/DelinearizeSPMD]: Running DelinearizeSPMD +2025-11-04T21:38:38Z INFO 8680 [sg0000/Tensorizer/Delinearization]: Running Delinearization +2025-11-04T21:38:38Z INFO 8680 [sg0000/Tensorizer/Delinearization]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/NeuronSimplifier]: NeuronSimplifier_iteration_1 finished after 0.018 seconds +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/MacroGeneration]: MacroGeneration finished after 0.129 seconds +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/NeuronSimplifier]: Finished (changed=True) +2025-11-04T21:38:38Z INFO 8680 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.010 seconds +2025-11-04T21:38:38Z INFO 8680 [sg0000/Tensorizer/DelinearizeSPMD]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8680 [sg0000/Tensorizer/DelinearizeSPMD]: DelinearizeSPMD finished after 0.031 seconds +2025-11-04T21:38:38Z INFO 8680 [sg0000/Tensorizer/ShardingPropagationAnalysis]: Running ShardingPropagationAnalysis +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.038 seconds +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/PerfectLoopNest]: Running PerfectLoopNest +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/PerfectLoopNest]: Finished (changed=False) +2025-11-04T21:38:38Z INFO 8680 [sg0000/Tensorizer/ShardingPropagationAnalysis]: ShardingPropagationAnalysis finished after 0.016 seconds +2025-11-04T21:38:38Z INFO 8680 [sg0000/Tensorizer/InferShardAxis]: Running InferShardAxis +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/PGTiling]: PGTiling finished after 0.437 seconds +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/InsertIOTransposes]: Running InsertIOTransposes +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/PerfectLoopNest]: PerfectLoopNest finished after 0.004 seconds +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/FlattenMacroLoop]: Finished (changed=True) +2025-11-04T21:38:38Z INFO 8681 [sg0001/Tensorizer/InsertIOTransposes]: Finished (changed=True) +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.014 seconds +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/RewriteWeights]: Running RewriteWeights +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/RewriteWeights]: Finished (changed=True) +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/RewriteWeights]: RewriteWeights finished after 0.005 seconds +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/ReshapeWeights]: Running ReshapeWeights +2025-11-04T21:38:38Z INFO 8682 [sg0002/Tensorizer/ReshapeWeights]: Finished (changed=True) +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/ReshapeWeights]: ReshapeWeights finished after 0.002 seconds +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/FlattenMacroLoop]: Finished (changed=False) +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.005 seconds +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/SimplifyMacroPredicates]: Running SimplifyMacroPredicates +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/SimplifyMacroPredicates]: Finished (changed=True) +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/InsertIOTransposes]: InsertIOTransposes finished after 0.040 seconds +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/InsertOffloadedTransposes]: Running InsertOffloadedTransposes +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/SimplifyMacroPredicates]: SimplifyMacroPredicates finished after 0.033 seconds +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/InferInitValue]: Running InferInitValue +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/InsertOffloadedTransposes]: OffloadedTranspose inserted: 0 +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/InsertOffloadedTransposes]: Finished (changed=False) +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/InsertOffloadedTransposes]: InsertOffloadedTransposes finished after 0.039 seconds +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/DramToDramTranspose]: Running DramToDramTranspose +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/DramToDramTranspose]: Finished (changed=False) +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/DramToDramTranspose]: DramToDramTranspose finished after 0.019 seconds +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/PGLayoutTilingPipeline]: PGLayoutTilingPipeline finished after 2.524 seconds +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/TilingProfiler]: Running TilingProfiler +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/TilingBottleneck]: +20 MACROS WITH LARGEST INSTRUCTION COUNTS: +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/TilingBottleneck]: 768: matmul_128x128x512 +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/TilingBottleneck]: 768: matmul_128x128x512 +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/TilingBottleneck]: 768: matmul_128x128x512 +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/TilingBottleneck]: 256: matmul_128x128x512 +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/TilingBottleneck]: 256: matmul_128x128x512 +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/TilingBottleneck]: 128: transpose_128x128 +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/TilingBottleneck]: 128: transpose_128x128 +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/TilingBottleneck]: 128: transpose_128x128 +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/TilingBottleneck]: 128: matmul_128x128x512 +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/TilingBottleneck]: 128: matmul_128x128x512 +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/TilingBottleneck]: 48: simd128x512 +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/TilingBottleneck]: 32: rmsnorm128x512x128 +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/TilingBottleneck]: 32: simd128x512 +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/TilingBottleneck]: 32: rmsnorm128x512x128 +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/TilingBottleneck]: 32: transpose_128x128 +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/TilingBottleneck]: 32: transpose_128x128 +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/TilingBottleneck]: 32: transpose_128x128 +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/TilingBottleneck]: 32: generic_store128x128 +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/TilingBottleneck]: 32: generic_store128x128 +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/TilingBottleneck]: 16: rmsnorm128x512x128 +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/InferInitValue]: Finished (changed=True) +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/TilingProfiler]: Finished (changed=False) +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/InferInitValue]: InferInitValue finished after 0.101 seconds +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier_iteration_0 +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/TilingProfiler]: TilingProfiler finished after 0.022 seconds +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronSimplifier]: NeuronSimplifier_iteration_0 finished after 0.018 seconds +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronSimplifier]: Finished (changed=False) +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.019 seconds +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/SimplifyTensor]: Running SimplifyTensor +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/FlattenMacroLoop]: Finished (changed=True) +2025-11-04T21:38:39Z INFO 8680 [sg0000/Tensorizer/ShardResult]: =================== Dumping Debug Info ===================== +2025-11-04T21:38:39Z INFO 8680 [sg0000/Tensorizer/ShardResult]: ------------------ Sharding summary ------------------ +total number of dags: 32 +total number of sharded dags: 25 + +total bytes transferred from input, output, non local tensors: 40385542 +total bytes transferred from input, output, non local tensors with 2x bandwidths: 31995904 +% bytes transferred with 2x bandwidths: 79.23 + +NC0 FLOPs: 36893488145284694019 +NC1 FLOPs: 36893488145284694016 +% FLOPs sharded: 100.00 + + +Shard dim: 1024, Number of dags: 24 +Matmuls sharded with this dim: +[1024(s),2,8,128] @ [2,8,128,2,2,2,2,64] = [1024(s),2,2,2,2,64] Number of occurrences: 1 +[1024(s),2,8,128] @ [2,8,128,4,128] = [1024(s),4,128] Number of occurrences: 1 +[1024(s),2,8,128] @ [2,8,128,4,2,64] = [1024(s),4,2,64] Number of occurrences: 1 +[64] @ [1024(s)] = [64,1024(s)] Number of occurrences: 1 + + +Shard dim: 2, Number of dags: 1 +Matmuls sharded with this dim: +[1024,4,2,128] @ [4,2,128,2(s),2,4,128] = [1024,2(s),2,4,128] (stationary-streaming swapped) Number of occurrences: 1 + + + +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/SimplifyTensor]: Running DeadCodeElimination_iteration_0 +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.021 seconds +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/InferNeuronTensor]: Running InferNeuronTensor +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/InferNeuronTensor]: Running InferNeuronTensor_iteration_0 +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/SimplifyTensor]: DeadCodeElimination_iteration_0 finished after 0.004 seconds +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/SimplifyTensor]: Finished (changed=True) +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/SimplifyTensor]: SimplifyTensor finished after 0.015 seconds +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/LICM]: Running LICM +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/LICM]: Finished (changed=True) +2025-11-04T21:38:39Z INFO 8680 [sg0000/Tensorizer/DelinearIndices]: Running DelinearIndices +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/LICM]: LICM finished after 0.008 seconds +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/SundaISel]: Running SundaISel +2025-11-04T21:38:39Z INFO 8680 [sg0000/Tensorizer/DelinearIndices]: Finished (changed=False) +2025-11-04T21:38:39Z INFO 8680 [sg0000/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.017 seconds +2025-11-04T21:38:39Z INFO 8680 [sg0000/Tensorizer/RemoveShardedPartitionAxes]: Running RemoveShardedPartitionAxes +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/InferNeuronTensor]: InferNeuronTensor_iteration_0 finished after 0.099 seconds +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/InferNeuronTensor]: Running InferNeuronTensor_iteration_1 +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/InferNeuronTensor]: InferNeuronTensor_iteration_1 finished after 0.003 seconds +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/InferNeuronTensor]: Finished (changed=True) +2025-11-04T21:38:39Z INFO 8680 [sg0000/Tensorizer/RemoveShardedPartitionAxes]: Finished (changed=True) +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/SundaISel]: Finished (changed=True) +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/InferNeuronTensor]: InferNeuronTensor finished after 0.103 seconds +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier_iteration_0 +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/NeuronSimplifier]: NeuronSimplifier_iteration_0 finished after 0.014 seconds +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/NeuronSimplifier]: Finished (changed=False) +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/SundaISel]: SundaISel finished after 0.071 seconds +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronAliasDependencyReset]: Running NeuronAliasDependencyReset +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/AliasDependencyElimination]: Finished (changed=False) +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.000 seconds +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronAliasDependencyInduction]: Running NeuronAliasDependencyInduction +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronAliasDependencyInduction]: Finished (changed=False) +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronAliasDependencyInduction]: NeuronAliasDependencyInduction finished after 0.002 seconds +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronAliasDependencyReset]: NeuronAliasDependencyReset finished after 0.027 seconds +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/LowerComplexBroadcast]: Running LowerComplexBroadcast +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/LowerComplexBroadcast]: Finished (changed=False) +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/LowerComplexBroadcast]: LowerComplexBroadcast finished after 0.009 seconds +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronLoopInterchange]: Running NeuronLoopInterchange +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronLoopInterchange]: Finished (changed=True) +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronLoopInterchange]: NeuronLoopInterchange finished after 0.008 seconds +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.013 seconds +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronLoopFusion]: Running NeuronLoopFusion +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronLoopFusion]: Running NeuronLoopFusion_iteration_0 +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronLoopFusion]: NeuronLoopFusion_iteration_0 finished after 0.028 seconds +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronLoopFusion]: Running NeuronLoopFusion_iteration_1 +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.015 seconds +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/LICM]: Running LICM +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/LICM]: Finished (changed=True) +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronLoopFusion]: NeuronLoopFusion_iteration_1 finished after 0.017 seconds +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronLoopFusion]: Running NeuronLoopFusion_iteration_2 +2025-11-04T21:38:39Z INFO 8680 [sg0000/Tensorizer/RemoveShardedPartitionAxes]: RemoveShardedPartitionAxes finished after 0.043 seconds +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronLoopFusion]: NeuronLoopFusion_iteration_2 finished after 0.009 seconds +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronLoopFusion]: Running NeuronLoopFusion_iteration_3 +2025-11-04T21:38:39Z INFO 8680 [sg0000/Tensorizer/InferShardAxis]: Finished (changed=True) +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronLoopFusion]: NeuronLoopFusion_iteration_3 finished after 0.007 seconds +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronLoopFusion]: Finished (changed=True) +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/LICM]: LICM finished after 0.007 seconds +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/RewriteReplicationMatmul]: Running RewriteReplicationMatmul +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/RewriteReplicationMatmul]: Finished (changed=False) +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronLoopFusion]: NeuronLoopFusion finished after 0.063 seconds +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronLoopInterchange]: Running NeuronLoopInterchange +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronLoopInterchange]: Finished (changed=False) +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronLoopInterchange]: NeuronLoopInterchange finished after 0.003 seconds +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronLICM]: Finished (changed=True) +2025-11-04T21:38:39Z INFO 8680 [sg0000/Tensorizer/InferShardAxis]: InferShardAxis finished after 0.638 seconds +2025-11-04T21:38:39Z INFO 8680 [sg0000/Tensorizer/MaskPropagation]: Running MaskPropagation +2025-11-04T21:38:39Z INFO 8680 [sg0000/Tensorizer/MaskPropagation]: Finished (changed=False) +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.013 seconds +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/FactorizeBlkDims]: Running FactorizeBlkDims +2025-11-04T21:38:39Z INFO 8680 [sg0000/Tensorizer/MaskPropagation]: MaskPropagation finished after 0.004 seconds +2025-11-04T21:38:39Z INFO 8680 [sg0000/Tensorizer/CanonicalizeDAGForPGTiling]: Running CanonicalizeDAGForPGTiling +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/FactorizeBlkDims]: Finished (changed=True) +2025-11-04T21:38:39Z INFO 8680 [sg0000/Tensorizer/CanonicalizeDAGForPGTiling]: Finished (changed=False) +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/FactorizeBlkDims]: FactorizeBlkDims finished after 0.021 seconds +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_0 +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/RewriteReplicationMatmul]: RewriteReplicationMatmul finished after 0.003 seconds +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/FlattenMacroLoop]: Finished (changed=True) +2025-11-04T21:38:39Z INFO 8680 [sg0000/Tensorizer/CanonicalizeDAGForPGTiling]: CanonicalizeDAGForPGTiling finished after 0.005 seconds +2025-11-04T21:38:39Z INFO 8680 [sg0000/Tensorizer/LowerCCOpBlockAxis]: Running LowerCCOpBlockAxis +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_0 finished after 0.045 seconds +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_1 +2025-11-04T21:38:39Z INFO 8680 [sg0000/Tensorizer/LowerCCOpBlockAxis]: Finished (changed=False) +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.011 seconds +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/SimplifyMacroPredicates]: Running SimplifyMacroPredicates +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_1 finished after 0.007 seconds +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronInstComb]: Finished (changed=True) +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/SimplifyMacroPredicates]: Finished (changed=False) +2025-11-04T21:38:39Z INFO 8680 [sg0000/Tensorizer/LowerCCOpBlockAxis]: LowerCCOpBlockAxis finished after 0.012 seconds +2025-11-04T21:38:39Z INFO 8680 [sg0000/Tensorizer/PGTiling]: Running PGTiling +2025-11-04T21:38:39Z INFO 8680 [sg0000/Tensorizer/AGOrderingAnalysisPass]: Running AGOrderingAnalysisPass +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/SimplifyMacroPredicates]: SimplifyMacroPredicates finished after 0.008 seconds +2025-11-04T21:38:39Z INFO 8681 [sg0001/Tensorizer/DataLocalityOpt]: Running DataLocalityOpt +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.057 seconds +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronValueNumbering]: Running NeuronValueNumbering +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronValueNumbering]: Finished (changed=False) +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronValueNumbering]: NeuronValueNumbering finished after 0.011 seconds +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_0 +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_0 finished after 0.009 seconds +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_1 +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_1 finished after 0.013 seconds +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronInstComb]: Finished (changed=True) +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.024 seconds +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/InferSharedMemLoc]: Running InferSharedMemLoc +2025-11-04T21:38:39Z INFO 8680 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 633 of IO tensor {'CrossPassTensor': ''}bfloat16 %input63|N|(128, 2, 8) is not sorted, index list (w/ AG ids): [(28, 'AG88'), (23, 'AG90'), (26, 'AG89')] +2025-11-04T21:38:39Z INFO 8680 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 634 of IO tensor {'CrossPassTensor': ''}bfloat16 %input67|NC|(2, 2, 128, 8, 2, 2, 2, 64) is not sorted, index list (w/ AG ids): [(23, 'AG90'), (28, 'AG88'), (26, 'AG89')] +2025-11-04T21:38:39Z INFO 8680 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 635 of IO tensor {'CrossPassTensor': ''}bfloat16 %input66|N|(64, 2) is not sorted, index list (w/ AG ids): [(24, 'AG93'), (21, 'AG96')] +2025-11-04T21:38:39Z INFO 8680 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 636 of IO tensor {'CrossPassTensor': ''}bfloat16 %input65|NC|(2, 128, 8, 4, 2, 64) is not sorted, index list (w/ AG ids): [(23, 'AG90'), (28, 'AG88'), (26, 'AG89')] +2025-11-04T21:38:39Z INFO 8680 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 637 of IO tensor {'CrossPassTensor': ''}bfloat16 %input64|N|(64, 2) is not sorted, index list (w/ AG ids): [(24, 'AG93'), (17, 'AG100')] +2025-11-04T21:38:39Z INFO 8680 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 638 of IO tensor {'CrossPassTensor': ''}bfloat16 %input62|NC|(2, 128, 8, 4, 128) is not sorted, index list (w/ AG ids): [(23, 'AG90'), (28, 'AG88'), (26, 'AG89')] +2025-11-04T21:38:39Z INFO 8680 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 419 of IO tensor {'CrossPassTensor': ''}bfloat16 %input61|NC|(2, 2, 128, 4, 2, 4, 128) is not sorted, index list (w/ AG ids): [(27, 'AG106'), (22, 'AG108'), (25, 'AG107')] +2025-11-04T21:38:39Z INFO 8680 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 631 of IO tensor non_local bfloat16 %all_gather.1(2, 8, 128, 2, 512) is not sorted, index list (w/ AG ids): [(23, 'AG90'), (26, 'AG89'), (28, 'AG88'), (1, 'AG92')] +2025-11-04T21:38:39Z INFO 8680 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 582 of IO tensor non_local bfloat16 %reshape.16(2, 2, 2, 2, 64, 2, 512) is not sorted, index list (w/ AG ids): [(7, 'AG99'), (12, 'AG98'), (16, 'AG97'), (21, 'AG96'), (24, 'AG93'), (1, 'AG92')] +2025-11-04T21:38:39Z INFO 8680 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 676 of IO tensor non_local bfloat16 %reshape.24(4, 2, 2, 64, 2, 512) is not sorted, index list (w/ AG ids): [(8, 'AG101'), (13, 'AG102'), (17, 'AG100'), (24, 'AG93'), (1, 'AG92')] +2025-11-04T21:38:39Z INFO 8680 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 614 of IO tensor non_local bfloat16 %reshape.29(4, 2, 2, 512, 128) is not sorted, index list (w/ AG ids): [(9, 'AG104'), (14, 'AG105'), (1, 'AG92'), (18, 'AG103')] +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/InferSharedMemLoc]: Finished (changed=True) +2025-11-04T21:38:39Z INFO 8680 [sg0000/Tensorizer/AGOrderingAnalysisPass]: AGOrderingAnalysisPass finished after 0.102 seconds +2025-11-04T21:38:39Z INFO 8680 [sg0000/Tensorizer/StaticTransposeLocalTensor]: Running StaticTransposeLocalTensor +2025-11-04T21:38:39Z INFO 8680 [sg0000/Tensorizer/StaticTransposeLocalTensor]: Finished (changed=True) +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/InferSharedMemLoc]: InferSharedMemLoc finished after 0.012 seconds +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/VectorizeDMA]: Running VectorizeDMA +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/VectorizeDMA]: Running VectorizeDMA_iteration_0 +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/VectorizeDMA]: VectorizeDMA_iteration_0 finished after 0.005 seconds +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/VectorizeDMA]: Finished (changed=False) +2025-11-04T21:38:39Z INFO 8680 [sg0000/Tensorizer/StaticTransposeLocalTensor]: StaticTransposeLocalTensor finished after 0.015 seconds +2025-11-04T21:38:39Z INFO 8680 [sg0000/Tensorizer/PComputeCutting]: Running PComputeCutting +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/VectorizeDMA]: VectorizeDMA finished after 0.006 seconds +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-11-04T21:38:39Z INFO 8680 [sg0000/Tensorizer/PComputeCutting]: Finished (changed=True) +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.013 seconds +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/LegalizePartitionReduce]: Running LegalizePartitionReduce +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/LegalizePartitionReduce]: Finished (changed=False) +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/LegalizePartitionReduce]: LegalizePartitionReduce finished after 0.003 seconds +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/DeConcat]: Running DeConcat +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/DeConcat]: Running DeConcat_iteration_0 +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/DeConcat]: DeConcat_iteration_0 finished after 0.001 seconds +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/DeConcat]: Finished (changed=False) +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/DeConcat]: DeConcat finished after 0.002 seconds +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/FactorizeThreadAxesInFreeDims]: Running FactorizeThreadAxesInFreeDims +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/FactorizeThreadAxesInFreeDims]: Finished (changed=False) +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/FactorizeThreadAxesInFreeDims]: FactorizeThreadAxesInFreeDims finished after 0.003 seconds +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/PartialSimdFusion]: Running PartialSimdFusion +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/PartialSimdFusion]: Running PartialSimdFusion_iteration_0 +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/PartialSimdFusion]: PartialSimdFusion_iteration_0 finished after 0.014 seconds +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/PartialSimdFusion]: Finished (changed=True) +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/PartialSimdFusion]: PartialSimdFusion finished after 0.014 seconds +2025-11-04T21:38:39Z INFO 8682 [sg0002/Tensorizer/TritiumFusion]: Running TritiumFusion +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/PComputeCutting]: PComputeCutting finished after 0.029 seconds +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/BFComputeCutting]: Running BFComputeCutting +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/BFComputeCutting]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/BFComputeCutting]: BFComputeCutting finished after 0.005 seconds +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/LoopSplitting]: Running LoopSplitting +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/LoopSplitting]: Finished (changed=False) +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/LoopSplitting]: LoopSplitting finished after 0.002 seconds +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/MacroGeneration]: Running MacroGeneration +2025-11-04T21:38:40Z INFO 8682 [sg0002/Tensorizer/TritiumFusion]: Finished (changed=True) +2025-11-04T21:38:40Z WARNING 8681 [sg0001/Tensorizer/DataLocalityOpt]: Generated 128x1 DMA for macro: + dma128x1:free_axes={};partition_axes={i2_1_1_1516=[0:128:1]};#instances=8192 { + for (i2_1_1_1516: range(0, 128, 1)) { # indent=16 + bfloat16 $1515[i2_0_1516, i0_0_1516, i2_1_0_1516, i0_1_1516, i1_1516, i3_1516, i2_1_1_1516] = load TongaSB partitions[3] bfloat16 (2, 2, 4, 128, 2, 2, 128) %1517[i0_0_1516, i2_0_1516, i2_1_0_1516, i2_1_1_1516, i0_1_1516, i1_1516, i3_1516] # dl = tensor_op_name: _reshape.335 | hlo_id: 160 | + non_local bfloat16 (2, 2, 2, 2, 4, 128, 128) %'reshape.73'[i0_0_1516, i0_1_1516, i1_1516, i2_0_1516, i2_1_0_1516, i2_1_1_1516, i3_1516] = store bfloat16 $1515[i2_0_1516, i0_0_1516, i2_1_0_1516, i0_1_1516, i1_1516, i3_1516, i2_1_1_1516] # dl = tensor_op_name: _reshape.335 | hlo_id: 160 | , id = 644 + } + } +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/DataLocalityOpt]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8682 [sg0002/Tensorizer/TritiumFusion]: TritiumFusion finished after 0.090 seconds +2025-11-04T21:38:40Z INFO 8682 [sg0002/Tensorizer/CCOpFusion]: Running CCOpFusion +2025-11-04T21:38:40Z INFO 8682 [sg0002/Tensorizer/CCOpFusion]: Running CCOpFusion_iteration_0 +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/DataLocalityOpt]: DataLocalityOpt finished after 0.360 seconds +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/DMATilingProfiler]: Running DMATilingProfiler +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/PostDLOTilingBottleneck]: +20 MACROS WITH LARGEST INSTRUCTION COUNTS: +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 8192: dma128x1 +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 768: matmul_128x128x512 +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 768: matmul_128x128x512 +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 768: matmul_128x128x512 +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 256: matmul_128x128x512 +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 256: matmul_128x128x512 +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 128: transpose_128x128 +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 128: transpose_128x128 +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 128: transpose_128x128 +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 128: matmul_128x128x512 +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 128: matmul_128x128x512 +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 96: dma128x512 +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 48: simd128x512 +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 32: rmsnorm128x512x128 +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 32: simd128x512 +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 32: rmsnorm128x512x128 +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 32: transpose_128x128 +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 32: transpose_128x128 +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 32: transpose_128x128 +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 32: dma128x512 +2025-11-04T21:38:40Z INFO 8682 [sg0002/Tensorizer/CCOpFusion]: CCOpFusion_iteration_0 finished after 0.036 seconds +2025-11-04T21:38:40Z INFO 8682 [sg0002/Tensorizer/CCOpFusion]: Finished (changed=False) +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/DMATilingProfiler]: Finished (changed=False) +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/DMATilingProfiler]: DMATilingProfiler finished after 0.026 seconds +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier_iteration_0 +2025-11-04T21:38:40Z INFO 8682 [sg0002/Tensorizer/CCOpFusion]: CCOpFusion finished after 0.037 seconds +2025-11-04T21:38:40Z INFO 8682 [sg0002/Tensorizer/VectorizeMatMult]: Running VectorizeMatMult +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/MacroGeneration]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/NeuronSimplifier]: NeuronSimplifier_iteration_0 finished after 0.038 seconds +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/NeuronSimplifier]: Finished (changed=False) +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/MacroGeneration]: MacroGeneration finished after 0.120 seconds +2025-11-04T21:38:40Z INFO 8682 [sg0002/Tensorizer/VectorizeMatMult]: Finished (changed=False) +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/PGTiling]: PGTiling finished after 0.493 seconds +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/InsertIOTransposes]: Running InsertIOTransposes +2025-11-04T21:38:40Z INFO 8682 [sg0002/Tensorizer/VectorizeMatMult]: VectorizeMatMult finished after 0.019 seconds +2025-11-04T21:38:40Z INFO 8682 [sg0002/Tensorizer/PartialLoopFusion]: Running PartialLoopFusion +2025-11-04T21:38:40Z INFO 8682 [sg0002/Tensorizer/PartialLoopFusion]: Running PartialLoopFusion_iteration_0 +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.039 seconds +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/LegalizeSundaMacro]: Running LegalizeSundaMacro +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/InsertIOTransposes]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8682 [sg0002/Tensorizer/PartialLoopFusion]: PartialLoopFusion_iteration_0 finished after 0.047 seconds +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/LegalizeSundaMacro]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8682 [sg0002/Tensorizer/PartialLoopFusion]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/InsertIOTransposes]: InsertIOTransposes finished after 0.050 seconds +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/InsertOffloadedTransposes]: Running InsertOffloadedTransposes +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/InsertOffloadedTransposes]: OffloadedTranspose inserted: 0 +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/InsertOffloadedTransposes]: Finished (changed=False) +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/LegalizeSundaMacro]: LegalizeSundaMacro finished after 0.031 seconds +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/InsertImplicitShardAxisBeforeISel]: Running InsertImplicitShardAxisBeforeISel +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/InsertImplicitShardAxisBeforeISel]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8682 [sg0002/Tensorizer/PartialLoopFusion]: PartialLoopFusion finished after 0.050 seconds +2025-11-04T21:38:40Z INFO 8682 [sg0002/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/InsertImplicitShardAxisBeforeISel]: InsertImplicitShardAxisBeforeISel finished after 0.011 seconds +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier_iteration_0 +2025-11-04T21:38:40Z INFO 8682 [sg0002/Tensorizer/NeuronLICM]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8682 [sg0002/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.016 seconds +2025-11-04T21:38:40Z INFO 8682 [sg0002/Tensorizer/LowerTranspose]: Running LowerTranspose +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/NeuronSimplifier]: NeuronSimplifier_iteration_0 finished after 0.019 seconds +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/NeuronSimplifier]: Finished (changed=False) +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.020 seconds +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/PerfectLoopNest]: Running PerfectLoopNest +2025-11-04T21:38:40Z INFO 8682 [sg0002/Tensorizer/LowerTranspose]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/PerfectLoopNest]: Finished (changed=False) +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/InsertOffloadedTransposes]: InsertOffloadedTransposes finished after 0.015 seconds +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/DramToDramTranspose]: Running DramToDramTranspose +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/DramToDramTranspose]: Finished (changed=False) +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/PerfectLoopNest]: PerfectLoopNest finished after 0.007 seconds +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/DramToDramTranspose]: DramToDramTranspose finished after 0.016 seconds +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/PGLayoutTilingPipeline]: PGLayoutTilingPipeline finished after 2.859 seconds +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/TilingProfiler]: Running TilingProfiler +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/FlattenMacroLoop]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/TilingBottleneck]: +20 MACROS WITH LARGEST INSTRUCTION COUNTS: +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/TilingBottleneck]: 256: matmul_128x128x512 +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/TilingBottleneck]: 256: matmul_128x128x512 +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/TilingBottleneck]: 128: transpose_128x128 +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/TilingBottleneck]: 128: matmul_128x128x512 +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/TilingBottleneck]: 128: matmul_128x128x512 +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/TilingBottleneck]: 64: transpose_128x128 +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/TilingBottleneck]: 64: transpose_128x128 +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/TilingBottleneck]: 64: transpose_128x128 +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/TilingBottleneck]: 64: transpose_128x128 +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/TilingBottleneck]: 32: simd128x512 +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/TilingBottleneck]: 32: rmsnorm128x512x128 +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/TilingBottleneck]: 32: transpose_128x128 +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/TilingBottleneck]: 32: transpose_128x128 +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.030 seconds +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/RewriteWeights]: Running RewriteWeights +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/TilingBottleneck]: 32: generic_store128x128 +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/TilingBottleneck]: 32: generic_store128x128 +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/TilingBottleneck]: 16: indirect_load128x512 +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/TilingBottleneck]: 16: rmsnorm128x512x128 +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/TilingBottleneck]: 16: simd128x256 +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/TilingBottleneck]: 16: simd128x256 +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/TilingBottleneck]: 16: simd128x512 +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/TilingProfiler]: Finished (changed=False) +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/RewriteWeights]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/RewriteWeights]: RewriteWeights finished after 0.012 seconds +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/ReshapeWeights]: Running ReshapeWeights +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/ReshapeWeights]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/TilingProfiler]: TilingProfiler finished after 0.030 seconds +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/ReshapeWeights]: ReshapeWeights finished after 0.002 seconds +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/FlattenMacroLoop]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/FlattenMacroLoop]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.028 seconds +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/InferNeuronTensor]: Running InferNeuronTensor +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/InferNeuronTensor]: Running InferNeuronTensor_iteration_0 +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.012 seconds +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/SimplifyMacroPredicates]: Running SimplifyMacroPredicates +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/SimplifyMacroPredicates]: Finished (changed=False) +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/SimplifyMacroPredicates]: SimplifyMacroPredicates finished after 0.022 seconds +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/InferInitValue]: Running InferInitValue +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/InferNeuronTensor]: InferNeuronTensor_iteration_0 finished after 0.064 seconds +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/InferNeuronTensor]: Running InferNeuronTensor_iteration_1 +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/InferNeuronTensor]: InferNeuronTensor_iteration_1 finished after 0.002 seconds +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/InferNeuronTensor]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/InferNeuronTensor]: InferNeuronTensor finished after 0.066 seconds +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier_iteration_0 +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/NeuronSimplifier]: NeuronSimplifier_iteration_0 finished after 0.023 seconds +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/NeuronSimplifier]: Finished (changed=False) +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.024 seconds +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/LICM]: Running LICM +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/LICM]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8682 [sg0002/Tensorizer/LowerTranspose]: LowerTranspose finished after 0.022 seconds +2025-11-04T21:38:40Z INFO 8682 [sg0002/Tensorizer/LowerBroadcast]: Running LowerBroadcast +2025-11-04T21:38:40Z INFO 8682 [sg0002/Tensorizer/LowerBroadcast]: Finished (changed=False) +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/LICM]: LICM finished after 0.010 seconds +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/RewriteReplicationMatmul]: Running RewriteReplicationMatmul +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/RewriteReplicationMatmul]: Finished (changed=False) +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/InferInitValue]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/RewriteReplicationMatmul]: RewriteReplicationMatmul finished after 0.002 seconds +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/FlattenMacroLoop]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.010 seconds +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/SimplifyMacroPredicates]: Running SimplifyMacroPredicates +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/SimplifyMacroPredicates]: Finished (changed=False) +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/SimplifyMacroPredicates]: SimplifyMacroPredicates finished after 0.004 seconds +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/DataLocalityOpt]: Running DataLocalityOpt +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/InferInitValue]: InferInitValue finished after 0.090 seconds +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier_iteration_0 +2025-11-04T21:38:40Z INFO 8682 [sg0002/Tensorizer/LowerBroadcast]: LowerBroadcast finished after 0.005 seconds +2025-11-04T21:38:40Z INFO 8682 [sg0002/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb +2025-11-04T21:38:40Z INFO 8682 [sg0002/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb_iteration_0 +2025-11-04T21:38:40Z INFO 8682 [sg0002/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb_iteration_0 finished after 0.023 seconds +2025-11-04T21:38:40Z INFO 8682 [sg0002/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb_iteration_1 +2025-11-04T21:38:40Z INFO 8682 [sg0002/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb_iteration_1 finished after 0.008 seconds +2025-11-04T21:38:40Z INFO 8682 [sg0002/Tensorizer/LateNeuronInstComb]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/NeuronSimplifier]: NeuronSimplifier_iteration_0 finished after 0.048 seconds +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/NeuronSimplifier]: Finished (changed=False) +2025-11-04T21:38:40Z INFO 8682 [sg0002/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb finished after 0.033 seconds +2025-11-04T21:38:40Z INFO 8682 [sg0002/Tensorizer/SplitAccGrp]: Running SplitAccGrp +2025-11-04T21:38:40Z INFO 8682 [sg0002/Tensorizer/SplitAccGrp]: Finished (changed=False) +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.056 seconds +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/SimplifyTensor]: Running SimplifyTensor +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/SimplifyTensor]: Running DeadCodeElimination_iteration_0 +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/SimplifyTensor]: DeadCodeElimination_iteration_0 finished after 0.004 seconds +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/SimplifyTensor]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8682 [sg0002/Tensorizer/SplitAccGrp]: SplitAccGrp finished after 0.006 seconds +2025-11-04T21:38:40Z INFO 8682 [sg0002/Tensorizer/SpillPSum]: Running SpillPSum +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/SimplifyTensor]: SimplifyTensor finished after 0.015 seconds +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/LICM]: Running LICM +2025-11-04T21:38:40Z INFO 8682 [sg0002/Tensorizer/SpillPSum]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/LICM]: Finished (changed=False) +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/LICM]: LICM finished after 0.009 seconds +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/SundaISel]: Running SundaISel +2025-11-04T21:38:40Z INFO 8682 [sg0002/Tensorizer/SpillPSum]: SpillPSum finished after 0.031 seconds +2025-11-04T21:38:40Z INFO 8682 [sg0002/Tensorizer/LowerIntrinsics]: Running LowerIntrinsics +2025-11-04T21:38:40Z WARNING 8680 [sg0000/Tensorizer/DataLocalityOpt]: Generated 128x1 DMA for macro: + dma128x1:free_axes={};partition_axes={i2_1_1_1626=[0:128:1]};#instances=8192 { + for (i2_1_1_1626: range(0, 128, 1)) { # indent=16 + bfloat16 $1625[i2_0_1626, i0_0_1626, i2_1_0_1626, i0_1_1626, i1_1626, i3_1626, i2_1_1_1626] = load TongaSB partitions[3] bfloat16 (2, 2, 4, 128, 2, 2, 128) %1627[i0_0_1626, i2_0_1626, i2_1_0_1626, i2_1_1_1626, i0_1_1626, i1_1626, i3_1626] # dl = tensor_op_name: _reshape.90 | hlo_id: 134 | + non_local bfloat16 (2, 2, 2, 2, 4, 128, 128) %'reshape.29'[i0_0_1626, i0_1_1626, i1_1626, i2_0_1626, i2_1_0_1626, i2_1_1_1626, i3_1626] = store bfloat16 $1625[i2_0_1626, i0_0_1626, i2_1_0_1626, i0_1_1626, i1_1626, i3_1626, i2_1_1_1626] # dl = tensor_op_name: _reshape.90 | hlo_id: 134 | , id = 614 + } + } +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/DataLocalityOpt]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/DataLocalityOpt]: DataLocalityOpt finished after 0.228 seconds +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/DMATilingProfiler]: Running DMATilingProfiler +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/PostDLOTilingBottleneck]: +20 MACROS WITH LARGEST INSTRUCTION COUNTS: +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 8192: dma128x1 +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 256: matmul_128x128x512 +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 256: matmul_128x128x512 +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 128: transpose_128x128 +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 128: matmul_128x128x512 +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 128: matmul_128x128x512 +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 64: transpose_128x128 +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 64: transpose_128x128 +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 64: transpose_128x128 +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 64: transpose_128x128 +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 32: simd128x512 +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 32: dma128x512 +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 32: rmsnorm128x512x128 +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 32: transpose_128x128 +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 32: transpose_128x128 +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 32: dma128x512 +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 32: generic_store128x128 +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 32: generic_store128x128 +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 16: indirect_load128x512 +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 16: rmsnorm128x512x128 +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/DMATilingProfiler]: Finished (changed=False) +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/SundaISel]: Finished (changed=True) +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/DMATilingProfiler]: DMATilingProfiler finished after 0.017 seconds +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier +2025-11-04T21:38:40Z INFO 8680 [sg0000/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier_iteration_0 +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/SundaISel]: SundaISel finished after 0.091 seconds +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/NeuronAliasDependencyReset]: Running NeuronAliasDependencyReset +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/AliasDependencyElimination]: Finished (changed=False) +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.000 seconds +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/NeuronAliasDependencyInduction]: Running NeuronAliasDependencyInduction +2025-11-04T21:38:40Z INFO 8681 [sg0001/Tensorizer/NeuronAliasDependencyInduction]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/LowerIntrinsics]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/NeuronAliasDependencyInduction]: NeuronAliasDependencyInduction finished after 0.001 seconds +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/NeuronSimplifier]: NeuronSimplifier_iteration_0 finished after 0.043 seconds +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/NeuronSimplifier]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/NeuronAliasDependencyReset]: NeuronAliasDependencyReset finished after 0.023 seconds +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/LowerComplexBroadcast]: Running LowerComplexBroadcast +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/LowerComplexBroadcast]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/LowerComplexBroadcast]: LowerComplexBroadcast finished after 0.005 seconds +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/NeuronLoopInterchange]: Running NeuronLoopInterchange +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/NeuronLoopInterchange]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/NeuronLoopInterchange]: NeuronLoopInterchange finished after 0.004 seconds +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.044 seconds +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/LegalizeSundaMacro]: Running LegalizeSundaMacro +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/LowerIntrinsics]: LowerIntrinsics finished after 0.098 seconds +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/InlineNativeKernels]: Running InlineNativeKernels +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/InlineNativeKernels]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/LegalizeSundaMacro]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.007 seconds +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/NeuronLoopFusion]: Running NeuronLoopFusion +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/NeuronLoopFusion]: Running NeuronLoopFusion_iteration_0 +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/LegalizeSundaMacro]: LegalizeSundaMacro finished after 0.042 seconds +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/InsertImplicitShardAxisBeforeISel]: Running InsertImplicitShardAxisBeforeISel +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/NeuronLoopFusion]: NeuronLoopFusion_iteration_0 finished after 0.031 seconds +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/NeuronLoopFusion]: Running NeuronLoopFusion_iteration_1 +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/InsertImplicitShardAxisBeforeISel]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/InlineNativeKernels]: InlineNativeKernels finished after 0.007 seconds +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/LegalizeType]: Running LegalizeType +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/NeuronLoopFusion]: NeuronLoopFusion_iteration_1 finished after 0.010 seconds +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/NeuronLoopFusion]: Running NeuronLoopFusion_iteration_2 +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/LegalizeType]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/NeuronLoopFusion]: NeuronLoopFusion_iteration_2 finished after 0.007 seconds +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/NeuronLoopFusion]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/InsertImplicitShardAxisBeforeISel]: InsertImplicitShardAxisBeforeISel finished after 0.014 seconds +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier_iteration_0 +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/LegalizeType]: LegalizeType finished after 0.010 seconds +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/NeuronLoopFusion]: NeuronLoopFusion finished after 0.052 seconds +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/NeuronSimplifier]: NeuronSimplifier_iteration_0 finished after 0.040 seconds +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/NeuronLoopInterchange]: Running NeuronLoopInterchange +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/NeuronSimplifier]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/NeuronLoopInterchange]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/NeuronLICM]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.041 seconds +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/PerfectLoopNest]: Running PerfectLoopNest +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/PerfectLoopNest]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.028 seconds +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/InferPSumTensor]: Running InferPSumTensor +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/InferPSumTensor]: Running InferPSumTensor_iteration_0 +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/InferPSumTensor]: InferPSumTensor_iteration_0 finished after 0.021 seconds +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/InferPSumTensor]: Running InferPSumTensor_iteration_1 +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/PerfectLoopNest]: PerfectLoopNest finished after 0.003 seconds +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/FlattenMacroLoop]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/NeuronLoopInterchange]: NeuronLoopInterchange finished after 0.004 seconds +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/InferPSumTensor]: InferPSumTensor_iteration_1 finished after 0.023 seconds +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/InferPSumTensor]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.018 seconds +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/RewriteWeights]: Running RewriteWeights +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/NeuronLICM]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/RewriteWeights]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.017 seconds +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/FactorizeBlkDims]: Running FactorizeBlkDims +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/RewriteWeights]: RewriteWeights finished after 0.008 seconds +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/ReshapeWeights]: Running ReshapeWeights +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/ReshapeWeights]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/FactorizeBlkDims]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/ReshapeWeights]: ReshapeWeights finished after 0.005 seconds +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/FlattenMacroLoop]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/FactorizeBlkDims]: FactorizeBlkDims finished after 0.036 seconds +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_0 +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.015 seconds +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/SimplifyMacroPredicates]: Running SimplifyMacroPredicates +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/SimplifyMacroPredicates]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/InferPSumTensor]: InferPSumTensor finished after 0.045 seconds +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/WeightCoalescing]: Running WeightCoalescing +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/WeightCoalescing]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/SimplifyMacroPredicates]: SimplifyMacroPredicates finished after 0.011 seconds +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/InferInitValue]: Running InferInitValue +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/WeightCoalescing]: WeightCoalescing finished after 0.005 seconds +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/LegalizeSundaAccess]: Running LegalizeSundaAccess +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/InferInitValue]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_0 finished after 0.114 seconds +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_1 +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/InferInitValue]: InferInitValue finished after 0.073 seconds +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier_iteration_0 +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_1 finished after 0.017 seconds +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/NeuronInstComb]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/LegalizeSundaAccess]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.139 seconds +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/NeuronValueNumbering]: Running NeuronValueNumbering +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/NeuronValueNumbering]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/LegalizeSundaAccess]: LegalizeSundaAccess finished after 0.081 seconds +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/RelaxPredicates]: Running RelaxPredicates +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/NeuronSimplifier]: NeuronSimplifier_iteration_0 finished after 0.042 seconds +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/NeuronSimplifier]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/NeuronValueNumbering]: NeuronValueNumbering finished after 0.008 seconds +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_0 +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/RelaxPredicates]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.043 seconds +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/SimplifyTensor]: Running SimplifyTensor +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_0 finished after 0.023 seconds +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/NeuronInstComb]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/SimplifyTensor]: Running DeadCodeElimination_iteration_0 +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/RelaxPredicates]: RelaxPredicates finished after 0.023 seconds +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/SimplifyTensor]: DeadCodeElimination_iteration_0 finished after 0.007 seconds +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/TensorInitialization]: Running TensorInitialization +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/SimplifyTensor]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/SimplifyTensor]: SimplifyTensor finished after 0.024 seconds +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/LICM]: Running LICM +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/TensorInitialization]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/LICM]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/TensorInitialization]: TensorInitialization finished after 0.021 seconds +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/LICM]: LICM finished after 0.009 seconds +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/SundaISel]: Running SundaISel +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.029 seconds +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/ExpandISAMacro]: Running ExpandISAMacro +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/ExpandISAMacro]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.024 seconds +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/InferSharedMemLoc]: Running InferSharedMemLoc +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/ExpandISAMacro]: ExpandISAMacro finished after 0.015 seconds +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/SimplifyNeuronTensor]: Running SimplifyNeuronTensor +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/InferSharedMemLoc]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/InferSharedMemLoc]: InferSharedMemLoc finished after 0.026 seconds +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/VectorizeDMA]: Running VectorizeDMA +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/VectorizeDMA]: Running VectorizeDMA_iteration_0 +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/VectorizeDMA]: VectorizeDMA_iteration_0 finished after 0.009 seconds +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/VectorizeDMA]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/SimplifyNeuronTensor]: Running DeadCodeElimination_iteration_0 +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/SimplifyNeuronTensor]: DeadCodeElimination_iteration_0 finished after 0.002 seconds +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/SimplifyNeuronTensor]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/VectorizeDMA]: VectorizeDMA finished after 0.010 seconds +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/SimplifyNeuronTensor]: SimplifyNeuronTensor finished after 0.045 seconds +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/DMALocalityOpt]: Running DMALocalityOpt +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/DMALocalityOpt]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.009 seconds +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/LegalizePartitionReduce]: Running LegalizePartitionReduce +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/LegalizePartitionReduce]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/SundaISel]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/LegalizePartitionReduce]: LegalizePartitionReduce finished after 0.003 seconds +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/DeConcat]: Running DeConcat +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/DeConcat]: Running DeConcat_iteration_0 +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/DeConcat]: DeConcat_iteration_0 finished after 0.006 seconds +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/DeConcat]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/DMALocalityOpt]: DMALocalityOpt finished after 0.002 seconds +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/DataStreaming]: Running DataStreaming +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/DeConcat]: DeConcat finished after 0.007 seconds +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/FactorizeThreadAxesInFreeDims]: Running FactorizeThreadAxesInFreeDims +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/FactorizeThreadAxesInFreeDims]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/DataStreaming]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/DataStreaming]: DataStreaming finished after 0.018 seconds +2025-11-04T21:38:41Z INFO 8682 [sg0002/Tensorizer/SFKVectorizer]: Running SFKVectorizer +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/FactorizeThreadAxesInFreeDims]: FactorizeThreadAxesInFreeDims finished after 0.008 seconds +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/PartialSimdFusion]: Running PartialSimdFusion +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/PartialSimdFusion]: Running PartialSimdFusion_iteration_0 +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/SundaISel]: SundaISel finished after 0.159 seconds +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/NeuronAliasDependencyReset]: Running NeuronAliasDependencyReset +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/AliasDependencyElimination]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.000 seconds +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/NeuronAliasDependencyInduction]: Running NeuronAliasDependencyInduction +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/NeuronAliasDependencyInduction]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/NeuronAliasDependencyInduction]: NeuronAliasDependencyInduction finished after 0.001 seconds +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/NeuronAliasDependencyReset]: NeuronAliasDependencyReset finished after 0.021 seconds +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/LowerComplexBroadcast]: Running LowerComplexBroadcast +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/LowerComplexBroadcast]: Finished (changed=False) +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/LowerComplexBroadcast]: LowerComplexBroadcast finished after 0.009 seconds +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/NeuronLoopInterchange]: Running NeuronLoopInterchange +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/PartialSimdFusion]: PartialSimdFusion_iteration_0 finished after 0.079 seconds +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/PartialSimdFusion]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8680 [sg0000/Tensorizer/NeuronLoopInterchange]: Finished (changed=True) +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/PartialSimdFusion]: PartialSimdFusion finished after 0.080 seconds +2025-11-04T21:38:41Z INFO 8681 [sg0001/Tensorizer/TritiumFusion]: Running TritiumFusion +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/NeuronLoopInterchange]: NeuronLoopInterchange finished after 0.012 seconds +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.009 seconds +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/NeuronLoopFusion]: Running NeuronLoopFusion +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/NeuronLoopFusion]: Running NeuronLoopFusion_iteration_0 +2025-11-04T21:38:42Z INFO 8682 [sg0002/Tensorizer/SFKVectorizer]: Running VectorizeLoop_iteration_0 +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/NeuronLoopFusion]: NeuronLoopFusion_iteration_0 finished after 0.026 seconds +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/NeuronLoopFusion]: Running NeuronLoopFusion_iteration_1 +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/NeuronLoopFusion]: NeuronLoopFusion_iteration_1 finished after 0.013 seconds +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/NeuronLoopFusion]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/NeuronLoopFusion]: NeuronLoopFusion finished after 0.040 seconds +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/NeuronLoopInterchange]: Running NeuronLoopInterchange +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/NeuronLoopInterchange]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/NeuronLoopInterchange]: NeuronLoopInterchange finished after 0.003 seconds +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-11-04T21:38:42Z INFO 8682 [sg0002/Tensorizer/SFKVectorizer]: VectorizeLoop_iteration_0 finished after 0.088 seconds +2025-11-04T21:38:42Z INFO 8682 [sg0002/Tensorizer/SFKVectorizer]: Running VectorizeLoop_iteration_1 +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/NeuronLICM]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8682 [sg0002/Tensorizer/SFKVectorizer]: VectorizeLoop_iteration_1 finished after 0.010 seconds +2025-11-04T21:38:42Z INFO 8682 [sg0002/Tensorizer/SFKVectorizer]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.044 seconds +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/FactorizeBlkDims]: Running FactorizeBlkDims +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/TritiumFusion]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8682 [sg0002/Tensorizer/SFKVectorizer]: SFKVectorizer finished after 0.288 seconds +2025-11-04T21:38:42Z INFO 8682 [sg0002/Tensorizer/LateLegalizeInst]: Running LateLegalizeInst +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/FactorizeBlkDims]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8682 [sg0002/Tensorizer/LateLegalizeInst]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/TritiumFusion]: TritiumFusion finished after 0.183 seconds +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/CCOpFusion]: Running CCOpFusion +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/CCOpFusion]: Running CCOpFusion_iteration_0 +2025-11-04T21:38:42Z INFO 8682 [sg0002/Tensorizer/LateLegalizeInst]: LateLegalizeInst finished after 0.013 seconds +2025-11-04T21:38:42Z INFO 8682 [sg0002/Tensorizer/CoalesceCCOp]: Running CoalesceCCOp +2025-11-04T21:38:42Z INFO 8682 [sg0002/Tensorizer/CoalesceCCOp]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/CCOpFusion]: CCOpFusion_iteration_0 finished after 0.034 seconds +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/CCOpFusion]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/FactorizeBlkDims]: FactorizeBlkDims finished after 0.023 seconds +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_0 +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/CCOpFusion]: CCOpFusion finished after 0.036 seconds +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/VectorizeMatMult]: Running VectorizeMatMult +2025-11-04T21:38:42Z INFO 8682 [sg0002/Tensorizer/CoalesceCCOp]: CoalesceCCOp finished after 0.020 seconds +2025-11-04T21:38:42Z INFO 8682 [sg0002/Tensorizer/SimpleAllReduceTiling]: Running SimpleAllReduceTiling +2025-11-04T21:38:42Z INFO 8682 [sg0002/Tensorizer/SimpleAllReduceTiling]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8682 [sg0002/Tensorizer/SimpleAllReduceTiling]: SimpleAllReduceTiling finished after 0.008 seconds +2025-11-04T21:38:42Z INFO 8682 [sg0002/Tensorizer/InsertCoreBarrier]: Running InsertCoreBarrier +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/VectorizeMatMult]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8682 [sg0002/Tensorizer/InsertCoreBarrier]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/VectorizeMatMult]: VectorizeMatMult finished after 0.042 seconds +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/PartialLoopFusion]: Running PartialLoopFusion +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/PartialLoopFusion]: Running PartialLoopFusion_iteration_0 +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_0 finished after 0.092 seconds +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_1 +2025-11-04T21:38:42Z INFO 8682 [sg0002/Tensorizer/InsertCoreBarrier]: InsertCoreBarrier finished after 0.011 seconds +2025-11-04T21:38:42Z INFO 8682 [sg0002/Tensorizer/DMAProfiler]: Running DMAProfiler +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_1 finished after 0.008 seconds +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/NeuronInstComb]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8682 [sg0002/Tensorizer/DMAProfiler]: Top 10 (estimated) latency DMAs: +2025-11-04T21:38:42Z INFO 8682 [sg0002/Tensorizer/DMAProfiler]: Est. DMA time: 1.523ms (300.000MiB, est bw: 206.549GB/s, 64.924% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[2] bfloat16 (2, 297, 128, 2048) %'992.1586'[i31_0,4i31_1_0_0+i31_1_0_1,i0.128,i1.128+128i2.16] = load bfloat16<128 x 2048> {'CrossPassTensor': ''}bfloat16 (2, 37984, 16, 128) %'input369'[i31_0,i0.128+512i31_1_0_0+128i31_1_0_1,i2.16,i1.128] # id=1585, src_id=None, , instances=600 # dl = tensor_op_name: input369_pftranspose_992 | hlo_id: 95 | if -i0.128-512i31_1_0_0-128i31_1_0_1+37983 >= 0 and -4i31_1_0_0-i31_1_0_1+296 >= 0 [[i0.128];[i1.128, i2.16]] -> [[i0.128];[i1.128, i2.16]] +2025-11-04T21:38:42Z INFO 8682 [sg0002/Tensorizer/DMAProfiler]: Est. DMA time: 198.539us (24.000MiB, est bw: 126.755GB/s, 8.463% of tot. time) for bfloat16<128 x 512> TongaSB partitions[5] bfloat16 (2, 2, 2, 2, 6, 128, 2, 512) %'input365_local_1070'[i16_0_1076,i15_0_0_0_1,i15_0_0_0_0,c1_1062,c2_1063,i0.128,i3.2,i1.128+128i2.2+256p_1696] = load bfloat16<128 x 512> {'CrossPassTensor': ''}bfloat16 (4, 2, 2, 128, 6, 2, 2, 128) %'input365'[i15_0_0_0_1+2i15_0_0_0_0,p_1696,c1_1062,i0.128,c2_1063,i3.2,i2.2,i1.128] # id=1376, src_id=None, , instances=192 # dl = tensor_op_name: _dot.199 | hlo_id: 63 | [[i0.128];[i1.128, i2.2, i3.2]] -> [[i0.128];[i1.128, i2.2, i3.2]] +2025-11-04T21:38:42Z INFO 8682 [sg0002/Tensorizer/DMAProfiler]: Est. DMA time: 193.732us (300.000KiB, est bw: 1.586GB/s, 8.259% of tot. time) for float32<1 x 128> {'no_delinear': '0'}non_local float32 (1, 2, 37984) %'convert.55'[0,i31_0,i0.128+512i31_1_0_0+128i31_1_0_1] = store float32<1 x 128> TongaSB partitions[2] float32 (2, 297, 1, 128) %'dot.200.1596'[i31_0,4i31_1_0_0+i31_1_0_1,0,i0.128] # id=1594, src_id=None, , instances=600 # dl = tensor_op_name: _dot.200 | hlo_id: 95 | if -i0.128-512i31_1_0_0-128i31_1_0_1+37983 >= 0 and -4i31_1_0_0-i31_1_0_1+296 >= 0 [[];[i0.128]] -> [[];[i0.128]] +2025-11-04T21:38:42Z INFO 8682 [sg0002/Tensorizer/DMAProfiler]: Est. DMA time: 123.036us (24.000MiB, est bw: 204.541GB/s, 5.245% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[4] bfloat16 (2, 6, 2, 2, 128, 2048) %'input366_local_1047'[i11_0,2i10_0_0_1_0+i10_0_0_1_1,i10_0_0_0,c2_1041,i0.128,i1.2048] = load bfloat16<128 x 2048> {'CrossPassTensor': ''}bfloat16 (2, 6, 128, 2, 2048) %'input366'[i10_0_0_0,2i10_0_0_1_0+i10_0_0_1_1,i0.128,c2_1041,i1.2048] # id=1367, src_id=None, , instances=48 # dl = tensor_op_name: _dot.197 | hlo_id: 52 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-11-04T21:38:42Z INFO 8682 [sg0002/Tensorizer/DMAProfiler]: Est. DMA time: 123.036us (24.000MiB, est bw: 204.541GB/s, 5.245% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[4] bfloat16 (2, 6, 2, 2, 128, 2048) %'input368_local_1058'[i16_0_1076,2i12_0_0_1_0+i12_0_0_1_1,i12_0_0_0,c2_1052,i0.128,i1.2048] = load bfloat16<128 x 2048> {'CrossPassTensor': ''}bfloat16 (2, 6, 128, 2, 2048) %'input368'[i12_0_0_0,2i12_0_0_1_0+i12_0_0_1_1,i0.128,c2_1052,i1.2048] # id=1370, src_id=None, , instances=48 # dl = tensor_op_name: _dot.198 | hlo_id: 42 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-11-04T21:38:42Z INFO 8682 [sg0002/Tensorizer/DMAProfiler]: Est. DMA time: 21.589us (4.000MiB, est bw: 194.277GB/s, 0.920% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[2] bfloat16 (2, 4, 128, 2048) %'996.1670'[i11_0,T_i1_0,i0.128,i1.2048] = load bfloat16<128 x 2048> non_local bfloat16 (2, 512, 2048) %'add.9'[i11_0,i0.128+128T_i1_0,i1.2048] # id=1560, src_id=None, , instances=8 # dl = tensor_op_name: add.9_pftranspose_996 | hlo_id: 27 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-11-04T21:38:42Z INFO 8682 [sg0002/Tensorizer/DMAProfiler]: Est. DMA time: 21.589us (4.000MiB, est bw: 194.277GB/s, 0.920% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[2] bfloat16 (2, 4, 128, 2, 2, 512) %'_reload_1523'[i16_0_1076,i4_0_1_1526_0,i0.128,i3.2,i2.2,i1.512] = load bfloat16<128 x 2048> DRAM3DBlk partitions[2] bfloat16 (4, 2, 128, 2048) %'_spill_1520'[i4_0_1_1526_0,i16_0_1076,i0.128,i1.512+1024i2.2+512i3.2] # id=1525, src_id=None, , instances=8 # dl = tensor_op_name: _dot.198 | hlo_id: 42 | [[i0.128];[i1.512, i2.2, i3.2]] -> [[i0.128];[i1.512, i2.2, i3.2]] +2025-11-04T21:38:42Z INFO 8682 [sg0002/Tensorizer/DMAProfiler]: Est. DMA time: 21.589us (4.000MiB, est bw: 194.277GB/s, 0.920% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[2] bfloat16 (2, 4, 128, 2048) %'1000.1675'[T_i20_0_1008,T_i1_0,i0.128,i1.2048] = load bfloat16<128 x 2048> non_local bfloat16 (2097152,) %'all_reduce.3-buffer-2033'[1048576T_i20_0_1008+2048i0.128+262144T_i1_0+i1.2048] # id=1569, src_id=None, , instances=8 # dl = tensor_op_name: all_reduce.3_pftranspose_1000 | hlo_id: 66 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-11-04T21:38:42Z INFO 8682 [sg0002/Tensorizer/DMAProfiler]: Est. DMA time: 13.416us (4.000MiB, est bw: 312.630GB/s, 0.572% of tot. time) for bfloat16<128 x 2048> DRAM3DBlk partitions[2] bfloat16 (4, 2, 128, 2048) %'_spill_1520'[i2_0_1_1634_2011_0,i11_0,i0.128,i1.2048] = store bfloat16<128 x 2048> TongaSB partitions[2] bfloat16 (2, 4, 128, 2048) %1014[i11_0,i2_0_1_1634_2011_0,i0.128,i1.2048] # id=1522, src_id=None, , instances=8 # dl = tensor_op_name: _custom-call.348 | hlo_id: 34 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-11-04T21:38:42Z INFO 8682 [sg0002/Tensorizer/DMAProfiler]: Est. DMA time: 13.416us (4.000MiB, est bw: 312.630GB/s, 0.572% of tot. time) for bfloat16<128 x 2048> non_local bfloat16 (2097152,) %'dot.14-buffer-2031'[1048576i16_0_1076+2048i0.128+262144i16_1_0_1076_1527+i1.2048] = store bfloat16<128 x 2048> TongaSB partitions[2] bfloat16 (2, 4, 128, 2048) %1077[i16_0_1076,i16_1_0_1076_1527,i0.128,i1.2048] # id=1379, src_id=None, , instances=8 # dl = tensor_op_name: _dot.199 | hlo_id: 63 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-11-04T21:38:42Z INFO 8682 [sg0002/Tensorizer/DMAProfiler]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.108 seconds +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/NeuronValueNumbering]: Running NeuronValueNumbering +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/PartialLoopFusion]: PartialLoopFusion_iteration_0 finished after 0.045 seconds +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/PartialLoopFusion]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8682 [sg0002/Tensorizer/DMAProfiler]: DMAProfiler finished after 0.022 seconds +2025-11-04T21:38:42Z INFO 8682 [sg0002/Tensorizer/OptimizeNKIKernels]: Running OptimizeNKIKernels +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/DoNothing]: Running DoNothing +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/DoNothing]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/NeuronValueNumbering]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/PartialLoopFusion]: PartialLoopFusion finished after 0.046 seconds +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/NeuronValueNumbering]: NeuronValueNumbering finished after 0.018 seconds +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_0 +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/NeuronLICM]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_0 finished after 0.012 seconds +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/NeuronInstComb]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/InferSharedMemLoc]: Running InferSharedMemLoc +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/InferSharedMemLoc]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.023 seconds +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/LowerTranspose]: Running LowerTranspose +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/InferSharedMemLoc]: InferSharedMemLoc finished after 0.010 seconds +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/FactorizeBlkDims]: Running FactorizeBlkDims +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/LowerTranspose]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/LowerTranspose]: LowerTranspose finished after 0.023 seconds +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/LowerBroadcast]: Running LowerBroadcast +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/LowerBroadcast]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/FactorizeBlkDims]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/LowerBroadcast]: LowerBroadcast finished after 0.003 seconds +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb_iteration_0 +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb_iteration_0 finished after 0.010 seconds +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/LateNeuronInstComb]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.013 seconds +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/InferSharedMemLoc]: Running InferSharedMemLoc +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/InferSharedMemLoc]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb finished after 0.011 seconds +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/SplitAccGrp]: Running SplitAccGrp +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/SplitAccGrp]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/InferSharedMemLoc]: InferSharedMemLoc finished after 0.006 seconds +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/VectorizeDMA]: Running VectorizeDMA +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/VectorizeDMA]: Running VectorizeDMA_iteration_0 +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/VectorizeDMA]: VectorizeDMA_iteration_0 finished after 0.004 seconds +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/VectorizeDMA]: Running VectorizeDMA_iteration_1 +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/VectorizeDMA]: VectorizeDMA_iteration_1 finished after 0.001 seconds +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/VectorizeDMA]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/FactorizeBlkDims]: FactorizeBlkDims finished after 0.022 seconds +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_0 +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_0 finished after 0.018 seconds +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/SplitAccGrp]: SplitAccGrp finished after 0.003 seconds +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_1 +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/SpillPSum]: Running SpillPSum +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_1 finished after 0.017 seconds +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/NeuronInstComb]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/VectorizeDMA]: VectorizeDMA finished after 0.007 seconds +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/SpillPSum]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.003 seconds +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/LegalizePartitionReduce]: Running LegalizePartitionReduce +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/LegalizePartitionReduce]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/SpillPSum]: SpillPSum finished after 0.035 seconds +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/LowerIntrinsics]: Running LowerIntrinsics +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.037 seconds +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/NeuronValueNumbering]: Running NeuronValueNumbering +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/NeuronValueNumbering]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/NeuronValueNumbering]: NeuronValueNumbering finished after 0.006 seconds +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_0 +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/LowerIntrinsics]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_0 finished after 0.015 seconds +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/NeuronInstComb]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/LegalizePartitionReduce]: LegalizePartitionReduce finished after 0.003 seconds +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/DeConcat]: Running DeConcat +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/DeConcat]: Running DeConcat_iteration_0 +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/DeConcat]: DeConcat_iteration_0 finished after 0.002 seconds +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/DeConcat]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.016 seconds +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/LowerTranspose]: Running LowerTranspose +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/LowerTranspose]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/DeConcat]: DeConcat finished after 0.003 seconds +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/FactorizeThreadAxesInFreeDims]: Running FactorizeThreadAxesInFreeDims +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/FactorizeThreadAxesInFreeDims]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/LowerTranspose]: LowerTranspose finished after 0.006 seconds +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/LowerBroadcast]: Running LowerBroadcast +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/LowerBroadcast]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/LowerBroadcast]: LowerBroadcast finished after 0.004 seconds +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb_iteration_0 +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb_iteration_0 finished after 0.013 seconds +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/LateNeuronInstComb]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/FactorizeThreadAxesInFreeDims]: FactorizeThreadAxesInFreeDims finished after 0.005 seconds +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/PartialSimdFusion]: Running PartialSimdFusion +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/PartialSimdFusion]: Running PartialSimdFusion_iteration_0 +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb finished after 0.014 seconds +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/SpillPSum]: Running SpillPSum +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/PartialSimdFusion]: PartialSimdFusion_iteration_0 finished after 0.043 seconds +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/PartialSimdFusion]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/SpillPSum]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/LowerIntrinsics]: LowerIntrinsics finished after 0.045 seconds +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/InlineNativeKernels]: Running InlineNativeKernels +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/InlineNativeKernels]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/SpillPSum]: SpillPSum finished after 0.035 seconds +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/LowerIntrinsics]: Running LowerIntrinsics +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/LowerIntrinsics]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/LowerIntrinsics]: LowerIntrinsics finished after 0.007 seconds +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/LegalizeType]: Running LegalizeType +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/PartialSimdFusion]: PartialSimdFusion finished after 0.045 seconds +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/TritiumFusion]: Running TritiumFusion +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/LegalizeType]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/LegalizeType]: LegalizeType finished after 0.018 seconds +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/InlineNativeKernels]: InlineNativeKernels finished after 0.006 seconds +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/LegalizeType]: Running LegalizeType +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/NeuronLICM]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/LegalizeType]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.014 seconds +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/InferPSumTensor]: Running InferPSumTensor +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/InferPSumTensor]: Running InferPSumTensor_iteration_0 +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/InferPSumTensor]: InferPSumTensor_iteration_0 finished after 0.012 seconds +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/InferPSumTensor]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/LegalizeType]: LegalizeType finished after 0.013 seconds +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/InferPSumTensor]: InferPSumTensor finished after 0.013 seconds +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/WeightCoalescing]: Running WeightCoalescing +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/NeuronLICM]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8680 [sg0000/Tensorizer/TritiumFusion]: Finished (changed=True) +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/WeightCoalescing]: Finished (changed=False) +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.017 seconds +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/InferPSumTensor]: Running InferPSumTensor +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/InferPSumTensor]: Running InferPSumTensor_iteration_0 +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/WeightCoalescing]: WeightCoalescing finished after 0.007 seconds +2025-11-04T21:38:42Z INFO 8682 [topk/Tensorizer/LegalizeSundaAccess]: Running LegalizeSundaAccess +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/InferPSumTensor]: InferPSumTensor_iteration_0 finished after 0.036 seconds +2025-11-04T21:38:42Z INFO 8681 [sg0001/Tensorizer/InferPSumTensor]: Running InferPSumTensor_iteration_1 +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/TritiumFusion]: TritiumFusion finished after 0.077 seconds +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/CCOpFusion]: Running CCOpFusion +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/CCOpFusion]: Running CCOpFusion_iteration_0 +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/LegalizeSundaAccess]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/LegalizeSundaAccess]: LegalizeSundaAccess finished after 0.033 seconds +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/InferPSumTensor]: InferPSumTensor_iteration_1 finished after 0.030 seconds +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/InferPSumTensor]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/CCOpFusion]: CCOpFusion_iteration_0 finished after 0.029 seconds +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/CCOpFusion]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.007 seconds +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/ExpandISAMacro]: Running ExpandISAMacro +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/ExpandISAMacro]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/CCOpFusion]: CCOpFusion finished after 0.031 seconds +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/VectorizeMatMult]: Running VectorizeMatMult +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/InferPSumTensor]: InferPSumTensor finished after 0.066 seconds +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/WeightCoalescing]: Running WeightCoalescing +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/WeightCoalescing]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/VectorizeMatMult]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/WeightCoalescing]: WeightCoalescing finished after 0.004 seconds +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/LegalizeSundaAccess]: Running LegalizeSundaAccess +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/VectorizeMatMult]: VectorizeMatMult finished after 0.020 seconds +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/PartialLoopFusion]: Running PartialLoopFusion +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/PartialLoopFusion]: Running PartialLoopFusion_iteration_0 +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/LegalizeSundaAccess]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/LegalizeSundaAccess]: LegalizeSundaAccess finished after 0.024 seconds +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/RelaxPredicates]: Running RelaxPredicates +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/RelaxPredicates]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/ExpandISAMacro]: ExpandISAMacro finished after 0.006 seconds +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/SimplifyNeuronTensor]: Running SimplifyNeuronTensor +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/PartialLoopFusion]: PartialLoopFusion_iteration_0 finished after 0.040 seconds +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/PartialLoopFusion]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/RelaxPredicates]: RelaxPredicates finished after 0.005 seconds +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/TensorInitialization]: Running TensorInitialization +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/TensorInitialization]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/PartialLoopFusion]: PartialLoopFusion finished after 0.041 seconds +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/TensorInitialization]: TensorInitialization finished after 0.005 seconds +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.004 seconds +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/ExpandISAMacro]: Running ExpandISAMacro +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/ExpandISAMacro]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/SimplifyNeuronTensor]: Running DeadCodeElimination_iteration_0 +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/SimplifyNeuronTensor]: DeadCodeElimination_iteration_0 finished after 0.002 seconds +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/NeuronLICM]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/SimplifyNeuronTensor]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/ExpandISAMacro]: ExpandISAMacro finished after 0.006 seconds +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/SimplifyNeuronTensor]: Running SimplifyNeuronTensor +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/SimplifyNeuronTensor]: SimplifyNeuronTensor finished after 0.083 seconds +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/DMALocalityOpt]: Running DMALocalityOpt +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/SimplifyNeuronTensor]: Running DeadCodeElimination_iteration_0 +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/SimplifyNeuronTensor]: DeadCodeElimination_iteration_0 finished after 0.002 seconds +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/SimplifyNeuronTensor]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/DMALocalityOpt]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.050 seconds +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/LowerTranspose]: Running LowerTranspose +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/SimplifyNeuronTensor]: SimplifyNeuronTensor finished after 0.027 seconds +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/DMALocalityOpt]: Running DMALocalityOpt +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/LowerTranspose]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/DMALocalityOpt]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/DMALocalityOpt]: DMALocalityOpt finished after 0.012 seconds +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/DataStreaming]: Running DataStreaming +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/DataStreaming]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/LowerTranspose]: LowerTranspose finished after 0.024 seconds +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/LowerBroadcast]: Running LowerBroadcast +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/LowerBroadcast]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/DataStreaming]: DataStreaming finished after 0.013 seconds +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/SFKVectorizer]: Running SFKVectorizer +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/LowerBroadcast]: LowerBroadcast finished after 0.004 seconds +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb_iteration_0 +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb_iteration_0 finished after 0.027 seconds +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb_iteration_1 +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/DMALocalityOpt]: DMALocalityOpt finished after 0.004 seconds +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/DataStreaming]: Running DataStreaming +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/SFKVectorizer]: Running VectorizeLoop_iteration_0 +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/SFKVectorizer]: VectorizeLoop_iteration_0 finished after 0.002 seconds +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb_iteration_1 finished after 0.008 seconds +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/DataStreaming]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/LateNeuronInstComb]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/SFKVectorizer]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/DataStreaming]: DataStreaming finished after 0.009 seconds +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/SFKVectorizer]: Running SFKVectorizer +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb finished after 0.037 seconds +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/SplitAccGrp]: Running SplitAccGrp +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/SplitAccGrp]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/SplitAccGrp]: SplitAccGrp finished after 0.003 seconds +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/SpillPSum]: Running SpillPSum +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/SFKVectorizer]: SFKVectorizer finished after 0.060 seconds +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/LateLegalizeInst]: Running LateLegalizeInst +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/LateLegalizeInst]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/SpillPSum]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/LateLegalizeInst]: LateLegalizeInst finished after 0.017 seconds +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/CoalesceCCOp]: Running CoalesceCCOp +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/CoalesceCCOp]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/CoalesceCCOp]: CoalesceCCOp finished after 0.007 seconds +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/SimpleAllReduceTiling]: Running SimpleAllReduceTiling +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/SimpleAllReduceTiling]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/SimpleAllReduceTiling]: SimpleAllReduceTiling finished after 0.006 seconds +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/InsertCoreBarrier]: Running InsertCoreBarrier +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/InsertCoreBarrier]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/SpillPSum]: SpillPSum finished after 0.037 seconds +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/LowerIntrinsics]: Running LowerIntrinsics +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/InsertCoreBarrier]: InsertCoreBarrier finished after 0.010 seconds +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/DMAProfiler]: Running DMAProfiler +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/DMAProfiler]: Top 10 (estimated) latency DMAs: +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/DMAProfiler]: Est. DMA time: 4.177us (296.750KiB, est bw: 72.741GB/s, 20.220% of tot. time) for float32<32 x 2374> TongaSB partitions[0] float32 (32, 2630) %4(init=0.0)[i0.32,i1.2374] = load float32<32 x 2374> float32 (32, 2374) %6[i0.32,i1.2374] # id=7, src_id=None, , instances=1 # dl = tensor_op_name: | /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/nki/_pre_prod_kernels/topk/topk.py:45:0 | [[i0.32];[i1.2374]] -> [[i0.32];[i1.2374]] +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/DMAProfiler]: Est. DMA time: 4.177us (296.750KiB, est bw: 72.741GB/s, 20.220% of tot. time) for float32<32 x 2374> TongaSB partitions[0] float32 (32, 2374) %10[i0.32,i1.2374] = load float32<32 x 2374> float32 (1, 75968) %'inp'[i0.32,i1.2374] # id=9, src_id=None, , instances=1 # dl = tensor_op_name: | /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/nki/_pre_prod_kernels/topk/topk.py:45:0 | [[i0.32];[i1.2374]] -> [[i0.32];[i1.2374]] +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/DMAProfiler]: Est. DMA time: 1.965us (4.000KiB, est bw: 2.085GB/s, 9.509% of tot. time) for float32<32 x 32> TongaSB partitions[0] float32 (32, 32) %485[i0.32,i1.32] = load float32<32 x 32> float32 (32, 32) %3[i0.32,i1.32] # id=13, src_id=None, , instances=1 # dl = tensor_op_name: | /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/nki/_pre_prod_kernels/topk/topk.py:45:0 | [[i0.32];[i1.32]] -> [[i0.32];[i1.32]] +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/DMAProfiler]: Est. DMA time: 1.922us (1.000KiB, est bw: 0.533GB/s, 9.301% of tot. time) for float32<1 x 256> TongaSB partitions[0] float32 (1, 256) %316[0,i0.256] = load float32<1 x 256> float32 (32, 8) %304[0,i0.256] # id=306, src_id=None, , instances=1 # dl = tensor_op_name: | /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/nki/_pre_prod_kernels/topk/topk.py:45:0 | [[];[i0.256]] -> [[];[i0.256]] +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/DMAProfiler]: Est. DMA time: 1.922us (1.000KiB, est bw: 0.533GB/s, 9.301% of tot. time) for uint32<1 x 256> TongaSB partitions[0] uint32 (1, 256) %319[0,i0.256] = load float32<1 x 256> float32 (32, 8) %307[0,i0.256] # id=309, src_id=None, , instances=1 # dl = tensor_op_name: | /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/nki/_pre_prod_kernels/topk/topk.py:45:0 | [[];[i0.256]] -> [[];[i0.256]] +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/DMAProfiler]: Est. DMA time: 1.640us (1.000KiB, est bw: 0.625GB/s, 7.936% of tot. time) for uint32<1 x 256> uint32 (1, 256) %'topk_indices'[0,i0.256] = store uint32<1 x 256> TongaSB partitions[0] uint32 (1, 256) %'global_id_buf'(init=0.0)[0,i0.256] # id=322, src_id=None, , instances=1 # dl = tensor_op_name: | /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/nki/_pre_prod_kernels/topk/topk.py:45:0 | [[];[i0.256]] -> [[];[i0.256]] +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/DMAProfiler]: Est. DMA time: 1.640us (1.000KiB, est bw: 0.625GB/s, 7.936% of tot. time) for float32<1 x 256> float32 (1, 256) %'topk_values'[0,i0.256] = store float32<1 x 256> TongaSB partitions[0] float32 (1, 256) %'val_buf'(init=0.0)[0,i0.256] # id=324, src_id=None, , instances=1 # dl = tensor_op_name: | /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/nki/_pre_prod_kernels/topk/topk.py:45:0 | [[];[i0.256]] -> [[];[i0.256]] +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/DMAProfiler]: Est. DMA time: 1.609us (1.000KiB, est bw: 0.636GB/s, 7.789% of tot. time) for float32<32 x 8> float32 (32, 8) %304[i0.32,i1.8] = store float32<32 x 8> TongaSB partitions[0] float32 (32, 8) %296[i0.32,i1.8] # id=305, src_id=None, , instances=1 # dl = tensor_op_name: | /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/nki/_pre_prod_kernels/topk/topk.py:45:0 | [[i0.32];[i1.8]] -> [[i0.32];[i1.8]] +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/DMAProfiler]: Est. DMA time: 1.609us (1.000KiB, est bw: 0.636GB/s, 7.789% of tot. time) for float32<32 x 8> float32 (32, 8) %307[i0.32,i1.8] = store float32<32 x 8> TongaSB partitions[0] float32 (32, 8) %517[i0.32,i1.8] # id=308, src_id=None, , instances=1 # dl = tensor_op_name: | /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/nki/_pre_prod_kernels/topk/topk.py:45:0 | [[i0.32];[i1.8]] -> [[i0.32];[i1.8]] +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/DMAProfiler]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/SFKVectorizer]: Running VectorizeLoop_iteration_0 +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/DMAProfiler]: DMAProfiler finished after 0.014 seconds +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/InferSharedMemLoc]: Running InferSharedMemLoc +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/InferSharedMemLoc]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/InferSharedMemLoc]: InferSharedMemLoc finished after 0.011 seconds +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/LowerIntrinsics]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/LowerIntrinsics]: LowerIntrinsics finished after 0.068 seconds +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/InlineNativeKernels]: Running InlineNativeKernels +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/InlineNativeKernels]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/InlineNativeKernels]: InlineNativeKernels finished after 0.009 seconds +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/LegalizeType]: Running LegalizeType +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/LegalizeType]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/LegalizeType]: LegalizeType finished after 0.017 seconds +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/SFKVectorizer]: VectorizeLoop_iteration_0 finished after 0.098 seconds +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/SFKVectorizer]: Running VectorizeLoop_iteration_1 +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/SFKVectorizer]: VectorizeLoop_iteration_1 finished after 0.021 seconds +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/NeuronLICM]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/SFKVectorizer]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.032 seconds +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/InferPSumTensor]: Running InferPSumTensor +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/InferPSumTensor]: Running InferPSumTensor_iteration_0 +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/SFKVectorizer]: SFKVectorizer finished after 0.323 seconds +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/LateLegalizeInst]: Running LateLegalizeInst +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/LateLegalizeInst]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/DoNothing]: Running DoNothing +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/DoNothing]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/LateLegalizeInst]: LateLegalizeInst finished after 0.030 seconds +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/CoalesceCCOp]: Running CoalesceCCOp +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/InferPSumTensor]: InferPSumTensor_iteration_0 finished after 0.069 seconds +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/InferPSumTensor]: Running InferPSumTensor_iteration_1 +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/DoNothing]: DoNothing finished after 0.005 seconds +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/InferSharedMemLoc]: Running InferSharedMemLoc +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/CoalesceCCOp]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/InferSharedMemLoc]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/CoalesceCCOp]: CoalesceCCOp finished after 0.022 seconds +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/SimpleAllReduceTiling]: Running SimpleAllReduceTiling +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/SimpleAllReduceTiling]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/InferSharedMemLoc]: InferSharedMemLoc finished after 0.008 seconds +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/InferPSumTensor]: InferPSumTensor_iteration_1 finished after 0.040 seconds +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/InferPSumTensor]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/FactorizeBlkDims]: Running FactorizeBlkDims +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/InferPSumTensor]: InferPSumTensor finished after 0.110 seconds +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/WeightCoalescing]: Running WeightCoalescing +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/WeightCoalescing]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/FactorizeBlkDims]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/WeightCoalescing]: WeightCoalescing finished after 0.008 seconds +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/LegalizeSundaAccess]: Running LegalizeSundaAccess +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/FactorizeBlkDims]: FactorizeBlkDims finished after 0.031 seconds +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_0 +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/SimpleAllReduceTiling]: SimpleAllReduceTiling finished after 0.006 seconds +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/InsertCoreBarrier]: Running InsertCoreBarrier +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_0 finished after 0.031 seconds +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_1 +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/InsertCoreBarrier]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_1 finished after 0.012 seconds +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/NeuronInstComb]: Finished (changed=True) +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/InsertCoreBarrier]: InsertCoreBarrier finished after 0.008 seconds +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/DMAProfiler]: Running DMAProfiler +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/DMAProfiler]: Top 10 (estimated) latency DMAs: +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/DMAProfiler]: Est. DMA time: 198.539us (24.000MiB, est bw: 126.755GB/s, 24.815% of tot. time) for bfloat16<128 x 512> TongaSB partitions[5] bfloat16 (2, 2, 2, 2, 6, 128, 2, 512) %'input68_local_1426'[i16_0_1432,i15_0_0_0_1,i15_0_0_0_0,c1_1418,c2_1419,i0.128,i3.2,i1.128+128i2.2+256p_1943] = load bfloat16<128 x 512> {'CrossPassTensor': ''}bfloat16 (4, 2, 2, 128, 6, 2, 2, 128) %'input68'[i15_0_0_0_1+2i15_0_0_0_0,p_1943,c1_1418,i0.128,c2_1419,i3.2,i2.2,i1.128] # id=1665, src_id=None, , instances=192 # dl = tensor_op_name: _dot.6 | hlo_id: 51 | [[i0.128];[i1.128, i2.2, i3.2]] -> [[i0.128];[i1.128, i2.2, i3.2]] +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/DMAProfiler]: Est. DMA time: 123.036us (24.000MiB, est bw: 204.541GB/s, 15.378% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[4] bfloat16 (2, 6, 2, 2, 128, 2048) %'input69_local_1403'[i11_0,2i10_0_0_1_0+i10_0_0_1_1,i10_0_0_0,c2_1397,i0.128,i1.2048] = load bfloat16<128 x 2048> {'CrossPassTensor': ''}bfloat16 (2, 6, 128, 2, 2048) %'input69'[i10_0_0_0,2i10_0_0_1_0+i10_0_0_1_1,i0.128,c2_1397,i1.2048] # id=1656, src_id=None, , instances=48 # dl = tensor_op_name: _dot.4 | hlo_id: 40 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/DMAProfiler]: Est. DMA time: 123.036us (24.000MiB, est bw: 204.541GB/s, 15.378% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[4] bfloat16 (2, 6, 2, 2, 128, 2048) %'input71_local_1414'[i16_0_1432,2i12_0_0_1_0+i12_0_0_1_1,i12_0_0_0,c2_1408,i0.128,i1.2048] = load bfloat16<128 x 2048> {'CrossPassTensor': ''}bfloat16 (2, 6, 128, 2, 2048) %'input71'[i12_0_0_0,2i12_0_0_1_0+i12_0_0_1_1,i0.128,c2_1408,i1.2048] # id=1659, src_id=None, , instances=48 # dl = tensor_op_name: _dot.5 | hlo_id: 30 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/DMAProfiler]: Est. DMA time: 41.879us (8.000MiB, est bw: 200.308GB/s, 5.234% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[4] bfloat16 (2, 2, 2, 2, 128, 2048) %'input78_local_1450'[i37_0,i38_0_0,c1_1442,c2_1443,i0.128,i1.2048] = load bfloat16<128 x 2048> {'CrossPassTensor': ''}bfloat16 (2, 2, 128, 4096) %'input78'[i38_0_0,c1_1442,i0.128,i1.2048+2048c2_1443] # id=1679, src_id=None, , instances=16 # dl = tensor_op_name: _dot.9 | hlo_id: 71 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/DMAProfiler]: Est. DMA time: 21.589us (4.000MiB, est bw: 194.277GB/s, 2.698% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[2] bfloat16 (2, 4, 128, 2048) %'1350.1909'[i11_0,T_i1_0,i0.128,i1.2048] = load bfloat16<128 x 2048> non_local bfloat16 (2, 512, 2048) %'add.4'[i11_0,i0.128+128T_i1_0,i1.2048] # id=1780, src_id=None, , instances=8 # dl = tensor_op_name: add.4_pftranspose_1350 | hlo_id: 15 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/DMAProfiler]: Est. DMA time: 21.589us (4.000MiB, est bw: 194.277GB/s, 2.698% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[3] bfloat16 (2, 2, 2, 128, 2048) %'_reload_1775'[i16_0_1432,i4_0_0_711_1778,i4_0_1_1778_0,i0.128,i1.2048] = load bfloat16<128 x 2048> DRAM3DBlk partitions[3] bfloat16 (2, 2, 2, 128, 2048) %'_spill_1772'[i4_0_0_711_1778,i4_0_1_1778_0,i16_0_1432,i0.128,i1.2048] # id=1777, src_id=None, , instances=8 # dl = tensor_op_name: _dot.5 | hlo_id: 30 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/DMAProfiler]: Est. DMA time: 21.589us (4.000MiB, est bw: 194.277GB/s, 2.698% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[2] bfloat16 (2, 4, 128, 2048) %'1354.1914'[i37_0,T_i1_0,i0.128,i1.2048] = load bfloat16<128 x 2048> non_local bfloat16 (2097152,) %'all_reduce.1-buffer-2416'[1048576i37_0+2048i0.128+262144T_i1_0+i1.2048] # id=1789, src_id=None, , instances=8 # dl = tensor_op_name: all_reduce.1_pftranspose_1354 | hlo_id: 54 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/DMAProfiler]: Est. DMA time: 21.589us (4.000MiB, est bw: 194.277GB/s, 2.698% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[3] bfloat16 (2, 2, 2, 128, 2048) %'input76_local_1471'[i67_0,c0_1464,c1_1465,i0.128,i1.2048] = load bfloat16<128 x 2048> {'CrossPassTensor': ''}bfloat16 (2, 128, 4096) %'input76'[c0_1464,i0.128,i1.2048+2048c1_1465] # id=1702, src_id=None, , instances=8 # dl = tensor_op_name: _dot.8 | hlo_id: 114 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/DMAProfiler]: Est. DMA time: 21.589us (4.000MiB, est bw: 194.277GB/s, 2.698% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[3] bfloat16 (2, 2, 2, 128, 2048) %'input73_local_1510'[i2_0_1516,c0_1503,c1_1504,i0.128,i1.2048] = load bfloat16<128 x 2048> {'CrossPassTensor': ''}bfloat16 (2, 128, 4096) %'input73'[c0_1503,i0.128,i1.2048+2048c1_1504] # id=1725, src_id=None, , instances=8 # dl = tensor_op_name: _dot.7 | hlo_id: 155 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/DMAProfiler]: Est. DMA time: 21.589us (4.000MiB, est bw: 194.277GB/s, 2.698% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[3] bfloat16 (2, 2, 2, 128, 2, 8, 128) %'get_tuple_element.2_local_1524'[i98_0_0_0_1541,c0_1518_0,c0_1518_1,i0.128,i3.2,i2.8,i1.128] = load bfloat16<128 x 2048> non_local bfloat16 (4, 2, 128, 8, 128) %'get_tuple_element.2'[2c0_1518_0+c0_1518_1,i3.2,i0.128,i2.8,i1.128] # id=1731, src_id=None, , instances=8 # dl = tensor_op_name: _dot.10 | hlo_id: 173 | [[i0.128];[i1.128, i2.8, i3.2]] -> [[i0.128];[i1.128, i2.8, i3.2]] +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/LegalizeSundaAccess]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8681 [sg0001/Tensorizer/DMAProfiler]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.045 seconds +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/NeuronValueNumbering]: Running NeuronValueNumbering +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/NeuronValueNumbering]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/LegalizeSundaAccess]: LegalizeSundaAccess finished after 0.076 seconds +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/RelaxPredicates]: Running RelaxPredicates +2025-11-04T21:38:43Z INFO 8680 [sg0000/Tensorizer/RelaxPredicates]: Finished (changed=False) +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/NeuronValueNumbering]: NeuronValueNumbering finished after 0.008 seconds +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-11-04T21:38:43Z INFO 8682 [topk/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_0 +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_0 finished after 0.015 seconds +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/NeuronInstComb]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8680 [sg0000/Tensorizer/RelaxPredicates]: RelaxPredicates finished after 0.008 seconds +2025-11-04T21:38:44Z INFO 8680 [sg0000/Tensorizer/TensorInitialization]: Running TensorInitialization +2025-11-04T21:38:44Z INFO 8680 [sg0000/Tensorizer/TensorInitialization]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.017 seconds +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/LowerTranspose]: Running LowerTranspose +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/LowerTranspose]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8681 [sg0001/Tensorizer/DMAProfiler]: DMAProfiler finished after 0.010 seconds +2025-11-04T21:38:44Z INFO 8681 [sg0001/Tensorizer/OptimizeNKIKernels]: Running OptimizeNKIKernels +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/DoNothing]: Running DoNothing +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/DoNothing]: Finished (changed=True) +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/LowerTranspose]: LowerTranspose finished after 0.007 seconds +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/LowerBroadcast]: Running LowerBroadcast +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/LowerBroadcast]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/LowerBroadcast]: LowerBroadcast finished after 0.006 seconds +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb_iteration_0 +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb_iteration_0 finished after 0.015 seconds +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/LateNeuronInstComb]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/InferSharedMemLoc]: Running InferSharedMemLoc +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/InferSharedMemLoc]: Finished (changed=True) +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb finished after 0.017 seconds +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/SpillPSum]: Running SpillPSum +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/InferSharedMemLoc]: InferSharedMemLoc finished after 0.001 seconds +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/FactorizeBlkDims]: Running FactorizeBlkDims +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/FactorizeBlkDims]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/SpillPSum]: Finished (changed=True) +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/FactorizeBlkDims]: FactorizeBlkDims finished after 0.000 seconds +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_0 +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_0 finished after 0.000 seconds +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/NeuronInstComb]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.001 seconds +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/NeuronValueNumbering]: Running NeuronValueNumbering +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/NeuronValueNumbering]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/NeuronValueNumbering]: NeuronValueNumbering finished after 0.000 seconds +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_0 +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_0 finished after 0.000 seconds +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/NeuronInstComb]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.000 seconds +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/LowerTranspose]: Running LowerTranspose +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/LowerTranspose]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/LowerTranspose]: LowerTranspose finished after 0.000 seconds +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/LowerBroadcast]: Running LowerBroadcast +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/LowerBroadcast]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/LowerBroadcast]: LowerBroadcast finished after 0.000 seconds +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb_iteration_0 +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb_iteration_0 finished after 0.000 seconds +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/LateNeuronInstComb]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb finished after 0.000 seconds +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/SpillPSum]: Running SpillPSum +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/SpillPSum]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/SpillPSum]: SpillPSum finished after 0.001 seconds +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/LowerIntrinsics]: Running LowerIntrinsics +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/LowerIntrinsics]: Finished (changed=True) +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/LowerIntrinsics]: LowerIntrinsics finished after 0.000 seconds +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/LegalizeType]: Running LegalizeType +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/LegalizeType]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/LegalizeType]: LegalizeType finished after 0.000 seconds +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/NeuronLICM]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.000 seconds +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/InferPSumTensor]: Running InferPSumTensor +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/InferPSumTensor]: Running InferPSumTensor_iteration_0 +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/InferPSumTensor]: InferPSumTensor_iteration_0 finished after 0.000 seconds +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/InferPSumTensor]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/SpillPSum]: SpillPSum finished after 0.034 seconds +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/LowerIntrinsics]: Running LowerIntrinsics +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/LowerIntrinsics]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/InferPSumTensor]: InferPSumTensor finished after 0.001 seconds +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/WeightCoalescing]: Running WeightCoalescing +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/WeightCoalescing]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/WeightCoalescing]: WeightCoalescing finished after 0.000 seconds +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/LegalizeSundaAccess]: Running LegalizeSundaAccess +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/LegalizeSundaAccess]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/LegalizeSundaAccess]: LegalizeSundaAccess finished after 0.000 seconds +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.000 seconds +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/ExpandISAMacro]: Running ExpandISAMacro +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/ExpandISAMacro]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/ExpandISAMacro]: ExpandISAMacro finished after 0.000 seconds +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/SimplifyNeuronTensor]: Running SimplifyNeuronTensor +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/SimplifyNeuronTensor]: Running DeadCodeElimination_iteration_0 +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/SimplifyNeuronTensor]: DeadCodeElimination_iteration_0 finished after 0.000 seconds +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/SimplifyNeuronTensor]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/SimplifyNeuronTensor]: SimplifyNeuronTensor finished after 0.001 seconds +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/DMALocalityOpt]: Running DMALocalityOpt +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/DMALocalityOpt]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/DMALocalityOpt]: DMALocalityOpt finished after 0.000 seconds +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/DataStreaming]: Running DataStreaming +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/DataStreaming]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/DataStreaming]: DataStreaming finished after 0.000 seconds +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/SFKVectorizer]: Running SFKVectorizer +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/SFKVectorizer]: Running VectorizeLoop_iteration_0 +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/SFKVectorizer]: VectorizeLoop_iteration_0 finished after 0.000 seconds +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/SFKVectorizer]: Finished (changed=True) +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/SFKVectorizer]: SFKVectorizer finished after 0.002 seconds +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/LateLegalizeInst]: Running LateLegalizeInst +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/LateLegalizeInst]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/LateLegalizeInst]: LateLegalizeInst finished after 0.000 seconds +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/CoalesceCCOp]: Running CoalesceCCOp +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/CoalesceCCOp]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/CoalesceCCOp]: CoalesceCCOp finished after 0.000 seconds +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/SimpleAllReduceTiling]: Running SimpleAllReduceTiling +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/SimpleAllReduceTiling]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/SimpleAllReduceTiling]: SimpleAllReduceTiling finished after 0.000 seconds +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/InsertCoreBarrier]: Running InsertCoreBarrier +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/InsertCoreBarrier]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/InsertCoreBarrier]: InsertCoreBarrier finished after 0.000 seconds +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/DMAProfiler]: Running DMAProfiler +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/DMAProfiler]: Top 10 (estimated) latency DMAs: +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/DMAProfiler]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/DMAProfiler]: DMAProfiler finished after 0.000 seconds +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/InferSharedMemLoc]: Running InferSharedMemLoc +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/InferSharedMemLoc]: Finished (changed=True) +2025-11-04T21:38:44Z INFO 8681 [attention_isa_kernel/Tensorizer/InferSharedMemLoc]: InferSharedMemLoc finished after 0.001 seconds +2025-11-04T21:38:44Z INFO 8681 [sg0001/Tensorizer/OptimizeNKIKernels]: Allocate SB of shape (128, 35676) for CausalAttentionMMSoftmaxMMWithoutSwap +2025-11-04T21:38:44Z INFO 8681 [sg0001/Tensorizer/OptimizeNKIKernels]: Allocate PSUM of shape (8, 128, 2048) for CausalAttentionMMSoftmaxMMWithoutSwap +2025-11-04T21:38:44Z INFO 8681 [sg0001/Tensorizer/OptimizeNKIKernels]: Finished (changed=True) +2025-11-04T21:38:44Z INFO 8681 [sg0001/Tensorizer/OptimizeNKIKernels]: OptimizeNKIKernels finished after 0.427 seconds +2025-11-04T21:38:44Z INFO 8681 [sg0001/Tensorizer/CCOpFusion]: Running CCOpFusion +2025-11-04T21:38:44Z INFO 8681 [sg0001/Tensorizer/CCOpFusion]: Running CCOpFusion_iteration_0 +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/LowerIntrinsics]: LowerIntrinsics finished after 0.007 seconds +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/LegalizeType]: Running LegalizeType +2025-11-04T21:38:44Z INFO 8681 [sg0001/Tensorizer/CCOpFusion]: CCOpFusion_iteration_0 finished after 0.058 seconds +2025-11-04T21:38:44Z INFO 8681 [sg0001/Tensorizer/CCOpFusion]: Finished (changed=True) +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/LegalizeType]: Finished (changed=True) +2025-11-04T21:38:44Z INFO 8681 [sg0001/Tensorizer/CCOpFusion]: CCOpFusion finished after 0.058 seconds +2025-11-04T21:38:44Z INFO 8681 [sg0001/Tensorizer/StaticProfiler]: Running StaticProfiler +2025-11-04T21:38:44Z INFO 8681 [sg0001/Tensorizer/StaticProfiler]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8681 [sg0001/Tensorizer/StaticProfiler]: StaticProfiler finished after 0.004 seconds +2025-11-04T21:38:44Z INFO 8681 [sg0001/Tensorizer/SplitAPUnionSets]: Running SplitAPUnionSets +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/LegalizeType]: LegalizeType finished after 0.022 seconds +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-11-04T21:38:44Z INFO 8681 [sg0001/Tensorizer/SplitAPUnionSets]: Finished (changed=True) +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/NeuronLICM]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8681 [sg0001/Tensorizer/SplitAPUnionSets]: SplitAPUnionSets finished after 0.042 seconds +2025-11-04T21:38:44Z INFO 8681 [sg0001/Tensorizer/LateLegalizePostSplit]: Running LateLegalizePostSplit +2025-11-04T21:38:44Z INFO 8681 [sg0001/Tensorizer/LateLegalizePostSplit]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.024 seconds +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/InferPSumTensor]: Running InferPSumTensor +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/InferPSumTensor]: Running InferPSumTensor_iteration_0 +2025-11-04T21:38:44Z INFO 8681 [sg0001/Tensorizer/LateLegalizePostSplit]: LateLegalizePostSplit finished after 0.006 seconds +2025-11-04T21:38:44Z INFO 8681 [sg0001/Tensorizer/InferSharedMemLoc]: Running InferSharedMemLoc +2025-11-04T21:38:44Z INFO 8681 [sg0001/Tensorizer/InferSharedMemLoc]: Finished (changed=True) +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/InferPSumTensor]: InferPSumTensor_iteration_0 finished after 0.022 seconds +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/InferPSumTensor]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8681 [sg0001/Tensorizer/InferSharedMemLoc]: InferSharedMemLoc finished after 0.007 seconds +2025-11-04T21:38:44Z INFO 8681 [sg0001/Tensorizer/LowerShardAxis]: Running LowerShardAxis +2025-11-04T21:38:44Z INFO 8681 [sg0001/Tensorizer/LowerShardAxis]: Finished (changed=True) +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/InferPSumTensor]: InferPSumTensor finished after 0.023 seconds +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/WeightCoalescing]: Running WeightCoalescing +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/WeightCoalescing]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8681 [sg0001/Tensorizer/LowerShardAxis]: LowerShardAxis finished after 0.010 seconds +2025-11-04T21:38:44Z INFO 8681 [sg0001/Tensorizer/CCOpFusion]: Running CCOpFusion +2025-11-04T21:38:44Z INFO 8681 [sg0001/Tensorizer/CCOpFusion]: Running CCOpFusion_iteration_0 +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/WeightCoalescing]: WeightCoalescing finished after 0.009 seconds +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/LegalizeSundaAccess]: Running LegalizeSundaAccess +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/LegalizeSundaAccess]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8681 [sg0001/Tensorizer/CCOpFusion]: CCOpFusion_iteration_0 finished after 0.047 seconds +2025-11-04T21:38:44Z INFO 8681 [sg0001/Tensorizer/CCOpFusion]: Finished (changed=True) +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/LegalizeSundaAccess]: LegalizeSundaAccess finished after 0.028 seconds +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8681 [sg0001/Tensorizer/CCOpFusion]: CCOpFusion finished after 0.047 seconds +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.007 seconds +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/ExpandISAMacro]: Running ExpandISAMacro +2025-11-04T21:38:44Z INFO 8681 [sg0001/Tensorizer/DumpGraphAndMetadata]: Running DumpGraphAndMetadata +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/ExpandISAMacro]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8681 [sg0001/Tensorizer/DumpGraphAndMetadata]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/ExpandISAMacro]: ExpandISAMacro finished after 0.008 seconds +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/SimplifyNeuronTensor]: Running SimplifyNeuronTensor +2025-11-04T21:38:44Z INFO 8681 [sg0001/Tensorizer/DumpGraphAndMetadata]: DumpGraphAndMetadata finished after 0.009 seconds +2025-11-04T21:38:44Z INFO 8681 [sg0001/Tensorizer/ZeroSizeTensorElimination]: Running ZeroSizeTensorElimination +2025-11-04T21:38:44Z INFO 8681 [sg0001/Tensorizer/ZeroSizeTensorElimination]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8681 [sg0001/Tensorizer/ZeroSizeTensorElimination]: ZeroSizeTensorElimination finished after 0.000 seconds +2025-11-04T21:38:44Z INFO 8681 [sg0001/Tensorizer/LowerToSendRecv]: Running LowerToSendRecv +2025-11-04T21:38:44Z INFO 8681 [sg0001/Tensorizer/LowerToSendRecv]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8681 [sg0001/Tensorizer/LowerToSendRecv]: LowerToSendRecv finished after 0.006 seconds +2025-11-04T21:38:44Z INFO 8681 [sg0001/Tensorizer/BirCodeGenLoop]: Running BirCodeGenLoop +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/SimplifyNeuronTensor]: Running DeadCodeElimination_iteration_0 +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/SimplifyNeuronTensor]: DeadCodeElimination_iteration_0 finished after 0.002 seconds +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/SimplifyNeuronTensor]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/SimplifyNeuronTensor]: SimplifyNeuronTensor finished after 0.102 seconds +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/DMALocalityOpt]: Running DMALocalityOpt +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/DMALocalityOpt]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/DMALocalityOpt]: DMALocalityOpt finished after 0.007 seconds +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/DataStreaming]: Running DataStreaming +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/DataStreaming]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/DataStreaming]: DataStreaming finished after 0.013 seconds +2025-11-04T21:38:44Z INFO 8682 [topk/Tensorizer/SFKVectorizer]: Running SFKVectorizer +2025-11-04T21:38:44Z INFO 8681 [sg0001/Tensorizer/BirCodeGenLoop]: Finished (changed=False) +2025-11-04T21:38:44Z INFO 8681 [sg0001/Tensorizer/BirCodeGenLoop]: BirCodeGenLoop finished after 0.124 seconds +2025-11-04T21:38:45Z INFO 8681 [Tensorizer]: BirCodeGen estimate #instances=2582 in sg0001 +2025-11-04T21:38:45Z INFO 8681 [Tensorizer]: IR signature: 40b8410f3e3a61bf7fd45b7e0e94e207d7397ffe04416c5629eaa4110af8acc8 for nc00/sg0001/TensorizerBIR +2025-11-04T21:38:45Z INFO 8681 [sg0001/Tensorizer/BirCodeGenLoop]: Running BirCodeGenLoop +2025-11-04T21:38:45Z INFO 8680 [sg0000/Tensorizer/TensorInitialization]: TensorInitialization finished after 0.007 seconds +2025-11-04T21:38:45Z INFO 8680 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-11-04T21:38:45Z INFO 8682 [topk/Tensorizer/SFKVectorizer]: Running VectorizeLoop_iteration_0 +2025-11-04T21:38:45Z INFO 8682 [topk/Tensorizer/SFKVectorizer]: VectorizeLoop_iteration_0 finished after 0.003 seconds +2025-11-04T21:38:45Z INFO 8680 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8682 [topk/Tensorizer/SFKVectorizer]: Finished (changed=True) +2025-11-04T21:38:45Z INFO 8680 [sg0000/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.005 seconds +2025-11-04T21:38:45Z INFO 8680 [sg0000/Tensorizer/ExpandISAMacro]: Running ExpandISAMacro +2025-11-04T21:38:45Z INFO 8680 [sg0000/Tensorizer/ExpandISAMacro]: Finished (changed=True) +2025-11-04T21:38:45Z INFO 8682 [topk/Tensorizer/SFKVectorizer]: SFKVectorizer finished after 0.120 seconds +2025-11-04T21:38:45Z INFO 8682 [topk/Tensorizer/LateLegalizeInst]: Running LateLegalizeInst +2025-11-04T21:38:45Z INFO 8681 [sg0001/Tensorizer/BirCodeGenLoop]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8682 [topk/Tensorizer/LateLegalizeInst]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8680 [sg0000/Tensorizer/ExpandISAMacro]: ExpandISAMacro finished after 0.007 seconds +2025-11-04T21:38:45Z INFO 8680 [sg0000/Tensorizer/SimplifyNeuronTensor]: Running SimplifyNeuronTensor +2025-11-04T21:38:45Z INFO 8682 [topk/Tensorizer/LateLegalizeInst]: LateLegalizeInst finished after 0.015 seconds +2025-11-04T21:38:45Z INFO 8682 [topk/Tensorizer/CoalesceCCOp]: Running CoalesceCCOp +2025-11-04T21:38:45Z INFO 8680 [sg0000/Tensorizer/SimplifyNeuronTensor]: Running DeadCodeElimination_iteration_0 +2025-11-04T21:38:45Z INFO 8680 [sg0000/Tensorizer/SimplifyNeuronTensor]: DeadCodeElimination_iteration_0 finished after 0.002 seconds +2025-11-04T21:38:45Z INFO 8680 [sg0000/Tensorizer/SimplifyNeuronTensor]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8682 [topk/Tensorizer/CoalesceCCOp]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8680 [sg0000/Tensorizer/SimplifyNeuronTensor]: SimplifyNeuronTensor finished after 0.021 seconds +2025-11-04T21:38:45Z INFO 8680 [sg0000/Tensorizer/DMALocalityOpt]: Running DMALocalityOpt +2025-11-04T21:38:45Z INFO 8680 [sg0000/Tensorizer/DMALocalityOpt]: Finished (changed=True) +2025-11-04T21:38:45Z INFO 8680 [sg0000/Tensorizer/DMALocalityOpt]: DMALocalityOpt finished after 0.002 seconds +2025-11-04T21:38:45Z INFO 8680 [sg0000/Tensorizer/DataStreaming]: Running DataStreaming +2025-11-04T21:38:45Z INFO 8680 [sg0000/Tensorizer/DataStreaming]: Finished (changed=True) +2025-11-04T21:38:45Z INFO 8681 [sg0001/Tensorizer/BirCodeGenLoop]: BirCodeGenLoop finished after 0.046 seconds +2025-11-04T21:38:45Z INFO 8680 [sg0000/Tensorizer/DataStreaming]: DataStreaming finished after 0.010 seconds +2025-11-04T21:38:45Z INFO 8680 [sg0000/Tensorizer/SFKVectorizer]: Running SFKVectorizer +2025-11-04T21:38:45Z INFO 8682 [topk/Tensorizer/CoalesceCCOp]: CoalesceCCOp finished after 0.007 seconds +2025-11-04T21:38:45Z INFO 8682 [topk/Tensorizer/SimpleAllReduceTiling]: Running SimpleAllReduceTiling +2025-11-04T21:38:45Z INFO 8681 [Tensorizer]: BirCodeGen estimate #instances=2582 in sg0001 +2025-11-04T21:38:45Z INFO 8681 [Tensorizer]: IR signature: 6fff65cb59e68a9aca9ef846bc6e9ebfb21128d363ecc5824bb9a3f8b2f8bc22 for nc01/sg0001/TensorizerBIR +2025-11-04T21:38:45Z INFO 8681 [Tensorizer]: Weights total number of bytes: 196610 +2025-11-04T21:38:45Z INFO 8681 [Tensorizer]: Successfully built model. +2025-11-04T21:38:45Z INFO 8682 [topk/Tensorizer/SimpleAllReduceTiling]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8682 [topk/Tensorizer/SimpleAllReduceTiling]: SimpleAllReduceTiling finished after 0.017 seconds +2025-11-04T21:38:45Z INFO 8682 [topk/Tensorizer/InsertCoreBarrier]: Running InsertCoreBarrier +2025-11-04T21:38:45Z INFO 8682 [topk/Tensorizer/InsertCoreBarrier]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8682 [topk/Tensorizer/InsertCoreBarrier]: InsertCoreBarrier finished after 0.007 seconds +2025-11-04T21:38:45Z INFO 8682 [topk/Tensorizer/DMAProfiler]: Running DMAProfiler +2025-11-04T21:38:45Z INFO 8682 [topk/Tensorizer/DMAProfiler]: Top 10 (estimated) latency DMAs: +2025-11-04T21:38:45Z INFO 8682 [topk/Tensorizer/DMAProfiler]: Est. DMA time: 2.014us (2.000KiB, est bw: 1.017GB/s, 12.329% of tot. time) for float32<32 x 16> TongaSB partitions[0] float32 (32, 272) %4(init=0.0)[i0.32,i1.16] = load float32<32 x 16> float32 (32, 16) %6[i0.32,i1.16] # id=7, src_id=None, , instances=1 # dl = tensor_op_name: | /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/nki/_pre_prod_kernels/topk/topk.py:45:0 | [[i0.32];[i1.16]] -> [[i0.32];[i1.16]] +2025-11-04T21:38:45Z INFO 8682 [topk/Tensorizer/DMAProfiler]: Est. DMA time: 2.014us (2.000KiB, est bw: 1.017GB/s, 12.329% of tot. time) for float32<32 x 16> TongaSB partitions[0] float32 (32, 16) %10[i0.32,i1.16] = load float32<32 x 16> float32 (1, 512) %'inp'[i0.32,i1.16] # id=9, src_id=None, , instances=1 # dl = tensor_op_name: | /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/nki/_pre_prod_kernels/topk/topk.py:45:0 | [[i0.32];[i1.16]] -> [[i0.32];[i1.16]] +2025-11-04T21:38:45Z INFO 8682 [topk/Tensorizer/DMAProfiler]: Est. DMA time: 1.965us (4.000KiB, est bw: 2.085GB/s, 12.028% of tot. time) for float32<32 x 32> TongaSB partitions[0] float32 (32, 32) %485[i0.32,i1.32] = load float32<32 x 32> float32 (32, 32) %3[i0.32,i1.32] # id=13, src_id=None, , instances=1 # dl = tensor_op_name: | /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/nki/_pre_prod_kernels/topk/topk.py:45:0 | [[i0.32];[i1.32]] -> [[i0.32];[i1.32]] +2025-11-04T21:38:45Z INFO 8682 [topk/Tensorizer/DMAProfiler]: Est. DMA time: 1.922us (1.000KiB, est bw: 0.533GB/s, 11.765% of tot. time) for float32<1 x 256> TongaSB partitions[0] float32 (1, 256) %316[0,i0.256] = load float32<1 x 256> float32 (32, 8) %304[0,i0.256] # id=306, src_id=None, , instances=1 # dl = tensor_op_name: | /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/nki/_pre_prod_kernels/topk/topk.py:45:0 | [[];[i0.256]] -> [[];[i0.256]] +2025-11-04T21:38:45Z INFO 8682 [topk/Tensorizer/DMAProfiler]: Est. DMA time: 1.922us (1.000KiB, est bw: 0.533GB/s, 11.765% of tot. time) for uint32<1 x 256> TongaSB partitions[0] uint32 (1, 256) %319[0,i0.256] = load float32<1 x 256> float32 (32, 8) %307[0,i0.256] # id=309, src_id=None, , instances=1 # dl = tensor_op_name: | /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/nki/_pre_prod_kernels/topk/topk.py:45:0 | [[];[i0.256]] -> [[];[i0.256]] +2025-11-04T21:38:45Z INFO 8682 [topk/Tensorizer/DMAProfiler]: Est. DMA time: 1.640us (1.000KiB, est bw: 0.625GB/s, 10.038% of tot. time) for uint32<1 x 256> uint32 (1, 256) %'topk_indices'[0,i0.256] = store uint32<1 x 256> TongaSB partitions[0] uint32 (1, 256) %'global_id_buf'(init=0.0)[0,i0.256] # id=322, src_id=None, , instances=1 # dl = tensor_op_name: | /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/nki/_pre_prod_kernels/topk/topk.py:45:0 | [[];[i0.256]] -> [[];[i0.256]] +2025-11-04T21:38:45Z INFO 8682 [topk/Tensorizer/DMAProfiler]: Est. DMA time: 1.640us (1.000KiB, est bw: 0.625GB/s, 10.038% of tot. time) for float32<1 x 256> float32 (1, 256) %'topk_values'[0,i0.256] = store float32<1 x 256> TongaSB partitions[0] float32 (1, 256) %'val_buf'(init=0.0)[0,i0.256] # id=324, src_id=None, , instances=1 # dl = tensor_op_name: | /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/nki/_pre_prod_kernels/topk/topk.py:45:0 | [[];[i0.256]] -> [[];[i0.256]] +2025-11-04T21:38:45Z INFO 8682 [topk/Tensorizer/DMAProfiler]: Est. DMA time: 1.609us (1.000KiB, est bw: 0.636GB/s, 9.852% of tot. time) for float32<32 x 8> float32 (32, 8) %304[i0.32,i1.8] = store float32<32 x 8> TongaSB partitions[0] float32 (32, 8) %296[i0.32,i1.8] # id=305, src_id=None, , instances=1 # dl = tensor_op_name: | /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/nki/_pre_prod_kernels/topk/topk.py:45:0 | [[i0.32];[i1.8]] -> [[i0.32];[i1.8]] +2025-11-04T21:38:45Z INFO 8682 [topk/Tensorizer/DMAProfiler]: Est. DMA time: 1.609us (1.000KiB, est bw: 0.636GB/s, 9.852% of tot. time) for float32<32 x 8> float32 (32, 8) %307[i0.32,i1.8] = store float32<32 x 8> TongaSB partitions[0] float32 (32, 8) %517[i0.32,i1.8] # id=308, src_id=None, , instances=1 # dl = tensor_op_name: | /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/nki/_pre_prod_kernels/topk/topk.py:45:0 | [[i0.32];[i1.8]] -> [[i0.32];[i1.8]] +2025-11-04T21:38:45Z INFO 8682 [topk/Tensorizer/DMAProfiler]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8682 [topk/Tensorizer/DMAProfiler]: DMAProfiler finished after 0.008 seconds +2025-11-04T21:38:45Z INFO 8682 [topk/Tensorizer/InferSharedMemLoc]: Running InferSharedMemLoc +2025-11-04T21:38:45Z INFO 8682 [topk/Tensorizer/InferSharedMemLoc]: Finished (changed=True) +2025-11-04T21:38:45Z INFO 8682 [topk/Tensorizer/InferSharedMemLoc]: InferSharedMemLoc finished after 0.011 seconds +2025-11-04T21:38:45Z INFO 8680 [sg0000/Tensorizer/SFKVectorizer]: Running VectorizeLoop_iteration_0 +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/DoNothing]: Running DoNothing +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/DoNothing]: Finished (changed=True) +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/DoNothing]: DoNothing finished after 0.001 seconds +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/InferSharedMemLoc]: Running InferSharedMemLoc +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/InferSharedMemLoc]: Finished (changed=True) +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/InferSharedMemLoc]: InferSharedMemLoc finished after 0.001 seconds +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/FactorizeBlkDims]: Running FactorizeBlkDims +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/FactorizeBlkDims]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/FactorizeBlkDims]: FactorizeBlkDims finished after 0.001 seconds +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_0 +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_0 finished after 0.001 seconds +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/NeuronInstComb]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.001 seconds +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/NeuronValueNumbering]: Running NeuronValueNumbering +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/NeuronValueNumbering]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/NeuronValueNumbering]: NeuronValueNumbering finished after 0.001 seconds +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_0 +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_0 finished after 0.001 seconds +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/NeuronInstComb]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.001 seconds +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/LowerTranspose]: Running LowerTranspose +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/LowerTranspose]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/LowerTranspose]: LowerTranspose finished after 0.000 seconds +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/LowerBroadcast]: Running LowerBroadcast +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/LowerBroadcast]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/LowerBroadcast]: LowerBroadcast finished after 0.000 seconds +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb_iteration_0 +2025-11-04T21:38:45Z INFO 8680 [sg0000/Tensorizer/SFKVectorizer]: VectorizeLoop_iteration_0 finished after 0.206 seconds +2025-11-04T21:38:45Z INFO 8680 [sg0000/Tensorizer/SFKVectorizer]: Running VectorizeLoop_iteration_1 +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb_iteration_0 finished after 0.003 seconds +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/LateNeuronInstComb]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb finished after 0.005 seconds +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/SpillPSum]: Running SpillPSum +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/SpillPSum]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8680 [sg0000/Tensorizer/SFKVectorizer]: VectorizeLoop_iteration_1 finished after 0.017 seconds +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/SpillPSum]: SpillPSum finished after 0.003 seconds +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/LowerIntrinsics]: Running LowerIntrinsics +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/LowerIntrinsics]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8680 [sg0000/Tensorizer/SFKVectorizer]: Finished (changed=True) +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/LowerIntrinsics]: LowerIntrinsics finished after 0.000 seconds +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/LegalizeType]: Running LegalizeType +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/LegalizeType]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/LegalizeType]: LegalizeType finished after 0.000 seconds +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/NeuronLICM]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.001 seconds +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/InferPSumTensor]: Running InferPSumTensor +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/InferPSumTensor]: Running InferPSumTensor_iteration_0 +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/InferPSumTensor]: InferPSumTensor_iteration_0 finished after 0.001 seconds +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/InferPSumTensor]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/InferPSumTensor]: InferPSumTensor finished after 0.005 seconds +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/WeightCoalescing]: Running WeightCoalescing +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/WeightCoalescing]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/WeightCoalescing]: WeightCoalescing finished after 0.000 seconds +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/LegalizeSundaAccess]: Running LegalizeSundaAccess +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/LegalizeSundaAccess]: Finished (changed=True) +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/LegalizeSundaAccess]: LegalizeSundaAccess finished after 0.003 seconds +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.006 seconds +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/ExpandISAMacro]: Running ExpandISAMacro +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/ExpandISAMacro]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/ExpandISAMacro]: ExpandISAMacro finished after 0.002 seconds +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/SimplifyNeuronTensor]: Running SimplifyNeuronTensor +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/SimplifyNeuronTensor]: Running DeadCodeElimination_iteration_0 +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/SimplifyNeuronTensor]: DeadCodeElimination_iteration_0 finished after 0.000 seconds +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/SimplifyNeuronTensor]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/SimplifyNeuronTensor]: SimplifyNeuronTensor finished after 0.001 seconds +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/DMALocalityOpt]: Running DMALocalityOpt +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/DMALocalityOpt]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/DMALocalityOpt]: DMALocalityOpt finished after 0.000 seconds +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/DataStreaming]: Running DataStreaming +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/DataStreaming]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/DataStreaming]: DataStreaming finished after 0.000 seconds +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/SFKVectorizer]: Running SFKVectorizer +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/SFKVectorizer]: Running VectorizeLoop_iteration_0 +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/SFKVectorizer]: VectorizeLoop_iteration_0 finished after 0.000 seconds +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/SFKVectorizer]: Finished (changed=True) +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/SFKVectorizer]: SFKVectorizer finished after 0.006 seconds +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/LateLegalizeInst]: Running LateLegalizeInst +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/LateLegalizeInst]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/LateLegalizeInst]: LateLegalizeInst finished after 0.001 seconds +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/CoalesceCCOp]: Running CoalesceCCOp +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/CoalesceCCOp]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/CoalesceCCOp]: CoalesceCCOp finished after 0.000 seconds +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/SimpleAllReduceTiling]: Running SimpleAllReduceTiling +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/SimpleAllReduceTiling]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/SimpleAllReduceTiling]: SimpleAllReduceTiling finished after 0.000 seconds +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/InsertCoreBarrier]: Running InsertCoreBarrier +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/InsertCoreBarrier]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/InsertCoreBarrier]: InsertCoreBarrier finished after 0.000 seconds +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/DMAProfiler]: Running DMAProfiler +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/DMAProfiler]: Top 10 (estimated) latency DMAs: +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/DMAProfiler]: Est. DMA time: 5.852us (1.000MiB, est bw: 179.191GB/s, 59.288% of tot. time) for float32<128 x 2048> TongaSB partitions[0] float32 (128, 2048) %13[i0.128,i1.2048] = load float32<128 x 2048> float32 (1, 256) %'x'[i0.128,i1.2048] # id=8, src_id=None, , instances=1 # dl = tensor_op_name: | if i0.128 == 0 and -i1.2048+255 >= 0 [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/DMAProfiler]: Est. DMA time: 4.018us (1.000MiB, est bw: 260.951GB/s, 40.712% of tot. time) for float32<128 x 2048> float32 (1, 256) %'y'[i0.128,i1.2048] = store float32<128 x 2048> TongaSB partitions[0] float32 (128, 2048) %11[i0.128,i1.2048] # id=10, src_id=None, , instances=1 # dl = tensor_op_name: | if i0.128 == 0 and -i1.2048+255 >= 0 [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/DMAProfiler]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/DMAProfiler]: DMAProfiler finished after 0.001 seconds +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/InferSharedMemLoc]: Running InferSharedMemLoc +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/InferSharedMemLoc]: Finished (changed=True) +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/InferSharedMemLoc]: InferSharedMemLoc finished after 0.000 seconds +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/DoNothing]: Running DoNothing +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/DoNothing]: Finished (changed=True) +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/DoNothing]: DoNothing finished after 0.002 seconds +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/InferSharedMemLoc]: Running InferSharedMemLoc +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/InferSharedMemLoc]: Finished (changed=True) +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/InferSharedMemLoc]: InferSharedMemLoc finished after 0.000 seconds +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/FactorizeBlkDims]: Running FactorizeBlkDims +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/FactorizeBlkDims]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/FactorizeBlkDims]: FactorizeBlkDims finished after 0.001 seconds +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_0 +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_0 finished after 0.001 seconds +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/NeuronInstComb]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.002 seconds +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/NeuronValueNumbering]: Running NeuronValueNumbering +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/NeuronValueNumbering]: Finished (changed=False) +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/NeuronValueNumbering]: NeuronValueNumbering finished after 0.001 seconds +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_0 +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_0 finished after 0.001 seconds +2025-11-04T21:38:45Z INFO 8682 [cumsum/Tensorizer/NeuronInstComb]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/SFKVectorizer]: SFKVectorizer finished after 0.459 seconds +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/LateLegalizeInst]: Running LateLegalizeInst +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.008 seconds +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/LowerTranspose]: Running LowerTranspose +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/LowerTranspose]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/LateLegalizeInst]: Finished (changed=True) +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/LowerTranspose]: LowerTranspose finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/LowerBroadcast]: Running LowerBroadcast +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/LowerBroadcast]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/LowerBroadcast]: LowerBroadcast finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb_iteration_0 +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb_iteration_0 finished after 0.001 seconds +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/LateNeuronInstComb]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb finished after 0.001 seconds +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/SpillPSum]: Running SpillPSum +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/SpillPSum]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/LateLegalizeInst]: LateLegalizeInst finished after 0.021 seconds +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/CoalesceCCOp]: Running CoalesceCCOp +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/SpillPSum]: SpillPSum finished after 0.003 seconds +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/LowerIntrinsics]: Running LowerIntrinsics +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/LowerIntrinsics]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/CoalesceCCOp]: Finished (changed=True) +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/LowerIntrinsics]: LowerIntrinsics finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/LegalizeType]: Running LegalizeType +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/LegalizeType]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/CoalesceCCOp]: CoalesceCCOp finished after 0.025 seconds +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/SimpleAllReduceTiling]: Running SimpleAllReduceTiling +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/SimpleAllReduceTiling]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/SimpleAllReduceTiling]: SimpleAllReduceTiling finished after 0.004 seconds +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/InsertCoreBarrier]: Running InsertCoreBarrier +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/LegalizeType]: LegalizeType finished after 0.002 seconds +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/NeuronLICM]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/InsertCoreBarrier]: Finished (changed=True) +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/InsertCoreBarrier]: InsertCoreBarrier finished after 0.013 seconds +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/DMAProfiler]: Running DMAProfiler +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/DMAProfiler]: Top 10 (estimated) latency DMAs: +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 41.879us (8.000MiB, est bw: 200.308GB/s, 12.020% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[4] bfloat16 (2, 2, 2, 2, 128, 2048) %'input67_local_1588'[i34_0,i35_0_0,c1_1580,c2_1581,i0.128,i1.2048] = load bfloat16<128 x 2048> {'CrossPassTensor': ''}bfloat16 (2, 2, 128, 4096) %'input67'[i35_0_0,c1_1580,i0.128,i1.2048+2048c2_1581] # id=1791, src_id=None, , instances=16 # dl = tensor_op_name: _dot.2 | hlo_id: 32 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 34.173us (4.000MiB, est bw: 122.737GB/s, 9.809% of tot. time) for bfloat16<128 x 512> TongaSB partitions[4] bfloat16 (2, 2, 2, 4, 128, 512) %'1499.2155'[T_i4,T_i0,T_i1,2T_i2_0+T_i2_1,i0.128,i1.512] = load bfloat16<128 x 512> non_local bfloat16 (2, 2, 4, 128, 2, 512) %'all_gather.1'[T_i0,T_i1,2T_i2_0+T_i2_1,i0.128,T_i4,i1.512] # id=2057, src_id=None, , instances=32 # dl = tensor_op_name: all_gather.1_pftranspose_1499 | hlo_id: 15 | [[i0.128];[i1.512]] -> [[i0.128];[i1.512]] +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 34.173us (4.000MiB, est bw: 122.737GB/s, 9.809% of tot. time) for bfloat16<128 x 512> TongaSB partitions[4] bfloat16 (2, 2, 2, 4, 128, 512) %'custom-call.177.2131'[i34_0,i16_0_0_1569,i16_0_1_0_1569,i16_0_1_1_1569,i0.128,i1.512] = load bfloat16<128 x 512> non_local bfloat16 (2, 2, 4, 128, 2, 512) %'all_gather.1'[i16_0_0_1569,i16_0_1_0_1569,i16_0_1_1_1569,i0.128,i34_0,i1.512] # id=1786, src_id=None, , instances=32 # dl = tensor_op_name: _custom-call.177 | hlo_id: 24 | [[i0.128];[i1.512]] -> [[i0.128];[i1.512]] +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 21.589us (4.000MiB, est bw: 194.277GB/s, 6.197% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[3] bfloat16 (2, 2, 2, 128, 2048) %'input65_local_1604'[i64_0,c0_1597,c1_1598,i0.128,i1.2048] = load bfloat16<128 x 2048> {'CrossPassTensor': ''}bfloat16 (2, 128, 4096) %'input65'[c0_1597,i0.128,i1.2048+2048c1_1598] # id=1838, src_id=None, , instances=8 # dl = tensor_op_name: _dot.1 | hlo_id: 88 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 21.589us (4.000MiB, est bw: 194.277GB/s, 6.197% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[3] bfloat16 (2, 2, 2, 128, 2048) %'input62_local_1620'[i2_0_1626,c0_1613,c1_1614,i0.128,i1.2048] = load bfloat16<128 x 2048> {'CrossPassTensor': ''}bfloat16 (2, 128, 4096) %'input62'[c0_1613,i0.128,i1.2048+2048c1_1614] # id=1885, src_id=None, , instances=8 # dl = tensor_op_name: _dot | hlo_id: 129 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 21.589us (4.000MiB, est bw: 194.277GB/s, 6.197% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[3] bfloat16 (2, 2, 2, 128, 2, 8, 128) %'get_tuple_element.1_local_1634'[i95_0_0_0_1651,c0_1628_0,c0_1628_1,i0.128,i3.2,i2.8,i1.128] = load bfloat16<128 x 2048> non_local bfloat16 (4, 2, 128, 8, 128) %'get_tuple_element.1'[2c0_1628_0+c0_1628_1,i3.2,i0.128,i2.8,i1.128] # id=1891, src_id=None, , instances=8 # dl = tensor_op_name: _dot.3 | hlo_id: 147 | [[i0.128];[i1.128, i2.8, i3.2]] -> [[i0.128];[i1.128, i2.8, i3.2]] +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 21.589us (4.000MiB, est bw: 194.277GB/s, 6.197% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[3] bfloat16 (2, 2, 2, 128, 2048) %'input61_local_1645'[i95_0_0_0_1651,i95_0_0_1,c2_1639_0_2709,i0.128,i1.2048] = load bfloat16<128 x 2048> {'CrossPassTensor': ''}bfloat16 (2, 2, 128, 4096) %'input61'[i95_0_0_0_1651,i95_0_0_1,i0.128,i1.2048+2048c2_1639_0_2709] # id=1892, src_id=None, , instances=8 # dl = tensor_op_name: _dot.3 | hlo_id: 147 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 17.737us (2.000MiB, est bw: 118.239GB/s, 5.091% of tot. time) for bfloat16<128 x 512> TongaSB partitions[2] bfloat16 (2, 8, 128, 512) %'transpose.1_pftranspose_1494'[T_i2_0_1498,c0_1533_1930,i0.128,i1.512] = indirect_load bfloat16<128 x 512> {'CrossPassTensor': ''}bfloat16 (151936, 2, 512) %'input60'[i0.128,T_i2_0_1498,i1.512] generic generic_dims:[0] generic_addrs: int32<128 x 1> TongaSB partitions[1] int32 (2, 128, 8, 1) %'gather.41.1928'[T_i2_0_1498,i0.128,c0_1533_1930,0] # id=1746, src_id=None, , attrs={'mode': OOBMode.ERROR}, instances=16 # dl = tensor_op_name: _gather.41 | hlo_id: 12 | [[i0.128];[i1.512]] -> [[i0.128];[i1.512]] +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 15.912us (4.000MiB, est bw: 263.593GB/s, 4.567% of tot. time) for bfloat16<128 x 1024> non_local bfloat16 (2097152,) %'dot.4-buffer-2752'[1024i95_0_0_0_1651+2048i0.128+262144i96_0_1651+i1.1024] = store bfloat16<128 x 1024> TongaSB partitions[2] bfloat16 (2, 8, 128, 1024) %1652[i95_0_0_0_1651,i96_0_1651,i0.128,i1.1024] # id=1895, src_id=None, , instances=16 # dl = tensor_op_name: _dot.3 | hlo_id: 147 | [[i0.128];[i1.1024]] -> [[i0.128];[i1.1024]] +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 13.768us (1.000MiB, est bw: 76.160GB/s, 3.952% of tot. time) for bfloat16<128 x 128> bfloat16 (8, 4, 4096, 128) %'output2'[i0.128,i1.128] generic, generic_dims:[0] generic_addrs: int32<128 x 1> TongaSB partitions[4] int32 (2, 2, 2, 4, 128, 1) %'scatter.6719.2318'[i111_0,i105_0,i105_1,i104_1_0,i0.128,0] = indirect_save bfloat16<128 x 128> TongaSB partitions[2] bfloat16 (2, 2, 128, 4, 2, 128) %'transpose.19'[i111_0,i105_0,i0.128,i104_1_0,i105_1,i1.128] # id=1909, src_id=None, , attrs={'mode': OOBMode.ERROR}, instances=32 # dl = tensor_op_name: _scatter.6719 | hlo_id: 187 | [[i0.128];[i1.128]] -> [[i0.128];[i1.128]] +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/DMAProfiler]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.001 seconds +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/InferPSumTensor]: Running InferPSumTensor +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/InferPSumTensor]: Running InferPSumTensor_iteration_0 +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/InferPSumTensor]: InferPSumTensor_iteration_0 finished after 0.001 seconds +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/InferPSumTensor]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/InferPSumTensor]: InferPSumTensor finished after 0.001 seconds +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/WeightCoalescing]: Running WeightCoalescing +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/WeightCoalescing]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/WeightCoalescing]: WeightCoalescing finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/LegalizeSundaAccess]: Running LegalizeSundaAccess +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/LegalizeSundaAccess]: Finished (changed=True) +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/LegalizeSundaAccess]: LegalizeSundaAccess finished after 0.003 seconds +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.007 seconds +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/ExpandISAMacro]: Running ExpandISAMacro +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/ExpandISAMacro]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/ExpandISAMacro]: ExpandISAMacro finished after 0.001 seconds +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/SimplifyNeuronTensor]: Running SimplifyNeuronTensor +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/SimplifyNeuronTensor]: Running DeadCodeElimination_iteration_0 +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/SimplifyNeuronTensor]: DeadCodeElimination_iteration_0 finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/SimplifyNeuronTensor]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/SimplifyNeuronTensor]: SimplifyNeuronTensor finished after 0.001 seconds +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/DMALocalityOpt]: Running DMALocalityOpt +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/DMALocalityOpt]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/DMALocalityOpt]: DMALocalityOpt finished after 0.001 seconds +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/DataStreaming]: Running DataStreaming +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/DataStreaming]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/DataStreaming]: DataStreaming finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/SFKVectorizer]: Running SFKVectorizer +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/SFKVectorizer]: Running VectorizeLoop_iteration_0 +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/SFKVectorizer]: VectorizeLoop_iteration_0 finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/SFKVectorizer]: Finished (changed=True) +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/DMAProfiler]: DMAProfiler finished after 0.015 seconds +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/OptimizeNKIKernels]: Running OptimizeNKIKernels +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/DoNothing]: Running DoNothing +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/DoNothing]: Finished (changed=True) +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/SFKVectorizer]: SFKVectorizer finished after 0.009 seconds +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/LateLegalizeInst]: Running LateLegalizeInst +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/LateLegalizeInst]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/DoNothing]: DoNothing finished after 0.004 seconds +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/InferSharedMemLoc]: Running InferSharedMemLoc +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/InferSharedMemLoc]: Finished (changed=True) +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/InferSharedMemLoc]: InferSharedMemLoc finished after 0.001 seconds +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/FactorizeBlkDims]: Running FactorizeBlkDims +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/FactorizeBlkDims]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/FactorizeBlkDims]: FactorizeBlkDims finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_0 +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_0 finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/NeuronInstComb]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/NeuronValueNumbering]: Running NeuronValueNumbering +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/NeuronValueNumbering]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/NeuronValueNumbering]: NeuronValueNumbering finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/NeuronInstComb]: Running NeuronInstComb_iteration_0 +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/NeuronInstComb]: NeuronInstComb_iteration_0 finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/NeuronInstComb]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.001 seconds +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/LowerTranspose]: Running LowerTranspose +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/LowerTranspose]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/LowerTranspose]: LowerTranspose finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/LowerBroadcast]: Running LowerBroadcast +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/LowerBroadcast]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/LowerBroadcast]: LowerBroadcast finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb_iteration_0 +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb_iteration_0 finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/LateNeuronInstComb]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/SpillPSum]: Running SpillPSum +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/SpillPSum]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/SpillPSum]: SpillPSum finished after 0.003 seconds +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/LowerIntrinsics]: Running LowerIntrinsics +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/LowerIntrinsics]: Finished (changed=True) +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/LowerIntrinsics]: LowerIntrinsics finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/LegalizeType]: Running LegalizeType +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/LegalizeType]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/LegalizeType]: LegalizeType finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/NeuronLICM]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/InferPSumTensor]: Running InferPSumTensor +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/InferPSumTensor]: Running InferPSumTensor_iteration_0 +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/InferPSumTensor]: InferPSumTensor_iteration_0 finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/InferPSumTensor]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/InferPSumTensor]: InferPSumTensor finished after 0.001 seconds +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/WeightCoalescing]: Running WeightCoalescing +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/WeightCoalescing]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/WeightCoalescing]: WeightCoalescing finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/LegalizeSundaAccess]: Running LegalizeSundaAccess +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/LegalizeSundaAccess]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/LegalizeSundaAccess]: LegalizeSundaAccess finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/ExpandISAMacro]: Running ExpandISAMacro +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/ExpandISAMacro]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/ExpandISAMacro]: ExpandISAMacro finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/SimplifyNeuronTensor]: Running SimplifyNeuronTensor +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/SimplifyNeuronTensor]: Running DeadCodeElimination_iteration_0 +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/SimplifyNeuronTensor]: DeadCodeElimination_iteration_0 finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/SimplifyNeuronTensor]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/SimplifyNeuronTensor]: SimplifyNeuronTensor finished after 0.001 seconds +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/DMALocalityOpt]: Running DMALocalityOpt +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/DMALocalityOpt]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/DMALocalityOpt]: DMALocalityOpt finished after 0.002 seconds +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/DataStreaming]: Running DataStreaming +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/DataStreaming]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/DataStreaming]: DataStreaming finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/SFKVectorizer]: Running SFKVectorizer +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/SFKVectorizer]: Running VectorizeLoop_iteration_0 +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/SFKVectorizer]: VectorizeLoop_iteration_0 finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/SFKVectorizer]: Finished (changed=True) +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/SFKVectorizer]: SFKVectorizer finished after 0.002 seconds +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/LateLegalizeInst]: Running LateLegalizeInst +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/LateLegalizeInst]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/LateLegalizeInst]: LateLegalizeInst finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/CoalesceCCOp]: Running CoalesceCCOp +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/CoalesceCCOp]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/CoalesceCCOp]: CoalesceCCOp finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/SimpleAllReduceTiling]: Running SimpleAllReduceTiling +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/SimpleAllReduceTiling]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/SimpleAllReduceTiling]: SimpleAllReduceTiling finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/InsertCoreBarrier]: Running InsertCoreBarrier +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/InsertCoreBarrier]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/InsertCoreBarrier]: InsertCoreBarrier finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/DMAProfiler]: Running DMAProfiler +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/DMAProfiler]: Top 10 (estimated) latency DMAs: +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/DMAProfiler]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/DMAProfiler]: DMAProfiler finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/InferSharedMemLoc]: Running InferSharedMemLoc +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/InferSharedMemLoc]: Finished (changed=True) +2025-11-04T21:38:46Z INFO 8680 [attention_isa_kernel/Tensorizer/InferSharedMemLoc]: InferSharedMemLoc finished after 0.001 seconds +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/OptimizeNKIKernels]: Allocate SB of shape (128, 35676) for CausalAttentionMMSoftmaxMMWithoutSwap +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/OptimizeNKIKernels]: Allocate PSUM of shape (8, 128, 2048) for CausalAttentionMMSoftmaxMMWithoutSwap +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/OptimizeNKIKernels]: Finished (changed=True) +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/OptimizeNKIKernels]: OptimizeNKIKernels finished after 0.332 seconds +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/CCOpFusion]: Running CCOpFusion +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/CCOpFusion]: Running CCOpFusion_iteration_0 +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/LateLegalizeInst]: LateLegalizeInst finished after 0.001 seconds +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/CoalesceCCOp]: Running CoalesceCCOp +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/CoalesceCCOp]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/CoalesceCCOp]: CoalesceCCOp finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/SimpleAllReduceTiling]: Running SimpleAllReduceTiling +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/SimpleAllReduceTiling]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/SimpleAllReduceTiling]: SimpleAllReduceTiling finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/InsertCoreBarrier]: Running InsertCoreBarrier +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/InsertCoreBarrier]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/CCOpFusion]: CCOpFusion_iteration_0 finished after 0.051 seconds +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/CCOpFusion]: Finished (changed=True) +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/InsertCoreBarrier]: InsertCoreBarrier finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/DMAProfiler]: Running DMAProfiler +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/DMAProfiler]: Top 10 (estimated) latency DMAs: +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/DMAProfiler]: Est. DMA time: 5.852us (1.000MiB, est bw: 179.191GB/s, 59.288% of tot. time) for float32<128 x 2048> TongaSB partitions[0] float32 (128, 2048) %13[i0.128,i1.2048] = load float32<128 x 2048> float32 (1, 256) %'x'[i0.128,i1.2048] # id=8, src_id=None, , instances=1 # dl = tensor_op_name: | if i0.128 == 0 and -i1.2048+255 >= 0 [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/DMAProfiler]: Est. DMA time: 4.018us (1.000MiB, est bw: 260.951GB/s, 40.712% of tot. time) for float32<128 x 2048> float32 (1, 256) %'y'[i0.128,i1.2048] = store float32<128 x 2048> TongaSB partitions[0] float32 (128, 2048) %11[i0.128,i1.2048] # id=10, src_id=None, , instances=1 # dl = tensor_op_name: | if i0.128 == 0 and -i1.2048+255 >= 0 [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/DMAProfiler]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/DMAProfiler]: DMAProfiler finished after 0.001 seconds +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/InferSharedMemLoc]: Running InferSharedMemLoc +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/InferSharedMemLoc]: Finished (changed=True) +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/CCOpFusion]: CCOpFusion finished after 0.051 seconds +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/StaticProfiler]: Running StaticProfiler +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/StaticProfiler]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8682 [cumsum/Tensorizer/InferSharedMemLoc]: InferSharedMemLoc finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/StaticProfiler]: StaticProfiler finished after 0.008 seconds +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/SplitAPUnionSets]: Running SplitAPUnionSets +2025-11-04T21:38:46Z INFO 8682 [sg0002/Tensorizer/OptimizeNKIKernels]: Finished (changed=True) +2025-11-04T21:38:46Z INFO 8682 [sg0002/Tensorizer/OptimizeNKIKernels]: OptimizeNKIKernels finished after 4.392 seconds +2025-11-04T21:38:46Z INFO 8682 [sg0002/Tensorizer/CCOpFusion]: Running CCOpFusion +2025-11-04T21:38:46Z INFO 8682 [sg0002/Tensorizer/CCOpFusion]: Running CCOpFusion_iteration_0 +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/SplitAPUnionSets]: Finished (changed=True) +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/SplitAPUnionSets]: SplitAPUnionSets finished after 0.040 seconds +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/LateLegalizePostSplit]: Running LateLegalizePostSplit +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/LateLegalizePostSplit]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/LateLegalizePostSplit]: LateLegalizePostSplit finished after 0.006 seconds +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/InferSharedMemLoc]: Running InferSharedMemLoc +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/InferSharedMemLoc]: Finished (changed=True) +2025-11-04T21:38:46Z INFO 8682 [sg0002/Tensorizer/CCOpFusion]: CCOpFusion_iteration_0 finished after 0.062 seconds +2025-11-04T21:38:46Z INFO 8682 [sg0002/Tensorizer/CCOpFusion]: Finished (changed=True) +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/InferSharedMemLoc]: InferSharedMemLoc finished after 0.007 seconds +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/LowerShardAxis]: Running LowerShardAxis +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/LowerShardAxis]: Finished (changed=True) +2025-11-04T21:38:46Z INFO 8682 [sg0002/Tensorizer/CCOpFusion]: CCOpFusion finished after 0.063 seconds +2025-11-04T21:38:46Z INFO 8682 [sg0002/Tensorizer/StaticProfiler]: Running StaticProfiler +2025-11-04T21:38:46Z WARNING 8682 [sg0002/Tensorizer/StaticProfiler]: matmul-based transposes inserted by penguin takes up 51.13 percent of all matmul computation +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/LowerShardAxis]: LowerShardAxis finished after 0.013 seconds +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/CCOpFusion]: Running CCOpFusion +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/CCOpFusion]: Running CCOpFusion_iteration_0 +2025-11-04T21:38:46Z INFO 8682 [sg0002/Tensorizer/StaticProfiler]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8682 [sg0002/Tensorizer/StaticProfiler]: StaticProfiler finished after 0.021 seconds +2025-11-04T21:38:46Z INFO 8682 [sg0002/Tensorizer/SplitAPUnionSets]: Running SplitAPUnionSets +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/CCOpFusion]: CCOpFusion_iteration_0 finished after 0.037 seconds +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/CCOpFusion]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/CCOpFusion]: CCOpFusion finished after 0.038 seconds +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/DumpGraphAndMetadata]: Running DumpGraphAndMetadata +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/DumpGraphAndMetadata]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/DumpGraphAndMetadata]: DumpGraphAndMetadata finished after 0.009 seconds +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/ZeroSizeTensorElimination]: Running ZeroSizeTensorElimination +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/ZeroSizeTensorElimination]: Finished (changed=False) +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/ZeroSizeTensorElimination]: ZeroSizeTensorElimination finished after 0.000 seconds +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/LowerToSendRecv]: Running LowerToSendRecv +2025-11-04T21:38:46Z INFO 8682 [sg0002/Tensorizer/SplitAPUnionSets]: Finished (changed=True) +2025-11-04T21:38:46Z INFO 8680 [sg0000/Tensorizer/LowerToSendRecv]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8682 [sg0002/Tensorizer/SplitAPUnionSets]: SplitAPUnionSets finished after 0.097 seconds +2025-11-04T21:38:47Z INFO 8682 [sg0002/Tensorizer/LateLegalizePostSplit]: Running LateLegalizePostSplit +2025-11-04T21:38:47Z INFO 8680 [sg0000/Tensorizer/LowerToSendRecv]: LowerToSendRecv finished after 0.019 seconds +2025-11-04T21:38:47Z INFO 8680 [sg0000/Tensorizer/BirCodeGenLoop]: Running BirCodeGenLoop +2025-11-04T21:38:47Z INFO 8682 [sg0002/Tensorizer/LateLegalizePostSplit]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8682 [sg0002/Tensorizer/LateLegalizePostSplit]: LateLegalizePostSplit finished after 0.020 seconds +2025-11-04T21:38:47Z INFO 8682 [sg0002/Tensorizer/InferSharedMemLoc]: Running InferSharedMemLoc +2025-11-04T21:38:47Z INFO 8682 [sg0002/Tensorizer/InferSharedMemLoc]: Finished (changed=True) +2025-11-04T21:38:47Z INFO 8682 [sg0002/Tensorizer/InferSharedMemLoc]: InferSharedMemLoc finished after 0.040 seconds +2025-11-04T21:38:47Z INFO 8682 [sg0002/Tensorizer/LowerShardAxis]: Running LowerShardAxis +2025-11-04T21:38:47Z INFO 8682 [sg0002/Tensorizer/LowerShardAxis]: Finished (changed=True) +2025-11-04T21:38:47Z INFO 8682 [sg0002/Tensorizer/LowerShardAxis]: LowerShardAxis finished after 0.024 seconds +2025-11-04T21:38:47Z INFO 8682 [sg0002/Tensorizer/CCOpFusion]: Running CCOpFusion +2025-11-04T21:38:47Z INFO 8682 [sg0002/Tensorizer/CCOpFusion]: Running CCOpFusion_iteration_0 +2025-11-04T21:38:47Z INFO 8682 [sg0002/Tensorizer/CCOpFusion]: CCOpFusion_iteration_0 finished after 0.054 seconds +2025-11-04T21:38:47Z INFO 8682 [sg0002/Tensorizer/CCOpFusion]: Finished (changed=True) +2025-11-04T21:38:47Z INFO 8680 [sg0000/Tensorizer/BirCodeGenLoop]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8682 [sg0002/Tensorizer/CCOpFusion]: CCOpFusion finished after 0.055 seconds +2025-11-04T21:38:47Z INFO 8680 [sg0000/Tensorizer/BirCodeGenLoop]: BirCodeGenLoop finished after 0.174 seconds +2025-11-04T21:38:47Z INFO 8682 [sg0002/Tensorizer/DumpGraphAndMetadata]: Running DumpGraphAndMetadata +2025-11-04T21:38:47Z INFO 8682 [sg0002/Tensorizer/DumpGraphAndMetadata]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8682 [sg0002/Tensorizer/DumpGraphAndMetadata]: DumpGraphAndMetadata finished after 0.076 seconds +2025-11-04T21:38:47Z INFO 8682 [sg0002/Tensorizer/ZeroSizeTensorElimination]: Running ZeroSizeTensorElimination +2025-11-04T21:38:47Z INFO 8682 [sg0002/Tensorizer/ZeroSizeTensorElimination]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8682 [sg0002/Tensorizer/ZeroSizeTensorElimination]: ZeroSizeTensorElimination finished after 0.000 seconds +2025-11-04T21:38:47Z INFO 8682 [sg0002/Tensorizer/LowerToSendRecv]: Running LowerToSendRecv +2025-11-04T21:38:47Z INFO 8680 [Tensorizer]: BirCodeGen estimate #instances=1305 in sg0000 +2025-11-04T21:38:47Z INFO 8680 [Tensorizer]: IR signature: 98daca180c0ec3f47dd29ac8a5821c14c620605cb9f684d9efa077642378433a for nc00/sg0000/TensorizerBIR +2025-11-04T21:38:47Z INFO 8680 [sg0000/Tensorizer/BirCodeGenLoop]: Running BirCodeGenLoop +2025-11-04T21:38:47Z INFO 8682 [sg0002/Tensorizer/LowerToSendRecv]: Finished (changed=True) +2025-11-04T21:38:47Z INFO 8682 [sg0002/Tensorizer/LowerToSendRecv]: LowerToSendRecv finished after 0.028 seconds +2025-11-04T21:38:47Z INFO 8682 [sg0002/Tensorizer/BirCodeGenLoop]: Running BirCodeGenLoop +2025-11-04T21:38:47Z INFO 8680 [sg0000/Tensorizer/BirCodeGenLoop]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8680 [sg0000/Tensorizer/BirCodeGenLoop]: BirCodeGenLoop finished after 0.072 seconds +2025-11-04T21:38:47Z INFO 8680 [Tensorizer]: BirCodeGen estimate #instances=1305 in sg0000 +2025-11-04T21:38:47Z INFO 8680 [Tensorizer]: IR signature: 2f3cd749ef44ac56048698881c3043500020f4b3a8872c0592b618a21b3a290d for nc01/sg0000/TensorizerBIR +2025-11-04T21:38:47Z INFO 8680 [Tensorizer]: Weights total number of bytes: 229634 +2025-11-04T21:38:47Z INFO 8680 [Tensorizer]: Successfully built model. +2025-11-04T21:38:47Z INFO 8682 [sg0002/Tensorizer/BirCodeGenLoop]: Finished (changed=False) +2025-11-04T21:38:47Z INFO 8682 [sg0002/Tensorizer/BirCodeGenLoop]: BirCodeGenLoop finished after 0.310 seconds +2025-11-04T21:38:47Z INFO 8682 [Tensorizer]: BirCodeGen estimate #instances=26049 in sg0002 +2025-11-04T21:38:47Z INFO 8682 [Tensorizer]: IR signature: d02a6a5b8b788c8805aa29f851a6db72e11bae39ebe34ad3abe1a75d0126c11d for nc00/sg0002/TensorizerBIR +2025-11-04T21:38:47Z INFO 8682 [sg0002/Tensorizer/BirCodeGenLoop]: Running BirCodeGenLoop +2025-11-04T21:38:48Z INFO 8682 [sg0002/Tensorizer/BirCodeGenLoop]: Finished (changed=False) +2025-11-04T21:38:48Z INFO 8682 [sg0002/Tensorizer/BirCodeGenLoop]: BirCodeGenLoop finished after 0.517 seconds +2025-11-04T21:38:48Z INFO 8682 [Tensorizer]: BirCodeGen estimate #instances=26049 in sg0002 +2025-11-04T21:38:48Z INFO 8682 [Tensorizer]: IR signature: cabc1abb7e92515618d6f8337386e63f4deff5b3549d4d74e54363188aae28da for nc01/sg0002/TensorizerBIR +2025-11-04T21:38:48Z INFO 8682 [Tensorizer]: Weights total number of bytes: 410376 +2025-11-04T21:38:48Z INFO 8682 [Tensorizer]: Successfully built model. +2025-11-04T21:38:48Z USER 8594 [root/Tensorizer/Tensorizer]: Tensorizer finished after 15.210 seconds +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: End tensorization +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: Network input: input60 +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: Network input: input0 +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: Network input: input63 +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: Network input: input67 +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: Network input: input66 +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: Network input: input1 +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: Network input: input65 +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: Network input: input64 +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: Network input: input62 +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: Network input: input61 +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: Network input: input4 +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: Network input: input2 +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: Network input: input5 +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: Network input: input70 +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: Network input: input71 +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: Network input: input69 +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: Network input: input68 +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: Network input: input74 +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: Network input: input78 +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: Network input: input77 +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: Network input: input76 +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: Network input: input75 +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: Network input: input73 +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: Network input: input72 +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: Network input: input6 +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: Network input: input2 +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: Network input: input7 +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: Network input: input367 +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: Network input: input368 +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: Network input: input366 +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: Network input: input365 +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: Network input: input370 +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: Network input: input1 +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: Network input: input369 +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: Network input: input3 +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: wrote bir.json +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: wrote tensor_map.json +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: wrote bir.json +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: wrote tensor_map.json +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: wrote bir.json +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: wrote tensor_map.json +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: wrote bir.json +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: wrote tensor_map.json +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: wrote bir.json +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: wrote tensor_map.json +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: wrote bir.json +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: wrote tensor_map.json +2025-11-04T21:38:48Z INFO 8594 [job.Frontend.0]: Job #0 finished +2025-11-04T21:38:48Z INFO 8594 [pipeline.Pipeline.0]: Finished job job.Frontend.0 +2025-11-04T21:38:48Z INFO 8594 [pipeline.Pipeline.0]: Starting job job.StaticIOTranspose.0 +2025-11-04T21:38:48Z INFO 8594 [pipeline.Pipeline.0]: Finished job job.StaticIOTranspose.0 +2025-11-04T21:38:48Z INFO 8594 [pipeline.Pipeline.0]: Starting job job.WalrusDriver.0 +2025-11-04T21:38:48Z INFO 8594 [job.WalrusDriver.0]: BackendDriver has 6 states with 2 core LNC +2025-11-04T21:38:48Z INFO 8594 [job.WalrusDriver.0]: BackendDriver VNC cwd: /home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk3/neuronxcc-miwah3fg +2025-11-04T21:38:48Z INFO 8594 [job.WalrusDriver.0]: BackendDriver: found partitions within VNC, using VNC + MT modular flow. +2025-11-04T21:38:48Z INFO 8594 [job.BIRLinker.1]: Creating directory nc00/sgLnk/sg00 +2025-11-04T21:38:48Z INFO 8594 [job.BIRLinker.2]: Creating directory nc01/sgLnk/sg00 +2025-11-04T21:38:48Z INFO 8594 [job.WalrusDriver.0]: BackendDriver in_state.num_states 6 with 2 core LNC +2025-11-04T21:38:48Z INFO 8594 [job.WalrusDriver.0]: Executing /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/starfish/bin/walrus_driver --optlevel 2 --allocator coloring --verbose 35 --logfile-verbose 20 --logfile /home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk3/log-neuron-cc.txt -o walrus_bir.out.json --enable-call-graph --enable-mt-backend --link-subgraphs nc00/sg00,nc01/sg00,nc00/sg01,nc01/sg01,nc00/sg02,nc01/sg02 --link-dir sgLnk/sg00 --vnc-nc-per-sengine 2 --execute-repetition 1 -i bir.json --min_split_size 10240 --skip_split_vns '' --no_split_dram --split_huge_dram_tensor 1.0 --preprocessing_only --max_tensorizer_distance 64 --pack_same_shape_only --instruction_fetch_latency 511 --max-partitions 1 --policy 3 --auxflag 0 --interleave none --schedule-delayed-latency 1 --postsched-mm-accum-reorder=false --max-load-lower-bound 0.14 --force-prefetch-follow-incoming-order -1 --allreduce-buffer-size 500 --dram-page-size 512 --dram-rotation-size -1 --allreduce-rotation-dis 8 --repeat-load-thres 4 --enable-mm-transpose-remat-optimization=true --save-len-thres 512 --save-dma-cnt-thres 32 --print-format json --relaxed-order=true --enable-anti-dependence-reduction=false --num-semaphores-per-queue 16 --numcores 1 --act-root-json /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/pwp/pwp_bin_trainium/act_info.json --dve-root-json /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/dve/dve_bin_gen3/dve_info.json --enable-verifier=true --enable-birsim=false --enable-birsim-sync-only=false --enable-data-race-checker=false --enable-new-backend=true --inject-error=NONE --enable-internal-partitioner --dge-levels scalar_dynamic_offset,vector_dynamic_offsets,spill_reload,io --dynamic-dma-scratch-size-per-partition=16384 --neff-output-filename /home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk3/model.MODULE_be035899334776123ed5+d208bdce.neff +2025-11-04T21:38:48Z INFO 8594 [job.WalrusDriver.0]: Working directory is /home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk3/neuronxcc-miwah3fg +2025-11-04T21:38:48Z INFO 8594 [job.WalrusDriver.0]: propagate_exit=True +2025-11-04T21:38:48Z INFO 8594 [job.WalrusDriver.0]: use_logger=False +2025-11-04T21:38:48Z INFO 8594 [job.WalrusDriver.0]: expose_stderr=True +2025-11-04T21:38:48Z INFO 9044 [Logging]: Logging to ../log-neuron-cc.txt at level 'INFO' +2025-11-04T21:38:48Z INFO 9044 [BackendDriver]: max_allowed_parallelism=12 +2025-11-04T21:38:48Z INFO 9044 [BackendDriver]: Loading module from nc01/sg01/bir.json +2025-11-04T21:38:48Z INFO 9044 [BackendDriver]: Loading module from nc00/sg02/bir.json +2025-11-04T21:38:48Z INFO 9044 [BackendDriver]: Loading module from nc01/sg02/bir.json +2025-11-04T21:38:48Z INFO 9044 [BackendDriver]: Loading module from nc00/sg00/bir.json +2025-11-04T21:38:48Z INFO 9044 [BackendDriver]: Loading module from nc00/sg01/bir.json +2025-11-04T21:38:48Z INFO 9044 [BackendDriver]: Loading module from nc01/sg00/bir.json +2025-11-04T21:38:49Z INFO 9044 [BackendDriver]: Backend driver mtBackend: true numModules: 6 Cwd: "/home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk3/neuronxcc-miwah3fg" +2025-11-04T21:38:49Z INFO 9044 [BackendDriver]: DynamicDMA is enabled +2025-11-04T21:38:49Z INFO 9044 [BackendDriver]: DynamicDMA levels being enabled: io, spill_reload, scalar_dynamic_offset, vector_dynamic_offsets, +2025-11-04T21:38:49Z INFO 9044 [BackendDriver]: Modular flow call graph is enabled +2025-11-04T21:38:49Z INFO 9044 [BackendDriver]: Internal partitioner is enabled +2025-11-04T21:38:49Z USER 9044 [BackendPassManager]: Running mod_parallel_pass +2025-11-04T21:38:49Z INFO 9044 [BackendPassManager]: Inputs to mod_parallel_pass: modules=6 functions=6 allocs=1914 blocks=6 instructions=1776 Max writers: 65 Max Readers: 64 +2025-11-04T21:38:49Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running do_nothing +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to do_nothing: modules=1 functions=1 allocs=210 blocks=1 instructions=101 Max writers: 4 Max Readers: 9 +2025-11-04T21:38:49Z USER 9044 (nc00/sg00) [ModuleForkPass]: do_nothing finished after 0.000 seconds +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 92mb, ru_maxrss: 213mb (delta=0mb) +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 210 memory location(s), 1 block(s), and 101 instruction(s). Max writers: 4 Max Readers: 9 +2025-11-04T21:38:49Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running birverifier +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=210 blocks=1 instructions=101 Max writers: 4 Max Readers: 9 +2025-11-04T21:38:49Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running do_nothing +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to do_nothing: modules=1 functions=1 allocs=591 blocks=1 instructions=703 Max writers: 65 Max Readers: 64 +2025-11-04T21:38:49Z USER 9044 (nc00/sg02) [ModuleForkPass]: do_nothing finished after 0.001 seconds +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 92mb, ru_maxrss: 213mb (delta=0mb) +2025-11-04T21:38:49Z WARNING 9044 [birverifier::InstVisitor]: (nc00/sg00) Non - output memory location with no reader: {convert.232.2350}@SB<0,0>(1x2)#Internal DebugInfo: +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 591 memory location(s), 1 block(s), and 703 instruction(s). Max writers: 65 Max Readers: 64 +2025-11-04T21:38:49Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running birverifier +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=591 blocks=1 instructions=703 Max writers: 65 Max Readers: 64 +2025-11-04T21:38:49Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running do_nothing +2025-11-04T21:38:49Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running do_nothing +2025-11-04T21:38:49Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running do_nothing +2025-11-04T21:38:49Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running do_nothing +2025-11-04T21:38:49Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to do_nothing: modules=1 functions=1 allocs=210 blocks=1 instructions=101 Max writers: 4 Max Readers: 9 +2025-11-04T21:38:49Z USER 9044 (nc01/sg00) [ModuleForkPass]: do_nothing finished after 0.000 seconds +2025-11-04T21:38:49Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to do_nothing: modules=1 functions=1 allocs=156 blocks=1 instructions=84 Max writers: 4 Max Readers: 8 +2025-11-04T21:38:49Z USER 9044 (nc00/sg01) [ModuleForkPass]: do_nothing finished after 0.000 seconds +2025-11-04T21:38:49Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to do_nothing: modules=1 functions=1 allocs=156 blocks=1 instructions=84 Max writers: 4 Max Readers: 8 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 92mb, ru_maxrss: 213mb (delta=0mb) +2025-11-04T21:38:49Z USER 9044 (nc01/sg01) [ModuleForkPass]: do_nothing finished after 0.000 seconds +2025-11-04T21:38:49Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 210 memory location(s), 1 block(s), and 101 instruction(s). Max writers: 4 Max Readers: 9 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 92mb, ru_maxrss: 213mb (delta=0mb) +2025-11-04T21:38:49Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running birverifier +2025-11-04T21:38:49Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=210 blocks=1 instructions=101 Max writers: 4 Max Readers: 9 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 156 memory location(s), 1 block(s), and 84 instruction(s). Max writers: 4 Max Readers: 8 +2025-11-04T21:38:49Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running birverifier +2025-11-04T21:38:49Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 92mb, ru_maxrss: 213mb (delta=0mb) +2025-11-04T21:38:49Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=156 blocks=1 instructions=84 Max writers: 4 Max Readers: 8 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 156 memory location(s), 1 block(s), and 84 instruction(s). Max writers: 4 Max Readers: 8 +2025-11-04T21:38:49Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running birverifier +2025-11-04T21:38:49Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=156 blocks=1 instructions=84 Max writers: 4 Max Readers: 8 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to do_nothing: modules=1 functions=1 allocs=591 blocks=1 instructions=703 Max writers: 65 Max Readers: 64 +2025-11-04T21:38:49Z USER 9044 (nc01/sg02) [ModuleForkPass]: do_nothing finished after 0.000 seconds +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 92mb, ru_maxrss: 213mb (delta=0mb) +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 591 memory location(s), 1 block(s), and 703 instruction(s). Max writers: 65 Max Readers: 64 +2025-11-04T21:38:49Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running birverifier +2025-11-04T21:38:49Z WARNING 9044 [birverifier::InstVisitor]: (nc01/sg00) Non - output memory location with no reader: {convert.232.2350}@SB<0,0>(1x2)#Internal DebugInfo: +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=591 blocks=1 instructions=703 Max writers: 65 Max Readers: 64 +2025-11-04T21:38:49Z USER 9044 (nc01/sg00) [ModuleForkPass]: birverifier finished after 0.022 seconds +2025-11-04T21:38:49Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 102mb, ru_maxrss: 213mb (delta=0mb) +2025-11-04T21:38:49Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 210 memory location(s), 1 block(s), and 101 instruction(s). Max writers: 4 Max Readers: 9 +2025-11-04T21:38:49Z USER 9044 (nc00/sg00) [ModuleForkPass]: birverifier finished after 0.026 seconds +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 102mb, ru_maxrss: 213mb (delta=0mb) +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 210 memory location(s), 1 block(s), and 101 instruction(s). Max writers: 4 Max Readers: 9 +2025-11-04T21:38:49Z USER 9044 (nc00/sg01) [ModuleForkPass]: birverifier finished after 0.038 seconds +2025-11-04T21:38:49Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 113mb, ru_maxrss: 213mb (delta=0mb) +2025-11-04T21:38:49Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 156 memory location(s), 1 block(s), and 84 instruction(s). Max writers: 4 Max Readers: 8 +2025-11-04T21:38:49Z USER 9044 (nc01/sg01) [ModuleForkPass]: birverifier finished after 0.049 seconds +2025-11-04T21:38:49Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 123mb, ru_maxrss: 213mb (delta=0mb) +2025-11-04T21:38:49Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 156 memory location(s), 1 block(s), and 84 instruction(s). Max writers: 4 Max Readers: 8 +2025-11-04T21:38:49Z USER 9044 (nc00/sg02) [ModuleForkPass]: birverifier finished after 0.159 seconds +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 183mb, ru_maxrss: 213mb (delta=0mb) +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 591 memory location(s), 1 block(s), and 703 instruction(s). Max writers: 65 Max Readers: 64 +2025-11-04T21:38:49Z USER 9044 (nc01/sg02) [ModuleForkPass]: birverifier finished after 0.188 seconds +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 187mb, ru_maxrss: 213mb (delta=0mb) +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 591 memory location(s), 1 block(s), and 703 instruction(s). Max writers: 65 Max Readers: 64 +2025-11-04T21:38:49Z USER 9044 [ModuleForkPass]: Compilation status: Total modules: 6, Passed: 6, Failed: 0 +2025-11-04T21:38:49Z USER 9044 [BackendPassManager]: mod_parallel_pass finished after 0.197 seconds +2025-11-04T21:38:49Z INFO 9044 [BackendPassManager]: curr_vmrss: 187mb, ru_maxrss: 213mb (delta=0mb) +2025-11-04T21:38:49Z USER 9044 [BackendPassManager]: Running subgraph_parallel_pass +2025-11-04T21:38:49Z INFO 9044 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=6 functions=6 allocs=1914 blocks=6 instructions=1776 Max writers: 65 Max Readers: 64 +2025-11-04T21:38:49Z USER 9044 (sg00) [SubgraphForkPass]: Running lnc_verifier +2025-11-04T21:38:49Z USER 9044 (sg02) [SubgraphForkPass]: Running lnc_verifier +2025-11-04T21:38:49Z USER 9044 (sg01) [SubgraphForkPass]: Running lnc_verifier +2025-11-04T21:38:49Z INFO 9044 (sg01) [SubgraphForkPass]: Inputs to lnc_verifier: modules=2 functions=2 allocs=312 blocks=2 instructions=168 Max writers: 4 Max Readers: 8 +2025-11-04T21:38:49Z INFO 9044 (sg00) [SubgraphForkPass]: Inputs to lnc_verifier: modules=2 functions=2 allocs=420 blocks=2 instructions=202 Max writers: 4 Max Readers: 9 +2025-11-04T21:38:49Z USER 9044 (sg00) [SubgraphForkPass]: lnc_verifier finished after 0.000 seconds +2025-11-04T21:38:49Z INFO 9044 (sg02) [SubgraphForkPass]: Inputs to lnc_verifier: modules=2 functions=2 allocs=1182 blocks=2 instructions=1406 Max writers: 65 Max Readers: 64 +2025-11-04T21:38:49Z INFO 9044 (sg00) [SubgraphForkPass]: curr_vmrss: 187mb, ru_maxrss: 213mb (delta=0mb) +2025-11-04T21:38:49Z USER 9044 (sg01) [SubgraphForkPass]: lnc_verifier finished after 0.000 seconds +2025-11-04T21:38:49Z INFO 9044 (sg00) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 420 memory location(s), 2 block(s), and 202 instruction(s). Max writers: 4 Max Readers: 9 +2025-11-04T21:38:49Z INFO 9044 (sg01) [SubgraphForkPass]: curr_vmrss: 187mb, ru_maxrss: 213mb (delta=0mb) +2025-11-04T21:38:49Z INFO 9044 (sg01) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 312 memory location(s), 2 block(s), and 168 instruction(s). Max writers: 4 Max Readers: 8 +2025-11-04T21:38:49Z USER 9044 (sg02) [SubgraphForkPass]: lnc_verifier finished after 0.001 seconds +2025-11-04T21:38:49Z INFO 9044 (sg02) [SubgraphForkPass]: curr_vmrss: 187mb, ru_maxrss: 213mb (delta=0mb) +2025-11-04T21:38:49Z INFO 9044 (sg02) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 1182 memory location(s), 2 block(s), and 1406 instruction(s). Max writers: 65 Max Readers: 64 +2025-11-04T21:38:49Z USER 9044 [SubgraphForkPass]: Compilation status: Total subgraphs: 3, Passed: 3, Failed: 0 +2025-11-04T21:38:49Z USER 9044 [BackendPassManager]: subgraph_parallel_pass finished after 0.004 seconds +2025-11-04T21:38:49Z INFO 9044 [BackendPassManager]: curr_vmrss: 187mb, ru_maxrss: 213mb (delta=0mb) +2025-11-04T21:38:49Z USER 9044 [BackendPassManager]: Running mod_parallel_pass +2025-11-04T21:38:49Z INFO 9044 [BackendPassManager]: Inputs to mod_parallel_pass: modules=6 functions=6 allocs=1914 blocks=6 instructions=1776 Max writers: 65 Max Readers: 64 +2025-11-04T21:38:49Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running expand_replication +2025-11-04T21:38:49Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running expand_replication +2025-11-04T21:38:49Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running expand_replication +2025-11-04T21:38:49Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running expand_replication +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to expand_replication: modules=1 functions=1 allocs=210 blocks=1 instructions=101 Max writers: 4 Max Readers: 9 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to expand_replication: modules=1 functions=1 allocs=156 blocks=1 instructions=84 Max writers: 4 Max Readers: 8 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg01) [ExpandReplication]: Found 0 replicated matmults +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [ExpandReplication]: Found 0 replicated matmults +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to expand_replication: modules=1 functions=1 allocs=591 blocks=1 instructions=703 Max writers: 65 Max Readers: 64 +2025-11-04T21:38:49Z USER 9044 (nc00/sg00) [ModuleForkPass]: expand_replication finished after 0.000 seconds +2025-11-04T21:38:49Z USER 9044 (nc00/sg01) [ModuleForkPass]: expand_replication finished after 0.000 seconds +2025-11-04T21:38:49Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 187mb, ru_maxrss: 213mb (delta=0mb) +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 187mb, ru_maxrss: 213mb (delta=0mb) +2025-11-04T21:38:49Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 156 memory location(s), 1 block(s), and 84 instruction(s). Max writers: 4 Max Readers: 8 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 210 memory location(s), 1 block(s), and 101 instruction(s). Max writers: 4 Max Readers: 9 +2025-11-04T21:38:49Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running unroll +2025-11-04T21:38:49Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running unroll +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to unroll: modules=1 functions=1 allocs=210 blocks=1 instructions=101 Max writers: 4 Max Readers: 9 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to unroll: modules=1 functions=1 allocs=156 blocks=1 instructions=84 Max writers: 4 Max Readers: 8 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [ExpandReplication]: Found 0 replicated matmults +2025-11-04T21:38:49Z INFO 9044 (nc00/sg01) [Unroll]: INFO (Unroll) Start unrolling at Tue Nov 4 21:38:49 2025 +2025-11-04T21:38:49Z USER 9044 (nc00/sg02) [ModuleForkPass]: expand_replication finished after 0.000 seconds +2025-11-04T21:38:49Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running expand_replication +2025-11-04T21:38:49Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running expand_replication +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 187mb, ru_maxrss: 213mb (delta=0mb) +2025-11-04T21:38:49Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to expand_replication: modules=1 functions=1 allocs=210 blocks=1 instructions=101 Max writers: 4 Max Readers: 9 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to expand_replication: modules=1 functions=1 allocs=156 blocks=1 instructions=84 Max writers: 4 Max Readers: 8 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg00) [ExpandReplication]: Found 0 replicated matmults +2025-11-04T21:38:49Z INFO 9044 (nc01/sg01) [ExpandReplication]: Found 0 replicated matmults +2025-11-04T21:38:49Z USER 9044 (nc01/sg00) [ModuleForkPass]: expand_replication finished after 0.000 seconds +2025-11-04T21:38:49Z USER 9044 (nc01/sg01) [ModuleForkPass]: expand_replication finished after 0.000 seconds +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to expand_replication: modules=1 functions=1 allocs=591 blocks=1 instructions=703 Max writers: 65 Max Readers: 64 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 591 memory location(s), 1 block(s), and 703 instruction(s). Max writers: 65 Max Readers: 64 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 187mb, ru_maxrss: 213mb (delta=0mb) +2025-11-04T21:38:49Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 187mb, ru_maxrss: 213mb (delta=0mb) +2025-11-04T21:38:49Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running unroll +2025-11-04T21:38:49Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 210 memory location(s), 1 block(s), and 101 instruction(s). Max writers: 4 Max Readers: 9 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 156 memory location(s), 1 block(s), and 84 instruction(s). Max writers: 4 Max Readers: 8 +2025-11-04T21:38:49Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running unroll +2025-11-04T21:38:49Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running unroll +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [ExpandReplication]: Found 0 replicated matmults +2025-11-04T21:38:49Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to unroll: modules=1 functions=1 allocs=210 blocks=1 instructions=101 Max writers: 4 Max Readers: 9 +2025-11-04T21:38:49Z USER 9044 (nc01/sg02) [ModuleForkPass]: expand_replication finished after 0.000 seconds +2025-11-04T21:38:49Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to unroll: modules=1 functions=1 allocs=156 blocks=1 instructions=84 Max writers: 4 Max Readers: 8 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg00) [Unroll]: INFO (Unroll) Start unrolling at Tue Nov 4 21:38:49 2025 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg01) [Unroll]: INFO (Unroll) Start unrolling at Tue Nov 4 21:38:49 2025 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 187mb, ru_maxrss: 213mb (delta=0mb) +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to unroll: modules=1 functions=1 allocs=591 blocks=1 instructions=703 Max writers: 65 Max Readers: 64 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [Unroll]: INFO (Unroll) Start unrolling at Tue Nov 4 21:38:49 2025 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 591 memory location(s), 1 block(s), and 703 instruction(s). Max writers: 65 Max Readers: 64 +2025-11-04T21:38:49Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running unroll +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to unroll: modules=1 functions=1 allocs=591 blocks=1 instructions=703 Max writers: 65 Max Readers: 64 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [Unroll]: INFO (Unroll) Start unrolling at Tue Nov 4 21:38:49 2025 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [Unroll]: INFO (Unroll) Start unrolling at Tue Nov 4 21:38:49 2025 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg00) [Unroll]: INFO (Unroll) DONE unrolling Tue Nov 4 21:38:49 2025 + +2025-11-04T21:38:49Z INFO 9044 (nc01/sg00) [Unroll]: sg0000 Instruction count after Unroll: +2025-11-04T21:38:49Z INFO 9044 (nc01/sg00) [Unroll]: Total count: 1303 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg00) [Unroll]: Matmult: 641 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg00) [Unroll]: TensorScalarPtr: 171 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg00) [Unroll]: TensorTensor: 134 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg00) [Unroll]: GenericCopy: 128 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg00) [Unroll]: Load: 69 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg00) [Unroll]: Activation: 55 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg00) [Unroll]: Save: 40 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg00) [Unroll]: DMACopy: 40 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg00) [Unroll]: Memset: 9 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg00) [Unroll]: StreamShuffle: 8 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg00) [Unroll]: CoreBarrier: 4 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg00) [Unroll]: CollectiveCompute: 2 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg00) [Unroll]: Select: 1 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg00) [Unroll]: BIRKernel: 1 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg00) [Unroll]: Unrolled DGE count with Dynamic AP: 40 +2025-11-04T21:38:49Z USER 9044 (nc01/sg00) [ModuleForkPass]: unroll finished after 0.067 seconds +2025-11-04T21:38:49Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 248mb, ru_maxrss: 248mb (delta=35mb) +2025-11-04T21:38:49Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1411 memory location(s), 1 block(s), and 1303 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:49Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running dead_code_elim_o1 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to dead_code_elim_o1: modules=1 functions=1 allocs=1411 blocks=1 instructions=1303 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg00) [DeadCodeElim]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:49Z INFO 9044 (nc01/sg00) [DeadCodeElim]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:49Z INFO 9044 (nc01/sg00) [DeadCodeElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:49Z INFO 9044 (nc01/sg00) [DeadCodeElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [Unroll]: INFO (Unroll) DONE unrolling Tue Nov 4 21:38:49 2025 + +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [Unroll]: sg0000 Instruction count after Unroll: +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [Unroll]: Total count: 1305 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [Unroll]: Matmult: 641 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [Unroll]: TensorScalarPtr: 171 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [Unroll]: TensorTensor: 134 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [Unroll]: GenericCopy: 128 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [Unroll]: Load: 69 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [Unroll]: Activation: 55 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [Unroll]: Save: 41 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [Unroll]: DMACopy: 41 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [Unroll]: Memset: 9 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [Unroll]: StreamShuffle: 8 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [Unroll]: CoreBarrier: 4 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [Unroll]: CollectiveCompute: 2 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [Unroll]: Select: 1 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [Unroll]: BIRKernel: 1 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [Unroll]: Unrolled DGE count with Dynamic AP: 40 +2025-11-04T21:38:49Z USER 9044 (nc01/sg00) [ModuleForkPass]: dead_code_elim_o1 finished after 0.004 seconds +2025-11-04T21:38:49Z USER 9044 (nc00/sg00) [ModuleForkPass]: unroll finished after 0.084 seconds +2025-11-04T21:38:49Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 252mb, ru_maxrss: 252mb (delta=3mb) +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 252mb, ru_maxrss: 252mb (delta=39mb) +2025-11-04T21:38:49Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 653 memory location(s), 1 block(s), and 1301 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1411 memory location(s), 1 block(s), and 1305 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:49Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running dead_code_elim_o1 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to dead_code_elim_o1: modules=1 functions=1 allocs=1411 blocks=1 instructions=1305 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [DeadCodeElim]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [DeadCodeElim]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [DeadCodeElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [DeadCodeElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:49Z USER 9044 (nc00/sg00) [ModuleForkPass]: dead_code_elim_o1 finished after 0.010 seconds +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 256mb, ru_maxrss: 256mb (delta=3mb) +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 654 memory location(s), 1 block(s), and 1304 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg01) [Unroll]: INFO (Unroll) DONE unrolling Tue Nov 4 21:38:49 2025 + +2025-11-04T21:38:49Z INFO 9044 (nc00/sg01) [Unroll]: sg0001 Instruction count after Unroll: +2025-11-04T21:38:49Z INFO 9044 (nc00/sg01) [Unroll]: Total count: 2582 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg01) [Unroll]: Matmult: 1828 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg01) [Unroll]: Load: 198 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg01) [Unroll]: TensorScalarPtr: 128 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg01) [Unroll]: GenericCopy: 121 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg01) [Unroll]: TensorTensor: 120 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg01) [Unroll]: Activation: 82 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg01) [Unroll]: Save: 45 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg01) [Unroll]: DMACopy: 34 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg01) [Unroll]: Memset: 10 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg01) [Unroll]: StreamShuffle: 8 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg01) [Unroll]: CoreBarrier: 4 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg01) [Unroll]: CollectiveCompute: 2 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg01) [Unroll]: Select: 1 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg01) [Unroll]: BIRKernel: 1 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg01) [Unroll]: Unrolled DGE count with Dynamic AP: 32 +2025-11-04T21:38:49Z USER 9044 (nc00/sg01) [ModuleForkPass]: unroll finished after 0.145 seconds +2025-11-04T21:38:49Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 270mb, ru_maxrss: 270mb (delta=57mb) +2025-11-04T21:38:49Z INFO 9044 (nc01/sg01) [Unroll]: INFO (Unroll) DONE unrolling Tue Nov 4 21:38:49 2025 + +2025-11-04T21:38:49Z INFO 9044 (nc01/sg01) [Unroll]: sg0001 Instruction count after Unroll: +2025-11-04T21:38:49Z INFO 9044 (nc01/sg01) [Unroll]: Total count: 2580 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg01) [Unroll]: Matmult: 1828 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg01) [Unroll]: Load: 198 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg01) [Unroll]: TensorScalarPtr: 128 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg01) [Unroll]: GenericCopy: 121 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg01) [Unroll]: TensorTensor: 120 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg01) [Unroll]: Activation: 82 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg01) [Unroll]: Save: 44 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg01) [Unroll]: DMACopy: 33 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg01) [Unroll]: Memset: 10 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1527 memory location(s), 1 block(s), and 2582 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:49Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running dead_code_elim_o1 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to dead_code_elim_o1: modules=1 functions=1 allocs=1527 blocks=1 instructions=2582 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg01) [DeadCodeElim]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:49Z INFO 9044 (nc01/sg01) [Unroll]: StreamShuffle: 8 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg01) [Unroll]: CoreBarrier: 4 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg01) [Unroll]: CollectiveCompute: 2 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg01) [Unroll]: Select: 1 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg01) [Unroll]: BIRKernel: 1 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg01) [Unroll]: Unrolled DGE count with Dynamic AP: 32 +2025-11-04T21:38:49Z USER 9044 (nc01/sg01) [ModuleForkPass]: unroll finished after 0.155 seconds +2025-11-04T21:38:49Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 261mb, ru_maxrss: 270mb (delta=57mb) +2025-11-04T21:38:49Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1527 memory location(s), 1 block(s), and 2580 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:49Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running dead_code_elim_o1 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to dead_code_elim_o1: modules=1 functions=1 allocs=1527 blocks=1 instructions=2580 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg01) [DeadCodeElim]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:49Z INFO 9044 (nc01/sg01) [DeadCodeElim]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:49Z INFO 9044 (nc01/sg01) [DeadCodeElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:49Z INFO 9044 (nc01/sg01) [DeadCodeElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:49Z USER 9044 (nc01/sg01) [ModuleForkPass]: dead_code_elim_o1 finished after 0.006 seconds +2025-11-04T21:38:49Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 261mb, ru_maxrss: 270mb (delta=0mb) +2025-11-04T21:38:49Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 742 memory location(s), 1 block(s), and 2579 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg01) [DeadCodeElim]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:49Z INFO 9044 (nc00/sg01) [DeadCodeElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:49Z INFO 9044 (nc00/sg01) [DeadCodeElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:49Z USER 9044 (nc00/sg01) [ModuleForkPass]: dead_code_elim_o1 finished after 0.016 seconds +2025-11-04T21:38:49Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 261mb, ru_maxrss: 270mb (delta=0mb) +2025-11-04T21:38:49Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 743 memory location(s), 1 block(s), and 2582 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [Unroll]: INFO (Unroll) DONE unrolling Tue Nov 4 21:38:49 2025 + +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [Unroll]: sg0002 Instruction count after Unroll: +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [Unroll]: Total count: 14230 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [Unroll]: Matmult: 11306 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [Unroll]: GenericCopy: 1452 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [Unroll]: Load: 490 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [Unroll]: Save: 330 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [Unroll]: Gather: 131 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [Unroll]: Max: 128 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [Unroll]: MaxIndexAndMatchReplace: 128 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [Unroll]: TensorTensor: 83 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [Unroll]: Activation: 61 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [Unroll]: TensorScalarPtr: 53 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [Unroll]: Memset: 23 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [Unroll]: CoreBarrier: 13 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [Unroll]: TensorReduce: 10 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [Unroll]: CollectiveCompute: 8 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [Unroll]: StreamShuffle: 4 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [Unroll]: Select: 3 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [Unroll]: Reciprocal: 3 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [Unroll]: Iota: 2 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [Unroll]: DMACopy: 2 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [Unroll]: Unrolled DGE count with Dynamic AP: 1 +2025-11-04T21:38:49Z USER 9044 (nc00/sg02) [ModuleForkPass]: unroll finished after 0.467 seconds +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 373mb, ru_maxrss: 373mb (delta=160mb) +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 5769 memory location(s), 1 block(s), and 14230 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:49Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running dead_code_elim_o1 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to dead_code_elim_o1: modules=1 functions=1 allocs=5769 blocks=1 instructions=14230 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [DeadCodeElim]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [DeadCodeElim]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [DeadCodeElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [DeadCodeElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:49Z USER 9044 (nc00/sg02) [ModuleForkPass]: dead_code_elim_o1 finished after 0.041 seconds +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 341mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3089 memory location(s), 1 block(s), and 14220 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [Unroll]: INFO (Unroll) DONE unrolling Tue Nov 4 21:38:49 2025 + +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [Unroll]: sg0002 Instruction count after Unroll: +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [Unroll]: Total count: 14219 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [Unroll]: Matmult: 11306 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [Unroll]: GenericCopy: 1452 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [Unroll]: Load: 490 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [Unroll]: Save: 319 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [Unroll]: Gather: 131 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [Unroll]: Max: 128 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [Unroll]: MaxIndexAndMatchReplace: 128 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [Unroll]: TensorTensor: 83 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [Unroll]: Activation: 61 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [Unroll]: TensorScalarPtr: 53 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [Unroll]: Memset: 23 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [Unroll]: CoreBarrier: 13 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [Unroll]: TensorReduce: 10 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [Unroll]: CollectiveCompute: 8 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [Unroll]: StreamShuffle: 4 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [Unroll]: Select: 3 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [Unroll]: Reciprocal: 3 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [Unroll]: Iota: 2 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [Unroll]: DMACopy: 2 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [Unroll]: Unrolled DGE count with Dynamic AP: 1 +2025-11-04T21:38:49Z USER 9044 (nc01/sg02) [ModuleForkPass]: unroll finished after 0.532 seconds +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 341mb, ru_maxrss: 373mb (delta=160mb) +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 5769 memory location(s), 1 block(s), and 14219 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:49Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running dead_code_elim_o1 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to dead_code_elim_o1: modules=1 functions=1 allocs=5769 blocks=1 instructions=14219 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [DeadCodeElim]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [DeadCodeElim]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [DeadCodeElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [DeadCodeElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:49Z USER 9044 (nc01/sg02) [ModuleForkPass]: dead_code_elim_o1 finished after 0.049 seconds +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 298mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2640 memory location(s), 1 block(s), and 13439 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:49Z USER 9044 [ModuleForkPass]: Compilation status: Total modules: 6, Passed: 6, Failed: 0 +2025-11-04T21:38:49Z USER 9044 [BackendPassManager]: mod_parallel_pass finished after 0.593 seconds +2025-11-04T21:38:49Z INFO 9044 [BackendPassManager]: curr_vmrss: 298mb, ru_maxrss: 373mb (delta=160mb) +2025-11-04T21:38:49Z USER 9044 [BackendPassManager]: Running subgraph_parallel_pass +2025-11-04T21:38:49Z INFO 9044 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=6 functions=6 allocs=8521 blocks=6 instructions=35425 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:49Z USER 9044 (sg00) [SubgraphForkPass]: Running localize_shared_memory +2025-11-04T21:38:49Z USER 9044 (sg01) [SubgraphForkPass]: Running localize_shared_memory +2025-11-04T21:38:49Z USER 9044 (sg02) [SubgraphForkPass]: Running localize_shared_memory +2025-11-04T21:38:49Z INFO 9044 (sg00) [SubgraphForkPass]: Inputs to localize_shared_memory: modules=2 functions=2 allocs=1307 blocks=2 instructions=2605 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:49Z INFO 9044 (sg01) [SubgraphForkPass]: Inputs to localize_shared_memory: modules=2 functions=2 allocs=1485 blocks=2 instructions=5161 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:49Z INFO 9044 (sg02) [SubgraphForkPass]: Inputs to localize_shared_memory: modules=2 functions=2 allocs=5729 blocks=2 instructions=27659 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:49Z USER 9044 (sg00) [SubgraphForkPass]: localize_shared_memory finished after 0.000 seconds +2025-11-04T21:38:49Z INFO 9044 (sg00) [SubgraphForkPass]: curr_vmrss: 298mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:49Z USER 9044 (sg01) [SubgraphForkPass]: localize_shared_memory finished after 0.000 seconds +2025-11-04T21:38:49Z INFO 9044 (sg01) [SubgraphForkPass]: curr_vmrss: 298mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:49Z INFO 9044 (sg00) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 1307 memory location(s), 2 block(s), and 2605 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:49Z INFO 9044 (sg01) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 1485 memory location(s), 2 block(s), and 5161 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:49Z USER 9044 (sg02) [SubgraphForkPass]: localize_shared_memory finished after 0.001 seconds +2025-11-04T21:38:49Z INFO 9044 (sg02) [SubgraphForkPass]: curr_vmrss: 298mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:49Z INFO 9044 (sg02) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 5729 memory location(s), 2 block(s), and 27659 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:49Z USER 9044 [SubgraphForkPass]: Compilation status: Total subgraphs: 3, Passed: 3, Failed: 0 +2025-11-04T21:38:49Z USER 9044 [BackendPassManager]: subgraph_parallel_pass finished after 0.009 seconds +2025-11-04T21:38:49Z INFO 9044 [BackendPassManager]: curr_vmrss: 298mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:49Z USER 9044 [BackendPassManager]: Running mod_parallel_pass +2025-11-04T21:38:49Z INFO 9044 [BackendPassManager]: Inputs to mod_parallel_pass: modules=6 functions=6 allocs=8521 blocks=6 instructions=35425 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:49Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running birverifier +2025-11-04T21:38:49Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running birverifier +2025-11-04T21:38:49Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running birverifier +2025-11-04T21:38:49Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running birverifier +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=654 blocks=1 instructions=1304 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=653 blocks=1 instructions=1301 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=2640 blocks=1 instructions=13439 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=3089 blocks=1 instructions=14220 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:49Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running birverifier +2025-11-04T21:38:49Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running birverifier +2025-11-04T21:38:49Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=743 blocks=1 instructions=2582 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:49Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=742 blocks=1 instructions=2579 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:49Z USER 9044 (nc00/sg00) [ModuleForkPass]: birverifier finished after 0.004 seconds +2025-11-04T21:38:49Z WARNING 9044 [birverifier::InstVisitor]: (nc01/sg02) Non - output memory location with no reader: {divide.1_1267_i1}@SB<0,0>(1x1024)#Internal DebugInfo: +2025-11-04T21:38:49Z WARNING 9044 [birverifier::InstVisitor]: (nc01/sg02) Non - output memory location with no reader: {select.5_1272_i1}@SB<0,0>(1x1024)#Internal DebugInfo: +2025-11-04T21:38:49Z USER 9044 (nc01/sg00) [ModuleForkPass]: birverifier finished after 0.007 seconds +2025-11-04T21:38:49Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 298mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:49Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 653 memory location(s), 1 block(s), and 1301 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 299mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 654 memory location(s), 1 block(s), and 1304 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:49Z USER 9044 (nc01/sg01) [ModuleForkPass]: birverifier finished after 0.010 seconds +2025-11-04T21:38:49Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 299mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:49Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 742 memory location(s), 1 block(s), and 2579 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:49Z USER 9044 (nc00/sg01) [ModuleForkPass]: birverifier finished after 0.015 seconds +2025-11-04T21:38:49Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 299mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:49Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 743 memory location(s), 1 block(s), and 2582 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:49Z USER 9044 (nc01/sg02) [ModuleForkPass]: birverifier finished after 0.069 seconds +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 305mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:49Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2640 memory location(s), 1 block(s), and 13439 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:49Z USER 9044 (nc00/sg02) [ModuleForkPass]: birverifier finished after 0.076 seconds +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 306mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3089 memory location(s), 1 block(s), and 14220 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:49Z USER 9044 [ModuleForkPass]: Compilation status: Total modules: 6, Passed: 6, Failed: 0 +2025-11-04T21:38:49Z USER 9044 [BackendPassManager]: mod_parallel_pass finished after 0.077 seconds +2025-11-04T21:38:49Z INFO 9044 [BackendPassManager]: curr_vmrss: 306mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:49Z USER 9044 [BackendPassManager]: Running subgraph_parallel_pass +2025-11-04T21:38:49Z INFO 9044 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=6 functions=6 allocs=8521 blocks=6 instructions=35425 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:49Z USER 9044 (sg01) [SubgraphForkPass]: Running lnc_verifier +2025-11-04T21:38:49Z USER 9044 (sg02) [SubgraphForkPass]: Running lnc_verifier +2025-11-04T21:38:49Z USER 9044 (sg00) [SubgraphForkPass]: Running lnc_verifier +2025-11-04T21:38:49Z INFO 9044 (sg01) [SubgraphForkPass]: Inputs to lnc_verifier: modules=2 functions=2 allocs=1485 blocks=2 instructions=5161 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:49Z INFO 9044 (sg00) [SubgraphForkPass]: Inputs to lnc_verifier: modules=2 functions=2 allocs=1307 blocks=2 instructions=2605 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:49Z INFO 9044 (sg02) [SubgraphForkPass]: Inputs to lnc_verifier: modules=2 functions=2 allocs=5729 blocks=2 instructions=27659 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:49Z USER 9044 (sg00) [SubgraphForkPass]: lnc_verifier finished after 0.001 seconds +2025-11-04T21:38:49Z INFO 9044 (sg00) [SubgraphForkPass]: curr_vmrss: 306mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:49Z INFO 9044 (sg00) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 1307 memory location(s), 2 block(s), and 2605 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:49Z USER 9044 (sg01) [SubgraphForkPass]: lnc_verifier finished after 0.001 seconds +2025-11-04T21:38:49Z INFO 9044 (sg01) [SubgraphForkPass]: curr_vmrss: 306mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:49Z INFO 9044 (sg01) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 1485 memory location(s), 2 block(s), and 5161 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:49Z USER 9044 (sg02) [SubgraphForkPass]: lnc_verifier finished after 0.004 seconds +2025-11-04T21:38:49Z INFO 9044 (sg02) [SubgraphForkPass]: curr_vmrss: 306mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:49Z INFO 9044 (sg02) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 5729 memory location(s), 2 block(s), and 27659 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:49Z USER 9044 [SubgraphForkPass]: Compilation status: Total subgraphs: 3, Passed: 3, Failed: 0 +2025-11-04T21:38:49Z USER 9044 [BackendPassManager]: subgraph_parallel_pass finished after 0.006 seconds +2025-11-04T21:38:49Z INFO 9044 [BackendPassManager]: curr_vmrss: 306mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:49Z USER 9044 [BackendPassManager]: Running mod_parallel_pass +2025-11-04T21:38:49Z INFO 9044 [BackendPassManager]: Inputs to mod_parallel_pass: modules=6 functions=6 allocs=8521 blocks=6 instructions=35425 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:49Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running instruction_reorder +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to instruction_reorder: modules=1 functions=1 allocs=654 blocks=1 instructions=1304 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:49Z USER 9044 (nc00/sg00) [ModuleForkPass]: instruction_reorder finished after 0.000 seconds +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 306mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 654 memory location(s), 1 block(s), and 1304 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:49Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running psum_legalization +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to psum_legalization: modules=1 functions=1 allocs=654 blocks=1 instructions=1304 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:49Z USER 9044 (nc00/sg00) [ModuleForkPass]: psum_legalization finished after 0.000 seconds +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 306mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 654 memory location(s), 1 block(s), and 1304 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:49Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running non_ssa_legalization +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to non_ssa_legalization: modules=1 functions=1 allocs=654 blocks=1 instructions=1304 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [NonSSALeg]: remove_redundant_loads +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [NonSSALeg]: remove_redundant_loads: 0 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [NonSSALeg]: [Non-SSA legalization]created 0 memorylocations +2025-11-04T21:38:49Z USER 9044 (nc00/sg00) [ModuleForkPass]: non_ssa_legalization finished after 0.001 seconds +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 306mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 654 memory location(s), 1 block(s), and 1304 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:49Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running legalize_cce_dma +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to legalize_cce_dma: modules=1 functions=1 allocs=654 blocks=1 instructions=1304 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:49Z USER 9044 (nc00/sg00) [ModuleForkPass]: legalize_cce_dma finished after 0.000 seconds +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 306mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 654 memory location(s), 1 block(s), and 1304 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:49Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running pre_opts +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to pre_opts: modules=1 functions=1 allocs=654 blocks=1 instructions=1304 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [PreOpts]: Skipped. No pre-opt passes enabled +2025-11-04T21:38:49Z USER 9044 (nc00/sg00) [ModuleForkPass]: pre_opts finished after 0.000 seconds +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 306mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 654 memory location(s), 1 block(s), and 1304 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:49Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running error_injector +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to error_injector: modules=1 functions=1 allocs=654 blocks=1 instructions=1304 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:49Z WARNING 9044 (nc00/sg00) [ErrorInjector]: Unrecognized injected error value "0" +2025-11-04T21:38:49Z USER 9044 (nc00/sg00) [ModuleForkPass]: error_injector finished after 0.000 seconds +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 306mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 654 memory location(s), 1 block(s), and 1304 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:49Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running vn_splitter +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to vn_splitter: modules=1 functions=1 allocs=654 blocks=1 instructions=1304 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [VNSplitter]: INFO (VNSplitter) Collected all the internal vnodes: size = 0 +2025-11-04T21:38:49Z INFO 9044 (nc00/sg00) [VNSplitter]: INFO (VNSplitter) Done with analyze and splitting: total dead nodes = 0 +2025-11-04T21:38:49Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running instruction_reorder +2025-11-04T21:38:49Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to instruction_reorder: modules=1 functions=1 allocs=3089 blocks=1 instructions=14220 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: instruction_reorder finished after 0.003 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 306mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3089 memory location(s), 1 block(s), and 14220 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running instruction_reorder +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to instruction_reorder: modules=1 functions=1 allocs=653 blocks=1 instructions=1301 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: instruction_reorder finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 306mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 653 memory location(s), 1 block(s), and 1301 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running psum_legalization +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to psum_legalization: modules=1 functions=1 allocs=653 blocks=1 instructions=1301 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: psum_legalization finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 306mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 653 memory location(s), 1 block(s), and 1301 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running non_ssa_legalization +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to non_ssa_legalization: modules=1 functions=1 allocs=653 blocks=1 instructions=1301 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [NonSSALeg]: remove_redundant_loads +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [NonSSALeg]: remove_redundant_loads: 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [NonSSALeg]: [Non-SSA legalization]created 0 memorylocations +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: non_ssa_legalization finished after 0.001 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 306mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 653 memory location(s), 1 block(s), and 1301 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running legalize_cce_dma +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to legalize_cce_dma: modules=1 functions=1 allocs=653 blocks=1 instructions=1301 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: legalize_cce_dma finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 306mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 653 memory location(s), 1 block(s), and 1301 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running pre_opts +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to pre_opts: modules=1 functions=1 allocs=653 blocks=1 instructions=1301 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PreOpts]: Skipped. No pre-opt passes enabled +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: pre_opts finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 306mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 653 memory location(s), 1 block(s), and 1301 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running error_injector +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to error_injector: modules=1 functions=1 allocs=653 blocks=1 instructions=1301 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z WARNING 9044 (nc01/sg00) [ErrorInjector]: Unrecognized injected error value "0" +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: error_injector finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 306mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 653 memory location(s), 1 block(s), and 1301 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running vn_splitter +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to vn_splitter: modules=1 functions=1 allocs=653 blocks=1 instructions=1301 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [VNSplitter]: INFO (VNSplitter) Collected all the internal vnodes: size = 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [VNSplitter]: INFO (VNSplitter) Done with analyze and splitting: total dead nodes = 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ShrinkDN]: INFO (ShrinkDN): Shrunk 1 nodes. Total savings 480 bytes/partition +2025-11-04T21:38:50Z INFO 9044 [PerformanceProfiler]: number of tensorizer non-local-tensor caused reload left 0 +2025-11-04T21:38:50Z INFO 9044 [PerformanceProfiler]: number of tensorizer non-local-tensor caused spill left 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [VNSplitterPass]: INFO (VNSplitter) Time: 0 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [VNSplitterPass]: INFO (VerticalFusion) Time: 0 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [VNSplitterPass]: INFO (ShrinkDN) Time: 0 seconds +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: vn_splitter finished after 0.001 seconds +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running instruction_reorder +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running instruction_reorder +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running instruction_reorder +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to instruction_reorder: modules=1 functions=1 allocs=743 blocks=1 instructions=2582 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to instruction_reorder: modules=1 functions=1 allocs=742 blocks=1 instructions=2579 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to instruction_reorder: modules=1 functions=1 allocs=2640 blocks=1 instructions=13439 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: instruction_reorder finished after 0.001 seconds +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: instruction_reorder finished after 0.001 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 306mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 306mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 742 memory location(s), 1 block(s), and 2579 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running psum_legalization +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 743 memory location(s), 1 block(s), and 2582 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to psum_legalization: modules=1 functions=1 allocs=742 blocks=1 instructions=2579 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running psum_legalization +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to psum_legalization: modules=1 functions=1 allocs=743 blocks=1 instructions=2582 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: psum_legalization finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 306mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 742 memory location(s), 1 block(s), and 2579 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running non_ssa_legalization +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: psum_legalization finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to non_ssa_legalization: modules=1 functions=1 allocs=742 blocks=1 instructions=2579 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [NonSSALeg]: remove_redundant_loads +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 306mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 743 memory location(s), 1 block(s), and 2582 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running non_ssa_legalization +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to non_ssa_legalization: modules=1 functions=1 allocs=743 blocks=1 instructions=2582 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [NonSSALeg]: remove_redundant_loads +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [NonSSALeg]: remove_redundant_loads: 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [NonSSALeg]: remove_redundant_loads: 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [NonSSALeg]: [Non-SSA legalization]created 0 memorylocations +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: non_ssa_legalization finished after 0.002 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 306mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 743 memory location(s), 1 block(s), and 2582 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ShrinkDN]: INFO (ShrinkDN): Shrunk 1 nodes. Total savings 480 bytes/partition +2025-11-04T21:38:50Z INFO 9044 [PerformanceProfiler]: number of tensorizer non-local-tensor caused reload left 0 +2025-11-04T21:38:50Z INFO 9044 [PerformanceProfiler]: number of tensorizer non-local-tensor caused spill left 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [VNSplitterPass]: INFO (VNSplitter) Time: 0 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [VNSplitterPass]: INFO (VerticalFusion) Time: 0 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [VNSplitterPass]: INFO (ShrinkDN) Time: 0.01 seconds +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: vn_splitter finished after 0.012 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 306mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 654 memory location(s), 1 block(s), and 1304 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running constant_propagate +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to constant_propagate: modules=1 functions=1 allocs=654 blocks=1 instructions=1304 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ConstantPropagate]: [Constant_propagate for select] directly remove instruction number: 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [NonSSALeg]: [Non-SSA legalization]created 0 memorylocations +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ConstantPropagate]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: instruction_reorder finished after 0.003 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 306mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2640 memory location(s), 1 block(s), and 13439 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running psum_legalization +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to psum_legalization: modules=1 functions=1 allocs=2640 blocks=1 instructions=13439 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: non_ssa_legalization finished after 0.003 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 306mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 742 memory location(s), 1 block(s), and 2579 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running legalize_cce_dma +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to legalize_cce_dma: modules=1 functions=1 allocs=742 blocks=1 instructions=2579 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: legalize_cce_dma finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 306mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 742 memory location(s), 1 block(s), and 2579 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running pre_opts +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to pre_opts: modules=1 functions=1 allocs=742 blocks=1 instructions=2579 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PreOpts]: Skipped. No pre-opt passes enabled +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: pre_opts finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 306mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 742 memory location(s), 1 block(s), and 2579 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running error_injector +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to error_injector: modules=1 functions=1 allocs=742 blocks=1 instructions=2579 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z WARNING 9044 (nc01/sg01) [ErrorInjector]: Unrecognized injected error value "0" +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: error_injector finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 306mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 742 memory location(s), 1 block(s), and 2579 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running vn_splitter +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to vn_splitter: modules=1 functions=1 allocs=742 blocks=1 instructions=2579 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ConstantPropagate]: [Constant_propagate for Affineselect] directly remove instruction number: 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [VNSplitter]: INFO (VNSplitter) Collected all the internal vnodes: size = 4 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [VNSplitter]: INFO (VNSplitter) Done with analyze and splitting: total dead nodes = 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ConstantPropagate]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running psum_legalization +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to psum_legalization: modules=1 functions=1 allocs=3089 blocks=1 instructions=14220 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 306mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 653 memory location(s), 1 block(s), and 1301 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running constant_propagate +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to constant_propagate: modules=1 functions=1 allocs=653 blocks=1 instructions=1301 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ConstantPropagate]: [Constant_propagate for select] directly remove instruction number: 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ConstantPropagate]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: psum_legalization finished after 0.002 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 306mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3089 memory location(s), 1 block(s), and 14220 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running non_ssa_legalization +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to non_ssa_legalization: modules=1 functions=1 allocs=3089 blocks=1 instructions=14220 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [NonSSALeg]: remove_redundant_loads +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 [PerformanceProfiler]: number of tensorizer non-local-tensor caused reload left 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ConstantPropagate]: [Constant_propagate for Affineselect] directly remove instruction number: 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ConstantPropagate]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: constant_propagate finished after 0.004 seconds +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: constant_propagate finished after 0.008 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 306mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 654 memory location(s), 1 block(s), and 1304 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running lower_ac +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to lower_ac: modules=1 functions=1 allocs=654 blocks=1 instructions=1304 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [LowerAC]: INFO (LowerAC) Lowered 0 loads, 0 saves, 0 copies. +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: lower_ac finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 306mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 654 memory location(s), 1 block(s), and 1304 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running input_dma_coalescing +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to input_dma_coalescing: modules=1 functions=1 allocs=654 blocks=1 instructions=1304 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: DMA input Coalescing combined 0 input loads +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: psum_legalization finished after 0.008 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 307mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2640 memory location(s), 1 block(s), and 13439 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running non_ssa_legalization +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to non_ssa_legalization: modules=1 functions=1 allocs=2640 blocks=1 instructions=13439 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [NonSSALeg]: remove_redundant_loads +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: input_dma_coalescing finished after 0.002 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 307mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 654 memory location(s), 1 block(s), and 1304 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 [PerformanceProfiler]: number of tensorizer non-local-tensor caused spill left 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [VNSplitterPass]: INFO (VNSplitter) Time: 0 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [VNSplitterPass]: INFO (VerticalFusion) Time: 0.001 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [VNSplitterPass]: INFO (ShrinkDN) Time: 0.002 seconds +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: vn_splitter finished after 0.008 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 307mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 742 memory location(s), 1 block(s), and 2579 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running constant_propagate +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to constant_propagate: modules=1 functions=1 allocs=742 blocks=1 instructions=2579 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running legalize_cce_dma +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to legalize_cce_dma: modules=1 functions=1 allocs=743 blocks=1 instructions=2582 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: legalize_cce_dma finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 307mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 743 memory location(s), 1 block(s), and 2582 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running pre_opts +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to pre_opts: modules=1 functions=1 allocs=743 blocks=1 instructions=2582 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PreOpts]: Skipped. No pre-opt passes enabled +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: pre_opts finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 307mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 743 memory location(s), 1 block(s), and 2582 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running error_injector +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to error_injector: modules=1 functions=1 allocs=743 blocks=1 instructions=2582 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z WARNING 9044 (nc00/sg01) [ErrorInjector]: Unrecognized injected error value "0" +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: error_injector finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 307mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 743 memory location(s), 1 block(s), and 2582 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running vn_splitter +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to vn_splitter: modules=1 functions=1 allocs=743 blocks=1 instructions=2582 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [NonSSALeg]: remove_redundant_loads: 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [VNSplitter]: INFO (VNSplitter) Collected all the internal vnodes: size = 4 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [VNSplitter]: INFO (VNSplitter) Done with analyze and splitting: total dead nodes = 0 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running remat_optimization +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to remat_optimization: modules=1 functions=1 allocs=654 blocks=1 instructions=1304 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ConstantPropagate]: [Constant_propagate for select] directly remove instruction number: 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [RematOpt]: Removed 0 remat instructions +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: remat_optimization finished after 0.001 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 307mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 654 memory location(s), 1 block(s), and 1304 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running coalesce_multichannel_cc_ops +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to coalesce_multichannel_cc_ops: modules=1 functions=1 allocs=654 blocks=1 instructions=1304 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: coalesce_multichannel_cc_ops finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 307mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 654 memory location(s), 1 block(s), and 1304 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running infer_stream_ids +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to infer_stream_ids: modules=1 functions=1 allocs=654 blocks=1 instructions=1304 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: infer_stream_ids finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 307mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 654 memory location(s), 1 block(s), and 1304 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running pre_sched +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to pre_sched: modules=1 functions=1 allocs=654 blocks=1 instructions=1304 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PreSched]: Start PRE scheduling 2 cores: 1 at: Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 [LayerSpiller]: LayerSpill: Start... +2025-11-04T21:38:50Z INFO 9044 [LayerSpiller]: LayerSpill: Found 1 Splits CCs +2025-11-04T21:38:50Z INFO 9044 [LayerSpiller]: Grouped CCs to 1 clusters. +2025-11-04T21:38:50Z INFO 9044 [LayerSpiller]: LayerSpill: To Spill 0 multi-layer tensors +2025-11-04T21:38:50Z INFO 9044 [LayerSpiller]: LayerSpill: set uninit flag on 0 insts +2025-11-04T21:38:50Z INFO 9044 [LayerSpiller]: LayerSpill: Done. +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PreSched]: Start split live ranges Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PreSched]: No split opportunities: +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PreSched]: End split live ranges Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PreSched]: Strt remove redundncies Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PreSched]: remove_redundant_memsets +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PreSched]: remove_redundant_memsets: 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PreSched]: remove_redundant_loads +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PreSched]: remove_redundant_loads: 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PreSched]: End remove redundncies Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PreSched]: Start DCE Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 [PerformanceProfiler]: number of tensorizer non-local-tensor caused reload left 0 +2025-11-04T21:38:50Z INFO 9044 [PerformanceProfiler]: number of tensorizer non-local-tensor caused spill left 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [VNSplitterPass]: INFO (VNSplitter) Time: 0 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [VNSplitterPass]: INFO (VerticalFusion) Time: 0 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [VNSplitterPass]: INFO (ShrinkDN) Time: 0.001 seconds +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: vn_splitter finished after 0.002 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PreSched]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PreSched]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PreSched]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PreSched]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 307mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 743 memory location(s), 1 block(s), and 2582 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running constant_propagate +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to constant_propagate: modules=1 functions=1 allocs=743 blocks=1 instructions=2582 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ConstantPropagate]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PreSched]: End DCE Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ConstantPropagate]: [Constant_propagate for select] directly remove instruction number: 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ConstantPropagate]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ConstantPropagate]: [Constant_propagate for Affineselect] directly remove instruction number: 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ConstantPropagate]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ConstantPropagate]: [Constant_propagate for Affineselect] directly remove instruction number: 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PreSched]: Start build flow dependencies Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 307mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [build_flow_deps]: Start build fdeps. Invocation: 1Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 653 memory location(s), 1 block(s), and 1301 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running lower_ac +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to lower_ac: modules=1 functions=1 allocs=653 blocks=1 instructions=1301 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [build_flow_deps]: Allocs: 654 instructions: 1304 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [LowerAC]: INFO (LowerAC) Lowered 0 loads, 0 saves, 0 copies. +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: lower_ac finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 307mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 653 memory location(s), 1 block(s), and 1301 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running input_dma_coalescing +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to input_dma_coalescing: modules=1 functions=1 allocs=653 blocks=1 instructions=1301 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: DMA input Coalescing combined 0 input loads +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: input_dma_coalescing finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 307mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 653 memory location(s), 1 block(s), and 1301 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running remat_optimization +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to remat_optimization: modules=1 functions=1 allocs=653 blocks=1 instructions=1301 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [RematOpt]: Removed 0 remat instructions +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: remat_optimization finished after 0.001 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 307mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: constant_propagate finished after 0.010 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 653 memory location(s), 1 block(s), and 1301 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running coalesce_multichannel_cc_ops +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to coalesce_multichannel_cc_ops: modules=1 functions=1 allocs=653 blocks=1 instructions=1301 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 307mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: coalesce_multichannel_cc_ops finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 307mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 742 memory location(s), 1 block(s), and 2579 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 653 memory location(s), 1 block(s), and 1301 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running infer_stream_ids +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running lower_ac +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to infer_stream_ids: modules=1 functions=1 allocs=653 blocks=1 instructions=1301 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to lower_ac: modules=1 functions=1 allocs=742 blocks=1 instructions=2579 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: infer_stream_ids finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 307mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 653 memory location(s), 1 block(s), and 1301 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running pre_sched +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to pre_sched: modules=1 functions=1 allocs=653 blocks=1 instructions=1301 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PreSched]: Start PRE scheduling 2 cores: 1 at: Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 [LayerSpiller]: LayerSpill: Start... +2025-11-04T21:38:50Z INFO 9044 [LayerSpiller]: LayerSpill: Found 1 Splits CCs +2025-11-04T21:38:50Z INFO 9044 [LayerSpiller]: Grouped CCs to 1 clusters. +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [LowerAC]: INFO (LowerAC) Lowered 0 loads, 0 saves, 0 copies. +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: lower_ac finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 307mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 [LayerSpiller]: LayerSpill: To Spill 0 multi-layer tensors +2025-11-04T21:38:50Z INFO 9044 [LayerSpiller]: LayerSpill: set uninit flag on 0 insts +2025-11-04T21:38:50Z INFO 9044 [LayerSpiller]: LayerSpill: Done. +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PreSched]: Start split live ranges Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 742 memory location(s), 1 block(s), and 2579 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running input_dma_coalescing +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to input_dma_coalescing: modules=1 functions=1 allocs=742 blocks=1 instructions=2579 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PreSched]: No split opportunities: +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PreSched]: End split live ranges Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PreSched]: Strt remove redundncies Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PreSched]: remove_redundant_memsets +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PreSched]: remove_redundant_memsets: 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PreSched]: remove_redundant_loads +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PreSched]: remove_redundant_loads: 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PreSched]: End remove redundncies Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PreSched]: Start DCE Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: DMA input Coalescing combined 0 input loads +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: input_dma_coalescing finished after 0.001 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 307mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 742 memory location(s), 1 block(s), and 2579 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PreSched]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running remat_optimization +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to remat_optimization: modules=1 functions=1 allocs=742 blocks=1 instructions=2579 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PreSched]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PreSched]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PreSched]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [RematOpt]: Removed 0 remat instructions +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: remat_optimization finished after 0.001 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 307mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 742 memory location(s), 1 block(s), and 2579 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running coalesce_multichannel_cc_ops +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to coalesce_multichannel_cc_ops: modules=1 functions=1 allocs=742 blocks=1 instructions=2579 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: coalesce_multichannel_cc_ops finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 307mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 742 memory location(s), 1 block(s), and 2579 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running infer_stream_ids +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to infer_stream_ids: modules=1 functions=1 allocs=742 blocks=1 instructions=2579 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: infer_stream_ids finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 307mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 742 memory location(s), 1 block(s), and 2579 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running pre_sched +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to pre_sched: modules=1 functions=1 allocs=742 blocks=1 instructions=2579 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PreSched]: Start PRE scheduling 2 cores: 1 at: Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 [LayerSpiller]: LayerSpill: Start... +2025-11-04T21:38:50Z INFO 9044 [LayerSpiller]: LayerSpill: Found 2 Splits CCs +2025-11-04T21:38:50Z INFO 9044 [LayerSpiller]: Grouped CCs to 2 clusters. +2025-11-04T21:38:50Z INFO 9044 [LayerSpiller]: LayerSpill: To Spill 0 multi-layer tensors +2025-11-04T21:38:50Z INFO 9044 [LayerSpiller]: LayerSpill: set uninit flag on 0 insts +2025-11-04T21:38:50Z INFO 9044 [LayerSpiller]: LayerSpill: Done. +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PreSched]: Start split live ranges Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PreSched]: No split opportunities: +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PreSched]: End split live ranges Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PreSched]: Strt remove redundncies Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PreSched]: remove_redundant_memsets +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PreSched]: remove_redundant_memsets: 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PreSched]: remove_redundant_loads +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PreSched]: remove_redundant_loads: 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PreSched]: End remove redundncies Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PreSched]: Start DCE Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [build_flow_deps]: Build fdeps inserted 3264 edges +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [build_flow_deps]: Done build fdeps 3264 Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PreSched]: End build flow dependencies Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PreSched]: Start remove useless insts Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PreSched]: remove_useless_insts +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PreSched]: remove Useless Instructions: 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PreSched]: End remove useless insts Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PreSched]: Start scratchpad optimization Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PreSched]: End scratchpad optimization Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [NonSSALeg]: remove_redundant_loads: 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PreSched]: DONE PRE scheduling Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: pre_sched finished after 0.022 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 307mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 654 memory location(s), 1 block(s), and 1304 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running tensor_copy_elim +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=654 blocks=1 instructions=1304 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [TensorCopyElim]: Tensor CP elimination: 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [TensorCopyElim]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ConstantPropagate]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PreSched]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [TensorCopyElim]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [TensorCopyElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [TensorCopyElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PreSched]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PreSched]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PreSched]: End DCE Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PreSched]: Start build flow dependencies Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [build_flow_deps]: Start build fdeps. Invocation: 2Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [build_flow_deps]: Allocs: 653 instructions: 1301 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PreSched]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: tensor_copy_elim finished after 0.006 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 308mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 654 memory location(s), 1 block(s), and 1304 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running dynamic_dma_setup +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to dynamic_dma_setup: modules=1 functions=1 allocs=654 blocks=1 instructions=1304 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: dynamic_dma_setup finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 308mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 655 memory location(s), 1 block(s), and 1304 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running runtime_memory_reservation +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to runtime_memory_reservation: modules=1 functions=1 allocs=655 blocks=1 instructions=1304 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: runtime_memory_reservation finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 308mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 656 memory location(s), 1 block(s), and 1304 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running lower_klir_kernel +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to lower_klir_kernel: modules=1 functions=1 allocs=656 blocks=1 instructions=1304 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: lower_klir_kernel finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 308mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 656 memory location(s), 1 block(s), and 1304 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running lower_nki_kernel +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to lower_nki_kernel: modules=1 functions=1 allocs=656 blocks=1 instructions=1304 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: lower_nki_kernel finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 308mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 656 memory location(s), 1 block(s), and 1304 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running coloring_allocator_psum +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to coloring_allocator_psum: modules=1 functions=1 allocs=656 blocks=1 instructions=1304 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PSUM_Allocator]: allocating PSUM +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PSUM_Allocator]: main loop +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PSUM_Allocator]: renumber locations +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PSUM_Allocator]: size = 150 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PSUM_Allocator]: build_no_bitmap start +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PSUM_Allocator]: 100% PSUM demand before spilling +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PSUM_Allocator]: PSUM high-water mark = 8 tensors +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PSUM_Allocator]: found 313 edges +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PSUM_Allocator]: mean: 4.17333 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PSUM_Allocator]: median: 4.15772 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PSUM_Allocator]: adjacency vectors require 2504 bytes +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PSUM_Allocator]: build_no_bitmap done +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PSUM_Allocator]: find costs +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PreSched]: End DCE Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PSUM_Allocator]: best-of-n loop, heuristic = 0, allow_psum_spill_within_accum_group = false +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PSUM_Allocator]: simplify interference graph +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PSUM_Allocator]: initialize low and high +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PSUM_Allocator]: lo = 150 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PSUM_Allocator]: hi = 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PSUM_Allocator]: inf = 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PSUM_Allocator]: total = 150 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PSUM_Allocator]: simplify +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PSUM_Allocator]: new candidates = 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PSUM_Allocator]: select ranges +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PSUM_Allocator]: no more spills +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PSUM_Allocator]: PSUM score = 0 (lower is better) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PSUM_Allocator]: spilling from PSUM cost about 0 cycles +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PSUM_Allocator]: 100% PSUM utilization after allocation +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: coloring_allocator_psum finished after 0.003 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PreSched]: Start build flow dependencies Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [build_flow_deps]: Start build fdeps. Invocation: 3Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [build_flow_deps]: Allocs: 742 instructions: 2579 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [build_flow_deps]: Build fdeps inserted 3262 edges +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [build_flow_deps]: Done build fdeps 3262 Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PreSched]: End build flow dependencies Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PreSched]: Start remove useless insts Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PreSched]: remove_useless_insts +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [NonSSALeg]: [Non-SSA legalization]created 0 memorylocations +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: non_ssa_legalization finished after 0.038 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 308mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: constant_propagate finished after 0.032 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 308mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 743 memory location(s), 1 block(s), and 2582 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running lower_ac +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to lower_ac: modules=1 functions=1 allocs=743 blocks=1 instructions=2582 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2640 memory location(s), 1 block(s), and 13439 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running legalize_cce_dma +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [LowerAC]: INFO (LowerAC) Lowered 0 loads, 0 saves, 0 copies. +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to legalize_cce_dma: modules=1 functions=1 allocs=2640 blocks=1 instructions=13439 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: lower_ac finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 308mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 743 memory location(s), 1 block(s), and 2582 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running input_dma_coalescing +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to input_dma_coalescing: modules=1 functions=1 allocs=743 blocks=1 instructions=2582 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [NonSSALeg]: [Non-SSA legalization]created 0 memorylocations +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: non_ssa_legalization finished after 0.043 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 308mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: DMA input Coalescing combined 0 input loads +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: input_dma_coalescing finished after 0.001 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 308mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3089 memory location(s), 1 block(s), and 14220 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 743 memory location(s), 1 block(s), and 2582 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running legalize_cce_dma +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running remat_optimization +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to remat_optimization: modules=1 functions=1 allocs=743 blocks=1 instructions=2582 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to legalize_cce_dma: modules=1 functions=1 allocs=3089 blocks=1 instructions=14220 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [RematOpt]: Removed 0 remat instructions +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: remat_optimization finished after 0.001 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 308mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 743 memory location(s), 1 block(s), and 2582 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running coalesce_multichannel_cc_ops +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to coalesce_multichannel_cc_ops: modules=1 functions=1 allocs=743 blocks=1 instructions=2582 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: coalesce_multichannel_cc_ops finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 308mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 743 memory location(s), 1 block(s), and 2582 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running infer_stream_ids +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to infer_stream_ids: modules=1 functions=1 allocs=743 blocks=1 instructions=2582 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: infer_stream_ids finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PreSched]: remove Useless Instructions: 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PreSched]: End remove useless insts Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PreSched]: Start scratchpad optimization Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PreSched]: End scratchpad optimization Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PreSched]: DONE PRE scheduling Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: pre_sched finished after 0.031 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 308mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 653 memory location(s), 1 block(s), and 1301 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running tensor_copy_elim +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=653 blocks=1 instructions=1301 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 308mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [TensorCopyElim]: Tensor CP elimination: 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 656 memory location(s), 1 block(s), and 1304 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running dma_optimization_psum +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to dma_optimization_psum: modules=1 functions=1 allocs=656 blocks=1 instructions=1304 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload instructions +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload memory locations +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: dma_optimization_psum finished after 0.002 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 308mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 656 memory location(s), 1 block(s), and 1304 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running address_rotation_psum +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to address_rotation_psum: modules=1 functions=1 allocs=656 blocks=1 instructions=1304 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: PSUM Rotation rotated 15 PSUM Banks +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 308mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 743 memory location(s), 1 block(s), and 2582 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running pre_sched +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to pre_sched: modules=1 functions=1 allocs=743 blocks=1 instructions=2582 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PreSched]: Start PRE scheduling 2 cores: 1 at: Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 [LayerSpiller]: LayerSpill: Start... +2025-11-04T21:38:50Z INFO 9044 [LayerSpiller]: LayerSpill: Found 2 Splits CCs +2025-11-04T21:38:50Z INFO 9044 [LayerSpiller]: Grouped CCs to 2 clusters. +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: legalize_cce_dma finished after 0.009 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 308mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [TensorCopyElim]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2640 memory location(s), 1 block(s), and 13439 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running pre_opts +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to pre_opts: modules=1 functions=1 allocs=2640 blocks=1 instructions=13439 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PreOpts]: Skipped. No pre-opt passes enabled +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: pre_opts finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 308mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2640 memory location(s), 1 block(s), and 13439 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z INFO 9044 [LayerSpiller]: LayerSpill: To Spill 0 multi-layer tensors +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running error_injector +2025-11-04T21:38:50Z INFO 9044 [LayerSpiller]: LayerSpill: set uninit flag on 0 insts +2025-11-04T21:38:50Z INFO 9044 [LayerSpiller]: LayerSpill: Done. +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PreSched]: Start split live ranges Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to error_injector: modules=1 functions=1 allocs=2640 blocks=1 instructions=13439 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z WARNING 9044 (nc01/sg02) [ErrorInjector]: Unrecognized injected error value "0" +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: error_injector finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 308mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2640 memory location(s), 1 block(s), and 13439 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running vn_splitter +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to vn_splitter: modules=1 functions=1 allocs=2640 blocks=1 instructions=13439 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [TensorCopyElim]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [VNSplitter]: INFO (VNSplitter) Collected all the internal vnodes: size = 4 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [VNSplitter]: INFO (VNSplitter) Done with analyze and splitting: total dead nodes = 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PreSched]: No split opportunities: +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [TensorCopyElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: PSUM Rotation rotated 5 PSUM Banks +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PreSched]: End split live ranges Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PreSched]: Strt remove redundncies Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PreSched]: remove_redundant_memsets +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PreSched]: remove_redundant_memsets: 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PreSched]: remove_redundant_loads +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PreSched]: remove_redundant_loads: 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [TensorCopyElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PreSched]: End remove redundncies Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PreSched]: Start DCE Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: PSUM Rotation rotated 51 PSUM Banks +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: address_rotation_psum finished after 0.007 seconds +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: legalize_cce_dma finished after 0.012 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 308mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 308mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 656 memory location(s), 1 block(s), and 1304 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running coloring_allocator_sb +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to coloring_allocator_sb: modules=1 functions=1 allocs=656 blocks=1 instructions=1304 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes loaded 16841476 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ColoringAllocator::Rep]: INFO: Pre GCA average loaded DMA size 2085 bytes +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes saved 8650754 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ColoringAllocator::Rep]: INFO: Pre GCA average saved DMA size 1689 bytes +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 2117632 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 206 bytes +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3089 memory location(s), 1 block(s), and 14220 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running pre_opts +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to pre_opts: modules=1 functions=1 allocs=3089 blocks=1 instructions=14220 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PreOpts]: Skipped. No pre-opt passes enabled +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: pre_opts finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 308mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: tensor_copy_elim finished after 0.010 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 308mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 653 memory location(s), 1 block(s), and 1301 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3089 memory location(s), 1 block(s), and 14220 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running dynamic_dma_setup +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running error_injector +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to dynamic_dma_setup: modules=1 functions=1 allocs=653 blocks=1 instructions=1301 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: dynamic_dma_setup finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 308mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 654 memory location(s), 1 block(s), and 1301 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running runtime_memory_reservation +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to runtime_memory_reservation: modules=1 functions=1 allocs=654 blocks=1 instructions=1301 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to error_injector: modules=1 functions=1 allocs=3089 blocks=1 instructions=14220 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: runtime_memory_reservation finished after 0.000 seconds +2025-11-04T21:38:50Z WARNING 9044 (nc00/sg02) [ErrorInjector]: Unrecognized injected error value "0" +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 308mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: error_injector finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 655 memory location(s), 1 block(s), and 1301 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running lower_klir_kernel +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 308mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to lower_klir_kernel: modules=1 functions=1 allocs=655 blocks=1 instructions=1301 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: lower_klir_kernel finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 308mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3089 memory location(s), 1 block(s), and 14220 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 655 memory location(s), 1 block(s), and 1301 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running lower_nki_kernel +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running vn_splitter +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to lower_nki_kernel: modules=1 functions=1 allocs=655 blocks=1 instructions=1301 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: lower_nki_kernel finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 308mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 655 memory location(s), 1 block(s), and 1301 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to vn_splitter: modules=1 functions=1 allocs=3089 blocks=1 instructions=14220 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [VNSplitter]: INFO (VNSplitter) Collected all the internal vnodes: size = 11 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [VNSplitter]: INFO (VNSplitter) Done with analyze and splitting: total dead nodes = 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PreSched]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PreSched]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PreSched]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PreSched]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PreSched]: End DCE Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: allocating SB +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: main loop +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: renumber locations +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: size = 470 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: find partners +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: found 75 accumulation groups +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: largest = custom-call.177.2122_i0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: tensors = 17 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: requires 33280 bytes/partition +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: expanding partners +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running coloring_allocator_psum +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to coloring_allocator_psum: modules=1 functions=1 allocs=655 blocks=1 instructions=1301 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PSUM_Allocator]: allocating PSUM +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PSUM_Allocator]: main loop +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PSUM_Allocator]: renumber locations +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PSUM_Allocator]: size = 150 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PSUM_Allocator]: build_no_bitmap start +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PSUM_Allocator]: 100% PSUM demand before spilling +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PSUM_Allocator]: PSUM high-water mark = 8 tensors +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PSUM_Allocator]: found 313 edges +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PSUM_Allocator]: mean: 4.17333 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PSUM_Allocator]: median: 4.15772 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PSUM_Allocator]: adjacency vectors require 2504 bytes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PSUM_Allocator]: build_no_bitmap done +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PSUM_Allocator]: find costs +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PSUM_Allocator]: best-of-n loop, heuristic = 0, allow_psum_spill_within_accum_group = false +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PSUM_Allocator]: simplify interference graph +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PSUM_Allocator]: initialize low and high +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PSUM_Allocator]: lo = 150 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PSUM_Allocator]: hi = 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PSUM_Allocator]: inf = 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PSUM_Allocator]: total = 150 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PSUM_Allocator]: simplify +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PSUM_Allocator]: new candidates = 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PSUM_Allocator]: select ranges +2025-11-04T21:38:50Z INFO 9044 []: find first defs for local +2025-11-04T21:38:50Z INFO 9044 []: find first defs for global +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: find loads +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: 2 pin count +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: 60 remat count +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: 2 pinned tensors will require about 16392 bytes/partition +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: build interference graph +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: pass 1 int-tree +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: Num intervals 470 Num locations 470 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: IntervalTree Build Done +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: info.neighbors init Done +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: info.neighbors partners Done +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PSUM_Allocator]: no more spills +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PSUM_Allocator]: PSUM score = 0 (lower is better) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PSUM_Allocator]: spilling from PSUM cost about 0 cycles +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PSUM_Allocator]: 100% PSUM utilization after allocation +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: coloring_allocator_psum finished after 0.012 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 309mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 655 memory location(s), 1 block(s), and 1301 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running dma_optimization_psum +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to dma_optimization_psum: modules=1 functions=1 allocs=655 blocks=1 instructions=1301 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload instructions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload memory locations +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: dma_optimization_psum finished after 0.001 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 309mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 655 memory location(s), 1 block(s), and 1301 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PreSched]: Start build flow dependencies Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [build_flow_deps]: Start build fdeps. Invocation: 4Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [build_flow_deps]: Allocs: 743 instructions: 2582 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [build_flow_deps]: Build fdeps inserted 7317 edges +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [build_flow_deps]: Done build fdeps 7317 Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PreSched]: End build flow dependencies Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PreSched]: Start remove useless insts Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PreSched]: remove_useless_insts +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PreSched]: remove Useless Instructions: 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PreSched]: End remove useless insts Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PreSched]: Start scratchpad optimization Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running address_rotation_psum +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to address_rotation_psum: modules=1 functions=1 allocs=655 blocks=1 instructions=1301 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PreSched]: End scratchpad optimization Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: PSUM Rotation rotated 15 PSUM Banks +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: IntervalTree readback Done +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: edge: 11448 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: mean: 48.7149 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: median: 39.2429 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: find costs +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: best-of-n loop, heuristic = 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: simplify interference graph +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: initialize safe & unsafe +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PreSched]: DONE PRE scheduling Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: pre_sched finished after 0.057 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: safe = 424 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: unsafe = 43 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: inf = 1 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: total = 468 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: simplify +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 309mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: simplify_step3_sorted2 #Unsafe 0 #Pinned 0 #Safe 0 minCost 1.79769e+308 maxCost 2.22507e-308 locations 470 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 742 memory location(s), 1 block(s), and 2579 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running tensor_copy_elim +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: new candidates = 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: select ranges +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=742 blocks=1 instructions=2579 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [TensorCopyElim]: Tensor CP elimination: 1 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: Total: 468 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: Spilled: 0.000 (0) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: Allocated: 1.000 (468) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: Rover zone: 0.904 (423) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: Pre-rover zone: 0.017 (8) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: Post-rover zone: 0.079 (37) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: Slice zone: 0.000 (0) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: Blocks nothing: 0.002 (1) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: Blocks medium: 0.000 (0) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: Blocks tall: 0.998 (467) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: Visited until tall blocking (mean): 0.988 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: Visited until tall blocking (median): 1.000 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: Visited until tall blocking (p95): 1.000 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: Success +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: SB spills = 0 tensors +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: size = 0 bytes/partition +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: remats = 0 tensors +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: unpinned = 0 tensors +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: size = 0 bytes/partition +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: SB score = 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: spilling from SB cost about 0 cycles +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: 16392 bytes/partition (100%) successfully pinned +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: pinning saved approximately 8300 cycles +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [SB_Allocator]: 0% SB utilization after allocation +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [TensorCopyElim]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes loaded 16841476 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ColoringAllocator::Rep]: INFO: Post GCA average loaded DMA size 2085 bytes +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes saved 8650754 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ColoringAllocator::Rep]: INFO: Post GCA average saved DMA size 1689 bytes +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 2117632 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 206 bytes +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: coloring_allocator_sb finished after 0.022 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: PSUM Rotation rotated 5 PSUM Banks +2025-11-04T21:38:50Z INFO 9044 [PerformanceProfiler]: number of tensorizer non-local-tensor caused reload left 0 +2025-11-04T21:38:50Z INFO 9044 [PerformanceProfiler]: number of tensorizer non-local-tensor caused spill left 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [VNSplitterPass]: INFO (VNSplitter) Time: 0 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [VNSplitterPass]: INFO (VerticalFusion) Time: 0.011 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [VNSplitterPass]: INFO (ShrinkDN) Time: 0.011 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [TensorCopyElim]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [TensorCopyElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [TensorCopyElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: PSUM Rotation rotated 51 PSUM Banks +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: address_rotation_psum finished after 0.009 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 309mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 655 memory location(s), 1 block(s), and 1301 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running coloring_allocator_sb +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to coloring_allocator_sb: modules=1 functions=1 allocs=655 blocks=1 instructions=1301 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes loaded 16841476 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ColoringAllocator::Rep]: INFO: Pre GCA average loaded DMA size 2085 bytes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes saved 8650752 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ColoringAllocator::Rep]: INFO: Pre GCA average saved DMA size 1689 bytes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 2117632 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 206 bytes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: tensor_copy_elim finished after 0.004 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 309mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 741 memory location(s), 1 block(s), and 2578 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running dynamic_dma_setup +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to dynamic_dma_setup: modules=1 functions=1 allocs=741 blocks=1 instructions=2578 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: dynamic_dma_setup finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 309mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 742 memory location(s), 1 block(s), and 2578 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running runtime_memory_reservation +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to runtime_memory_reservation: modules=1 functions=1 allocs=742 blocks=1 instructions=2578 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: runtime_memory_reservation finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 309mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 743 memory location(s), 1 block(s), and 2578 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running lower_klir_kernel +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to lower_klir_kernel: modules=1 functions=1 allocs=743 blocks=1 instructions=2578 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: lower_klir_kernel finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 309mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 743 memory location(s), 1 block(s), and 2578 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running lower_nki_kernel +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to lower_nki_kernel: modules=1 functions=1 allocs=743 blocks=1 instructions=2578 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: lower_nki_kernel finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 309mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 743 memory location(s), 1 block(s), and 2578 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running coloring_allocator_psum +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: allocating SB +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to coloring_allocator_psum: modules=1 functions=1 allocs=743 blocks=1 instructions=2578 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: main loop +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: renumber locations +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: size = 469 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: find partners +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: found 75 accumulation groups +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: largest = custom-call.177.2122_i1 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: tensors = 17 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: requires 33280 bytes/partition +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: expanding partners +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PSUM_Allocator]: allocating PSUM +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PSUM_Allocator]: main loop +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PSUM_Allocator]: renumber locations +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PSUM_Allocator]: size = 168 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PSUM_Allocator]: build_no_bitmap start +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PSUM_Allocator]: 100% PSUM demand before spilling +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PSUM_Allocator]: PSUM high-water mark = 8 tensors +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PSUM_Allocator]: found 507 edges +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PSUM_Allocator]: mean: 6.03571 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PSUM_Allocator]: median: 6.98448 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PSUM_Allocator]: adjacency vectors require 4056 bytes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PSUM_Allocator]: build_no_bitmap done +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PSUM_Allocator]: find costs +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 309mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 656 memory location(s), 1 block(s), and 1304 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running address_rotation_sb +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=656 blocks=1 instructions=1304 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: address_rotation_sb finished after 0.001 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 309mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 656 memory location(s), 1 block(s), and 1304 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running dma_optimization_sb +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to dma_optimization_sb: modules=1 functions=1 allocs=656 blocks=1 instructions=1304 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: DMA optimization In bytes loaded or saved 25492230, 41.3852% input load, 9.25497% output write, 49.3598% spill/reload [sg0000] +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: removed 0 identical load +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: adjusted 0 DMACopy remat +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PSUM_Allocator]: best-of-n loop, heuristic = 0, allow_psum_spill_within_accum_group = false +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PSUM_Allocator]: simplify interference graph +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PSUM_Allocator]: initialize low and high +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PSUM_Allocator]: lo = 168 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PSUM_Allocator]: hi = 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PSUM_Allocator]: inf = 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PSUM_Allocator]: total = 168 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PSUM_Allocator]: simplify +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PSUM_Allocator]: new candidates = 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PSUM_Allocator]: select ranges +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: vn_splitter finished after 0.027 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 309mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2640 memory location(s), 1 block(s), and 13439 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running constant_propagate +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [build_flow_deps]: Build fdeps inserted 7319 edges +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [build_flow_deps]: Done build fdeps 7319 Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PreSched]: End build flow dependencies Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PreSched]: Start remove useless insts Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PreSched]: remove_useless_insts +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PSUM_Allocator]: no more spills +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PSUM_Allocator]: PSUM score = 0 (lower is better) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PSUM_Allocator]: spilling from PSUM cost about 0 cycles +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PSUM_Allocator]: 100% PSUM utilization after allocation +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: coloring_allocator_psum finished after 0.009 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to constant_propagate: modules=1 functions=1 allocs=2640 blocks=1 instructions=13439 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 309mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 743 memory location(s), 1 block(s), and 2578 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running dma_optimization_psum +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to dma_optimization_psum: modules=1 functions=1 allocs=743 blocks=1 instructions=2578 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PreSched]: remove Useless Instructions: 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PreSched]: End remove useless insts Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PreSched]: Start scratchpad optimization Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PreSched]: End scratchpad optimization Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload instructions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload memory locations +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: dma_optimization_psum finished after 0.002 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 309mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 743 memory location(s), 1 block(s), and 2578 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running address_rotation_psum +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to address_rotation_psum: modules=1 functions=1 allocs=743 blocks=1 instructions=2578 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PreSched]: DONE PRE scheduling Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: pre_sched finished after 0.042 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 309mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 743 memory location(s), 1 block(s), and 2582 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running tensor_copy_elim +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=743 blocks=1 instructions=2582 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ShrinkDN]: INFO (ShrinkDN): Shrunk 2 nodes. Total savings 14336 bytes/partition +2025-11-04T21:38:50Z INFO 9044 []: find first defs for local +2025-11-04T21:38:50Z INFO 9044 []: find first defs for global +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: PSUM Rotation rotated 3 PSUM Banks +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: find loads +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: 2 pin count +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: 60 remat count +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: 2 pinned tensors will require about 16392 bytes/partition +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: build interference graph +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: pass 1 int-tree +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: Num intervals 469 Num locations 469 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: IntervalTree Build Done +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: info.neighbors init Done +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: info.neighbors partners Done +2025-11-04T21:38:50Z INFO 9044 [PerformanceProfiler]: number of tensorizer non-local-tensor caused reload left 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ConstantPropagate]: [Constant_propagate for select] directly remove instruction number: 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: IntervalTree readback Done +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: edge: 11440 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: mean: 48.7846 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: median: 39.5271 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: find costs +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: PSUM Rotation rotated 3 PSUM Banks +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: best-of-n loop, heuristic = 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: simplify interference graph +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: initialize safe & unsafe +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: safe = 423 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: unsafe = 43 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: inf = 1 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: total = 467 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: simplify +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: simplify_step3_sorted2 #Unsafe 0 #Pinned 0 #Safe 0 minCost 1.79769e+308 maxCost 2.22507e-308 locations 469 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: new candidates = 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: select ranges +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: sub-graph will get execute 1 times +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: [Load Merging]: removed 0 remat/cloned instructions +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: [Load shrink]: shrinked 0 GCA remat/cloned instructions +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: [Load Merging + Load shrink] reduced input/const loading DMA traffic 1835008, 7.1983% out of total dma traffic(1.055e+07) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: PSUM Rotation rotated 4 PSUM Banks +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: address_rotation_psum finished after 0.007 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload instructions +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload memory locations +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 310mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 743 memory location(s), 1 block(s), and 2578 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running coloring_allocator_sb +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to coloring_allocator_sb: modules=1 functions=1 allocs=743 blocks=1 instructions=2578 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes loaded 57221636 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ColoringAllocator::Rep]: INFO: Pre GCA average loaded DMA size 2292 bytes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes saved 11534336 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ColoringAllocator::Rep]: INFO: Pre GCA average saved DMA size 2048 bytes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 1064960 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 130 bytes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload instructions +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload memory locations +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: [Re-allocation Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload instructions +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload memory locations +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-11-04T21:38:50Z INFO 9044 [PerformanceProfiler]: number of tensorizer non-local-tensor caused spill left 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: allocating SB +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: main loop +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: [remove extra save] removed 0 memlocs and 0 instructions +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [VNSplitterPass]: INFO (VNSplitter) Time: 0 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [VNSplitterPass]: INFO (VerticalFusion) Time: 0.015 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [VNSplitterPass]: INFO (ShrinkDN) Time: 0.02 seconds +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: vn_splitter finished after 0.044 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 310mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: renumber locations +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: size = 530 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3089 memory location(s), 1 block(s), and 14220 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running constant_propagate +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: [remove_memset_spill]: removed 0 spill/reload instructions +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: [remove_memset_spill]: removed 0 spill/reload memory locations +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to constant_propagate: modules=1 functions=1 allocs=3089 blocks=1 instructions=14220 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: find partners +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [TensorCopyElim]: Tensor CP elimination: 1 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: found 142 accumulation groups +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: largest = _dot.6-t1590_i16 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: tensors = 36 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: requires 49152 bytes/partition +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: expanding partners +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [TensorCopyElim]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ConstantPropagate]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:50Z INFO 9044 []: find first defs for local +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ConstantPropagate]: [Constant_propagate for select] directly remove instruction number: 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [TensorCopyElim]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 []: find first defs for global +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: Total: 467 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: Spilled: 0.000 (0) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: Allocated: 1.000 (467) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: Rover zone: 0.906 (423) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: Pre-rover zone: 0.015 (7) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: Post-rover zone: 0.079 (37) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: Slice zone: 0.000 (0) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: Blocks nothing: 0.002 (1) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: Blocks medium: 0.000 (0) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: Blocks tall: 0.998 (466) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: Visited until tall blocking (mean): 0.990 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: Visited until tall blocking (median): 1.000 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: Visited until tall blocking (p95): 1.000 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: Success +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: SB spills = 0 tensors +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: size = 0 bytes/partition +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: remats = 0 tensors +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: unpinned = 0 tensors +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: size = 0 bytes/partition +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: SB score = 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: spilling from SB cost about 0 cycles +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: 16392 bytes/partition (100%) successfully pinned +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: pinning saved approximately 8300 cycles +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [SB_Allocator]: 0% SB utilization after allocation +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes loaded 16841476 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ColoringAllocator::Rep]: INFO: Post GCA average loaded DMA size 2085 bytes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes saved 8650752 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ColoringAllocator::Rep]: INFO: Post GCA average saved DMA size 1689 bytes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 2117632 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 206 bytes +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: coloring_allocator_sb finished after 0.026 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 310mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 655 memory location(s), 1 block(s), and 1301 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running address_rotation_sb +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=655 blocks=1 instructions=1301 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: DMA SpillSave Coalescing Round 0 combined 0 SpillSaves and Reloads +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: average loaded DMA size 2388 bytes +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: average saved DMA size 1689 bytes +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes loaded 15006468 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: INFO: Post DMA coalescing average loaded DMA size 2388 bytes +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes saved 8650754 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: INFO: Post DMA coalescing average saved DMA size 1689 bytes +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: [Experiment partial DMA access] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: [DMA optimization] reduced DMA traffic 1835008, 7.1983% out of total dma traffic +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: DMA optimization Out bytes loaded or saved 23657222, 44.5953% input load, 9.97284% output write, 45.4318% spill/reload [sg0000] +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes loaded 15006468 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: INFO: Post DMA optimization average loaded DMA size 2388 bytes +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes saved 8650754 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: INFO: Post DMA optimization average saved DMA size 1689 bytes +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes DMAcopyed 2117632 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: INFO: Post DMA optimization average DMAcopyed DMA size 206 bytes +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: INFO: Post DMA optimization average DMA size 1188 bytes +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: INFO: Finished set_spill_canreadUninit(module); +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: DMA optimization re-enable optimization +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: dma_optimization_sb finished after 0.024 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [TensorCopyElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [TensorCopyElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: tensor_copy_elim finished after 0.018 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 310mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 742 memory location(s), 1 block(s), and 2581 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running dynamic_dma_setup +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to dynamic_dma_setup: modules=1 functions=1 allocs=742 blocks=1 instructions=2581 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: dynamic_dma_setup finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 310mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 743 memory location(s), 1 block(s), and 2581 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running runtime_memory_reservation +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to runtime_memory_reservation: modules=1 functions=1 allocs=743 blocks=1 instructions=2581 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: runtime_memory_reservation finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 310mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 744 memory location(s), 1 block(s), and 2581 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running lower_klir_kernel +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to lower_klir_kernel: modules=1 functions=1 allocs=744 blocks=1 instructions=2581 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: lower_klir_kernel finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 310mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 744 memory location(s), 1 block(s), and 2581 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running lower_nki_kernel +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to lower_nki_kernel: modules=1 functions=1 allocs=744 blocks=1 instructions=2581 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: lower_nki_kernel finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 310mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 744 memory location(s), 1 block(s), and 2581 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running coloring_allocator_psum +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to coloring_allocator_psum: modules=1 functions=1 allocs=744 blocks=1 instructions=2581 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PSUM_Allocator]: allocating PSUM +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PSUM_Allocator]: main loop +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PSUM_Allocator]: renumber locations +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PSUM_Allocator]: size = 168 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: address_rotation_sb finished after 0.008 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 310mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 655 memory location(s), 1 block(s), and 1301 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running dma_optimization_sb +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to dma_optimization_sb: modules=1 functions=1 allocs=655 blocks=1 instructions=1301 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: DMA optimization In bytes loaded or saved 25492228, 41.3852% input load, 9.25496% output write, 49.3598% spill/reload [sg0000] +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: removed 0 identical load +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: adjusted 0 DMACopy remat +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: find loads +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: 2 pin count +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: 93 remat count +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: 2 pinned tensors will require about 16392 bytes/partition +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: build interference graph +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: pass 1 int-tree +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: sub-graph will get execute 1 times +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: [Load Merging]: removed 0 remat/cloned instructions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: [Load shrink]: shrinked 0 GCA remat/cloned instructions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: [Load Merging + Load shrink] reduced input/const loading DMA traffic 1835008, 7.1983% out of total dma traffic(1.055e+07) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload instructions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload memory locations +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 310mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 641 memory location(s), 1 block(s), and 1290 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running address_rotation_sb +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=641 blocks=1 instructions=1290 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: Num intervals 530 Num locations 530 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: IntervalTree Build Done +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: info.neighbors init Done +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: info.neighbors partners Done +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: IntervalTree readback Done +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: edge: 14181 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: mean: 53.5132 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: median: 44.0241 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: find costs +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: SB Rotation rotated 5 Sb address +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ConstantPropagate]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ConstantPropagate]: [Constant_propagate for Affineselect] directly remove instruction number: 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: best-of-n loop, heuristic = 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: simplify interference graph +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: initialize safe & unsafe +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: safe = 382 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: unsafe = 127 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: inf = 19 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: total = 528 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: simplify +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: simplify_step3_sorted2 #Unsafe 0 #Pinned 0 #Safe 0 minCost 1.79769e+308 maxCost 2.22507e-308 locations 530 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: SB Rotation rotated 4 Sb address +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: new candidates = 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: select ranges +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: Total: 528 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: Spilled: 0.000 (0) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: Allocated: 1.000 (528) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: Rover zone: 0.892 (471) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: Pre-rover zone: 0.011 (6) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: Post-rover zone: 0.097 (51) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: Slice zone: 0.000 (0) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: Blocks nothing: 0.002 (1) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: Blocks medium: 0.000 (0) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: Blocks tall: 0.998 (527) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: Visited until tall blocking (mean): 0.994 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: Visited until tall blocking (median): 1.000 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: Visited until tall blocking (p95): 1.000 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: Success +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: SB spills = 0 tensors +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: size = 0 bytes/partition +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: remats = 0 tensors +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: unpinned = 0 tensors +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: size = 0 bytes/partition +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: SB score = 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: spilling from SB cost about 0 cycles +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: 16392 bytes/partition (100%) successfully pinned +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: pinning saved approximately 8300 cycles +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [SB_Allocator]: 0% SB utilization after allocation +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PSUM_Allocator]: build_no_bitmap start +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PSUM_Allocator]: 100% PSUM demand before spilling +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PSUM_Allocator]: PSUM high-water mark = 8 tensors +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PSUM_Allocator]: found 507 edges +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PSUM_Allocator]: mean: 6.03571 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PSUM_Allocator]: median: 6.98448 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PSUM_Allocator]: adjacency vectors require 4056 bytes +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PSUM_Allocator]: build_no_bitmap done +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PSUM_Allocator]: find costs +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes loaded 57221636 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ColoringAllocator::Rep]: INFO: Post GCA average loaded DMA size 2292 bytes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes saved 11534336 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ColoringAllocator::Rep]: INFO: Post GCA average saved DMA size 2048 bytes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 1064960 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 130 bytes +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: coloring_allocator_sb finished after 0.025 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 311mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 743 memory location(s), 1 block(s), and 2578 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running address_rotation_sb +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=743 blocks=1 instructions=2578 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ConstantPropagate]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload instructions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload memory locations +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: [Re-allocation Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload instructions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload memory locations +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: address_rotation_sb finished after 0.003 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 311mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 743 memory location(s), 1 block(s), and 2578 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running dma_optimization_sb +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to dma_optimization_sb: modules=1 functions=1 allocs=743 blocks=1 instructions=2578 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: [remove extra save] removed 0 memlocs and 0 instructions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: DMA optimization In bytes loaded or saved 68755972, 71.0237% input load, 3.05014% output write, 25.9262% spill/reload [sg0001] +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: [remove_memset_spill]: removed 0 spill/reload instructions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: [remove_memset_spill]: removed 0 spill/reload memory locations +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: removed 0 identical load +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: adjusted 0 DMACopy remat +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: sub-graph will get execute 27 times +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: [Load Merging]: removed 0 remat/cloned instructions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: [Load shrink]: shrinked 0 GCA remat/cloned instructions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: [Load Merging + Load shrink] reduced input/const loading DMA traffic 0, 0% out of total dma traffic(4.8833e+07) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: [spill optimization round 0]: removed 8 spill/reload instructions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: [spill optimization round 0]: removed 8 spill/reload memory locations +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PSUM_Allocator]: best-of-n loop, heuristic = 0, allow_psum_spill_within_accum_group = false +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PSUM_Allocator]: simplify interference graph +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PSUM_Allocator]: initialize low and high +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PSUM_Allocator]: lo = 168 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PSUM_Allocator]: hi = 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PSUM_Allocator]: inf = 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PSUM_Allocator]: total = 168 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PSUM_Allocator]: simplify +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PSUM_Allocator]: new candidates = 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PSUM_Allocator]: select ranges +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PSUM_Allocator]: no more spills +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PSUM_Allocator]: PSUM score = 0 (lower is better) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PSUM_Allocator]: spilling from PSUM cost about 0 cycles +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PSUM_Allocator]: 100% PSUM utilization after allocation +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: coloring_allocator_psum finished after 0.023 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: [spill optimization round 1]: removed 0 spill/reload instructions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: [spill optimization round 1]: removed 0 spill/reload memory locations +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 4194304, 23.5294% out of total spill/reload dma traffic +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload instructions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload memory locations +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: [Re-allocation Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload instructions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload memory locations +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: [remove extra save] removed 0 memlocs and 0 instructions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: DMA SpillSave Coalescing Round 0 combined 0 SpillSaves and Reloads +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: average loaded DMA size 2388 bytes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: average saved DMA size 1689 bytes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes loaded 15006468 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: INFO: Post DMA coalescing average loaded DMA size 2388 bytes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes saved 8650752 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: [remove_memset_spill]: removed 0 spill/reload instructions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: INFO: Post DMA coalescing average saved DMA size 1689 bytes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: [remove_memset_spill]: removed 0 spill/reload memory locations +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: [Experiment partial DMA access] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: [DMA optimization] reduced DMA traffic 1835008, 7.1983% out of total dma traffic +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: DMA optimization Out bytes loaded or saved 23657220, 44.5953% input load, 9.97284% output write, 45.4318% spill/reload [sg0000] +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: constant_propagate finished after 0.051 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes loaded 15006468 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: INFO: Post DMA optimization average loaded DMA size 2388 bytes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 311mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes saved 8650752 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: INFO: Post DMA optimization average saved DMA size 1689 bytes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes DMAcopyed 2117632 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: INFO: Post DMA optimization average DMAcopyed DMA size 206 bytes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: INFO: Post DMA optimization average DMA size 1188 bytes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: INFO: Finished set_spill_canreadUninit(module); +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: DMA optimization re-enable optimization +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: dma_optimization_sb finished after 0.026 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 311mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2640 memory location(s), 1 block(s), and 13439 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 640 memory location(s), 1 block(s), and 1287 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running address_rotation_sb +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running lower_ac +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=640 blocks=1 instructions=1287 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to lower_ac: modules=1 functions=1 allocs=2640 blocks=1 instructions=13439 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 311mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 744 memory location(s), 1 block(s), and 2581 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running dma_optimization_psum +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to dma_optimization_psum: modules=1 functions=1 allocs=744 blocks=1 instructions=2581 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [LowerAC]: INFO (LowerAC) Lowered 0 loads, 0 saves, 0 copies. +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: lower_ac finished after 0.002 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 311mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2640 memory location(s), 1 block(s), and 13439 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running input_dma_coalescing +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to input_dma_coalescing: modules=1 functions=1 allocs=2640 blocks=1 instructions=13439 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload instructions +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload memory locations +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: dma_optimization_psum finished after 0.002 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 311mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 744 memory location(s), 1 block(s), and 2581 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running address_rotation_psum +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to address_rotation_psum: modules=1 functions=1 allocs=744 blocks=1 instructions=2581 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: SB Rotation rotated 5 Sb address +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: SB Rotation rotated 3 Sb address +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: DMA SpillSave Coalescing Round 0 combined 0 SpillSaves and Reloads +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: DMA input Coalescing combined 0 input loads +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: input_dma_coalescing finished after 0.006 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 311mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2640 memory location(s), 1 block(s), and 13439 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running remat_optimization +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to remat_optimization: modules=1 functions=1 allocs=2640 blocks=1 instructions=13439 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: average loaded DMA size 2254 bytes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: average saved DMA size 1843 bytes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes loaded 55124484 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: INFO: Post DMA coalescing average loaded DMA size 2254 bytes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes saved 9437184 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: SB Rotation rotated 32 Sb address +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: INFO: Post DMA coalescing average saved DMA size 1843 bytes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: [Experiment partial DMA access] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: [DMA optimization] reduced DMA traffic 4194304, 6.10028% out of total dma traffic +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: DMA optimization Out bytes loaded or saved 64561668, 75.6378% input load, 3.24829% output write, 21.1139% spill/reload [sg0001] +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes loaded 55124484 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: INFO: Post DMA optimization average loaded DMA size 2254 bytes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes saved 9437184 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: INFO: Post DMA optimization average saved DMA size 1843 bytes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes DMAcopyed 1064960 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: INFO: Post DMA optimization average DMAcopyed DMA size 130 bytes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: INFO: Post DMA optimization average DMA size 1737 bytes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: INFO: Finished set_spill_canreadUninit(module); +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: DMA optimization re-enable optimization +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: dma_optimization_sb finished after 0.024 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 311mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 734 memory location(s), 1 block(s), and 2570 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running address_rotation_sb +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=734 blocks=1 instructions=2570 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: PSUM Rotation rotated 3 PSUM Banks +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: SB Rotation rotated 3 Sb address +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: SB Rotation rotated 48 Sb address +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: SB Rotation rotated 21 Sb address +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ConstantPropagate]: [Constant_propagate for Affineselect] directly remove instruction number: 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [RematOpt]: Removed 0 remat instructions +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: remat_optimization finished after 0.014 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 311mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: address_rotation_sb finished after 0.044 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2640 memory location(s), 1 block(s), and 13439 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running coalesce_multichannel_cc_ops +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: PSUM Rotation rotated 3 PSUM Banks +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to coalesce_multichannel_cc_ops: modules=1 functions=1 allocs=2640 blocks=1 instructions=13439 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: coalesce_multichannel_cc_ops finished after 0.002 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 311mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2640 memory location(s), 1 block(s), and 13439 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running infer_stream_ids +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to infer_stream_ids: modules=1 functions=1 allocs=2640 blocks=1 instructions=13439 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: PSUM Rotation rotated 4 PSUM Banks +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: address_rotation_psum finished after 0.022 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 311mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 744 memory location(s), 1 block(s), and 2581 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running coloring_allocator_sb +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to coloring_allocator_sb: modules=1 functions=1 allocs=744 blocks=1 instructions=2581 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes loaded 57221636 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ColoringAllocator::Rep]: INFO: Pre GCA average loaded DMA size 2292 bytes +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes saved 11534338 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ColoringAllocator::Rep]: INFO: Pre GCA average saved DMA size 2047 bytes +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 1064960 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 130 bytes +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: infer_stream_ids finished after 0.002 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 311mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2640 memory location(s), 1 block(s), and 13439 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running pre_sched +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to pre_sched: modules=1 functions=1 allocs=2640 blocks=1 instructions=13439 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PreSched]: Start PRE scheduling 2 cores: 1 at: Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 [LayerSpiller]: LayerSpill: Start... +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 311mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 641 memory location(s), 1 block(s), and 1290 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running coloring_allocator_dram +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to coloring_allocator_dram: modules=1 functions=1 allocs=641 blocks=1 instructions=1290 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:50Z INFO 9044 [LayerSpiller]: LayerSpill: Found 2 Splits CCs +2025-11-04T21:38:50Z INFO 9044 [LayerSpiller]: Grouped CCs to 2 clusters. +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: allocating spills in DRAM pre_link mode for address space Local +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: reserved space = 164096 bytes +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: spill space = 0 bytes +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: aligned spill space = 0 bytes +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: dram space = 107374182400 bytes +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: renumber locations +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: size = 0 +2025-11-04T21:38:50Z INFO 9044 []: find first defs for local +2025-11-04T21:38:50Z INFO 9044 []: find first defs for global +2025-11-04T21:38:50Z INFO 9044 [LayerSpiller]: LayerSpill: To Spill 0 multi-layer tensors +2025-11-04T21:38:50Z INFO 9044 [LayerSpiller]: LayerSpill: set uninit flag on 0 insts +2025-11-04T21:38:50Z INFO 9044 [LayerSpiller]: LayerSpill: Done. +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PreSched]: Start split live ranges Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: Num intervals 0 Num locations 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: IntervalTree Build Done +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: info.neighbors init Done +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: IntervalTree readback Done +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: simplify interference graph +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: initialize low and high +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: lo = 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: hi = 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: total = 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: simplify +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: new candidates = 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: select ranges +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: CC buffer size limit 524288000 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: allreduce_dram_hwm 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: Real CC buffer size 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: DRAM hwm after allocation: 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: DRAM allocation successful +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: coloring_allocator_dram finished after 0.003 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 311mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 641 memory location(s), 1 block(s), and 1290 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running address_rotation_dram +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to address_rotation_dram: modules=1 functions=1 allocs=641 blocks=1 instructions=1290 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: Runtime page size at 512MB +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ConstantPropagate]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: DRAM hwm before rotation 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: allreduce buffer size 524288000 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: allreduce hwm 4194304 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: Real CC buffer size 4194304 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: DRAM hwm after rotation 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: DRAM Rotation rotated 0 Dram address +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: address_rotation_dram finished after 0.001 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: allocating SB +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: main loop +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: renumber locations +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: size = 531 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PreSched]: Num_Splits: 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PreSched]: End split live ranges Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PreSched]: Strt remove redundncies Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PreSched]: remove_redundant_memsets +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: find partners +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: found 142 accumulation groups +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: largest = _dot.6-t1590_i15 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: tensors = 36 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: requires 49152 bytes/partition +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: expanding partners +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: SB Rotation rotated 32 Sb address +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PreSched]: remove_redundant_memsets: 1 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PreSched]: remove_redundant_loads +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 312mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 641 memory location(s), 1 block(s), and 1290 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running tensorcopy_accel +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to tensorcopy_accel: modules=1 functions=1 allocs=641 blocks=1 instructions=1290 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [TensorCopyAccel::Impl]: Running peephole optimization pass +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [TensorCopyAccel::Impl]: Accelerated 34 out of 136 tensorcopy in Function: sg0000 average acceleration factor: 1 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: tensorcopy_accel finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 312mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 641 memory location(s), 1 block(s), and 1290 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running peephole_opts +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to peephole_opts: modules=1 functions=1 allocs=641 blocks=1 instructions=1290 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [PeepholeOpts]: PeepholeOpts enabled? Recip: true Tsp: true Tc: false SplitSelect: true SimplifyMemset true +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: peephole_opts finished after 0.001 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 312mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 641 memory location(s), 1 block(s), and 1291 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running lower_kernel +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to lower_kernel: modules=1 functions=1 allocs=641 blocks=1 instructions=1291 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [LowerKernel]: Started running LowerKernel +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [LowerKernel]: BIR SB coloring allocator is disabled +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [LowerKernel]: Start of kernel lowering pass, number of insts: 1291, number of allocs: 641 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [LowerKernel]: Found InstBIRKernel: [CausalAttentionMMSoftmaxMMWithoutSwap]I-2766-0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [LowerKernel]: Scan BKs time (s): 0.000117 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [LowerKernel]: Set architecture: gen3 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [LowerKernel]: Input/output shapes for Kernel inst [I-2766-0] +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [LowerKernel]: input0: [ 4 128 1024 ] +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [LowerKernel]: input1: [ 4 128 1024 ] +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [LowerKernel]: input2: [ 4 1024 128 ] +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [LowerKernel]: input3: ap +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [LowerKernel]: output0: [ 4 128 1024 ] +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [LowerKernel]: do_input1_tp=false +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [LowerKernel]: do_out_tp=true +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [LowerKernel]: Legalized inp_ap=[[131072,4],[1024,128],[1,1024]] +Offset: 0 +Memory Location: {reshape.16}@DRAM(1048576x2)#Internal DebugInfo: +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [LowerKernel]: Legalized inp_ap=[[131072,4],[1024,128],[1,1024]] +Offset: 0 +Memory Location: {reshape.24}@DRAM(1048576x2)#Internal DebugInfo: +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [LowerKernel]: AP of Q indicates standalone Q tensor. +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [LowerKernel]: parallel_split_n = input1_ap[1].getStep() / input1_ap[2].getNum() = 1024 / 1024 = 1 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [LowerKernel]: Sharding/tiling split_i=0, split_n=1 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [LowerKernel]: Flash attention has been disabled +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [LowerKernel]: Scratch sbuf for kernel I-2766-0: [80128, 115804) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [LowerKernel]: seq_len=1024, seq_len2=1024, complete_seq_len2=1024 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [LowerKernel]: Creating identity matrices with AffineSelect +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [LowerKernel]: seq_len=1024, seq_len2=1024, complete_seq_len2=1024 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PreSched]: remove_redundant_loads: 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [LowerKernel]: Creating identity matrices with AffineSelect +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PreSched]: End remove redundncies Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PreSched]: Start DCE Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [LowerKernel]: seq_len=1024, seq_len2=1024, complete_seq_len2=1024 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [LowerKernel]: Creating identity matrices with AffineSelect +2025-11-04T21:38:50Z INFO 9044 []: find first defs for local +2025-11-04T21:38:50Z INFO 9044 []: find first defs for global +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PreSched]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: SB Rotation rotated 21 Sb address +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: SB Rotation rotated 16 Sb address +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: address_rotation_sb finished after 0.049 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 312mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 640 memory location(s), 1 block(s), and 1287 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running coloring_allocator_dram +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to coloring_allocator_dram: modules=1 functions=1 allocs=640 blocks=1 instructions=1287 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [LowerKernel]: seq_len=1024, seq_len2=1024, complete_seq_len2=1024 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [LowerKernel]: Creating identity matrices with AffineSelect +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DRAM_Allocator]: allocating spills in DRAM pre_link mode for address space Local +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DRAM_Allocator]: reserved space = 164096 bytes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DRAM_Allocator]: spill space = 0 bytes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DRAM_Allocator]: aligned spill space = 0 bytes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DRAM_Allocator]: dram space = 107374182400 bytes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DRAM_Allocator]: renumber locations +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DRAM_Allocator]: size = 0 +2025-11-04T21:38:50Z INFO 9044 []: find first defs for local +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [LowerKernel]: Lower BKs time (s): 0.03333 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: lower_kernel finished after 0.011 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 313mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: find loads +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1184 memory location(s), 1 block(s), and 2086 instruction(s). Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running lower_klir_kernel +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to lower_klir_kernel: modules=1 functions=1 allocs=1184 blocks=1 instructions=2086 Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: 2 pin count +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: 93 remat count +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: 2 pinned tensors will require about 16392 bytes/partition +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: build interference graph +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: pass 1 int-tree +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: lower_klir_kernel finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 313mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1184 memory location(s), 1 block(s), and 2086 instruction(s). Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running lower_nki_kernel +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to lower_nki_kernel: modules=1 functions=1 allocs=1184 blocks=1 instructions=2086 Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: lower_nki_kernel finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 313mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1184 memory location(s), 1 block(s), and 2086 instruction(s). Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running non_ssa_legalization +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to non_ssa_legalization: modules=1 functions=1 allocs=1184 blocks=1 instructions=2086 Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [NonSSALeg]: remove_redundant_loads +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [NonSSALeg]: remove_redundant_loads: 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: Num intervals 531 Num locations 531 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: IntervalTree Build Done +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: info.neighbors init Done +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: info.neighbors partners Done +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: IntervalTree readback Done +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: edge: 14189 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: mean: 53.4426 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: median: 43.6911 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: find costs +2025-11-04T21:38:50Z INFO 9044 []: find first defs for global +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: SB Rotation rotated 28 Sb address +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DRAM_Allocator]: Num intervals 0 Num locations 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DRAM_Allocator]: IntervalTree Build Done +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DRAM_Allocator]: info.neighbors init Done +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DRAM_Allocator]: IntervalTree readback Done +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DRAM_Allocator]: simplify interference graph +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DRAM_Allocator]: initialize low and high +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DRAM_Allocator]: lo = 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DRAM_Allocator]: hi = 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DRAM_Allocator]: total = 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DRAM_Allocator]: simplify +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DRAM_Allocator]: new candidates = 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DRAM_Allocator]: select ranges +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DRAM_Allocator]: CC buffer size limit 524288000 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DRAM_Allocator]: allreduce_dram_hwm 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DRAM_Allocator]: Real CC buffer size 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DRAM_Allocator]: DRAM hwm after allocation: 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DRAM_Allocator]: DRAM allocation successful +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: coloring_allocator_dram finished after 0.006 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 313mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 640 memory location(s), 1 block(s), and 1287 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running address_rotation_dram +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to address_rotation_dram: modules=1 functions=1 allocs=640 blocks=1 instructions=1287 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: Runtime page size at 512MB +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: DRAM hwm before rotation 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: allreduce buffer size 524288000 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: allreduce hwm 4194304 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: Real CC buffer size 4194304 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: DRAM hwm after rotation 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: DRAM Rotation rotated 0 Dram address +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: address_rotation_dram finished after 0.001 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 313mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 640 memory location(s), 1 block(s), and 1287 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running tensorcopy_accel +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to tensorcopy_accel: modules=1 functions=1 allocs=640 blocks=1 instructions=1287 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [TensorCopyAccel::Impl]: Running peephole optimization pass +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [TensorCopyAccel::Impl]: Accelerated 34 out of 135 tensorcopy in Function: sg0000 average acceleration factor: 1 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: tensorcopy_accel finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 313mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 640 memory location(s), 1 block(s), and 1287 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running peephole_opts +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to peephole_opts: modules=1 functions=1 allocs=640 blocks=1 instructions=1287 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [PeepholeOpts]: PeepholeOpts enabled? Recip: true Tsp: true Tc: false SplitSelect: true SimplifyMemset true +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: peephole_opts finished after 0.001 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 314mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 640 memory location(s), 1 block(s), and 1288 instruction(s). Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running lower_kernel +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to lower_kernel: modules=1 functions=1 allocs=640 blocks=1 instructions=1288 Max writers: 16 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [LowerKernel]: Started running LowerKernel +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [LowerKernel]: BIR SB coloring allocator is disabled +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [LowerKernel]: Start of kernel lowering pass, number of insts: 1288, number of allocs: 640 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [LowerKernel]: Found InstBIRKernel: [CausalAttentionMMSoftmaxMMWithoutSwap]I-2766-0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [LowerKernel]: Scan BKs time (s): 8.8e-05 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [LowerKernel]: Set architecture: gen3 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [LowerKernel]: Input/output shapes for Kernel inst [I-2766-0] +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [LowerKernel]: input0: [ 4 128 1024 ] +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [LowerKernel]: input1: [ 4 128 1024 ] +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [LowerKernel]: input2: [ 4 1024 128 ] +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [LowerKernel]: input3: ap +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [LowerKernel]: output0: [ 4 128 1024 ] +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [LowerKernel]: do_input1_tp=false +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [LowerKernel]: do_out_tp=true +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [LowerKernel]: Legalized inp_ap=[[131072,4],[1024,128],[1,1024]] +Offset: 524288 +Memory Location: {reshape.16}@DRAM(1048576x2)#Internal DebugInfo: +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [LowerKernel]: Legalized inp_ap=[[131072,4],[1024,128],[1,1024]] +Offset: 524288 +Memory Location: {reshape.24}@DRAM(1048576x2)#Internal DebugInfo: +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [LowerKernel]: AP of Q indicates standalone Q tensor. +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [LowerKernel]: parallel_split_n = input1_ap[1].getStep() / input1_ap[2].getNum() = 1024 / 1024 = 1 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [LowerKernel]: Sharding/tiling split_i=0, split_n=1 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [LowerKernel]: Flash attention has been disabled +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [LowerKernel]: Scratch sbuf for kernel I-2766-0: [80128, 115804) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [LowerKernel]: seq_len=1024, seq_len2=1024, complete_seq_len2=1024 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [LowerKernel]: Creating identity matrices with AffineSelect +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [LowerKernel]: seq_len=1024, seq_len2=1024, complete_seq_len2=1024 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [LowerKernel]: Creating identity matrices with AffineSelect +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [LowerKernel]: seq_len=1024, seq_len2=1024, complete_seq_len2=1024 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [LowerKernel]: Creating identity matrices with AffineSelect +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [NonSSALeg]: [Non-SSA legalization]created 32 memorylocations +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: non_ssa_legalization finished after 0.008 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 316mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1200 memory location(s), 1 block(s), and 2086 instruction(s). Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running dynamic_dma_cleanup +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to dynamic_dma_cleanup: modules=1 functions=1 allocs=1200 blocks=1 instructions=2086 Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: address_rotation_sb finished after 0.050 seconds +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: dynamic_dma_cleanup finished after 0.001 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 316mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 316mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 734 memory location(s), 1 block(s), and 2570 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running coloring_allocator_dram +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: best-of-n loop, heuristic = 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: simplify interference graph +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to coloring_allocator_dram: modules=1 functions=1 allocs=734 blocks=1 instructions=2570 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: initialize safe & unsafe +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1200 memory location(s), 1 block(s), and 2086 instruction(s). Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running birverifier +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=1200 blocks=1 instructions=2086 Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: safe = 383 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: unsafe = 127 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: inf = 19 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: total = 529 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: simplify +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: simplify_step3_sorted2 #Unsafe 0 #Pinned 0 #Safe 0 minCost 1.79769e+308 maxCost 2.22507e-308 locations 531 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: new candidates = 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: select ranges +2025-11-04T21:38:50Z WARNING 9044 [birverifier::InstVisitor]: (nc00/sg00) Non - output memory location with no reader: {I-2766-0_s0_aten__mul_broadcast.7-t210_b0}@SB<0,85508>(128x4)#Internal DebugInfo: +2025-11-04T21:38:50Z WARNING 9044 [birverifier::InstVisitor]: (nc00/sg00) Non - output memory location with no reader: {I-2766-0_s0_aten__mul_broadcast.7-t210_b1}@SB<0,85508>(128x4)#Internal DebugInfo: +2025-11-04T21:38:50Z WARNING 9044 [birverifier::InstVisitor]: (nc00/sg00) Non - output memory location with no reader: {I-2766-0_s0_aten__mul_broadcast.7-t210_b2}@SB<0,85508>(128x4)#Internal DebugInfo: +2025-11-04T21:38:50Z WARNING 9044 [birverifier::InstVisitor]: (nc00/sg00) Non - output memory location with no reader: {I-2766-0_s0_aten__mul_broadcast.7-t210_b3}@SB<0,85508>(128x4)#Internal DebugInfo: +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: Total: 529 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: Spilled: 0.000 (0) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: Allocated: 1.000 (529) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: Rover zone: 0.890 (471) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: Pre-rover zone: 0.013 (7) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: Post-rover zone: 0.096 (51) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: Slice zone: 0.000 (0) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: Blocks nothing: 0.002 (1) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: Blocks medium: 0.000 (0) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: Blocks tall: 0.998 (528) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: Visited until tall blocking (mean): 0.993 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: Visited until tall blocking (median): 1.000 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: Visited until tall blocking (p95): 1.000 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: Success +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: SB spills = 0 tensors +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: size = 0 bytes/partition +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: remats = 0 tensors +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: unpinned = 0 tensors +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: size = 0 bytes/partition +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: SB score = 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: spilling from SB cost about 0 cycles +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: 16392 bytes/partition (100%) successfully pinned +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [LowerKernel]: seq_len=1024, seq_len2=1024, complete_seq_len2=1024 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [LowerKernel]: Creating identity matrices with AffineSelect +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DRAM_Allocator]: allocating spills in DRAM pre_link mode for address space Local +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DRAM_Allocator]: reserved space = 131072 bytes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DRAM_Allocator]: spill space = 0 bytes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DRAM_Allocator]: aligned spill space = 0 bytes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DRAM_Allocator]: dram space = 107374182400 bytes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DRAM_Allocator]: renumber locations +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DRAM_Allocator]: size = 0 +2025-11-04T21:38:50Z INFO 9044 []: find first defs for local +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [LowerKernel]: Lower BKs time (s): 0.029906 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: lower_kernel finished after 0.009 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 317mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 []: find first defs for global +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1183 memory location(s), 1 block(s), and 2083 instruction(s). Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running lower_klir_kernel +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to lower_klir_kernel: modules=1 functions=1 allocs=1183 blocks=1 instructions=2083 Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: pinning saved approximately 8300 cycles +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [SB_Allocator]: 0% SB utilization after allocation +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes loaded 57221636 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ColoringAllocator::Rep]: INFO: Post GCA average loaded DMA size 2292 bytes +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes saved 11534338 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ColoringAllocator::Rep]: INFO: Post GCA average saved DMA size 2047 bytes +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 1064960 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 130 bytes +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: lower_klir_kernel finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 317mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1183 memory location(s), 1 block(s), and 2083 instruction(s). Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running lower_nki_kernel +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to lower_nki_kernel: modules=1 functions=1 allocs=1183 blocks=1 instructions=2083 Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: lower_nki_kernel finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 317mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1183 memory location(s), 1 block(s), and 2083 instruction(s). Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running non_ssa_legalization +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to non_ssa_legalization: modules=1 functions=1 allocs=1183 blocks=1 instructions=2083 Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [NonSSALeg]: remove_redundant_loads +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [NonSSALeg]: remove_redundant_loads: 0 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: coloring_allocator_sb finished after 0.043 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 317mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: birverifier finished after 0.009 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 317mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 744 memory location(s), 1 block(s), and 2581 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running address_rotation_sb +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1200 memory location(s), 1 block(s), and 2086 instruction(s). Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running dynamic_dma_scan +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=744 blocks=1 instructions=2581 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to dynamic_dma_scan: modules=1 functions=1 allocs=1200 blocks=1 instructions=2086 Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: dynamic_dma_scan finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 317mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1200 memory location(s), 1 block(s), and 2086 instruction(s). Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running build_fdeps +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to build_fdeps: modules=1 functions=1 allocs=1200 blocks=1 instructions=2086 Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DRAM_Allocator]: Num intervals 0 Num locations 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DRAM_Allocator]: IntervalTree Build Done +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DRAM_Allocator]: info.neighbors init Done +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DRAM_Allocator]: IntervalTree readback Done +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DRAM_Allocator]: simplify interference graph +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DRAM_Allocator]: initialize low and high +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DRAM_Allocator]: lo = 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DRAM_Allocator]: hi = 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DRAM_Allocator]: total = 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DRAM_Allocator]: simplify +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DRAM_Allocator]: new candidates = 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DRAM_Allocator]: select ranges +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DRAM_Allocator]: CC buffer size limit 524288000 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [build_flow_deps]: Start build fdeps. Invocation: 5Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DRAM_Allocator]: allreduce_dram_hwm 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DRAM_Allocator]: Real CC buffer size 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DRAM_Allocator]: DRAM hwm after allocation: 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DRAM_Allocator]: DRAM allocation successful +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: coloring_allocator_dram finished after 0.010 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 317mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [build_flow_deps]: Allocs: 1200 instructions: 2086 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 734 memory location(s), 1 block(s), and 2570 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running address_rotation_dram +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to address_rotation_dram: modules=1 functions=1 allocs=734 blocks=1 instructions=2570 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: Runtime page size at 512MB +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: DRAM hwm before rotation 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: allreduce buffer size 524288000 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: allreduce hwm 4194304 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: Real CC buffer size 4194304 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: DRAM hwm after rotation 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: DRAM Rotation rotated 0 Dram address +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: address_rotation_dram finished after 0.002 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 317mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 734 memory location(s), 1 block(s), and 2570 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running tensorcopy_accel +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to tensorcopy_accel: modules=1 functions=1 allocs=734 blocks=1 instructions=2570 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [TensorCopyAccel::Impl]: Running peephole optimization pass +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PreSched]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [TensorCopyAccel::Impl]: Accelerated 18 out of 129 tensorcopy in Function: sg0001 average acceleration factor: 1 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: tensorcopy_accel finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 318mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 734 memory location(s), 1 block(s), and 2570 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running peephole_opts +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to peephole_opts: modules=1 functions=1 allocs=734 blocks=1 instructions=2570 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [PeepholeOpts]: PeepholeOpts enabled? Recip: true Tsp: true Tc: false SplitSelect: true SimplifyMemset true +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: peephole_opts finished after 0.001 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 318mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 734 memory location(s), 1 block(s), and 2571 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running lower_kernel +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to lower_kernel: modules=1 functions=1 allocs=734 blocks=1 instructions=2571 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [LowerKernel]: Started running LowerKernel +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [LowerKernel]: BIR SB coloring allocator is disabled +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [LowerKernel]: Start of kernel lowering pass, number of insts: 2571, number of allocs: 734 +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: constant_propagate finished after 0.115 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 318mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3089 memory location(s), 1 block(s), and 14220 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running lower_ac +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to lower_ac: modules=1 functions=1 allocs=3089 blocks=1 instructions=14220 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: address_rotation_sb finished after 0.005 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 318mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 744 memory location(s), 1 block(s), and 2581 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running dma_optimization_sb +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to dma_optimization_sb: modules=1 functions=1 allocs=744 blocks=1 instructions=2581 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: DMA optimization In bytes loaded or saved 68755974, 71.0237% input load, 3.05014% output write, 25.9262% spill/reload [sg0001] +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [LowerKernel]: Found InstBIRKernel: [CausalAttentionMMSoftmaxMMWithoutSwap]I-2433-0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [LowerKernel]: Scan BKs time (s): 0.01566 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [LowerKernel]: Set architecture: gen3 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [LowerKernel]: Input/output shapes for Kernel inst [I-2433-0] +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [LowerKernel]: input0: [ 4 128 1024 ] +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [LowerKernel]: input1: [ 4 128 1024 ] +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [LowerKernel]: input2: [ 4 1024 128 ] +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [LowerKernel]: input3: ap +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [LowerKernel]: output0: [ 4 128 1024 ] +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [LowerKernel]: do_input1_tp=false +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [LowerKernel]: do_out_tp=true +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [LowerKernel]: Legalized inp_ap=[[131072,4],[1024,128],[1,1024]] +Offset: 524288 +Memory Location: {reshape.60}@DRAM(1048576x2)#Internal DebugInfo: +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [LowerKernel]: Legalized inp_ap=[[131072,4],[1024,128],[1,1024]] +Offset: 524288 +Memory Location: {reshape.68}@DRAM(1048576x2)#Internal DebugInfo: +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [LowerKernel]: AP of Q indicates standalone Q tensor. +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [LowerKernel]: parallel_split_n = input1_ap[1].getStep() / input1_ap[2].getNum() = 1024 / 1024 = 1 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [LowerKernel]: Sharding/tiling split_i=0, split_n=1 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [LowerKernel]: Flash attention has been disabled +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [LowerKernel]: Scratch sbuf for kernel I-2433-0: [65024, 100700) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [LowerKernel]: seq_len=1024, seq_len2=1024, complete_seq_len2=1024 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [LowerKernel]: Creating identity matrices with AffineSelect +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: removed 0 identical load +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: adjusted 0 DMACopy remat +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [NonSSALeg]: [Non-SSA legalization]created 32 memorylocations +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: non_ssa_legalization finished after 0.010 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PreSched]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 318mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1199 memory location(s), 1 block(s), and 2083 instruction(s). Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running dynamic_dma_cleanup +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to dynamic_dma_cleanup: modules=1 functions=1 allocs=1199 blocks=1 instructions=2083 Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: dynamic_dma_cleanup finished after 0.001 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 318mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1199 memory location(s), 1 block(s), and 2083 instruction(s). Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running birverifier +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [LowerKernel]: seq_len=1024, seq_len2=1024, complete_seq_len2=1024 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [LowerKernel]: Creating identity matrices with AffineSelect +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=1199 blocks=1 instructions=2083 Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z WARNING 9044 [birverifier::InstVisitor]: (nc01/sg00) Non - output memory location with no reader: {I-2766-0_s0_aten__mul_broadcast.7-t210_b0}@SB<0,85508>(128x4)#Internal DebugInfo: +2025-11-04T21:38:50Z WARNING 9044 [birverifier::InstVisitor]: (nc01/sg00) Non - output memory location with no reader: {I-2766-0_s0_aten__mul_broadcast.7-t210_b1}@SB<0,85508>(128x4)#Internal DebugInfo: +2025-11-04T21:38:50Z WARNING 9044 [birverifier::InstVisitor]: (nc01/sg00) Non - output memory location with no reader: {I-2766-0_s0_aten__mul_broadcast.7-t210_b2}@SB<0,85508>(128x4)#Internal DebugInfo: +2025-11-04T21:38:50Z WARNING 9044 [birverifier::InstVisitor]: (nc01/sg00) Non - output memory location with no reader: {I-2766-0_s0_aten__mul_broadcast.7-t210_b3}@SB<0,85508>(128x4)#Internal DebugInfo: +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [LowerKernel]: seq_len=1024, seq_len2=1024, complete_seq_len2=1024 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [LowerKernel]: Creating identity matrices with AffineSelect +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PreSched]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [LowerAC]: INFO (LowerAC) Lowered 0 loads, 0 saves, 0 copies. +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: lower_ac finished after 0.008 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 321mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3089 memory location(s), 1 block(s), and 14220 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running input_dma_coalescing +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to input_dma_coalescing: modules=1 functions=1 allocs=3089 blocks=1 instructions=14220 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: sub-graph will get execute 27 times +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: [Load Merging]: removed 0 remat/cloned instructions +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [build_flow_deps]: Build fdeps inserted 4668 edges +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [build_flow_deps]: Done build fdeps 4668 Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: build_fdeps finished after 0.014 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 321mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1200 memory location(s), 1 block(s), and 2086 instruction(s). Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running remove_redundancies +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to remove_redundancies: modules=1 functions=1 allocs=1200 blocks=1 instructions=2086 Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [RemoveRedundancies]: remove_clobbered_writes +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [RemoveRedundancies]: remove_clobbered_writes: 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PreSched]: End DCE Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: [Load shrink]: shrinked 0 GCA remat/cloned instructions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [LowerKernel]: seq_len=1024, seq_len2=1024, complete_seq_len2=1024 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [LowerKernel]: Creating identity matrices with AffineSelect +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: [Load Merging + Load shrink] reduced input/const loading DMA traffic 0, 0% out of total dma traffic(4.8833e+07) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [LowerKernel]: Lower BKs time (s): 0.033175 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: lower_kernel finished after 0.012 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 322mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1277 memory location(s), 1 block(s), and 3366 instruction(s). Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running lower_klir_kernel +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to lower_klir_kernel: modules=1 functions=1 allocs=1277 blocks=1 instructions=3366 Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: lower_klir_kernel finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 322mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1277 memory location(s), 1 block(s), and 3366 instruction(s). Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running lower_nki_kernel +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to lower_nki_kernel: modules=1 functions=1 allocs=1277 blocks=1 instructions=3366 Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: lower_nki_kernel finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 322mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1277 memory location(s), 1 block(s), and 3366 instruction(s). Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running non_ssa_legalization +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to non_ssa_legalization: modules=1 functions=1 allocs=1277 blocks=1 instructions=3366 Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [NonSSALeg]: remove_redundant_loads +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: [spill optimization round 0]: removed 8 spill/reload instructions +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: [spill optimization round 0]: removed 8 spill/reload memory locations +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [RemoveRedundancies]: remove_useless_insts +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [NonSSALeg]: remove_redundant_loads: 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: [spill optimization round 1]: removed 0 spill/reload instructions +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: [spill optimization round 1]: removed 0 spill/reload memory locations +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [RemoveRedundancies]: remove Useless Instructions: 28 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: remove_redundancies finished after 0.005 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 322mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1188 memory location(s), 1 block(s), and 2058 instruction(s). Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running anti_dependency_analyzer +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=1188 blocks=1 instructions=2058 Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB} +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 4194304, 23.5294% out of total spill/reload dma traffic +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [NonSSALeg]: [Non-SSA legalization]created 32 memorylocations +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload instructions +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload memory locations +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: [Re-allocation Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload instructions +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: non_ssa_legalization finished after 0.005 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload memory locations +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 323mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1293 memory location(s), 1 block(s), and 3366 instruction(s). Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running dynamic_dma_cleanup +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to dynamic_dma_cleanup: modules=1 functions=1 allocs=1293 blocks=1 instructions=3366 Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: dynamic_dma_cleanup finished after 0.001 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 323mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1293 memory location(s), 1 block(s), and 3366 instruction(s). Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running birverifier +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=1293 blocks=1 instructions=3366 Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: [remove extra save] removed 0 memlocs and 0 instructions +2025-11-04T21:38:50Z WARNING 9044 [birverifier::InstVisitor]: (nc01/sg01) Non - output memory location with no reader: {I-2433-0_s0_aten__mul_broadcast.7-t210_b0}@SB<0,70404>(128x4)#Internal DebugInfo: +2025-11-04T21:38:50Z WARNING 9044 [birverifier::InstVisitor]: (nc01/sg01) Non - output memory location with no reader: {I-2433-0_s0_aten__mul_broadcast.7-t210_b1}@SB<0,70404>(128x4)#Internal DebugInfo: +2025-11-04T21:38:50Z WARNING 9044 [birverifier::InstVisitor]: (nc01/sg01) Non - output memory location with no reader: {I-2433-0_s0_aten__mul_broadcast.7-t210_b2}@SB<0,70404>(128x4)#Internal DebugInfo: +2025-11-04T21:38:50Z WARNING 9044 [birverifier::InstVisitor]: (nc01/sg01) Non - output memory location with no reader: {I-2433-0_s0_aten__mul_broadcast.7-t210_b3}@SB<0,70404>(128x4)#Internal DebugInfo: +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: [remove_memset_spill]: removed 0 spill/reload instructions +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: DMA input Coalescing combined 0 input loads +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: input_dma_coalescing finished after 0.020 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: [remove_memset_spill]: removed 0 spill/reload memory locations +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 324mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3089 memory location(s), 1 block(s), and 14220 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running remat_optimization +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to remat_optimization: modules=1 functions=1 allocs=3089 blocks=1 instructions=14220 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PreSched]: Start build flow dependencies Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [build_flow_deps]: Start build fdeps. Invocation: 6Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: birverifier finished after 0.027 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 324mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1199 memory location(s), 1 block(s), and 2083 instruction(s). Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running dynamic_dma_scan +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to dynamic_dma_scan: modules=1 functions=1 allocs=1199 blocks=1 instructions=2083 Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [build_flow_deps]: Allocs: 2640 instructions: 13438 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: dynamic_dma_scan finished after 0.001 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 325mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1199 memory location(s), 1 block(s), and 2083 instruction(s). Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running build_fdeps +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to build_fdeps: modules=1 functions=1 allocs=1199 blocks=1 instructions=2083 Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [build_flow_deps]: Start build fdeps. Invocation: 7Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: DMA SpillSave Coalescing Round 0 combined 0 SpillSaves and Reloads +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [build_flow_deps]: Allocs: 1199 instructions: 2083 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: average loaded DMA size 2254 bytes +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: average saved DMA size 1842 bytes +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes loaded 55124484 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: INFO: Post DMA coalescing average loaded DMA size 2254 bytes +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: anti_dependency_analyzer finished after 0.022 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 325mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes saved 9437186 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: INFO: Post DMA coalescing average saved DMA size 1842 bytes +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1188 memory location(s), 1 block(s), and 2058 instruction(s). Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running tensor_copy_elim +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=1188 blocks=1 instructions=2058 Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [TensorCopyElim]: Tensor CP elimination: 32 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: [Experiment partial DMA access] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: [DMA optimization] reduced DMA traffic 4194304, 6.10028% out of total dma traffic +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: DMA optimization Out bytes loaded or saved 64561670, 75.6378% input load, 3.2483% output write, 21.1139% spill/reload [sg0001] +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes loaded 55124484 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: INFO: Post DMA optimization average loaded DMA size 2254 bytes +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes saved 9437186 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: INFO: Post DMA optimization average saved DMA size 1842 bytes +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes DMAcopyed 1064960 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: INFO: Post DMA optimization average DMAcopyed DMA size 130 bytes +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: INFO: Post DMA optimization average DMA size 1737 bytes +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: INFO: Finished set_spill_canreadUninit(module); +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: DMA optimization re-enable optimization +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: dma_optimization_sb finished after 0.041 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 325mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 735 memory location(s), 1 block(s), and 2573 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running address_rotation_sb +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=735 blocks=1 instructions=2573 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: SB Rotation rotated 3 Sb address +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [TensorCopyElim]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [build_flow_deps]: Build fdeps inserted 4666 edges +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: birverifier finished after 0.023 seconds +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: tensor_copy_elim finished after 0.010 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 325mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 325mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1155 memory location(s), 1 block(s), and 2026 instruction(s). Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running dead_code_elim_o0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1293 memory location(s), 1 block(s), and 3366 instruction(s). Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running dynamic_dma_scan +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to dead_code_elim_o0: modules=1 functions=1 allocs=1155 blocks=1 instructions=2026 Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to dynamic_dma_scan: modules=1 functions=1 allocs=1293 blocks=1 instructions=3366 Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: dynamic_dma_scan finished after 0.001 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 325mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [build_flow_deps]: Done build fdeps 4666 Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: build_fdeps finished after 0.015 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 325mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1293 memory location(s), 1 block(s), and 3366 instruction(s). Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running build_fdeps +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to build_fdeps: modules=1 functions=1 allocs=1293 blocks=1 instructions=3366 Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: SB Rotation rotated 52 Sb address +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1199 memory location(s), 1 block(s), and 2083 instruction(s). Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running remove_redundancies +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to remove_redundancies: modules=1 functions=1 allocs=1199 blocks=1 instructions=2083 Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [RemoveRedundancies]: remove_clobbered_writes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [build_flow_deps]: Start build fdeps. Invocation: 8Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [RemoveRedundancies]: remove_clobbered_writes: 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [RemoveRedundancies]: remove_useless_insts +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [build_flow_deps]: Allocs: 1293 instructions: 3366 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [RemoveRedundancies]: remove Useless Instructions: 28 +2025-11-04T21:38:50Z USER 9044 (nc00/sg00) [ModuleForkPass]: dead_code_elim_o0 finished after 0.004 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 325mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: remove_redundancies finished after 0.003 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 325mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1155 memory location(s), 1 block(s), and 2026 instruction(s). Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1187 memory location(s), 1 block(s), and 2055 instruction(s). Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running anti_dependency_analyzer +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=1187 blocks=1 instructions=2055 Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB} +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [RematOpt]: Removed 0 remat instructions +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: remat_optimization finished after 0.035 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 328mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3089 memory location(s), 1 block(s), and 14220 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running coalesce_multichannel_cc_ops +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to coalesce_multichannel_cc_ops: modules=1 functions=1 allocs=3089 blocks=1 instructions=14220 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: coalesce_multichannel_cc_ops finished after 0.002 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 329mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3089 memory location(s), 1 block(s), and 14220 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running infer_stream_ids +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to infer_stream_ids: modules=1 functions=1 allocs=3089 blocks=1 instructions=14220 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: infer_stream_ids finished after 0.002 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 330mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3089 memory location(s), 1 block(s), and 14220 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running pre_sched +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to pre_sched: modules=1 functions=1 allocs=3089 blocks=1 instructions=14220 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PreSched]: Start PRE scheduling 2 cores: 1 at: Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 [LayerSpiller]: LayerSpill: Start... +2025-11-04T21:38:50Z INFO 9044 [LayerSpiller]: LayerSpill: Found 2 Splits CCs +2025-11-04T21:38:50Z INFO 9044 [LayerSpiller]: Grouped CCs to 2 clusters. +2025-11-04T21:38:50Z INFO 9044 [LayerSpiller]: LayerSpill: To Spill 0 multi-layer tensors +2025-11-04T21:38:50Z INFO 9044 [LayerSpiller]: LayerSpill: set uninit flag on 0 insts +2025-11-04T21:38:50Z INFO 9044 [LayerSpiller]: LayerSpill: Done. +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PreSched]: Start split live ranges Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: anti_dependency_analyzer finished after 0.022 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 331mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1187 memory location(s), 1 block(s), and 2055 instruction(s). Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running tensor_copy_elim +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=1187 blocks=1 instructions=2055 Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [build_flow_deps]: Build fdeps inserted 8735 edges +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [build_flow_deps]: Done build fdeps 8735 Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: build_fdeps finished after 0.030 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 333mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1293 memory location(s), 1 block(s), and 3366 instruction(s). Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running remove_redundancies +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to remove_redundancies: modules=1 functions=1 allocs=1293 blocks=1 instructions=3366 Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [RemoveRedundancies]: remove_clobbered_writes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [RemoveRedundancies]: remove_clobbered_writes: 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [RemoveRedundancies]: remove_useless_insts +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PreSched]: Num_Splits: 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [TensorCopyElim]: Tensor CP elimination: 32 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PreSched]: End split live ranges Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PreSched]: Strt remove redundncies Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PreSched]: remove_redundant_memsets +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [TensorCopyElim]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [RemoveRedundancies]: remove Useless Instructions: 28 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: remove_redundancies finished after 0.004 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 333mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1281 memory location(s), 1 block(s), and 3338 instruction(s). Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running anti_dependency_analyzer +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=1281 blocks=1 instructions=3338 Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB} +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: tensor_copy_elim finished after 0.010 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 333mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1154 memory location(s), 1 block(s), and 2023 instruction(s). Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running dead_code_elim_o0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to dead_code_elim_o0: modules=1 functions=1 allocs=1154 blocks=1 instructions=2023 Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z USER 9044 (nc01/sg00) [ModuleForkPass]: dead_code_elim_o0 finished after 0.001 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 335mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1154 memory location(s), 1 block(s), and 2023 instruction(s). Max writers: 33 Max Readers: 224 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PreSched]: remove_redundant_memsets: 5 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PreSched]: remove_redundant_loads +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PreSched]: remove_redundant_loads: 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: SB Rotation rotated 16 Sb address +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PreSched]: End remove redundncies Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PreSched]: Start DCE Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: SB Rotation rotated 29 Sb address +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: address_rotation_sb finished after 0.064 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 337mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 735 memory location(s), 1 block(s), and 2573 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running coloring_allocator_dram +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to coloring_allocator_dram: modules=1 functions=1 allocs=735 blocks=1 instructions=2573 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: allocating spills in DRAM pre_link mode for address space Local +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: reserved space = 131072 bytes +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: spill space = 0 bytes +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: aligned spill space = 0 bytes +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: dram space = 107374182400 bytes +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: renumber locations +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: size = 0 +2025-11-04T21:38:50Z INFO 9044 []: find first defs for local +2025-11-04T21:38:50Z INFO 9044 []: find first defs for global +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: anti_dependency_analyzer finished after 0.026 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 336mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1281 memory location(s), 1 block(s), and 3338 instruction(s). Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running tensor_copy_elim +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=1281 blocks=1 instructions=3338 Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: Num intervals 0 Num locations 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: IntervalTree Build Done +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: info.neighbors init Done +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: IntervalTree readback Done +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: simplify interference graph +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: initialize low and high +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: lo = 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: hi = 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: total = 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: simplify +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: new candidates = 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: select ranges +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: CC buffer size limit 524288000 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: allreduce_dram_hwm 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: Real CC buffer size 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: DRAM hwm after allocation: 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: DRAM allocation successful +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PreSched]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: coloring_allocator_dram finished after 0.005 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 336mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 735 memory location(s), 1 block(s), and 2573 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running address_rotation_dram +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to address_rotation_dram: modules=1 functions=1 allocs=735 blocks=1 instructions=2573 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: Runtime page size at 512MB +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [build_flow_deps]: Build fdeps inserted 35144 edges +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [build_flow_deps]: Done build fdeps 35144 Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PreSched]: End build flow dependencies Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PreSched]: Start remove useless insts Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PreSched]: remove_useless_insts +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: DRAM hwm before rotation 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: allreduce buffer size 524288000 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: allreduce hwm 4194304 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: Real CC buffer size 4194304 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: DRAM hwm after rotation 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: DRAM Rotation rotated 0 Dram address +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: address_rotation_dram finished after 0.002 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 336mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 735 memory location(s), 1 block(s), and 2573 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running tensorcopy_accel +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to tensorcopy_accel: modules=1 functions=1 allocs=735 blocks=1 instructions=2573 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [TensorCopyAccel::Impl]: Running peephole optimization pass +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [TensorCopyAccel::Impl]: Accelerated 18 out of 130 tensorcopy in Function: sg0001 average acceleration factor: 1 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: tensorcopy_accel finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 336mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [TensorCopyElim]: Tensor CP elimination: 32 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 735 memory location(s), 1 block(s), and 2573 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running peephole_opts +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to peephole_opts: modules=1 functions=1 allocs=735 blocks=1 instructions=2573 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [PeepholeOpts]: PeepholeOpts enabled? Recip: true Tsp: true Tc: false SplitSelect: true SimplifyMemset true +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: peephole_opts finished after 0.001 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 336mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 735 memory location(s), 1 block(s), and 2574 instruction(s). Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running lower_kernel +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to lower_kernel: modules=1 functions=1 allocs=735 blocks=1 instructions=2574 Max writers: 24 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [LowerKernel]: Started running LowerKernel +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [LowerKernel]: BIR SB coloring allocator is disabled +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [LowerKernel]: Start of kernel lowering pass, number of insts: 2574, number of allocs: 735 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [LowerKernel]: Found InstBIRKernel: [CausalAttentionMMSoftmaxMMWithoutSwap]I-2433-0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [LowerKernel]: Scan BKs time (s): 0.002154 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [LowerKernel]: Set architecture: gen3 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [LowerKernel]: Input/output shapes for Kernel inst [I-2433-0] +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [LowerKernel]: input0: [ 4 128 1024 ] +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [LowerKernel]: input1: [ 4 128 1024 ] +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [LowerKernel]: input2: [ 4 1024 128 ] +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [LowerKernel]: input3: ap +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [LowerKernel]: output0: [ 4 128 1024 ] +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [LowerKernel]: do_input1_tp=false +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [LowerKernel]: do_out_tp=true +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [LowerKernel]: Legalized inp_ap=[[131072,4],[1024,128],[1,1024]] +Offset: 0 +Memory Location: {reshape.60}@DRAM(1048576x2)#Internal DebugInfo: +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [LowerKernel]: Legalized inp_ap=[[131072,4],[1024,128],[1,1024]] +Offset: 0 +Memory Location: {reshape.68}@DRAM(1048576x2)#Internal DebugInfo: +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [LowerKernel]: AP of Q indicates standalone Q tensor. +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [LowerKernel]: parallel_split_n = input1_ap[1].getStep() / input1_ap[2].getNum() = 1024 / 1024 = 1 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [LowerKernel]: Sharding/tiling split_i=0, split_n=1 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [LowerKernel]: Flash attention has been disabled +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [LowerKernel]: Scratch sbuf for kernel I-2433-0: [65024, 100700) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [LowerKernel]: seq_len=1024, seq_len2=1024, complete_seq_len2=1024 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [LowerKernel]: Creating identity matrices with AffineSelect +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [TensorCopyElim]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PreSched]: remove Useless Instructions: 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PreSched]: End remove useless insts Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PreSched]: Start scratchpad optimization Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [LowerKernel]: seq_len=1024, seq_len2=1024, complete_seq_len2=1024 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [LowerKernel]: Creating identity matrices with AffineSelect +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [LowerKernel]: seq_len=1024, seq_len2=1024, complete_seq_len2=1024 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [LowerKernel]: Creating identity matrices with AffineSelect +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [LowerKernel]: seq_len=1024, seq_len2=1024, complete_seq_len2=1024 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [LowerKernel]: Creating identity matrices with AffineSelect +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [LowerKernel]: Lower BKs time (s): 0.016752 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: lower_kernel finished after 0.006 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PreSched]: End scratchpad optimization Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 340mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1278 memory location(s), 1 block(s), and 3369 instruction(s). Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running lower_klir_kernel +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to lower_klir_kernel: modules=1 functions=1 allocs=1278 blocks=1 instructions=3369 Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: lower_klir_kernel finished after 0.001 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 340mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1278 memory location(s), 1 block(s), and 3369 instruction(s). Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running lower_nki_kernel +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to lower_nki_kernel: modules=1 functions=1 allocs=1278 blocks=1 instructions=3369 Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: lower_nki_kernel finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 340mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1278 memory location(s), 1 block(s), and 3369 instruction(s). Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running non_ssa_legalization +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to non_ssa_legalization: modules=1 functions=1 allocs=1278 blocks=1 instructions=3369 Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [NonSSALeg]: remove_redundant_loads +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: tensor_copy_elim finished after 0.014 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 340mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1248 memory location(s), 1 block(s), and 3306 instruction(s). Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running dead_code_elim_o0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to dead_code_elim_o0: modules=1 functions=1 allocs=1248 blocks=1 instructions=3306 Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc01/sg01) [ModuleForkPass]: dead_code_elim_o0 finished after 0.002 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 340mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1248 memory location(s), 1 block(s), and 3306 instruction(s). Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [NonSSALeg]: remove_redundant_loads: 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PreSched]: DONE PRE scheduling Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: pre_sched finished after 0.191 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 340mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2640 memory location(s), 1 block(s), and 13438 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running tensor_copy_elim +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=2640 blocks=1 instructions=13438 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PreSched]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PreSched]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PreSched]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PreSched]: End DCE Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [NonSSALeg]: [Non-SSA legalization]created 32 memorylocations +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: non_ssa_legalization finished after 0.033 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 339mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1294 memory location(s), 1 block(s), and 3369 instruction(s). Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running dynamic_dma_cleanup +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to dynamic_dma_cleanup: modules=1 functions=1 allocs=1294 blocks=1 instructions=3369 Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: dynamic_dma_cleanup finished after 0.001 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 339mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1294 memory location(s), 1 block(s), and 3369 instruction(s). Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running birverifier +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=1294 blocks=1 instructions=3369 Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z WARNING 9044 [birverifier::InstVisitor]: (nc00/sg01) Non - output memory location with no reader: {I-2433-0_s0_aten__mul_broadcast.7-t210_b0}@SB<0,70404>(128x4)#Internal DebugInfo: +2025-11-04T21:38:50Z WARNING 9044 [birverifier::InstVisitor]: (nc00/sg01) Non - output memory location with no reader: {I-2433-0_s0_aten__mul_broadcast.7-t210_b1}@SB<0,70404>(128x4)#Internal DebugInfo: +2025-11-04T21:38:50Z WARNING 9044 [birverifier::InstVisitor]: (nc00/sg01) Non - output memory location with no reader: {I-2433-0_s0_aten__mul_broadcast.7-t210_b2}@SB<0,70404>(128x4)#Internal DebugInfo: +2025-11-04T21:38:50Z WARNING 9044 [birverifier::InstVisitor]: (nc00/sg01) Non - output memory location with no reader: {I-2433-0_s0_aten__mul_broadcast.7-t210_b3}@SB<0,70404>(128x4)#Internal DebugInfo: +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [TensorCopyElim]: Tensor CP elimination: 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PreSched]: Start build flow dependencies Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [build_flow_deps]: Start build fdeps. Invocation: 9Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: birverifier finished after 0.008 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 339mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1294 memory location(s), 1 block(s), and 3369 instruction(s). Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running dynamic_dma_scan +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to dynamic_dma_scan: modules=1 functions=1 allocs=1294 blocks=1 instructions=3369 Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: dynamic_dma_scan finished after 0.001 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 339mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1294 memory location(s), 1 block(s), and 3369 instruction(s). Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running build_fdeps +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to build_fdeps: modules=1 functions=1 allocs=1294 blocks=1 instructions=3369 Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [build_flow_deps]: Start build fdeps. Invocation: 10Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [build_flow_deps]: Allocs: 1294 instructions: 3369 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [TensorCopyElim]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [build_flow_deps]: Allocs: 3089 instructions: 14215 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [build_flow_deps]: Build fdeps inserted 8737 edges +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [build_flow_deps]: Done build fdeps 8737 Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: build_fdeps finished after 0.016 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 340mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1294 memory location(s), 1 block(s), and 3369 instruction(s). Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running remove_redundancies +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to remove_redundancies: modules=1 functions=1 allocs=1294 blocks=1 instructions=3369 Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [RemoveRedundancies]: remove_clobbered_writes +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [RemoveRedundancies]: remove_clobbered_writes: 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [RemoveRedundancies]: remove_useless_insts +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [TensorCopyElim]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [RemoveRedundancies]: remove Useless Instructions: 28 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [TensorCopyElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [TensorCopyElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: tensor_copy_elim finished after 0.046 seconds +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: remove_redundancies finished after 0.003 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 340mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 340mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1282 memory location(s), 1 block(s), and 3341 instruction(s). Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running anti_dependency_analyzer +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=1282 blocks=1 instructions=3341 Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB} +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2640 memory location(s), 1 block(s), and 13438 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running dynamic_dma_setup +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to dynamic_dma_setup: modules=1 functions=1 allocs=2640 blocks=1 instructions=13438 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: dynamic_dma_setup finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 339mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2641 memory location(s), 1 block(s), and 13438 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running runtime_memory_reservation +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to runtime_memory_reservation: modules=1 functions=1 allocs=2641 blocks=1 instructions=13438 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: runtime_memory_reservation finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 339mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2642 memory location(s), 1 block(s), and 13438 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running lower_klir_kernel +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to lower_klir_kernel: modules=1 functions=1 allocs=2642 blocks=1 instructions=13438 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: lower_klir_kernel finished after 0.003 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 340mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2642 memory location(s), 1 block(s), and 13438 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running lower_nki_kernel +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to lower_nki_kernel: modules=1 functions=1 allocs=2642 blocks=1 instructions=13438 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: lower_nki_kernel finished after 0.001 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 340mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2642 memory location(s), 1 block(s), and 13438 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running coloring_allocator_psum +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to coloring_allocator_psum: modules=1 functions=1 allocs=2642 blocks=1 instructions=13438 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PSUM_Allocator]: allocating PSUM +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PSUM_Allocator]: main loop +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PSUM_Allocator]: renumber locations +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PSUM_Allocator]: size = 1062 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PSUM_Allocator]: build_no_bitmap start +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PSUM_Allocator]: 100% PSUM demand before spilling +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PSUM_Allocator]: PSUM high-water mark = 8 tensors +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PSUM_Allocator]: found 1273 edges +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PSUM_Allocator]: mean: 2.39736 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PSUM_Allocator]: median: 1.98918 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PSUM_Allocator]: adjacency vectors require 10184 bytes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PSUM_Allocator]: build_no_bitmap done +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PSUM_Allocator]: find costs +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: anti_dependency_analyzer finished after 0.050 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 345mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1282 memory location(s), 1 block(s), and 3341 instruction(s). Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PSUM_Allocator]: best-of-n loop, heuristic = 0, allow_psum_spill_within_accum_group = false +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PSUM_Allocator]: simplify interference graph +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PSUM_Allocator]: initialize low and high +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PSUM_Allocator]: lo = 988 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PSUM_Allocator]: hi = 74 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PSUM_Allocator]: inf = 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PSUM_Allocator]: total = 1062 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PSUM_Allocator]: simplify +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running tensor_copy_elim +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PSUM_Allocator]: new candidates = 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PSUM_Allocator]: select ranges +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=1282 blocks=1 instructions=3341 Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PSUM_Allocator]: no more spills +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PSUM_Allocator]: PSUM score = 0 (lower is better) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PSUM_Allocator]: spilling from PSUM cost about 0 cycles +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [PSUM_Allocator]: 100% PSUM utilization after allocation +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: coloring_allocator_psum finished after 0.052 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 346mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2642 memory location(s), 1 block(s), and 13438 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running dma_optimization_psum +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to dma_optimization_psum: modules=1 functions=1 allocs=2642 blocks=1 instructions=13438 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [TensorCopyElim]: Tensor CP elimination: 32 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [TensorCopyElim]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: tensor_copy_elim finished after 0.030 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 346mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1249 memory location(s), 1 block(s), and 3309 instruction(s). Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running dead_code_elim_o0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to dead_code_elim_o0: modules=1 functions=1 allocs=1249 blocks=1 instructions=3309 Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z USER 9044 (nc00/sg01) [ModuleForkPass]: dead_code_elim_o0 finished after 0.003 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [build_flow_deps]: Build fdeps inserted 47058 edges +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 346mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [build_flow_deps]: Done build fdeps 47058 Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PreSched]: End build flow dependencies Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PreSched]: Start remove useless insts Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PreSched]: remove_useless_insts +2025-11-04T21:38:50Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1249 memory location(s), 1 block(s), and 3309 instruction(s). Max writers: 33 Max Readers: 248 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload instructions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload memory locations +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: dma_optimization_psum finished after 0.033 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 346mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2642 memory location(s), 1 block(s), and 13438 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running address_rotation_psum +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to address_rotation_psum: modules=1 functions=1 allocs=2642 blocks=1 instructions=13438 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PreSched]: remove Useless Instructions: 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PreSched]: End remove useless insts Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PreSched]: Start scratchpad optimization Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PreSched]: End scratchpad optimization Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PreSched]: DONE PRE scheduling Tue Nov 4 21:38:50 2025 +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: pre_sched finished after 0.261 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 347mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3089 memory location(s), 1 block(s), and 14215 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running tensor_copy_elim +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=3089 blocks=1 instructions=14215 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: PSUM Rotation rotated 0 PSUM Banks +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [TensorCopyElim]: Tensor CP elimination: 63 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [TensorCopyElim]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: PSUM Rotation rotated 2 PSUM Banks +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [TensorCopyElim]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [TensorCopyElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [TensorCopyElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: PSUM Rotation rotated 4 PSUM Banks +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: address_rotation_psum finished after 0.111 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 346mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2642 memory location(s), 1 block(s), and 13438 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running coloring_allocator_sb +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to coloring_allocator_sb: modules=1 functions=1 allocs=2642 blocks=1 instructions=13438 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes loaded 199672978 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ColoringAllocator::Rep]: INFO: Pre GCA average loaded DMA size 3422 bytes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes saved 6444544 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ColoringAllocator::Rep]: INFO: Pre GCA average saved DMA size 3510 bytes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 4100 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 241 bytes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: tensor_copy_elim finished after 0.069 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 342mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3026 memory location(s), 1 block(s), and 14152 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running dynamic_dma_setup +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to dynamic_dma_setup: modules=1 functions=1 allocs=3026 blocks=1 instructions=14152 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: dynamic_dma_setup finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 342mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3027 memory location(s), 1 block(s), and 14152 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running runtime_memory_reservation +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to runtime_memory_reservation: modules=1 functions=1 allocs=3027 blocks=1 instructions=14152 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: runtime_memory_reservation finished after 0.000 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 342mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3028 memory location(s), 1 block(s), and 14152 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running lower_klir_kernel +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to lower_klir_kernel: modules=1 functions=1 allocs=3028 blocks=1 instructions=14152 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: lower_klir_kernel finished after 0.002 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 342mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3028 memory location(s), 1 block(s), and 14152 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running lower_nki_kernel +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to lower_nki_kernel: modules=1 functions=1 allocs=3028 blocks=1 instructions=14152 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: lower_nki_kernel finished after 0.002 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 342mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3028 memory location(s), 1 block(s), and 14152 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running coloring_allocator_psum +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to coloring_allocator_psum: modules=1 functions=1 allocs=3028 blocks=1 instructions=14152 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: allocating SB +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: main loop +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: renumber locations +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: size = 1540 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: find partners +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: found 1057 accumulation groups +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: largest = _dot.199-t1193_i18 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: tensors = 36 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: requires 49152 bytes/partition +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: expanding partners +2025-11-04T21:38:50Z INFO 9044 []: find first defs for local +2025-11-04T21:38:50Z INFO 9044 []: find first defs for global +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PSUM_Allocator]: allocating PSUM +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PSUM_Allocator]: main loop +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PSUM_Allocator]: renumber locations +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PSUM_Allocator]: size = 1186 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: find loads +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: 2 pin count +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: 371 remat count +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: 2 pinned tensors will require about 16392 bytes/partition +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: build interference graph +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: pass 1 int-tree +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PSUM_Allocator]: build_no_bitmap start +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PSUM_Allocator]: 100% PSUM demand before spilling +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PSUM_Allocator]: PSUM high-water mark = 8 tensors +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PSUM_Allocator]: found 1335 edges +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PSUM_Allocator]: mean: 2.25126 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PSUM_Allocator]: median: 1.82863 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PSUM_Allocator]: adjacency vectors require 10680 bytes +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PSUM_Allocator]: build_no_bitmap done +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PSUM_Allocator]: find costs +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: Num intervals 1540 Num locations 1540 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: IntervalTree Build Done +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: info.neighbors init Done +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: info.neighbors partners Done +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: IntervalTree readback Done +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: edge: 16008 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: mean: 20.7896 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: median: 14.7752 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: find costs +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PSUM_Allocator]: best-of-n loop, heuristic = 0, allow_psum_spill_within_accum_group = false +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PSUM_Allocator]: simplify interference graph +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PSUM_Allocator]: initialize low and high +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PSUM_Allocator]: lo = 1112 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PSUM_Allocator]: hi = 74 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PSUM_Allocator]: inf = 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PSUM_Allocator]: total = 1186 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PSUM_Allocator]: simplify +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PSUM_Allocator]: new candidates = 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PSUM_Allocator]: select ranges +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: best-of-n loop, heuristic = 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: simplify interference graph +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: initialize safe & unsafe +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: safe = 1415 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: unsafe = 106 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: inf = 17 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: total = 1538 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: simplify +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: simplify_step3_sorted2 #Unsafe 0 #Pinned 0 #Safe 0 minCost 1.79769e+308 maxCost 2.22507e-308 locations 1540 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: new candidates = 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PSUM_Allocator]: no more spills +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PSUM_Allocator]: PSUM score = 0 (lower is better) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PSUM_Allocator]: spilling from PSUM cost about 0 cycles +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [PSUM_Allocator]: 100% PSUM utilization after allocation +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: select ranges +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: coloring_allocator_psum finished after 0.116 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 349mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3028 memory location(s), 1 block(s), and 14152 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running dma_optimization_psum +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to dma_optimization_psum: modules=1 functions=1 allocs=3028 blocks=1 instructions=14152 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: Total: 1538 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: Spilled: 0.000 (0) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: Allocated: 1.000 (1538) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: Rover zone: 0.964 (1482) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: Pre-rover zone: 0.012 (19) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: Post-rover zone: 0.024 (37) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: Slice zone: 0.000 (0) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: Blocks nothing: 0.015 (23) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: Blocks medium: 0.001 (2) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: Visited until medium blocking (mean): 0.716 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: Visited until medium blocking (median): 0.714 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: Visited until medium blocking (p95): 0.714 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: Blocks tall: 0.984 (1513) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: Visited until tall blocking (mean): 0.789 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: Visited until tall blocking (median): 0.998 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: Visited until tall blocking (p95): 1.000 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: Success +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: SB spills = 0 tensors +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: size = 0 bytes/partition +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: remats = 0 tensors +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: unpinned = 0 tensors +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: size = 0 bytes/partition +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: SB score = 0 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: spilling from SB cost about 0 cycles +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: 16392 bytes/partition (100%) successfully pinned +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: pinning saved approximately 8300 cycles +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [SB_Allocator]: 0% SB utilization after allocation +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload instructions +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload memory locations +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: dma_optimization_psum finished after 0.037 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 346mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3028 memory location(s), 1 block(s), and 14152 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running address_rotation_psum +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to address_rotation_psum: modules=1 functions=1 allocs=3028 blocks=1 instructions=14152 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes loaded 199672978 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ColoringAllocator::Rep]: INFO: Post GCA average loaded DMA size 3422 bytes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes saved 6444544 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ColoringAllocator::Rep]: INFO: Post GCA average saved DMA size 3510 bytes +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 4100 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 241 bytes +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: coloring_allocator_sb finished after 0.175 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 342mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2642 memory location(s), 1 block(s), and 13438 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running address_rotation_sb +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=2642 blocks=1 instructions=13438 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: PSUM Rotation rotated 62 PSUM Banks +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: address_rotation_sb finished after 0.049 seconds +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 344mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2642 memory location(s), 1 block(s), and 13438 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running dma_optimization_sb +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to dma_optimization_sb: modules=1 functions=1 allocs=2642 blocks=1 instructions=13438 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: PSUM Rotation rotated 2 PSUM Banks +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: DMA optimization In bytes loaded or saved 206117522, 93.8194% input load, 0% output write, 6.18055% spill/reload [sg0002] +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: removed 0 identical load +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: adjusted 0 DMACopy remat +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: PSUM Rotation rotated 3 PSUM Banks +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: address_rotation_psum finished after 0.084 seconds +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 345mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3028 memory location(s), 1 block(s), and 14152 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running coloring_allocator_sb +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to coloring_allocator_sb: modules=1 functions=1 allocs=3028 blocks=1 instructions=14152 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes loaded 200308382 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ColoringAllocator::Rep]: INFO: Pre GCA average loaded DMA size 3398 bytes +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes saved 6459915 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ColoringAllocator::Rep]: INFO: Pre GCA average saved DMA size 2894 bytes +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 4100 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 241 bytes +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: sub-graph will get execute 1 times +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: [Load Merging]: removed 0 remat/cloned instructions +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: allocating SB +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: main loop +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: renumber locations +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: size = 1792 +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: [Load shrink]: shrinked 0 GCA remat/cloned instructions +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: find partners +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: found 1181 accumulation groups +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: largest = _dot.199-t1193_i3 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: tensors = 36 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: requires 49152 bytes/partition +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: expanding partners +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: [Load Merging + Load shrink] reduced input/const loading DMA traffic 0, 0% out of total dma traffic(1.93378e+08) +2025-11-04T21:38:50Z INFO 9044 []: find first defs for local +2025-11-04T21:38:50Z INFO 9044 []: find first defs for global +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload instructions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload memory locations +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: find loads +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: 2 pin count +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: 381 remat count +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: 2 pinned tensors will require about 16392 bytes/partition +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: build interference graph +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: pass 1 int-tree +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: Num intervals 1792 Num locations 1792 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: IntervalTree Build Done +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: info.neighbors init Done +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: info.neighbors partners Done +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: IntervalTree readback Done +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: edge: 17592 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: mean: 19.6339 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: median: 13.0344 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: find costs +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: best-of-n loop, heuristic = 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: simplify interference graph +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: initialize safe & unsafe +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload instructions +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload memory locations +2025-11-04T21:38:50Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: [Re-allocation Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: safe = 1665 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: unsafe = 108 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: inf = 17 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: total = 1790 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: simplify +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: simplify_step3_sorted2 #Unsafe 0 #Pinned 0 #Safe 0 minCost 1.79769e+308 maxCost 2.22507e-308 locations 1792 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: new candidates = 0 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: select ranges +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: Total: 1790 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: Spilled: 0.000 (0) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: Allocated: 1.000 (1790) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: Rover zone: 0.938 (1679) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: Pre-rover zone: 0.035 (62) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: Post-rover zone: 0.025 (45) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: Slice zone: 0.002 (4) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: Blocks nothing: 0.063 (113) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: Blocks medium: 0.007 (12) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: Visited until medium blocking (mean): 0.588 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: Visited until medium blocking (median): 0.612 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: Visited until medium blocking (p95): 0.842 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: Blocks tall: 0.930 (1665) +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: Visited until tall blocking (mean): 0.709 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: Visited until tall blocking (median): 0.975 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: Visited until tall blocking (p95): 1.000 +2025-11-04T21:38:50Z INFO 9044 (nc00/sg02) [SB_Allocator]: Success +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload instructions +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload memory locations +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: [remove extra save] removed 0 memlocs and 0 instructions +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: [remove_memset_spill]: removed 0 spill/reload instructions +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: [remove_memset_spill]: removed 0 spill/reload memory locations +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: DMA SpillSave Coalescing Round 0 combined 0 SpillSaves and Reloads +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: average loaded DMA size 3422 bytes +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: average saved DMA size 3510 bytes +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes loaded 199672978 +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: INFO: Post DMA coalescing average loaded DMA size 3422 bytes +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes saved 6444544 +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: INFO: Post DMA coalescing average saved DMA size 3510 bytes +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: [Experiment partial DMA access] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: [DMA optimization] reduced DMA traffic 0, 0% out of total dma traffic +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: DMA optimization Out bytes loaded or saved 206117522, 93.8194% input load, 0% output write, 6.18055% spill/reload [sg0002] +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes loaded 199672978 +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: INFO: Post DMA optimization average loaded DMA size 3422 bytes +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes saved 6444544 +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: INFO: Post DMA optimization average saved DMA size 3510 bytes +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes DMAcopyed 4100 +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: INFO: Post DMA optimization average DMAcopyed DMA size 241 bytes +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: INFO: Post DMA optimization average DMA size 3423 bytes +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: INFO: Finished set_spill_canreadUninit(module); +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: DMA optimization re-enable optimization +2025-11-04T21:38:51Z USER 9044 (nc01/sg02) [ModuleForkPass]: dma_optimization_sb finished after 0.242 seconds +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 351mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2641 memory location(s), 1 block(s), and 13438 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running address_rotation_sb +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=2641 blocks=1 instructions=13438 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: SB Rotation rotated 3 Sb address +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: SB Rotation rotated 197 Sb address +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [SB_Allocator]: SB spills = 0 tensors +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [SB_Allocator]: size = 0 bytes/partition +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [SB_Allocator]: remats = 0 tensors +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [SB_Allocator]: unpinned = 0 tensors +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [SB_Allocator]: size = 0 bytes/partition +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [SB_Allocator]: SB score = 0 +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [SB_Allocator]: spilling from SB cost about 0 cycles +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [SB_Allocator]: 16392 bytes/partition (100%) successfully pinned +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [SB_Allocator]: pinning saved approximately 8300 cycles +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [SB_Allocator]: 0% SB utilization after allocation +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes loaded 200308382 +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [ColoringAllocator::Rep]: INFO: Post GCA average loaded DMA size 3398 bytes +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes saved 6459915 +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [ColoringAllocator::Rep]: INFO: Post GCA average saved DMA size 2894 bytes +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 4100 +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 241 bytes +2025-11-04T21:38:51Z USER 9044 (nc00/sg02) [ModuleForkPass]: coloring_allocator_sb finished after 0.291 seconds +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 350mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3028 memory location(s), 1 block(s), and 14152 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running address_rotation_sb +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=3028 blocks=1 instructions=14152 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:51Z USER 9044 (nc00/sg02) [ModuleForkPass]: address_rotation_sb finished after 0.021 seconds +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 347mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3028 memory location(s), 1 block(s), and 14152 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running dma_optimization_sb +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to dma_optimization_sb: modules=1 functions=1 allocs=3028 blocks=1 instructions=14152 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: DMA optimization In bytes loaded or saved 206768297, 93.6771% input load, 1.93453e-06% output write, 6.32293% spill/reload [sg0002] +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: removed 0 identical load +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: adjusted 0 DMACopy remat +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: sub-graph will get execute 1 times +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: [Load Merging]: removed 0 remat/cloned instructions +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: [Load shrink]: shrinked 0 GCA remat/cloned instructions +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: [Load Merging + Load shrink] reduced input/const loading DMA traffic 0, 0% out of total dma traffic(1.93694e+08) +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload instructions +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload memory locations +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload instructions +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload memory locations +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: [Re-allocation Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload instructions +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload memory locations +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: [remove extra save] removed 0 memlocs and 0 instructions +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: [remove_memset_spill]: removed 0 spill/reload instructions +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: [remove_memset_spill]: removed 0 spill/reload memory locations +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: SB Rotation rotated 17 Sb address +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: DMA SpillSave Coalescing Round 0 combined 0 SpillSaves and Reloads +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: average loaded DMA size 3398 bytes +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: average saved DMA size 2894 bytes +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes loaded 200308382 +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: INFO: Post DMA coalescing average loaded DMA size 3398 bytes +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes saved 6459915 +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: INFO: Post DMA coalescing average saved DMA size 2894 bytes +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: SB Rotation rotated 2 Sb address +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: [Experiment partial DMA access] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: [DMA optimization] reduced DMA traffic 0, 0% out of total dma traffic +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: DMA optimization Out bytes loaded or saved 206768297, 93.6771% input load, 1.93453e-06% output write, 6.32293% spill/reload [sg0002] +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes loaded 200308382 +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: INFO: Post DMA optimization average loaded DMA size 3398 bytes +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes saved 6459915 +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: INFO: Post DMA optimization average saved DMA size 2894 bytes +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes DMAcopyed 4100 +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: INFO: Post DMA optimization average DMAcopyed DMA size 241 bytes +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: INFO: Post DMA optimization average DMA size 3378 bytes +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: INFO: Finished set_spill_canreadUninit(module); +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: DMA optimization re-enable optimization +2025-11-04T21:38:51Z USER 9044 (nc00/sg02) [ModuleForkPass]: dma_optimization_sb finished after 0.180 seconds +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 349mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3027 memory location(s), 1 block(s), and 14152 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running address_rotation_sb +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=3027 blocks=1 instructions=14152 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: SB Rotation rotated 72 Sb address +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: SB Rotation rotated 6 Sb address +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:51Z USER 9044 (nc01/sg02) [ModuleForkPass]: address_rotation_sb finished after 0.343 seconds +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 348mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2641 memory location(s), 1 block(s), and 13438 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running coloring_allocator_dram +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to coloring_allocator_dram: modules=1 functions=1 allocs=2641 blocks=1 instructions=13438 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DRAM_Allocator]: allocating spills in DRAM pre_link mode for address space Local +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DRAM_Allocator]: reserved space = 32768 bytes +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DRAM_Allocator]: spill space = 2097152 bytes +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DRAM_Allocator]: aligned spill space = 2097152 bytes +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DRAM_Allocator]: dram space = 107374182400 bytes +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DRAM_Allocator]: renumber locations +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DRAM_Allocator]: size = 4 +2025-11-04T21:38:51Z INFO 9044 []: find first defs for local +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: SB Rotation rotated 210 Sb address +2025-11-04T21:38:51Z INFO 9044 []: find first defs for global +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DRAM_Allocator]: Num intervals 4 Num locations 4 +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DRAM_Allocator]: IntervalTree Build Done +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DRAM_Allocator]: info.neighbors init Done +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DRAM_Allocator]: IntervalTree readback Done +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DRAM_Allocator]: simplify interference graph +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DRAM_Allocator]: initialize low and high +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DRAM_Allocator]: lo = 4 +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DRAM_Allocator]: hi = 0 +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DRAM_Allocator]: total = 4 +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DRAM_Allocator]: simplify +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DRAM_Allocator]: new candidates = 0 +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DRAM_Allocator]: select ranges +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DRAM_Allocator]: CC buffer size limit 524288000 +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DRAM_Allocator]: allreduce_dram_hwm 0 +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DRAM_Allocator]: Real CC buffer size 0 +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DRAM_Allocator]: DRAM hwm after allocation: 2097152 +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DRAM_Allocator]: DRAM allocation successful +2025-11-04T21:38:51Z USER 9044 (nc01/sg02) [ModuleForkPass]: coloring_allocator_dram finished after 0.060 seconds +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 353mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2641 memory location(s), 1 block(s), and 13438 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running address_rotation_dram +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to address_rotation_dram: modules=1 functions=1 allocs=2641 blocks=1 instructions=13438 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: Runtime page size at 512MB +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: DRAM hwm before rotation 2097152 +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: allreduce buffer size 524288000 +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: allreduce hwm 4194304 +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: Real CC buffer size 4194304 +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: DRAM hwm after rotation 2097152 +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: DRAM Rotation rotated 0 Dram address +2025-11-04T21:38:51Z USER 9044 (nc01/sg02) [ModuleForkPass]: address_rotation_dram finished after 0.019 seconds +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 351mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2641 memory location(s), 1 block(s), and 13438 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running tensorcopy_accel +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to tensorcopy_accel: modules=1 functions=1 allocs=2641 blocks=1 instructions=13438 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [TensorCopyAccel::Impl]: Running peephole optimization pass +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [TensorCopyAccel::Impl]: Accelerated 601 out of 1262 tensorcopy in Function: sg0002 average acceleration factor: 1 +2025-11-04T21:38:51Z USER 9044 (nc01/sg02) [ModuleForkPass]: tensorcopy_accel finished after 0.007 seconds +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 353mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2641 memory location(s), 1 block(s), and 13438 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running peephole_opts +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to peephole_opts: modules=1 functions=1 allocs=2641 blocks=1 instructions=13438 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [PeepholeOpts]: PeepholeOpts enabled? Recip: true Tsp: true Tc: false SplitSelect: true SimplifyMemset true +2025-11-04T21:38:51Z USER 9044 (nc01/sg02) [ModuleForkPass]: peephole_opts finished after 0.006 seconds +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 352mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2641 memory location(s), 1 block(s), and 13441 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running lower_kernel +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to lower_kernel: modules=1 functions=1 allocs=2641 blocks=1 instructions=13441 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [LowerKernel]: Started running LowerKernel +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [LowerKernel]: BIR SB coloring allocator is disabled +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [LowerKernel]: Start of kernel lowering pass, number of insts: 13441, number of allocs: 2641 +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [LowerKernel]: Scan BKs time (s): 0.001417 +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [LowerKernel]: Lower BKs time (s): 1e-06 +2025-11-04T21:38:51Z USER 9044 (nc01/sg02) [ModuleForkPass]: lower_kernel finished after 0.002 seconds +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 352mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2641 memory location(s), 1 block(s), and 13441 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running lower_klir_kernel +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to lower_klir_kernel: modules=1 functions=1 allocs=2641 blocks=1 instructions=13441 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z USER 9044 (nc01/sg02) [ModuleForkPass]: lower_klir_kernel finished after 0.002 seconds +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 352mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2641 memory location(s), 1 block(s), and 13441 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running lower_nki_kernel +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to lower_nki_kernel: modules=1 functions=1 allocs=2641 blocks=1 instructions=13441 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z USER 9044 (nc01/sg02) [ModuleForkPass]: lower_nki_kernel finished after 0.001 seconds +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 352mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2641 memory location(s), 1 block(s), and 13441 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running non_ssa_legalization +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to non_ssa_legalization: modules=1 functions=1 allocs=2641 blocks=1 instructions=13441 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [NonSSALeg]: remove_redundant_loads +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [NonSSALeg]: remove_redundant_loads: 0 +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [NonSSALeg]: [Non-SSA legalization]created 0 memorylocations +2025-11-04T21:38:51Z USER 9044 (nc01/sg02) [ModuleForkPass]: non_ssa_legalization finished after 0.012 seconds +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 353mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2641 memory location(s), 1 block(s), and 13441 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running dynamic_dma_cleanup +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to dynamic_dma_cleanup: modules=1 functions=1 allocs=2641 blocks=1 instructions=13441 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z USER 9044 (nc01/sg02) [ModuleForkPass]: dynamic_dma_cleanup finished after 0.002 seconds +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 352mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2641 memory location(s), 1 block(s), and 13441 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running birverifier +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=2641 blocks=1 instructions=13441 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z WARNING 9044 [birverifier::InstVisitor]: (nc01/sg02) Non - output memory location with no reader: {divide.1_1267_i1}@SB<32,16384>(1x1024)#Internal DebugInfo: +2025-11-04T21:38:51Z WARNING 9044 [birverifier::InstVisitor]: (nc01/sg02) Non - output memory location with no reader: {select.5_1272_i1}@SB<96,17536>(1x1024)#Internal DebugInfo: +2025-11-04T21:38:51Z USER 9044 (nc01/sg02) [ModuleForkPass]: birverifier finished after 0.032 seconds +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 353mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2641 memory location(s), 1 block(s), and 13441 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running dynamic_dma_scan +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to dynamic_dma_scan: modules=1 functions=1 allocs=2641 blocks=1 instructions=13441 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z USER 9044 (nc01/sg02) [ModuleForkPass]: dynamic_dma_scan finished after 0.022 seconds +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 353mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2641 memory location(s), 1 block(s), and 13441 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running build_fdeps +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to build_fdeps: modules=1 functions=1 allocs=2641 blocks=1 instructions=13441 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [build_flow_deps]: Start build fdeps. Invocation: 11Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [build_flow_deps]: Allocs: 2641 instructions: 13441 +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: SB Rotation rotated 46 Sb address +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [build_flow_deps]: Build fdeps inserted 35148 edges +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [build_flow_deps]: Done build fdeps 35148 Tue Nov 4 21:38:51 2025 +2025-11-04T21:38:51Z USER 9044 (nc01/sg02) [ModuleForkPass]: build_fdeps finished after 0.076 seconds +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 356mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2641 memory location(s), 1 block(s), and 13441 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running remove_redundancies +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to remove_redundancies: modules=1 functions=1 allocs=2641 blocks=1 instructions=13441 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [RemoveRedundancies]: remove_clobbered_writes +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [RemoveRedundancies]: remove_clobbered_writes: 0 +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [RemoveRedundancies]: remove_useless_insts +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [RemoveRedundancies]: remove Useless Instructions: 0 +2025-11-04T21:38:51Z USER 9044 (nc01/sg02) [ModuleForkPass]: remove_redundancies finished after 0.010 seconds +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 356mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2641 memory location(s), 1 block(s), and 13441 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running anti_dependency_analyzer +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=2641 blocks=1 instructions=13441 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB} +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: SB Rotation rotated 4 Sb address +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: SB Rotation rotated 164 Sb address +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:51Z USER 9044 (nc00/sg02) [ModuleForkPass]: address_rotation_sb finished after 0.397 seconds +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 368mb, ru_maxrss: 373mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3027 memory location(s), 1 block(s), and 14152 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running coloring_allocator_dram +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to coloring_allocator_dram: modules=1 functions=1 allocs=3027 blocks=1 instructions=14152 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: allocating spills in DRAM pre_link mode for address space Local +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: reserved space = 34824 bytes +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: spill space = 2104324 bytes +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: aligned spill space = 2125824 bytes +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: dram space = 107374182400 bytes +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: renumber locations +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: size = 11 +2025-11-04T21:38:51Z INFO 9044 []: find first defs for local +2025-11-04T21:38:51Z INFO 9044 []: find first defs for global +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: Num intervals 11 Num locations 11 +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: IntervalTree Build Done +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: info.neighbors init Done +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: IntervalTree readback Done +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: simplify interference graph +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: initialize low and high +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: lo = 11 +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: hi = 0 +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: total = 11 +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: simplify +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: new candidates = 0 +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: select ranges +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: CC buffer size limit 524288000 +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: allreduce_dram_hwm 0 +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: Real CC buffer size 0 +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: DRAM hwm after allocation: 2097152 +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: DRAM allocation successful +2025-11-04T21:38:51Z USER 9044 (nc00/sg02) [ModuleForkPass]: coloring_allocator_dram finished after 0.091 seconds +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 377mb, ru_maxrss: 377mb (delta=4mb) +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3027 memory location(s), 1 block(s), and 14152 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running address_rotation_dram +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to address_rotation_dram: modules=1 functions=1 allocs=3027 blocks=1 instructions=14152 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: Runtime page size at 512MB +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: DRAM hwm before rotation 2097152 +2025-11-04T21:38:51Z USER 9044 (nc01/sg02) [ModuleForkPass]: anti_dependency_analyzer finished after 0.134 seconds +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 364mb, ru_maxrss: 377mb (delta=4mb) +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2641 memory location(s), 1 block(s), and 13441 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running tensor_copy_elim +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=2641 blocks=1 instructions=13441 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: allreduce buffer size 524288000 +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: allreduce hwm 4194304 +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: Real CC buffer size 4194304 +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [TensorCopyElim]: Tensor CP elimination: 0 +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [TensorCopyElim]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: DRAM hwm after rotation 2097152 +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: DRAM Rotation rotated 0 Dram address +2025-11-04T21:38:51Z USER 9044 (nc00/sg02) [ModuleForkPass]: address_rotation_dram finished after 0.055 seconds +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 365mb, ru_maxrss: 377mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3027 memory location(s), 1 block(s), and 14152 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running tensorcopy_accel +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to tensorcopy_accel: modules=1 functions=1 allocs=3027 blocks=1 instructions=14152 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [TensorCopyAccel::Impl]: Running peephole optimization pass +2025-11-04T21:38:51Z USER 9044 (nc01/sg02) [ModuleForkPass]: tensor_copy_elim finished after 0.030 seconds +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 365mb, ru_maxrss: 377mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2641 memory location(s), 1 block(s), and 13441 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running dead_code_elim_o0 +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to dead_code_elim_o0: modules=1 functions=1 allocs=2641 blocks=1 instructions=13441 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [TensorCopyAccel::Impl]: Accelerated 601 out of 1401 tensorcopy in Function: sg0002 average acceleration factor: 1 +2025-11-04T21:38:51Z USER 9044 (nc00/sg02) [ModuleForkPass]: tensorcopy_accel finished after 0.014 seconds +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 365mb, ru_maxrss: 377mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3027 memory location(s), 1 block(s), and 14152 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running peephole_opts +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to peephole_opts: modules=1 functions=1 allocs=3027 blocks=1 instructions=14152 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [PeepholeOpts]: PeepholeOpts enabled? Recip: true Tsp: true Tc: false SplitSelect: true SimplifyMemset true +2025-11-04T21:38:51Z USER 9044 (nc01/sg02) [ModuleForkPass]: dead_code_elim_o0 finished after 0.012 seconds +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 364mb, ru_maxrss: 377mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2641 memory location(s), 1 block(s), and 13441 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z USER 9044 (nc00/sg02) [ModuleForkPass]: peephole_opts finished after 0.017 seconds +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 364mb, ru_maxrss: 377mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3027 memory location(s), 1 block(s), and 14155 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running lower_kernel +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to lower_kernel: modules=1 functions=1 allocs=3027 blocks=1 instructions=14155 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [LowerKernel]: Started running LowerKernel +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [LowerKernel]: BIR SB coloring allocator is disabled +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [LowerKernel]: Start of kernel lowering pass, number of insts: 14155, number of allocs: 3027 +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [LowerKernel]: Scan BKs time (s): 0.001539 +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [LowerKernel]: Lower BKs time (s): 1e-06 +2025-11-04T21:38:51Z USER 9044 (nc00/sg02) [ModuleForkPass]: lower_kernel finished after 0.002 seconds +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 362mb, ru_maxrss: 377mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3027 memory location(s), 1 block(s), and 14155 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running lower_klir_kernel +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to lower_klir_kernel: modules=1 functions=1 allocs=3027 blocks=1 instructions=14155 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z USER 9044 (nc00/sg02) [ModuleForkPass]: lower_klir_kernel finished after 0.001 seconds +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 362mb, ru_maxrss: 377mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3027 memory location(s), 1 block(s), and 14155 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running lower_nki_kernel +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to lower_nki_kernel: modules=1 functions=1 allocs=3027 blocks=1 instructions=14155 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z USER 9044 (nc00/sg02) [ModuleForkPass]: lower_nki_kernel finished after 0.001 seconds +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 362mb, ru_maxrss: 377mb (delta=0mb) +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3027 memory location(s), 1 block(s), and 14155 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running non_ssa_legalization +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to non_ssa_legalization: modules=1 functions=1 allocs=3027 blocks=1 instructions=14155 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [NonSSALeg]: remove_redundant_loads +2025-11-04T21:38:51Z INFO 9044 (nc00/sg02) [NonSSALeg]: remove_redundant_loads: 0 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [NonSSALeg]: [Non-SSA legalization]created 0 memorylocations +2025-11-04T21:38:52Z USER 9044 (nc00/sg02) [ModuleForkPass]: non_ssa_legalization finished after 0.046 seconds +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 363mb, ru_maxrss: 377mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3027 memory location(s), 1 block(s), and 14155 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:52Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running dynamic_dma_cleanup +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to dynamic_dma_cleanup: modules=1 functions=1 allocs=3027 blocks=1 instructions=14155 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:52Z USER 9044 (nc00/sg02) [ModuleForkPass]: dynamic_dma_cleanup finished after 0.003 seconds +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 363mb, ru_maxrss: 377mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3027 memory location(s), 1 block(s), and 14155 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:52Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running birverifier +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=3027 blocks=1 instructions=14155 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:52Z USER 9044 (nc00/sg02) [ModuleForkPass]: birverifier finished after 0.049 seconds +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 363mb, ru_maxrss: 377mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3027 memory location(s), 1 block(s), and 14155 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:52Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running dynamic_dma_scan +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to dynamic_dma_scan: modules=1 functions=1 allocs=3027 blocks=1 instructions=14155 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:52Z USER 9044 (nc00/sg02) [ModuleForkPass]: dynamic_dma_scan finished after 0.003 seconds +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 363mb, ru_maxrss: 377mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3027 memory location(s), 1 block(s), and 14155 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:52Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running build_fdeps +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to build_fdeps: modules=1 functions=1 allocs=3027 blocks=1 instructions=14155 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [build_flow_deps]: Start build fdeps. Invocation: 12Tue Nov 4 21:38:52 2025 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [build_flow_deps]: Allocs: 3027 instructions: 14155 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [build_flow_deps]: Build fdeps inserted 46999 edges +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [build_flow_deps]: Done build fdeps 46999 Tue Nov 4 21:38:52 2025 +2025-11-04T21:38:52Z USER 9044 (nc00/sg02) [ModuleForkPass]: build_fdeps finished after 0.111 seconds +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 367mb, ru_maxrss: 377mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3027 memory location(s), 1 block(s), and 14155 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:52Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running remove_redundancies +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to remove_redundancies: modules=1 functions=1 allocs=3027 blocks=1 instructions=14155 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [RemoveRedundancies]: remove_clobbered_writes +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [RemoveRedundancies]: remove_clobbered_writes: 0 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [RemoveRedundancies]: remove_useless_insts +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [RemoveRedundancies]: remove Useless Instructions: 0 +2025-11-04T21:38:52Z USER 9044 (nc00/sg02) [ModuleForkPass]: remove_redundancies finished after 0.015 seconds +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 366mb, ru_maxrss: 377mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3027 memory location(s), 1 block(s), and 14155 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:52Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running anti_dependency_analyzer +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=3027 blocks=1 instructions=14155 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB} +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:52Z USER 9044 (nc00/sg02) [ModuleForkPass]: anti_dependency_analyzer finished after 0.116 seconds +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 383mb, ru_maxrss: 383mb (delta=6mb) +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3027 memory location(s), 1 block(s), and 14155 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:52Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running tensor_copy_elim +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=3027 blocks=1 instructions=14155 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [TensorCopyElim]: Tensor CP elimination: 0 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [TensorCopyElim]: eliminateDeadStore removed 0 instructions +2025-11-04T21:38:52Z USER 9044 (nc00/sg02) [ModuleForkPass]: tensor_copy_elim finished after 0.038 seconds +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 374mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3027 memory location(s), 1 block(s), and 14155 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:52Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running dead_code_elim_o0 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to dead_code_elim_o0: modules=1 functions=1 allocs=3027 blocks=1 instructions=14155 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:52Z USER 9044 (nc00/sg02) [ModuleForkPass]: dead_code_elim_o0 finished after 0.013 seconds +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 373mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3027 memory location(s), 1 block(s), and 14155 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:52Z USER 9044 [ModuleForkPass]: Compilation status: Total modules: 6, Passed: 6, Failed: 0 +2025-11-04T21:38:52Z USER 9044 [BackendPassManager]: mod_parallel_pass finished after 2.366 seconds +2025-11-04T21:38:52Z INFO 9044 [BackendPassManager]: curr_vmrss: 373mb, ru_maxrss: 383mb (delta=10mb) +2025-11-04T21:38:52Z USER 9044 [BackendPassManager]: Running subgraph_parallel_pass +2025-11-04T21:38:52Z INFO 9044 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=6 functions=6 allocs=10474 blocks=6 instructions=38260 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:52Z USER 9044 (sg00) [SubgraphForkPass]: Running localize_shared_memory +2025-11-04T21:38:52Z USER 9044 (sg02) [SubgraphForkPass]: Running localize_shared_memory +2025-11-04T21:38:52Z USER 9044 (sg01) [SubgraphForkPass]: Running localize_shared_memory +2025-11-04T21:38:52Z INFO 9044 (sg00) [SubgraphForkPass]: Inputs to localize_shared_memory: modules=2 functions=2 allocs=2309 blocks=2 instructions=4049 Max writers: 33 Max Readers: 224 +2025-11-04T21:38:52Z INFO 9044 (sg01) [SubgraphForkPass]: Inputs to localize_shared_memory: modules=2 functions=2 allocs=2497 blocks=2 instructions=6615 Max writers: 33 Max Readers: 248 +2025-11-04T21:38:52Z INFO 9044 (sg02) [SubgraphForkPass]: Inputs to localize_shared_memory: modules=2 functions=2 allocs=5668 blocks=2 instructions=27596 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:52Z USER 9044 (sg00) [SubgraphForkPass]: localize_shared_memory finished after 0.001 seconds +2025-11-04T21:38:52Z INFO 9044 (sg00) [SubgraphForkPass]: curr_vmrss: 372mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (sg00) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 2309 memory location(s), 2 block(s), and 4049 instruction(s). Max writers: 33 Max Readers: 224 +2025-11-04T21:38:52Z USER 9044 (sg00) [SubgraphForkPass]: Running lower_local_collectives +2025-11-04T21:38:52Z INFO 9044 (sg00) [SubgraphForkPass]: Inputs to lower_local_collectives: modules=2 functions=2 allocs=2309 blocks=2 instructions=4049 Max writers: 33 Max Readers: 224 +2025-11-04T21:38:52Z USER 9044 (sg01) [SubgraphForkPass]: localize_shared_memory finished after 0.001 seconds +2025-11-04T21:38:52Z USER 9044 (sg02) [SubgraphForkPass]: localize_shared_memory finished after 0.001 seconds +2025-11-04T21:38:52Z INFO 9044 (sg02) [SubgraphForkPass]: curr_vmrss: 372mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (sg02) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 5668 memory location(s), 2 block(s), and 27596 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:52Z USER 9044 (sg02) [SubgraphForkPass]: Running lower_local_collectives +2025-11-04T21:38:52Z INFO 9044 (sg01) [SubgraphForkPass]: curr_vmrss: 372mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (sg01) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 2497 memory location(s), 2 block(s), and 6615 instruction(s). Max writers: 33 Max Readers: 248 +2025-11-04T21:38:52Z INFO 9044 (sg02) [SubgraphForkPass]: Inputs to lower_local_collectives: modules=2 functions=2 allocs=5668 blocks=2 instructions=27596 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:52Z USER 9044 (sg01) [SubgraphForkPass]: Running lower_local_collectives +2025-11-04T21:38:52Z INFO 9044 (sg01) [SubgraphForkPass]: Inputs to lower_local_collectives: modules=2 functions=2 allocs=2497 blocks=2 instructions=6615 Max writers: 33 Max Readers: 248 +2025-11-04T21:38:52Z USER 9044 (sg00) [SubgraphForkPass]: lower_local_collectives finished after 0.001 seconds +2025-11-04T21:38:52Z INFO 9044 (sg00) [SubgraphForkPass]: curr_vmrss: 372mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (sg00) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 2309 memory location(s), 2 block(s), and 4053 instruction(s). Max writers: 33 Max Readers: 224 +2025-11-04T21:38:52Z USER 9044 (sg00) [SubgraphForkPass]: Running extend_shared_lifetimes +2025-11-04T21:38:52Z INFO 9044 (sg00) [SubgraphForkPass]: Inputs to extend_shared_lifetimes: modules=2 functions=2 allocs=2309 blocks=2 instructions=4053 Max writers: 33 Max Readers: 224 +2025-11-04T21:38:52Z USER 9044 (sg01) [SubgraphForkPass]: lower_local_collectives finished after 0.002 seconds +2025-11-04T21:38:52Z INFO 9044 (sg01) [SubgraphForkPass]: curr_vmrss: 372mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (sg01) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 2497 memory location(s), 2 block(s), and 6619 instruction(s). Max writers: 33 Max Readers: 248 +2025-11-04T21:38:52Z USER 9044 (sg01) [SubgraphForkPass]: Running extend_shared_lifetimes +2025-11-04T21:38:52Z INFO 9044 (sg01) [SubgraphForkPass]: Inputs to extend_shared_lifetimes: modules=2 functions=2 allocs=2497 blocks=2 instructions=6619 Max writers: 33 Max Readers: 248 +2025-11-04T21:38:52Z USER 9044 (sg00) [SubgraphForkPass]: extend_shared_lifetimes finished after 0.006 seconds +2025-11-04T21:38:52Z INFO 9044 (sg00) [SubgraphForkPass]: curr_vmrss: 372mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (sg00) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 2309 memory location(s), 2 block(s), and 4057 instruction(s). Max writers: 34 Max Readers: 224 +2025-11-04T21:38:52Z USER 9044 (sg02) [SubgraphForkPass]: lower_local_collectives finished after 0.017 seconds +2025-11-04T21:38:52Z INFO 9044 (sg02) [SubgraphForkPass]: curr_vmrss: 372mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (sg02) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 5674 memory location(s), 2 block(s), and 27614 instruction(s). Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:52Z USER 9044 (sg02) [SubgraphForkPass]: Running extend_shared_lifetimes +2025-11-04T21:38:52Z INFO 9044 (sg02) [SubgraphForkPass]: Inputs to extend_shared_lifetimes: modules=2 functions=2 allocs=5674 blocks=2 instructions=27614 Max writers: 298 Max Readers: 5242 +2025-11-04T21:38:52Z USER 9044 (sg01) [SubgraphForkPass]: extend_shared_lifetimes finished after 0.022 seconds +2025-11-04T21:38:52Z INFO 9044 (sg01) [SubgraphForkPass]: curr_vmrss: 372mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (sg01) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 2497 memory location(s), 2 block(s), and 6623 instruction(s). Max writers: 34 Max Readers: 248 +2025-11-04T21:38:52Z USER 9044 (sg02) [SubgraphForkPass]: extend_shared_lifetimes finished after 0.084 seconds +2025-11-04T21:38:52Z INFO 9044 (sg02) [SubgraphForkPass]: curr_vmrss: 372mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (sg02) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 5674 memory location(s), 2 block(s), and 27618 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:52Z USER 9044 [SubgraphForkPass]: Compilation status: Total subgraphs: 3, Passed: 3, Failed: 0 +2025-11-04T21:38:52Z USER 9044 [BackendPassManager]: subgraph_parallel_pass finished after 0.108 seconds +2025-11-04T21:38:52Z INFO 9044 [BackendPassManager]: curr_vmrss: 372mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z USER 9044 [BackendPassManager]: Running mod_parallel_pass +2025-11-04T21:38:52Z INFO 9044 [BackendPassManager]: Inputs to mod_parallel_pass: modules=6 functions=6 allocs=10480 blocks=6 instructions=38298 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:52Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running coloring_allocator_dram_shared +2025-11-04T21:38:52Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running coloring_allocator_dram_shared +2025-11-04T21:38:52Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running coloring_allocator_dram_shared +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to coloring_allocator_dram_shared: modules=1 functions=1 allocs=1249 blocks=1 instructions=3313 Max writers: 34 Max Readers: 248 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:52Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to coloring_allocator_dram_shared: modules=1 functions=1 allocs=1154 blocks=1 instructions=2027 Max writers: 34 Max Readers: 224 +2025-11-04T21:38:52Z INFO 9044 (nc01/sg00) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:52Z INFO 9044 (nc01/sg00) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to coloring_allocator_dram_shared: modules=1 functions=1 allocs=3030 blocks=1 instructions=14166 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:52Z INFO 9044 (nc01/sg00) [DRAM_Allocator]: allocating spills in DRAM pre_link mode for address space Shared +2025-11-04T21:38:52Z INFO 9044 (nc01/sg00) [DRAM_Allocator]: reserved space = 131328 bytes +2025-11-04T21:38:52Z INFO 9044 (nc01/sg00) [DRAM_Allocator]: spill space = 23068672 bytes +2025-11-04T21:38:52Z INFO 9044 (nc01/sg00) [DRAM_Allocator]: aligned spill space = 23068672 bytes +2025-11-04T21:38:52Z INFO 9044 (nc01/sg00) [DRAM_Allocator]: dram space = 107374182400 bytes +2025-11-04T21:38:52Z INFO 9044 (nc01/sg00) [DRAM_Allocator]: Skipping shared tensor allocations on core 1, marking as remoteLocalTarget instead +2025-11-04T21:38:52Z USER 9044 (nc01/sg00) [ModuleForkPass]: coloring_allocator_dram_shared finished after 0.003 seconds +2025-11-04T21:38:52Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running coloring_allocator_dram_shared +2025-11-04T21:38:52Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running coloring_allocator_dram_shared +2025-11-04T21:38:52Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running coloring_allocator_dram_shared +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to coloring_allocator_dram_shared: modules=1 functions=1 allocs=1155 blocks=1 instructions=2030 Max writers: 34 Max Readers: 224 +2025-11-04T21:38:52Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to coloring_allocator_dram_shared: modules=1 functions=1 allocs=1248 blocks=1 instructions=3310 Max writers: 34 Max Readers: 248 +2025-11-04T21:38:52Z INFO 9044 (nc01/sg01) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:52Z INFO 9044 (nc01/sg01) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:52Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to coloring_allocator_dram_shared: modules=1 functions=1 allocs=2644 blocks=1 instructions=13452 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:52Z INFO 9044 (nc01/sg02) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:52Z INFO 9044 (nc01/sg02) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:52Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 373mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1154 memory location(s), 1 block(s), and 2027 instruction(s). Max writers: 34 Max Readers: 224 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: allocating spills in DRAM pre_link mode for address space Shared +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: reserved space = 131328 bytes +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: spill space = 23068672 bytes +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: aligned spill space = 23068672 bytes +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: dram space = 107374182400 bytes +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: renumber locations +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: size = 8 +2025-11-04T21:38:52Z INFO 9044 []: find first defs for local +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: allocating spills in DRAM pre_link mode for address space Shared +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: reserved space = 98304 bytes +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: spill space = 29360128 bytes +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: aligned spill space = 29360128 bytes +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: dram space = 107374182400 bytes +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: renumber locations +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: size = 9 +2025-11-04T21:38:52Z INFO 9044 []: find first defs for local +2025-11-04T21:38:52Z INFO 9044 []: find first defs for global +2025-11-04T21:38:52Z INFO 9044 []: find first defs for global +2025-11-04T21:38:52Z INFO 9044 (nc01/sg01) [DRAM_Allocator]: allocating spills in DRAM pre_link mode for address space Shared +2025-11-04T21:38:52Z INFO 9044 (nc01/sg01) [DRAM_Allocator]: reserved space = 98304 bytes +2025-11-04T21:38:52Z INFO 9044 (nc01/sg01) [DRAM_Allocator]: spill space = 29360128 bytes +2025-11-04T21:38:52Z INFO 9044 (nc01/sg01) [DRAM_Allocator]: aligned spill space = 29360128 bytes +2025-11-04T21:38:52Z INFO 9044 (nc01/sg01) [DRAM_Allocator]: dram space = 107374182400 bytes +2025-11-04T21:38:52Z INFO 9044 (nc01/sg01) [DRAM_Allocator]: Skipping shared tensor allocations on core 1, marking as remoteLocalTarget instead +2025-11-04T21:38:52Z USER 9044 (nc01/sg01) [ModuleForkPass]: coloring_allocator_dram_shared finished after 0.011 seconds +2025-11-04T21:38:52Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 374mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1248 memory location(s), 1 block(s), and 3310 instruction(s). Max writers: 34 Max Readers: 248 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: Num intervals 8 Num locations 8 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: IntervalTree Build Done +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: info.neighbors init Done +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: IntervalTree readback Done +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: simplify interference graph +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: initialize low and high +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: lo = 8 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: hi = 0 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: total = 8 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: simplify +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: new candidates = 0 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: Fall back to default allocation strategy [Core0 Local, Shared] +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: select ranges +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: CC buffer size limit 524288000 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: allreduce_dram_hwm 14680064 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: Real CC buffer size 14680064 +2025-11-04T21:38:52Z INFO 9044 (nc01/sg02) [DRAM_Allocator]: allocating spills in DRAM pre_link mode for address space Shared +2025-11-04T21:38:52Z INFO 9044 (nc01/sg02) [DRAM_Allocator]: reserved space = 2129920 bytes +2025-11-04T21:38:52Z INFO 9044 (nc01/sg02) [DRAM_Allocator]: spill space = 17095682 bytes +2025-11-04T21:38:52Z INFO 9044 (nc01/sg02) [DRAM_Allocator]: aligned spill space = 17141760 bytes +2025-11-04T21:38:52Z INFO 9044 (nc01/sg02) [DRAM_Allocator]: dram space = 107374182400 bytes +2025-11-04T21:38:52Z INFO 9044 (nc01/sg02) [DRAM_Allocator]: Skipping shared tensor allocations on core 1, marking as remoteLocalTarget instead +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: DRAM hwm after allocation: 23068672 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [DRAM_Allocator]: DRAM allocation successful +2025-11-04T21:38:52Z USER 9044 (nc00/sg00) [ModuleForkPass]: coloring_allocator_dram_shared finished after 0.025 seconds +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 374mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1155 memory location(s), 1 block(s), and 2030 instruction(s). Max writers: 34 Max Readers: 224 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: Num intervals 9 Num locations 9 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: IntervalTree Build Done +2025-11-04T21:38:52Z USER 9044 (nc01/sg02) [ModuleForkPass]: coloring_allocator_dram_shared finished after 0.021 seconds +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: info.neighbors init Done +2025-11-04T21:38:52Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 373mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2644 memory location(s), 1 block(s), and 13452 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: allocating spills in DRAM pre_link mode for address space Shared +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: IntervalTree readback Done +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: simplify interference graph +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: initialize low and high +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: lo = 9 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: hi = 0 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: total = 9 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: simplify +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: new candidates = 0 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: Fall back to default allocation strategy [Core0 Local, Shared] +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: select ranges +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: CC buffer size limit 524288000 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: allreduce_dram_hwm 16777216 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: Real CC buffer size 16777216 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: reserved space = 2139148 bytes +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: spill space = 17095682 bytes +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: aligned spill space = 17141760 bytes +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: dram space = 107374182400 bytes +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: renumber locations +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: size = 19 +2025-11-04T21:38:52Z INFO 9044 []: find first defs for local +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: DRAM hwm after allocation: 27262976 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [DRAM_Allocator]: DRAM allocation successful +2025-11-04T21:38:52Z USER 9044 (nc00/sg01) [ModuleForkPass]: coloring_allocator_dram_shared finished after 0.042 seconds +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 373mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1249 memory location(s), 1 block(s), and 3313 instruction(s). Max writers: 34 Max Readers: 248 +2025-11-04T21:38:52Z INFO 9044 []: find first defs for global +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: Num intervals 19 Num locations 19 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: IntervalTree Build Done +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: info.neighbors init Done +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: IntervalTree readback Done +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: simplify interference graph +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: initialize low and high +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: lo = 19 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: hi = 0 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: total = 19 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: simplify +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: new candidates = 0 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: Already used DRAM hwm: 2097152 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: Fall back to default allocation strategy [Core0 Local, Shared] +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: Already used DRAM hwm: 2097152 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: select ranges +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: CC buffer size limit 524288000 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: allreduce_dram_hwm 10502144 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: Real CC buffer size 10502144 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: DRAM hwm after allocation: 15011840 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [DRAM_Allocator]: DRAM allocation successful +2025-11-04T21:38:52Z USER 9044 (nc00/sg02) [ModuleForkPass]: coloring_allocator_dram_shared finished after 0.094 seconds +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 375mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3030 memory location(s), 1 block(s), and 14166 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:52Z USER 9044 [ModuleForkPass]: Compilation status: Total modules: 6, Passed: 6, Failed: 0 +2025-11-04T21:38:52Z USER 9044 [BackendPassManager]: mod_parallel_pass finished after 0.097 seconds +2025-11-04T21:38:52Z INFO 9044 [BackendPassManager]: curr_vmrss: 371mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z USER 9044 [BackendPassManager]: Running subgraph_parallel_pass +2025-11-04T21:38:52Z INFO 9044 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=6 functions=6 allocs=10480 blocks=6 instructions=38298 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:52Z USER 9044 (sg00) [SubgraphForkPass]: Running sync_shared_allocations +2025-11-04T21:38:52Z USER 9044 (sg01) [SubgraphForkPass]: Running sync_shared_allocations +2025-11-04T21:38:52Z INFO 9044 (sg00) [SubgraphForkPass]: Inputs to sync_shared_allocations: modules=2 functions=2 allocs=2309 blocks=2 instructions=4057 Max writers: 34 Max Readers: 224 +2025-11-04T21:38:52Z INFO 9044 (sg01) [SubgraphForkPass]: Inputs to sync_shared_allocations: modules=2 functions=2 allocs=2497 blocks=2 instructions=6623 Max writers: 34 Max Readers: 248 +2025-11-04T21:38:52Z USER 9044 (sg01) [SubgraphForkPass]: sync_shared_allocations finished after 0.001 seconds +2025-11-04T21:38:52Z INFO 9044 (sg01) [SubgraphForkPass]: curr_vmrss: 371mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z USER 9044 (sg02) [SubgraphForkPass]: Running sync_shared_allocations +2025-11-04T21:38:52Z INFO 9044 (sg02) [SubgraphForkPass]: Inputs to sync_shared_allocations: modules=2 functions=2 allocs=5674 blocks=2 instructions=27618 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:52Z INFO 9044 (sg01) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 2497 memory location(s), 2 block(s), and 6623 instruction(s). Max writers: 34 Max Readers: 248 +2025-11-04T21:38:52Z USER 9044 (sg02) [SubgraphForkPass]: sync_shared_allocations finished after 0.001 seconds +2025-11-04T21:38:52Z INFO 9044 (sg02) [SubgraphForkPass]: curr_vmrss: 370mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (sg02) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 5674 memory location(s), 2 block(s), and 27618 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:52Z USER 9044 (sg00) [SubgraphForkPass]: sync_shared_allocations finished after 0.007 seconds +2025-11-04T21:38:52Z INFO 9044 (sg00) [SubgraphForkPass]: curr_vmrss: 370mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (sg00) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 2309 memory location(s), 2 block(s), and 4057 instruction(s). Max writers: 34 Max Readers: 224 +2025-11-04T21:38:52Z USER 9044 [SubgraphForkPass]: Compilation status: Total subgraphs: 3, Passed: 3, Failed: 0 +2025-11-04T21:38:52Z USER 9044 [BackendPassManager]: subgraph_parallel_pass finished after 0.009 seconds +2025-11-04T21:38:52Z INFO 9044 [BackendPassManager]: curr_vmrss: 370mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z USER 9044 [BackendPassManager]: Running mod_parallel_pass +2025-11-04T21:38:52Z INFO 9044 [BackendPassManager]: Inputs to mod_parallel_pass: modules=6 functions=6 allocs=10480 blocks=6 instructions=38298 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:52Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running anti_dependency_analyzer_post_shared_dram +2025-11-04T21:38:52Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running anti_dependency_analyzer_post_shared_dram +2025-11-04T21:38:52Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to anti_dependency_analyzer_post_shared_dram: modules=1 functions=1 allocs=2644 blocks=1 instructions=13452 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:52Z INFO 9044 (nc01/sg02) [AntiDependencyAnalyzer]: Analysis types: {DRAM} +2025-11-04T21:38:52Z INFO 9044 (nc01/sg02) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to anti_dependency_analyzer_post_shared_dram: modules=1 functions=1 allocs=3030 blocks=1 instructions=14166 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [AntiDependencyAnalyzer]: Analysis types: {DRAM} +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:52Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running anti_dependency_analyzer_post_shared_dram +2025-11-04T21:38:52Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running anti_dependency_analyzer_post_shared_dram +2025-11-04T21:38:52Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running anti_dependency_analyzer_post_shared_dram +2025-11-04T21:38:52Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to anti_dependency_analyzer_post_shared_dram: modules=1 functions=1 allocs=1154 blocks=1 instructions=2027 Max writers: 34 Max Readers: 224 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to anti_dependency_analyzer_post_shared_dram: modules=1 functions=1 allocs=1155 blocks=1 instructions=2030 Max writers: 34 Max Readers: 224 +2025-11-04T21:38:52Z INFO 9044 (nc01/sg00) [AntiDependencyAnalyzer]: Analysis types: {DRAM} +2025-11-04T21:38:52Z INFO 9044 (nc01/sg00) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:52Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running anti_dependency_analyzer_post_shared_dram +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [AntiDependencyAnalyzer]: Analysis types: {DRAM} +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to anti_dependency_analyzer_post_shared_dram: modules=1 functions=1 allocs=1249 blocks=1 instructions=3313 Max writers: 34 Max Readers: 248 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [AntiDependencyAnalyzer]: Analysis types: {DRAM} +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:52Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to anti_dependency_analyzer_post_shared_dram: modules=1 functions=1 allocs=1248 blocks=1 instructions=3310 Max writers: 34 Max Readers: 248 +2025-11-04T21:38:52Z INFO 9044 (nc01/sg01) [AntiDependencyAnalyzer]: Analysis types: {DRAM} +2025-11-04T21:38:52Z INFO 9044 (nc01/sg01) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:52Z USER 9044 (nc01/sg00) [ModuleForkPass]: anti_dependency_analyzer_post_shared_dram finished after 0.003 seconds +2025-11-04T21:38:52Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 370mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1154 memory location(s), 1 block(s), and 2027 instruction(s). Max writers: 34 Max Readers: 224 +2025-11-04T21:38:52Z USER 9044 (nc01/sg01) [ModuleForkPass]: anti_dependency_analyzer_post_shared_dram finished after 0.004 seconds +2025-11-04T21:38:52Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 370mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z USER 9044 (nc00/sg01) [ModuleForkPass]: anti_dependency_analyzer_post_shared_dram finished after 0.004 seconds +2025-11-04T21:38:52Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1248 memory location(s), 1 block(s), and 3310 instruction(s). Max writers: 34 Max Readers: 248 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 370mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1249 memory location(s), 1 block(s), and 3313 instruction(s). Max writers: 34 Max Readers: 248 +2025-11-04T21:38:52Z USER 9044 (nc01/sg02) [ModuleForkPass]: anti_dependency_analyzer_post_shared_dram finished after 0.014 seconds +2025-11-04T21:38:52Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 370mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2644 memory location(s), 1 block(s), and 13452 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:52Z USER 9044 (nc00/sg00) [ModuleForkPass]: anti_dependency_analyzer_post_shared_dram finished after 0.007 seconds +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 370mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1155 memory location(s), 1 block(s), and 2030 instruction(s). Max writers: 34 Max Readers: 224 +2025-11-04T21:38:52Z USER 9044 (nc00/sg02) [ModuleForkPass]: anti_dependency_analyzer_post_shared_dram finished after 0.026 seconds +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 370mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3030 memory location(s), 1 block(s), and 14166 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:52Z USER 9044 [ModuleForkPass]: Compilation status: Total modules: 6, Passed: 6, Failed: 0 +2025-11-04T21:38:52Z USER 9044 [BackendPassManager]: mod_parallel_pass finished after 0.028 seconds +2025-11-04T21:38:52Z INFO 9044 [BackendPassManager]: curr_vmrss: 370mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z USER 9044 [BackendPassManager]: Running nc_parallel_pass +2025-11-04T21:38:52Z INFO 9044 [BackendPassManager]: Inputs to nc_parallel_pass: modules=6 functions=6 allocs=10480 blocks=6 instructions=38298 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:52Z USER 9044 (nc01) [CoreForkPass]: Running memory_analysis_after_coloring_allocator_dram_shared +2025-11-04T21:38:52Z INFO 9044 (nc01) [CoreForkPass]: Inputs to memory_analysis_after_coloring_allocator_dram_shared: modules=3 functions=3 allocs=5046 blocks=3 instructions=18789 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:52Z USER 9044 (nc00) [CoreForkPass]: Running memory_analysis_after_coloring_allocator_dram_shared +2025-11-04T21:38:52Z INFO 9044 (nc00) [CoreForkPass]: Inputs to memory_analysis_after_coloring_allocator_dram_shared: modules=3 functions=3 allocs=5434 blocks=3 instructions=19509 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:52Z USER 9044 (nc00) [CoreForkPass]: memory_analysis_after_coloring_allocator_dram_shared finished after 0.081 seconds +2025-11-04T21:38:52Z INFO 9044 (nc00) [CoreForkPass]: curr_vmrss: 374mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (nc00) [CoreForkPass]: Output has 3 module(s), 3 function(s), 5434 memory location(s), 3 block(s), and 19509 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:52Z USER 9044 (nc01) [CoreForkPass]: memory_analysis_after_coloring_allocator_dram_shared finished after 0.144 seconds +2025-11-04T21:38:52Z INFO 9044 (nc01) [CoreForkPass]: curr_vmrss: 374mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (nc01) [CoreForkPass]: Output has 3 module(s), 3 function(s), 5046 memory location(s), 3 block(s), and 18789 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:52Z USER 9044 [CoreForkPass]: Compilation status: Total modules: 2, Passed: 6, Failed: 0 +2025-11-04T21:38:52Z USER 9044 [BackendPassManager]: nc_parallel_pass finished after 0.149 seconds +2025-11-04T21:38:52Z INFO 9044 [BackendPassManager]: curr_vmrss: 373mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z USER 9044 [BackendPassManager]: Running mod_parallel_pass +2025-11-04T21:38:52Z INFO 9044 [BackendPassManager]: Inputs to mod_parallel_pass: modules=6 functions=6 allocs=10480 blocks=6 instructions=38298 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:52Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running prefetch_scheduling_before_sched +2025-11-04T21:38:52Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running prefetch_scheduling_before_sched +2025-11-04T21:38:52Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to prefetch_scheduling_before_sched: modules=1 functions=1 allocs=1248 blocks=1 instructions=3310 Max writers: 34 Max Readers: 248 +2025-11-04T21:38:52Z USER 9044 (nc01/sg01) [ModuleForkPass]: prefetch_scheduling_before_sched finished after 0.000 seconds +2025-11-04T21:38:52Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to prefetch_scheduling_before_sched: modules=1 functions=1 allocs=2644 blocks=1 instructions=13452 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:52Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 370mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z USER 9044 (nc01/sg02) [ModuleForkPass]: prefetch_scheduling_before_sched finished after 0.000 seconds +2025-11-04T21:38:52Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 370mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1248 memory location(s), 1 block(s), and 3310 instruction(s). Max writers: 34 Max Readers: 248 +2025-11-04T21:38:52Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running post_sched +2025-11-04T21:38:52Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to post_sched: modules=1 functions=1 allocs=1248 blocks=1 instructions=3310 Max writers: 34 Max Readers: 248 +2025-11-04T21:38:52Z INFO 9044 [PostSched]: Detected modules.size() == 1; running LNC=1 post_sched +2025-11-04T21:38:52Z INFO 9044 [PostSched]: Detected --lnc_aware_scheduler=false; running LNC=1 post_sched +2025-11-04T21:38:52Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2644 memory location(s), 1 block(s), and 13452 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:52Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running post_sched +2025-11-04T21:38:52Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to post_sched: modules=1 functions=1 allocs=2644 blocks=1 instructions=13452 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:52Z INFO 9044 [PostSched]: Detected modules.size() == 1; running LNC=1 post_sched +2025-11-04T21:38:52Z INFO 9044 [PostSched]: Detected --lnc_aware_scheduler=false; running LNC=1 post_sched +2025-11-04T21:38:52Z INFO 9044 [post_scheduler]: Start PosT ScheD 3 gen3 Tue Nov 4 21:38:52 2025 +2025-11-04T21:38:52Z INFO 9044 [post_scheduler]: Start PosT ScheD 3 gen3 Tue Nov 4 21:38:52 2025 +2025-11-04T21:38:52Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running prefetch_scheduling_before_sched +2025-11-04T21:38:52Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running prefetch_scheduling_before_sched +2025-11-04T21:38:52Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running prefetch_scheduling_before_sched +2025-11-04T21:38:52Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running prefetch_scheduling_before_sched +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to prefetch_scheduling_before_sched: modules=1 functions=1 allocs=1249 blocks=1 instructions=3313 Max writers: 34 Max Readers: 248 +2025-11-04T21:38:52Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to prefetch_scheduling_before_sched: modules=1 functions=1 allocs=1154 blocks=1 instructions=2027 Max writers: 34 Max Readers: 224 +2025-11-04T21:38:52Z USER 9044 (nc00/sg01) [ModuleForkPass]: prefetch_scheduling_before_sched finished after 0.000 seconds +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 371mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to prefetch_scheduling_before_sched: modules=1 functions=1 allocs=1155 blocks=1 instructions=2030 Max writers: 34 Max Readers: 224 +2025-11-04T21:38:52Z USER 9044 (nc00/sg00) [ModuleForkPass]: prefetch_scheduling_before_sched finished after 0.000 seconds +2025-11-04T21:38:52Z USER 9044 (nc01/sg00) [ModuleForkPass]: prefetch_scheduling_before_sched finished after 0.000 seconds +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1249 memory location(s), 1 block(s), and 3313 instruction(s). Max writers: 34 Max Readers: 248 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 371mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running post_sched +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to post_sched: modules=1 functions=1 allocs=1249 blocks=1 instructions=3313 Max writers: 34 Max Readers: 248 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to prefetch_scheduling_before_sched: modules=1 functions=1 allocs=3030 blocks=1 instructions=14166 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:52Z INFO 9044 [PostSched]: Detected modules.size() == 1; running LNC=1 post_sched +2025-11-04T21:38:52Z INFO 9044 [PostSched]: Detected --lnc_aware_scheduler=false; running LNC=1 post_sched +2025-11-04T21:38:52Z USER 9044 (nc00/sg02) [ModuleForkPass]: prefetch_scheduling_before_sched finished after 0.000 seconds +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1155 memory location(s), 1 block(s), and 2030 instruction(s). Max writers: 34 Max Readers: 224 +2025-11-04T21:38:52Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running post_sched +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 371mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 371mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to post_sched: modules=1 functions=1 allocs=1155 blocks=1 instructions=2030 Max writers: 34 Max Readers: 224 +2025-11-04T21:38:52Z INFO 9044 [PostSched]: Detected modules.size() == 1; running LNC=1 post_sched +2025-11-04T21:38:52Z INFO 9044 [PostSched]: Detected --lnc_aware_scheduler=false; running LNC=1 post_sched +2025-11-04T21:38:52Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1154 memory location(s), 1 block(s), and 2027 instruction(s). Max writers: 34 Max Readers: 224 +2025-11-04T21:38:52Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running post_sched +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3030 memory location(s), 1 block(s), and 14166 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:52Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running post_sched +2025-11-04T21:38:52Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to post_sched: modules=1 functions=1 allocs=1154 blocks=1 instructions=2027 Max writers: 34 Max Readers: 224 +2025-11-04T21:38:52Z INFO 9044 [PostSched]: Detected modules.size() == 1; running LNC=1 post_sched +2025-11-04T21:38:52Z INFO 9044 [PostSched]: Detected --lnc_aware_scheduler=false; running LNC=1 post_sched +2025-11-04T21:38:52Z INFO 9044 [post_scheduler]: Start PosT ScheD 3 gen3 Tue Nov 4 21:38:52 2025 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to post_sched: modules=1 functions=1 allocs=3030 blocks=1 instructions=14166 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:52Z INFO 9044 [PostSched]: Detected modules.size() == 1; running LNC=1 post_sched +2025-11-04T21:38:52Z INFO 9044 [PostSched]: Detected --lnc_aware_scheduler=false; running LNC=1 post_sched +2025-11-04T21:38:52Z INFO 9044 [post_scheduler]: Start PosT ScheD 3 gen3 Tue Nov 4 21:38:52 2025 +2025-11-04T21:38:52Z INFO 9044 [post_scheduler]: Start PosT ScheD 3 gen3 Tue Nov 4 21:38:52 2025 +2025-11-04T21:38:52Z INFO 9044 [post_scheduler]: Start PosT ScheD 3 gen3 Tue Nov 4 21:38:52 2025 +2025-11-04T21:38:52Z INFO 9044 [post_scheduler]: Time-aware hwm post-sched +2025-11-04T21:38:52Z INFO 9044 [post_scheduler]: Time-aware hwm post-sched +2025-11-04T21:38:52Z INFO 9044 [post_scheduler]: Time-aware hwm post-sched +2025-11-04T21:38:52Z INFO 9044 [post_scheduler]: Time-aware simulation time: 29678805 +2025-11-04T21:38:52Z INFO 9044 [post_scheduler]: Time-aware hwm post-sched +2025-11-04T21:38:52Z INFO 9044 [post_scheduler]: Done PosT ScheD Tue Nov 4 21:38:52 2025 +2025-11-04T21:38:52Z USER 9044 (nc00/sg01) [ModuleForkPass]: post_sched finished after 0.067 seconds +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 376mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 [post_scheduler]: Time-aware simulation time: 698745 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1249 memory location(s), 1 block(s), and 3313 instruction(s). Max writers: 34 Max Readers: 248 +2025-11-04T21:38:52Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running expand_scheduling_units +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to expand_scheduling_units: modules=1 functions=1 allocs=1249 blocks=1 instructions=3313 Max writers: 34 Max Readers: 248 +2025-11-04T21:38:52Z USER 9044 (nc00/sg01) [ModuleForkPass]: expand_scheduling_units finished after 0.000 seconds +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 373mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1249 memory location(s), 1 block(s), and 3313 instruction(s). Max writers: 34 Max Readers: 248 +2025-11-04T21:38:52Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running dead_code_elim_o0 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to dead_code_elim_o0: modules=1 functions=1 allocs=1249 blocks=1 instructions=3313 Max writers: 34 Max Readers: 248 +2025-11-04T21:38:52Z USER 9044 (nc00/sg01) [ModuleForkPass]: dead_code_elim_o0 finished after 0.003 seconds +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 374mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1249 memory location(s), 1 block(s), and 3313 instruction(s). Max writers: 34 Max Readers: 248 +2025-11-04T21:38:52Z INFO 9044 [post_scheduler]: Done PosT ScheD Tue Nov 4 21:38:52 2025 +2025-11-04T21:38:52Z USER 9044 (nc00/sg00) [ModuleForkPass]: post_sched finished after 0.073 seconds +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 373mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1155 memory location(s), 1 block(s), and 2030 instruction(s). Max writers: 34 Max Readers: 224 +2025-11-04T21:38:52Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running expand_scheduling_units +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to expand_scheduling_units: modules=1 functions=1 allocs=1155 blocks=1 instructions=2030 Max writers: 34 Max Readers: 224 +2025-11-04T21:38:52Z USER 9044 (nc00/sg00) [ModuleForkPass]: expand_scheduling_units finished after 0.000 seconds +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 373mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1155 memory location(s), 1 block(s), and 2030 instruction(s). Max writers: 34 Max Readers: 224 +2025-11-04T21:38:52Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running dead_code_elim_o0 +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to dead_code_elim_o0: modules=1 functions=1 allocs=1155 blocks=1 instructions=2030 Max writers: 34 Max Readers: 224 +2025-11-04T21:38:52Z INFO 9044 [post_scheduler]: Time-aware simulation time: 685465 +2025-11-04T21:38:52Z USER 9044 (nc00/sg00) [ModuleForkPass]: dead_code_elim_o0 finished after 0.006 seconds +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 374mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1155 memory location(s), 1 block(s), and 2030 instruction(s). Max writers: 34 Max Readers: 224 +2025-11-04T21:38:52Z INFO 9044 [post_scheduler]: Done PosT ScheD Tue Nov 4 21:38:52 2025 +2025-11-04T21:38:52Z USER 9044 (nc01/sg00) [ModuleForkPass]: post_sched finished after 0.087 seconds +2025-11-04T21:38:52Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 374mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1154 memory location(s), 1 block(s), and 2027 instruction(s). Max writers: 34 Max Readers: 224 +2025-11-04T21:38:52Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running expand_scheduling_units +2025-11-04T21:38:52Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to expand_scheduling_units: modules=1 functions=1 allocs=1154 blocks=1 instructions=2027 Max writers: 34 Max Readers: 224 +2025-11-04T21:38:52Z USER 9044 (nc01/sg00) [ModuleForkPass]: expand_scheduling_units finished after 0.000 seconds +2025-11-04T21:38:52Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 374mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1154 memory location(s), 1 block(s), and 2027 instruction(s). Max writers: 34 Max Readers: 224 +2025-11-04T21:38:52Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running dead_code_elim_o0 +2025-11-04T21:38:52Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to dead_code_elim_o0: modules=1 functions=1 allocs=1154 blocks=1 instructions=2027 Max writers: 34 Max Readers: 224 +2025-11-04T21:38:52Z USER 9044 (nc01/sg00) [ModuleForkPass]: dead_code_elim_o0 finished after 0.002 seconds +2025-11-04T21:38:52Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 374mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1154 memory location(s), 1 block(s), and 2027 instruction(s). Max writers: 34 Max Readers: 224 +2025-11-04T21:38:52Z INFO 9044 [post_scheduler]: Time-aware simulation time: 29334582 +2025-11-04T21:38:52Z INFO 9044 [post_scheduler]: Done PosT ScheD Tue Nov 4 21:38:52 2025 +2025-11-04T21:38:52Z USER 9044 (nc01/sg01) [ModuleForkPass]: post_sched finished after 0.127 seconds +2025-11-04T21:38:52Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 378mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1248 memory location(s), 1 block(s), and 3310 instruction(s). Max writers: 34 Max Readers: 248 +2025-11-04T21:38:52Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running expand_scheduling_units +2025-11-04T21:38:52Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to expand_scheduling_units: modules=1 functions=1 allocs=1248 blocks=1 instructions=3310 Max writers: 34 Max Readers: 248 +2025-11-04T21:38:52Z USER 9044 (nc01/sg01) [ModuleForkPass]: expand_scheduling_units finished after 0.000 seconds +2025-11-04T21:38:52Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 376mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1248 memory location(s), 1 block(s), and 3310 instruction(s). Max writers: 34 Max Readers: 248 +2025-11-04T21:38:52Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running dead_code_elim_o0 +2025-11-04T21:38:52Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to dead_code_elim_o0: modules=1 functions=1 allocs=1248 blocks=1 instructions=3310 Max writers: 34 Max Readers: 248 +2025-11-04T21:38:52Z USER 9044 (nc01/sg01) [ModuleForkPass]: dead_code_elim_o0 finished after 0.003 seconds +2025-11-04T21:38:52Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 377mb, ru_maxrss: 383mb (delta=0mb) +2025-11-04T21:38:52Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1248 memory location(s), 1 block(s), and 3310 instruction(s). Max writers: 34 Max Readers: 248 +2025-11-04T21:38:52Z INFO 9044 [post_scheduler]: Time-aware hwm post-sched +2025-11-04T21:38:52Z INFO 9044 [post_scheduler]: Time-aware hwm post-sched +2025-11-04T21:38:53Z INFO 9044 [post_scheduler]: Time-aware simulation time: 1747874 +2025-11-04T21:38:53Z INFO 9044 [post_scheduler]: Time-aware simulation time: 1589382 +2025-11-04T21:38:53Z INFO 9044 [post_scheduler]: Done PosT ScheD Tue Nov 4 21:38:53 2025 +2025-11-04T21:38:53Z USER 9044 (nc01/sg02) [ModuleForkPass]: post_sched finished after 0.637 seconds +2025-11-04T21:38:53Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 413mb, ru_maxrss: 413mb (delta=30mb) +2025-11-04T21:38:53Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2644 memory location(s), 1 block(s), and 13452 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:53Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running expand_scheduling_units +2025-11-04T21:38:53Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to expand_scheduling_units: modules=1 functions=1 allocs=2644 blocks=1 instructions=13452 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:53Z USER 9044 (nc01/sg02) [ModuleForkPass]: expand_scheduling_units finished after 0.002 seconds +2025-11-04T21:38:53Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 402mb, ru_maxrss: 413mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2644 memory location(s), 1 block(s), and 13452 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:53Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running dead_code_elim_o0 +2025-11-04T21:38:53Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to dead_code_elim_o0: modules=1 functions=1 allocs=2644 blocks=1 instructions=13452 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:53Z INFO 9044 [post_scheduler]: Done PosT ScheD Tue Nov 4 21:38:53 2025 +2025-11-04T21:38:53Z USER 9044 (nc00/sg02) [ModuleForkPass]: post_sched finished after 0.641 seconds +2025-11-04T21:38:53Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 402mb, ru_maxrss: 413mb (delta=30mb) +2025-11-04T21:38:53Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3030 memory location(s), 1 block(s), and 14166 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:53Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running expand_scheduling_units +2025-11-04T21:38:53Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to expand_scheduling_units: modules=1 functions=1 allocs=3030 blocks=1 instructions=14166 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:53Z USER 9044 (nc00/sg02) [ModuleForkPass]: expand_scheduling_units finished after 0.006 seconds +2025-11-04T21:38:53Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 400mb, ru_maxrss: 413mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3030 memory location(s), 1 block(s), and 14166 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:53Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running dead_code_elim_o0 +2025-11-04T21:38:53Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to dead_code_elim_o0: modules=1 functions=1 allocs=3030 blocks=1 instructions=14166 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:53Z USER 9044 (nc01/sg02) [ModuleForkPass]: dead_code_elim_o0 finished after 0.015 seconds +2025-11-04T21:38:53Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 400mb, ru_maxrss: 413mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2644 memory location(s), 1 block(s), and 13448 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:53Z USER 9044 (nc00/sg02) [ModuleForkPass]: dead_code_elim_o0 finished after 0.017 seconds +2025-11-04T21:38:53Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 400mb, ru_maxrss: 413mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3030 memory location(s), 1 block(s), and 14166 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:53Z USER 9044 [ModuleForkPass]: Compilation status: Total modules: 6, Passed: 6, Failed: 0 +2025-11-04T21:38:53Z USER 9044 [BackendPassManager]: mod_parallel_pass finished after 0.671 seconds +2025-11-04T21:38:53Z INFO 9044 [BackendPassManager]: curr_vmrss: 400mb, ru_maxrss: 413mb (delta=30mb) +2025-11-04T21:38:53Z USER 9044 [BackendPassManager]: Running subgraph_parallel_pass +2025-11-04T21:38:53Z INFO 9044 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=6 functions=6 allocs=10480 blocks=6 instructions=38294 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:53Z USER 9044 (sg00) [SubgraphForkPass]: Running localize_shared_memory +2025-11-04T21:38:53Z USER 9044 (sg01) [SubgraphForkPass]: Running localize_shared_memory +2025-11-04T21:38:53Z INFO 9044 (sg01) [SubgraphForkPass]: Inputs to localize_shared_memory: modules=2 functions=2 allocs=2497 blocks=2 instructions=6623 Max writers: 34 Max Readers: 248 +2025-11-04T21:38:53Z USER 9044 (sg01) [SubgraphForkPass]: localize_shared_memory finished after 0.001 seconds +2025-11-04T21:38:53Z INFO 9044 (sg01) [SubgraphForkPass]: curr_vmrss: 400mb, ru_maxrss: 413mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9044 (sg01) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 2497 memory location(s), 2 block(s), and 6623 instruction(s). Max writers: 34 Max Readers: 248 +2025-11-04T21:38:53Z USER 9044 (sg02) [SubgraphForkPass]: Running localize_shared_memory +2025-11-04T21:38:53Z INFO 9044 (sg02) [SubgraphForkPass]: Inputs to localize_shared_memory: modules=2 functions=2 allocs=5674 blocks=2 instructions=27614 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:53Z INFO 9044 (sg00) [SubgraphForkPass]: Inputs to localize_shared_memory: modules=2 functions=2 allocs=2309 blocks=2 instructions=4057 Max writers: 34 Max Readers: 224 +2025-11-04T21:38:53Z USER 9044 (sg02) [SubgraphForkPass]: localize_shared_memory finished after 0.001 seconds +2025-11-04T21:38:53Z INFO 9044 (sg02) [SubgraphForkPass]: curr_vmrss: 399mb, ru_maxrss: 413mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9044 (sg02) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 5674 memory location(s), 2 block(s), and 27614 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:53Z USER 9044 (sg00) [SubgraphForkPass]: localize_shared_memory finished after 0.003 seconds +2025-11-04T21:38:53Z INFO 9044 (sg00) [SubgraphForkPass]: curr_vmrss: 399mb, ru_maxrss: 413mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9044 (sg00) [SubgraphForkPass]: Output has 2 module(s), 2 function(s), 2309 memory location(s), 2 block(s), and 4057 instruction(s). Max writers: 34 Max Readers: 224 +2025-11-04T21:38:53Z USER 9044 [SubgraphForkPass]: Compilation status: Total subgraphs: 3, Passed: 3, Failed: 0 +2025-11-04T21:38:53Z USER 9044 [BackendPassManager]: subgraph_parallel_pass finished after 0.007 seconds +2025-11-04T21:38:53Z INFO 9044 [BackendPassManager]: curr_vmrss: 399mb, ru_maxrss: 413mb (delta=0mb) +2025-11-04T21:38:53Z USER 9044 [BackendPassManager]: Running mod_parallel_pass +2025-11-04T21:38:53Z INFO 9044 [BackendPassManager]: Inputs to mod_parallel_pass: modules=6 functions=6 allocs=10480 blocks=6 instructions=38294 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:53Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running address_rotation_sb +2025-11-04T21:38:53Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running address_rotation_sb +2025-11-04T21:38:53Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=1249 blocks=1 instructions=3313 Max writers: 34 Max Readers: 248 +2025-11-04T21:38:53Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=3030 blocks=1 instructions=14166 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:53Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running address_rotation_sb +2025-11-04T21:38:53Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=1154 blocks=1 instructions=2027 Max writers: 34 Max Readers: 224 +2025-11-04T21:38:53Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running address_rotation_sb +2025-11-04T21:38:53Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running address_rotation_sb +2025-11-04T21:38:53Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=1248 blocks=1 instructions=3310 Max writers: 34 Max Readers: 248 +2025-11-04T21:38:53Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=2644 blocks=1 instructions=13448 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:53Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running address_rotation_sb +2025-11-04T21:38:53Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=1155 blocks=1 instructions=2030 Max writers: 34 Max Readers: 224 +2025-11-04T21:38:53Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: PSUM Rotation rotated 109 PSUM Banks +2025-11-04T21:38:53Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: PSUM Rotation rotated 33 PSUM Banks +2025-11-04T21:38:53Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: PSUM Rotation rotated 121 PSUM Banks +2025-11-04T21:38:53Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: PSUM Rotation rotated 109 PSUM Banks +2025-11-04T21:38:53Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: SB Rotation rotated 18 Sb address +2025-11-04T21:38:53Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: PSUM Rotation rotated 33 PSUM Banks +2025-11-04T21:38:53Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: PSUM Rotation rotated 131 PSUM Banks +2025-11-04T21:38:53Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: PSUM Rotation rotated 121 PSUM Banks +2025-11-04T21:38:53Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: SB Rotation rotated 70 Sb address +2025-11-04T21:38:53Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: SB Rotation rotated 20 Sb address +2025-11-04T21:38:53Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: PSUM Rotation rotated 41 PSUM Banks +2025-11-04T21:38:53Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: SB Rotation rotated 62 Sb address +2025-11-04T21:38:53Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: PSUM Rotation rotated 136 PSUM Banks +2025-11-04T21:38:53Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: SB Rotation rotated 70 Sb address +2025-11-04T21:38:53Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: SB Rotation rotated 1 Sb address +2025-11-04T21:38:53Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: PSUM Rotation rotated 101 PSUM Banks +2025-11-04T21:38:53Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: SB Rotation rotated 62 Sb address +2025-11-04T21:38:53Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: SB Rotation rotated 18 Sb address +2025-11-04T21:38:53Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: SB Rotation rotated 1 Sb address +2025-11-04T21:38:53Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: PSUM Rotation rotated 42 PSUM Banks +2025-11-04T21:38:53Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: SB Rotation rotated 108 Sb address +2025-11-04T21:38:53Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: SB Rotation rotated 106 Sb address +2025-11-04T21:38:53Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:53Z INFO 9044 (nc00/sg00) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:53Z USER 9044 (nc00/sg00) [ModuleForkPass]: address_rotation_sb finished after 0.092 seconds +2025-11-04T21:38:53Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 400mb, ru_maxrss: 413mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1155 memory location(s), 1 block(s), and 2030 instruction(s). Max writers: 34 Max Readers: 224 +2025-11-04T21:38:53Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running anti_dependency_analyzer +2025-11-04T21:38:53Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=1155 blocks=1 instructions=2030 Max writers: 34 Max Readers: 224 +2025-11-04T21:38:53Z INFO 9044 (nc00/sg00) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB} +2025-11-04T21:38:53Z INFO 9044 (nc00/sg00) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:53Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:53Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: SB Rotation rotated 69 Sb address +2025-11-04T21:38:53Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: PSUM Rotation rotated 103 PSUM Banks +2025-11-04T21:38:53Z INFO 9044 (nc01/sg00) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:53Z USER 9044 (nc01/sg00) [ModuleForkPass]: address_rotation_sb finished after 0.101 seconds +2025-11-04T21:38:53Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 400mb, ru_maxrss: 413mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1154 memory location(s), 1 block(s), and 2027 instruction(s). Max writers: 34 Max Readers: 224 +2025-11-04T21:38:53Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running anti_dependency_analyzer +2025-11-04T21:38:53Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=1154 blocks=1 instructions=2027 Max writers: 34 Max Readers: 224 +2025-11-04T21:38:53Z INFO 9044 (nc01/sg00) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB} +2025-11-04T21:38:53Z INFO 9044 (nc01/sg00) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:53Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: SB Rotation rotated 16 Sb address +2025-11-04T21:38:53Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: SB Rotation rotated 45 Sb address +2025-11-04T21:38:53Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: SB Rotation rotated 70 Sb address +2025-11-04T21:38:53Z USER 9044 (nc00/sg00) [ModuleForkPass]: anti_dependency_analyzer finished after 0.030 seconds +2025-11-04T21:38:53Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 402mb, ru_maxrss: 413mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1155 memory location(s), 1 block(s), and 2030 instruction(s). Max writers: 34 Max Readers: 224 +2025-11-04T21:38:53Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running anti_dependency_analyzer +2025-11-04T21:38:53Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=1155 blocks=1 instructions=2030 Max writers: 34 Max Readers: 224 +2025-11-04T21:38:53Z INFO 9044 (nc00/sg00) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS} +2025-11-04T21:38:53Z INFO 9044 (nc00/sg00) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:53Z USER 9044 (nc01/sg00) [ModuleForkPass]: anti_dependency_analyzer finished after 0.026 seconds +2025-11-04T21:38:53Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 402mb, ru_maxrss: 413mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1154 memory location(s), 1 block(s), and 2027 instruction(s). Max writers: 34 Max Readers: 224 +2025-11-04T21:38:53Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running anti_dependency_analyzer +2025-11-04T21:38:53Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=1154 blocks=1 instructions=2027 Max writers: 34 Max Readers: 224 +2025-11-04T21:38:53Z INFO 9044 (nc01/sg00) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS} +2025-11-04T21:38:53Z INFO 9044 (nc01/sg00) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:53Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: SB Rotation rotated 5 Sb address +2025-11-04T21:38:53Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: SB Rotation rotated 63 Sb address +2025-11-04T21:38:53Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: SB Rotation rotated 106 Sb address +2025-11-04T21:38:53Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: SB Rotation rotated 5 Sb address +2025-11-04T21:38:53Z USER 9044 (nc01/sg00) [ModuleForkPass]: anti_dependency_analyzer finished after 0.022 seconds +2025-11-04T21:38:53Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 402mb, ru_maxrss: 413mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1154 memory location(s), 1 block(s), and 2027 instruction(s). Max writers: 34 Max Readers: 224 +2025-11-04T21:38:53Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running dep_opt +2025-11-04T21:38:53Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to dep_opt: modules=1 functions=1 allocs=1154 blocks=1 instructions=2027 Max writers: 34 Max Readers: 224 +2025-11-04T21:38:53Z INFO 9044 (nc01/sg00) [build_flow_deps]: Start build fdeps. Invocation: 13Tue Nov 4 21:38:53 2025 +2025-11-04T21:38:53Z INFO 9044 (nc01/sg00) [build_flow_deps]: Allocs: 1154 instructions: 2027 +2025-11-04T21:38:53Z USER 9044 (nc00/sg00) [ModuleForkPass]: anti_dependency_analyzer finished after 0.028 seconds +2025-11-04T21:38:53Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 402mb, ru_maxrss: 413mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1155 memory location(s), 1 block(s), and 2030 instruction(s). Max writers: 34 Max Readers: 224 +2025-11-04T21:38:53Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running dep_opt +2025-11-04T21:38:53Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to dep_opt: modules=1 functions=1 allocs=1155 blocks=1 instructions=2030 Max writers: 34 Max Readers: 224 +2025-11-04T21:38:53Z INFO 9044 (nc00/sg00) [build_flow_deps]: Start build fdeps. Invocation: 14Tue Nov 4 21:38:53 2025 +2025-11-04T21:38:53Z INFO 9044 (nc00/sg00) [build_flow_deps]: Allocs: 1155 instructions: 2030 +2025-11-04T21:38:53Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:53Z INFO 9044 (nc01/sg01) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:53Z USER 9044 (nc01/sg01) [ModuleForkPass]: address_rotation_sb finished after 0.163 seconds +2025-11-04T21:38:53Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 402mb, ru_maxrss: 413mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1248 memory location(s), 1 block(s), and 3310 instruction(s). Max writers: 34 Max Readers: 248 +2025-11-04T21:38:53Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running anti_dependency_analyzer +2025-11-04T21:38:53Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=1248 blocks=1 instructions=3310 Max writers: 34 Max Readers: 248 +2025-11-04T21:38:53Z INFO 9044 (nc01/sg01) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB} +2025-11-04T21:38:53Z INFO 9044 (nc01/sg01) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:53Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: SB Rotation rotated 107 Sb address +2025-11-04T21:38:53Z INFO 9044 (nc01/sg00) [build_flow_deps]: Build fdeps inserted 4629 edges +2025-11-04T21:38:53Z INFO 9044 (nc01/sg00) [build_flow_deps]: Done build fdeps 4629 Tue Nov 4 21:38:53 2025 +2025-11-04T21:38:53Z INFO 9044 (nc00/sg00) [build_flow_deps]: Build fdeps inserted 4632 edges +2025-11-04T21:38:53Z INFO 9044 (nc00/sg00) [build_flow_deps]: Done build fdeps 4632 Tue Nov 4 21:38:53 2025 +2025-11-04T21:38:53Z USER 9044 (nc01/sg00) [ModuleForkPass]: dep_opt finished after 0.021 seconds +2025-11-04T21:38:53Z USER 9044 (nc00/sg00) [ModuleForkPass]: dep_opt finished after 0.018 seconds +2025-11-04T21:38:53Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 402mb, ru_maxrss: 413mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1155 memory location(s), 1 block(s), and 2030 instruction(s). Max writers: 34 Max Readers: 224 +2025-11-04T21:38:53Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running report_stats +2025-11-04T21:38:53Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to report_stats: modules=1 functions=1 allocs=1155 blocks=1 instructions=2030 Max writers: 34 Max Readers: 224 +2025-11-04T21:38:53Z INFO 9044 (nc00/sg00) [ReportStats]: Data Movement Statistics: sg0000 +┌─────────────────┬────────────────────────────┬───────┬────────────┐ +│ Instruction │ Kind │ Count │ Bytes │ +├─────────────────┼────────────────────────────┼───────┼────────────┤ +│ DMACopy │ ExternalInput -> Internal │ 8 │ 2489319424 │ +│ DMACopy │ Internal -> ExternalOutput │ 32 │ 1073741824 │ +│ DMACopy │ Internal -> Output │ 1 │ 8388608 │ +│ DMACopy (Spill) │ Internal │ 32 │ 0 │ +│ Load │ Const -> Internal │ 5 │ 53504 │ +│ Load │ ExternalInput -> Internal │ 28 │ 10496516 │ +│ Load │ Internal │ 94 │ 7602176 │ +│ Save │ Internal │ 66 │ 7340032 │ +│ Save │ Internal -> Output │ 7 │ 2359298 │ +└─────────────────┴────────────────────────────┴───────┴────────────┘ + +2025-11-04T21:38:53Z INFO 9044 (nc00/sg00) [ReportStats]: +┌─────────────────────┬───────┐ +│ Bytes per partition │ Count │ +├─────────────────────┼───────┤ +│ 2 │ 5 │ +│ 4 │ 1 │ +│ 32 │ 2 │ +│ 64 │ 2 │ +│ 256 │ 98 │ +│ 512 │ 1 │ +│ 1024 │ 52 │ +│ 2048 │ 17 │ +│ 4096 │ 30 │ +│ 1048576 │ 32 │ +│ 4194304 │ 2 │ +└─────────────────────┴───────┘ + +2025-11-04T21:38:53Z INFO 9044 (nc00/sg00) [ReportStats]: MM Stats: #MatMults 913 #MatMult-Transposes 225 +2025-11-04T21:38:53Z INFO 9044 (nc00/sg00) [ReportStats]: IO Tensor size combined: 457978372 +2025-11-04T21:38:53Z INFO 9044 (nc00/sg00) [ReportStats]: IO Tensor Statistics: +┌────────────────────┬────────────────┬──────────┬──────────────┐ +│ Largest IO Tensors │ Kind │ Src Type │ Size (Bytes) │ +├────────────────────┼────────────────┼──────────┼──────────────┤ +│ input60 │ ExternalInput │ bfloat16 │ 311164928 │ +│ output2 │ ExternalOutput │ bfloat16 │ 33554432 │ +│ output1 │ ExternalOutput │ bfloat16 │ 33554432 │ +│ input4 │ ExternalInput │ bfloat16 │ 33554432 │ +│ input5 │ ExternalInput │ bfloat16 │ 33554432 │ +│ input61 │ ExternalInput │ bfloat16 │ 4194304 │ +│ input67 │ ExternalInput │ bfloat16 │ 4194304 │ +│ input62 │ ExternalInput │ bfloat16 │ 2097152 │ +│ input65 │ ExternalInput │ bfloat16 │ 2097152 │ +│ input0 │ ExternalInput │ int32 │ 4096 │ +└────────────────────┴────────────────┴──────────┴──────────────┘ + +2025-11-04T21:38:53Z INFO 9044 (nc00/sg00) [ReportStats]: Large (Internal) Tensor Statistics: +┌───────────────────────────┬──────────┬──────────┬──────────────┐ +│ Largest Tensors │ Kind │ Src Type │ Size (Bytes) │ +├───────────────────────────┼──────────┼──────────┼──────────────┤ +│ intermediate0 │ Output │ bfloat16 │ 4194304 │ +│ intermediate3 │ Output │ bfloat16 │ 4194304 │ +│ intermediate3-buffer-2754 │ Internal │ bfloat16 │ 4194304 │ +│ all_gather.1 │ Internal │ bfloat16 │ 4194304 │ +│ dot.4-buffer-2752 │ Internal │ bfloat16 │ 4194304 │ +│ reshape.29 │ Internal │ bfloat16 │ 2097152 │ +│ reshape.24 │ Internal │ bfloat16 │ 2097152 │ +│ transpose.1 │ Internal │ bfloat16 │ 2097152 │ +│ get_tuple_element.1 │ Internal │ bfloat16 │ 2097152 │ +│ reshape.16 │ Internal │ bfloat16 │ 2097152 │ +└───────────────────────────┴──────────┴──────────┴──────────────┘ + +2025-11-04T21:38:53Z USER 9044 (nc00/sg00) [ModuleForkPass]: report_stats finished after 0.001 seconds +2025-11-04T21:38:53Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 402mb, ru_maxrss: 413mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1155 memory location(s), 1 block(s), and 2030 instruction(s). Max writers: 34 Max Readers: 224 +2025-11-04T21:38:53Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 402mb, ru_maxrss: 413mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1154 memory location(s), 1 block(s), and 2027 instruction(s). Max writers: 34 Max Readers: 224 +2025-11-04T21:38:53Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running report_stats +2025-11-04T21:38:53Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to report_stats: modules=1 functions=1 allocs=1154 blocks=1 instructions=2027 Max writers: 34 Max Readers: 224 +2025-11-04T21:38:53Z INFO 9044 (nc01/sg00) [ReportStats]: Data Movement Statistics: sg0000 +┌─────────────────┬────────────────────────────┬───────┬────────────┐ +│ Instruction │ Kind │ Count │ Bytes │ +├─────────────────┼────────────────────────────┼───────┼────────────┤ +│ DMACopy │ ExternalInput -> Internal │ 8 │ 2489319424 │ +│ DMACopy │ Internal -> ExternalOutput │ 32 │ 1073741824 │ +│ DMACopy (Spill) │ Internal │ 32 │ 0 │ +│ Load │ Const -> Internal │ 5 │ 53504 │ +│ Load │ ExternalInput -> Internal │ 28 │ 10496516 │ +│ Load │ Internal │ 94 │ 7602176 │ +│ Save │ Internal │ 66 │ 7340032 │ +│ Save │ Internal -> Output │ 6 │ 2359296 │ +└─────────────────┴────────────────────────────┴───────┴────────────┘ + +2025-11-04T21:38:53Z INFO 9044 (nc01/sg00) [ReportStats]: +┌─────────────────────┬───────┐ +│ Bytes per partition │ Count │ +├─────────────────────┼───────┤ +│ 2 │ 4 │ +│ 4 │ 1 │ +│ 32 │ 2 │ +│ 64 │ 2 │ +│ 256 │ 98 │ +│ 512 │ 1 │ +│ 1024 │ 52 │ +│ 2048 │ 17 │ +│ 4096 │ 30 │ +│ 1048576 │ 32 │ +└─────────────────────┴───────┘ + +2025-11-04T21:38:53Z INFO 9044 (nc01/sg00) [ReportStats]: MM Stats: #MatMults 913 #MatMult-Transposes 225 +2025-11-04T21:38:53Z INFO 9044 (nc01/sg00) [ReportStats]: IO Tensor size combined: 457978372 +2025-11-04T21:38:53Z INFO 9044 (nc01/sg00) [ReportStats]: IO Tensor Statistics: +┌────────────────────┬────────────────┬──────────┬──────────────┐ +│ Largest IO Tensors │ Kind │ Src Type │ Size (Bytes) │ +├────────────────────┼────────────────┼──────────┼──────────────┤ +│ input60 │ ExternalInput │ bfloat16 │ 311164928 │ +│ output2 │ ExternalOutput │ bfloat16 │ 33554432 │ +│ output1 │ ExternalOutput │ bfloat16 │ 33554432 │ +│ input4 │ ExternalInput │ bfloat16 │ 33554432 │ +│ input5 │ ExternalInput │ bfloat16 │ 33554432 │ +│ input61 │ ExternalInput │ bfloat16 │ 4194304 │ +│ input67 │ ExternalInput │ bfloat16 │ 4194304 │ +│ input62 │ ExternalInput │ bfloat16 │ 2097152 │ +│ input65 │ ExternalInput │ bfloat16 │ 2097152 │ +│ input0 │ ExternalInput │ int32 │ 4096 │ +└────────────────────┴────────────────┴──────────┴──────────────┘ + +2025-11-04T21:38:53Z INFO 9044 (nc01/sg00) [ReportStats]: Large (Internal) Tensor Statistics: +┌───────────────────────────┬──────────┬──────────┬──────────────┐ +│ Largest Tensors │ Kind │ Src Type │ Size (Bytes) │ +├───────────────────────────┼──────────┼──────────┼──────────────┤ +│ intermediate0 │ Output │ bfloat16 │ 4194304 │ +│ intermediate3 │ Output │ bfloat16 │ 4194304 │ +│ intermediate3-buffer-2754 │ Internal │ bfloat16 │ 4194304 │ +│ all_gather.1 │ Internal │ bfloat16 │ 4194304 │ +│ dot.4-buffer-2752 │ Internal │ bfloat16 │ 4194304 │ +│ reshape.29 │ Internal │ bfloat16 │ 2097152 │ +│ reshape.24 │ Internal │ bfloat16 │ 2097152 │ +│ transpose.1 │ Internal │ bfloat16 │ 2097152 │ +│ get_tuple_element.1 │ Internal │ bfloat16 │ 2097152 │ +│ reshape.16 │ Internal │ bfloat16 │ 2097152 │ +└───────────────────────────┴──────────┴──────────┴──────────────┘ + +2025-11-04T21:38:53Z USER 9044 (nc01/sg00) [ModuleForkPass]: report_stats finished after 0.001 seconds +2025-11-04T21:38:53Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 403mb, ru_maxrss: 413mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1154 memory location(s), 1 block(s), and 2027 instruction(s). Max writers: 34 Max Readers: 224 +2025-11-04T21:38:53Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:53Z INFO 9044 (nc00/sg01) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:53Z USER 9044 (nc00/sg01) [ModuleForkPass]: address_rotation_sb finished after 0.198 seconds +2025-11-04T21:38:53Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 403mb, ru_maxrss: 413mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1249 memory location(s), 1 block(s), and 3313 instruction(s). Max writers: 34 Max Readers: 248 +2025-11-04T21:38:53Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running anti_dependency_analyzer +2025-11-04T21:38:53Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=1249 blocks=1 instructions=3313 Max writers: 34 Max Readers: 248 +2025-11-04T21:38:53Z INFO 9044 (nc00/sg01) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB} +2025-11-04T21:38:53Z INFO 9044 (nc00/sg01) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:53Z USER 9044 (nc01/sg01) [ModuleForkPass]: anti_dependency_analyzer finished after 0.046 seconds +2025-11-04T21:38:53Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 405mb, ru_maxrss: 413mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1248 memory location(s), 1 block(s), and 3310 instruction(s). Max writers: 34 Max Readers: 248 +2025-11-04T21:38:53Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running anti_dependency_analyzer +2025-11-04T21:38:53Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=1248 blocks=1 instructions=3310 Max writers: 34 Max Readers: 248 +2025-11-04T21:38:53Z INFO 9044 (nc01/sg01) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS} +2025-11-04T21:38:53Z INFO 9044 (nc01/sg01) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:53Z USER 9044 (nc00/sg01) [ModuleForkPass]: anti_dependency_analyzer finished after 0.040 seconds +2025-11-04T21:38:53Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 406mb, ru_maxrss: 413mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1249 memory location(s), 1 block(s), and 3313 instruction(s). Max writers: 34 Max Readers: 248 +2025-11-04T21:38:53Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running anti_dependency_analyzer +2025-11-04T21:38:53Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=1249 blocks=1 instructions=3313 Max writers: 34 Max Readers: 248 +2025-11-04T21:38:53Z INFO 9044 (nc00/sg01) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS} +2025-11-04T21:38:53Z INFO 9044 (nc00/sg01) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:53Z USER 9044 (nc01/sg01) [ModuleForkPass]: anti_dependency_analyzer finished after 0.035 seconds +2025-11-04T21:38:53Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 406mb, ru_maxrss: 413mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1248 memory location(s), 1 block(s), and 3310 instruction(s). Max writers: 34 Max Readers: 248 +2025-11-04T21:38:53Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running dep_opt +2025-11-04T21:38:53Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to dep_opt: modules=1 functions=1 allocs=1248 blocks=1 instructions=3310 Max writers: 34 Max Readers: 248 +2025-11-04T21:38:53Z INFO 9044 (nc01/sg01) [build_flow_deps]: Start build fdeps. Invocation: 15Tue Nov 4 21:38:53 2025 +2025-11-04T21:38:53Z INFO 9044 (nc01/sg01) [build_flow_deps]: Allocs: 1248 instructions: 3310 +2025-11-04T21:38:53Z USER 9044 (nc00/sg01) [ModuleForkPass]: anti_dependency_analyzer finished after 0.014 seconds +2025-11-04T21:38:53Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 406mb, ru_maxrss: 413mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1249 memory location(s), 1 block(s), and 3313 instruction(s). Max writers: 34 Max Readers: 248 +2025-11-04T21:38:53Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running dep_opt +2025-11-04T21:38:53Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to dep_opt: modules=1 functions=1 allocs=1249 blocks=1 instructions=3313 Max writers: 34 Max Readers: 248 +2025-11-04T21:38:53Z INFO 9044 (nc00/sg01) [build_flow_deps]: Start build fdeps. Invocation: 16Tue Nov 4 21:38:53 2025 +2025-11-04T21:38:53Z INFO 9044 (nc00/sg01) [build_flow_deps]: Allocs: 1249 instructions: 3313 +2025-11-04T21:38:53Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: PSUM Rotation rotated 697 PSUM Banks +2025-11-04T21:38:53Z INFO 9044 (nc00/sg01) [build_flow_deps]: Build fdeps inserted 8664 edges +2025-11-04T21:38:53Z INFO 9044 (nc00/sg01) [build_flow_deps]: Done build fdeps 8664 Tue Nov 4 21:38:53 2025 +2025-11-04T21:38:53Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: PSUM Rotation rotated 786 PSUM Banks +2025-11-04T21:38:53Z INFO 9044 (nc01/sg01) [build_flow_deps]: Build fdeps inserted 8661 edges +2025-11-04T21:38:53Z INFO 9044 (nc01/sg01) [build_flow_deps]: Done build fdeps 8661 Tue Nov 4 21:38:53 2025 +2025-11-04T21:38:53Z USER 9044 (nc00/sg01) [ModuleForkPass]: dep_opt finished after 0.032 seconds +2025-11-04T21:38:53Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 406mb, ru_maxrss: 413mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1249 memory location(s), 1 block(s), and 3313 instruction(s). Max writers: 34 Max Readers: 248 +2025-11-04T21:38:53Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running report_stats +2025-11-04T21:38:53Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to report_stats: modules=1 functions=1 allocs=1249 blocks=1 instructions=3313 Max writers: 34 Max Readers: 248 +2025-11-04T21:38:53Z INFO 9044 (nc00/sg01) [ReportStats]: Data Movement Statistics: sg0001 +┌─────────────────┬────────────────────────────┬───────┬─────────��──┐ +│ Instruction │ Kind │ Count │ Bytes │ +├─────────────────┼────────────────────────────┼───────┼────────────┤ +│ DMACopy │ Input -> Internal │ 1 │ 6291456 │ +│ DMACopy │ Internal -> ExternalOutput │ 32 │ 1073741824 │ +│ DMACopy │ Internal -> Output │ 1 │ 8388608 │ +│ DMACopy (Spill) │ Internal │ 32 │ 0 │ +│ Load │ Const -> Internal │ 5 │ 65536 │ +│ Load │ ExternalInput -> Internal │ 171 │ 48243204 │ +│ Load │ Input -> Internal │ 6 │ 524288 │ +│ Load │ Internal │ 84 │ 9437184 │ +│ Save │ Internal │ 68 │ 8388608 │ +│ Save │ Internal -> Output │ 5 │ 2097154 │ +└─────────────────┴────────────────────────────┴───────┴────────────┘ + +2025-11-04T21:38:53Z INFO 9044 (nc00/sg01) [ReportStats]: +┌─────────────────────┬───────┐ +│ Bytes per partition │ Count │ +├─────────────────────┼───────┤ +│ 2 │ 5 │ +│ 4 │ 1 │ +│ 32 │ 2 │ +│ 64 │ 4 │ +│ 256 │ 97 │ +│ 512 │ 4 │ +│ 1024 │ 130 │ +│ 2048 │ 8 │ +│ 4096 │ 88 │ +│ 1048576 │ 32 │ +│ 2097152 │ 3 │ +│ 4194304 │ 2 │ +└─────────────────────┴───────┘ + +2025-11-04T21:38:53Z INFO 9044 (nc00/sg01) [ReportStats]: MM Stats: #MatMults 2100 #MatMult-Transposes 248 +2025-11-04T21:38:53Z INFO 9044 (nc00/sg01) [ReportStats]: IO Tensor size combined: 184558084 +2025-11-04T21:38:53Z INFO 9044 (nc00/sg01) [ReportStats]: IO Tensor Statistics: +┌────────────────────┬────────────────┬──────────┬──────────────┐ +│ Largest IO Tensors │ Kind │ Src Type │ Size (Bytes) │ +├────────────────────┼────────────────┼──────────┼──────────────┤ +│ output4 │ ExternalOutput │ bfloat16 │ 33554432 │ +│ input6 │ ExternalInput │ bfloat16 │ 33554432 │ +│ input7 │ ExternalInput │ bfloat16 │ 33554432 │ +│ output3 │ ExternalOutput │ bfloat16 │ 33554432 │ +│ input68 │ ExternalInput │ bfloat16 │ 12582912 │ +│ input71 │ ExternalInput │ bfloat16 │ 12582912 │ +│ input69 │ ExternalInput │ bfloat16 │ 12582912 │ +│ input72 │ ExternalInput │ bfloat16 │ 4194304 │ +│ input78 │ ExternalInput │ bfloat16 │ 4194304 │ +│ input76 │ ExternalInput │ bfloat16 │ 2097152 │ +└────────────────────┴────────────────┴──────────┴──────────────┘ + +2025-11-04T21:38:53Z INFO 9044 (nc00/sg01) [ReportStats]: Large (Internal) Tensor Statistics: +┌───────────────────────────┬──────────┬──────────┬──────────────┐ +│ Largest Tensors │ Kind │ Src Type │ Size (Bytes) │ +├───────────────────────────┼──────────┼──────────┼──────────────┤ +│ intermediate3 │ Input │ bfloat16 │ 4194304 │ +│ dot.7-buffer-2414 │ Internal │ bfloat16 │ 4194304 │ +│ dot.11-buffer-2419 │ Internal │ bfloat16 │ 4194304 │ +│ intermediate5 │ Output │ bfloat16 │ 4194304 │ +│ intermediate0 │ Input │ bfloat16 │ 4194304 │ +│ all_reduce.1-buffer-2416 │ Internal │ bfloat16 │ 4194304 │ +│ intermediate6-buffer-2421 │ Internal │ bfloat16 │ 4194304 │ +│ intermediate6 │ Output │ bfloat16 │ 4194304 │ +│ add.4 │ Internal │ bfloat16 │ 4194304 │ +│ reshape.60 │ Internal │ bfloat16 │ 2097152 │ +└───────────────────────────┴──────────┴──────────┴──────────────┘ + +2025-11-04T21:38:53Z USER 9044 (nc00/sg01) [ModuleForkPass]: report_stats finished after 0.008 seconds +2025-11-04T21:38:53Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 404mb, ru_maxrss: 413mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1249 memory location(s), 1 block(s), and 3313 instruction(s). Max writers: 34 Max Readers: 248 +2025-11-04T21:38:53Z USER 9044 (nc01/sg01) [ModuleForkPass]: dep_opt finished after 0.043 seconds +2025-11-04T21:38:53Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 404mb, ru_maxrss: 413mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1248 memory location(s), 1 block(s), and 3310 instruction(s). Max writers: 34 Max Readers: 248 +2025-11-04T21:38:53Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running report_stats +2025-11-04T21:38:53Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to report_stats: modules=1 functions=1 allocs=1248 blocks=1 instructions=3310 Max writers: 34 Max Readers: 248 +2025-11-04T21:38:53Z INFO 9044 (nc01/sg01) [ReportStats]: Data Movement Statistics: sg0001 +┌─────────────────┬────────────────────────────┬───────┬────────────┐ +│ Instruction │ Kind │ Count │ Bytes │ +├─────────────────┼────────────────────────────┼───────┼────────────┤ +│ DMACopy │ Input -> Internal │ 1 │ 6291456 │ +│ DMACopy │ Internal -> ExternalOutput │ 32 │ 1073741824 │ +│ DMACopy (Spill) │ Internal │ 32 │ 0 │ +│ Load │ Const -> Internal │ 5 │ 65536 │ +│ Load │ ExternalInput -> Internal │ 171 │ 48243204 │ +│ Load │ Input -> Internal │ 6 │ 524288 │ +│ Load │ Internal │ 84 │ 9437184 │ +│ Save │ Internal │ 68 │ 8388608 │ +│ Save │ Internal -> Output │ 4 │ 2097152 │ +└─────────────────┴────────────────────────────┴───────┴────────────┘ + +2025-11-04T21:38:53Z INFO 9044 (nc01/sg01) [ReportStats]: +┌─────────────────────┬───────┐ +│ Bytes per partition │ Count │ +├─────────────────────┼───────┤ +│ 2 │ 4 │ +│ 4 │ 1 │ +│ 32 │ 2 │ +│ 64 │ 4 │ +│ 256 │ 97 │ +│ 512 │ 4 │ +│ 1024 │ 130 │ +│ 2048 │ 8 │ +│ 4096 │ 88 │ +│ 1048576 │ 32 │ +│ 2097152 │ 3 │ +└─────────────────────┴───────┘ + +2025-11-04T21:38:53Z INFO 9044 (nc01/sg01) [ReportStats]: MM Stats: #MatMults 2100 #MatMult-Transposes 248 +2025-11-04T21:38:53Z INFO 9044 (nc01/sg01) [ReportStats]: IO Tensor size combined: 184558084 +2025-11-04T21:38:53Z INFO 9044 (nc01/sg01) [ReportStats]: IO Tensor Statistics: +┌────────────────────┬────────────────┬──────────┬──────────────┐ +│ Largest IO Tensors │ Kind │ Src Type │ Size (Bytes) │ +├────────────────────┼────────────────┼──────────┼──────────────┤ +│ output4 │ ExternalOutput │ bfloat16 │ 33554432 │ +│ input6 │ ExternalInput │ bfloat16 │ 33554432 │ +│ input7 │ ExternalInput │ bfloat16 │ 33554432 │ +│ output3 │ ExternalOutput │ bfloat16 │ 33554432 │ +│ input68 │ ExternalInput │ bfloat16 │ 12582912 │ +│ input71 │ ExternalInput │ bfloat16 │ 12582912 │ +│ input69 │ ExternalInput │ bfloat16 │ 12582912 │ +│ input72 │ ExternalInput │ bfloat16 │ 4194304 │ +│ input78 │ ExternalInput │ bfloat16 │ 4194304 │ +│ input76 │ ExternalInput │ bfloat16 │ 2097152 │ +└────────────────────┴────────────────┴──────────┴──────────────┘ + +2025-11-04T21:38:53Z INFO 9044 (nc01/sg01) [ReportStats]: Large (Internal) Tensor Statistics: +┌───────────────────────────┬──────────┬──────────┬──────────────┐ +│ Largest Tensors │ Kind │ Src Type │ Size (Bytes) │ +├───────────────────────────┼──────────┼──────────┼──────────────┤ +│ intermediate3 │ Input │ bfloat16 │ 4194304 │ +│ dot.7-buffer-2414 │ Internal │ bfloat16 │ 4194304 │ +│ dot.11-buffer-2419 │ Internal │ bfloat16 │ 4194304 │ +│ intermediate5 │ Output │ bfloat16 │ 4194304 │ +│ intermediate0 │ Input │ bfloat16 │ 4194304 │ +│ all_reduce.1-buffer-2416 │ Internal │ bfloat16 │ 4194304 │ +│ intermediate6-buffer-2421 │ Internal │ bfloat16 │ 4194304 │ +│ intermediate6 │ Output │ bfloat16 │ 4194304 │ +│ add.4 │ Internal │ bfloat16 │ 4194304 │ +│ reshape.60 │ Internal │ bfloat16 │ 2097152 │ +└───────────────────────────┴──────────┴──────────┴──────────────┘ + +2025-11-04T21:38:53Z USER 9044 (nc01/sg01) [ModuleForkPass]: report_stats finished after 0.002 seconds +2025-11-04T21:38:53Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 404mb, ru_maxrss: 413mb (delta=0mb) +2025-11-04T21:38:53Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1248 memory location(s), 1 block(s), and 3310 instruction(s). Max writers: 34 Max Readers: 248 +2025-11-04T21:38:53Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: PSUM Rotation rotated 13 PSUM Banks +2025-11-04T21:38:53Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: PSUM Rotation rotated 13 PSUM Banks +2025-11-04T21:38:53Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: PSUM Rotation rotated 46 PSUM Banks +2025-11-04T21:38:53Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: PSUM Rotation rotated 97 PSUM Banks +2025-11-04T21:38:53Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: SB Rotation rotated 35 Sb address +2025-11-04T21:38:53Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: SB Rotation rotated 1 Sb address +2025-11-04T21:38:53Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: SB Rotation rotated 55 Sb address +2025-11-04T21:38:53Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: SB Rotation rotated 38 Sb address +2025-11-04T21:38:53Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: SB Rotation rotated 1 Sb address +2025-11-04T21:38:53Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: SB Rotation rotated 9 Sb address +2025-11-04T21:38:53Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: SB Rotation rotated 37 Sb address +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: SB Rotation rotated 63 Sb address +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:54Z USER 9044 (nc01/sg02) [ModuleForkPass]: address_rotation_sb finished after 0.618 seconds +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 405mb, ru_maxrss: 413mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2644 memory location(s), 1 block(s), and 13448 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running anti_dependency_analyzer +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=2644 blocks=1 instructions=13448 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB} +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: SB Rotation rotated 2 Sb address +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: SB Rotation rotated 64 Sb address +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:54Z USER 9044 (nc01/sg02) [ModuleForkPass]: anti_dependency_analyzer finished after 0.088 seconds +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 419mb, ru_maxrss: 419mb (delta=6mb) +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-11-04T21:38:54Z USER 9044 (nc00/sg02) [ModuleForkPass]: address_rotation_sb finished after 0.720 seconds +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 414mb, ru_maxrss: 419mb (delta=6mb) +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2644 memory location(s), 1 block(s), and 13448 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running anti_dependency_analyzer +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=2644 blocks=1 instructions=13448 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS} +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3030 memory location(s), 1 block(s), and 14166 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running anti_dependency_analyzer +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=3030 blocks=1 instructions=14166 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB} +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:54Z USER 9044 (nc01/sg02) [ModuleForkPass]: anti_dependency_analyzer finished after 0.020 seconds +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 408mb, ru_maxrss: 419mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2644 memory location(s), 1 block(s), and 13448 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running dep_opt +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to dep_opt: modules=1 functions=1 allocs=2644 blocks=1 instructions=13448 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [build_flow_deps]: Start build fdeps. Invocation: 17Tue Nov 4 21:38:54 2025 +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [build_flow_deps]: Allocs: 2644 instructions: 13448 +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [build_flow_deps]: Build fdeps inserted 35064 edges +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [build_flow_deps]: Done build fdeps 35064 Tue Nov 4 21:38:54 2025 +2025-11-04T21:38:54Z USER 9044 (nc01/sg02) [ModuleForkPass]: dep_opt finished after 0.102 seconds +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 421mb, ru_maxrss: 421mb (delta=2mb) +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2644 memory location(s), 1 block(s), and 13448 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running report_stats +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to report_stats: modules=1 functions=1 allocs=2644 blocks=1 instructions=13448 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [ReportStats]: Data Movement Statistics: sg0002 +┌─────────────┬───────────────────────────┬───────┬───────────┐ +│ Instruction │ Kind │ Count │ Bytes │ +├─────────────┼───────────────────────────┼───────┼───────────┤ +│ DMACopy │ Input -> Internal │ 1 │ 6291456 │ +│ DMACopy │ Internal │ 2 │ 4194304 │ +│ Load │ Const -> Internal │ 1 │ 32768 │ +│ Load │ ExternalInput -> Internal │ 448 │ 193345548 │ +│ Load │ Internal │ 20 │ 6294662 │ +│ Save │ Internal │ 312 │ 6444544 │ +└─────────────┴───────────────────────────┴───────┴───────────┘ + +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [ReportStats]: +┌─────────────────────┬───────┐ +│ Bytes per partition │ Count │ +├─────────────────────┼───��───┤ +│ 2 │ 2 │ +│ 4 │ 4 │ +│ 32 │ 2 │ +│ 128 │ 2 │ +│ 256 │ 1 │ +│ 384 │ 1 │ +│ 512 │ 302 │ +│ 1024 │ 97 │ +│ 2048 │ 1 │ +│ 4096 │ 370 │ +│ 2097152 │ 3 │ +└─────────────────────┴───────┘ + +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [ReportStats]: MM Stats: #MatMults 11178 #MatMult-Transposes 5242 +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [ReportStats]: IO Tensor size combined: 348925968 +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [ReportStats]: IO Tensor Statistics: +┌────────────────────┬────────────────┬──────────┬──────────────┐ +│ Largest IO Tensors │ Kind │ Src Type │ Size (Bytes) │ +├────────────────────┼────────────────┼──────────┼──────────────┤ +│ input369 │ ExternalInput │ bfloat16 │ 311164928 │ +│ input368 │ ExternalInput │ bfloat16 │ 12582912 │ +│ input365 │ ExternalInput │ bfloat16 │ 12582912 │ +│ input366 │ ExternalInput │ bfloat16 │ 12582912 │ +│ input370 │ ExternalInput │ bfloat16 │ 4096 │ +│ input367 │ ExternalInput │ bfloat16 │ 4096 │ +│ input1 │ ExternalInput │ int32 │ 4096 │ +│ input3 │ ExternalInput │ float32 │ 12 │ +│ output0 │ ExternalOutput │ int32 │ 4 │ +└────────────────────┴────────────────┴──────────┴──────────────┘ + +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [ReportStats]: Large (Internal) Tensor Statistics: +┌────────────────────────────────────────┬──────────┬──────────┬──────────────┐ +│ Largest Tensors │ Kind │ Src Type │ Size (Bytes) │ +├────────────────────────────────────────┼──────────┼──────────┼──────────────┤ +│ convert.53 │ Internal │ bfloat16 │ 4194304 │ +│ all_reduce.3-buffer-2033 │ Internal │ bfloat16 │ 4194304 │ +│ intermediate84 │ Input │ bfloat16 │ 4194304 │ +│ dot.14-buffer-2031 │ Internal │ bfloat16 │ 4194304 │ +│ intermediate83 │ Input │ bfloat16 │ 4194304 │ +│ add.9 │ Internal │ bfloat16 │ 4194304 │ +│ DynamicDMAScratchLoc │ Internal │ uint8 │ 2097152 │ +│ all_reduce.3_pftranspose_1000-t1614_i3 │ Internal │ bfloat16 │ 1048576 │ +│ all_reduce.3_pftranspose_1000-t1614_i2 │ Internal │ bfloat16 │ 1048576 │ +│ add.9_pftranspose_996-t1610_i3 │ Internal │ bfloat16 │ 1048576 │ +└────────────────────────────────────────┴──────────┴──────────┴──────────────┘ + +2025-11-04T21:38:54Z USER 9044 (nc01/sg02) [ModuleForkPass]: report_stats finished after 0.016 seconds +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 420mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2644 memory location(s), 1 block(s), and 13448 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z USER 9044 (nc00/sg02) [ModuleForkPass]: anti_dependency_analyzer finished after 0.150 seconds +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 416mb, ru_maxrss: 421mb (delta=2mb) +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3030 memory location(s), 1 block(s), and 14166 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running anti_dependency_analyzer +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=3030 blocks=1 instructions=14166 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS} +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [AntiDependencyAnalyzer]: DRAM size: 25769803776 num-bins: 24 bin-size: 1073741824 +2025-11-04T21:38:54Z USER 9044 (nc00/sg02) [ModuleForkPass]: anti_dependency_analyzer finished after 0.030 seconds +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 409mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3030 memory location(s), 1 block(s), and 14166 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running dep_opt +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to dep_opt: modules=1 functions=1 allocs=3030 blocks=1 instructions=14166 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [build_flow_deps]: Start build fdeps. Invocation: 18Tue Nov 4 21:38:54 2025 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [build_flow_deps]: Allocs: 3030 instructions: 14166 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [build_flow_deps]: Build fdeps inserted 45387 edges +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [build_flow_deps]: Done build fdeps 45387 Tue Nov 4 21:38:54 2025 +2025-11-04T21:38:54Z USER 9044 (nc00/sg02) [ModuleForkPass]: dep_opt finished after 0.127 seconds +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 410mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3030 memory location(s), 1 block(s), and 14166 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running report_stats +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to report_stats: modules=1 functions=1 allocs=3030 blocks=1 instructions=14166 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [ReportStats]: Data Movement Statistics: sg0002 +┌─────────────┬────────────────────────────┬───────┬───────────┐ +│ Instruction │ Kind │ Count │ Bytes │ +├─────────────┼────────────────────────────┼───────┼───────────┤ +│ DMACopy │ Input -> Internal │ 1 │ 6291456 │ +│ DMACopy │ Internal │ 4 │ 4194304 │ +│ Load │ Const -> Internal │ 8 │ 348936 │ +│ Load │ ExternalInput -> Internal │ 448 │ 193345548 │ +│ Load │ Internal │ 34 │ 6613898 │ +│ Save │ Internal │ 329 │ 6459911 │ +│ Save │ Internal -> ExternalOutput │ 1 │ 4 │ +└─────────────┴────────────────────────────┴───────┴───────────┘ + +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [ReportStats]: +┌─────────────────────┬───────┐ +│ Bytes per partition │ Count │ +├─────────────────────┼───────┤ +│ 1 │ 1 │ +│ 2 │ 3 │ +│ 4 │ 9 │ +│ 8 │ 2 │ +│ 16 │ 3 │ +│ 32 │ 6 │ +│ 64 │ 2 │ +│ 128 │ 4 │ +│ 256 │ 1 │ +│ 384 │ 1 │ +│ 512 │ 302 │ +│ 1024 │ 113 │ +│ 2048 │ 2 │ +│ 4096 │ 370 │ +│ 9496 │ 2 │ +│ 2097152 │ 3 │ +└─────────────────────┴───────┘ + +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [ReportStats]: MM Stats: #MatMults 11302 #MatMult-Transposes 5242 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [ReportStats]: IO Tensor size combined: 348925968 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [ReportStats]: IO Tensor Statistics: +┌────────────────────┬────────────────┬──────────┬──────────────┐ +│ Largest IO Tensors │ Kind │ Src Type │ Size (Bytes) │ +├────────────────────┼────────────────┼──────────┼──────────────┤ +│ input369 │ ExternalInput │ bfloat16 │ 311164928 │ +│ input368 │ ExternalInput │ bfloat16 │ 12582912 │ +│ input365 │ ExternalInput │ bfloat16 │ 12582912 │ +│ input366 │ ExternalInput │ bfloat16 │ 12582912 │ +│ input370 │ ExternalInput │ bfloat16 │ 4096 │ +│ input367 │ ExternalInput │ bfloat16 │ 4096 │ +│ input1 │ ExternalInput │ int32 │ 4096 │ +│ input3 │ ExternalInput │ float32 │ 12 │ +│ output0 │ ExternalOutput │ int32 │ 4 │ +└────────────────────┴────────────────┴──────────┴──────────────┘ + +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [ReportStats]: Large (Internal) Tensor Statistics: +┌──────────────────────────┬──────────┬──────────┬──────────────┐ +│ Largest Tensors │ Kind │ Src Type │ Size (Bytes) │ +├──────────────────────────┼──────────┼──────────┼──────────────┤ +│ add.9 │ Internal │ bfloat16 │ 4194304 │ +│ convert.53 │ Internal │ bfloat16 │ 4194304 │ +│ intermediate84 │ Input │ bfloat16 │ 4194304 │ +│ dot.14-buffer-2031 │ Internal │ bfloat16 │ 4194304 │ +│ intermediate83 │ Input │ bfloat16 │ 4194304 │ +│ all_reduce.3-buffer-2033 │ Internal │ bfloat16 │ 4194304 │ +│ DynamicDMAScratchLoc │ Internal │ uint8 │ 2097152 │ +│ -t3025 │ Internal │ float32 │ 1048576 │ +│ -t3019 │ Internal │ float32 │ 1048576 │ +│ -t3014 │ Internal │ float32 │ 1048576 │ +└──────────────────────────┴──────────┴──────────┴──────────────┘ + +2025-11-04T21:38:54Z USER 9044 (nc00/sg02) [ModuleForkPass]: report_stats finished after 0.014 seconds +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 409mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3030 memory location(s), 1 block(s), and 14166 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z USER 9044 [ModuleForkPass]: Compilation status: Total modules: 6, Passed: 6, Failed: 0 +2025-11-04T21:38:54Z USER 9044 [BackendPassManager]: mod_parallel_pass finished after 1.055 seconds +2025-11-04T21:38:54Z INFO 9044 [BackendPassManager]: curr_vmrss: 409mb, ru_maxrss: 421mb (delta=8mb) +2025-11-04T21:38:54Z USER 9044 [BackendPassManager]: Running assign_trigger_engine +2025-11-04T21:38:54Z INFO 9044 [BackendPassManager]: Inputs to assign_trigger_engine: modules=6 functions=6 allocs=10480 blocks=6 instructions=38294 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg00) [AssignTriggerEngine]: Assigned trigger engine for 85 DMA instructions. Moved 19 DMA instructions to CC's engines. +2025-11-04T21:38:54Z INFO 9044 (nc01/sg00) [AssignTriggerEngine]: Assigned trigger engine for 84 DMA instructions. Moved 18 DMA instructions to CC's engines. +2025-11-04T21:38:54Z INFO 9044 (nc00/sg01) [AssignTriggerEngine]: Assigned trigger engine for 73 DMA instructions. Moved 5 DMA instructions to CC's engines. +2025-11-04T21:38:54Z INFO 9044 (nc01/sg01) [AssignTriggerEngine]: Assigned trigger engine for 72 DMA instructions. Moved 4 DMA instructions to CC's engines. +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [AssignTriggerEngine]: Assigned trigger engine for 336 DMA instructions. Moved 7 DMA instructions to CC's engines. +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [AssignTriggerEngine]: Assigned trigger engine for 317 DMA instructions. Moved 5 DMA instructions to CC's engines. +2025-11-04T21:38:54Z INFO 9044 [AssignTriggerEngine]: Limiting IO queue to SP only +2025-11-04T21:38:54Z USER 9044 [BackendPassManager]: assign_trigger_engine finished after 0.020 seconds +2025-11-04T21:38:54Z INFO 9044 [BackendPassManager]: curr_vmrss: 409mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9044 [BackendPassManager]: Output has 6 module(s), 6 function(s), 10480 memory location(s), 6 block(s), and 38294 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z USER 9044 [BackendPassManager]: Running mod_parallel_pass +2025-11-04T21:38:54Z INFO 9044 [BackendPassManager]: Inputs to mod_parallel_pass: modules=6 functions=6 allocs=10480 blocks=6 instructions=38294 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running sync_before_global_cc +2025-11-04T21:38:54Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running sync_before_global_cc +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to sync_before_global_cc: modules=1 functions=1 allocs=2644 blocks=1 instructions=13448 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to sync_before_global_cc: modules=1 functions=1 allocs=3030 blocks=1 instructions=14166 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z USER 9044 (nc01/sg02) [ModuleForkPass]: sync_before_global_cc finished after 0.003 seconds +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 409mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2644 memory location(s), 1 block(s), and 13451 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z USER 9044 (nc00/sg02) [ModuleForkPass]: sync_before_global_cc finished after 0.017 seconds +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 409mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3030 memory location(s), 1 block(s), and 14169 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running sync_before_global_cc +2025-11-04T21:38:54Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to sync_before_global_cc: modules=1 functions=1 allocs=1154 blocks=1 instructions=2027 Max writers: 34 Max Readers: 224 +2025-11-04T21:38:54Z USER 9044 (nc01/sg00) [ModuleForkPass]: sync_before_global_cc finished after 0.001 seconds +2025-11-04T21:38:54Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 409mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running sync_before_global_cc +2025-11-04T21:38:54Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1154 memory location(s), 1 block(s), and 2029 instruction(s). Max writers: 34 Max Readers: 224 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to sync_before_global_cc: modules=1 functions=1 allocs=1249 blocks=1 instructions=3313 Max writers: 34 Max Readers: 248 +2025-11-04T21:38:54Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running sync_before_global_cc +2025-11-04T21:38:54Z USER 9044 (nc00/sg01) [ModuleForkPass]: sync_before_global_cc finished after 0.001 seconds +2025-11-04T21:38:54Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running sync_before_global_cc +2025-11-04T21:38:54Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 409mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1249 memory location(s), 1 block(s), and 3315 instruction(s). Max writers: 34 Max Readers: 248 +2025-11-04T21:38:54Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to sync_before_global_cc: modules=1 functions=1 allocs=1248 blocks=1 instructions=3310 Max writers: 34 Max Readers: 248 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to sync_before_global_cc: modules=1 functions=1 allocs=1155 blocks=1 instructions=2030 Max writers: 34 Max Readers: 224 +2025-11-04T21:38:54Z USER 9044 (nc00/sg00) [ModuleForkPass]: sync_before_global_cc finished after 0.001 seconds +2025-11-04T21:38:54Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 409mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1155 memory location(s), 1 block(s), and 2032 instruction(s). Max writers: 34 Max Readers: 224 +2025-11-04T21:38:54Z USER 9044 (nc01/sg01) [ModuleForkPass]: sync_before_global_cc finished after 0.001 seconds +2025-11-04T21:38:54Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 409mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1248 memory location(s), 1 block(s), and 3312 instruction(s). Max writers: 34 Max Readers: 248 +2025-11-04T21:38:54Z USER 9044 [ModuleForkPass]: Compilation status: Total modules: 6, Passed: 6, Failed: 0 +2025-11-04T21:38:54Z USER 9044 [BackendPassManager]: mod_parallel_pass finished after 0.027 seconds +2025-11-04T21:38:54Z INFO 9044 [BackendPassManager]: curr_vmrss: 409mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z USER 9044 [BackendPassManager]: Running assign_hwdge_engine +2025-11-04T21:38:54Z INFO 9044 [BackendPassManager]: Inputs to assign_hwdge_engine: modules=6 functions=6 allocs=10480 blocks=6 instructions=38308 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z USER 9044 [BackendPassManager]: assign_hwdge_engine finished after 0.006 seconds +2025-11-04T21:38:54Z INFO 9044 [BackendPassManager]: curr_vmrss: 409mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9044 [BackendPassManager]: Output has 6 module(s), 6 function(s), 10480 memory location(s), 6 block(s), and 38308 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z USER 9044 [BackendPassManager]: Running mod_parallel_pass +2025-11-04T21:38:54Z INFO 9044 [BackendPassManager]: Inputs to mod_parallel_pass: modules=6 functions=6 allocs=10480 blocks=6 instructions=38308 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running alloc_queues +2025-11-04T21:38:54Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running alloc_queues +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to alloc_queues: modules=1 functions=1 allocs=3030 blocks=1 instructions=14169 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to alloc_queues: modules=1 functions=1 allocs=2644 blocks=1 instructions=13451 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [AllocQueues]: Alloc Queue info: +┌───────────────────┬────────────────┬────────────┬────────────┬──────────────────┐ +│ Name │ DMAQueue::Type │ Engine │ Num Queues │ Num instructions │ +├───────────────────┼────────────────┼────────────┼────────────┼──────────────────┤ +│ qSPIO0 │ input │ SP │ 16 │ 6 │ +│ qSPSpillReload0 │ data │ SP │ 16 │ 11 │ +│ qPoolSpillReload0 │ data │ Pool │ 16 │ 3 │ +│ qActSpillReload0 │ data │ Activation │ 16 │ 298 │ +│ qDVESpillReload0 │ data │ DVE │ 16 │ 1 │ +│ qSPDynamicHW │ dynamic │ SP │ 16 │ 9 │ +│ qPoolDynamic │ dynamic │ Pool │ 16 │ 444 │ +│ qActDynamicHW │ dynamic │ Activation │ 16 │ 12 │ +└───────────────────┴────────────────┴────────────┴────────────┴──────────────────┘ + +2025-11-04T21:38:54Z USER 9044 (nc01/sg02) [ModuleForkPass]: alloc_queues finished after 0.003 seconds +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 409mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [AllocQueues]: Alloc Queue info: +┌───────────────────┬────────────────┬────────────┬────────────┬──────────────────┐ +│ Name │ DMAQueue::Type │ Engine │ Num Queues │ Num instructions │ +├───────────────────┼────────────────┼────────────┼────────────┼──────────────────┤ +│ qSPIO0 │ input │ SP │ 16 │ 7 │ +│ qSPSpillReload0 │ data │ SP │ 16 │ 28 │ +│ qDVESpillReload0 │ data │ DVE │ 16 │ 9 │ +│ qPoolSpillReload0 │ data │ Pool │ 16 │ 10 │ +│ qActSpillReload0 │ data │ Activation │ 16 │ 301 │ +│ qSPDynamicHW │ dynamic │ SP │ 16 │ 14 │ +│ qPoolDynamic │ dynamic │ Pool │ 16 │ 444 │ +│ qActDynamicHW │ dynamic │ Activation │ 16 │ 12 │ +└───────────────────┴────────────────┴────────────┴────────────┴──────────────────┘ + +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2644 memory location(s), 1 block(s), and 13451 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running chain_dma_transposes +2025-11-04T21:38:54Z USER 9044 (nc00/sg02) [ModuleForkPass]: alloc_queues finished after 0.003 seconds +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 409mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to chain_dma_transposes: modules=1 functions=1 allocs=2644 blocks=1 instructions=13451 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3030 memory location(s), 1 block(s), and 14169 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running chain_dma_transposes +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to chain_dma_transposes: modules=1 functions=1 allocs=3030 blocks=1 instructions=14169 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z USER 9044 (nc01/sg02) [ModuleForkPass]: chain_dma_transposes finished after 0.008 seconds +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 409mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2644 memory location(s), 1 block(s), and 13451 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running alloc_queues +2025-11-04T21:38:54Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running alloc_queues +2025-11-04T21:38:54Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running alloc_queues +2025-11-04T21:38:54Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running alloc_queues +2025-11-04T21:38:54Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to alloc_queues: modules=1 functions=1 allocs=1155 blocks=1 instructions=2032 Max writers: 34 Max Readers: 224 +2025-11-04T21:38:54Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to alloc_queues: modules=1 functions=1 allocs=1154 blocks=1 instructions=2029 Max writers: 34 Max Readers: 224 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to alloc_queues: modules=1 functions=1 allocs=1249 blocks=1 instructions=3315 Max writers: 34 Max Readers: 248 +2025-11-04T21:38:54Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to alloc_queues: modules=1 functions=1 allocs=1248 blocks=1 instructions=3312 Max writers: 34 Max Readers: 248 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg00) [AllocQueues]: Alloc Queue info: +┌───────────────────┬────────────────┬────────────┬────────────┬──────────────────┐ +│ Name │ DMAQueue::Type │ Engine │ Num Queues │ Num instructions │ +├───────────────────┼────────────────┼────────────┼────────────┼──────────────────┤ +│ qSPIO0 │ input │ SP │ 16 │ 4 │ +│ qSPSpillReload0 │ data │ SP │ 16 │ 1 │ +│ qPoolSpillReload0 │ data │ Pool │ 16 │ 32 │ +│ qSPDynamicHW │ dynamic │ SP │ 16 │ 98 │ +│ qPoolDynamic │ dynamic │ Pool │ 16 │ 72 │ +│ qActDynamicHW │ dynamic │ Activation │ 16 │ 66 │ +└───────────────────┴────────────────┴────────────┴────────────┴──────────────────┘ + +2025-11-04T21:38:54Z INFO 9044 (nc01/sg00) [AllocQueues]: Alloc Queue info: +┌───────────────────┬────────────────┬────────────┬────────────┬──────────────────┐ +│ Name │ DMAQueue::Type │ Engine │ Num Queues │ Num instructions │ +├───────────────────┼────────────────┼────────────┼────────────┼──────────────────┤ +│ qSPIO0 │ input │ SP │ 16 │ 3 │ +│ qSPSpillReload0 │ data │ SP │ 16 │ 1 │ +│ qPoolSpillReload0 │ data │ Pool │ 16 │ 32 │ +│ qSPDynamicHW │ dynamic │ SP │ 16 │ 98 │ +│ qPoolDynamic │ dynamic │ Pool │ 16 │ 71 │ +│ qActDynamicHW │ dynamic │ Activation │ 16 │ 66 │ +└───────────────────┴────────────────┴────────────┴────────────┴──────────────────┘ + +2025-11-04T21:38:54Z USER 9044 (nc01/sg00) [ModuleForkPass]: alloc_queues finished after 0.001 seconds +2025-11-04T21:38:54Z USER 9044 (nc00/sg00) [ModuleForkPass]: alloc_queues finished after 0.001 seconds +2025-11-04T21:38:54Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 409mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 409mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1154 memory location(s), 1 block(s), and 2029 instruction(s). Max writers: 34 Max Readers: 224 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1155 memory location(s), 1 block(s), and 2032 instruction(s). Max writers: 34 Max Readers: 224 +2025-11-04T21:38:54Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running chain_dma_transposes +2025-11-04T21:38:54Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running chain_dma_transposes +2025-11-04T21:38:54Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to chain_dma_transposes: modules=1 functions=1 allocs=1155 blocks=1 instructions=2032 Max writers: 34 Max Readers: 224 +2025-11-04T21:38:54Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to chain_dma_transposes: modules=1 functions=1 allocs=1154 blocks=1 instructions=2029 Max writers: 34 Max Readers: 224 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg01) [AllocQueues]: Alloc Queue info: +┌───────────────────┬────────────────┬────────────┬────────────┬──────────────────┐ +│ Name │ DMAQueue::Type │ Engine │ Num Queues │ Num instructions │ +├───────────────────┼────────────────┼────────────┼────────────┼──────────────────┤ +│ qSPIO0 │ input │ SP │ 16 │ 3 │ +│ qPoolSpillReload0 │ data │ Pool │ 16 │ 32 │ +│ qSPDynamicHW │ dynamic │ SP │ 16 │ 89 │ +│ qPoolDynamic │ dynamic │ Pool │ 16 │ 213 │ +│ qActDynamicHW │ dynamic │ Activation │ 16 │ 68 │ +└───────────────────┴────────────────┴────────────┴────────────┴──────────────────┘ + +2025-11-04T21:38:54Z USER 9044 (nc00/sg01) [ModuleForkPass]: alloc_queues finished after 0.001 seconds +2025-11-04T21:38:54Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 409mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1249 memory location(s), 1 block(s), and 3315 instruction(s). Max writers: 34 Max Readers: 248 +2025-11-04T21:38:54Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running chain_dma_transposes +2025-11-04T21:38:54Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to chain_dma_transposes: modules=1 functions=1 allocs=1249 blocks=1 instructions=3315 Max writers: 34 Max Readers: 248 +2025-11-04T21:38:54Z USER 9044 (nc01/sg00) [ModuleForkPass]: chain_dma_transposes finished after 0.001 seconds +2025-11-04T21:38:54Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 409mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1154 memory location(s), 1 block(s), and 2029 instruction(s). Max writers: 34 Max Readers: 224 +2025-11-04T21:38:54Z USER 9044 (nc00/sg01) [ModuleForkPass]: chain_dma_transposes finished after 0.001 seconds +2025-11-04T21:38:54Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 409mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1249 memory location(s), 1 block(s), and 3315 instruction(s). Max writers: 34 Max Readers: 248 +2025-11-04T21:38:54Z INFO 9044 (nc01/sg01) [AllocQueues]: Alloc Queue info: +┌───────────────────┬────────────────┬────────────┬────────────┬──────────────────┐ +│ Name │ DMAQueue::Type │ Engine │ Num Queues │ Num instructions │ +├───────────────────┼────────────────┼────────────┼────────────┼──────────────────┤ +│ qSPIO0 │ input │ SP │ 16 │ 2 │ +│ qPoolSpillReload0 │ data │ Pool │ 16 │ 32 │ +│ qSPDynamicHW │ dynamic │ SP │ 16 │ 89 │ +│ qPoolDynamic │ dynamic │ Pool │ 16 │ 212 │ +│ qActDynamicHW │ dynamic │ Activation │ 16 │ 68 │ +└───────────────────┴────────────────┴────────────┴────────────┴──────────────────┘ + +2025-11-04T21:38:54Z USER 9044 (nc01/sg01) [ModuleForkPass]: alloc_queues finished after 0.004 seconds +2025-11-04T21:38:54Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 409mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1248 memory location(s), 1 block(s), and 3312 instruction(s). Max writers: 34 Max Readers: 248 +2025-11-04T21:38:54Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running chain_dma_transposes +2025-11-04T21:38:54Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to chain_dma_transposes: modules=1 functions=1 allocs=1248 blocks=1 instructions=3312 Max writers: 34 Max Readers: 248 +2025-11-04T21:38:54Z USER 9044 (nc00/sg02) [ModuleForkPass]: chain_dma_transposes finished after 0.019 seconds +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 409mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z USER 9044 (nc01/sg01) [ModuleForkPass]: chain_dma_transposes finished after 0.001 seconds +2025-11-04T21:38:54Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 409mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1248 memory location(s), 1 block(s), and 3312 instruction(s). Max writers: 34 Max Readers: 248 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3030 memory location(s), 1 block(s), and 14169 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z USER 9044 (nc00/sg00) [ModuleForkPass]: chain_dma_transposes finished after 0.010 seconds +2025-11-04T21:38:54Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 409mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1155 memory location(s), 1 block(s), and 2032 instruction(s). Max writers: 34 Max Readers: 224 +2025-11-04T21:38:54Z USER 9044 [ModuleForkPass]: Compilation status: Total modules: 6, Passed: 6, Failed: 0 +2025-11-04T21:38:54Z USER 9044 [BackendPassManager]: mod_parallel_pass finished after 0.028 seconds +2025-11-04T21:38:54Z INFO 9044 [BackendPassManager]: curr_vmrss: 409mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z USER 9044 [BackendPassManager]: Running nc_parallel_pass +2025-11-04T21:38:54Z INFO 9044 [BackendPassManager]: Inputs to nc_parallel_pass: modules=6 functions=6 allocs=10480 blocks=6 instructions=38308 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z USER 9044 (nc00) [CoreForkPass]: Running insert_dma_switch_queue_instance +2025-11-04T21:38:54Z INFO 9044 (nc00) [CoreForkPass]: Inputs to insert_dma_switch_queue_instance: modules=3 functions=3 allocs=5434 blocks=3 instructions=19516 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z USER 9044 (nc00) [CoreForkPass]: insert_dma_switch_queue_instance finished after 0.000 seconds +2025-11-04T21:38:54Z INFO 9044 (nc00) [CoreForkPass]: curr_vmrss: 409mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9044 (nc00) [CoreForkPass]: Output has 3 module(s), 3 function(s), 5434 memory location(s), 3 block(s), and 19516 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z USER 9044 (nc01) [CoreForkPass]: Running insert_dma_switch_queue_instance +2025-11-04T21:38:54Z INFO 9044 (nc01) [CoreForkPass]: Inputs to insert_dma_switch_queue_instance: modules=3 functions=3 allocs=5046 blocks=3 instructions=18792 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z USER 9044 (nc01) [CoreForkPass]: insert_dma_switch_queue_instance finished after 0.000 seconds +2025-11-04T21:38:54Z INFO 9044 (nc01) [CoreForkPass]: curr_vmrss: 409mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9044 (nc01) [CoreForkPass]: Output has 3 module(s), 3 function(s), 5046 memory location(s), 3 block(s), and 18792 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z USER 9044 [CoreForkPass]: Compilation status: Total modules: 2, Passed: 6, Failed: 0 +2025-11-04T21:38:54Z USER 9044 [BackendPassManager]: nc_parallel_pass finished after 0.002 seconds +2025-11-04T21:38:54Z INFO 9044 [BackendPassManager]: curr_vmrss: 409mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z USER 9044 [BackendPassManager]: Running mod_parallel_pass +2025-11-04T21:38:54Z INFO 9044 [BackendPassManager]: Inputs to mod_parallel_pass: modules=6 functions=6 allocs=10480 blocks=6 instructions=38308 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running prefetch_scheduling_after_sched +2025-11-04T21:38:54Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running prefetch_scheduling_after_sched +2025-11-04T21:38:54Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running prefetch_scheduling_after_sched +2025-11-04T21:38:54Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to prefetch_scheduling_after_sched: modules=1 functions=1 allocs=1155 blocks=1 instructions=2032 Max writers: 34 Max Readers: 224 +2025-11-04T21:38:54Z USER 9044 (nc00/sg00) [ModuleForkPass]: prefetch_scheduling_after_sched finished after 0.000 seconds +2025-11-04T21:38:54Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 409mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to prefetch_scheduling_after_sched: modules=1 functions=1 allocs=1248 blocks=1 instructions=3312 Max writers: 34 Max Readers: 248 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1155 memory location(s), 1 block(s), and 2032 instruction(s). Max writers: 34 Max Readers: 224 +2025-11-04T21:38:54Z USER 9044 (nc01/sg01) [ModuleForkPass]: prefetch_scheduling_after_sched finished after 0.000 seconds +2025-11-04T21:38:54Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running lower_control +2025-11-04T21:38:54Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 409mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to prefetch_scheduling_after_sched: modules=1 functions=1 allocs=2644 blocks=1 instructions=13451 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to lower_control: modules=1 functions=1 allocs=1155 blocks=1 instructions=2032 Max writers: 34 Max Readers: 224 +2025-11-04T21:38:54Z USER 9044 (nc01/sg02) [ModuleForkPass]: prefetch_scheduling_after_sched finished after 0.000 seconds +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 409mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running prefetch_scheduling_after_sched +2025-11-04T21:38:54Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running prefetch_scheduling_after_sched +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2644 memory location(s), 1 block(s), and 13451 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1248 memory location(s), 1 block(s), and 3312 instruction(s). Max writers: 34 Max Readers: 248 +2025-11-04T21:38:54Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running lower_control +2025-11-04T21:38:54Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to prefetch_scheduling_after_sched: modules=1 functions=1 allocs=1249 blocks=1 instructions=3315 Max writers: 34 Max Readers: 248 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to prefetch_scheduling_after_sched: modules=1 functions=1 allocs=3030 blocks=1 instructions=14169 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z USER 9044 (nc00/sg02) [ModuleForkPass]: prefetch_scheduling_after_sched finished after 0.000 seconds +2025-11-04T21:38:54Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running prefetch_scheduling_after_sched +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to lower_control: modules=1 functions=1 allocs=2644 blocks=1 instructions=13451 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 409mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to prefetch_scheduling_after_sched: modules=1 functions=1 allocs=1154 blocks=1 instructions=2029 Max writers: 34 Max Readers: 224 +2025-11-04T21:38:54Z USER 9044 (nc01/sg00) [ModuleForkPass]: prefetch_scheduling_after_sched finished after 0.000 seconds +2025-11-04T21:38:54Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 409mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1154 memory location(s), 1 block(s), and 2029 instruction(s). Max writers: 34 Max Readers: 224 +2025-11-04T21:38:54Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running lower_control +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3030 memory location(s), 1 block(s), and 14169 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running lower_control +2025-11-04T21:38:54Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to lower_control: modules=1 functions=1 allocs=1154 blocks=1 instructions=2029 Max writers: 34 Max Readers: 224 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to lower_control: modules=1 functions=1 allocs=3030 blocks=1 instructions=14169 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg00) [LowerControl]: EraseInterBbDeps removed 0 inter-BB deps +2025-11-04T21:38:54Z INFO 9044 (nc01/sg00) [LowerControl]: EraseInterBbDeps removed 0 inter-BB deps +2025-11-04T21:38:54Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running lower_control +2025-11-04T21:38:54Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to lower_control: modules=1 functions=1 allocs=1248 blocks=1 instructions=3312 Max writers: 34 Max Readers: 248 +2025-11-04T21:38:54Z USER 9044 (nc00/sg00) [ModuleForkPass]: lower_control finished after 0.006 seconds +2025-11-04T21:38:54Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 409mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1155 memory location(s), 1 block(s), and 2032 instruction(s). Max writers: 34 Max Readers: 224 +2025-11-04T21:38:54Z USER 9044 (nc00/sg00) [ModuleForkPass]: Running dep_reduction +2025-11-04T21:38:54Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Inputs to dep_reduction: modules=1 functions=1 allocs=1155 blocks=1 instructions=2032 Max writers: 34 Max Readers: 224 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg00) [DepReduction]: Start Dependency Reduction +2025-11-04T21:38:54Z INFO 9044 (nc00/sg00) [DepReduction]: Cacheing dependencies for debug info +2025-11-04T21:38:54Z USER 9044 (nc00/sg01) [ModuleForkPass]: prefetch_scheduling_after_sched finished after 0.000 seconds +2025-11-04T21:38:54Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 409mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1249 memory location(s), 1 block(s), and 3315 instruction(s). Max writers: 34 Max Readers: 248 +2025-11-04T21:38:54Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running lower_control +2025-11-04T21:38:54Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to lower_control: modules=1 functions=1 allocs=1249 blocks=1 instructions=3315 Max writers: 34 Max Readers: 248 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg00) [DepReduction]: Processing async instrs... +2025-11-04T21:38:54Z INFO 9044 (nc00/sg00) [DepReduction]: Processing secondary edges per engine... +2025-11-04T21:38:54Z INFO 9044 (nc00/sg00) [DepReduction]: Processing secondary edges per engine, Done. Num edges removed 1964 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg01) [LowerControl]: EraseInterBbDeps removed 0 inter-BB deps +2025-11-04T21:38:54Z INFO 9044 (nc00/sg00) [DepReduction]: Processing redundant descendants, Done. Num edges removed 2158 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg00) [DepReduction]: Processing async instrs, Done. Num edges removed 2158 +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [LowerControl]: EraseInterBbDeps removed 0 inter-BB deps +2025-11-04T21:38:54Z USER 9044 (nc00/sg01) [ModuleForkPass]: lower_control finished after 0.009 seconds +2025-11-04T21:38:54Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 409mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1249 memory location(s), 1 block(s), and 3315 instruction(s). Max writers: 34 Max Readers: 248 +2025-11-04T21:38:54Z USER 9044 (nc00/sg01) [ModuleForkPass]: Running dep_reduction +2025-11-04T21:38:54Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Inputs to dep_reduction: modules=1 functions=1 allocs=1249 blocks=1 instructions=3315 Max writers: 34 Max Readers: 248 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg01) [DepReduction]: Start Dependency Reduction +2025-11-04T21:38:54Z INFO 9044 (nc00/sg01) [DepReduction]: Cacheing dependencies for debug info +2025-11-04T21:38:54Z USER 9044 (nc01/sg00) [ModuleForkPass]: lower_control finished after 0.018 seconds +2025-11-04T21:38:54Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 409mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1154 memory location(s), 1 block(s), and 2029 instruction(s). Max writers: 34 Max Readers: 224 +2025-11-04T21:38:54Z USER 9044 (nc01/sg00) [ModuleForkPass]: Running dep_reduction +2025-11-04T21:38:54Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Inputs to dep_reduction: modules=1 functions=1 allocs=1154 blocks=1 instructions=2029 Max writers: 34 Max Readers: 224 +2025-11-04T21:38:54Z INFO 9044 (nc01/sg00) [DepReduction]: Start Dependency Reduction +2025-11-04T21:38:54Z INFO 9044 (nc01/sg00) [DepReduction]: Cacheing dependencies for debug info +2025-11-04T21:38:54Z INFO 9044 (nc01/sg01) [LowerControl]: EraseInterBbDeps removed 0 inter-BB deps +2025-11-04T21:38:54Z USER 9044 (nc01/sg01) [ModuleForkPass]: lower_control finished after 0.023 seconds +2025-11-04T21:38:54Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 409mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1248 memory location(s), 1 block(s), and 3312 instruction(s). Max writers: 34 Max Readers: 248 +2025-11-04T21:38:54Z USER 9044 (nc01/sg01) [ModuleForkPass]: Running dep_reduction +2025-11-04T21:38:54Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Inputs to dep_reduction: modules=1 functions=1 allocs=1248 blocks=1 instructions=3312 Max writers: 34 Max Readers: 248 +2025-11-04T21:38:54Z INFO 9044 (nc01/sg01) [DepReduction]: Start Dependency Reduction +2025-11-04T21:38:54Z INFO 9044 (nc01/sg01) [DepReduction]: Cacheing dependencies for debug info +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [LowerControl]: EraseInterBbDeps removed 0 inter-BB deps +2025-11-04T21:38:54Z INFO 9044 (nc01/sg00) [DepReduction]: Processing async instrs... +2025-11-04T21:38:54Z INFO 9044 (nc01/sg00) [DepReduction]: Processing secondary edges per engine... +2025-11-04T21:38:54Z INFO 9044 (nc01/sg00) [DepReduction]: Processing secondary edges per engine, Done. Num edges removed 1971 +2025-11-04T21:38:54Z INFO 9044 (nc01/sg00) [DepReduction]: Processing redundant descendants, Done. Num edges removed 2163 +2025-11-04T21:38:54Z INFO 9044 (nc01/sg00) [DepReduction]: Processing async instrs, Done. Num edges removed 2163 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg01) [DepReduction]: Processing async instrs... +2025-11-04T21:38:54Z INFO 9044 (nc00/sg01) [DepReduction]: Processing secondary edges per engine... +2025-11-04T21:38:54Z USER 9044 (nc01/sg02) [ModuleForkPass]: lower_control finished after 0.038 seconds +2025-11-04T21:38:54Z USER 9044 (nc00/sg02) [ModuleForkPass]: lower_control finished after 0.039 seconds +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 410mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3030 memory location(s), 1 block(s), and 14169 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z USER 9044 (nc00/sg02) [ModuleForkPass]: Running dep_reduction +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Inputs to dep_reduction: modules=1 functions=1 allocs=3030 blocks=1 instructions=14169 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [DepReduction]: Start Dependency Reduction +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [DepReduction]: Cacheing dependencies for debug info +2025-11-04T21:38:54Z INFO 9044 (nc01/sg01) [DepReduction]: Processing async instrs... +2025-11-04T21:38:54Z INFO 9044 (nc01/sg01) [DepReduction]: Processing secondary edges per engine... +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 410mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2644 memory location(s), 1 block(s), and 13451 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z USER 9044 (nc01/sg02) [ModuleForkPass]: Running dep_reduction +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Inputs to dep_reduction: modules=1 functions=1 allocs=2644 blocks=1 instructions=13451 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [DepReduction]: Start Dependency Reduction +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [DepReduction]: Cacheing dependencies for debug info +2025-11-04T21:38:54Z INFO 9044 (nc00/sg00) [DepReduction]: Num Async removed: 0 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg00) [DepReduction]: Finished dependency reduction: 8804 removed, new total 1134 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg00) [DepReduction]: Finished Dependency Reduction +2025-11-04T21:38:54Z USER 9044 (nc00/sg00) [ModuleForkPass]: dep_reduction finished after 0.044 seconds +2025-11-04T21:38:54Z INFO 9044 (nc00/sg00) [ModuleForkPass]: curr_vmrss: 410mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9044 (nc00/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1155 memory location(s), 1 block(s), and 2032 instruction(s). Max writers: 34 Max Readers: 224 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg01) [DepReduction]: Processing secondary edges per engine, Done. Num edges removed 3497 +2025-11-04T21:38:54Z INFO 9044 (nc01/sg01) [DepReduction]: Processing secondary edges per engine, Done. Num edges removed 3525 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg01) [DepReduction]: Processing redundant descendants, Done. Num edges removed 3837 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg01) [DepReduction]: Processing async instrs, Done. Num edges removed 3837 +2025-11-04T21:38:54Z INFO 9044 (nc01/sg01) [DepReduction]: Processing redundant descendants, Done. Num edges removed 3864 +2025-11-04T21:38:54Z INFO 9044 (nc01/sg01) [DepReduction]: Processing async instrs, Done. Num edges removed 3864 +2025-11-04T21:38:54Z INFO 9044 (nc01/sg00) [DepReduction]: Num Async removed: 0 +2025-11-04T21:38:54Z INFO 9044 (nc01/sg00) [DepReduction]: Finished dependency reduction: 8680 removed, new total 1132 +2025-11-04T21:38:54Z INFO 9044 (nc01/sg00) [DepReduction]: Finished Dependency Reduction +2025-11-04T21:38:54Z USER 9044 (nc01/sg00) [ModuleForkPass]: dep_reduction finished after 0.080 seconds +2025-11-04T21:38:54Z INFO 9044 (nc01/sg00) [ModuleForkPass]: curr_vmrss: 414mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9044 (nc01/sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1154 memory location(s), 1 block(s), and 2029 instruction(s). Max writers: 34 Max Readers: 224 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [DepReduction]: Processing async instrs... +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [DepReduction]: Processing secondary edges per engine... +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [DepReduction]: Processing async instrs... +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [DepReduction]: Processing secondary edges per engine... +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [DepReduction]: Processing secondary edges per engine, Done. Num edges removed 12501 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [DepReduction]: Processing secondary edges per engine, Done. Num edges removed 13037 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [DepReduction]: Processing redundant descendants, Done. Num edges removed 14186 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [DepReduction]: Processing async instrs, Done. Num edges removed 14186 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg01) [DepReduction]: Num Async removed: 0 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg01) [DepReduction]: Finished dependency reduction: 17210 removed, new total 1392 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg01) [DepReduction]: Finished Dependency Reduction +2025-11-04T21:38:54Z USER 9044 (nc00/sg01) [ModuleForkPass]: dep_reduction finished after 0.145 seconds +2025-11-04T21:38:54Z INFO 9044 (nc00/sg01) [ModuleForkPass]: curr_vmrss: 418mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9044 (nc00/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1249 memory location(s), 1 block(s), and 3315 instruction(s). Max writers: 34 Max Readers: 248 +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [DepReduction]: Processing redundant descendants, Done. Num edges removed 13281 +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [DepReduction]: Processing async instrs, Done. Num edges removed 13281 +2025-11-04T21:38:54Z INFO 9044 (nc01/sg01) [DepReduction]: Num Async removed: 0 +2025-11-04T21:38:54Z INFO 9044 (nc01/sg01) [DepReduction]: Finished dependency reduction: 17306 removed, new total 1381 +2025-11-04T21:38:54Z INFO 9044 (nc01/sg01) [DepReduction]: Finished Dependency Reduction +2025-11-04T21:38:54Z USER 9044 (nc01/sg01) [ModuleForkPass]: dep_reduction finished after 0.146 seconds +2025-11-04T21:38:54Z INFO 9044 (nc01/sg01) [ModuleForkPass]: curr_vmrss: 417mb, ru_maxrss: 421mb (delta=0mb) +2025-11-04T21:38:54Z INFO 9044 (nc01/sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1248 memory location(s), 1 block(s), and 3312 instruction(s). Max writers: 34 Max Readers: 248 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [DepReduction]: Num Async removed: 0 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [DepReduction]: Finished dependency reduction: 77784 removed, new total 4256 +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [DepReduction]: Finished Dependency Reduction +2025-11-04T21:38:54Z USER 9044 (nc00/sg02) [ModuleForkPass]: dep_reduction finished after 0.279 seconds +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [ModuleForkPass]: curr_vmrss: 424mb, ru_maxrss: 424mb (delta=3mb) +2025-11-04T21:38:54Z INFO 9044 (nc00/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3030 memory location(s), 1 block(s), and 14169 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [DepReduction]: Num Async removed: 0 +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [DepReduction]: Finished dependency reduction: 60385 removed, new total 3432 +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [DepReduction]: Finished Dependency Reduction +2025-11-04T21:38:54Z USER 9044 (nc01/sg02) [ModuleForkPass]: dep_reduction finished after 0.295 seconds +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [ModuleForkPass]: curr_vmrss: 423mb, ru_maxrss: 424mb (delta=3mb) +2025-11-04T21:38:54Z INFO 9044 (nc01/sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 2644 memory location(s), 1 block(s), and 13451 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z USER 9044 [ModuleForkPass]: Compilation status: Total modules: 6, Passed: 6, Failed: 0 +2025-11-04T21:38:54Z USER 9044 [BackendPassManager]: mod_parallel_pass finished after 0.359 seconds +2025-11-04T21:38:54Z INFO 9044 [BackendPassManager]: curr_vmrss: 420mb, ru_maxrss: 424mb (delta=3mb) +2025-11-04T21:38:54Z USER 9044 [BackendPassManager]: Running nc_parallel_pass +2025-11-04T21:38:54Z INFO 9044 [BackendPassManager]: Inputs to nc_parallel_pass: modules=6 functions=6 allocs=10480 blocks=6 instructions=38308 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z USER 9044 (nc00) [CoreForkPass]: Running bir_linker +2025-11-04T21:38:54Z USER 9044 (nc01) [CoreForkPass]: Running bir_linker +2025-11-04T21:38:54Z INFO 9044 (nc00) [CoreForkPass]: Inputs to bir_linker: modules=3 functions=3 allocs=5434 blocks=3 instructions=19516 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z INFO 9044 (nc01) [CoreForkPass]: Inputs to bir_linker: modules=3 functions=3 allocs=5046 blocks=3 instructions=18792 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:54Z INFO 9044 (nc01/sgLnk) [BirLinker]: bir_linker cwd: +2025-11-04T21:38:54Z INFO 9044 (nc00/sgLnk) [BirLinker]: bir_linker cwd: +2025-11-04T21:38:54Z INFO 9044 (nc01/sgLnk) [BirLinker]: Num intermediates 86 +2025-11-04T21:38:54Z INFO 9044 (nc01/sgLnk) [BirLinker]: Num Module Definitions 3 +2025-11-04T21:38:54Z INFO 9044 (nc01/sgLnk) [BirLinker]: Linking to a call-graph structure +2025-11-04T21:38:54Z INFO 9044 (nc00/sgLnk) [BirLinker]: Num intermediates 86 +2025-11-04T21:38:54Z INFO 9044 (nc00/sgLnk) [BirLinker]: Num Module Definitions 3 +2025-11-04T21:38:54Z INFO 9044 (nc00/sgLnk) [BirLinker]: Linking to a call-graph structure +2025-11-04T21:38:54Z INFO 9044 (nc00/sgLnk) [BirLinker]: Added a new SpillReload Que qSPPIOParam0 +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [BirLinker]: tensor_map verification successful. +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [BirLinker]: Writing updated tensor_map /home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk3/neuronxcc-miwah3fg/nc00/sgLnk/sg00/tensor_map.json +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [BirLinker]: PostLink Stats: #MatMults 68915 #MatMult-Transposes 12163 +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [BirLinker]: Total Intermediate MMTs 216 #out: 0 #inp: 216 #symmetric: 0 +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [BirLinker]: Total Intermediate IOs with MMTs: 2 #out: 0 #inp: 2 #both: 0 +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [BirLinker]: releasing pre-link modules +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [BirLinker]: tensor_map verification successful. +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [BirLinker]: Writing updated tensor_map /home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk3/neuronxcc-miwah3fg/nc01/sgLnk/sg00/tensor_map.json +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [BirLinker]: PostLink Stats: #MatMults 68791 #MatMult-Transposes 12163 +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [BirLinker]: Total Intermediate MMTs 216 #out: 0 #inp: 216 #symmetric: 0 +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [BirLinker]: Total Intermediate IOs with MMTs: 2 #out: 0 #inp: 2 #both: 0 +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [BirLinker]: releasing pre-link modules +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [BirLinker]: linking Done. +2025-11-04T21:38:55Z USER 9044 (nc01) [CoreForkPass]: bir_linker finished after 0.393 seconds +2025-11-04T21:38:55Z INFO 9044 (nc01) [CoreForkPass]: curr_vmrss: 627mb, ru_maxrss: 627mb (delta=203mb) +2025-11-04T21:38:55Z INFO 9044 (nc01) [CoreForkPass]: Output has 1 module(s), 4 function(s), 5560 memory location(s), 4 block(s), and 18834 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 (nc01) [CoreForkPass]: Running postlnk_dma_report +2025-11-04T21:38:55Z INFO 9044 (nc01) [CoreForkPass]: Inputs to postlnk_dma_report: modules=1 functions=4 allocs=5560 blocks=4 instructions=18834 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [BirLinker]: linking Done. +2025-11-04T21:38:55Z USER 9044 (nc00) [CoreForkPass]: bir_linker finished after 0.422 seconds +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: curr_vmrss: 432mb, ru_maxrss: 627mb (delta=203mb) +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: Output has 1 module(s), 4 function(s), 5948 memory location(s), 4 block(s), and 19558 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [DMAReport]: DMA Report: Bytes loaded or saved 302725018, 83.4954% input load, 1.47211% output write, 15.0325% spill/reload +2025-11-04T21:38:55Z USER 9044 (nc01) [CoreForkPass]: postlnk_dma_report finished after 0.004 seconds +2025-11-04T21:38:55Z INFO 9044 (nc01) [CoreForkPass]: curr_vmrss: 432mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9044 (nc01) [CoreForkPass]: Output has 1 module(s), 4 function(s), 5560 memory location(s), 4 block(s), and 18834 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 (nc01) [CoreForkPass]: Running report_stats +2025-11-04T21:38:55Z INFO 9044 (nc01) [CoreForkPass]: Inputs to report_stats: modules=1 functions=4 allocs=5560 blocks=4 instructions=18834 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [ReportStats]: Data Movement Statistics: main +┌─────────────┬──────┬───────┬───────┐ +│ Instruction │ Kind │ Count │ Bytes │ +└─────────────┴──────┴───────┴───────┘ + +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [ReportStats]: +┌─────────────────────┬───────┐ +│ Bytes per partition │ Count │ +└─────────────────────┴───────┘ + +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [ReportStats]: Data Movement Statistics: sg0000 +┌─────────────────┬────────────────────────────┬───────┬────────────┐ +│ Instruction │ Kind │ Count │ Bytes │ +├─────────────────┼────────────────────────────┼───────┼────────────┤ +│ DMACopy │ ExternalInput -> Internal │ 8 │ 2489319424 │ +│ DMACopy │ Internal -> ExternalOutput │ 32 │ 1073741824 │ +│ DMACopy (Spill) │ Internal │ 32 │ 0 │ +│ Load │ Const -> Internal │ 5 │ 53504 │ +│ Load │ ExternalInput -> Internal │ 28 │ 10496516 │ +│ Load │ Internal │ 94 │ 7602176 │ +│ Save │ Internal │ 66 │ 7340032 │ +│ Save │ Internal -> Output │ 6 │ 2359296 │ +└─────────────────┴────────────────────────────┴───────┴────────────┘ + +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [ReportStats]: +┌─────────────────────┬───────┐ +│ Bytes per partition │ Count │ +├─────────────────────┼───────┤ +│ 2 │ 4 │ +│ 4 │ 1 │ +│ 32 │ 2 │ +│ 64 │ 2 │ +│ 256 │ 98 │ +│ 512 │ 1 │ +│ 1024 │ 52 │ +│ 2048 │ 17 │ +│ 4096 │ 30 │ +│ 1048576 │ 32 │ +└─────────────────────┴───────┘ + +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [ReportStats]: Data Movement Statistics: sg0001 +┌─────────────────┬────────────────────────────┬───────┬────────────┐ +│ Instruction │ Kind │ Count │ Bytes │ +├─────────────────┼────────────────────────────┼───────┼────────────┤ +│ DMACopy │ Input -> Internal │ 1 │ 6291456 │ +│ DMACopy │ Internal -> ExternalOutput │ 32 │ 1073741824 │ +│ DMACopy (Spill) │ Internal │ 32 │ 0 │ +│ Load │ Const -> Internal │ 5 │ 65536 │ +│ Load │ ExternalInput -> Internal │ 171 │ 48243204 │ +│ Load │ Input -> Internal │ 6 │ 524288 │ +│ Load │ Internal │ 84 │ 9437184 │ +│ Save │ Internal │ 68 │ 8388608 │ +│ Save │ Internal -> Output │ 4 │ 2097152 │ +└─────────────────┴────────────────────────────┴───────┴────────────┘ + +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [ReportStats]: +┌─────────────────────┬───────┐ +│ Bytes per partition │ Count │ +├─────────────────────┼───────┤ +│ 2 │ 4 │ +│ 4 │ 1 │ +│ 32 │ 2 │ +│ 64 │ 4 │ +│ 256 │ 97 │ +│ 512 │ 4 │ +│ 1024 │ 130 │ +│ 2048 │ 8 │ +│ 4096 │ 88 │ +│ 1048576 │ 32 │ +│ 2097152 │ 3 │ +└─────────────────────┴───────┘ + +2025-11-04T21:38:55Z USER 9044 (nc00) [CoreForkPass]: Running postlnk_dma_report +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: Inputs to postlnk_dma_report: modules=1 functions=4 allocs=5948 blocks=4 instructions=19558 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [ReportStats]: Data Movement Statistics: sg0002 +┌─────────────┬───────────────────────────┬───────┬───────────┐ +│ Instruction │ Kind │ Count │ Bytes │ +├─────────────┼───────────────────────────┼───────┼───────────┤ +│ DMACopy │ Input -> Internal │ 1 │ 6291456 │ +│ DMACopy │ Internal │ 2 │ 4194304 │ +│ Load │ Const -> Internal │ 1 │ 32768 │ +│ Load │ ExternalInput -> Internal │ 448 │ 193345548 │ +│ Load │ Internal │ 20 │ 6294662 │ +│ Save │ Internal │ 312 │ 6444544 │ +└─────────────┴───────────────────────────┴───────┴───────────┘ + +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [ReportStats]: +┌─────────────────────┬───────┐ +│ Bytes per partition │ Count │ +├─────────────────────┼───────┤ +│ 2 │ 2 │ +│ 4 │ 4 │ +│ 32 │ 2 │ +│ 128 │ 2 │ +│ 256 │ 1 │ +│ 384 │ 1 │ +│ 512 │ 302 │ +│ 1024 │ 97 │ +│ 2048 │ 1 │ +│ 4096 │ 370 │ +│ 2097152 │ 3 │ +└─────────────────────┴───────┘ + +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [ReportStats]: MM Stats: #MatMults 14191 #MatMult-Transposes 5715 +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [ReportStats]: IO Tensor size combined: 6781430828 +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [ReportStats]: IO Tensor Statistics: +┌────────────────────┬────────────────┬──────────┬──────────────┐ +│ Largest IO Tensors │ Kind │ Src Type │ Size (Bytes) │ +├────────────────────┼────────────────┼──────────┼──────────────┤ +│ input60_sg0000 │ ExternalInput │ bfloat16 │ 311164928 │ +│ input369_sg0002 │ ExternalInput │ bfloat16 │ 311164928 │ +│ input60 │ ExternalInput │ bfloat16 │ 311164928 │ +│ input369 │ ExternalInput │ bfloat16 │ 311164928 │ +│ output3 │ ExternalOutput │ bfloat16 │ 33554432 │ +│ output2 │ ExternalOutput │ bfloat16 │ 33554432 │ +│ input5 │ ExternalInput │ bfloat16 │ 33554432 │ +│ output7 │ ExternalOutput │ bfloat16 │ 33554432 │ +│ input4 │ ExternalInput │ bfloat16 │ 33554432 │ +│ output11 │ ExternalOutput │ bfloat16 │ 33554432 │ +└────────────────────┴────────────────┴──────────┴──────────────┘ + +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [ReportStats]: Large (Internal) Tensor Statistics: +┌─────────────────┬───────────────────┬──────────┬──────────────┐ +│ Largest Tensors │ Kind │ Src Type │ Size (Bytes) │ +├─────────────────┼───────────────────┼──────────┼──────────────┤ +│ intermediate3 │ InternalInterface │ bfloat16 │ 4194304 │ +│ intermediate0 │ InternalInterface │ bfloat16 │ 4194304 │ +│ intermediate20 │ InternalInterface │ bfloat16 │ 4194304 │ +│ intermediate11 │ InternalInterface │ bfloat16 │ 4194304 │ +│ intermediate5 │ InternalInterface │ bfloat16 │ 4194304 │ +│ intermediate14 │ InternalInterface │ bfloat16 │ 4194304 │ +│ intermediate26 │ InternalInterface │ bfloat16 │ 4194304 │ +│ intermediate23 │ InternalInterface │ bfloat16 │ 4194304 │ +│ intermediate17 │ InternalInterface │ bfloat16 │ 4194304 │ +│ intermediate8 │ InternalInterface │ bfloat16 │ 4194304 │ +└─────────────────┴───────────────────┴──────────┴──────────────┘ + +2025-11-04T21:38:55Z USER 9044 (nc01) [CoreForkPass]: report_stats finished after 0.010 seconds +2025-11-04T21:38:55Z INFO 9044 (nc01) [CoreForkPass]: curr_vmrss: 432mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DMAReport]: DMA Report: Bytes loaded or saved 303375797, 83.4205% input load, 1.46896% output write, 15.1106% spill/reload +2025-11-04T21:38:55Z USER 9044 (nc00) [CoreForkPass]: postlnk_dma_report finished after 0.008 seconds +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: curr_vmrss: 432mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: Output has 1 module(s), 4 function(s), 5948 memory location(s), 4 block(s), and 19558 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 (nc00) [CoreForkPass]: Running report_stats +2025-11-04T21:38:55Z INFO 9044 (nc01) [CoreForkPass]: Output has 1 module(s), 4 function(s), 5560 memory location(s), 4 block(s), and 18834 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 (nc01) [CoreForkPass]: Running coloring_allocator_dram_post_lnk +2025-11-04T21:38:55Z INFO 9044 (nc01) [CoreForkPass]: Inputs to coloring_allocator_dram_post_lnk: modules=1 functions=4 allocs=5560 blocks=4 instructions=18834 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [DRAM_Allocator]: allocating spills in DRAM post_link mode for address space Local +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [DRAM_Allocator]: reserved space = 0 bytes +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [DRAM_Allocator]: spill space = 0 bytes +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: Inputs to report_stats: modules=1 functions=4 allocs=5948 blocks=4 instructions=19558 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [ReportStats]: Data Movement Statistics: main +┌─────────────┬──────┬───────┬───────┐ +│ Instruction │ Kind │ Count │ Bytes │ +└─────────────┴──────┴───────┴───────┘ + +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [ReportStats]: +┌─────────────────────┬───────┐ +│ Bytes per partition │ Count │ +└─────────────────────┴───────┘ + +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [DRAM_Allocator]: aligned spill space = 0 bytes +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [DRAM_Allocator]: dram space = 107374182400 bytes +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [DRAM_Allocator]: renumber locations +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [DRAM_Allocator]: size = 0 +2025-11-04T21:38:55Z INFO 9044 []: find first defs for local +2025-11-04T21:38:55Z INFO 9044 []: find first defs for global +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [DRAM_Allocator]: Num intervals 0 Num locations 0 +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [DRAM_Allocator]: IntervalTree Build Done +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [DRAM_Allocator]: info.neighbors init Done +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [DRAM_Allocator]: IntervalTree readback Done +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [DRAM_Allocator]: simplify interference graph +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [DRAM_Allocator]: initialize low and high +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [DRAM_Allocator]: lo = 0 +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [DRAM_Allocator]: hi = 0 +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [DRAM_Allocator]: total = 0 +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [DRAM_Allocator]: simplify +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [DRAM_Allocator]: new candidates = 0 +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [DRAM_Allocator]: Already used DRAM hwm: 27262976 +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [DRAM_Allocator]: select ranges +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [DRAM_Allocator]: CC buffer size limit 524288000 +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [DRAM_Allocator]: allreduce_dram_hwm 27262976 +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [DRAM_Allocator]: Real CC buffer size 27262976 +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [DRAM_Allocator]: DRAM hwm after allocation: 27262976 +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [DRAM_Allocator]: DRAM allocation successful +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [ReportStats]: Data Movement Statistics: sg0000 +┌─────────────────┬────────────────────────────┬───────┬────────────┐ +│ Instruction │ Kind │ Count │ Bytes │ +├─────────────────┼────────────────────────────┼───────���────────────┤ +│ DMACopy │ ExternalInput -> Internal │ 8 │ 2489319424 │ +│ DMACopy │ Internal -> ExternalOutput │ 32 │ 1073741824 │ +│ DMACopy │ Internal -> Output │ 1 │ 8388608 │ +│ DMACopy (Spill) │ Internal │ 32 │ 0 │ +│ Load │ Const -> Internal │ 5 │ 53504 │ +│ Load │ ExternalInput -> Internal │ 28 │ 10496516 │ +│ Load │ Internal │ 94 │ 7602176 │ +│ Save │ Internal │ 66 │ 7340032 │ +│ Save │ Internal -> Output │ 7 │ 2359298 │ +└─────────────────┴────────────────────────────┴───────┴────────────┘ + +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [ReportStats]: +┌─────────────────────┬───────┐ +│ Bytes per partition │ Count │ +├─────────────────────┼───────┤ +│ 2 │ 5 │ +│ 4 │ 1 │ +│ 32 │ 2 │ +│ 64 │ 2 │ +│ 256 │ 98 │ +│ 512 │ 1 │ +│ 1024 │ 52 │ +│ 2048 │ 17 │ +│ 4096 │ 30 │ +│ 1048576 │ 32 │ +│ 4194304 │ 2 │ +└─────────────────────┴───────┘ + +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [ReportStats]: Data Movement Statistics: sg0001 +┌─────────────────┬────────────────────────────┬───────┬────────────┐ +│ Instruction │ Kind │ Count │ Bytes │ +├─────────────────┼────────────────────────────┼───────┼────────────┤ +│ DMACopy │ Input -> Internal │ 1 │ 6291456 │ +│ DMACopy │ Internal -> ExternalOutput │ 32 │ 1073741824 │ +│ DMACopy │ Internal -> Output │ 1 │ 8388608 │ +│ DMACopy (Spill) │ Internal │ 32 │ 0 │ +│ Load │ Const -> Internal │ 5 │ 65536 │ +│ Load │ ExternalInput -> Internal │ 171 │ 48243204 │ +│ Load │ Input -> Internal │ 6 │ 524288 │ +│ Load │ Internal │ 84 │ 9437184 │ +│ Save │ Internal │ 68 │ 8388608 │ +│ Save │ Internal -> Output │ 5 │ 2097154 │ +└─────────────────┴────────────────────────────┴───────┴────────────┘ + +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [ReportStats]: +┌─────────────────────┬───────┐ +│ Bytes per partition │ Count │ +├─────────────────────┼───────┤ +│ 2 │ 5 │ +│ 4 │ 1 │ +│ 32 │ 2 │ +│ 64 │ 4 │ +│ 256 │ 97 │ +│ 512 │ 4 │ +│ 1024 │ 130 │ +│ 2048 │ 8 │ +│ 4096 │ 88 │ +│ 1048576 │ 32 │ +│ 2097152 │ 3 │ +│ 4194304 │ 2 │ +└─────────────────────┴───────┘ + +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [ReportStats]: Data Movement Statistics: sg0002 +┌─────────────┬────────────────────────────┬───────┬───────────┐ +│ Instruction │ Kind │ Count │ Bytes │ +├─────────────┼────────────────────────────┼───────┼───────────┤ +│ DMACopy │ Input -> Internal │ 1 │ 6291456 │ +│ DMACopy │ Internal │ 4 │ 4194304 │ +│ Load │ Const -> Internal │ 8 │ 348936 │ +│ Load │ ExternalInput -> Internal │ 448 │ 193345548 │ +│ Load │ Internal │ 34 │ 6613898 │ +│ Save │ Internal │ 329 │ 6459911 │ +│ Save │ Internal -> ExternalOutput │ 1 │ 4 │ +└─────────────┴────────────────────────────┴───────┴───────────┘ + +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [ReportStats]: +┌─────────────────────┬───────┐ +│ Bytes per partition │ Count │ +├─────────────────────┼───────┤ +│ 1 │ 1 │ +│ 2 │ 3 │ +│ 4 │ 9 │ +│ 8 │ 2 │ +│ 16 │ 3 │ +│ 32 │ 6 │ +│ 64 │ 2 │ +│ 128 │ 4 │ +│ 256 │ 1 │ +│ 384 │ 1 │ +│ 512 │ 302 │ +│ 1024 │ 113 │ +│ 2048 │ 2 │ +│ 4096 │ 370 │ +│ 9496 │ 2 │ +│ 2097152 │ 3 │ +└─────────────────────┴───────┘ + +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [ReportStats]: MM Stats: #MatMults 14315 #MatMult-Transposes 5715 +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [ReportStats]: IO Tensor size combined: 6781430828 +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [ReportStats]: IO Tensor Statistics: +┌────────────────────┬────────────────┬──────────┬──────────────┐ +│ Largest IO Tensors │ Kind │ Src Type │ Size (Bytes) │ +├────────────────────┼────────────────┼──────────┼──────────────┤ +│ input60_sg0000 │ ExternalInput │ bfloat16 │ 311164928 │ +│ input369_sg0002 │ ExternalInput │ bfloat16 │ 311164928 │ +│ input60 │ ExternalInput │ bfloat16 │ 311164928 │ +│ input369 │ ExternalInput │ bfloat16 │ 311164928 │ +│ output3 │ ExternalOutput │ bfloat16 │ 33554432 │ +│ output2 │ ExternalOutput │ bfloat16 │ 33554432 │ +│ input5 │ ExternalInput │ bfloat16 │ 33554432 │ +│ output7 │ ExternalOutput │ bfloat16 │ 33554432 │ +│ input4 │ ExternalInput │ bfloat16 │ 33554432 │ +│ output11 │ ExternalOutput │ bfloat16 │ 33554432 │ +└────────────────────┴────────────────┴──────────┴──────────────┘ + +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [ReportStats]: Large (Internal) Tensor Statistics: +┌─────────────────┬───────────────────┬──────────┬──────────────┐ +│ Largest Tensors │ Kind │ Src Type │ Size (Bytes) │ +├─────────────────┼───────────────────┼──────────┼──────────────┤ +│ intermediate3 │ InternalInterface │ bfloat16 │ 4194304 │ +│ intermediate0 │ InternalInterface │ bfloat16 │ 4194304 │ +│ intermediate20 │ InternalInterface │ bfloat16 │ 4194304 │ +│ intermediate11 │ InternalInterface │ bfloat16 │ 4194304 │ +│ intermediate5 │ InternalInterface │ bfloat16 │ 4194304 │ +│ intermediate14 │ InternalInterface │ bfloat16 │ 4194304 │ +│ intermediate26 │ InternalInterface │ bfloat16 │ 4194304 │ +│ intermediate23 │ InternalInterface │ bfloat16 │ 4194304 │ +│ intermediate17 │ InternalInterface │ bfloat16 │ 4194304 │ +│ intermediate8 │ InternalInterface │ bfloat16 │ 4194304 │ +└─────────────────┴───────────────────┴──────────┴──────────────┘ + +2025-11-04T21:38:55Z USER 9044 (nc00) [CoreForkPass]: report_stats finished after 0.019 seconds +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: curr_vmrss: 432mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: Output has 1 module(s), 4 function(s), 5948 memory location(s), 4 block(s), and 19558 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 (nc00) [CoreForkPass]: Running coloring_allocator_dram_post_lnk +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: Inputs to coloring_allocator_dram_post_lnk: modules=1 functions=4 allocs=5948 blocks=4 instructions=19558 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: allocating spills in DRAM post_link mode for address space Local +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: reserved space = 0 bytes +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: spill space = 0 bytes +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: aligned spill space = 0 bytes +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: dram space = 107374182400 bytes +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: renumber locations +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: size = 0 +2025-11-04T21:38:55Z INFO 9044 []: find first defs for local +2025-11-04T21:38:55Z INFO 9044 []: find first defs for global +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: Num intervals 0 Num locations 0 +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: IntervalTree Build Done +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: info.neighbors init Done +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: IntervalTree readback Done +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: simplify interference graph +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: initialize low and high +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: lo = 0 +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: hi = 0 +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: total = 0 +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: simplify +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: new candidates = 0 +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: Already used DRAM hwm: 27262976 +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: select ranges +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: CC buffer size limit 524288000 +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: allreduce_dram_hwm 27262976 +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: Real CC buffer size 27262976 +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: DRAM hwm after allocation: 27262976 +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: DRAM allocation successful +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:55Z USER 9044 (nc01) [CoreForkPass]: coloring_allocator_dram_post_lnk finished after 0.036 seconds +2025-11-04T21:38:55Z INFO 9044 (nc01) [CoreForkPass]: curr_vmrss: 432mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9044 (nc01) [CoreForkPass]: Output has 1 module(s), 4 function(s), 5560 memory location(s), 4 block(s), and 18834 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 (nc01) [CoreForkPass]: Running coloring_allocator_dram_shared_post_lnk +2025-11-04T21:38:55Z INFO 9044 (nc01) [CoreForkPass]: Inputs to coloring_allocator_dram_shared_post_lnk: modules=1 functions=4 allocs=5560 blocks=4 instructions=18834 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [DRAM_Allocator]: allocating spills in DRAM post_link mode for address space Shared +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [DRAM_Allocator]: reserved space = 0 bytes +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [DRAM_Allocator]: spill space = 235405368 bytes +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [DRAM_Allocator]: aligned spill space = 235520000 bytes +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [DRAM_Allocator]: dram space = 107374182400 bytes +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [DRAM_Allocator]: Skipping shared tensor allocations on core 1, marking as remoteLocalTarget instead +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:55Z USER 9044 (nc01) [CoreForkPass]: coloring_allocator_dram_shared_post_lnk finished after 0.038 seconds +2025-11-04T21:38:55Z INFO 9044 (nc01) [CoreForkPass]: curr_vmrss: 432mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9044 (nc01) [CoreForkPass]: Output has 1 module(s), 4 function(s), 5560 memory location(s), 4 block(s), and 18834 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 (nc00) [CoreForkPass]: coloring_allocator_dram_post_lnk finished after 0.091 seconds +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: curr_vmrss: 432mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: Output has 1 module(s), 4 function(s), 5948 memory location(s), 4 block(s), and 19558 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 (nc00) [CoreForkPass]: Running coloring_allocator_dram_shared_post_lnk +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: Inputs to coloring_allocator_dram_shared_post_lnk: modules=1 functions=4 allocs=5948 blocks=4 instructions=19558 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: allocating spills in DRAM post_link mode for address space Shared +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: reserved space = 0 bytes +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: spill space = 235405368 bytes +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: aligned spill space = 235520000 bytes +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: dram space = 107374182400 bytes +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: renumber locations +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: size = 86 +2025-11-04T21:38:55Z INFO 9044 []: find first defs for local +2025-11-04T21:38:55Z INFO 9044 []: find first defs for global +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: Num intervals 86 Num locations 86 +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: IntervalTree Build Done +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: info.neighbors init Done +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: IntervalTree readback Done +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: simplify interference graph +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: initialize low and high +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: lo = 86 +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: hi = 0 +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: total = 86 +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: simplify +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: new candidates = 0 +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: Already used DRAM hwm: 27262976 +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: select ranges +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: CC buffer size limit 524288000 +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: allreduce_dram_hwm 27262976 +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: Real CC buffer size 27262976 +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: DRAM hwm after allocation: 44576768 +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [DRAM_Allocator]: DRAM allocation successful +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:55Z USER 9044 (nc00) [CoreForkPass]: coloring_allocator_dram_shared_post_lnk finished after 0.073 seconds +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: curr_vmrss: 432mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: Output has 1 module(s), 4 function(s), 5948 memory location(s), 4 block(s), and 19558 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 [CoreForkPass]: Compilation status: Total modules: 2, Passed: 6, Failed: 0 +2025-11-04T21:38:55Z USER 9044 [BackendPassManager]: nc_parallel_pass finished after 0.633 seconds +2025-11-04T21:38:55Z INFO 9044 [BackendPassManager]: curr_vmrss: 432mb, ru_maxrss: 627mb (delta=203mb) +2025-11-04T21:38:55Z USER 9044 [BackendPassManager]: Running subgraph_parallel_pass +2025-11-04T21:38:55Z INFO 9044 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=2 functions=8 allocs=11508 blocks=8 instructions=38392 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 (sg00) [SubgraphForkPass]: Running sync_shared_allocations +2025-11-04T21:38:55Z INFO 9044 (sg00) [SubgraphForkPass]: Inputs to sync_shared_allocations: modules=2 functions=8 allocs=11508 blocks=8 instructions=38392 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 (sg00) [SubgraphForkPass]: sync_shared_allocations finished after 0.006 seconds +2025-11-04T21:38:55Z INFO 9044 (sg00) [SubgraphForkPass]: curr_vmrss: 432mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9044 (sg00) [SubgraphForkPass]: Output has 2 module(s), 8 function(s), 11508 memory location(s), 8 block(s), and 38392 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 [SubgraphForkPass]: Compilation status: Total subgraphs: 1, Passed: 1, Failed: 0 +2025-11-04T21:38:55Z USER 9044 [BackendPassManager]: subgraph_parallel_pass finished after 0.008 seconds +2025-11-04T21:38:55Z INFO 9044 [BackendPassManager]: curr_vmrss: 432mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:55Z USER 9044 [BackendPassManager]: Running nc_parallel_pass +2025-11-04T21:38:55Z INFO 9044 [BackendPassManager]: Inputs to nc_parallel_pass: modules=2 functions=8 allocs=11508 blocks=8 instructions=38392 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 (nc00) [CoreForkPass]: Running memory_analysis_after_coloring_allocator_dram_post_lnk +2025-11-04T21:38:55Z USER 9044 (nc01) [CoreForkPass]: Running memory_analysis_after_coloring_allocator_dram_post_lnk +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: Inputs to memory_analysis_after_coloring_allocator_dram_post_lnk: modules=1 functions=4 allocs=5948 blocks=4 instructions=19558 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z INFO 9044 (nc01) [CoreForkPass]: Inputs to memory_analysis_after_coloring_allocator_dram_post_lnk: modules=1 functions=4 allocs=5560 blocks=4 instructions=18834 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 (nc00) [CoreForkPass]: memory_analysis_after_coloring_allocator_dram_post_lnk finished after 0.037 seconds +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: curr_vmrss: 432mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: Output has 1 module(s), 4 function(s), 5948 memory location(s), 4 block(s), and 19558 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 (nc00) [CoreForkPass]: Running lower_dynamic_dma +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: Inputs to lower_dynamic_dma: modules=1 functions=4 allocs=5948 blocks=4 instructions=19558 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 (nc01) [CoreForkPass]: memory_analysis_after_coloring_allocator_dram_post_lnk finished after 0.044 seconds +2025-11-04T21:38:55Z INFO 9044 (nc01) [CoreForkPass]: curr_vmrss: 432mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9044 (nc01) [CoreForkPass]: Output has 1 module(s), 4 function(s), 5560 memory location(s), 4 block(s), and 18834 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 (nc01) [CoreForkPass]: Running lower_dynamic_dma +2025-11-04T21:38:55Z INFO 9044 (nc01) [CoreForkPass]: Inputs to lower_dynamic_dma: modules=1 functions=4 allocs=5560 blocks=4 instructions=18834 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 (nc00) [CoreForkPass]: lower_dynamic_dma finished after 0.006 seconds +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: curr_vmrss: 432mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: Output has 1 module(s), 4 function(s), 5948 memory location(s), 4 block(s), and 19558 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 (nc00) [CoreForkPass]: Running legalize_dynamic_dma +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: Inputs to legalize_dynamic_dma: modules=1 functions=4 allocs=5948 blocks=4 instructions=19558 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 (nc01) [CoreForkPass]: lower_dynamic_dma finished after 0.022 seconds +2025-11-04T21:38:55Z INFO 9044 (nc01) [CoreForkPass]: curr_vmrss: 432mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9044 (nc01) [CoreForkPass]: Output has 1 module(s), 4 function(s), 5560 memory location(s), 4 block(s), and 18834 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 (nc01) [CoreForkPass]: Running legalize_dynamic_dma +2025-11-04T21:38:55Z INFO 9044 (nc01) [CoreForkPass]: Inputs to legalize_dynamic_dma: modules=1 functions=4 allocs=5560 blocks=4 instructions=18834 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [LegalizeDynamicDMA]: Legalize Dynamic DMA scanned 1 DGE instructions +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [LegalizeDynamicDMA]: After Legalize Dynamic DMA, 1 DGE instructions were scanned +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [LegalizeDynamicDMA]: +┌───────────┬───────────────────────────────┬────────────────────────────┐ +│ Sub-Pass │ Illegal Instructions Detected │ New Instructions Generated │ +├───────────┼───────────────────────────────┼────────────────────────────┤ +│ Peeling │ 0 │ 0 │ +│ Unrolling │ 0 │ 0 │ +│ Splitting │ 0 │ 0 │ +└───────────┴───────────────────────────────┴────────────────────────────┘ + +2025-11-04T21:38:55Z USER 9044 (nc00) [CoreForkPass]: legalize_dynamic_dma finished after 0.029 seconds +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: curr_vmrss: 432mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: Output has 1 module(s), 4 function(s), 5948 memory location(s), 4 block(s), and 19558 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 (nc00) [CoreForkPass]: Running optimize_queue_switch +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: Inputs to optimize_queue_switch: modules=1 functions=4 allocs=5948 blocks=4 instructions=19558 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [OptimizeQueueSwitch]: Optimize queue switch has replaced 7 total SQI Instructions with RQI +2025-11-04T21:38:55Z USER 9044 (nc00) [CoreForkPass]: optimize_queue_switch finished after 0.006 seconds +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: curr_vmrss: 432mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: Output has 1 module(s), 4 function(s), 5948 memory location(s), 4 block(s), and 19565 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 (nc00) [CoreForkPass]: Running lower_dma +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: Inputs to lower_dma: modules=1 functions=4 allocs=5948 blocks=4 instructions=19565 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [LegalizeDynamicDMA]: Legalize Dynamic DMA scanned 1 DGE instructions +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [LegalizeDynamicDMA]: After Legalize Dynamic DMA, 1 DGE instructions were scanned +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [LegalizeDynamicDMA]: +┌───────────┬───────────────────────────────┬────────────────────────────┐ +│ Sub-Pass │ Illegal Instructions Detected │ New Instructions Generated │ +├───────────┼───────────────────────────────┼────────────────────────────┤ +│ Peeling │ 0 │ 0 │ +│ Unrolling │ 0 │ 0 │ +│ Splitting │ 0 │ 0 │ +└───────────┴───────────────────────────────┴────────────────────────────┘ + +2025-11-04T21:38:55Z USER 9044 (nc01) [CoreForkPass]: legalize_dynamic_dma finished after 0.029 seconds +2025-11-04T21:38:55Z INFO 9044 (nc01) [CoreForkPass]: curr_vmrss: 432mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9044 (nc01) [CoreForkPass]: Output has 1 module(s), 4 function(s), 5560 memory location(s), 4 block(s), and 18834 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 (nc01) [CoreForkPass]: Running optimize_queue_switch +2025-11-04T21:38:55Z INFO 9044 (nc01) [CoreForkPass]: Inputs to optimize_queue_switch: modules=1 functions=4 allocs=5560 blocks=4 instructions=18834 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [OptimizeQueueSwitch]: Optimize queue switch has replaced 7 total SQI Instructions with RQI +2025-11-04T21:38:55Z USER 9044 (nc01) [CoreForkPass]: optimize_queue_switch finished after 0.006 seconds +2025-11-04T21:38:55Z INFO 9044 (nc01) [CoreForkPass]: curr_vmrss: 432mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9044 (nc01) [CoreForkPass]: Output has 1 module(s), 4 function(s), 5560 memory location(s), 4 block(s), and 18841 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 (nc01) [CoreForkPass]: Running lower_dma +2025-11-04T21:38:55Z INFO 9044 (nc01) [CoreForkPass]: Inputs to lower_dma: modules=1 functions=4 allocs=5560 blocks=4 instructions=18841 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [LowerDMA]: lower_dma metrics start + IO + Copy (DGE/DMA) + 128 partition : 5164/5164 (100% DGE) + power-of-2 partition : 5192/5255 (98.8011% DGE) + > 3 dimensional : 0/0 + non-integer desc size : 0/0 + total : 5193/5256 (98.8014% DGE) + Cast (DGE/DMA) + 128 partition : 57/57 (100% DGE) + power-of-2 partition : 169/170 (99.4118% DGE) + > 3 dimensional : 0/0 + non-integer desc size : 0/0 + total : 169/170 (99.4118% DGE) + Spill/Reload + Copy (DGE/DMA) + 128 partition : 4429/4433 (99.9098% DGE) + power-of-2 partition : 4429/4775 (92.7539% DGE) + > 3 dimensional : 0/4 (0% DGE) + non-integer desc size : 0/0 + total : 4429/4775 (92.7539% DGE) + Cast (DGE/DMA) + 128 partition : 0/0 + power-of-2 partition : 0/2 (0% DGE) + > 3 dimensional : 0/0 + non-integer desc size : 0/0 + total : 0/2 (0% DGE) + CopyMode + CCE : 29 + Transpose : 896 + Replicate : 0 + Dynamic (DGE/DMA) + scalar : 1/1 (100% DGE) + vector : 904/904 (100% DGE) + Opcode + ReadVarAddr : 0 + IndirectLoad : 0 + IndirectSave : 0 + IndirectSaveAccumulate : 0 + DstReduceDGE : 0 +lower_dma metrics end +2025-11-04T21:38:55Z USER 9044 (nc00) [CoreForkPass]: lower_dma finished after 0.096 seconds +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: curr_vmrss: 433mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: Output has 1 module(s), 4 function(s), 5948 memory location(s), 4 block(s), and 19565 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 (nc00) [CoreForkPass]: Running expand_all_engine +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: Inputs to expand_all_engine: modules=1 functions=4 allocs=5948 blocks=4 instructions=19565 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 (nc00) [CoreForkPass]: expand_all_engine finished after 0.009 seconds +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: curr_vmrss: 433mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: Output has 1 module(s), 4 function(s), 5948 memory location(s), 4 block(s), and 19565 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 (nc00) [CoreForkPass]: Running alloc_semaphores +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: Inputs to alloc_semaphores: modules=1 functions=4 allocs=5948 blocks=4 instructions=19565 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 (nc00) [CoreForkPass]: alloc_semaphores finished after 0.045 seconds +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: curr_vmrss: 433mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: Output has 1 module(s), 4 function(s), 5948 memory location(s), 4 block(s), and 19565 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 (nc00) [CoreForkPass]: Running expand_inst_late +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: Inputs to expand_inst_late: modules=1 functions=4 allocs=5948 blocks=4 instructions=19565 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [LowerDMA]: lower_dma metrics start + IO + Copy (DGE/DMA) + 128 partition : 5164/5164 (100% DGE) + power-of-2 partition : 5164/5198 (99.3459% DGE) + > 3 dimensional : 0/0 + non-integer desc size : 0/0 + total : 5165/5199 (99.346% DGE) + Cast (DGE/DMA) + 128 partition : 57/57 (100% DGE) + power-of-2 partition : 169/170 (99.4118% DGE) + > 3 dimensional : 0/0 + non-integer desc size : 0/0 + total : 169/170 (99.4118% DGE) + Spill/Reload + Copy (DGE/DMA) + 128 partition : 4424/4428 (99.9097% DGE) + power-of-2 partition : 4424/4737 (93.3924% DGE) + > 3 dimensional : 0/4 (0% DGE) + non-integer desc size : 0/0 + total : 4424/4737 (93.3924% DGE) + Cast (DGE/DMA) + 128 partition : 0/0 + power-of-2 partition : 0/0 + > 3 dimensional : 0/0 + non-integer desc size : 0/0 + total : 0/0 + CopyMode + CCE : 29 + Transpose : 896 + Replicate : 0 + Dynamic (DGE/DMA) + scalar : 1/1 (100% DGE) + vector : 904/904 (100% DGE) + Opcode + ReadVarAddr : 0 + IndirectLoad : 0 + IndirectSave : 0 + IndirectSaveAccumulate : 0 + DstReduceDGE : 0 +lower_dma metrics end +2025-11-04T21:38:55Z USER 9044 (nc01) [CoreForkPass]: lower_dma finished after 0.146 seconds +2025-11-04T21:38:55Z INFO 9044 (nc01) [CoreForkPass]: curr_vmrss: 433mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9044 (nc01) [CoreForkPass]: Output has 1 module(s), 4 function(s), 5560 memory location(s), 4 block(s), and 18841 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 (nc01) [CoreForkPass]: Running expand_all_engine +2025-11-04T21:38:55Z INFO 9044 (nc01) [CoreForkPass]: Inputs to expand_all_engine: modules=1 functions=4 allocs=5560 blocks=4 instructions=18841 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 (nc01) [CoreForkPass]: expand_all_engine finished after 0.015 seconds +2025-11-04T21:38:55Z INFO 9044 (nc01) [CoreForkPass]: curr_vmrss: 433mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9044 (nc01) [CoreForkPass]: Output has 1 module(s), 4 function(s), 5560 memory location(s), 4 block(s), and 18841 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 (nc01) [CoreForkPass]: Running alloc_semaphores +2025-11-04T21:38:55Z INFO 9044 (nc01) [CoreForkPass]: Inputs to alloc_semaphores: modules=1 functions=4 allocs=5560 blocks=4 instructions=18841 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 (nc00) [CoreForkPass]: expand_inst_late finished after 0.035 seconds +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: curr_vmrss: 433mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: Output has 1 module(s), 4 function(s), 5948 memory location(s), 4 block(s), and 19712 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 (nc00) [CoreForkPass]: Running seq_inst_opt +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: Inputs to seq_inst_opt: modules=1 functions=4 allocs=5948 blocks=4 instructions=19712 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [SeqInstOpt]: Removing 0 unnecessary InstRegisterMove instruction(s) from Block1 +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [SeqInstOpt]: Removing 72 unnecessary InstRegisterMove instruction(s) from Block1 +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [SeqInstOpt]: Removing 65 unnecessary InstRegisterMove instruction(s) from Block1 +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [SeqInstOpt]: Removing 0 unnecessary InstRegisterMove instruction(s) from Block1 +2025-11-04T21:38:55Z USER 9044 (nc00) [CoreForkPass]: seq_inst_opt finished after 0.006 seconds +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: curr_vmrss: 433mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: Output has 1 module(s), 4 function(s), 5948 memory location(s), 4 block(s), and 19575 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 (nc00) [CoreForkPass]: Running lower_sync +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: Inputs to lower_sync: modules=1 functions=4 allocs=5948 blocks=4 instructions=19575 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 (nc00) [CoreForkPass]: lower_sync finished after 0.021 seconds +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: curr_vmrss: 433mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: Output has 1 module(s), 4 function(s), 5948 memory location(s), 4 block(s), and 21099 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 (nc00) [CoreForkPass]: Running lower_act +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: Inputs to lower_act: modules=1 functions=4 allocs=5948 blocks=4 instructions=21099 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 (nc00) [CoreForkPass]: lower_act finished after 0.006 seconds +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: curr_vmrss: 433mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: Output has 1 module(s), 4 function(s), 5948 memory location(s), 4 block(s), and 21113 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 (nc00) [CoreForkPass]: Running lower_dve +2025-11-04T21:38:55Z INFO 9044 (nc00) [CoreForkPass]: Inputs to lower_dve: modules=1 functions=4 allocs=5948 blocks=4 instructions=21113 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z INFO 9044 (nc00/sgLnk) [LowerDVE]: Loading DVE opcodes table dve_info.json from /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/dve/dve_bin_gen3/dve_info.json +2025-11-04T21:38:55Z USER 9044 (nc01) [CoreForkPass]: alloc_semaphores finished after 0.054 seconds +2025-11-04T21:38:55Z INFO 9044 (nc01) [CoreForkPass]: curr_vmrss: 433mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9044 (nc01) [CoreForkPass]: Output has 1 module(s), 4 function(s), 5560 memory location(s), 4 block(s), and 18841 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 (nc01) [CoreForkPass]: Running expand_inst_late +2025-11-04T21:38:55Z INFO 9044 (nc01) [CoreForkPass]: Inputs to expand_inst_late: modules=1 functions=4 allocs=5560 blocks=4 instructions=18841 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 (nc01) [CoreForkPass]: expand_inst_late finished after 0.053 seconds +2025-11-04T21:38:55Z INFO 9044 (nc01) [CoreForkPass]: curr_vmrss: 435mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9044 (nc01) [CoreForkPass]: Output has 1 module(s), 4 function(s), 5560 memory location(s), 4 block(s), and 18988 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 (nc01) [CoreForkPass]: Running seq_inst_opt +2025-11-04T21:38:55Z INFO 9044 (nc01) [CoreForkPass]: Inputs to seq_inst_opt: modules=1 functions=4 allocs=5560 blocks=4 instructions=18988 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [SeqInstOpt]: Removing 0 unnecessary InstRegisterMove instruction(s) from Block1 +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [SeqInstOpt]: Removing 72 unnecessary InstRegisterMove instruction(s) from Block1 +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [SeqInstOpt]: Removing 65 unnecessary InstRegisterMove instruction(s) from Block1 +2025-11-04T21:38:55Z INFO 9044 (nc01/sgLnk) [SeqInstOpt]: Removing 0 unnecessary InstRegisterMove instruction(s) from Block1 +2025-11-04T21:38:55Z USER 9044 (nc01) [CoreForkPass]: seq_inst_opt finished after 0.009 seconds +2025-11-04T21:38:55Z INFO 9044 (nc01) [CoreForkPass]: curr_vmrss: 436mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:55Z INFO 9044 (nc01) [CoreForkPass]: Output has 1 module(s), 4 function(s), 5560 memory location(s), 4 block(s), and 18851 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:55Z USER 9044 (nc01) [CoreForkPass]: Running lower_sync +2025-11-04T21:38:55Z INFO 9044 (nc01) [CoreForkPass]: Inputs to lower_sync: modules=1 functions=4 allocs=5560 blocks=4 instructions=18851 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:56Z USER 9044 (nc01) [CoreForkPass]: lower_sync finished after 0.030 seconds +2025-11-04T21:38:56Z INFO 9044 (nc01) [CoreForkPass]: curr_vmrss: 436mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9044 (nc01) [CoreForkPass]: Output has 1 module(s), 4 function(s), 5560 memory location(s), 4 block(s), and 20220 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:56Z USER 9044 (nc01) [CoreForkPass]: Running lower_act +2025-11-04T21:38:56Z INFO 9044 (nc01) [CoreForkPass]: Inputs to lower_act: modules=1 functions=4 allocs=5560 blocks=4 instructions=20220 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:56Z USER 9044 (nc01) [CoreForkPass]: lower_act finished after 0.012 seconds +2025-11-04T21:38:56Z INFO 9044 (nc01) [CoreForkPass]: curr_vmrss: 436mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9044 (nc01) [CoreForkPass]: Output has 1 module(s), 4 function(s), 5560 memory location(s), 4 block(s), and 20233 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:56Z USER 9044 (nc01) [CoreForkPass]: Running lower_dve +2025-11-04T21:38:56Z INFO 9044 (nc01) [CoreForkPass]: Inputs to lower_dve: modules=1 functions=4 allocs=5560 blocks=4 instructions=20233 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [LowerDVE]: Loading DVE opcodes table dve_info.json from /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/dve/dve_bin_gen3/dve_info.json +2025-11-04T21:38:56Z USER 9044 (nc00) [CoreForkPass]: lower_dve finished after 0.161 seconds +2025-11-04T21:38:56Z INFO 9044 (nc00) [CoreForkPass]: curr_vmrss: 437mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9044 (nc00) [CoreForkPass]: Output has 1 module(s), 4 function(s), 5948 memory location(s), 4 block(s), and 21113 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:56Z USER 9044 (nc00) [CoreForkPass]: Running lower_ap +2025-11-04T21:38:56Z INFO 9044 (nc00) [CoreForkPass]: Inputs to lower_ap: modules=1 functions=4 allocs=5948 blocks=4 instructions=21113 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:56Z USER 9044 (nc00) [CoreForkPass]: lower_ap finished after 0.011 seconds +2025-11-04T21:38:56Z INFO 9044 (nc00) [CoreForkPass]: curr_vmrss: 437mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9044 (nc00) [CoreForkPass]: Output has 1 module(s), 4 function(s), 5948 memory location(s), 4 block(s), and 21113 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:56Z USER 9044 (nc00) [CoreForkPass]: Running coloring_allocator_reg +2025-11-04T21:38:56Z INFO 9044 (nc00) [CoreForkPass]: Inputs to coloring_allocator_reg: modules=1 functions=4 allocs=5948 blocks=4 instructions=21113 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: allocating REG +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: main loop iteration 1 +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: allocating REG +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: main loop iteration 1 +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: renumber registers +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: size = 4 +2025-11-04T21:38:56Z INFO 9044 []: find first defs for local reg +2025-11-04T21:38:56Z INFO 9044 []: find first defs for global reg +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: live range analysis +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: find costs +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: simplify interference graph +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: initialize low and high +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: lo = 4 +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: hi = 0 +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: inf = 0 +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: total = 4 +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: simplify +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: new candidates = 0 +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: select ranges +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: no more spills +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: REG score = 0 (lower is better) +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: Spilling from REG cost about 0 cycles +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: 0% REG utilization after allocation +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: allocating REG +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: main loop iteration 1 +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: renumber registers +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: size = 3 +2025-11-04T21:38:56Z INFO 9044 []: find first defs for local reg +2025-11-04T21:38:56Z INFO 9044 []: find first defs for global reg +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: live range analysis +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: find costs +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: simplify interference graph +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: initialize low and high +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: lo = 3 +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: hi = 0 +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: inf = 0 +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: total = 3 +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: simplify +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: new candidates = 0 +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: select ranges +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: no more spills +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: REG score = 0 (lower is better) +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: Spilling from REG cost about 0 cycles +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: 0% REG utilization after allocation +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: allocating REG +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: main loop iteration 1 +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: renumber registers +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: size = 4 +2025-11-04T21:38:56Z INFO 9044 []: find first defs for local reg +2025-11-04T21:38:56Z INFO 9044 []: find first defs for global reg +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: live range analysis +2025-11-04T21:38:56Z USER 9044 (nc01) [CoreForkPass]: lower_dve finished after 0.179 seconds +2025-11-04T21:38:56Z INFO 9044 (nc01) [CoreForkPass]: curr_vmrss: 438mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9044 (nc01) [CoreForkPass]: Output has 1 module(s), 4 function(s), 5560 memory location(s), 4 block(s), and 20233 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:56Z USER 9044 (nc01) [CoreForkPass]: Running lower_ap +2025-11-04T21:38:56Z INFO 9044 (nc01) [CoreForkPass]: Inputs to lower_ap: modules=1 functions=4 allocs=5560 blocks=4 instructions=20233 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:56Z USER 9044 (nc01) [CoreForkPass]: lower_ap finished after 0.010 seconds +2025-11-04T21:38:56Z INFO 9044 (nc01) [CoreForkPass]: curr_vmrss: 440mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9044 (nc01) [CoreForkPass]: Output has 1 module(s), 4 function(s), 5560 memory location(s), 4 block(s), and 20233 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: find costs +2025-11-04T21:38:56Z USER 9044 (nc01) [CoreForkPass]: Running coloring_allocator_reg +2025-11-04T21:38:56Z INFO 9044 (nc01) [CoreForkPass]: Inputs to coloring_allocator_reg: modules=1 functions=4 allocs=5560 blocks=4 instructions=20233 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: allocating REG +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: main loop iteration 1 +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: allocating REG +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: main loop iteration 1 +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: renumber registers +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: size = 4 +2025-11-04T21:38:56Z INFO 9044 []: find first defs for local reg +2025-11-04T21:38:56Z INFO 9044 []: find first defs for global reg +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: live range analysis +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: find costs +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: simplify interference graph +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: simplify interference graph +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: initialize low and high +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: lo = 4 +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: hi = 0 +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: inf = 0 +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: total = 4 +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: simplify +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: new candidates = 0 +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: select ranges +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: no more spills +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: REG score = 0 (lower is better) +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: Spilling from REG cost about 0 cycles +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: 0% REG utilization after allocation +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: allocating REG +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: initialize low and high +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: lo = 4 +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: hi = 0 +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: inf = 0 +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: total = 4 +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: simplify +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: new candidates = 0 +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: select ranges +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: no more spills +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: REG score = 0 (lower is better) +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: Spilling from REG cost about 0 cycles +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [REG_Allocator]: 0% REG utilization after allocation +2025-11-04T21:38:56Z USER 9044 (nc00) [CoreForkPass]: coloring_allocator_reg finished after 0.168 seconds +2025-11-04T21:38:56Z INFO 9044 (nc00) [CoreForkPass]: curr_vmrss: 440mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: main loop iteration 1 +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: renumber registers +2025-11-04T21:38:56Z INFO 9044 (nc00) [CoreForkPass]: Output has 1 module(s), 4 function(s), 5948 memory location(s), 4 block(s), and 21113 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: size = 3 +2025-11-04T21:38:56Z INFO 9044 []: find first defs for local reg +2025-11-04T21:38:56Z INFO 9044 []: find first defs for global reg +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: live range analysis +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: find costs +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: simplify interference graph +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: initialize low and high +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: lo = 3 +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: hi = 0 +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: inf = 0 +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: total = 3 +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: simplify +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: new candidates = 0 +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: select ranges +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: no more spills +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: REG score = 0 (lower is better) +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: Spilling from REG cost about 0 cycles +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: 0% REG utilization after allocation +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: allocating REG +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: main loop iteration 1 +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: renumber registers +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: size = 4 +2025-11-04T21:38:56Z INFO 9044 []: find first defs for local reg +2025-11-04T21:38:56Z INFO 9044 []: find first defs for global reg +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: live range analysis +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: find costs +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: simplify interference graph +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: initialize low and high +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: lo = 4 +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: hi = 0 +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: inf = 0 +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: total = 4 +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: simplify +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: new candidates = 0 +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: select ranges +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: no more spills +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: REG score = 0 (lower is better) +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: Spilling from REG cost about 0 cycles +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [REG_Allocator]: 0% REG utilization after allocation +2025-11-04T21:38:56Z USER 9044 (nc01) [CoreForkPass]: coloring_allocator_reg finished after 0.093 seconds +2025-11-04T21:38:56Z INFO 9044 (nc01) [CoreForkPass]: curr_vmrss: 441mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9044 (nc01) [CoreForkPass]: Output has 1 module(s), 4 function(s), 5560 memory location(s), 4 block(s), and 20233 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:56Z USER 9044 [CoreForkPass]: Compilation status: Total modules: 2, Passed: 2, Failed: 0 +2025-11-04T21:38:56Z USER 9044 [BackendPassManager]: nc_parallel_pass finished after 0.749 seconds +2025-11-04T21:38:56Z INFO 9044 [BackendPassManager]: curr_vmrss: 441mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:56Z USER 9044 [BackendPassManager]: Running vnc_remote_addr_map +2025-11-04T21:38:56Z INFO 9044 [BackendPassManager]: Inputs to vnc_remote_addr_map: modules=2 functions=8 allocs=11508 blocks=8 instructions=41346 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:56Z USER 9044 [BackendPassManager]: vnc_remote_addr_map finished after 0.002 seconds +2025-11-04T21:38:56Z INFO 9044 [BackendPassManager]: curr_vmrss: 441mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9044 [BackendPassManager]: Output has 2 module(s), 8 function(s), 11508 memory location(s), 8 block(s), and 41346 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:56Z USER 9044 [BackendPassManager]: Running vnc_link +2025-11-04T21:38:56Z INFO 9044 [BackendPassManager]: Inputs to vnc_link: modules=2 functions=8 allocs=11508 blocks=8 instructions=41346 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:56Z INFO 9044 [VncLink]: Found 0 remote updates +2025-11-04T21:38:56Z USER 9044 [BackendPassManager]: vnc_link finished after 0.001 seconds +2025-11-04T21:38:56Z INFO 9044 [BackendPassManager]: curr_vmrss: 441mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9044 [BackendPassManager]: Output has 2 module(s), 8 function(s), 11508 memory location(s), 8 block(s), and 41346 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:56Z USER 9044 [BackendPassManager]: Running mod_parallel_pass +2025-11-04T21:38:56Z INFO 9044 [BackendPassManager]: Inputs to mod_parallel_pass: modules=2 functions=8 allocs=11508 blocks=8 instructions=41346 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:56Z USER 9044 (nc01/sgLnk) [ModuleForkPass]: Running birverifier +2025-11-04T21:38:56Z USER 9044 (nc00/sgLnk) [ModuleForkPass]: Running birverifier +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=4 allocs=5560 blocks=4 instructions=20233 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=4 allocs=5948 blocks=4 instructions=21113 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:56Z USER 9044 (nc00/sgLnk) [ModuleForkPass]: birverifier finished after 0.095 seconds +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [ModuleForkPass]: curr_vmrss: 441mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [ModuleForkPass]: Output has 1 module(s), 4 function(s), 5948 memory location(s), 4 block(s), and 21113 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:56Z USER 9044 (nc01/sgLnk) [ModuleForkPass]: birverifier finished after 0.141 seconds +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [ModuleForkPass]: curr_vmrss: 441mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [ModuleForkPass]: Output has 1 module(s), 4 function(s), 5560 memory location(s), 4 block(s), and 20233 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:56Z USER 9044 [ModuleForkPass]: Compilation status: Total modules: 2, Passed: 2, Failed: 0 +2025-11-04T21:38:56Z USER 9044 [BackendPassManager]: mod_parallel_pass finished after 0.145 seconds +2025-11-04T21:38:56Z INFO 9044 [BackendPassManager]: curr_vmrss: 441mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:56Z USER 9044 [BackendPassManager]: Running subgraph_parallel_pass +2025-11-04T21:38:56Z INFO 9044 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=2 functions=8 allocs=11508 blocks=8 instructions=41346 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:56Z USER 9044 (sg00) [SubgraphForkPass]: Running lnc_verifier +2025-11-04T21:38:56Z INFO 9044 (sg00) [SubgraphForkPass]: Inputs to lnc_verifier: modules=2 functions=8 allocs=11508 blocks=8 instructions=41346 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:56Z USER 9044 (sg00) [SubgraphForkPass]: lnc_verifier finished after 0.018 seconds +2025-11-04T21:38:56Z INFO 9044 (sg00) [SubgraphForkPass]: curr_vmrss: 441mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9044 (sg00) [SubgraphForkPass]: Output has 2 module(s), 8 function(s), 11508 memory location(s), 8 block(s), and 41346 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:56Z USER 9044 [SubgraphForkPass]: Compilation status: Total subgraphs: 1, Passed: 1, Failed: 0 +2025-11-04T21:38:56Z USER 9044 [BackendPassManager]: subgraph_parallel_pass finished after 0.026 seconds +2025-11-04T21:38:56Z INFO 9044 [BackendPassManager]: curr_vmrss: 441mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:56Z USER 9044 [BackendPassManager]: Running mod_parallel_pass +2025-11-04T21:38:56Z INFO 9044 [BackendPassManager]: Inputs to mod_parallel_pass: modules=2 functions=8 allocs=11508 blocks=8 instructions=41346 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:56Z USER 9044 (nc00/sgLnk) [ModuleForkPass]: Running codegen +2025-11-04T21:38:56Z USER 9044 (nc01/sgLnk) [ModuleForkPass]: Running codegen +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [ModuleForkPass]: Inputs to codegen: modules=1 functions=4 allocs=5948 blocks=4 instructions=21113 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [ModuleForkPass]: Inputs to codegen: modules=1 functions=4 allocs=5560 blocks=4 instructions=20233 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [Codegen]: Total un-allocated DRAM tensors by kind: +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [Codegen]: Total un-allocated DRAM tensors by kind: +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [Codegen]: +┌────────────────┬─────────────┐ +│ TensorKind │ Size (GB) │ +├────────────────┼─────────────┤ +│ ExternalInput │ 1.89233 │ +│ ExternalOutput │ 1.75 │ +│ Const │ 0.000533108 │ +└────────────────┴─────────────┘ + +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [Codegen]: +┌────────────────┬─────────────┐ +│ TensorKind │ Size (GB) │ +├────────────────┼─────────────┤ +│ ExternalInput │ 1.89233 │ +│ ExternalOutput │ 1.75 │ +│ Const │ 0.000535022 │ +└────────────────┴─────────────┘ + +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [Codegen]: Instruction Stats: +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [Codegen]: +┌─────────────────────┬───────┐ +│ Opcode │ Count │ +├─────────────────────┼───────┤ +│ MATMUL │ 14299 │ +│ LDWEIGHTS │ 14281 │ +│ EVENT_SEMAPHORE │ 1369 │ +│ UNKNOWN(0xd4) │ 1069 │ +│ CAST │ 770 │ +│ COPY │ 740 │ +│ ACTIVATE │ 515 │ +│ TENSOR_TENSOR │ 463 │ +│ PSEUDO_DMA_TRIGGER │ 389 │ +│ UNKNOWN(0xd3) │ 145 │ +│ TENSOR_SCALAR_ADDR │ 145 │ +│ UNKNOWN(0x9a) │ 96 │ +│ UNKNOWN(0x9b) │ 96 │ +│ MEMSET │ 93 │ +│ UNKNOWN(0xda) │ 80 │ +│ UNKNOWN(0x92) │ 72 │ +│ TENSOR_REDUCE │ 69 │ +│ RECIPROCAL │ 65 │ +│ UNKNOWN(0x24) │ 64 │ +│ UNKNOWN(0xd8) │ 50 │ +│ TENSOR_SCALAR │ 38 │ +│ PSEUDO_BRANCH_LABEL │ 20 │ +│ LOAD_MASK_SELECT │ 16 │ +│ STREAM_SHUFFLE │ 16 │ +│ UNKNOWN(0xd2) │ 15 │ +│ ACT_TABLE_LOAD │ 13 │ +│ PSEUDO_DMA_REARM │ 7 │ +│ UNKNOWN(0xd9) │ 7 │ +│ UNKNOWN(0xcf) │ 7 │ +│ MOVE │ 7 │ +│ UNKNOWN(0xe8) │ 5 │ +│ ALU_OP │ 2 │ +│ IOTA │ 2 │ +│ PSEUDO_TENSOR_LOAD │ 1 │ +└─────────────────────┴───────┘ + +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [Codegen]: +┌────────────┬───────┐ +│ Engine │ Count │ +├────────────┼───────┤ +│ Unassigned │ 0 │ +│ GPSIMD │ 1807 │ +│ Scalar │ 2995 │ +│ Tensor │ 28810 │ +│ SyncDMA │ 0 │ +│ Vector │ 1162 │ +│ Sync │ 272 │ +│ All │ 0 │ +└────────────┴───────┘ + +2025-11-04T21:38:56Z USER 9044 (nc01/sgLnk) [Codegen]: isa_gen finished after 0.306 seconds +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [Codegen]: Number of DMA descriptors on each queue instance: +┌───────────────────────────┬────────────────┐ +│ Queue Instance │ RT Descriptors │ +├───────────────────────────┼────────────────┤ +│ qActSpillReload0_defId_2 │ 596 │ +│ qDVESpillReload0_defId_2 │ 2 │ +│ qPoolSpillReload0_defId_0 │ 49152 │ +│ qPoolSpillReload0_defId_1 │ 49152 │ +│ qPoolSpillReload0_defId_2 │ 7 │ +│ qSPIO0 │ 43092 │ +│ qSPSpillReload0_defId_0 │ 2 │ +│ qSPSpillReload0_defId_2 │ 4110 │ +└───────────────────────────┴────────────────┘ + +Total descriptors: 146113 (0.00217725 GB) +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [Codegen]: Number of DMA engines used by each queue: +┌───────────────────┬──────────────────────┐ +│ Queue │ DMA Engines │ +├───────────────────┼──────────────────────┤ +│ qSPDynamicHW │ 16 │ +│ qSPIO0 │ 16 │ +│ qSPSpillReload0 │ 16 │ +│ qPoolDynamic │ 16 │ +│ qActDynamicHW │ 16 │ +│ qPoolSpillReload0 │ 16 │ +│ qActSpillReload0 │ 16 │ +│ qDVESpillReload0 │ 16 │ +├───────────────────┼──────────────────────┤ +│ TOTAL │ 128 (must be <= 176) │ +└───────────────────┴──────────────────────┘ + +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [Codegen]: Tensors with largest descriptor count: +┌──────────────────────────────────────────────────────┬───────────────┬──────────┬──────────────────┐ +│ Tensor Name │ Kind │ Src Type │ Descriptor Count │ +├──────────────────────────────────────────────────────┼───────────────┼──────────┼──────────────────┤ +│ I-2766-0_b2_grp_7_s0_tile0_exp_tp_sbuf_sg0000 │ Internal │ bfloat16 │ 8 │ +│ I-2766-0_grp_7_sec_0_mhlo_exponential_6_b2_i0_sg0000 │ Internal │ bfloat16 │ 8 │ +│ I-2433-0_b0_grp_5_s0_tile0_exp_tp_sbuf_sg0001 │ Internal │ bfloat16 │ 8 │ +│ I-2766-0_grp_6_sec_0_mhlo_exponential_6_b2_i0_sg0000 │ Internal │ bfloat16 │ 8 │ +│ I-2766-0_grp_5_sec_0_mhlo_exponential_6_b0_i0_sg0000 │ Internal │ bfloat16 │ 8 │ +│ I-2766-0_b2_grp_4_s0_tile0_exp_tp_sbuf_sg0000 │ Internal │ bfloat16 │ 8 │ +│ add.4_sg0001 │ Internal │ bfloat16 │ 27 │ +│ compare.2.1758_sg0001 │ Internal │ int32 │ 27 │ +│ input2 │ ExternalInput │ int32 │ 28 │ +│ convert.55_sg0002 │ Internal │ float32 │ 297 │ +└──────────────────────────────────────────────────────┴───────────────┴──────────┴──────────────────┘ + +2025-11-04T21:38:56Z USER 9044 (nc01/sgLnk) [Codegen]: dma_desc_gen finished after 0.027 seconds +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [Codegen]: Generating debug info +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [Codegen]: Instruction Stats: +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [Codegen]: +┌─────────────────────┬───────┐ +│ Opcode │ Count │ +├─────────────────────┼───────┤ +│ MATMUL │ 14547 │ +│ LDWEIGHTS │ 14529 │ +│ EVENT_SEMAPHORE │ 1524 │ +│ UNKNOWN(0xd4) │ 1076 │ +│ COPY │ 868 │ +│ CAST │ 770 │ +│ ACTIVATE │ 522 │ +│ TENSOR_TENSOR │ 465 │ +│ PSEUDO_DMA_TRIGGER │ 427 │ +│ GATHER │ 291 │ +│ POOL_BUFFER_LOAD │ 291 │ +│ TENSOR_SCALAR_ADDR │ 145 │ +│ UNKNOWN(0xd3) │ 145 │ +│ DVE_READ_INDICES │ 128 │ +│ MAX8 │ 128 │ +│ MATCH_VALUE_LOAD │ 128 │ +│ MATCH_REPLACE8 │ 128 │ +│ MEMSET │ 107 │ +│ UNKNOWN(0x9a) │ 96 │ +│ UNKNOWN(0x9b) │ 96 │ +│ UNKNOWN(0xda) │ 80 │ +│ TENSOR_REDUCE │ 74 │ +│ UNKNOWN(0x92) │ 72 │ +│ RECIPROCAL │ 67 │ +│ UNKNOWN(0x24) │ 64 │ +│ UNKNOWN(0xd8) │ 50 │ +│ TENSOR_SCALAR │ 40 │ +│ PSEUDO_BRANCH_LABEL │ 20 │ +│ STREAM_SHUFFLE │ 20 │ +│ LOAD_MASK_SELECT │ 20 │ +│ UNKNOWN(0xd2) │ 15 │ +│ ACT_TABLE_LOAD │ 14 │ +│ MOVE │ 7 │ +│ PSEUDO_DMA_REARM │ 7 │ +│ UNKNOWN(0xcf) │ 7 │ +│ UNKNOWN(0xd9) │ 7 │ +│ UNKNOWN(0xe8) │ 5 │ +│ UNKNOWN(0xe5) │ 2 │ +│ ALU_OP │ 2 │ +│ IOTA │ 2 │ +│ PSEUDO_TENSOR_LOAD │ 1 │ +│ TENSOR_SCALAR │ 1 │ +│ RNG │ 1 │ +└─────────────────────┴───────┘ + +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [Codegen]: +┌────────────┬───────┐ +│ Engine │ Count │ +├────────────┼───────┤ +│ Unassigned │ 0 │ +│ GPSIMD │ 2450 │ +│ Scalar │ 3139 │ +│ Tensor │ 29309 │ +│ SyncDMA │ 0 │ +│ Vector │ 1802 │ +│ Sync │ 309 │ +│ All │ 0 │ +└────────────┴───────┘ + +2025-11-04T21:38:56Z USER 9044 (nc00/sgLnk) [Codegen]: isa_gen finished after 0.373 seconds +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [Codegen]: Number of DMA descriptors on each queue instance: +┌───────────────────────────┬────────────────┐ +│ Queue Instance │ RT Descriptors │ +├───────────────────────────┼────────────────┤ +│ qActSpillReload0_defId_2 │ 602 │ +│ qDVESpillReload0_defId_2 │ 142 │ +│ qPoolSpillReload0_defId_0 │ 49152 │ +│ qPoolSpillReload0_defId_1 │ 49152 │ +│ qPoolSpillReload0_defId_2 │ 207 │ +│ qSPIO0 │ 43094 │ +│ qSPPIOParam0 │ 56 │ +│ qSPSpillReload0_defId_0 │ 2 │ +│ qSPSpillReload0_defId_2 │ 4454 │ +└───────────────────────────┴────────────────┘ + +Total descriptors: 146861 (0.0021884 GB) +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [Codegen]: Number of DMA engines used by each queue: +┌───────────────────┬──────────────────────┐ +│ Queue │ DMA Engines │ +├───────────────────┼──────────────────────┤ +│ qSPDynamicHW │ 16 │ +│ qSPIO0 │ 16 │ +│ qSPSpillReload0 │ 16 │ +│ qPoolDynamic │ 16 │ +│ qActDynamicHW │ 16 │ +│ qPoolSpillReload0 │ 16 │ +│ qDVESpillReload0 │ 16 │ +│ qActSpillReload0 │ 16 │ +│ qSPPIOParam0 │ 16 │ +├───────────────────┼──────────────────────┤ +│ TOTAL │ 144 (must be <= 176) │ +└───────────────────┴──────────────────────┘ + +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [Codegen]: Tensors with largest descriptor count: +┌──────────────────────────────────────────────────────┬───────────────┬──────────┬──────────────────┐ +│ Tensor Name │ Kind │ Src Type │ Descriptor Count │ +├──────────────────────────────────────────────────────┼───────────────┼──────────┼──────────────────┤ +│ I-2433-0_b3_grp_6_s0_tile0_exp_tp_sbuf_sg0001 │ Internal │ bfloat16 │ 8 │ +│ I-2766-0_b0_grp_6_s0_tile0_exp_tp_sbuf_sg0000 │ Internal │ bfloat16 │ 8 │ +│ I-2433-0_b3_grp_7_s0_tile0_exp_tp_sbuf_sg0001 │ Internal │ bfloat16 │ 8 │ +│ I-2766-0_grp_5_sec_0_mhlo_exponential_6_b0_i0_sg0000 │ Internal │ bfloat16 │ 8 │ +│ I-2433-0_grp_4_sec_0_mhlo_exponential_6_b2_i0_sg0001 │ Internal │ bfloat16 │ 8 │ +│ all-reduce.465.2434_sg0001 │ Internal │ bfloat16 │ 27 │ +│ compare.2.1758_sg0001 │ Internal │ int32 │ 27 │ +│ add.4_sg0001 │ Internal │ bfloat16 │ 27 │ +│ input2 │ ExternalInput │ int32 │ 28 │ +�� convert.55_sg0002 │ Internal │ float32 │ 298 │ +└──────────────────────────────────────────────────────┴───────────────┴──────────┴──────────────────┘ + +2025-11-04T21:38:56Z USER 9044 (nc01/sgLnk) [Codegen]: debug_info_gen finished after 0.066 seconds +2025-11-04T21:38:56Z USER 9044 (nc00/sgLnk) [Codegen]: dma_desc_gen finished after 0.026 seconds +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [Codegen]: Generating debug info +2025-11-04T21:38:56Z USER 9044 (nc01/sgLnk) [ModuleForkPass]: codegen finished after 0.425 seconds +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [ModuleForkPass]: curr_vmrss: 462mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [ModuleForkPass]: Output has 1 module(s), 4 function(s), 5560 memory location(s), 4 block(s), and 20233 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:56Z USER 9044 (nc00/sgLnk) [Codegen]: debug_info_gen finished after 0.049 seconds +2025-11-04T21:38:56Z USER 9044 (nc00/sgLnk) [ModuleForkPass]: codegen finished after 0.461 seconds +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [ModuleForkPass]: curr_vmrss: 462mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [ModuleForkPass]: Output has 1 module(s), 4 function(s), 5948 memory location(s), 4 block(s), and 21113 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:56Z USER 9044 [ModuleForkPass]: Compilation status: Total modules: 2, Passed: 2, Failed: 0 +2025-11-04T21:38:56Z USER 9044 [BackendPassManager]: mod_parallel_pass finished after 0.465 seconds +2025-11-04T21:38:56Z INFO 9044 [BackendPassManager]: curr_vmrss: 462mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:56Z USER 9044 [BackendPassManager]: Running hbm_usage +2025-11-04T21:38:56Z INFO 9044 [BackendPassManager]: Inputs to hbm_usage: modules=2 functions=8 allocs=11508 blocks=8 instructions=41346 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [HBMUsage]: +┌───────────────┬───────────┬───────────────────┐ +│ DMA Ring Type │ I/O Size │ Spill/Reload Size │ +├───────────────┼───────────┼───────────────────┤ +│ Copy │ 1.344KB │ 61.312KB │ +│ CCE │ 672.000KB │ 48.000B │ +│ Transpose │ 0.000B │ 1.500MB │ +│ Replicate │ 0.000B │ 0.000B │ +│ Overhead │ 16.000KB │ 110.250KB │ +└───────────────┴───────────┴───────────────────┘ + +2025-11-04T21:38:56Z INFO 9044 (nc00/sgLnk) [HBMUsage]: +┌─────────────────────┬───────────┐ +│ DRAM Memory Usage │ Size │ +├─────────────────────┼───────────┤ +│ Total: │ 3.689GB │ +│ Model Code │ 2.259MB │ +│ Model Constants │ 561.012KB │ +│ Unallocated Tensors │ 3.642GB │ +│ Allocated Tensors │ 42.508MB │ +│ DMA Ring IO │ 689.344KB │ +│ DMA Ring Spill │ 1.668MB │ +└─────────────────────┴───────────┘ + +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [HBMUsage]: +┌───────────────┬───────────┬───────────────────┐ +│ DMA Ring Type │ I/O Size │ Spill/Reload Size │ +├───────────────┼───────────┼───────────────────┤ +│ Copy │ 1.312KB │ 49.656KB │ +│ CCE │ 672.000KB │ 48.000B │ +│ Transpose │ 0.000B │ 1.500MB │ +│ Replicate │ 0.000B │ 0.000B │ +│ Overhead │ 15.750KB │ 94.500KB │ +└───────────────┴───────────┴───────────────────┘ + +2025-11-04T21:38:56Z INFO 9044 (nc01/sgLnk) [HBMUsage]: +┌─────────────────────┬───────────┐ +│ DRAM Memory Usage │ Size │ +├─────────────────────┼───────────┤ +│ Total: │ 3.673GB │ +│ Model Code │ 2.139MB │ +│ Model Constants │ 559.004KB │ +│ Unallocated Tensors │ 3.642GB │ +│ Allocated Tensors │ 26.000MB │ +│ DMA Ring IO │ 689.062KB │ +│ DMA Ring Spill │ 1.641MB │ +└─────────────────────┴───────────┘ + +2025-11-04T21:38:56Z INFO 9044 [HBMUsage]: Total estimated HBM usage is: 3.719GB +2025-11-04T21:38:56Z USER 9044 [BackendPassManager]: hbm_usage finished after 0.004 seconds +2025-11-04T21:38:56Z INFO 9044 [BackendPassManager]: curr_vmrss: 462mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:56Z INFO 9044 [BackendPassManager]: Output has 2 module(s), 8 function(s), 11508 memory location(s), 8 block(s), and 41346 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:56Z USER 9044 [BackendPassManager]: Running neff_packager +2025-11-04T21:38:56Z INFO 9044 [BackendPassManager]: Inputs to neff_packager: modules=2 functions=8 allocs=11508 blocks=8 instructions=41346 Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:56Z INFO 9044 [NeffPackager]: FileDeDuper file not found value_sg0000_constant.7_CRSM.npy +2025-11-04T21:38:56Z INFO 9044 [NeffPackager]: FileDeDuper file not found value_sg0000_constant.9-1736_CRSM.npy +2025-11-04T21:38:56Z INFO 9044 [NeffPackager]: FileDeDuper file not found value_sg0000_constant.3-1653-1738_CRSM.npy +2025-11-04T21:38:56Z INFO 9044 [NeffPackager]: FileDeDuper file not found value_sg0000_constant.2-1664-1740_CRSM.npy +2025-11-04T21:38:56Z INFO 9044 [NeffPackager]: FileDeDuper file not found value_sg0000_identity_2051_CRSM.npy +2025-11-04T21:38:56Z INFO 9044 [NeffPackager]: FileDeDuper file not found value_sg0000_identity_2038_CRSM.npy +2025-11-04T21:38:56Z INFO 9044 [NeffPackager]: FileDeDuper file not found value_sg0001_constant.15_CRSM.npy +2025-11-04T21:38:56Z INFO 9044 [NeffPackager]: FileDeDuper file not found value_sg0001_constant.12-1543-1639_CRSM.npy +2025-11-04T21:38:56Z INFO 9044 [NeffPackager]: FileDeDuper file not found value_sg0001_constant.11-1554-1641_CRSM.npy +2025-11-04T21:38:56Z INFO 9044 [NeffPackager]: FileDeDuper file not found value_sg0001_constant.12-1565-1643_CRSM.npy +2025-11-04T21:38:56Z INFO 9044 [NeffPackager]: FileDeDuper file not found value_sg0001_constant.11-1575-1645_CRSM.npy +2025-11-04T21:38:56Z INFO 9044 [NeffPackager]: FileDeDuper file not found value_sg0001_identity_1783_CRSM.npy +2025-11-04T21:38:56Z INFO 9044 [NeffPackager]: FileDeDuper file not found value_sg0002_constant.24_CRSM.npy +2025-11-04T21:38:56Z INFO 9044 [NeffPackager]: FileDeDuper file not found value_sg0002_constant.25_CRSM.npy +2025-11-04T21:38:56Z INFO 9044 [NeffPackager]: FileDeDuper file not found value_sg0002_constant.26_CRSM.npy +2025-11-04T21:38:56Z INFO 9044 [NeffPackager]: FileDeDuper file not found value_sg0002_constant.28_CRSM.npy +2025-11-04T21:38:56Z INFO 9044 [NeffPackager]: FileDeDuper file not found value_sg0002_constant.29_CRSM.npy +2025-11-04T21:38:56Z INFO 9044 [NeffPackager]: FileDeDuper file not found value_sg0002_constant.27-1134-1355_CRSM.npy +2025-11-04T21:38:56Z INFO 9044 [NeffPackager]: FileDeDuper file not found value_sg0002_identity_1563_CRSM.npy +2025-11-04T21:38:56Z INFO 9044 [NeffPackager]: Const File de-dup saved 0 KB of memory footprint +2025-11-04T21:38:56Z INFO 9044 [NeffPackager]: FileDeDuper file not found value_sg0000_constant.7_CRSM.npy +2025-11-04T21:38:56Z INFO 9044 [NeffPackager]: FileDeDuper file not found value_sg0000_constant.9-1736_CRSM.npy +2025-11-04T21:38:56Z INFO 9044 [NeffPackager]: FileDeDuper file not found value_sg0000_constant.3-1653-1738_CRSM.npy +2025-11-04T21:38:56Z INFO 9044 [NeffPackager]: FileDeDuper file not found value_sg0000_constant.2-1664-1740_CRSM.npy +2025-11-04T21:38:56Z INFO 9044 [NeffPackager]: FileDeDuper file not found value_sg0000_identity_2051_CRSM.npy +2025-11-04T21:38:56Z INFO 9044 [NeffPackager]: FileDeDuper file not found value_sg0000_identity_2038_CRSM.npy +2025-11-04T21:38:57Z INFO 9044 [NeffPackager]: FileDeDuper file not found value_sg0001_constant.15_CRSM.npy +2025-11-04T21:38:57Z INFO 9044 [NeffPackager]: FileDeDuper file not found value_sg0001_constant.12-1543-1639_CRSM.npy +2025-11-04T21:38:57Z INFO 9044 [NeffPackager]: FileDeDuper file not found value_sg0001_constant.11-1554-1641_CRSM.npy +2025-11-04T21:38:57Z INFO 9044 [NeffPackager]: FileDeDuper file not found value_sg0001_constant.12-1565-1643_CRSM.npy +2025-11-04T21:38:57Z INFO 9044 [NeffPackager]: FileDeDuper file not found value_sg0001_constant.11-1575-1645_CRSM.npy +2025-11-04T21:38:57Z INFO 9044 [NeffPackager]: FileDeDuper file not found value_sg0001_identity_1783_CRSM.npy +2025-11-04T21:38:57Z INFO 9044 [NeffPackager]: FileDeDuper file not found value_sg0002_constant.26_CRSM.npy +2025-11-04T21:38:57Z INFO 9044 [NeffPackager]: FileDeDuper file not found value_sg0002_constant.28_CRSM.npy +2025-11-04T21:38:57Z INFO 9044 [NeffPackager]: FileDeDuper file not found value_sg0002_constant.29_CRSM.npy +2025-11-04T21:38:57Z INFO 9044 [NeffPackager]: FileDeDuper file not found value_sg0002_identity_1563_CRSM.npy +2025-11-04T21:38:57Z INFO 9044 [NeffPackager]: Const File de-dup saved 0 KB of memory footprint +2025-11-04T21:38:57Z WARNING 9044 [NeffFileWriter]: writeKelp missing file /home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk3/neuronxcc-miwah3fg/metrics.json +2025-11-04T21:38:57Z WARNING 9044 [NeffFileWriter]: writeKelp missing file /local/p4clients/pkgbuild-const/workspace/build/KaenaCompiler/KaenaCompiler-2.x.207535.0/AL2_x86_64/DEV.STD.PTHREAD/build/private/_skbuild/linux-x86_64-3.10/cmake-build/neuronxcc/walrus/neff_packager/MetricMetadata.json +2025-11-04T21:38:57Z INFO 9044 [NeffFileWriter]: Neff will be written to: /home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk3/model.MODULE_be035899334776123ed5+d208bdce.neff +2025-11-04T21:38:57Z INFO 9044 [NeffFileWriter]: IR signature: 94edbc79dbeab9e50a2627ad929d62b4 for neff artifacts +2025-11-04T21:38:57Z USER 9044 [BackendPassManager]: neff_packager finished after 0.180 seconds +2025-11-04T21:38:57Z INFO 9044 [BackendPassManager]: curr_vmrss: 463mb, ru_maxrss: 627mb (delta=0mb) +2025-11-04T21:38:57Z INFO 9044 [BackendPassManager]: Output has 2 module(s), 8 function(s), 11508 memory location(s), 8 block(s), and 41346 instruction(s). Max writers: 299 Max Readers: 5242 +2025-11-04T21:38:57Z INFO 9044 [BackendDriver]: HBM scratchpad usage summary (post-allocation): +┌──────┬───────────┬────────────────────────────────────────────────────────────┬─────────────┐ +│ Core │ Subgraph │ Description │ Value │ +├──────┼───────────┼────────────────────────────────────────────────────────────┼─────────────┤ +│ nc00 │ sg00 │ Peak scratchpad usage: local │ 0.000000 GB │ +│ nc00 │ sg00 │ Peak scratchpad usage: local and shared │ 0.021484 GB │ +│ nc00 │ sg00 │ Total size of allocated tensors: local │ 0.000000 GB │ +│ nc00 │ sg00 │ Total size of allocated tensors: shared │ 0.021484 GB │ +│ nc00 │ sg01 │ Peak scratchpad usage: local │ 0.000000 GB │ +│ nc00 │ sg01 │ Peak scratchpad usage: local and shared │ 0.025391 GB │ +│ nc00 │ sg01 │ Total size of allocated tensors: local │ 0.000000 GB │ +│ nc00 │ sg01 │ Total size of allocated tensors: shared │ 0.027344 GB │ +│ nc00 │ sg02 │ Peak scratchpad usage: local │ 0.001953 GB │ +│ nc00 │ sg02 │ Peak scratchpad usage: local and shared │ 0.013981 GB │ +│ nc00 │ sg02 │ Total size of allocated tensors: local │ 0.001980 GB │ +│ nc00 │ sg02 │ Total size of allocated tensors: shared │ 0.015965 GB │ +│ nc00 │ Max │ Peak scratchpad usage: local │ 0.001953 GB │ +│ nc00 │ Max │ Peak scratchpad usage: local and shared │ 0.025391 GB │ +│ nc00 │ Post-link │ Peak scratchpad usage after intermediate tensor allocation │ 0.041515 GB │ +│ nc00 │ Post-link │ Total size of allocated intermediate tensors │ 0.219345 GB │ +├──────┼───────────┼────────────────────────────────────────────────────────────┼─────────────┤ +│ nc01 │ sg00 │ Peak scratchpad usage: local │ 0.000000 GB │ +│ nc01 │ sg00 │ Total size of allocated tensors: local │ 0.000000 GB │ +│ nc01 │ sg01 │ Peak scratchpad usage: local │ 0.000000 GB │ +│ nc01 │ sg01 │ Total size of allocated tensors: local │ 0.000000 GB │ +│ nc01 │ sg02 │ Peak scratchpad usage: local │ 0.001953 GB │ +│ nc01 │ sg02 │ Total size of allocated tensors: local │ 0.001953 GB │ +│ nc01 │ Max │ Peak scratchpad usage: local │ 0.001953 GB │ +├──────┼───────────┼────────────────────────────────────────────────────────────┼─────────────┤ +│ Max │ Max │ Peak scratchpad usage │ 0.041515 GB │ +│ Max │ Max │ Peak scratchpad usage (page-aligned) │ 0.500000 GB │ +└──────┴───────────┴────────────────────────────────────────────────────────────┴─────────────┘ + +2025-11-04T21:38:57Z INFO 9044 [BackendDriver]: Largest tensors at peak scratchpad usage, core=nc00, subgraph=sg00, addr_space=shared (complete data located at nc00/sg00/memory_analysis_after_coloring_allocator_dram_shared_DRAM_Shared_hwm_allocations.csv): +┌────────────────────────────────────────────────────────────────┬──────────┬───────────────┬─────────────┐ +│ Tensor Name │ Type │ # Sub-tensors │ Total Size │ +├────────────────────────────────────────────────────────────────┼──────────┼───────────────┼─────────────┤ +│ all_gather.1 │ bfloat16 │ 1 │ 4.000000 MB │ +│ reshape.16 │ bfloat16 │ 1 │ 2.000000 MB │ +│ reshape.24 │ bfloat16 │ 1 │ 2.000000 MB │ +│ reshape.29 │ bfloat16 │ 1 │ 2.000000 MB │ +│ transpose.1 │ bfloat16 │ 1 │ 2.000000 MB │ +└────────────────────────────────────────────────────────────────┴──────────┴───────────────┴─────────────┘ + +2025-11-04T21:38:57Z INFO 9044 [BackendDriver]: Largest tensors at peak scratchpad usage, core=nc00, subgraph=sg02, addr_space=local (complete data located at nc00/sg02/memory_analysis_after_coloring_allocator_dram_shared_DRAM_Local_hwm_allocations.csv): +┌────────────────────────────────────────────────────────────────┬───────┬───────────────┬─────────────┐ +│ Tensor Name │ Type │ # Sub-tensors │ Total Size │ +├────────────────────────────────────────────────────────────────┼───────┼───────────────┼─────────────┤ +│ reshape.104 │ int32 │ 1 │ 0.000008 MB │ +└────────────────────────────────────────────────────────────────┴───────┴───────────────┴─────────────┘ + +2025-11-04T21:38:57Z INFO 9044 [BackendDriver]: Largest intermediate tensors at peak scratchpad usage, core=nc00 (complete data located at nc00//sgLnk/sg00/memory_analysis_after_coloring_allocator_dram_post_lnk_DRAM_Shared_hwm_allocations.csv): +┌────────────────────────────────────────────────────────────────┬──────────┬───────────────┬─────────────┐ +│ Tensor Name │ Type │ # Sub-tensors │ Total Size │ +├────────────────────────────────────────────────────────────────┼──────────┼───────────────┼─────────────┤ +│ intermediate0 │ bfloat16 │ 1 │ 4.000000 MB │ +│ intermediate3 │ bfloat16 │ 1 │ 4.000000 MB │ +│ intermediate5 │ bfloat16 │ 1 │ 4.000000 MB │ +│ intermediate6 │ bfloat16 │ 1 │ 4.000000 MB │ +│ intermediate1 │ bfloat16 │ 1 │ 0.250000 MB │ +│ intermediate2 │ bfloat16 │ 1 │ 0.250000 MB │ +│ intermediate4 │ bfloat16 │ 1 │ 0.003906 MB │ +│ intermediate7 │ bfloat16 │ 1 │ 0.003906 MB │ +└────────────────────────────────────────────────────────────────┴──────────┴───────────────┴─────────────┘ + +2025-11-04T21:38:57Z INFO 9044 [BackendDriver]: Largest intermediate tensors at peak scratchpad usage, core=nc01 (complete data located at nc01//sgLnk/sg00/memory_analysis_after_coloring_allocator_dram_post_lnk_DRAM_Shared_hwm_allocations.csv): +┌────────────────────────────────────────────────────────────────┬──────────┬───────────────┬─────────────┐ +│ Tensor Name │ Type │ # Sub-tensors │ Total Size │ +├────────────────────────────────────────────────────────────────┼──────────┼───────────────┼─────────────┤ +│ intermediate0 │ bfloat16 │ 1 │ 4.000000 MB │ +│ intermediate3 │ bfloat16 │ 1 │ 4.000000 MB │ +│ intermediate5 │ bfloat16 │ 1 │ 4.000000 MB │ +│ intermediate6 │ bfloat16 │ 1 │ 4.000000 MB │ +│ intermediate1 │ bfloat16 │ 1 │ 0.250000 MB │ +│ intermediate2 │ bfloat16 │ 1 │ 0.250000 MB │ +│ intermediate4 │ bfloat16 │ 1 │ 0.003906 MB │ +│ intermediate7 │ bfloat16 │ 1 │ 0.003906 MB │ +└────────────────────────────────────────────────────────────────┴──────────┴───────────────┴─────────────┘ + +2025-11-04T21:38:57Z INFO 9044 [BackendDriver]: Backend completed successfully, tearing down. +2025-11-04T21:38:57Z INFO 8594 [job.WalrusDriver.0]: VNCBackend: completed successfully. +2025-11-04T21:38:57Z INFO 8594 [pipeline.Pipeline.0]: Finished job job.WalrusDriver.0 +2025-11-04T21:38:57Z INFO 8594 [pipeline.Pipeline.0]: Starting job job.BIRLinker.0 +2025-11-04T21:38:57Z INFO 8594 [job.BIRLinker.0]: Replay this job by calling: /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/bin/neuronx-cc compile --framework XLA --state '{"model": ["/home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk3/model.MODULE_be035899334776123ed5+d208bdce.hlo_module.pb"], "tensormap": "tensor_map.json", "bir": "walrus_bir.out.json", "lorean_sg_key": null, "input_name_map": null, "output_name_map": null, "constant_tensors": null, "cached_wavegraph": "walrus_bir.out.json", "state_dir": "/home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk3/neuronxcc-miwah3fg/nc00/sg00", "state_id": "nc00/sg00"}' --pipeline BIRLinker +2025-11-04T21:38:57Z INFO 8594 [job.BIRLinker.0]: BIRLinker cwd: /home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk3/neuronxcc-miwah3fg +2025-11-04T21:38:57Z INFO 8594 [job.BIRLinker.0]: Linking already done. +2025-11-04T21:38:57Z INFO 8594 [pipeline.Pipeline.0]: Finished job job.BIRLinker.0 +2025-11-04T21:38:57Z INFO 8594 [pipeline.Pipeline.0]: Starting job job.Kelper.0 +2025-11-04T21:38:57Z INFO 8594 [job.Kelper.0]: Skipping neff generation which was already performed by neff_packager +2025-11-04T21:38:57Z INFO 8594 [pipeline.Pipeline.0]: Finished job job.Kelper.0 +2025-11-04T21:38:57Z INFO 8594 [pipeline.Pipeline.0]: Starting job job.NeffWrapper.0 +2025-11-04T21:38:57Z INFO 8594 [job.NeffWrapper.0]: Job NeffWrapper len(in_states) 1 +2025-11-04T21:38:57Z INFO 8594 [job.NeffWrapper.0]: Processing input #0 +2025-11-04T21:38:57Z INFO 8594 [job.NeffWrapper.0]: Start NeffWrapper +2025-11-04T21:38:57Z INFO 8594 [job.NeffWrapper.0]: Executing: /opt/aws_neuronx_venv_pytorch_2_8_nxd_inference/lib/python3.10/site-packages/neuronxcc/starfish/bin/hlo-neff-wrapper --hlo /home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk3/model.MODULE_be035899334776123ed5+d208bdce.hlo_module.pb --neff /home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk3/model.MODULE_be035899334776123ed5+d208bdce.neff --io_transposes /home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk3/neuronxcc-miwah3fg/io_transposes.json --output /home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk3/wrapped_neff.hlo --netlist /home/ubuntu/qwen3/qwen3-1.7B-TP2-BS8-SEQ4096/context_encoding_model/_tp0_bk3/neuronxcc-miwah3fg/hlo_netlist.json +2025-11-04T21:38:57Z INFO 8594 [job.NeffWrapper.0]: There are no io transposes nor zero-sized parameters. Output will not be produced. +Hlo neff wrapper finished successfully. Have a wonderful day :D + +2025-11-04T21:38:57Z INFO 8594 [job.NeffWrapper.0]: Job #0 finished +2025-11-04T21:38:57Z INFO 8594 [pipeline.Pipeline.0]: Finished job job.NeffWrapper.0 +2025-11-04T21:38:57Z INFO 8594 [pipeline.Pipeline.0]: Finished pipeline Pipeline +2025-11-04T21:38:57Z INFO 8594 [pipeline.Pipeline.0]: Job #0 finished +2025-11-04T21:38:57Z INFO 8576 [root]: Subcommand returned with exitcode=0