Lmod Warning:
-------------------------------------------------------------------------------
The following dependent module(s) are not currently loaded: curl/8.4.0
(required by: htslib/1.16)
-------------------------------------------------------------------------------




The following have been reloaded with a version change:
  1) curl/8.4.0 => curl/8.17.0     2) openssl/3.0.7 => openssl/3.6.0

2026-05-28 11:53:58.994557: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2026-05-28 11:54:15.024940: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2026-05-28 11:54:15.027503: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2026-05-28 11:54:15.506410: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:db:00.0 name: NVIDIA H200 computeCapability: 9.0
coreClock: 1.98GHz coreCount: 132 deviceMemorySize: 139.72GiB deviceMemoryBandwidth: 4.47TiB/s
2026-05-28 11:54:15.506484: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2026-05-28 11:54:16.043521: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2026-05-28 11:54:16.043632: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2026-05-28 11:54:16.462881: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2026-05-28 11:54:16.980538: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2026-05-28 11:54:17.528726: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2026-05-28 11:54:17.750690: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2026-05-28 11:54:17.916616: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2026-05-28 11:54:17.921886: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2026-05-28 11:54:17.922286: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2026-05-28 11:54:17.922431: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2026-05-28 11:54:17.926075: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:db:00.0 name: NVIDIA H200 computeCapability: 9.0
coreClock: 1.98GHz coreCount: 132 deviceMemorySize: 139.72GiB deviceMemoryBandwidth: 4.47TiB/s
2026-05-28 11:54:17.926117: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2026-05-28 11:54:17.926134: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2026-05-28 11:54:17.926144: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2026-05-28 11:54:17.926152: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2026-05-28 11:54:17.926161: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2026-05-28 11:54:17.926169: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2026-05-28 11:54:17.926178: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2026-05-28 11:54:17.926202: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2026-05-28 11:54:17.932461: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2026-05-28 11:54:17.932490: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2026-05-28 12:00:33.991983: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2026-05-28 12:00:33.992079: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      0 
2026-05-28 12:00:33.992091: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0:   N 
2026-05-28 12:00:33.996405: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 133852 MB memory) -> physical GPU (device: 0, name: NVIDIA H200, pci bus id: 0000:db:00.0, compute capability: 9.0)
2026-05-28 12:00:37.792796: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2026-05-28 12:00:37.793283: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2800000000 Hz
2026-05-28 12:00:40.838340: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2026-05-28 12:03:19.212527: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2026-05-28 12:03:19.217605: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2026-05-28 12:16:04.213585: W tensorflow/stream_executor/gpu/asm_compiler.cc:63] Running ptxas --version returned 256
2026-05-28 12:16:04.431494: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: ptxas exited with non-zero error code 256, output: 
Relying on driver to perform ptx compilation. 
Modify $PATH to customize ptxas location.
This message will be only logged once.
2026-05-28 12:16:34.339727: I tensorflow/stream_executor/cuda/cuda_blas.cc:1838] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
/home/users/shouvikm/miniconda3/envs/bpnet/lib/python3.7/site-packages/tensorflow/python/keras/engine/functional.py:595: UserWarning: Input dict contained keys ['coordinates', 'jitters', 'index', 'status', 'rev_comp'] which did not match any model input. They will be ignored by the model.
  [n for n in tensors.keys() if n not in ref_input_names])
/home/users/shouvikm/miniconda3/envs/bpnet/lib/python3.7/site-packages/tensorflow/python/keras/engine/functional.py:595: UserWarning: Input dict contained keys ['coordinates'] which did not match any model input. They will be ignored by the model.
  [n for n in tensors.keys() if n not in ref_input_names])
2026-05-28 12:24:05.114329: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
2026-05-28 12:24:09.325171: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2026-05-28 12:24:15.519111: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2026-05-28 12:24:15.520068: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2026-05-28 12:24:16.017946: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:db:00.0 name: NVIDIA H200 computeCapability: 9.0
coreClock: 1.98GHz coreCount: 132 deviceMemorySize: 139.72GiB deviceMemoryBandwidth: 4.47TiB/s
2026-05-28 12:24:16.018025: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2026-05-28 12:24:16.022508: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2026-05-28 12:24:16.022603: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2026-05-28 12:24:16.024635: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2026-05-28 12:24:16.025734: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2026-05-28 12:24:16.028887: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2026-05-28 12:24:16.030370: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2026-05-28 12:24:16.031493: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2026-05-28 12:24:16.037618: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2026-05-28 12:24:16.037929: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2026-05-28 12:24:16.038023: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2026-05-28 12:24:16.040631: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:db:00.0 name: NVIDIA H200 computeCapability: 9.0
coreClock: 1.98GHz coreCount: 132 deviceMemorySize: 139.72GiB deviceMemoryBandwidth: 4.47TiB/s
2026-05-28 12:24:16.040649: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2026-05-28 12:24:16.040661: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2026-05-28 12:24:16.040670: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2026-05-28 12:24:16.040679: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2026-05-28 12:24:16.040688: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2026-05-28 12:24:16.040697: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2026-05-28 12:24:16.040706: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2026-05-28 12:24:16.040715: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2026-05-28 12:24:16.047478: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2026-05-28 12:24:16.047499: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2026-05-28 12:30:14.277443: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2026-05-28 12:30:14.277543: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      0 
2026-05-28 12:30:14.277553: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0:   N 
2026-05-28 12:30:14.282577: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 133852 MB memory) -> physical GPU (device: 0, name: NVIDIA H200, pci bus id: 0000:db:00.0, compute capability: 9.0)
/home/users/shouvikm/miniconda3/envs/bpnet/lib/python3.7/site-packages/tensorflow/python/keras/layers/core.py:1059: UserWarning: bpnet.model.arch is not loaded, but a Lambda layer uses it. It may cause errors.
  , UserWarning)
batch:   0%|          | 0/32 [00:00<?, ?it/s]2026-05-28 12:30:16.991512: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2026-05-28 12:30:16.991980: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2800000000 Hz
2026-05-28 12:30:17.440582: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2026-05-28 12:33:10.544066: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2026-05-28 12:33:10.545427: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2026-05-28 12:45:21.572223: W tensorflow/stream_executor/gpu/asm_compiler.cc:63] Running ptxas --version returned 256
2026-05-28 12:45:21.793147: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: ptxas exited with non-zero error code 256, output: 
Relying on driver to perform ptx compilation. 
Modify $PATH to customize ptxas location.
This message will be only logged once.
/home/users/shouvikm/miniconda3/envs/bpnet/lib/python3.7/site-packages/tensorflow/python/keras/engine/functional.py:595: UserWarning: Input dict contained keys ['coordinates', 'true_profiles', 'true_logcounts', 'rev_comp'] which did not match any model input. They will be ignored by the model.
  [n for n in tensors.keys() if n not in ref_input_names])
batch:   3%|▎         | 1/32 [15:38<8:04:49, 938.36s/it]batch:   6%|▋         | 2/32 [15:38<3:13:15, 386.52s/it]batch:   9%|▉         | 3/32 [15:38<1:41:33, 210.13s/it]batch:  12%|█▎        | 4/32 [15:39<59:23, 127.26s/it]  batch:  16%|█▌        | 5/32 [15:39<36:39, 81.46s/it] batch:  19%|█▉        | 6/32 [15:39<23:19, 53.84s/it]batch:  22%|██▏       | 7/32 [15:39<15:07, 36.31s/it]batch:  25%|██▌       | 8/32 [15:39<09:55, 24.82s/it]batch:  28%|██▊       | 9/32 [15:40<06:34, 17.13s/it]batch:  31%|███▏      | 10/32 [15:40<04:22, 11.91s/it]batch:  34%|███▍      | 11/32 [15:40<02:55,  8.34s/it]batch:  38%|███▊      | 12/32 [15:40<01:57,  5.87s/it]batch:  41%|████      | 13/32 [15:41<01:19,  4.16s/it]batch:  44%|████▍     | 14/32 [15:41<00:53,  2.97s/it]batch:  47%|████▋     | 15/32 [15:41<00:36,  2.14s/it]batch:  50%|█████     | 16/32 [15:41<00:25,  1.57s/it]batch:  53%|█████▎    | 17/32 [15:42<00:17,  1.16s/it]batch:  56%|█████▋    | 18/32 [15:42<00:12,  1.13it/s]batch:  59%|█████▉    | 19/32 [15:42<00:08,  1.46it/s]batch:  62%|██████▎   | 20/32 [15:42<00:06,  1.82it/s]batch:  66%|██████▌   | 21/32 [15:42<00:04,  2.22it/s]batch:  69%|██████▉   | 22/32 [15:43<00:03,  2.61it/s]batch:  72%|███████▏  | 23/32 [15:43<00:03,  2.98it/s]batch:  75%|███████▌  | 24/32 [15:43<00:02,  3.30it/s]batch:  78%|███████▊  | 25/32 [15:43<00:01,  3.58it/s]batch:  81%|████████▏ | 26/32 [15:44<00:01,  3.78it/s]batch:  84%|████████▍ | 27/32 [15:44<00:01,  3.95it/s]batch:  88%|████████▊ | 28/32 [15:44<00:00,  4.07it/s]batch:  94%|█████████▍| 30/32 [15:44<00:00,  4.97it/s]batch:  97%|█████████▋| 31/32 [15:45<00:00,  4.84it/s]batch: 100%|██████████| 32/32 [15:45<00:00, 29.53s/it]
  0%|          | 0/1924 [00:00<?, ?it/s]  7%|▋         | 130/1924 [00:00<00:01, 1298.28it/s] 14%|█▎        | 260/1924 [00:00<00:01, 1294.21it/s] 20%|██        | 390/1924 [00:00<00:01, 1268.07it/s] 27%|██▋       | 517/1924 [00:00<00:01, 1268.20it/s] 33%|███▎      | 644/1924 [00:00<00:01, 1258.60it/s] 40%|████      | 770/1924 [00:00<00:00, 1247.26it/s] 47%|████▋     | 895/1924 [00:00<00:00, 1238.35it/s] 53%|█████▎    | 1019/1924 [00:00<00:00, 1220.26it/s] 59%|█████▉    | 1142/1924 [00:00<00:00, 1213.75it/s] 66%|██████▌   | 1264/1924 [00:01<00:00, 1203.51it/s] 72%|███████▏  | 1385/1924 [00:01<00:00, 1192.97it/s] 78%|███████▊  | 1505/1924 [00:01<00:00, 1186.44it/s] 84%|████████▍ | 1624/1924 [00:01<00:00, 1174.86it/s] 91%|█████████ | 1742/1924 [00:01<00:00, 1169.89it/s] 97%|█████████▋| 1859/1924 [00:01<00:00, 1156.82it/s]100%|██████████| 1924/1924 [00:01<00:00, 1204.26it/s]
  0%|          | 0/1924 [00:00<?, ?it/s]  7%|▋         | 131/1924 [00:00<00:01, 1289.69it/s] 14%|█▎        | 261/1924 [00:00<00:01, 1290.97it/s] 20%|██        | 391/1924 [00:00<00:01, 1287.31it/s] 27%|██▋       | 520/1924 [00:00<00:01, 1278.54it/s] 34%|███▎      | 648/1924 [00:00<00:01, 1259.88it/s] 40%|████      | 775/1924 [00:00<00:00, 1249.64it/s] 47%|████▋     | 900/1924 [00:00<00:00, 1239.49it/s] 53%|█████▎    | 1024/1924 [00:00<00:00, 1228.36it/s] 60%|█████▉    | 1147/1924 [00:00<00:00, 1213.77it/s] 66%|██████▌   | 1269/1924 [00:01<00:00, 1203.29it/s] 72%|███████▏  | 1390/1924 [00:01<00:00, 1193.62it/s] 78%|███████▊  | 1510/1924 [00:01<00:00, 1187.25it/s] 85%|████████▍ | 1629/1924 [00:01<00:00, 1175.97it/s] 91%|█████████ | 1747/1924 [00:01<00:00, 1171.31it/s] 97%|█████████▋| 1865/1924 [00:01<00:00, 1154.02it/s]100%|██████████| 1924/1924 [00:01<00:00, 1205.61it/s]
2026-05-28 12:46:15.406299: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2026-05-28 12:46:22.127267: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2026-05-28 12:46:22.134304: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2026-05-28 12:46:22.609224: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:db:00.0 name: NVIDIA H200 computeCapability: 9.0
coreClock: 1.98GHz coreCount: 132 deviceMemorySize: 139.72GiB deviceMemoryBandwidth: 4.47TiB/s
2026-05-28 12:46:22.609323: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2026-05-28 12:46:22.613534: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2026-05-28 12:46:22.613644: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2026-05-28 12:46:22.615685: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2026-05-28 12:46:22.616923: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2026-05-28 12:46:22.620132: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2026-05-28 12:46:22.621606: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2026-05-28 12:46:22.622661: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2026-05-28 12:46:22.628682: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2026-05-28 12:46:22.629128: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2026-05-28 12:46:22.629213: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2026-05-28 12:46:22.631862: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:db:00.0 name: NVIDIA H200 computeCapability: 9.0
coreClock: 1.98GHz coreCount: 132 deviceMemorySize: 139.72GiB deviceMemoryBandwidth: 4.47TiB/s
2026-05-28 12:46:22.631907: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2026-05-28 12:46:22.631923: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2026-05-28 12:46:22.631935: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2026-05-28 12:46:22.631947: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2026-05-28 12:46:22.631958: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2026-05-28 12:46:22.631969: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2026-05-28 12:46:22.631981: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2026-05-28 12:46:22.631992: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2026-05-28 12:46:22.638076: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2026-05-28 12:46:22.638108: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2026-05-28 12:52:23.115781: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2026-05-28 12:52:23.115876: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      0 
2026-05-28 12:52:23.115886: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0:   N 
2026-05-28 12:52:23.121156: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 133852 MB memory) -> physical GPU (device: 0, name: NVIDIA H200, pci bus id: 0000:db:00.0, compute capability: 9.0)
2026-05-28 12:52:23.159919: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2026-05-28 12:52:23.174941: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2800000000 Hz
2026-05-28 12:52:26.681812: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2026-05-28 12:55:20.596675: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2026-05-28 12:55:20.602020: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2026-05-28 13:07:25.985719: W tensorflow/stream_executor/gpu/asm_compiler.cc:63] Running ptxas --version returned 256
2026-05-28 13:07:26.203469: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: ptxas exited with non-zero error code 256, output: 
Relying on driver to perform ptx compilation. 
Modify $PATH to customize ptxas location.
This message will be only logged once.
2026-05-28 13:07:28.648392: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at cwise_op_gpu_base.cc:89 : Internal: Failed to load in-memory CUBIN: CUDA_ERROR_NO_BINARY_FOR_GPU: no kernel image is available for execution on the device
2026-05-28 13:07:28.648579: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at cwise_op_gpu_base.cc:89 : Internal: Failed to load in-memory CUBIN: CUDA_ERROR_NO_BINARY_FOR_GPU: no kernel image is available for execution on the device
2026-05-28 13:07:28.648759: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at cwise_op_gpu_base.cc:89 : Internal: Failed to load in-memory CUBIN: CUDA_ERROR_NO_BINARY_FOR_GPU: no kernel image is available for execution on the device
2026-05-28 13:07:31.290379: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at cwise_op_gpu_base.cc:89 : Internal: Failed to load in-memory CUBIN: CUDA_ERROR_NO_BINARY_FOR_GPU: no kernel image is available for execution on the device
2026-05-28 13:07:33.915162: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at cwise_op_gpu_base.cc:89 : Internal: Failed to load in-memory CUBIN: CUDA_ERROR_NO_BINARY_FOR_GPU: no kernel image is available for execution on the device
2026-05-28 13:07:36.552295: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at cwise_op_gpu_base.cc:89 : Internal: Failed to load in-memory CUBIN: CUDA_ERROR_NO_BINARY_FOR_GPU: no kernel image is available for execution on the device
2026-05-28 13:07:39.204208: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at cwise_op_gpu_base.cc:89 : Internal: Failed to load in-memory CUBIN: CUDA_ERROR_NO_BINARY_FOR_GPU: no kernel image is available for execution on the device
2026-05-28 13:07:42.724769: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at cwise_op_gpu_base.cc:89 : Internal: Failed to load in-memory CUBIN: CUDA_ERROR_NO_BINARY_FOR_GPU: no kernel image is available for execution on the device
2026-05-28 13:07:46.237946: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at cwise_op_gpu_base.cc:89 : Internal: Failed to load in-memory CUBIN: CUDA_ERROR_NO_BINARY_FOR_GPU: no kernel image is available for execution on the device
2026-05-28 13:07:49.759406: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at cwise_op_gpu_base.cc:89 : Internal: Failed to load in-memory CUBIN: CUDA_ERROR_NO_BINARY_FOR_GPU: no kernel image is available for execution on the device
2026-05-28 13:07:53.283106: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at cwise_op_gpu_base.cc:89 : Internal: Failed to load in-memory CUBIN: CUDA_ERROR_NO_BINARY_FOR_GPU: no kernel image is available for execution on the device
RuntimeError: module compiled against API version 0xe but this version of numpy is 0xd
/home/users/shouvikm/miniconda3/envs/bpnet/lib/python3.7/site-packages/tensorflow/python/keras/layers/core.py:1059: UserWarning: bpnet.model.arch is not loaded, but a Lambda layer uses it. It may cause errors.
  , UserWarning)
Traceback (most recent call last):
  File "/home/users/shouvikm/miniconda3/envs/bpnet/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1375, in _do_call
    return fn(*args)
  File "/home/users/shouvikm/miniconda3/envs/bpnet/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1360, in _run_fn
    target_list, run_metadata)
  File "/home/users/shouvikm/miniconda3/envs/bpnet/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1453, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: Failed to load in-memory CUBIN: CUDA_ERROR_NO_BINARY_FOR_GPU: no kernel image is available for execution on the device
	 [[{{node gradients/main_logsumexp_counts_bias_0/ReduceLogSumExp/Log_grad/Abs}}]]
	 [[gradients/main_logsumexp_counts_bias_0/ReduceLogSumExp/Sub_grad/Reshape/_65]]
  (1) Internal: Failed to load in-memory CUBIN: CUDA_ERROR_NO_BINARY_FOR_GPU: no kernel image is available for execution on the device
	 [[{{node gradients/main_logsumexp_counts_bias_0/ReduceLogSumExp/Log_grad/Abs}}]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/users/shouvikm/miniconda3/envs/bpnet/bin/bpnet-shap", line 8, in <module>
    sys.exit(shap_scores_main())
  File "/home/users/shouvikm/miniconda3/envs/bpnet/lib/python3.7/site-packages/bpnet/cli/shap_scores.py", line 448, in shap_scores_main
    shap_scores(args, shap_scores_dir)
  File "/home/users/shouvikm/miniconda3/envs/bpnet/lib/python3.7/site-packages/bpnet/cli/shap_scores.py", line 326, in shap_scores
    counts_shap_inputs, progress_message=100)
  File "/home/users/shouvikm/miniconda3/envs/bpnet/lib/python3.7/site-packages/shap/explainers/deep/deep_tf.py", line 294, in shap_values
    sample_phis = self.run(self.phi_symbolic(feature_ind), self.model_inputs, joint_input)
  File "/home/users/shouvikm/miniconda3/envs/bpnet/lib/python3.7/site-packages/shap/explainers/deep/deep_tf.py", line 322, in run
    return self.session.run(out, feed_dict)
  File "/home/users/shouvikm/miniconda3/envs/bpnet/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 968, in run
    run_metadata_ptr)
  File "/home/users/shouvikm/miniconda3/envs/bpnet/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1191, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/users/shouvikm/miniconda3/envs/bpnet/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1369, in _do_run
    run_metadata)
  File "/home/users/shouvikm/miniconda3/envs/bpnet/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1394, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: Failed to load in-memory CUBIN: CUDA_ERROR_NO_BINARY_FOR_GPU: no kernel image is available for execution on the device
	 [[node gradients/main_logsumexp_counts_bias_0/ReduceLogSumExp/Log_grad/Abs (defined at /lib/python3.7/site-packages/shap/explainers/deep/deep_tf.py:494) ]]
	 [[gradients/main_logsumexp_counts_bias_0/ReduceLogSumExp/Sub_grad/Reshape/_65]]
  (1) Internal: Failed to load in-memory CUBIN: CUDA_ERROR_NO_BINARY_FOR_GPU: no kernel image is available for execution on the device
	 [[node gradients/main_logsumexp_counts_bias_0/ReduceLogSumExp/Log_grad/Abs (defined at /lib/python3.7/site-packages/shap/explainers/deep/deep_tf.py:494) ]]
0 successful operations.
0 derived errors ignored.

Errors may have originated from an input operation.
Input Source operations connected to node gradients/main_logsumexp_counts_bias_0/ReduceLogSumExp/Log_grad/Abs:
 gradients/main_logsumexp_counts_bias_0/ReduceLogSumExp/Log_grad/sub (defined at /lib/python3.7/site-packages/shap/explainers/deep/deep_tf.py:489)

Input Source operations connected to node gradients/main_logsumexp_counts_bias_0/ReduceLogSumExp/Log_grad/Abs:
 gradients/main_logsumexp_counts_bias_0/ReduceLogSumExp/Log_grad/sub (defined at /lib/python3.7/site-packages/shap/explainers/deep/deep_tf.py:489)

Original stack trace for 'gradients/main_logsumexp_counts_bias_0/ReduceLogSumExp/Log_grad/Abs':
  File "/bin/bpnet-shap", line 8, in <module>
    sys.exit(shap_scores_main())
  File "/lib/python3.7/site-packages/bpnet/cli/shap_scores.py", line 448, in shap_scores_main
    shap_scores(args, shap_scores_dir)
  File "/lib/python3.7/site-packages/bpnet/cli/shap_scores.py", line 326, in shap_scores
    counts_shap_inputs, progress_message=100)
  File "/lib/python3.7/site-packages/shap/explainers/deep/deep_tf.py", line 294, in shap_values
    sample_phis = self.run(self.phi_symbolic(feature_ind), self.model_inputs, joint_input)
  File "/lib/python3.7/site-packages/shap/explainers/deep/deep_tf.py", line 229, in phi_symbolic
    self.phi_symbolics[i] = tf.gradients(out, self.model_inputs)
  File "/lib/python3.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 318, in gradients_v2
    unconnected_gradients)
  File "/lib/python3.7/site-packages/tensorflow/python/ops/gradients_util.py", line 684, in _GradientsHelper
    lambda: grad_fn(op, *out_grads))
  File "/lib/python3.7/site-packages/tensorflow/python/ops/gradients_util.py", line 340, in _MaybeCompile
    return grad_fn()  # Exit early
  File "/lib/python3.7/site-packages/tensorflow/python/ops/gradients_util.py", line 684, in <lambda>
    lambda: grad_fn(op, *out_grads))
  File "/lib/python3.7/site-packages/shap/explainers/deep/deep_tf.py", line 327, in custom_grad
    return op_handlers[op.type](self, op, *grads)
  File "/lib/python3.7/site-packages/shap/explainers/deep/deep_tf.py", line 477, in handler
    return nonlinearity_1d_handler(input_ind, explainer, op, *grads)
  File "/lib/python3.7/site-packages/shap/explainers/deep/deep_tf.py", line 494, in nonlinearity_1d_handler
    tf.tile(tf.abs(delta_in0), dup0) < 1e-6,
  File "/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper
    return target(*args, **kwargs)
  File "/lib/python3.7/site-packages/tensorflow/python/ops/math_ops.py", line 401, in abs
    return gen_math_ops._abs(x, name=name)
  File "/lib/python3.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 56, in _abs
    "Abs", x=x, name=name)
  File "/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py", line 750, in _apply_op_helper
    attrs=attr_protos, op_def=op_def)
  File "/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3536, in _create_op_internal
    op_def=op_def)
  File "/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 1990, in __init__
    self._traceback = tf_stack.extract_stack()

...which was originally created as op 'main_logsumexp_counts_bias_0/ReduceLogSumExp/Log', defined at:
  File "/bin/bpnet-shap", line 8, in <module>
    sys.exit(shap_scores_main())
[elided 0 identical lines from previous traceback]
  File "/lib/python3.7/site-packages/bpnet/cli/shap_scores.py", line 448, in shap_scores_main
    shap_scores(args, shap_scores_dir)
  File "/lib/python3.7/site-packages/bpnet/cli/shap_scores.py", line 96, in shap_scores
    model = load_model(args.model, compile=False)
  File "/lib/python3.7/site-packages/tensorflow/python/keras/saving/save.py", line 212, in load_model
    return saved_model_load.load(filepath, compile, options)
  File "/lib/python3.7/site-packages/tensorflow/python/keras/saving/saved_model/load.py", line 138, in load
    keras_loader.load_layers(compile=compile)
  File "/lib/python3.7/site-packages/tensorflow/python/keras/saving/saved_model/load.py", line 376, in load_layers
    node_metadata.metadata)
  File "/lib/python3.7/site-packages/tensorflow/python/keras/saving/saved_model/load.py", line 417, in _load_layer
    obj, setter = self._revive_from_config(identifier, metadata, node_id)
  File "/lib/python3.7/site-packages/tensorflow/python/keras/saving/saved_model/load.py", line 435, in _revive_from_config
    self._revive_layer_from_config(metadata, node_id))
  File "/lib/python3.7/site-packages/tensorflow/python/keras/saving/saved_model/load.py", line 495, in _revive_layer_from_config
    generic_utils.serialize_keras_class_and_config(class_name, config))
  File "/lib/python3.7/site-packages/tensorflow/python/keras/layers/serialization.py", line 177, in deserialize
    printable_module_name='layer')
  File "/lib/python3.7/site-packages/tensorflow/python/keras/utils/generic_utils.py", line 358, in deserialize_keras_object
    list(custom_objects.items())))
  File "/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 2262, in from_config
    config, custom_objects=custom_objects)
  File "/lib/python3.7/site-packages/tensorflow/python/keras/engine/functional.py", line 669, in from_config
    config, custom_objects)
  File "/lib/python3.7/site-packages/tensorflow/python/keras/engine/functional.py", line 1285, in reconstruct_from_config
    process_node(layer, node_data)
  File "/lib/python3.7/site-packages/tensorflow/python/keras/engine/functional.py", line 1233, in process_node
    output_tensors = layer(input_tensors, **kwargs)
  File "/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer_v1.py", line 786, in __call__
    outputs = call_fn(cast_inputs, *args, **kwargs)
  File "/lib/python3.7/site-packages/tensorflow/python/keras/layers/core.py", line 917, in call
    result = self.function(inputs, **kwargs)
  File "/lib/python3.7/site-packages/bpnet/model/arch.py", line 446, in <lambda>
    lambda x: tf.math.reduce_logsumexp(x, axis=-1, keepdims=True),

