Centered instance model scales input image (not cropped image) leading to error #872

talmo · 2022-07-29T15:48:17Z

I think the problem is that we generally expect an input scaling of 1.0 for centered instance models since they're crops already. The training does handle this appropriately, but not the visualization for some reason (it's probably missing the input scaling transformer/preprocessing).

In general, I think we can solve this by switching to using the InferenceModel classes to generate visualizations so that we're not doing some custom inference routines inside of Trainer classes.

Here's the relevant error:

  File "D:\anaconda\envs\sleap\lib\site-packages\sleap\nn\callbacks.py", line 280, in on_epoch_end
    figure = self.plot_fn()
  File "D:\anaconda\envs\sleap\lib\site-packages\sleap\nn\training.py", line 1328, in <lambda>
    viz_fn=lambda: visualize_example(next(training_viz_ds_iter)),
  File "D:\anaconda\envs\sleap\lib\site-packages\sleap\nn\training.py", line 1308, in visualize_example
    preds = find_peaks(tf.expand_dims(example["instance_image"], axis=0))
  File "D:\anaconda\envs\sleap\lib\site-packages\keras\engine\base_layer.py", line 1037, in __call__
    outputs = call_fn(inputs, *args, **kwargs)
  File "D:\anaconda\envs\sleap\lib\site-packages\sleap\nn\inference.py", line 1722, in call
    out = self.keras_model(crops)
  File "D:\anaconda\envs\sleap\lib\site-packages\keras\engine\base_layer.py", line 1020, in __call__
    input_spec.assert_input_compatibility(self.input_spec, inputs, self.name)
  File "D:\anaconda\envs\sleap\lib\site-packages\keras\engine\input_spec.py", line 269, in assert_input_compatibility
    ', found shape=' + display_shape(x.shape))
ValueError: Input 0 is incompatible with layer model: expected shape=(None, 128, 128, 3), found shape=(1, 32, 32, 3)

See issue below for more.

Discussed in #871

^{Originally posted by Shifulai July 29, 2022}
Thank for your attention.
When I try to train the top-down centered instance model, the training cannot work when the input scaling is not 1.0. The train will stay at epoch1 but the runtime still add.

Bug report below

Software versions:
SLEAP: 1.2.6
TensorFlow: 2.6.3
Numpy: 1.19.5
Python: 3.7.12
OS: Windows-10-10.0.19041-SP0

Happy SLEAPing! :)
Using already trained model for centroid: D:/Desktop/CK/sleap/data\models\220729_134535.centroid.n=765\training_config.json
Resetting monitor window.
Polling: D:/Desktop/CK/sleap/data\models\220729_194813.centered_instance.n=765\viz\validation.*.png
Start training centered_instance...
['sleap-train', 'C:\\Users\\admin\\AppData\\Local\\Temp\\tmp1aqtnvzl\\220729_194813_training_job.json', 'D:/Desktop/CK/sleap/data/food competition.slp', '--zmq', '--save_viz']       
INFO:sleap.nn.training:Versions:
SLEAP: 1.2.6
TensorFlow: 2.6.3
Numpy: 1.19.5
Python: 3.7.12
OS: Windows-10-10.0.19041-SP0
INFO:sleap.nn.training:Training labels file: D:/Desktop/CK/sleap/data/food competition.slp
INFO:sleap.nn.training:Training profile: C:\Users\admin\AppData\Local\Temp\tmp1aqtnvzl\220729_194813_training_job.json
INFO:sleap.nn.training:
INFO:sleap.nn.training:Arguments:
INFO:sleap.nn.training:{
    "training_job_path": "C:\\Users\\admin\\AppData\\Local\\Temp\\tmp1aqtnvzl\\220729_194813_training_job.json",
    "labels_path": "D:/Desktop/CK/sleap/data/food competition.slp",
    "video_paths": [
        ""
    ],
    "val_labels": null,
    "test_labels": null,
    "tensorboard": false,
    "save_viz": true,
    "zmq": true,
    "run_name": "",
    "prefix": "",
    "suffix": "",
    "cpu": false,
    "first_gpu": false,
    "last_gpu": false,
    "gpu": 0
}
INFO:sleap.nn.training:
INFO:sleap.nn.training:Training job:
INFO:sleap.nn.training:{
    "data": {
        "labels": {
            "training_labels": null,
            "validation_labels": null,
            "validation_fraction": 0.1,
            "test_labels": null,
            "split_by_inds": false,
            "training_inds": null,
            "validation_inds": null,
            "test_inds": null,
            "search_path_hints": [],
            "skeletons": []
        },
        "preprocessing": {
            "ensure_rgb": false,
            "ensure_grayscale": false,
            "imagenet_mode": null,
            "input_scaling": 0.25,
            "pad_to_stride": null,
            "resize_and_pad_to_target": true,
            "target_height": null,
            "target_width": null
        },
        "instance_cropping": {
            "center_on_part": "tail",
            "crop_size": null,
            "crop_size_detection_padding": 16
        }
    },
    "model": {
        "backbone": {
            "leap": null,
            "unet": {
                "stem_stride": null,
                "max_stride": 16,
                "output_stride": 8,
                "filters": 16,
                "filters_rate": 1.5,
                "middle_block": true,
                "up_interpolate": true,
                "stacks": 1
            },
            "hourglass": null,
            "resnet": null,
            "pretrained_encoder": null
        },
        "heads": {
            "single_instance": null,
            "centroid": null,
            "centered_instance": {
                "anchor_part": "tail",
                "part_names": null,
                "sigma": 2.5,
                "output_stride": 8,
                "loss_weight": 1.0,
                "offset_refinement": false
            },
            "multi_instance": null,
            "multi_class_bottomup": null,
            "multi_class_topdown": null
        }
    },
    "optimization": {
        "preload_data": true,
        "augmentation_config": {
            "rotate": true,
            "rotation_min_angle": -180.0,
            "rotation_max_angle": 180.0,
            "translate": false,
            "translate_min": -5,
            "translate_max": 5,
            "scale": false,
            "scale_min": 0.9,
            "scale_max": 1.1,
            "uniform_noise": false,
            "uniform_noise_min_val": 0.0,
            "uniform_noise_max_val": 10.0,
            "gaussian_noise": false,
            "gaussian_noise_mean": 5.0,
            "gaussian_noise_stddev": 1.0,
            "contrast": true,
            "contrast_min_gamma": 0.5,
            "contrast_max_gamma": 2.0,
            "brightness": true,
            "brightness_min_val": 0.0,
            "brightness_max_val": 10.0,
            "random_crop": false,
            "random_crop_height": 256,
            "random_crop_width": 256,
            "random_flip": false,
            "flip_horizontal": true
        },
        "online_shuffling": true,
        "shuffle_buffer_size": 128,
        "prefetch": true,
        "batch_size": 4,
        "batches_per_epoch": null,
        "min_batches_per_epoch": 200,
        "val_batches_per_epoch": null,
        "min_val_batches_per_epoch": 10,
        "epochs": 200,
        "optimizer": "adam",
        "initial_learning_rate": 0.0001,
        "learning_rate_schedule": {
            "reduce_on_plateau": true,
            "reduction_factor": 0.5,
            "plateau_min_delta": 1e-06,
            "plateau_patience": 5,
            "plateau_cooldown": 3,
            "min_learning_rate": 1e-08
        },
        "hard_keypoint_mining": {
            "online_mining": false,
            "hard_to_easy_ratio": 2.0,
            "min_hard_keypoints": 2,
            "max_hard_keypoints": null,
            "loss_scale": 5.0
        },
        "early_stopping": {
            "stop_training_on_plateau": true,
            "plateau_min_delta": 1e-08,
            "plateau_patience": 20
        }
    },
    "outputs": {
        "save_outputs": true,
        "run_name": "220729_194813.centered_instance.n=765",
        "run_name_prefix": "",
        "run_name_suffix": "",
        "runs_folder": "D:/Desktop/CK/sleap/data\\models",
        "tags": [
            ""
        ],
        "save_visualizations": true,
        "delete_viz_images": true,
        "zip_outputs": false,
        "log_to_csv": true,
        "checkpointing": {
            "initial_model": false,
            "best_model": true,
            "every_epoch": false,
            "latest_model": false,
            "final_model": false
        },
        "tensorboard": {
            "write_logs": false,
            "loss_frequency": "epoch",
            "architecture_graph": false,
            "profile_graph": false,
            "visualizations": true
        },
        "zmq": {
            "subscribe_to_controller": true,
            "controller_address": "tcp://127.0.0.1:9000",
            "controller_polling_timeout": 10,
            "publish_updates": true,
            "publish_address": "tcp://127.0.0.1:9001"
        }
    },
    "name": "",
    "description": "",
    "sleap_version": "1.2.6",
    "filename": "C:\\Users\\admin\\AppData\\Local\\Temp\\tmp1aqtnvzl\\220729_194813_training_job.json"
}
INFO:sleap.nn.training:
INFO:sleap.nn.training:Using GPU 0 for acceleration.
INFO:sleap.nn.training:Disabled GPU memory pre-allocation.
INFO:sleap.nn.training:System:
GPUs: 1/1 available
  Device: /physical_device:GPU:0
         Available: True
        Initalized: False
     Memory growth: True
INFO:sleap.nn.training:
INFO:sleap.nn.training:Initializing trainer...
INFO:sleap.nn.training:Loading training labels from: D:/Desktop/CK/sleap/data/food competition.slp
INFO:sleap.nn.training:Creating training and validation splits from validation fraction: 0.1
INFO:sleap.nn.training:  Splits: Training = 689 / Validation = 76.
INFO:sleap.nn.training:Setting up for training...
INFO:sleap.nn.training:Setting up pipeline builders...
INFO:sleap.nn.training:Setting up model...
INFO:sleap.nn.training:Building test pipeline...
2022-07-29 19:48:31.149710: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-07-29 19:48:33.014129: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3489 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
2022-07-29 19:48:35.801343: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
2022-07-29 19:48:47.065434: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 480 } dim { size: 270 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2304 num_cores: 16 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 32768 l2_cache_size: 262144 l3_cache_size: 16777216 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 128 } dim { size: 128 } dim { size: 3 } } }
INFO:sleap.nn.training:Loaded test example. [22.266s]
INFO:sleap.nn.training:  Input shape: (128, 128, 3)
INFO:sleap.nn.training:Created Keras model.
INFO:sleap.nn.training:  Backbone: UNet(stacks=1, filters=16, filters_rate=1.5, kernel_size=3, stem_kernel_size=7, convs_per_block=2, stem_blocks=0, down_blocks=4, middle_block=True, up_blocks=1, up_interpolate=True, block_contraction=False)
INFO:sleap.nn.training:  Max stride: 16
INFO:sleap.nn.training:  Parameters: 265,575
INFO:sleap.nn.training:  Heads:
INFO:sleap.nn.training:    [0] = CenteredInstanceConfmapsHead(part_names=['nose', 'hear_r', 'hear_l', 'tail'], anchor_part='tail', sigma=2.5, output_stride=8, loss_weight=1.0)       
INFO:sleap.nn.training:  Outputs:
INFO:sleap.nn.training:    [0] = KerasTensor(type_spec=TensorSpec(shape=(None, 16, 16, 4), dtype=tf.float32, name=None), name='CenteredInstanceConfmapsHead/BiasAdd:0', description="created by layer 'CenteredInstanceConfmapsHead'")
INFO:sleap.nn.training:Setting up data pipelines...
INFO:sleap.nn.training:Training set: n = 689
INFO:sleap.nn.training:Validation set: n = 76
INFO:sleap.nn.training:Setting up optimization...
INFO:sleap.nn.training:  Learning rate schedule: LearningRateScheduleConfig(reduce_on_plateau=True, reduction_factor=0.5, plateau_min_delta=1e-06, plateau_patience=5, plateau_cooldown=3, min_learning_rate=1e-08)
INFO:sleap.nn.training:  Early stopping: EarlyStoppingConfig(stop_training_on_plateau=True, plateau_min_delta=1e-08, plateau_patience=20)
INFO:sleap.nn.training:Setting up outputs...
INFO:sleap.nn.callbacks:Training controller subscribed to: tcp://127.0.0.1:9000 (topic: )
INFO:sleap.nn.training:  ZMQ controller subcribed to: tcp://127.0.0.1:9000
INFO:sleap.nn.callbacks:Progress reporter publishing on: tcp://127.0.0.1:9001 for: not_set
INFO:sleap.nn.training:  ZMQ progress reporter publish on: tcp://127.0.0.1:9001
INFO:sleap.nn.training:Created run path: D:/Desktop/CK/sleap/data\models\220729_194813.centered_instance.n=765
INFO:sleap.nn.training:Setting up visualization...
2022-07-29 19:48:59.507634: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1920 } dim { size: 1080 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2304 num_cores: 16 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 32768 l2_cache_size: 262144 l3_cache_size: 16777216 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } 
dim { size: 128 } dim { size: 128 } dim { size: 3 } } }
2022-07-29 19:49:07.684222: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1920 } dim { size: 1080 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2304 num_cores: 16 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 32768 l2_cache_size: 262144 l3_cache_size: 16777216 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } 
dim { size: 128 } dim { size: 128 } dim { size: 3 } } }
INFO:sleap.nn.training:Finished trainer set up. [41.8s]
INFO:sleap.nn.training:Creating tf.data.Datasets for training data generation...
2022-07-29 19:55:14.233259: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 480 } dim { size: 270 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2304 num_cores: 16 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 32768 l2_cache_size: 262144 l3_cache_size: 16777216 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 128 } dim { size: 128 } dim { size: 3 } } }
2022-07-29 19:55:31.551806: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 480 } dim { size: 270 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2304 num_cores: 16 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 32768 l2_cache_size: 262144 l3_cache_size: 16777216 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 128 } dim { size: 128 } dim { size: 3 } } }
INFO:sleap.nn.training:Finished creating training datasets. [384.0s]
INFO:sleap.nn.training:Starting training loop...
2022-07-29 19:55:32.101723: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 480 } dim { size: 270 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2304 num_cores: 16 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 32768 l2_cache_size: 262144 l3_cache_size: 16777216 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 128 } dim { size: 128 } dim { size: 3 } } }
Epoch 1/200
2022-07-29 19:55:33.962928: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8201
WARNING:tensorflow:Callback method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0000s vs `on_train_batch_end` time: 0.0156s). Check your callbacks.
2022-07-29 19:55:58.738032: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 480 } dim { size: 270 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2304 num_cores: 16 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 32768 l2_cache_size: 262144 l3_cache_size: 16777216 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 128 } dim { size: 128 } dim { size: 3 } } }
344/344 - 30s - loss: 0.0325 - nose: 0.0373 - hear_r: 0.0405 - hear_l: 0.0394 - tail: 0.0128 - val_loss: 0.0242 - val_nose: 0.0294 - val_hear_r: 0.0325 - val_hear_l: 0.0307 - val_tail: 0.0041
Traceback (most recent call last):
  File "D:\anaconda\envs\sleap\Scripts\sleap-train-script.py", line 33, in <module>
    sys.exit(load_entry_point('sleap==1.2.6', 'console_scripts', 'sleap-train')())
  File "D:\anaconda\envs\sleap\lib\site-packages\sleap\nn\training.py", line 1955, in main
    trainer.train()
  File "D:\anaconda\envs\sleap\lib\site-packages\sleap\nn\training.py", line 923, in train
    verbose=2,
  File "D:\anaconda\envs\sleap\lib\site-packages\keras\engine\training.py", line 1230, in fit
    callbacks.on_epoch_end(epoch, epoch_logs)
  File "D:\anaconda\envs\sleap\lib\site-packages\keras\callbacks.py", line 413, in on_epoch_end
    callback.on_epoch_end(epoch, logs)
  File "D:\anaconda\envs\sleap\lib\site-packages\sleap\nn\callbacks.py", line 280, in on_epoch_end
    figure = self.plot_fn()
  File "D:\anaconda\envs\sleap\lib\site-packages\sleap\nn\training.py", line 1328, in <lambda>
    viz_fn=lambda: visualize_example(next(training_viz_ds_iter)),
  File "D:\anaconda\envs\sleap\lib\site-packages\sleap\nn\training.py", line 1308, in visualize_example
    preds = find_peaks(tf.expand_dims(example["instance_image"], axis=0))
  File "D:\anaconda\envs\sleap\lib\site-packages\keras\engine\base_layer.py", line 1037, in __call__
    outputs = call_fn(inputs, *args, **kwargs)
  File "D:\anaconda\envs\sleap\lib\site-packages\sleap\nn\inference.py", line 1722, in call
    out = self.keras_model(crops)
  File "D:\anaconda\envs\sleap\lib\site-packages\keras\engine\base_layer.py", line 1020, in __call__
    input_spec.assert_input_compatibility(self.input_spec, inputs, self.name)
  File "D:\anaconda\envs\sleap\lib\site-packages\keras\engine\input_spec.py", line 269, in assert_input_compatibility
    ', found shape=' + display_shape(x.shape))
ValueError: Input 0 is incompatible with layer model: expected shape=(None, 128, 128, 3), found shape=(1, 32, 32, 3)
INFO:sleap.nn.callbacks:Closing the reporter controller/context.
INFO:sleap.nn.callbacks:Closing the training controller socket/context.

The text was updated successfully, but these errors were encountered:

roomrys · 2022-08-02T18:45:16Z

Can we just use the output stride of the centroid model to do a limited version of input scaling on the centered instance model (finite set of feasible stride values)? -> This would couple the centroid and centered instance models though, which we might not want.

Problem Analysis

It seems that the keras_model used expects an input shape same as the pre-scaled input. We should initialize the keras model to expect the scaled input shape.

Relevant Code

Set up the keras model

sleap/sleap/nn/training.py

Lines 721 to 736 in c4409dd

    
           def _setup_model(self): 
        
               """Set up the keras model.""" 
        
               # Infer the input shape by evaluating the data pipeline. 
        
               logger.info("Building test pipeline...") 
        
               t0 = time() 
        
               base_pipeline = self.pipeline_builder.make_base_pipeline( 
        
                   self.data_readers.training_labels_reader 
        
               ) 
        
               base_example = next(iter(base_pipeline.make_dataset())) 
        
               input_shape = base_example[self.input_keys[0]].shape 
        
               # TODO: extend input shape determination for multi-input 
        
               logger.info(f"Loaded test example. [{time() - t0:.3f}s]") 
        
               logger.info(f"  Input shape: {input_shape}") 
        
               # Create the tf.keras.Model instance. 
        
               self.model.make_model(input_shape)

Make the pipeline using the preprocessing from self.data_config. Note that the Resizer is resizing the original uncropped image. To resize the cropped image, we should move the Resizer after the InstanceCropper transform.

sleap/sleap/nn/data/pipelines.py

Lines 657 to 680 in c4409dd

    
               def make_base_pipeline(self, data_provider: Provider) -> Pipeline: 
        
                   """Create base pipeline with input data only. 
        
                   Args: 
        
                       data_provider: A `Provider` that generates data examples, typically a 
        
                           `LabelsReader` instance. 
        
                   Returns: 
        
                       A `Pipeline` instance configured to produce input examples. 
        
                   """ 
        
                   pipeline = Pipeline(providers=data_provider) 
        
                   if self.data_config.preprocessing.resize_and_pad_to_target: 
        
                       pipeline += SizeMatcher.from_config( 
        
                           config=self.data_config.preprocessing, 
        
                           provider=data_provider, 
        
                       ) 
        
                   pipeline += Normalizer.from_config(self.data_config.preprocessing) 
        
                   pipeline += Resizer.from_config(self.data_config.preprocessing) 
        
                   pipeline += InstanceCentroidFinder.from_config( 
        
                       self.data_config.instance_cropping, 
        
                       skeletons=self.data_config.labels.skeletons, 
        
                   ) 
        
                   pipeline += InstanceCropper.from_config(self.data_config.instance_cropping) 
        
                   return pipeline

The visualization pipeline for TopDown should use self.make_base_pipeline instead of rewriting everything (also the base pipeline includes the Resizer)

sleap/sleap/nn/data/pipelines.py

Lines 760 to 785 in c4409dd

    
8000
          
               def make_viz_pipeline(self, data_provider: Provider) -> Pipeline: 
        
                   """Create visualization pipeline. 
        
                   Args: 
        
                       data_provider: A `Provider` that generates data examples, typically a 
        
                           `LabelsReader` instance. 
        
                   Returns: 
        
                       A `Pipeline` instance configured to fetch data and for running inference to 
        
                       generate predictions useful for visualization during training. 
        
                   """ 
        
                   pipeline = Pipeline(data_provider) 
        
                   if self.data_config.preprocessing.resize_and_pad_to_target: 
        
                       pipeline += SizeMatcher.from_config( 
        
                           config=self.data_config.preprocessing, 
        
                           provider=data_provider, 
        
                       ) 
        
                   pipeline += Normalizer.from_config(self.data_config.preprocessing) 
        
                   pipeline += InstanceCentroidFinder.from_config( 
        
                       self.data_config.instance_cropping, 
        
                       skeletons=self.data_config.labels.skeletons, 
        
                   ) 
        
                   pipeline += InstanceCropper.from_config(self.data_config.instance_cropping) 
        
                   pipeline += Repeater() 
        
                   pipeline += Prefetcher() 
        
                   return pipeline

Follow-up problems

After moving the Resizer after the InstanceCropper, we also need a way of passing in the points_keys from SizeMatcher to Resizer.
We need to make some changes to Recalculate crop size if user-specified crop size indivisible by max stride #841 s.t. the crop_size * scale_size must be divisible by the max stride for the TopDown (centered instance).

Traceback

Traceback (most recent call last):
  File "C:\Users\TalmoLab\miniconda3\envs\sleap_convert-naming\Scripts\sleap-train-script.py", line 33, in <module>
    sys.exit(load_entry_point('sleap', 'console_scripts', 'sleap-train')())
  File "d:\social-leap-estimates-animal-poses\pull-requests\sleap_convert-naming\sleap\sleap\nn\training.py", line 1955, in main    trainer.train()
  File "d:\social-leap-estimates-animal-poses\pull-requests\sleap_convert-naming\sleap\sleap\nn\training.py", line 923, in train    verbose=2,
  File "C:\Users\TalmoLab\miniconda3\envs\sleap_convert-naming\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "d:\social-leap-estimates-animal-poses\pull-requests\sleap_convert-naming\sleap\sleap\nn\callbacks.py", line 280, in on_epoch_end
    figure = self.plot_fn()
  File "d:\social-leap-estimates-animal-poses\pull-requests\sleap_convert-naming\sleap\sleap\nn\training.py", line 1328, in <lambda>
    viz_fn=lambda: visualize_example(next(training_viz_ds_iter)),
  File "d:\social-leap-estimates-animal-poses\pull-requests\sleap_convert-naming\sleap\sleap\nn\training.py", line 1308, in visualize_example
    preds = find_peaks(tf.expand_dims(example["instance_image"], axis=0))
  File "d:\social-leap-estimates-animal-poses\pull-requests\sleap_convert-naming\sleap\sleap\nn\inference.py", line 1723, in call
    out = self.keras_model(crops)
ValueError: Exception encountered when calling layer "find_instance_peaks" (type FindInstancePeaks).

Input 0 of layer "model" is incompatible with the layer: expected shape=(None, 224, 224, 1), found shape=(1, 168, 168, 1)       

Call arguments received:
  • inputs=tf.Tensor(shape=(1, 224, 224, 1), dtype=float32)

amblypatty · 2023-02-21T02:19:36Z

I too think I am now experiencing this issue; however, I am not sure why it is coming up for me now during training when I have been training on the same model for weeks. I have tried reinstalling SLEAP v1.2.9 and paying Google Colab for more compute capability (per discussion #871). Below is the dialogue, which appears similar to what @talmo posted, but the TF error comes up at the INFO:sleap.nn.training:Building test pipeline... before the visualization set up. Note that I also disabled visualizations from the Run Training dialogue, yet the issue still came up:

INFO:numexpr.utils:Note: NumExpr detected 12 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
INFO:sleap.nn.training:Versions:
SLEAP: 1.2.9
TensorFlow: 2.8.4
Numpy: 1.21.6
Python: 3.8.10
OS: Linux-5.10.147+-x86_64-with-glibc2.29
INFO:sleap.nn.training:Training labels file: resolved_skeletons_with_predictions.pkg.slp
INFO:sleap.nn.training:Training profile: centered_instance.json
INFO:sleap.nn.training:
INFO:sleap.nn.training:Arguments:
INFO:sleap.nn.training:{
    "training_job_path": "centered_instance.json",
    "labels_path": "resolved_skeletons_with_predictions.pkg.slp",
    "video_paths": [
        ""
    ],
    "val_labels": null,
    "test_labels": null,
    "tensorboard": false,
    "save_viz": false,
    "zmq": false,
    "run_name": "",
    "prefix": "",
    "suffix": "",
    "cpu": false,
    "first_gpu": false,
    "last_gpu": false,
    "gpu": "auto"
}
INFO:sleap.nn.training:
INFO:sleap.nn.training:Training job:
INFO:sleap.nn.training:{
    "data": {
        "labels": {
            "training_labels": null,
            "validation_labels": null,
            "validation_fraction": 0.1,
            "test_labels": null,
            "split_by_inds": false,
            "training_inds": null,
            "validation_inds": null,
            "test_inds": null,
            "search_path_hints": [],
            "skeletons": []
        },
        "preprocessing": {
            "ensure_rgb": false,
            "ensure_grayscale": false,
            "imagenet_mode": null,
            "input_scaling": 1.0,
            "pad_to_stride": null,
            "resize_and_pad_to_target": true,
            "target_height": null,
            "target_width": null
        },
        "instance_cropping": {
            "center_on_part": "pedicel",
            "crop_size": null,
            "crop_size_detection_padding": 16
        }
    },
    "model": {
        "backbone": {
            "leap": null,
            "unet": {
                "stem_stride": null,
                "max_stride": 32,
                "output_stride": 4,
                "filters": 48,
                "filters_rate": 2.0,
                "middle_block": true,
                "up_interpolate": true,
                "stacks": 1
            },
            "hourglass": null,
            "resnet": null,
            "pretrained_encoder": null
        },
        "heads": {
            "single_instance": null,
            "centroid": null,
            "centered_instance": {
                "anchor_part": "pedicel",
                "part_names": null,
                "sigma": 2.5,
                "output_stride": 4,
                "loss_weight": 1.0,
                "offset_refinement": false
            },
            "multi_instance": null,
            "multi_class_bottomup": null,
            "multi_class_topdown": null
        }
    },
    "optimization": {
        "preload_data": true,
        "augmentation_config": {
            "rotate": true,
            "rotation_min_angle": -180.0,
            "rotation_max_angle": 180.0,
            "translate": false,
            "translate_min": -5,
            "translate_max": 5,
            "scale": true,
            "scale_min": 0.9,
            "scale_max": 1.1,
            "uniform_noise": false,
            "uniform_noise_min_val": 0.0,
            "uniform_noise_max_val": 10.0,
            "gaussian_noise": false,
            "gaussian_noise_mean": 5.0,
            "gaussian_noise_stddev": 1.0,
            "contrast": false,
            "contrast_min_gamma": 0.5,
            "contrast_max_gamma": 2.0,
            "brightness": true,
            "brightness_min_val": 0.0,
            "brightness_max_val": 10.0,
            "random_crop": false,
            "random_crop_height": 256,
            "random_crop_width": 256,
            "random_flip": false,
            "flip_horizontal": true
        },
        "online_shuffling": true,
        "shuffle_buffer_size": 128,
        "prefetch": true,
        "batch_size": 4,
        "batches_per_epoch": null,
        "min_batches_per_epoch": 200,
        "val_batches_per_epoch": null,
        "min_val_batches_per_epoch": 10,
        "epochs": 200,
        "optimizer": "adam",
        "initial_learning_rate": 0.0001,
        "learning_rate_schedule": {
            "reduce_on_plateau": true,
            "reduction_factor": 0.5,
            "plateau_min_delta": 1e-06,
            "plateau_patience": 5,
            "plateau_cooldown": 3,
            "min_learning_rate": 1e-08
        },
        "hard_keypoint_mining": {
            "online_mining": true,
            "hard_to_easy_ratio": 2.0,
            "min_hard_keypoints": 3,
            "max_hard_keypoints": null,
            "loss_scale": 5.0
        },
        "early_stopping": {
            "stop_training_on_plateau": true,
            "plateau_min_delta": 1e-08,
            "plateau_patience": 10
        }
    },
    "outputs": {
        "save_outputs": true,
        "run_name": "230220_173034",
        "run_name_prefix": "",
        "run_name_suffix": ".centered_instance",
        "runs_folder": "models",
        "tags": [
            ""
        ],
        "save_visualizations": true,
        "delete_viz_images": true,
        "zip_outputs": false,
        "log_to_csv": true,
        "checkpointing": {
            "initial_model": false,
            "best_model": true,
            "every_epoch": false,
            "latest_model": false,
            "final_model": false
        },
        "tensorboard": {
            "write_logs": false,
            "loss_frequency": "epoch",
            "architecture_graph": false,
            "profile_graph": false,
            "visualizations": true
        },
        "zmq": {
            "subscribe_to_controller": false,
            "controller_address": "tcp://127.0.0.1:9000",
            "controller_polling_timeout": 10,
            "publish_updates": false,
            "publish_address": "tcp://127.0.0.1:9001"
        }
    },
    "name": "",
    "description": "",
    "sleap_version": "1.2.9",
    "filename": "centered_instance.json"
}
INFO:sleap.nn.training:
INFO:sleap.nn.training:Auto-selected GPU 0 with 40533 MiB of free memory.
INFO:sleap.nn.training:Using GPU 0 for acceleration.
INFO:sleap.nn.training:Disabled GPU memory pre-allocation.
INFO:sleap.nn.training:System:
GPUs: 1/1 available
  Device: /physical_device:GPU:0
         Available: True
        Initalized: False
     Memory growth: True
INFO:sleap.nn.training:
INFO:sleap.nn.training:Initializing trainer...
INFO:sleap.nn.training:Loading training labels from: resolved_skeletons_with_predictions.pkg.slp
INFO:sleap.nn.training:Creating training and validation splits from validation fraction: 0.1
INFO:sleap.nn.training:  Splits: Training = 271 / Validation = 30.
INFO:sleap.nn.training:Setting up for training...
INFO:sleap.nn.training:Setting up pipeline builders...
INFO:sleap.nn.training:Setting up model...
INFO:sleap.nn.training:Building test pipeline...
2023-02-20 23:13:00.691758: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 544 } dim { size: 544 } dim { size: 3 } } }
INFO:sleap.nn.training:Loaded test example. [3.366s]
INFO:sleap.nn.training:  Input shape: (544, 544, 3)
INFO:sleap.nn.training:Created Keras model.
INFO:sleap.nn.training:  Backbone: UNet(stacks=1, filters=48, filters_rate=2.0, kernel_size=3, stem_kernel_size=7, convs_per_block=2, stem_blocks=0, down_blocks=5, middle_block=True, up_blocks=3, up_interpolate=True, block_contraction=False)
INFO:sleap.nn.training:  Max stride: 32
INFO:sleap.nn.training:  Parameters: 70,331,019
INFO:sleap.nn.training:  Heads: 
INFO:sleap.nn.training:    [0] = CenteredInstanceConfmapsHead(part_names=['prosoma', 'pedicel', 'opisthosoma', 'pedipalpR1', 'pedipalpL1', 'antlegR1', 'antlegR2', 'antlegL1', 'antlegL2', 'forelegR1', 'forelegR2', 'forelegL1', 'forelegL2', 'midlegR1', 'midlegR2', 'midlegL1', 'midlegL2', 'hindlegR1', 'hindlegR2', 'hindlegL1', 'hindlegL2', 'pedipalpR2', 'pedipalpL2', 'antlegR3', 'antlegR4', 'antlegL3', 'antlegL4'], anchor_part='pedicel', sigma=2.5, output_stride=4, loss_weight=1.0)
INFO:sleap.nn.training:  Outputs: 
INFO:sleap.nn.training:    [0] = KerasTensor(type_spec=TensorSpec(shape=(None, 136, 136, 27), dtype=tf.float32, name=None), name='CenteredInstanceConfmapsHead/BiasAdd:0', description="created by layer 'CenteredInstanceConfmapsHead'")
INFO:sleap.nn.training:Setting up data pipelines...
INFO:sleap.nn.training:Training set: n = 271
INFO:sleap.nn.training:Validation set: n = 30
INFO:sleap.nn.training:Setting up optimization...
INFO:sleap.nn.training:  OHKM enabled: HardKeypointMiningConfig(online_mining=True, hard_to_easy_ratio=2.0, min_hard_keypoints=3, max_hard_keypoints=None, loss_scale=5.0)
INFO:sleap.nn.training:  Learning rate schedule: LearningRateScheduleConfig(reduce_on_plateau=True, reduction_factor=0.5, plateau_min_delta=1e-06, plateau_patience=5, plateau_cooldown=3, min_learning_rate=1e-08)
INFO:sleap.nn.training:  Early stopping: EarlyStoppingConfig(stop_training_on_plateau=True, plateau_min_delta=1e-08, plateau_patience=10)
INFO:sleap.nn.training:Setting up outputs...
INFO:sleap.nn.training:Created run path: models/230220_173034.centered_instance
INFO:sleap.nn.training:Setting up visualization...
2023-02-20 23:13:02.432647: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 544 } dim { size: 544 } dim { size: 3 } } }
2023-02-20 23:13:03.627076: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 544 } dim { size: 544 } dim { size: 3 } } }
Unable to use Qt backend for matplotlib. This probably means Qt is running headless.
INFO:sleap.nn.training:Finished trainer set up. [6.2s]
INFO:sleap.nn.training:Creating tf.data.Datasets for training data generation...
2023-02-20 23:13:15.674809: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 544 } dim { size: 544 } dim { size: 3 } } }
2023-02-20 23:13:18.846102: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 544 } dim { size: 544 } dim { size: 3 } } }
INFO:sleap.nn.training:Finished creating training datasets. [15.6s]
INFO:sleap.nn.training:Starting training loop...
2023-02-20 23:13:19.593550: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 544 } dim { size: 544 } dim { size: 3 } } }
Epoch 1/200
2023-02-20 23:13:56.422111: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 544 } dim { size: 544 } dim { size: 3 } } }
2023-02-20 23:14:02.495095: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 27 } dim { size: 136 } dim { size: 136 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40233992192 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -5 } dim { size: -6 } dim { size: 1 } } }
200/200 - 46s - loss: 0.0064 - ohkm: 0.0053 - prosoma: 0.0010 - pedicel: 0.0010 - opisthosoma: 0.0011 - pedipalpR1: 0.0010 - pedipalpL1: 0.0010 - antlegR1: 0.0011 - antlegR2: 0.0011 - antlegL1: 0.0011 - antlegL2: 0.0011 - forelegR1: 0.0010 - forelegR2: 0.0011 - forelegL1: 0.0011 - forelegL2: 0.0011 - midlegR1: 0.0010 - midlegR2: 0.0011 - midlegL1: 0.0010 - midlegL2: 0.0011 - hindlegR1: 0.0010 - hindlegR2: 0.0011 - hindlegL1: 0.0010 - hindlegL2: 0.0011 - pedipalpR2: 0.0011 - pedipalpL2: 0.0011 - antlegR3: 0.0010 - antlegR4: 9.9843e-04 - antlegL3: 0.0011 - antlegL4: 0.0010 - val_loss
8000
: 0.0063 - val_ohkm: 0.0053 - val_prosoma: 0.0010 - val_pedicel: 9.7641e-04 - val_opisthosoma: 0.0010 - val_pedipalpR1: 0.0010 - val_pedipalpL1: 0.0010 - val_antlegR1: 0.0010 - val_antlegR2: 0.0011 - val_antlegL1: 0.0010 - val_antlegL2: 0.0011 - val_forelegR1: 0.0010 - val_forelegR2: 0.0011 - val_forelegL1: 0.0010 - val_forelegL2: 0.0011 - val_midlegR1: 0.0010 - val_midlegR2: 0.0011 - val_midlegL1: 0.0010 - val_midlegL2: 0.0011 - val_hindlegR1: 0.0010 - val_hindlegR2: 0.0011 - val_hindlegL1: 0.0010 - val_hindlegL2: 0.0011 - val_pedipalpR2: 0.0010 - val_pedipalpL2: 0.0010 - val_antlegR3: 0.0011 - val_antlegR4: 9.6961e-04 - val_antlegL3: 0.0011 - val_antlegL4: 9.9553e-04 - lr: 1.0000e-04 - 46s/epoch - 232ms/step
Epoch 2/200
2023-02-20 23:14:33.058386: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 544 } dim { size: 544 } dim { size: 3 } } }
200/200 - 39s - loss: 0.0063 - ohkm: 0.0053 - prosoma: 9.8153e-04 - pedicel: 9.6689e-04 - opisthosoma: 0.0010 - pedipalpR1: 9.9786e-04 - pedipalpL1: 9.9719e-04 - antlegR1: 0.0010 - antlegR2: 0.0010 - antlegL1: 0.0010 - antlegL2: 0.0010 - forelegR1: 0.0010 - forelegR2: 0.0010 - forelegL1: 0.0010 - forelegL2: 0.0010 - midlegR1: 0.0010 - midlegR2: 0.0010 - midlegL1: 0.0010 - midlegL2: 0.0010 - hindlegR1: 0.0010 - hindlegR2: 0.0010 - hindlegL1: 0.0010 - hindlegL2: 0.0010 - pedipalpR2: 0.0010 - pedipalpL2: 0.0010 - antlegR3: 0.0010 - antlegR4: 9.8585e-04 - antlegL3: 0.0010 - antlegL4: 0.0010 - val_loss: 0.0063 - val_ohkm: 0.0052 - val_prosoma: 9.7833e-04 - val_pedicel: 9.5453e-04 - val_opisthosoma: 0.0010 - val_pedipalpR1: 0.0010 - val_pedipalpL1: 0.0010 - val_antlegR1: 0.0010 - val_antlegR2: 0.0010 - val_antlegL1: 0.0010 - val_antlegL2: 0.0010 - val_forelegR1: 0.0010 - val_forelegR2: 0.0010 - val_forelegL1: 0.0010 - val_forelegL2: 0.0010 - val_midlegR1: 0.0010 - val_midlegR2: 0.0010 - val_midlegL1: 0.0010 - val_midlegL2: 0.0010 - val_hindlegR1: 0.0010 - val_hindlegR2: 0.0010 - val_hindlegL1: 0.0010 - val_hindlegL2: 0.0010 - val_pedipalpR2: 0.0010 - val_pedipalpL2: 0.0010 - val_antlegR3: 0.0010 - val_antlegR4: 9.8150e-04 - val_antlegL3: 0.0010 - val_antlegL4: 0.0010 - lr: 1.0000e-04 - 39s/epoch - 195ms/step

...until I force stopped the process.
I appreciate any help you can provide.

roomrys · 2023-02-21T19:04:54Z

Hi @amblypatty,

Originally, we thought this error might be caused by the plotting just the visualizations (confidence maps overlaid on instances) during training; however, after tracking down the error, we found that the real problem was that our pipeline for the top-down model is not set-up to handle input scaling on the second model (the centered instance model). It seems your input_scaling is set to the default 1.0 so we don't expect to see this particular error in your case.

Unless I overlooked something, the logs seem to indicate that training has completed the 2nd epoch and is about to head into the 3rd epoch? Some clarifying questions: Are the logs truncated? What behavior are you experiencing?

Thanks,
Liezl

amblypatty · 2023-02-21T19:52:14Z

Hi @roomrys,

Indeed, I terminated the process after seeing the PredictCost() function failed in the first epoch:

Epoch 1/200
2023-02-20 23:13:56.422111: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 544 } dim { size: 544 } dim { size: 3 } } }
2023-02-20 23:14:02.495095: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 27 } dim { size: 136 } dim { size: 136 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40233992192 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -5 } dim { size: -6 } dim { size: 1 } } }

The previous result after training the top-down model with this error in each epoch (though, the warning shows up earlier) was a predictions.pkg.slp file with 'mean scores' but no instances on the suggested frames when I run:

!sleap-track \
    -m models/230218_232711.centroid \
    -m models/230218_232711.centered_instance \
    --only-suggested-frames \
    -o 230218_232711_predicted_suggestions.slp \
    resolved_skeletons_with_predictions.pkg.slp

Where I get a complete prediction (with PredictCost() errors):

INFO:numexpr.utils:Note: NumExpr detected 12 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
Started inference at: 2023-02-19 05:15:02.414019
Args:
{
│   'data_path': 'resolved_skeletons_with_predictions.pkg.slp',
│   'models': [
│   │   'models/230218_232711.centroid',
│   │   'models/230218_232711.centered_instance'
│   ],
│   'frames': '',
│   'only_labeled_frames': False,
│   'only_suggested_frames': True,
│   'output': '230218_232711_predicted_suggestions.slp',
│   'no_empty_frames': False,
│   'verbosity': 'rich',
│   'video.dataset': None,
│   'video.input_format': 'channels_last',
│   'video.index': '',
│   'cpu': False,
│   'first_gpu': False,
│   'last_gpu': False,
│   'gpu': 'auto',
│   'max_edge_length_ratio': 0.25,
│   'dist_penalty_weight': 1.0,
│   'batch_size': 4,
│   'open_in_gui': False,
│   'peak_threshold': 0.2,
│   'tracking.tracker': None,
│   'tracking.target_instance_count': None,
│   'tracking.pre_cull_to_target': None,
│   'tracking.pre_cull_iou_threshold': None,
│   'tracking.post_connect_single_breaks': None,
│   'tracking.clean_instance_count': None,
│   'tracking.clean_iou_threshold': None,
│   'tracking.similarity': None,
│   'tracking.match': None,
│   'tracking.track_window': None,
│   'tracking.min_new_track_points': None,
│   'tracking.min_match_points': None,
│   'tracking.img_scale': None,
│   'tracking.of_window_size': None,
│   'tracking.of_max_levels': None,
│   'tracking.save_shifted_instances': None,
│   'tracking.kf_node_indices': None,
│   'tracking.kf_init_frame_count': None
}

INFO:sleap.nn.inference:Auto-selected GPU 0 with 40533 MiB of free memory.
Versions:
SLEAP: 1.2.9
TensorFlow: 2.8.4
Numpy: 1.21.6
Python: 3.8.10
OS: Linux-5.10.147+-x86_64-with-glibc2.29

System:
GPUs: 1/1 available
  Device: /physical_device:GPU:0
         Available: True
        Initalized: False
     Memory growth: True

Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% ETA: -:--:-- ?2023-02-19 05:15:29.605879: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -51 } dim { size: -52 } dim { size: -53 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40233992192 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -54 } dim { size: -55 } dim { size: 1 } } }
2023-02-19 05:15:29.606433: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 4 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -59 } dim { size: -60 } dim { size: 3 } } }
2023-02-19 05:15:29.613101: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -109 } dim { size: -110 } dim { size: -111 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -6 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -6 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40233992192 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -6 } dim { size: -112 } dim { size: -113 } dim { size: 1 } } }
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━  94% ETA: 0:00:01 58.1 FPS2023-02-19 05:15:35.958628: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -51 } dim { size: -52 } dim { size: -53 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40233992192 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -54 } dim { size: -55 } dim { size: 1 } } }
2023-02-19 05:15:35.959166: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 3 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -59 } dim { size: -60 } dim { size: 3 } } }
2023-02-19 05:15:35.966002: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -109 } dim { size: -110 } dim { size: -111 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -6 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -6 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40233992192 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -6 } dim { size: -112 } dim { size: -113 } dim { size: 1 } } }
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% ETA: 0:00:00 13.9 FPS
Finished inference at: 2023-02-19 05:15:37.534322
Total runtime: 35.120323181152344 secs
Predicted frames: 51/51
Provenance:
{
│   'model_paths': [
│   │   'models/230218_232711.centroid/training_config.json',
│   │   'models/230218_232711.centered_instance/training_config.json'
│   ],
│   'predictor': 'TopDownPredictor',
│   'sleap_version': '1.2.9',
│   'platform': 'Linux-5.10.147+-x86_64-with-glibc2.29',
│   'command': '/usr/local/bin/sleap-track -m models/230218_232711.centroid -m models/230218_232711.centered_instance --only-suggested-frames -o 230218_232711_predicted_suggestions.slp resolved_skeletons_with_predictions.pkg.slp',
│   'data_path': 'resolved_skeletons_with_predictions.pkg.slp',
│   'output_path': '230218_232711_predicted_suggestions.slp',
│   'total_elapsed': 35.120323181152344,
│   'start_timestamp': '2023-02-19 05:15:02.414019',
│   'finish_timestamp': '2023-02-19 05:15:37.534322'
}

Saved output: 230218_232711_predicted_suggestions.slp

...and then merge the predictions in the SLEAP GUI. Additionally, there are no metrics for the centered_instance model:

The image above shows, in the background, a suggested frame (313) that has a mean score but there is no predicted instance on the frame. In the foreground shows the evaluation metrics window where the most recent centered_instance model shows empty cells for the evaluation metrics, but the previous centered_instance model shows the metrics (expected).

Thanks for your help,
Patrick

amblypatty · 2023-03-12T20:57:36Z

Hello @roomrys and @talmo,

I am still experiencing this issue, even in the newest 1.3.0a0 release. I have tried redoing this with a few different hyperparameters to try and get the previously expected behavior, but I am still experiencing an error in the PredictCost() function. I am afraid I don't really know what it means or how to get around it. I would really appreciate some help on this one.

Here is the latest output from my top-down training, first from the Centroid and then the Centered-Instance:

INFO:sleap.nn.training:
INFO:sleap.nn.training:Initializing trainer...
INFO:sleap.nn.training:Loading training labels from: resolved_skeletons_with_predictions.pkg.slp
INFO:sleap.nn.training:Creating training and validation splits from validation fraction: 0.1
INFO:sleap.nn.training:  Splits: Training = 271 / Validation = 30.
INFO:sleap.nn.training:Setting up for training...
INFO:sleap.nn.training:Setting up pipeline builders...
INFO:sleap.nn.training:Setting up model...
INFO:sleap.nn.training:Building test pipeline...
INFO:sleap.nn.training:Loaded test example. [2.734s]
INFO:sleap.nn.training:  Input shape: (544, 960, 3)
INFO:sleap.nn.training:Created Keras model.
INFO:sleap.nn.training:  Backbone: UNet(stacks=1, filters=16, filters_rate=2.0, kernel_size=3, stem_kernel_size=7, convs_per_block=2, stem_blocks=0, down_blocks=4, middle_block=True, up_blocks=3, up_interpolate=True, block_contraction=False)
INFO:sleap.nn.training:  Max stride: 16
INFO:sleap.nn.training:  Parameters: 1,953,393
INFO:sleap.nn.training:  Heads: 
INFO:sleap.nn.training:    [0] = CentroidConfmapsHead(anchor_part='pedicel', sigma=2.5, output_stride=2, loss_weight=1.0)
INFO:sleap.nn.training:  Outputs: 
INFO:sleap.nn.training:    [0] = KerasTensor(type_spec=TensorSpec(shape=(None, 272, 480, 1), dtype=tf.float32, name=None), name='CentroidConfmapsHead/BiasAdd:0', description="created by layer 'CentroidConfmapsHead'")
INFO:sleap.nn.training:Setting up data pipelines...
INFO:sleap.nn.training:Training set: n = 271
INFO:sleap.nn.training:Validation set: n = 30
INFO:sleap.nn.training:Setting up optimization...
INFO:sleap.nn.training:  Learning rate schedule: LearningRateScheduleConfig(reduce_on_plateau=True, reduction_factor=0.5, plateau_min_delta=1e-06, plateau_patience=5, plateau_cooldown=3, min_learning_rate=1e-08)
INFO:sleap.nn.training:  Early stopping: EarlyStoppingConfig(stop_training_on_plateau=True, plateau_min_delta=1e-08, plateau_patience=20)
INFO:sleap.nn.training:Setting up outputs...
INFO:sleap.nn.training:Created run path: models/230312_144956.centroid
INFO:sleap.nn.training:Setting up visualization...
2023-03-12 19:06:42.229412: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -34 } dim { size: -35 } dim { size: -36 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -37 } dim { size: -38 } dim { size: 1 } } }
2023-03-12 19:06:43.501136: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -34 } dim { size: -35 } dim { size: -36 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -37 } dim { size: -38 } dim { size: 1 } } }
Unable to use Qt backend for matplotlib. This probably means Qt is running headless.
Unable to use Qt backend for matplotlib. This probably means Qt is running headless.
INFO:sleap.nn.training:Finished trainer set up. [6.9s]
INFO:sleap.nn.training:Creating tf.data.Datasets for training data generation...
INFO:sleap.nn.training:Finished creating training datasets. [14.2s]
INFO:sleap.nn.training:Starting training loop...
Epoch 1/200
200/200 - 38s - loss: 9.3239e-05 - val_loss: 5.8659e-05 - lr: 1.0000e-04 - 38s/epoch - 190ms/step
Epoch 2/200
200/200 - 22s - loss: 3.1217e-05 - val_loss: 2.7985e-05 - lr: 1.0000e-04 - 22s/epoch - 111ms/step
Epoch 3/200
200/200 - 23s - loss: 1.8750e-05 - val_loss: 1.7997e-05 - lr: 1.0000e-04 - 23s/epoch - 113ms/step

... (truncated here as training ensues the same) ...

Epoch 46: ReduceLROnPlateau reducing learning rate to 3.12499992105586e-06.
200/200 - 21s - loss: 2.8520e-06 - val_loss: 4.5091e-06 - lr: 6.2500e-06 - 21s/epoch - 107ms/step
Epoch 47/200
200/200 - 22s - loss: 3.1557e-06 - val_loss: 1.9466e-06 - lr: 3.1250e-06 - 22s/epoch - 108ms/step
Epoch 47: early stopping
INFO:sleap.nn.training:Finished training loop. [17.6 min]
INFO:sleap.nn.training:Deleting visualization directory: models/230312_144956.centroid/viz
INFO:sleap.nn.training:Saving evaluation metrics to model folder...
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% ETA: -:--:-- ?2023-03-12 19:24:35.575939: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -74 } dim { size: -75 } dim { size: -76 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -77 } dim { size: -78 } dim { size: 1 } } }
2023-03-12 19:24:35.576331: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 4 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -83 } dim { size: -84 } dim { size: 3 } } }
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸  99% ETA: 0:00:01 27.9 FPS2023-03-12 19:24:45.964082: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -74 } dim { size: -75 } dim { size: -76 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -77 } dim { size: -78 } dim { size: 1 } } }
2023-03-12 19:24:45.964449: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 3 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -83 } dim { size: -84 } dim { size: 3 } } }
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% ETA: 0:00:00 15.8 FPS
INFO:sleap.nn.evals:Saved predictions: models/230312_144956.centroid/labels_pr.train.slp
INFO:sleap.nn.evals:Saved metrics: models/230312_144956.centroid/metrics.train.npz
INFO:sleap.nn.evals:OKS mAP: 0.980198
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% ETA: -:--:-- ?2023-03-12 19:24:49.825924: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -74 } dim { size: -75 } dim { size: -76 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -77 } dim { size: -78 } dim { size: 1 } } }
2023-03-12 19:24:49.826309: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 4 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -83 } dim { size: -84 } dim { size: 3 } } }
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━  93% ETA: 0:00:01 88.9 FPS2023-03-12 19:24:51.899967: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -74 } dim { size: -75 } dim { size: -76 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -77 } dim { size: -78 } dim { size: 1 } } }
2023-03-12 19:24:51.900339: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 2 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -83 } dim { size: -84 } dim { size: 3 } } }
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% ETA: 0:00:00 10.7 FPS
INFO:sleap.nn.evals:Saved predictions: models/230312_144956.centroid/labels_pr.val.slp
INFO:sleap.nn.evals:Saved metrics: models/230312_144956.centroid/metrics.val.npz
INFO:sleap.nn.evals:OKS mAP: 0.930693

INFO:sleap.nn.training:
INFO:sleap.nn.training:Auto-selected GPU 0 with 40510 MiB of free memory.
INFO:sleap.nn.training:Using GPU 0 for acceleration.
INFO:sleap.nn.training:Disabled GPU memory pre-allocation.
INFO:sleap.nn.training:System:
GPUs: 1/1 available
  Device: /physical_device:GPU:0
         Available: True
        Initalized: False
     Memory growth: True
INFO:sleap.nn.training:
INFO:sleap.nn.training:Initializing trainer...
INFO:sleap.nn.training:Loading training labels from: resolved_skeletons_with_predictions.pkg.slp
INFO:sleap.nn.training:Creating training and validation splits from validation fraction: 0.1
INFO:sleap.nn.training:  Splits: Training = 271 / Validation = 30.
INFO:sleap.nn.training:Setting up for training...
INFO:sleap.nn.training:Setting up pipeline builders...
INFO:sleap.nn.training:Setting up model...
INFO:sleap.nn.training:Building test pipeline...
2023-03-12 19:25:06.978157: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 512 } dim { size: 512 } dim { size: 3 } } }
INFO:sleap.nn.training:Loaded test example. [3.324s]
INFO:sleap.nn.training:  Input shape: (512, 512, 3)
INFO:sleap.nn.training:Created Keras model.
INFO:sleap.nn.training:  Backbone: UNet(stacks=1, filters=24, filters_rate=2.0, kernel_size=3, stem_kernel_size=7, convs_per_block=2, stem_blocks=0, down_blocks=4, middle_block=True, up_blocks=2, up_interpolate=True, block_contraction=False)
INFO:sleap.nn.training:  Max stride: 16
INFO:sleap.nn.training:  Parameters: 4,313,235
INFO:sleap.nn.training:  Heads: 
INFO:sleap.nn.training:    [0] = CenteredInstanceConfmapsHead(part_names=['prosoma', 'pedicel', 'opisthosoma', 'pedipalpR1', 'pedipalpL1', 'antlegR1', 'antlegR2', 'antlegL1', 'antlegL2', 'forelegR1', 'forelegR2', 'forelegL1', 'forelegL2', 'midlegR1', 'midlegR2', 'midlegL1', 'midlegL2', 'hindlegR1', 'hindlegR2', 'hindlegL1', 'hindlegL2', 'pedipalpR2', 'pedipalpL2', 'antlegR3', 'antlegR4', 'antlegL3', 'antlegL4'], anchor_part='pedicel', sigma=2.5, output_stride=4, loss_weight=1.0)
INFO:sleap.nn.training:  Outputs: 
INFO:sleap.nn.training:    [0] = KerasTensor(type_spec=TensorSpec(shape=(None, 128, 128, 27), dtype=tf.float32, name=None), name='CenteredInstanceConfmapsHead/BiasAdd:0', description="created by layer 'CenteredInstanceConfmapsHead'")
INFO:sleap.nn.training:Setting up data pipelines...
INFO:sleap.nn.training:Training set: n = 271
INFO:sleap.nn.training:Validation set: n = 30
INFO:sleap.nn.training:Setting up optimization...
INFO:sleap.nn.training:  OHKM enabled: HardKeypointMiningConfig(online_mining=True, hard_to_easy_ratio=2.0, min_hard_keypoints=3, max_hard_keypoints=None, loss_scale=5.0)
INFO:sleap.nn.training:  Learning rate schedule: LearningRateScheduleConfig(reduce_on_plateau=True, reduction_factor=0.5, plateau_min_delta=1e-06, plateau_patience=5, plateau_cooldown=3, min_learning_rate=1e-08)
INFO:sleap.nn.training:  Early stopping: EarlyStoppingConfig(stop_training_on_plateau=True, plateau_min_delta=1e-08, plateau_patience=10)
INFO:sleap.nn.training:Setting up outputs...
INFO:sleap.nn.training:Created run path: models/230312_144956.centered_instance
INFO:sleap.nn.training:Setting up visualization...
2023-03-12 19:25:08.628362: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 512 } dim { size: 512 } dim { size: 3 } } }
2023-03-12 19:25:09.832672: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 512 } dim { size: 512 } dim { size: 3 } } }
Unable to use Qt backend for matplotlib. This probably means Qt is running headless.
Unable to use Qt backend for matplotlib. This probably means Qt is running headless.
INFO:sleap.nn.training:Finished trainer set up. [6.1s]
INFO:sleap.nn.training:Creating tf.data.Datasets for training data generation...
2023-03-12 19:25:21.965479: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 512 } dim { size: 512 } dim { size: 3 } } }
2023-03-12 19:25:25.098796: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 512 } dim { size: 512 } dim { size: 3 } } }
INFO:sleap.nn.training:Finished creating training datasets. [15.6s]
INFO:sleap.nn.training:Starting training loop...
2023-03-12 19:25:25.834120: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 512 } dim { size: 512 } dim { size: 3 } } }
Epoch 1/200
2023-03-12 19:25:51.413321: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 512 } dim { size: 512 } dim { size: 3 } } }
2023-03-12 19:25:56.975111: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 27 } dim { size: 128 } dim { size: 128 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -5 } dim { size: -6 } dim { size: 1 } } }
200/200 - 32s - loss: 0.0071 - ohkm: 0.0060 - prosoma: 0.0012 - pedicel: 0.0012 - opisthosoma: 0.0012 - pedipalpR1: 0.0012 - pedipalpL1: 0.0012 - antlegR1: 0.0012 - antlegR2: 0.0012 - antlegL1: 0.0012 - antlegL2: 0.0012 - forelegR1: 0.0012 - forelegR2: 0.0012 - forelegL1: 0.0012 - forelegL2: 0.0012 - midlegR1: 0.0012 - midlegR2: 0.0012 - midlegL1: 0.0012 - midlegL2: 0.0012 - hindlegR1: 0.0012 - hindlegR2: 0.0012 - hindlegL1: 0.0012 - hindlegL2: 0.0012 - pedipalpR2: 0.0012 - pedipalpL2: 0.0012 - antlegR3: 0.0012 - antlegR4: 0.0011 - antlegL3: 0.0012 - antlegL4: 0.0011 - val_loss: 0.0071 - val_ohkm: 0.0060 - val_prosoma: 0.0011 - val_pedicel: 0.0012 - val_opisthosoma: 0.0012 - val_pedipalpR1: 0.0011 - val_pedipalpL1: 0.0011 - val_antlegR1: 0.0012 - val_antlegR2: 0.0012 - val_antlegL1: 0.0011 - val_antlegL2: 0.0012 - val_forelegR1: 0.0012 - val_forelegR2: 0.0012 - val_forelegL1: 0.0011 - val_forelegL2: 0.0012 - val_midlegR1: 0.0012 - val_midlegR2: 0.0012 - val_midlegL1: 0.0012 - val_midlegL2: 0.0012 - val_hindlegR1: 0.0012 - val_hindlegR2: 0.0012 - val_hindlegL1: 0.0011 - val_hindlegL2: 0.0012 - val_pedipalpR2: 0.0012 - val_pedipalpL2: 0.0012 - val_antlegR3: 0.0012 - val_antlegR4: 0.0011 - val_antlegL3: 0.0012 - val_antlegL4: 0.0011 - lr: 1.0000e-04 - 32s/epoch - 160ms/step
Epoch 2/200
2023-03-12 19:26:14.838946: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 512 } dim { size: 512 } dim { size: 3 } } }
200/200 - 20s - loss: 0.0071 - ohkm: 0.0060 - prosoma: 0.0012 - pedicel: 0.0012 - opisthosoma: 0.0012 - pedipalpR1: 0.0012 - pedipalpL1: 0.0012 - antlegR1: 0.0012 - antlegR2: 0.0012 - antlegL1: 0.0012 - antlegL2: 0.0012 - forelegR1: 0.0012 - forelegR2: 0.0012 - forelegL1: 0.0012 - forelegL2: 0.0012 - midlegR1: 0.0012 - midlegR2: 0.0012 - midlegL1: 0.0012 - midlegL2: 0.0012 - hindlegR1: 0.0012 - hindlegR2: 0.0012 - hindlegL1: 0.0012 - hindlegL2: 0.0012 - pedipalpR2: 0.0012 - pedipalpL2: 0.0012 - antlegR3: 0.0012 - antlegR4: 0.0011 - antlegL3: 0.0012 - antlegL4: 0.0011 - val_loss: 0.0071 - val_ohkm: 0.0060 - val_prosoma: 0.0012 - val_pedicel: 0.0012 - val_opisthosoma: 0.0012 - val_pedipalpR1: 0.0011 - val_pedipalpL1: 0.0011 - val_antlegR1: 0.0012 - val_antlegR2: 0.0012 - val_antlegL1: 0.0012 - val_antlegL2: 0.0012 - val_forelegR1: 0.0012 - val_forelegR2: 0.0012 - val_forelegL1: 0.0012 - val_forelegL2: 0.0012 - val_midlegR1: 0.0012 - val_midlegR2: 0.0012 - val_midlegL1: 0.0012 - val_midlegL2: 0.0012 - val_hindlegR1: 0.0012 - val_hindlegR2: 0.0012 - val_hindlegL1: 0.0012 - val_hindlegL2: 0.0012 - val_pedipalpR2: 0.0012 - val_pedipalpL2: 0.0012 - val_antlegR3: 0.0011 - val_antlegR4: 9.8390e-04 - val_antlegL3: 0.0011 - val_antlegL4: 0.0011 - lr: 1.0000e-04 - 20s/epoch - 98ms/step
Epoch 3/200
2023-03-12 19:26:35.598997: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 512 } dim { size: 512 } dim { size: 3 } } }
200/200 - 22s - loss: 0.0071 - ohkm: 0.0059 - prosoma: 0.0011 - pedicel: 0.0011 - opisthosoma: 0.0012 - pedipalpR1: 0.0011 - pedipalpL1: 0.0011 - antlegR1: 0.0011 - antlegR2: 0.0012 - antlegL1: 0.0011 - antlegL2: 0.0012 - forelegR1: 0.0011 - forelegR2: 0.0012 - forelegL1: 0.0011 - forelegL2: 0.0012 - midlegR1: 0.0011 - midlegR2: 0.0012 - midlegL1: 0.0011 - midlegL2: 0.0012 - hindlegR1: 0.0011 - hindlegR2: 0.0012 - hindlegL1: 0.0011 - hindlegL2: 0.0012 - pedipalpR2: 0.0012 - pedipalpL2: 0.0012 - antlegR3: 0.0012 - antlegR4: 0.0011 - antlegL3: 0.0012 - antlegL4: 0.0011 - val_loss: 0.0070 - val_ohkm: 0.0059 - val_prosoma: 0.0011 - val_pedicel: 0.0011 - val_opisthosoma: 0.0012 - val_pedipalpR1: 0.0011 - val_pedipalpL1: 0.0011 - val_antlegR1: 0.0011 - val_antlegR2: 0.0012 - val_antlegL1: 0.0011 - val_antlegL2: 0.0012 - val_forelegR1: 0.0011 - val_forelegR2: 0.0012 - val_forelegL1: 0.0011 - val_forelegL2: 0.0012 - val_midlegR1: 0.0012 - val_midlegR2: 0.0012 - val_midlegL1: 0.0011 - val_midlegL2: 0.0012 - val_hindlegR1: 0.0012 - val_hindlegR2: 0.0012 - val_hindlegL1: 0.0011 - val_hindlegL2: 0.0012 - val_pedipalpR2: 0.0012 - val_pedipalpL2: 0.0012 - val_antlegR3: 0.0011 - val_antlegR4: 9.6768e-04 - val_antlegL3: 0.0011 - val_antlegL4: 0.0011 - lr: 1.0000e-04 - 22s/epoch - 110ms/step
Epoch 4/200
2023-03-12 19:26:56.726952: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 512 } dim { size: 512 } dim { size: 3 } } }
200/200 - 21s - loss: 0.0070 - ohkm: 0.0059 - prosoma: 0.0011 - pedicel: 0.0011 - opisthosoma: 0.0012 - pedipalpR1: 0.0011 - pedipalpL1: 0.0011 - antlegR1: 0.0011 - antlegR2: 0.0012 - antlegL1: 0.0011 - antlegL2: 0.0011 - forelegR1: 0.0011 - forelegR2: 0.0012 - forelegL1: 0.0011 - forelegL2: 0.0012 - midlegR1: 0.0011 - midlegR2: 0.0012 - midlegL1: 0.0011 - midlegL2: 0.0012 - hindlegR1: 0.0011 - hindlegR2: 0.0012 - hindlegL1: 0.0011 - hindlegL2: 0.0012 - pedipalpR2: 0.0011 - pedipalpL2: 0.0011 - antlegR3: 0.0011 - antlegR4: 0.0010 - antlegL3: 0.0011 - antlegL4: 0.0011 - val_loss: 0.0070 - val_ohkm: 0.0058 - val_prosoma: 0.0011 - val_pedicel: 0.0011 - val_opisthosoma: 0.0011 - val_pedipalpR1: 0.0011 - val_pedipalpL1: 0.0011 - val_antlegR1: 0.0011 - val_antlegR2: 0.0011 - val_antlegL1: 0.0011 - val_antlegL2: 0.0011 - val_forelegR1: 0.0011 - val_forelegR2: 0.0011 - val_forelegL1: 0.0011 - val_forelegL2: 0.0012 - val_midlegR1: 0.0011 - val_midlegR2: 0.0011 - val_midlegL1: 0.0011 - val_midlegL2: 0.0012 - val_hindlegR1: 0.0011 - val_hindlegR2: 0.0011 - val_hindlegL1: 0.0011 - val_hindlegL2: 0.0011 - val_pedipalpR2: 0.0011 - val_pedipalpL2: 0.0011 - val_antlegR3: 0.0011 - val_antlegR4: 9.7633e-04 - val_antlegL3: 0.0011 - val_antlegL4: 0.0010 - lr: 1.0000e-04 - 21s/epoch - 107ms/step
Epoch 5/200
2023-03-12 19:27:18.484206: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 512 } dim { size: 512 } dim { size: 3 } } }
200/200 - 21s - loss: 0.0069 - ohkm: 0.0058 - prosoma: 0.0011 - pedicel: 0.0010 - opisthosoma: 0.0011 - pedipalpR1: 0.0011 - pedipalpL1: 0.0011 - antlegR1: 0.0011 - antlegR2: 0.0011 - antlegL1: 0.0011 - antlegL2: 0.0011 - forelegR1: 0.0011 - forelegR2: 0.0011 - forelegL1: 0.0011 - forelegL2: 0.0011 - midlegR1: 0.0011 - midlegR2: 0.0011 - midlegL1: 0.0011 - midlegL2: 0.0011 - hindlegR1: 0.0011 - hindlegR2: 0.0011 - hindlegL1: 0.0011 - hindlegL2: 0.0011 - pedipalpR2: 0.0011 - pedipalpL2: 0.0011 - antlegR3: 0.0011 - antlegR4: 0.0010 - antlegL3: 0.0011 - antlegL4: 0.0011 - val_loss: 0.0069 - val_ohkm: 0.0058 - val_prosoma: 0.0011 - val_pedicel: 0.0010 - val_opisthosoma: 0.0011 - val_pedipalpR1: 0.0011 - val_pedipalpL1: 0.0011 - val_antlegR1: 0.0011 - val_antlegR2: 0.0011 - val_antlegL1: 0.0011 - val_antlegL2: 0.0011 - val_forelegR1: 0.0011 - val_forelegR2: 0.0011 - val_forelegL1: 0.0011 - val_forelegL2: 0.0011 - val_midlegR1: 0.0011 - val_midlegR2: 0.0011 - val_midlegL1: 0.0011 - val_midlegL2: 0.0012 - val_hindlegR1: 0.0011 - val_hindlegR2: 0.0011 - val_hindlegL1: 0.0011 - val_hindlegL2: 0.0011 - val_pedipalpR2: 0.0011 - val_pedipalpL2: 0.0011 - val_antlegR3: 0.0011 - val_antlegR4: 9.5325e-04 - val_antlegL3: 0.0011 - val_antlegL4: 0.0011 - lr: 1.0000e-04 - 21s/epoch - 107ms/step
Epoch 6/200
2023-03-12 19:27:39.438895: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 512 } dim { size: 512 } dim { size: 3 } } }
200/200 - 21s - loss: 0.0068 - ohkm: 0.0058 - prosoma: 9.7589e-04 - pedicel: 8.8568e-04 - opisthosoma: 0.0011 - pedipalpR1: 0.0011 - pedipalpL1: 0.0011 - antlegR1: 0.0011 - antlegR2: 0.0011 - antlegL1: 0.0011 - antlegL2: 0.0011 - forelegR1: 0.0011 - forelegR2: 0.0011 - forelegL1: 0.0011 - forelegL2: 0.0011 - midlegR1: 0.0011 - midlegR2: 0.0011 - midlegL1: 0.0011 - midlegL2: 0.0011 - hindlegR1: 0.0011 - hindlegR2: 0.0011 - hindlegL1: 0.0011 - hindlegL2: 0.0011 - pedipalpR2: 0.0010 - pedipalpL2: 0.0010 - antlegR3: 0.0011 - antlegR4: 0.0010 - antlegL3: 0.0011 - antlegL4: 0.0011 - val_loss: 0.0068 - val_ohkm: 0.0057 - val_prosoma: 9.5023e-04 - val_pedicel: 8.6653e-04 - val_opisthosoma: 0.0011 - val_pedipalpR1: 0.0011 - val_pedipalpL1: 0.0011 - val_antlegR1: 0.0011 - val_antlegR2: 0.0011 - val_antlegL1: 0.0011 - val_antlegL2: 0.0011 - val_forelegR1: 0.0011 - val_forelegR2: 0.0011 - val_forelegL1: 0.0011 - val_forelegL2: 0.0011 - val_midlegR1: 0.0011 - val_midlegR2: 0.0011 - val_midlegL1: 0.0011 - val_midlegL2: 0.0011 - val_hindlegR1: 0.0011 - val_hindlegR2: 0.0011 - val_hindlegL1: 0.0011 - val_hindlegL2: 0.0011 - val_pedipalpR2: 0.0010 - val_pedipalpL2: 0.0010 - val_antlegR3: 0.0011 - val_antlegR4: 9.0389e-04 - val_antlegL3: 0.0011 - val_antlegL4: 0.0010 - lr: 1.0000e-04 - 21s/epoch - 104ms/step
Epoch 7/200
2023-03-12 19:28:00.736129: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 512 } dim { size: 512 } dim { size: 3 } } }
200/200 - 23s - loss: 0.0068 - ohkm: 0.0057 - prosoma: 9.0675e-04 - pedicel: 8.2878e-04 - opisthosoma: 0.0011 - pedipalpR1: 0.0011 - pedipalpL1: 0.0011 - antlegR1: 0.0011 - antlegR2: 0.0011 - antlegL1: 0.0011 - antlegL2: 0.0011 - forelegR1: 0.0011 - forelegR2: 0.0011 - forelegL1: 0.0011 - forelegL2: 0.0011 - midlegR1: 0.0011 - midlegR2: 0.0011 - midlegL1: 0.0011 - midlegL2: 0.0011 - hindlegR1: 0.0011 - hindlegR2: 0.0011 - hindlegL1: 0.0011 - hindlegL2: 0.0011 - pedipalpR2: 9.7089e-04 - pedipalpL2: 9.6632e-04 - antlegR3: 0.0011 - antlegR4: 0.0010 - antlegL3: 0.0011 - antlegL4: 0.0011 - val_loss: 0.0067 - val_ohkm: 0.0057 - val_prosoma: 8.2120e-04 - val_pedicel: 7.7487e-04 - val_opisthosoma: 0.0010 - val_pedipalpR1: 0.0011 - val_pedipalpL1: 0.0011 - val_antlegR1: 0.0011 - val_antlegR2: 0.0011 - val_antlegL1: 0.0011 - val_antlegL2: 0.0011 - val_forelegR1: 0.0011 - val_forelegR2: 0.0011 - val_forelegL1: 0.0011 - val_forelegL2: 0.0011 - val_midlegR1: 0.0011 - val_midlegR2: 0.0011 - val_midlegL1: 0.0011 - val_midlegL2: 0.0011 - val_hindlegR1: 0.0011 - val_hindlegR2: 0.0011 - val_hindlegL1: 0.0011 - val_hindlegL2: 0.0011 - val_pedipalpR2: 8.7347e-04 - val_pedipalpL2: 9.0317e-04 - val_antlegR3: 0.0011 - val_antlegR4: 9.1410e-04 - val_antlegL3: 0.0010 - val_antlegL4: 0.0010 - lr: 1.0000e-04 - 23s/epoch - 113ms/step

... Truncated through the rest of the training epochs. Notice the PredictCost() error warning each time...

Epoch 49/200
2023-03-12 19:42:29.306173: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 512 } dim { size: 512 } dim { size: 3 } } }
200/200 - 20s - loss: 0.0031 - ohkm: 0.0027 - prosoma: 3.3843e-04 - pedicel: 3.0595e-04 - opisthosoma: 2.9900e-04 - pedipalpR1: 3.7595e-04 - pedipalpL1: 3.7146e-04 - antlegR1: 4.9511e-04 - antlegR2: 4.9294e-04 - antlegL1: 4.7181e-04 - antlegL2: 4.0852e-04 - forelegR1: 3.7959e-04 - forelegR2: 4.7230e-04 - forelegL1: 3.6725e-04 - forelegL2: 4.3950e-04 - midlegR1: 3.5101e-04 - midlegR2: 4.2124e-04 - midlegL1: 3.4595e-04 - midlegL2: 4.2020e-04 - hindlegR1: 3.6612e-04 - hindlegR2: 3.3246e-04 - hindlegL1: 3.7615e-04 - hindlegL2: 3.2629e-04 - pedipalpR2: 3.7594e-04 - pedipalpL2: 3.8471e-04 - antlegR3: 5.9053e-04 - antlegR4: 6.0350e-04 - antlegL3: 5.1578e-04 - antlegL4: 5.2723e-04 - val_loss: 0.0040 - val_ohkm: 0.0035 - val_prosoma: 4.2798e-04 - val_pedicel: 3.8819e-04 - val_opisthosoma: 3.6716e-04 - val_pedipalpR1: 4.5729e-04 - val_pedipalpL1: 4.7555e-04 - val_antlegR1: 5.9989e-04 - val_antlegR2: 6.6770e-04 - val_antlegL1: 5.8266e-04 - val_antlegL2: 4.6682e-04 - val_forelegR1: 4.9084e-04 - val_forelegR2: 5.7035e-04 - val_forelegL1: 4.2677e-04 - val_forelegL2: 5.4642e-04 - val_midlegR1: 4.6823e-04 - val_midlegR2: 5.3738e-04 - val_midlegL1: 4.1209e-04 - val_midlegL2: 5.5408e-04 - val_hindlegR1: 4.5810e-04 - val_hindlegR2: 5.1612e-04 - val_hindlegL1: 4.6348e-04 - val_hindlegL2: 4.0936e-04 - val_pedipalpR2: 4.6826e-04 - val_pedipalpL2: 4.9571e-04 - val_antlegR3: 7.8056e-04 - val_antlegR4: 7.7968e-04 - val_antlegL3: 5.9276e-04 - val_antlegL4: 6.4306e-04 - lr: 1.2500e-05 - 20s/epoch - 101ms/step
Epoch 49: early stopping
INFO:sleap.nn.training:Finished training loop. [17.1 min]
INFO:sleap.nn.training:Deleting visualization directory: models/230312_144956.centered_instance/viz
INFO:sleap.nn.training:Saving evaluation metrics to model folder...
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% ETA: -:--:-- ?2023-03-12 19:42:35.583321: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 4 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -26 } dim { size: -27 } dim { size: 3 } } }
2023-03-12 19:42:35.592771: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -52 } dim { size: -53 } dim { size: -54 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -9 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -9 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -9 } dim { size: -56 } dim { size: -57 } dim { size: 1 } } }
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸  99% ETA: 0:00:01 26.9 FPS2023-03-12 19:42:45.387926: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 3 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -26 } dim { size: -27 } dim { size: 3 } } }
2023-03-12 19:42:45.397145: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -52 } dim { size: -53 } dim { size: -54 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -9 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -9 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -9 } dim { size: -56 } dim { size: -57 } dim { size: 1 } } }
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% ETA: 0:00:00 19.4 FPS
INFO:sleap.nn.evals:Saved predictions: models/230312_144956.centered_instance/labels_pr.train.slp
INFO:sleap.nn.evals:Saved metrics: models/230312_144956.centered_instance/metrics.train.npz
INFO:sleap.nn.evals:OKS mAP: 0.870044
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% ETA: -:--:-- ?2023-03-12 19:42:48.149569: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 4 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -26 } dim { size: -27 } dim { size: 3 } } }
2023-03-12 19:42:48.158952: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -52 } dim { size: -53 } dim { size: -54 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -9 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -9 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -9 } dim { size: -56 } dim { size: -57 } dim { size: 1 } } }
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━  93% ETA: 0:00:01 92.7 FPS2023-03-12 19:42:49.475717: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 2 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -26 } dim { size: -27 } dim { size: 3 } } }
2023-03-12 19:42:49.485143: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -52 } dim { size: -53 } dim { size: -54 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -9 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -9 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -9 } dim { size: -56 } dim { size: -57 } dim { size: 1 } } }
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% ETA: 0:00:00 16.0 FPS
INFO:sleap.nn.evals:Saved predictions: models/230312_144956.centered_instance/labels_pr.val.slp
INFO:sleap.nn.evals:Saved metrics: models/230312_144956.centered_instance/metrics.val.npz
INFO:sleap.nn.evals:OKS mAP: 0.830889

You will notice that there still is a metrics evaluation but with PredictCost() errors. I then predict on the suggested frames:

INFO:numexpr.utils:Note: NumExpr detected 12 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
Started inference at: 2023-03-12 20:33:42.799078
Args:
{
│   'data_path': 'resolved_skeletons_with_predictions.pkg.slp',
│   'models': [
│   │   'models/230312_144956.centroid',
│   │   'models/230312_144956.centered_instance'
│   ],
│   'frames': '',
│   'only_labeled_frames': False,
│   'only_suggested_frames': True,
│   'output': '230312_144956_predicted_suggestions.slp',
│   'no_empty_frames': False,
│   'verbosity': 'rich',
│   'video.dataset': None,
│   'video.input_format': 'channels_last',
│   'video.index': '',
│   'cpu': False,
│   'first_gpu': False,
│   'last_gpu': False,
│   'gpu': 'auto',
│   'max_edge_length_ratio': 0.25,
│   'dist_penalty_weight': 1.0,
│   'batch_size': 4,
│   'open_in_gui': False,
│   'peak_threshold': 0.2,
│   'tracking.tracker': None,
│   'tracking.target_instance_count': None,
│   'tracking.pre_cull_to_target': None,
│   'tracking.pre_cull_iou_threshold': None,
│   'tracking.post_connect_single_breaks': None,
│   'tracking.clean_instance_count': None,
│   'tracking.clean_iou_threshold': None,
│   'tracking.similarity': None,
│   'tracking.match': None,
│   'tracking.robust': None,
│   'tracking.track_window': None,
│   'tracking.min_new_track_points': None,
│   'tracking.min_match_points': None,
│   'tracking.img_scale': None,
│   'tracking.of_window_size': None,
│   'tracking.of_max_levels': None,
│   'tracking.save_shifted_instances': None,
│   'tracking.kf_node_indices': None,
│   'tracking.kf_init_frame_count': None
}

INFO:sleap.nn.inference:Auto-selected GPU 0 with 40510 MiB of free memory.
Versions:
SLEAP: 1.3.0a0
TensorFlow: 2.8.4
Numpy: 1.22.4
Python: 3.9.16
OS: Linux-5.10.147+-x86_64-with-glibc2.31

System:
GPUs: 1/1 available
  Device: /physical_device:GPU:0
         Available: True
        Initalized: False
     Memory growth: True

Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% ETA: -:--:-- ?2023-03-12 20:33:56.497439: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -56 } dim { size: -57 } dim { size: -58 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -59 } dim { size: -60 } dim { size: 1 } } }
2023-03-12 20:33:56.497985: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 4 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -67 } dim { size: -68 } dim { size: 3 } } }
2023-03-12 20:33:56.503357: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -111 } dim { size: -112 } dim { size: -113 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -6 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -6 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -6 } dim { size: -114 } dim { size: -115 } dim { size: 1 } } }
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━  94% ETA: 0:00:01 76.1 FPS2023-03-12 20:34:01.203800: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -56 } dim { size: -57 } dim { size: -58 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -59 } dim { size: -60 } dim { size: 1 } } }
2023-03-12 20:34:01.204359: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 3 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -67 } dim { size: -68 } dim { size: 3 } } }
2023-03-12 20:34:01.209752: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -111 } dim { size: -112 } dim { size: -113 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -6 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -6 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape {
10000
 dim { size: -6 } dim { size: -114 } dim { size: -115 } dim { size: 1 } } }
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% ETA: 0:00:00 18.7 FPS
Finished inference at: 2023-03-12 20:34:02.183675
Total runtime: 19.384610176086426 secs
Predicted frames: 51/51
Provenance:
{
│   'model_paths': [
│   │   'models/230312_144956.centroid/training_config.json',
│   │   'models/230312_144956.centered_instance/training_config.json'
│   ],
│   'predictor': 'TopDownPredictor',
│   'sleap_version': '1.3.0a0',
│   'platform': 'Linux-5.10.147+-x86_64-with-glibc2.31',
│   'command': '/usr/local/bin/sleap-track -m models/230312_144956.centroid -m models/230312_144956.centered_instance --only-suggested-frames -o 230312_144956_predicted_suggestions.slp resolved_skeletons_with_predictions.pkg.slp',
│   'data_path': 'resolved_skeletons_with_predictions.pkg.slp',
│   'output_path': '230312_144956_predicted_suggestions.slp',
│   'total_elapsed': 19.384610176086426,
│   'start_timestamp': '2023-03-12 20:33:42.799078',
│   'finish_timestamp': '2023-03-12 20:34:02.183675'
}

Saved output: 230312_144956_predicted_suggestions.slp

But the problem is that the prediction file is empty. Even though it has the same file size (57kb) as previous prediction files that have worked. When I merge the prediction file into my current SLEAP project, nothing happens. When I open the prediction file by itself, nothing shows up either, but it could be because there isn't a video file attached to it.

Additionally, as in my previous comment, I am still unable to see a model metric evaluation. Please let me know if there is something else I can provide to help solve this issue.
I am stuck until this is solved.

roomrys · 2023-03-15T16:03:29Z

Hi @amblypatty,

Could you share everything needed to do the training/inference (video, slp, models) and the 230312_144956_predicted_suggestions.slp to lmaree@salk.edu? Sorry, Github doesn't notify for reactions, but thanks for bumping this again - it had gotten buried.... Let's get you unstuck.

Thanks,
Liezl

jmdelahanty · 2023-04-17T22:39:42Z

One of our labmates is also seeming to experience this issue. I can send to you if you want an example Liezl, but they're currently running an older SLEAP version.

Lauraschwarz · 2023-04-18T14:46:02Z

i think i am experiencing a similar issue. i am very new to this but reading through this it seems very similar to what happens for me. i have tried optimising the training parameters for my top-down multianimal model, and when i tweak the input scaling (and the max stride) settings, in some cases i receive an error message in the GUI saying that the training failed. for my centroid model, keeping the input scaling at 0.5 and the max stride at 32 works. but increasing the input scaling to 1.0 and the max stride to 64 i start seeing this issue. i will keep an eye on this issue. i just thought i would mention that i am experiencing this. thank you also for an amazing tool. i really like SLEAP.

smasri09 · 2023-11-03T20:35:48Z

Hello, I am getting this issue as well, but at input scaling of 0.5. I need to use 0.5 to get the model to run on my 8GB GPU with 1280x1024 video, by changing that and by reducing filters from 64 to 48, and rate from 2 to 1.5, I was finally able to get the model to run. Attached error code. Is there anything I can do? Thanks for the support

}
INFO:sleap.nn.training:
INFO:sleap.nn.training:Training job:
INFO:sleap.nn.training:{
"data": {
"labels": {
"training_labels": null,
"validation_labels": null,
"validation_fraction": 0.1,
"test_labels": null,
"split_by_inds": false,
"training_inds": null,
"validation_inds": null,
"test_inds": null,
"search_path_hints": [],
"skeletons": []
},
"preprocessing": {
"ensure_rgb": false,
"ensure_grayscale": false,
"imagenet_mode": null,
"input_scaling": 0.5,
"pad_to_stride": null,
"resize_and_pad_to_target": true,
"target_height": null,
"target_width": null
},
"instance_cropping": {
"center_on_part": "back",
"crop_size": 592,
"crop_size_detection_padding": 16
}
},
"model": {
"backbone": {
"leap": null,
"unet": {
"stem_stride": null,
"max_stride": 16,
"output_stride": 2,
"filters": 48,
"filters_rate": 1.5,
"middle_block": true,
"up_interpolate": false,
"stacks": 1
},
"hourglass": null,
"resnet": null,
"pretrained_encoder": null
},
"heads": {
"single_instance": null,
"centroid": null,
"centered_instance": null,
"multi_instance": null,
"multi_class_bottomup": null,
"multi_class_topdown": {
"confmaps": {
"anchor_part": "back",
"part_names": null,
"sigma": 5.0,
"output_stride": 2,
"loss_weight": 1.0,
"offset_refinement": false
},
"class_vectors": {
"classes": [
"o",
"d"
],
"num_fc_layers": 3,
"num_fc_units": 64,
"global_pool": true,
"output_stride": 16,
"loss_weight": 1.0
}
}
},
"base_checkpoint": null
},
"optimization": {
"preload_data": true,
"augmentation_config": {
"rotate": false,
"rotation_min_angle": -180.0,
"rotation_max_angle": 180.0,
"translate": false,
"translate_min": -5,
"translate_max": 5,
"scale": false,
"scale_min": 0.9,
"scale_max": 1.1,
"uniform_noise": false,
"uniform_noise_min_val": 0.0,
"uniform_noise_max_val": 10.0,
"gaussian_noise": false,
"gaussian_noise_mean": 5.0,
"gaussian_noise_stddev": 1.0,
"contrast": false,
"contrast_min_gamma": 0.5,
"contrast_max_gamma": 2.0,
"brightness": false,
"brightness_min_val": 0.0,
"brightness_max_val": 10.0,
"random_crop": false,
"random_crop_height": 256,
"random_crop_width": 256,
"random_flip": true,
"flip_horizontal": false
},
"online_shuffling": true,
"shuffle_buffer_size": 128,
"prefetch": true,
"batch_size": 8,
"batches_per_epoch": null,
"min_batches_per_epoch": 200,
"val_batches_per_epoch": null,
"min_val_batches_per_epoch": 10,
"epochs": 100,
"optimizer": "adam",
"initial_learning_rate": 0.0001,
"learning_rate_schedule": {
"reduce_on_plateau": true,
"reduction_factor": 0.5,
"plateau_min_delta": 1e-06,
"plateau_patience": 5,
"plateau_cooldown": 3,
"min_learning_rate": 1e-08
},
"hard_keypoint_mining": {
"online_mining": false,
"hard_to_easy_ratio": 2.0,
"min_hard_keypoints": 2,
"max_hard_keypoints": null,
"loss_scale": 5.0
},
"early_stopping": {
"stop_training_on_plateau": true,
"plateau_min_delta": 1e-06,
"plateau_patience": 10
}
},
"outputs": {
"save_outputs": true,
"run_name": "231103_162437.multi_class_topdown.n=20",
"run_name_prefix": "",
"run_name_suffix": "",
"runs_folder": "C:/ml/sleap/labels\models",
"tags": [
""
],
"save_visualizations": true,
"delete_viz_images": true,
"zip_outputs": false,
"log_to_csv": true,
"checkpointing": {
"initial_model": false,
"best_model": true,
"every_epoch": false,
"latest_model": false,
"final_model": false
},
"tensorboard": {
"write_logs": false,
"loss_frequency": "epoch",
"architecture_graph": false,
"profile_graph": false,
"visualizations": true
},
"zmq": {
"subscribe_to_controller": true,
"controller_address": "tcp://127.0.0.1:9000",
"controller_polling_timeout": 10,
"publish_updates": true,
"publish_address": "tcp://127.0.0.1:9001"
}
},
"name": "",
"description": "",
"sleap_version": "1.3.3",
"filename": "C:\Users\smasr\AppData\Local\Temp\tmpb8i55rmq\231103_162437_training_job.json"
}
INFO:sleap.nn.training:
INFO:sleap.nn.training:Auto-selected GPU 0 with 7963 MiB of free memory.
INFO:sleap.nn.training:Using GPU 0 for acceleration.
INFO:sleap.nn.training:Disabled GPU memory pre-allocation.
INFO:sleap.nn.training:System:
GPUs: 1/1 available
Device: /physical_device:GPU:0
Available: True
Initalized: False
Memory growth: True
INFO:sleap.nn.training:
INFO:sleap.nn.training:Initializing trainer...
INFO:sleap.nn.training:Loading training labels from: C:/ml/sleap/labels/labels.v001.slp
INFO:sleap.nn.training:Creating training and validation splits from validation fraction: 0.1
INFO:sleap.nn.training: Splits: Training = 18 / Validation = 2.
INFO:sleap.nn.training:Setting up for training...
INFO:sleap.nn.training:Setting up pipeline builders...
INFO:sleap.nn.training:Setting up model...
INFO:sleap.nn.training:Building test pipeline...
2023-11-03 16:24:41.318359: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-11-03 16:24:41.699298: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 5417 MB memory: -> device: 0, name: NVIDIA GeForce RTX 4070 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.9
INFO:sleap.nn.training:Loaded test example. [2.027s]
INFO:sleap.nn.training: Input shape: (592, 592, 3)
INFO:sleap.nn.training:Created Keras model.
INFO:sleap.nn.training: Backbone: UNet(stacks=1, filters=48, filters_rate=1.5, kernel_size=3, stem_kernel_size=7, convs_per_block=2, stem_blocks=0, down_blocks=4, middle_block=True, up_blocks=3, up_interpolate=False, block_contraction=False)
INFO:sleap.nn.training: Max stride: 16
INFO:sleap.nn.training: Parameters: 3,326,096
INFO:sleap.nn.training: Heads:
INFO:sleap.nn.training: [0] = CenteredInstanceConfmapsHead(part_names=['nose', 'neck', 'back', 'tailstart', 'tailend'], anchor_part='back', sigma=5.0, output_stride=2, loss_weight=1.0)
INFO:sleap.nn.training: [1] = ClassVectorsHead(classes=['o', 'd'], num_fc_layers=3, num_fc_units=64, global_pool=True, output_stride=16, loss_weight=1.0)
INFO:sleap.nn.training: Outputs:
INFO:sleap.nn.training: [0] = KerasTensor(type_spec=TensorSpec(shape=(None, 296, 296, 5), dtype=tf.float32, name=None), name='CenteredInstanceConfmapsHead/BiasAdd:0', description="created by layer 'CenteredInstanceConfmapsHead'")
INFO:sleap.nn.training: [1] = KerasTensor(type_spec=TensorSpec(shape=(None, 2), dtype=tf.float32, name=None), name='ClassVectorsHead/Softmax:0', description="created by layer 'ClassVectorsHead'")
INFO:sleap.nn.training:Training from scratch
INFO:sleap.nn.training:Setting up data pipelines...
INFO:sleap.nn.training:Training set: n = 18
INFO:sleap.nn.training:Validation set: n = 2
INFO:sleap.nn.training:Setting up optimization...
INFO:sleap.nn.training: Learning rate schedule: LearningRateScheduleConfig(reduce_on_plateau=True, reduction_factor=0.5, plateau_min_delta=1e-06, plateau_patience=5, plateau_cooldown=3, min_learning_rate=1e-08)
INFO:sleap.nn.training: Early stopping: EarlyStoppingConfig(stop_training_on_plateau=True, plateau_min_delta=1e-06, plateau_patience=10)
INFO:sleap.nn.training:Setting up outputs...
INFO:sleap.nn.callbacks:Training controller subscribed to: tcp://127.0.0.1:9000 (topic: )
INFO:sleap.nn.training: ZMQ controller subcribed to: tcp://127.0.0.1:9000
INFO:sleap.nn.callbacks:Progress reporter publishing on: tcp://127.0.0.1:9001 for: not_set
INFO:sleap.nn.training: ZMQ progress reporter publish on: tcp://127.0.0.1:9001
INFO:sleap.nn.training:Created run path: C:/ml/sleap/labels\models\231103_162437.multi_class_topdown.n=20
INFO:sleap.nn.training:Setting up visualization...
INFO:sleap.nn.training:Finished trainer set up. [3.3s]
INFO:sleap.nn.training:Creating tf.data.Datasets for training data generation...
INFO:sleap.nn.training:Finished creating training datasets. [3.2s]
INFO:sleap.nn.training:Starting training loop...
Epoch 1/100
2023-11-03 16:24:50.027369: I tensorflow/stream_executor/cuda/cuda_dnn.cc:366] Loaded cuDNN version 8201
2023-11-03 16:24:51.105728: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] INTERNAL: ptxas exited with non-zero error code -1, output:
Relying on driver to perform ptx compilation.
Modify $PATH to customize ptxas location.
This message will be only logged once.
2023-11-03 16:24:54.369871: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.43GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2023-11-03 16:24:54.370160: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.43GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2023-11-03 16:24:56.097067: I tensorflow/stream_executor/cuda/cuda_blas.cc:1774] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
2023-11-03 16:24:56.987300: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.40GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2023-11-03 16:24:56.987450: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.40GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2023-11-03 16:24:57.053966: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.55GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2023-11-03 16:24:57.054198: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.55GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2023-11-03 16:24:57.415488: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.71GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2023-11-03 16:24:57.416283: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.71GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2023-11-03 16:24:57.829054: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.55GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2023-11-03 16:24:57.829268: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.55GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
WARNING:tensorflow:Callback method on_train_batch_end is slow compared to the batch time (batch time: 0.1977s vs on_train_batch_end time: 0.2514s). Check your callbacks.
Traceback (most recent call last):
File "C:\Users\smasr.conda\envs\das\envs\sleap2\Scripts\sleap-train-script.py", line 33, in
sys.exit(load_entry_point('sleap==1.3.3', 'console_scripts', 'sleap-train')())
File "C:\Users\smasr.conda\envs\das\envs\sleap2\lib\site-packages\sleap\nn\training.py", line 2014, in main
trainer.train()
File "C:\Users\smasr.conda\envs\das\envs\sleap2\lib\site-packages\sleap\nn\training.py", line 941, in train
verbose=2,
File "C:\Users\smasr.conda\envs\das\envs\sleap2\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "C:\Users\smasr.conda\envs\das\envs\sleap2\lib\site-packages\sleap\nn\callbacks.py", line 280, in on_epoch_end
figure = self.plot_fn()
File "C:\Users\smasr.conda\envs\das\envs\sleap2\lib\site-packages\sleap\nn\training.py", line 1786, in
viz_fn=lambda: visualize_example(next(training_viz_ds_iter)),
File "C:\Users\smasr.conda\envs\das\envs\sleap2\lib\site-packages\sleap\nn\training.py", line 1766, in visualize_example
preds = find_peaks(tf.expand_dims(example["instance_image"], axis=0))
File "C:\Users\smasr.conda\envs\das\envs\sleap2\lib\site-packages\sleap\nn\inference.py", line 2088, in call
out = self.keras_model(crops)
ValueError: Exception encountered when calling layer "find_instance_peaks" (type FindInstancePeaks).

Input 0 of layer "model" is incompatible with the layer: expected shape=(None, 592, 592, 3), found shape=(1, 296, 296, 3)

roomrys · 2023-11-06T16:51:38Z

Hi @smasri09,

I know you said that you needed an input scaling of 0.5 to get the model to run on you 8GB GPU, but is there any way you can keep top-down-id model to an input scaling of 1 and just adjust the centroid model input 10000 scaling? Maybe even lowering it less than 0.5? Similar to the centered instance model, the top-down-id model does not support adjusting the input scaling - it relies on the centroid model taking crops of the full image to save on memory, but then keeps full resolution in the crop to accurately locate smaller body parts.

Thanks,
Liezl

aperkes · 2024-02-22T23:34:54Z

HI, I ran into the same issue training a leap-backbone top-down centered instance model with input scaling at 0.25 (it was just there by default, or maybe from an earlier run). Google brought me here and switching input scaling 1 fixed the problem. Here's the log in case that is helpful:

Output Log:

Using already trained model for centroid: /home/ammon/Documents/Scripts/FishTrack/sleap/models/240222_132820.centroid.n=20/training_config.json
Resetting monitor window.
Polling: /home/ammon/Documents/Scripts/FishTrack/sleap/models/240222_145208.centered_instance.n=20/viz/validation.*.png
Start training centered_instance...
['sleap-train', '/tmp/tmpvfx421d7/240222_145208_training_job.json', '/home/ammon/Documents/Scripts/FishTrack/sleap/jallefish.labels.v001.slp', '--zmq', '--save_viz']
INFO:sleap.nn.training:Versions:
SLEAP: 1.3.3
TensorFlow: 2.7.0
Numpy: 1.19.5
Python: 3.7.12
OS: Linux-5.15.0-94-generic-x86_64-with-debian-bullseye-sid
INFO:sleap.nn.training:Training labels file: /home/ammon/Documents/Scripts/FishTrack/sleap/jallefish.labels.v001.slp
INFO:sleap.nn.training:Training profile: /tmp/tmpvfx421d7/240222_145208_training_job.json
INFO:sleap.nn.training:
INFO:sleap.nn.training:Arguments:
INFO:sleap.nn.training:{
"training_job_path": "/tmp/tmpvfx421d7/240222_145208_training_job.json",
"labels_path": "/home/ammon/Documents/Scripts/FishTrack/sleap/jallefish.labels.v001.slp",
"video_paths": [
""
],
"val_labels": null,
"test_labels": null,
"base_checkpoint": null,
"tensorboard": false,
"save_viz": true,
"zmq": true,
"run_name": "",
"prefix": "",
"suffix": "",
"cpu": false,
"first_gpu": false,
"last_gpu": false,
"gpu": "auto"
}
INFO:sleap.nn.training:
INFO:sleap.nn.training:Training job:
INFO:sleap.nn.training:{
"data": {
"labels": {
"training_labels": null,
"validation_labels": null,
"validation_fraction": 0.1,
"test_labels": null,
"split_by_inds": false,
"training_inds": null,
"validation_inds": null,
"test_inds": null,
"search_path_hints": [],
"skeletons": []
},
"preprocessing": {
"ensure_rgb": false,
"ensure_grayscale": true,
"imagenet_mode": null,
"input_scaling": 0.25,
"pad_to_stride": null,
"resize_and_pad_to_target": true,
"target_height": null,
"target_width": null
},
"instance_cropping": {
"center_on_part": "Body-line",
"crop_size": null,
"crop_size_detection_padding": 16
}
},
"model": {
"backbone": {
"leap": {
"max_stride": 8,
"output_stride": 4,
"filters": 64,
"filters_rate": 2.0,
"up_interpolate": false,
"stacks": 1
},
"unet": null,
"hourglass": null,
"resnet": null,
"pretrained_encoder": null
},
"heads": {
"single_instance": null,
"centroid": null,
"centered_instance": {
"anchor_part": "Body-line",
"part_names": null,
"sigma": 2.5,
"output_stride": 4,
"loss_weight": 1.0,
"offset_refinement": false
},
"multi_instance": null,
"multi_class_bottomup": null,
"multi_class_topdown": null
},
"base_checkpoint": null
},
"optimization": {
"preload_data": true,
"augmentation_config": {
"rotate": true,
"rotation_min_angle": -15.0,
"rotation_max_angle": 15.0,
"translate": false,
"translate_min": -5,
"translate_max": 5,
"scale": false,
"scale_min": 0.9,
"scale_max": 1.1,
"uniform_noise": false,
"uniform_noise_min_val": 0.0,
"uniform_noise_max_val": 10.0,
"gaussian_noise": true,
"gaussian_noise_mean": 5.0,
"gaussian_noise_stddev": 1.0,
"contrast": false,
"contrast_min_gamma": 0.5,
"contrast_max_gamma": 2.0,
"brightness": true,
"brightness_min_val": 0.0,
"brightness_max_val": 10.0,
"random_crop": false,
"random_crop_height": 256,
"random_crop_width": 256,
"random_flip": true,
"flip_horizontal": false
},
"online_shuffling": true,
"shuffle_buffer_size": 128,
"prefetch": true,
"batch_size": 8,
"batches_per_epoch": null,
"min_batches_per_epoch": 200,
"val_batches_per_epoch": null,
"min_val_batches_per_epoch": 10,
"epochs": 200,
"optimizer": "adam",
"initial_learning_rate": 0.0001,
"learning_rate_schedule": {
"reduce_on_plateau": true,
"reduction_factor": 0.5,
"plateau_min_delta": 1e-06,
"plateau_patience": 5,
"plateau_cooldown": 3,
"min_learning_rate": 1e-08
},
"hard_keypoint_mining": {
"online_mining": false,
"hard_to_easy_ratio": 2.0,
"min_hard_keypoints": 2,
"max_hard_keypoints": null,
"loss_scale": 5.0
},
"early_stopping": {
"stop_training_on_plateau": true,
"plateau_min_delta": 1e-08,
"plateau_patience": 10
}
},
"outputs": {
"save_outputs": true,
"run_name": "240222_145208.centered_instance.n=20",
"run_name_prefix": "",
"run_name_suffix": "",
"runs_folder": "/home/ammon/Documents/Scripts/FishTrack/sleap/models",
"tags": [
""
],
"save_visualizations": true,
"delete_viz_images": true,
"zip_outputs": false,
"log_to_csv": true,
"checkpointing": {
"initial_model": false,
"best_model": true,
"every_epoch": false,
"latest_model": false,
"final_model": false
},
"tensorboard": {
"write_logs": false,
"loss_frequency": "epoch",
"architecture_graph": false,
"profile_graph": false,
"visualizations": true
},
"zmq": {
"subscribe_to_controller": true,
"controller_address": "tcp://127.0.0.1:9000",
"controller_polling_timeout": 10,
"publish_updates": true,
"publish_address": "tcp://127.0.0.1:9001"
}
},
"name": "",
"description": "",
"sleap_version": "1.3.3",
"filename": "/tmp/tmpvfx421d7/240222_145208_training_job.json"
}
INFO:sleap.nn.training:
2024-02-22 14:52:10.597671: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-02-22 14:52:10.603512: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-02-22 14:52:10.603672: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
INFO:sleap.nn.training:Auto-selected GPU 0 with 11038 MiB of free memory.
INFO:sleap.nn.training:Using GPU 0 for acceleration.
INFO:sleap.nn.training:Disabled GPU memory pre-allocation.
INFO:sleap.nn.training:System:
GPUs: 1/1 available
Device: /physical_device:GPU:0
Available: True
Initalized: False
Memory growth: True
INFO:sleap.nn.training:
INFO:sleap.nn.training:Initializing trainer...
INFO:sleap.nn.training:Loading training labels from: /home/ammon/Documents/Scripts/FishTrack/sleap/jallefish.labels.v001.slp
INFO:sleap.nn.training:Creating training and validation splits from validation fraction: 0.1
INFO:sleap.nn.training: Splits: Training = 18 / Validation = 2.
INFO:sleap.nn.training:Setting up for training...
INFO:sleap.nn.training:Setting up pipeline builders...
INFO:sleap.nn.training:Setting up model...
INFO:sleap.nn.training:Building test pipeline...
2024-02-22 14:52:11.407698: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-02-22 14:52:11.408510: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-02-22 14:52:11.408692: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-02-22 14:52:11.408806: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-02-22 14:52:11.733932: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-02-22 14:52:11.734090: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-02-22 14:52:11.734232: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-02-22 14:52:11.734326: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9233 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3060, pci bus id: 0000:01:00.0, compute capability: 8.6
INFO:sleap.nn.training:Loaded test example. [1.962s]
INFO:sleap.nn.training: Input shape: (32, 32, 1)
INFO:sleap.nn.training:Created Keras model.
INFO:sleap.nn.training: Backbone: LeapCNN(stacks=1, filters=64, filters_rate=2.0, down_blocks=3, down_convs_per_block=3, up_blocks=1, up_interpolate=False, up_convs_per_block=2)
INFO:sleap.nn.training: Max stride: 8
INFO:sleap.nn.training: Parameters: 2,509,443
INFO:sleap.nn.training: Heads:
INFO:sleap.nn.training: [0] = CenteredInstanceConfmapsHead(part_names=['Mouth', 'Body-line', 'Tail-tip'], anchor_part='Body-line', sigma=2.5, output_stride=4, loss_weight=1.0)
INFO:sleap.nn.training: Outputs:
INFO:sleap.nn.training: [0] = KerasTensor(type_spec=TensorSpec(shape=(None, 8, 8, 3), dtype=tf.float32, name=None), name='CenteredInstanceConfmapsHead/BiasAdd:0', description="created by layer 'CenteredInstanceConfmapsHead'")
INFO:sleap.nn.training:Training from scratch
INFO:sleap.nn.training:Setting up data pipelines...
INFO:sleap.nn.training:Training set: n = 18
INFO:sleap.nn.training:Validation set: n = 2
INFO:sleap.nn.training:Setting up optimization...
INFO:sleap.nn.training: Learning rate schedule: LearningRateScheduleConfig(reduce_on_plateau=True, reduction_factor=0.5, plateau_min_delta=1e-06, plateau_patience=5, plateau_cooldown=3, min_learning_rate=1e-08)
INFO:sleap.nn.training: Early stopping: EarlyStoppingConfig(stop_training_on_plateau=True, plateau_min_delta=1e-08, plateau_patience=10)
INFO:sleap.nn.training:Setting up outputs...
INFO:sleap.nn.callbacks:Training controller subscribed to: tcp://127.0.0.1:9000 (topic: )
INFO:sleap.nn.training: ZMQ controller subcribed to: tcp://127.0.0.1:9000
INFO:sleap.nn.callbacks:Progress reporter publishing on: tcp://127.0.0.1:9001 for: not_set
INFO:sleap.nn.training: ZMQ progress reporter publish on: tcp://127.0.0.1:9001
INFO:sleap.nn.training:Created run path: /home/ammon/Documents/Scripts/FishTrack/sleap/models/240222_145208.centered_instance.n=20
INFO:sleap.nn.training:Setting up visualization...
INFO:sleap.nn.training:Finished trainer set up. [3.4s]
INFO:sleap.nn.training:Creating tf.data.Datasets for training data generation...
INFO:sleap.nn.training:Finished creating training datasets. [3.0s]
INFO:sleap.nn.training:Starting training loop...
Epoch 1/200
2024-02-22 14:52:19.009446: I tensorflow/stream_executor/cuda/cuda_dnn.cc:366] Loaded cuDNN version 8201
Traceback (most recent call last):
File "/home/ammon/anaconda3/envs/sleap/bin/sleap-train", line 33, in
sys.exit(load_entry_point('sleap==1.3.3', 'console_scripts', 'sleap-train')())
File "/home/ammon/anaconda3/envs/sleap/lib/python3.7/site-packages/sleap/nn/training.py", line 2014, in main
trainer.train()
File "/home/ammon/anaconda3/envs/sleap/lib/python3.7/site-packages/sleap/nn/training.py", line 941, in train
verbose=2,
File "/home/ammon/anaconda3/envs/sleap/lib/python3.7/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/home/ammon/anaconda3/envs/sleap/lib/python3.7/site-packages/sleap/nn/callbacks.py", line 280, in on_epoch_end
figure = self.plot_fn()
File "/home/ammon/anaconda3/envs/sleap/lib/python3.7/site-packages/sleap/nn/training.py", line 1346, in
viz_fn=lambda: visualize_example(next(training_viz_ds_iter)),
File "/home/ammon/anaconda3/envs/sleap/lib/python3.7/site-packages/sleap/nn/training.py", line 1326, in visualize_example
preds = find_peaks(tf.expand_dims(example["instance_image"], axis=0))
File "/home/ammon/anaconda3/envs/sleap/lib/python3.7/site-packages/sleap/nn/inference.py", line 2088, in call
out = self.keras_model(crops)
ValueError: Exception encountered when calling layer "find_instance_peaks" (type FindInstancePeaks).

Input 0 of layer "model" is incompatible with the layer: expected shape=(None, 32, 32, 1), found shape=(1, 8, 8, 1)

Call arguments received:
• inputs=tf.Tensor(shape=(1, 32, 32, 1), dtype=float32)
terminate called without an active exception

lqmeyers · 2024-02-25T22:40:54Z

HI, I ran into the same issue training a leap-backbone top-down centered instance model with input scaling at 0.25 (it was just there by default, or maybe from an earlier run). Google brought me here and switching input scaling 1 fixed the problem. Here's the log in case that is helpful:

Output Log:

Hi! Just want to add I'm running into this issue as well, with updated SLEAP from conda. Assume it is being worked on, but in the mean time was curious what other params (other than batch size) to tweak to make centerd inst training smaller for our GPU limits.

Thanks!

Luke

Traceback (most recent call last):

Traceback (most recent call last): File "/home/lmeyers/anaconda3/envs/sleap/bin/sleap-train", line 33, in sys.exit(load_entry_point('sleap==1.2.8', 'console_scripts', 'sleap-train')()) File "/home/lmeyers/anaconda3/envs/sleap/lib/python3.7/site-packages/sleap/nn/training.py", line 1981, in main trainer.train() File "/home/lmeyers/anaconda3/envs/sleap/lib/python3.7/site-packages/sleap/nn/training.py", line 927, in train verbose=2, File "/home/lmeyers/anaconda3/envs/sleap/lib/python3.7/site-packages/keras/engine/training.py", line 1230, in fit callbacks.on_epoch_end(epoch, epoch_logs) File "/home/lmeyers/anaconda3/envs/sleap/lib/python3.7/site-packages/keras/callbacks.py", line 413, in on_epoch_end callback.on_epoch_end(epoch, logs) File "/home/lmeyers/anaconda3/envs/sleap/lib/python3.7/site-packages/sleap/nn/callbacks.py", line 280, in on_epoch_end figure = self.plot_fn() File "/home/lmeyers/anaconda3/envs/sleap/lib/python3.7/site-packages/sleap/nn/training.py", line 1332, in viz_fn=lambda: visualize_example(next(training_viz_ds_iter)), File "/home/lmeyers/anaconda3/envs/sleap/lib/python3.7/site-packages/sleap/nn/training.py", line 1312, in visualize_example preds = find_peaks(tf.expand_dims(example["instance_image"], axis=0)) File "/home/lmeyers/anaconda3/envs/sleap/lib/python3.7/site-packages/keras/engine/base_layer.py", line 1037, in __call__ outputs = call_fn(inputs, *args, **kwargs) File "/home/lmeyers/anaconda3/envs/sleap/lib/python3.7/site-packages/sleap/nn/inference.py", line 1868, in call out = self.keras_model(crops) File "/home/lmeyers/anaconda3/envs/sleap/lib/python3.7/site-packages/keras/engine/base_layer.py", line 1020, in __call__ input_spec.assert_input_compatibility(self.input_spec, inputs, self.name) File "/home/lmeyers/anaconda3/envs/sleap/lib/python3.7/site-packages/keras/engine/input_spec.py", line 269, in assert_input_compatibility ', found shape=' + display_shape(x.shape)) ValueError: Input 0 is incompatible with layer model: expected shape=(None, 768, 768, 3), found shape=(1, 192, 192, 3) terminate called without an active exception train-script.sh: line 2: 32871 Aborted (core dumped) sleap-train centered_instance.json labels.v001.pkg.slp

MHRosenberg · 2025-01-17T01:44:52Z

Not sure, but I think I might having an issue similar to this: #872 (comment)

A single animal set of training params leads to GPU memory errors if I enable visualizations but doesn't if I don't...

Happy to post json files or errors, if desirable.

talmo added the bug Something isn't working label Jul 29, 2022

roomrys changed the title ~~Visualization during training centered instance models does not work when input scaling is not 1.0~~ Centered instance model scales input image not cropped image leading to error Nov 23, 2022

roomrys changed the title ~~Centered instance model scales input image not cropped image leading to error~~ Centered instance model scales input image (not cropped image) leading to error Feb 21, 2023

roomrys mentioned this issue Jun 7, 2023

Test size error for multi-class topdown model #1282

Closed

roomrys mentioned this issue Dec 19, 2024

Fix input scaling in centered-instance model #2054

Merged

11 tasks

roomrys added the fixed in future release Fix or feature is merged into develop and will be available in future release. label Mar 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Centered instance model scales input image (not cropped image) leading to error #872

Centered instance model scales input image (not cropped image) leading to error #872

Bug report below

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Centered instance model scales input image (not cropped image) leading to error #872

Centered instance model scales input image (not cropped image) leading to error #872

Comments

Uh oh!

Discussed in #871

Bug report below

Uh oh!

Problem Analysis

Relevant Code

Follow-up problems

Traceback

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!