Home How to disable automatic checkpoint loading

Questions

How to disable automatic checkpoint loading

December 9, 2021

Im trying to run a loop over a set of parameters and I wan’t to make a new network for each parameter and let it learn a few epochs.

Currently my code looks like this:

def optimize_scale(self, epochs=5, comp_scale=100, scale_list=[1, 100]):
    trainer = pyli.Trainer(gpus=1, max_epochs=epochs)
    
    for scale in scale_list:
        test_model = CustomNN(num_layers=1, scale=scale, lr=1, pad=True, batch_size=1)
        trainer.fit(test_model)
        trainer.test(verbose=True)
        
        del test_model

Everything works fine for the first element of scale_list, the network learns 5 epochs and completes the test. All this can be seen in the console. However for all following elements of scale_list it doesn’t work as the old network is not overwritten, but instead an old checkpoint is loaded automatically when trainer.fit(model) is called. In the console this is indicated through:

C:\Users\XXXX\AppData\Roaming\Python\Python39\site-packages\pytorch_lightning\callbacks\model_checkpoint.py:623: UserWarning:
Checkpoint directory D:\XXXX\src\lightning_logs\version_0\checkpoints exists and is not empty.
rank_zero_warn(f"Checkpoint directory {dirpath} exists and is not empty.")
train_size = 8   val_size = 1    test_size = 1
Restoring states from the checkpoint path at D:\XXXX\src\lightning_logs\version_0\checkpoints\epoch=4-step=39.ckpt
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Loaded model weights from checkpoint at D:\XXXX\src\lightning_logs\version_0\checkpoints\epoch=4-step=39.ckpt

The consequence is that the second test outputs the same result, as the the checkpoint from the old network was loaded which already finished all 5 epochs. I though that adding the del test_model might help in dropping the model completely, but that did not work.

On my search I found a few Issues closely related, for example: https://github.com/PyTorchLightning/pytorch-lightning/issues/368. However I did not manage to fix my problem. I assume it has something to with the fact that the new network which should overwrite the old one has the same name/version and therefore looks for the same checkpoints.

If anyone has an idea or knows how to circumvent this I would be very grateful.

>Solution :

I think, in your settings, you want to disable automatic checkpointing:

trainer = pyli.Trainer(gpus=1, max_epochs=epochs,enable_checkpointing=False)

You may need to explicitly save a checkpoint (with a different name) for each training session you are running.

You can manually save a checkpoint via:

trainer.save_checkpoint(f'checkpoint_for_scale_{scale}.pth')

pytorch-lightning

byMR

Published December 09, 2021

Add a comment

re.findall not reading a file

byMR

December 9, 2021

Questions

Function for returning estimates based on direction of confidence interval values

byMR

December 9, 2021

Questions

Keep only letters in all rows of specific column – remove all other characters

byMR

December 9, 2021

Questions

Get data from XML using python

byMR

December 9, 2021

Questions

Typescript throwing an error for the wrong type even when the type is right

byMR

December 9, 2021

Questions

Count how many values in Column D are bigger than in Column C based on criteria in Column B

byMR

December 9, 2021

How to disable automatic checkpoint loading

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

re.findall not reading a file

Function for returning estimates based on direction of confidence interval values

Keep only letters in all rows of specific column – remove all other characters

Get data from XML using python

Typescript throwing an error for the wrong type even when the type is right

Count how many values in Column D are bigger than in Column C based on criteria in Column B

Keep Up to Date with the Most Important News

How to disable automatic checkpoint loading

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

re.findall not reading a file

Function for returning estimates based on direction of confidence interval values

Keep only letters in all rows of specific column – remove all other characters

Get data from XML using python

Typescript throwing an error for the wrong type even when the type is right

Count how many values in Column D are bigger than in Column C based on criteria in Column B

Discover more from Dev solutions