How to deepcopy a model in PyTorch?

To deepcopy a model in PyTorch, we can use either copy.deepcopy or make new instance of the model and copy the parameters using load_state_dict and state_dict. The module 'copy' in Python provides us deepcopy() method to create a deep copy. So we can use deepcopy() to create a deepcopy of any object in Python not restricted to PyTorch only. While deepcopy using load_state_dict and state_dict is specific to PyTorch only.

Lets understand these two approaches in detail.

Prerequisite/ setup

We need to install PyTorch.
pip install torch
For more details, please visit the PyTorch page to install locally.

Approach 1: Using the copy.deepcopy()

In this approach, we use deepcopy() method to deepcopy a PyTorch model. The deepcopy() method is available in Python module 'copy'. 

Syntax

copy.deepcopy(model)
Here model is a PyTorch model/ neural network, we deepcopy this model.

Steps

1. Import required libraries
2. Define a model/ neural network
3. Create a deep copy using copy.deepcopy()
4. Print both models

1. Import required libraries/ module

The first step is to import the necessary libraries/ modules. Here we will use torch - library and copy - module.
import torch
import copy

2. Define a model/ neural network

Now we define a simple model/ neural network. Here we define a linear model with  'in_features' = 2 and 'out_features' = 2. 
model = torch.nn.Linear(2, 2)

3. Create a deep copy using copy.deepcopy()

Now use the deepcopy() method to deepcopy the above defined PyTorch model. The deepcopy is assigned to 'model_copy'. 
model_copy = copy.deepcopy(model)

4. Print both models

At last step, we print the both models.
print(model.weight)
print(model_copy.weight)
Now look at the complete Python program example with all steps discussed above.

Example 1

# import required lib/module
import torch
import copy

# define a simple model
model = torch.nn.Linear(2, 2)
# create a deepcopy of the above model
model_copy = copy.deepcopy(model)

# Print both models
print(model.weight)
print(model_copy.weight)

Output

Parameter containing:
tensor([[-0.3875,  0.1497],
        [-0.1765, -0.2011]], requires_grad=True)
Parameter containing:
tensor([[-0.3875,  0.1497],
        [-0.1765, -0.2011]], requires_grad=True)

Look at the output, both models are the same.
Note: You may get different values of the above tensors as the weights are initialized randomly.

Approach 2: Using the load_state_dict() and state_dict()  

The second and most accurate approach is to create an instance of the model and then copy the parameters (weights and biases) using the load_state_dict and state_dict. To create an instance of the model we use type() method in Python. See the below syntax:

Syntax

model_copy = type(model)(args)
model_copy.load_state_dict(model.state_dict())
Here model is our model, args are the arguments (here 'in_features' and 'out_features') of the model. We should pass these arguments, else it will throw an error (See example 3)

Steps

1. Import required libraries
2. Define a model/ neural network
3. Get a new instance of the model
4. Copy weights and biases
5. Print both models

1. Import required libraries

We need only torch to perform our task. So import torch.
import torch

2. Define a model/ neural network

Define a simple model as in first approach:
model = torch.nn.Linear(2, 2)

3. Get a new instance of the model

Now create a new instance of the above model:
model_copy = type(model)(2,2)

4. Copy weights and biases

Now we copy the parameters (weights and biases) of the model to the above created instance 'model_copy'. To get the parameters of the model, we use model.state_dict().
model_copy.load_state_dict(model.state_dict())

5. Print both models

Now at last print both models (the weights of the mdoels):
print(model.weight)
print(model_copy.weight)

Example 2

import torch

# define a simple model
model = torch.nn.Linear(2, 2)

# get a new instance of the model
model_copy = type(model)(2,2)

# copy weights and biases
model_copy.load_state_dict(model.state_dict())

# Print both models
print(model.weight)
print(model_copy.weight)

Output

Parameter containing:
tensor([[-0.0303, -0.6644],
        [ 0.1111,  0.5059]], requires_grad=True)
Parameter containing:
tensor([[-0.0303, -0.6644],
        [ 0.1111,  0.5059]], requires_grad=True)

Note: You may get different values of the above tensors as the weights are initialized randomly.

Example 3: 

# import required lib/module
import torch

# define a simple model
model = torch.nn.Linear(2, 2)

# get a new instance of the model
model_copy = type(model)()

# copy weights and biases
model_copy.load_state_dict(model.state_dict())

# Print both models
print(model.weight)
print(model_copy.weight)

Output

TypeError: __init__() missing 2 required positional arguments: 'in_features' and 'out_features'

Notice that we have not passed the arguments in  "model_copy = type(model)()'. That's create this error. So don't forget to pass same arguments as of the model.

Conclusion

In this post we discussed two approaches to deepcopy a model in PyTorch. The first approach is to use copy.deepcopy() method. The other approach is to first create an instance of the model and then copy the model parameters (weights and biases) to created intance using load_state_dict and state_dict.

Advertisements




Comments