Codecademy · parkersarahl · Jan 1, 2025 · Jan 1, 2025 · Jan 1, 2025 · Jan 2, 2025
diff --git a/content/pytorch/concepts/parallelizing-models/parallelizing-models.md b/content/pytorch/concepts/parallelizing-models/parallelizing-models.md
@@ -0,0 +1,62 @@
+---
+Title: 'parallelizing-models' 
+Description: 'Model parallelization is used to train models that require more memory than what is available on a single GPU.' 
+Subjects:
+  - 'Computer Science'
+  - 'Machine Learning'
+  - 'Data Science'
+Tags: 
+  - 'Algorithms'
+  - 'PyTorch'
+  - 'Machine Learning'
+CatalogContent: 
+  - 'intro-to-py-torch-and-neural-networks'
+  - 'paths/build-a-machine-learning-model'
+---
+
+**Model parallelization** trains deep learning models that require more memory than what is available on a single graphic processing unit (GPU). It is used within the pyTorch library. The model is separated into parts (i.e. layers or modules) and assigned to different GPUs. The GPUs perform the computations simultaneously, allowing for faster processing of large models. The GPUs communicate with each other and share the data to ensure the data output from one GPU is used in another GPU as needed. 
+
+## Setting up the environment
+```pseudo 
+import torch
+import torch.nn as nn
+```
+
+## Syntax
+To utilize model parallelization the model should be wrapped using the following syntax:
+
+```shell
+class ModelParallel(nn.Module)
+```
+
+## Example
+The layers or modules should then be assigned to a specified GPU. The following example demonstrates how this can be accomplished. 
+
+```py
+# Define a model split across two GPUs
+class ModelParallel(nn.Module):
+    def __init__(self):
+        super(ModelParallel, self).__init__()
+        self.layer1 = nn.Linear(1000, 500).to('cuda:0')  # First GPU
+        self.layer2 = nn.Linear(500, 100).to('cuda:1')   # Second GPU
+
+    def forward(self, x):
+        x = x.to('cuda:0')  # Input to first GPU
+        x = self.layer1(x)
+        x = x.to('cuda:1')  # Output of first layer to second GPU
+        x = self.layer2(x)
+        return x
+
+model = ModelParallel()
+x = torch.randn(64, 1000)
+output = model(x)
+```
+## Output
+
+The output of the above code would result in a tensor. The exact values would depend on the initialization of the model weights and the input data, but could be expected to look similar to the following output:
+```shell
+tensor([[ 0.1324, -0.2847,  ...,  0.5921],  # First sample in the batch
+        [-0.0412,  0.4891,  ..., -0.2345],  # Second sample in the batch
+        ...
+        [ 0.2347, -0.1011,  ...,  0.4567]]) # 64 rows, each with 100 values
+```