AI/ML notes

ch2.pretrained_network

Ch 2. Pretrained networks

Exploring Pretrained Models in PyTorch

Impact of Deep Learning on Computer Vision

  • Deep learning has revolutionized computer vision, driven by:
    • The need for image classification and interpretation.
    • Availability of large-scale datasets.
    • Advances in convolutional layers and GPU acceleration.
    • Interest from tech giants in understanding user-generated images.

Using Pretrained Models

  • A pretrained neural network functions like a program that maps inputs (images) to outputs (labels, captions, or new images).
  • Benefits of using pretrained models:
    • Leverages expert-designed architectures.
    • Saves computation time—no need to train from scratch.
    • Provides a strong starting point for deep learning projects.

Importance of Running Pretrained Models

  • Useful for evaluating, visualizing, and using deep learning models in real-world applications.
  • Prepares users for working with real data and model outputs, regardless of whether they trained the model themselves.
  • Learning PyTorch Hub helps efficiently access and share models through a unified interface.

Types of Pretrained Models Explored

  1. Image Classification Models – Identify objects in images.
  2. GANs (Generative Adversarial Networks) & CycleGAN – Generate new images from existing ones.
  3. Image Captioning Models – Generate descriptive text from images.

Using Pretrained Networks for Image Recognition in PyTorch

Pretrained Networks and ImageNet

  • Pretrained deep learning models are widely available through repositories, often published alongside research papers.
  • The ImageNet dataset (http://imagenet.stanford.edu) contains 14+ million labeled images and serves as a benchmark for image classification models.
  • The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) has driven improvements in:
    • Image classification (identifying object categories).
    • Object localization (detecting object positions in images).
    • Scene classification and parsing (segmenting images into meaningful regions).
  • Models trained on ImageNet classify images into 1,000 categories and return the top 5 predictions ranked by confidence.

Loading Pretrained Networks in PyTorch

Obtaining a Pretrained Model


AlexNet: A Historic Breakthrough in Deep Learning

  • Won the 2012 ILSVRC competition with a top-5 error rate of 15.4% (compared to 26.2% from non-deep learning models).

  • Marked a turning point for deep learning in computer vision.

  • Structure:

    • Five convolutional layers.
    • Fully connected layers converting the image into 1,000 class scores.
  • Loading AlexNet in PyTorch:

    alexnet = models.AlexNet()

ResNet: Deep Networks with Residual Connections

  • ResNet-101 introduced residual connections, solving the problem of training deep networks.

  • Won multiple ILSVRC competitions in 2015.

  • Loading a Pretrained ResNet Model:

    resnet = models.resnet101(pretrained=True)
  • ResNet-101 has 44.5 million parameters, requiring extensive computation during training.

  • Viewing Model Architecture:

    print(resnet)
    • Modules (layers) include:
      • Conv2d: Convolutional layer.
      • BatchNorm2d: Batch normalization.
      • ReLU: Activation function.
      • MaxPool2d: Pooling layer.
      • fc: Fully connected layer producing 1,000 class scores.

Preprocessing Images for Model Input

  • Input images must be resized, cropped, normalized, and converted into tensors.

  • Using torchvision.transforms for preprocessing:

    from torchvision import transforms
    
    preprocess = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406],
                             std=[0.229, 0.224, 0.225])
    ])
  • Loading and Preprocessing an Image:

    from PIL import Image
    img = Image.open("../data/p1ch2/bobby.jpg")
    img_t = preprocess(img)
  • Adding a Batch Dimension for Model Input:

    import torch
    batch_t = torch.unsqueeze(img_t, 0)

Running Inference on a Pretrained Model

  • Set the model to evaluation mode before running inference:

    resnet.eval()
  • Performing a Forward Pass on the Image:

    out = resnet(batch_t)
  • Output: A tensor of 1,000 class scores.


Decoding the Model's Prediction

  • Loading the ImageNet class labels:

    with open('../data/p1ch2/imagenet_classes.txt') as f:
        labels = [line.strip() for line in f.readlines()]
  • Finding the Predicted Label:

    _, index = torch.max(out, 1)
    labels[index[0]]
  • Computing Confidence Scores:

    percentage = torch.nn.functional.softmax(out, dim=1)[0] * 100
    labels[index[0]], percentage[index[0]].item()

    Example Output:

    ('golden retriever', 96.29)
  • Retrieving the Top 5 Predictions:

    _, indices = torch.sort(out, descending=True)
    [(labels[idx], percentage[idx].item()) for idx in indices[0][:5]]

    Example Output:

    [
        ('golden retriever', 96.29),
        ('Labrador retriever', 2.80),
        ('cocker spaniel', 0.28),
        ('redbone', 0.20),
        ('tennis ball', 0.11)
    ]
    • Observations: The model correctly identifies the dog breeds but also suggests "tennis ball," possibly due to bias in training data.
  • Experimenting with Different Images:

    • The model performs well on common objects seen in training but may misclassify unseen objects.

The GAN Game: Generative Adversarial Networks (GANs)

Understanding GANs

  • GAN (Generative Adversarial Network) consists of two competing neural networks:

    1. Generator (Painter) – Creates realistic images from random noise.
    2. Discriminator (Art Critic) – Distinguishes between real and generated images.
  • Objective:

    • The generator tries to fool the discriminator into believing fake images are real.
    • The discriminator learns to correctly classify real vs. fake images.
    • Over time, both networks improve, leading to highly realistic synthetic images.
  • GANs have powerful applications in:

    • Face generation (e.g., AI-generated human faces).
    • Image translation (e.g., turning sketches into realistic landscapes).
    • Audio synthesis (e.g., deep fake voice cloning).
    • Text generation (e.g., realistic AI-generated writing).

CycleGAN: Transforming Images Between Domains

  • CycleGAN extends GANs by allowing image-to-image translation without requiring paired examples.
  • Example: Transforming horses into zebras and vice versa.
  • How it works:
    • Two generators: One for horse → zebra, another for zebra → horse.
    • Two discriminators: One for detecting fake zebras, another for detecting fake horses.
    • Cycle consistency: Converts an image to another domain and back, ensuring realism.
  • Key Advantage: No need for perfectly aligned image pairs (e.g., a horse and zebra in the same pose).

Running a Pretrained CycleGAN Model (Horse to Zebra)

1. Load a Pretrained CycleGAN Generator

  • Define the generator using ResNet:

    netG = ResNetGenerator()
  • Load the pretrained model weights:

    import torch
    
    model_path = '../data/p1ch2/horse2zebra_0.4.0.pth'
    model_data = torch.load(model_path)
    netG.load_state_dict(model_data)
  • Set the model to evaluation mode:

    netG.eval()

2. Preprocess an Input Image

  • Load an image of a horse:

    from PIL import Image
    from torchvision import transforms
    
    img = Image.open("../data/p1ch2/horse.jpg")
  • Apply preprocessing transformations:

    preprocess = transforms.Compose([
        transforms.Resize(256),
        transforms.ToTensor()
    ])
    
    img_t = preprocess(img)
    batch_t = torch.unsqueeze(img_t, 0)

3. Generate the Fake Zebra Image

  • Run the generator on the image:

    batch_out = netG(batch_t)
  • Convert the output tensor back into an image:

    out_t = (batch_out.data.squeeze() + 1.0) / 2.0
    out_img = transforms.ToPILImage()(out_t)
    out_img.show()

A Pretrained Network for Scene Description

Introduction to Image Captioning

  • Image captioning models generate natural language descriptions of images.
  • The NeuralTalk2 model by Andrej Karpathy is a popular pretrained image-captioning model.
  • This model takes an input image and produces a coherent sentence describing the scene.

How Image Captioning Works

  • The model consists of two connected halves:
    1. Feature Extraction (CNN-based Network)
      • Converts an image into a numerical representation (detecting objects like "cat," "table," "mouse").
    2. Sentence Generation (Recurrent Neural Network - RNN)
      • Uses the extracted features to generate a sequence of words, forming a caption.
      • Words are generated sequentially, with each word depending on the previous ones.

Running NeuralTalk2 in PyTorch

  • Repository: NeuralTalk2 PyTorch

  • Running the Captioning Model:

    python eval.py --model ./data/FC/fc-model.pth \
                   --infos_path ./data/FC/fc-infos.pkl \
                   --image_folder ./data
  • Example Output for horse.jpg:

    • "A person riding a horse on a beach." (Correct caption)

Testing the Model with a Fake Image

  • Input: zebra.jpg (generated by CycleGAN from horse.jpg).
  • Output: "A group of zebras are standing in a field."
    • The model correctly identified a zebra but mistakenly saw multiple zebras.
    • Possible bias in the dataset: Zebras often appear in groups in training data.
    • The rider was ignored because riders on zebras were not in the training dataset.

Key Takeaways

  • Deep learning enables automated image captioning without hardcoded grammar rules.
  • Training data biases affect model outputs (e.g., assuming zebras appear in groups).
  • General-purpose architectures (CNNs + RNNs) can generate captions without prior domain knowledge.

Torch Hub: A Unified Interface for Pretrained Models

  • Introduced in PyTorch 1.0, Torch Hub provides a standardized way to access pretrained models from GitHub repositories.
  • Goal: Make loading third-party models as easy as loading models from TorchVision.
  • Key Feature: No need to manually clone repositories—PyTorch handles downloading and model loading automatically.

How Torch Hub Works

  • Authors publish a model by adding a hubconf.py file to the root of their GitHub repository.

  • Example hubconf.py structure:

    dependencies = ['torch', 'math']
    
    def some_entry_fn(*args, **kwargs):
        model = build_some_model(*args, **kwargs)
        return model
    
    def another_entry_fn(*args, **kwargs):
        model = build_another_model(*args, **kwargs)
        return model
  • Components of hubconf.py:

    • dependencies: Lists required libraries.
    • Entry functions: Define how to initialize and return the model.
    • Allows multiple entry points for different models or preprocessing steps.

Loading Models with Torch Hub

  • Example: Loading ResNet-18 from TorchVision’s GitHub repository:

    import torch
    from torch import hub
    
    resnet18_model = hub.load('pytorch/vision:master', 'resnet18', pretrained=True)
    • 'pytorch/vision:master' → Specifies repository and branch.
    • 'resnet18' → Calls the resnet18 function from hubconf.py.
    • pretrained=True → Loads pretrained ImageNet weights.
    • Downloads model and stores it in .torch/hub (default directory).

Advantages of Torch Hub

  • Consistent Interface: Any model with hubconf.py can be loaded the same way.
  • No Cloning Required: Models can be downloaded and used directly.
  • Flexible Entry Points:
    • Can return models, preprocessing functions, or end-to-end pipelines.
    • Some repositories may provide custom input transformations or label mappings.

Future of Torch Hub

  • Still growing, but expected to become a key model-sharing platform.
  • More researchers and developers are expected to adopt this format for sharing models.
  • Searching GitHub for hubconf.py can help discover available models.

Torch Hub provides a standardized, flexible, and efficient way to access pretrained models, making it easier than ever to experiment with state-of-the-art deep learning architectures.

Conclusion

One thing that PyTorch does particularly right is providing those building blocks in the form of an essential toolset—PyTorch is not a very large library from an API perspective, especially when compared with other deep learning frameworks.