A pragmatic guide to training a Mask-RCNN model on your custom dataset
In the field of computer vision, image segmentation refers to classifying the object category and extracting the pixel-by-pixel mask of the objects from an image. So in addition to object classification and object localization, we also obtain information about which exact pixels are part of the object of interest in the image.
Mask-RCNN is a deep-neural network (an extension of Faster-RCNN) that carries out instance segmentation and was released in 2017 by Facebook. This blog post aims to provide brief and pragmatic guidance on the Tensorflow implementation of Mask-RCNN. So if you are not familiar with CNNs or Mask-RCNN, I recommend going through this blog or reading the original paper by Facebook AI Research before proceeding further.
In general, Mask R-CNN has two stages after a CNN feature extractor extracts image features:
At the end of this blog, you will be able to do the following tasks:
For the ease of labeling objects, I chose the task of recognizing and extracting masks of footballs in the images. The first step involves collecting your dataset if you don’t already have one for your application. Feel free to skip this step if you already have your own dataset.
For scraping images, we need a tool that can simulate a human user who manually downloads images from Google. Selenium is an open-source web-based automation tool that can get the job done by connecting Python with your web browser (we will use Google Chrome here). You also need a web driver for your corresponding Google Chrome version (find out your version by clicking ‘About Google Chrome’). You can download the driver here. For example, I had Version 85.0.4183.121 so I downloaded Chrome version 85.
pip3 install selenium
chromedriver.
Place it in the same directory as the python script scrape_images.py
below.In the script below, selenium launches a dummy Chrome browser which searches google for the query words (given by the queries
in the code below), fetches the URLs of the images from google, downloads, and saves them to your local folder scraped_images
. For each query, it can download about 100 images given by the parameter NUM_IMAGES
. Run the script using the python3 scrape_images.py
in the terminal and it will save the downloaded images in the folder scraped_images
. Remember to change your query words and the number of images you want to download for each query. You might get an error for some of the images as some images might be just inaccessible and some might give you errors while downloading but you can just add more query words if you want more images. You can filter the appropriate images according to the application that you are building. Here is a snippet from the code but you can download the full script here.
---------------
1from selenium import webdriver
2import io, os, time
3import requests
4import hashlib
5from PIL import Image
6
7def search_and_download(query_item ,driver_path ,max_images_to_fetch=10):
8
9 with webdriver.Chrome(executable_path=driver_path) as mydriver:
10 res = get_image_urls(query_item, max_images_to_fetch, wd=mydriver, sleeptime=0.5)
11
12 # download images from fetched urls
13 for elem in res:
14 download_image(target_folder,elem)
15
16if __name__ == '__main__':
17
18 QUERIES = ['football', 'soccer', 'soccer kick']
19 NUM_IMAGES = 100
20
21 curr_dir = os.getcwd()
22 driver_path = os.path.join(curr_dir, 'chromedriver')
23 target_folder = os.path.join(curr_dir, 'scraped_images')
24
25 if not os.path.exists(target_folder):
26 os.makedirs(target_folder)
27
28 for query in QUERIES:
29 search_and_download(query_item = query,driver_path = driver_path, max_images_to_fetch = NUM_IMAGES)
---------------
Although there are a number of labeling tools available, I used VIA as it is very intuitive, light-weight, and easy to use. You can just open it in your browser with this link, upload your dataset and begin labeling. You can choose different region shapes for labeling such as circle, oval, rectangle, or polygon. Since my object of interest was football, I chose a circle for a few and polygons for others. You should set the region_attributes
(for example class_name, class_id, image_quality, etc) in the beginning before you start labeling. Learn a few keyboard shortcuts to speed up the training process. I prepared two batches for labeling, one for training, and one for validation. You can preview the annotations using Annotations → Preview Annotations and export them as a JSON file.
At the end of the labeling process, my folder structure had two folders train
and val
with their respective images and annotations in files annot.json
Now that we have our dataset and the labels ready, we are ready to set up our Mask-RCNN project. A well-known implementation of Mask-RCNN can be found here by Matterport. Although it is only compatible with Tensorflow 1.3 and Keras 2.1.0, a lot of people have come with a version that is adapted to work with Tensorflow 2.0 which has its newly incorporated Keras. I used a version by TomGross and you can access it by cloning the git repo below.
This will make a folder named Mask_RCNN
which will have your files and sample scripts. You can copy the folder named mrcnn
and coco weights file mask_rcnn_coco.h5
inside Mask_RCNN
folder and placed them along with football_data
folder so the directory structure looked like this where football_segmentation.ipynb
is the jupyter notebook that we will use for training and logs
is a folder that we make to hold the trained model checkpoints and tensorboard information.
With the directory structure already set up in Step 3, we are ready to train the Mask-RCNN model on the football dataset. In football_segmentation.ipynb
below, import the necessary packages and modules from mrcnn
folder, define the root directory, coco weight path, and logs directory. The modules model
and utils
from mrcnn
have functions and classes for constructing the Mask-RCNN architecture.
-----------------
1import os, sys, json, datetime
2import numpy as np
3import skimage.draw
4
5from mrcnn.config import Config
6from mrcnn import model as modellib, utils
7
8# Root directory of the project
9ROOT_DIR = os.getcwd()
10sys.path.append(ROOT_DIR)
11
12# Path to trained weights file
13COCO_WEIGHTS_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5")
14
15# Directory to save logs and model checkpoints
16DEFAULT_LOGS_DIR = os.path.join(ROOT_DIR, "logs")
-----------------
The implementation by Matterport already has a base configuration that we import in the code above on line 5. However, for our custom dataset, we need to add a new class and overwrite the configuration in the parent class that it inherits. To see the values of the hyperparameters in the base configuration (for example, LEARNING_RATE
, STEPS_PER_EPOCH,
etc), refer to the parent class in the script Config.py
. The custom configuration is defined in the FootballConfig class in the code below.
-----------------
1class FootballConfig(Config):
2 """Configuration for training on the football dataset.
3 Derives from the base Config class and overrides some values.
4 """
5 # Give the configuration a recognizable name
6 NAME = "football"
7
8 # Adjust this according to your GPU's memory
9 IMAGES_PER_GPU = 1
10
11 # Number of classes (including background).
12 NUM_CLASSES = 1 + 1 # Background + football
13
14 # Number of training steps per epoch
15 STEPS_PER_EPOCH = 100
16
17 # Skip detections with < 90% confidence
18 DETECTION_MIN_CONFIDENCE = 0.9
19
20class FootballDataset(utils.Dataset):
21
22 def load_Football(self, dataset_dir, subset):
23 """Load a subset of the Football dataset.
24 dataset_dir: Root directory of the dataset.
25 subset: Subset to load: train or val
26 """
27 # Add football class
28 self.add_class("football", 1, "football")
29
30 # Choose from train or validation dataset
31 assert subset in ["train", "val"]
32 dataset_dir = os.path.join(dataset_dir, subset)
33
34 annotations = json.load(open(os.path.join(dataset_dir, "annot.json")))
35 annotations = list(annotations.values()) # don't need the dict keys
36
37 # The VIA tool saves images in the JSON even if they don't have any annotations. Skip unannotated images.
38 annotations = [a for a in annotations if a['regions']]
39
40 # Add images
41 for a in annotations:
42 # Get the x, y coordinaets of points of the polygons that make up
43 # the outline of each object instance. These are stores in the
44 # shape_attributes (see json format above)
45 # The if condition is needed to support VIA versions 1.x and 2.x.
46 if type(a['regions']) is dict:
47 polygons = [r['shape_attributes'] for r in a['regions'].values()]
48 else:
49 polygons = [r['shape_attributes'] for r in a['regions']]
50
51 # load_mask() needs the image size to convert polygons to masks.
52 # Unfortunately, VIA doesn't include it in JSON, so we must read
53 # the image. This is only managable since the dataset is tiny. Else you could
54 # also add the image sizes in the annotation JSONS seperately after VIA labeling.
55 image_path = os.path.join(dataset_dir, a['filename'])
56 image = skimage.io.imread(image_path)
57 height, width = image.shape[:2]
58
59 self.add_image(
60 "football",
61 image_id=a['filename'], # use file name as a unique image id
62 path=image_path,
63 width=width, height=height,
64 polygons=polygons)
65
66 def load_mask(self, image_id):
67 """Generate instance masks for an image.
68 Returns:
69 masks: A bool array of shape [height, width, instance count] with
70 one mask per instance.
71 class_ids: a 1D array of class IDs of the instance masks.
72 """
73 # If not a football dataset image, delegate to parent class.
74 image_info = self.image_info[image_id]
75 if image_info["source"] != "football":
76 return super(self.__class__, self).load_mask(image_id)
77
78 # Convert polygons to a bitmap mask of shape
79 # [height, width, instance_count]
80 info = self.image_info[image_id]
81
82 mask = np.zeros([info["height"], info["width"], len(info["polygons"])],
83 dtype=np.uint8)
84
85 for i, p in enumerate(info["polygons"]):
86 # Get indexes of pixels inside the polygon and set them to 1
87 # if you end up using any other shape apart from circle or polygon, add the shape format here
88 if 'r' in p:
89 rr, cc = skimage.draw.circle( p['cx'], p['cy'], p['r'])
90 mask[cc, rr,i] = 1
91 else:
92 rr, cc = skimage.draw.polygon(p['all_points_y'], p['all_points_x'])
93 mask[rr, cc, i] = 1
94
95 # Return mask, and array of class IDs of each instance. Since we have
96 # one class ID only, we return an array of 1s
97 return mask.astype(np.bool), np.ones([mask.shape[-1]], dtype=np.int32)
98
-----------------
You can visualize the mask and the corresponding image using the code below. Remember to replace the mask pixel value of 1 with 255 in lines 90 and 93 above. The image and the corresponding mask are shown below.
-----------------
-----------------
For training, we need to load the configuration and functions defined above in FootballDataset class, load, and prepare the dataset for training and validation using prepare
function. The model is imported as modellib
and the configuration defined by FootballConfig is used to initialize it. We train the last layers of the model with coco weights (line 16) for 30 epochs and the default learning rate. You can tune these hyperparameters. There are default callbacks for TensorBoard and saving of checkpoints for each epoch in the logs
folder.
1def train(model, dataset):
2 """Train the model."""
3 # Training dataset.
4 dataset_train = FootballDataset()
5 dataset_train.load_Football(dataset, "train")
6 dataset_train.prepare()
7
8 # Validation dataset
9 dataset_val = FootballDataset()
10 dataset_val.load_Football(dataset, "val")
11 dataset_val.prepare()
12
13 # Since we're using a very small dataset, and starting from
14 # COCO trained weights, we don't need to train too long. Also,
15 # no need to train all layers, just the heads should do it.
16 model.train(dataset_train, dataset_val,
17 learning_rate=config.LEARNING_RATE,
18 epochs=30,
19 layers='heads')
20
21config = FootballConfig()
22model = modellib.MaskRCNN(mode="training", config=config, model_dir=DEFAULT_LOGS_DIR)
23
24weights_path = COCO_WEIGHTS_PATH
25dataset = os.path.join(ROOT_DIR , "football_data")
26
27model.load_weights(weights_path, by_name=True, exclude=[
28 "mrcnn_class_logits", "mrcnn_bbox_fc",
29 "mrcnn_bbox", "mrcnn_mask"])
30
31train(model, dataset)
32
The training shows the different loss values at each epoch. Since Mask-RCNN involves RPN (Regional Proposal network) that predicts region proposals, along with class labels, bounding boxes, and mask predictions, we can see five different types of losses that have to be minimized. For more information about losses, see the Mask-RCNN architecture in this article or read the original paper here or see the script model.py
.
I would suggest training on a GPU because Mask-RCNN is computationally heavy to train on a CPU. If you have a machine with GPU, you can just start training right away but if you don’t have it, you can use Google Colab as they provide free GPU access. Just mount your google drive into your Colab notebook using the code below. You will be asked to click on a link to authorize the mount. Copy the authorization code back into the notebook and you can obtain access to the data in your google drive in Google Colab.
from google.colab import drivedrive.mount('/content/gdrive')
The entire training for 30 epochs took about 30min for me (78 train + 21 validation images). If you are training on your own machine, the training can be visualized using this command in the terminal: tensorboard --logdir=/path_to_logs
and then opening localhost:6006 in your browser. For training on Google Colab, you can visualize the losses through tensorboard using this command below.
%load_ext tensorboard%tensorboard --logdir=<path_to_logs> --host=127.0.0.1
For inference, you can load the model weights from one of the checkpoints saved in the logs
folder and visualize the predictions for unseen instances. The first element ofresults
on line 29 below gives a dictionary with keys about the following predictions:
1import matplotlib.pyplot as plt
2from mrcnn import visualize
3
4ROOT_DIR = os.getcwd()
5MODEL_DIR = os.path.join(ROOT_DIR,"logs")
6weights_path = os.path.join(MODEL_DIR, "mask_rcnn_football_0030.h5")
7
8class InferenceConfig(FootballConfig):
9 # Set batch size to 1 since we'll be running inference on
10 # one image at a time. Batch size = GPU_COUNT * IMAGES_PER_GPU
11 GPU_COUNT = 1
12 IMAGES_PER_GPU = 1
13
14config = InferenceConfig()
15# config.display()
16
17# Create model object in inference mode.
18model = modellib.MaskRCNN(mode="inference", model_dir=MODEL_DIR, config=config)
19
20# Load weights trained on MS-COCO
21model.load_weights(weights_path, by_name=True)
22
23# Load a random image from a path
24image = skimage.io.imread(os.path.join(ROOT_DIR, 'img.jpg'))
25
26# prediction
27results = model.detect([image], verbose=1)
28
29# Visualize results
30r = results[0]
31class_names = ['BG', 'football']
32visualize.display_instances(image, r['rois'], r['masks'], r['class_ids'],
33 class_names, r['scores'],
34 title="Predictions")
35
The model does a pretty good job of identifying footballs in the images. It can also recognize masked footballs and difficult to isolate footballs like in the image below.
In this tutorial, you learned to collect and labeled data, set up your Mask RCNN project, and train a model to perform instance segmentation. The labeled data, the entire code, and the trained weights are available at my Github repo.
Hope you enjoyed the post! Leave your comments and suggestions below.
References