Introduction
How many times have you lost the TV remote, and how much time did you spend finding it? It happens to most of us and is the most frustrating experience. However, if I told you that there is a computer algorithm that could solve the problem in a few milliseconds?
Object Detection is a method to solve such problems. Have you ever tried an Object Detection model using a dataset of your choice?

In this article, we will deeply understand the details of Detectron2 for Object Detection. This includes its introduction, origin, getting started perks, and implementation.
Introduction and Origin to Detectron2
Detectron2 is an advanced library launched by Facebook AI Research (FAIR) in 2018 to implement detection and segmentation problems. It is framed upon the maskrcnn benchmark and requires CUDA to solve heavy computations. It supports numerous operations such as bounding box detection, keypoint detection, instance segmentation, etc. It has come up with pre-trained models that you can load easily and use per your requirements.
The previous framework Detectron was implemented using Caffe2. However, this new framework is implemented in PyTorch and uses Torch attributes. All the models present are pre-trained on COCO Dataset.
Detectron2 for Object Detection
Now that you have a brief idea about Detectron2 let’s start with Detectron2 for Object Detection.
Getting started with Detectron2
To start with Detectron2, we will install the necessary dependencies, check libraries, and import a few necessary packages.
Installation
We will start with installing a few prerequisites, such as Torch Vision and COCO API. Then we will check if the CUDA is available. And finally, we will install Detectron2 using the following piece of code.
# installing dependencies:
!pip install -U torch==1.5 torchvision==0.6 -f https://download.pytorch.org/whl/cu101/torch_stable.html
!pip install cython pyyaml==5.1
!pip install -U 'git+ https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
import torch, torchvision
print(torch.__version__, torch.cuda.is_available())
!gcc --version
# install detectron2:
!pip install detectron2==0.1.3 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.5/index.html
Importing a few necessary packages
import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()
#import some common libraries
import numpy as n
import cv2
Import random
from google.colab.patches import cv2_imshow
#importing some common detectron2 utilities
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog
Implementation of Detectron2 with Datasets
This module covers the complete steps involved in Detectron2 for Object Detection.
Step-1: Preparing the Dataset
Some datasets have built-in support in Detectron2 and they are listed in builtin datasets folder. If you want to use a Dataset of your choice, you must register it.
Firstly, we will train a Text Detection Model from an existing pre-trained model on the COCO dataset. The Text Detection Dataset has three classes: Hindi, English and Others. There are several formats in which you can feed data to the model. However, Detectron2 accepts only the COCO format. The COCO format contains a JSON file that includes all the image details.
Step-2: Registering the Dataset
Use the following code to register the dataset.
import json
from detectron2.structures import BoxMode
def get_board(imgdir):
json_file = imgdir+"/dataset.json"
with open(json_file) as file:
dataset = json.load(file)
for i in dataset:
filename = i["file_name"]
i["file_name"] = imgdir+"/"+filename
for j in i["annotations"]:
j["bbox_mode"] = BoxMode.XYWH_ABS
j["category_id"] = int(j["category_id"])
return dataset
from detectron2.data import DatasetCatalog, MetadataCatalog
for d in ["train", "val"]:
DatasetCatalog.register("boardetect_" + d, lambda d=d: get_board("Text_Detection_Dataset_COCO_Format/" + d))
MetadataCatalog.get("boardetect_" + d).set(thing_classes=["HINDI","ENGLISH","OTHER"])
board_metadata = MetadataCatalog.get("boardetect_train")
Step-3: Starting with the Training Set
We will randomly pick two pictures from our Metro Dataset and analyze what the bounding boxes look like. Use the following Python code for the same.
dataset = get_board("Text_Detection_Dataset_COCO_Format/metro")
for d in random.sample(dataset, 2):
img = cv2.imread(d["file_name"])
visualizer = Visualizer(img[:, :, ::-1], metadata=board_metadata)
vis = visualizer.draw_dataset_dict(d)
cv2_imshow(vis.get_image()[:, :, ::-1])
Output:


Step-4: Training the Model
Here, we will just fine-tune our model on the dataset. You can use the example configuration below for your reference.
from detectron2.engine import DefaultTrainer
from detectron2.config import get_con
import os
con = get_con()
con.merge_from_file(model_zoo.get_con_file("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml"))
con.DATASETS.TRAIN = ("boardetect_metro",)
con.DATASETS.TEST = ("boardetect_val",)
con.DATALOADER.NUM_WORKERS = 4
con.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml") con.SOLVER.IMS_PER_BATCH = 4
con.SOLVER.BASE_LR = 0.0125 # pick a good LearningRate
con.SOLVER.MAX_ITER = 1500 #No. of iterations
con.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 256
con.MODEL.ROI_HEADS.NUM_CLASSES = 3 # No. of classes = [HINDI, ENGLISH, OTHER]
con.TEST.EVAL_PERIOD = 500
os.makedirs(con.OUTPUT_DIR, exist_ok=True)
trainer = CocoTrainer(cfg)
trainer.resume_or_load(resume=False)
trainer.train()
Step-5: Example using the Trained Model
Use the code below to check the Trained Model configuration.
from detectron2.utils.visualizer import ColorMode
con.MODEL.WEIGHTS = os.path.join(con.OUTPUT_DIR, "model_final.pth")
con.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.8
con.DATASETS.TEST = ("boardetect_val", )
predictor = DefaultPredictor(con)
dataset = get_board("Text_Detection_Dataset_COCO_Format/val")
for d in random.sample(dataset, 2):
im = cv2.imread(d["file_name"])
outputs = predictor(im)
v = Visualizer(im[:, :, ::-1],
metadata=board_metadata,
scale=0.8,
instance_mode=ColorMode.IMAGE
)
v = v.draw_instance_predictions(outputs["instances"].to("cpu")) cv2_imshow(v.get_image()[:, :, ::-1])
Output:


Step-6: Evaluation of the Trained Model
Use the reference code to evaluate the trained model.
from detectron2.evaluation import COCOEvaluator, inference_on_dataset
from detectron2.data import build_detection_test_loader
evaluator = COCOEvaluator("boardetect_val", con, False, output_dir="/output/")
val_loader = build_detection_test_loader(con, "boardetect_val")
inference_on_dataset(predictor.model, val_loader, evaluator)
Also see, Sampling and Quantization



