This repo contains a script to convert the CrowdHuman dataset annotations to COCO format and a dataset Class for reading data.
CrowdHuman is a benchmark dataset to better evaluate detectors in crowd scenarios. The CrowdHuman dataset is large, rich-annotated and contains high diversity. CrowdHuman contains 15000, 4370 and 5000 images for training, validation, and testing, respectively. There are a total of 470K human instances from train and validation subsets and 23 persons per image, with various kinds of occlusions in the dataset. Each human instance is annotated with a head bounding-box, human visible-region bounding-box and human full-body bounding-box. We hope our dataset will serve as a solid baseline and help promote future research in human detection tasks.
Supported annotation_train.odgt and annotation_val.odgt which contains the annotations of CrowdHuman.
odgt is a file format that each line of it is a JSON, this JSON contains the whole annotations for the relative image. We prefer using this format since it is reader-friendly.
Annotation format:
JSON{
"ID" : image_filename,
"gtboxes" : [gtbox],
}
gtbox{
"tag" : "person" or "mask",
"vbox": [x, y, w, h],
"fbox": [x, y, w, h],
"hbox": [x, y, w, h],
"extra" : extra,
"head_attr" : head_attr,
}
extra{
"ignore": 0 or 1,
"box_id": int,
"occ": int,
}
head_attr{
"ignore": 0 or 1,
"unsure": int,
"occ": int,
}Keysinextraandhead_attrare optional, it means some of them may not existextra/head_attrcontains attributes forperson/headtagismaskmeans that this box iscrowd/reflection/something like person/...and need to beignore(theignoreinextrais1)vbox, fbox, hboxmeansvisible box, full box, head boxrespectively
Before converting, ensure that the dataset folder format like this:
|- crowdhuman
|- Images #contain all train and test images
|- annotation_train.json
|- annotation_val.jsonYou can use this to simply keep full boxes with tag person only
python crowdhuman2coco.py -d /path/to/crowdhuman/dataset -o /path/to/annotation_train.odgt/ -j /path/to/annotation_train.jsonFor more demand, run this to get more detial infomation
python crowdhuman2coco.py --helpThis repo also contains two simple implement of CrowdHuman Dataset Class in PyTorch and MegEngine.
The Dataset will return a tuple that contains the annotations that you need in order everytime when it call __getitem__
supported_order
class CrowdHuman(VisionDataset):
supported_order = (
"image",
"boxes",
"vboxes",
"hboxes",
"boxes_category",
"info",
)You can easily use this to instantiate a crowdhuman_dataset
crowdhuman_dataset = CrowdHuman(
root='path/to/CrowdHuman',
ann_file='path/to/annotations.json',
remove_images_without_annotations=True,
order=[
'image',
'boxes',
'boxes_category'
'info'
]
)