Neural Data Server - Use Fetched Data for Transfer Learning

Data Preprocessing

We here provide useful code and tips for using the data from datasets indexed in our Dataset Registry.

JSON file of imageids

After clicking Download from NDS, you will obtain a data.json file containing a list of image IDs (filenames) prefixed by the respective dataset name in the following format:

{
  "imageids": [
    "[DATASET_NAME]/[IMAGEID]"
    ...
    "openimages/2c0134029eeadb66.jpg",
    "openimages/b03d875edb3f74a0.jpg",
    "coco/000000215569.jpg",
    "coco/000000559836.jpg"
    ...
  ]
}

You can read in a list of IDs for each dataset using:

import json

d = json.load(open('data.json', 'r'))
COCO_IDS = list(map(lambda y: y[5:-4], filter(lambda x: x.startswith('coco'), d['imageids'])))
OPENIMAGE_IDS = list(map(lambda y: y[11:-4], filter(lambda x: x.startswith('openimages'), d['imageids'])))

If you want to only download the images corresponding to the Image IDs rather than the entire dataset, you may download images through the URL links obtained from the respective dataset provider. The following script uses the train-images-boxable-with-rotation.csv Image IDs provided by OpenImages to map Image IDs to URL links.

import csv
import wget

with open('train-images-boxable-with-rotation.csv') as f:
    csv_reader = csv.DictReader(f)
    for row in csv_reader:
        if row['ImageID'] in OPENIMAGE_IDS:
            url = row['OriginalURL']
            try:
                wget.download(url, row['ImageID']+'.jpg')
            except:
                print("Invalid URL: ", url)

NOTE: Please check the respective dataset provider's website for the file format mapping IDs to URL link. As some URL links may be broken, it is not guaranteed that all images can be downloaded.

Combining datasets using COCO format annotations

If you wish to combine multiple datasets, it is often useful to convert them into a unified data format. The following link provides annotations for object detection in COCO Dataset Format (hosted on the dataset providers' websites):

Useful Tools:

Converting Annotations in a Common Format

This script will allow you to merge the annotations above into a single COCO-style annotation file containing Image IDs in your data.json file.

Other annotations

To obtain other annotations, you may use the Image IDs along with your desired annotation found on the respective dataset provider website.

Additional Resources

Once obtaining your data and annotations, you can start training your networks. You may find the following resources useful to help get started: