Skip to content

Data preparation (Local)

Local custom datasets

If your dataset is ready in local storage, you can use them by following the instructions.

Organize dataset

Create separate directories for images/labels and train/valid/test.

/my_dataset
├── images
│   ├── train
│   ├── valid
│   └── test
└── labels

Place your images on proper path.

/my_dataset
├── images
│   ├── train
│   │   ├── img1.jpg
│   │   ├── img2.jpg
│   │   └── ...
│   ├── valid
│   │   ├── img1.jpg
│   │   ├── img2.jpg
│   │   └── ...
│   └── test
│       ├── img1.jpg
│       ├── img2.jpg
│       └── ...
└── labels

Set labels on proper path.

  • For image classification, you may need csv format label files.
  • For semantic segmentation and object detection, organize your label files (could be masks or box annotations) in corresponding folders.
/my_dataset
├── images
│   ├── train
│   │   ├── img1.jpg
│   │   ├── img2.jpg
│   │   └── ...
│   ├── valid
│   │   ├── img1.jpg
│   │   ├── img2.jpg
│   │   └── ...
│   └── test
│       ├── img1.jpg
│       ├── img2.jpg
│       └── ...
└── labels
    └── train directory or file ...
    ├── valid directory or file ...
    └── test directory or file ...

If you just run training, test split may not needed.

If you just run evaluation or inference, train and valid split may not needed.

Set configuration file

Define the paths to your datasets in the configuration file to tell NetsPresso Trainer where to find the data. Finally, you can complete data configuration by adding some metadata like id_mapping. Here is example for classification:

data:
  name: my_custom_dataset
  task: classification # This could be other task
  format: local
  path:
    root: ./my_dataset
    train:
      image: train/images
      label: train/labels.csv
    valid:
      image: valid/images
      label: valid/labels.csv
    test:
      image: test/images
      label: test/labels.csv
  id_mapping: [cat, dog, elephant]

For detailed definition of data configuration, please refer to components/data

Open datasets

If you are interested in using open datasets, follow the instructions below to seamlessly integrate them into the local custom datasets format.

Image classification

CIFAR100

Run cifar100.py python file with your dataset directory as an argument.

CIFAR100 dataset will be automatically downloaded to ./data/download. After executing scripts, you can use pre-defined configuration.

python ./tools/open_dataset_tool/cifar100.py --dir ./data

ImageNet1K

ImageNet1K dataset cannot be automatically downloaded. You should download dataset from ImageNet website, and place downloaded files into ./data/download.

And, run imagenet1k.py python file with your dataset directorty and downloaded files path as arguments. After executing scripts, you can use pre-defined configuration.

(imagenet1k.py needs scipy library which is in requirements-optional.txt)

python ./tools/open_dataset_tool/imagenet1k.py --dir ./data --train-images ./data/download/ILSVRC2012_img_train.tar --valid-images ./data/download/ILSVRC2012_img_val.tar --devkit ./data/download/ILSVRC2012_devkit_t12.tar.gz

Semantic segmentation

ADE20K

Run ade20k.py python file with your dataset directory as an augument.

ADE20K dataset will be automatically downloaded to ./data/download. After executing scripts, you can use pre-defined configuration.

python ./tools/open_dataset_tool/ade20k.py --dir ./data

Cityscapes

Cityscapes dataset cannot be automatically downloaded. You should download dataset from Cityscapes website, and place downloaded files into ./data/download.

And, run cityscapes.py python file with your dataset directorty and downloaded files path as arguments. After executing scripts, you can use pre-defined configuration.

python --dir ./data --images .data/download/leftImg8bit_trainvaltest.zip --labels .data/download/gtFine_trainvaltest.zip

PascalVOC 2012

Run voc2012_seg.py python file with your dataset directory as an argument.

PascalVOC 2012 dataset will be automatically downloaded to ./data/download. After executing scripts, you can use pre-defined configuration.

python ./tools/open_dataset_tool/voc2012_seg.py --dir ./data

Object detection

COCO 2017

Run coco2017.py python file with your dataset directory as an argument.

COCO 2017 dataset will be automatically downloaded to ./data/download. After executing scripts, you can use pre-defined configuration.

python ./tools/open_dataset_tool/coco2017.py --dir ./data

Objects365

Run objects365.py python file with your dataset directory as an argument.

Objects365 dataset will be automatically downloaded to ./data/download/objects365. After executing scripts, you can use pre-defined configuration. As the dataset is quite large, It is recommened to use multiprocess when you download it (e.g., --num_process 4).

python ./tools/open_dataset_tool/objects365.py --dir ./data --num_process 4

PascalVOC 2012

Run voc2012_det.py python file with your dataset directory as an argument.

PascalVOC 2012 dataset will be automatically downloaded to ./data/download. After executing scripts, you can use pre-defined configuration.

python ./tools/open_dataset_tool/voc2012_det.py --dir ./data

Pose estimation

WFLW

Run wflw.py python file with your dataset directory as an argument.

WFLW dataset will be automatically downloaded to ./data/download. After executing scripts, you can use pre-defined configuration.

python ./tools/open_dataset_tool/wflw.py --dir ./data

Run NetsPresso Trainer

Now you can run NetsPresso Trainer with your local dataset!

python train.py --data your_huggingface_dataset_yaml_path.yaml ...