Data preparation (Local)¶
Local custom datasets¶
If your dataset is ready in local storage, you can use them by following the instructions.
Organize dataset¶
Create separate directories for images/labels and train/valid/test.
Place your images on proper path.
/my_dataset
├── images
│ ├── train
│ │ ├── img1.jpg
│ │ ├── img2.jpg
│ │ └── ...
│ ├── valid
│ │ ├── img1.jpg
│ │ ├── img2.jpg
│ │ └── ...
│ └── test
│ ├── img1.jpg
│ ├── img2.jpg
│ └── ...
└── labels
Set labels on proper path.
- For image classification, you may need csv format label files.
- For semantic segmentation and object detection, organize your label files (could be masks or box annotations) in corresponding folders.
/my_dataset
├── images
│ ├── train
│ │ ├── img1.jpg
│ │ ├── img2.jpg
│ │ └── ...
│ ├── valid
│ │ ├── img1.jpg
│ │ ├── img2.jpg
│ │ └── ...
│ └── test
│ ├── img1.jpg
│ ├── img2.jpg
│ └── ...
└── labels
└── train directory or file ...
├── valid directory or file ...
└── test directory or file ...
If you just run training, test split may not needed.
If you just run evaluation or inference, train and valid split may not needed.
Set configuration file¶
Define the paths to your datasets in the configuration file to tell NetsPresso Trainer where to find the data. Finally, you can complete data configuration by adding some metadata like id_mapping
. Here is example for classification:
data:
name: my_custom_dataset
task: classification # This could be other task
format: local
path:
root: ./my_dataset
train:
image: train/images
label: train/labels.csv
valid:
image: valid/images
label: valid/labels.csv
test:
image: test/images
label: test/labels.csv
id_mapping: [cat, dog, elephant]
For detailed definition of data configuration, please refer to components/data
Open datasets¶
If you are interested in using open datasets, follow the instructions below to seamlessly integrate them into the local custom datasets format.
Image classification¶
CIFAR100¶
Run cifar100.py
python file with your dataset directory as an argument.
CIFAR100 dataset will be automatically downloaded to ./data/download
. After executing scripts, you can use pre-defined configuration.
ImageNet1K¶
ImageNet1K dataset cannot be automatically downloaded. You should download dataset from ImageNet website, and place downloaded files into ./data/download
.
And, run imagenet1k.py
python file with your dataset directorty and downloaded files path as arguments. After executing scripts, you can use pre-defined configuration.
(imagenet1k.py
needs scipy library which is in requirements-optional.txt)
python ./tools/open_dataset_tool/imagenet1k.py --dir ./data --train-images ./data/download/ILSVRC2012_img_train.tar --valid-images ./data/download/ILSVRC2012_img_val.tar --devkit ./data/download/ILSVRC2012_devkit_t12.tar.gz
Semantic segmentation¶
ADE20K¶
Run ade20k.py
python file with your dataset directory as an augument.
ADE20K dataset will be automatically downloaded to ./data/download
. After executing scripts, you can use pre-defined configuration.
Cityscapes¶
Cityscapes dataset cannot be automatically downloaded. You should download dataset from Cityscapes website, and place downloaded files into ./data/download
.
And, run cityscapes.py
python file with your dataset directorty and downloaded files path as arguments. After executing scripts, you can use pre-defined configuration.
python --dir ./data --images .data/download/leftImg8bit_trainvaltest.zip --labels .data/download/gtFine_trainvaltest.zip
PascalVOC 2012¶
Run voc2012_seg.py
python file with your dataset directory as an argument.
PascalVOC 2012 dataset will be automatically downloaded to ./data/download
. After executing scripts, you can use pre-defined configuration.
Object detection¶
COCO 2017¶
Run coco2017.py
python file with your dataset directory as an argument.
COCO 2017 dataset will be automatically downloaded to ./data/download
. After executing scripts, you can use pre-defined configuration.
Objects365¶
Run objects365.py
python file with your dataset directory as an argument.
Objects365 dataset will be automatically downloaded to ./data/download/objects365
. After executing scripts, you can use pre-defined configuration. As the dataset is quite large, It is recommened to use multiprocess when you download it (e.g., --num_process 4
).
PascalVOC 2012¶
Run voc2012_det.py
python file with your dataset directory as an argument.
PascalVOC 2012 dataset will be automatically downloaded to ./data/download
. After executing scripts, you can use pre-defined configuration.
Pose estimation¶
WFLW¶
Run wflw.py
python file with your dataset directory as an argument.
WFLW dataset will be automatically downloaded to ./data/download
. After executing scripts, you can use pre-defined configuration.
Run NetsPresso Trainer¶
Now you can run NetsPresso Trainer with your local dataset!