MixTransformer¶
MixTransformer backbone based on SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers.
We provide the MixTransformer encoder (MiT), the backbone of SegFormer, as a freely usable backbone module. Users have the flexibility to configure the transformer encoder for each stage, enabling MiT-b0 to MiT-b5.
Field list¶
Field | Description |
---|---|
name |
(str) Name must be "mixtransformer" to use MixTransformer backbone. |
params.ffn_intermediate_expansion_ratio |
(int) Expansion factor to compute intermediate dimension in feed-forward network. |
params.ffn_act_type |
(str) Activation function for feed-forward network in the transformer block. Supporting activation functions are described in [here]. |
params.ffn_dropout_prob |
(float) Dropout probability for feed-forward network in the transformer block. |
params.attention_dropout_prob |
(float) Dropout probability for attention in the transformer block. |
stage_params[n].num_blocks |
(int) The number of transformer blocks in the encoder. |
stage_params[n].sequence_reduction_ratio |
(int) Sequence reduction ratio for multi-head attention. |
stage_params[n].encoder_chananels |
(int) Dimension for the transformer block. |
stage_params[n].embedding_patch_sizes |
(int) Kernel size for convolution layer in overlapping patch embedding. |
stage_params[n].embedding_strides |
(int) stride value for convolution layer in overlapping patch embedding. |
stage_params[n].num_attention_heads |
(int) The number of heads in the multi-head attention. |
Model configuration examples¶
MiT-b0
model:
architecture:
backbone:
name: mixtransformer
params:
ffn_intermediate_expansion_ratio: 4
ffn_act_type: "gelu"
ffn_dropout_prob: 0.0
attention_dropout_prob: 0.0
stage_params:
-
num_blocks: 2
sequence_reduction_ratio: 8
attention_chananels: 32
embedding_patch_sizes: 7
embedding_strides: 4
num_attention_heads: 1
-
num_blocks: 2
sequence_reduction_ratio: 4
attention_chananels: 64
embedding_patch_sizes: 3
embedding_strides: 2
num_attention_heads: 2
-
num_blocks: 2
sequence_reduction_ratio: 2
attention_chananels: 160
embedding_patch_sizes: 3
embedding_strides: 2
num_attention_heads: 5
-
num_blocks: 2
sequence_reduction_ratio: 1
attention_chananels: 256
embedding_patch_sizes: 3
embedding_strides: 2
num_attention_heads: 8