mmcls.apis¶
- mmcls.apis.inference_model(model, img)[源代码]¶
Inference image(s) with the classifier.
- 参数
model (nn.Module) – The loaded classifier.
img (str/ndarray) – The image filename or loaded image.
- 返回
- The classification results that contains
class_name, pred_label and pred_score.
- 返回类型
result (dict)
- mmcls.apis.init_model(config, checkpoint=None, device='cuda:0', options=None)[源代码]¶
Initialize a classifier from config file.
- 参数
config (str or
mmcv.Config) – Config file path or the config object.checkpoint (str, optional) – Checkpoint path. If left as None, the model will not load any weights.
options (dict) – Options to override some settings in the used config.
- 返回
The constructed classifier.
- 返回类型
nn.Module
- mmcls.apis.multi_gpu_test(model, data_loader, tmpdir=None, gpu_collect=False)[源代码]¶
Test model with multiple gpus.
This method tests model with multiple gpus and collects the results under two different modes: gpu and cpu modes. By setting ‘gpu_collect=True’ it encodes results to gpu tensors and use gpu communication for results collection. On cpu mode it saves the results on different gpus to ‘tmpdir’ and collects them by the rank 0 worker.
- 参数
model (nn.Module) – Model to be tested.
data_loader (nn.Dataloader) – Pytorch data loader.
tmpdir (str) – Path of directory to save the temporary results from different gpus under cpu mode.
gpu_collect (bool) – Option to use either gpu or cpu to collect results.
- 返回
The prediction results.
- 返回类型
list
- mmcls.apis.set_random_seed(seed, deterministic=False)[源代码]¶
Set random seed.
- 参数
seed (int) – Seed to be used.
deterministic (bool) – Whether to set the deterministic option for CUDNN backend, i.e., set torch.backends.cudnn.deterministic to True and torch.backends.cudnn.benchmark to False. Default: False.
- mmcls.apis.show_result_pyplot(model, img, result, fig_size=(15, 10), title='result', wait_time=0)[源代码]¶
Visualize the classification results on the image.
- 参数
model (nn.Module) – The loaded classifier.
img (str or np.ndarray) – Image filename or loaded image.
result (list) – The classification result.
fig_size (tuple) – Figure size of the pyplot figure. Defaults to (15, 10).
title (str) – Title of the pyplot figure. Defaults to ‘result’.
wait_time (int) – How many seconds to display the image. Defaults to 0.
mmcls.core¶
evaluation¶
- class mmcls.core.evaluation.DistEvalHook(dataloader, interval=1, gpu_collect=False, by_epoch=True, **eval_kwargs)[源代码]¶
Distributed evaluation hook.
- 参数
dataloader (DataLoader) – A PyTorch dataloader.
interval (int) – Evaluation interval (by epochs). Default: 1.
tmpdir (str, optional) – Temporary directory to save the results of all processes. Default: None.
gpu_collect (bool) – Whether to use gpu or cpu to collect results. Default: False.
- class mmcls.core.evaluation.EvalHook(dataloader, interval=1, by_epoch=True, **eval_kwargs)[源代码]¶
Evaluation hook.
- 参数
dataloader (DataLoader) – A PyTorch dataloader.
interval (int) – Evaluation interval (by epochs). Default: 1.
- mmcls.core.evaluation.average_performance(pred, target, thr=None, k=None)[源代码]¶
Calculate CP, CR, CF1, OP, OR, OF1, where C stands for per-class average, O stands for overall average, P stands for precision, R stands for recall and F1 stands for F1-score.
- 参数
pred (torch.Tensor | np.ndarray) – The model prediction with shape (N, C), where C is the number of classes.
target (torch.Tensor | np.ndarray) – The target of each prediction with shape (N, C), where C is the number of classes. 1 stands for positive examples, 0 stands for negative examples and -1 stands for difficult examples.
thr (float) – The confidence threshold. Defaults to None.
k (int) – Top-k performance. Note that if thr and k are both given, k will be ignored. Defaults to None.
- 返回
(CP, CR, CF1, OP, OR, OF1)
- 返回类型
tuple
- mmcls.core.evaluation.average_precision(pred, target)[源代码]¶
Calculate the average precision for a single class.
AP summarizes a precision-recall curve as the weighted mean of maximum precisions obtained for any r’>r, where r is the recall:
\[\text{AP} = \sum_n (R_n - R_{n-1}) P_n\]Note that no approximation is involved since the curve is piecewise constant.
- 参数
pred (np.ndarray) – The model prediction with shape (N, ).
target (np.ndarray) – The target of each prediction with shape (N, ).
- 返回
a single float as average precision value.
- 返回类型
float
- mmcls.core.evaluation.calculate_confusion_matrix(pred, target)[源代码]¶
Calculate confusion matrix according to the prediction and target.
- 参数
pred (torch.Tensor | np.array) – The model prediction with shape (N, C).
target (torch.Tensor | np.array) – The target of each prediction with shape (N, 1) or (N,).
- 返回
- Confusion matrix
The shape is (C, C), where C is the number of classes.
- 返回类型
torch.Tensor
- mmcls.core.evaluation.f1_score(pred, target, average_mode='macro', thrs=0.0)[源代码]¶
Calculate F1 score according to the prediction and target.
- 参数
pred (torch.Tensor | np.array) – The model prediction with shape (N, C).
target (torch.Tensor | np.array) – The target of each prediction with shape (N, 1) or (N,).
average_mode (str) – The type of averaging performed on the result. Options are ‘macro’ and ‘none’. If ‘none’, the scores for each class are returned. If ‘macro’, calculate metrics for each class, and find their unweighted mean. Defaults to ‘macro’.
thrs (Number | tuple[Number], optional) – Predictions with scores under the thresholds are considered negative. Default to 0.
- 返回
F1 score.
- 返回类型
float | np.array | list[float | np.array]
Args
thrsis numberthrsis tupleaverage_mode= “macro”float
list[float]
average_mode= “none”np.array
list[np.array]
- mmcls.core.evaluation.mAP(pred, target)[源代码]¶
Calculate the mean average precision with respect of classes.
- 参数
pred (torch.Tensor | np.ndarray) – The model prediction with shape (N, C), where C is the number of classes.
target (torch.Tensor | np.ndarray) – The target of each prediction with shape (N, C), where C is the number of classes. 1 stands for positive examples, 0 stands for negative examples and -1 stands for difficult examples.
- 返回
A single float as mAP value.
- 返回类型
float
- mmcls.core.evaluation.precision(pred, target, average_mode='macro', thrs=0.0)[源代码]¶
Calculate precision according to the prediction and target.
- 参数
pred (torch.Tensor | np.array) – The model prediction with shape (N, C).
target (torch.Tensor | np.array) – The target of each prediction with shape (N, 1) or (N,).
average_mode (str) – The type of averaging performed on the result. Options are ‘macro’ and ‘none’. If ‘none’, the scores for each class are returned. If ‘macro’, calculate metrics for each class, and find their unweighted mean. Defaults to ‘macro’.
thrs (Number | tuple[Number], optional) – Predictions with scores under the thresholds are considered negative. Default to 0.
- 返回
Precision.
- 返回类型
float | np.array | list[float | np.array]
Args
thrsis numberthrsis tupleaverage_mode= “macro”float
list[float]
average_mode= “none”np.array
list[np.array]
- mmcls.core.evaluation.precision_recall_f1(pred, target, average_mode='macro', thrs=0.0)[源代码]¶
Calculate precision, recall and f1 score according to the prediction and target.
- 参数
pred (torch.Tensor | np.array) – The model prediction with shape (N, C).
target (torch.Tensor | np.array) – The target of each prediction with shape (N, 1) or (N,).
average_mode (str) – The type of averaging performed on the result. Options are ‘macro’ and ‘none’. If ‘none’, the scores for each class are returned. If ‘macro’, calculate metrics for each class, and find their unweighted mean. Defaults to ‘macro’.
thrs (Number | tuple[Number], optional) – Predictions with scores under the thresholds are considered negative. Default to 0.
- 返回
tuple containing precision, recall, f1 score.
The type of precision, recall, f1 score is one of the following:
Args
thrsis numberthrsis tupleaverage_mode= “macro”float
list[float]
average_mode= “none”np.array
list[np.array]
- 返回类型
tuple
- mmcls.core.evaluation.recall(pred, target, average_mode='macro', thrs=0.0)[源代码]¶
Calculate recall according to the prediction and target.
- 参数
pred (torch.Tensor | np.array) – The model prediction with shape (N, C).
target (torch.Tensor | np.array) – The target of each prediction with shape (N, 1) or (N,).
average_mode (str) – The type of averaging performed on the result. Options are ‘macro’ and ‘none’. If ‘none’, the scores for each class are returned. If ‘macro’, calculate metrics for each class, and find their unweighted mean. Defaults to ‘macro’.
thrs (Number | tuple[Number], optional) – Predictions with scores under the thresholds are considered negative. Default to 0.
- 返回
Recall.
- 返回类型
float | np.array | list[float | np.array]
Args
thrsis numberthrsis tupleaverage_mode= “macro”float
list[float]
average_mode= “none”np.array
list[np.array]
- mmcls.core.evaluation.support(pred, target, average_mode='macro')[源代码]¶
Calculate the total number of occurrences of each label according to the prediction and target.
- 参数
pred (torch.Tensor | np.array) – The model prediction with shape (N, C).
target (torch.Tensor | np.array) – The target of each prediction with shape (N, 1) or (N,).
average_mode (str) – The type of averaging performed on the result. Options are ‘macro’ and ‘none’. If ‘none’, the scores for each class are returned. If ‘macro’, calculate metrics for each class, and find their unweighted sum. Defaults to ‘macro’.
- 返回
Support.
If the
average_modeis set to macro, the function returns a single float.If the
average_modeis set to none, the function returns a np.array with shape C.
- 返回类型
float | np.array
mmcls.models¶
models¶
classifiers¶
- class mmcls.models.classifiers.BaseClassifier(init_cfg=None)[源代码]¶
Base class for classifiers.
- forward(img, return_loss=True, **kwargs)[源代码]¶
Calls either forward_train or forward_test depending on whether return_loss=True.
Note this setting will change the expected inputs. When return_loss=True, img and img_meta are single-nested (i.e. Tensor and List[dict]), and when resturn_loss=False, img and img_meta should be double nested (i.e. List[Tensor], List[List[dict]]), with the outer list indicating test time augmentations.
- forward_test(imgs, **kwargs)[源代码]¶
- 参数
imgs (List[Tensor]) – the outer list indicates test-time augmentations and inner Tensor should have a shape NxCxHxW, which contains all images in the batch.
- abstract forward_train(imgs, **kwargs)[源代码]¶
- 参数
img (list[Tensor]) – List of tensors of shape (1, C, H, W). Typically these should be mean centered and std scaled.
kwargs (keyword arguments) – Specific to concrete implementation.
- show_result(img, result, text_color='white', font_scale=0.5, row_width=20, show=False, fig_size=(15, 10), win_name='', wait_time=0, out_file=None)[源代码]¶
Draw result over img.
- 参数
img (str or ndarray) – The image to be displayed.
result (dict) – The classification results to draw over img.
text_color (str or tuple or
Color) – Color of texts.font_scale (float) – Font scales of texts.
row_width (int) – width between each row of results on the image.
show (bool) – Whether to show the image. Default: False.
fig_size (tuple) – Image show figure size. Defaults to (15, 10).
win_name (str) – The window name.
wait_time (int) – How many seconds to display the image. Defaults to 0.
out_file (str or None) – The filename to write the image. Default: None.
- 返回
Image with overlaid results.
- 返回类型
img (ndarray)
- train_step(data, optimizer=None, **kwargs)[源代码]¶
The iteration step during training.
This method defines an iteration step during training, except for the back propagation and optimizer updating, which are done in an optimizer hook. Note that in some complicated cases or models, the whole process including back propagation and optimizer updating are also defined in this method, such as GAN.
- 参数
data (dict) – The output of dataloader.
optimizer (
torch.optim.Optimizer| dict, optional) – The optimizer of runner is passed totrain_step(). This argument is unused and reserved.
- 返回
- Dict of outputs. The following fields are contained.
loss (torch.Tensor): A tensor for back propagation, which can be a weighted sum of multiple losses.
log_vars (dict): Dict contains all the variables to be sent to the logger.
num_samples (int): Indicates the batch size (when the model is DDP, it means the batch size on each GPU), which is used for averaging the logs.
- 返回类型
dict
- val_step(data, optimizer=None, **kwargs)[源代码]¶
The iteration step during validation.
This method shares the same signature as
train_step(), but used during val epochs. Note that the evaluation after training epochs is not implemented with this method, but an evaluation hook.- 参数
data (dict) – The output of dataloader.
optimizer (
torch.optim.Optimizer| dict, optional) – The optimizer of runner is passed totrain_step(). This argument is unused and reserved.
- 返回
- Dict of outputs. The following fields are contained.
loss (torch.Tensor): A tensor for back propagation, which can be a weighted sum of multiple losses.
log_vars (dict): Dict contains all the variables to be sent to the logger.
num_samples (int): Indicates the batch size (when the model is DDP, it means the batch size on each GPU), which is used for averaging the logs.
- 返回类型
dict
- class mmcls.models.classifiers.ImageClassifier(backbone, neck=None, head=None, pretrained=None, train_cfg=None, init_cfg=None)[源代码]¶
-
- forward_train(img, gt_label, **kwargs)[源代码]¶
Forward computation during training.
- 参数
img (Tensor) – of shape (N, C, H, W) encoding input images. Typically these should be mean centered and std scaled.
gt_label (Tensor) – It should be of shape (N, 1) encoding the ground-truth label of input images for single label task. It shoulf be of shape (N, C) encoding the ground-truth label of input images for multi-labels task.
- 返回
a dictionary of loss components
- 返回类型
dict[str, Tensor]
backbones¶
- class mmcls.models.backbones.AlexNet(num_classes=- 1)[源代码]¶
AlexNet backbone.
The input for AlexNet is a 224x224 RGB image.
- 参数
num_classes (int) – number of classes for classification. The default value is -1, which uses the backbone as a feature extractor without the top classifier.
- class mmcls.models.backbones.LeNet5(num_classes=- 1)[源代码]¶
LeNet5 backbone.
The input for LeNet-5 is a 32×32 grayscale image.
- 参数
num_classes (int) – number of classes for classification. The default value is -1, which uses the backbone as a feature extractor without the top classifier.
- class mmcls.models.backbones.MlpMixer(arch='b', img_size=224, patch_size=16, out_indices=- 1, drop_rate=0.0, drop_path_rate=0.0, norm_cfg={'type': 'LN'}, act_cfg={'type': 'GELU'}, patch_cfg={}, layer_cfgs={}, init_cfg=None)[源代码]¶
Mlp-Mixer backbone.
Pytorch implementation of MLP-Mixer: An all-MLP Architecture for Vision
- 参数
arch (str | dict) – MLP Mixer architecture Defaults to ‘b’.
img_size (int | tuple) – Input image size.
patch_size (int | tuple) – The patch size.
out_indices (Sequence | int) – Output from which layer. Defaults to -1, means the last layer.
drop_rate (float) – Probability of an element to be zeroed. Defaults to 0.
drop_path_rate (float) – stochastic depth rate. Defaults to 0.
norm_cfg (dict) – Config dict for normalization layer. Defaults to
dict(type='LN').act_cfg (dict) – The activation config for FFNs. Default GELU.
patch_cfg (dict) – Configs of patch embeding. Defaults to an empty dict.
layer_cfgs (Sequence | dict) – Configs of each mixer block layer. Defaults to an empty dict.
init_cfg (dict, optional) – Initialization config dict. Defaults to None.
- class mmcls.models.backbones.MobileNetV2(widen_factor=1.0, out_indices=(7), frozen_stages=- 1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU6'}, norm_eval=False, with_cp=False, init_cfg=[{'type': 'Kaiming', 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[源代码]¶
MobileNetV2 backbone.
- 参数
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Default: 1.0.
out_indices (None or Sequence[int]) – Output from which stages. Default: (7, ).
frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.
conv_cfg (dict, optional) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU6’).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
- forward(x)[源代码]¶
Forward computation.
- 参数
x (tensor | tuple[tensor]) – x could be a Torch.tensor or a tuple of Torch.tensor, containing input data for forward computation.
- make_layer(out_channels, num_blocks, stride, expand_ratio)[源代码]¶
Stack InvertedResidual blocks to build a layer for MobileNetV2.
- 参数
out_channels (int) – out_channels of block.
num_blocks (int) – number of blocks.
stride (int) – stride of the first block. Default: 1
expand_ratio (int) – Expand the number of channels of the hidden layer in InvertedResidual by this ratio. Default: 6.
- class mmcls.models.backbones.MobileNetV3(arch='small', conv_cfg=None, norm_cfg={'eps': 0.001, 'momentum': 0.01, 'type': 'BN'}, out_indices=None, frozen_stages=- 1, norm_eval=False, with_cp=False, init_cfg=[{'type': 'Kaiming', 'layer': ['Conv2d'], 'nonlinearity': 'leaky_relu'}, {'type': 'Normal', 'layer': ['Linear'], 'std': 0.01}, {'type': 'Constant', 'layer': ['BatchNorm2d'], 'val': 1}])[源代码]¶
MobileNetV3 backbone.
- 参数
arch (str) – Architecture of mobilnetv3, from {small, large}. Default: small.
conv_cfg (dict, optional) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
out_indices (None or Sequence[int]) – Output from which stages. Default: None, which means output tensors from final stage.
frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
- class mmcls.models.backbones.RegNet(arch, in_channels=3, stem_channels=32, base_channels=32, strides=(2, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(3), style='pytorch', deep_stem=False, avg_down=False, frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=False, with_cp=False, zero_init_residual=True, init_cfg=None)[源代码]¶
RegNet backbone.
More details can be found in paper .
- 参数
arch (dict) – The parameter of RegNets. - w0 (int): initial width - wa (float): slope of width - wm (float): quantization parameter to quantize the width - depth (int): depth of the backbone - group_w (int): width of group - bot_mul (float): bottleneck ratio, i.e. expansion of bottleneck.
strides (Sequence[int]) – Strides of the first block of each stage.
base_channels (int) – Base channels after stem layer.
in_channels (int) – Number of input image channels. Default: 3.
dilations (Sequence[int]) – Dilation of each stage.
out_indices (Sequence[int]) – Output from which stages.
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer. Default: “pytorch”.
frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters. Default: -1.
norm_cfg (dict) – dictionary to construct and config norm layer. Default: dict(type=’BN’, requires_grad=True).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.
示例
>>> from mmcls.models import RegNet >>> import torch >>> self = RegNet( arch=dict( w0=88, wa=26.31, wm=2.25, group_w=48, depth=25, bot_mul=1.0)) >>> self.eval() >>> inputs = torch.rand(1, 3, 32, 32) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 96, 8, 8) (1, 192, 4, 4) (1, 432, 2, 2) (1, 1008, 1, 1)
- adjust_width_group(widths, bottleneck_ratio, groups)[源代码]¶
Adjusts the compatibility of widths and groups.
- 参数
widths (list[int]) – Width of each stage.
bottleneck_ratio (float) – Bottleneck ratio.
groups (int) – number of groups in each stage
- 返回
The adjusted widths and groups of each stage.
- 返回类型
tuple(list)
- forward(x)[源代码]¶
Forward computation.
- 参数
x (tensor | tuple[tensor]) – x could be a Torch.tensor or a tuple of Torch.tensor, containing input data for forward computation.
- generate_regnet(initial_width, width_slope, width_parameter, depth, divisor=8)[源代码]¶
Generates per block width from RegNet parameters.
- 参数
initial_width ([int]) – Initial width of the backbone
width_slope ([float]) – Slope of the quantized linear function
width_parameter ([int]) – Parameter used to quantize the width.
depth ([int]) – Depth of the backbone.
divisor (int) – The divisor of channels. Defaults to 8.
- 返回
- tuple containing:
list: Widths of each stage.
int: The number of stages.
- 返回类型
tuple
- class mmcls.models.backbones.RepVGG(arch, in_channels=3, base_channels=64, out_indices=(3), strides=(2, 2, 2, 2), dilations=(1, 1, 1, 1), frozen_stages=- 1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, with_cp=False, deploy=False, norm_eval=False, init_cfg=[{'type': 'Kaiming', 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[源代码]¶
RepVGG backbone.
A PyTorch impl of : RepVGG: Making VGG-style ConvNets Great Again
- 参数
arch (str | dict) –
The parameter of RepVGG. If it’s a dict, it should contain the following keys:
num_blocks (Sequence[int]): Number of blocks in each stage.
width_factor (Sequence[float]): Width deflator in each stage.
group_layer_map (dict | None): RepVGG Block that declares the need to apply group convolution.
se_cfg (dict | None): Se Layer config
in_channels (int) – Number of input image channels. Default: 3.
base_channels (int) – Base channels of RepVGG backbone, work with width_factor together. Default: 64.
out_indices (Sequence[int]) – Output from which stages. Default: (3, ).
strides (Sequence[int]) – Strides of the first block of each stage. Default: (2, 2, 2, 2).
dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 1, 1).
frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters. Default: -1.
conv_cfg (dict | None) – The config dict for conv layers. Default: None.
norm_cfg (dict) – The config dict for norm layers. Default: dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’).
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
deploy (bool) – Whether to switch the model structure to deployment mode. Default: False.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
init_cfg (dict or list[dict], optional) – Initialization config dict.
- class mmcls.models.backbones.Res2Net(scales=4, base_width=26, style='pytorch', deep_stem=True, avg_down=True, init_cfg=None, **kwargs)[源代码]¶
Res2Net backbone.
A PyTorch implement of : Res2Net: A New Multi-scale Backbone Architecture
- 参数
depth (int) – Depth of Res2Net, choose from {50, 101, 152}.
scales (int) – Scales used in Res2Net. Defaults to 4.
base_width (int) – Basic width of each scale. Defaults to 26.
in_channels (int) – Number of input image channels. Defaults to 3.
num_stages (int) – Number of Res2Net stages. Defaults to 4.
strides (Sequence[int]) – Strides of the first block of each stage. Defaults to
(1, 2, 2, 2).dilations (Sequence[int]) – Dilation of each stage. Defaults to
(1, 1, 1, 1).out_indices (Sequence[int]) – Output from which stages. Defaults to
(3, ).style (str) – “pytorch” or “caffe”. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer. Defaults to “pytorch”.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Defaults to True.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottle2neck. Defaults to True.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.
norm_cfg (dict) – Dictionary to construct and config norm layer. Defaults to
dict(type='BN', requires_grad=True).norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Defaults to False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Defaults to True.
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.
示例
>>> from mmcls.models import Res2Net >>> import torch >>> model = Res2Net(depth=50, ... scales=4, ... base_width=26, ... out_indices=(0, 1, 2, 3)) >>> model.eval() >>> inputs = torch.rand(1, 3, 32, 32) >>> level_outputs = model.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 256, 8, 8) (1, 512, 4, 4) (1, 1024, 2, 2) (1, 2048, 1, 1)
- class mmcls.models.backbones.ResNeSt(depth, groups=1, width_per_group=4, radix=2, reduction_factor=4, avg_down_stride=True, **kwargs)[源代码]¶
ResNeSt backbone.
Please refer to the paper for details.
- 参数
depth (int) – Network depth, from {50, 101, 152, 200}.
groups (int) – Groups of conv2 in Bottleneck. Default: 32.
width_per_group (int) – Width per group of conv2 in Bottleneck. Default: 4.
radix (int) – Radix of SpltAtConv2d. Default: 2
reduction_factor (int) – Reduction factor of SplitAttentionConv2d. Default: 4.
avg_down_stride (bool) – Whether to use average pool for stride in Bottleneck. Default: True.
in_channels (int) – Number of input image channels. Default: 3.
stem_channels (int) – Output channels of the stem layer. Default: 64.
num_stages (int) – Stages of the network. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage. Default:
(1, 2, 2, 2).dilations (Sequence[int]) – Dilation of each stage. Default:
(1, 1, 1, 1).out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default:
(3, ).style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
conv_cfg (dict | None) – The config dict for conv layers. Default: None.
norm_cfg (dict) – The config dict for norm layers.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.
- class mmcls.models.backbones.ResNeXt(depth, groups=32, width_per_group=4, **kwargs)[源代码]¶
ResNeXt backbone.
Please refer to the paper for details.
- 参数
depth (int) – Network depth, from {50, 101, 152}.
groups (int) – Groups of conv2 in Bottleneck. Default: 32.
width_per_group (int) – Width per group of conv2 in Bottleneck. Default: 4.
in_channels (int) – Number of input image channels. Default: 3.
stem_channels (int) – Output channels of the stem layer. Default: 64.
num_stages (int) – Stages of the network. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage. Default:
(1, 2, 2, 2).dilations (Sequence[int]) – Dilation of each stage. Default:
(1, 1, 1, 1).out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default:
(3, ).style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
conv_cfg (dict | None) – The config dict for conv layers. Default: None.
norm_cfg (dict) – The config dict for norm layers.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.
- class mmcls.models.backbones.ResNet(depth, in_channels=3, stem_channels=64, base_channels=64, expansion=None, num_stages=4, strides=(1, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(3), style='pytorch', deep_stem=False, avg_down=False, frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=False, with_cp=False, zero_init_residual=True, init_cfg=[{'type': 'Kaiming', 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[源代码]¶
ResNet backbone.
Please refer to the paper for details.
- 参数
depth (int) – Network depth, from {18, 34, 50, 101, 152}.
in_channels (int) – Number of input image channels. Default: 3.
stem_channels (int) – Output channels of the stem layer. Default: 64.
base_channels (int) – Middle channels of the first stage. Default: 64.
num_stages (int) – Stages of the network. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage. Default:
(1, 2, 2, 2).dilations (Sequence[int]) – Dilation of each stage. Default:
(1, 1, 1, 1).out_indices (Sequence[int]) – Output from which stages. Default:
(3, ).style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
conv_cfg (dict | None) – The config dict for conv layers. Default: None.
norm_cfg (dict) – The config dict for norm layers.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.
示例
>>> from mmcls.models import ResNet >>> import torch >>> self = ResNet(depth=18) >>> self.eval() >>> inputs = torch.rand(1, 3, 32, 32) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 64, 8, 8) (1, 128, 4, 4) (1, 256, 2, 2) (1, 512, 1, 1)
- class mmcls.models.backbones.ResNetV1d(**kwargs)[源代码]¶
ResNetV1d backbone.
This variant is described in Bag of Tricks..
Compared with default ResNet(ResNetV1b), ResNetV1d replaces the 7x7 conv in the input stem with three 3x3 convs. And in the downsampling block, a 2x2 avg_pool with stride 2 is added before conv, whose stride is changed to 1.
- class mmcls.models.backbones.ResNet_CIFAR(depth, deep_stem=False, **kwargs)[源代码]¶
ResNet backbone for CIFAR.
Compared to standard ResNet, it uses kernel_size=3 and stride=1 in conv1, and does not apply MaxPoolinng after stem. It has been proven to be more efficient than standard ResNet in other public codebase, e.g., https://github.com/kuangliu/pytorch-cifar/blob/master/models/resnet.py.
- 参数
depth (int) – Network depth, from {18, 34, 50, 101, 152}.
in_channels (int) – Number of input image channels. Default: 3.
stem_channels (int) – Output channels of the stem layer. Default: 64.
base_channels (int) – Middle channels of the first stage. Default: 64.
num_stages (int) – Stages of the network. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage. Default:
(1, 2, 2, 2).dilations (Sequence[int]) – Dilation of each stage. Default:
(1, 1, 1, 1).out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default:
(3, ).style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
deep_stem (bool) – This network has specific designed stem, thus it is asserted to be False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
conv_cfg (dict | None) – The config dict for conv layers. Default: None.
norm_cfg (dict) – The config dict for norm layers.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.
- class mmcls.models.backbones.SEResNeXt(depth, groups=32, width_per_group=4, **kwargs)[源代码]¶
SEResNeXt backbone.
Please refer to the paper for details.
- 参数
depth (int) – Network depth, from {50, 101, 152}.
groups (int) – Groups of conv2 in Bottleneck. Default: 32.
width_per_group (int) – Width per group of conv2 in Bottleneck. Default: 4.
se_ratio (int) – Squeeze ratio in SELayer. Default: 16.
in_channels (int) – Number of input image channels. Default: 3.
stem_channels (int) – Output channels of the stem layer. Default: 64.
num_stages (int) – Stages of the network. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage. Default:
(1, 2, 2, 2).dilations (Sequence[int]) – Dilation of each stage. Default:
(1, 1, 1, 1).out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default:
(3, ).style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
conv_cfg (dict | None) – The config dict for conv layers. Default: None.
norm_cfg (dict) – The config dict for norm layers.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.
- class mmcls.models.backbones.SEResNet(depth, se_ratio=16, **kwargs)[源代码]¶
SEResNet backbone.
Please refer to the paper for details.
- 参数
depth (int) – Network depth, from {50, 101, 152}.
se_ratio (int) – Squeeze ratio in SELayer. Default: 16.
in_channels (int) – Number of input image channels. Default: 3.
stem_channels (int) – Output channels of the stem layer. Default: 64.
num_stages (int) – Stages of the network. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage. Default:
(1, 2, 2, 2).dilations (Sequence[int]) – Dilation of each stage. Default:
(1, 1, 1, 1).out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default:
(3, ).style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
conv_cfg (dict | None) – The config dict for conv layers. Default: None.
norm_cfg (dict) – The config dict for norm layers.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.
示例
>>> from mmcls.models import SEResNet >>> import torch >>> self = SEResNet(depth=50) >>> self.eval() >>> inputs = torch.rand(1, 3, 224, 224) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 64, 56, 56) (1, 128, 28, 28) (1, 256, 14, 14) (1, 512, 7, 7)
- class mmcls.models.backbones.ShuffleNetV1(groups=3, widen_factor=1.0, out_indices=(2), frozen_stages=- 1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, norm_eval=False, with_cp=False, init_cfg=None)[源代码]¶
ShuffleNetV1 backbone.
- 参数
groups (int) – The number of groups to be used in grouped 1x1 convolutions in each ShuffleUnit. Default: 3.
widen_factor (float) – Width multiplier - adjusts the number of channels in each layer by this amount. Default: 1.0.
out_indices (Sequence[int]) – Output from which stages. Default: (2, )
frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.
conv_cfg (dict, optional) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
- forward(x)[源代码]¶
Forward computation.
- 参数
x (tensor | tuple[tensor]) – x could be a Torch.tensor or a tuple of Torch.tensor, containing input data for forward computation.
- make_layer(out_channels, num_blocks, first_block=False)[源代码]¶
Stack ShuffleUnit blocks to make a layer.
- 参数
out_channels (int) – out_channels of the block.
num_blocks (int) – Number of blocks.
first_block (bool) – Whether is the first ShuffleUnit of a sequential ShuffleUnits. Default: False, which means using the grouped 1x1 convolution.
- class mmcls.models.backbones.ShuffleNetV2(widen_factor=1.0, out_indices=(3), frozen_stages=- 1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, norm_eval=False, with_cp=False, init_cfg=None)[源代码]¶
ShuffleNetV2 backbone.
- 参数
widen_factor (float) – Width multiplier - adjusts the number of channels in each layer by this amount. Default: 1.0.
out_indices (Sequence[int]) – Output from which stages. Default: (0, 1, 2, 3).
frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.
conv_cfg (dict, optional) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
- class mmcls.models.backbones.SwinTransformer(arch='T', img_size=224, in_channels=3, drop_rate=0.0, drop_path_rate=0.1, out_indices=(3), use_abs_pos_embed=False, auto_pad=False, norm_cfg={'type': 'LN'}, stage_cfgs={}, patch_cfg={}, init_cfg=None)[源代码]¶
Swin Transformer A PyTorch implement of : Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Inspiration from https://github.com/microsoft/Swin-Transformer
- 参数
arch (str | dict) – Swin Transformer architecture Defaults to ‘T’.
img_size (int | tuple) – The size of input image. Defaults to 224.
in_channels (int) – The num of input channels. Defaults to 3.
drop_rate (float) – Dropout rate after embedding. Defaults to 0.
drop_path_rate (float) – Stochastic depth rate. Defaults to 0.1.
use_abs_pos_embed (bool) – If True, add absolute position embedding to the patch embedding. Defaults to False.
auto_pad (bool) – If True, auto pad feature map to fit window_size. Defaults to False.
norm_cfg (dict, optional) – Config dict for normalization layer at end of backone. Defaults to dict(type=’LN’)
stage_cfgs (Sequence | dict, optional) – Extra config dict for each stage. Defaults to empty dict.
patch_cfg (dict, optional) – Extra config dict for patch embedding. Defaults to empty dict.
init_cfg (dict, optional) – The Config for initialization. Defaults to None.
实际案例
>>> from mmcls.models import SwinTransformer >>> import torch >>> extra_config = dict( >>> arch='tiny', >>> stage_cfgs=dict(downsample_cfg={'kernel_size': 3, >>> 'expansion_ratio': 3}), >>> auto_pad=True) >>> self = SwinTransformer(**extra_config) >>> inputs = torch.rand(1, 3, 224, 224) >>> output = self.forward(inputs) >>> print(output.shape) (1, 2592, 4)
- class mmcls.models.backbones.T2T_ViT(img_size=224, in_channels=3, embed_dims=384, t2t_cfg={}, drop_rate=0.0, num_layers=14, out_indices=- 1, layer_cfgs={}, drop_path_rate=0.0, norm_cfg={'type': 'LN'}, final_norm=True, output_cls_token=True, init_cfg=None)[源代码]¶
Tokens-to-Token Vision Transformer (T2T-ViT)
A PyTorch implementation of `Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet<https://arxiv.org/abs/2101.11986>`_
- 参数
img_size (int) – Input image size.
in_channels (int) – Number of input channels.
embed_dims (int) – Embedding dimension.
t2t_cfg (dict) – Extra config of Tokens-to-Token module. Defaults to an empty dict.
drop_rate (float) – Dropout rate after position embedding. Defaults to 0.
num_layers (int) – Num of transformer layers in encoder. Defaults to 14.
out_indices (Sequence | int) – Output from which stages. Defaults to -1, means the last stage.
layer_cfgs (Sequence | dict) – Configs of each transformer layer in encoder. Defaults to an empty dict.
drop_path_rate (float) – stochastic depth rate. Defaults to 0.
norm_cfg (dict) – Config dict for normalization layer. Defaults to
dict(type='LN').final_norm (bool) – Whether to add a additional layer to normalize final feature map. Defaults to True.
output_cls_token (bool) – Whether output the cls_token. Defaults to True.
init_cfg (dict, optional) – The Config for initialization. Defaults to None.
- class mmcls.models.backbones.TIMMBackbone(model_name, pretrained=False, checkpoint_path='', in_channels=3, init_cfg=None, **kwargs)[源代码]¶
Wrapper to use backbones from timm library. More details can be found in timm .
- 参数
model_name (str) – Name of timm model to instantiate.
pretrained (bool) – Load pretrained weights if True.
checkpoint_path (str) – Path of checkpoint to load after model is initialized.
in_channels (int) – Number of input image channels. Default: 3.
init_cfg (dict, optional) – Initialization config dict
**kwargs – Other timm & model specific arguments.
- class mmcls.models.backbones.TNT(arch='b', img_size=224, patch_size=16, in_channels=3, ffn_ratio=4, qkv_bias=False, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.0, act_cfg={'type': 'GELU'}, norm_cfg={'type': 'LN'}, first_stride=4, num_fcs=2, init_cfg=[{'type': 'TruncNormal', 'layer': 'Linear', 'std': 0.02}, {'type': 'Constant', 'layer': 'LayerNorm', 'val': 1.0, 'bias': 0.0}])[源代码]¶
Transformer in Transformer A PyTorch implement of : Transformer in Transformer
Inspiration from https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/tnt.py
- 参数
arch (str | dict) – Vision Transformer architecture Default: ‘b’
img_size (int | tuple) – Input image size. Default to 224
patch_size (int | tuple) – The patch size. Deault to 16
in_channels (int) – Number of input channels. Default to 3
ffn_ratio (int) – A ratio to calculate the hidden_dims in ffn layer. Default: 4
qkv_bias (bool) – Enable bias for qkv if True. Default False
drop_rate (float) – Probability of an element to be zeroed after the feed forward layer. Default 0.
attn_drop_rate (float) – The drop out rate for attention layer. Default 0.
drop_path_rate (float) – stochastic depth rate. Default 0.
act_cfg (dict) – The activation config for FFNs. Defaults to GELU.
norm_cfg (dict) – Config dict for normalization layer. Default layer normalization
first_stride (int) – The stride of the conv2d layer. We use a conv2d layer and a unfold layer to implement image to pixel embedding.
num_fcs (int) – The number of fully-connected layers for FFNs. Default 2
init_cfg (dict, optional) – Initialization config dict
- class mmcls.models.backbones.VGG(depth, num_classes=- 1, num_stages=5, dilations=(1, 1, 1, 1, 1), out_indices=None, frozen_stages=- 1, conv_cfg=None, norm_cfg=None, act_cfg={'type': 'ReLU'}, norm_eval=False, ceil_mode=False, with_last_pool=True, init_cfg=[{'type': 'Kaiming', 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1.0, 'layer': ['_BatchNorm']}, {'type': 'Normal', 'std': 0.01, 'layer': ['Linear']}])[源代码]¶
VGG backbone.
- 参数
depth (int) – Depth of vgg, from {11, 13, 16, 19}.
with_norm (bool) – Use BatchNorm or not.
num_classes (int) – number of classes for classification.
num_stages (int) – VGG stages, normally 5.
dilations (Sequence[int]) – Dilation of each stage.
out_indices (Sequence[int], optional) – Output from which stages. When it is None, the default behavior depends on whether num_classes is specified. If num_classes <= 0, the default value is (4, ), output the last feature map before classifier. If num_classes > 0, the default value is (5, ), output the classification score. Default: None.
frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
ceil_mode (bool) – Whether to use ceil_mode of MaxPool. Default: False.
with_last_pool (bool) – Whether to keep the last pooling before classifier. Default: True.
- class mmcls.models.backbones.VisionTransformer(arch='b', img_size=224, patch_size=16, out_indices=- 1, drop_rate=0.0, drop_path_rate=0.0, norm_cfg={'eps': 1e-06, 'type': 'LN'}, final_norm=True, output_cls_token=True, interpolate_mode='bicubic', patch_cfg={}, layer_cfgs={}, init_cfg=None)[源代码]¶
Vision Transformer.
A PyTorch implement of : `An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale<https://arxiv.org/abs/2010.11929>`_
- 参数
arch (str | dict) – Vision Transformer architecture Default: ‘b’
img_size (int | tuple) – Input image size
patch_size (int | tuple) – The patch size
out_indices (Sequence | int) – Output from which stages. Defaults to -1, means the last stage.
drop_rate (float) – Probability of an element to be zeroed. Defaults to 0.
drop_path_rate (float) – stochastic depth rate. Defaults to 0.
norm_cfg (dict) – Config dict for normalization layer. Defaults to
dict(type='LN').final_norm (bool) – Whether to add a additional layer to normalize final feature map. Defaults to True.
output_cls_token (bool) – Whether output the cls_token. If set True, with_cls_token must be True. Defaults to True.
interpolate_mode (str) – Select the interpolate mode for position embeding vector resize. Defaults to “bicubic”.
patch_cfg (dict) – Configs of patch embeding. Defaults to an empty dict.
layer_cfgs (Sequence | dict) – Configs of each transformer layer in encoder. Defaults to an empty dict.
init_cfg (dict, optional) – Initialization config dict. Defaults to None.
- forward(x)[源代码]¶
Forward computation.
- 参数
x (tensor | tuple[tensor]) – x could be a Torch.tensor or a tuple of Torch.tensor, containing input data for forward computation.
- static resize_pos_embed(pos_embed, src_shape, dst_shape, mode='bicubic')[源代码]¶
Resize pos_embed weights.
- 参数
pos_embed (torch.Tensor) – Position embedding weights with shape [1, L, C].
src_shape (tuple) – The resolution of downsampled origin training image.
dst_shape (tuple) – The resolution of downsampled new training image.
mode (str) – Algorithm used for upsampling:
'nearest'|'linear'|'bilinear'|'bicubic'|'trilinear'. Default:'bicubic'
- 返回
The resized pos_embed of shape [1, L_new, C]
- 返回类型
torch.Tensor
heads¶
- class mmcls.models.heads.ClsHead(loss={'loss_weight': 1.0, 'type': 'CrossEntropyLoss'}, topk=(1), cal_acc=False, init_cfg=None)[源代码]¶
classification head.
- 参数
loss (dict) – Config of classification loss.
topk (int | tuple) – Top-k accuracy.
cal_acc (bool) – Whether to calculate accuracy during training. If you use Mixup/CutMix or something like that during training, it is not reasonable to calculate accuracy. Defaults to False.
- class mmcls.models.heads.LinearClsHead(num_classes, in_channels, init_cfg={'layer': 'Linear', 'std': 0.01, 'type': 'Normal'}, *args, **kwargs)[源代码]¶
Linear classifier head.
- 参数
num_classes (int) – Number of categories excluding the background category.
in_channels (int) – Number of channels in the input feature map.
init_cfg (dict | optional) – The extra init config of layers. Defaults to use dict(type=’Normal’, layer=’Linear’, std=0.01).
- class mmcls.models.heads.MultiLabelClsHead(loss={'loss_weight': 1.0, 'reduction': 'mean', 'type': 'CrossEntropyLoss', 'use_sigmoid': True}, init_cfg=None)[源代码]¶
Classification head for multilabel task.
- 参数
loss (dict) – Config of classification loss.
- class mmcls.models.heads.MultiLabelLinearClsHead(num_classes, in_channels, loss={'loss_weight': 1.0, 'reduction': 'mean', 'type': 'CrossEntropyLoss', 'use_sigmoid': True}, init_cfg={'layer': 'Linear', 'std': 0.01, 'type': 'Normal'})[源代码]¶
Linear classification head for multilabel task.
- 参数
num_classes (int) – Number of categories.
in_channels (int) – Number of channels in the input feature map.
loss (dict) – Config of classification loss.
init_cfg (dict | optional) – The extra init config of layers. Defaults to use dict(type=’Normal’, layer=’Linear’, std=0.01).
- class mmcls.models.heads.StackedLinearClsHead(num_classes: int, in_channels: int, mid_channels: Sequence, dropout_rate: float = 0.0, norm_cfg: Optional[Dict] = None, act_cfg: Dict = {'type': 'ReLU'}, **kwargs)[源代码]¶
Classifier head with several hidden fc layer and a output fc layer.
- 参数
num_classes (int) – Number of categories excluding the background category.
in_channels (int) – Number of channels in the input feature map.
mid_channels (Sequence) – Number of channels in the hidden fc layers.
dropout_rate (float) – Dropout rate after each hidden fc layer, except the last layer. Defaults to 0.
norm_cfg (dict, optional) – Config dict of normalization layer after each hidden fc layer, except the last layer. Defaults to None.
act_cfg (dict, optional) – Config dict of activation function after each hidden layer, except the last layer. Defaults to use “ReLU”.
- class mmcls.models.heads.VisionTransformerClsHead(num_classes, in_channels, hidden_dim=None, act_cfg={'type': 'Tanh'}, init_cfg={'layer': 'Linear', 'type': 'Constant', 'val': 0}, *args, **kwargs)[源代码]¶
Vision Transformer classifier head.
- 参数
num_classes (int) – Number of categories excluding the background category.
in_channels (int) – Number of channels in the input feature map.
hidden_dim (int) – Number of the dimensions for hidden layer. Only available during pre-training. Default None.
act_cfg (dict) – The activation config. Only available during pre-training. Defaults to Tanh.
necks¶
- class mmcls.models.necks.GlobalAveragePooling(dim=2)[源代码]¶
Global Average Pooling neck.
Note that we use view to remove extra channel after pooling. We do not use squeeze as it will also remove the batch dimension when the tensor has a batch dimension of size 1, which can lead to unexpected errors.
- 参数
dim (int) – Dimensions of each sample channel, can be one of {1, 2, 3}. Default: 2
- forward(inputs)[源代码]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
注解
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
losses¶
- class mmcls.models.losses.AsymmetricLoss(gamma_pos=0.0, gamma_neg=4.0, clip=0.05, reduction='mean', loss_weight=1.0)[源代码]¶
asymmetric loss.
- 参数
gamma_pos (float) – positive focusing parameter. Defaults to 0.0.
gamma_neg (float) – Negative focusing parameter. We usually set gamma_neg > gamma_pos. Defaults to 4.0.
clip (float, optional) – Probability margin. Defaults to 0.05.
reduction (str) – The method used to reduce the loss into a scalar.
loss_weight (float) – Weight of loss. Defaults to 1.0.
- class mmcls.models.losses.CrossEntropyLoss(use_sigmoid=False, use_soft=False, reduction='mean', loss_weight=1.0, class_weight=None, pos_weight=None)[源代码]¶
Cross entropy loss.
- 参数
use_sigmoid (bool) – Whether the prediction uses sigmoid of softmax. Defaults to False.
use_soft (bool) – Whether to use the soft version of CrossEntropyLoss. Defaults to False.
reduction (str) – The method used to reduce the loss. Options are “none”, “mean” and “sum”. Defaults to ‘mean’.
loss_weight (float) – Weight of the loss. Defaults to 1.0.
class_weight (List[float], optional) – The weight for each class with shape (C), C is the number of classes. Default None.
pos_weight (List[float], optional) – The positive weight for each class with shape (C), C is the number of classes. Only enabled in BCE loss when
use_sigmoidis True. Default None.
- forward(cls_score, label, weight=None, avg_factor=None, reduction_override=None, **kwargs)[源代码]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
注解
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmcls.models.losses.FocalLoss(gamma=2.0, alpha=0.25, reduction='mean', loss_weight=1.0)[源代码]¶
Focal loss.
- 参数
gamma (float) – Focusing parameter in focal loss. Defaults to 2.0.
alpha (float) – The parameter in balanced form of focal loss. Defaults to 0.25.
reduction (str) – The method used to reduce the loss into a scalar. Options are “none” and “mean”. Defaults to ‘mean’.
loss_weight (float) – Weight of loss. Defaults to 1.0.
- forward(pred, target, weight=None, avg_factor=None, reduction_override=None)[源代码]¶
Sigmoid focal loss.
- 参数
pred (torch.Tensor) – The prediction with shape (N, *).
target (torch.Tensor) – The ground truth label of the prediction with shape (N, *), N or (N,1).
weight (torch.Tensor, optional) – Sample-wise loss weight with shape (N, *). Defaults to None.
avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.
reduction_override (str, optional) – The method used to reduce the loss into a scalar. Options are “none”, “mean” and “sum”. Defaults to None.
- 返回
Loss.
- 返回类型
torch.Tensor
- class mmcls.models.losses.LabelSmoothLoss(label_smooth_val, num_classes=None, mode=None, reduction='mean', loss_weight=1.0)[源代码]¶
Initializer for the label smoothed cross entropy loss.
Refers to Rethinking the Inception Architecture for Computer Vision
This decreases gap between output scores and encourages generalization. Labels provided to forward can be one-hot like vectors (NxC) or class indices (Nx1). And this accepts linear combination of one-hot like labels from mixup or cutmix except multi-label task.
- 参数
label_smooth_val (float) – The degree of label smoothing.
num_classes (int, optional) – Number of classes. Defaults to None.
mode (str) – Refers to notes, Options are ‘original’, ‘classy_vision’, ‘multi_label’. Defaults to ‘classy_vision’
reduction (str) – The method used to reduce the loss. Options are “none”, “mean” and “sum”. Defaults to ‘mean’.
loss_weight (float) – Weight of the loss. Defaults to 1.0.
提示
if the mode is “original”, this will use the same label smooth method as the original paper as:
\[(1-\epsilon)\delta_{k, y} + \frac{\epsilon}{K}\]where epsilon is the label_smooth_val, K is the num_classes and delta(k,y) is Dirac delta, which equals 1 for k=y and 0 otherwise.
if the mode is “classy_vision”, this will use the same label smooth method as the facebookresearch/ClassyVision repo as:
\[\frac{\delta_{k, y} + \epsilon/K}{1+\epsilon}\]if the mode is “multi_label”, this will accept labels from multi-label task and smoothing them as:
\[(1-2\epsilon)\delta_{k, y} + \epsilon\]- forward(cls_score, label, weight=None, avg_factor=None, reduction_override=None, **kwargs)[源代码]¶
Label smooth loss.
- 参数
pred (torch.Tensor) – The prediction with shape (N, *).
label (torch.Tensor) – The ground truth label of the prediction with shape (N, *).
weight (torch.Tensor, optional) – Sample-wise loss weight with shape (N, *). Defaults to None.
avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.
reduction_override (str, optional) – The method used to reduce the loss into a scalar. Options are “none”, “mean” and “sum”. Defaults to None.
- 返回
Loss.
- 返回类型
torch.Tensor
- class mmcls.models.losses.SeesawLoss(use_sigmoid=False, p=0.8, q=2.0, num_classes=1000, eps=0.01, reduction='mean', loss_weight=1.0)[源代码]¶
Implementation of seesaw loss.
Refers to Seesaw Loss for Long-Tailed Instance Segmentation (CVPR 2021)
- 参数
use_sigmoid (bool) – Whether the prediction uses sigmoid of softmax. Only False is supported. Defaults to False.
p (float) – The
pin the mitigation factor. Defaults to 0.8.q (float) – The
qin the compenstation factor. Defaults to 2.0.num_classes (int) – The number of classes. Default to 1000 for the ImageNet dataset.
eps (float) – The minimal value of divisor to smooth the computation of compensation factor, default to 1e-2.
reduction (str) – The method that reduces the loss to a scalar. Options are “none”, “mean” and “sum”. Default to “mean”.
loss_weight (float) – The weight of the loss. Defaults to 1.0
- forward(cls_score, labels, weight=None, avg_factor=None, reduction_override=None)[源代码]¶
Forward function.
- 参数
cls_score (torch.Tensor) – The prediction with shape (N, C).
labels (torch.Tensor) – The learning label of the prediction.
weight (torch.Tensor, optional) – Sample-wise loss weight.
avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.
reduction (str, optional) – The method used to reduce the loss. Options are “none”, “mean” and “sum”.
- 返回
The calculated loss
- 返回类型
torch.Tensor
- mmcls.models.losses.accuracy(pred, target, topk=1, thrs=0.0)[源代码]¶
Calculate accuracy according to the prediction and target.
- 参数
pred (torch.Tensor | np.array) – The model prediction.
target (torch.Tensor | np.array) – The target of each prediction
topk (int | tuple[int]) – If the predictions in
topkmatches the target, the predictions will be regarded as correct ones. Defaults to 1.thrs (Number | tuple[Number], optional) – Predictions with scores under the thresholds are considered negative. Default to 0.
- 返回
- Accuracy
float: If both
topkandthrsis a single value.list[float]: If one of
topkorthrsis a tuple.list[list[float]]: If both
topkandthrsis a tuple. And the first dim istopk, the second dim isthrs.
- 返回类型
float | list[float] | list[list[float]]
- mmcls.models.losses.asymmetric_loss(pred, target, weight=None, gamma_pos=1.0, gamma_neg=4.0, clip=0.05, reduction='mean', avg_factor=None)[源代码]¶
asymmetric loss.
Please refer to the paper for details.
- 参数
pred (torch.Tensor) – The prediction with shape (N, *).
target (torch.Tensor) – The ground truth label of the prediction with shape (N, *).
weight (torch.Tensor, optional) – Sample-wise loss weight with shape (N, ). Defaults to None.
gamma_pos (float) – positive focusing parameter. Defaults to 0.0.
gamma_neg (float) – Negative focusing parameter. We usually set gamma_neg > gamma_pos. Defaults to 4.0.
clip (float, optional) – Probability margin. Defaults to 0.05.
reduction (str) – The method used to reduce the loss. Options are “none”, “mean” and “sum”. If reduction is ‘none’ , loss is same shape as pred and label. Defaults to ‘mean’.
avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.
- 返回
Loss.
- 返回类型
torch.Tensor
- mmcls.models.losses.binary_cross_entropy(pred, label, weight=None, reduction='mean', avg_factor=None, class_weight=None, pos_weight=None)[源代码]¶
Calculate the binary CrossEntropy loss with logits.
- 参数
pred (torch.Tensor) – The prediction with shape (N, *).
label (torch.Tensor) – The gt label with shape (N, *).
weight (torch.Tensor, optional) – Element-wise weight of loss with shape (N, ). Defaults to None.
reduction (str) – The method used to reduce the loss. Options are “none”, “mean” and “sum”. If reduction is ‘none’ , loss is same shape as pred and label. Defaults to ‘mean’.
avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.
class_weight (torch.Tensor, optional) – The weight for each class with shape (C), C is the number of classes. Default None.
pos_weight (torch.Tensor, optional) – The positive weight for each class with shape (C), C is the number of classes. Default None.
- 返回
The calculated loss
- 返回类型
torch.Tensor
- mmcls.models.losses.convert_to_one_hot(targets: torch.Tensor, classes) → torch.Tensor[源代码]¶
This function converts target class indices to one-hot vectors, given the number of classes.
- 参数
targets (Tensor) – The ground truth label of the prediction with shape (N, 1)
classes (int) – the number of classes.
- 返回
Processed loss values.
- 返回类型
Tensor
- mmcls.models.losses.cross_entropy(pred, label, weight=None, reduction='mean', avg_factor=None, class_weight=None)[源代码]¶
Calculate the CrossEntropy loss.
- 参数
pred (torch.Tensor) – The prediction with shape (N, C), C is the number of classes.
label (torch.Tensor) – The gt label of the prediction.
weight (torch.Tensor, optional) – Sample-wise loss weight.
reduction (str) – The method used to reduce the loss.
avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.
class_weight (torch.Tensor, optional) – The weight for each class with shape (C), C is the number of classes. Default None.
- 返回
The calculated loss
- 返回类型
torch.Tensor
- mmcls.models.losses.reduce_loss(loss, reduction)[源代码]¶
Reduce loss as specified.
- 参数
loss (Tensor) – Elementwise loss tensor.
reduction (str) – Options are “none”, “mean” and “sum”.
- 返回
Reduced loss tensor.
- 返回类型
Tensor
- mmcls.models.losses.sigmoid_focal_loss(pred, target, weight=None, gamma=2.0, alpha=0.25, reduction='mean', avg_factor=None)[源代码]¶
Sigmoid focal loss.
- 参数
pred (torch.Tensor) – The prediction with shape (N, *).
target (torch.Tensor) – The ground truth label of the prediction with shape (N, *).
weight (torch.Tensor, optional) – Sample-wise loss weight with shape (N, ). Defaults to None.
gamma (float) – The gamma for calculating the modulating factor. Defaults to 2.0.
alpha (float) – A balanced form for Focal Loss. Defaults to 0.25.
reduction (str) – The method used to reduce the loss. Options are “none”, “mean” and “sum”. If reduction is ‘none’ , loss is same shape as pred and label. Defaults to ‘mean’.
avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.
- 返回
Loss.
- 返回类型
torch.Tensor
- mmcls.models.losses.weight_reduce_loss(loss, weight=None, reduction='mean', avg_factor=None)[源代码]¶
Apply element-wise weight and reduce loss.
- 参数
loss (Tensor) – Element-wise loss.
weight (Tensor) – Element-wise weights.
reduction (str) – Same as built-in losses of PyTorch.
avg_factor (float) – Average factor when computing the mean of losses.
- 返回
Processed loss values.
- 返回类型
Tensor
- mmcls.models.losses.weighted_loss(loss_func)[源代码]¶
Create a weighted version of a given loss function.
To use this decorator, the loss function must have the signature like
loss_func(pred, target, **kwargs). The function only needs to compute element-wise loss without any reduction. This decorator will add weight and reduction arguments to the function. The decorated function will have the signature likeloss_func(pred, target, weight=None, reduction='mean', avg_factor=None, **kwargs).- Example
>>> import torch >>> @weighted_loss >>> def l1_loss(pred, target): >>> return (pred - target).abs()
>>> pred = torch.Tensor([0, 2, 3]) >>> target = torch.Tensor([1, 1, 1]) >>> weight = torch.Tensor([1, 0, 1])
>>> l1_loss(pred, target) tensor(1.3333) >>> l1_loss(pred, target, weight) tensor(1.) >>> l1_loss(pred, target, reduction='none') tensor([1., 1., 2.]) >>> l1_loss(pred, target, weight, avg_factor=2) tensor(1.5000)
utils¶
- class mmcls.models.utils.Augments(augments_cfg)[源代码]¶
Data augments.
We implement some data augmentation methods, such as mixup, cutmix.
- 参数
(list[mmcv.ConfigDict] | obj (augments_cfg) – mmcv.ConfigDict): Config dict of augments
示例
>>> augments_cfg = [ dict(type='BatchCutMix', alpha=1., num_classes=10, prob=0.5), dict(type='BatchMixup', alpha=1., num_classes=10, prob=0.3) ] >>> augments = Augments(augments_cfg) >>> imgs = torch.randn(16, 3, 32, 32) >>> label = torch.randint(0, 10, (16, )) >>> imgs, label = augments(imgs, label)
To decide which augmentation within Augments block is used the following rule is applied. We pick augmentation based on the probabilities. In the example above, we decide if we should use BatchCutMix with probability 0.5, BatchMixup 0.3. As Identity is not in augments_cfg, we use Identity with probability 1 - 0.5 - 0.3 = 0.2.
- class mmcls.models.utils.HybridEmbed(backbone, img_size=224, feature_size=None, in_channels=3, embed_dims=768, conv_cfg=None, init_cfg=None)[源代码]¶
CNN Feature Map Embedding.
Extract feature map from CNN, flatten, project to embedding dim.
- 参数
backbone (nn.Module) – CNN backbone
img_size (int | tuple) – The size of input image. Default: 224
feature_size (int | tuple, optional) – Size of feature map extracted by CNN backbone. Default: None
in_channels (int) – The num of input channels. Default: 3
embed_dims (int) – The dimensions of embedding. Default: 768
conv_cfg (dict, optional) – The config dict for conv layers. Default: None.
init_cfg (mmcv.ConfigDict, optional) – The Config for initialization. Default: None.
- forward(x)[源代码]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
注解
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmcls.models.utils.InvertedResidual(in_channels, out_channels, mid_channels, kernel_size=3, stride=1, se_cfg=None, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, with_cp=False, init_cfg=None)[源代码]¶
Inverted Residual Block.
- 参数
in_channels (int) – The input channels of this Module.
out_channels (int) – The output channels of this Module.
mid_channels (int) – The input channels of the depthwise convolution.
kernel_size (int) – The kernel size of the depthwise convolution. Default: 3.
stride (int) – The stride of the depthwise convolution. Default: 1.
se_cfg (dict) – Config dict for se layer. Default: None, which means no se layer.
conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’).
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
- 返回
The output tensor.
- 返回类型
Tensor
- forward(x)[源代码]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
注解
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmcls.models.utils.MultiheadAttention(embed_dims, num_heads, input_dims=None, attn_drop=0.0, proj_drop=0.0, dropout_layer={'drop_prob': 0.0, 'type': 'Dropout'}, qkv_bias=True, qk_scale=None, proj_bias=True, v_shortcut=False, init_cfg=None)[源代码]¶
Multi-head Attention Module.
This module implements multi-head attention that supports different input dims and embed dims. And it also supports a shortcut from
value, which is useful if input dims is not the same with embed dims.- 参数
embed_dims (int) – The embedding dimension.
num_heads (int) – Parallel attention heads.
input_dims (int, optional) – The input dimension, and if None, use
embed_dims. Defaults to None.attn_drop (float) – Dropout rate of the dropout layer after the attention calculation of query and key. Defaults to 0.
proj_drop (float) – Dropout rate of the dropout layer after the output projection. Defaults to 0.
dropout_layer (dict) – The dropout config before adding the shortcut. Defaults to
dict(type='Dropout', drop_prob=0.).qkv_bias (bool) – If True, add a learnable bias to q, k, v. Defaults to True.
qk_scale (float, optional) – Override default qk scale of
head_dim ** -0.5if set. Defaults to None.proj_bias (bool) – Defaults to True.
v_shortcut (bool) – Add a shortcut from value to output. It’s usually used if
input_dimsis different fromembed_dims. Defaults to False.init_cfg (dict, optional) – The Config for initialization. Defaults to None.
- forward(x)[源代码]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
注解
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmcls.models.utils.PatchEmbed(img_size=224, in_channels=3, embed_dims=768, norm_cfg=None, conv_cfg=None, init_cfg=None)[源代码]¶
Image to Patch Embedding.
We use a conv layer to implement PatchEmbed.
- 参数
img_size (int | tuple) – The size of input image. Default: 224
in_channels (int) – The num of input channels. Default: 3
embed_dims (int) – The dimensions of embedding. Default: 768
norm_cfg (dict, optional) – Config dict for normalization layer. Default: None
conv_cfg (dict, optional) – The config dict for conv layers. Default: None
init_cfg (mmcv.ConfigDict, optional) – The Config for initialization. Default: None
- forward(x)[源代码]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
注解
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmcls.models.utils.PatchMerging(input_resolution, in_channels, expansion_ratio, kernel_size=2, stride=None, padding=0, dilation=1, bias=False, norm_cfg={'type': 'LN'}, init_cfg=None)[源代码]¶
Merge patch feature map.
This layer use nn.Unfold to group feature map by kernel_size, and use norm and linear layer to embed grouped feature map.
- 参数
input_resolution (tuple) – The size of input patch resolution.
in_channels (int) – The num of input channels.
expansion_ratio (Number) – Expansion ratio of output channels. The num of output channels is equal to int(expansion_ratio * in_channels).
kernel_size (int | tuple, optional) – the kernel size in the unfold layer. Defaults to 2.
stride (int | tuple, optional) – the stride of the sliding blocks in the unfold layer. Defaults to be equal with kernel_size.
padding (int | tuple, optional) – zero padding width in the unfold layer. Defaults to 0.
dilation (int | tuple, optional) – dilation parameter in the unfold layer. Defaults to 1.
bias (bool, optional) – Whether to add bias in linear layer or not. Defaults to False.
norm_cfg (dict, optional) – Config dict for normalization layer. Defaults to dict(type=’LN’).
init_cfg (dict, optional) – The extra config for initialization. Defaults to None.
- class mmcls.models.utils.SELayer(channels, squeeze_channels=None, ratio=16, divisor=8, bias='auto', conv_cfg=None, act_cfg=({'type': 'ReLU'}, {'type': 'Sigmoid'}), init_cfg=None)[源代码]¶
Squeeze-and-Excitation Module.
- 参数
channels (int) – The input (and output) channels of the SE layer.
squeeze_channels (None or int) – The intermediate channel number of SElayer. Default: None, means the value of
squeeze_channelsismake_divisible(channels // ratio, divisor).ratio (int) – Squeeze ratio in SELayer, the intermediate channel will be
make_divisible(channels // ratio, divisor). Only used whensqueeze_channelsis None. Default: 16.divisor (int) – The divisor to true divide the channel number. Only used when
squeeze_channelsis None. Default: 8.conv_cfg (None or dict) – Config dict for convolution layer. Default: None, which means using conv2d.
act_cfg (dict or Sequence[dict]) – Config dict for activation layer. If act_cfg is a dict, two activation layers will be configurated by this dict. If act_cfg is a sequence of dicts, the first activation layer will be configurated by the first dict and the second activation layer will be configurated by the second dict. Default: (dict(type=’ReLU’), dict(type=’Sigmoid’))
- forward(x)[源代码]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
注解
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmcls.models.utils.ShiftWindowMSA(embed_dims, input_resolution, num_heads, window_size, shift_size=0, qkv_bias=True, qk_scale=None, attn_drop=0, proj_drop=0, dropout_layer={'drop_prob': 0.0, 'type': 'DropPath'}, auto_pad=False, init_cfg=None)[源代码]¶
Shift Window Multihead Self-Attention Module.
- 参数
embed_dims (int) – Number of input channels.
input_resolution (Tuple[int, int]) – The resolution of the input feature map.
num_heads (int) – Number of attention heads.
window_size (int) – The height and width of the window.
shift_size (int, optional) – The shift step of each window towards right-bottom. If zero, act as regular window-msa. Defaults to 0.
qkv_bias (bool, optional) – If True, add a learnable bias to q, k, v. Default: True
qk_scale (float | None, optional) – Override default qk scale of head_dim ** -0.5 if set. Defaults to None.
attn_drop (float, optional) – Dropout ratio of attention weight. Defaults to 0.0.
proj_drop (float, optional) – Dropout ratio of output. Defaults to 0.
dropout_layer (dict, optional) – The dropout_layer used before output. Defaults to dict(type=’DropPath’, drop_prob=0.).
auto_pad (bool, optional) – Auto pad the feature map to be divisible by window_size, Defaults to False.
init_cfg (dict, optional) – The extra config for initialization. Default: None.
- forward(query)[源代码]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
注解
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- mmcls.models.utils.channel_shuffle(x, groups)[源代码]¶
Channel Shuffle operation.
This function enables cross-group information flow for multiple groups convolution layers.
- 参数
x (Tensor) – The input tensor.
groups (int) – The number of groups to divide the input tensor in the channel dimension.
- 返回
The output tensor after channel shuffle operation.
- 返回类型
Tensor
- mmcls.models.utils.make_divisible(value, divisor, min_value=None, min_ratio=0.9)[源代码]¶
Make divisible function.
This function rounds the channel number down to the nearest value that can be divisible by the divisor.
- 参数
value (int) – The original channel number.
divisor (int) – The divisor to fully divide the channel number.
min_value (int, optional) – The minimum value of the output channel. Default: None, means that the minimum value equal to the divisor.
min_ratio (float) – The minimum ratio of the rounded channel number to the original channel number. Default: 0.9.
- 返回
The modified output channel number
- 返回类型
int
mmcls.datasets¶
datasets¶
- class mmcls.datasets.BaseDataset(data_prefix, pipeline, classes=None, ann_file=None, test_mode=False)[源代码]¶
Base dataset.
- 参数
data_prefix (str) – the prefix of data path
pipeline (list) – a list of dict, where each element represents a operation defined in mmcls.datasets.pipelines
ann_file (str | None) – the annotation file. When ann_file is str, the subclass is expected to read from the ann_file. When ann_file is None, the subclass is expected to read according to data_prefix
test_mode (bool) – in train mode or test mode
- property class_to_idx¶
Map mapping class name to class index.
- 返回
mapping from class name to class index.
- 返回类型
dict
- evaluate(results, metric='accuracy', metric_options=None, logger=None)[源代码]¶
Evaluate the dataset.
- 参数
results (list) – Testing results of the dataset.
metric (str | list[str]) – Metrics to be evaluated. Default value is accuracy.
metric_options (dict, optional) – Options for calculating metrics. Allowed keys are ‘topk’, ‘thrs’ and ‘average_mode’. Defaults to None.
logger (logging.Logger | str, optional) – Logger used for printing related information during evaluation. Defaults to None.
- 返回
evaluation results
- 返回类型
dict
- get_cat_ids(idx: int) → List[int][源代码]¶
Get category id by index.
- 参数
idx (int) – Index of data.
- 返回
Image category of specified index.
- 返回类型
cat_ids (List[int])
- classmethod get_classes(classes=None)[源代码]¶
Get class names of current dataset.
- 参数
classes (Sequence[str] | str | None) – If classes is None, use default CLASSES defined by builtin dataset. If classes is a string, take it as a file name. The file contains the name of classes where each line contains one class name. If classes is a tuple or list, override the CLASSES defined by the dataset.
- 返回
Names of categories of the dataset.
- 返回类型
tuple[str] or list[str]
- class mmcls.datasets.CIFAR10(data_prefix, pipeline, classes=None, ann_file=None, test_mode=False)[源代码]¶
CIFAR10 Dataset.
This implementation is modified from https://github.com/pytorch/vision/blob/master/torchvision/datasets/cifar.py
- class mmcls.datasets.CIFAR100(data_prefix, pipeline, classes=None, ann_file=None, test_mode=False)[源代码]¶
CIFAR100 Dataset.
- class mmcls.datasets.ClassBalancedDataset(dataset, oversample_thr)[源代码]¶
A wrapper of repeated dataset with repeat factor.
Suitable for training on class imbalanced datasets like LVIS. Following the sampling strategy in 2, in each epoch, an image may appear multiple times based on its “repeat factor”.
The repeat factor for an image is a function of the frequency the rarest category labeled in that image. The “frequency of category c” in [0, 1] is defined by the fraction of images in the training set (without repeats) in which category c appears.
The dataset needs to implement
self.get_cat_ids()to support ClassBalancedDataset.The repeat factor is computed as followed.
For each category c, compute the fraction \(f(c)\) of images that contain it.
For each category c, compute the category-level repeat factor
\[r(c) = \max(1, \sqrt{\frac{t}{f(c)}})\]For each image I and its labels \(L(I)\), compute the image-level repeat factor
\[r(I) = \max_{c \in L(I)} r(c)\]
引用
- 参数
dataset (
CustomDataset) – The dataset to be repeated.oversample_thr (float) – frequency threshold below which data is repeated. For categories with f_c >= oversample_thr, there is no oversampling. For categories with f_c < oversample_thr, the degree of oversampling following the square-root inverse frequency heuristic above.
- class mmcls.datasets.ConcatDataset(datasets)[源代码]¶
A wrapper of concatenated dataset.
Same as
torch.utils.data.dataset.ConcatDataset, but add get_cat_ids function.- 参数
datasets (list[
Dataset]) – A list of datasets.
- class mmcls.datasets.DistributedSampler(dataset, num_replicas=None, rank=None, shuffle=True, round_up=True)[源代码]¶
- class mmcls.datasets.FashionMNIST(data_prefix, pipeline, classes=None, ann_file=None, test_mode=False)[源代码]¶
Fashion-MNIST Dataset.
- class mmcls.datasets.ImageNet(data_prefix, pipeline, classes=None, ann_file=None, test_mode=False)[源代码]¶
ImageNet Dataset.
This implementation is modified from https://github.com/pytorch/vision/blob/master/torchvision/datasets/imagenet.py
- class mmcls.datasets.ImageNet21k(data_prefix, pipeline, classes=None, ann_file=None, multi_label=False, recursion_subdir=False, test_mode=False)[源代码]¶
ImageNet21k Dataset.
Since the dataset ImageNet21k is extremely big, cantains 21k+ classes and 1.4B files. This class has improved the following points on the basis of the class ImageNet, in order to save memory usage and time
required :
Delete the samples attribute
using ‘slots’ create a Data_item tp replace dict
Modify setting info dict from function load_annotations to function prepare_data
using int instead of np.array(…, np.int64)
Args: data_prefix (str): the prefix of data path pipeline (list): a list of dict, where each element represents
a operation defined in mmcls.datasets.pipelines
- ann_file (str | None): the annotation file. When ann_file is str,
the subclass is expected to read from the ann_file. When ann_file is None, the subclass is expected to read according to data_prefix
test_mode (bool): in train mode or test mode multi_label (bool): use multi label or not. recursion_subdir(bool): whether to use sub-directory pictures, which
are meet the conditions in the folder under category directory.
- class mmcls.datasets.MNIST(data_prefix, pipeline, classes=None, ann_file=None, test_mode=False)[源代码]¶
MNIST Dataset.
This implementation is modified from https://github.com/pytorch/vision/blob/master/torchvision/datasets/mnist.py
- class mmcls.datasets.MultiLabelDataset(data_prefix, pipeline, classes=None, ann_file=None, test_mode=False)[源代码]¶
Multi-label Dataset.
- evaluate(results, metric='mAP', metric_options=None, logger=None, **deprecated_kwargs)[源代码]¶
Evaluate the dataset.
- 参数
results (list) – Testing results of the dataset.
metric (str | list[str]) – Metrics to be evaluated. Default value is ‘mAP’. Options are ‘mAP’, ‘CP’, ‘CR’, ‘CF1’, ‘OP’, ‘OR’ and ‘OF1’.
metric_options (dict, optional) – Options for calculating metrics. Allowed keys are ‘k’ and ‘thr’. Defaults to None
logger (logging.Logger | str, optional) – Logger used for printing related information during evaluation. Defaults to None.
deprecated_kwargs (dict) – Used for containing deprecated arguments.
- 返回
evaluation results
- 返回类型
dict
- class mmcls.datasets.RepeatDataset(dataset, times)[源代码]¶
A wrapper of repeated dataset.
The length of repeated dataset will be times larger than the original dataset. This is useful when the data loading time is long but the dataset is small. Using RepeatDataset can reduce the data loading time between epochs.
- 参数
dataset (
Dataset) – The dataset to be repeated.times (int) – Repeat times.
- class mmcls.datasets.VOC(**kwargs)[源代码]¶
Pascal VOC Dataset.
- mmcls.datasets.build_dataloader(dataset, samples_per_gpu, workers_per_gpu, num_gpus=1, dist=True, shuffle=True, round_up=True, seed=None, pin_memory=True, persistent_workers=True, **kwargs)[源代码]¶
Build PyTorch DataLoader.
In distributed training, each GPU/process has a dataloader. In non-distributed training, there is only one dataloader for all GPUs.
- 参数
dataset (Dataset) – A PyTorch dataset.
samples_per_gpu (int) – Number of training samples on each GPU, i.e., batch size of each GPU.
workers_per_gpu (int) – How many subprocesses to use for data loading for each GPU.
num_gpus (int) – Number of GPUs. Only used in non-distributed training.
dist (bool) – Distributed training/test or not. Default: True.
shuffle (bool) – Whether to shuffle the data at every epoch. Default: True.
round_up (bool) – Whether to round up the length of dataset by adding extra samples to make it evenly divisible. Default: True.
pin_memory (bool) – Whether to use pin_memory in DataLoader. Default: True
persistent_workers (bool) – If True, the data loader will not shutdown the worker processes after a dataset has been consumed once. This allows to maintain the workers Dataset instances alive. The argument also has effect in PyTorch>=1.7.0. Default: True
kwargs – any keyword argument to be used to initialize DataLoader
- 返回
A PyTorch dataloader.
- 返回类型
DataLoader
pipelines¶
- class mmcls.datasets.pipelines.AutoAugment(policies, hparams={'pad_val': 128})[源代码]¶
Auto augmentation.
This data augmentation is proposed in AutoAugment: Learning Augmentation Policies from Data.
- 参数
policies (list[list[dict]]) – The policies of auto augmentation. Each policy in
policiesis a specific augmentation policy, and is composed by several augmentations (dict). When AutoAugment is called, a random policy inpolicieswill be selected to augment images.hparams (dict) – Configs of hyperparameters. Hyperparameters will be used in policies that require these arguments if these arguments are not set in policy dicts. Defaults to use _HPARAMS_DEFAULT.
- class mmcls.datasets.pipelines.AutoContrast(prob=0.5)[源代码]¶
Auto adjust image contrast.
- 参数
prob (float) – The probability for performing invert therefore should be in range [0, 1]. Defaults to 0.5.
- class mmcls.datasets.pipelines.Brightness(magnitude, prob=0.5, random_negative_prob=0.5)[源代码]¶
Adjust images brightness.
- 参数
magnitude (int | float) – The magnitude used for adjusting brightness. A positive magnitude would enhance the brightness and a negative magnitude would make the image darker. A magnitude=0 gives the origin img.
prob (float) – The probability for performing contrast adjusting therefore should be in range [0, 1]. Defaults to 0.5.
random_negative_prob (float) – The probability that turns the magnitude negative, which should be in range [0,1]. Defaults to 0.5.
- class mmcls.datasets.pipelines.CenterCrop(crop_size, efficientnet_style=False, crop_padding=32, interpolation='bilinear', backend='cv2')[源代码]¶
Center crop the image.
- 参数
crop_size (int | tuple) – Expected size after cropping with the format of (h, w).
efficientnet_style (bool) – Whether to use efficientnet style center crop. Defaults to False.
crop_padding (int) – The crop padding parameter in efficientnet style center crop. Only valid if efficientnet style is True. Defaults to 32.
interpolation (str) – Interpolation method, accepted values are ‘nearest’, ‘bilinear’, ‘bicubic’, ‘area’, ‘lanczos’. Only valid if
efficientnet_styleis True. Defaults to ‘bilinear’.backend (str) – The image resize backend type, accepted values are cv2 and pillow. Only valid if efficientnet style is True. Defaults to cv2.
提示
If the image is smaller than the crop size, return the original image.
If efficientnet_style is set to False, the pipeline would be a simple center crop using the crop_size.
If efficientnet_style is set to True, the pipeline will be to first to perform the center crop with the
crop_size_as:
\[\text{crop\_size\_} = \frac{\text{crop\_size}}{\text{crop\_size} + \text{crop\_padding}} \times \text{short\_edge}\]And then the pipeline resizes the img to the input crop size.
- class mmcls.datasets.pipelines.Collect(keys, meta_keys=('filename', 'ori_filename', 'ori_shape', 'img_shape', 'flip', 'flip_direction', 'img_norm_cfg'))[源代码]¶
Collect data from the loader relevant to the specific task.
This is usually the last stage of the data loader pipeline. Typically keys is set to some subset of “img” and “gt_label”.
- 参数
keys (Sequence[str]) – Keys of results to be collected in
data.meta_keys (Sequence[str], optional) – Meta keys to be converted to
mmcv.DataContainerand collected indata[img_metas]. Default: (‘filename’, ‘ori_shape’, ‘img_shape’, ‘flip’, ‘flip_direction’, ‘img_norm_cfg’)
- 返回
The result dict contains the following keys
keys in
self.keysimg_metasif available
- 返回类型
dict
- class mmcls.datasets.pipelines.ColorJitter(brightness, contrast, saturation)[源代码]¶
Randomly change the brightness, contrast and saturation of an image.
- 参数
brightness (float) – How much to jitter brightness. brightness_factor is chosen uniformly from [max(0, 1 - brightness), 1 + brightness].
contrast (float) – How much to jitter contrast. contrast_factor is chosen uniformly from [max(0, 1 - contrast), 1 + contrast].
saturation (float) – How much to jitter saturation. saturation_factor is chosen uniformly from [max(0, 1 - saturation), 1 + saturation].
- class mmcls.datasets.pipelines.ColorTransform(magnitude, prob=0.5, random_negative_prob=0.5)[源代码]¶
Adjust images color balance.
- 参数
magnitude (int | float) – The magnitude used for color transform. A positive magnitude would enhance the color and a negative magnitude would make the image grayer. A magnitude=0 gives the origin img.
prob (float) – The probability for performing ColorTransform therefore should be in range [0, 1]. Defaults to 0.5.
random_negative_prob (float) – The probability that turns the magnitude negative, which should be in range [0,1]. Defaults to 0.5.
- class mmcls.datasets.pipelines.Compose(transforms)[源代码]¶
Compose a data pipeline with a sequence of transforms.
- 参数
transforms (list[dict | callable]) – Either config dicts of transforms or transform objects.
- class mmcls.datasets.pipelines.Contrast(magnitude, prob=0.5, random_negative_prob=0.5)[源代码]¶
Adjust images contrast.
- 参数
magnitude (int | float) – The magnitude used for adjusting contrast. A positive magnitude would enhance the contrast and a negative magnitude would make the image grayer. A magnitude=0 gives the origin img.
prob (float) – The probability for performing contrast adjusting therefore should be in range [0, 1]. Defaults to 0.5.
random_negative_prob (float) – The probability that turns the magnitude negative, which should be in range [0,1]. Defaults to 0.5.
- class mmcls.datasets.pipelines.Cutout(shape, pad_val=128, prob=0.5)[源代码]¶
Cutout images.
- 参数
shape (int | float | tuple(int | float)) – Expected cutout shape (h, w). If given as a single value, the value will be used for both h and w.
pad_val (int, Sequence[int]) – Pixel pad_val value for constant fill. If it is a sequence, it must have the same length with the image channels. Defaults to 128.
prob (float) – The probability for performing cutout therefore should be in range [0, 1]. Defaults to 0.5.
- class mmcls.datasets.pipelines.Equalize(prob=0.5)[源代码]¶
Equalize the image histogram.
- 参数
prob (float) – The probability for performing invert therefore should be in range [0, 1]. Defaults to 0.5.
- class mmcls.datasets.pipelines.Invert(prob=0.5)[源代码]¶
Invert images.
- 参数
prob (float) – The probability for performing invert therefore should be in range [0, 1]. Defaults to 0.5.
- class mmcls.datasets.pipelines.Lighting(eigval, eigvec, alphastd=0.1, to_rgb=True)[源代码]¶
Adjust images lighting using AlexNet-style PCA jitter.
- 参数
eigval (list) – the eigenvalue of the convariance matrix of pixel values, respectively.
eigvec (list[list]) – the eigenvector of the convariance matrix of pixel values, respectively.
alphastd (float) – The standard deviation for distribution of alpha. Defaults to 0.1
to_rgb (bool) – Whether to convert img to rgb.
- class mmcls.datasets.pipelines.LoadImageFromFile(to_float32=False, color_type='color', file_client_args={'backend': 'disk'})[源代码]¶
Load an image from file.
Required keys are “img_prefix” and “img_info” (a dict that must contain the key “filename”). Added or updated keys are “filename”, “img”, “img_shape”, “ori_shape” (same as img_shape) and “img_norm_cfg” (means=0 and stds=1).
- 参数
to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.
color_type (str) – The flag argument for
mmcv.imfrombytes(). Defaults to ‘color’.file_client_args (dict) – Arguments to instantiate a FileClient. See
mmcv.fileio.FileClientfor details. Defaults todict(backend='disk').
- class mmcls.datasets.pipelines.Normalize(mean, std, to_rgb=True)[源代码]¶
Normalize the image.
- 参数
mean (sequence) – Mean values of 3 channels.
std (sequence) – Std values of 3 channels.
to_rgb (bool) – Whether to convert the image from BGR to RGB, default is true.
- class mmcls.datasets.pipelines.Pad(size=None, pad_to_square=False, pad_val=0, padding_mode='constant')[源代码]¶
Pad images.
- 参数
size (tuple[int] | None) – Expected padding size (h, w). Conflicts with pad_to_square. Defaults to None.
pad_to_square (bool) – Pad any image to square shape. Defaults to False.
pad_val (Number | Sequence[Number]) – Values to be filled in padding areas when padding_mode is ‘constant’. Default to 0.
padding_mode (str) – Type of padding. Should be: constant, edge, reflect or symmetric. Default to “constant”.
- class mmcls.datasets.pipelines.Posterize(bits, prob=0.5)[源代码]¶
Posterize images (reduce the number of bits for each color channel).
- 参数
bits (int | float) – Number of bits for each pixel in the output img, which should be less or equal to 8.
prob (float) – The probability for posterizing therefore should be in range [0, 1]. Defaults to 0.5.
- class mmcls.datasets.pipelines.RandAugment(policies, num_policies, magnitude_level, magnitude_std=0.0, total_level=30, hparams={'pad_val': 128})[源代码]¶
Random augmentation.
This data augmentation is proposed in RandAugment: Practical automated data augmentation with a reduced search space.
- 参数
policies (list[dict]) – The policies of random augmentation. Each policy in
policiesis one specific augmentation policy (dict). The policy shall at least have key type, indicating the type of augmentation. For those which have magnitude, (given to the fact they are named differently in different augmentation, ) magnitude_key and magnitude_range shall be the magnitude argument (str) and the range of magnitude (tuple in the format of (val1, val2)), respectively. Note that val1 is not necessarily less than val2.num_policies (int) – Number of policies to select from policies each time.
magnitude_level (int | float) – Magnitude level for all the augmentation selected.
total_level (int | float) – Total level for the magnitude. Defaults to 30.
magnitude_std (Number | str) –
Deviation of magnitude noise applied.
If positive number, magnitude is sampled from normal distribution (mean=magnitude, std=magnitude_std).
If 0 or negative number, magnitude remains unchanged.
If str “inf”, magnitude is sampled from uniform distribution (range=[min, magnitude]).
hparams (dict) – Configs of hyperparameters. Hyperparameters will be used in policies that require these arguments if these arguments are not set in policy dicts. Defaults to use _HPARAMS_DEFAULT.
注解
magnitude_std will introduce some randomness to policy, modified by https://github.com/rwightman/pytorch-image-models.
When magnitude_std=0, we calculate the magnitude as follows:
\[\text{magnitude} = \frac{\text{magnitude\_level}} {\text{total\_level}} \times (\text{val2} - \text{val1}) + \text{val1}\]
- class mmcls.datasets.pipelines.RandomCrop(size, padding=None, pad_if_needed=False, pad_val=0, padding_mode='constant')[源代码]¶
Crop the given Image at a random location.
- 参数
size (sequence or int) – Desired output size of the crop. If size is an int instead of sequence like (h, w), a square crop (size, size) is made.
padding (int or sequence, optional) – Optional padding on each border of the image. If a sequence of length 4 is provided, it is used to pad left, top, right, bottom borders respectively. If a sequence of length 2 is provided, it is used to pad left/right, top/bottom borders, respectively. Default: None, which means no padding.
pad_if_needed (boolean) – It will pad the image if smaller than the desired size to avoid raising an exception. Since cropping is done after padding, the padding seems to be done at a random offset. Default: False.
pad_val (Number | Sequence[Number]) – Pixel pad_val value for constant fill. If a tuple of length 3, it is used to pad_val R, G, B channels respectively. Default: 0.
padding_mode (str) –
Type of padding. Defaults to “constant”. Should be one of the following:
constant: Pads with a constant value, this value is specified with pad_val.
edge: pads with the last value at the edge of the image.
reflect: Pads with reflection of image without repeating the last value on the edge. For example, padding [1, 2, 3, 4] with 2 elements on both sides in reflect mode will result in [3, 2, 1, 2, 3, 4, 3, 2].
symmetric: Pads with reflection of image repeating the last value on the edge. For example, padding [1, 2, 3, 4] with 2 elements on both sides in symmetric mode will result in [2, 1, 1, 2, 3, 4, 4, 3].
- class mmcls.datasets.pipelines.RandomErasing(erase_prob=0.5, min_area_ratio=0.02, max_area_ratio=0.4, aspect_range=(0.3, 3.3333333333333335), mode='const', fill_color=(128, 128, 128), fill_std=None)[源代码]¶
Randomly selects a rectangle region in an image and erase pixels.
- 参数
erase_prob (float) – Probability that image will be randomly erased. Default: 0.5
min_area_ratio (float) – Minimum erased area / input image area Default: 0.02
max_area_ratio (float) – Maximum erased area / input image area Default: 0.4
aspect_range (sequence | float) – Aspect ratio range of erased area. if float, it will be converted to (aspect_ratio, 1/aspect_ratio) Default: (3/10, 10/3)
mode (str) –
Fill method in erased area, can be:
const (default): All pixels are assign with the same value.
rand: each pixel is assigned with a random value in [0, 255]
fill_color (sequence | Number) – Base color filled in erased area. Defaults to (128, 128, 128).
fill_std (sequence | Number, optional) – If set and
modeis ‘rand’, fill erased area with random color from normal distribution (mean=fill_color, std=fill_std); If not set, fill erased area with random color from uniform distribution (0~255). Defaults to None.
注解
See Random Erasing Data Augmentation
This paper provided 4 modes: RE-R, RE-M, RE-0, RE-255, and use RE-M as default. The config of these 4 modes are:
RE-R: RandomErasing(mode=’rand’)
RE-M: RandomErasing(mode=’const’, fill_color=(123.67, 116.3, 103.5))
RE-0: RandomErasing(mode=’const’, fill_color=0)
RE-255: RandomErasing(mode=’const’, fill_color=255)
- class mmcls.datasets.pipelines.RandomFlip(flip_prob=0.5, direction='horizontal')[源代码]¶
Flip the image randomly.
Flip the image randomly based on flip probaility and flip direction.
- 参数
flip_prob (float) – probability of the image being flipped. Default: 0.5
direction (str) – The flipping direction. Options are ‘horizontal’ and ‘vertical’. Default: ‘horizontal’.
- class mmcls.datasets.pipelines.RandomGrayscale(gray_prob=0.1)[源代码]¶
Randomly convert image to grayscale with a probability of gray_prob.
- 参数
gray_prob (float) – Probability that image should be converted to grayscale. Default: 0.1.
- 返回
Image after randomly grayscale transform.
- 返回类型
ndarray
提示
If input image is 1 channel: grayscale version is 1 channel.
If input image is 3 channel: grayscale version is 3 channel with r == g == b.
- class mmcls.datasets.pipelines.RandomResizedCrop(size, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), max_attempts=10, efficientnet_style=False, min_covered=0.1, crop_padding=32, interpolation='bilinear', backend='cv2')[源代码]¶
Crop the given image to random size and aspect ratio.
A crop of random size (default: of 0.08 to 1.0) of the original size and a random aspect ratio (default: of 3/4 to 4/3) of the original aspect ratio is made. This crop is finally resized to given size.
- 参数
size (sequence | int) – Desired output size of the crop. If size is an int instead of sequence like (h, w), a square crop (size, size) is made.
scale (tuple) – Range of the random size of the cropped image compared to the original image. Defaults to (0.08, 1.0).
ratio (tuple) – Range of the random aspect ratio of the cropped image compared to the original image. Defaults to (3. / 4., 4. / 3.).
max_attempts (int) – Maximum number of attempts before falling back to Central Crop. Defaults to 10.
efficientnet_style (bool) – Whether to use efficientnet style Random ResizedCrop. Defaults to False.
min_covered (Number) – Minimum ratio of the cropped area to the original area. Only valid if efficientnet_style is true. Defaults to 0.1.
crop_padding (int) – The crop padding parameter in efficientnet style center crop. Only valid if efficientnet_style is true. Defaults to 32.
interpolation (str) – Interpolation method, accepted values are ‘nearest’, ‘bilinear’, ‘bicubic’, ‘area’, ‘lanczos’. Defaults to ‘bilinear’.
backend (str) – The image resize backend type, accepted values are cv2 and pillow. Defaults to cv2.
- static get_params(img, scale, ratio, max_attempts=10)[源代码]¶
Get parameters for
cropfor a random sized crop.- 参数
img (ndarray) – Image to be cropped.
scale (tuple) – Range of the random size of the cropped image compared to the original image size.
ratio (tuple) – Range of the random aspect ratio of the cropped image compared to the original image area.
max_attempts (int) – Maximum number of attempts before falling back to central crop. Defaults to 10.
- 返回
- Params (ymin, xmin, ymax, xmax) to be passed to crop for
a random sized crop.
- 返回类型
tuple
- static get_params_efficientnet_style(img, size, scale, ratio, max_attempts=10, min_covered=0.1, crop_padding=32)[源代码]¶
Get parameters for
cropfor a random sized crop in efficientnet style.- 参数
img (ndarray) – Image to be cropped.
size (sequence) – Desired output size of the crop.
scale (tuple) – Range of the random size of the cropped image compared to the original image size.
ratio (tuple) – Range of the random aspect ratio of the cropped image compared to the original image area.
max_attempts (int) – Maximum number of attempts before falling back to central crop. Defaults to 10.
min_covered (Number) – Minimum ratio of the cropped area to the original area. Only valid if efficientnet_style is true. Defaults to 0.1.
crop_padding (int) – The crop padding parameter in efficientnet style center crop. Defaults to 32.
- 返回
- Params (ymin, xmin, ymax, xmax) to be passed to crop for
a random sized crop.
- 返回类型
tuple
- class mmcls.datasets.pipelines.Resize(size, interpolation='bilinear', adaptive_side='short', backend='cv2')[源代码]¶
Resize images.
- 参数
size (int | tuple) – Images scales for resizing (h, w). When size is int, the default behavior is to resize an image to (size, size). When size is tuple and the second value is -1, the image will be resized according to adaptive_side. For example, when size is 224, the image is resized to 224x224. When size is (224, -1) and adaptive_size is “short”, the short side is resized to 224 and the other side is computed based on the short side, maintaining the aspect ratio.
interpolation (str) – Interpolation method. For “cv2” backend, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos”. For “pillow” backend, accepted values are “nearest”, “bilinear”, “bicubic”, “box”, “lanczos”, “hamming”. More details can be found in mmcv.image.geometric.
adaptive_side (str) – Adaptive resize policy, accepted values are “short”, “long”, “height”, “width”. Default to “short”.
backend (str) – The image resize backend type, accepted values are cv2 and pillow. Default: cv2.
- class mmcls.datasets.pipelines.Rotate(angle, center=None, scale=1.0, pad_val=128, prob=0.5, random_negative_prob=0.5, interpolation='nearest')[源代码]¶
Rotate images.
- 参数
angle (float) – The angle used for rotate. Positive values stand for clockwise rotation.
center (tuple[float], optional) – Center point (w, h) of the rotation in the source image. If None, the center of the image will be used. Defaults to None.
scale (float) – Isotropic scale factor. Defaults to 1.0.
pad_val (int, Sequence[int]) – Pixel pad_val value for constant fill. If a sequence of length 3, it is used to pad_val R, G, B channels respectively. Defaults to 128.
prob (float) – The probability for performing Rotate therefore should be in range [0, 1]. Defaults to 0.5.
random_negative_prob (float) – The probability that turns the angle negative, which should be in range [0,1]. Defaults to 0.5.
interpolation (str) – Interpolation method. Options are ‘nearest’, ‘bilinear’, ‘bicubic’, ‘area’, ‘lanczos’. Defaults to ‘nearest’.
- class mmcls.datasets.pipelines.Sharpness(magnitude, prob=0.5, random_negative_prob=0.5)[源代码]¶
Adjust images sharpness.
- 参数
magnitude (int | float) – The magnitude used for adjusting sharpness. A positive magnitude would enhance the sharpness and a negative magnitude would make the image bulr. A magnitude=0 gives the origin img.
prob (float) – The probability for performing contrast adjusting therefore should be in range [0, 1]. Defaults to 0.5.
random_negative_prob (float) – The probability that turns the magnitude negative, which should be in range [0,1]. Defaults to 0.5.
- class mmcls.datasets.pipelines.Shear(magnitude, pad_val=128, prob=0.5, direction='horizontal', random_negative_prob=0.5, interpolation='bicubic')[源代码]¶
Shear images.
- 参数
magnitude (int | float) – The magnitude used for shear.
pad_val (int, Sequence[int]) – Pixel pad_val value for constant fill. If a sequence of length 3, it is used to pad_val R, G, B channels respectively. Defaults to 128.
prob (float) – The probability for performing Shear therefore should be in range [0, 1]. Defaults to 0.5.
direction (str) – The shearing direction. Options are ‘horizontal’ and ‘vertical’. Defaults to ‘horizontal’.
random_negative_prob (float) – The probability that turns the magnitude negative, which should be in range [0,1]. Defaults to 0.5.
interpolation (str) – Interpolation method. Options are ‘nearest’, ‘bilinear’, ‘bicubic’, ‘area’, ‘lanczos’. Defaults to ‘bicubic’.
- class mmcls.datasets.pipelines.Solarize(thr, prob=0.5)[源代码]¶
Solarize images (invert all pixel values above a threshold).
- 参数
thr (int | float) – The threshold above which the pixels value will be inverted.
prob (float) – The probability for solarizing therefore should be in range [0, 1]. Defaults to 0.5.
- class mmcls.datasets.pipelines.SolarizeAdd(magnitude, thr=128, prob=0.5)[源代码]¶
SolarizeAdd images (add a certain value to pixels below a threshold).
- 参数
magnitude (int | float) – The value to be added to pixels below the thr.
thr (int | float) – The threshold below which the pixels value will be adjusted.
prob (float) – The probability for solarizing therefore should be in range [0, 1]. Defaults to 0.5.
- class mmcls.datasets.pipelines.Translate(magnitude, pad_val=128, prob=0.5, direction='horizontal', random_negative_prob=0.5, interpolation='nearest')[源代码]¶
Translate images.
- 参数
magnitude (int | float) – The magnitude used for translate. Note that the offset is calculated by magnitude * size in the corresponding direction. With a magnitude of 1, the whole image will be moved out of the range.
pad_val (int, Sequence[int]) – Pixel pad_val value for constant fill. If a sequence of length 3, it is used to pad_val R, G, B channels respectively. Defaults to 128.
prob (float) – The probability for performing translate therefore should be in range [0, 1]. Defaults to 0.5.
direction (str) – The translating direction. Options are ‘horizontal’ and ‘vertical’. Defaults to ‘horizontal’.
random_negative_prob (float) – The probability that turns the magnitude negative, which should be in range [0,1]. Defaults to 0.5.
interpolation (str) – Interpolation method. Options are ‘nearest’, ‘bilinear’, ‘bicubic’, ‘area’, ‘lanczos’. Defaults to ‘nearest’.
mmcls.utils¶
- mmcls.utils.load_json_logs(json_logs)[源代码]¶
load and convert json_logs to log_dicts.
- 参数
json_logs (str) – paths of json_logs.
- 返回
- dict())]: key is epoch, value is a sub dict keys of
sub dict is different metrics, e.g. memory, bbox_mAP, value of sub dict is a list of corresponding values of all iterations.
- 返回类型
list[dict(int