物体识别入门-365被限制了让提款-365彩票还能玩吗-365被限制了让提款-365商城官网

数据预处理

在这里，我们需大致的了解一下这套数据的构造，比方说这套数据包含着多少个类别，每个类别里有多少张图片。这些信息会在搭建模型或训练模型时用到

train_images <- list()

for (class in list.files('flower_photos', recursive = FALSE)) {

train_images[[class]] <- sort(list.files(file.path('flower_photos', class), recursive = TRUE))

}

data.frame(class = names(train_images),

n_images = sapply(train_images, length)) %>%

ggplot(aes(x = class, y = n_images)) +

geom_bar(stat = 'identity')

从运行结果中，我们可以看到这套数据一共有五个类别。每个类别都包含着 600 张以上的图像，其中 dandelion 和 daisy 类中的图像数量的差距还是有点大。一般讲在训练模型时，最理想的情况下是需要保证每个类别中的样本数相似。因此，当我们使用类别间样本数的差距比较大的数据时，我们需要调整每个类别中的样本量。比较简单的方案有以下两种。

从样本数比较多的类别里随机删掉一些样本。

使用一些数据扩增（data augmentation）的方法，将样本数少的类别里的数据扩增。

在这里，为了节省时间，我们在每个类别里挑选 20 张图像作为训练集用于训练模型。另外，我们还在每个类别里挑选了 10 张图像作为验证集用于验证模型的性能。

n_train_images <- 20

n_valid_images <- 10

class_labels <- c('dandelion', 'sunflowers', 'roses', 'tulips', 'daisy')

dir.create('flower_photos_train', showWarnings = FALSE)

dir.create('flower_photos_valid', showWarnings = FALSE)

for (class in names(train_images)) {

if (class %in% class_labels) {

dir.create(file.path('flower_photos_train', class), showWarnings = FALSE)

dir.create(file.path('flower_photos_valid', class), showWarnings = FALSE)

for (i in 1:length(train_images[[class]])) {

if (i <= n_train_images) {

file.copy(file.path('flower_photos', class, train_images[[class]][i]),

file.path('flower_photos_train', class, train_images[[class]][i]))

} else if (n_train_images < i && i <= n_train_images + n_valid_images) {

file.copy(file.path('flower_photos', class, train_images[[class]][i]),

file.path('flower_photos_valid', class, train_images[[class]][i]))

}

待我们整理好训练集以及验证集之后，我们分别为这两组数据集定义一个预处理的流程。

train_transforms <- function(img) {

img <- transform_to_tensor(img)

img <- transform_resize(img, size = c(512, 512))

img <- transform_random_resized_crop(img, size = c(224, 224))

img <- transform_color_jitter(img)

img <- transform_random_horizontal_flip(img)

img <- transform_normalize(img, mean = c(0.485, 0.456, 0.406), std = c(0.229, 0.224, 0.225))

img

}

valid_transforms <- function(img) {

img <- transform_to_tensor(img)

img <- transform_resize(img, size = c(256, 256))

img <- transform_center_crop(img, 224)

img <- transform_normalize(img, mean = c(0.485, 0.456, 0.406), std = c(0.229, 0.224, 0.225))

img

}

下一步我们将整理好的训练集的路径赋予给 image_folder_dataset 函数，让其从中自动获取分组信息，图片信息，以及对每个图片做好预处理的准备。随后我们将 image_folder_dataset 输出的对象赋予给 dataloader，让其准备归纳图片准备带入到模型中训练。

dataset_train <- image_folder_dataset('flower_photos_train', transform = train_transforms)

dataset_train$classes

#> [1] "daisy" "dandelion" "roses" "sunflowers" "tulips"

dataloader_train <- dataloader(dataset_train, batch_size = 2, shuffle = TRUE)

物体识别入门

相关推荐

分數計算器

苹果手机为什么总是无缘无故扣钱？

MacDown：开源Markdown编辑器

2025年中国银行业展望报告

一加手机5

斩星魔剑活动斩星魔剑还会上架吗

齐聚《我的世界》中国版发布会五大知名建筑团队大盘点

下载iCloud照片/文件的正确姿势(亲测好使)

埏埴是什么意思

合作伙伴