生物多样性 ›› 2024, Vol. 32 ›› Issue (4): 23435.  DOI: 10.17520/biods.2023435

• 技术与方法 • 上一篇    下一篇

基于深度学习的我国北方常见天然草地植物识别

王永财1, 万华伟2, 高吉喜2,*(), 胡卓玮1,*(), 孙晨曦2, 吕娜2, 张志如2   

  1. 1.首都师范大学资源环境与旅游学院, 北京 100048
    2.生态环境部卫星环境应用中心, 北京 100094
  • 收稿日期:2023-11-15 接受日期:2024-03-30 出版日期:2024-04-20 发布日期:2024-05-17
  • 通讯作者: * E-mail: gjx@nies.org;huzhuowei@cnu.edu.cn
  • 基金资助:
    国家重点研发计划(2021YFB3901102)

Identification of common native grassland plants in northern China using deep learning

Yongcai Wang1, Huawei Wan2, Jixi Gao2,*(), Zhuowei Hu1,*(), Chenxi Sun2, Na Lü2, Zhiru Zhang2   

  1. 1 College of Resource Environment and Tourism, Capital Normal University, Beijing 100048
    2 Satellite Application Center for Ecology and Environment, Ministry of Ecology and Environment, Beijing 100094
  • Received:2023-11-15 Accepted:2024-03-30 Online:2024-04-20 Published:2024-05-17
  • Contact: * E-mail: gjx@nies.org;huzhuowei@cnu.edu.cn

摘要:

草地植物的分类识别是开展草地资源调查和生物多样性监测的基础, 计算机视觉和深度学习技术的快速发展为植物分类识别提供了技术条件, 但目前缺乏专门针对草地植物识别的数据集和模型。本研究建立了我国北方831种常见天然草地植物的图像数据集, 基于卷积神经网络(convolutional neural network, CNN)和视觉Transformers (vision transformers, ViT)这两个最先进的图像分类架构进行模型训练, 以获取草地植物识别模型, 并从模型识别精度、识别速度和大小等方面评估了Eva-02、ResNet-RS、MobileNetV3和MobileViTv2 4个模型的性能。从模型识别精度方面来看, Eva-02、MobileViTv2、ResNet_RS和MobileNetV3在测试集的Top1准确率分别为96.78%、94.29%、95.57%和91.53%, Top5准确率分别为99.17%、98.93%、98.79%和97.56%。从模型大小和识别速度来看, MobileNetV3的参数量最小, 识别速度最快, 其次为MobileViTv2, 可用于移动端部署, 而Eva-02参数量最大, 检测速度最慢。与Pl@ntNet、花伴侣、百度识图植物识别效果的比较表明, 本研究训练得到的4个植物识别模型可以识别的天然草地植物物种数量最多, 识别准确率最高, 均优于这3个识别系统。

关键词: 草地, 植物识别, 深度学习, 卷积神经网络, 视觉Transformers

Abstract

Aims: The classification and identification of grassland plants is an essential part of grassland resource surveillance and biodiversity monitoring. Rapid advancements in computer vision and deep learning have created opportunities for automating this process, however, there is currently a shortage of datasets and models specifically tailored for the identification of grassland plants.

Methods: This study established a dataset comprising images of 831 species of native grassland plants in northern China. Employing state-of-the-art image classification architectures based on convolutional neural networks (CNN) and vision transformers (ViT), we trained models for the recognition of grassland plant images. Four models (Eva-02, ResNet_RS, MobileNetV3, and MobileViTv2) were evaluated for accuracy, recognition speed, and size.

Results: Regarding model recognition accuracy, the Top1 accuracy of the Eva-02, MobileViTv2, ResNet_RS, and MobileNetV3 models on the test set were 96.78%, 94.29%, 95.57%, and 91.53%, respectively. The Top5 accuracy on the test set were 99.17%, 98.93%, 98.79%, and 97.56%, respectively. In terms of model size and recognition speed, the MobileNetV3 model exhibited the smallest parameter size and fastest recognition speed, followed by MobileViTv2, making these models suitable for deployment on mobile devices. Conversely, the Eva-02 model had the largest parameter size and the slowest detection speed. Comparing with Pl@ntNet, HuaBanLv, and Baidu-Shitu, all four models developed in this study outperform these three recognition systems.

Conclusion: The plant recognition models trained in this study can recognize the largest number of natural grassland plant species with the highest accuracy compared to other popular recognition systems. The four models strike a balance between model recognition accuracy and performance that is suitable for deployment on both desktop and mobile platforms. They also fulfill the requirements for indoor and outdoor application scenarios.

Key words: grassland, species recognition, deep learning, convolutional neural network, vision transformer