综合智慧能源 ›› 2024, Vol. 46 ›› Issue (11): 19-28.doi: 10.3969/j.issn.2097-0706.2024.11.003

• 电力大数据分析与挖掘 • 上一篇    下一篇

基于BERT-BiLSTM-CRF的电力事故信息抽取方法

赵贵中(), 黄淼华   

  1. 广东电网有限责任公司惠州供电局,广东 惠州 516001
  • 收稿日期:2024-09-09 修回日期:2024-10-21 出版日期:2024-11-25
  • 作者简介:赵贵中(1981),男,工程师,硕士,从事电力安全管理等方面的研究,13829928090@163.com
  • 基金资助:
    南方电网公司科技项目(031300KK52222091)

An information extraction method for electric power accidents based on BERT-BiLSTM-CRF model

ZHAO Guizhong(), HUANG Miaohua   

  1. Huizhou Power Supply Bureau, Guangdong Power Grid Corporation,Huizhou 516001,China
  • Received:2024-09-09 Revised:2024-10-21 Published:2024-11-25
  • Supported by:
    Science and Technology Project of China Southern Power Grid Company Limited(031300KK52222091)

摘要:

为了探究电力事故规律,建立人身安全预警模型,在大规模事故样本中自动精准抽取信息并进行多维分析十分必要。传统中文信息实体特征抽取的精确度较低,因此,基于新型中文处理的命名实体识别技术,结合多种特定机器学习和深度学习模型,提出一种专用于电网事故领域的BERT-BiLSTM-CRF模型。通过基于转换器的双向编码表示预训练模型输出高质量词向量,利用语义增强掩码策略增强模型深入理解文本整体结构的能力。运用双向长短期记忆网络模型同时捕捉上下文信息,完成特征提取。根据条件随机场模型输出最优预测序列。试验结果表明,专用模型优势显著,其准确率、召回率和F1值均高于3种现有实体识别模型,包括预训练好的基于生成式预训练转换器技术的通用大模型。试验验证了所提方法在处理中文电力事故信息抽取问题时准确度高,具有显著优势。

关键词: 电力事故, 信息抽取, 双向编码表示预训练, 双向长短期记忆网络, 条件随机场

Abstract:

Investigating patterns in electric power accidents and establishing a safety warning model require accurate, automated information extraction from large-scale accident samples for multidimensional analysis. However, traditional methods for extracting Chinese information entity features have shown low accuracy. Therefore, based on a novel named entity recognition technique for Chinese processing and leveraging multiple machine learning and deep learning models, a BERT-BiLSTM-CRF model tailored to the power grid accident domain was proposed. High-quality word vectors were generated by a pre-trained model of bidirectional encoder representations from transformers(BERT) within a transformer framework. A semantic enhancement masking strategy was employed to improve the model's understanding of the overall text structure. Then, a bidirection long short-term memory(BiLSTM) model was applied to capture contextual information, completing feature extraction. The conditional random field(CRF) model produced the optimal prediction sequence. Experimental results demonstrated the superiority of this customized model, as its accuracy, recall, and F1 score exceeded those of three existing entity recognition models, including a general large model pre-trained using Generative pre-trained transformer(GPT) technology. These experiments validate that the proposed method achieves high accuracy and displays significant advantages in Chinese electric power accident information extraction.

Key words: electric power accidents, information extraction, bidirectional encoder representations from transformers pre-training, bidirection long short-term memory network, conditional random field

中图分类号: