Integrated Intelligent Energy ›› 2024, Vol. 46 ›› Issue (11): 19-28.doi: 10.3969/j.issn.2097-0706.2024.11.003

• Optimized Operation and Control of Integrating Energy Systems • Previous Articles     Next Articles

An information extraction method for electric power accidents based on BERT-BiLSTM-CRF model

ZHAO Guizhong(), HUANG Miaohua   

  1. Huizhou Power Supply Bureau, Guangdong Power Grid Corporation,Huizhou 516001,China
  • Received:2024-09-09 Revised:2024-10-21 Published:2024-11-25
  • Supported by:
    Science and Technology Project of China Southern Power Grid Company Limited(031300KK52222091)

Abstract:

Investigating patterns in electric power accidents and establishing a safety warning model require accurate, automated information extraction from large-scale accident samples for multidimensional analysis. However, traditional methods for extracting Chinese information entity features have shown low accuracy. Therefore, based on a novel named entity recognition technique for Chinese processing and leveraging multiple machine learning and deep learning models, a BERT-BiLSTM-CRF model tailored to the power grid accident domain was proposed. High-quality word vectors were generated by a pre-trained model of bidirectional encoder representations from transformers(BERT) within a transformer framework. A semantic enhancement masking strategy was employed to improve the model's understanding of the overall text structure. Then, a bidirection long short-term memory(BiLSTM) model was applied to capture contextual information, completing feature extraction. The conditional random field(CRF) model produced the optimal prediction sequence. Experimental results demonstrated the superiority of this customized model, as its accuracy, recall, and F1 score exceeded those of three existing entity recognition models, including a general large model pre-trained using Generative pre-trained transformer(GPT) technology. These experiments validate that the proposed method achieves high accuracy and displays significant advantages in Chinese electric power accident information extraction.

Key words: electric power accidents, information extraction, bidirectional encoder representations from transformers pre-training, bidirection long short-term memory network, conditional random field

CLC Number: