基于BERT-BiLSTM-CRF的电力事故信息抽取方法

doi:10.3969/j.issn.2097-0706.2024.11.003

摘要/Abstract

摘要：

为了探究电力事故规律，建立人身安全预警模型，在大规模事故样本中自动精准抽取信息并进行多维分析十分必要。传统中文信息实体特征抽取的精确度较低，因此，基于新型中文处理的命名实体识别技术，结合多种特定机器学习和深度学习模型，提出一种专用于电网事故领域的BERT-BiLSTM-CRF模型。通过基于转换器的双向编码表示预训练模型输出高质量词向量，利用语义增强掩码策略增强模型深入理解文本整体结构的能力。运用双向长短期记忆网络模型同时捕捉上下文信息，完成特征提取。根据条件随机场模型输出最优预测序列。试验结果表明，专用模型优势显著，其准确率、召回率和F₁值均高于3种现有实体识别模型，包括预训练好的基于生成式预训练转换器技术的通用大模型。试验验证了所提方法在处理中文电力事故信息抽取问题时准确度高，具有显著优势。

关键词: 电力事故, 信息抽取, 双向编码表示预训练, 双向长短期记忆网络, 条件随机场

Abstract:

Investigating patterns in electric power accidents and establishing a safety warning model require accurate， automated information extraction from large-scale accident samples for multidimensional analysis. However， traditional methods for extracting Chinese information entity features have shown low accuracy. Therefore， based on a novel named entity recognition technique for Chinese processing and leveraging multiple machine learning and deep learning models， a BERT-BiLSTM-CRF model tailored to the power grid accident domain was proposed. High-quality word vectors were generated by a pre-trained model of bidirectional encoder representations from transformers（BERT） within a transformer framework. A semantic enhancement masking strategy was employed to improve the model's understanding of the overall text structure. Then， a bidirection long short-term memory（BiLSTM） model was applied to capture contextual information， completing feature extraction. The conditional random field（CRF） model produced the optimal prediction sequence. Experimental results demonstrated the superiority of this customized model， as its accuracy， recall， and F₁ score exceeded those of three existing entity recognition models， including a general large model pre-trained using Generative pre-trained transformer（GPT） technology. These experiments validate that the proposed method achieves high accuracy and displays significant advantages in Chinese electric power accident information extraction.

Key words: electric power accidents, information extraction, bidirectional encoder representations from transformers pre-training, bidirection long short-term memory network, conditional random field

中图分类号:

TP391

赵贵中, 黄淼华. 基于BERT-BiLSTM-CRF的电力事故信息抽取方法[J]. 综合智慧能源, 2024, 46(11): 19-28.

ZHAO Guizhong, HUANG Miaohua. An information extraction method for electric power accidents based on BERT-BiLSTM-CRF model[J]. Integrated Intelligent Energy, 2024, 46(11): 19-28.

图/表 10

图1

图2

表1

图3

图4

表2

图5

表3

表4

表5

参考文献 32

[1]	葛磊蛟, 崔庆雪, 李明玮, 等. 面向低碳经济运行的新型电力系统态势感知技术综述[J]. 综合智慧能源, 2023, 45(1): 1-13. doi: 10.3969/j.issn.2097-0706.2023.01.001
	GE Leijiao, CUI Qingxue, LI Mingwei, et al. Review on situational awareness technology in a low-carbon oriented new power system[J]. Integrated Intelligent Energy, 2023, 45(1): 1-13. doi: 10.3969/j.issn.2097-0706.2023.01.001
[2]	旦乙画, 张芮漩. 电力系统接地装置腐蚀特性及其诊断技术[J]. 重庆大学学报, 2023, 46(11): 26-41.
	DAN Yihua, ZHANG Ruixuan. Corrosion characteristics and diagnosis technologies of grounding devices in power systems[J]. Journal of Chongqing University, 2023, 46(11): 26-41.
[3]	严玉琼, 张苏, 梁志星, 等. 2016-2021年我国电力企业人身事故统计与规律分析[J]. 安全, 2023, 44(4):46-51.
	YAN Yuqiong, ZHANG Su, LIANG Zhixing, et al. Statistics and analysis of electric power enterprises personal accidents in China during 2016-2021[J]. Safety & Security, 2023, 44(4): 46-51.
[4]	李晖, 刘栋, 姚丹阳. 面向碳达峰碳中和目标的我国电力系统发展研判[J]. 中国电机工程学报, 2021, 41(18): 6245-6259.
	LI Hui, LIU Dong,YAO Danyang. Analysis and reflection on the development of power system towards the goal of carbon emission peak and carbon neutrality[J]. Proceedings of the CSEE, 2021, 41(18): 6245-6259.
[5]	蔺家骏, 闫玮丹, 胡俊华, 等. 多模态知识图谱在电力运检中的应用与展望[J]. 综合智慧能源, 2024, 46(1): 65-74. doi: 10.3969/j.issn.2097-0706.2024.01.008
	LIN Jiajun, YAN Weidan, HU Junhua, et al. Application and prospect of multimodal knowledge graph in electric power operation inspection[J]. Integrated Intelligent Energy, 2024, 46(1): 65-74. doi: 10.3969/j.issn.2097-0706.2024.01.008
[6]	林穿, 徐启峰, 黄奕钒. 基于事理图谱的电力安全事故预控方法[J]. 中国安全生产科学技术, 2021, 17(10): 39-45.
	LIN Chuan, XU Qifeng, HUANG Yifan. Pro-control method of power safety accidents based on event evolutionary graph[J]. Journal of Safety Science and Technology, 2021, 17(10): 39-45.
[7]	何晓峰, 林子钊, 徐希, 等. 基于模糊化事故等级指标的调度风险评价方法[J]. 电力系统保护与控制, 2021, 49(5): 98-104.
	HE Xiaofeng, LIN Zizhao, XU Xi, et al. A dispatch risk assessment method based on fuzzification accident rating index[J]. Power System Protection and Control, 2021, 49(5): 98-104.
[8]	张苏, 刘晓露, 聂晓琴, 等. 电力人身伤亡事故致因网络的构建与分析[J]. 安全与环境学报, 2024, 24(6): 2305-2312.
	ZHANG Su, LIU Xiaolu, NIE Xiaoqin, et al. Construction and analysis of the causal network of electric power personal injury and death accidents[J]. Journal of Safety and Environment, 2024, 24(6): 2305-2312.
[9]	刘洋, 董久钰, 魏江. 数字创新管理:理论框架与未来研究[J]. 管理世界, 2020, 36(7): 198-217,219.
	LIU Yang, DONG Jiuyu, WEI Jiang. Digital innovation management:Theoretical framework and future research[J]. Journal of Management World, 2020, 36(7):198-217,219.
[10]	LIU B, ZHANG Z M. An Improved automatic extraction of Chinese mathematical terminology with iterated dilated residual gated convolutions[C]// Proceedings of 2021 Asia-Pacific Conference on Communications Technology and Computer Science (ACCTCS).IEEE, 2021: 178-181.
[11]	李冬梅, 张扬, 李东远, 等. 实体关系抽取方法研究综述[J]. 计算机研究与发展, 2020, 57(7): 1424-1448.
	LI Dongmei, ZHANG Yang, LI Dongyuan, et al. Review of entity relation extraction methods[J]. Journal of Computer Research and Development, 2020, 57(7): 1424-1448.
[12]	CHEN X L, OUYANG C P, LIU Y B, et al. Improving the named entity recognition of Chinese electronic medical records by combining domain dictionary and rules[J]. International Journal of Environmental Research and Public Health, 2020, 17(8): 2687.
[13]	张吉祥, 张祥森, 武长旭, 等. 知识图谱构建技术综述[J]. 计算机工程, 2022, 48(3): 23-37. doi: 10.19678/j.issn.1000-3428.0061803
	ZHANG Jixiang, ZHANG Xiangsen, WU Changxu, et al. Survey of knowledge graph construction techniques[J]. Computer Engineering, 2022, 48(3): 23-37. doi: 10.19678/j.issn.1000-3428.0061803
[14]	LIN J C W, SHAO Y N, DJENOURI Y, et al. ASRNN: A recurrent neural network with an attention model for sequence labeling[J]. Knowledge-Based Systems, 2021, 212: 106548.
[15]	LI Y H, SONG L, ZHANG C. Sparse conditional hidden Markov model for weakly supervised named entity recognition[C]// Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM, 2022: 978-988.
[16]	杨丽静, 唐俊, 沈伟富, 等. 基于命名实体识别的恶性肿瘤诊断文本信息提取研究[J]. 医院管理论坛, 2020, 37(8): 74-77.
	YANG Lijing, TANG Jun, SHEN Weifu, et al. Research on text information extraction of malignant tumor diagnosis based on named entity recognition[J]. Hospital Management Forum, 2020, 37(8): 74-77.
[17]	VAN HOUDT G, MOSQUERA C, NÁPOLES G. A review on the long short-term memory model[J]. Artificial Intelligence Review, 2020, 53(8): 5929-5955.
[18]	POOSTCHI H, PICCARDI M. BiLSTM-SSVM: Training the BiLSTM with a structured hinge loss for named-entity recognition[J]. IEEE Transactions on Big Data, 2022, 8(1):203-212.
[19]	何玉洁, 杜方, 史英杰, 等. 基于深度学习的命名实体识别研究综述[J]. 计算机工程与应用, 2021, 57(11): 21-36. doi: 10.3778/j.issn.1002-8331.2012-0170
	HE Yujie, DU Fang, SHI Yingjie, et al. Survey of named entity recognition based on deep learning[J]. Computer Engineering and Applications, 2021, 57(11): 21-36. doi: 10.3778/j.issn.1002-8331.2012-0170
[20]	WANG J N, XU W J, FU X Y, et al. ASTRAL:Adversarial trained LSTM-CNN for named entity recognition[J]. Knowledge-Based Systems, 2020, 197: 105842.
[21]	HU Y J, MAI G C, CUNDY C, et al. Geo-knowledge-guided GPT models improve the extraction of location descriptions from disaster-related social media messages[J]. International Journal of Geographical Information Science, 2023, 37(11): 2289-2318.
[22]	FU L, WENG Z Q, ZHANG J H, et al. MMBERT: A unified framework for biomedical named entity recognition[J]. Medical & Biological Engineering & Computing, 2023, 62(1): 327-341.
[23]	江叶峰, 孙少华, 仇晨光, 等. 电网故障处置预案文本中的命名实体识别研究[J]. 电力工程技术, 2021, 40(5): 177-183.
	JIANG Yefeng, SUN Shaohua, QIU Chenguang, et al. Named entity recognition in power fault disposal preplan text[J]. Electric Power Engineering Technology, 2021, 40(5): 177-183.
[24]	徐会芳, 张中浩, 谈元鹏, 等. 面向电网调度领域的实体识别技术[J]. 电力建设, 2021, 42(10): 71-77. doi: 10.12204/j.issn.1000-7229.2021.10.008
	XU Huifang, ZHANG Zhonghao, TAN Yuanpeng, et al. Research on entity recognition technology in power grid dispatching field[J]. Electric Power Construction, 2021, 42(10): 71-77. doi: 10.12204/j.issn.1000-7229.2021.10.008
[25]	郑闯. 电网智能客服问答系统设计与实现[D]. 沈阳: 中国科学院大学(中国科学院沈阳计算技术研究所), 2022.
	ZHENG Chuang. Design and implementation of intelligent customer service Q & A system for power grid[D]. Shenyang: Shenyang Institute of Computing Technology, Chinese Academy of Sciences, 2022.
[26]	陈庆, 柳雨生, 段练达, 等. 大语言模型融合知识图谱的风电运维问答系统研究[J]. 综合智慧能源, 2024, 46(9):61-68. doi: 10.3969/j.issn.2097-0706.2024.09.008
	CHEN Qing, LIU Yusheng, DUAN Lianda, et al. Research on a wind power operation and maintenance Q & A system based on large language models and knowledge graphs[J]. Integrated Intelligent Energy, 2024, 46(9):61-68. doi: 10.3969/j.issn.2097-0706.2024.09.008
[27]	林凌云, 陈青, 金磊, 等. 基于知识图谱的变电站告警信息故障知识表示研究与应用[J]. 电力系统保护与控制, 2022, 50(12): 90-99.
	LIN Lingyun, CHEN Qing, JIN Lei, et al. Research and application of substation alarm signal fault knowledge representation based on knowledge graph[J]. Power System Protection and Control, 2022, 50(12): 90-99.
[28]	唐焕玲, 卫红敏, 王育林, 等. 结合LDA与Word2vec的文本语义增强方法[J]. 计算机工程与应用, 2022, 58(13):135-145. doi: 10.3778/j.issn.1002-8331.2112-0491
	TANG Huanling, WEI Hongmin, WANG Yulin, et al. Text semantic enhancement method combining LDA and Word2Vec[J]. Computer Engineering and Applications, 2022, 58(13): 135-145. doi: 10.3778/j.issn.1002-8331.2112-0491
[29]	LIU C, SUN K J, ZHOU Q Q, et al. CPMI-ChatGLM: parameter-efficient fine-tuning ChatGLM with Chinese patent medicine instructions[J]. Scientific Reports, 2024, 14(1): 6403. doi: 10.1038/s41598-024-56874-w pmid: 38493251
[30]	SU J L, MURTADHA A, PAN S F, et al. Global pointer: novel efficient span-based approach for named entity recognition[EB/OL].(2022-08-06)[2024-09-05]. https://arxiv.org/abs/2208.03054v1.
[31]	HUANG Z H, XU W, YU K. Bidirectional LSTM-CRF models for sequence tagging[EB/OL].(2015-08-09) [2024-09-05]. https://arxiv.org/abs/1508.01991v1.
[32]	WAN T Y, WANG W H, ZHOU H. Research on information extraction of municipal solid waste crisis using BERT-LSTM-CRF[C]// Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval. ACM, 2020: 205-209.

项目	内容
原始文本	1名工作人员在6 kV母线清擦作业结束后发生事故误碰带电母线触电死亡
分词文本	1名工作人员在 6 kV 母线清擦作业结束后发生事故误碰带电母线触电死亡
原始Mask输入	1名工作人员在 6 kV 母线清[MASK] 作业结束后发生 [MASK]故误碰 [MASK]电母线触电死[MASK]
SAMS	1名 [MASK] 在 6 kV 母线清理 [MASK] [MASK] 后出现 [MASK] [MASK] 有电母线 [MASK] 死去

参数	数值
LSTM层数量	2
LSTM隐藏层大小	128
CRF学习率	3×10^-3
BERT-WWM学习率	3×10^-5
训练轮数	20
adam_epsilon	1×10^-8
最大序列长度	256
训练批次大小	32
预热比例	0.01
验证批次大小	32
保存步骤	20
权重衰减	0.01

数据集	准确率
仅使用DuEE1.0数据集	31.2
加入自主爬取的语料数据集	89.4

模型	准确率	召回率	F₁
ChatGLM-6B	79.8	83.4	81.6
GlobalPointer	82.3	73.7	77.8
BiLSTM-CRF	76.6	80.9	78.7
BERT-LSTM-CRF	84.6	86.3	85.4
BERT-BiLSTM-CRF	86.8	92.1	89.4

文本内容	模型	时间	地点	事故类型	死亡人数
10月31日，重庆乌江电力工程有限公司在重庆乌江电力有限公司35 kV乜恒鑫线更换雷击绝缘子作业过程中，2名作业人员完工下塔时，1人误碰带电线路发生触电事故，另1人受惊吓从塔上跌落，造成2人死亡	BERT-Bilstm-CRF	10月31日	5 kV乜恒鑫线	触电/坠落	2
	ChatLM- 6B	10月31日	重庆乌江电力有限公司	触电	1