Optimization operation of multi-energy complementary system based on multi-agent reinforcement learning

doi:10.3969/j.issn.2097-0706.2026.03.002

Abstract

Abstract:

To address the dynamic coordinated optimization challenges of multi-energy complementary systems under high renewable energy integration and the limitations of traditional centralized methods in multi-agent interest coordination and real-time response， dynamic optimization modeling research was conducted. A three-layer multi-agent reinforcement learning （MARL） framework—consisting of a physical layer， decision layer， and coordination layer—was developed， with energy producers， consumers， and system schedulers classified as independent agents. Based on the improved proximal policy optimization algorithm， a dynamic reward function integrating economic efficiency， environmental friendliness， and stability was designed， and distributed decision-making with global coordination was achieved through a centralized training-decentralized execution mechanism. A typical park-level multi-energy complementary system was used as a case study. The results showed that the proposed MARL model increased the renewable energy consumption rate to 92.3%， reducing the unit electricity cost by 28.9% compared to the traditional mixed integer programming （MIP） method. Under a 50% load abrupt change scenario， the system recovery time was shortened to 90 s， which was 900% faster than the MIP method. Even with ±20% wind and solar forecasting errors， the load satisfaction rate remained at 98.7%. This dynamic optimization model effectively addressed the multi-agent coordination and uncertainty adaptation challenges in multi-energy complementary systems， providing technical support for the real-time optimization and scheduling of high-penetration renewable energy systems.

Key words: multi-energy complementary system, multi-agent reinforcement learning, dynamic optimization modeling, source-grid-load-storage, renewable energy consumption, coordinated scheduling

CLC Number:

TK 01：TM 73

CHEN Feng, LU Xiaomin, LI Mengyang, ZHANG Tao, YANG Fan. Optimization operation of multi-energy complementary system based on multi-agent reinforcement learning[J]. Integrated Intelligent Energy, 2026, 48(3): 15-26.

Figures/Tables 17

Fig.1

Fig.2

Table 1

Table 2

Table 3

Fig.3

Table 4

Fig.4

Table 5

Fig.5

Table 6

Fig.6

Fig.7

Table 7

Fig.8

Fig.9

Fig.10

References 30

[1]	肖玲娟, 李润祥. 以风电为主的新能源电站源-储协调优化调度方法[J]. 自动化与仪器仪表, 2025(7): 97-102.
	XIAO Lingjuan, LI Runxiang. Optimization and scheduling method for source storage coordination of new energy power stations mainly based on wind power[J]. Automation & Instrumentation, 2025(7): 97-102.
[2]	郭少臣. 风电储能系统中的能量管理效率技术优化路径[J]. 电气技术与经济, 2024(12): 69-72.
	GUO Shaochen. Technical optimization path of energy management efficiency in wind energy storage system[J]. Electrical Equipment and Economy, 2024(12): 69-72.
[3]	赵书强, 吴博, 李志伟, 等. 风电-储能参与调频的高比例风电电力系统运行经济性分析[J]. 南方电网技术, 2023, 17(4): 69-76, 89. doi: 10.13648/j.cnki.issn1674-0629.2023.04.007
	ZHAO Shuqiang, WU Bo, LI Zhiwei, et al. Operational economic analysis of high-proportion wind power system with wind power and energy storage participating in frequency regulation[J]. Southern Power System Technology, 2023, 17(4): 69-76, 89. doi: 10.13648/j.cnki.issn1674-0629.2023.04.007
[4]	徐楠, 陈斌, 黄伟, 等. 基于滚动优化的电-热-气-冷系统多时间尺度低碳运行[J]. 电网与清洁能源, 2024, 40(7): 95-106, 113.
	XU Nan, CHEN Bin, HUANG Wei, et al. Multi-timescale low-carbon operation of the electric-heat-gas-cooling combined supply system based on rolling optimization[J]. Advances of Power System & Hydroelectric Engineering, 2024, 40(7): 95-106, 113.
[5]	周益民, 杨博, 胡袁炜骥, 等. 考虑绿证-碳交易的多能互补综合能源系统电-热-气协同低碳优化调度[J]. 电网技术, 2025, 49(6): 2428-2435.
	ZHOU Yimin, YANG Bo, HU Yuanweiji, et al. Electricity-heat-gas collaborative low-carbon optimal dispatch of multi-energy complementary integrated energy system considering green certificate and carbon emission trading[J]. Power System Technology, 2025, 49(6): 2428-2435.
[6]	刘毓伶, 赵兴勇, 刘立. 含信息间隙决策理论的电热气联合系统优化调度[J]. 电力系统及其自动化学报, 2024, 36(12): 45-53.
	LIU Yuling, ZHAO Xingyong, LIU Li. Optimization and scheduling of power, heat and gas combination system based on information gap decision theory[J]. Proceedings of the CSU-EPSA, 2024, 36(12): 45-53.
[7]	徐新宇, 秦伟, 李楠, 等. 基于强化学习的马尔科夫决策过程电网侧储能系统的智能调度优化[J]. 电工技术, 2025(13): 94-96.
	XU Xinyu, QIN Wei, LI Nan, et al. Intelligent scheduling optimization of power grid side energy storage system based on reinforcement learning Markov decision process[J]. Electric Engineering, 2025(13): 94-96.
[8]	张硕, 袁春辉, 李英姿, 等. 新型电力系统演化趋优下多元负荷绿色协同市场行为DQN仿真模型[J]. 电力建设, 2025, 46(7): 191-204.
	ZHANG Shuo, YUAN Chunhui, LI Yingzi, et al. DQN simulation model of green cooperative market behavior of multi-load users for optimal evolution of new-type power system[J]. Electric Power Construction, 2025, 46(7): 191-204.
[9]	周雪松, 韩静, 马幼捷, 等. 基于DQN算法的直流微电网负载接口变换器自抗扰控制策略[J]. 电力系统保护与控制, 2025, 53(1): 95-103.
	ZHOU Xuesong, HAN Jing, MA Youjie, et al. Active disturbance rejection control strategy of a DC microgrid load interface converter based on a DQN algorithm[J]. Power System Protection and Control, 2025, 53(1): 95-103.
[10]	王仁浚, 高红均, 罗龙波, 等. 基于深度强化学习的新型配电系统优化运行研究综述[J]. 电力自动化设备, 2025, 45(9): 152-164.
	WANG Renjun, GAO Hongjun, LUO Longbo, et al. Review of research on new distribution system optimization operation based on deep reinforcement learning[J]. Electric Power Automation Equipment, 2025, 45(9): 152-164.
[11]	青辰, 魏震波, 刘洋, 等. 基于双时间尺度模型预测控制的灵活性资源动态调度[J]. 高压电器, 2025, 61(5): 31-40, 52.
	QING Chen, WEI Zhenbo, LIU Yang, et al. Dynamic scheduling of flexible resources based on dual-time scale model predictive control[J]. High Voltage Apparatus, 2025, 61(5): 31-40, 52.
[12]	江美慧, 许镇江, 张其朴, 等. 面向综合能源系统的综合需求响应策略及其建模技术[J]. 电力建设, 2024, 45(12): 65-82.
	JIANG Meihui, XU Zhenjiang, ZHANG Qipu, et al. Integrated demand response strategy and modeling technology for integrated energy systems[J]. Electric Power Construction, 2024, 45(12): 65-82.
[13]	蒋明轩, 卞艺衡, 李更丰, 等. 面向能源互联网的电-碳-氢耦合交易市场研究综述[J]. 电力建设, 2025, 46(8): 150-165.
	JIANG Mingxuan, BIAN Yiheng, LI Gengfeng, et al. Review of the research on the electricity-carbon-hydrogen coupling trading market under the energy Internet[J]. Electric Power Construction, 2025, 46(8): 150-165.
[14]	李民, 周博宇, 蒋雷雷, 等. 基于多智能体强化学习的电池一致性充放电控制策略[J]. 武汉大学学报(理学版), 2025, 71(3): 395-403.
	LI Min, ZHOU Boyu, JIANG Leilei, et al. Battery consistency charge-discharge control strategy based on multi-agent reinforcement learning[J]. Journal of Wuhan University (Natural Science Edition), 2025, 71(3): 395-403.
[15]	徐慧慧, 田云飞, 赵宇洋, 等. 基于条件生成对抗网络与多智能体强化学习的配电网可靠性评估方法[J]. 中国电力, 2025, 58(4): 230-236.
	XU Huihui, TIAN Yunfei, ZHAO Yuyang, et al. A reliability assessment method for distribution networks based on conditional generative adversarial network and multi-agent reinforcement learning[J]. Electric Power, 2025, 58(4): 230-236.
[16]	李峥嵘, 刘雨欣, 朱晗, 等. 面向“双碳”目标的光储一体化建筑集群电力协同调度方法研究综述[J]. 建筑节能, 2023, 51(8): 1-10.
	LI Zhengrong, LIU Yuxin, ZHU Han, et al. Review on power cooperative scheduling method for building cluster integrated PV and storage dual to the carbon peaking and carbon neutrality goals[J]. Building Energy Efficiency, 2023, 51(8): 1-10.
[17]	于小唐, 覃智君, 梁志坚. 需求响应的园区综合能源系统优化运行方式研究[J]. 广西大学学报(自然科学版), 2024, 49(4): 788-798.
	YU Xiaotang, QIN Zhijun, LIANG Zhijian. Research on optimization operation mode of park integrated energy system for demand response[J]. Journal of Guangxi University (Natural Science Edition), 2024, 49(4): 788-798.
[18]	葛文超, 任洪波, 朱跃钊, 等. 镇级可再生能源多能互补系统优化配置与调度探索[J]. 太阳能学报, 2025, 46(3): 168-176.
	GE Wenchao, REN Hongbo, ZHU Yuezhao, et al. Optimized allocation and scheduling of renewable energy in multi-energy complementary system at township level[J]. Acta Energiae Solaris Sinica, 2025, 46(3): 168-176.
[19]	洪春雪, 肖海平, 谭甲群, 等. 考虑容量电价的风光火储能源基地多目标优化调度[J]. 综合智慧能源, 2025, 47(7): 1-11. doi: 10.3969/j.issn.2097-0706.2025.07.001
	HONG Chunxue, XIAO Haiping, TAN Jiaqun, et al. Multi-objective optimal schedule of a wind-photovoltaic-thermal-storage energy base considering capacity tariffs[J]. Integrated Intelligent Energy, 2025, 47(7): 1-11. doi: 10.3969/j.issn.2097-0706.2025.07.001
[20]	钟永洁, 王紫东, 左建勋, 等. 计及多时段尺度与地域分层的多能互补系统经济调度[J]. 综合智慧能源, 2024, 46(4): 52-59. doi: 10.3969/j.issn.2097-0706.2024.04.007
	ZHONG Yongjie, WANG Zidong, ZUO Jianxun, et al. Economic dispatch of multi-energy complementary systems considering multi-period scales and regional stratification[J]. Integrated Intelligent Energy, 2024, 46(4): 52-59. doi: 10.3969/j.issn.2097-0706.2024.04.007
[21]	王心玉, 李金航, 陈衡, 等. 同时参与绿证交易-碳交易的区域多能互补电力系统优化调度[J]. 动力工程学报, 2025, 45(2): 315-324. doi: 10.19805/j.cnki.jcspe.2025.230749
	WANG Xinyu, LI Jinhang, CHEN Heng, et al. Optimization scheduling of regional multi-energy complementary power system participating simultaneously in green certificate trading-carbon trading[J]. Journal of Chinese Society of Power Engineering, 2025, 45(2): 315-324. doi: 10.19805/j.cnki.jcspe.2025.230749
[22]	曹猛, 解超, 尹纯亚, 等. 计及源荷不确定性的多综合能源微网协同优化运行策略[J]. 电网与清洁能源, 2025, 41(3): 112-124.
	CAO Meng, XIE Chao, YIN Chunya, et al. Co-optimized operation strategy for multiple integrated energy microgrids considering source-load uncertainty[J]. Power System and Clean Energy, 2025, 41(3): 112-124.
[23]	何良策, 王宇, 卢志刚, 等. 面向多区域综合能源系统低碳运行的共享电-氢储能优化配置[J]. 电力系统保护与控制, 2025, 53(18): 52-63.
	HE Liangce, WANG Yu, LU Zhigang, et al. Optimal allocation of shared electricity-hydrogen storage for low-carbon operation of multiple regional integrated energy systems[J]. Power System Protection and Control, 2025, 53(18): 52-63.
[24]	蒋灵慧, 冯霞, 崔凯平, 等. 面向联邦学习的油电混合电动汽车站点推荐算法研究[J]. 重庆理工大学学报(自然科学版), 2025, 39(3): 113-119.
	JIANG Linghui, FENG Xia, CUI Kaiping, et al. Research on hybrid electric vehicle station recommendation algorithm based on federated learning[J]. Journal of Chongqing Institute of Technology, 2025, 39(3): 113-119.
[25]	吴润泽, 霍金鑫, 郭昊博. 基于DQN的电力协同计算与缓存的任务卸载策略[J]. 电力建设, 2024, 45(8): 149-158. doi: 10.12204/j.issn.1000-7229.2024.08.014
	WU Runze, HUO Jinxin, GUO Haobo. DQN-based task offloading strategy for power co-computing and caching[J]. Electric Power Construction, 2024, 45(8): 149-158. doi: 10.12204/j.issn.1000-7229.2024.08.014
[26]	乔艺林, 王楚通, 熊厚博, 等. 基于发电侧共建共享的季节性-短期混合储能系统优化配置[J]. 可再生能源, 2025, 43(5): 654-662.
	QIAO Yilin, WANG Chutong, XIONG Houbo, et al. Optimization of seasonal and short-term hybrid energy storage system based on co-construction and sharing of power generation side[J]. Renewable Energy Resources, 2025, 43(5): 654-662.
[27]	马爽, 张弘毅, 马炜翔, 等. 基于沙戈荒地区源荷随机特性的多能源优化运行研究综述[J]. 高电压技术, 2025, 51(2): 507-519.
	MA Shuang, ZHANG Hongyi, MA Weixiang, et al. Research review on the optimal operation of multi-energy system in gobi and desert areas based on stochastic characteristics of source-load[J]. High Voltage Engineering, 2025, 51(2): 507-519.
[28]	黄晟, 潘丽君, 屈尹鹏, 等. 基于改进深度Q网络的无预测风电场日前拓扑优化[J]. 电力系统自动化, 2025, 49(2): 122-132.
	HUANG Sheng, PAN Lijun, QU Yinpeng, et al. Day-ahead topology optimization for wind farm without forecasting based on improved deep Q network[J]. Automation of Electric Power Systems, 2025, 49(2): 122-132.
[29]	孙浩, 邢作霞, 吴维宁, 等. 计及电力市场交易机制的风-光-储-氢混合电厂配置策略研究[J]. 储能科学与技术, 2025, 14(7): 2801-2812. doi: 10.19799/j.cnki.2095-4239.2025.0358
	SUN Hao, XING Zuoxia, WU Weining, et al. Research on configuration strategies for wind-solar-battery-hydrogen hybrid power plants considering electricity market trading mechanisms[J]. Energy Storage Science and Technology, 2025, 14(7): 2801-2812. doi: 10.19799/j.cnki.2095-4239.2025.0358
[30]	吴晨曦, 倪索引, 郑静, 等. 考虑能碳溯源的风储协同调度研究[J]. 电网与清洁能源, 2025, 41(2): 84-92.
	WU Chenxi, NI Suoyin, ZHENG Jing, et al. Research on wind storage-energy co-scheduling considering energy-carbon traceability[J]. Power System and Clean Energy, 2025, 41(2): 84-92.

智能体	目标
DER智能体（风电/光伏）	最大化出力预测精度
负荷智能体（工业/商业用户）	最小化用电成本
系统协调智能体	全局经济性与低碳性最优

项目	场景1：高电价+中等负荷（经济优先）	场景2：低电价+高负荷（稳定性优先）	场景3：环保红线+任意负荷（环保优先）	场景4：极端负荷+普通电价（稳定性优先）
背景	工作日10:00—15:00，尖峰电价（1.5 元/（kW·h）），负荷率70%，污染物排放正常	00:00—06:00，低谷电价（0.5 元/（kW·h）），负荷率90%，接近稳定临界点	重污染预警（${\rho }_{\mathrm{P}{\mathrm{M}}_{2.5}}$>150 μg/m³），需减排30%	夏季16:00—20:00，负荷率95%，平段电价（0.8 元/（kW·h））
核心目标	最小化购电成本，兼顾基础环保与稳定性	保障系统稳定，防止负荷超限跳闸	严格控制污染物排放，牺牲部分经济收益	防止系统崩溃（电压跌落/线路过载）
权重取值	α=0.60，β=0.20，γ=0.20	α=0.25，β=0.05，γ=0.70	α=0.15，β=0.7，γ=0.15	α=0.20，β=0.10，γ=0.70
关键激励动作	负荷转移至低谷时段、高性价比电源出力	分散式电源协同平抑波动、负荷错峰	清洁能源优先出力、高污染机组降负荷	可中断负荷切除、储能紧急放电支撑
时变策略	电价降至1.2 元/（kW·h）时，α降至0.4，β，γ相应提升	负荷率>95%时，γ临时提升至0.85	${\rho }_{\mathrm{P}{\mathrm{M}}_{2.5}}$>200 μg/m³时，β提升至0.8，α降至0.1	负荷率<80%时，γ降至0.3，α回升至0.5

模块	原PPO算法	改进PPO算法
决策架构	单智能体/集中式决策	CTDE框架（Actor分散决+ Critic集中评估）
特征处理	无差别处理所有节点特征	注意力机制动态加权关键节点（权重$\ge 0.6$）
多目标平衡	固定权重奖励函数	冲突协调因子λ平衡经济性和稳定性

评价指标	改进PPO算法	传统MIP方法
负荷满足率/%	98.70	89.30
决策延迟/s	4.2	28.6
风光消纳率/%	96.50	82.30
设备故障恢复时间/mim	15	47

指标	测试结果	说明
加密传输延迟	平均5.5 ms，最大8.0 ms	基于园区级算例的1 000次通信测试结果
加密强度	AES-256+SHA-256，抗暴力破解时间≥10³⁸ a	符合ISO 27001信息安全标准
数据泄露风险	0次原始数据泄露（鲁棒性验证中未发现）	联邦学习框架下原始数据仅保留在本地智能体
局部响应延迟	45 ms（含加密）	较传统集中式加密（120 ms）降低62.5%