基于双尺度度量的改进模糊均值曲线聚类方法研究

doi:10.3969/j.issn.2097-0706.2022.04.001

摘要/Abstract

摘要：

智能化电力网络中存在许多随时间变化表现出明显曲线特征的函数型数据,进行曲线聚类可以有效挖掘数据信息。针对模糊均值聚类算法初始聚类中心选取困难以及曲线聚类方法中相似度衡量不准确等问题,提出一种基于双尺度度量的改进模糊均值曲线聚类方法。根据皮尔逊距离衡量曲线的纵向形状相似性、动态时间弯曲距离衡量曲线的横向形状相似性,提出一种基于双尺度度量的密度峰值算法确定初始聚类中心;采用改进熵权法融合皮尔逊距离与动态时间弯曲距离作为聚类算法中的相似性度量;采用聚类有效性指标,从聚类效果、算法稳定性2个方面对聚类结果和算法性能进行评价;最后采用某地区一年的风电实际出力数据作为算例样本进行聚类分析,验证了所提出模型及算法的正确性和有效性。

关键词: 智能电网, 数据挖掘, 曲线聚类, 改进模糊均值曲线聚类, 皮尔逊距离, 动态时间弯曲距离, 改进熵权法, 相似度, 风电

Abstract:

There are many functional data showing obvious curve features that vary with time in intelligent power networks. Curve clustering can effectively mine the data information. Aiming at the difficulty in selecting the initial clustering centre for fuzzy mean clustering algorithm and the inaccurate similarity measurement of curve clustering methods, an improved fuzzy mean curve clustering method based on two-scale metric is proposed. The longitudinal shape similarity of a curve is measured according to the Pearson distance, and the horizontal shape similarity of the curve is measured according to the dynamic time wrapping distance. Then,a density peak algorithm based on two-scale measurement is proposed to determine the initial clustering centre. The improved entropy weight method combines Pearson distance and dynamic time wrapping distance in similarity measurement of clustering algorithm. Clustering validity indexes are taken to evaluate the clustering results and algorithm performance from the aspects of clustering effect and algorithm stability. At last, taking the annual data of wind power outputs in a region as the example for clustering analysis,the results verify the correctness and effectiveness of the model and calculation method.

Key words: intelligent grid, data mining, curve clustering, improved fuzzy mean curve clustering, Pearson distance, dynamic time warping distance, improved entropy weight method, similarity, wind power

中图分类号:

TK01⁺1：TP301.6：TM614

陈甜甜, 高亚静, 卢占会. 基于双尺度度量的改进模糊均值曲线聚类方法研究[J]. 综合智慧能源, 2022, 44(4): 1-11.

CHEN Tiantian, GAO Yajing, LU Zhanhui. Research on improved fuzzy mean curve clustering method based on two-scale measurement[J]. Integrated Intelligent Energy, 2022, 44(4): 1-11.

图/表 19

图1

图2

图3

图4

图5

图6

表1

表2

图7

图8

表3

图9

表4

表5

表6

图10

图11

图12

图13

参考文献 28

[1]	许腾腾, 王瑞, 黄恒君. 一种加入类间因素的曲线聚类算法[J]. 智能系统学报, 2019, 14(2):362-368.
	XU Tengteng, WANG Rui, HUANG Hengjun. Curve clustering algorithms by adding the differences among clusters[J]. CAAI Transactions on Intelligent Systems, 2019, 14(2):362-368.
[2]	张东霞, 苗新, 刘丽平, 等. 智能电网大数据技术发展研究[J]. 中国电机工程学报, 2015, 35(1):2-12.
	ZHANG Dongxia, MIAO Xin, LIU Liping, et al. Research on development strategy for smart grid big data[J]. Proceedings of the CSEE, 2015, 35(1):2-12.
[3]	朱天怡, 艾芊, 贺兴, 等. 基于数据驱动的用电行为分析方法及应用综述[J]. 电网技术, 2020, 44(9):3497-3507.
	ZHU Tianyi, AI Qian, HE Xing, et al. An overview of data-driven electricity consumption behavior analysis method and application[J]. Power System Technology, 2020, 44(9):3497-3507.
[4]	王群, 董文略, 杨莉. 基于Wasserstein距离和改进K-medoids聚类的风电/光伏经典场景集生成算法[J]. 中国电机工程学报, 2015, 35(11):2654-2661.
	WANG Qun, DONG Wenlue, YANG Li. A wind power/photovoltaic typical scenario set generation algorithm based on Wasserstein distance metric and revised K-medoids cluster[J]. Proceedings of the CSEE, 2015, 35(11):2654-2661.
[5]	MAHELA O P, KHAN B, ALHELOU H H, et al. Power quality assessment and event detection in distribution network with wind energy penetration using stockwell transform and fuzzy clustering[J]. IEEE Transactions on Industrial Informatics, 2020, 16(11):6922-6932. doi: 10.1109/TII.2020.2971709
[6]	王潇笛, 刘俊勇, 刘友波, 等. 采用自适应分段聚合近似的典型负荷曲线形态聚类算法[J]. 电力系统自动化, 2019, 43(1):110-118.
	WANG Xiaodi, LIU Junyong, LIU Youbo, et al. Shape clustering algorithm of typical load curves based on adaptive piecewise aggregate approximation[J]. Automation of Electric Power Systems, 2019, 43(1):110-118.
[7]	刘永光, 孙超亮, 牛贞贞, 等. 改进型模糊C均值聚类算法的电力负荷特性分类技术研究[J]. 电测与仪表, 2014, 51(18):5-9.
	LIU Yongguang, SUN Chaoliang, NIU Zhenzhen, et al. Research on the improved fuzzy C-means clustering algorithm based power load characteristic classification technology[J]. Electrical Measurement & Instrumentation, 2014, 51(18):5-9.
[8]	刘辉舟, 周开乐, 胡小建. 基于模糊负荷聚类的不良负荷数据辨识与修正[J]. 中国电力, 2013, 46(10):29-34.
	LIU Huizhou, ZHOU Kaile, HU Xiaojian. Bad data identification and correction based on load clustering by FCM algorithm[J]. Electric Power, 2013, 46(10):29-34.
[9]	徐衍会, 张蓝宇, 宋歌. 基于核的模糊C均值逐层聚类算法在负荷分类中的应用[J]. 电力建设, 2015, 36(4):46-51.
	XU Yanhui, ZHANG Lanyu, SONG Ge. Application of clustering hierarchy algorithm based on kernel fuzzy C-means in power load classification[J]. Electric Power Construction, 2015, 36(4):46-51.
[10]	吴亚雄, 高崇, 曹华珍, 等. 基于灰狼优化聚类算法的日负荷曲线聚类分析[J]. 电力系统保护与控制, 2020, 48(6):68-76.
	WU Yaxiong, GAO Chong, CAO Huazhen, et al. Clustering analysis of daily load curves based on GWO algorithm[J]. Power System Protection and Control, 2020, 48(6):68-76.
[11]	常鲜戎, 孙景文. 基于改进的模糊C均值聚类的负荷预处理[J]. 华北电力大学学报(自然科学版), 2014, 41(1):27-32.
	CHANG Xianrong, SUN Jingwen. Data processing based on improved fuzzy C-means clustering[J]. Journal of North China Electric Power University(Natural Science Edition), 2014, 41(1):27-32.
[12]	孔祥玉, 胡启安, 董旭柱, 等. 引入改进模糊C均值聚类的负荷数据辨识及修复方法[J]. 电力系统自动化, 2017, 41(9):90-95.
	KONG Xiangyu, HU Qi'an, DONG Xuzhu, et al. Load data identification and correction method with improved fuzzy C-means clustering algorithm[J]. Automation of Electric Power Systems, 2017, 41(9):90-95.
[13]	李云飞, 张鹏, 程鹏飞, 等. 大数据挖掘下冲击性负荷特性电网短期负荷预测的探索与实践[J]. 电力大数据, 2019, 22(4):80-86.
	LI Yunfei, ZHANG Peng, CHENG Pengfei, et al. Exploration and practice of short-term load forecasting based on large data mining under impulse load characteristics[J]. Power Systems and Big Data, 2019, 22(4):80-86.
[14]	李阳, 刘友波, 刘俊勇, 等. 基于形态距离的日负荷数据自适应稳健聚类算法[J]. 中国电机工程学报, 2019, 39(12):3409-3420.
	LI Yang, LIU Youbo, LIU Junyong. Self-adaptive and robust clustering algorithm for daily load profiles based on morphological distance[J]. Proceedings of the CSEE, 2019, 39(12):3409-3420.
[15]	张发才, 李喜旺, 樊国旗. 基于高斯混合聚类的风电出力场景划分[J]. 计算机系统应用, 2021, 30(1):146-153.
	ZHANG Facai, LI Xiwang, FAN Guoqi. Wind power output scene division based on Gaussian hybrid clustering[J]. Computer Systems & Applications, 2021, 30(1):146-153.
[16]	黎静华, 桑川川, 甘一夫, 等. 风电功率预测技术研究综述[J]. 现代电力, 2017, 34(3):1-11.
	LI Jinghua, SANG Chuanchuan, GAN Yifu, et al. A review of researches on wind power forecasting technology[J]. Modern Electric Power, 2017, 34(3):1-11.
[17]	杨明明. 基于机舱传递函数的风机功率曲线研究[J]. 华电技术, 2020, 42(5):50-54.
	YANG Mingming. Research on wind turbine power curve based on Nacelle Transfer Function[J]. Huadian Technology, 2020, 42(5):50-54.
[18]	OZKAN M B, KARAGOZ P. Data mining-based upscaling approach for regional wind power forecasting: Regional statistical hybrid wind power forecast technique (RegionalSHWIP)[J]. IEEE Access, 2019, 7:171790-171800. doi: 10.1109/ACCESS.2019.2956203
[19]	林俐, 肖舒, 费宏运, 等. 基于曲线形态特征的地区规模化风电出力场景划分[J]. 电网与清洁能源, 2020, 36(3):74-81,88.
	LIN Li, XIAO Shu, FEI Hongyun, et al. Regional scaled wind power output scene segmentation based on curve morphological features[J]. Power System and Clean Energy, 2020, 36(3):74-81,88.
[20]	周炳华, 王洋, 李峰, 等. 城市能源互联网视角下的主动配电网规划设计与策略研究[J]. 华电技术, 2021, 43(1):59-65. doi: 10.3969/j.issn.1674-1951.2021.01.010
	ZHOU Binghua, WANG Yang, LI Feng, et al. Research on active distribution network planning and design strategy from the perspective of urban energy internet[J]. Huadian Technology, 2021, 43(1):59-65.
[21]	肖满生, 阳娣兰, 张居武, 等. 基于模糊相关度的模糊C均值聚类加权指数研究[J]. 计算机应用, 2010, 30(12):3388-3390.
	XIAO Mansheng, YANG Dilan, ZHANG Juwu, et al. Research of weighting exponent of fuzzy C-means algorithm based on fuzzy relevance[J]. Journal of Computer Applications, 2010, 30(12):3388-3390.
[22]	王瑞峰, 王庆荣. 基于改进双层聚类多目标优化的配电网动态重构[J]. 电力系统保护与控制, 2019, 47(21):92-99.
	WANG Ruifeng, WANG Qingrong. Multi-objective optimization of dynamic reconfiguration of distribution network based on improved bilayer clustering[J]. Power System Protection and Control, 2019, 47(21):92-99.
[23]	CHOI W, CHO J, LEE S, et al. Fast constrained dynamic time warping for similarity measure of time series data[J]. IEEE Access, 2020, 8(8):222841-222858. doi: 10.1109/ACCESS.2020.3043839
[24]	李钢, 杜欣慧, 裴玥瑶, 等. 基于改进密度峰值聚类的超短期工业负荷预测[J]. 电测与仪表, 2021, 3(14):1-6.
	LI Gang, DU Xinhui, PEI Yueyao, et al. Ultra-short term industrial load prediction based on improved density peak clustering[J]. Electrical Measurement & Instrumentation. 2021, 3(14):1-6.
[25]	RODRIGUEZ A, LAIO A. Clustering by fast search and find of density peaks[J]. Science, 2014, 344(6191):1492-1496. doi: 10.1126/science.1242072
[26]	欧阳森, 石怡理. 改进熵权法及其在电能质量评估中的应用[J]. 电力系统自动化, 2013, 37(21):156-159,164.
	OUYANG Sen, SHI Yili. A new improved entropy method and its application in power quality evaluation[J]. Automation of Electric Power Systems, 2013, 37(21):156-159,164.
[27]	XIE X L, BENI G. A validity measure for fuzzy clustering[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1991, 13(8):841-847. doi: 10.1109/34.85677
[28]	周世兵, 徐振源, 唐旭清. 新的K-均值算法最佳聚类数确定方法[J]. 计算机工程与应用, 2010, 46(16):27-31.
	ZHOU Shibing, XU Zhenyuan, TANG Xuqing. New method for determining optimal number of clusters in K-means clustering algorithm[J]. Computer Engineering and Applications, 2010, 46(16):27-31.

算法	第1类	第2类	第3类	第4类	第5类
标准FCM	7.271 2	5.480 6	6.829 0	5.179 9	2.976 5
改进FCM	2.619 1	3.828 7	3.546 8	3.717 0	3.718 5

算法	聚类数	XB指标值	SI指标值
标准FCM	5	5.234 5	5.547 4
改进FCM	5	1.250 8	3.486 0

初始聚类中心选取方法	聚类数	XB	SI	最优聚类次数	程序平运行时间/s
随机选取	5	1.250 8	3.468 0	2	0.928 8
改进K-means	5	2.115 1	3.598 7	30	0.670 9
欧氏密度峰值	5	2.115 0	3.598 7	30	0.673 3
本文方法	5	1.250 8	3.468 0	30	0.845 2

相似度度量方法	最优聚类数	XB
欧氏距离	3	3.755 2
皮尔逊距离	2	1.556 0
DTW距离	2	3.799 9
综合距离	5	1.250 8

相似度度量方法	聚类数	XB	SI	程序平均运行时间/s
欧氏距离	5	5.234 5	5.547 5	0.636 6
皮尔逊距离	5	1.586 0	3.644 0	0.689 4
DTW距离	5	5.777 2	4.881 1	1.171 2
综合距离	5	1.250 8	3.468 0	0.845 2