综合智慧能源

• •    

基于MDLOF-iForest和M-KNN-Slope的公共建筑负荷异常数据识别与修复

刘一宁, 陈柏安, 杜鹏程, 林晓刚, 江美慧   

  1. 广西大学电气工程学院, 广西壮族自治区 530004 中国
    中国科学院海西研究院泉州装备制造研究中心, 福建 362000 中国
    内蒙古工业大学新能源学院, 内蒙古自治区 017010 中国
  • 收稿日期:2024-12-31 修回日期:2025-02-11
  • 基金资助:
    国家自然科学基金面上项目(52307072)

Recognition and repair of abnormal data of public building load based on MDLOF-iForest and M-KNN-Slope

LIU Yining, CHEN Baian, DU Pengcheng, LIN Xiaogang, JIANG Meihui   

  1. School of Electrical Engineering , Guangxi University,Nanning 530004, 530004, China
    Quanzhou Equipment Manufacturing Research Center, Herwest Institute, Chinese Academy of Sciences, 362000, China
    School of Reneable Energy, Inner Mongolia University of Technology,Erdos 017010, 017010, China
  • Received:2024-12-31 Revised:2025-02-11
  • Supported by:
    National Natural Science Foundation of China(52307072)

摘要: 在公共建筑能耗研究中,公共建筑负荷易产生异常数据,对异常负荷值进行识别与修复是不可或缺的数据处理环节。针对现有的异常数据识别与修复方法的局限性,本文提出了一种基于MDLOF-iForest算法和M-KNN-Slope算法的公共建筑负荷异常数据识别与修复方法。MDLOF-iForest算法在传统的LOF算法中引入马氏距离,提高了模型对数据特征间关联性的感知能力,同时将MDLOF算法与iForest算法的优势相结合,可以快速准确的识别出异常数据。其次,M-KNN-Slope算法利用异常数据与正常数据负荷趋势线特征相似的邻居,得到相似趋势线斜率加权平均值,完成对异常数据的修复,减少了对样本数据的依赖。通过对中国广西的一栋办公公共建筑和一栋商业公共建筑2024年8月至11月负荷数据的验证,修复后90%左右数据与正确数据差值在10%以内,且相较一般算法,M-KNN-Slope算法能够获得更多误差在5%以内数据,利用修复前后进行预测(XGBoost、LSTM、BP和SVM)在均方根值(Root Mean Square Error, RMSE)上分别降低了5.02%-14.40%,在绝对平均误差(Mean Absolute Error, MAE)上分别降低了2.44%-13.34%。

关键词: 公共建筑, 负荷, 异常数据, MDLOF-iForest, M-KNN-Slope

Abstract: In the research of public building energy consumption, public building load is easy to produce abnormal data, so identifying and repairing abnormal load value is an indispensable data processing link. In view of the limitations of existing abnormal data identification and repair methods, this paper proposes a method of public building load abnormal data identification and repair based on MDLOF-iForest algorithm and M-KNN-Slope algorithm. Mdloof-iforest algorithm introduces Mahalanobis distance into the traditional LOF algorithm to improve the model's perception of the correlation between data features. Meanwhile, combining the advantages of MDLOF algorithm and iForest algorithm, it can quickly and accurately identify abnormal data. Secondly, M-KNN-Slope algorithm uses neighbors with similar load trend line characteristics of abnormal data and normal data to obtain the slope weighted average value of similar trend line, complete the repair of abnormal data, and reduce the dependence on sample data. Through the verification of load data of an office public building and a commercial public building in Guangxi, China from August to November 2024, the difference between 90% data and correct data is less than 10%, and compared with the general algorithm, M-KNN-Slope algorithm can obtain more data with error less than 5%. The Root Mean Square Error (RMSE) of XGBoost, LSTM, BP and SVM before and after repair was reduced by 5.02% to 14.40%, respectively. Mean Absolute Error (MAE) was reduced by 2.44% to 13.34%, respectively.

Key words: Public buildings, Load, Abnormal data, MDLOF-iForest, M-KNN-Slope