华电技术 ›› 2018, Vol. 40 ›› Issue (7): 1-4.

• 研究与开发 •    下一篇

基于大数据平台和并行随机森林算法的能耗预测模型优化

  

  1.  
    湖南大唐先一科技有限公司,长沙〓410007
  • 出版日期:2018-07-25 发布日期:2018-08-24

Optimization of energy consumption forecast model based on big data platform and parallel random forest

  1. Hunan Datang Xianyi Technology Company Limited, Changsha 410007, China
  • Online:2018-07-25 Published:2018-08-24

摘要:

利用Hadoop,Spark,Hbase等构建分布式大数据分析平台,在此基础上通过数据采集和预处理获得健康的数据集,建立并行随机森林算法的能耗回归预测模型,全面分析和比较基于随机森林预测模型的输入与模型参数、输出之间的关系。重点比较分析了决策树数量、决策树深度、最大分裂数等参数对训练模型精度、运行时效、复杂度的影响,得到该预测模型的最优化参数,实现供电煤耗的精准预测与软测量计算。

关键词:

Abstract:

A healthy data set is acquired through data collection and preprocessing based on the construction of distributed big data analysis platform such as Hadoop, Spark and Hbase. Regression forecasting model of energy consumption based on the parallel random forest algorithm is built to comprehensively analyze and compare the relationship between input based on random forest prediction model, model parameters and output. The emphasis lies on comparative analysis of the decision tree number, depth of the decision tree and maximum number of split, which will affect the training model accuracy, running time and complexity. Optimization of the prediction model can achieve accurate prediction on the coal consumption for power supply and soft measurement calculation.

Key words: