Invented by Li-Yen Kuo, Chih-Lun Liao, Chun-Han TAI, Hao-Yu Kao, National Chung Shan Institute of Science and Technology NCSIST
The National Chung Shan Institute of Science and Technology NCSIST invention works as followsThe method of predicting air quality using machine learning models includes the following: (A), providing air pollutants data for a regression algorithm eXtreme gradient boosting (XGBoost), (B), providing air pollutants data for a Long Short Term Memory (LSTM), (C), combining air pollutions data, XGBoost and LSTM predictions values to generate air pollutant combination data (D), performing an XGBoost classifier algorithm to get a suggestion on whether to issue a air pollution alert (E); (E) (B) (LSTM (LSTM (LSTM (LSTM (LSTM (LSTM (LSTM (LSTM (LSTM (LSTM (LS (E) (E) (E) (E) (E (E) (E) (E) (E) (E) (E (E) (E (E) (E (E (E (E (E) (D (E) (D (E) (D (D) (D) (E) (S) a) XGBoost () a) a) a) XGBoost) regression XGBoost) XGBoost) a) XGBoost) a) () (E) XGBoost) a) (S) a) XGBoost) a) a) a The two layers of machine-learning models can improve the situation in which prediction results are conservative because a single model lacks enough data.
Background for Method to predict air quality using machine learning models
The World Health Organization (WHO), lists air pollution among the major environmental carcinogens, and it is even more dangerous than secondhand smoke in terms of lung cancer risk. Particulate matter 2,5 (PM2,5) is the most dangerous air pollution. It can accumulate deep inside human lungs. The American Heart Association confirmed that PM2.5 can penetrate the respiratory system of humans and carry bacteria, heavy metals and dioxins directly into the thoracic cavities. Long-term exposure may increase the risk of myocardial ischemia, stroke, arrhythmia and other cardiovascular diseases.
The long-term effects of air pollution are still unknown. Air pollution is often higher than the standard value due to increased pollution from industry. This is especially true in winter, when airflow becomes less efficient. “It is evident how serious the problem of air pollution has grown.
As there are so many sources of air pollution, it’s difficult to monitor the pollution accurately. The current method of monitoring air pollution relies on the use of an atmospheric model for calculations. This allows air pollution levels to be predicted over the next few days. Nevertheless, there are other problems. There are two problems: (1) the resolution of the predicted time intervals can be low when a conventional method estimates an average value using days, rather than predicting real-time. However, real-time predictions may be more accurate and closer to the living habits of the residents. (2) the resolution of the predicted spatial range can also be low because the atmospheric model is usually variable across multiple latitudes and longitudes. The air pollution is then predicted based on an average of a large range. Air quality is closely related to human habits and activities, as well topographic factors. The conventional method that only considers atmospheric flows is unable to predict air quality in a small area. The prediction results without localization are only useful as a guide for local residents and have no real-time benefit in terms of early warning. In addition to pollution gases within the monitored area, pollutants carried by outside factors, such as monsoons should also be monitored, since the influence of these sources can vary depending on wind speed, wind direction and date. Even though the weather in adjacent regions is similar, topographic factors can have a different impact. “The number of variables in air pollution are huge and make it difficult to predict air pollution accurately.
A method to predict air pollution is to train a regressor using the eXtreme Gradient Boosting algorithm (XGBoost), with parameters that can be adjusted according to data on air pollution, and then to predict future air pollutants via a well trained XGBoost algorithm. Because air pollution is a time-varying phenomenon, a change in the atmosphere over a period of time can influence future concentrations of air pollution. To help XGBoost learn about time-varying characteristics, it is sufficient to use the XGBoost algorithm for training.
The industry must develop a way to predict air quality more accurately, using a deep learning artificial intelligence model, such as Long Short-Term Memory (LSTM), in cases where the prediction results are conservative, or when a model lacks enough data. In this way, cost and time are taken into consideration, and air pollution conditions can be accurately and effectively predicted in order to improve human health.
The main objective of this invention is to develop a machine-learning model for predicting air pollution by combining data on air pollution, an XGBoost algorithm, a LSTM and an XGBoost classifier algorithm in order to achieve reliable and accurate prediction results.
The present invention is a method of predicting air pollution with machine learning models. The method includes: (A), providing air quality data for performing a eXtreme Graduate Boosting (XGBoost), regression algorithm in the order to get a XGBoost value; (B), providing the data for performing a Long Short Term Memory (LSTM), algorithm in the order to get a LSTM value; (C), combining the data and obtaining the LSTM value, to create air pollution combination; (D), performing an XGBoost classifier algorithm to determine whether or
The air pollution combination of step (C) can be created by combining air pollution data in vector form, along with the XGBoost and LSTM predictions values. Linking the vectors of the air pollution data and the XGBoost value with the LSTM value can produce characteristic vectors that are suitable for second-time machine learning.
The suggestion to issue an alert for air pollution in step (D), may include an air quality alert value. This value is designed to monitor the current air pollution conditions. If the air quality alert value is above a certain value, it indicates that the current air pollution conditions may be severe. It should be decided whether to issue an alert to the public to encourage them to stay inside rather than engage in outdoor activities.
The above summary, the detailed description that follows and the accompanying drawings will further illustrate the features and effects of the invention. The following description and accompanying drawings will provide further objectives and benefits of the present invention.
The following detailed description of preferred embodiment, illustrated in various figures and drawings, will make it clear to anyone with ordinary knowledge of the art what the objectives of this invention are.
Embodiments describe the present invention’s method.” The following detailed description will help those skilled in the art to understand the benefits and effects of the invention.
Refer to FIG. This diagram illustrates the grid for air pollution partitioning according to this invention. FIG. As shown in FIG. 1, pollution sources from neighboring regions can also affect Taiwan via wind, with greater influence from the nearby region and decreasing influence from other regions as distance increases. In FIG., the sizes of the grids are based on influence levels coming from different regions. 1. The most detailed partition grid is Taiwan, because the region has the highest level of pollution. “The influence levels in China and the coastal regions of Korea is the second highest, while the influence levels in inland Mongolia and China regions are lower. Therefore, the partition grid of Taiwan are the most detailed.
Refer to FIG. The flowchart in Figure 2 illustrates a method of predicting air pollution with machine learning models, according to the invention. As shown in FIG. As shown in FIG.
Step S201: To obtain an XGBoost value, perform a regression algorithm using air pollution data.
Step S202: To obtain an LSTM value, perform the LSTM algorithm using air pollution data.
Step S203: Combining the air pollution data with the XGBoost and LSTM predictions values to produce air pollution combination data.Click here to view the patent on Google Patents.