Opportunities and Challenges in using machine learning to model pollen data

Publication date: 09-02-2024, Read time: 3 min

There are many types of pollen models simulating pollen emission and/or atmospheric transport of pollen to predict the amount of pollen at a given time at a given location. As pollen can be transported over larger areas, large-scale models that capture both local and transported pollen need to be developed. This requires large datasets. We need to predict future scenarios to understand how climate change will impact pollen and pollen allergies. This can be done via coupling of pollen models with climate prediction models. Machine learning (ML) models are especially interesting for this purpose.

Opportunities

1. Investigation of multiple variables. It allows us to investigate the impact of many environmental variables on pollen emissions. One may think that temperature is only a single variable and that the correlation between pollen emission and temperature is easy to determine. However, many other variables can be derived from a single variable like temperature, such as the number of warm days, the daily minimum, maximum, and mean temperature. The same applies to rainfall, humidity and wind. ML models can analyse the relevance of those variables to model pollen occurrence.

2. Handling non-linear data. Next to this, ML allows us to capture non-linear and complex relationships. Many statistical models assume linear relationships between variables. Machine learning algorithms can approximate arbitrary functions allowing the inference of complex, non-linear relationships.

3. Real-time monitoring and forecasting. In real-time forecasting, we often see changing patterns. Data streams are often dynamic and subject to fluctuations. Machine learning models can adjust their predictions when new data becomes available.

4. Access to large data sets. Long time series of daily pollen counts per species are available at multiple locations. Next to this, we now have access to a plethora of spatial and spatio-temporal environmental datasets. These relatively large and complex datasets can be analysed together via machine learning to model the impact of environmental conditions on pollen release.

5. Adujusting to changing conditions. By training machine learning algorithms on long-term and short-term historical data, and checking which approach leads to more accurate predictions, we can create models that adjust to changing conditions.

Challenges

1. Lack of data points. In space, we have only few data points, due to hand-counting of pollen data. Counting is difficult because both species and volume have to be determined. Due to the limited availability of pollen counting stations, our data might not capture the complexity of spatial variability in pollen distribution.

2. Lack of spatial awareness. Machine learning models are not spatially aware. This means that space isn't really a factor in these algorithms. The same is true about time. Most ML models aren't immediately and natively equipped to model time series data.

To learn more about the project, visit our project page, and check out these links:

Tree compass

Tree compass helps create a healthy living environment for hayfever patients

Be cautious planting trees that may cause hayfever