Carbon “Food-Print”: Predicting the Climate-Score of fresh produce

TechLabs Düsseldorf
5 min readApr 13, 2022

This project was carried out as part of the TechLabs “Digital Shaper Program” in Düsseldorf (Winter Term 2021/22).

Abstract:

In today’s completely globalized world, fruits and vegetables in the supermarket come from various continents. Even though sustainability is getting more and more attention, there is no easy way to compare groceries based on their effect on the climate — but the global food supply chain has a large lever in potentially decreasing CO2 emissions. The aim of this project is to create and predict the Climate-Score ranking for various types of produce which allows consumers to make better purchasing decisions for our climate.

Our project:

The Climate-Score is heavily inspired by the Nutri-Score, which is a nutritional labelling system often printed on the front of food packaging. The Nutri-Score is a five-step scale from A to E that indicates the overall score for the nutritional value of a product. Just like the Nutri-Score, our goal is to provide consumers with additional information on the product in a very easy and intuitive way. Thus, we have adopted the same scale with A meaning a very environmentally friendly product and E meaning a very environmentally harmful product.

In the first step of the project, it was essential to determine which factors in the value chain have the biggest effect on the total CO2 emissions. We had to simplify the supply chain into the production (agriculture), processing (farm management) and distribution (transport) phase. This simplification of the global food supply chain was needed, as free and detailed data sets on the climate effects of different types of food are hard to come by. Based on this decision our team thought of some possible scenarios on how the CO2 emissions might differ across different types of produce. To be more accurate, we imagined how the origin country, the agriculture style (greenhouse vs. organic) and seasonality of different fruits and vegetables might influence the calculated CO2 values.

We managed to find data on the emissions created while growing the produce on farms. For produce grown in greenhouses we added an additional 2,5 kg of CO2 emissions to the total, while organic fruits and vegetables subtract 15% from the total. It is important to mention that we assumed greenhouse cultivation for a fruit or vegetable if the given produce is not currently in season. To calculate the emissions of the transportation phase, GPS data of harbors and airports around the world was used. We managed to convert the GPS coordinates into distances measured in km through the Python package ‘geopy.distance’. Multiplication with estimated emissions created through sea and air freight gives us the needed emissions values created through a given transport vehicle. These calculations give us all the necessary data to build our own data set.

For the data base we had to generate each possible scenario for each type of fruit and vegetable. With up to 237 origin countries, the data size got big quickly. Our final data set had a total of 30933 observation rows. By calculating the total CO2 emissions ourselves, we had full reign over deciding how the classification of the Climate-Score will be distributed. We sorted the calculated CO2 scores into 5 approximately equal bins, sorted A-E from lowest to highest values.

With the finished data set we are now able to start our data modeling. It is important to mention that we will not include every column of the data set in our prediction models as the algorithms will notice the connection between the columns very quickly. We thus defined the following columns as the input for different supervised learning estimators using Python as our main programming language:

As is evident in the table, we mostly use categorical data and only include the calculated transport data for some numeric values. To accurately predict the Climate-Score of fruits and vegetables, we focused on supervised learning models, e.g. Decision Tree Regressor, Random Forest Classifier, Support Vector Classifier, Stochastic Gradient Descent Classifier and the Gaussian Naive Bayes. After fitting and prediction of our learning models, we can depict the following accuracy scores:

Using the Randomized Search and Grid Search for hyperparameter-tuning, we can raise the accuracy of the Random Forest Classifier to 92%. The following confusion matrix portrays the result of the final model. This shows that the prediction of the Climate-Score through the input of mostly categorical variables is highly accurate. This discovery highlights that you don’t need to have exact numeric values for every supply chain step to make predictions on climate effects very precise — especially when the goal is to inform consumers in a quick and effective manner.

You can find our project on Github.

GitHub repository: https://github.com/dalys100/Carbon-Foodprint

The Team:

Daria Lysenko: Data Science (Python)

Vivienne Simunec: Data Science (Python)

Sebastian Leszinski: Data Science (Python)

--

--