Insights from food delivery data: 100% inflation rate in 18 months
As with my major data analysis/science projects, this one has followed a course of growth over the last couple of years. As my approach changed, the product manifested itself in different and more detailed forms (for the better, I would say). But the main reason I am interested in food data is still the same. That is, well, that there are fundamental differences in how people from different backgrounds make eating decisions on a daily basis. After all, we are what we eat.
Or, is it the other way around?
I am not only talking about religion-driven or simple income-induced differences. Even eating can get political under particular conditions and certain edible items can and will be perceived as signals for association with a social group, as these signals usually become even more salient in polarized social settings (DellaPosta, Shi, and Macy 2015). As I continue exploring food industries in various locations (next up: Berlin), I took a detour to revisit Turkey’s leading but increasingly infamous1 food2 delivery leviathan: yemeksepeti.com.
One and half a year ago, for a tweet of mine that became somewhat of a hit (which I linked just below), I had tapped into the aforementioned delivery service for the first time. Scraping the data off it was rather straightforward: crawling JavaScript-based content was not strictly necessary unless you wanted to have access to restaurants that are not open for delivery at that time of the day. This required (and still does) activating a simple check-box that triggered a JavaScript event that listed the restaurants regardless of their state. I did not particularly like that there was a possibility that some restaurants didn’t get picked up by the rvest script but I still went along with scraping the data during the peak hours because I did not actually have much free time then. Fast forward a couple of days, I had curated a data set that featured the menus of a sizable portion of the restaurants in Istanbul that partnered up with yemeksepeti, and the list of neighborhoods these restaurants served.
Speaking of web-scraping, here is the github repository for this most recent version of the project. I will try to make available all my projects unless I suspect that there might be possible terms of service violations with publicizing the codes and/or data. Whatever, back to the task at hand: This time around, I used Selenium in python to collect the links to each restaurant and rvest in R to collect the menus.
Alright, back to September 2020. if you are interested - which you probably are, if you are here reading this post -, just take a look at the tweet. The main idea is that there are considerable differences in average Lahmacun prices (a popular dough, spice and meat-based food, sometimes referred to as the “Turkish Pizza”, woefully) across different districts in Istanbul. Naturally, the richer and the more attractive is the district, the higher are the prices… ostensibly. But is it that simple? There are numerous deviations from what you would expect based on that intuition and we might want to investigate this further. So, one of my motivations was to improve the script and delve further into the dynamics that surround the pricing of Lahmacun and other practical foods.
İstanbul'da ilçe ilçe ortalama lahmacun fiyatları.
— Efe Başlar (@baslare) September 6, 2020
Esenler ucuzlukta zirveye oynarken Beşiktaş fiyat şampiyonluğunda yine iddialı. pic.twitter.com/z8Na0l8LWB
In addition to my usual rationale for getting my hands dirty around the inspect functionality of my Firefox browser, I had to satiate my curiosity in one additional aspect. If you are keeping track of the news then you might possibly have heard of inflation soaring high3 in Turkey and if you are a little perceptive, you might also have noticed that the official figures for annual inflation4 are met with stark suspicion5, well, for a plethora of reasons. So, for taking a peek at what “true” inflation looks like, there isn’t a much better way than utilizing a service that Istanbulites have come to rely upon even more heavily in the Covid-19 world. Alright, without further ado, let us move on to the punchline.
OK. Take a look at the map above. Move your mouse cursor over the districts for more information, if you’d like. Take a look at the previous iteration of the same map. Yup. It is correct. The prices have increased around 100% and that happened within a span of 18 months. Let that sink in. And no, I have not changed my methodology greatly, barring some improvements for more accurate estimation. I’ll be detailing these improvements below but overall it is pretty similar to one I had then. Overall, inflation seems to be running rampant in the food industry and the prices seem to have inflated equally across districts: around a whopping 100%! The figures are based on 2198 yemeksepeti partner restaurants with Lahmacun on their menu (those that satisfy some qualifications I laid out below).
The districts with a sizable portion of secular, upper-middle-class residents (read: economically better off) seem to be the ones in which you can have the most pricey lahmacuns. But the number of restaurants serving in each district is not necessarily proportional to the population residing within that district. It shouldn’t come as a surprise when I tell you that it is mostly the younger inhabitants, especially university students that use food delivery services.
Now, let us dive into some details about how this plot was created. Creating the plots was definitely not as straightforward as the maps themselves seem. I mentioned just above that I had made some improvements in my approach. Improvements seldom come without any further complexity and this is one of those cases (not always, of course).
yemeksepeti lists 961 precinct equivalent divisions for its partner network in Istanbul. This is deceptively similar to the true number of precincts in Istanbul, which stands at 964. However, some precincts are arbitrarily divided by yemeksepeti for operational reasons, as precincts are hardly homogeneous across different districts in Istanbul and not necessarily equally reachable: they range from a few hundred inhabitants to a hundred thousand.
Precincts constitute the smallest possible Turkish administrative unit, as per the latest regulations. Districts, depicted on the maps in this post, house a number of districts. Basic math tells us that for each of the 39 districts in Istanbul we can expect to observe around 25 precincts. The way yemeksepeti stores its data doesn’t tell much about where a restaurant is located. It would have been amazing to be able to have access to geospatial data but a web-scraper must live with what he can get his hands on.
Even knowing about the precise locations of each restaurant would not be able to change a simple fact: restaurant deliveries are trans-precinct and most of the time trans-district. Each precinct is (literally) fed by a number of restaurants located in and around itself and some precincts are served by more restaurants than other precincts. Even though I was aware of this simple fact when I first delved into yemeksepeti data, I had only taken a simple average for each district, without giving any effort to create a more representative statistic under this particular structure of the data.
In addition, programmatically searching each restaurant’s menu for the product of interest (“Lahmacun”, in this particular case), has its own problems. If your goal is to filter everything with “Lahmacun” in its name, then you are in for a treat because you are going to get every form of seemingly relevant item in the results. The solution is to include some keywords or other elements you do not want to see. Since I wanted to focus as much as possible on the singular lahmacun, I employed that solution. You can refer to my github repository if you are interested in how I filtered out the “lahmacuns of interest”.
Furthermore, most of the partner restaurants have set a minimum threshold on the total amount of the order in TL, conditional on the precinct they serve. The customer therefore must surpass that threshold in order to guarantee delivery for the order. Although I sincerely doubt that these values are issued with any meticulous calculation, the underlying rationale probably draws from a simple and rational mechanism: cost-benefit trade-off.
If a restaurant sets too high a threshold, then it could mean that they regard that particular customer to be too far away and that it is not worth it to send out a rider for that delivery or that the restaurant is too fancy for a small delivery. There are some ridiculous combinations, e.g. a Lahmacun restaurant requiring a minimum 1000 TL delivery amount for some precincts (Since you have studied the map above you know you would need to order around 50 Lahmacuns on average to qualify for such delivery), or a fancy restaurant offering its gourmet Lahmacun for around 90 TL.
It is obvious that a person looking for a regular Lahmacun would hardly consider those kind of restaurants to be among viable alternatives for his lovely lahmacun-to-be. But, in theory, that particular restaurant is still part of the restaurants that serve that specific precinct and should somehow be incorporated into the statistic. My solution is to apply a weighted average based on this delivery threshold (you can see a sample of restaurants with different thresholds in the screenshot above).
Let \(p_i\) denote the weighted average of the restaurants serving the precinct \(i\), \(t_{ij}\) is the threshold associated with restaurant \(i\) serving to the precinct \(j\) and \(p_{j}\) corresponds to the price of a product at restaurant \(j\), to my knowledge the restaurants do not engage in variable pricing, therefore the prices are invariant to \(i\). Yes, the prices in each restaurant are weighted by the inverse of the threshold \(t_{ij}\). You can notice that some restaurants have a threshold of 0, so I just replaced those values with 10. In addition, because the number that serves each precinct can differ, it is worthwhile to keep the number of restaurants serving each precinct. I used the restaurant counts as weights for each precinct when calculating the weighted average for each district.
\[p_i = \sum_{j \in J}\frac{p_{j}/t_{ij}}{\sum_{j \in J} 1/t_{ij}}\]
Assuming that everything went according to the plan, we get a list of all the precinct-equivalent units at yemeksepeti, with all the desired statistics. Thankfully, within the URL to each of the precinct-equivalent units, we have the name of the district it belongs to. This gives us, with some regular expressions workaround, the possibility of extracting the names of each Istanbul district. After doing all the processing, we get a list of the 39 districts and the corresponding weighted-average lahmacun prices. Yep. That was that simple.
Next up, even though I cannot provide any benchmark for the severity of the consumer inflation of other food items, you can find below maps and charts for Hamburgers and Beef/Chicken Döners. In the section that follows, I’ll a bit about a couple of regression models to dig deeper into precinct-based, rather than district-based analyses.
While you are still here take a look at the histogram below of Lahmacun restaurant prices in the entire data set. The average price is around 16.30 TL, which is slightly over 1 euro.
I’ll refrain from engaging in commenting in detail too much on the plots below as they follow the same principles I laid out above and I believe they can speak for themselves. These may not make much sense without a benchmark, but the price patterns observed for Lahmacun seem to persist across other deliverable food. The table below summarizes the number of restaurants that I used for the calculation of the weighted average prices.
Product | n |
---|---|
Lahmacun | 2198 |
Burger | 2799 |
Döner | 1805 |
Tavuk Döner | 1569 |
Istanbul has a curious penchant for making tasty Burgers and this is evidenced by the highest number of restaurants serving Burger, among the four food articles I selected. I know for sure that university students are particularly fond of eating burgers, restaurants of which are quite ubiquitous in the more well-off Besiktas and Kadiköy districts. Hence the higher prices. Working-class majority districts including Esenler and Sultanbeyli, and the Istanbul “countryside” Catalca and Sile almost always feature the least expensive alternatives, regardless of the food in question.
The Turkish döner usually comes in two variants: beef and chicken, with the former being the more expensive alternative. The latter has its own popularity mainly thanks to its accessibility. I enjoy both equally but at the same time believe that there are certain circumstances in which one of them should trump over the other. Both seem to follow similar patterns, as most of the time they are offered by the same restaurant (at the same time, some restaurants can be chicken-exclusive).
I would now like to tend to the question I posed in the beginning: what are the factors that contribute to the pricing of various food items? I brought in two different sources of data to create covariates to explain the variance in price. First, I utilized the API of the Istanbul Metropolitan Municipality for its (rather outdated) open data source6. The data I collected gives us access to precinct-level demographic records. Those that are relevant for us are the precinct population and its breakdown to various age-sex intervals, and the share of university graduates among the residents of that precinct. Second, I brought in the results of the most recent election in Istanbul: the rerun of 2019 the mayoral elections, at the precinct-level as well.
We are now set to see if voting behavior and certain demographics are in any way related to the variation in prices of each of the four food articles we talked about. Since I would love to evade multicollinearity as much as possible, I tried to “merge” the information from correlated and relevant variables into one variable. The model specification is as follows:
\[\small price_i = \beta_0 +\beta_1rating_i + \beta_2log(pop_i) + \beta_3count_i + \beta_4AKP{\%\_}CHP{\%}{\_}diff_i + \beta_5youth{\%\_}diff_i \] where the dependent variable \(price_i\) is the average product price in restaurants serving the precinct \(i\), explained by the (weighted) average rating of the restaurants serving the precinct, logarithm of the precinct population, the number of restaurants serving that product, the difference in vote shares of AKP (Erdogan’s party) and (the main opposition: social democrats), and the differences in the ratios between males and females that are between ages 15 and 39. As far as I could tell, there are no imminent endogeneity problems and it is relatively safe to use the OLS regression here.
Dependent variable: | ||||
Average Price | ||||
Lahmacun | Burger | Döner | Tavuk Döner | |
(1) | (2) | (3) | (4) | |
youth_diff | 0.915 | 3.440 | 0.607 | -0.963 |
(0.840) | (2.429) | (1.971) | (1.650) | |
log(pop) | -0.578*** | -1.317*** | -1.180*** | -0.583*** |
(0.055) | (0.147) | (0.131) | (0.112) | |
weighted_rating | 2.424*** | 6.140*** | 10.462*** | 4.926*** |
(0.283) | (0.660) | (0.564) | (0.455) | |
akp_chp_diff | -1.445*** | -1.680*** | -3.137*** | -3.604*** |
(0.199) | (0.602) | (0.464) | (0.349) | |
rest_count | 0.030*** | 0.066*** | 0.020** | 0.022*** |
(0.003) | (0.005) | (0.008) | (0.008) | |
Constant | -0.447 | 2.421 | -41.053*** | -11.549*** |
(2.447) | (5.929) | (4.912) | (3.839) | |
Observations | 780 | 779 | 779 | 779 |
R2 | 0.464 | 0.489 | 0.521 | 0.353 |
Adjusted R2 | 0.461 | 0.486 | 0.518 | 0.348 |
Residual Std. Error | 1.399 (df = 774) | 4.051 (df = 773) | 3.286 (df = 773) | 2.753 (df = 773) |
F Statistic | 134.027*** (df = 5; 774) | 148.001*** (df = 5; 773) | 167.914*** (df = 5; 773) | 84.221*** (df = 5; 773) |
Note: | p<0.1; p<0.05; p<0.01 |
Around 200 precincts that got filtered out during the merging process of the data sources are largely due to some precincts not being served by yemeksepeti (mostly the villages in the countryside), having fewer than 500 residents, and having some ambiguous naming (by yemeksepeti) that rendered me unable to tell which precinct it was. The explanatory power of the models is moderate at best, regardless, there are actually some interesting insights to be gained from the results.
the more populated a precinct is, the less expensive is the product.
Even though it does not seem to be a major effect and it does not seem to be a proxy for the demand for any of the food articles selected, (over)population seems to be in a negative relationship with price. This is expected because as with any metropolis the average income in Istanbul is usually considerably less in neighborhoods with a larger population (this has some exceptions, some precincts in the (geographic) periphery of the city have gotten cluttered with high-rise housing projects in the last decade). The effects seem rather small at first, but if you recall that the precinct population in Istanbul has a huge range, a 0.1 TL increase in a 10% increase in population surely gains more weight. In addition, the average precinct price seems to be slightly affected by each increment in the number of restaurants serving that particular precinct.
people are fine with paying more for better döner and burgers, but especially for döner.
Assuming that the variable average ratings for restaurants are a somewhat acceptable proxy for the quality of their output, we can see that Istanbulites are fairly OK with paying more if it means that it gets them good döner. This could in fact be related to some döner restaurants having a well-established reputation. It gets even more interesting when juxtaposing the döner and burger results, on average burgers seem to be 20-30% more expensive than beef döner but restaurant ratings seem to be 60-70% more salient in their relations to restaurant prices. For beef döner, a 10 TL increase in the price seems to be justified as long as it accompanies an increase of 1 in the restaurant rating
the more a precinct favors AKP over CHP, overall it is associated with lower prices.
This is also an expected result as it is well established that AKP thrives in lower-income environments whereas CHP voting is associated with high-income high-education masses. There are also numerous options for having premium döner in CHP-majority quarters, and both beef and chicken döner seem to be more sensitive to differences in AKP/CHP vote shares compared to Burger and Lahmacun. But, as discussed the AKP-CHP divide is highly pertinent to the economic divide, so this does not necessarily translate to politically driven food preferences.
This was a fun and at times a tedious exercise (especially when dealing with inconsistent restaurant menus and precinct names) using a personally curated data set tapping into three different sources: a delivery service, precinct-level demographics data, and election results. You must surely have noticed that it is still not enough to draw robust inferences and that we require more information on consumer behavior to come up with more interesting results. Most prominently, precinct-level average income data could help in disentangling political preferences from income-induced preferences. I would also have loved to carry out some geospatial analysis if I had access to such data.