Bargaining with the algorithm: Shipt shopper pay

Summary

The exception to the gig-economy algorithmic rule, Shipt launched its platform using a transparent (although perhaps underpaid) system for paying it’s workers: $5 per shop, plus 7.5% commission on order amounts. But, like other companies, Shipt is engaging in a strategic bait-and-switch. Starting in early 2020, Shipt began paying some workers using a new black-box algorithm that workers call “V2”. On September 16th, and again on September 30th, Shipt finalized the rollout of this new algorithm, falling in line with other gig-economy corporations that have been using a black-box system for years.

To understand how this new payout model will impact workers, we’ve partnered with Coworker.org to build a system to aggregate data on worker payouts from the Shipt platform. Over the summer and into the fall, 94 Shipt workers have donated their data to the project, totaling 3475 submissions since August 20. Each of these “shops” contains information on the total cost of an order, it’s date, worker total pay, tip, any promo pay involved, and whether the order was delivered late. The focus of this writeup is to explore this data with a statistical lens, to try and estimate how and where Shipt’s new payment algorithm impacts workers.

Our main findings are:

  • 41% of the shoppers we’ve collected data from are earning less under the new V2 payment scheme.
  • Those shoppers that are making less under the new V2 scheme are making 11% less on average per-shop.
  • The number of shoppers earning less under V2 is growing, with 60% of shoppers having earned less from V2 in the past week as of this report (Oct. 7th)
  • However, across all shops and workers, the V2 pay scheme does pay more: about $0.99 on average, per-shop.

Overall, these findings lead us to believe that any earnings increases under the V2 algorithm are not being distributed evenly. Some shoppers are serially making less than others under this new regime. Without more transparency for workers by Shipt or broader, worker-driven research like this project, why this discrepancy exists will remain a mystery.

Many shoppers are earning less under V2

Under V2, shoppers are paid under a black-box rule with no transparency. Luckily, however, we have a baseline to compare their pay to. Because we can easily ask how a worker’s pay would have been different had they been paid under the V1 algorithm, we don’t need to rely on complex statistical tests – we can just say, with certainty, what a shopper would have been paid if Shipt had never changed their algorithm (we do, however, need to make sure that the data we use is OK, which we did – see the section on data & methods if you’re curious).

The left plot below shows a box for each shopper in our final dataset, colored by how much money they’re making – or losing – on average in the new pay scheme. About 40% of workers are making less per-shop than they would if they were getting V1 pay. The amounts may not seem like much: a majority of workers making less are making between $.50 and $3 less per shop. However, these differences amount to an average of an 11% pay cut for these workers who are making less.

And there’s another mystery here. Shipt claims that it’s algorithm estimates “effort” for each shop. If it were simply bad at doing this, and had some random error, we’d expect the lower payouts to be evenly distributed across workers. But that’s not what we see. Instead, we see a group of workers who are consistently making less under the new algorithm. This suggests that Shipt’s algorithm is systematically under-paying certain workers. This raises the question: which workers?

…and the number of shoppers earning less is growing

In further analysis, we’ve also found that the payouts from the V2 algorithm have decreased over time, with payouts that happened after the Sept. 15th rollout being lower and more clustered together. This has also seemingly resulted in more shoppers being paid less by V2 as time goes on:

But overall, V2 does (technically) pay more.

When examined in aggregate, the V2 algorithm seems to be paying more for the average shop than V1 would have, though not all shoppers are reaping this benefit. V2’s lower end of pay seems to offer close to a minimum pay-out of $10 per shop, and it also has a much tighter distribution – the peak on the blue V2 curve in the plot below is much higher and narrower. This means that shops have less varied pay overall, though payouts for orders with large order totals are generally smaller than they were under V1. It’s puzzling to see a solid 40% of shoppers earning less, even while the V2 algorithm pays out more on average across shops. This only makes sense if there is a group of shoppers consistently getting paid less under V2. Rather than a raise for all shoppers, V2 might be better described as a reallocation of payouts that has resulted in a pay increase for some, but a large cut for others.

Where did the data come from?

All this data is self-reported shopper data, collected using automatic text recognition on shopper “shop receipts”. each receipt has a unique order number, a delivery window, the date and time of delivery, the order pay (determined by the algorithm), the tip (determined by the customer), the order total (paid by the customer), and the total pay (order pay + tip).

An example receipt

An example receipt

This data was collected as part of a research collaboration between MIT and Coworker.org.

Appendix: Data, methods, and additional analysis

Collecting data from informally organized workers is a difficult challenge. The dataset we use here isn’t perfect, but it’s the most representative sample we have of earnings from Shipt shoppers to date. The tech we use to parse shop amounts from screenshots sometimes parses out an incorrect number, and we had to decide what date range to use in our final analysis. Some of these decisions and their reasoning are presented below.

Basic filtering: screenshots are hard to parse

We do apply some basic filtering to get our final dataset. Our screenshot reader isn’t perfect, and gets plenty of submissions wrong. For the most part, this means it sometimes drops significant digits and turns e.g. a $16 shop into a $6 one. To try to account for this, we remove all shops that were parsed as having an order payout of under $8, which removes only 7.6% of shops. We came to this cutoff by manually inspecting submitted reciepts (order totals are very rarely less than $8), and by examining the data, which shows a suspiciously high number of orders with just under $8 order pay.

We also exclude shops with a payout over $55 before tips, for a similar reason, and any shops with an order total (the amount a customer purchased through Shipt) over $500.

Submission time range and minimum shops per shopper

We’ve been collecting screenshot submissions throughout the summer, but have allowed shoppers to submit shops from any time. This means we have records from even before January of 2020! In our analysis, we’re including shops from shoppers who have submitted at least 10 shops in total (v1 or v2). The plot below shows the order pay of each individual shop we’ve recieved from shoppers. The vertical gray line represents the date that Shipt rolled out their new V2 algorithm, on Sept. 15th. Some shoppers in some areas were paid using the new V2 algorithm far before the 15th – the earliest V2 shop we recorded is from February.

Time Period: ensuring a consistent and representative dataset

To calculate how the V2 algorithm has affected worker pay, we also need to define a period of data to use. We’d ideally find a period of time with no major changes in payments, and enough submissions to make sure that our data is relatively representative. Aggregating by week, we can see a sharp increase in the number of submissions starting around week 28 – mid-July – when we first started launching the tool. We also see the transition to the new payment model on Sept. 15th (the vertical gray line). To make sure that we can use V2 data throughout the summer, and not just after the 15th, we make sure that the V2 algorithm in the summer is indistinguishable from the V2 algorithm that ran after the 15th. A quick look at the mean order pay over time shows that while the order pay for V1 has remained relatively stable, V2 pay has varied quite a bit. It’s difficult to know if this is due to Shipt changing their algorithm over time, or just the result of lower submission rates, but looking at Figure c below, it’s clear that the V2 pay before week 30 is higher, and more variable, than after week 30, so we decide to only use shops that happened after week 30 in our analysis.

Orders are slightly larger after the V2 rollout

We calculate what Shipt shoppers would have been paid if Shipt had continued using the V1 payment scheme after the 15th, so we effectively control for any differences in pay over time. But, an objection someone could make to our analysis is that it’s possible that the V2 algorithm also changed the kind of shops shoppers recieved. If this was the case, then there could be a systematic change in pay from before the 15th to after the 15th that lowered (or raised) shopper pay under V1, but not V2. Testing if this is the case, we find that there is likely a difference between the pre-rollout and post-rollout orders (\(p = 0.015\)). Orders after the rollout have a slightly higher order total in general. The effect isn’t large, but it favors post-rollout shops. This shouldn’t affect our analysis, since the V2 and the V1 numbers we’re calculating use these same shops. Interestingly, V2 shops pay takes a dive after the rollouts, which isn’t explained by the order totals. V2 shops payouts drop from an average of $14.50 to $13.50, a 7% decrease. Using only shops from October, which we have fewer of (so an estimate is less reliable) V2 pay has dropped down to $12.25 – a 15% decrease.