January 4, 2023: Dex Aggregator Battle Test 🦙💱
Benchmarking new DefiLlama aggregator against 1Inch, Matcha, Curve
2023 continues its parade of wonderful news, with the launch of DefiLlama’s meta DEX aggregator.
DeFi’s top data source becoming the aggregator of aggregators is great in and of itself. Yet there’s an additional fringe benefit, which is that DefiLlama gains its first revenue source through the process. The top notch data collection efforts of DefiLlama give degens a market service roughly comparable with that of a Bloomberg Terminal ($24K/year), all for free and with no complaints from the team.
Since many of the exchanger partners they route through offer a referral fee, this finally nets the team their first source of revenue. So their radically unsustainable business model becomes marginally more sustainable!
The service has already shot up the leaderboards, quickly hitting number two on the list.
But how good is LlamaSwap aktually? We tested out quotes from hundreds of trades so you don’t have to…
Full article will be automatically paywalled for 48 hours
Data Sources
We started by selecting a handful of exchanges to test:
LlamaSwap, because that’s the title of the article
Matcha, because it was the only exchange with more volume than LlamaSwap at test time
Curve, because we’re $CRV maxis, but more generally to see a rough baseline for single-source trades. Also to see how a single AMM stacks up against AMM aggregators. It should serve as a baseline, to function sort of like a control group.
1Inch, because it utilizes extremely complex routing for large trades
We also needed a list of assets to test. When we ran a previous trial to test Curve efficiency, we deliberately had a selection bias that favored the long-tail of assets we could test on Curve’s router. In this case, we used a narrower slice of assets in an attempt to select assets that would run the spectrum of good to bad for Curve, to test out the edge cases. We focused on assets with lots of liquidity, even though DeFi doesn’t have many such assets left.
We settled on:
Dollarcoins:
USDT
USDC
Dai
Frax
3CRV
Ethereum:
ETH
WETH
stETH
Other Volatile Assets:
WBTC
STG
FXS
As you can see, some assets (3CRV/stETH) would give Curve a natural advantage as the largest liquidity source. WBTC and ETH/WETH sit nicely in TriCrypto, providing decent exposure of these assets. For other v2 pools, we shunned those where Curve had too much liquidity relative to other sources (ie CRV/ETH, CVX/ETH) so that aggregators would have to do better than just routing via Curve.
We picked instead pools where moderate liquidity existed paired with other assets on the list. We picked FXS because Frax has such a big footprint over the Curve ecosystem: the token contains a few different Curve routes, yet overall liquidity is a bit thin. We also picked STG because the USDC pairing was a bit of a test for v2 pools, but it remains resilient.
Note that STG and FXS are relatively lower market cap, so they also provided a good test for the effects of slippage among whales.
Methodology
Since we’re just one degen, the methodology had to be simple. If you hadn’t heard though, cryptocurrencies are extremely volatile (shocked Pikachu meme). We needed a system that could compare these four exchanges on a reasonably level playing field.
To do so, we loaded each exchange into four browser tabs. We kept refreshing the trades until they all stayed within a single significant digit. As soon as we got a clean sample, we recorded the outcome and proceeded. In cases of turbulence we had to set our computer aside until the apes calmed down. Eventually we were able to procure enough samples.
We ran two sets of trials, focused on “modest” size transactions and “whale” sized transactions. The former represents the majority of trades by count, the latter the majority of trades by volume. Given that Curve optimizes for low slippage on high dollar value transactions, this also removed some Curve “home field advantage.”
Input values
Shrimp (targeting the $10K value range):
Dollarcoins: $10,000
Ethereum: 10
WBTC: 1
FXS: 2,500
STG: 25,000
Whale (targeting the ~$1MM value range):
Dollarcoins: $1,000,000
Ethereum: 1000
WBTC: 100
FXS: 250,000
STG: 2,500,000
A few other idle notes:
1inch has a new “fusion mode” which looks cool, but since it didn’t publish the route we used their legacy router (which didn’t appear to impact results too much)
LlamaSwap provides estimates inclusive of gas costs, while other routers do not appear to do so. Once we realized this we started recording the gas costs, and later backfilled likely values as gas didn’t change too much.
LlamaSwap rounded (or truncated) everything to two to three decimal places — by selecting the values above we had enough precision, and we rounded everything to keep a level playing field.
We also created a trade weighting to determine how often the value of a trade moves through a particular exchange. For instance… if a trade goes through Curve, we note Curve gets 100% trade weight — the entire trade went through Curve. If a trade involves two steps, both of which go through a Curve pool, this counts as 200%. Easy enough, and it comes in handy as way to track utilization for the often complex trades that 1Inch likes.
Objectives
Knowing this is not a strictly apples-to-apples comparison (ie Curve is single-source, LlamaSwap factors in gas fees), our objective was to simply dissect a snapshot of the aggregator market as of early 2023. Some questions we had in mind:
Does LlamaSwap provide reasonable (or superior) results?
Are natural Curve routings included/neglected by aggregators (a prior problem)
What rough % of volume flows through Curve?
Do we see higher utilization of Curve in whale quotes, per the popular wisdom?
How do the aggregators perform in cases of whale trades with high slippage?
Results
Sifting through the results, we see all the exchange aggregators provided fairly good results… with a few caveats. Let’s start with the high level results, and then smooth them out a bit. The average difference from the best price:
DefiLlama: -.04%
1Inch: -.05%
Curve: -2.01%
Matcha: -17.72%
Holy smokes, Matcha is almost twenty percent off?
Well, it’s a little bit influenced by one asset that just kept whiffing. Apparently Matcha doesn’t recognize the existence of 3CRV. When trying to route it through Uniswap, it erases all your money, as happens to so many Uniswap users. Whoopsie.
So we learned Matcha is not great at routing Curve pools. Let’s refine it to see how it compares against pools it recognizes. Knock out everything with 3CRV, and Matcha’s quote drops to 2.05% below the best quote.
Really, 2.05%? That’s quite near Curve, which we threw in and deliberately nerfed. We should expect an aggregator of several AMMs to significantly outperform a single AMM. However, it is the case that Matcha struggled with some of the more complex trades.
The big reason appears to be a poor job of integrating Curve routing. For example, in these trials 1Inch weighted trades about 51% through Uniswap, 45% through Curve. 1Inch also performed very well — usually within .05% of the best price.
In contrast, Matcha also routed about 50% through Uniswap, but just 19% through Curve. The result is underperforming on the range of 2% to 18% for its users. Yikes.
So let’s be exceedingly generous and imagine users are smart enough to heed any slippage warnings. We’ll only consider trades where Matcha did not have such high miss rates. In these cases, we see that Matcha improves to 0.14% below the optimum.
Not too bad, but if we’re going to be so generous to Matcha, we should consider being equally generous to Curve. After all, we specifically selected a few Curve pools with lower liquidity to see the effects, and even then Curve averaged just 2% below optimum. However, wipe out the pools with >1% price movement, and Curve improves to a resounding .12% below the optimum price. Not bad, considering Curve doesn’t have the crutch of routing through other exchanges.
With this mostly uniform comparison, we can fit them on a reasonable chart.
It’s fair to say, without qualification, that the numbers are phenomenal for DefiLlama. The service launched just a few days ago, and already is outperforming all others. Impressive work!
Several other questions from above dealt with Curve utilization by aggregators. We got a glimpse of the top line numbers above — where 1Inch outperformed Matcha in part by integrating more Curve routing. For these assets where Curve has liquidity, we saw optimum routing include sending about half the transaction weight through Curve.
Curve’s focus on low slippage for high dollar trades does show up a bit in the evidence. Looking into 1Inch routing, among shrimp-sized trades it only threw about 38% of trades through a Curve pool. When handling whale trades, this number jumped to 54%. In contrast, whale trades saw Uniswap weight drop from 54% to 48%. These numbers also showed up on Matcha, which is not so well integrated with Curve.
Our recommendation? Use DefiLlama, maybe shop around at
Considerations
The biggest thing to keep in mind with all this is the biased sample of assets. Different results would probably come by sampling actual trades and tracking the results.
Of course, maybe a dozen other aggregators exist. We can’t rule out that other aggregators may do even better.
Other chains may be different, especially given reduced gas costs.
We didn’t account too much for the fact DefiLlama considers gas costs in their calculations — we didn’t have to as they outperformed, but their hit rate may be even better if this was backed out.
Several holes exist, but as a quick snapshot it was a useful way to view some of the trends. Hopefully this can inspire more diligent researchers inclined to give the subject its due!