#23: Lost in translation
Companies need to work on better translation tools in the era of large language model
Signals & Experiments
I am starting this newsletter with this section because I have a lot more to say about a particular topic than the news in the past week: translation. I spent the last week in China going through multiple cities, including Shenzhen, Hangzhou, Suzhou, and Shanghai.
In the past few years, all of us have read about the transformative quality of large language models that can understand multiple languages. All AI labs flaunt their benchmarks while releasing a new model. But in practice, I didn’t see the effects of the new AI model when it came to the translation of text or speech.
When I was in China, I had to use apps like AliPay and WeChat a lot. Often, tea and coffee shops in China would have a mini app inside these apps that can help you order. There is a floating translation option in Alipay, but the success rate of that working and working well was not great.
I was in a lot of briefings and Q&A sessions where an executive was speaking Chinese. When I used iFlytek’s S6 recorder, the audio captured was good, but the on-device translation was very average, and its interface was janky to look at and follow the speech at the current point.
AirPods’ translation feature only worked well when I was in a one-on-one conversation, and that too missed the mark of what I had asked and what the other person was saying.
In China, Google’s services are restricted, so I relied on Apple’s translation app to take photos and convert text on menus and signs to understand the basics. The translation through the camera is not great, but it is good enough to make do and place an order at a restaurant. With WWDC scheduled today, I wish that, beyond Siri, Apple would add more intelligence to translation. The only bit that worked well was holding the phone with the translation app to the other person and conversing with them.
I was trying out Even Realities G2 Glasses, and the translation was very good. It worked well when an executive spoke in Chinese, and I was looking at the text on my glasses, but a lot of times, that felt like I was catching up to the speed of text that was whizzing past my tiny screen.
But there are a few practical caveats to this. You need to have the glasses on you and connected to your phone. And you need to activate the translate mode when conversing with someone. To show them your side of the conversation, you need to open the app. Plus, with this particular pair of glasses, you can’t translate real-life text as there is no camera.
There are a few challenger brands like Inmo that offer both camera and screen with a translation feature, but I just experienced them in a demo at Global Connect Show (GCS) and don’t have a verdict to pass yet.
With AI advancing so much, I think there should be better standalone apps and in-app translation features for travelers. We can’t stick with old clunky interfaces that barely work and get translation good enough for users to understand a word or two and make a purchase decision at a shop or restaurant. With AI, I feel that the act of translation should feel more fluid, reliable, and easy to use. Maybe work on a better translation before chasing AGI?
Top News
Suno scores $400 million in new funding at a $5.4 billion valuation
Music creation app Suno said that it has raised $400 million in a Series D fundraising led by Bond Capital with investment from IVP, Forerunner, Union Square Ventures, Alkeon, and Quiet.
The company’s latest funding comes after its $250 million round in November 2025. Since then, it has settled its lawsuit with Warner Music Group (WMG) and seen a bump in valuation from $2.45 billion to $4.5 billion.
The legal trouble still exists in the form of lawsuits with Sony Music and Universal Music. Recently, both labels added more than 61,000 songs to the lawsuit, claiming that they were used in model training by Suno. This increases potential damages for Suno if the court favors the labels. However, Suno is urging the court to let it keep the number of songs used in model training under wraps.
However, more people are using the app. The company said in February that it has more than 2 million paid subscribers and $300 million in annual recurring revenue. The company also reached #1 in the music category in the App Store. The company said in its announcement
In recent months, we’ve seen Suno become part of culture in ways that continue to surprise us. Family members are turning text threads, group chats, and inside jokes into songs. People are writing songs for birthdays, graduations, and even work events. Viral trends helped propel Suno to #1 in the App Store’s Music category in dozens of countries.
It is not yet clear if people are using this for fun or if there is a trend of professional users using the app for production or ideation. Investors are betting on the usage going up and also the company settling its lawsuits.
In the past few months, ElevenLabs, Google, and Scale AI have released license-safe music models. Suno is seemingly going in the same direction, as it said it plans to release its first music model “developed in partnership with the music industry.”
Quick Bytes
Sesame, the voice AI startup from Oculus founders, launched its app in public preview. The app now has four agents, Maya, Miles, Simone, and Charlie, with their own memories and personality types.
AethexAI, a startup concentrating on Africa and the Middle East, raised $3 million in funding and opened up its platform. The company also released its Kora series of models that work with localized dialects of English, French, and Arabic.
Google is going to roll out on-device scam detection for instances where the miscreant is using number spoofing and calling you from what seems like a number on your contact list. As CNET noted, the caveat is that this only works when both parties have the Google dialer, contacts, and Messages app installed.
Customer support startup Fin (formerly Intercom) released a new model called Apex Flash. The company said that the new model is almost 20% faster than the previous version for time to first audio. Plus, it also improved the resolution rate by 24%.
Google released Gemma 4, a 12B model that doesn’t have a separate encoder for audio and video inputs. With this, the company released its Edge Gallery app and Edge Eloquent app for dictation on Mac, which use local models for their features.
Two years after pulling back its voice-led ordering system, McDonald’s is piloting a new one in partnership with Google. An account called McFranchisee on X noted that in its test phase in five stores, the system has processed over 1 million transactions.
Off Topic
One thing I always miss after my trips abroad is beverages and snacks that I get in convenience stores. During my trip to China, I became a fan of both iced black tea and Chrysanthemum tea, which you get in 7-Eleven or Family Mart stores. I also liked the dark chocolate KitKat that is not available in India.
I did see cucumber-flavored Lay’s chips that I couldn’t try. A chewy version of Alpenlibe was also a new discovery for me. I also brought a ton of new flavors of Pocky back home.
Partner Spotlight with Atomik Growth
In a chat with a16z, ElevenLabs CEO Mati Staniszewski talks about how he and his co-founder, Piotr Dabkowski, got inspired to solve for badly dubbed movies and created a company that thinks that voice is the new interface for human-computer interaction.
Sponsored content
Thank you for tuning in. Keep listening.
This newsletter is by Ivan Mehta, a freelance reporter at TechCrunch. It covers AI and technology in voice, audio, and music.
Email: voiceaiweek@gmail.com or im@ivanmehta.com




