A few weeks ago, I saw that Kobi had sent me a few messages on Telegram. Kobi is a close collaborator of mine, we work on many projects together. He sometimes joins me on the ZK Podcast as a co-host, he is a co-founder of ZKValidator, co-creator of ZK Hack and is one of the best engineers I know. His first message simply read “I’m sorry…”
He went on to send me the link to something he had just built called zkpod.ai. “I created a program that processes all the ZK podcast transcripts. It allows you to ask a question and then generates the answer. Try it out” he wrote. I opened it up and put in the question “What is the cryptography underlying Bulletproofs?” Clicking on the audio answer option, I was shocked. I was hearing my own voice giving a detailed technical answer!
Every week on the ZK podcast, I interview really brilliant people, mathematicians, engineers and thinkers. They share their wealth of knowledge with our listeners. But I myself am the interviewer, the perpetual student, and I am far from the expert on the topics we cover
Now though, I very much sounded like one of the experts. This felt surreal! Had I (or the AI version of me) just answered succinctly and clearly with incredible detail a question I probably wouldn’t have been able to spontaneously answer myself? Crazy!
This smarter, sharper voice twin had just blasted past me intellectually.
rip
“OMG” I wrote back to Kobi. I was amazed and excited by this new tool. I could use this on the show. Maybe people would use it as a resource!
But very quickly I began to feel a growing sense of dread. What exactly does it mean to have a “version” of me out there, but one with super intelligence. What if it starts to say strange things, things I would never say, or don’t believe? What if people become confused by it, thinking this is actually me?
Years ago when I saw the deepfake phenomenon on the rise, I realised that my voice could potentially be stolen. That someone could recreate it and use it for their own purposes. But this seemed like a remote threat. It would be quite labor intensive, I imagined, and plus the show was (and still is tbh) quite niche, so who would even bother. There must be bigger fish to fry.
Now, fast forward to today, and there was a self-generating deepfake AI doppelgänger answering questions better than I ever could.
And a thought popped into my head: does this mean I become obsolete?
I am sure I still have some human advantages … for now. I have my old stories, lived and learned history, topics I know about from outside of the show, social skills, interview skills, empathy and knowledge of norms of our industry, etc. But I do think it could be that in a few years, there is a version of this that is simply better than me.
The story that keeps popping into my mind is that of The Little Mermaid, the hit 80s Disney animated film. I will summarize it here for folks who never saw it (and will do so from memory, without using ChatGPT, to stay on theme)
The Little Mermaid is the story of Ariel, the red-haired mermaid who loves to sing but longs to walk on land. But Ariel’s stuck as a mermaid with a full-on tail and can’t leave the sea. In desperation, she visits Ursula, an evil witch octopus lady, who offers to magically give her legs and the chance to walk with the humans above the sea. In exchange, she will take Ariel’s voice and Ariel will be left mute.
Ariel takes the deal. She gets legs and goes on land, meets a prince, something something. But while she is up there, Ariel is unable to speak. She can’t sing. Maybe there is some race to get a true love’s kiss in time or some other disney bullshit… (thank you faulty human memory). Ursula then transforms herself into the image of Ariel and goes on land. This clone of Ariel looks just like her, but can also speak and sing. She is the “better version” of Ariel and manages to snag the prince. He is so charmed by her beautiful singing voice (guy is deep!) that he offers to marry her. Ariel is stuck watching on as this fake Ariel is able to walk on land, sing and achieve more than she could. It being Disney of course, we do get a happy ending: Ursula/fake Ariel is defeated, Ariel gets her voice back and wins the prince and the phat golden ring.
Back to zkpod.ai - using it for the first time, it’s like I am face to face with Ursula, the witch who has all the magic, the power, and who now also has my voice.
And I am Ariel stuck in the sea, looking on as she walks further inland than I ever will.
---------
I am inspired and terrified by this thing that Kobi built. When I first saw it, I wanted to hold back the launch, to try to halt the inevitable. But as we both acknowledged, what has been done cannot be undone. AI has my voice, and if Kobi hadn’t built this, someone else could have. I am happy that it was built by a friend and that I had a bit of time to digest it before he released it into the wild. I am also happy to be working on what we work on, since it does feel like the techniques we are developing in the zk space may be used to keep these things slightly in check.
We will definitely find cool ways to use zkpod.ai in coming episodes. And damn the speed of this thing is already crazy. It learns and improves so quickly.
I do fear the implications of losing control over something that had up until now been entirely mine. And it’s not that I lose my voice to another person or entity. But to a new unknown thing that will develop it in unknown ways.
It makes me want to push our community to build more tools to better connect humans to a particular piece of content and to distinguish a real authentic human recording from an AI fake.
Amazingly, we have just recently been exploring this on the show. In our interviews with [Florian Tramer](https://zeroknowledge.fm/246-2/) and with [Yi Sun and Daniel Kang](https://zeroknowledge.fm/265-2/), we covered ways in which ZK and ML are intersecting. Specifically in the episode with Yi and Daniel, we talked about this idea of attested sensors. A means to have a digital sensor (like a camera or mic) capture some analog input, but have the device itself sign or attest to the realness of the input. In their case, they were discussing photography and cameras but maybe there could be other attested sensor devices.
And so I have a wishlist of tools, some which may already exist:
* I would like there to be attested sensors in microphones, if there isn’t yet such a thing.
* I would also want a way to easily sign on audio content and confirm it's realness. I want a way that others can verify that content has been signed off on by me. To show when it is me speaking, and when it is the AI Anna.
* I would love to be able to trace the point of voice capture as it moves through each step of editing, maybe we could even use a recursive zkp.
* I want a way to protect my voice; a legal framework where I can have some control over how it is used or some technical way to track it when and where it is used.
These are my wishes, but it does feel a bit futile. When this thing begins to ask better questions than I can, then I guess I too will be the latest dumb human out of a job.
But until then, I think it is going to be crazy exciting over the next few months and years as we see these different techniques and tools grow and develop and hopefully (at least for a while) manage to balance each other out.