Tag Archives: Conversation AI

Will AI Replace Mid-Level Engineers by 2025? Not So Fast, Mark Zuckerberg

It’s hard to ignore the growing buzz around artificial intelligence (AI) and its potential impact on various industries. Recently, Mark Zuckerberg predicted on Joe Rogan’s podcast that AI could replace mid-level engineers by 2025. While it’s a compelling narrative, it misses the mark for several reasons. Let’s unpack why this prediction is more hype than reality.

Companies Aren’t Fully Using Their Own AI Tools

Take Salesforce as a prime example. The company has heavily promoted its AI-powered sales agents, touting them as the future of sales. Yet, if you look at Salesforce’s own career page, approximately 75% of their job postings (775 out of 1035 as of Jan 16, 2025) are for sales roles. If their AI tools were truly ready to replace human salespeople, why wouldn’t Salesforce “dogfood” their own product, slash sales jobs, and reap massive savings?

Dogfooding Gone Wrong

This disconnect isn’t unique to Salesforce. Many companies pushing AI solutions still rely heavily on human expertise to deliver the results they promise. It’s one thing to sell the dream of AI-driven automation, but it’s another to trust your core operations to it. If organizations like Salesforce, which stand to gain the most from successful AI adoption, aren’t betting the farm on their own tools, why should we believe AI will displace engineers en masse at other companies?

AI-Generated Code Still Needs Maintenance

Even if AI can write functional code, that doesn’t eliminate the need for mid-level engineers. All code, no matter how well-written, eventually requires updates. Security vulnerabilities need patching, APIs evolve, dependencies get deprecated, and business requirements change. Who’s going to handle these inevitable maintenance tasks? AI might be able to assist, but it can’t completely replace the nuanced understanding of a system that a human engineer provides.

Consider the metaphor of AI as a power tool for software development. It can make some tasks faster and easier, but someone still needs to wield the tool, know how to use it safely, and fix the mess when something goes wrong. Far from making engineers obsolete, AI tools are likely to amplify their productivity—and perhaps even increase demand for engineers who can effectively integrate these tools into their workflows.

AI Generated Code

If companies like Meta actually moved forward with replacing most of their mid-level engineers, they’d quickly find themselves in a “foot-and-gun” scenario. Without a robust team of engineers to maintain and adapt AI-generated code, systems would break down, product development would stall, and customer trust would erode. It’s a short-sighted strategy that prioritizes immediate cost savings over long-term resilience.

Selling the Promise of AI Is in Their Interest

It’s no secret that tech giants have a vested interest in promoting AI as the next big thing. AI and machine learning are lucrative business lines, and hyping up their potential is a great way to attract investment, sell products, and capture headlines. By framing AI as a technology capable of replacing entire swaths of the workforce, these companies generate excitement and urgency around adopting their solutions.

Heck, I am an AI/ML Engineer… I am in the space promoting the same thing, but my views on AI/ML is that they are HIGHLY strategic tools to be used by people. Replacing mid-level engineers isn’t just a technical challenge; it’s a strategic one. Engineering teams don’t just write code—they collaborate, solve complex problems, and adapt systems to changing business needs. These human-centric tasks are not easily outsourced to AI, no matter how advanced it becomes.

AI Generated Code

At the end of the day, humans consume the products that these companies produce. Until that changes, people will make the decision on what to buy and companies need to persuade those people to choose to buy their products. AI/ML systems don’t understand why things go viral, why we collectively like what we do, and why things like Hawk Tuah or Luigi Mangione captured our collective attention. Would AI have predicted that a good number of people would rally around someone killing another person? I think not.

The Full Stop Thought

AI is undoubtedly transforming how we work, and some jobs will inevitably be impacted. However, the idea that AI will replace most mid-level engineers at companies like Meta by 2025 is far-fetched. The reality is that AI tools are most effective as complements to human expertise, not replacements for it. Companies still need skilled engineers to maintain systems, adapt to changes, and ensure the quality of their products—and that’s not going to change anytime soon.

Here is the final thought… Currently, all AI systems today start with a user prompt. The keyword here is the user. Humans drive the direction of the work an AI system does because they aren’t self-aware of their environment. They don’t know what’s happening outside the digital world and the little box they live in. Until AI systems interfaces become a simple power switch without requiring a user prompt, these systems will need humans to direct what they produce. Period.

Voice Cloning: The Text-to-Speech Feature You Never Knew You Needed And Why It Matters

Over the holiday break, I started experimenting with cloning my voice for reasons I will get to later in this blog post. As I walked down the list of Voice Cloning providers out there and began to evaluate them using my cost-to-benefit ratio scale, a set of requirements and must-have capabilities emerged.

In this blog post, we will cover what those required features are, why they are essential for my scenario, why I feel those reasons will transcend into the general use case, and, ultimately, what it means for text-to-speech providers moving forward.

First Some Background

I have been in the Natural Language Processing (NLP) space for over 3 years. In that time, as most people do, I started looking to obtain accurate transcription from speech and then moved into trying to digest conversation to create “computer-generated” interactions. Large Language Models (LLMs) dramatically accelerated the accessibility and, quite frankly, the ability to do so in a meaningful way without a lot of effort.

After comprehension, most individuals move into increasing the level of interaction by being able to interface with these systems using humans’ other amazing tool.. Hearing. As humans, we don’t want to talk into a device and then have to read its output. I mean, heck, most people find subtitled movies beyond annoying if those subtitles drag out for anything more than a few minutes. Here, we start to see the need for text-to-speech, but what kind of voice should we use?

How I Tried Automating Myself

That voice depends on the use case. More to the point, that voice depends on how familiar you are with the “thing” you are interacting with. I use “thing” as this catch-all, but in reality, it’s some device you are conversing with. Moreover, depending on what that device is and what our connection with said device is, the voice used makes all the difference in the world in the experience of that interaction.

Let’s consider these scenarios:

Siri, Alexa, or Google

These devices are simple. You say a command, and Siri, Alexa, or Google (hopefully) give you a meaningful answer. You don’t place much weight on what kind of voice it replies with. Sure, it’s cute if it replies in an accent or if it can reply in Snoop Dogg’s voice, but in the end, it doesn’t really matter all that much for that interaction.

Call Center, Tech Support, etc

The next wave of voice interactions is replacing humans with voice automation systems. This is where most companies are today in this evolution. There are a ton of companies trying to do this for a variety of reasons, usually led by decreasing labor costs.

The most common use cases are replacing customer support staff with these automated systems. Today, this usually entails using Speech-to-Text to transcribe what someone on the phone is saying, transcribing that text to pass it off to a Large Language Model (LLM) or, more correctly, a Retrieval-Augmented Generation (RAG) system for better context, and then taking the output and passing it through Text-to-Speech to generate a human-like voice to feedback to the listener on the other end of the phone.

That human-like voice is essential for many reasons. It turns out that when people on the phone hear a computer voice made by Felix the Cat from the 60s, they are more likely to hang up the phone because no one wants to deal with a computer unless it is important enough to stay on the line. That last statement is very true. If I really, really need something, then I am going to endure this computer-based interaction by not hanging up.

It all comes down to companies (and the people in the next section) wanting to keep engagement (i.e., not hanging up the phone) as high as possible because they get something out of that interaction.

Content Creator to Mimic Myself

For this last use case, not only do we want the voice to be indistinguishable from a human, but we also want that voice to sound EXACTLY like me. This is the use case I was exploring. I want that voice to sound personalized because that voice will be associated with my brand and, more importantly, a level of personalization and relatability to my content. That is done by creating content or using a voice that is me.
Why was I interested in this use case? In this age of social media, there has been a huge emphasis on creating more meaningful content. For those that do this for a living, creating content in the form of audio (i.e., Podcasts, etc.) and specially recorded video (i.e., Vlogs, TikToks, etc.) is extremely time consuming. So, wouldn’t it be great if there was a way to offload some lower-value voice work to voice cloning? That’s the problem I was trying to solve.

If you are looking to tackle this use case, then based on the Call Center use cases, having your real voice intermixed with an AI clone of your voice that is just slightly off will likely be off-putting. In the worst case, your listeners might just “hang up the phone” on your content. This is why the quality, intonation, pauses, etc, in voice cloning, will make or break the platforms that offer voice cloning. If it doesn’t sound like you, you risk alienating your audience.

Why Voice Cloning Is Important

For Text-to-Speech platforms out there, voice cloning will be a huge deal, but the mainstream is not there yet… This is not because the technology doesn’t exist (it does) but because corporations are still the primary users by volume in Text-to-Speech (for now). They are busy trying to automate jobs away to replace them with AI systems.

In my opinion, there is already a bunch of social media content being generated with human-like voices; case in point, the annoying voice in the video below. Just spend 5 minutes on TikTok. I think once people start to realize the value of automating their own personal brand/content on social media and it’s accessible enough for creators, you are going to see an explosion of growth on the platforms that provide voice cloning.

Those platforms that don’t offer voice cloning will need to at some point or die. Why? Why pay for two subscriptions where one platform provides human-like voice for the Call Center use case and pay another subscription for a platform that provides pre-canned human-like voice but also allows you to clone your voice for social media (that could also be used to create your own set of pre-canned voices)? The answer is you don’t.

Where To Go From Here

In this quest to clone my voice, I tried a bunch of platforms out there, and I found one that works the best for me, taking things like price and intonation into account. I may have a follow-up blog post about the journey and process I used to select and compare all the services. If those are interested, a behind-the-scenes of what I will use voice cloning for might interest people reading this post.

Until then, I hope you found this analysis interesting and the breakdown for the various use cases enlightening. Until the next time… happy hacking! If you like what you read, check out my other stuff at: https://linktr.ee/davidvonthenen.

Top Reasons to Mark Your Calendar for SCaLE Next Year

In March, I had the fortune of attending and speaking at one of my favorite conferences, Southern California Linux Expo (SCaLE) 21x. As the name suggests, this is the 21st iteration of this tech-heavy yet family-oriented event, which usually takes place in Pasadena but, in some years, in the greater Los Angeles area. This is my sixth time attending (and 3rd time presenting), and I am glad to say that this year’s conference knocked it out of the park again.

What is SCaLE?

SCaLE is North America’s largest community-run open source and free software conference. The entire event, from setting up the networking to managing the session introductions, is all volunteer-based. This allows SCaLE to skip over the pay-for-play sessions you typically see at larger corporate events and focus on quality sessions that attendees are interested in. More importantly, this allows the event to keep the cost of attendance to under $100 for the entire 4-day event and maximum inclusion for those that want to attend.

Southern California Linux Expo 21x

The content ranges from topics like Kubernetes to Open Source AI to the Low-level Linux kernel. My favorite session topics always revolve around IoT/Edge, Security, and anything unique and obscure which you will definitely find a lot of here. I wanted to highlight a few of the more interesting (and hilarious) things I was able to participate in at SCaLE this year. I hope you will enjoy this too…

Kwaai Summit: Personal AI

You want to discuss a very meta but also a very real topic that will arrive at our doorsteps soon: Personal AI. What is Personal AI? It’s the idea that we will have AI systems making decisions on behalf of individuals, or more specifically, you. Whether you know this or not, this is already happening on a small scale (excuse the pun). These are things like your iPhone or Android making reservations at a restaurant or, a more concrete example, making recommendations on things you might be interested in purchasing based on your Instagram or TikTok feed.

Now, imagine we have all this data, information, choices, relationships, and associations to all these different disparate data points. How will these choices and products find their way to grab your attention? In the past decade, it’s been done through associations (when you Instagram heart or Facebook like something) and then extrapolating what else you might enjoy based on probabilities. For example, if you like baseball, you might want to purchase a Dodgers jersey.

Kwaai Summit

The next wave will resemble a personal assistant in the form of an AI agent talking to external AI agents and then making decisions based on those interactions. Your AI agent knows everything about you. What you like, who your friends are, your background, and all other aspects of your life. Based on your unique profile, this AI agent will genuinely know how to interact with your digital environment based on who you are and what apps and access you have.

The Kwaai Summit discussed the new interactions and connections we will have with these AI systems. This was a fascinating series of talks. I recommend checking out The AI Accelerated Personalized Computing Revolution by Pankaj Kedia below.

If we start interacting with the world via proxy using our AI Agents, there will be a lot of interesting fallout from these interactions. First, what controls your AI Agents’ access, and how does it establish trust with these external AI agents? This is important because if these agents act on our behalf, what determines whether these interactions are good and allowed? Second, where did your AI Agent come from? As a precarious scenario, if your agent was created by Amazon, it might steer you to Whole Foods for all your grocery needs. Definite conflicts of interest there.

As a follow-up to this topic, I would check out AI and Trust by Bruce Schneier below. What an interesting future indeed.

Shameless Plug: My Session About Voice AI Assistants

My session at SCaLE was entitled Voice-Activated AI Collaborators: A Hands-On Guide Using LLMs in IoT & Edge Devices. The discussion was framed by landing LLMs and other machine learning models on IoT and Edge Devices and the complications from working in resource-constrained environments, that is, environments with smaller amounts of memory, CPU, etc. When building your IoT or Edge device, you have decisions on how much “work” you want to do on your Edge Device versus remotely in the cloud. More work means more resources. More resources mean a high-priced device.

Since Voice AI Agents, like Alexa, Siri, or Google Home, don’t have traditional graphical user interfaces and solely rely on using spoken word for interaction, the focus of this talk centered around how the transcription accuracy of the commands you give can dramatically impact the quality of the prompt to your LLM or the input to your machine learning models.

If you are interested in learning more about how to optimize running machine learning models at the Edge, check out my recording below:

Turn on the Funnies

I promised something funny, and one of the staples at SCaLE is your annual talk by Corey Quinn. He often pokes fun at topics all throughout the tech industry. He literally does this every single year. It’s tradition at this point. This year’s topic is where I spent a good 7 years of my life dealing with… Kubernetes. A good portion of it is spot on. His talk Terrible Ideas in Kubernetes was another huge success.

SCaLE Recap

Wrapping up an event like SCaLE is no small feat. I would highly recommend attending this conference next year for those who’ve never had the pleasure of attending. What sets SCaLE apart isn’t just its impressive array of sessions ranging from Kubernetes intricacies to the latest in open source AI, but SCaLE stands as a beacon of community, innovation, and inclusivity, and drawing tech enthusiasts from every corner. For me, the biggest draw is to hear from diverse perspectives all throughout the tech industry and meeting new people in a techy social setting.

For those contemplating bringing their families along, you’ll find SCaLE to be an unexpectedly family-friendly event. Imagine sharing your passion for tech while your loved ones enjoy many activities, like Saturday’s Game Night, which offers everything from board games and video games to virtual reality headsets. If you’re based in or near Los Angeles or are looking to attend a conference on the west coast, SCaLE is the place to be with its information-packed sessions, grassroots vibe, and watercooler-style discussions with subject matter experts throughout the industry.