Will AI Replace Mid-Level Engineers by 2025? Not So Fast, Mark Zuckerberg

It’s hard to ignore the growing buzz around artificial intelligence (AI) and its potential impact on various industries. Recently, Mark Zuckerberg predicted on Joe Rogan’s podcast that AI could replace mid-level engineers by 2025. While it’s a compelling narrative, it misses the mark for several reasons. Let’s unpack why this prediction is more hype than reality.

Companies Aren’t Fully Using Their Own AI Tools

Take Salesforce as a prime example. The company has heavily promoted its AI-powered sales agents, touting them as the future of sales. Yet, if you look at Salesforce’s own career page, approximately 75% of their job postings (775 out of 1035 as of Jan 16, 2025) are for sales roles. If their AI tools were truly ready to replace human salespeople, why wouldn’t Salesforce “dogfood” their own product, slash sales jobs, and reap massive savings?

Dogfooding Gone Wrong

This disconnect isn’t unique to Salesforce. Many companies pushing AI solutions still rely heavily on human expertise to deliver the results they promise. It’s one thing to sell the dream of AI-driven automation, but it’s another to trust your core operations to it. If organizations like Salesforce, which stand to gain the most from successful AI adoption, aren’t betting the farm on their own tools, why should we believe AI will displace engineers en masse at other companies?

AI-Generated Code Still Needs Maintenance

Even if AI can write functional code, that doesn’t eliminate the need for mid-level engineers. All code, no matter how well-written, eventually requires updates. Security vulnerabilities need patching, APIs evolve, dependencies get deprecated, and business requirements change. Who’s going to handle these inevitable maintenance tasks? AI might be able to assist, but it can’t completely replace the nuanced understanding of a system that a human engineer provides.

Consider the metaphor of AI as a power tool for software development. It can make some tasks faster and easier, but someone still needs to wield the tool, know how to use it safely, and fix the mess when something goes wrong. Far from making engineers obsolete, AI tools are likely to amplify their productivity—and perhaps even increase demand for engineers who can effectively integrate these tools into their workflows.

AI Generated Code

If companies like Meta actually moved forward with replacing most of their mid-level engineers, they’d quickly find themselves in a “foot-and-gun” scenario. Without a robust team of engineers to maintain and adapt AI-generated code, systems would break down, product development would stall, and customer trust would erode. It’s a short-sighted strategy that prioritizes immediate cost savings over long-term resilience.

Selling the Promise of AI Is in Their Interest

It’s no secret that tech giants have a vested interest in promoting AI as the next big thing. AI and machine learning are lucrative business lines, and hyping up their potential is a great way to attract investment, sell products, and capture headlines. By framing AI as a technology capable of replacing entire swaths of the workforce, these companies generate excitement and urgency around adopting their solutions.

Heck, I am an AI/ML Engineer… I am in the space promoting the same thing, but my views on AI/ML is that they are HIGHLY strategic tools to be used by people. Replacing mid-level engineers isn’t just a technical challenge; it’s a strategic one. Engineering teams don’t just write code—they collaborate, solve complex problems, and adapt systems to changing business needs. These human-centric tasks are not easily outsourced to AI, no matter how advanced it becomes.

AI Generated Code

At the end of the day, humans consume the products that these companies produce. Until that changes, people will make the decision on what to buy and companies need to persuade those people to choose to buy their products. AI/ML systems don’t understand why things go viral, why we collectively like what we do, and why things like Hawk Tuah or Luigi Mangione captured our collective attention. Would AI have predicted that a good number of people would rally around someone killing another person? I think not.

The Full Stop Thought

AI is undoubtedly transforming how we work, and some jobs will inevitably be impacted. However, the idea that AI will replace most mid-level engineers at companies like Meta by 2025 is far-fetched. The reality is that AI tools are most effective as complements to human expertise, not replacements for it. Companies still need skilled engineers to maintain systems, adapt to changes, and ensure the quality of their products—and that’s not going to change anytime soon.

Here is the final thought… Currently, all AI systems today start with a user prompt. The keyword here is the user. Humans drive the direction of the work an AI system does because they aren’t self-aware of their environment. They don’t know what’s happening outside the digital world and the little box they live in. Until AI systems interfaces become a simple power switch without requiring a user prompt, these systems will need humans to direct what they produce. Period.

Voice Cloning: The Text-to-Speech Feature You Never Knew You Needed And Why It Matters

Over the holiday break, I started experimenting with cloning my voice for reasons I will get to later in this blog post. As I walked down the list of Voice Cloning providers out there and began to evaluate them using my cost-to-benefit ratio scale, a set of requirements and must-have capabilities emerged.

In this blog post, we will cover what those required features are, why they are essential for my scenario, why I feel those reasons will transcend into the general use case, and, ultimately, what it means for text-to-speech providers moving forward.

First Some Background

I have been in the Natural Language Processing (NLP) space for over 3 years. In that time, as most people do, I started looking to obtain accurate transcription from speech and then moved into trying to digest conversation to create “computer-generated” interactions. Large Language Models (LLMs) dramatically accelerated the accessibility and, quite frankly, the ability to do so in a meaningful way without a lot of effort.

After comprehension, most individuals move into increasing the level of interaction by being able to interface with these systems using humans’ other amazing tool.. Hearing. As humans, we don’t want to talk into a device and then have to read its output. I mean, heck, most people find subtitled movies beyond annoying if those subtitles drag out for anything more than a few minutes. Here, we start to see the need for text-to-speech, but what kind of voice should we use?

How I Tried Automating Myself

That voice depends on the use case. More to the point, that voice depends on how familiar you are with the “thing” you are interacting with. I use “thing” as this catch-all, but in reality, it’s some device you are conversing with. Moreover, depending on what that device is and what our connection with said device is, the voice used makes all the difference in the world in the experience of that interaction.

Let’s consider these scenarios:

Siri, Alexa, or Google

These devices are simple. You say a command, and Siri, Alexa, or Google (hopefully) give you a meaningful answer. You don’t place much weight on what kind of voice it replies with. Sure, it’s cute if it replies in an accent or if it can reply in Snoop Dogg’s voice, but in the end, it doesn’t really matter all that much for that interaction.

Call Center, Tech Support, etc

The next wave of voice interactions is replacing humans with voice automation systems. This is where most companies are today in this evolution. There are a ton of companies trying to do this for a variety of reasons, usually led by decreasing labor costs.

The most common use cases are replacing customer support staff with these automated systems. Today, this usually entails using Speech-to-Text to transcribe what someone on the phone is saying, transcribing that text to pass it off to a Large Language Model (LLM) or, more correctly, a Retrieval-Augmented Generation (RAG) system for better context, and then taking the output and passing it through Text-to-Speech to generate a human-like voice to feedback to the listener on the other end of the phone.

That human-like voice is essential for many reasons. It turns out that when people on the phone hear a computer voice made by Felix the Cat from the 60s, they are more likely to hang up the phone because no one wants to deal with a computer unless it is important enough to stay on the line. That last statement is very true. If I really, really need something, then I am going to endure this computer-based interaction by not hanging up.

It all comes down to companies (and the people in the next section) wanting to keep engagement (i.e., not hanging up the phone) as high as possible because they get something out of that interaction.

Content Creator to Mimic Myself

For this last use case, not only do we want the voice to be indistinguishable from a human, but we also want that voice to sound EXACTLY like me. This is the use case I was exploring. I want that voice to sound personalized because that voice will be associated with my brand and, more importantly, a level of personalization and relatability to my content. That is done by creating content or using a voice that is me.
Why was I interested in this use case? In this age of social media, there has been a huge emphasis on creating more meaningful content. For those that do this for a living, creating content in the form of audio (i.e., Podcasts, etc.) and specially recorded video (i.e., Vlogs, TikToks, etc.) is extremely time consuming. So, wouldn’t it be great if there was a way to offload some lower-value voice work to voice cloning? That’s the problem I was trying to solve.

If you are looking to tackle this use case, then based on the Call Center use cases, having your real voice intermixed with an AI clone of your voice that is just slightly off will likely be off-putting. In the worst case, your listeners might just “hang up the phone” on your content. This is why the quality, intonation, pauses, etc, in voice cloning, will make or break the platforms that offer voice cloning. If it doesn’t sound like you, you risk alienating your audience.

Why Voice Cloning Is Important

For Text-to-Speech platforms out there, voice cloning will be a huge deal, but the mainstream is not there yet… This is not because the technology doesn’t exist (it does) but because corporations are still the primary users by volume in Text-to-Speech (for now). They are busy trying to automate jobs away to replace them with AI systems.

In my opinion, there is already a bunch of social media content being generated with human-like voices; case in point, the annoying voice in the video below. Just spend 5 minutes on TikTok. I think once people start to realize the value of automating their own personal brand/content on social media and it’s accessible enough for creators, you are going to see an explosion of growth on the platforms that provide voice cloning.

Those platforms that don’t offer voice cloning will need to at some point or die. Why? Why pay for two subscriptions where one platform provides human-like voice for the Call Center use case and pay another subscription for a platform that provides pre-canned human-like voice but also allows you to clone your voice for social media (that could also be used to create your own set of pre-canned voices)? The answer is you don’t.

Where To Go From Here

In this quest to clone my voice, I tried a bunch of platforms out there, and I found one that works the best for me, taking things like price and intonation into account. I may have a follow-up blog post about the journey and process I used to select and compare all the services. If those are interested, a behind-the-scenes of what I will use voice cloning for might interest people reading this post.

Until then, I hope you found this analysis interesting and the breakdown for the various use cases enlightening. Until the next time… happy hacking! If you like what you read, check out my other stuff at: https://linktr.ee/davidvonthenen.

2024 RTC Conference Recap: Shining a Spotlight on AI in Healthcare and Voice AI Assistants

The 2024 Real Time Communication Conference at Illinois Tech was an electrifying event, showcasing emerging technologies across Voice, WebRTC, IoT/Edge, and groundbreaking research. But if you ask me, the real magic happens in the conversations between sessions. These impromptu chats with attendees always spark new ideas, collaborations, and insights that you won’t find on any slide deck. It’s a space where cutting-edge tech meets human curiosity and creativity, making for an unforgettable experience.

I had the pleasure of presenting two sessions this year, both deeply focused on AI’s transformative potential. From training machine learning models for medical analysis to mining digital conversations for actionable insights, here’s a recap of the key takeaways from both sessions—and resources to keep the learning going.

Session 1: Machine Learning for Good – Training Models for Medical Analysis

In this keynote, co-presented with Nikki-Rae Alkema, we explored how machine learning is reshaping healthcare, especially in diagnostics. We focused on multi-modal/model — the fusion of audio, video, and sensor inputs to catch conditions like Parkinson’s Disease early. By analyzing subtle cues across different data types, we’re not just looking at isolated symptoms but building a more comprehensive picture of patient health.

This session emphasized the human aspect of AI. It’s not about replacing healthcare professionals but augmenting their abilities. Every algorithm, every data point analyzed, translates to real human stories and health outcomes. The goal? To move healthcare from a reactive to a proactive stance, where early detection becomes the norm rather than the exception.

This work underscores the potential for machine learning to empower medical professionals with insights that weren’t possible before, bringing us closer to a future where AI truly enhances human care.

Session 2: Mining Conversations – Building NLP Models to Decode the Digital Chatter

In our increasingly digital world, conversation data is a treasure trove of insights. This session dove into the intricacies of Natural Language Processing (NLP), specifically how to build multiple NLP models to work in concert. Whether it’s Slack messages, Zoom calls, or social media chatter, there’s a wealth of unstructured data waiting to be harnessed.

We walked through collecting raw data from WebRTC applications, then cleaning, tokenizing, and preparing it for machine learning pipelines. This process enables us to extract meaningful insights, classify content, and recognize entities—turning raw digital chatter into a strategic asset.

Whether you’re analyzing customer service interactions or mining social media for trends, these NLP techniques open doors to more profound, data-driven insights, directly applicable to real-world use cases.

The Magic of In-Between Sessions: Final Thoughts

What makes the RTC Conference truly special is the community. Between presentations, I had fascinating discussions with industry leaders, researchers, and fellow AI enthusiasts. These conversations often linger on the edges of what we’re presenting, pushing ideas further and sparking fresh perspectives. From discussing the ethics of AI in diagnostics to exploring how NLP can evolve to understand more nuanced human emotions, these interactions made for a vibrant and thought-provoking experience.

If you missed the event, the session recordings are available through the official conference site now! Take a look at the slides, code and more! Here’s to embracing AI’s potential together—until next time!