An AI Cloned Voice Led to a $35 Million Bank Robbery – Here’s How to Not Get Fooled by Deepfakes

In what seems to be an iconic bank heist movie scenario, a deepfake audio convinced a bank employee to transfer a significant sum of money to a fake account.

Deepfake is one of the latest emerging and evolving technologies that gracefully enters the cyber-attacks scene. While the initial intent of deepfakes was to create entertaining moments, it didn’t take long before criminals seized this tech to achieve their goals.

Let’s uncover all the details of this audio scam and how you can spot a deepfake.

What Happened During the $35 Million Voice Clone

‘Seeing is believing’ – could be the ideal motto for a deepfake video. Replace with ‘hearing’, and it can apply to an audio deepfake just as well, especially if it’s a very well-engineered one. Because that’s exactly how it happened during this innovative bank robbery.

Criminals used a deepfake audio scam and managed to steal $35 million from a bank in the United Arab Emirates in January 2020.

The unreported incident was revealed at the surface just now because Forbes magazine found court documents where the UAE requested help from American investigators to trace $400,000 of the stolen funds. Further investigations showed the attackers planned their scheme in detail, sending several convincing emails and finally using a deepfake voice to persuade a bank employee to make the transfer.

The scam involved a bank manager from the United Arab Emirates who had a phone call conversation with a person he knew – a manager at a company he’d spoken with before. The company manager needed help from the bank with some transfers, precisely $35 million, and mentioned his lawyer would handle all the procedures. As the bank manager received emails from the lawyer, he didn’t suspect anything fishy, so he made the transfers.

It wasn’t until too late that he discovered he was the victim of a ‘deep voice’ swindling.

This is the second known voice-based deepfake incident that involves financial fraud. The first one occurred in March 2019, where a deepfake audio that replicated the voice of a manager of a UK energy company convinced an employee to transfer $243,000 into what the employee thought was a company account.

Experts believe this could be the beginning of a new type of deceit that will expand. Their assumptions rely on the fact that faking an audio recording or a voice is easier than making a deepfake video. Just think that anyone can collect voice biometrics either by recording someone live or with the help of a device such as a phone.

What’s a Deepfake

The ‘deepfake’ term derives from the basic deep learning technology (a form of artificial intelligence) which is used to create a fake image, sound, and video. Also called “synthetic media”, a deepfake is meant to replicate a video or footage that was created with traditional tools such as a video camera.

In reality, those who create a deepfake use sophisticated software. Most often, deepfakes swap faces in videos and change one person’s face into another’s in such a way that the human eye wouldn’t be able to tell the difference.

Applying deep learning algorithms, a deepfake would take an original video as the baseline and then add a collection of other videos with the ‘fake’ character – the person who impersonates the original character.

This process sometimes runs thousands of face shots of the two people, as the algorithm tries to find similarities in the features of the two faces, scaling down the discrepancies and eventually compressing the images. A different algorithm would then have to come and restore the faces from the compressed images.

All in all, deepfake replaces a video camera with computer code manipulation.

tom cruise viral deepfakes

Twitter, Tom Cruise is just one example of several famous people who were the targets of viral deepfakes

The Problem with Deepfakes

Since they can make almost anything look incredibly real, cyber-attackers or perpetrators, in general, could embed deepfakes in all their schemes. For instance, we might soon face phishing attacks transformed in video conversations (which are in fact deepfake videos), as well as kidnappers who use deepfake videos of family members when requesting a ransom.

Creators of deepfake technology can even go so far and make it seem certain people have indecent behavior or accuse them of something they never did. Imagine someone making a false accusation about you and showing a deepfake video to support the claims. It can be quite tricky to prove it’s a fake, and even if you can, your image or reputation might have already suffered.

Let’s not forget deepfakes started by putting celebrity faces into pornographic videos.

You might also remember the rather funny ‘I’m not a cat’ Zoom video moment when a Texas lawyer didn’t know how to get rid of a kitten avatar filter during a court hearing.

How to Recognize a Deepfake Video

As with anything fake, the devil’s in the details. And because deepfakes become more frequent, this is just another skill that’s best to add to your ‘How to be cybersmart’ list.

Here’s what you need to examine closely:

Unnatural animated faces

Most deepfakes can’t replicate a human face, movements and gestures that look 100% real. For instance, check if or how many times the character blinks and if it looks natural. Also, look for elements like skin tone and how the lighting or shadows are displayed.

Pixel patterns

This may be a detail that designers could spot easier, but anyone who inspects carefully can identify elements that don’t evenly match. For example, a character’s face can look blurrier compared to the background or edges of the face aren’t sharp, parts of the skin or hair don’t have the exact same color.

Lack of facial expressions or emotions

Apart from the physical attributes, you should also pay attention to facial gestures and see if they correspond to the character’s words. Saying something funny would normally be linked at least to a smile. The lack of any facial expression should really raise a big question mark.

If the character is a famous person, look for other videos, so you can compare the facial expressions and get an idea if they match.

Audio doesn’t match the video

Sometimes, people who create a deepfake don’t go all the way and focus on either the video or the audio part. So, while the video is very well crafted, you can tell that the audio was poorly manipulated. That’s how you know the voice doesn’t match the character.

If you’re simply dealing with audio, whether a recording or a conversation, you can just double check if the person was actually the one you spoke with. Call them back or send them a message to confirm.

Additional tip: Try out deepfake detection software; they analyze photos and videos and display a confidence score that tells you the odds of the content being fake. Microsoft and other companies like Adobe and Deeptrace have already created this type of detection tool.

Deepfakes in the Future

Although, for the moment, anyone can access software that helps you create a deepfake, it’s not exactly easy to make one. Still, it doesn’t mean things won’t change in the future.

Based on researchers’ predictions, in up to 5-10 years, deepfake software is bound to become a DIY technique. Basically, the same way anyone can use Snapchat today will also use a deepfake smartphone app. It sounds fun, but it’s also concerning. In a world where you’ll have to differentiate the real from the fake every second could turn out to be extremely exhausting, not to mention the ethical issues that could derive from it.

The only silver lining we can hope for is a specific law that regulates deepfakes and makes sure they can’t easily end up in the wrong hands would soon be enacted.


Have you ever got fooled by a deepfake video? Which one was it?

Let me know in the comments below.

Leave a comment

Write a comment

Your email address will not be published. Required fields are marked*