Deconstructing A Phishing Attack

Everybody who has ever had an email address (so every single one of you reading this article) will have received a phishing email at one time or another (whether you realised it or not). These emails are looking to capture your sensitive data by using any one of a number of techniques, with a view of replaying or selling your data online.

Recently a client of ours received a phishing email, reporting to be from their organisations “IT Help Desk” with an ‘.htm’ file attached requiring the user to change their Microsoft password to keep their account secure. Immediately recognising the email as a phishing attempt, the client contacted us to investigate further.

During the analysis, I realised this was a fairly advanced phishing attempt that used multiple obfuscation techniques to avoid detection by anti-virus & spam filters that we don’t see very often. To that end, I decided to share with you the process I followed to find out exactly what the document contained, what it’s objective was, what data it was actually capturing and where it was sending that data to.

I will go through the whole process I followed, discussing the logic behind my decisions and why I drew certain assumptions. At the end of the article I will also include a modified copy of the original file so you can see for yourself the process. (PLEASE NOTE: THIS FILE WILL BE A LIVE PHISHING FILE. WHILST EVERY EFFORT HAS BEEN MADE TO RENDER THIS SAFE. DO NOT OPEN THIS IF YOU DO NOT CONFIDENTLY KNOW HOW TO SAFELY HANDLE SUSPICIOUS FILES ON YOUR SYSTEM. ANY DAMAGE TO, OR LOSS OF DATA ON YOUR SYSTEM IS YOUR RESPONSIBILITY.)


You need to understand that web page files such as .htm, .html, .xml & .php files can be extremely dangerous. As you will see later in this article, they can contain various programming scripts that can execute on your machine as soon as they are opened and can lead to significant compromising of your machine and data. If you EVER receive a file attached to an email and it contains a web page file and you feel the irresistible urge to open it, never open it on your main machine. ALWAYS open it in a secure, isolated environment first so you can be satisfied it is safe.

With that said, let’s get into the process of deconstructing our example.

Step 1. – Opening the file.

Obviously with the file being suspicious, we were not about to open it on our daily machines. So we spun up an Ubuntu desktop on a Kasm workspace so we could work with the file in a sandboxed environment. Next we used a VPN to mask our real public IP address so that any data sent to the remote attacker could not be easily traced back to us (this turned out to be unnecessary as the remote server had been suspended. Read on to find out more).

Opening the email, and attached .htm file in, we were presented with an exceedingly convincing Microsoft login page with the victims email address already entered (the details you see in our images have been changed to protect our client, obviously!) .

Taking aside the fact this is a locally stored web document, which Microsoft would never send out and hence is guaranteed to be fake. This must be about the most convincing spoofed site I have ever seen. On pure looks alone, I have not been able to find a difference between this page, and the normal Microsoft login page. Functionality wise though, there are some key parts that do not work (such as the ‘Forgot Password’ button) that ring alarm bells for you that this is fake if you open something like this.

When we entered a random password into the form, the form reloaded again telling us the password was wrong. When we entered the same password again, the form redirected us to the clients website. (An interesting point of note, when I changed the email address being used in the attack to the one you see in the picture above and tried the form again, the form redirected me to ‘’, demonstrating that the form is using the email domain as the redirect location).

Now this is great, it is clear the form is attempting to harvest our Microsoft credentials in order to use these for nefarious purposes in the future. But is the file doing anything else while it is at it?

To answer that, we need to take a look at the code.

Step 2 – Decoding the code

Next I opened the .htm file using VS Code to take a look at the inner workings and this is where it starts getting interesting. The file contains nothing more than a HMTL script.

This threw me at first. With hindsight it probably shouldn’t have but it did. The script starts with a variable containing the victims email address, then a variable with a random string of data, another variable that is manipulated based on some calculations, another variable containing a HUGE, seemingly encrypted payload, and then a final function at the end to rewrite the document based on the output of the preceding script.

Now on its own this doesn’t tell us anything. That encoded ‘Q’ constant on it’s own certainly wouldn’t render a webpage so there must be some additional work hiding somewhere (or obfuscated from view). This must mean these script elements are all linked together.

To understand what this file is actually capable of doing, I needed to see if I could decode that ‘Q’ constant. Being the biggest element of the page, this must hold the core data that is controlling everything.

I already knew that character ‘escaping’ was a common way of hiding usually perfectly readable information in plain sight by converting the string to it’s corresponding hex value.

This last line in the document confirms that this is the case here so I took the ‘Q’ string and ran it through an ‘unescaper’ but the output was just as unreadable as the input.

Back to the drawing board and on examining the ‘nwe’ section of the script a little closer. A few things become a little more obvious.

Whilst looking at this code, it became clear that the ‘returrn’ constant was pulling data down from the ‘x’ constant. So the zero in ‘[x[0]’ is equal to the corresponding position (in this case ‘e’) in the ‘x’ constant. Applying this logic to the whole string the ‘returrn’ constant becomes ‘replaceAll(%3%,3)’.

replaceAll() is a java script tool that looks for and replaces all instances of one thing with another. So in this case, the replaceAll() function looks for every instance of ‘%3%’ in the document and replaces it with ‘3’. This is a very clever way of hiding an obfuscator from virus detection whilst making it very simple for a browser to decode.

So I replaced all the instances of ‘%3%’ with ‘3’ and then put the resulting ‘Q’ string back through the ‘unescaper’ and, low and behold, we now have the full HTML code in plain text. I could now examine in clear detail exactly what this file could and could not do.

Step 3 – Deep Diving the Code

The code shows us many things. It contains all the CSS styling required to allow the document to be entirely self sufficient without an internet connection. It also contains the body of the web page with a web form (that is how they obtain the data) and a number of Java scripts that control the page.

Of note to me was one particular script:

This has a number of different elements that tell us how the attacker obtains the data they are looking for. Examining this script against the Body of the html code, I was able to identify that the attacker is collecting the victims email address and password from the web form, converting it into a JSON object then using a html POST request to send this to a URL.

To make this a little clearer, here is the same script, but annotated to say what each key element does:

The script also provides information that unlocks other elements of the document that are useful to us.

On line 10460 within the .ajax section of the script, there is an ‘atob(file)’ function. ‘atob()’ is used by Java script to decode strings that have been encoded using Base64. This tells us that the ‘file’ variable must be a Base64 encoded string.

Following this thread back up the document, we see this line:

This means that the ‘file’ variable used in the URL section is equal to the ‘Qiles’ variable used at the start of the document. So this must mean the following line is a Base64 encoded string:

Suddenly we now know how to decode this string and, not surprisingly, putting the string through a Base64 decoder returns the following: “”

This is the URL that the captured data is being sent to. Huzzah! We can defend against this!

Examining this URL, we see that it comes back to a hosting account held on a Brazilian organisations servers and that, indeed, that hosting provider has suspended the hosting account. Presumably because of the volume and type of traffic connecting to the suspicious domain.

Unfortunately We can see that the script is taking the harvested data, converting it to JSON objects, adding these objects to the .php URL as parameters and executing this URL on a .php page which likely uses the POST request to commit the data to a database.

Wireshark captures on the workspace also show that, when opening the document and providing the relevant information, the workspace did indeed make a TLS connection to the I.P. address matching the domain in the file.


So now we know what the file is capable of (in this case it is purely using a web form to capture credentials. There is no other nefarious scripting present in the document). We know what data is being collected (in this case, the attacker is obtaining the victims Microsoft credentials). We also know where the data is being sent (the decoded URL) and how it is being transmitted.

We can use all of this information to now harden the clients defences, blocking any communication to that particular domain, blacklisting the I.P. addresses and creating more detailed email filtering rules to identify these kind of threats.

The layout of the document also tells us something about the sophistication of the attack. The core code of the document (the double obfuscated ‘Q’ section) is universal and the whole thing revolves around the two variables set out right at the start of the document. This not only allows the document to circumvent most file scanners, but it also makes it very simple to automate the production of these documents to target victims en-masse based on a list of harvested email addresses.

To automate the document, the attacker would simply need to amend the template to include the Base64 encoded URL they are using to receive the data, and then execute a script to create unique files for each email address in their list. They may even use a Phishing package to achieve this automatically and to control the receipt of harvested data.

To walk through this process yourself please feel free to download the files and take a look for yourself. These are available at the link below:

If you would like to know more about how Pride Security can help your business defend against attacks like this, please get visit our website at or give us a call on +44(0)1332 949706

Send Us a Message Below

Contact Info

Let's Connect

To connect with us, please use the details below to get in touch or fill out the form to send us a message directly​.

Social Media Auto Publish Powered By :