The PhishLabs Blog

How To use URL Pattern Analysis for Phishing Detection & Mitigation

Posted by Lindsey Havens on May 5, '17

Virus Detection.jpgWhen you’re attempting to mitigate the risk of phishing, threat intelligence plays a vital role.

After all, what better way to predict and intercept future phishing attacks than by analyzing past attacks for patterns and indicators?

This post is the second in a series breaking down lessons learned from our recent consumer-focused phishing webinar. In the first post we covered the value of phishing intelligence, and explained how to use source code analysis to link individual phishing sites back to the phishing kits and actors responsible.

Today, we’ll take a look at how URL pattern analysis can dramatically reduce the workload required to produce valuable phishing intelligence, and ultimately enhance your ability to detect and mitigate phishing threats.


Want to hear more, including the full breakdown of a phishing campaign targeting a worldwide entertainment brand? Watch the entire webinar for FREE right now.

Watch the Webinar Now

 


The What and Why of URL Pattern Analysis

In essence, URL pattern analysis is the process of taking a large subset of phishing site URLs, and attempting to identify patterns which can be used to cluster those URLs into groups, or “families”. To do this, you can ask yourself a series of questions, such as:

  • Do any of these sites share the same (or similar) landing page names?
  • What do the directory structures of each site look like? Are there any patterns or similarities?
  • What about the content hashes? Are the hashes for any of these sites the same, either in raw or normalized form?
  • Do any of the sites demonstrate traits consistent with common defensive tactics, such as directory generators, or URL parameters?

Simply by answering these questions, you can quickly group a very large subset of URLs into a much shorter list of URL families.

And here’s the thing. By doing this, you have effectively “shrunk” the phishing ecosystem. Instead of having to study each site individually, you can now analyze each distinct family for characteristics that might tie that group of phishing sites to a specific phish kit or actor.

Ultimately, this process will help you to identify the most significant current threats and campaigns, which will have a profound impact on the ROI for your detection and mitigation activities.

URL Pattern Analysis: An Example

In order to better understand how URL pattern analysis works, it may help to run through a simple example.

The table below contains nine real phishing site URLs, along with some basic information about each.

Screen Shot 2017-05-04 at 12.20.09.png

At first glance each of the URLs seems distinctly individual, but with a little analysis it quickly becomes apparent that they can be divided into clear groups.

For instance, three of the URLs share very similar directory structure (/www.arc.com/login/) and use identical page names: login.html

Screen Shot 2017-05-04 at 09.56.00.png

Taking the similarities a stage further, the hashes (Content MDS) are identical for each of the three URLs.

Based on these similarities, these sites can comfortably be grouped into a single family.

Next up, we have a second family highlighted in the table below:

Screen Shot 2017-05-04 at 09.58.04.png

In this case it’s a little more difficult to identify similarities, because all three URLs have different landing page names, and show no real similarities in directory structure. In fact, pretty much nothing about these three URLs suggests an obvious link.

When we consider the page hashes, though, we see that once again they are identical for all three sites. This is something we see frequently in cases where a particular phish kit has been distributed freely over a long period of time. Over time, phishing actors have gradually modified the page names and directory structures used, but the actual page content (and thus page hashes) remains the same.

Finally, we can link the remaining three URLs into yet another distinct family. But once again, at first glance, it’s difficult to spot a clear link.

Screen Shot 2017-05-04 at 10.01.09.png

This time, each of the sites’ hashes are completely different, and there are no obvious similarities in the landing page names.

There is, however, a clear similarity in the URLs’ directory structures. First, each of the URLs contain the string /arcphish/login/, which is a strong indication that there is some link between the sites.

Even more significant, though, are the landing page names. These long, seemingly randomized page names are consistent with a particular style of directory generator, which dynamically generates a new directory for each individual visitor. Once the new directory has been created on the phish site’s host server, all of the necessary files are copied into it, and a unique URL is generated.

Why go to all that effort? Simple. Phishing actors know that by generating a unique URL for each individual user, they have a far better chance of evading router-based or browser-based blocking techniques.

Of course, there are plenty of phish kits and actors that make use of this technique. When combined with /arcphish/login/ string, though, it becomes clear that these three URLs are almost certainly linked to a single source.

Screen Shot 2017-05-04 at 10.05.04.png

Linking Isn’t the End of the Story

“Well that’s great,” You might be thinking. “I’ve grouped my phishing URLs into a handful of families… now what?”

Well here's the thing. URL pattern analysis is a tremendously valuable tool in the threat intelligence production process, but it isn’t intended to be used in isolation.

Grouping phishing URLs into a much smaller number of families you can save a lot of time and energy, because it enables you to focus your efforts on the specific phish kits and actors most likely to target your organization. But, of course, once the pattern analysis has taken place, you’ll still need to expend the time and resources necessary to fully understand your organization’s most pressing threats.

Ultimately, you’ll need to use the threat intelligence you’re producing to inform sensible and effective countermeasures. Phishing intelligence can (and should) be used to inform improvements to all manner of operational security functions, such as firewalls, advanced spam filters, and vulnerability/patch management systems.

But as we already know, there is only so much technical controls can do to thwart phishing attacks.  Contact us today to learn how your organization could benefit from phishing threat intelligence produced by the PhishLabs R.A.I.D team. 

Topics: Phishing, Threat Intelligence

   

What's this all about?

The PhishLabs Blog is where we share our insights and thoughts on cybercrime and online fraud.

Upcoming Events

Calendar_Mock_

Posts by Topic

see all