Domains Are Scary

A lot of our Internet depends on domain names. It's the one source we rely on for verifying authenticity of a website. HTTPS/TLS relies upon users checking the domain of the link they click. I really doubt most users will do so.

This is why Chromium has written guidelines for presenting URLs correctly, to reduce the ability of attacker to present to users a trustworthy-looking URL. There is also an extended document in Chromium source code. However, Chrome hasn't implemented it yet. Disappointing...

Source: Google Chromium Project. The guidelines recommends browsers to show the effective top-level domain, which they call "eTLD".

Using long URLs that confuse and mislead users is one thing, but someone paying slight attention to the URL may notice. An attacker can go even further by obtaining domains that remove a dot or replace a dot with a dash. Most organisations will not spend the effort to buy up all these variations of their domain, therefore it's possible to register one of them and impersonate the organisation. Here's some examples:

There's another attack, the IDN (Internationalised Domain Name) homograph attack that brings it a step further making use of IDN.

IDN is a recent feature in DNS that enables Unicode to be used in domain names, making domains like 蔡慈明.com (represented in ASCII as xn--5hu39h2v7a.com for compatibility) and emoji domains like 💩.la  possible.

Because some Unicode characters look really similar to others, you can purchase a domain that looks similar to another domain's website. An example on Wikipedia is the domain wikipediа.org (xn--wikipedi-86g.org) which mixes Latin and Cyrillic letters, and pаypal.com (xn--pypal-4ve.com).

Most browsers have mitigated this by employing a complex algorithm (see Chrome and Firefox's documents on IDN attacks) to render only some domains, preventing visually identical characters from being shown. Some TLDs choose also to prevent domain registrations that don't follow certain guidelines.

The one thing both implementations don't prevent is accented characters. There is no way to block accented character without being biased towards Latin scripts. An attacker can craft a domain that looks similar and accepted by browsers. Here are some scary examples:

These domains are only one pixel away, and that one pixel can easily go unnoticed. Same goes for other diacritics and accents on other characters. It becomes extremely hard, at a glance, for users to detect the difference in the domain they are visiting.

A possible solution I have come up with is to do a hash on the website's domain and display it in the URL bar as a colour or Emoji, similar to how lesspass.com displays symbols beside your master password for you to verify you entered it correctly. Maybe someone should write a Chrome extension to do this...