A Website in the Forest

“If a tree falls in a forest and no one is around to hear it, does it make a sound?”

Open Full Size
Open Full Size

Who connects to a website that you’ve told no one about?

Because we know that we will visit websites frequently, we wanted to see who else and how quickly other companies were visiting newly registered websites and who exactly they were and why they were doing it. We also wanted to see who knew about domains that were much harder to trace.

How different TLDs differ in public availability

All new gTLDs (.zone .pro etc..) and some of the major TLDs (.com .net .org .biz) have daily updatable lists accessible to almost anybody, so inevitably if you register a domain within one of these zones it will be public knowledge within a few hours of registration, regardless of how obscure the domain is.

ccTLDs (.uk .pw .de) operate differently, they don’t generally make the list of registered domains publically available.

The Domain Name Setup

We registered two domains – one a .com and another a .pw – .pw is the top level country code for Palau a small group of islands in the Pacific Ocean – the ccTLD is however sold by “The Professional Web” a subsidiary of Directi, an internet registrar with 1000+ employees as an open ccTLD. We choose this ccTLD because it was the cheapest to register.

Both domain names are registered as 12 random characters and numbers basically making it almost impossible for anyone to guess the domain names.

The Website Setup

We built a simple HTML5 valid website which contains –

  1. The first four lines of Prometheus by Lord Byron in a paragraph tag
  2. A signature for an expensive marketing automation tool – for our own testing
  3. A single PNG image
  4. A linked CSS stylesheet
  5. A linked woff font
  6. A linked JS file
  7. An invocation of a web socket
  8. An HTML5 video tag with mp4/ogg/webm and 3gp supported files
  9. Inline JavaScript which makes an AJAX call

Sites that are “scraping” will generally just load the HTML content of a page, skipping the images, JavaScript, CSS, videos and definitely not perform any AJAX requests or opening websockets. A modern web browser would call most of the files linked above. We wanted to make sure we could see the difference between scrapers and an actual web browser in the logs for requests.

The Leaks

Registering a domain in absolute secrecy is almost impossible, here’s the location of leaks before we’ve really even got started –

  1. Namecheap – by registering the domain here this company is aware of our domain
  2. Google – The confirmation of registration was delivered to us by Gmail – Google scan the content of emails to determine the advertisements – they also potentially visit website links in the email to determine if the website has malicious content.
  3. All DNS server owners between Telstra and Namecheap – we do a test lookup using wget on Ubuntu Linux from our broadband plan in Australia with Telstra to ensure the web server is setup correctly, this performs a DNS lookup on our ISPs servers that will then look for a recently cached version of the IP address, which, if no other lookups have been done, would reside on Namecheap’s DNS system.
  4. Directi – The registry owner for the PW TLD – they have a record of the registration of the domain and Verisign, the registry owner for the COM TLD.

Who visited us – Tin Foil Hat Required

We expected to get traffic to our .com registration within a few hours because of its visibility on publically available zone files, it was just a matter of who and when. All queries within the first 28 hours were bots pretending to be PC browsers wit the exception of one. Here’s what happened –

.COM Domain Registered and IP points to web server @ 4:20 UTC on 4th November 2014.

+4 minutes – Telstra (our ISP)
This is us testing to ensure that the server is setup – this also leaks the existence of the domain into the DNS resolution system.

+3 hours 57 minutes – Peer 1 – Unknown
A connection from Peer 1 machine hosted in Vancouver. Blacklist logs for this bot show it changes its useragent frequently.

+6 hours 45 minutes – Prolocation – Unknown
A connection from a machine in the Netherlands that resolves to noc.prolocation.net, so appearing to be a part of this hosting companies business.

+7 hours 17 minutes – Cyveillance – QinetQ Company
This US based company provides threat intelligence and internet monitoring, its parent company QinetQ is a UK company formed from the privatization of parts of the Ministry of Defence and Defence Evaluation Research Agency.

+11 hours 32 minutes – Hurricane Electric  – Unknown
This is a strange one. The IP of this visit resolves to a single .com domain name – the WHOIS record for it shows it’s owned by managing director of a US aviation and insurance company but uses a Chinese DNS provider. Searching for registrations from the same person turned up a few other domain names all of which were registered around April 2014 and all contained Chinese content. Based on the ease of accessibility of the person’s contact information (home address, phone number, wife’s name) via simple Google searches we don’t believe that the owner of these DNS records matches the person controlling whatever this is connecting to the website.

+12 hours 2 minutes – Cyren (Commtouch)
Cyren is a cloud-based security solution provider, it lists Google as one of its customers saying that it provides them with “embedded antivirus solution to protect customers from malware” – as noted in the “Leaks” section above we received the confirmation of the domain registration via Gmail – that may have caused this connection to the website.

+14 hours 5 minutes – DomainTools
DomainTools provides DNS research, WHOIS lookup and cybercrime investigation services.

+18 hours 37 minutes – BuiltWith
That’s us! Our window for finding new sites should be no more than 24 hours.

+26 hours 47 minutes – Prescient Software Inc / IRS
An IP with a DNS record of phishmongers.com that has TXT records pointing to the IRS that seems to be linked to Prescient Software Inc – you can Google that one for conspiracy theories.

What about the .PW Domain? Tin Foil Hat not Required

24 hours after registering the .PW domain in exactly the same manner as the .COM domain, not a single bot has visited the site (except for our own test to ensure the domain is setup correctly).

Conclusion

In just under 27 hours from registering a .com domain 8 different entities visited the website. Half of them  (Cyren, Cyveillance, DomainTools and BuiltWith) are all known companies that advertise the fact they do this, the other half are either unknown or don’t advertise why they are indexing websites in such a short period of time since registration. All of the bots except the Prescient Software one pretended to be web browsers, none of the bots actually were (no CSS/JS or media requests were loaded).

24 hours after registering a .PW domain in the same manner as the .COM there have been no visits from any bots at all which isn’t what we thought would have happened, the leaks of data that happen when you register a domain provides information to some companies about that domain.

Missing from our logs are search engine bots, this must mean that search engines are not using domain registrations as a source of new crawler indexes, at least not relatively quickly.

We will continue to monitor the visits to the websites and see what happens in the days and weeks to come.

 

 

 

 

 

 

 

 

Gary

Gary is co-founder of BuiltWith and uses a profile picture from 2006.