Introducing BuiltWith Full Index
⏰ Old Post
This post is over 11 years old - it might be slightly outdated and consigned to the history books.
What does this actually mean?
Previously to detect web technologies BuiltWith downloaded the HTML that renders the homepage of the site. In 2013 this is no longer sufficient to detect all technologies. "View Source" on any modern website framework and all you will see is a skeleton of what the website is actually built with thanks to client side rendering and the increasing use of JavaScript.
Full index means we render the full website, follow redirects, execute JavaScript, load Iframes and download all the resources that make up a page.
How full index is different
What improvements does this bring?
This allows us to uncover more technologies and provide a more accurate BuiltWith experience. The following improvements will be found within Trends, TrendsPro reports and standard technology lookups -
Advertising
Many advertising technologies are loaded behind iframes and/or instantiated from JavaScript. Full site rendering means we'll be able to identify and see these technologies now.
Tag Managers
Websites using tag management systems load additional third party analytics and tracking technologies differently to their default installation. Full index provides us with the ability to track them regardless of how they are implemented in the original HTML.
Custom Installs
Website owners tweak their code to run as fast as possible, especially mission critical websites where 1 second extra of load time reduces engagement. These websites sometimes have totally customized installations where we would previously have missed third party scripts loading because they would not match the majority of detection techniques we use.
Technical Challenges of Full Index
Our old method meant providing a technology lookup for a website required one "GET" request for content. So 2 million sites required 2 million lookups, full index proved a challenge because it requires as many lookups as the webpage requires to render the page, additional compute time to render JavaScript, CSS and Images and an additional compute time required to find the technologies in all of the additional resources.
Web page download request increase.
At our last run 2.4 million top sites create 92.4 million records. Some of these requests are for cached content or rendering errors with the site (a single image being downloaded many times for example) but show the scaling required to provide a level of detail unprecedented in web technology research.
tl;dr
We've greatly improved technology lookup, Trends and TrendsPro accuracy and coverage by doing full website page loads.