When conducting a web application penetration test understanding and extending the attack surface is an exercise that is critical for success. Having a large wordlist of realistic directories, files and domains is assists immensely with this process.
Commonspeak is a wordlist generation tool that leverages public datasets from Google's BigQuery platform. By performing queries on large datasets that are updated frequently, commonspeak is able to generate wordlists that are ""evolutionary"", in the sense that they reflect the newest trends on the internet.
This presentation will discuss the concept of evolutionary wordlists and how Commonspeak parses URLs from various BigQuery datasets including HTTPArchive, Stack Overflow and HackerNews to build current, consistently evolving and realistic wordlists of directories, files, parameter names for specific technologies, and subdomains.
We will also introduce Commonspeak 2 and discuss the additions to the tool including scheduled wordlist creation, comprehensive GitHub queries a permutation engine for subdomain discovery and asynchronous wordlist generation.