Information Gathering
Overview
Web reconnaissance is the first step in security assessments or penetration testing, akin to a detective’s investigation to gather clues. The goals include:
Identifying assets like domains, subdomains, and IPs.
Uncovering hidden files, directories, and technologies.
Analyzing attack surfaces by finding open ports and software versions.
Gathering intelligence on employees, emails, and technologies for social engineering.
Reconnaissance can be Active (direct interaction, higher detection risk) or Passive (no direct interaction, lower risk).
Active Reconnaissance
Directly interacts with the target
Higher
Port scanning, vulnerability scanning, network mapping
Passive Reconnaissance
Uses public sources without direct interaction
Lower
Search engines, WHOIS, DNS enumeration, social media
WHOIS Lookups
WHOIS provides domain ownership details like registrar, registration dates, nameservers, and contacts.
Example command:
whois example.comNote: WHOIS data can be inaccurate or masked by privacy services.
DNS
DNS translates domain names to IP addresses.
Example with dig for IPv4 (A record):
dig example.com ACommon DNS record types:
A
Maps hostname to IPv4 address
AAAA
Maps hostname to IPv6 address
CNAME
Alias for hostname
MX
Mail servers for domain
NS
Authoritative name server delegation
TXT
Arbitrary text information
SOA
DNS zone administrative information
Subdomains and Enumeration
Subdomains organize services within a domain (e.g., mail.example.com).
Active Enumeration
Directly probes DNS servers
Brute-forcing, DNS zone transfers
Passive Enumeration
Uses public data sources
Certificate Transparency logs, search engine queries
Tools for subdomain enumeration include dnsenum and combining active and passive methods enhances discovery.
Zone Transfers
Full DNS zone transfers (AXFR) can reveal all DNS info.
Example attempt:
dig @ns1.example.com example.com axfrOften restricted but misconfiguration can expose info.
Virtual Hosts
Multiple websites share the same IP, differentiated by hostname.
Tool example: gobuster with vhost mode to brute-force virtual hosts.
Example command:
gobuster vhost -u http://192.0.2.1 -w hostnames.txtCertificate Transparency Logs
CT logs record issued SSL/TLS certificates, revealing subdomains.
Example command using curl and jq to fetch subdomains via crt.sh:
curl -s "https://crt.sh/?q=%25.example.com&output=json" | jq -r '.[].name_value' | sed 's/\*\.//g' | sort -uWeb Crawling
Automated navigation to map site structure and gather info.
Important file:
robots.txtshows disallowed crawling paths.Framework:
Scrapyin Python for web scraping.
Example Scrapy spider snippet:
import scrapy
class ExampleSpider(scrapy.Spider):
name = "example"
start_urls = ['http://example.com/']
def parse(self, response):
for link in response.css('a::attr(href)').getall():
if any(link.endswith(ext) for ext in self.interesting_extensions):
yield {"file": link}
elif not link.startswith("#") and not link.startswith("mailto:"):
yield response.follow(link, callback=self.parse)Search Engine Discovery
Use search engines and advanced operators ("Google Dorks") for passive reconnaissance.
Common operators:
site:
Restrict to site
site:example.com "password reset"
inurl:
Search URL
inurl:admin login
filetype:
File type
filetype:pdf "confidential report"
intitle:
Search page title
intitle:"index of" /backup
cache:
Cached page
cache:example.com
"search term"
Exact phrase
"internal error" site:example.com
OR
Combine terms
inurl:admin OR inurl:login
-
Exclude terms
inurl:admin -intext:wordpress
Web Archives
Wayback Machine archives historical website snapshots useful to find:
Past site content no longer available.
Hidden or removed directories/files.
Website content changes over time.
Last updated