Information Gathering

Overview

Web reconnaissance is the first step in security assessments or penetration testing, akin to a detective’s investigation to gather clues. The goals include:

  • Identifying assets like domains, subdomains, and IPs.

  • Uncovering hidden files, directories, and technologies.

  • Analyzing attack surfaces by finding open ports and software versions.

  • Gathering intelligence on employees, emails, and technologies for social engineering.

Reconnaissance can be Active (direct interaction, higher detection risk) or Passive (no direct interaction, lower risk).

Type
Description
Detection Risk
Examples

Active Reconnaissance

Directly interacts with the target

Higher

Port scanning, vulnerability scanning, network mapping

Passive Reconnaissance

Uses public sources without direct interaction

Lower

Search engines, WHOIS, DNS enumeration, social media


WHOIS Lookups

WHOIS provides domain ownership details like registrar, registration dates, nameservers, and contacts.

Example command:

whois example.com

Note: WHOIS data can be inaccurate or masked by privacy services.


DNS

DNS translates domain names to IP addresses.

Example with dig for IPv4 (A record):

dig example.com A

Common DNS record types:

Record Type
Description

A

Maps hostname to IPv4 address

AAAA

Maps hostname to IPv6 address

CNAME

Alias for hostname

MX

Mail servers for domain

NS

Authoritative name server delegation

TXT

Arbitrary text information

SOA

DNS zone administrative information


Subdomains and Enumeration

Subdomains organize services within a domain (e.g., mail.example.com).

Enumeration Type
Description
Examples

Active Enumeration

Directly probes DNS servers

Brute-forcing, DNS zone transfers

Passive Enumeration

Uses public data sources

Certificate Transparency logs, search engine queries

Tools for subdomain enumeration include dnsenum and combining active and passive methods enhances discovery.


Zone Transfers

Full DNS zone transfers (AXFR) can reveal all DNS info.

Example attempt:

dig @ns1.example.com example.com axfr

Often restricted but misconfiguration can expose info.


Virtual Hosts

Multiple websites share the same IP, differentiated by hostname.

Tool example: gobuster with vhost mode to brute-force virtual hosts.

Example command:

gobuster vhost -u http://192.0.2.1 -w hostnames.txt

Certificate Transparency Logs

CT logs record issued SSL/TLS certificates, revealing subdomains.

Example command using curl and jq to fetch subdomains via crt.sh:

curl -s "https://crt.sh/?q=%25.example.com&output=json" | jq -r '.[].name_value' | sed 's/\*\.//g' | sort -u

Web Crawling

Automated navigation to map site structure and gather info.

  • Important file: robots.txt shows disallowed crawling paths.

  • Framework: Scrapy in Python for web scraping.

Example Scrapy spider snippet:

import scrapy
class ExampleSpider(scrapy.Spider):
    name = "example"
    start_urls = ['http://example.com/']
    def parse(self, response):
        for link in response.css('a::attr(href)').getall():
            if any(link.endswith(ext) for ext in self.interesting_extensions):
                yield {"file": link}
            elif not link.startswith("#") and not link.startswith("mailto:"):
                yield response.follow(link, callback=self.parse)

Search Engine Discovery

Use search engines and advanced operators ("Google Dorks") for passive reconnaissance.

Common operators:

Operator
Description
Example

site:

Restrict to site

site:example.com "password reset"

inurl:

Search URL

inurl:admin login

filetype:

File type

filetype:pdf "confidential report"

intitle:

Search page title

intitle:"index of" /backup

cache:

Cached page

cache:example.com

"search term"

Exact phrase

"internal error" site:example.com

OR

Combine terms

inurl:admin OR inurl:login

-

Exclude terms

inurl:admin -intext:wordpress


Web Archives

Wayback Machine archives historical website snapshots useful to find:

  • Past site content no longer available.

  • Hidden or removed directories/files.

  • Website content changes over time.

Last updated