verbose log

Passive Recon Pipeline: How the Assessment Actually Works

3/6/2026 12 min

If you read the Wolf version of this post, you got the what: passive recon, evidence of active targeting, shadow ITInformation Technology — Broad term for computing infrastructure and services, a dangling subdomain, data leakage, EOL software. All real, all from a recent engagement. This post is for practitioners — the people who want to know how.

The pipeline I’m going to describe is composable, repeatable, and built entirely from open-source tooling. It’s designed to simulate what a motivated attacker does before they touch anything.

The Philosophy First

As I covered in the full post: passive recon means zero interaction with the target’s infrastructure. No packets sent, no requests made, no rules tripped. You’re reading public records that have been accumulating since the first certificate was issued and the first URLUniform Resource Locator — Web address for accessing resources was crawled. The question you’re answering is what an attacker learns before they do anything active. The answer is almost always more than the organization realizes.

The Pipeline

The assessment runs in two parallel branches that converge at a final correlation step.

                  ┌─ dns-enum ─── shodan-enrich ──┐
ct-recon ─────────┤                               │
                  └─ (active recon, phase 2)      │
                                                  ├─ asm-report
root domain ─┬─ gau ─── url-analyze ──────────────┤
             └─ theHarvester ─────────────────────┘

Infrastructure branch: CT logs discover subdomains, DNSDomain Name System — Port 53 (UDP/TCP). Resolves domain names to IP addresses resolves them, Shodan enriches the IPs with what it’s already observed from its continuous internet-wide scanning.

Application branch: GAU (GetAllURLs) pulls historical URLs from Wayback Machine, Common Crawl, OTX, and URLScan. A custom analyzer classifies and extracts intelligence from those URLs.

Both branches feed into a final correlation step that produces a unified inventory.

Certificate Transparency Logs

CT logs are the starting point. Every TLSTransport Layer Security — Port 443 (HTTPS). Encryption protocol for data in transit certificate issued by a trusted CACertificate Authority — Entity that issues and signs digital certificates is logged to public CT logs. It’s a requirement of the CACertificate Authority — Entity that issues and signs digital certificates/Browser Forum. This means every subdomain that’s ever had a certificate issued for it is permanently in a public record, searchable by domain.

Tools like crt.sh expose a queryable interface to these logs. A simple query against a root domain returns every certificate ever issued, including SANs, which often contain subdomains that don’t appear in DNSDomain Name System — Port 53 (UDP/TCP). Resolves domain names to IP addresses at all: preprod environments, internal tooling that someone issued a cert for, staging systems.

What you’re building here is a historical record of everything the organization has ever put a certificate on. Some of it will be long dead. All of it is signal.

DNS Enumeration

With a list of candidate subdomains from CT logs, the next step is DNSDomain Name System — Port 53 (UDP/TCP). Resolves domain names to IP addresses resolution. This tells you what’s actually alive right now: which subdomains resolve, to what IPs, through what CNAME chains.

CNAME chains are particularly interesting because they reveal vendor relationships. A subdomain that CNAMEs to a third-party service is telling you: this organization trusts this vendor enough to delegate DNSDomain Name System — Port 53 (UDP/TCP). Resolves domain names to IP addresses to them. Chain enough of those and you have a vendor inventory the organization probably doesn’t have themselves.

Cloudflare-proxied IPs are worth flagging separately. If a subdomain resolves to a CF edge IPInternet Protocol — Network layer addressing and routing, you’re not seeing the origin: the real IPInternet Protocol — Network layer addressing and routing is hidden. That’s a WAFWeb Application Firewall — Layer 7 firewall protecting web applications coverage indicator, but it also means there’s something behind it you can’t see directly from DNSDomain Name System — Port 53 (UDP/TCP). Resolves domain names to IP addresses.

Subdomain takeover candidates surface here too. The detection is mechanical: if a CNAME points to an external provider and that provider returns a “not found” or “available” response for the target resource, the subdomain is claimable. The hard part isn’t finding them, it’s that DNSDomain Name System — Port 53 (UDP/TCP). Resolves domain names to IP addresses records for dead vendor relationships often outlive anyone’s awareness that a relationship ended.

Shodan Enrichment

Once you have a list of unique IPs from DNSDomain Name System — Port 53 (UDP/TCP). Resolves domain names to IP addresses, Shodan is where things get interesting. Shodan continuously scans the internet and catalogues what it finds: open ports, service banners, TLSTransport Layer Security — Port 443 (HTTPS). Encryption protocol for data in transit certificate details, HTTPHypertext Transfer Protocol — Port 80. Web protocol (unencrypted) response headers. By the time you query it, it’s likely already scanned every IPInternet Protocol — Network layer addressing and routing in your list multiple times.

What you’re looking for:

  • Open ports that shouldn’t be open. SSHSecure Shell — Port 22. Encrypted remote administration protocol, RDPRemote Desktop Protocol — Port 3389. Microsoft remote access, database ports, admin interfaces reachable from the public internet.
  • Default or self-signed certificates. Indicates infrastructure that was stood up without proper TLSTransport Layer Security — Port 443 (HTTPS). Encryption protocol for data in transit configuration, often shadow ITInformation Technology — Broad term for computing infrastructure and services or forgotten test environments.
  • EOL software. Shodan captures version strings. An nginx/1.14.0 banner on a production-adjacent server is both an EOL finding and a CVECommon Vulnerabilities and Exposures — Standard identifier for known vulnerabilities surface.
  • Known CVEs. Shodan’s vulnerability data cross-references service versions against CVECommon Vulnerabilities and Exposures — Standard identifier for known vulnerabilities databases. It’ll tell you if a service version has documented exploits.
  • Default pages. An IPInternet Protocol — Network layer addressing and routing serving a default web server welcome page is infrastructure that was stood up and left running. Chances are high that it’s not monitored and probably not patched.

The key insight here is that Shodan already has this data. You’re not scanning anything. You’re reading a database of scans that happened before you started.

URL Intelligence

GAU aggregates historical URLs from multiple sources: the Wayback Machine, Common Crawl, OTX, and URLScan.io. For a mature organization, this produces tens or hundreds of thousands of URLs spanning years of crawl history.

Raw URLUniform Resource Locator — Web address for accessing resources lists are noise. The value is in classification:

  • Auth and session endpoints: login flows, token endpoints, OAuthOpen Authorization — Authorization framework for delegated access callbacks. These tell you how authentication works and where to look for weaknesses.
  • APIApplication Programming Interface — Interface for software-to-software communication endpoints: patterns like /api/v1/, /graphql, /rest/ reveal the APIApplication Programming Interface — Interface for software-to-software communication surface. Version numbers in paths tell you about APIApplication Programming Interface — Interface for software-to-software communication lifecycle management (or lack thereof).
  • Sensitive file types: .env, .sql, .bak, .config in URLUniform Resource Locator — Web address for accessing resources paths. Most won’t return anything useful, but they tell you what the organization’s deployment hygiene looked like historically.
  • Parameters with sensitive patterns: email addresses, tokens, IDs in query strings. This is where PIIPersonally Identifiable Information — Data that can identify an individual leakage lives. If an application ever put a user’s email address in a URLUniform Resource Locator — Web address for accessing resources parameter, it’s in the crawl databases forever.
  • Non-production hosts: URLs from staging, dev, preprod, sandbox environments that got indexed. Shadow ITInformation Technology — Broad term for computing infrastructure and services discovery from the application layer instead of the infrastructure layer.

The URLUniform Resource Locator — Web address for accessing resources archive is effectively a historical record of every mistake the application has ever made in how it constructs URLs. Some of those mistakes are fixed. The record of them isn’t.

OSINT Aggregation

theHarvester pulls email addresses, subdomains, IPs, and employee names from a broad range of OSINTOpen Source Intelligence — Intelligence gathered from publicly available sources sources: search engines, LinkedIn, GitHub, certificate data. This fills in the picture that the technical pipeline misses.

Employee email patterns matter because they’re the credential format attackers use in phishing and credential stuffing. A discovered pattern of firstname.lastname@company.com is intelligence. GitHub user associations can reveal contractors, third-party developers, and, occasionally, credentials or internal tooling that got committed to a public repo.

Correlating the Data

All five sources converge into a single inventory: every subdomain with its discovery source(s), every IPInternet Protocol — Network layer addressing and routing with its associated hosts and Shodan findings, every vendor relationship inferred from DNSDomain Name System — Port 53 (UDP/TCP). Resolves domain names to IP addresses, every URLUniform Resource Locator — Web address for accessing resources classification, every email address.

The correlation step is where findings that look insignificant in isolation become significant together. A subdomain that appears in CT logs but not in DNSDomain Name System — Port 53 (UDP/TCP). Resolves domain names to IP addresses is a ghost: it existed once. A subdomain that resolves to an IPInternet Protocol — Network layer addressing and routing that Shodan says is running EOL software with known CVEs is a finding. A CNAME pointing to a provider that shows “page not found” or “this domain is available” is a critical finding.

Finding Evidence of Active Reconnaissance

This deserves its own section because it’s the most interesting part of this particular engagement.

URLScan.io is a public service where users submit URLs for analysis. The submissions are logged, timestamped, and publicly searchable. If someone is systematically submitting subdomains belonging to your client for analysis, that’s visible.

The pattern we found was consistent with automated target development: multiple subdomains submitted over a compressed timeframe, in a sequence that suggests enumeration rather than organic discovery. Not a human clicking links, but a tool working through a list. Cross-referencing the submission timestamps against CT log issuance timestamps and other public data points let us establish a timeline and confirm the pattern.

There’s no log source for “someone looked you up on URLScan.” It doesn’t show up in a SIEMSecurity Information and Event Management — Centralized log collection, correlation, and alerting. The only way to find it is to go look, which is the whole point.

What Phase 1 Doesn’t Cover

The passivity that makes this safe to run also defines its ceiling. Specifically:

  • You don’t know what’s running behind Cloudflare. You see the edge, not the origin.
  • You don’t know what the application does, only what URLs it has used historically.
  • You can’t validate whether a vulnerability is actually exploitable, only that the conditions for it exist.
  • You have no view of internal infrastructure, authenticated surfaces, or anything that requires credentials.

Phase 2 (active recon actually probing the infrastructure) and Phase 3 (vulnerability scanning and validation) go significantly deeper. Phase 1 gives you the map. Phases 2 and 3 tell you what’s on it.

Making It Repeatable

The value of a structured pipeline over ad-hoc tooling is repeatability. The same pipeline run six months later produces a delta: new subdomains, new IPs, new URLUniform Resource Locator — Web address for accessing resources patterns, changed Shodan findings. That delta is your monitoring signal.

This is the direction the tooling is designed for: not just a one-time engagement artifact, but a continuously runnable assessment that answers “what changed in our external attack surface since last time?”

Most organizations don’t know the answer to that question. They should.

Want the Wolf version? Head back to the original post.

root@wolf-solutions:~$ cd /blog/passive-recon-assessment