Home > Networking > Overview of URL Filtering – Without Decryption

Overview of URL Filtering – Without Decryption

Palo Alto Networks firewalls use their PAN-DB cloud-based URL filtering database (or an offline PAN-DB private cloud in some cases) to categorize websites and enforce policies based on those categories. Even when SSL/TLS traffic is not decrypted, the firewall can identify the destination URL or domain by analyzing unencrypted metadata in the HTTPS handshake. This allows the firewall to apply URL filtering policies without needing to decrypt the traffic, which is particularly useful in environments where decryption is not enabled due to privacy, performance, or configuration constraints.When an internal proxy is involved, it acts as an intermediary between the client and the destination server. The proxy may handle the HTTPS connection in different ways (e.g., transparent or explicit proxy), but the firewall can still inspect certain elements of the traffic to determine the URL or domain being accessed. Below, we’ll break down the mechanism, the role of the proxy, and the specific packet header details that enable this functionality.

Mechanism of URL Determination Without Decryption

The key to the Palo Alto firewall’s ability to perform URL filtering on encrypted HTTPS traffic without decryption lies in the SSL/TLS handshake process, specifically the Client Hello packet, which contains the Server Name Indication (SNI) field. Here’s a step-by-step explanation of how this works:

  1. Client Initiates HTTPS Connection:
    • When a user attempts to access a website (e.g., https://www.example.com), their browser sends a Client Hello packet as part of the SSL/TLS handshake to the destination server (or the internal proxy, if configured).
    • The Client Hello packet includes the SNI field, which specifies the hostname of the server the client is trying to reach (e.g., www.example.com). The SNI is sent in plaintext, as it is part of the initial handshake before encryption is established.
  2. Role of the Internal Proxy:
    • Explicit Proxy:
    • In an explicit proxy setup, the client is configured to send all web requests to the proxy server. The proxy initiates a new HTTPS connection to the destination server on behalf of the client. During this process, the proxy sends its own Client Hello packet to the destination server, including the SNI field with the target hostname (e.g., www.example.com).
    • Transparent Proxy:
    • In a transparent proxy setup, the client is unaware of the proxy, and the proxy intercepts the traffic without explicit configuration on the client side. The original Client Hello packet from the client, including the SNI, is forwarded to the destination server, often without modification.
    • In both cases:
    • The firewall, positioned in the network path (e.g., between the proxy and the internet or between the client and the proxy), can inspect the Client Hello packet to extract the SNI.
  3. Firewall Inspects the Client Hello Packet:
    • The Palo Alto firewall captures the Client Hello packet and extracts the SNI field, which contains the hostname (e.g., www.example.com).
    • If the SNI is not present (rare in modern HTTPS traffic, as most browsers and servers support SNI), the firewall may fall back to inspecting the Common Name (CN) in the server’s SSL certificate, which is sent in the Server Hello packet during the handshake. Like the SNI, the CN is also sent in plaintext.
  4. URL Categorization Using PAN-DB:
    • The firewall uses the extracted hostname (from SNI or CN) to perform a lookup in its local cache of URL categorizations. If the hostname is not cached, the firewall queries the PAN-DB cloud database to determine the URL category (e.g., “social-media,” “news,” “malware,” etc.).
    • The PAN-DB database maps hostnames to predefined or custom URL categories, which are then used to enforce the URL filtering policy configured on the firewall (e.g., allow, block, alert, or override).
  5. Policy Enforcement:
    • Based on the URL category, the firewall applies the configured action (e.g., allow access, block with a response page, or log the request). This is done without decrypting the actual content of the HTTPS session, as the hostname alone is sufficient for categorization in most cases.
    • The firewall logs the URL, category, and action taken in its URL filtering logs, which can be reviewed for monitoring and troubleshooting.
  6. Caching for Performance:
    • To reduce latency, the firewall caches recently accessed URL categorizations in its local data plane and management plane caches. This minimizes the need for repeated cloud lookups for frequently accessed sites.
    • If the firewall cannot connect to PAN-DB (e.g., due to lack of internet connectivity or an expired license), it relies on the local cache or applies a default action for uncategorized URLs (e.g., “not-resolved” category).

Packet Header Details

The critical information used for URL filtering without decryption is found in the SSL/TLS handshake packets, specifically:

  • Client Hello Packet:
    • Server Name Indication (SNI): This is an extension in the TLS protocol that indicates the hostname the client is attempting to connect to. For example, for https://www.example.com, the SNI field would contain www.example.com. The SNI is sent in plaintext and is visible to the firewall.
    • Structure: The Client Hello packet is part of the TLS handshake and occurs immediately after the TCP three-way handshake. It includes fields like the TLS version, cipher suites, and extensions, with the SNI being one of the extensions.

Example:

TLSv1.2 Record Layer: Handshake Protocol: Client Hello
    Handshake Type: Client Hello (1)
    Version: TLS 1.2
    Random: ...
    Session ID: ...
    Cipher Suites: ...
    Extensions:
        server_name: www.example.com
  • Server Hello Packet (Fallback):
    • If the SNI is not present (e.g., in rare cases where the client does not support SNI or custom SSL applications are used), the firewall inspects the Common Name (CN) in the server’s SSL certificate, which is sent in the Server Hello packet.
    • Common Name (CN): The CN field in the certificate typically contains the hostname or a wildcard (e.g., *.example.com). This is also sent in plaintext during the handshake.
    • Limitation: The CN is less granular than the SNI, as it may only provide the root domain or a wildcard, limiting the firewall’s ability to filter specific subdomains or paths.
  • HTTP Host Header (If Decryption Is Enabled):
    • For completeness, if decryption were enabled, the firewall could inspect the HTTP Host header in the HTTP GET request (e.g., Host: www.example.com). However, since the query specifies no decryption, this is not applicable here.

Impact of Internal Proxy

The internal proxy’s role in the network affects how the firewall sees the traffic:

  • Explicit Proxy: (Will further discuss later)
    • The client sends an HTTP CONNECT request to the proxy (e.g., CONNECT www.example.com:443 HTTP/1.1). The proxy then establishes an HTTPS connection to the destination server, sending a Client Hello packet with the SNI.
    • The firewall, positioned between the proxy and the internet, sees the Client Hello packet from the proxy and extracts the SNI to determine the URL.
    • If the firewall is between the client and the proxy, it may see the HTTP CONNECT request, which includes the hostname in plaintext, allowing URL identification even before the TLS handshake.
  • Transparent Proxy: (We are focusing this type in our topic)
    • The proxy intercepts the client’s HTTPS traffic without explicit configuration. The original Client Hello packet from the client, including the SNI, is forwarded to the destination server.
    • The firewall sees the same Client Hello packet as it would without a proxy, allowing it to extract the SNI and perform URL filtering.
  • Proxy Challenges:
    • If the proxy rewrites or obscures the SNI (rare in standard configurations), the firewall may rely on the CN or other metadata, which could reduce accuracy.
    • If the proxy uses a single IP address for multiple destinations (e.g., a reverse proxy or CDN), the firewall may only see the proxy’s IP, but the SNI still provides the actual hostname.

Limitations and Considerations

While SNI-based URL filtering is effective, it has some limitations that networking professionals should be aware of:

  1. SNI Absence:
    • Some clients (e.g., older browsers or custom applications) may not include SNI in the Client Hello packet. In such cases, the firewall falls back to the CN, which may be less specific (e.g., a wildcard like *.example.com).
    • Certain non-browser applications may send data before the Client Hello, causing the firewall to classify the traffic as unknown-tcp, bypassing URL filtering.
  2. Granularity:
    • Without decryption, the firewall can only filter based on the hostname (e.g., www.example.com) and not specific paths (e.g., www.example.com/politics). Path-based filtering requires decryption to inspect the HTTP request.
  3. Encrypted SNI (ESNI/ECH):
    • Modern protocols like Encrypted Client Hello (ECH) (successor to Encrypted SNI) encrypt the SNI field, making it inaccessible to the firewall. While ECH adoption is still limited as of 2025, it could reduce the effectiveness of non-decrypted URL filtering in the future.
  4. CDN and Shared IPs:
    • Websites hosted on Content Delivery Networks (CDNs) may share IP addresses, but the SNI field allows the firewall to distinguish between different hostnames served by the same IP.
  5. Proxy Configuration:
    • If the proxy is configured to bypass the firewall or use non-standard ports, URL filtering may not work as expected. Ensure the firewall is in the path of all internet-bound traffic.

Practical Example

  1. The client sends an HTTP CONNECT request to the proxy: CONNECT www.facebook.com:443 HTTP/1.1.
  2. The proxy initiates an HTTPS connection to www.facebook.com, sending a Client Hello packet with server_name: www.facebook.com.
  3. The Palo Alto firewall, positioned between the proxy and the internet, captures the Client Hello packet and extracts the SNI (www.facebook.com).
  4. The firewall queries PAN-DB (or checks its local cache) and determines that www.facebook.com belongs to the “social-media” category.
  5. Based on the URL filtering profile (e.g., block “social-media”), the firewall blocks the connection and logs the attempt.

Best Practices

  1. Verify Proxy Configuration:
    • Ensure the proxy is configured to allow the firewall to inspect Client Hello packets (e.g., no SNI rewriting or tunneling that obscures the hostname).
    • For explicit proxies, confirm that HTTP CONNECT requests are visible to the firewall if positioned between the client and proxy.
  2. Use Test A Site:
    • Use Palo Alto’s Test A Site tool (urlfiltering.paloaltonetworks.com) to check how PAN-DB categorizes specific URLs and request recategorization if needed.
  3. Enable Logging:
    • Configure the URL filtering profile to log all categories (e.g., set to “alert” for allowed categories) to gain visibility into user web activity. Use the “Log container page only” option to reduce log volume.
  4. Combine with App-ID:
    • Use Palo Alto’s App-ID in conjunction with URL filtering to improve accuracy, especially for non-web traffic or when decryption is not enabled.
  5. Monitor for ESNI/ECH:
    • Stay informed about the adoption of Encrypted Client Hello (ECH). If ECH becomes widespread, consider selective SSL decryption for critical URL categories to maintain filtering effectiveness.
  6. Cache Management:
    • Ensure the firewall has internet connectivity for PAN-DB lookups, as cached entries may expire or become outdated.

Conclusion

Palo Alto Networks firewalls can determine URLs accessed through an internal proxy without decryption by inspecting the Server Name Indication (SNI) field in the Client Hello packet or, as a fallback, the Common Name (CN) in the server’s SSL certificate. These fields, sent in plaintext during the SSL/TLS handshake, allow the firewall to identify the hostname and query the PAN-DB database for its URL category. The internal proxy (explicit or transparent) does not significantly alter this process, as the firewall can still see the SNI in the proxy’s Client Hello packet or the HTTP CONNECT request. By leveraging these mechanisms, the firewall enforces URL filtering policies efficiently, even for encrypted HTTPS traffic, making it a powerful tool for network security.

Leave a Comment