Regex: The Unsung Hero Behind Modern VA Tools

regex vapt (1)

When we talk about Vulnerability Assessment and Penetration Testing (VAPT), the first things that come to mind are using tools like Burp Suite, ZAP, Nmap, Nuclei, SQLMap, etc to perform VA, False Positive Removal and Manual Penetration Testing. However, one unsung hero that makes many of these tools powerful and helps penetration testers in manual testing—is Regex (Regular Expressions). Often seen as a complex and confusing topic, regex is, in fact, an indispensable tool for cybersecurity professionals, from VAPT analysts to red teamers. It’s a concise, powerful language for finding and manipulating text patterns, turning a tedious manual search into a few lines of code.

Regex provides a search pattern that can be used for string matching, extraction, and validation. In the context of VAPT, regex becomes a weapon to quickly identify vulnerabilities, sensitive data leaks, secrets, and anomalies from large volumes of responses, JS Files, HTML Files etc.

There are lot of use cases of Regex in VAPT and Red Teaming. Some of the use cases are as mentioned below: 

  • Automation-ready: Regex is the backbone of Burp matchers, Nuclei templates, and GitHub secret scanning tools. 
  • Time and Noise reduction: Instead of manually digging through logs, regex filters only what matters. 
  • Customizable: You can write custom patterns to match unique tokens or proprietary formats. 
  • Cross Tool Usage: Works in Burp, ZAP, Nuclei, Powershell, Grep, Python scripts, Go tools, and even in browser devtools.

Regex can be used with many tools, its utilization with some of the tools are as mentioned below: 

1. Utilization of Regex in Burp Suite: 

Burp suite is widely used to test Web Application, Mobile Applications, API and Thick Clients. Burp Suite allows regex in: 

  • Proxy Match & Replace 
  • Search Feature 

Below mentioned are some regex queries that can be run for different use cases/purpose: 

a. API Keys Identification/Enumeration:
AIza[0-9A-Za-z-_]{35} # Google API Key
AKIA[0-9A-Z]{16} # AWS Access Key
xox[baprs]-([0-9a-zA-Z]{10,48})? # Slack Token
"api[-]?key"\s[:=]\s["']?[0-9a-zA-Z-]{16,45} # Generic API Key
ya29.[0-9A-Za-z-_]+ # Google OAuth Token
sk_live_[0-9a-zA-Z]{24} # Stripe Secret Key
sk_test_[0-9a-zA-Z]{24} # Stripe Test Key
(?i)firebase[-]?api[-]?key\s[:=]\s["']?[A-Za-z0-9-_]{30,}["']? # Firebase API Key

b. Private Keys Identification/Enumeration:
-----BEGIN (RSA|EC|DSA|OPENSSH) PRIVATE KEY-----

b. Cookies/Tokens Identification/Enumeration:
eyJ[A-Za-z0-9_-]+.[A-Za-z0-9.-]+.[A-Za-z0-9.-]+ #JWT
(?i)Authorization:\sBearer\s+[A-Za-z0-9-_.]+ #Bearer header

(?i)Set-Cookie:\s[^;=]+=[^;\r\n]+ #Any cookie set
(?i)refresh[_-]?token["']?\s[:=]\s["'][^"']{8,}["'] #Refresh token fields

d. SQL Errors Identification/Enumeration:
SQL syntax.MySQL|Warning.mssql_|ORA-\d+|PostgreSQL.*ERROR

e. Server Errors Identification/Enumeration:
(Exception in thread|Traceback (most recent call last)|NullReferenceException|System.IO.File)

f. Stack Traces Identification/Enumeration:
(at .* in ..cs:line \d+)|(Caused by: ..\w+Exception)

g. Email Address Identification/Enumeration:
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[A-Za-z]{2,}

h. Card Identification/Enumeration:
(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14})

i. Phone Number Identification/Enumeration:
+?[0-9]{10,13}

j. API Path Identification/Enumeration:
\/api\/[a-zA-Z0-9_-/]+

k. URL’s Identification/Enumeration:
http?:\/\/[a-zA-Z0-9.-]+(?:\/[^\s"']*)?

l. Exposed Configuration Files Identification:
(.env|.git|.bak|.sql|.config|.yml|.ini|.xml)

m. Subdomain Identification/Enumeration:
(?:a-z0-9?.)+[a-z]{2,} # Generic domain
([a-zA-Z0-9_-]+.target.com) # Target-specific subdomains
https?:\/\/([a-zA-Z0-9_-]+.[a-z]+.[a-z]{2,}) # Subdomain inside URL

n. Aadhaar and PAN Card Identification/Enumeration:
[A-Z]{5}[0-9]{4}[A-Z]{1} # Indian PAN Card
[0-9]{12} # Indian Aadhaar

o. Credentials/Connection String:
(?i)(db|database|user(name)?|pwd|pass(word)?)\s[:=]\s["'][^"']{3,}["'] # Generic creds
(?i)jdbc:[a-z:]+://[^\s"'<>]+ # JDBC URL
(?i)mongodb(?:+srv)?:\/\/(?:[^\s:@]+)(?::[^\s@]+)?@[^\s/]+/[^\s"']* # Mongo URI
(?i)postgres(?:ql)?:\/\/[\w.-]+(?::\d+)?\/[\w.-]+ # Postgres URI

(?i)mysql:\/\/[\w.-]+(?::\d+)?\/[\w.-]+ # MySQL URI
(?i)redis:\/\/[\w.-]+(?::\d+)? # Redis URI
(?i)amqp:\/\/[\w.:@\/-]+ # AMQP

p. Network/Infrastructure:
\b(?:\d{1,3}.){3}\d{1,3}\b # IPv4
\b(?:(?:[A-Fa-f0-9]{1,4}:){7}[A-Fa-f0-9]{1,4})\b # IPv6 (simple)
\b(?:[0-9a-f]{7,40})\b # Git commit-ish
(?i)server:\s(nginx|apache|iis|gunicorn|caddy)[^\r\n] # Server header
(?i)x-powered-by:\s[^\r\n] # X-Powered-By header

2. Utilization of Regex in Nuclei (Custom Templates):
Nuclei uses regex extensively in its matchers. Example for detecting AWS key leakage:

id: aws-access-key-leak
info:
  name: AWS Access Key ID Disclosure
  author: ashutosh
  severity: high
  description: Detects exposed AWS Access Key IDs in HTTP responses
  tags: aws, credentials, exposure, secrets, cloud
requests:
  - method: GET
    path:
      - "{{BaseURL}}"
    matchers:
      - type: regex
        part: body
        regex:
          - '(?i)(AKIA|ASIA|A3T[A-Z0-9])[A-Z0-9]{16}'
    extractors:
      - type: regex
        part: body
        regex:
          - '(?i)(AKIA|ASIA|A3T[A-Z0-9])[A-Z0-9]{16}'

3. Utilization of Regex in GitHub Dorking:
GitHub is a goldmine for leaked secrets and domains. Although GitHub Dorking is generally used during red teaming, it’s a good practice to introduce it to VAPT methodologies as well. Regex supercharges Dorking by extracting structured data.
Below mentioned are some regex queries that can be run in GitHub for different use cases/purpose:

a. Secrets in GitHub:
"api[-]?key"\s[:=]\s["']?[0-9a-zA-Z-]{16,45} # API Keys
"password"\s[:=]\s["'][^"']+ # Hardcoded password
"secret"\s[:=]\s["'][^"']+ # Hardcoded secret
"token"\s[:=]\s["'][A-Za-z0-9-_]{20,} # Token
AKIA[0-9A-Z]{16} # AWS key leaks

b. Subdomain Identification/Enumeration in Repositories:
([a-zA-Z0-9_-]+.(target.com))

Regex is a pentester’s Swiss Army knife. Regex is more than just a text-matching trick, it’s a core skill for penetration testers. For anyone involved in cybersecurity, from a novice pentester to a red teamer, a solid understanding of regular expressions is non-negotiable. It’s the skill that allows you to automate, parse the unparsable, and quickly find the needle in the digital haystack.

By maintaining a personalized regex cheat sheet, pentesters can significantly improve efficiency, cut down false negatives, and spot issues that automated tools may miss.