Regex Library
Regex Library

25 patterns

Data Extraction Regex Patterns

Data extraction patterns are designed to find and capture structured information within unstructured text. Unlike validation patterns, extraction patterns use global matching to pull out all occurrences from a larger string.

Common Use Cases

Log file parsingWeb scrapingText processingETL pipelines

All Data Extraction Patterns

Hashtag Extraction

Extracts all hashtags (#tag).

#[A-Za-z0-9_]+\b

Mention Extraction

Extracts user mentions (@user).

@[A-Za-z0-9_]+\b

Number Extraction

Extracts integers and decimals (positive/negative).

-?\d+(?:\.\d+)?

URLs in Text

Extracts URLs with or without protocol.

\bhttps?:\/\/[^\s<>"]+|www\.[^\s<>"]+

HTML Entity

Matches HTML entities (&nbsp;, &#160;, &#xA0;).

&[a-zA-Z]+;|&#\d+;|&#x[0-9a-fA-F]+;

HTML Comment

Matches HTML comments.

<!--[\s\S]*?-->

Markdown Link Extraction

Extracts the text and URL from a Markdown link [text](url).

\[([^\]]+)\]\(([^\)]+)\)

Emoji Extraction

Matches common Unicode emojis in a text.

\uD83C[\uDF00-\uDFFF]|\uD83D[\uDC00-\uDE4F]|\uD83D[\uDE80-\uDEFF]|[\u2600-\u27BF]

Markdown Link

Extracts or validates Markdown hyperlinks [text](url).

\[([^\[\]]+)\]\((https?:\/\/[^\s)]+)\)

HTML Tag

Matches paired HTML tags with their content.

<([a-zA-Z][a-zA-Z0-9]*)\b[^>]*>.*?<\/\1>

Markdown Header

Matches Markdown heading lines (H1 to H6).

^#{1,6}\s+.+$

Log Level Prefix

Matches standard log level prefixes at the start of a log line.

^\[(DEBUG|INFO|WARN|WARNING|ERROR|FATAL|CRITICAL)\]

Extract IPv4 Addresses from Text

Extracts all valid IPv4 addresses from a block of text (use with global flag).

\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b

Markdown Image

Matches Markdown image syntax ![alt](src). Captures alt text and source URL.

!\[([^\]]*)\]\(([^)]+)\)

Markdown Bold Text

Matches **bold** Markdown text. Captures the inner text in group 1.

\*\*([^*\n]+)\*\*

Markdown Italic Text

Matches *italic* Markdown text without matching bold (**) text.

(?<!\*)\*([^*\n]+)\*(?!\*)

Markdown Fenced Code Block

Matches ``` fenced code blocks with optional language hint. Captures language and code.

```([a-zA-Z0-9+#-]*)\n([\s\S]*?)```

HTML Self-closing Tag

Matches XHTML-style self-closing tags (e.g. <img />, <br />). Captures tag name and attributes.

<([a-zA-Z][a-zA-Z0-9-]*)([^>]*?)\/>

Double-quoted String

Matches a double-quoted string with support for escaped quotes inside.

"(?:[^"\\]|\\.)*"

Single-quoted String

Matches a single-quoted string supporting escaped single quotes.

'(?:[^'\\]|\\.)*'

SQL Comment

Matches both single-line (--) and multi-line (/* */) SQL comments.

(--[^\r\n]*|\/\*[\s\S]*?\*\/)

Log Timestamp

Extracts ISO-like log timestamps from text (e.g. 2026-01-15 14:30:45.123).

\b\d{4}-\d{2}-\d{2}[ T]\d{2}:\d{2}:\d{2}(?:[.,]\d+)?\b

Markdown List Item

Matches a Markdown unordered (-, *, +) or ordered (1.) list item line.

^\s*(?:[-*+]|\d+\.)\s+.+$

Markdown Blockquote

Matches a single Markdown blockquote line (> text).

^>\s+.+$

Markdown Table Row

Matches a Markdown table row line (| col1 | col2 |).

^\|.+\|\s*$

Frequently Asked Questions

What is the difference between validation and extraction regex?

Validation uses ^ and $ anchors to match the entire string. Extraction drops the anchors and uses the global flag (g) to find all matches within a larger text.

How do I extract all emails from a text?

Use the Email Extraction pattern with the global flag: text.match(/\b[\w.-]+@[\w.-]+\.\w{2,4}\b/g)

How do I extract all URLs from HTML?

Use the URL Extraction pattern: text.match(/https?:\/\/[^\s<>"]+/g)

Looking for patterns in other categories?

Browse all 250 patterns