How to Use grep, sed, and awk for Text Processing

Illustration of a programming guide showing a terminal with grep sed and awk commands, highlighting pattern matching matching stream editing, text transformation, and commandline efficiency.

How to Use grep, sed, and awk for Text Processing
SPONSORED

Sponsor message — This article is made possible by Dargslan.com, a publisher of practical, no-fluff IT & developer workbooks.

Why Dargslan.com?

If you prefer doing over endless theory, Dargslan’s titles are built for you. Every workbook focuses on skills you can apply the same day—server hardening, Linux one-liners, PowerShell for admins, Python automation, cloud basics, and more.


In the world of command-line text processing, three tools stand as pillars of efficiency and power: grep, sed, and awk. Whether you're managing server logs, parsing configuration files, or extracting data from massive datasets, these utilities transform hours of manual work into seconds of automated processing. They're not just tools—they're essential skills that separate proficient system administrators and developers from those still clicking through text editors.

These three commands form a trinity of text manipulation: grep searches and filters, sed edits and transforms, and awk analyzes and reports. Each has its own philosophy and strengths, yet they complement each other perfectly. Together, they represent decades of Unix philosophy distilled into practical, everyday utilities that remain as relevant today as when they were first created.

Throughout this guide, you'll discover practical techniques for each tool, understand when to use which command, and learn how to combine them for sophisticated text processing workflows. We'll explore real-world examples, examine common patterns, and provide you with a reference you can return to whenever you need to manipulate text efficiently from the command line.

Understanding grep: The Pattern Searching Powerhouse

The global regular expression print (grep) command searches through text line by line, displaying lines that match specified patterns. Its simplicity belies its power—grep can process gigabytes of data in seconds, making it indispensable for log analysis, code searching, and data extraction tasks. The basic syntax follows a straightforward pattern: the command, optional flags, the search pattern, and the file to search.

When working with grep, you're essentially asking the system to find needles in haystacks, but with surgical precision. The tool reads input text, applies pattern matching to each line, and outputs matching results. This line-oriented approach makes grep exceptionally fast and memory-efficient, even with enormous files that would crash text editors.

Essential grep Options and Flags

The true versatility of grep emerges through its command-line options. The -i flag enables case-insensitive searching, perfect when you don't know the exact capitalization of your target text. The -r or -R flag recursively searches through directories, transforming grep into a powerful codebase exploration tool. The -v flag inverts the match, showing lines that don't contain the pattern—incredibly useful for filtering out noise from log files.

grep -i "error" application.log
grep -r "function_name" /path/to/project/
grep -v "DEBUG" system.log
grep -n "TODO" *.js
grep -c "warning" server.log

The -n option displays line numbers alongside matches, essential for pinpointing exact locations in files. The -c flag counts matching lines rather than displaying them, providing quick statistics. For context around matches, -A (after), -B (before), and -C (context) options show surrounding lines, helping you understand what happened before and after an event.

"The ability to search through thousands of files in milliseconds fundamentally changes how you approach problem-solving in software development and system administration."

Regular Expressions in grep

Regular expressions elevate grep from a simple text finder to a sophisticated pattern matcher. Basic regex metacharacters include the caret ^ for line start, dollar sign $ for line end, dot . for any character, and asterisk * for zero or more repetitions. Character classes like [0-9] match digits, while [a-zA-Z] matches letters.

grep "^Error" logs.txt
grep "failed$" status.log
grep "[0-9]\{3\}-[0-9]\{4\}" contacts.txt
grep -E "error|warning|critical" system.log
grep -P "\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}" access.log

Extended regular expressions, activated with -E or by using egrep, support additional operators like + (one or more), ? (zero or one), and | (alternation). Perl-compatible regular expressions, enabled with -P, provide even more power with lookaheads, lookbehinds, and non-greedy matching. Understanding which regex flavor you're using prevents frustration and unexpected results.

Pattern Meaning Example Matches
^pattern Line starts with pattern ^Error Error: file not found
pattern$ Line ends with pattern failed$ Connection failed
. Any single character a.c abc, adc, a3c
* Zero or more of previous ab*c ac, abc, abbc
[abc] Any character in brackets [0-9] Any digit 0-9
[^abc] Any character not in brackets [^0-9] Any non-digit
\{n,m\} Between n and m repetitions [0-9]\{2,4\} 12, 123, 1234

Practical grep Applications

🔍 Log file analysis represents one of the most common grep use cases. System administrators regularly search through gigabytes of logs to identify errors, track user activity, or diagnose performance issues. Combining grep with pipes allows you to chain multiple filters, progressively narrowing down results until you find exactly what you need.

grep "ERROR" app.log | grep "database"
grep -i "failed login" auth.log | grep -v "test_user"
ps aux | grep "python"
cat access.log | grep "404" | grep -o "[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}"

📊 Code searching becomes dramatically faster with grep. Instead of opening every file in an editor, you can instantly locate function definitions, variable usage, or specific code patterns across an entire project. The recursive option combined with file pattern matching makes this particularly powerful for large codebases.

🔧 System monitoring often relies on grep to filter process lists, network connections, or system resources. Piping commands like ps, netstat, or df through grep extracts relevant information without the noise of unrelated data. This real-time filtering capability makes grep essential for both interactive troubleshooting and automated monitoring scripts.

"Pattern matching isn't just about finding text—it's about asking the right questions of your data and getting immediate, actionable answers."

Mastering sed: The Stream Editor

The stream editor (sed) processes text in a single pass, making it extraordinarily efficient for transformations and substitutions. Unlike interactive editors, sed operates on streams of text, reading input line by line, applying commands, and producing output. This streaming architecture allows sed to handle files of any size with minimal memory consumption, making it perfect for automated text transformations in scripts and pipelines.

At its core, sed executes commands against each line of input. The most common operation—substitution—uses the s command with a syntax that feels familiar if you've used any Unix text editor: s/pattern/replacement/flags. However, sed's capabilities extend far beyond simple find-and-replace operations, encompassing deletion, insertion, transformation, and complex multi-line manipulations.

Basic sed Substitution

The substitution command forms the foundation of most sed operations. The basic format s/old/new/ replaces the first occurrence of "old" with "new" on each line. Adding the g flag (global) replaces all occurrences on each line, not just the first. The i flag makes the search case-insensitive, while numeric flags like 2 replace only the second occurrence.

sed 's/old/new/' file.txt
sed 's/error/ERROR/g' log.txt
sed 's/foo/bar/2' data.txt
sed -i 's/localhost/127.0.0.1/g' config.txt
sed 's/[0-9]\{3\}-[0-9]\{4\}/XXX-XXXX/g' contacts.txt

The -i option edits files in place, modifying the original rather than printing to standard output. This powerful feature requires caution—always test your sed commands without -i first, or use -i.bak to create backup copies. Regular expressions work within sed substitutions just as they do in grep, allowing sophisticated pattern matching and replacement operations.

Advanced sed Operations

💡 Deletion commands remove lines matching patterns or at specific positions. The d command deletes lines, while /pattern/d deletes only lines matching the pattern. Address ranges like 1,10d delete lines 1 through 10, and /start/,/end/d deletes from a line matching "start" to a line matching "end".

sed '1d' file.txt
sed '/^$/d' file.txt
sed '/DEBUG/d' log.txt
sed '1,5d' file.txt
sed '/START/,/END/d' data.txt

✏️ Insertion and appending add new content to files. The i command inserts text before a line, while a appends after. The c command replaces entire lines with new content. These operations work with addresses, allowing you to target specific lines or patterns for modification.

sed '1i\Header Line' file.txt
sed '$a\Footer Line' file.txt
sed '/pattern/i\New line before match' file.txt
sed '/pattern/c\Replacement line' file.txt

🔄 Transformation commands modify characters rather than entire patterns. The y command transliterates characters, similar to the tr command but integrated into sed's line-processing workflow. This proves useful for case conversion, character set transformations, or simple encryption operations.

"Stream editing isn't about what you can do in an editor—it's about what you can automate across thousands of files without ever opening one."

sed Address Ranges and Patterns

Addresses specify which lines sed commands affect. Numeric addresses like 5 target a specific line, while ranges like 10,20 target multiple lines. The special address $ represents the last line. Pattern addresses like /regex/ target lines matching a regular expression, and you can combine numeric and pattern addresses for precise control.

sed '5s/old/new/' file.txt
sed '10,20s/foo/bar/g' file.txt
sed '/ERROR/s/^/CRITICAL: /' log.txt
sed '/start/,/end/s/old/new/g' file.txt
sed -n '1,10p' file.txt

The -n option suppresses automatic printing, making sed only output what you explicitly tell it to with the p command. This combination creates powerful filtering capabilities, essentially turning sed into a more sophisticated grep. Multiple commands can be chained with -e options or separated by semicolons, allowing complex multi-step transformations in a single pass.

Practical sed Use Cases

Configuration file management becomes trivial with sed. Instead of manually editing dozens of configuration files across multiple servers, sed scripts can update settings consistently and reliably. Whether you're changing port numbers, updating paths, or modifying parameter values, sed handles these tasks with precision and repeatability.

sed -i 's/port=8080/port=9090/g' */config.ini
sed 's/#.*$//' script.sh
sed '/^$/d; /^#/d' config.txt
sed 's/\([0-9]\{1,3\}\)\.\([0-9]\{1,3\}\)\.\([0-9]\{1,3\}\)\.\([0-9]\{1,3\}\)/IP:\1.\2.\3.\4/g' log.txt

Data sanitization and formatting tasks benefit enormously from sed's transformation capabilities. Removing comments from code, standardizing date formats, masking sensitive information, or normalizing whitespace—these routine but tedious tasks become one-liners with sed. The ability to process files without loading them entirely into memory makes sed suitable even for multi-gigabyte datasets.

Command Operation Example Effect
s/pattern/replacement/ Substitute s/foo/bar/ Replace first "foo" with "bar"
d Delete 5d Delete line 5
p Print /pattern/p Print matching lines
i\text Insert 1i\Header Insert before line 1
a\text Append $a\Footer Append after last line
c\text Change 5c\New Replace line 5
y/set1/set2/ Transliterate y/abc/ABC/ Convert a→A, b→B, c→C
"The difference between a novice and an expert isn't knowing more commands—it's knowing which tool solves which problem most elegantly."

Exploring awk: The Text Processing Language

Unlike grep and sed, which are specialized tools, awk is a complete programming language designed for text processing. Named after its creators (Aho, Weinberger, and Kernighan), awk treats input as records (typically lines) divided into fields (typically words or columns). This field-oriented approach makes awk exceptional for working with structured text like CSV files, log files with consistent formats, or any data organized into columns.

The basic awk syntax follows the pattern: awk 'pattern { action }' file. For each input line, awk checks if it matches the pattern, and if so, executes the action. If you omit the pattern, the action applies to all lines. If you omit the action, awk prints matching lines (like grep). This simple structure scales from one-liners to complex programs with variables, functions, and control structures.

Understanding awk Field Processing

Fields form the foundation of awk's power. By default, awk splits each line on whitespace, assigning the first field to $1, the second to $2, and so on. The special variable $0 contains the entire line. The NF variable holds the number of fields, and $NF references the last field regardless of how many exist.

awk '{print $1}' file.txt
awk '{print $1, $3}' data.txt
awk '{print $NF}' file.txt
awk '{print $0}' file.txt
awk -F: '{print $1}' /etc/passwd

The -F option changes the field separator, allowing awk to parse any delimited format. Common separators include colons for /etc/passwd files, commas for CSV data, or tabs for TSV files. Regular expressions work as separators too, enabling sophisticated field splitting based on complex patterns rather than single characters.

awk Patterns and Conditions

🎯 Pattern matching in awk uses regular expressions enclosed in slashes, similar to sed. Patterns can also be conditional expressions using comparison operators like ==, !=, <, >, <=, and >=. Logical operators && (and), || (or), and ! (not) combine conditions for complex filtering.

awk '/error/ {print}' log.txt
awk '$3 > 100 {print $1, $3}' data.txt
awk '$2 == "active" {print $0}' status.txt
awk '$1 ~ /^[A-Z]/ {print}' names.txt
awk 'NR > 1 && $5 < 50 {print $1, $5}' file.txt

📍 Special patterns include BEGIN and END, which execute before processing any input and after processing all input, respectively. These prove invaluable for initialization tasks, printing headers, calculating totals, or performing cleanup operations. The NR variable tracks the current record number, enabling line-number-based processing.

awk 'BEGIN {print "Report"} {print $1} END {print "Done"}' file.txt
awk 'BEGIN {sum=0} {sum+=$3} END {print sum}' data.txt
awk 'NR==5 {print}' file.txt
awk 'NR > 10 && NR < 20 {print}' file.txt

awk Variables and Calculations

Variables in awk require no declaration—simply assign and use them. Arithmetic operations include addition +, subtraction -, multiplication *, division /, modulo %, and exponentiation ^. String concatenation happens automatically when you place variables or strings adjacent to each other. This flexibility makes awk excellent for on-the-fly calculations and data aggregation.

awk '{sum += $3} END {print sum}' numbers.txt
awk '{sum += $2; count++} END {print sum/count}' data.txt
awk '{total += $3; print $1, $3, total}' sales.txt
awk '{max = ($3 > max) ? $3 : max} END {print max}' values.txt

Built-in functions extend awk's capabilities. String functions like length(), substr(), toupper(), and tolower() manipulate text. Mathematical functions include sqrt(), sin(), cos(), and rand(). The printf function provides formatted output similar to C, offering precise control over number formatting and column alignment.

awk '{print length($0), $0}' file.txt
awk '{print toupper($1)}' names.txt
awk '{print substr($1, 1, 3)}' file.txt
awk '{printf "%-10s %5.2f\n", $1, $2}' data.txt

"When grep and sed feel like using a screwdriver, awk is like having an entire workshop—sometimes you need the full toolset."

Advanced awk Techniques

🔢 Associative arrays (also called hash tables or dictionaries) make awk uniquely powerful for data aggregation and analysis. Arrays in awk use string indices, allowing you to count occurrences, group data, or build lookup tables. This capability transforms awk from a line processor into a data analysis tool.

awk '{count[$1]++} END {for (word in count) print word, count[word]}' file.txt
awk '{sum[$1] += $2} END {for (key in sum) print key, sum[key]}' data.txt
awk '{arr[$2]++} END {for (i in arr) if (arr[i] > 5) print i}' log.txt

⚙️ Multi-line processing handles records spanning multiple lines. Changing the record separator with RS allows awk to treat paragraphs, XML elements, or other multi-line structures as single records. The output record separator ORS and output field separator OFS control how awk formats its output.

awk 'BEGIN {RS=""; FS="\n"} {print $1}' file.txt
awk 'BEGIN {OFS=","} {print $1, $2, $3}' data.txt
awk 'BEGIN {ORS=" | "} {print $1}' file.txt

🛠️ Control structures including if-else statements, for loops, and while loops enable complex logic within awk programs. Functions can be defined for reusable code blocks. At this level, awk transcends simple text processing and becomes a full scripting language, though still optimized for line-oriented data.

awk '{if ($3 > 100) print $1, "high"; else print $1, "low"}' data.txt
awk '{for (i=1; i<=NF; i++) sum+=$i; print sum; sum=0}' numbers.txt
awk 'BEGIN {i=1; while (i<=5) {print i; i++}}'

Practical awk Applications

Log analysis represents one of awk's killer applications. Parsing web server logs, application logs, or system logs to extract statistics, identify patterns, or generate reports becomes straightforward with awk's field-processing capabilities. Calculating response times, counting status codes, or tracking user activity—tasks that might require complex scripts in other languages—become elegant one-liners in awk.

awk '{print $1}' access.log | sort | uniq -c | sort -rn | head -10
awk '$9 == 404 {print $7}' access.log
awk '{sum+=$10} END {print sum/NR}' response_times.log
awk -F'[: ]' '{hr=$1; count[hr]++} END {for (h in count) print h, count[h]}' timestamps.log

Data transformation and reformatting tasks showcase awk's versatility. Converting between file formats, extracting specific columns, calculating derived values, or restructuring data—these operations combine awk's pattern matching, field processing, and computational capabilities. Whether you're preparing data for analysis, generating reports, or integrating systems with different formats, awk bridges the gaps efficiently.

"The best tool is the one that makes your specific task easiest—grep for finding, sed for changing, awk for analyzing."

Combining grep, sed, and awk in Pipelines

The true power of these tools emerges when you combine them in Unix pipelines. Each tool handles its specialty, passing results to the next command in a chain of transformations. This compositional approach—doing one thing well and connecting tools together—represents the Unix philosophy at its finest. Complex data processing workflows that might require hundreds of lines of code in traditional programming languages become concise, readable pipelines.

Pipelines flow from left to right, with each command processing the output of the previous command. The pipe symbol | connects commands, creating a stream of data that flows through multiple transformations. Understanding which tool to use at each stage—grep for filtering, sed for transformation, awk for analysis—enables you to construct efficient, maintainable processing workflows.

Strategic Tool Selection

🎪 Use grep when you need to filter lines based on patterns. Grep excels at finding needles in haystacks, reducing large datasets to relevant subsets. Its speed and simplicity make it ideal as the first stage in a pipeline, quickly eliminating irrelevant data before more complex processing begins.

✂️ Use sed when you need to transform or modify text. Sed's substitution capabilities handle find-and-replace operations, text insertion, deletion, and line-by-line transformations. Place sed in the middle of pipelines to clean, normalize, or restructure data after filtering but before analysis.

📊 Use awk when you need to analyze, calculate, or reformat field-oriented data. Awk's programming capabilities make it perfect for aggregation, statistics, and complex transformations that require variables, arrays, or conditional logic. Position awk at the end of pipelines for final analysis and report generation.

grep "ERROR" app.log | sed 's/^.*ERROR: //' | awk '{count[$0]++} END {for (err in count) print count[err], err}' | sort -rn
cat access.log | grep "404" | awk '{print $1}' | sort | uniq -c | sort -rn | head -20
ps aux | grep "python" | awk '{sum+=$3} END {print "Total CPU:", sum "%"}'

Real-World Pipeline Examples

Web server log analysis demonstrates the power of combined tools. Starting with grep to filter specific status codes or time ranges, piping through sed to extract or transform fields, then using awk to calculate statistics or generate reports—this pattern solves countless real-world problems. The pipeline approach allows you to build complex analyses incrementally, testing each stage before adding the next.

grep "$(date +%d/%b/%Y)" access.log | grep " 200 " | awk '{print $7}' | sort | uniq -c | sort -rn | head -10
cat error.log | grep -i "exception" | sed 's/.*Exception: //' | awk '{count[$0]++} END {for (e in count) print count[e], e}' | sort -rn

System monitoring and reporting benefit from pipeline processing. Extracting process information, calculating resource usage, identifying trends, or generating alerts—these tasks combine filtering, transformation, and analysis in ways that showcase each tool's strengths. The ability to process live data streams makes these pipelines valuable for both interactive troubleshooting and automated monitoring.

ps aux | awk 'NR>1 {cpu[$1]+=$3; mem[$1]+=$4} END {for (user in cpu) printf "%-10s CPU: %5.1f%% MEM: %5.1f%%\n", user, cpu[user], mem[user]}'
df -h | grep "^/dev" | awk '$5+0 > 80 {print $6, $5}'

Performance Considerations

Pipeline efficiency matters when processing large datasets. Each pipe creates a new process, so extremely long pipelines with many stages can introduce overhead. However, the streaming nature of these tools means they process data incrementally, never loading entire files into memory. This makes pipelines surprisingly efficient even for gigabyte-scale datasets.

Order matters in pipelines. Filtering early with grep reduces the data volume for subsequent commands, improving overall performance. Expensive operations like sorting should happen as late as possible, after data has been filtered and reduced. Sometimes a single awk command can replace a complex pipeline, trading readability for efficiency—choose based on your specific needs and constraints.

"Mastery isn't memorizing every option—it's understanding the philosophy behind each tool and knowing how to combine them creatively."

Best Practices and Common Patterns

Effective text processing requires more than knowing commands—it demands understanding patterns, anticipating edge cases, and writing maintainable solutions. Whether you're crafting a quick one-liner or building a production script, following best practices ensures your commands work reliably and remain understandable when you revisit them months later.

Testing and Debugging Strategies

Always test commands on sample data before running them on production files. Create small test files that include edge cases: empty lines, special characters, maximum and minimum values, unusual formatting. This practice catches errors early and builds confidence in your commands. For sed's in-place editing, always test without -i first, or use -i.bak to preserve originals.

head -100 large_file.txt > test_sample.txt
sed 's/pattern/replacement/' test_sample.txt | less
awk '{print $1, $2}' test_sample.txt | head -20

Build pipelines incrementally. Start with a simple grep or awk command, verify the output, then add the next stage. This iterative approach makes debugging easier—when something breaks, you know exactly which addition caused the problem. Use tee to capture intermediate results while still passing data through the pipeline, enabling inspection without breaking the flow.

Handling Special Characters and Edge Cases

Special characters in regular expressions require escaping. Characters like . * [ ] ^ $ \ | have special meanings and must be escaped with backslashes when you want to match them literally. Different tools have different escaping requirements—what works in grep might need adjustment for sed or awk. When in doubt, test with simple examples first.

grep '\$[0-9]\+\.[0-9]\{2\}' prices.txt
sed 's/\./DOT/g' file.txt
awk -F'\|' '{print $1}' pipe_delimited.txt

Empty lines, leading/trailing whitespace, and inconsistent field separators cause common problems. Use grep -v '^$' to filter empty lines, or handle them explicitly in awk with NF > 0 conditions. Trim whitespace with sed or use awk's automatic field splitting, which handles multiple spaces by default. For inconsistent delimiters, regular expression field separators in awk provide flexibility.

Documentation and Maintainability

Complex commands deserve comments, especially in scripts. Explain the purpose, expected input format, and any assumptions. For long pipelines, break them across multiple lines with backslashes, adding comments for each stage. Future you (or your colleagues) will appreciate the clarity when debugging or modifying the code months later.

# Extract error counts by type from application log
grep "ERROR" app.log | \ # Filter error lines
sed 's/.*ERROR: //' | \ # Extract error message
awk '{count[$0]++} END {for (e in count) print count[e], e}' | \ # Count occurrences
sort -rn | \ # Sort by frequency
head -20 # Top 20 errors

Consider creating shell functions or scripts for frequently used patterns. Rather than remembering complex command sequences, encapsulate them in named functions with clear parameters. This approach promotes reusability, reduces errors, and makes your toolkit more accessible to others on your team.

Security Considerations

Be cautious with user input in text processing commands. Unsanitized input in regular expressions can cause unexpected behavior or even security vulnerabilities. When processing untrusted data, validate input formats first, use fixed strings instead of patterns where possible, and avoid executing commands constructed from user input without thorough sanitization.

File permissions matter when using sed's in-place editing or redirecting output. Ensure you have write permissions before attempting modifications. For sensitive data, be aware that intermediate pipeline stages might expose information in process listings. Consider using temporary files with restricted permissions for sensitive processing tasks.

How do I search for a string that contains special characters like dots or slashes?

Escape special characters with a backslash. For example, to search for "192.168.1.1", use grep "192\.168\.1\.1". Alternatively, use grep's -F flag for fixed-string (literal) matching: grep -F "192.168.1.1". This treats the pattern as a literal string rather than a regular expression, automatically handling all special characters.

What's the difference between grep -E, grep -P, and plain grep?

Plain grep uses basic regular expressions (BRE), where some metacharacters like +, ?, and | must be escaped. The -E flag enables extended regular expressions (ERE), treating these characters as special without escaping. The -P flag enables Perl-compatible regular expressions (PCRE), adding advanced features like lookaheads, lookbehinds, and non-greedy quantifiers. For most tasks, -E provides the best balance of power and portability.

How can I modify files in place safely with sed?

Use sed -i.bak 's/old/new/g' file.txt to create a backup with a .bak extension before modifying the original. Always test your sed command without the -i flag first to verify it produces the expected output. For critical files, consider using version control or creating explicit backups before running any in-place modifications. Some systems require sed -i '' 's/old/new/g' (with an empty string after -i) instead of sed -i alone.

When should I use awk instead of sed or grep?

Use awk when you need to work with fields or columns, perform calculations, maintain state across lines (like running totals), or use variables and arrays. If you're just searching for patterns, grep is simpler and faster. If you're doing straightforward text substitution, sed is more appropriate. Awk shines when you need to analyze structured data, aggregate information, or perform complex transformations that require programming logic.

How do I process CSV files that contain commas within quoted fields?

Standard awk struggles with quoted fields containing delimiters. For robust CSV processing, consider specialized tools like csvkit or miller. However, for simple cases, you can use awk with FPAT: awk 'BEGIN{FPAT="([^,]+)|(\"[^\"]+\")"} {print $1, $3}'. This pattern matches either non-comma sequences or quoted strings. For production CSV processing, dedicated tools handle edge cases more reliably than general-purpose text processors.

Can these tools handle binary files or only text files?

These tools are designed for text processing and may produce unexpected results with binary files. Grep will often skip binary files by default or output a warning. However, grep -a forces binary files to be treated as text, useful for searching in files with mixed content. For true binary file processing, use specialized tools like hexdump, xxd, or strings to extract text from binaries before processing with grep, sed, or awk.