Unix Command Line Text Parsing I. Identifying Objective Before anything can help someone parse text, that person must understand what he or she is looking for. Sometimes it's overwhelming, but no amount of pipes will be a magic bullet here. II. Think 2-dimensinoally Text output is two dimensonal. Most unix data already has structure there that's going to help you. See every command output for the spreadsheet that it is. You have rows, columns, delimiters, and structure. Sometimes you just need to cut things away until you have that. III. Don't be efficient This is one type of programming where CPU time is out the window. This is all about creativity. Don't worry about optimization, if you were worried about speed you should move to a scripting language or C or something. You're going to use this thing once, get the data you need and move on. Dont' be afraid to write bad code, embrace it. IV. Reduce Output Once you've identified what you're looking for, throw away everything you know that doesn't help you. The basic tools I use most for this are awk (for rows); grep (for columns). * I don't use cut. It's clumsy, and for my money just stick to awk. V. You don't *HAVE* to be a regexp god. Regular expressions are certainly a powerful and useful skill. But being a regexp god is tough work for some people. I use very few regexp beyond file splats, and I do okay. VI. Think in Layers Revision is key, think in layers. Each layer is seperated by a pipe. You are building a chain of STDOUTS and STDINS. Take a look at layers often to understand the next steps... cat file cat file | awk '{print $1}' cat file | awk '{print $1}' | sort cat file | awk '{print $1}' | sort | uniq cat file | awk '{print $1}' | sort | uniq -c VII. Knowing a few tools very well is better than knowing a lot of tools not very well. I've written entire programs because I didn't read the manpage of a simple tool. Read manpages for the utilities in your toolbox. Double check your base commands don't already do what you're trying to do. VIII. My basic toolbox Limited Awk - conditionals (if) ; prints; tolower/upper; substr; basic variable usage Limited Sed - Strickly query/replaces Sort - sort file, by number, reverse Uniq - remove duplicate lines, count the lines (ALA "grouped by") tr -d 'X' - remove character x cat - number lines; look for wierd characters; squeeze blanks head,tail grep - print lines after/before match, match more than 1 regexp, inverse, whole word match, case insensitive wc -l count lines back ticks `` creating shell scripts from outputs tac (on linux) IX. Replace slow commands with file/cat reductions so you can think better in layers. Sometimes you're doing something slow at some point, perhaps a sort like... produce a list of all users who have ssh'd into a system var log messages might be very big, once you have a useful magnitude of reduction like |grep sshd and you need to build through it then take that reduction to disk cat /var/log/messages | grep sshd > /var/tmp/messages.ssh and replace your reduction with a reproduction: cat /var/tmp/messages.sh | next_command Just be sure you know how much data you're slogging around. X. Go numeric for IP/Ports.. it's more parsable. Real World Examples: Performance Troubleshooting.