English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية
AWK is a language for processing text files and a powerful text analysis tool.
The name AWK is because it takes the initial letters of the family names of the three founders Alfred Aho, Peter Weinberger, and Brian Kernighan.
awk [options] 'script' var=value file(s) or awk [options] -f scriptfile var=value file(s)
Option parameter description:
-F fs or --field-separator fs
Specify the input file field separator, fs is a string or a regular expression, such as-F:.
-v var=value or --asign var=value
Assign a user-defined variable.
-f scripfile or --file scriptfile
Read awk commands from a script file.
-mf nnn and -mr nnn
Set an intrinsic limit for the nnn value,-The mf option limits the maximum number of blocks allocated to nnn;-The mr option limits the maximum number of records. These two features are extended features of the Bell Labs version of awk and are not applicable in standard awk.
-W compact or --compat, -W traditional or --traditional
Run awk in compatibility mode. Therefore, the behavior of gawk is completely the same as standard awk, and all awk extensions are ignored.
-W copyleft or --copyleft, -W copyright or --copyright
Print a brief copyright information.
-W help or --help, -W usage or --usage
Print all awk options and a brief description of each option.
-W lint or --lint
Print warnings about structures that cannot be ported to traditional unix platforms.
-W lint-old or --lint-old
Print warnings about structures that cannot be ported to traditional unix platforms.
-W posix
Open compatibility mode. But there are the following limitations, not recognized:/x, function keywords, func, escape sequences, and when fs is a space, the new line is treated as a field delimiter; operators**and**= cannot replace ^ and ^=; fflush is invalid.
-W re-interval or --re-inerval
Allow the use of interval regular expressions, refer to (Posix character classes in grep), such as bracket expressions [[:alpha:]].
-W source program-text or --source program-text
using program-text as source code, can be used with-mixed use of f command.
-W version or --version
Print the version information of bug report.
The content of log.txt is as follows:
2 this is a test 3 Are you like awk This's a test 10 There are orange,apple,mongo
Usage 1:
awk '{[pattern] action}' {filenames} # Line matching statement awk '' can only use single quotes
Example:
# Split each line by space or TAB, and output the1、4Item $ awk '{print $1$4}' log.txt --------------------------------------------- 2 a 3 like This's 10 orange,apple,mongo # Format output $ awk '{printf "%-8s %-10s\n",$1$4}' log.txt --------------------------------------------- 2 a 3 like This's 10 orange,apple,mongo
Usage 2:
awk -F #-F is equivalent to the built-in variable FS, specifying the delimiter
Example:
# Use "," to split $ awk -F '{print $1$2}' log.txt --------------------------------------------- 2 this is a test 3 Are you like awk This's a test 10 There are orange apple # Or use built-in variables $ awk 'BEGIN{FS=","} {print $1$2}' log.txt --------------------------------------------- 2 this is a test 3 Are you like awk This's a test 10 There are orange apple # Use multiple delimiters. First use space to split, then split the results again using "," $ awk -F '[ ,]' '{print $1$2$5}' log.txt --------------------------------------------- 2 this test 3 Are awk This's a 10 There apple
Usage 3:
awk -v # Set variables
Example:
$ awk -va=1 '{print $1$1+a}' log.txt --------------------------------------------- 2 3 3 4 This's 1 10 11 $ awk -va=1 -vb=s '{print $1$1+a,$1b}' log.txt --------------------------------------------- 2 3 2s 3 4 3s This's 1 This'ss 10 11 10s
Usage four:
awk -f {awk script} {filename}
Example:
$ awk -f cal.awk log.txt
Operator | Description |
---|---|
= += -= *= /= %= ^= **= | Assignment |
?: | C conditional expression |
|| | Logical OR |
&& | Logical AND |
~ and !~ | Match regular expression and not match regular expression |
< <= > >= != == | Relational operator |
Space | Concatenation |
+ - | Addition, subtraction |
* / % | Multiplication, division, and modulus |
+ - ! | Unary plus, minus, and logical NOT |
^ *** | Power |
++ -- | Increase or decrease, as a prefix or suffix |
$ | Field reference |
in | Array member |
Filter lines where the first column is greater than2lines
$ awk '"1>2' log.txt #Command #Output 3 Are you like awk This's a test 10 There are orange,apple,mongo
Filter lines where the first column equals2lines
$ awk '"1==2 {print $1$3}' log.txt #Command #Output 2 is
Filter lines where the first column is greater than2And the second column equals 'Are' lines
$ awk '"1>2 && $2=="Are" {print $1$2$3}' log.txt #Command #Output 3 Are you
Variable | Description |
---|---|
$n | The nth field of the current record, separated by FS |
$0 | Complete input record |
ARGC | Number of command line arguments |
ARGIND | Position of the current file in the command line (starting from 0) |
ARGV | Array containing command line arguments |
CONVFMT | Number conversion format (default is %.6g)ENVIRON environment variable associated array |
ERRNO | Description of the last system error |
FIELDWIDTHS | Field width list (separated by spaces) |
FILENAME | Current filename |
FNR | Line numbers counted separately for each file |
FS | Field separator (default is any whitespace) |
IGNORECASE | If true, perform case-insensitive matching |
NF | The number of fields in a record |
NR | The number of records read, which is the line number, from1Start |
OFMT | Number output format (default is %.6g) |
OFS | Output field separator, default is the same as input field separator. |
ORS | Output record separator (default is a newline) |
RLENGTH | The length of the string matched by the match function |
RS | Record separator (default is a newline) |
RSTART | The first position of the string matched by the match function |
SUBSEP | Array index separator (default is/034) |
$ awk 'BEGIN{printf "%4s %4s %4s %4s %4s %4s %4s %4s %4s\n","FILENAME","ARGC","FNR","FS","NF","NR","OFS","ORS","RS";printf "%---------------------------------------------\n"} {printf "%4s %4s %4s %4s %4s %4s %4s %4s %4s\n",FILENAME,ARGC,FNR,FS,NF,NR,OFS,ORS,RS}' log.txt FILENAME ARGC FNR FS NF NR OFS ORS RS --------------------------------------------- log.txt 2 1 5 1 log.txt 2 2 5 2 log.txt 2 3 3 3 log.txt 2 4 4 4 $ awk -F' 'BEGIN{printf "%4s %4s %4s %4s %4s %4s %4s %4s %4s\n","FILENAME","ARGC","FNR","FS","NF","NR","OFS","ORS","RS";printf "%---------------------------------------------\n"} {printf "%4s %4s %4s %4s %4s %4s %4s %4s %4s\n",FILENAME,ARGC,FNR,FS,NF,NR,OFS,ORS,RS}' log.txt FILENAME ARGC FNR FS NF NR OFS ORS RS --------------------------------------------- log.txt 2 1 1 1 log.txt 2 2 1 2 log.txt 2 3 2 3 log.txt 2 4 1 4 # Output sequence number NR, line number of matching text $ awk '{print NR,FNR,$1$2$3}' log.txt --------------------------------------------- 1 1 2 this is 2 2 3 Are you 3 3 This's a test 4 4 10 There are # Specify the output delimiter $ $ awk '{print $1$2$5}' OFS=" $ " log.txt --------------------------------------------- 2 $ this $ test $ 3 $ Are $ awk $ This's $ a $ 10 There $
# Output the second column containing "th", and print the second and fourth columns $ awk '"2 ~ /th/ {print $2$4}' log.txt --------------------------------------------- this a
~ means the start of the pattern.// is the pattern.
# Output lines containing "re" $ awk '"/re/ "' log.txt --------------------------------------------- 3 Are you like awk 10 There are orange,apple,mongo
$ awk 'BEGIN{IGNORECASE=",1} /this/"' log.txt --------------------------------------------- 2 this is a test This's a test
$ awk '"2 !~ /th/ {print $2$4}' log.txt --------------------------------------------- Are like a There orange,apple,mongo $ awk '!/th/ {print $2$4}' log.txt --------------------------------------------- Are like a There orange,apple,mongo
For awk scripts, we need to pay attention to two keywords BEGIN and END.
BEGIN{This is where the statements to be executed before execution are placed }
END {This is where the statements to be executed after all lines are processed are placed }
{This is where the statements to be executed for each line are placed}
Suppose there is a file (student score sheet):
$ cat score.txt Marry 2143 78 84 77 Jack 2321 66 78 45 Tom 2122 48 77 71 Mike 2537 87 97 95 Bob 2415 40 57 62
Our awk script is as follows:
$ cat cal.awk #!/bin/awk -f # Before running BEGIN { math = 0 english = 0 computer = 0 printf "NAME NO. MATH ENGLISH COMPUTER TOTAL\n" printf "---------------------------------------------\n" } # Running { math+=$3 english+=$4 computer+=$5 printf "%-6s %-6s %4d %8d %8d %8d\n$1, $2, $3$4$5, $3+$4+$5 } # After running END { printf "---------------------------------------------\n" printf " TOTAL:%10d %8d %8d \n", math, english, computer printf "AVERAGE:%10.2f %8.2f %8.2f\n", math/NR, english/NR, computer/NR }
Let's take a look at the execution result:
$ awk -f cal.awk score.txt NAME NO. MATH ENGLISH COMPUTER TOTAL --------------------------------------------- Marry 2143 78 84 77 239 Jack 2321 66 78 45 189 Tom 2122 48 77 71 196 Mike 2537 87 97 95 279 Bob 2415 40 57 62 159 --------------------------------------------- TOTAL: 319 393 350 AVERAGE: 63.80 78.60 70.00
The hello world program of AWK is:
BEGIN { print "Hello, world!" }
Calculate file size
$ ls -l *.txt | awk '{sum+=$5} END {print sum} -------------------------------------------------- 666581
Find lines longer than from the file 80 lines:
awk 'length>80' log.txt
Print the 9x9 multiplication table
seq 9 | sed 'H;g' | awk -v RS='' '{for(i=1;i<=NF;i++)printf("%dx%d=%d%s", i, NR, i*NR, i==NR?"\n":"\t")'