laitimes

Linux pipeline commands and text processing code interpretation

author:Flash Gene

1 Introduction

1.1 The Importance of Text Processing in Linux

In Linux, everything exists in the form of files, so text processing is at the heart of system management and maintenance. We often need to read, edit, and create text files, as well as manage files and directories. For large-scale source code files, we need to perform tasks such as searching, replacing, formatting, and parsing. Luckily, Linux provides us with powerful text processing tools like grep, sed, and awk that can help us accomplish these tasks efficiently.

1.2 The Role of Pipeline Commands in Text Processing

Pipeline commands enable efficient data transfer and processing by taking the output of one command as input to another. By concatenating multiple commands, pipeline commands are able to reduce reliance on temporary files, thereby optimizing system performance. Specifically, it has features such as data transfer, data filtering, and data transformation. First, pipeline commands can quickly transfer data and avoid creating unnecessary temporary files, thus improving data processing efficiency. Secondly, through the data filtering function, users can filter out data that meets specific conditions.

1.3 Advantages of Pipeline Command Processing Text

  • Simple and efficient: Pipeline commands realize the chain processing of tasks by passing the output of one command directly to the next command, avoiding tedious intermediate steps, and making the overall operation more concise and efficient
  • Flexible combination: Since pipeline commands can be easily concatenated together, we can freely combine different commands according to specific needs to form a powerful data processing process to meet the needs of various complex scenarios
  • Instant feedback: Pipeline commands are very real-time, and can be processed and analyzed at the same time as data is generated, allowing us to get immediate feedback and identify and resolve issues more quickly

In summary, the advantages of pipeline commands in text processing are mainly reflected in their simplicity, immediacy, elasticity, and resource efficiency, making them the preferred tool for processing text data.

2 Introduction to Pipeline Commands

2.1 Overview of pipeline command symbols (|).

When we execute a command in bash, we usually see the output information. But sometimes, this information needs to go through a series of processes to meet our needs. This is where the pipe command comes in handy. The pipeline command uses | symbol, which can take the output of one command as the input of another command, enabling continuous processing of data.

Let's start with an example of a pipeline command. Suppose we need to see how many files are in the /etc directory, then we can use ls -al /etc to see it, but because there are so many files, the screen will be crammed in one breath, and we don't know what the previous output is.

root@iZbp1gdez9i69jcemsn3vhZ:~# ls -al /etc
total 940
drwxr-xr-x 110 root root       4096 Jan 12 06:20 .
drwxr-xr-x  19 root root       4096 May 25  2023 ..
-rw-r--r--   1 root root       3028 Apr 21  2022 adduser.conf
-rw-r--r--   1 root root         16 May 15  2023 adjtime
drwxr-xr-x   2 root root       4096 May 15  2023 alternatives
drwxr-xr-x   3 root root       4096 May 15  2023 apache2
drwxr-xr-x   3 root root       4096 May 15  2023 apparmor
drwxr-xr-x   8 root root       4096 Jun  2  2023 apparmor.d
drwxr-xr-x   3 root root       4096 May 15  2023 apport
drwxr-xr-x   8 root root       4096 May 15  2023 apt
……
-rw-r--r--   1 root root        460 Dec  8  2021 zsh_command_not_found           

In this case we can use the less command:

root@iZbp1gdez9i69jcemsn3vhZ:~# ls -al /etc | less 
total 940
drwxr-xr-x 110 root root       4096 Jan 12 06:20 .
drwxr-xr-x  19 root root       4096 May 25  2023 ..
-rw-r--r--   1 root root       3028 Apr 21  2022 adduser.conf
-rw-r--r--   1 root root         16 May 15  2023 adjtime
drwxr-xr-x   2 root root       4096 May 15  2023 alternatives
drwxr-xr-x   3 root root       4096 May 15  2023 apache2
drwxr-xr-x   3 root root       4096 May 15  2023 apparmor
drwxr-xr-x   8 root root       4096 Jun  2  2023 apparmor.d
drwxr-xr-x   3 root root       4096 May 15  2023 apport
drwxr-xr-x   8 root root       4096 May 15  2023 apt           

This way, the output of the ls command can be read in less way, and with the less feature, we can flip the relevant information back and forth. The key to this is the pipe command (|). The pipe command (|) can only process the standard output information from the previous command, and has no direct processing capability for standard error messages. Then the overall pipeline command looks like this:

Linux pipeline commands and text processing code interpretation

Figure 1 Pipeline commands

The first data that follows each pipeline must be a command, and this command must be able to accept standard output data in order to be a pipeline command. For example, LESS, GREP, SED, AWK, etc. are all pipe commands that can accept standard input, while ls, cp, and mv are not pipe commands because they do not accept data from stdin. To summarize, there are two main points to note about the pipeline command:

  • The pipeline command only processes standard output and ignores standard errors
  • Pipeline commands must be able to accept data from the previous command as standard input for further processing

2.2 Communication between pipeline commands and processes

Pipelines are the oldest form of inter-process communication in the UNIX environment, and they are essentially a type of file and are designed to follow UNIX's Everything-to-Everything principles of a file. Although the implementation is in the form of a file, the pipe itself does not take up space on disk or other external storage. On Linux implementations, it takes up memory space. So, the Linux pipeline is really a memory buffer that operates as a file, and it uses the system to call the read()/write() function to read and write to enable data transfer between them.

Linux pipeline commands and text processing code interpretation

Figure 2 Communication between pipeline commands and processes

2.2.1 Kernel implementation of pipelines

  • Ring buffer

In the kernel, pipelines use ring buffers to store data. The principle of a ring buffer is that a buffer is treated as an end-to-end ring in which the read and write pointers are used to record the position of read and write operations.

Linux pipeline commands and text processing code interpretation

Figure 3 Annular buffer

In the Linux kernel, 16 memory pages are used as ring buffers, so the size of this ring buffer is 64KB (16*4KB).

When data is written to a pipeline, the write starts from where the write pointer is pointed, and the write pointer is moved forward. When reading data from a pipeline, it starts reading in from the read pointer and moves the read pointer forward. When a read operation is performed on a pipeline that has no data to read, the current process will be blocked. Writing to pipes that don't have free space will also block the current process.

  • Pipeline objects

In the Linux kernel, pipelines are managed using pipe_inode_info objects. Let's start by looking at the definition of the pipe_inode_info object, as follows:

struct pipe_inode_info {
    wait_queue_head_t wait;
    unsigned int nrbufs,
    unsigned int curbuf;
    ...
    unsigned int readers;
    unsigned int writers;
    unsigned int waiting_writers;
    ...
    struct inode *inode;
    struct pipe_buffer bufs[16];
};           

pipe_inode_info What the various fields of the object do:

  • wait: a waiting queue that stores processes that are waiting for a pipeline that can be read or written.
  • bufs: Ring buffer, consisting of 16 pipe_buffer objects, each pipe_buffer of which has a memory page, more on this later.
  • nrbufs: indicates how many memory pages of the ring buffer have been occupied by unread data.
  • curbuf: Indicates which memory page in the ring buffer is currently being read of the data.
  • readers: indicates the number of processes that are reading the pipeline.
  • writers:表示正在写入管道的进程数。
  • waiting_writers: indicates the number of processes waiting for the pipeline to be writable.
  • inode: The inode object associated with the pipeline.

Since the ring buffer is made up of 16 pipe_buffer objects, let's take a look at the definition of the pipe_buffer object:

struct pipe_buffer {
    struct page *page;
    unsigned int offset;
    unsigned int len;
    ...
};           

pipe_buffer What the various fields of the object do:

  • page: Points to the memory page occupied by the pipe_buffer object.
  • offset: If the process is reading the data of the current memory page, then offset points to the offset that is reading the current memory page.
  • len: indicates the length of the current memory page with unread data.
Linux pipeline commands and text processing code interpretation

Figure 4 Relationship between pipe_inode_info objects and pipe_buffer objects

  • Read operations

The pipe's ring buffer read pointer is a combination of the curbuf field of the pipe_inode_info object and the offset field of the pipe_buffer object:

  • The curbuf field of the pipe_inode_info object represents which pipe_buffer of the bufs array from which the read operation will read the data.
  • The offset field of the pipe_buffer object indicates where the read operation starts reading data from the memory page.

Read operations are done by the pipe_read function. It looks like this:

static ssize_t
pipe_read(struct kiocb *iocb, const struct iovec *_iov, unsigned long nr_segs,
          loff_t pos)
{
    ...
    struct pipe_inode_info *pipe;


    // 1. 获取管道对象
    pipe = inode->i_pipe;


    for (;;) {
        // 2. 获取管道未读数据占有多少个内存页
        int bufs = pipe->nrbufs;


        if (bufs) {
            // 3. 获取读操作应该从环形缓冲区的哪个内存页处读取数据
            int curbuf = pipe->curbuf;  
            struct pipe_buffer *buf = pipe->bufs + curbuf;
            ...


            /* 4. 通过 pipe_buffer 的 offset 字段获取真正的读指针,
             *    并且从管道中读取数据到用户缓冲区.
             */
            error = pipe_iov_copy_to_user(iov, addr + buf->offset, chars, atomic);
            ...


            ret += chars;
            buf->offset += chars; // 增加 pipe_buffer 对象的 offset 字段的值
            buf->len -= chars;    // 减少 pipe_buffer 对象的 len 字段的值


            /* 5. 如果当前内存页的数据已经被读取完毕 */
            if (!buf->len) {
                ...
                curbuf = (curbuf + 1) & (PIPE_BUFFERS - 1);
                pipe->curbuf = curbuf; // 移动 pipe_inode_info 对象的 curbuf 指针
                pipe->nrbufs = --bufs; // 减少 pipe_inode_info 对象的 nrbufs 字段
                do_wakeup = 1;
            }


            total_len -= chars;


            // 6. 如果读取到用户期望的数据长度, 退出循环
            if (!total_len)
                break;
        }
        ...
    }


    ...
    return ret;
}           

The above code is divided into the following steps:

  • Get the pipe_inode_info object to the pipeline through the file inode object
  • How many memory pages are used to get pipeline unread data through the nrbufs field of the pipe_inode_info object
  • Get through the curbuf field of the pipe_inode_info object from which page of memory the ring buffer the read should read from
  • The true read pointer is obtained through the offset field of the pipe_buffer object, and the data is read from the pipeline to the user buffer
  • If the data for the current memory page has already been read, move the curbuf pointer of the pipe_inode_info object and reduce the value of its nrbufs field
  • If the data length expected by the user is read, the loop is exited
  • Copying

Writes are calculated by reading the pointer. So how do you calculate a write pointer by reading a pointer?

Write pointer = read pointer + unread data length

  • First, use the curbuf field and nrbufs field of pipe_inode_info to locate which pipe_buffer should write data to.
  • Then use the offset field and len field of the pipe_buffer object to locate where the memory page should be written

Writes are done by the pipe_write function. It looks like this:

static ssize_t
pipe_write(struct kiocb *iocb, const struct iovec *_iov, unsigned long nr_segs,
           loff_t ppos)
{
    ...
    struct pipe_inode_info *pipe;
    ...
    pipe = inode->i_pipe;
    ...
    chars = total_len & (PAGE_SIZE - 1); /* size of the last buffer */


    // 1. 如果最后写入的 pipe_buffer 还有空闲的空间
    if (pipe->nrbufs && chars != 0) {
        // 获取写入数据的位置
        int lastbuf = (pipe->curbuf + pipe->nrbufs - 1) & (PIPE_BUFFERS-1);
        struct pipe_buffer *buf = pipe->bufs + lastbuf;
        const struct pipe_buf_operations *ops = buf->ops;
        int offset = buf->offset + buf->len;


        if (ops->can_merge && offset + chars <= PAGE_SIZE) {
            ...
            error = pipe_iov_copy_from_user(offset + addr, iov, chars, atomic);
            ...
            buf->len += chars;
            total_len -= chars;
            ret = chars;


            // 如果要写入的数据已经全部写入成功, 退出循环
            if (!total_len)
                goto out;
        }
    }


    // 2. 如果最后写入的 pipe_buffer 空闲空间不足, 那么申请一个新的内存页来存储数据
    for (;;) {
        int bufs;
        ...
        bufs = pipe->nrbufs;


        if (bufs < PIPE_BUFFERS) {
            int newbuf = (pipe->curbuf + bufs) & (PIPE_BUFFERS-1);
            struct pipe_buffer *buf = pipe->bufs + newbuf;
            ...


            // 申请一个新的内存页
            if (!page) {
                page = alloc_page(GFP_HIGHUSER);
                ...
            }
            ...
            error = pipe_iov_copy_from_user(src, iov, chars, atomic);
            ...
            ret += chars;


            buf->page = page;
            buf->ops = &anon_pipe_buf_ops;
            buf->offset = 0;
            buf->len = chars;


            pipe->nrbufs = ++bufs;
            pipe->tmp_page = NULL;


            // 如果要写入的数据已经全部写入成功, 退出循环
            total_len -= chars;
            if (!total_len)
                break;
        }
        ...
    }


out:
    ...
    return ret;
}           

The above code is a bit long, but the logic is very simple, mainly doing the following:

  • If there is free space for the last write to pipe_buffer, then the data is written to this pipe_buffer and the value of its len field is increased.
  • If the last write doesn't have enough free space for the pipe_buffer, then a new memory page is requested, the data is saved to the new memory page, and the value of the nrbufs field of pipe_inode_info is increased.
  • If all the data has been written, the write operation is exited.

3 文本处理三剑客(grep、sed、awk)

3.1 grep command: a text search tool

The grep command analyzes the information line by line, and if a line contains the information we need, it takes out the line. The simple syntax is as follows:

grep [-acinv] [--color=auto] '查找字符' filename           

The options and parameters are as follows:

  • -a: Looks up the data as a text file for the binary
  • -c: Counts the number of times the 'lookup character' is found
  • -i: Ignore the difference in case, so the case is treated as the same
  • -n: output line number
  • -v: Inverted selection, showing those rows that do not have 'Find Character' content

Example 1: Query for process IDs that contain java

ps -ef | grep java           
root@iZbp1gdez9i69jcemsn3vhZ:~# ps -ef | grep java
root      333317  330639  0 16:20 pts/0    00:00:00 grep --color=auto java
root     1354943 1354940 11 Jan03 ?        1-08:39:19 ../jdk1.8//bin/java -server -Xmx4g -Xms4g -XX:MetaspaceSize=1024m -XX:MaxMetaspaceSize=1024m -XX:NewRatio=3 -XX:SurvivorRatio=8 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:MaxTenuringThreshold=15 -XX:+ExplicitGCInvokesConcurrent -XX:+DoEscapeAnalysis -XX:+CMSClassUnloadingEnabled -Djava.awt.headless=true -Djava.security.egd=file:/dev/./urandom -Dfile.encoding=utf-8 -Dsun.jnu.encoding=utf-8 -Duser.timezone=GMT+8 -Duser.language=zh -Duser.country=CN -Djava.net.preferIPv4Stack=false -Djava.util.logging.config.file=./lib/logging.properties -Djava.security.policy=./conf/java.policy -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=../logs -jar ./bootstrap.jar -r -lib ./patch;./lib;./jdbc StartUp
root     1355266       1  0 Jan03 ?        00:17:27 ../jdk1.8//bin/java -Djava.util.logging.config.file=../webserver/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djdk.tls.ephemeralDHKeySize=2048 -Djava.protocol.handler.pkgs=org.apache.catalina.webresources -Dorg.apache.catalina.security.SecurityListener.UMASK=0027 -Dignore.endorsed.dirs= -classpath ../webserver/bin/bootstrap.jar:../webserver/bin/tomcat-juli.jar -Dcatalina.base=../webserver -Dcatalina.home=../webserver -Djava.io.tmpdir=../webserver/temp org.apache.catalina.startup.Bootstrap           

Example 2: Query log files based on a specified string

cat aws.log | grep 'https'           
root@iZbp1gdez9i69jcemsn3vhZ:/home/AWS6/logs# cat aws.log | grep 'https'
2024-01-15 01:00:00--job worker-5bc3681f-98c8-473f-b1f7-62a519b734e2,DepartmentJob--定时任务DepartmentJob-执行方法url=https://www.baidu.com
2024-01-15 01:30:00--job worker-8d9b850e-729d-4c19-9f26-65fcf9f01695,UserJob--job.UserJob.execute- https://www.baidu.com           

3.2 sed command: Implement text conversion and processing

As we mentioned earlier, the grep command parses a line of text and removes it as a whole line if it contains a keyword. The sed command we're going to talk about next is also a pipeline command (which can analyze standard inputs), and it can also add, delete, replace, and so on on specific rows. The usage of sed is as follows:

sed [-nefr] [操作]           

The options and parameters are as follows:

  • -n: Use silent mode. In general sed usage, all data from stdin is usually listed on the screen, but if the -n parameter is added, only those rows that have been selected by sed will be listed.
  • -e: Makes the result of the sed operation output from the screen, and changes the original file (this parameter is selected by default, as opposed to the direct modification file of -i).
  • -f: reads the sed operation to be performed from a file, and -f filename can perform the sed operation written in filename.
  • The -r:sed operation uses the syntax of the extended regular expression (the default is the basic regular expression syntax).
  • -i: Modify the content of the read file directly, instead of being output by the screen.

Regarding the [Operation] part of it, its format is as follows:

[n1[,n2]]function           

n1, n2: It does not necessarily exist, it usually represents the number of rows to be operated, for example, if my operation needs to be performed between 10 and 20 lines, it is written as 10, 20 [operation name]. Specifically, the function of the operation function on the row includes the following things:

  • a: New, A can be followed by characters, these characters will be added to the next line of N1/N2;
  • c: replace, c can be followed by characters, these characters can replace the line between n1 and n2;
  • d: delete, because it is a delete, so there is usually no need to pick up anything after d;
  • i: inserted, i can be followed by characters, these characters will be added to the last line of n1/n2;
  • p: Print, i.e. print out some selected rows. Usually p will be run with the argument sed -n.
  • s: substitution, you can directly replace the work, usually the operation of this s can be used with regular expressions.

Example 1: Review the contents of the /home/linux/test file and precede each line with a line number, while deleting 2-5 lines

cat -n /home/linux/test | sed '2,5d'           

Results:

root@iZbp1gdez9i69jcemsn3vhZ:/home/linux# cat -n /home/linux/test | sed '2,5d'
     1  root:x:0:0:root:/root:/bin/bash
     6  games:x:5:60:games:/usr/games:/usr/sbin/nologin
     7  man:x:6:12:man:/var/cache/man:/usr/sbin/nologin
     8  lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin
     9  mail:x:8:8:mail:/var/mail:/usr/sbin/nologin
    10  news:x:9:9:news:/var/spool/news:/usr/sbin/nologin           

Example 2: Pick up on the above and add the word hello world after the second line

cat -n /home/linux/test | sed '2a hello world'           

Results:

root@iZbp1gdez9i69jcemsn3vhZ:/home# cat -n /home/linux/test | sed '2a hello world'
     1  root:x:0:0:root:/root:/bin/bash
     2  daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
hello world
     3  bin:x:2:2:bin:/bin:/usr/sbin/nologin
     4  sys:x:3:3:sys:/dev:/usr/sbin/nologin
     5  sync:x:4:65534:sync:/bin:/bin/sync
     6  games:x:5:60:games:/usr/games:/usr/sbin/nologin
     7  man:x:6:12:man:/var/cache/man:/usr/sbin/nologin
     8  lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin
     9  mail:x:8:8:mail:/var/mail:/usr/sbin/nologin
    10  news:x:9:9:news:/var/spool/news:/usr/sbin/nologin           

Example 3: Continuing from the above, replace the content with No 2-5 number

cat -n /home/linux/test | sed '2,5c No 2-5 number'           

Results:

root@iZbp1gdez9i69jcemsn3vhZ:/home/linux# cat -n /home/linux/test | sed '2,5c No 2-5 number'
     1  root:x:0:0:root:/root:/bin/bash
No 2-5 number
     6  games:x:5:60:games:/usr/games:/usr/sbin/nologin
     7  man:x:6:12:man:/var/cache/man:/usr/sbin/nologin
     8  lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin
     9  mail:x:8:8:mail:/var/mail:/usr/sbin/nologin
    10  news:x:9:9:news:/var/spool/news:/usr/sbin/nologin           

示例4:承接上面,仅列出/home/linux/test文件内的第5-7行

cat -n /home/linux/test | sed -n '5,7p'           

Results:

root@iZbp1gdez9i69jcemsn3vhZ:/home/linux# cat -n /home/linux/test | sed -n '5,7p'
     5  sync:x:4:65534:sync:/bin:/bin/sync
     6  games:x:5:60:games:/usr/games:/usr/sbin/nologin
     7  man:x:6:12:man:/var/cache/man:/usr/sbin/nologin           

3.3 awk commands: Powerful text analysis tools

In contrast to SEDs, which often operate on an entire row, AWK tends to divide a row into multiple fields. As a result, AWK is well suited for handling small text data, and its mode of operation is typically like this:

awk '{条件类型1{操作1} 条件类型2{操作2} ...}' filename           

awk is followed by two single quotation marks followed by curly braces {} to set the action you want to perform on the data. The awk can process subsequent files, and can also read standard output from the previous command. As mentioned earlier, awk mainly handles each row into multiple fields, and the default field separator is a spacebar or [tab] key. For example, let's use last to retrieve the loginer's data (only the first 3 rows):

last -n 3           

Results:

root@iZbp1gdez9i69jcemsn3vhZ:/home/linux# last -n 3
root     pts/0        115.204.34.254   Mon Jan 15 16:13   still logged in
root     pts/0        36.27.53.152     Mon Jan 15 13:14 - 15:57  (02:43)
root     pts/0        36.27.53.152     Mon Jan 15 11:16 - 13:13  (01:57)           

If I want to retrieve the IP address of the login user and separate the account and the IP address with [Tab], it will look like this:

last -n 3 | awk '{print $1 "\t" $3}'           

Results:

root@iZbp1gdez9i69jcemsn3vhZ:/home/linux# last -n 3 | awk '{print $1 "\t" $3}'
root    115.204.34.254
root    36.27.53.152
root    36.27.53.152           

Let's continue with the example of last -n 3 above, if I want to:

  • List the account number for each row (i.e. $1)
  • List the number of rows currently being processed (i.e. the NR variable in awk)
  • and state how many fields the row has (i.e. NF fields in awk)
last -n 5 | awk '{print $1 "\t lines: " NR "\t columns: " NF}'           

Results:

root@iZbp1gdez9i69jcemsn3vhZ:/home/linux# last -n 5 | awk '{print $1 "\t lines: " NR "\t columns: " NF}'
root     lines: 1        columns: 10
root     lines: 2        columns: 10
root     lines: 3        columns: 10
root     lines: 4        columns: 10
root     lines: 5        columns: 10
         lines: 6        columns: 0
wtmp     lines: 7        columns: 7           

Let's take a look at how to use AWK to do the computing functions. Let's say we have a salary data table file with pay.txt that reads as follows:

root@iZbp1gdez9i69jcemsn3vhZ:/home/linux# cat pay.txt
Name    1st     2nd     3th
VBird   23000   24000   25000
DMTsai  21000   20000   23000
Bird2   43000   42000   41000           

How do we calculate the total amount of 1st, 2nd, 3rd place for each person? And we need to format the output. We can think of it like this:

  • The first row is just the header, so the first row does not summate and only needs to print the header (i.e., when NR==1 is processed).
  • Sum after the second line (processed after NR>=2)
cat pay.txt | \
awk 'NR == 1 {printf "%10s %10s %10s %10s %10s\n", $1, $2, $3, $4, "Total"} \
NR >= 2 {total = $2 + $3 + $4; \
printf "%10s %10d %10d %10d %10.2f\n", $1, $2, $3, $4, total}'           

Results:

root@iZbp1gdez9i69jcemsn3vhZ:/home/linux# cat pay.txt | \
awk 'NR == 1 {printf "%10s %10s %10s %10s %10s\n", $1, $2, $3, $4, "Total"} \
NR >= 2 {total = $2 + $3 + $4; \
printf "%10s %10d %10d %10d %10.2f\n", $1, $2, $3, $4, total}'
      Name        1st        2nd        3th      Total
     VBird      23000      24000      25000   72000.00
    DMTsai      21000      20000      23000   64000.00
     Bird2      43000      42000      41000  126000.00           

4 Apps in Shell Scripts

Example 1: Query error logs

#!/bin/bash


# 指定要查询的字符串
SEARCH_STRING="error"


# 指定系统日志文件路径(这里假设使用的是syslog文件,实际路径可能有所不同)
LOG_FILE="/home/AWS6/logs/aws.log"


# 指定输出文件
OUTPUT_FILE="matched_lines.txt"


# 使用grep查询包含指定字符串的行,并将结果重定向到新文件
grep "$SEARCH_STRING" "$LOG_FILE" > "$OUTPUT_FILE"


echo "Matching lines saved to $OUTPUT_FILE"

           

Results:

root@iZbp1gdez9i69jcemsn3vhZ:/home# ./error_log.sh 
Matching lines saved to matched_lines.txt           

Example 2: Check the health of a Java process

#!/bin/bash


# 查询Java进程并打印运行状况


# 获取所有Java进程的PID
java_pids=$(ps aux | grep '[j]ava' | awk '{print $2}')


if [ -z "$java_pids" ]; then
  echo "No Java processes found."
else
  echo "Java processes running:"
  for pid in $java_pids; do
    # 打印Java进程的基本信息
    echo "PID: $pid"
    ps -p $pid -o pid,ppid,cmd,%mem,%cpu,vsz,rss,etime
    echo "--------------------------"
  done
fi           

Results:

root@iZbp1gdez9i69jcemsn3vhZ:/home# ./jvm.sh
Java processes running:
PID: 1354943
    PID    PPID CMD                         %MEM %CPU    VSZ   RSS     ELAPSED
1354943 1354940 ../jdk1.8//bin/java -server  8.2 11.2 14725120 2601820 12-02:19:29
--------------------------
PID: 1355266
    PID    PPID CMD                         %MEM %CPU    VSZ   RSS     ELAPSED
1355266       1 ../jdk1.8//bin/java -Djava.  6.2  0.1 15684712 1964080 12-02:19:07
--------------------------           

5 Summary

In the process of using Linux pipeline commands and text processing, pipeline commands and text processing technology can help us better manage the server, and by using grep, sed, awk, etc., we can easily locate problems, analyze logs, and monitor performance, so as to improve work efficiency. Understanding the functions and usage of different commands allows us to choose the right command for different text processing scenarios. Whether it's searching for keywords, extracting data, or replacing text, you can flexibly adapt through the combination of different commands.

Author of this article

Beta, from the backend team of Mantu Internet Center.

Source-WeChat public account: Mantu coder

Source: https://mp.weixin.qq.com/s/fF9Cj2MTwF0ED0UBKlur8A

Read on