您现在的位置是:网站首页>技术百科技术百科

sed 使用教程

小大寒2024-01-01[技术百科]博学多闻

sed 使用教程本文介绍了SED的基本用法,重点演示了如何使用s命令替换文本、在行首或行尾添加内容,以及通过正则表达式实现高级文本处理。通过实例说明了SED的强大功能,帮助读者高效地编辑文本。

awk诞生于1977年,比awk年长2-3年的sed更像是个兄长。就像《射雕英雄传》中,awk像蓉儿,而sed则像靖哥哥。因此,本站的另一篇awk的文章“蓉儿妹妹跳了个Topless”后,他的靖哥哥sed也坐不住了,也必须出场展示一番。

sed的全称是stream editor,即流编辑器,通过编程的方式来处理文本,极具黑客风格。sed的核心功能是基于正则表达式的模式匹配,因此,精通sed的人往往在正则表达式方面也非常娴熟。

本篇文章不会全面介绍sed的所有功能,你可以参考sed官方手册。这里,我希望能与你们分享一些技巧,抓住那些可能从手机指缝或马桶旁流失的时间,学点有趣的知识。当然,后续的深入学习仍然需要你们自己努力。

sed的基本用法

使用s命令进行替换

我们以以下文本作为演示:

$ cat pets.txt
This is my parrot
  my parrot's name is Alice
This is my rabbit
  my rabbit's name is Bob
This is my turtle
  my turtle's name is Charlie
This is my hamster
  my hamster's name is Daisy
    

将文本中的“my”替换为“John’s”,以下命令便能实现(s表示替换命令,/my/用于匹配字符串“my”,/John’s/表示将匹配项替换为“John’s”,而/g表示在一行中替换所有匹配项):

$ sed "s/my/John's/g" pets.txt
This is John's parrot
  John's parrot's name is Alice
This is John's rabbit
  John's rabbit's name is Bob
This is John's turtle
  John's turtle's name is Charlie
This is John's hamster
  John's hamster's name is Daisy
    

注意:如果你要使用单引号,那么无法通过\’来转义,这时可以使用双引号,在双引号内通过\”来实现转义。

再注意:上面的sed命令并不会直接修改文件内容,只是将处理结果输出。如果需要将结果写入文件,可以使用重定向,例如:

$ sed "s/my/John's/g" pets.txt > modified_pets.txt

或者使用 -i 参数直接修改文件内容:

$ sed -i "s/my/John's/g" pets.txt

在每一行最前面添加内容:

$ sed 's/^/#/g' pets.txt
#This is my parrot
#  my parrot's name is Alice
#This is my rabbit
#  my rabbit's name is Bob
#This is my turtle
#  my turtle's name is Charlie
#This is my hamster
#  my hamster's name is Daisy
    

在每一行最后面添加内容:

$ sed 's/$/ --- /g' pets.txt
This is my parrot ---
  my parrot's name is Alice ---
This is my rabbit ---
  my rabbit's name is Bob ---
This is my turtle ---
  my turtle's name is Charlie ---
This is my hamster ---
  my hamster's name is Daisy ---
    

顺便介绍一下正则表达式的一些基础知识:

  • ^ 表示行首,例如:/^#/ 匹配以 # 开头的行。
  • $ 表示行尾,例如:/}$/ 匹配以 } 结尾的行。
  • \< 表示单词的开头,例如:\<abc 匹配以 abc 开头的单词。
  • \> 表示单词的结尾,例如:abc\> 匹配以 abc 结尾的单词。
  • . 表示任意单个字符。
  • * 表示某字符出现0次或多次。
  • [ ] 表示字符集合,例如:[abc] 匹配 abc[a-zA-Z] 匹配所有字母。如果集合内以 ^ 开头,表示取反,例如:[^a] 匹配除 a 之外的字符。

正则表达式可以做许多有趣的事情,例如去掉HTML中的标签:

<b>This</b> is what <span style="text-decoration: underline;">I</span> meant. Understand?
    

来看一下我们的sed命令

# 如果按照这种方式使用,可能会遇到问题
$ sed 's/<.*>//g' html.txt
理解了吗?

# 为了解决上述问题,我们需要这样操作。
# 其中的'[^>]'表示除了'>'之外的字符,可以出现0次或多次。
$ sed 's/<[^>]*>//g' html.txt
这就是我的意思。明白了吗?

接下来我们来看看指定需要替换的内容:

$ sed "3s/my/your/g" pets.txt
This is my cat
  my cat's name is betty
This is your dog
  your dog's name is frank
This is my fish
  my fish's name is george
This is my goat
  my goat's name is adam

以下命令只会替换第3到第6行的文本。

$ sed "3,6s/my/your/g" pets.txt
This is my cat
  my cat's name is betty
This is your dog
  your dog's name is frank
This is your fish
  your fish's name is george
This is my goat
  my goat's name is adam

 

$ cat my.txt
This is my cat, my cat's name is betty
This is my dog, my dog's name is frank
This is my fish, my fish's name is george
This is my goat, my goat's name is adam

只替换每行中的第一个's':

$ sed 's/s/S/1' my.txt
ThiS is my cat, my cat's name is betty
ThiS is my dog, my dog's name is frank
ThiS is my fish, my fish's name is george
ThiS is my goat, my goat's name is adam

只替换每行中的第二个's':

$ sed 's/s/S/2' my.txt
This iS my cat, my cat's name is betty
This iS my dog, my dog's name is frank
This iS my fish, my fish's name is george
This iS my goat, my goat's name is adam

只替换第一行中第3个及之后的's':

$ sed 's/s/S/3g' my.txt
This is my cat, my cat'S name iS betty
This is my dog, my dog's name iS frank
This is my fish, my fish'S name iS george
This is my goat, my goat'S name iS adam

多个模式匹配

如果我们需要一次替换多个模式,可以参考以下示例:(第一个模式将第一行到第三行的"my"替换为"your",第二个模式则把第三行之后的"This"替换为"That")

$ sed '1,3s/my/your/g; 3,$s/This/That/g' my.txt
This is your lion, your lion's name is tom
This is your rabbit, your rabbit's name is mike
That is your bird, your bird's name is john
That is my cow, my cow's name is peter

上面的命令等价于:(注意:下面使用的是sed的-e命令行参数)

sed -e '1,3s/my/your/g' -e '3,$s/This/That/g' my.txt

我们还可以使用&符号作为被匹配的变量,然后可以在它的两边加上其他内容。如下所示:

$ sed 's/my/[&]/g' my.txt
This is [my] lion, [my] lion's name is tom
This is [my] rabbit, [my] rabbit's name is mike
This is [my] bird, [my] bird's name is john
This is [my] cow, [my] cow's name is peter

圆括号匹配

圆括号匹配的示例:(圆括号包裹的正则表达式所匹配的字符串可以作为变量使用,在sed中使用的是\1, \2等形式)

$ sed 's/This is my \([^,&]*\),.*is \(.*\)/\1:\2/g' my.txt
lion:tom
rabbit:mike
bird:john
cow:peter

上述示例中的正则表达式略显复杂,解读如下(去除转义字符):

正则表达式为:This is my ([^,])*\,.*is (.*)
匹配结果为:This is my (lion),......is (tom)

其中:\1代表lion,\2代表tom

sed命令

让我们回到最初的例子,pets.txt,来看一些命令:

N命令

首先来看N命令——它会将下一行的内容加入到当前的缓冲区进行匹配。

下面的示例将会把原文本中的偶数行与奇数行合并匹配,由于s命令只匹配并替换一次,最终得到的结果如下:

$ sed 'N;s/my/your/' pets.txt
This is your lion
  my lion's name is tom
This is your rabbit
  my rabbit's name is mike
This is your bird
  my bird's name is john
This is your cow
  my cow's name is peter

也就是说,原始文件变成了:

This is my lion\n  my lion's name is tom
This is my rabbit\n  my rabbit's name is mike
This is my bird\n  my bird's name is john
This is my cow\n  my cow's name is peter

接下来,下面的例子你就能理解了:

$ sed 'N;s/\n/,/' pets.txt
This is my lion,  my lion's name is tom
This is my rabbit,  my rabbit's name is mike
This is my bird,  my bird's name is john
This is my cow,  my cow's name is peter
a命令和i命令

a命令表示append(追加),i命令表示insert(插入)。它们用于在文件中添加新的一行。例如:

# 其中的1i表示在第1行前插入一行(insert)
$ sed "1 i This is my tiger, my tiger's name is simba" my.txt
This is my tiger, my tiger's name is simba
This is my lion, my lion's name is leo
This is my rabbit, my rabbit's name is max
This is my parrot, my parrot's name is rio
This is my cow, my cow's name is milo

# 其中的1a表示在最后一行后追加一行(append)
$ sed "$ a This is my tiger, my tiger's name is simba" my.txt
This is my lion, my lion's name is leo
This is my tiger, my tiger's name is simba
This is my rabbit, my rabbit's name is max
This is my parrot, my parrot's name is rio
This is my cow, my cow's name is milo

我们也可以根据匹配的内容来添加文本:

# 注意其中的/fish/a,表示匹配到/fish/之后,追加一行
$ sed "/parrot/a This is my tiger, my tiger's name is simba" my.txt
This is my lion, my lion's name is leo
This is my rabbit, my rabbit's name is max
This is my parrot, my parrot's name is rio
This is my tiger, my tiger's name is simba
This is my cow, my cow's name is milo

以下示例展示了如何在每一行后插入内容:

$ sed "/my/a ----" my.txt
This is my lion, my lion's name is leo
----
This is my rabbit, my rabbit's name is max
----
This is my parrot, my parrot's name is rio
----
This is my cow, my cow's name is milo
----
c命令

c命令用于替换匹配的行。

$ sed "2 c This is my tiger, my tiger's name is simba" my.txt
This is my lion, my lion's name is leo
This is my tiger, my tiger's name is simba
This is my rabbit, my rabbit's name is max
This is my parrot, my parrot's name is rio
This is my cow, my cow's name is milo

$ sed "/rabbit/c This is my tiger, my tiger's name is simba" my.txt
This is my lion, my lion's name is leo
This is my dog, my dog's name is charlie
This is my tiger, my tiger's name is simba
This is my parrot, my parrot's name is rio
This is my cow, my cow's name is milo
d命令

d命令用于删除匹配的行。

$ sed '/rabbit/d' my.txt
This is my lion, my lion's name is leo
This is my dog, my dog's name is charlie
This is my parrot, my parrot's name is rio
This is my cow, my cow's name is milo

$ sed '2d' my.txt
This is my lion, my lion's name is leo
This is my rabbit, my rabbit's name is max
This is my parrot, my parrot's name is rio
This is my cow, my cow's name is milo

$ sed '2,$d' my.txt
This is my lion, my lion's name is leo
p命令

p命令用于打印匹配的行。

你可以将这个命令理解为类似于grep的功能。

# 匹配到rabbit并打印,注意输出会显示重复的那一行,
# 这是因为sed处理时会输出所有的信息
$ sed '/rabbit/p' my.txt
This is my lion, my lion's name is leo
This is my dog, my dog's name is charlie
This is my rabbit, my rabbit's name is max
This is my rabbit, my rabbit's name is max
This is my parrot, my parrot's name is rio
This is my cow, my cow's name is milo

# 使用-n参数就不会出现重复输出
$ sed -n '/rabbit/p' my.txt
This is my rabbit, my rabbit's name is max

# 从一个模式开始到另一个模式结束
$ sed -n '/dog/,/rabbit/p' my.txt
This is my dog, my dog's name is charlie
This is my rabbit, my rabbit's name is max

# 从第一行打印到匹配到rabbit的那一行
$ sed -n '1,/rabbit/p' my.txt
This is my lion, my lion's name is leo
This is my dog, my dog's name is charlie
This is my rabbit, my rabbit's name is max

几个知识点

接下来,我们将介绍四个sed的基本知识点:

Pattern Space

第一个知识点是关于-n参数的。可能你还不太理解,没关系。我们可以通过查看sed处理文本的伪代码来了解Pattern Space的概念:

foreach line in file {
    // 将行放入Pattern_Space
    Pattern_Space <= line;

    // 对每个Pattern_Space执行sed命令
    Pattern_Space <= EXEC(sed_cmd, Pattern_Space);

    // 如果没有指定 -n 参数,则输出处理后的Pattern_Space
    if (sed option hasn't "-n")  {
       print Pattern_Space
    }
}
Address

第二个知识点是关于address的。几乎所有的命令都适用(注:其中的!表示是否在匹配成功后执行命令)。

[address[,address]][!] {cmd}

address可以是一个数字,也可以是一个模式。你可以通过逗号分隔两个address表示一个区间,在该区间内执行命令cmd。伪代码如下:

bool bexec = false
foreach line in file {
    if ( match(address1) ){
        bexec = true;
    }

    if ( bexec == true) {
        EXEC(sed_cmd);
    }

    if ( match (address2) ) {
        bexec = false;
    }
}

关于address,你还可以使用相对位置,例如:

# 其中的+3表示从匹配行开始,连续3行进行操作
$ sed '/rabbit/,+3s/^/# /g' pets.txt
This is my lion
  my lion's name is leo
# This is my rabbit
#   my rabbit's name is max
# This is my parrot
#   my parrot's name is rio
This is my cow
  my cow's name is milo
命令打包

第二个知识点是cmd可以是多个命令,它们可以通过分号分隔,或者使用大括号将多个命令括起来作为嵌套命令。下面是一些示例:

$ cat pets.txt
This is my tiger
  my tiger's name is alice
This is my rabbit
  my rabbit's name is tom
This is my bird
  my bird's name is sam
This is my horse
  my horse's name is john

# 对第3行到第6行,执行命令/This/d
$ sed '3,6 {/This/d}' pets.txt
This is my tiger
  my tiger's name is alice
  my rabbit's name is tom
  my bird's name is sam
This is my horse
  my horse's name is john

# 对第3行到第6行,先匹配/This/成功后,再匹配/bird/,成功后执行d命令
$ sed '3,6 {/This/{/bird/d}}' pets.txt
This is my tiger
  my tiger's name is alice
This is my rabbit
  my rabbit's name is tom
  my bird's name is sam
This is my horse
  my horse's name is john

# 从第一行到最后一行,如果匹配到This,则删除该行;如果前面有空格,则去除空格
$ sed '1,${/This/d;s/^ *//g}' pets.txt
my tiger's name is alice
my rabbit's name is tom
my bird's name is sam
my horse's name is john 
Hold Space

第三个知识点我们来看一下 Hold Space 的概念。

接下来,我们将详细了解 Hold Space。首先,先介绍四个常用的命令:

g:将 hold space 中的内容复制到 pattern space 中,原来的 pattern space 内容会被清除。
G:将 hold space 中的内容追加到 pattern space 后面。
h:将 pattern space 中的内容复制到 hold space 中,原有的 hold space 内容会被清除。
H:将 pattern space 中的内容追加到 hold space 后面。
x:交换 pattern space 和 hold space 的内容。

这些命令有什么实际作用呢?让我们通过两个示例来理解。以下是我们将使用的示例文件:

$ cat t.txt
one
two
three

第一个示例:

$ sed 'H;g' t.txt
one

one
two

one
two
three

看起来有些不太清楚吧?不用担心,脑海中多琢磨几次就好了。

第二个示例:我们通过命令反转文件中的行顺序:

$ sed '1!G;h;$!d' t.txt
three
two
one

其中的命令 `1!G;h;$!d` 可以拆解成三个部分:

  • 1!G —— 仅在第一行不执行 G 命令,将 hold space 中的内容追加回 pattern space。
  • h —— 所有行都会执行 h 命令,将 pattern space 中的内容复制到 hold space 中。
  • $!d —— 除了最后一行不执行 d 命令,其他行都会执行 d 命令,将当前行删除。

这个命令序列可能有些复杂,别担心,脑海中多琢磨几次就好了。

sed的内容就演示到这里吧,希望对你有所帮助。

阅读完毕,很棒哦!

文章评论

站点信息

  • 网站地址:www.xiaodahan.com
  • 我的QQ: 3306916637