您现在的位置是:网站首页>技术百科技术百科
sed 使用教程
小大寒2024-01-01[技术百科]博学多闻
sed 使用教程本文介绍了SED的基本用法,重点演示了如何使用s命令替换文本、在行首或行尾添加内容,以及通过正则表达式实现高级文本处理。通过实例说明了SED的强大功能,帮助读者高效地编辑文本。
awk诞生于1977年,比awk年长2-3年的sed更像是个兄长。就像《射雕英雄传》中,awk像蓉儿,而sed则像靖哥哥。因此,本站的另一篇awk的文章“蓉儿妹妹跳了个Topless”后,他的靖哥哥sed也坐不住了,也必须出场展示一番。
sed的全称是stream editor,即流编辑器,通过编程的方式来处理文本,极具黑客风格。sed的核心功能是基于正则表达式的模式匹配,因此,精通sed的人往往在正则表达式方面也非常娴熟。
本篇文章不会全面介绍sed的所有功能,你可以参考sed官方手册。这里,我希望能与你们分享一些技巧,抓住那些可能从手机指缝或马桶旁流失的时间,学点有趣的知识。当然,后续的深入学习仍然需要你们自己努力。
sed的基本用法
使用s命令进行替换
我们以以下文本作为演示:
$ cat pets.txt This is my parrot my parrot's name is Alice This is my rabbit my rabbit's name is Bob This is my turtle my turtle's name is Charlie This is my hamster my hamster's name is Daisy
将文本中的“my”替换为“John’s”,以下命令便能实现(s
表示替换命令,/my/
用于匹配字符串“my”,/John’s/
表示将匹配项替换为“John’s”,而/g
表示在一行中替换所有匹配项):
$ sed "s/my/John's/g" pets.txt This is John's parrot John's parrot's name is Alice This is John's rabbit John's rabbit's name is Bob This is John's turtle John's turtle's name is Charlie This is John's hamster John's hamster's name is Daisy
注意:如果你要使用单引号,那么无法通过\’
来转义,这时可以使用双引号,在双引号内通过\”
来实现转义。
再注意:上面的sed命令并不会直接修改文件内容,只是将处理结果输出。如果需要将结果写入文件,可以使用重定向,例如:
$ sed "s/my/John's/g" pets.txt > modified_pets.txt
或者使用 -i
参数直接修改文件内容:
$ sed -i "s/my/John's/g" pets.txt
在每一行最前面添加内容:
$ sed 's/^/#/g' pets.txt #This is my parrot # my parrot's name is Alice #This is my rabbit # my rabbit's name is Bob #This is my turtle # my turtle's name is Charlie #This is my hamster # my hamster's name is Daisy
在每一行最后面添加内容:
$ sed 's/$/ --- /g' pets.txt This is my parrot --- my parrot's name is Alice --- This is my rabbit --- my rabbit's name is Bob --- This is my turtle --- my turtle's name is Charlie --- This is my hamster --- my hamster's name is Daisy ---
顺便介绍一下正则表达式的一些基础知识:
^
表示行首,例如:/^#/
匹配以#
开头的行。$
表示行尾,例如:/}$/
匹配以}
结尾的行。\<
表示单词的开头,例如:\<abc
匹配以abc
开头的单词。\>
表示单词的结尾,例如:abc\>
匹配以abc
结尾的单词。.
表示任意单个字符。*
表示某字符出现0次或多次。[ ]
表示字符集合,例如:[abc]
匹配a
、b
或c
;[a-zA-Z]
匹配所有字母。如果集合内以^
开头,表示取反,例如:[^a]
匹配除a
之外的字符。
正则表达式可以做许多有趣的事情,例如去掉HTML中的标签:
<b>This</b> is what <span style="text-decoration: underline;">I</span> meant. Understand?
来看一下我们的sed命令
# 如果按照这种方式使用,可能会遇到问题 $ sed 's/<.*>//g' html.txt 理解了吗? # 为了解决上述问题,我们需要这样操作。 # 其中的'[^>]'表示除了'>'之外的字符,可以出现0次或多次。 $ sed 's/<[^>]*>//g' html.txt 这就是我的意思。明白了吗?
接下来我们来看看指定需要替换的内容:
$ sed "3s/my/your/g" pets.txt This is my cat my cat's name is betty This is your dog your dog's name is frank This is my fish my fish's name is george This is my goat my goat's name is adam
以下命令只会替换第3到第6行的文本。
$ sed "3,6s/my/your/g" pets.txt This is my cat my cat's name is betty This is your dog your dog's name is frank This is your fish your fish's name is george This is my goat my goat's name is adam
$ cat my.txt This is my cat, my cat's name is betty This is my dog, my dog's name is frank This is my fish, my fish's name is george This is my goat, my goat's name is adam
只替换每行中的第一个's':
$ sed 's/s/S/1' my.txt ThiS is my cat, my cat's name is betty ThiS is my dog, my dog's name is frank ThiS is my fish, my fish's name is george ThiS is my goat, my goat's name is adam
只替换每行中的第二个's':
$ sed 's/s/S/2' my.txt This iS my cat, my cat's name is betty This iS my dog, my dog's name is frank This iS my fish, my fish's name is george This iS my goat, my goat's name is adam
只替换第一行中第3个及之后的's':
$ sed 's/s/S/3g' my.txt This is my cat, my cat'S name iS betty This is my dog, my dog's name iS frank This is my fish, my fish'S name iS george This is my goat, my goat'S name iS adam
多个模式匹配
如果我们需要一次替换多个模式,可以参考以下示例:(第一个模式将第一行到第三行的"my"替换为"your",第二个模式则把第三行之后的"This"替换为"That")
$ sed '1,3s/my/your/g; 3,$s/This/That/g' my.txt This is your lion, your lion's name is tom This is your rabbit, your rabbit's name is mike That is your bird, your bird's name is john That is my cow, my cow's name is peter
上面的命令等价于:(注意:下面使用的是sed的-e命令行参数)
sed -e '1,3s/my/your/g' -e '3,$s/This/That/g' my.txt
我们还可以使用&符号作为被匹配的变量,然后可以在它的两边加上其他内容。如下所示:
$ sed 's/my/[&]/g' my.txt This is [my] lion, [my] lion's name is tom This is [my] rabbit, [my] rabbit's name is mike This is [my] bird, [my] bird's name is john This is [my] cow, [my] cow's name is peter
圆括号匹配
圆括号匹配的示例:(圆括号包裹的正则表达式所匹配的字符串可以作为变量使用,在sed中使用的是\1, \2等形式)
$ sed 's/This is my \([^,&]*\),.*is \(.*\)/\1:\2/g' my.txt lion:tom rabbit:mike bird:john cow:peter
上述示例中的正则表达式略显复杂,解读如下(去除转义字符):
正则表达式为:This is my ([^,])*\,.*is (.*)
匹配结果为:This is my (lion),......is (tom)
其中:\1代表lion,\2代表tom
sed命令
让我们回到最初的例子,pets.txt,来看一些命令:
N命令
首先来看N命令——它会将下一行的内容加入到当前的缓冲区进行匹配。
下面的示例将会把原文本中的偶数行与奇数行合并匹配,由于s命令只匹配并替换一次,最终得到的结果如下:
$ sed 'N;s/my/your/' pets.txt This is your lion my lion's name is tom This is your rabbit my rabbit's name is mike This is your bird my bird's name is john This is your cow my cow's name is peter
也就是说,原始文件变成了:
This is my lion\n my lion's name is tom This is my rabbit\n my rabbit's name is mike This is my bird\n my bird's name is john This is my cow\n my cow's name is peter
接下来,下面的例子你就能理解了:
$ sed 'N;s/\n/,/' pets.txt This is my lion, my lion's name is tom This is my rabbit, my rabbit's name is mike This is my bird, my bird's name is john This is my cow, my cow's name is peter
a命令和i命令
a命令表示append(追加),i命令表示insert(插入)。它们用于在文件中添加新的一行。例如:
# 其中的1i表示在第1行前插入一行(insert) $ sed "1 i This is my tiger, my tiger's name is simba" my.txt This is my tiger, my tiger's name is simba This is my lion, my lion's name is leo This is my rabbit, my rabbit's name is max This is my parrot, my parrot's name is rio This is my cow, my cow's name is milo # 其中的1a表示在最后一行后追加一行(append) $ sed "$ a This is my tiger, my tiger's name is simba" my.txt This is my lion, my lion's name is leo This is my tiger, my tiger's name is simba This is my rabbit, my rabbit's name is max This is my parrot, my parrot's name is rio This is my cow, my cow's name is milo
我们也可以根据匹配的内容来添加文本:
# 注意其中的/fish/a,表示匹配到/fish/之后,追加一行 $ sed "/parrot/a This is my tiger, my tiger's name is simba" my.txt This is my lion, my lion's name is leo This is my rabbit, my rabbit's name is max This is my parrot, my parrot's name is rio This is my tiger, my tiger's name is simba This is my cow, my cow's name is milo
以下示例展示了如何在每一行后插入内容:
$ sed "/my/a ----" my.txt This is my lion, my lion's name is leo ---- This is my rabbit, my rabbit's name is max ---- This is my parrot, my parrot's name is rio ---- This is my cow, my cow's name is milo ----
c命令
c命令用于替换匹配的行。
$ sed "2 c This is my tiger, my tiger's name is simba" my.txt This is my lion, my lion's name is leo This is my tiger, my tiger's name is simba This is my rabbit, my rabbit's name is max This is my parrot, my parrot's name is rio This is my cow, my cow's name is milo $ sed "/rabbit/c This is my tiger, my tiger's name is simba" my.txt This is my lion, my lion's name is leo This is my dog, my dog's name is charlie This is my tiger, my tiger's name is simba This is my parrot, my parrot's name is rio This is my cow, my cow's name is milo
d命令
d命令用于删除匹配的行。
$ sed '/rabbit/d' my.txt This is my lion, my lion's name is leo This is my dog, my dog's name is charlie This is my parrot, my parrot's name is rio This is my cow, my cow's name is milo $ sed '2d' my.txt This is my lion, my lion's name is leo This is my rabbit, my rabbit's name is max This is my parrot, my parrot's name is rio This is my cow, my cow's name is milo $ sed '2,$d' my.txt This is my lion, my lion's name is leo
p命令
p命令用于打印匹配的行。
你可以将这个命令理解为类似于grep的功能。
# 匹配到rabbit并打印,注意输出会显示重复的那一行, # 这是因为sed处理时会输出所有的信息 $ sed '/rabbit/p' my.txt This is my lion, my lion's name is leo This is my dog, my dog's name is charlie This is my rabbit, my rabbit's name is max This is my rabbit, my rabbit's name is max This is my parrot, my parrot's name is rio This is my cow, my cow's name is milo # 使用-n参数就不会出现重复输出 $ sed -n '/rabbit/p' my.txt This is my rabbit, my rabbit's name is max # 从一个模式开始到另一个模式结束 $ sed -n '/dog/,/rabbit/p' my.txt This is my dog, my dog's name is charlie This is my rabbit, my rabbit's name is max # 从第一行打印到匹配到rabbit的那一行 $ sed -n '1,/rabbit/p' my.txt This is my lion, my lion's name is leo This is my dog, my dog's name is charlie This is my rabbit, my rabbit's name is max
几个知识点
接下来,我们将介绍四个sed的基本知识点:
Pattern Space
第一个知识点是关于-n参数的。可能你还不太理解,没关系。我们可以通过查看sed处理文本的伪代码来了解Pattern Space的概念:
foreach line in file { // 将行放入Pattern_Space Pattern_Space <= line; // 对每个Pattern_Space执行sed命令 Pattern_Space <= EXEC(sed_cmd, Pattern_Space); // 如果没有指定 -n 参数,则输出处理后的Pattern_Space if (sed option hasn't "-n") { print Pattern_Space } }
Address
第二个知识点是关于address的。几乎所有的命令都适用(注:其中的!表示是否在匹配成功后执行命令)。
[address[,address]][!] {cmd}
address可以是一个数字,也可以是一个模式。你可以通过逗号分隔两个address表示一个区间,在该区间内执行命令cmd。伪代码如下:
bool bexec = false foreach line in file { if ( match(address1) ){ bexec = true; } if ( bexec == true) { EXEC(sed_cmd); } if ( match (address2) ) { bexec = false; } }
关于address,你还可以使用相对位置,例如:
# 其中的+3表示从匹配行开始,连续3行进行操作 $ sed '/rabbit/,+3s/^/# /g' pets.txt This is my lion my lion's name is leo # This is my rabbit # my rabbit's name is max # This is my parrot # my parrot's name is rio This is my cow my cow's name is milo
命令打包
第二个知识点是cmd可以是多个命令,它们可以通过分号分隔,或者使用大括号将多个命令括起来作为嵌套命令。下面是一些示例:
$ cat pets.txt This is my tiger my tiger's name is alice This is my rabbit my rabbit's name is tom This is my bird my bird's name is sam This is my horse my horse's name is john # 对第3行到第6行,执行命令/This/d $ sed '3,6 {/This/d}' pets.txt This is my tiger my tiger's name is alice my rabbit's name is tom my bird's name is sam This is my horse my horse's name is john # 对第3行到第6行,先匹配/This/成功后,再匹配/bird/,成功后执行d命令 $ sed '3,6 {/This/{/bird/d}}' pets.txt This is my tiger my tiger's name is alice This is my rabbit my rabbit's name is tom my bird's name is sam This is my horse my horse's name is john # 从第一行到最后一行,如果匹配到This,则删除该行;如果前面有空格,则去除空格 $ sed '1,${/This/d;s/^ *//g}' pets.txt my tiger's name is alice my rabbit's name is tom my bird's name is sam my horse's name is john
Hold Space
第三个知识点我们来看一下 Hold Space 的概念。
接下来,我们将详细了解 Hold Space。首先,先介绍四个常用的命令:
g:将 hold space 中的内容复制到 pattern space 中,原来的 pattern space 内容会被清除。
G:将 hold space 中的内容追加到 pattern space 后面。
h:将 pattern space 中的内容复制到 hold space 中,原有的 hold space 内容会被清除。
H:将 pattern space 中的内容追加到 hold space 后面。
x:交换 pattern space 和 hold space 的内容。
这些命令有什么实际作用呢?让我们通过两个示例来理解。以下是我们将使用的示例文件:
$ cat t.txt one two three
第一个示例:
$ sed 'H;g' t.txt one one two one two three
看起来有些不太清楚吧?不用担心,脑海中多琢磨几次就好了。
第二个示例:我们通过命令反转文件中的行顺序:
$ sed '1!G;h;$!d' t.txt three two one
其中的命令 `1!G;h;$!d` 可以拆解成三个部分:
- 1!G —— 仅在第一行不执行 G 命令,将 hold space 中的内容追加回 pattern space。
- h —— 所有行都会执行 h 命令,将 pattern space 中的内容复制到 hold space 中。
- $!d —— 除了最后一行不执行 d 命令,其他行都会执行 d 命令,将当前行删除。
这个命令序列可能有些复杂,别担心,脑海中多琢磨几次就好了。
sed的内容就演示到这里吧,希望对你有所帮助。
阅读完毕,很棒哦!
上一篇:可扩展架构