How can I replace a character within a specific context in each line of the whole file?

Question

I have a large file which contains hundreds of English phrases in the following form:

\phrase {. . . * * } {I shoul-d've stayed home.} {aɪ ʃʊd‿əv ˈsteɪd ˈhoʊm.} <- only replace on this line \phrase { . . * } {Did you eat?} {dɪdʒjʊʷˈit? ↗} <- only replace on this line \phrase { * . * . * . . . * . } {Yeah, I made some pas-ta if you're hun-gry.} {ˈjɛə, aɪ ˈmeɪd səm ˈpɑ stəʷɪf jər ˈhʌŋ gri.} <- only replace on this line

It's a LaTeX .tex file. I would like to replace all r characters in each phonetic transcription (by phonetic transcription I mean every third line after the \phrase line) with the ɹ symbol (hex code U+0279).

Doing it by hand in Emacs is cumbersome for me. I was wondering if there is a way to target those lines somehow and do the replacement automatically.

All r characters have to be replaced with ɹ, there is no exception, but only in the phonetic transcription, leave the r as-is in the English/non-phonetic text.

Is it possible to do that somehow by using a script or something? There are no line breaks in my document so the transcription is alway the third line after \phrase. Thank you!

It doesn't make sense to say simultaneously both that (1) you only want to replace on certain lines, and (2) there are no line breaks. If there are no line breaks, there's only one line. Do you mean there are no LaTeX line breaks ` \\ `, or no line breaks inside each brace group? — frabjous, CommentedMay 3, 2022 at 14:07
Does this answer your question? Relative line number after the match in sed — Chris Davies, CommentedMay 3, 2022 at 14:09
Worked from that suggested duplicate, sed '/^\\phrase/,+3 { /^\\phrase/,+2 !{ s/r/ɹ/g } }' — Chris Davies, CommentedMay 3, 2022 at 14:15
I meant no line breaks inside the brace group. I don't use `\\` in Latex because there is no need for it. — Zoltan King, CommentedMay 3, 2022 at 14:29
Just to be nit-picky, "only replace on this line", which is English text, must not be replaced by the solutions, but they do anyway :D -> "All r characters have to be replaced with ɹ, there is no exception, but only in the phonetic transcription, leave the r as-is in the English/non-phonetic text." — Aravindh Krishnamoorthy, CommentedMay 4, 2022 at 12:16

Archemar · Accepted Answer · 2022-05-03 14:19:01Z

an awk version (you'll need a relay file, you can one-line it)

awk '/\\phrase/ { p=NR ; } NR == p+3 { gsub("r","ɹ") ; } {print;} ' old-file.tex > new-file.tex

where

/\\phrase/ { p=NR ; } will set p to each line number where \phrase appear
NR == p+3 { gsub("r","ɹ") ; } perform replacement on 3th line after
{print;} print all line.

this gave on your sample :(note the ɹeplace )

\phrase {. . . * * } {I shoul-d've stayed home.} {aɪ ʃʊd‿əv ˈsteɪd ˈhoʊm.} <- only ɹeplace on this line \phrase { . . * } {Did you eat?} {dɪdʒjʊʷˈit? ↗} <- only ɹeplace on this line \phrase { * . * . * . . . * . } {Yeah, I made some pas-ta if you're hun-gry.} {ˈjɛə, aɪ ˈmeɪd səm ˈpɑ stəʷɪf jəɹ ˈhʌŋ gɹi.} <- only ɹeplace on this line

If you change /\\phrase/ { p=NR ; } NR == p+3 to /\\phrase/ { p=NR+3; } NR == p then a) it'll be a bit more efficient as then it only does the addition once every 5 lines instead of every line and, b) it'll be a bit more robust as then it can't undesirably change the 3rd line of the file if it doesn't start with \phrase (e.g. maybe there's a few header lines at the start of the file, idk) — Ed Morton, CommentedMay 5, 2022 at 21:48

thanasisp · Accepted Answer · 2022-05-03 19:55:12Z

11

awk 'c&&!--c {gsub(/r/,"ɹ")} /\\phrase/ {c=3} 1' file > newfile

c&&!--c is a common awk idiom, implementing the whilegetline logic, see reference.

The action following this condition will be executed only when decreasing from one to zero.

When matching a literal '\phrase', we set c=3, so the gsub() will be executed only for the 3rd line after the match, and this is repeating for all matches.

edited May 3, 2022 at 19:55

answered May 3, 2022 at 19:23

thanasisp

8,4722 gold badges29 silver badges40 bronze badges

1
I see the reference calls this a "common awk idiom" but I don't think it's that common. It's clever, though, and I'll be able to follow it next time I see it. But it took a little puzzling to get it.
– mpez0
CommentedMay 4, 2022 at 12:32
1
@mpez0 We could use just !--c, but c&&!--c is safer and economical. !--c alone after printing, will still be decreasing and evaluated, this is two operations. While using c&&!--c, after printing, c remains zero, the part after && is not executed, this is one condition only. Also, decreasing it forever could reach any integer limitations and become true again (I think it is very improbable but not impossible).
– thanasisp
CommentedMay 4, 2022 at 13:06
All numbers in awk are floating point, so the "integer" limitations would be on the the 53-bit mantissa. Gnu's awk has option -M for arbitrary precision integers, but that's still not fixed size.
– mpez0
CommentedMay 4, 2022 at 13:14

Add a comment |

JoL · Accepted Answer · 2022-05-06 01:17:44Z

Since you're on Emacs...

The Evil/Vim Way

If you have evil-mode installed (or you switch to Vim), you can do this:

:g/^\\phrase/+3s/r/ɹ/g

That's the simplest.

The Keyboard Macro Way

Staying with stock Emacs, you can use a keyboard macro: C-x ( C-M-s ^\\phrase Enter C-n C-n C-n C-a C-space C-e C-M-% r Enter ɹ Enter ! C-x ) C-u 2 C-x e

C-x ( starts the macro, C-x ) ends the macro, C-x e runs the macro, C-u 2/C-2 modifies C-x e so it runs the macro 2 times. You can also use a big number like C-u 10000 if you don't want to count. C-M-s searches for a regex. After moving down 3 lines and selecting the line, C-M-% starts a replacement in selection. After the prompts for what replaces what, ! means to accept all replacements in selection.

The Elisp Way

You can also open up the *scratch* buffer and run this (with C-M-x while having the cursor on the code):

(with-current-buffer "foo" (goto-char (point-min)) (while (re-search-forward "^\\\\phrase" nil t) (forward-line 3) (replace-string-in-region "r" "ɹ" (point) (line-end-position))))

where foo is the name of the buffer where you want to do this.

EDIT: replace-string-in-region was introduced in Emacs 28.1 (latest version as of writing). If your Emacs is older, you can use search-forward and replace-match like this instead:

(with-current-buffer "foo" (goto-char (point-min)) (while (re-search-forward "^\\\\phrase" nil t) (forward-line 3) (while (search-forward "r" (line-end-position) t) (replace-match "ɹ"))))

The Shell Command Filter Way

You can also filter the Emacs buffer through an external command, like one of the other answers here: C-x h C-u M-| <command> Enter

C-x h selects the whole buffer. M-| will prompt for the command that will filter the selection. C-u modifies M-| so it replaces the selection with the output instead of putting it in a temporary buffer.

This is what I get when trying the lips version: ibb.co/8gz5jd8 — Zoltan King, CommentedMay 5, 2022 at 18:57
@ZoltanKing replace-string-in-region was introduced in the latest Emacs version 28.1. I've added an alternative for previous versions. — JoL, CommentedMay 5, 2022 at 20:55

terdon · Accepted Answer · 2022-05-03 14:48:44Z

If you always have a blank line between each section, you can try perl's "paragraph" mode to read each section as a single "line":

$ perl -F'\n' -00ane '$F[3]=~s/r/ɹ/g; print join "\n",@F , "\n"' file \phrase {. . . * * } {I shoul-d've stayed home.} {aɪ ʃʊd‿əv ˈsteɪd ˈhoʊm.} <- only ɹeplace on this line \phrase { . . * } {Did you eat?} {dɪdʒjʊʷˈit? ↗} <- only ɹeplace on this line \phrase { * . * . * . . . * . } {Yeah, I made some pas-ta if you're hun-gry.} {ˈjɛə, aɪ ˈmeɪd səm ˈpɑ stəʷɪf jəɹ ˈhʌŋ gɹi.} <- only ɹeplace on this line

Explanation

-a: autosplit each input line into the array @F.
-F'\n': split on newline characters.
-00: "paragraph mode", lines are now defined by \n\n (an empty line), so each section becomes a "line".
-ne: read the input file line by line and apply the script given by -e to each line.
$F[3]=~s/r/ɹ/g;: replace all r with ɹ on the 4th element of the array @F (this is the 4th line of each section; arrays start at 0).
print join "\n",@F , "\n"': join the modified @F array with \n, and then print it along with an extra \n.

If you cannot rely on that and need to always go for the 3rd line after a line matching \phrase, you can do:

$ perl -pe '$k=0 if /\\phrase\b/; $k++; s/r/ɹ/g if $k==4' file \phrase {. . . * * } {I shoul-d've stayed home.} {aɪ ʃʊd‿əv ˈsteɪd ˈhoʊm.} <- only ɹeplace on this line \phrase { . . * } {Did you eat?} {dɪdʒjʊʷˈit? ↗} <- only ɹeplace on this line \phrase { * . * . * . . . * . } {Yeah, I made some pas-ta if you're hun-gry.} {ˈjɛə, aɪ ˈmeɪd səm ˈpɑ stəʷɪf jəɹ ˈhʌŋ gɹi.} <- only ɹeplace on this line

This sets a counter to 0 each time we see \phrase, and increments it by one on each new line. Then, we only do the replacement when the counter's value is 4.

No, sometimes there is a \note section between the phrases. The answer from Archemar works because his method specifically targets the line \phrase then move from there to the desired position. — Zoltan King, CommentedMay 3, 2022 at 14:41
@ZoltanKing OK, see update. But please edit your question and make sure that the excerpt you show adequately represents your data. — terdon, CommentedMay 3, 2022 at 14:46
@ZoltanKing, if those notes are separated by blank lines too, then you could add a test for the \phrase keyword. Something like $F[0] =~ /\\phrase/ or next as the first thing in the one-liner might work (didn't test). But if you can have something like \note<newline>whatever<newline>\phrase... without a blank line in between, then it wouldn't work. — ilkkachu, CommentedMay 3, 2022 at 14:47

Stéphane Chazelas · Accepted Answer · 2022-05-04 09:58:03Z

With standard sed:

sed '/^\\phrase$/{n;n;n;s/r/ɹ/g;}'

y/r/ɹ/ in place of s/r/ɹ/g would also work in POSIX compliant sed implementations provided the ɹ character is regarded as one in the user's locale, but s/r/ɹ/g would be more portable as it would also work with sed implementations that don't support multi-byte characters (as ɹ is in UTF-8; I can't find any character encoding where ɹ is encoded on a single byte).

For that ɹ to be properly encoded in the user's locale, with zsh, you could do:

sed $'/^\\\\phrase$/{n;n;n;s/r/\u0279/g;}'

Where that \u0279 would be expanded to the encoding of that ɹ character in the user's locale¹

^{¹ That $'\uXXXX' is now supported by a few other shells, but beware that in some, that's expanded in the locale as it was when the shell was started or when that line of code was read, not necessarily when the locale in which that sed command is executed. In ksh93, it's always expanded in UTF-8, regardless of the locale of the user. When the character is not available in the locale's charset, the behaviour also varies between shells. It causes an error in zsh}

Using ed may result in simpler commands than sed, since we wouldn't need all those n commands.. Good point about the care needed with y and multibyte characters! — Toby Speight, CommentedMay 5, 2022 at 6:58
@TobySpeight The Ed way is actually provided by JoL's answer: g/^\\phrase/+3s/r/ɹ/g. Although the Ed command itself is indeed simpler, putting it into action usually looks a bit cumbersome (i.e. printf '%s\n' 'g/^\\phrase/+3s/r/ɹ/g' w q | ed -s file). — Quasímodo, CommentedMay 5, 2022 at 11:17

hobbs · Accepted Answer · 2022-05-04 05:42:25Z

perl -Mutf8 -CSD -pe '$phrase = $. if /\\phrase/; s/r/ɹ/g if $. == $phrase + 3'

fairly strightforward; set flags for unicode handling, remember the line number ($.) if we see \phrase, and do a replacement if the line number is three greater than that.

Chris Davies · Accepted Answer · 2022-05-03 17:00:37Z

Since we're getting other answers here's a worked solution from an almost-duplicate question. This is for GNU sed, but on the linked answer there are also POSIX suggestions:

sed '/^\\phrase/,+3 { /^\\phrase/,+2 !{ s/r/ɹ/g } }'

What this does is take the \phrase (bound to start-of-line) and work with that and the next two lines (+3, starting with the matching line as line one). For the first two lines of this group it does not apply the substitution from r to ɹ (the implication being that for the last line of the group it does apply the substitution).

Output from example:

\phrase {. . . * * } {I shoul-d've stayed home.} {aɪ ʃʊd‿əv ˈsteɪd ˈhoʊm.} <- only ɹeplace on this line \phrase { . . * } {Did you eat?} {dɪdʒjʊʷˈit? ↗} <- only ɹeplace on this line \phrase { * . * . * . . . * . } {Yeah, I made some pas-ta if you're hun-gry.} {ˈjɛə, aɪ ˈmeɪd səm ˈpɑ stəʷɪf jəɹ ˈhʌŋ gɹi.} <- only ɹeplace on this line

jubilatious1 · Accepted Answer · 2022-05-06 02:26:41Z

Using Raku (formerly known as Perl_6)

raku -pe 'state $ph; $ph = 0 if /^ \\phrase $/; s:g/r/ɹ/ if ++$ph == 4;'

You might want to try Raku, since it was built from the ground-up to handle Unicode. The code above (in fact) is very similar to the Perl5 answer posted by @hobbs, in that it uses Raku's -pe autoprinting linewise command-line flags, and counts lines down from the line where \phrase is seen.

For the code above, variable $ph is stated once at the beginning of the program. As the file is read linewise, $ph gets set to 0 when a line containing \phrase and nothing else is encountered (meaning ++$ph == 1 is True). From this point an auto-incrementing test if ++$ph == 4 is performed (counting down 3 lines), which if satisfied, then directs the substitution operator s:g/r/ɹ/ to act :globally within the desired line.

[For Perl aficionados: Raku dispenses with a wide variety of compiler variables like $. in favor of the state variable declarator and associated anonymous state variables, such as $, @, and %. According to the docs, "state declares lexically scoped variables, just like my. However, initialization happens exactly once... .". The $ anonymous state variable in Raku can be used to add line numbers to a text file, i.e. raku -ne 'put ++$ ~ " $_";' ].

Note, because Raku handles Unicode gracefully, the s:g/r/ɹ/ substitution can just as easily be written:

s:g/r/\x0279/

OR

s:g/r/\c[Latin Small Letter Turned R]/

...the above descriptive conversion "Latin Small Letter Turned R" may help when you have font/Unicode-related difficulties (or... if you're just tired of trying to remember Unicode hex-codes).

Sample Output:

\phrase {. . . * * } {I shoul-d've stayed home.} {aɪ ʃʊd‿əv ˈsteɪd ˈhoʊm.} <- only ɹeplace on this line \phrase { . . * } {Did you eat?} {dɪdʒjʊʷˈit? ↗} <- only ɹeplace on this line \phrase { * . * . * . . . * . } {Yeah, I made some pas-ta if you're hun-gry.} {ˈjɛə, aɪ ˈmeɪd səm ˈpɑ stəʷɪf jəɹ ˈhʌŋ gɹi.} <- only ɹeplace on this line

https://en.wikipedia.org/wiki/IPA_Extensions
https://docs.raku.org/syntax/state
https://raku.org

Stack Exchange Network

How can I replace a character within a specific context in each line of the whole file?

8 Answers 8

The Evil/Vim Way

The Keyboard Macro Way

The Elisp Way

The Shell Command Filter Way

Explanation

You must log in to answer this question.

Linked

Hot Network Questions

How can I replace a character within a specific context in each line of the whole file?

8 Answers 8

The Evil/Vim Way

The Keyboard Macro Way

The Elisp Way

The Shell Command Filter Way

Explanation

You must log in to answer this question.

Linked

Related

Hot Network Questions