Menu
  • HOME
  • TAGS

default field separator for awk

Tag: linux,unix,awk,posix,separator

Sorry for this stupid question, searched but not confident is the right answer is found, so the default separator is only space for awk?

Best How To :

Here's a pragmatic summary that applies to all major Awk implementations:

  • GNU Awk (gawk) - the default awk in some Linux distros
  • Mawk (mawk) - the default awk in some Linux distros (e.g., Ubuntu)
  • BSD Awk - a.k.a. BWK Awk - the default awk on BSD-like platforms, including OSX

On Linux, awk -W version will tell you which implementation the default awk is.
BSD Awk only understands awk --version (which GNU Awk understands in addition to awk -W version).

Recent versions of all these implementations follow the POSIX standard with respect to field separators[1] (but not record separators).

Glossary:

  • RS is the input-record separator, which describes how the input is broken into records:

    • The POSIX-mandated default value is a newline, also referred to as \n below; that is, input is broken into lines by default.
    • On awk's command line, RS can be specified as -v RS=<sep>.
    • POSIX restricts RS to a literal, single-character value, but GNU Awk and Mawk support multi-character values that may be extended regular expressions (BSD Awk does not support that).
  • FS is the input-field separator, which describes how each record is split into fields; it may be an extended regular expression.

    • On awk's command line, FS can be specified as -F <sep> (or -v FS=<sep>).
    • The POSIX-mandated default value is formally a space (0x20), but that space is not literally interpreted as the (only) separator, but has special meaning; see below.

By default:

  • any run of spaces and/or tabs and/or newlines is treated as a field separator
  • with leading and trailing runs ignored.

The POSIX spec. uses the abstraction <blank> for spaces and tabs, which is true for all locales, but could comprise additional characters in specific locales - I don't know if any such locales exist.

Note that with the default input-record separator (RS), \n, newlines typically do not enter the picture as field separators, because no record itself contains \n in that case.

Newlines as field separators do come into play, however:

  • When RS is set to a value that results in records themselves containing \n instances (such as when RS is set to the empty string; see below).
  • Generally, when the split() function is used to split a string into array elements without an explicit-field separator argument.
    • Even though the input records won't contain \n instances in case the default RS is in effect, the split() function when invoked without an explicit field-separator argument on a multi-line string from a different source (e.g., a variable passed via the -v option or as a pseudo-filename) always treats \n as a field separator.

Important NON-default considerations:

  • Assigning the empty string to RS has special meaning: it reads the input in paragraph mode, meaning that the input is broken into records by runs of non-empty lines, with leading and trailing runs of empty lines ignored.

  • When you assign anything other than a literal space to FS, the interpretation of FS changes fundamentally:

    • A single character or each character from a specified character set is recognized individually as a field separator - not runs of it, as with the default.
      • For instance, setting FS to [ ] - even though it effectively amounts to a single space - causes every individual space instance in each record to be treated as a field separator.
      • To recognize runs, the regex quantifier (duplication symbol) + must be used; e.g., [\t]+ would recognize runs of tabs as a single separator.
    • Leading and trailing separators are NOT ignored, and, instead, separate empty fields.
    • Setting FS to the empty string means that each character of a record is its own field.
  • As mandated by POSIX, if RS is set to the empty string (paragraph mode), newlines (\n) are also considered field separators, irrespective of the value of FS.

[1] Unfortunately, GNU Awk up to at least version 4.1.3 complies with an obsolete POSIX standard with respect to field separators when you use the option to enforce POSIX compliance, -P (--posix): with that option in effect and RS set to a non-empty value, newlines (\n instances) are NOT recognized as field separators. The GNU Awk manual spells out the obsolete behavior (but neglects to mention that it doesn't apply when RS is set to the empty string). The POSIX standard changed in 2008 (see comments) to also consider newlines field separators when FS has its default value - as GNU Awk has always done without -P (--posix).
Here are 2 commands that verify the behavior described above:
* With -P in effect and RS set to the empty string, \n is still treated as a field separator:
gawk -P -F' ' -v RS='' '{ printf "<%s>, <%s>\n", $1, $2 }' <<< $'a\nb'
* With -P in effect and a non-empty RS, \n is NOT treated as a field separator - this is the obsolete behavior:
gawk -P -F' ' -v RS='|' '{ printf "<%s>, <%s>\n", $1, $2 }' <<< $'a\nb'
A fix is coming, according to the GNU Awk maintainers; expect it in version 4.2 (no time frame given).
(Tip of the hat to @JohnKugelman and @EdMorton for their help.)

Finding the average of a column excluding certain rows using AWK

linux,bash,awk,scripting

Through awk, $ awk '$5!="99999"{sum+=$5}END{print sum}' file 227.5 Explanation: $5!="99999" if 5th column does not contain 99999, then do {sum+=$5} adding the value of 5th column to the variable sum. Likewise it keeps adding the value of 5th column when awk see's the record which satisfies the given condition. Finally...

Shell script to loop over files with same names but different extensions

linux,bash,shell

anubhava's solution is excellent if, as they do in your example, the extensions sort into the right order. For the more general case, where sorting cannot be relied upon, we can specify the argument order explicitly: for f in *.ext1 do program "$f" "${f%.ext1}.ext2" done This will work even if...

Bash modify CSV to change a field

linux,bash,awk

Please save following awk script as awk.src: function date_str(val) { Y = substr(val,0,4); M = substr(val,5,2); D = substr(val,7,2); date = sprintf("%s-%s-%s",Y,M,D); return date; } function time_str(val) { h = substr(val,9,2); m = substr(val,11,2); s = substr(val,13,2); time = sprintf("%s:%s:%s",h,m,s); return time; } BEGIN { FS="|" } # ## MAIN...

Using an ad-hoc libc with a tool which is an argument of another tool

linux,shared-libraries

You can achieve that by using the env utility: timeout 10 /usr/bin/env LD_LIBRARY_PATH=/path/to/mod/libc/ cp a b Env will set the environment variable and exec the other utility with that environment....

Ignore first few lines and last few lines in a file Linux

linux,awk

awk cannot look ahead so you'll have to save the lines. awk 'NR>2{if(z!="")print z;z=y;y=x;x=$0}' file Practically zero memory overhead...

How to extract single-/multiline regex-matching items from an unpredictably formatted file and put each one in a single line into output file?

linux,shell,unix,replace,grep

Assuming that your document is well-formed, i.e. <b> opening tags always match with a </b> closing tag, then this may be what you need: sed '[email protected]<[/]\?b>@\n&\[email protected]' path/to/input.txt | awk 'BEGIN {buf=""} /<b>/ {Y=1; buf=""} /<\/b>/ {Y=0; print buf"</b>"} Y {buf = buf$0} ' | tr -s ' ' Output: <b>data1</b>...

How to make new line when using echo to write a file in C

c,linux,file,echo,system

There is one new line, which is to be expected. The echo command prints all its arguments on a single line separated by spaces, which is the output you see. You need to execute the result of: echo "$(ls %s)" to preserve the newlines in the ls output. See Capturing...

How to append entry the end of a multi-line entry using any of stream editors like sed or awk

linux,bash,awk,sed,sh

Here's a sed version: /^Host_Alias/{ # whenever we match Host_Alias at line start : /\\$/{N;b} # if backslash, append next line and repeat s/$/,host25/ # add the new host to end of line } If you need to add your new host to just one of the host aliases, adjust...

AWK count number of times a term appear with respect to other columns

linux,shell,command-line,awk,sed

Almost same as the other answer, but printing 0 instead of blank. AMD$ awk -F, 'NR>1{a[$2]+=$3;b[$2]++} END{for(i in a)print i, a[i], b[i]}' File pear 1 1 apple 2 3 orange 0 1 peach 0 1 Taking , as field seperator. For all lines except the first, update array a. i.e...

Linux-wget command

linux,shell,wget

Try this to create a string variable n, with no leading whitespace (thanks @011c): n="10.0.0.135.527" wget http://infamvn:8081/nexus/content/groups/LDM_REPO_LIN64/com/infa/com.infa.products.ldm.ingestion.server.scala/"$n"-SNAPSHOT/com.infa.products.ldm.ingestion.server.scala-"$n"-20150622.210643-1-sources.jar ...

sed string with special character New

linux,sed,special-characters

Escape / with \: sed -i 's/mrm.fr.mycompany.com/10.70.89.40:8081\/artifactory/' config.xml Or use this: sed -i 's|mrm.fr.mycompany.com|10.70.89.40:8081/artifactory|' config.xml ...

How does the kernel separate threads from processes

linux,multithreading,linux-kernel

Unlike Windows, Linux does not have an implementation of "threads" in the kernel. The kernel gives us what are sometimes called "lightweight processes", which are a generalization of the concepts of "processes" and "threads", and can be used to implement either. It may be confusing when you read kernel code...

how to modify an array value with given index?

arrays,linux,bash

You don't need the quotes. Just use ${i}, or even $i: pomme[${i}]="" Or pomme[$i]="" ...

sed and PHP tags

regex,linux,sed

.* is greedy: it matches all possible characters. This way, even sed 's/<?php.*//' file will also delete all the content in your file. To prevent this greediness of .*, say "everything but a ?" -> [^?]*: sed 's/<?php[^?]*?><?php[^?]*?>//' file Test $ cat a <?php echo 'first' ?><?php echo 'second' ?><?php...

storing 'du' result in a variable [duplicate]

bash,unix,putty

Like so FOO="$(du -m myfile.csv)" echo "$FOO" Output 1.25 myfile.csv ...

Get system startup time (without reading /proc/uptime)

php,linux

You don't need the -s flag to determine the uptime. If you do something like this you have the time the server is running: $tmp = explode(' ', exec('uptime')); $uptime = $tmp[2]; // something like 2:14 (hh:mm) nb: an alternative would be to use the who -b command, which will...

Syncing Vagrant VMs across different physical servers

linux,vagrant,backup,virtual-machine,sync

Vagrant doesn't inherently support this, since it's intended audience is really development environments. It seems like you're looking for something more like what VMWare vSphere does.

Django MySQLClient pip compile failure on Linux

python,linux,django,gcc,pip

It looks like you're missing zlib; you'll want to install it: apt-get install zlib1g-dev I also suggest reading over the README and confirming you have all other dependencies met: https://github.com/dccmx/mysqldb/blob/master/README Also, I suggest using mysqlclient over MySQLdb as its a fork of MySQLdb and what Django recommends....

fread(), solaris to unix portability and use of uninitialised values

c,linux,memory,stack,portability

Q 1. why is ch empty even after fread() assignment? (Most probably) because fread() failed. See the detailed answer below. Q 2.Is this a portability issue between Solaris and Linux? No, there is a possible issue with your code itself, which is correctly reported by valgrind. I cannot quite...

ret_from_syscall source code and when it is called

linux,linux-kernel,kernel,linux-device-driver,system-calls

The ret_from_syscall symbol will be in architecture-specific assembly code (it does not exist for all architectures). I would look in arch/XXX/kernel/entry.S. It's not actually a function. It is part of the assembly code that handles the transition from user-space into kernel-space for a system call. It's simply a label to...

Delete some lines from text using Linux command

linux,shell,sed,grep,pattern-matching

The -v option to grep inverts the search, reporting only the lines that don't match the pattern. Since you know how to use grep to find the lines to be deleted, using grep -v and the same pattern will give you all the lines to be kept. You can write...

Java read bytes from Socket on Linux

linux,windows,sockets,network-programming,raspberry-pi

InputStream input = client.getInputStream(); BufferedReader in = new BufferedReader(new InputStreamReader(input)); Your problem is here. You can't use multiple inputs on a socket when one or more of them is buffered. The buffered input stream/reader will read-ahead and 'steal' data from the other stream. You need to change your protocol so...

debian 8: deb command not found. How can i fix it? [closed]

linux,debian,deb

The instruction is Add "contrib" and "non-free" components to /etc/apt/sources.list, for example I.e., you're supposed to add that line to the given file with a text editor. You are not supposed to execute it on a command line....

How to check which symbols on my shared library have non-position independent code (PIC)?

linux,gcc,debian,powerpc

To find which symbols made your elf non-PIC/PIE (Position Independent Code/Executable), use scanelf from pax-utils package (on ubuntu, install it with sudo apt-get install pax-utils): $ scanelf -qT /usr/local/lib/libluajit-5.1.so.2.1.0 | head -n 3 libluajit-5.1.so.2.1.0: buf_grow [0x7694] in (optimized out: previous lj_BC_MODVN) [0x7600] libluajit-5.1.so.2.1.0: buf_grow [0x769C] in (optimized out: previous lj_BC_MODVN)...

Linux - sh script - download multiple files from FTP

linux,ftp,sh

wget expect can be tricky to work with so I'd prefer to use GNU Wget as an alternative. The following should work as long as you don’t have any spaces in any of the arguments. for v in "${files_to_download[@]}" do ftp_file="${v}.bz2" wget --user=${USER} --password=${PASSWD} ${HOST}/${ftp_file} done Request multiple resources using...

linux running command as root from c code that run as normal user

c++,linux

A workaround is to modify the sudoers file and remove the requirement of a password from your user ID for a particular script to have sudo privileges. Enter sudo visudo After this, add the details in the following manner. username ALL=(ALL) NOPASSWD: /path/to/script Another method would be to pipe the...

Git post-receive hook is not executed

linux,git,githooks,git-post-receive

The hook file is incorrectly named post-reveive.

AWK write to new column base on if else of other column

linux,bash,shell,awk,sed

You can use: awk -F, 'NR>1 {$0 = $0 FS (($4 >= 0.7) ? 1 : 0)} 1' test_file.csv ...

Calling find more than once on the same folder tree

linux,bash,shell,unix,find

Try this: find . -mmin +35 -or -mmin -25 find supports several logical operators (-and, -or, -not). See the OPERATORS section of the man pages for more details. ==================== EDIT: In response to the question about processing the two matches differently, I do not know of a way to do...

Force linux to use php as php55

php,linux,fedora

You can create an alias: alias php="php55" Now if you type php it uses php55...

Identifying when a file is changed- Bash

bash,shell,unix

I would store the output of find, and if non-empty, echo the line break: found=$(find . -name "${myarray[i]}") if [[ -n $found ]]; then { echo "$found"; echo "<br>"; } >> "$tmp" fi ...

What are correct permissions for Linux Apache2 PHP 5.3 log file?

php,linux,apache,logging,permissions

I'd simply set its owner to apache user. This will give you the name of apache user : ps aux | grep httpd In my case (CentOS), it's 'apache' but sometimes it's 'www-data'... chown apache:apache /var/log/httpd/php_errors.log chmod 600 /var/log/httpd/php_errors.log ...

Change a Script to a For Do Done Loop

linux,bash,for-loop,awk

Turns out the code wasn't invalid (had to correct some quoting issues) but that the folder was corrupt when i tried to use it in the bash script. Here is the working code with the correct double quotes around the directory variables. #!/bin/bash #file location XMLDIR='/home/amoore19/XML/00581-001/scores' NEWXML='/home/amoore19/XML/00581-001' #this gives me...

NASM: copying a pointer from a register to a buffer in .data

linux,assembly,nasm,x86-64

The problem is, you don't have debug info for the ptr type, so gdb treats it as integer. You can examine its real contents using: (gdb) x/a &ptr 0x600124 <ptr>: 0x7fffffffe950 (gdb) p/a $rsp $3 = 0x7fffffffe950 Of course I have a different value for rsp than you, but you...

How to extract first letters of dashed separated words in a bash variable?

linux,string,bash,shell,variables

This isn't the shortest method, but it doesn't require any external processes. IFS=- read -a words <<< $MY_TEXT for word in "${words[@]}"; do MY_INITIALS+=${word:0:1}; done ...

make error during building webkitgtk

linux,makefile,cmake,make

as you see, in Edit1, you (make) try to run JavaScriptCore-4.0.gir instead of compile it with g-ir-compiler; I tried on my pc and my command is: cd /home/davide/src/webkitgtk-2.8.3/build/Source/JavaScriptCore && \ /usr/bin/g-ir-compiler /home/davide/src/webkitgtk-2.8.3/build/JavaScriptCore-4.0.gir \ -o /home/davide/src/webkitgtk-2.8.3/build/JavaScriptCore-4.0.typelib as a workaround, you cand edit: build/Source/JavaScriptCore/CMakeFiles/JavascriptCore-4-gir.dir/build here's the lines on my file (the last...

What does it indicate if /proc/PID/maps shows zero for all addresses?

linux,linux-kernel

I found the discussion in Valgrind mail list when someone had the same problem. The issue was that the kernel have been patched with PaX patches, one of which doesn't allow to look at the /proc/pid/maps. The quote about the patch from wikipedia The second and third classes of attacks...

How to look at the top 30 lines (or head) of all files inside a directory?

linux

You can use globbing: head -n 10 *.cpp > all_headers.txt The above command exports the first 10 lines of all cpp files in a folder into all_headers.txt. According to Aereaux's comment you should also use the -q option of head since otherwise head would print the file name before the...

Mounting GEOM_ELI Encrypted ZFS Pool as root

unix,encryption,freebsd,boot,zfs

Turns out I was correct. The daXp4.eli files are necessary as it's the metadata of each disk. A reference point if you will. By performing: geli backup /dev/daXp4 /boot/daXp4.eli It create the meta files required for geom to attempt a decryption at boot time. I hope this helps someone else...

How can I resolve the “Could not fix timestamps in …” “…Error: The requested feature is not implemented.”

linux,build,f#

This is usually a sign that you should update your mono. Older mono versions have issues with their unzip implementation

Use Unix Executable File to Run Shell Script and MPKG File

osx,shell,unix

The most common issue when handling variables containing paths of directories and files is the presence of special characters such as spaces. To handle those correctly, you should always quote the variables, using double quotes. Better code would therefor be: sudo sh "$path/join.sh" sudo sh "$path/join2.sh" It is also advised...

BASH - conditional sum of columns and rows in csv file

linux,bash,csv,awk

This awk program will print the modified header and modify the output to contain the sums and their division: awk 'BEGIN {FS=OFS=";"} (NR==1) {$10="results/time"; print $0} (NR>1 && NF) {sum8[$10]+=$8; sum9[$10]+=$9; other[$10]=$0} END {for (i in sum8) {$0=other[i]; $8=sum8[i]; $9=sum9[i]; $10=(sum9[i]?sum8[i]/sum9[i]:"NaN"); print}}' which gives: Date;dbms;type;description;W;D;S;results;time;results/time Mon Jun 15 14:22:20 CEST...

Why can I view some Unix executable files in Mac OS X and not others?

git,bash,shell,unix,binary

Executable files may be scripts (in which case you can read the text), or binaries (which are ELF formatted machine code). Your shell script is a script; git is an ELF binary. You can use the file command to see more detail. For example, on my nearest Linux system: $...

Capture tee's argument inside piped Perl execution

perl,unix

The short answer is - you can't. tee is a separate process with it's own arguments. There is no way to access these arguments from that process. (well, I suppose you could run ps or something). The point of tee is to take STDOUT write some of it to a...

pass enter key from Java to Shell script

java,unix,jsch

According to the JSch javadoc, you must call setInputStream() or getOutputStream() before connect(). You can only do one of these, once. For your purposes, getOutputStream() seems more appropriate. Once you have an OutputStream, you can wrap it in a PrintWriter to make sending commands easier. Similarly you can use channel.getInputStream()...

Extracting columns within a range AWK

unix,awk

You want to test for 0.75-0.8 but wrote code to test for 0.7-0.75 and you forgot to specify what to test in the second part of your condition. Do this: awk '$2 >= 0.75 && $2 <= 0.8' Also note that you want a numeric comparison not a string comparison...

While loop in bash using variable from txt file

linux,bash,rhel

As indicated in the comments, you need to provide "something" to your while loop. The while construct is written in a way that will execute with a condition; if a file is given, it will proceed until the read exhausts. #!/bin/bash file=Sheetone.txt while IFS= read -r line do echo sh...

Extra backslash when storing grep in a value

linux,bash

The output from set -x uses single quotes. So the outer double quotes were replaced with single quotes but you can't escape single quotes inside a single quoted string so when it then replaced the inner double quotes it needed, instead, to replace them with '\'' which ends the single...