Menu
  • HOME
  • TAGS

BASH - conditional sum of columns and rows in csv file

Tag: linux,bash,csv,awk

i have CSV file with some database benchmark results here is the example:

Date;dbms;type;description;W;D;S;results;time;id
Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;570;265;50
Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;420;215;50
Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;500;365;50
Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;530;255;50

Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;870;265;99
Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;620;215;99
Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;700;365;99
Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;530;255;99

i need to process all rows with the same id (value of the last column) and get this:

Date;dbms;type;description;W;D;S;time;results;results/time
Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;sum column 8;sum column 9;(sum column 8 / sum column 9)
Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;sum column 8;sum column 9;(sum column 8 / sum column 9)

for now i can only do the sum of column 8 with the awk command:

awk -F";" '{print;sum+=$8 }END{print "sum " sum}' ./file.CSV

Edit:
need help with some modification of script iam already using. here are real input data:

Date;dbms;type;description;W;D;time;TotalTransactions;NOTransactions;id
Mon Jun 15 14:53:41 CEST 2015;sqlite;in-memory;TPC-C test results;2;1;10;272270;117508;50
Mon Jun 15 15:03:46 CEST 2015;sqlite;in-memory;TPC-C test results;2;1;10;280080;110063;50
Mon Jun 15 15:13:53 CEST 2015;sqlite;in-memory;TPC-C test results;5;1;10;144170;31815;60
Mon Jun 15 15:13:53 CEST 2015;sqlite;in-memory;TPC-C test results;5;1;10;137570;33910;60
Mon Jun 15 15:24:04 CEST 2015;hsql;in-memory;TPC-C test results;2;1;10;226660;97734;70
Mon Jun 15 15:34:08 CEST 2015;hsql;in-memory;TPC-C test results;2;1;10;210420;95113;70
Mon Jun 15 15:44:16 CEST 2015;hsql;in-memory;TPC-C test results;5;1;10;288360;119328;80
Mon Jun 15 15:44:16 CEST 2015;hsql;in-memory;TPC-C test results;5;1;10;270360;124328;80

i need to sum values in time, TotalTransactions and NOTransactions columns and then add a column with value (sum NOTransactions/sum time)

iam using this script:

awk 'BEGIN {FS=OFS=";"}
(NR==1) {$10="results/time"; print $0} 
(NR>1 && NF) {sum7[$10]+=$7; sum8[$10]+=$8; sum9[$10]+=$9;  other[$10]=$0} 
 END {for (i in sum8) 
        {$0=other[i]; $7=sum7[i];$8=sum8[i]; $9=sum9[i]; $10=sprintf("%.0f", sum9[i]/sum7[i]); print}}' ./logsFinal.csv

gives me this output:

;;;;;;;;;results/time
Mon Jun 15 15:03:46 CEST 2015;sqlite;in-memory;TPC-C test results;2;1;20;552350;227571;11379
Mon Jun 15 15:13:53 CEST 2015;sqlite;in-memory;TPC-C test results;5;1;20;281740;65725;3286
Mon Jun 15 15:34:08 CEST 2015;hsql;in-memory;TPC-C test results;2;1;20;437080;192847;9642
Mon Jun 15 15:44:16 CEST 2015;hsql;in-memory;TPC-C test results;5;1;20;558720;243656;12183
Date;dbms;type;description;W;D;0;0;0;-nan

values looks good (except header row). But i need to get these results without id column (i want delete id column) So i need to get same values but instead of identify processed rows with same values in id column it must be rows with same values in dbms AND W AND D columns

Best How To :

This awk program will print the modified header and modify the output to contain the sums and their division:

awk 'BEGIN {FS=OFS=";"} 
     (NR==1) {$10="results/time"; print $0} 
     (NR>1 && NF) {sum8[$10]+=$8; sum9[$10]+=$9; other[$10]=$0} 
     END {for (i in sum8) 
            {$0=other[i]; $8=sum8[i]; $9=sum9[i]; $10=(sum9[i]?sum8[i]/sum9[i]:"NaN"); print}}'

which gives:

Date;dbms;type;description;W;D;S;results;time;results/time
Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;2020;1100;1.83636
Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;2720;1100;2.47273

You don't seem to care for the ID in the result, but if you do, just replace $10= with $11=.

Also, if you need to sum things based on values of more than one column, you can create a temporary variable (a in the example below) which is a concatenation of two columns and use it as an index in the arrays, like this:

awk 'BEGIN {FS=OFS=";"}
     (NR==1) {$10="results/time"; print $0}
     (NR>1 && NF) {a=$5$6; sum8[a]+=$8; sum9[a]+=$9; other[a]=$0}
     END {for (i in sum8)
            {$0=other[i]; $8=sum8[i]; $9=sum9[i]; $10=(sum9[i]?sum8[i]/sum9[i]:"NaN"); print}}'

Matching string inside file and returning result

regex,string,bash,shell,grep

Using sqlite3 from bash on OS X seems fairly straightforward (I'm no expert at this, by the way). You will need to find out which table you need. You can do this with an interactive session. I'll show you with the database you suggested: /Users/fredbloggs> sqlite3 ~/Library/Application\ Support/Dock/desktoppicture.db SQLite version...

How to append entry the end of a multi-line entry using any of stream editors like sed or awk

linux,bash,awk,sed,sh

Here's a sed version: /^Host_Alias/{ # whenever we match Host_Alias at line start : /\\$/{N;b} # if backslash, append next line and repeat s/$/,host25/ # add the new host to end of line } If you need to add your new host to just one of the host aliases, adjust...

linux running command as root from c code that run as normal user

c++,linux

A workaround is to modify the sudoers file and remove the requirement of a password from your user ID for a particular script to have sudo privileges. Enter sudo visudo After this, add the details in the following manner. username ALL=(ALL) NOPASSWD: /path/to/script Another method would be to pipe the...

How does the kernel separate threads from processes

linux,multithreading,linux-kernel

Unlike Windows, Linux does not have an implementation of "threads" in the kernel. The kernel gives us what are sometimes called "lightweight processes", which are a generalization of the concepts of "processes" and "threads", and can be used to implement either. It may be confusing when you read kernel code...

AWK count number of times a term appear with respect to other columns

linux,shell,command-line,awk,sed

Almost same as the other answer, but printing 0 instead of blank. AMD$ awk -F, 'NR>1{a[$2]+=$3;b[$2]++} END{for(i in a)print i, a[i], b[i]}' File pear 1 1 apple 2 3 orange 0 1 peach 0 1 Taking , as field seperator. For all lines except the first, update array a. i.e...

Ignore first few lines and last few lines in a file Linux

linux,awk

awk cannot look ahead so you'll have to save the lines. awk 'NR>2{if(z!="")print z;z=y;y=x;x=$0}' file Practically zero memory overhead...

AWK|BASH, use double FS and ternary operator

bash,awk

You can use this awk command awk -F, '{if (split($1, a, " ") > 2) print $NF; else print}' file This will output rosario escuela estadist, medellin medellin If you want to get rid of the space before $NF, use this awk command awk -F, '{if (split($1, a, " ")...

What are correct permissions for Linux Apache2 PHP 5.3 log file?

php,linux,apache,logging,permissions

I'd simply set its owner to apache user. This will give you the name of apache user : ps aux | grep httpd In my case (CentOS), it's 'apache' but sometimes it's 'www-data'... chown apache:apache /var/log/httpd/php_errors.log chmod 600 /var/log/httpd/php_errors.log ...

How to extract single-/multiline regex-matching items from an unpredictably formatted file and put each one in a single line into output file?

linux,shell,unix,replace,grep

Assuming that your document is well-formed, i.e. <b> opening tags always match with a </b> closing tag, then this may be what you need: sed '[email protected]<[/]\?b>@\n&\[email protected]' path/to/input.txt | awk 'BEGIN {buf=""} /<b>/ {Y=1; buf=""} /<\/b>/ {Y=0; print buf"</b>"} Y {buf = buf$0} ' | tr -s ' ' Output: <b>data1</b>...

Delete some lines from text using Linux command

linux,shell,sed,grep,pattern-matching

The -v option to grep inverts the search, reporting only the lines that don't match the pattern. Since you know how to use grep to find the lines to be deleted, using grep -v and the same pattern will give you all the lines to be kept. You can write...

make error during building webkitgtk

linux,makefile,cmake,make

as you see, in Edit1, you (make) try to run JavaScriptCore-4.0.gir instead of compile it with g-ir-compiler; I tried on my pc and my command is: cd /home/davide/src/webkitgtk-2.8.3/build/Source/JavaScriptCore && \ /usr/bin/g-ir-compiler /home/davide/src/webkitgtk-2.8.3/build/JavaScriptCore-4.0.gir \ -o /home/davide/src/webkitgtk-2.8.3/build/JavaScriptCore-4.0.typelib as a workaround, you cand edit: build/Source/JavaScriptCore/CMakeFiles/JavascriptCore-4-gir.dir/build here's the lines on my file (the last...

What does it indicate if /proc/PID/maps shows zero for all addresses?

linux,linux-kernel

I found the discussion in Valgrind mail list when someone had the same problem. The issue was that the kernel have been patched with PaX patches, one of which doesn't allow to look at the /proc/pid/maps. The quote about the patch from wikipedia The second and third classes of attacks...

While loop in bash using variable from txt file

linux,bash,rhel

As indicated in the comments, you need to provide "something" to your while loop. The while construct is written in a way that will execute with a condition; if a file is given, it will proceed until the read exhausts. #!/bin/bash file=Sheetone.txt while IFS= read -r line do echo sh...

Macports switch PHP CLI version

php,bash,drupal,terminal,macports

Do not modify files in /usr/bin. That's Apple's turf, and there are always other possibilities to avoid changing things there, especially since Apple's next update will happily revert these changes again and scripts might rely on /usr/bin/php being exactly the version Apple shipped with the OS. Put the original binary...

linux - running a process and tailing a file simultaneously

bash,shell,tail

I would simply start the tail in background and the python process in foreground. When the python process finishes you can kill the tail, like this: #!/bin/bash touch /tmp/out # Make sure that the file exists tail -f /tmp/out & pid=$! python test.py kill "$pid" ...

How to change svn:externals from bash file non-interactive

bash,svn,svn-externals

Use svn ps svn:externals svn://hostname/branchname -F extenals.txt http://svnbook.red-bean.com/en/1.8/svn.ref.svn.c.propset.html...

NASM: copying a pointer from a register to a buffer in .data

linux,assembly,nasm,x86-64

The problem is, you don't have debug info for the ptr type, so gdb treats it as integer. You can examine its real contents using: (gdb) x/a &ptr 0x600124 <ptr>: 0x7fffffffe950 (gdb) p/a $rsp $3 = 0x7fffffffe950 Of course I have a different value for rsp than you, but you...

shell script cut from variables

bash,shell,shellcode

With GNU grep: grep -oP 'aaa&\K.*' file Output: 123 456 \K: ignore everything before pattern matching and ignore pattern itself From man grep: -o, --only-matching Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line. -P, --perl-regexp Interpret PATTERN as a...

AWK write to new column base on if else of other column

linux,bash,shell,awk,sed

You can use: awk -F, 'NR>1 {$0 = $0 FS (($4 >= 0.7) ? 1 : 0)} 1' test_file.csv ...

Linux-wget command

linux,shell,wget

Try this to create a string variable n, with no leading whitespace (thanks @011c): n="10.0.0.135.527" wget http://infamvn:8081/nexus/content/groups/LDM_REPO_LIN64/com/infa/com.infa.products.ldm.ingestion.server.scala/"$n"-SNAPSHOT/com.infa.products.ldm.ingestion.server.scala-"$n"-20150622.210643-1-sources.jar ...

Python: can't access newly defined environment variables

python,bash,environment-variables

After updating your .bashrc, perform source ~/.bashrc to apply the changes. Also, merge the two BONSAI-related calls into one: export BONSAI=/home/me/Utils/bonsai_v3.2 UPDATE: It was actually an attempt to update the environment for some Eclipse-based IDE. This is a different usecase altogether. It should be described in the Eclipse help. Also,...

ret_from_syscall source code and when it is called

linux,linux-kernel,kernel,linux-device-driver,system-calls

The ret_from_syscall symbol will be in architecture-specific assembly code (it does not exist for all architectures). I would look in arch/XXX/kernel/entry.S. It's not actually a function. It is part of the assembly code that handles the transition from user-space into kernel-space for a system call. It's simply a label to...

Assign and use of a variable in the same subshell

bash,scope,subshell

Let's look to the POSIX specification to understand why this behaves as it does, not just in bash but in any compliant shell: 2.10.2, Shell Grammar Rules From rule 7(b), covering cases where an assignment precedes a simple command: If all the characters preceding '=' form a valid name (see...

Why does `sort file > file` result in an empty file? [duplicate]

bash

With sort afile > afile this happens: The shell opens and truncates afile because of the file direction operation > afile The shell executes the sort program with one argument, afile, and binds stdout of the new process to the file descriptor opened in step 1. The sort program opens...

How do I check whether a file or file directory exist in bash?

bash,if-statement

Checking file and/or directory existence To check whether a file exists in bash, you use the -f operator. For directories, use -d. Example usage: $ mkdir dir $ [ -d dir ] && echo exists! exists! $ rmdir dir $ [ -d dir ] && echo exists! $ touch file...

Syncing Vagrant VMs across different physical servers

linux,vagrant,backup,virtual-machine,sync

Vagrant doesn't inherently support this, since it's intended audience is really development environments. It seems like you're looking for something more like what VMWare vSphere does.

Extra backslash when storing grep in a value

linux,bash

The output from set -x uses single quotes. So the outer double quotes were replaced with single quotes but you can't escape single quotes inside a single quoted string so when it then replaced the inner double quotes it needed, instead, to replace them with '\'' which ends the single...

Django MySQLClient pip compile failure on Linux

python,linux,django,gcc,pip

It looks like you're missing zlib; you'll want to install it: apt-get install zlib1g-dev I also suggest reading over the README and confirming you have all other dependencies met: https://github.com/dccmx/mysqldb/blob/master/README Also, I suggest using mysqlclient over MySQLdb as its a fork of MySQLdb and what Django recommends....

Using an ad-hoc libc with a tool which is an argument of another tool

linux,shared-libraries

You can achieve that by using the env utility: timeout 10 /usr/bin/env LD_LIBRARY_PATH=/path/to/mod/libc/ cp a b Env will set the environment variable and exec the other utility with that environment....

bash interactive script pass input

bash,command-line-arguments

If you want to redirect the normal standard input of the program, you could use so called "here documents" (see e.g. the BASH manual page): java -jar script.jar <<EOF your input here EOF That means standard input (a.k.a. stdin) is redirected and will be the text in the "here document",...

Identifying when a file is changed- Bash

bash,shell,unix

I would store the output of find, and if non-empty, echo the line break: found=$(find . -name "${myarray[i]}") if [[ -n $found ]]; then { echo "$found"; echo "<br>"; } >> "$tmp" fi ...

Replace [a-z],[a-z] with [a-z], [a-z] and keep the letters

bash,awk,sed

What I have tried sed 's/[a-z],[a-z]/[a-z], [a-z]/g' <<< "suny stony brook, stony brook,usa." You need to use regex's capture groups here to refer to the original [a-z] values. For example: s/\([a-z]\),\([a-z]\)/\1, \2/g Notice how I've surrounded those [a-z] with \( and \)? These form capture groups that can be...

Java read bytes from Socket on Linux

linux,windows,sockets,network-programming,raspberry-pi

InputStream input = client.getInputStream(); BufferedReader in = new BufferedReader(new InputStreamReader(input)); Your problem is here. You can't use multiple inputs on a socket when one or more of them is buffered. The buffered input stream/reader will read-ahead and 'steal' data from the other stream. You need to change your protocol so...

shell script for counting replacements

bash,replace,count

Assuming you want to replace the word 'apple' with 'banana' (exact match) in the contents of the files and not on the names of the files (see my comment above) and that you are using the bash shell: #!/bin/bash COUNTER=0 for file in *.txt ; do COUNTER=$(grep -o "\<apple\>" $file...

Bash alias function with predefined argument

bash

Simply, add it in your .bashrc like: alias gl10="gitLog 10" To apply the changes: source ~/.bashrc...

Bash modify CSV to change a field

linux,bash,awk

Please save following awk script as awk.src: function date_str(val) { Y = substr(val,0,4); M = substr(val,5,2); D = substr(val,7,2); date = sprintf("%s-%s-%s",Y,M,D); return date; } function time_str(val) { h = substr(val,9,2); m = substr(val,11,2); s = substr(val,13,2); time = sprintf("%s:%s:%s",h,m,s); return time; } BEGIN { FS="|" } # ## MAIN...

Force linux to use php as php55

php,linux,fedora

You can create an alias: alias php="php55" Now if you type php it uses php55...

Bash script using sed acts differently when passing variable

bash,sed

Your variable is still within single quote hence not getting expanded. Use: sed -i 's|^\('"$CHECK"' = \)*.|\1'6'|' /user/file.txt ...

how to modify an array value with given index?

arrays,linux,bash

You don't need the quotes. Just use ${i}, or even $i: pomme[${i}]="" Or pomme[$i]="" ...

How do I silence the HEAD of a curl request while using the silent flag?

bash,shell,curl,command-line,pipe

Try this: curl --silent "www.site.com" > file.txt ...

How can I resolve the “Could not fix timestamps in …” “…Error: The requested feature is not implemented.”

linux,build,f#

This is usually a sign that you should update your mono. Older mono versions have issues with their unzip implementation

Capitalize all files in a directory using Bash

osx,bash,rename

In Bash 4 you can use parameter expansion directly to capitalize every letter in a word (^^) or just the first letter (^). for f in *; do mv -- "$f" "${f^}" done You can use patterns to form more sophisticated case modifications. But for your specific question, aren't you...

sed and PHP tags

regex,linux,sed

.* is greedy: it matches all possible characters. This way, even sed 's/<?php.*//' file will also delete all the content in your file. To prevent this greediness of .*, say "everything but a ?" -> [^?]*: sed 's/<?php[^?]*?><?php[^?]*?>//' file Test $ cat a <?php echo 'first' ?><?php echo 'second' ?><?php...

Why can I view some Unix executable files in Mac OS X and not others?

git,bash,shell,unix,binary

Executable files may be scripts (in which case you can read the text), or binaries (which are ELF formatted machine code). Your shell script is a script; git is an ELF binary. You can use the file command to see more detail. For example, on my nearest Linux system: $...