Menu
  • HOME
  • TAGS

Perl - Reading Specific Lines from a CSV file

Tag: perl,csv,text-parsing

I'm looking to read a certain "category" from a .csv file that looks something like this:

Category 1, header1, header2, header3,...,
          , data, data, data,...,
          , data, data, data,...,
          , data, data, data,...,
Category 2, header1, header2, header3,...,
          , data, data, data,...,
          , data, data, data,...,
          , data, data, data,...,
Category 3, header1, header2, header3,...,
          , data, data, data,...,
          , data, data, data,...,
          , data, data, data,...

Let's say I wanted to print only the data from a specific "category"... how would I go about doing this?

ie: I want to print Category 2 data, the output should look like:

Category 2, header1, header2, header3,...,
          , data, data, data,...,
          , data, data, data,...,
          , data, data, data,...

Best How To :

Unless your data includes quoted fields, like a,b,c,"complicated field, quoted",e,f,g there is no advantage in using Text::CSV over a simple split /,/.

This example categorizes the data into a hash that you can access simply and directly. I have used Data::Dump only to show the contents of the resulting data structure.

use strict;
use warnings;
use autodie;

open my $fh, '<', 'mydata.csv';

my $category;
my %data;
while (<$fh>) {
  chomp;
  my @data = split /,/;
  my $cat = shift @data;
  $category = $cat if $cat =~ /\S/;
  push @{ $data{$category} }, \@data;
}

use Data::Dumper;
$Data::Dumper::Useqq = 1;
print Dumper \%data;

output

{
  "Category 1" => [
                    [" header1", " header2", " header3", "..."],
                    [" data", " data", " data", "..."],
                    [" data", " data", " data", "..."],
                    [" data", " data", " data", "..."],
                  ],
  "Category 2" => [
                    [" header1", " header2", " header3", "..."],
                    [" data", " data", " data", "..."],
                    [" data", " data", " data", "..."],
                    [" data", " data", " data", "..."],
                  ],
  "Category 3" => [
                    [" header1", " header2", " header3", "..."],
                    [" data", " data", " data", "..."],
                    [" data", " data", " data", "..."],
                    [" data", " data", " data", "..."],
                  ],
}

Update

If all you want is to separate a given section of the file then there is no need to put it into a hash. This program will do what you want.

#!/usr/bin/perl

use strict;
use warnings;
use autodie;

my ($file, $wanted) = @ARGV;

open my $fh, '<', $file;

my $category;

while (<$fh>) {
  my ($cat) = /\A([^,]*)/;
  $category = $cat if $cat =~ /\S/;
  print if $category eq $wanted;
}

Run it like this on the command line

get_category.pl mydata.csv 'Category 2' > cat2.csv

output

Category 2, header1, header2, header3,...,
          , data, data, data,...,
          , data, data, data,...,
          , data, data, data,...

How to match and remove the content preceding it from a file in unix [closed]

mysql,perl,sed,solaris

with GNU sed sed -n '1,/-- Final view structure for view `view_oss_user`/p' this will print lines from 1 till pattern found, others will not be printed or if you want to exclude pattern line then sed -n '1,/-- Final view structure for view `view_oss_user`/p' | sed '$d' ...

Resampling and merging data frame with python

python,csv,pandas,resampling,merging-data

You can concat the two DataFrames, interpolate, then reindex on the DataFrame you want. I assume we have a certain number of DataFrames, where the Date is a DateTimeIndex in all of them. I will use two in this example, since you used two in the question, but the code...

Read CSV and plot colored line graph

python,csv,matplotlib,graph,plot

you need to turn x and y into type np.array before you calculate above_threshold and below_threshold, and then it works. In your version, you don't get an array of bools, but just False and True. I added comma delimiters to your input csv file to make it work (I assume...

type conversion performance optimizable?

c#,xml,csv,optimization,type-conversion

IEnumerable<string> values = new List<string>(); values = … Probably not going to be a big deal, but why create a new List<string>() just to throw it away. Replace this with either: IEnumerable<string> values; values = … If you need values defined in a previous scope, or else just: Enumerable<string> values...

Python CSV reader/writer handling quotes: How can I wrap row fields in quotes? (Getting triple quotes as output)

python,csv

quotechar only indicates what character the writer should use for quoting. It's quote=csv.QUOTE_ALL that you need. Create your writer like this: a = csv.writer(fp, quoting=csv.QUOTE_ALL) quoting defaults to csv.QUOTE_MINIMAL, meaning that it will only quote fields if they contain the delimiter, which is why it's only quoting "JOHNSON, JOHN J."....

Panda's Write CSV - Append vs. Write

python,csv,pandas

Not sure there is a way in pandas but checking if the file exists would be a simple approach: import os # if file does not exist write header if not os.path.isfile('filename.csv'): df.to_csv('filename.csv',header ='column_names') else: # else it exists so append without writing the header df.to_csv('filename.csv',mode = 'a',header=False) ...

Adding time/duration from CSV file

python,python-2.7,csv,datetime

Use datetime.timedelta() objects to model the durations, and pass in the 3 components as seconds, minutes and hours. Parse your file with the csv module; no point in re-inventing the character-separated-values-parsing wheel here. Use a dictionary to track In and Out values per user; using a collections.defaultdict() object will make...

Parse text from a .txt file using csv module

python,python-2.7,parsing,csv

How about using Regular Expression def get_info(string_to_search): res_dict = {} import re find_type = re.compile("Type:[\s]*[\w]*") res = find_type.search(string_to_search) res_dict["Type"] = res.group(0).split(":")[1].strip() find_Status = re.compile("Status:[\s]*[\w]*") res = find_Status.search(string_to_search) res_dict["Status"] = res.group(0).split(":")[1].strip() find_date = re.compile("Date:[\s]*[/0-9]*") res = find_date.search(string_to_search) res_dict["Date"] = res.group(0).split(":")[1].strip() res_dict["description"] =...

Regex in Perl Uninitialized $1

regex,perl

$1 is the value captured by the first capture (()), but you have no captures in your pattern. Fix: /(?<=File `..\/)(.*)(?=')/ Simplified: m{File `../(.*)'} More robust: m{File `../([^']*)'} ...

How to rearrange CSV / JSON keys columns? (Javascript)

javascript,json,csv,papaparse

Papa Parse allows to specify order of fields in the unparse() function: var csv = Papa.unparse({ fields: ["ID", "OrderNumber", "OrderStatus", "StartTime", "FinishTime", "canOp", "OpDesc", "UOM"], data: [{ OrderStatus: "Good", canOp: "True", OpDesc: "Good to go", ID: "100", OrderNumber: "1000101", FinishTime: "20:50", UOM: "K", StartTime: "18:10" }, // ... ] });...

Perl Debugging Using Flags

perl,debugging,script-debugging

In perl, compile time is also run time. So there's really not a great deal of advantage in using #define type statements. My usual trick is: my $debug = 0; $debug += scalar grep ( "-d", @ARGV ); (GetOpt is probably honestly a better plan though) And then use: print...

Create unicode character with pack

perl,unicode

Why do I get ff and not c3bf when using pack ? This is because pack creates a character string, not a byte string. > perl -MDevel::Peek -e 'Dump(pack("U", 0xff));' SV = PV(0x13a6d18) at 0x13d2ce8 REFCNT = 1 FLAGS = (PADTMP,POK,READONLY,pPOK,UTF8) PV = 0xa6d298 "\303\277"\0 [UTF8 "\x{ff}"] CUR =...

-M Script start time minus file modification time, in days

perl,perldoc

I think this explains what you are seeing perl -E 'say "START TIME",$^T; qx(touch $_), sleep(5), say -M for "/tmp/file"; say "STAT ON FILE", (stat(_))[9]' output when I ran it START TIME1434460114 0 STAT ON FILE1434460114 1) script starts $^T is set to 1434460114 2) almost immediately the file "/tmp/file"...

Convert strings of data to “Data” objects in R [duplicate]

r,date,csv

If you read on the R help page for as.Date by typing ?as.Date you will see there is a default format assumed if you do not specify. So to specify for your data you would do nmmaps$date <- as.Date(nmmaps$date, format="%m/%d/%Y") ...

Create an external Hive table from an existing external table

csv,hadoop,hive

I am presuming you want to select distinct data from "uncleaned" table and insert into "cleaned" table. CREATE EXTERNAL TABLE `uncleaned`( `a` int, `b` string, `c` string, `d` string, `e` bigint ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/external/uncleaned' create another...

How can I use a variable to get an Input$ in Shiny?

r,variables,csv,shiny

input is just a reactivevalues object so you can use [[: print(input[[a]]) ...

Opening multiple files in perl array

arrays,perl

You just need to enclose the code that handles a single file in a loop that iterates over all of the log files You should also reconsider the amount of comments that you use. It is far better to write your code and choose identifiers so that the behaviour is...

How Do I Transform This CSV / Tabular Data Into A Different Shape?

excel,csv

Assuming your events are in different columns, something like this would work - Sub test() Dim wsOrig As Worksheet Set wsOrig = ActiveSheet Dim wsDest As Worksheet Set wsDest = Sheets("Sheet2") Dim lcol As Integer lcol = Cells(1, Columns.Count).End(xlToLeft).Column Dim lrow As Integer lrow = Cells(Rows.Count, "A").End(xlUp).row Dim x As...

How to pass a hash as optional argument to -M in command line

perl,hash,package,command-line-interface

Looking at perlrun use: perl -Mfeature=say "-Mconstant {c1 => 'foo', c2 => 'bar'}" -e"say c1,c2" ...

Replace improper commas in CSV file

regex,r,csv

If you need the comments, you still can replace the 6th comma with a semicolon and use your previous solution: gsub("((?:[^,]*,){5}[^,]*),", "\\1;", vec1, perl=TRUE) Regex explanation: ((?:[^,]*,){5}[^,]*) - a capturing group that we will reference to as Group 1 with \\1 in the replacement pattern, matching (?:[^,]*,){5} - 5 sequences...

Version-dependent fallback code

perl

Many recent features are not forward-compatible. As you've seen, you'll get compile-time errors using a feature that is too new for the version of perl you are running. Using block-eval won't help, because the contents of the block need to be valid for the current perl interpreter. You are on...

Specific rows from CSV as dictionary and logic when keys are the same - Python

python,csv,dictionary

You can build the whole dictionary first, with the lists containing all the values for each key. Then once the dictionary is made, you can go through every key and take the largest and smallest values. yourdict = dict() with open(file) as f: filedata = f.read().splitlines() for line in filedata:...

How to stop foreach loop from printing duplicate data?

php,csv,foreach

As your foreach is printing duplicates , it means that your array $csv_tbl contains duplicate values, you can remove duplicate values from array using array_unqiue But also by looking at the screenshots i can see that the callid is different for every record. do your csv contain duplicates?:: $k =1;...

BASH - conditional sum of columns and rows in csv file

linux,bash,csv,awk

This awk program will print the modified header and modify the output to contain the sums and their division: awk 'BEGIN {FS=OFS=";"} (NR==1) {$10="results/time"; print $0} (NR>1 && NF) {sum8[$10]+=$8; sum9[$10]+=$9; other[$10]=$0} END {for (i in sum8) {$0=other[i]; $8=sum8[i]; $9=sum9[i]; $10=(sum9[i]?sum8[i]/sum9[i]:"NaN"); print}}' which gives: Date;dbms;type;description;W;D;S;results;time;results/time Mon Jun 15 14:22:20 CEST...

Exporting Data from Cassandra to CSV file

apache,csv,cassandra,export,export-to-csv

The syntax of your original COPY command is also fine. The problem is with your column named timestamp, which is a data type and is a reserved word in this context. For this reason you need to escape your column name as follows: COPY product (uid, productcount, term, "timestamp") TO...

Is it possible to output to a csv file with multiple sheets?

java,excel,csv

CSV file is interpreted a sequence of characters which comply to some standardization, therefor it cannot contains more than one sheet. You can output your data in a Excel file that contains more than one sheet using the Apache POI api.

Convert delimited string to array and group using LINQ in C#

c#,arrays,linq,csv

public static void Main() { var input = @"**Albert School**: George Branson, Eric Towson, Nancy Vanderburg; **Hallowed Halls**: Ann Crabtree, Jane Goodall, Rick Grey, Tammy Hudson; **XXX University**: Rick Anderson, Martha Zander;"; var universities = input .Split(';') .Select(ParseUniversity) .ToArray(); } public static University ParseUniversity(string line) { var split = line...

Perl: Using Text::CSV to print AoH

arrays,perl,csv

Pretty fundamentally - CSV is an array based data structure - it's a vaguely enhanced version of join. But the thing you need for this job is print_hr from Text::CSV. First you need to set your header order: $csv->column_names (@names); # Set column names for getline_hr () Then you can...

Saying there are 0 arguments when I have 2? Trying to open a CSV file to write into

ruby,file,csv,dir

I believe the problem is with Dir.foreach, not CSV.open. You need to supply a directory to foreach as an argument. That's why you are getting the missing argument error. Try: Dir.foreach('/path/to_my/directory') do |current_file| I think the open that is referenced in the error message is when Dir is trying to...

Why Filter::Indent::HereDoc complain when blank line in middle of HereDoc

perl,heredoc

The documentation says: If there is no terminator string (so the here document stops at the first blank line), then enough whitespace will be stripped out so that the leftmost character of the document will be flush with the left margin, e.g. print <<; Hello, World! # This will print:...

CSV File header part is coming parsing by LINQ

c#,linq,csv

You seem to be doing the Skip(1) in the wrong place: var csvLinesData = csvlines.Skip(1).Select(l => l.Split(',').ToArray()); // IEnumerable<string[]> As it stands you're skipping the first column for each row, not the first row....

Python: isolating re.search results

python,regex,csv

The problem here is that re.search returns a match object not the match string and you need to use group attribute to access your desire result. If you wants all the captured groups you can use groups attribute and for a special group you can pass the number of expected...

Why this exclusion not working for long sentences?

text-processing,perl

You're using the wrong syntax: [] is used to match a character class, while here you're trying to match a number of occurences of ., which can be done using {}: perl -ne 'print unless /.{240,}/' input.txt > output.txt Also, as suggested by salva in the comments, the pattern can...

What does this horribly ugly (yet somehow secretly beautiful) Perl code do?

perl,formatting,deobfuscation

Passing the code to Google returns this page that explains qrpff is a Perl script created by Keith Winstein and Marc Horowitz of the MIT SIPB. It performs DeCSS in six or seven lines. The name itself is an encoding of "decss" in rot-13. See also qrpff explained....

Compare 2 csv files and output different rows to a 3rd CSV file using Python 2.7

python,python-2.7,csv,compare

You're currently checking for rows that exist in the old file but aren't in the new file. That's not what you want to do. Instead, you should check for rows that exist in the the new file, but aren't in the new one: output = [row for row in newList2...

Entity framework to CSV float values

c#,entity-framework,csv

Probably in your case the simplest way to do it is to change the current thread culture then restore it. Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US"); You can also specify a format provider in (prop.GetValue(entityObject, null) ?? "?").ToString() but in this case you probably need to check if prop.GetValue(entityObject, null) is IFormattable...

Compare 2 seperate csv files and write difference to a new csv file - Python 2.7

python,python-2.7,csv,compare

What do you mean by difference? The answer to that gives you two distinct possibilities. If a row is considered same when all columns are same, then you can get your answer via the following code: import csv f1 = open ("olddata/file1.csv") oldFile1 = csv.reader(f1) oldList1 = [] for row...

unable to understand qr interpolation

regex,perl

The chapter of the Perl documentation that deals with this is called perlre. In the extended pattern matching section it explains this. Starting in Perl 5.14, a "^" (caret or circumflex accent) immediately after the "?" is a shorthand equivalent to d-imsx . Flags (except "d" ) may follow the...

Group instances based on NA values in r

r,file,csv,instance,na

df[!is.na(df$Value), ] Size Value Location Num1 Num2 Rent 1 800 900 <NA> 2 2 y 3 1100 1300 uptown 3 3 n 4 1200 1100 <NA> 2 1 y And df[is.na(df$Value), ] Size Value Location Num1 Num2 Rent 2 850 NA midcity NA 3 y 5 1000 NA Lakeview NA...

Check for decimal point and add it at the end if its not there using awk/perl

regex,perl,shell,awk

In awk Just sub for those fields awk -F, -vOFS="," '{sub(/^[^\.]+$/,"&.",$6);sub(/^[^\.]+$/,"&.",$11)}1' file or sed sed 's/^\(\([^,]*,\)\{5\}[^.,]\+\),/\1./;s/^\(\([^,]*,\)\{10\}[^.,]\+\),/\1./' file ...

Capture tee's argument inside piped Perl execution

perl,unix

The short answer is - you can't. tee is a separate process with it's own arguments. There is no way to access these arguments from that process. (well, I suppose you could run ps or something). The point of tee is to take STDOUT write some of it to a...

Find numbers in a file and change their value with perl

regex,perl

You can just match the numbers that are less than one.. and replace with 1: perl -pe 's/\b0\.\d+/1/g' file See DEMO...

How to pivot array into another array in Ruby

arrays,ruby,csv

Here is a way using an intermediate hash-of-hash The h ends up looking like this {"Alaska"=>{"Rain"=>"3", "Snow"=>"4"}, "Alabama"=>{"Snow"=>"2", "Hail"=>"1"}} myArray = [["Alaska","Rain","3"],["Alaska","Snow","4"],["Alabama","Snow","2"],["Alabama","Hail","1"]] myFields = ["Snow","Rain","Hail"] h = Hash.new{|h, k| h[k] = {}} myArray.each{|i, j, k| h[i][j] = k } p [["State"] + myFields] + h.map{|k, v| [k] + v.values_at(*myFields)} output...

Perl : Display perl variable awk sed echo

perl

You can do that using a Perl regex pattern my $calculate; ($calculate = $1) =~ tr~,~/~ if $value =~ /SP=[^:]*:([^;]*)/; ...

Looping variables

perl,scripting

As per my comment, most of the functionality of your program is provided by the Math::Vector::Real module that you're already using It looks like you want the angle in degrees between successive pairs of 3D vectors in your file. This code creates vectors from each line in the file until...

Command line arguments in Perl

perl

$! and $_ are global variables. For more information you can read here $_ The default input and pattern-searching space $! If used in a numeric context, yields the current value of the errno variable, identifying the last system call error. If used in a string context, yields the corresponding...

export mysql query array using fputcsv

php,mysql,arrays,csv,fputcsv

Corresponding to official manual for mysqli_fetch_array: mysqli_fetch_array — Fetch a result row as an associative, a numeric array, or both You coded MYSQLI_ASSOC flag, so yo get associative array for one row of data: By using the MYSQLI_ASSOC constant this function will behave identically to the mysqli_fetch_assoc() See examples of...

calling cgi script from other cgi script

perl,cgi

A CGI program normally expects to fetch the parameter values from the environment variable QUERY_STRING. Passing parameter values on the command line is a debugging facility, and works only when the program is run from the command prompt You could try something like this my $result = do { local...

Run 3 variables at once in a python for loop.

python,loops,variables,csv,for-loop

zip the lists and use a for loop: def downloadData(n,i,d): for name, id, data in zip(n,i,d): URL = "http://www.website.com/data/{}".format(name) #downloads the file from the website. The last part of the URL is the name r = requests.get(URL) with open("data/{}_{}_{}.csv".format(name, id, data), "wb") as code: #create the file in the format...