Menu
  • HOME
  • TAGS

Parsing a big textfile and saving information in a table

regex,vba,ms-access,text-parsing

As I mentioned in my comments to the question, I would recommend parsing the original text file and writing out a temporary CSV file like this: 94172,,,253.25,"BMW_2",230 94173,20.000000,85.000000,1.000000,"CrCtl_StM",230 94174,20.000000,85.000000,1.000000,"CrCtl_StM",230 ... and then importing the CSV file using the VBA DoCmd.TransferText method. Using a Recordset to perform the inserts (as suggested...

What is the cleanest way to strip leading 0s from the day-of-month portion of dates?

c#,datetime,text-parsing,date-math,date-manipulation

Use a custom format. (MMMM d, yyyy) String lds = computedDate.ToString("MMMM d, yyyy", CultureInfo.InvariantCulture); single d would give you a single or double digit day part. If the day part is below 10, then you will get only a single digit instead of leading 0, for others you will get...

Cutting down on Stanford parser's time-to-parse by pruning the sentence

parsing,nlp,text-processing,text-parsing,text-analysis

You asked for 'creative' approaches - the Cell Closure pruning method might be worth a look. See the series of publications by Brian Roark, Kristy Hollingshead, and Nathan Bodenstab. Papers: 1 2 3. The basic intuition is: Each cell in the CYK parse chart 'covers' a certain span (e.g. the...

Trouble formulating a regular expression for use with sed to extract column values

linux,shell,sed,text-parsing

The general caveat applies: awk is the better tool for the job. Here's a simpler sed solution: ls -la | sed -E 's/^(([^[:space:]]+)[[:space:]]+){5}.*/\2/' works with both spaces and tabs between columns takes advantage of repeating capture groups only reporting the last captured instance - in this case, the 5th column...

Text parsing and formatting with JavaScript

javascript,jquery,text,text-parsing,text-formatting

var s = "A #i#sample#/i# %%text #b#with%% text format#/b#", s1 = "%%#b#A#/b#%%|%%#b#B#/b#%% will be good."; s.replace(/#(.*?)#/g, "<$1>"); //"A <i>sample</i> %%text <b>with%% text format</b>" s1.replace(/#(.*?)#/g, "<$1>"); //"%%<b>A</b>%%|%%<b>B</b>%% will be good." You can use capture groups to replace the # with appropriate brackets < and >....

How to add missing characters to a line of text in a file using some scripting on UNIX?

bash,shell,unix,awk,text-parsing

You could use the following awk code: awk -F"," 'NF < 50 {printf $0; for(i = NF; i < 50; ++i) printf ","; printf "\n" }' file produces 9,15040113501460,0,b1 0035569144,91 302317960883,0,15040113501460,132,15040614170560,N,0,0,0,0,0,0,0,0,0,0,8,0,0000000000000000,0,0,2,,27,b1 003st69144,1,,,,,,,,,,,,,,,,,,,, 9,15350114601560,0,b1 0033765345,91 304294596921,0,15040113501560,132,15040610170260,N,0,0,0,0,0,0,0,0,0,0,8,0,0000000000000000,0,0,2,,27,b1 0031r69144,1,,,,,,,,,,,,,,,,,,,, here, NF is the number of fields in each line, F is the...

I need to read data from a file, but unable to read the whole values from it

c,file-io,text-parsing

Since you're not consuming the whole line with your fscanf, the remaining values trip up your scan. Either use fgets to read the whole line and then scan if with sscanf, or add %*[^\n] to the end of your fscanf (which reads characters until it encounters a newline character, and...

How can I read all of the paragraphs of a text into a list?

java,text-processing,text-parsing

The issue is in this block: Paragraph paragraph = new Paragraph(); List<String> strings = sentences; // <-- !!!!! paragraph.setSentences(strings); paragraphs.add(paragraph); sentences.clear(); You use the same object that sentences points to for all your paragraphs, so in the end all your Paragraph objects will point to the same List<String>. Thus, any...

Lexical analyzer with decimal numbers

java,regex,text-parsing,lexical-analysis

I don't think you are exiting the first loop until you have exhausted all input. c is never updated (except in in the else if, which is never called). At the end of your program c is still equal to '4'. As a general path to a solution, you should...

parsing tab separated header of a file in unix

unix,awk,text-parsing

If I understand correctly, you are asking how to loop over the fields from 1 to NF. Here is an example of such a loop: $ head -1 file | awk -F"\t" '{for (i=1;i<=NF;i++)printf "%s ",$i; print"";}' abc ttr nnc r32 inc ...

pattern.compile help java program [closed]

java,text-parsing,matcher

Finally, I got this working (Thanks for the amazing inputs). below are the changes I made : public void findLines(String aFileName) { List<Integer> a = new ArrayList<Integer>(); List<Integer> b = new ArrayList<Integer>(); Pattern regexp = Pattern.compile("(if|else|while).*"); Matcher exp1 = regexp.matcher("if|else|while"); Path path = Paths.get(aFileName); try ( BufferedReader reader = Files.newBufferedReader(path,...

Reading text file for specfic keyword included inside brackets '{'

python,regex,text-parsing,brackets,openfoam

The answer depends on whether you want to support the whole general OpenFOAM's dictionary format, or not. If you only need to support format similar to what you have shown in the question, then a simple regex like \b(\w+)\s+{\s+type\s+(\w+); would do: https://regex101.com/r/yV8tK2/1 . This can be your option if you...

String parsing in C. Copying a part of a string to another string

c,string,text-parsing,arrays

I agree with Anonymouse. It is both clumsy and inefficient to replace first all \n, then all \t. Instead, make a single pass through the string, replacing all escape characters as you go. I left the space allocation out in the code sample below; IMHO this is a separate responsibility,...

Extract substrings from a file and store them in shell variables

linux,shell,awk,sed,text-parsing

Try this if you're using bash: $ declare $(awk '{print $2"="$4}' file) $ echo "$parent" 192.168.1.2 If the file contained white space in the values you want to init the variables with then you'd just have to set IFS to a newline before invoking declare, e.g. (simplified the input file...

javascript extract parts of a sentence

javascript,jquery,coffeescript,text-parsing,string-parsing

I think this is the code you are looking for. See the comment in the code for some more information about how it works. var sentence = "Show me a gift of a car"; // specify commands var commands = { gif: { pattern: /gift of a (.*).?/, action: call...

How do I access individual nodes in the dependency tree and constituency tree returned by the Stanford Parser?

c#,stanford-nlp,text-parsing

Yes, there exist plenty of data structures to work with constituency trees and dependency trees. For constituency trees, you want to work with the Tree data structure which has many useful built-in functions to traverse trees, get all the terminal nodes, etc. For dependency trees you can either work with...

Reading data from a Text File into Matlab array

matlab,parsing,matlab-figure,text-parsing,string-parsing

Here is what I would do. First thing, delete the "(0)" types of lines from your text file (could even use a simple shells script for that). This I put into the file called post2.txt. # First, load the text file into Matlab: A = load('post2.txt'); # Create the imaginary...

Parse a text file with columns aligned with white spaces

python,text-parsing,string-parsing

You can use re.split(), and use \s{2,} as a delimiter pattern: >>> l = ["Blah blah, blah bao 123456 ", ... "hello, hello, hello miao 299292929 "] >>> for item in l: ... re.split('\s{2,}', item.strip()) ... ['Blah blah, blah', 'bao', '123456'] ['hello, hello, hello', 'miao', '299292929'] \s{2,} matches 2 or...

What is CoNLL data format?

nlp,text-parsing,text-mining,information-extraction

There are many different CoNLL formats since CoNLL is a different shared task each year. The format for CoNLL 2009 is described here. Each line represents a single word with a series of tab-separated fields. _s indicate empty values. Mate-Parser's manual says that it uses the first 12 columns of...

Regex to validate macros in text

regex,parsing,macros,escaping,text-parsing

If your language supports PCRE verb, (*SKIP)(*F), i did use. \(\*(?:(?!\(\*|\*\)).)*\*\)(*SKIP)(*F)|(?:\(\*|\*\)) DEMO That is, use this \(\*(?:(?!\(\*|\*\)).)*\*\)(*SKIP)(*F)|\(\* regex and then replace the matched characters with < then use this regex \(\*(?:(?!\(\*|\*\)).)*\*\)(*SKIP)(*F)|\*\) against the modified string and replace the matched characters with > I love the variable length lookbehind feature in...

Removing brackets and quotes from print in Python 2.7

python-2.7,readfile,text-parsing

re.findall returns a list. When you do str(someList), it will print the brackets and commas: >>> l = ["a", "b", "c"] >>> print str(l) ['a', 'b', 'c'] If you want to print without [ and ,, use join: >>> print ' '.join(l) a b c If you want to print...

how to make a text parsing function efficient in R

r,text-processing,text-parsing

I've re-read recently your question, comments and @wibeasley's answer and got that didn't understand everything correctly. Now it have become more clear, and I'll try to suggest something useful. First of all, we need a small example to work with. I've made it from the dictionary in your link. dictdf...

Why does my regex not work inside string.match()?

javascript,regex,text-parsing

The leading (?s) is an invalid group in a JavaScript regex. The browser console surely produced an error when you tried that. You don't need to escape < or >, so this works: var re = /<strong>.+<\/strong>/; Note that if you have several <strong> elements in a run of text,...

batch parsing multiple registry dumps to csv

batch-file,python-3.x,text-parsing

@ECHO Off SETLOCAL ENABLEDELAYEDEXPANSION :: delete output file DEL "newfile.txt" >NUL 2>nul :: remove variables starting $ or # :: remove variables starting $ FOR /F "delims==" %%a In ('set $ 2^>Nul') DO SET "%%a=" CALL :zap# :: Set column headers SET /a columncount=0 FOR /f "tokens=1*delims==" %%a IN (q27905448.ct8)...

Why do these two methods of counting words differ significantly?

c,regex,text,fscanf,text-parsing

You are incorrect when you say that the command ... $ grep -o -w 'Mars' TripToMars.txt | wc -w ... "finds only instances of 'Mars' as a standalone word", or at least that statement is misleading in context. The command finds instances of "Mars" that are not part of a...

Parsing structured text data

text-parsing

This looks like the output from the PHP serialize function (you need to unserialize it): http://php.net/manual/en/function.serialize.php If you are working in python, there is a port of the serialize and unserialize functions here: https://pypi.python.org/pypi/phpserialize Anatomy of a serialize()'ed value: String s:size:value; Integer i:value; Boolean b:value; (does not store "true" or...

Batch: How do I parse a string containing a filesystem path?

batch-file,path,text-parsing

Try to apply path modifiers as follows: set "inputPath=C:\Users\SomeUser\Desktop\SomeFolder\File.jar" for %%i in ("%inputPath%") do set fname=%%~nxi echo %fname% %%~nx<loop-var> extracts the filename root (n) and filename extension (x) from the loop variable; i.e., it extracts the last/filename component from the loop variable's value. (%%i was chosen as the loop variable...

Extract the 2nd columns from multiple tables and print them tab-delimited

sed,awk,multiple-columns,text-parsing

Use the following awk script: #!/usr/bin/awk BEGIN { nc=1; nr_ = 1; maxr = 1;} /^--$/ { if (maxr < nr_ ) maxr = nr_; nc++; nr_=1; next; } { col[nc, nr_++] = $2; } END { for(r = 1; r < maxr ; r++) { for(c = 1; c...

Perl - Reading Specific Lines from a CSV file

perl,csv,text-parsing

Unless your data includes quoted fields, like a,b,c,"complicated field, quoted",e,f,g there is no advantage in using Text::CSV over a simple split /,/. This example categorizes the data into a hash that you can access simply and directly. I have used Data::Dump only to show the contents of the resulting data...

ColdFusion extract values from text file

regex,coldfusion,extract,text-parsing,text-extraction

You don't want REMatch, you want REFind (docs): REFind(reg_expression, string [, start, returnsubexpressions ] ) returnsubexpressions is what you need, so... <cfset str = "request.config.MY_PARAM_NAME = 'The parameter VALUE!!';"> <cfset match = REFind("^request\.config\.(\S+) = (.*);", str, 1, "Yes")> <cfdump var="#match#"> match will be a Struct with two keys (POS and...

Match exact word in bash script, extract number from string

bash,shell,text-parsing

see whether this construct is helpful for your purpose: #!/bin/bash name="longname55445" echo "${name##*[A-Za-z]}" this assumes a letter adjacent to number. The following is NOT another way to write the same, because it is wrong. Please see comments below by mklement0, who noticed this. Mea culpa. echo "${name##*[:letter:]}" ...

Removing duplicates from a file

java,text-parsing

Since what you really have is sets of two lines, not one, the matter is a little more complicated than simply read the lines one by one and only trim duplicates. Here is a solution using Java 7: public static void eliminateDups(final String srcfile, final String dstfile) throws IOException {...

Basic Attoparsec Parsing returns only “Right []”

haskell,text-parsing

Your parser "successfully parses" a list of MapLines of length zero before failing at the first line. Remove that line (and make sure your file doesn't include any non-parsable bytes at the start like a BOM) and it should work. Or write a parser for lines starting with a #...

Parse text file in MATLAB

matlab,text-parsing

you can store it in a cell array like this: t{1} = [ 5 4 3 2 1]; t{2} = [ 10 9 8 7 6 5]; t{3} = [ 11 12 13 14]; and use them like this: >> t(1) ans = [1x5 double] >> t{2} ans = 10...

Counting the number of specific values in a column with awk

awk,text-parsing

The problem is that your target field has embedded double quotes, so you need to match them too, by including them - \-escaped - in the string to match against: awk ' BEGIN{FS=","; s_count=0; c_count=0} ($3=="\"s\"") {s_count++} ($3=="\"c\"") {c_count++} END{ print s_count; print c_count } ' data.csv As an aside,...