regex,vba,ms-access,text-parsing
As I mentioned in my comments to the question, I would recommend parsing the original text file and writing out a temporary CSV file like this: 94172,,,253.25,"BMW_2",230 94173,20.000000,85.000000,1.000000,"CrCtl_StM",230 94174,20.000000,85.000000,1.000000,"CrCtl_StM",230 ... and then importing the CSV file using the VBA DoCmd.TransferText method. Using a Recordset to perform the inserts (as suggested...
c#,datetime,text-parsing,date-math,date-manipulation
Use a custom format. (MMMM d, yyyy) String lds = computedDate.ToString("MMMM d, yyyy", CultureInfo.InvariantCulture); single d would give you a single or double digit day part. If the day part is below 10, then you will get only a single digit instead of leading 0, for others you will get...
parsing,nlp,text-processing,text-parsing,text-analysis
You asked for 'creative' approaches - the Cell Closure pruning method might be worth a look. See the series of publications by Brian Roark, Kristy Hollingshead, and Nathan Bodenstab. Papers: 1 2 3. The basic intuition is: Each cell in the CYK parse chart 'covers' a certain span (e.g. the...
The general caveat applies: awk is the better tool for the job. Here's a simpler sed solution: ls -la | sed -E 's/^(([^[:space:]]+)[[:space:]]+){5}.*/\2/' works with both spaces and tabs between columns takes advantage of repeating capture groups only reporting the last captured instance - in this case, the 5th column...
javascript,jquery,text,text-parsing,text-formatting
var s = "A #i#sample#/i# %%text #b#with%% text format#/b#", s1 = "%%#b#A#/b#%%|%%#b#B#/b#%% will be good."; s.replace(/#(.*?)#/g, "<$1>"); //"A <i>sample</i> %%text <b>with%% text format</b>" s1.replace(/#(.*?)#/g, "<$1>"); //"%%<b>A</b>%%|%%<b>B</b>%% will be good." You can use capture groups to replace the # with appropriate brackets < and >....
bash,shell,unix,awk,text-parsing
You could use the following awk code: awk -F"," 'NF < 50 {printf $0; for(i = NF; i < 50; ++i) printf ","; printf "\n" }' file produces 9,15040113501460,0,b1 0035569144,91 302317960883,0,15040113501460,132,15040614170560,N,0,0,0,0,0,0,0,0,0,0,8,0,0000000000000000,0,0,2,,27,b1 003st69144,1,,,,,,,,,,,,,,,,,,,, 9,15350114601560,0,b1 0033765345,91 304294596921,0,15040113501560,132,15040610170260,N,0,0,0,0,0,0,0,0,0,0,8,0,0000000000000000,0,0,2,,27,b1 0031r69144,1,,,,,,,,,,,,,,,,,,,, here, NF is the number of fields in each line, F is the...
Since you're not consuming the whole line with your fscanf, the remaining values trip up your scan. Either use fgets to read the whole line and then scan if with sscanf, or add %*[^\n] to the end of your fscanf (which reads characters until it encounters a newline character, and...
java,text-processing,text-parsing
The issue is in this block: Paragraph paragraph = new Paragraph(); List<String> strings = sentences; // <-- !!!!! paragraph.setSentences(strings); paragraphs.add(paragraph); sentences.clear(); You use the same object that sentences points to for all your paragraphs, so in the end all your Paragraph objects will point to the same List<String>. Thus, any...
java,regex,text-parsing,lexical-analysis
I don't think you are exiting the first loop until you have exhausted all input. c is never updated (except in in the else if, which is never called). At the end of your program c is still equal to '4'. As a general path to a solution, you should...
If I understand correctly, you are asking how to loop over the fields from 1 to NF. Here is an example of such a loop: $ head -1 file | awk -F"\t" '{for (i=1;i<=NF;i++)printf "%s ",$i; print"";}' abc ttr nnc r32 inc ...
Finally, I got this working (Thanks for the amazing inputs). below are the changes I made : public void findLines(String aFileName) { List<Integer> a = new ArrayList<Integer>(); List<Integer> b = new ArrayList<Integer>(); Pattern regexp = Pattern.compile("(if|else|while).*"); Matcher exp1 = regexp.matcher("if|else|while"); Path path = Paths.get(aFileName); try ( BufferedReader reader = Files.newBufferedReader(path,...
python,regex,text-parsing,brackets,openfoam
The answer depends on whether you want to support the whole general OpenFOAM's dictionary format, or not. If you only need to support format similar to what you have shown in the question, then a simple regex like \b(\w+)\s+{\s+type\s+(\w+); would do: https://regex101.com/r/yV8tK2/1 . This can be your option if you...
I agree with Anonymouse. It is both clumsy and inefficient to replace first all \n, then all \t. Instead, make a single pass through the string, replacing all escape characters as you go. I left the space allocation out in the code sample below; IMHO this is a separate responsibility,...
linux,shell,awk,sed,text-parsing
Try this if you're using bash: $ declare $(awk '{print $2"="$4}' file) $ echo "$parent" 192.168.1.2 If the file contained white space in the values you want to init the variables with then you'd just have to set IFS to a newline before invoking declare, e.g. (simplified the input file...
javascript,jquery,coffeescript,text-parsing,string-parsing
I think this is the code you are looking for. See the comment in the code for some more information about how it works. var sentence = "Show me a gift of a car"; // specify commands var commands = { gif: { pattern: /gift of a (.*).?/, action: call...
Yes, there exist plenty of data structures to work with constituency trees and dependency trees. For constituency trees, you want to work with the Tree data structure which has many useful built-in functions to traverse trees, get all the terminal nodes, etc. For dependency trees you can either work with...
matlab,parsing,matlab-figure,text-parsing,string-parsing
Here is what I would do. First thing, delete the "(0)" types of lines from your text file (could even use a simple shells script for that). This I put into the file called post2.txt. # First, load the text file into Matlab: A = load('post2.txt'); # Create the imaginary...
python,text-parsing,string-parsing
You can use re.split(), and use \s{2,} as a delimiter pattern: >>> l = ["Blah blah, blah bao 123456 ", ... "hello, hello, hello miao 299292929 "] >>> for item in l: ... re.split('\s{2,}', item.strip()) ... ['Blah blah, blah', 'bao', '123456'] ['hello, hello, hello', 'miao', '299292929'] \s{2,} matches 2 or...
nlp,text-parsing,text-mining,information-extraction
There are many different CoNLL formats since CoNLL is a different shared task each year. The format for CoNLL 2009 is described here. Each line represents a single word with a series of tab-separated fields. _s indicate empty values. Mate-Parser's manual says that it uses the first 12 columns of...
regex,parsing,macros,escaping,text-parsing
If your language supports PCRE verb, (*SKIP)(*F), i did use. \(\*(?:(?!\(\*|\*\)).)*\*\)(*SKIP)(*F)|(?:\(\*|\*\)) DEMO That is, use this \(\*(?:(?!\(\*|\*\)).)*\*\)(*SKIP)(*F)|\(\* regex and then replace the matched characters with < then use this regex \(\*(?:(?!\(\*|\*\)).)*\*\)(*SKIP)(*F)|\*\) against the modified string and replace the matched characters with > I love the variable length lookbehind feature in...
python-2.7,readfile,text-parsing
re.findall returns a list. When you do str(someList), it will print the brackets and commas: >>> l = ["a", "b", "c"] >>> print str(l) ['a', 'b', 'c'] If you want to print without [ and ,, use join: >>> print ' '.join(l) a b c If you want to print...
r,text-processing,text-parsing
I've re-read recently your question, comments and @wibeasley's answer and got that didn't understand everything correctly. Now it have become more clear, and I'll try to suggest something useful. First of all, we need a small example to work with. I've made it from the dictionary in your link. dictdf...
The leading (?s) is an invalid group in a JavaScript regex. The browser console surely produced an error when you tried that. You don't need to escape < or >, so this works: var re = /<strong>.+<\/strong>/; Note that if you have several <strong> elements in a run of text,...
batch-file,python-3.x,text-parsing
@ECHO Off SETLOCAL ENABLEDELAYEDEXPANSION :: delete output file DEL "newfile.txt" >NUL 2>nul :: remove variables starting $ or # :: remove variables starting $ FOR /F "delims==" %%a In ('set $ 2^>Nul') DO SET "%%a=" CALL :zap# :: Set column headers SET /a columncount=0 FOR /f "tokens=1*delims==" %%a IN (q27905448.ct8)...
c,regex,text,fscanf,text-parsing
You are incorrect when you say that the command ... $ grep -o -w 'Mars' TripToMars.txt | wc -w ... "finds only instances of 'Mars' as a standalone word", or at least that statement is misleading in context. The command finds instances of "Mars" that are not part of a...
This looks like the output from the PHP serialize function (you need to unserialize it): http://php.net/manual/en/function.serialize.php If you are working in python, there is a port of the serialize and unserialize functions here: https://pypi.python.org/pypi/phpserialize Anatomy of a serialize()'ed value: String s:size:value; Integer i:value; Boolean b:value; (does not store "true" or...
Try to apply path modifiers as follows: set "inputPath=C:\Users\SomeUser\Desktop\SomeFolder\File.jar" for %%i in ("%inputPath%") do set fname=%%~nxi echo %fname% %%~nx<loop-var> extracts the filename root (n) and filename extension (x) from the loop variable; i.e., it extracts the last/filename component from the loop variable's value. (%%i was chosen as the loop variable...
sed,awk,multiple-columns,text-parsing
Use the following awk script: #!/usr/bin/awk BEGIN { nc=1; nr_ = 1; maxr = 1;} /^--$/ { if (maxr < nr_ ) maxr = nr_; nc++; nr_=1; next; } { col[nc, nr_++] = $2; } END { for(r = 1; r < maxr ; r++) { for(c = 1; c...
Unless your data includes quoted fields, like a,b,c,"complicated field, quoted",e,f,g there is no advantage in using Text::CSV over a simple split /,/. This example categorizes the data into a hash that you can access simply and directly. I have used Data::Dump only to show the contents of the resulting data...
regex,coldfusion,extract,text-parsing,text-extraction
You don't want REMatch, you want REFind (docs): REFind(reg_expression, string [, start, returnsubexpressions ] ) returnsubexpressions is what you need, so... <cfset str = "request.config.MY_PARAM_NAME = 'The parameter VALUE!!';"> <cfset match = REFind("^request\.config\.(\S+) = (.*);", str, 1, "Yes")> <cfdump var="#match#"> match will be a Struct with two keys (POS and...
see whether this construct is helpful for your purpose: #!/bin/bash name="longname55445" echo "${name##*[A-Za-z]}" this assumes a letter adjacent to number. The following is NOT another way to write the same, because it is wrong. Please see comments below by mklement0, who noticed this. Mea culpa. echo "${name##*[:letter:]}" ...
Since what you really have is sets of two lines, not one, the matter is a little more complicated than simply read the lines one by one and only trim duplicates. Here is a solution using Java 7: public static void eliminateDups(final String srcfile, final String dstfile) throws IOException {...
Your parser "successfully parses" a list of MapLines of length zero before failing at the first line. Remove that line (and make sure your file doesn't include any non-parsable bytes at the start like a BOM) and it should work. Or write a parser for lines starting with a #...
you can store it in a cell array like this: t{1} = [ 5 4 3 2 1]; t{2} = [ 10 9 8 7 6 5]; t{3} = [ 11 12 13 14]; and use them like this: >> t(1) ans = [1x5 double] >> t{2} ans = 10...
The problem is that your target field has embedded double quotes, so you need to match them too, by including them - \-escaped - in the string to match against: awk ' BEGIN{FS=","; s_count=0; c_count=0} ($3=="\"s\"") {s_count++} ($3=="\"c\"") {c_count++} END{ print s_count; print c_count } ' data.csv As an aside,...