You can grep to do this kind of regexp matching among the column names: x = c(1, 2, 3) df = data.frame(var1=x, var2=x, var3=x, other=x) df[, grep("var*", colnames(df))] Output: var1 var2 var3 1 1 1 1 2 2 2 2 3 3 3 3 So, basically just making use of...

In your updated code you are missing {} for the forvalues loop. Also, you don't make use of the local employcode_tmp, and that seems to be what you aim for. Fixing the syntax errors I mention and deleting your second quietly should give you some output. However, your loop gives...

You can manipulate matrices in Stata, but Mata is richer for this purpose. It can be used calculator-style. . sysuse auto, clear (1978 Automobile Data) . tab rep78 foreign, nofreq row matcell(freqs) Repair | Record | Car type 1978 | Domestic Foreign | Total -----------+----------------------+---------- 1 | 100.00 0.00 |...

The simplest way is to use recursion: real matrix cart_prod(pointer vector c_list ,| real scalar curr_i){ if(curr_i==.) curr_i=1 myret = (*c_list[curr_i]) if (curr_i<length(c_list)){ ret = cart_prod(c_list, curr_i+1) myret = mm_expand(myret,rows(ret),1,1), mm_expand(ret, rows(myret),1,0) } return(myret) } cart_prod(c_list) This will work even if the vectors pointed to from c_list are of different...

r,sorting,data-mining,stata,subsetting

I tried to answer your questions at the end. First, an example data frame to play around with: set.seed(123) df <- data.frame(id=c(paste0(letters[1:10], 1:10)), matrix(sample(1:20, 500, replace=T), nrow=100, ncol=5)) colnames(df)[2:6] <- paste0("var", 1:5) 1. Count values of a variable For the first question, I'm not sure why you wouldn't do this...

After xtile PH_scale = PH, nq(4) this is a easy replace replace PH_scale = cond(inlist(PH_scale, 1, 4), 1, 2) Alternatively, create percentiles directly _pctile PH, nq(4) gen PH_scale = cond(PH < r(r1) | PH > r(r3), 1, 2) if PH < . Note that indicator variables are widely defined as...

In general, week() is part of the solution if and only if you define your weeks according to Stata's rules for weeks. They are Week 1 of the year starts on January 1, regardless. Week 2 of the year starts on January 8, regardless. And so on, except that week...

clear set more off *----- example data ----- input /// id str10 date 1 "1/1/2010" 2 "1/1/2010" 3 "1/4/2010" 4 "1/5/2010" 5 "1/8/2010" 6 "1/10/2010" 7 "1/11/2010" end gen date2 = date(date, "MDY") format %td date2 drop date list *----- what you want ----- isid id levelsof id, local(levid) forvalues...

See the help files for extended functions of local/global macros: help extended_fcn (where extended_function is variable label varname [what you asked for] or value label varname [not what you asked for, but may be of use]). E.g.: sysuse auto, clear local x : variable label foreign local y : value...

One way is using Mata: clear set more off // example matrix matrix E = 1,2,3,4,5,6,7,8,9 matrix list E // change shape mata: Em = st_matrix("E") // take matrix to Mata Em = colshape(Em,3) // change shape st_matrix("Es", Em) // take result to Stata end // list in Stata matrix...

I'm still inclined to consider this question off-topic. It looks like a simple code request. I'll answer with the hope that future questions more clearly state what the programming problem is (including code). One way is: clear set more off // change to 500 set obs 15 gen counter =...

This is covered in this FAQ. Here's the relevant code for testing that a coefficient is 0.5: sysuse auto, clear gen ln_mpg = ln(mpg) regress ln_mpg i.foreign##c.weight test _b[1.foreign#c.weight]=.5 local sign_wgt = sign(1.foreign#c.weight) display "Ho: coef <= 0.5 p-value = " ttail(r(df_r),`sign_wgt'*sqrt(r(F))) display "Ho: coef >= 0.5 p-value = "...

The following works for your example data, but notice I had to insert the "non-conventional" characters inside the regex definition because I don't see a way of expressing "all but numbers" using Stata's implementation of regex: clear set more off *----- example data ----- input /// str30 orig "AufderScholle12" "K^nigsbr¸ckerPlatz3"...

The double loop is causing the problem: local files : dir "D:/Datos/rferrer/Desktop/statatemps" files "test*.xls" cd "D:/Datos/rferrer/Desktop/statatemps" local counter = 1 foreach file in `files' { import excel "`file'", sheet("Hoja1") firstrow clear if `counter' == 1 { di in red `counter' save "D:/Datos/rferrer/Desktop/statatemps/master.dta", replace } else { append using "D:/Datos/rferrer/Desktop/statatemps/master.dta" save...

As @Roberto Ferrer points out, your question isn't very clear, but here is an example using moss from SSC: . clear . input str16 var1 var1 1. "h 01 .00 .0 abc" 2. "d 1.0 .0 14.0abc" 3. "1,0.0 0.0 .0abc" 4. end . moss var1, regex match("([0-9]+\.*[0-9]*|\.[0-9]+)") . l...

r,data.table,stata,code-translation

Your intuition is correct. collapse is the Stata equivalent of R's aggregate function, which produces a new dataset from an input dataset by applying an aggregating function (or multiple aggregating functions, one per variable) to every variable in a dataset.

It is no surprise that duplicates does not do what you are wanting, as it does not fit your problem. For example, the observation with id == 2, disease == 0 is not a duplicate of any other observation. More generally, duplicates does not purport to be a general-purpose command...

Try this (assuming df is your data) transform(df, aggregated_count = ave(comments_count, member_id, FUN = cumsum)) # member_id entry_id comments_count timestamp aggregated_count # 1 1 a 4 2008-06-09 12:41:00 4 # 2 1 b 1 2008-07-14 18:41:00 5 # 3 1 c 3 2008-07-17 15:40:00 8 # 4 2 d 12...

r,ggplot2,stata,graphing,subsetting

very nicely done question. If you are still interested in a base solution: transactionID <- c(1, 2, 3, 4) date <- as.Date(c("2006-08-06", "2008-07-30", "2009-04-16", "2013-02-05")) cost <- as.integer(c(1208, 23820, 402, 89943)) company <- c("ACo", "BInc", "CInd", "DOp") thedata <- data.frame(transactionID, date, cost, company) par(mar = c(5,7,3,2), tcl = .2, las...

Use if: clear set obs 29 gen t = "should not be here" tempfile file1 save "`file1'" clear set obs 31 gen t = "should be here" tempfile file2 save "`file2'" clear *----- foreach f in file1 file2 { use "``f''", clear if _N > 30 { export excel using...

Upon further thinking (and reading an old article by Nick Cox), it occurred to me that statsby can be used to avoid the loop and speed up the program. Here's a comparison of their speed. Let's first prepare example data. set more off timer clear webuse nlswork,clear keep idcode ln_wage...

The dataset provided answers the question. Consider the tabulation: . tab W_num_exp num_execs_i | ntl_exp, | Winsorized | fraction | .01 | Freq. Percent Cum. ------------+----------------------------------- 0 | 297 16.71 16.71 1 | 418 23.52 40.24 2 | 436 24.54 64.77 3 | 282 15.87 80.64 4 | 171 9.62...

If they are actual strings, this should work: sysuse auto, clear ds, has(type string) // get a list of string variables // loop over each string variable, count observations that contain Buick anywhere, and drop the variable if N>0 foreach var of varlist `r(varlist)' { count if regexm(`var',"Buick") if r(N)>0...

Edit: Just create scalars, which persist in memory clear all webuse ugdp cs case exposed [fw=pop], by(age) scalar rr_mh = r(rr) Then use glm: glm case exposed [fw = pop], family(binomial) link(log) scalar rr_crude = exp(_b[exposed]) or cs case exposed [fw = pop] scalar rr_crude = r(rr) In either case:...

The simplest way is just to use summarize results directly: sysuse auto, clear quietly foreach v of var price-foreign { su `v', detail gen `v'q = `v' / (r(p75) - r(p25)) } The egen route is overkill if it means creating new variables for each original variable, just to hold...

As @thelatemail suggests, the easiest thing to do here is simply run Stata in batch mode from a system call. Here's an example do file (called "example.do"): log using out.log, replace sysuse auto regress mpg weight foreign And here's the R code to run it and retrieve the output (assuming...

A first approximation is clear set more off input /// state year size income 1 1978 1 1000 1 1978 1.5 100 1 1978 2 5000 1 1979 1 3779.736 1 1979 1.5 3779.736 1 1979 2 4878.414 1 1980 1 4290 1 1980 1.5 4290 1 1980 2 5537...

You can select observations using value labels. . sysuse auto, clear (1978 Automobile Data) . count if foreign=="Foreign":origin 22 You need to know the name of the value labels, here origin. You can look that up in several ways. This is documented at [U] 13.11 in Stata 14 and (under...

osx,performance,terminal,stata

From the OS X terminal, cd to the directory containing the CSV files, and run the command ls -lUt | head which should show your files, sorted by the most recent access time, limited to the 10 most recently accessed....

"leaving aside that for this example model there is absolutely no need to use a NLLS regression": I think that's what you can't do here.... The question is about why the syntax is as it is. That's a matter of logic and a matter of history. Why a particular syntax...

I've had this error many times before, and it's easy to reproduce: library(foreign) test <- data.frame(a = "", b = 1, stringsAsFactors = FALSE) write.dta(test, 'example.dta') One solution is to use factor variables instead of character variables, e.g., for (colname in names(test)) { if (is.character(test[[colname]])) { test[[colname]] <- as.factor(test[[colname]]) }...

Your sandbox data has implicit missing values, so the first two lines get omitted the way I read this in. I take that as being incidental. As @Roberto Ferrer clearly explained, this is an (utterly standard) reshape long. clear input Name Company1 Company2 Company3 Company4 Company5 Company6 1985 6.0781 2.4766...

Let's suppose that you are looking for variables with a certain value label attached. You can retrieve those variables using ds and pass their names to recode. . clear . set obs 2 obs was 0, now 2 . forval j = 1/5 { 2. gen y`j' = _n 3....

One place for improvement stands out in your do file: You are doing 600 merges of sampled ids with "big" files. Here is code that requires only one merge for each big file, or six, in your case. The trick is to reshape the sample data from long format to...

I don't know enough about R / Pandas to provide an authoritative answer but it's likely related to this phrase in your question: "datasets roughly correspond to a list of vectors" (please correct me if the following is wrong). Program writers always have the choice about whether to implement a...

Based on your latest comment, which describes the (un)expected behavior: clear set more off *----- example data ----- input /// country year rank 1 1960 2 1 1961 1 1 1962 2 2 1960 1 2 1961 1 2 1962 1 3 1960 3 3 1961 3 3 1962 3...

@Roberto Ferrer has given a direct approach. It follows from the logic he uses that there is also a route using egen's group() function: egen group = group(id date2) bysort id (group): gen clust2 = sum(group != group[_n-1]) ...

@Nick has already given a solution to the problem. He claims only stylistic changes were made, but I suspect more. The double quotes originally used by the poster introduce an additional word in the definition of local cells. That is clear when we count the words contained in the local...

This does appear to be a casebook example of how to do things as slowly as possible, except that naturally you are not doing that on purpose. All it lacks is a loop over observations to calculate totals. So, the good news is that you should indeed be able to...

when you say cd you need to point to a hard drive directory: cd C:\ then mkdir mkdir "`directory ' " then save save "C:/`name'\current_data.dta" be careful the last slash is / not \ ......

clear all program define press, rclass syntax varlist(fv) [if] [in] /// [fweight aweight pweight iweight] , /// [nodots] gettoken y x : varlist marksample touse preserve quietly keep if `touse' if "`weight'" != "" { local wgt "[`weight'`exp']" } tempvar pred temp prs quietly gen double `pred' = . if...

To answer your comment, this works for me: clear set more off input /// mnbr firm cont 1591 2 1 9246 6 . 812 6 1 674 6 1 end list // problem 1 inspect firm display r(N_unique) // problem 2 bysort firm: egen totc = total(cont) by firm: gen...

I don't have scope to test this, but here are suggested simplifications to the code segment. I don't address the main question, which I don't understand, partly because there is no precise description of data structure in the question. To summarize suggestions: Use summarize, meanonly when that is all you...

The first date is just the minimum date. Sort dates within each patient, and the first date and the smallest date are one and the same, as a date is numeric. bysort id (date) : gen firstdate = date[1] Note that I deliberately did not overwrite your original date variable....

This is a cute idea, but it is not workable. end has a meaning within loops which clashes with its meaning to input. If you input outside the loops, then it will not understand the loop syntax. Here is another way to do it: . clear all . set obs...

Following @Roberto Ferrer's helpful comment, here is his second method based on downloading mylabels using ssc inst mylabels: . webuse grunfeld, clear . su y Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- year | 200 1944.5 5.780751 1935 1954 . mylabels 1935(4)1955, suffix(#) myscale(@) local(lbl) 1935 "1935#" 1939...

This will fail unless the variable looked at, here var1, is a string variable and every distinct value of the variable used could be a new and legal variable name. I've not tried to make the code resistant to failures of the assumptions. levelsof var1, local(levels) foreach v of local...

You should look at help string functions to learn basic syntax here. drop if length(Cod) < 16 may be what you seek. ...

StataCorp's own signposting pointing a call for help elsewhere is in recent versions typically done by .maint files. Thus viewsource shelp_alias.maint contains a list of various pointers from names starting with s, including allowed abbreviations, and help sfrancia can be seen to fire up help swilk. For s substitute any...

The Pesaran statistic will (asymptotically) follow a standard normal distribution if the null-hypothesis is true, so: the p-value is 2*(normal(-abs(r(pesaran))))

I would recommend reading the variable names into a local, and use only global if strictly necessary. One way to do that is to use import excel along with levelsof: clear set more off // import from MS Excel and create local import excel using myvars.xlsx, cellrange(B2:B5) firstrow levelsof myvars,...

sql,stata,common-table-expression

You should declare multiple CTE's in the following way: WITH cte AS ( ) , cte2 AS ( ) Not by using the WITH-clause multiple times....

Here's one way that uses lag operators: clear input firm_id t exp 0 1 0 0 2 1 1 1 0 1 2 1 1 3 1 2 1 0 2 2 1 2 3 0 3 1 1 3 2 1 4 1 0 4 2 0 end xtset...

First, make good use of Stata's help files: e.g., search percentiles returns a list of possible commands. Two commands that will likely be of use are summarize (with the detail option; note that you can use return list afterwards to view/store results [regardless of whether the detail option was specified])...

Stata has a rule that in addition to the directories (etc.) explicitly named in the adopath, it will also look off those in subdirectories named by an individual letter for programs whose names begin with that letter. Thus suppose you are invoking a command whatever and your named directories include...

You don't need to dereference scalars here. They don't have temporary names; you assigned them permanent names, so there are no aliases to peel off. I am guessing that the multiple versions of code for writing the scalar were guesses at the correct code and that you only need each...

You can use table's contents() option: clear set more off sysuse auto table foreign rep78, contents(mean price) Something similar can be achieved with tabout (from SSC): tabout foreign rep78 using testtab.csv, sum cells(mean price) ptotal(none) ...

For some reason calling R from Stata via winexec (or shell or !) opens a different command prompt than does opening the command prompt from the start window. At least in my install, this loads a different set of environment variables so that the library path is the admin library...

One of the many ways this can be done is with a regression followed by two margins* commands: webuse hanley table rating [iw=pop], contents(mean disease) reg disease i.rating [iw=pop] margins rating marginsplot, noci This has the advantage of not altering your data in any way....

The problem seems to be you are including independent variables, and therefore, estimating an ARMAX model. For the out-of-sample forecasts, you need also values for the independent variables AvgPov and AvgEnrol. The model doesn't estimate them; recall the dependent variable is D4.AvgU5MR.

Check help program. An example program (that takes no arguments): // define program capture program drop hello program hello display "hello world!" end // try it out hello ...

I wrote an answer to a similar question which might be useful here, but do not have enough reputation to answer this as a comment, so here goes: You can do stcox treat x1 x2 x3 and stcurve, survival at1(treat=0) at2(treat=1) outfile(stcurve.dta). In the file stcurve.dta you will have the...

This is not in the first instance a precision problem. It is an inevitable problem when (1) the number of values is even and (2) the median is the mean of two central values that are different. Then the median itself is not a value in the dataset and will...

Either gen x2 = sum(x1) or gen x2 = sum(x1 == 1) is sufficient. There is a loop over observations tacit as usual there, but you don't need an explicit loop. In detail, sum() here is a cumulative or running sum. In your case, the first solution is simple and...

This doesn't strike me as a programming question; nevertheless from Stata 13 there is a cls command for precisely this purpose. Users of all versions can see that documented here....

we can not answer this question without knowing the format of your date. I had a similar problem. If I assume your date is in yyyy-mm-dd format, then you need this code (assuming that v1 is the variable that holds your dates in Stata) generate v2 = date(v1, "YMD") format...

You need to create an extra identifier to make replicates distinguishable. clear input id str1 tests testvalue 1 A 4 1 B 5 1 C 3 1 D 3 2 A 3 2 B 3 3 C 3 3 D 4 4 A 3 4 B 5 4 A 1...

You can use extended generate (egen) to do this: egen double sec_avg_income = mean(firm_income), by(SIC) gen double ratio = firm_income/sec_avg_income The first line calculates the mean firm income in each sector. The second constructs the ratio of own income to average sector income. ...

Use a "row" vector instead of a "column" vector. If you check, for example, the stored results of regress, you'll see that this is what is expected. capture program drop mytest program mytest, eclass version 13 syntax varlist [if] marksample touse // mata subroutine creates matrix `b', such as mata:...

Another work-around is forval y = 5/11 { local Y : di %02.0f `y' <code using local Y, which must be treated as a string> } The middle line could be based on `: di %02.0f `y'' so that using another macro can be avoided, but at the cost of...

I am not sure if I understood correctly what you want. It would have been useful if you had added the stset and stcox code necessary before running stcurve. If the Kaplan-Meier hazard graph is identical to your first stcurve, survival you can try a dirty fix by generating a...

One option is to use parallel lists. Some technique: local agrp "cat dog cow pig" local bgrp "meow woof moo oinkoink" local n : word count `agrp' forvalues i = 1/`n' { local a : word `i' of `agrp' local b : word `i' of `bgrp' display "`a' says `b'"...

I may have figured it out: bys group (value): gen d = value[_n+1] - value[_n] bys group: egen result = max(d) drop d ...

As @Roberto Ferrer has commented, if values for a string variable are like "US2006", you could proceed gen year = real(substr(whatever, -4, 4)) gen country = substr(whatever, 1, length(whatever) - 4) The first statement extracts the last 4 characters and converts them to a number. The second statement drops the...

As often seems to happen, deciding in advance that the solution must be based on regular expressions just complicates your code. From your description you need the three characters before the first "-". That would be gen p_id = substr(var1, strpos(var1, "-") - 3, 3) Test example: clear input str21...

At present, the officially written egen function std() does not support operations by. I can't identify a statistical or computational reason for that, but it is well documented. (Why you need luck to get past a documented limitation I don't understand.) In principle, any user could write their own egen...

Your question seems strange to me. You asked about a dummy-dummy interaction, but your example involves continuous-dummy interaction. Here's how to do either one: webuse union, clear /* dummy-dummy iteraction */ probit union i.south##i.black grade, nolog margins r.south#r.black /* continuous-dummy iteraction */ probit union i.south##c.grade margins r.south, dydx(grade) You should...

It's a standard application of forvalues. See the help. There are two nested loops, over the infix 1 2 3 and over the years 2011 2012 2013. For safety, blank out any previous contents first. Make sure you refer to this only within the space where it is created. local...

Here's one approach building on your code. . sysuse auto.dta, clear (1978 Automobile Data) . quiet describe, varlist . local vars `r(varlist)' . display "vars - `vars'" vars - make price mpg rep78 headroom trunk weight length turn displacement gear_ratio foreign . lookfor "Circle" storage display value variable name type...

I can't reproduce this in Stata 13.1. The format of the time variable is used by default on the time (horizontal) axis. Here's a test script. Hint: You should learn to produce such reproducible examples yourself. clear input week weeksum 20093 16 20100 61 20107 34 20114 42 20121 24...

In Stata 14, this can be accomplished with: `=ustrunescape("\u0052\u0303")' This combines the Unicode for capital R with the one for tilde. ...

Escaping double quotes seems to solve the issue: StataMP-64 /e do myDoFile1.do \"C:\MyDir\" So it seems that Windows (7) is stripping the quotes before passing the arguments to Stata....

Try something along the lines of clear set more off *----- example data ----- input /// id str20 date treat match num 1 01feb2000 1 2 2 1 01apr2000 0 . . 1 01jan2002 1 3 1 2 01mar2000 1 3 0 2 01may2000 0 . . 3 01dec2002 1...

Note that NA is the jargon of some other programs, but not native to Stata. Stata calls these "missing values". If you just (1) segregate the observations with missing values, then immediately (2) identifying the last so many observations with non-missing values follows from sorting within the other observations, those...

@William Lisowski debugged your code in a comment, but you can simplify the whole procedure. Create the tuples beforehand using the user-written command tuples (ssc install tuples). clear set more off *----- example data ----- sysuse auto keep mpg weight price gen time = _n tsset time *----- what you...

tabout from SSC may work for you: clear set more off *----- example data set ----- input /// id year occup 1 1999 1 1 2000 1 1 2001 1 2 1999 1 2 2000 2 2 2001 1 3 1999 1 3 2000 2 3 2001 2 4 1999...

I've never used -estout-, but perhaps this will give you a start. webuse nlswork poisson hours i.union##c.tenure, robust estimates store m0 margins union, dydx(tenure) post estimates store m1 estimates restore m0 margins rb1.union, dydx(tenure) post estimates store m2 Why this works: margins needs access to the results of the original...

This is my program (in Stata terms, downloadable via ssc install ciplot) so I can speak confidently. (On Statalist, it's expected that you explain the exact provenance of user-written programs; that would be good practice here too.) It's not a bug; it's a feature (supposedly). The offsets are entirely deliberate,...

You need to dereference the scalar. Here's an example: sysuse auto, clear sum price scalar obs = r(N) scalar mean = r(mean) scalar sd = r(sd) scalar value = 10000 ttesti `=obs' `=mean' `=sd' `=value' ttest price=10000 However, if you just want to test coefficients, why not do that directly?...

Accessing coefficients is as easy as calling _b[varname]; analogously the corresponding standard errors: _se[varname]. An example: webuse grunfeld, clear qui xtreg mvalue invest i.year,fe cluster(company) // coef for invest display _b[invest] // std error for invest display _se[invest] // displayed results in matrix matrix list r(table) For multiple-equation models use...

You don't show all your code but you are evidently fitting an ordered probit upstream. Consider the simplest kind of example: . sysuse auto (1978 Automobile Data) . oprobit rep78 weight [stuff omitted] ------------------------------------------------------------------------------ rep78 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- weight | -.0005881 .0001729 -3.40...