machine-learning,apache-spark,bigdata,analysis,mllib

You can use SparkConext.wholeTextFiles(...). It reads a directory and creates RDD for all the files within that directory.

java,arrays,algorithm,sorting,analysis

It is very complex and hard to correctly measure performance of a java program, especially when you want to compare different algorithms due to the fact that the JVM does many clever optimizations during execution (e.g. see Wikipedia: Java performance - Adaptive optimization). One major rule is to perform a...

I would say two use cases, one for each. A bit more bloated, but as I've dealt with changing specs that had details that weren't included "because it's obvious", I'm prefer over-communication rather than the alternative. By defining it as two use cases, it cuts any ambiguety as to whether...

All the comments are right. There are simply no particular performance considerations to worry about here. So write code that's readable first. How about something like this instead?: function func3() { for (var i = 1; i <= 20; i++) { var biz = (i % 3 === 0); var...

java,time,big-o,analysis,questionmark

Actually that algorithm would be O(log(n)). You are dividing by 10 (knocking off a 0 each time through the loop). Generally an algorithm is O(n) if it scales linearly with the size of n, but for this, if you increase the size of n by a factor of 10, you...

javascript,performance,gwt,analysis,monitor

You can view/debug client-side JS code via GWT SuperDevMode.

python,text,nltk,analysis,french

Usually its a better idea to use a list of stopwords of your own. For this purpose, you can get a list of French stopwords from here. The article word 'les' is also on the list. Create a text file of them and use the file to remove stopwords from...

r,analysis,covariance,sem,r-lavaan

See http://lavaan.ugent.be/tutorial/syntax2.html I'm just copying from there, the following code is self-explanatory # three-factor model visual =~ x1 + x2 + x3 textual =~ x4 + x5 + x6 speed =~ NA*x7 + x8 + x9 # orthogonal factors visual ~~ 0*speed textual ~~ 0*speed # fix variance of speed...

algorithm,logging,big-o,analysis

For each execution of the inner loop: for j = i to 1, it will run i steps, with i from 1 to n. So the total time complexity is 1 + 2 + ... + n = n*(n+1)/2 ~ O(n^2) ...

algorithm,big-o,analysis,recurrence-relation,big-theta

A recurrence relation is a way of recursively defining a function. For example, the recurrence relation T(n) = 4T(n / 3) + O(1) says that the function is defined so that if you want to determine T(n), you can evaluate T(n / 3), multiply it by 4, an add in...

Try: prop.table(my.lda$svd^2) According to the documentation, lda(...)$svd is a vector of the singular values, which give the ratio of the between- and within-group standard deviations on the linear discriminant variables. Their squares are the canonical F-statistics. ...

This: SELECT [Measures].[Internet Sales Amount] ON 0 ,[Date].[Calendar].[Calendar Year].MEMBERS ON 1 FROM [Adventure Works]; Returns this: If I just now add in NON EMPTY I should get rid of 2009 and 2010: SELECT [Measures].[Internet Sales Amount] ON 0 ,NON EMPTY [Date].[Calendar].[Calendar Year].MEMBERS ON 1 FROM [Adventure Works]; Now returns: The...

Try this: Add this in front of your query: WITH MEMBER [MEASURES].[X] AS [Data Zameldowania].[Data zameldowania].[Miesiac slownie].membervalue Then amend your order snippet to this: NON EMPTY Order ( [Data Zameldowania].[Data zameldowania].[Miesiac slownie] * [Data Zameldowania].[Rok].[Rok], [MEASURES].[X] ,bdesc //<< changed to break natural hier order ) ON 0 Second attempt. Please...

java,algorithm,memory,data,analysis

They are all O(N) in space usage. (And TreeMap is O(NlogN)) Of course, that doesn't tell you some important facts about how much space these data structures will use in practice. For example: An ArrayList's space usage is not directly proportional to the list size. In the best case, an...

java,time,complexity-theory,space,analysis

Your complexity depends on what you choose for n. If n is the number of files, the complexity is O(n) because each file is visited once.

python,data,matplotlib,analysis

This seems like it would be a lot simpler using a Pandas dataframe. Then, part of your problem is analogous to this question: Read multiple *.txt files into Pandas Dataframe with filename as column header import pandas as pd import matplotlib.pyplot as plt filelist = ['data1.txt', 'data2.txt'] dataframe = pd.concat([pd.read_csv(file,...

If I understand your question, then this: ErinJan[ErinJan == 0] <- HarryJan[ErinJan == 0] would make the replacement across the whole matrix. I am not sure how your columns are arranged, but if you pull out all the column 3s, you should be able to do the same replacement for...

algorithm,queue,constants,analysis

Personally, I don't like interview questions with arbitrary restrictions like this, unless it actually reflects the conditions you have to work with at the company. I don't think it actually finds qualified candidates or rather, I don't think it accurately eliminates unqualified ones. When I did technical interviews for my...

In the first example, where you have 3 non nested for loops (or any kind of loop really), it would simply be O(x + y + z), where x, y, and z are the amount of repetitions of each for loop (assuming constant time inside). However, this plus is only...

I'm terrible with this theoretical run-time analysis stuff, but this might help to look at things another way: for i_{1,n} { if (j>i) -> B[i,j] = sum of A[i],A[i+1],...,A[j] else -> B[i,j] = 0 } Which (I think) happens to be the same as: for i_{1,n} { for j_{1,n} {...

hook,analysis,packet,pcap,ndis

By the time the packets have hit the NDIS layer, the higher-layer metadata about who sent the packets is gone. (If you try to get the current process anyway, you'll find the current process ID is often wrong. NDIS sends traffic in arbitrary process context, not the sender's original context.)...

algorithm,time-complexity,analysis,pseudocode,asymptotic-complexity

To me this maps g(n)=n^2 function since loop for i=1 to n runs n times if you count all values from range [1,n]. Of course if it is [1,n) then you have g(n)=(n-1)^2 but that's matter of convention. Two nested loops , each running n times give you O(n^2) complexity...

import itertools column = [3, 1, 7, 2, 9, 4] You can make a set of pairs like this # You can use set() instead of list() if you want to remove duplicates list(itertools.combinations(column,2)) Output [(3, 1), (3, 7), (3, 2), (3, 9), (3, 4), (1, 7), (1, 2), (1,...

survfit on a coxph model without any other specifications gives the survival curve for a case whose covariate predictors are the average of the population that the model was created with. From the help for survfit.coxph Serious thought has been given to removing the default value for newdata, which is...

algorithm,loops,runtime,analysis,big-theta

n!/3!(n-3)! = n(n-1)(n-2)/3! = (n^2-n)(n-2)/6 = (n^3-2n^2-n^2+2n)/6 = (n^3 -3n^2 + 2n)/6 You can show easily1 that for large enough values of n: 1/2 n^3 < (n^3 -3n^2 + 2n)/6 < 2n^3 So when it comes to asymptotic notation, it is in Theta(n^3), and NOT in o(n^3). (1) One way...

algorithm,language-agnostic,time-complexity,complexity-theory,analysis

Answering for the first part :- for i = 1 to 526 for j = 1 to n^2(lgn)^3 for k = 1 to n x=x+1 This program will run 526 * n^2 * (lg n)^3 * n times = 526 * n^3 * (lg n)^3 times. So, x = x...

analysis,requirements,system-requirements,system-analysis

If you get a rejection that means that either the username or the password is already used by someone else. So if I chose [email protected] as my password and someone else already has it as a password it will get rejected, so then I know that this is someones password...

try this: print(output$loadings, cutoff = 0.3) see ?print.loadings for the details....

excel,indexing,lookup,analysis,vlookup

You can use =SUMPRODUCT() and =INDEX() to perform the double-match. In G2 enter: =INDEX(C$1:C$100,SUMPRODUCT((A$1:A$100=F2)*((B$1:B$100=E2)*ROW($1:100)))) adjust the ranges to match the size of your table....

You could do this quite easily using data.table though it will get more complicated if the number of non-missing values isn't equal between the columns library(data.table) setDT(df)[, lapply(.SD, na.omit), by = Id] # Id date1 date2 # 1: 1 2008-10-02 2008-10-02 # 2: 2 2008-10-02 2008-10-02 ...

algorithm,analysis,insertion-sort

Its a boring night and I wanted to play with latex, so to expand on the comment i left into an actual answer... To summarize your question. Lets start with a basic example. Which is equivalent to This is where the form comes from. Back to a parameterized sum, we...

json,search,elasticsearch,analysis

The explain output depends on the query that is getting executed. If you look carefully at its structure you will notice that it matches the elements of the query and inner elements of the explanation correspond to the inner structures of the query. So, the relevant to you elements can...

objective-c,osx,charts,analysis,stock

Seems like the "go to" solution (in other words, the one everyone runs to) is CorePlot, which you can find on GitHub at: https://github.com/core-plot And there's a RayWenderlich tutorial on stock charts with CorePlot, however his (or their) tutorials are more geared for iOS so you may have to do...

Can have both. If at least one method is abstract, a class is abstract.

sql-server,datetime,mdx,analysis

Please have a look at these mdx functions StrToMember https://msdn.microsoft.com/en-us/library/ms146022.aspx StrToSet https://msdn.microsoft.com/en-us/library/ms144782.aspx These and several other StrTo.. functions are used pretty extensively for passing in parameters. In your example you need to wrap the whole string in something like this: strToMember( "[TP DIM CALENDAR].[Date].&[" + Format(CDate(Parameters!FromParameter.Value),"yyyy-MM-dd") + "T00:00:00]" ) ...

Maintainability is a function of a lot of different parameters. A maintainability around 70 is usually perfectly acceptable. ~70+ is good, 30-70 is in a warning zone, and under 30 is usually a problem. If you want to improve your score, try to move some of those css properties into...

Try this: df['Q_Seen'] = df.stack().values >>> df Q1_Seen Q2_Seen Q3_Seen Q4_Seen Q_Seen Q1a nan nan nan Q1a nan Q2a nan nan Q2a nan nan Q3d nan Q3d nan Q2c nan nan Q2c ...

There are a lot of unknowns here - his walking speed, his painting speed, for how long does the paint in the brush last... But clearly there are two processes going on here. One is quadratic - it's the walking to and fro between the paint can and the painting...

performance,algorithm,big-o,analysis,growth-rate

My best guesses: O( N * log(sqrt(N)) ) Doesn't have closed form expression because of: [indefinite integral of x^x] cannot be expressed in terms of a finite number of elementary functions... [1] O( N^3 ) ...

Assuming "display(i,j)" is done in constant time (or a single operation) and that we don't count any cost for incrementing variables, then the total cost is: N*((N^3 - 4) + (N^2 + 1)) = N^4 +N^3 - 3N You are correct that it is O(N^4). This is because (for large...

django,data,scikit-learn,analysis

Django is a python framework meaning that you need to have python installed to use it. Once you have python, you can use whatever python package you want (compatible with the version of python you are using).

r,audio,runtime-error,analysis,tuner

Wave files are limited to 4GB of audio data because all of the size fields in a wave header are 32-bits. See http://en.wikipedia.org/wiki/WAV#Limitations It's possible that WavePad uses the W64 format mentioned in the Wikipedia article but that readWave does not....

java,analysis,jprofiler,heap-dump

In interactive mode, you can right-click an existing bookmark and change its name. If you use the API or a trigger action, you can specify a name when the bookmark is created.

c#,optimization,analysis,ca2000

This is nearly a duplicate of How to fix a CA2000 IDisposable C# compiler warning, when using a global cache. Maybe it should be considered a duplicate of that one. I'm not sure. Code Analysis is legitimately complaining that it is theoretically possible for the method to complete without the...

"Can static analysis guess that GetValue(x) will have a constant value ?" That totally depends on the capabilities and quality of your static code analysis tool. From theory that's possible to detect, yes. If you meant what the compiler can deduce about constant expressions, it also depends on the...

c++,algorithm,complexity-theory,analysis

The runtime of main() is composed of the runtime of some constant-time statements and the runtime of the i-loop: T_main(n, l) ∈ O(1) + T_fori(n, l) The i-loop runs exactly (n - 1) times and is composed of some constant-time statements and the runtime of the j-loop: T_fori(n, l) ∈...