c++,c,precision,floating-point-precision,double-precision

From the standard: There are three ï¬‚oating point types: float, double, and long double. The type double provides at least as much precision as float, and the type long double provides at least as much precision as double. The set of values of the type float is a subset of...

fortran,precision,hdf5,double-precision

The type double precision and the corresponding kind kind(1.d0) are perfectly well defined by the standard. But they are also not exactly fixed. Indeed there were many computers in history and use different kind of native formats for their floating point numbers and the standard must allow this! So, a...

c,precision,double-precision,fmod

I'd solve this by simply not using floating point variables in that way: for (int i = 0; i <= 500; i += 5) { double d = i / 100.0; // in case you need to use it. if ((i % 25) == 0) print 'X'; } They're usually...

plot,fortran,gfortran,double-precision

I found that the solution was in the .MOD file with the machine code. Naturally, this object file has to correspond with the double precision libraries, and the default after installing DISLIN is for the .MOD file to refer to the simple precision libraries. There exists in a folder named...

fortran,size,binaryfiles,double-precision

An unformatted sequential file - which is what you are using, is a record oriented file format. For every record some book-keeping fields are typically written to the file to enable the processor to navigate from record to record. Details may vary from compiler to compiler, but there is a...

java,double,precision,double-precision

I'm using a decimal floating point arithmetic with a precision of three decimal digits and (roughly) with the same features as the typical binary floating point arithmetic. Say you have 123.0 and 4.56. These numbers are represented by a mantissa (0<=m<1) and an exponent: 0.123*10^3 and 0.456*10^1, which I'll write...

java,for-loop,double,treemap,double-precision

You just ran into one of the main problems in mathematical coding! Congratulations! This is a really annoying one in general, but in your case it's quite solvable. Check out the answer to this question and apply it to your code and it should be fixed. Round a double to...

c++,c++11,random,double,double-precision

The maximum value dist(engine) in your code can return is std::nextafter(1, 0). Assuming IEEE-754 binary64 format for double, this number is 0.99999999999999988897769753748434595763683319091796875 If your compiler rounds floating point literals to the nearest representable value, then this is also the value you actually get when you write 0.99999999999999994 in code (the...

c,floating-point,double,double-precision

Double are represented as m*2^e where m is the mantissa and e is the exponent. Doubles have 11 bits for the exponent. Since the exponent can be negative there is an offset of 1023. That means that the real calculation is m*2^(e-1023). The largest 11 bit number is 2047. The...

There is a syntax error in the inline assembly instruction for the load of the double argument to 32 bit registers. This: asm volatile("mov.b32 {%0,%1}, %2;":"=r"(lo),"=r"(hi):"d"(x)); should be: asm volatile("mov.b64 {%0,%1}, %2;":"=r"(lo),"=r"(hi):"d"(x)); Using a "d" (ie 64 bit floating point register) as the source in a 32 bit load is...

c#,excel,double-precision,integer-division

I think something's funny in your Excel spreadsheet. When I tried this, I get the same answer in both C# and Excel. I believe the correct answer is 9.577520323. Here's a screenshot of my Excel spreadsheet: ...

python,fortran,precision,double-precision,f2py

You should set output in the Fortran code using double precision, output = 1.3d0 In Fortran 1.3 is a single precision constant....

c++,double,floating-accuracy,double-precision,atof

The problem is that it is in general impossible to represent fractional decimal values exactly using binary floating point numbers. For example, 0.1 is represented as 1.000000000000000055511151231257827021181583404541015625E-1 when using double (you can use this online analyzer to determine the values). When computing with these rounded values the number of necessary...

c#,unit-testing,double,double-precision,epsilon

Per the documentation of Double.Epsilon: The value of the Epsilon property reflects the smallest positive Double value that is significant in numeric operations or comparisons when the value of the Double instance is zero. (Emphasis mine.) Adding it to 90.0 does not produce "the next smallest value after 90.0", this...

c,floating-point,double-precision

On the 32-bit platform, ABI constraints make it simpler to use historical floating-point registers; as a consequence, the compiler defines FLT_EVAL_METHOD as 2. This is how you get: Float 63 Double 63 In short, when FLT_EVAL_METHOD is defined to 2 by the compiler, as is the case on your 32-bit...

c#,.net,datetime,double-precision

http://msdn.microsoft.com/en-us/library/system.datetime.addseconds.aspx DateTime.AddSeconds() per the documentation rounds to the nearest millisecond (10,000 ticks). Using ticks: // We have a DateTime in memory DateTime original = new DateTime(635338107839470268); // We convert it to a Unix Timestamp double unixTimestamp = (original - new DateTime(1970, 1, 1)).TotalSeconds; // unixTimestamp is saved somewhere // User...

floating-point,sum,double-precision

Classical rounding error problem. Floating-point addition doesn't obey the usual mathematical rules. In particular, (a+b)+c = a+(b+c) does not hold. If 10^-21 is too much of an error for you, look up Kahan summation. In short, that keeps track of the accumulated rounding errors in summation....

c,floating-point-precision,double-precision

A float is never more accurate than a double since the former must be a subset of the latter, by the C standard: 6.2.5/6: "The set of values of the type float is a subset of the set of values of the type double; the set of values of the...

c++,unix,precision,double-precision,scientific-notation

The number you are getting is related to the machine epsilon for the double data type. A double is 64 bits long, with 1 bit for the sign, 11 bits for the exponent, and 52 bits for the mantissa fraction. A double's value is given by 1.mmmmm... * (2^exp) With...

The BigInteger structure has a Pow method. This structure resides in the System.Numerics namespace, and was introduced in .NET Framework 4.0. You need to add a reference to the System.Numerics assembly before using it. using System; using System.Numerics; public static class Program { public static void Main(string[] args) { Console.WriteLine(BigInteger.Pow(17,...

c++,solaris,precision,double-precision,scientific-notation

You could declare norm as a long double for some more precision. long double wiki Although there are some compiler specific issues to be aware of. Some compilers make long double synonymous with double. Another way to go about solving this precision problem is to work with numbers in the...

opengl,glsl,shader,trigonometry,double-precision

My current accurate shader implementation of 'acos()' is a mix out of the usual Taylor series and the answer from Bence. With 40 iterations I get an accuracy of 4.44089e-16 to the 'acos()' implementation from math.h. Maybe it is not the best, but it works for me: And here it...