Seeding java.util.Random with consecutive numbers

Tags: java random
By : Parker
Source: Stackoverflow.com
Question!

I've simplified a bug I'm experiencing down to the following lines of code:

    int[] vals = new int[8];
    for (int i = 0; i < 1500; i++)
        vals[new Random(i).nextInt(8)]++;
    System.out.println(Arrays.toString(vals));

The output is: [0, 0, 0, 0, 0, 1310, 190, 0]

Is this just an artifact of choosing consecutive numbers to seed Random and then using nextInt with a power of 2? If so, are there other pitfalls like this I should be aware of, and if not, what am I doing wrong? (I'm not looking for a solution to the above problem, just some understanding about what else could go wrong)


Dan, well-written analysis. As the javadoc is pretty explicit about how numbers are calculated, it's not a mystery as to why this happened as much as if there are other anomalies like this to watch out for-- I didn't see any documentation about consecutive seeds, and I'm hoping someone with some experience with java.util.Random can point out other common pitfalls.

As for the code, the need is for several parallel agents to have repeatably random behavior who happen to choose from an enum 8 elements long as their first step. Once I discovered this behavior, the seeds all come from a master Random object created from a known seed. In the former (sequentially-seeded) version of the program, all behavior quickly diverged after that first call to nextInt, so it took quite a while for me to narrow the program's behavior down to the RNG library, and I'd like to avoid that situation in the future.

By : Parker


Answers

As much as possible, the seed for an RNG should itself be random. The seeds that you are using are only going to differ in one or two bits.

There's very rarely a good reason to create two separate RNGs in the one program. Your code is not one of those situations where it makes sense.

Just create one RNG and reuse it, then you won't have this problem.

In response to comment from mmyers:

Do you happen to know java.util.Random well enough to explain why it picks 5 and 6 in this case?

The answer is in the source code for java.util.Random, which is a linear congruential RNG. When you specify a seed in the constructor, it is manipulated as follows.

seed = (seed ^ 0x5DEECE66DL) & mask;

Where the mask simply retains the lower 48 bits and discards the others.

When generating the actual random bits, this seed is manipulated as follows:

randomBits = (seed * 0x5DEECE66DL + 0xBL) & mask;

Now if you consider that the seeds used by Parker were sequential (0 -1499), and they were used once then discarded, the first four seeds generated the following four sets of random bits:

101110110010000010110100011000000000101001110100
101110110001101011010101011100110010010000000111
101110110010110001110010001110011101011101001110
101110110010011010010011010011001111000011100001

Note that the top 10 bits are indentical in each case. This is a problem because he only wants to generate values in the range 0-7 (which only requires a few bits) and the RNG implementation does this by shifting the higher bits to the right and discarding the low bits. It does this because in the general case the high bits are more random than the low bits. In this case they are not because the seed data was poor.

Finally, to see how these bits convert into the decimal values that we get, you need to know that java.util.Random makes a special case when n is a power of 2. It requests 31 random bits (the top 31 bits from the above 48), multiplies that value by n and then shifts it 31 bits to the right.

Multiplying by 8 (the value of n in this example) is the same as shifting left 3 places. So the net effect of this procedure is to shift the 31 bits 28 places to the right. In each of the 4 examples above, this leaves the bit pattern 101 (or 5 in decimal).

If we didn't discard the RNGs after just one value, we would see the sequences diverge. While the four sequences above all start with 5, the second values of each are 6, 0, 2 and 4 respectively. The small differences in the initial seeds start to have an influence.

In response to the updated question: java.util.Random is thread-safe, you can share one instance across multiple threads, so there is still no need to have multiple instances. If you really have to have multiple RNG instances, make sure that they are seeded completely independently of each other, otherwise you can't trust the outputs to be independent.

As to why you get these kind of effects, java.util.Random is not the best RNG. It's simple, pretty fast and, if you don't look too closely, reasonably random. However, if you run some serious tests on its output, you'll see that it's flawed. You can see that visually here.

If you need a more random RNG, you can use java.security.SecureRandom. It's a fair bit slower, but it works properly. One thing that might be a problem for you though is that it is not repeatable. Two SecureRandom instances with the same seed won't give the same output. This is by design.

So what other options are there? This is where I plug my own library. It includes 3 repeatable pseudo-RNGs that are faster than SecureRandom and more random than java.util.Random. I didn't invent them, I just ported them from the original C versions. They are all thread-safe.

I implemented these RNGs because I needed something better for my evolutionary computation code. In line with my original brief answer, this code is multi-threaded but it only uses a single RNG instance, shared between all threads.

By : Dan Dyer


I would like to add here, because the Exception handling in almost all java / C# code that I have seen is just incorrect. I.e. you end up with very difficult to debug error for ignored Exceptions, or, equally bad, you get an obscure exception which tells you nothing, because blindly following the "catching(Exception) is bad" and things are just worse.

First, understand that an exception is a way to facilitate the returning of error information across code layers. Now, mistake 1: a layer is not just a stack frame, a layer is code which has a well defined responsibility. If you just coded interfaces and impls just because, well you have better things to fix.

If the layers are well designed and have specific responsibilities, then the information of the error has different meaning as it bubbles up. <-this is the key on what to do, there is no universal rule.

So, this means that when an Exception occurs you have 2 options, but you need to understand where in the layer you are:

A) If you are in the middle of a layer, and you are just an internal, normally private, helper function and something goes bad: DONT WORRY, let the caller receive the exception. Its perfectly OK because you have no business context and 1) You are not ignoring the error and 2) The caller is part of your layer and should have known this can happen, but you might not now the context to handle it down below.

or ...

B) You are the top boundary of the layer, the facade to the internals. Then if you get an exception the default shall be to CATCH ALL and stop any specific exceptions from crossing to the upper layer which will make no sense to the caller, or even worse, you might change and the caller will have a dependency to an implementation detail and both will break.

The strength of an application is the decoupling level between the layers. Here you will stop everything as a general rule and rethrow the error with a generic exception translating the information to a more meaningful error for the upper layer.

RULE: All entry points to a layer shall be protected with CATCH ALL and all errors translated or handled. Now this 'handeled' happens only 1% of the time, mostly you just need (or can) return the error in a correct abstraction.

No I am sure this is very difficult to understand. real Example ->

I have a package that runs some simulations. These simulations are in text scripts. there is a package that compiles these scripts and there is a generic utils package that just reads text files and, of course, the base java RTL. The UML dependency is->

Simulator->Compiler->utilsTextLoader->Java File

1) If something breaks in the utils loader inside one private and I get a FileNotFound, Permissions or what ever, well just let it pass. There is nothing else you can do.

2) At the boundary, in the utilsTextLoader function initially called you will follow the above rule and CATCH_ALL. The compiler does not care on what happen, it just needs to now whether the file was loaded or not. So in the catch, re throw a new exception and translate the FileNotFound or whatever to "Could not read file XXXX".

3) The compiler will now know that the source was not loaded. Thats ALL it needs to know. So if I later I change utilsTestLoader to load from network the compiler will not change. If you let go FileNotFound and later change you will impact compiler for nothing.

4) The cycle repeats: The actual function that called the lower layer for the file will do nothing upon getting the exception. So it lets it go up.

5) As the exception gets to the layer between simulator and compiler the compiler again CATCHES_ALL, hiding any detail and just throws a more specific error: "Could not compile script XXX"

6) Finally repeat the cycle one more time, the simulator function that called the compiler just lets go.

7) The finally boundary is to the user. The user is a LAYER and all applies. The main has a try that catches_ALL and finally just makes a nice dialog box or page and "throws" a translated error to the user.

So the user sees.


Simulator: Fatal error could not start simulator

-Compiler: Could not compile script FOO1

--TextLoader: Could not read file foo1.scp

---trl: FileNotFound


Compare to:

a) Compiler: NullPointer Exception <-common case and a lost night debugging a file name typo

b) Loader: File not found <- Did I mention that loader loads hundreds of scripts ??

or

c) Nothing happens because all was ignored!!!

Of course this assumes that on every rethrow you didn't forget to set the cause exception.

Well my 2cts. This simple rules have saved my life many times...

-Ale

By : Alex Vaz


Sometimes that is to only way to handle "catch every exception" scenarios, without actually catching every exception.

I think sometimes, say, lowlevel framework / runtime code needs to make sure that no exception is ever escaping. Unfortunately, there is also no way the framework code can know which exceptions are raised by the code executed by the thread.

In that case a possible catch block could look like this:

try
{
   // User code called here
}
catch (Exception ex)
{
   if (ExceptionIsFatal(ex))
     throw;

   Log(ex);
}

There are three important points here, however:

  1. This isn't something for every situation. In code reviews we are very picky about places where this is actually neccessary and thus allowed.
  2. The ExceptionIsFatal() method assures that we don't eat exceptions which should never be swallowed (ExecutionEngineException, OutOfMemoryException, ThreadAbortException, etc.)
  3. What is swallowed is carefully logged (event-log, log4net, YMMV)

Typically, I'm all for the practice of letting uncaught exceptions simply "crash" the application by terminating the CLR. However, especially in server applications, this is sometimes not sensible. If one thread encounters a problem which is deemed non-fatal, there is no reason in ripping the whole process down, killing off all other running requests (WCF, for example, handles some cases this way).



This video can help you solving your question :)
By: admin