I fully agree with what Chris wrote. Adding to that
Quote:
...assembly-level review of the entire application
You don't compare binaries, of course, but the output. They must be identical, except if real (physical) randomness is used. In that case you must be able to supply the entropy source as an input, making the output deterministic. If this is not possible, you should not fully trust the precompiled binaries and use instead your own compiled version. (For the real paranoid the trust issue moves now to the compiler: does it add secretly some backdoor functions?) But, it is an awful lot of work, and should be done by some independent security researcher. If you trust him, your confidence to the SW increases. It is enough that just once, one person finds an anomaly, and the trust will be lost, therefore, supplying the wrong source code is not the best way to cheat.
As I already hinted, cheating is still possible with fully open (innocent looking) source code, too. An example is the handling of duplicate message blocks. You should abort the protocol, but how many of us would not accept the "optimization", that the last block would be used among the duplicates with the same sequence number (opening a back door for a man-in-the-middle attack)? Another example is a homebrewed hash function. Say, the hashed disk serial number and the tick count are used to seed a random number generator. You could argue that these are not known by the server, so the nonces are secure. But, how would you test a sufficiently complex piece of hash function code? I can write a function which provides output only dependent on the last 16 bits of the tick count and the last 4 bits of the disk serial, but internally manipulates all the other parts of the input as if they had an affect on the result. (A few hundred runs of black box testing would likely provide different hash values, so the tester might very easily overlook the problem.) It is very hard to prove this kind of entropy loss (especially if the uniform distribution gets only distorted), and it is even harder to distinguish bugs from malicious code. But, in any event, with this code there are only a million different pseudorandom sequences the client SW can generate, and the first few entries let the server identify the effective seed.
Quote:
You can make a detailed review of their code and you can audit assembly of the pre-built binary. But in practice you will not do it, because it's extremely tedious and takes enormous amount of time
It could and should be done by just one or a few trustworthy testers. You should reject SW that you (or a trustworthy tester) don't fully understand. In security, the SW must be very simple, modular and well commented. Open SSL, for example, does not fit into this category. Just look at the primality testing code. It takes weeks to fully understand that small piece alone. Thankfully, there are people I trust who did go through it line-by-line, but otherwise I would not recommend using it at all.