Regarding the behavior shield being dependent on the cloud, that isn’t entirely true:

The cloud is just another layer, it checks the cloud for getting info, if the file is clean/bad etc.

The difference of these samples is because of the time between the two tests and that it meanwhile got classified.

Also, the behaviour shield is the last line of defence. Most of the time other mechanism should catch the malware which is why just testing the behaviour shield on it’s own isn’t viable.