use ffz primitive which maps to ARCv2 instruction, vs. non atomic
__test_and_set_bit
It is unlikely if we will even have more than 32 counters, but still add
a BUILD_BUG to catch that
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>