896 33152 365568 nic23a1.dat In 29 Out 8 896 33152 365568 nic23a3.dat In 29 Out 8 896 24192 267008 nic23b1.dat In 19 Out 8 896 24192 267008 nic23b3.dat In 19 Out 8 896 33152 365568 nic23c1.dat In 29 Out 8 896 33152 365568 nic23c3.dat In 29 Out 8 840 31080 342720 nic8a1.dat In 29 Out 8 840 31080 342720 nic8a3.dat In 29 Out 8 840 22680 250320 nic8b1.dat In 19 Out 8 840 22680 250320 nic8b3.dat In 19 Out 8 840 31080 342720 nic8c1.dat In 29 Out 8 840 31080 342720 nic8c3.dat In 29 Out 8 nic23a1 and nic23a3 are the same feature, different target due to different experts. there are 3 data set labelled by a, b, c, respectively, there are 2 experts giving the teaching output labelled by 1 and 3 resp. data set a has 29 features dim but 3 or 4 of them are all 0's data set b has 19 features and data set c has 29 features. It seems normalize each feature individually should give better result.