Flatten-T Swish: a thresholded ReLU-Swish-like activation function for deep learning

Hock Hung Chieng; Noorhaniza Wahid; Ong Pauline; Sai Raj Kishore Perla

doi:10.26555/ijain.v4i2.249


Flatten-T Swish: a thresholded ReLU-Swish-like activation function for deep learning

^{(1) *} Hock Hung Chieng

(Universiti Tun Hussein Onn Malaysia, Malaysia)
⁽²⁾ Noorhaniza Wahid

(Universiti Tun Hussein Onn Malaysia, Malaysia)
⁽³⁾ Ong Pauline

(Universiti Tun Hussein Onn Malaysia, Malaysia)
⁽⁴⁾ Sai Raj Kishore Perla

(Institute of Engineering and Management, India)
^*corresponding author

Abstract

Activation functions are essential for deep learning methods to learn and perform complex tasks such as image classification. Rectified Linear Unit (ReLU) has been widely used and become the default activation function across the deep learning community since 2012. Although ReLU has been popular, however, the hard zero property of the ReLU has heavily hindering the negative values from propagating through the network. Consequently, the deep neural network has not been benefited from the negative representations. In this work, an activation function called Flatten-T Swish (FTS) that leverage the benefit of the negative values is proposed. To verify its performance, this study evaluates FTS with ReLU and several recent activation functions. Each activation function is trained using MNIST dataset on five different deep fully connected neural networks (DFNNs) with depth vary from five to eight layers. For a fair evaluation, all DFNNs are using the same configuration settings. Based on the experimental results, FTS with a threshold value, T=-0.20 has the best overall performance. As compared with ReLU, FTS (T=-0.20) improves MNIST classification accuracy by 0.13%, 0.70%, 0.67%, 1.07% and 1.15% on wider 5 layers, slimmer 5 layers, 6 layers, 7 layers and 8 layers DFNNs respectively. Apart from this, the study also noticed that FTS converges twice as fast as ReLU. Although there are other existing activation functions are also evaluated, this study elects ReLU as the baseline activation function.

Keywords

Deep learning; Activation function; Flatten-T Swish; Fully connected neural networks

DOI

https://doi.org/10.26555/ijain.v4i2.249

Article metrics

Abstract views : 6028 | PDF views : 592

Cite

How to cite item

Full Text

Download

References

[1] R. H. Hahnloser, R. Sarpeshkar, M. A. Mahowald, R. J. Douglas, and H. S. Seung, â€œDigital selection and analogue amplification coexist in a cortex-inspired silicon circuit,â€ Nature, vol. 405, no. 6789, p. 947, 2000, doi: https://doi.org/10.1038/35016072.

[2] K. Jarrett, K. Kavukcuoglu, and Y. LeCun, â€œWhat is the best multi-stage architecture for object recognition?,â€ in Computer Vision, 2009 IEEE 12th International Conference on, 2009, pp. 2146â€“2153, doi: https://doi.org/10.1109/ICCV.2009.5459469.

[3] A. Krizhevsky, I. Sutskever, and G. E. Hinton, â€œImagenet classification with deep convolutional neural networks,â€ in Advances in neural information processing systems, 2012, pp. 1097â€“1105, available at: http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.

[4] A. L. Maas, A. Y. Hannun, and A. Y. Ng, â€œRectifier nonlinearities improve neural network acoustic models,â€ in Proc. icml, 2013, vol. 30, p. 3, available at: https://pdfs.semanticscholar.org/367f/2c63a6f6a10b3b64b8729d
601e69337ee3cc.pdf.

[5] W. Ouyang, A. Aristov, M. Lelek, X. Hao, and C. Zimmer, â€œDeep learning massively accelerates super-resolution localization microscopy,â€ Nat. Biotechnol., 2018, doi: https://doi.org/10.1038/nbt.4106.

[6] P. Wang, R. Ge, X. Xiao, Y. Cai, G. Wang, and F. Zhou, â€œRectified-Linear-Unit-Based Deep Learning for Biomedical Multi-label Data,â€ Interdiscip. Sci. Comput. Life Sci., vol. 9, no. 3, pp. 419â€“422, 2017, doi: https://doi.org/10.1007/s12539-016-0196-1.

[7] W. Xie, J. A. Noble, and A. Zisserman, â€œMicroscopy cell counting and detection with fully convolutional regression networks,â€ Comput. Methods Biomech. Biomed. Eng. Imaging Vis., vol. 6, no. 3, pp. 283â€“292, 2018, doi: https://doi.org/10.1080/21681163.2016.1149104.

[8] A. Valada, L. Spinello, and W. Burgard, â€œDeep feature learning for acoustics-based terrain classification,â€ in Robotics Research, Springer, 2018, pp. 21â€“37, doi: https://doi.org/10.1007/978-3-319-60916-4_2.

[9] B. Xu, N. Wang, T. Chen, and M. Li, â€œEmpirical evaluation of rectified activations in convolutional network,â€ ArXiv Prepr. ArXiv150500853, 2015, available at: https://arxiv.org/abs/1505.00853v2.

[10] D.-A. Clevert, T. Unterthiner, and S. Hochreiter, â€œFast and accurate deep network learning by exponential linear units (elus),â€ ArXiv Prepr. ArXiv151107289, 2015, available at: https://arxiv.org/abs/1511.07289v5.

[11] K. He, X. Zhang, S. Ren, and J. Sun, â€œDelving deep into rectifiers: Surpassing human-level performance on imagenet classification,â€ in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1026â€“1034, https://doi.org/10.1109/ICCV.2015.123.

[12] D. Hendrycks and K. Gimpel, â€œBridging nonlinearities and stochastic regularizers with Gaussian error linear units,â€ ArXiv Prepr. ArXiv160608415, 2016, doi: https://arxiv.org/abs/1606.08415v2.

[13] G. Klambauer, T. Unterthiner, A. Mayr, and S. Hochreiter, â€œSelf-normalizing neural networks,â€ in Advances in Neural Information Processing Systems, 2017, pp. 971â€“980, available at: http://papers.nips.cc/paper/6698-self-normalizing-neural-networks.

[14] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, â€œGradient-based learning applied to document recognition,â€ Proc. IEEE, vol. 86, no. 11, pp. 2278â€“2324, 1998, doi: https://doi.org/10.1109/5.726791.

[15] J. Xiao, Z. Liu, P. Zhao, Y. Li, and J. Huo, â€œDeep Learning Image Reconstruction Simulation for Electromagnetic Tomography,â€ IEEE Sens. J., vol. 18, no. 8, pp. 3290â€“3298, 2018, doi: https://doi.org/10.1109/JSEN.2018.2809485.

[16] F. Belletti, A. Beutel, S. Jain, and E. Chi, â€œFactorized Recurrent Neural Architectures for Longer Range Dependence,â€ in International Conference on Artificial Intelligence and Statistics, 2018, pp. 1522â€“1530, available at: http://proceedings.mlr.press/v84/belletti18a.html.

[17] M. A. Masrob, M. A. Rahman, and G. H. George, â€œDesign of a neural network based power system stabilizer in reduced order power system,â€ in Electrical and Computer Engineering (CCECE), 2017 IEEE 30th Canadian Conference on, 2017, pp. 1â€“6, doi: https://doi.org/10.1109/CCECE.2017.7946634.

[18] J. Han and C. Moraga, â€œThe influence of the sigmoid function parameters on the speed of backpropagation learning,â€ in International Workshop on Artificial Neural Networks, 1995, pp. 195â€“201, doi: https://doi.org/10.1007/3-540-59497-3_175.

[19] P. Ramachandran, B. Zoph, and Q. V. Le, â€œSearching for activation functions,â€ 2018, available at: https://openreview.net/forum?id=SkBYYyZRZ.

[20] G. F. Montufar, R. Pascanu, K. Cho, and Y. Bengio, â€œOn the number of linear regions of deep neural networks,â€ in Advances in neural information processing systems, 2014, pp. 2924â€“2932, available at: http://papers.nips.cc/paper/5422-on-the-number-of-linear-regions-of-deep-neural-networks.

[21] Y. LeCun et al., â€œBackpropagation applied to handwritten zip code recognition,â€ Neural Comput., vol. 1, no. 4, pp. 541â€“551, 1989, doi: https://doi.org/10.1162/neco.1989.1.4.541.

[22] Y. LeCun, Y. Bengio, and G. Hinton, â€œDeep learning,â€ nature, vol. 521, no. 7553, p. 436, 2015, available at: https://www.nature.com/articles/nature14539.

[23] G. Van Rossum, â€œAn introduction to Python for UNIX/C programmers,â€ Proc NLUUG Najaarsconferentie Dutch UNIX Users Group, 1993, available at: http://liuj.fcu.edu.tw/net_pg/python/Intro-Python.pdf.

[24] M. Abadi et al., â€œTensorflow: a system for large-scale machine learning.,â€ in OSDI, 2016, vol. 16, pp. 265â€“283, available at: https://www.usenix.org/system/files/conference/osdi16/osdi16-abadi.pdf.

[25] S. S. Girija, â€œTensorflow: Large-scale machine learning on heterogeneous distributed systems,â€ 2016, available at: https://cse.buffalo.edu/~chandola/teaching/mlseminardocs/TensorFlow.pdf.

[26] X. Glorot and Y. Bengio, â€œUnderstanding the difficulty of training deep feedforward neural networks,â€ in Proceedings of the thirteenth international conference on artificial intelligence and statistics, 2010, pp. 249â€“256, available at: http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf?hc_location=ufi.

[27] N. C. Camgoz, S. Hadfield, O. Koller, and R. Bowden, â€œUsing convolutional 3d neural networks for user-independent continuous gesture recognition,â€ in Pattern Recognition (ICPR), 2016 23rd International Conference on, 2016, pp. 49â€“54, doi: https://doi.org/10.1109/ICPR.2016.7899606.

[28] H.-J. Kim and Y.-H. Kim, â€œClassifying Copyrighted Designs through Convolutional Neural Networks,â€ Int. J. Appl. Eng. Res., vol. 13, no. 1, pp. 590â€“597, 2018, available at: https://www.ripublication.com/ijaer18/ijaerv13n1_79.pdf.

[29] S. K. Gouda, S. Kanetkar, D. Harrison, and M. K. Warmuth, â€œSpeech Recognition: Key Word Spotting through Image Recognition,â€ 2018, available at: https://arxiv.org/abs/1803.03759v1.

[30] L. Botton, â€œLarge-scale machine learning with stochastic gradient descent,â€ in Proceedings of COMPSTATâ€™2010, 2010, pp. 177â€“186, doi: https://doi.org/10.1007/978-3-7908-2604-3_16.

[31] S. Ioffe and C. Szegedy, â€œBatch normalization: Accelerating deep network training by reducing internal covariate shift,â€ 2015, available at: https://arxiv.org/abs/1502.03167.

[32] V. Nair and G. E. Hinton, â€œRectified linear units improve restricted boltzmann machines,â€ in Proceedings of the 27th international conference on machine learning (ICML-10), 2010, pp. 807â€“814, available at: https://www.cs.toronto.edu/~hinton/absps/reluICML.pdf.

[33] X. Glorot, A. Bordes, and Y. Bengio, â€œDeep sparse rectifier neural networks,â€ in Proceedings of the fourteenth international conference on artificial intelligence and statistics, 2011, pp. 315â€“323, available at: http://proceedings.mlr.press/v15/glorot11a/glorot11a.pdf.

[34] C. Dugas, Y. Bengio, F. BÃ©lisle, C. Nadeau, and R. Garcia, â€œIncorporating second-order functional knowledge for better option pricing,â€ in Advances in neural information processing systems, 2001, pp. 472â€“478, available at: http://papers.nips.cc/paper/1920-incorporating-second-order-functional-knowledge-for-better-option-pricing.pdf.

[35] S. Elfwing, E. Uchibe, and K. Doya, â€œSigmoid-weighted linear units for neural network function approximation in reinforcement learning,â€ Neural Netw., 2018, doi: https://doi.org/10.1016/j.neunet.2017.12.012.

[36] E. Alcaide, â€œE-swish: Adjusting Activations to Different Network Depths,â€ ArXiv Prepr. ArXiv 1801.07145, 2018, available at: https://arxiv.org/abs/1801.07145v1.

[37] S. Qiu, X. Xu, and B. Cai, â€œFReLU: Flexible Rectified Linear Units for Improving Convolutional Neural Networks,â€ ArXiv Prepr. ArXiv170608098, 2017, available at: https://arxiv.org/abs/1706.08098.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

___________________________________________________________
International Journal of Advances in Intelligent Informatics
ISSN 2442-6571 (print) | 2548-3161 (online)
Organized by UAD and ASCEE Computer Society
Published by Universitas Ahmad Dahlan
W: http://ijain.org
E: info@ijain.org (paper handling issues)
andri.pranolo.id@ieee.org (publication issues)

View IJAIN Stats

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0

Username
Password
Remember me