1 Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep Learning, MIT Press, 2016, https://www.deeplearningbook.org/
2 Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, and Jonathan Taylor, An Introduction to Statistical Learning with Applications in Python, Springer, 2023, https://link.springer.com/book/10.1007/978-3-031-38747-0
3 Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep Learning, MIT Press, 2016, https://www.deeplearningbook.org/
4 Vincent Vandenbussche, The Regularization Cookbook, Packt Publishing, 2023.
5 Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, and Jonathan Taylor, An Introduction to Statistical Learning with Applications in Python, Springer, 2023, https://link.springer.com/book/10.1007/978-3-031-38747-0
6 Max Kuhn and Kjell Johnson, Applied Predictive Modeling, Springer, 2016. Ludwig Fahrmeir, Thomas Kneib, Stefan Lang, and Brian D. Marx, Regression: Models, Methods and Applications, 2nd edition, Springer, 2021.
7 Trong-Hieu Nguyen-Mau, Tuan-Luc Huynh, Thanh-Danh Le, Hai-Dang Nguyen, and Minh-Triet Tran, "Advanced Augmentation and Ensemble Approaches for Classifying Long-Tailed Multi-Label Chest X-Rays," Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2023, pp. 2729-2738, https://openaccess.thecvf.com/content/ICCV2023W/CVAMD/html/Nguyen-Mau_Advanced_Augmentation_and_Ensemble_Approaches_for_Classifying_Long-Tailed_Multi-Label_Chest_ICCVW_2023_paper.html . Changhyun Kim, Giyeol Kim, Sooyoung Yang, Hyunsu Kim, Sangyool Lee, and Hansu Cho, "Chest X-Ray Feature Pyramid Sum Model with Diseased Area Data Augmentation Method," Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2023, pp. 2757-2766, https://openaccess.thecvf.com/content/ICCV2023W/CVAMD/html/Kim_Chest_X-Ray_Feature_Pyramid_Sum_Model_with_Diseased_Area_Data_ICCVW_2023_paper.html
8 Grégoire Montavon, Geneviève B. Orr, and Klaus-Robert Müller, Neural Networks: Tricks of the Trade, 2nd edition, Springer, 2012.
9 Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov, "Dropout: A Simple Way to Prevent Neural Networks from Overfitting," Journal of Machine Learning Research, Vol. 15, No. 56, 2014, pp. 1929−1958, https://jmlr.org/papers/v15/srivastava14a.html
10 Max Kuhn and Kjell Johnson, Applied Predictive Modeling, Springer, 2016.
11 Rahul Parhi and Robert D. Nowak, "Deep Learning Meets Sparse Regularization: A Signal Processing Perspective," IEEE Signal Processing Magazine, Vol. 40, No. 6, 2023, pp. 63-74, https://arxiv.org/abs/2301.09554
12 Stephen Hanson and Lorien Pratt, "Comparing Biases for Minimal Network Construction with Back-Propagation," Advances in Neural Information Processing Systems 1, 1988, pp. 177-185, https://proceedings.neurips.cc/paper/1988/file/1c9ac0159c94d8d0cbedc973445af2da-Paper.pdf
13 David P. Helmbold, Philip M. Long, "Surprising properties of dropout in deep networks," Journal of Machine Learning Research, Vol. 18, No. 200, 2018, pp. 1−28, https://jmlr.org/papers/v18/16-549.html
14 Guodong Zhang, Chaoqi Wang, Bowen Xu, and Roger Grosse, "Three Mechanisms of Weight Decay Regularization," International Conference on Learning Representations (ILCR) 2019, https://arxiv.org/abs/1810.12281
15 David P. Helmbold and Philip M. Long, "Fundamental Differences between Dropout and Weight Decay in Deep Networks," 2017, https://arxiv.org/abs/1602.04484v3
16 Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep Learning, MIT Press, 2016, https://www.deeplearningbook.org/