Παρουσίαση της Μεταπτυχιακής Εργασίας του μεταπτυχιακού φοιτητή του Τμήματος Επιστήμης Υπολογιστών κ. Ξανθόπουλου Ιορδάνη με θέμα:
“A Qualitative, Quantitative and User-based Methodology of Automated Machine Learning Systems Evaluation ”
03 Σεπτεμβρίου 2020, 18:00-20:00
Περιγραφή: Automated Machine Learning (AutoML) is a rapidly rising sub-field of Machine Learning. AutoML aims to fully automate the machine learning process end-to-end, democratizing Machine Learning to non-experts and drastically increasing the productivity of expert analysts. So far, most comparisons of AutoML systems focus solely on quantitative criteria such as predictive performance and execution time. In this thesis, we present a multi-level methodology to adequately evaluate these complex systems. We start off by examining AutoML services for predictive modeling tasks from a user's perspective, going beyond predictive performance. We present a wide palette of criteria and dimensions on which to evaluate and compare these services as a user. The comparison indicates the strengths and weaknesses of each service, the needs that it covers, the segment of users that is most appropriate for, and the possibilities for improvements. To further expand on the evaluation of the user experience, we create and conduct a custom user study, focusing on the user experience and usability of AutoML systems. In this study, the users are asked to perform a ML analysis using 3 state-of-the-art systems and grade their ease-of-use. Their responses provide useful feedback to the AutoML systems' development teams regarding UX bottlenecks and flawed design decisions. Continuing, we present our own quantitative evaluation methodology, by setting rules and preferences for selecting candidate AutoML systems, benchmark datasets and performance metrics. For our quantitative criteria, we emphasize on the accuracy of the estimation of predictive performance, as well as their hold-out performance. Additionally, we perform an analysis based on the metafeatures of our benchmark and evaluate how different data characteristics are correlated to the outcome. The results show most systems overestimate their output's performance, while there are no major differences between them when it comes to ranking them based on hold-out performance. In both cases, the results have a trend with the selected metafeatures.