Machine learning in secondary statistics education - challenges and possible ways to address them

Presented at: 7 March 2023; 20:00 UTC

Webinar duration: 90 minutes

Presenter(s): Rolf Biehler, Yannik Fleischer, Paderborn University, Germany

Link to video

Presentation slides

Statistics is a field that is rapidly evolving towards data science, and with it, data-driven machine learning (ML) is emerging as a field with numerous applications that can be both beneficial and controversial. ML has been successful in producing useful applications for society, but it has also raised concerns about bias in algorithms and data, surveillance misuse, and delegating responsibility to machines. The general public often lacks access to ML and AI methods, which can lead to difficulty distinguishing between legitimate concerns and false suspicions. This is why ML and AI literacy should be an essential component of data literacy.

The Project Data Science and Big Data (ProDaBi, www.prodabi.de/en) at the Paderborn University is investigating which aspects of ML can and should be taught in secondary school statistics classes. We are developing and testing teaching materials for students in grades 5/6, 9/10, and 12/13. One of our primary focuses is on data-based decision trees as a specific ML method. For students in grades 5/6, we use an unplugged approach with data cards, while for older students, we use various digital tools such as CODAP for data exploration and semi-automatic decision tree construction, and Jupyter Notebook with integrated libraries in prepared environments that allow for applying ML algorithms without requiring coding skills from students.

During the talk, we will present our approach to ML in statistics education, our materials, and the use of digital tools, as well as the datasets we use. Nutrition data, data on media use of adolescents, and data from parking space occupancy are used by students to develop predictive models. We will also discuss the evaluation of models and sources of bias, as well as potential problems in the societal use of ML methods. Overall, our goal is to promote ML and AI literacy and equip students with the knowledge and skills they need to navigate this rapidly developing field in a responsible and informed manner.


Dr. Rolf Biehler is professor for didactics of mathematics at Paderborn University. His research interests include probability, statistics and data science education, university mathematics education, and the professional development of mathematics teachers. He is engaged in the International Association of Statistics Education (IASE) and has worked as an editor or editorial board member in several international journals and book series for mathematics education. He is currently co-directing the Project Data Science and Big Data at School (www.prodabi.de/en) and is the chairperson of the Advisory Board of the Statistics Education Research Journal (SERJ).

Yannik Fleischer is a PhD student in mathematics education research at Paderborn University, Germany. His main research interest is developing a conception for teaching machine learning methods in school with a focus on decision trees, and to evaluate this by developing and examining teaching materials in practice. Since 2019, he has been teaching year-long project courses on data science in upper secondary and developing, implementing, and evaluating teaching modules for different levels in secondary school, mainly about machine learning with decision trees.