Faculty of Informatics

Tools

Info for

Italiano

About

Study

Research

Practicalities

News and events

Events

July

2024

19.
07.
2024

Nothing is exact - the role of geometry in R/D for CAD software

Seminars

July

2024

22.
07.
2024

PyTamaro Summer Academy 2024

Workshop, Seminar

Context Aware Collaborative IoT Services in a Smart World

Seminars

July

2024

25.
07.
2024

The Multilevel Kernel-method for Compactly Supported RBFs and its Application in High-dimensional Approximation

Seminars

September

2024

04.
09.
2024

SIESTA 2024 - 4th International Software Engineering Summer School

Workshop, Conference

September

2024

11.
09.
2024

Bernoulli Society Worskhop - Innovation in dynamic network modelling

Workshop

September

2024

17.
09.
2024

ICANN 2024 - The 33rd International Conference on Artificial Neural Networks

Conference

October

2024

09.
10.
2024

A Lugano la 13. edizione della DACH+ Energy Informatics Conference

Conference

Test Case Generation and Fault Localization for Data Science Programs

Staff - Faculty of Informatics

Date: 13 June 2024 / 09:00 - 12:00

USI East Campus, Room D1.15

You are cordially invited to attend the PhD Dissertation Defence of Mohammad Rezaalipour on Thursday 13 June 2024 at 09:00 in room D1.15.

Abstract:
Data science refers to inter-disciplinary approaches designed to extract knowledge from vast amounts of data. It combines techniques from fields such as statistics and machine learning to develop novel applications for different science and engineering domains. Data science approaches are implemented as programs usually written in languages such as R or Python, collectively referred to as data science programs. Due to their inter-disciplinary usages, these programs are often written by domain experts possibly unfamiliar with the best practices of software development, and thus, they may exhibit low quality. In fact, there is evidence that these programs contain several bugs, often different in nature compared to those found in traditional programs. As a result, data science programs challenge conventional debugging techniques such as those from test generation and fault localization activities, due to the unique nature of bugs found in them. Additionally, being written in dynamically typed languages such as Python adds to the difficulties of testing and analyzing them. These challenges call for research into new debugging techniques tailored specifically for these programs, which is the focus of the current dissertation. Precisely, this thesis aims to understand the capabilities and limitations of standard test generation and fault localization techniques on data science programs implemented in dynamic languages such as Python. To achieve this goal, the dissertation presents contributions in three areas: i) a test generation technique for neural network (NN) programs, a wide spread class of data science programs; ii) an empirical study of fault localization in Python programs; and iii) two debugging tools and a curated dataset of NN bugs. In the first area, we investigated and identified the limitations of general-purpose test generation techniques on NN programs, which led to the development of aNNoTest, a novel test generation technique tailored for NN programs. We evaluated aNNoTest on 19 open-source programs, demonstrating its effectiveness at finding bugs in real-world NN programs. In the second area, we conducted the first large-scale multi-family empirical study of fault localization in Python programs. Targeting 135 bugs from 13 projects, we studied seven fault localization techniques from four families along with combinations of them. We considered different fault localization granularity levels and measured both effectiveness and efficiency in our analyses. In the third area, we developed: i) the aNNoTest tool, an implementation of the aNNoTest approach mentioned above; ii) FauxPy, to our knowledge, the first open-source multi-family fault localization tool for Python; and iii) a curated dataset of NN bugs, for which aNNoTest was used to generate tests. Along with supporting the domain with the tools and techniques we developed, we hope our contributions will be beneficial to inform the development of more effective debugging techniques for Python data science programs.

Dissertation Committee:
- Prof. Carlo Alberto Furia, Università della Svizzera italiana, Switzerland (Research Advisor)
- Prof. Michele Lanza, Università della Svizzera italiana, Switzerland (Internal Member)
- Prof. Paolo Tonella, Università della Svizzera italiana, Switzerland (Internal Member)
- Prof. Domenico Bianculli, University of Luxembourg, Luxembourg (External Member)
- Prof. Gordon Fraser, University of Passau, Germany (External Member)

Contact

Staff - Faculty of Informatics

+41 58 666 46 90

[email protected]

Attachments

Add to your calendar

Share

Facebook

Twitter

LinkedIn

Whatsapp

Email

Print

Faculty of Informatics
Università della Svizzera italiana
Via Buffi 13
6900 Lugano, Svizzera
tel +41 58 666 46 90
e-mail [email protected]
Other contacts Feedback on the website

Directions

How to get to the Faculty

Stay in touch

About

Study

Research

Practicalities

News and events

Test Case Generation and Fault Localization for Data Science Programs

Contact

Attachments

Share

Print

Directions

Stay in touch