School of Computing Graduate Theses

Permanent URI for this collection


Recent Submissions

Now showing 1 - 5 of 519
  • Item
    A Smartwatch-Based Continuous Authentication System
    (2024-05-16) Gholami, Arash; Computing; Alaca, Furkan; Zulkernine, Mohammad
    Conventional authentication methods can protect unattended devices if they are logged-out; however, an abandoned logged-in device remains vulnerable to unauthorized access. Inactivity timeouts can help to mitigate this threat; however, a long timeout increases susceptibility to attack, whereas a short timeout sacrifices usability. Continuous authentication aims to continuously and non-intrusively check if the user currently using the system is the same user who initially logged-in. If so, the user remains logged-in; otherwise, the user is logged-out. We design and evaluate a comprehensive data processing pipeline for smartwatch-based continuous authentication systems using inertial sensor data. Our pipeline uses a Siamese convolutional neural network to learn and extract features and one-class classifiers to classify authentication attempts as either legitimate or malicious. To the best of our knowledge, our work is the first to use learned features and one-class classifiers for continuous authentication with smartwatch inertial sensor data. We compare our learned features with hand-picked features proposed in prior work; we show that our learned features achieve lower equal-error rate (EER) for shorter-duration time windows and achieve similar EER for longer-duration time windows. These results indicate that learned features are a promising approach to detect malicious authentication attempts more accurately in a shorter time window. Based on the insights gained from our work, we make recommendations for future work that would help improve the performance and real-world feasibility of smartwatch-based continuous authentication systems.
  • Item
    Limited Existential and Universal Width Alternating Finite Automata
    (2024-04-30) Zakzok, Mohammad; Computing; Salomaa, Kai
    There have been many studies on several models of limited nondeterministic finite automata (NFAs), including studies on tree width, branching, guessing, and ambiguity of NFAs. However, the study of limited alternating finite automata (AFAs), a generalization of NFAs that allows for alternation between existential and universal states, is fairly new in the literature. Drawing inspiration from limited NFAs, recent studies have introduced maximal and optimal existential and universal widths of AFAs, designed to quantify the amount of existential and universal power used by an AFA. In this thesis, I explore the relation between limited optimal existential and universal width AFAs, limited maximal existential and universal width AFAs, NFAs, universal finite automata (UFAs), and deterministic finite automata (DFAs). More specifically, the study focuses on the state complexity of those models and the way they relate to each other. I show improved upper bounds for converting limited existential and/or universal width AFAs into NFAs, UFAs, and DFAs, largely using improved variations of subset construction. I also establish lower bounds for the size of NFAs, UFAs, and DFAs simulating limited existential and/or universal width AFAs. Although the upper and lower bounds provided are not the same, they are reasonably close.
  • Item
    On the Release-Readiness of ML-Based Products
    (2024-04-23) Patel, Harshkumar; Computing; Adams, Bram; Hassan, Ahmed E.
    In the rapidly evolving fields of machine learning (ML) and artificial intelligence (AI), assessing the release-readiness of software products presents significant challenges. These include the unpredictability of AI results due to non-determinism, the complexity of comprehensive testing without clear model guidelines, and the need to comply with dynamic global regulations. To mitigate potential reputational risks, it is crucial for organizations to conduct detailed release-readiness evaluations. Traditional ML systems require checks like unbiased training data and model quality verification before deployment. Generative AI systems need additional checks to address challenges such as misinformation and defense against adversarial attacks like prompt injection. This thesis explores release-readiness across traditional and generative AI systems. Traditional approaches often follow established checklists and assume newer models are superior, disregarding the utility of older models post-deployment. An empirical study across eight long-lived Apache projects with 84,343 commits illustrates the benefits of model recycling, which preserves older models' utility and enhances software release-readiness. We evaluated three Just-In-Time (JIT) defect prediction models—Random Forest, Logistic Regression, and Neural Network. Our findings show that model recycling significantly outperforms traditional retraining in metrics like recall, g-mean, AUC, and F1 score by up to 30%, 20%, 11%, and 10%, respectively. Model Stacking, the top recycling strategy, improves outcomes in over 50% of the projects but increases inference time. For generative AI systems, the thesis identifies the absence of a comprehensive checklist tailored to the unique complexities of release-readiness. We synthesized a detailed checklist from a systematic review of 65 grey literature sources and insights from 44 organizations, guiding practitioners in evaluating key release-readiness aspects such as performance, monitoring, and deployment strategies. This research bridges the gap between theory and practice in release-readiness, advancing the discourse on ML software systems and providing actionable insights for managing ML model lifecycles. It contributes to the development of more reliable and sustainable ML-based software systems, facilitating their successful integration into the real-world.
  • Item
    Empirical Evaluation of Graph-Anonymized Metrics for JIT Defect Prediction
    (2024-04-23) Malik, Akshat; Computing; Adams, Bram; Hassan, Ahmed E.
    Software analytics data, housed in version control and issue report repositories, serves pivotal roles in organizational tasks like predicting code bugs and selecting code reviewers. Yet, individual organizations often lack the breadth of data needed for these analytics, making data sharing essential. However, privacy concerns impede such sharing, as it could expose sensitive organizational information or compromise product quality. Additionally, the risk of reverse-engineering training data from shared models further complicates data privacy. To address these challenges, anonymization techniques like MORPH, LACE, and LACE2 provide privacy to defect prediction data. While effective, they often sacrifice metric performance due to a disregard for data relationships during anonymization. To preserve these relationships, we propose graph anonymization techniques. These methods maintain data connections while ensuring privacy. Our research evaluates four specific graph anonymization methods—Random Add/Delete, Random Switch, k-DA, and Generalization—across six large software projects. We assess privacy using the Increased Privacy Ratio (IPR) metric, finding that each method achieves privacy scores above 65%, with Random Add/Delete and Random Switch exceeding 80% for all projects. Within-project analyses reveal minimal impacts on model performance, with median reductions of 1.45% in AUC, 5.35% in Recall, 2.29% in G-Mean, and 4.42% in FPR when privacy exceeds 65%. However, when privacy scores surpass 80%, performance declines further. Notably, graph anonymization outperforms tabular methods like MORPH and LACE, which significantly diminish model performance. In cross-project analyses, graph anonymization maintains model performance with marginal reductions in metrics compared to non-graph methods. Combining graph with non-graph anonymization techniques mitigates performance declines observed with non-graph methods alone, highlighting the efficacy of graph anonymization in preserving privacy while sustaining model performance. Our study underscores the effectiveness of graph anonymization in maintaining privacy without compromising model performance across within-project and cross-project settings. Even when employing diverse anonymization strategies, graph anonymization techniques consistently deliver robust performance, showcasing their viability in real-world applications.
  • Item
    Directed Exploration In Deep Reinforcement Learning Through Optimism
    (2024-04-02) Hashemi, Seyed Masih; Computing; Rivest, François; Givigi, Sidney
    Reinforcement learning is the science of decision-making inspired by behavioral psychology and based on learning from positive or negative rewards. One of the main dilemmas in any Reinforcement Learning task is the exploration-exploitation tradeoff: The question of when the agent should exploit known rewards and when it should take actions that it believes are suboptimal to learn more about its environment. The importance of this problem becomes magnified when dealing with deceiving rewards and reward-sparse environments where the agent seldom receives feedback on the task it is trying to learn. In deep reinforcement learning, intrinsic motivation and directed exploration are the dominant strategies. Optimistic initialization of value estimates is proven to be useful when the tasks are simple enough so that the table-based technique can be executed. In this thesis, I first devise a custom sparse-reward and deceptive-reward environment with the property of one-to-one mapping between game images and states. I also outline a methodology for initializing a Deep Q-Network optimistically with some prior knowledge of the environment and show how this simple strategy can be effective even when compared to more complicated intrinsic reward methods. Furthermore, I proposed a novel algorithm to generate optimistic-like exploration without the need for prior knowledge of the environment through the detection and exploration of pessimistic (under-valued) regions.