My Projects

Interpretable Machine Learning Research

For the past year and a half, as a member of the Duke Interpretable Machine Learning Lab, I have been developing interpretable models that use genetic and visual information to classify insects and to provide insights about the connection between the two types of data. I work with one other student under the advice of Cynthia Rudin, Chaofan Chen, and graduate student advisors.

We have explored and ruled out several approaches, including phylogenetic trees where each node is an interpretable multimodal model, and parallel models with loss terms designed to pair genetic and image features. Though they obtained high accuracy, they had no utility over pure genetic models, which had very high accuracy on their own. We decided to pivot to developing models that obtain very high accuracy while requiring minimal genetic information.

Our current work, targeted for submission to CVPR this November, presents a suite of models designed to do just that.

Multimodal ProtoPNet
We extended the ProtoPNet architecture which learns latent space “prototypes” that correspond to input space features. In the case of insect images, these may be features like head shape or wing type. The model classifies insects by explicitly identifying which features are present in a sample.

Example prototype Activation of a prototype on an insect sample.

Our first contribution was designing a family of ProtoPNets to handle genetic information. They all take fixed-length, one-hot encoded vectors of genetic information as input and use a small CNN to embed them in a feature space. Normally, the ProtoPNet architecture classifies using these embeddings, completely disregarding spatial information in the latent space. When using genetic information, this harms performance and interpretability, since the absolute position of nucleotides matters. To address this, I implemented a few remedies including traditional positional encoding, and a novel approach using fixed-location prototypes.

Our second contribution was adding a layer that combines genetic and image ProtoPNet models in a manner that minimizes the importance of genetic information.

Multimodal ProtoTree
We built on the ProtoTree architecture, which organizes prototypes into a decision tree. To classify, the tree is traversed based on how well each node’s prototype corresponds with the sample in question.

We integrated the same genetic ProtoPNet described above into this framework and extended it to use both image and genetic information at once. We then introduced a new loss term and pruning step to ensure that most of the decision tree would use visual features, and that genetic features would only be used when necessary.

Diagram of our ProtoTree evaluation Example classification of an insect using our multimodal ProtoTree. Red and green nodes use genetic and visual information respecitvely.

We are also exploring a more general, non-interpretable approach that combines generic models and provides a likelihood that additional genetic information could change a prediction.

Minerva CubeSat

As a member of Duke AERO, our high-powered rocketry team, I co-led the development of a 5U CubeSat, “Minerva,” that deployed from our student-built 30,000-ft rocket, “Perseus.” Minerva featured an active guided parachute, a two-axis gimbal camera, live video, and basic computer vision. All components, including the flight computer, gimbal, ground station, and parachute system, were entirely student-designed.

I led the development of all electrical components, co-led the design of the CubeSat as a whole, designed the flight computer, and wrote all flight and ground station software.

Minerva PCB

The guided parachute system used two 5-turn servos to adjust line lengths. Moving them in parallel induced forward or backward glide, while differential movement enabled rotation. Our goal was to glide toward a predetermined landing site and avoid dense vegetation during landing. While the system worked mechanically, Minerva’s low mass made it too susceptible to wind for effective control.

Minerva PCB Diagram Parachute Rigging Configuration

The gimbal, designed in-house, used two brushless DC motors, an IMU, and a camera connected through a slip ring to provide smooth, stable video during descent. The camera interfaced with a Raspberry Pi, which handled local recording, vegetation detection for landing, and analog conversion for 1.3 GHz video transmission. Unfortunately, both the primary and backup Raspberry Pi failed shortly before competition, leaving the video system nonfunctional.

The ESP32-based flight computer integrated a GPS, IMU, and barometric sensor, along with motor drivers, buck converters, pyro channels, and power management circuitry. It linked to a ground station with a Python backend and Vue frontend that combined telemetry with the video feed for real-time visualization.

Minerva PCB Minerva PCBs

This project was the most impactful learning experience of my life. Due to delays in relaunching Pitchfork, our rocket from 2024, and an unexpected rule change, we fell behind schedule. I chose to pursue every planned subsystem rather than scaling back, which limited testing and caused issues on launch day.

Now, as Avionics Lead, I am determined not to make the same mistakes. I am focusing on doing fewer projects, and doing them well. And I am emphasizing testing throughout our whole development process.

FPGA Checkerboard

Along with a classmate, I built a physical checkerboard that detects player moves, computes a response using a custom CPU designed in Verilog and implemented on an FPGA, and displays the result with LEDs.

For the project, I handled move detection, implemented all new FPGA features, and wrote the opponent move algorithm, while my partner handled LEDs and physical fabrication.

Checkerboard

Each playable square contained a Hall effect sensor and an LED. Magnetic pieces triggered the sensors when moved. To handle input from 32 sensors simultaneously, we chained together four 8-bit shift registers. A similar shift-register-based design controlled the LEDs.

The system’s core was a Xilinx FPGA running our custom CPU designed with structural Verilog. Beyond standard Von Neumann features such as pipelining and multiplication/division, we integrated memory-mapped I/O for sensor and LED control, and custom shift instructions to accelerate move calculation.

Although time constraints prevented us from implementing minimax algorithm with alpha-beta pruning that we hoped to use for move generation, we built a simpler algorithm that reliably generated legal moves.

HiDDeN Signature

As part of my Adversarial Machine Learning class, I reimplemented HiDDeN: Hiding Data With Deep Networks, an approach for hiding JPEG compression-resistant messages in images that can later be extracted, and that can't be recovered by adversaries. I adapted this idea to embed digital signatures within images themselves, providing non-repudiation. This would allow a person, or AI model, to create an image and allow others to verify that they were the original author. It would prevent adversaries from improperly attributing objectionable images to an author.

Comparison between watermarked and non-watermarked images Comparison between images before (top) and after (bottom) hiding watermarks in them
Implementation

Naively, one could sign an image by hashing all but the least significant bits (LSBs) of the image, sign the hash, and store the signature in the LSBs. This works without compression but fails once compression alters pixel values. To handle compression, I needed a representation of the image that remains stable.

To do this, I used encrypted ViT embeddings! The approach works like this:

  1. An image creator creates an image.
  2. They pass the image through a ViT to get a condensed representation of the visual content of the image.
  3. They encrypt the ViT embedding using a secret key.
  4. They hide this encrypted embedding in the image using HiDDeN.

To verify that the image creator was the true author of the image, and that the image wasn't created by an adversary one could

  1. Use HiDDeN to extract the message from the image
  2. Decrypt the message using the author's public key, to obtain the original ViT embedding.
  3. Compute the ViT embedding of the image
  4. Compare the new embedding with the original embedding. If they are close, then the image was generated by the author.

To improve data integrity, I added error-correcting codes and checksums to each message.

To ensure that image embeddings stayed consistent after the images were watermarked with messages and underwent compression, I trained a ViT encoder with a contrastive loss to align pre- and post-compression embeddings while keeping embeddings of different images distinct.

The results were mixed. The technique worked almost perfectly without any compression. The watermarks were able to survive mild JPEG compression 72% of the time but failed under stronger, more realistic compression.

Ongoing Projects

Duke AERO — I am currently co-leading Duke AERO's Avionics team of approximately 15 members to design reliable electrical systems, flight computers, telemetry, live video, and active control systems for a 30,000-ft rocket. (C++, KiCad)

Robotic Whiteboard — I am working in a team of three to build a whiteboard drawing robot that can draw and erase user input images, text, and LaTeX. (ESP32, Embedded Systems, C++)

Esoteric Programming Language — I am working on an esoteric, interpreted programming language where all symbols can be reassigned. (C++)

Other Projects

Duke AERO Payload 23/24 — As a member of the 23/24 Duke AERO Payload Subteam, I designed electrical systems, wrote flight software, and developed all ground station software for a 6U deployable CubeSat.

NEAT Brachiation — I developed a library (Python, CUDA) implementing NEAT, a genetic algorithm for generating neural networks. I used it to teach a gibbon-like agent to brachiate (swing).

Travel Listing Description Generator — I developed an internal tool that used the ChatGPT functions API to automatically query a local company's database to generate travel listing descriptions and blog posts. It could integrate imagery, markdown, and include widgets like trail maps.

Proposal Builder — I developed a full-stack web app using Typescript, Node.JS, PostgreSQL, and Vue to allow a local company's marketing team to quickly build proposals for clients. It allowed users to flexibly design proposals, create and use templates, and export/import PDFs.

Capra - Travel Listing Video Generator — I developed a Python tool, using MoviePy and FFmpeg, to automatically generate travel listing teaser videos for a local travel software company. It took imagery, audio, video, and text content and automatically generated varied, and interesting summary videos for travel listings. I also developed transformations to turn static images into realistic drone shots and handheld shots.