Ananya AthreyasUC Berkeley · Data ScienceAI / ML builder

project-first landing page

Building machine learning pipelines, model adaptation workflows, and practical systems around real data.

I'm a Data Science student at UC Berkeley interested in AI/ML systems, workflow automation, evaluation pipelines, and text-heavy modeling. This page is centered on the actual things I've built: projects, pipeline decisions, and concrete results.

builds

ML pipelines, text classification systems, and data workflows.

interests

Fine-tuning, evaluation datasets, automation, and applied ML systems.

looking for

Roles where building, iteration, and system depth all matter.

selected projects

Clear project snapshots with the build, the stack, and the result.

ML pipeline#01

UFO Sightings: Machine Learning Pipeline

Analyzed 80K+ UFO reports to uncover structure in noisy, witness-reported data using a full cleaning, clustering, and classification workflow.

Pythonpandasregexscikit-learnK-meansDBSCANMLPClassifier

what I built

  • Built a preprocessing pipeline for cleaning text-heavy reports and engineering usable features.
  • Applied K-means and DBSCAN to explore hidden structure before moving into supervised modeling.
  • Trained an MLPClassifier and iterated on the pipeline to improve identification performance.

result / signal

  • Worked on 80K+ records
  • Improved performance from ~1.2% to 7.6%
Text classification#02

Spam vs. Ham Email Classifier

Built a binary text classifier to separate spam from legitimate email with a focus on feature engineering and practical model iteration.

Pythonpandasscikit-learn

what I built

  • Engineered 20+ text features to capture useful spam signals.
  • Iterated on preprocessing and model selection instead of relying on a single default approach.
  • Kept the workflow simple enough to inspect, compare, and improve quickly.

result / signal

  • 20+ engineered features
  • Reached 90% accuracy
Model adaptation workflows#03

Agent Training Data + Evaluation Pipelines

Built training data pipelines for adapting open models to agentic workflows, with work spanning trace conversion, dataset shaping, evaluation, and promotion logic.

PythonQLoRAeval datasetsbenchmark replayquantization

what I built

  • Converted agent traces into run packets, normalized steps, and task-specific fine-tuning/evaluation datasets.
  • Improved small-model workflow performance through QLoRA, benchmark replay, quantization, and promotion gates.
  • Worked across data preparation, training, and evaluation in a fine-tuning pipeline.

result / signal

  • Focused on adaptation loops
  • Built around inspectable workflow data

research + direction

Interested in systems where modeling, evaluation, and iteration all show up in the work.

Berkeley Institute for Data Science research experience working with 750GB+ of scholarly citation data and network analysis.
Strong interest in fine-tuning pipelines, eval dataset construction, agent traces, and automation workflows for applied ML systems.

contact / availability

Looking for opportunities where I can contribute to applied machine learning, model evaluation, workflow automation, and real product-facing systems.
ananya.athreyas@berkeley.edulinkedin.com/in/ananya-athreyas
Award: Gold Presidential Volunteer Service Award · East Bay SPCA