Samay U Shetty

Computer Science Graduate Student | AI/ML Researcher

Specializing in AI, Machine Learning, NLP & Software Systems
Published researcher with hands-on experience in scalable ML systems

View Resume

<About Me />

Computer Science Graduate Student at Rochester Institute of Technology with a passion for AI, Machine Learning, and Software Systems. Published researcher with hands-on experience in building scalable ML systems, applied NLP research, and developing innovative solutions in the intersection of AI and software engineering.

Programming Languages

Python Java C++ C SQL JavaScript

AI/ML Frameworks

TensorFlow PyTorch Hugging Face LangChain Scikit-learn OpenCV

Software Systems

RESTful APIs Microservices Database Design System Architecture Version Control Testing

Data & Tools

NumPy Pandas MongoDB Git/GitHub Docker Kubernetes

Cloud & DevOps

AWS GCP CI/CD Pipelines GitHub Actions MLOps

Research Focus

Natural Language Processing Large Language Models Retrieval-Augmented Generation Computer Vision Neural Networks Deep Learning

<Research & Experience />

Software Developer (ML Research)

Lab of Population Intelligence, RIT

Summer 2025

  • Researched annotator disagreement in supervised learning and updated DisCo (Distribution from Context), a neural model predicting full label distributions by modeling annotator-item pairs
  • Developed scalable ML pipelines for six benchmark NLP datasets, incorporating annotator metadata embeddings, ensemble evaluation, and GPU-accelerated training
  • Enhanced DisCo with refined loss functions and metadata integration, achieving ~39% performance improvement and contributing to a Top-9 global leaderboard ranking (LeWiDi 2025) and acceptance to EMNLP Conference 2025

ML Developer (Research Assistant)

Vidyalankar Institute of Technology

Aug–Dec 2023

  • Developed diabetes prediction model with Random Forest, improving accuracy by 8%
  • Optimized preprocessing pipelines, reducing model training time by 40%
  • Co-authored peer-reviewed paper on ML in healthcare diagnostics published in TANZ

<Education />

Rochester Institute of Technology

Master's in Computer Science

Focus: AI, Machine Learning, Software Systems

Aug 2024 – Present

Rochester, NY

Published Research ML Research Assistant Conference Publications

University of Mumbai

Bachelor's in Electronics & Telecommunication

Minor in Data Science

2020 – 2024

Mumbai, India

<Featured Projects />

Savora AI

AI-powered restaurant operations platform using neuro-symbolic AI and Graph RAG for financial reasoning and real-time decision support. Built pipelines and warehousing to integrate multi-source restaurant data (sales, inventory, scheduling).

Python SQLite RESTful API Graph RAG Neuro-symbolic AI System Design

BizRizz

Agentic AI Platform for business consultancies. Built production-ready AI agent to analyze competitor data and customer sentiment, offering strategic recommendations for businesses.

Python Flask Gemini AI Google Places API Paychex API Full-stack

MatSAR

ML-Driven PolSAR Data Classification Tool. Led development of custom ML tool outperforming PolSARPro in SAR image classification accuracy and UI accessibility.

C++ MATLAB Machine Learning GUI Development Computer Vision Research

Amazon Packaging Size Prediction

Developed ML model predicting optimal packaging sizes by training over 200k data points using Random Forest. Utilized NLP techniques to extract and process product specifications.

Python Pandas NLP Random Forest Data Analysis Feature Engineering

<Publications & Research />

Improving Distributional Predictions via Metadata and Loss Reweighting with DisCo

ArXiv 2025

Research on enhancing neural models for predicting full label distributions in supervised learning tasks. Focus on annotator disagreement modeling and metadata integration for improved performance.

Matsar: A Comprehensive Machine Learning Approach for Polsar Data Processing

IJCA 2025

Comprehensive approach to machine learning applications in Polarimetric Synthetic Aperture Radar data processing. Focus on classification algorithms and GUI development for improved accessibility.

Polsarhub: A Large-Scale Repository for Polsar Data

IEEE IGARSS 2024

Development of a comprehensive repository for Polarimetric SAR data to support research and development in remote sensing and computer vision applications.