Skip to the content.
Yash Raj — Data Scientist
YR

Yash Raj

B.Tech · Mathematics & Scientific Computing

Mathematics & Scientific Computing student with a focus on data science and backend engineering. Building data pipelines, designing REST APIs, and applying statistical methods to extract insight from real-world datasets.

About

My mathematical background shapes how I approach model design, data quality, and performance trade-offs. I work on building data pipelines, designing REST APIs, and applying statistical methods to extract insight from real-world datasets.

Currently deepening my work in predictive analytics and machine learning — moving from descriptive analysis toward building systems that forecast and classify.

Technical skills

Data Science & Analytics
Python Pandas NumPy SciPy Matplotlib Seaborn Jupyter
Backend & APIs
FastAPI SQLite PyArrow Java Uvicorn
Data Structures & Algorithms
C++ LeetCode

Projects

Automated Wealth & Portfolio Optimization API

View repo →

An asynchronous backend microservice that ingests real-time EOD price data and US Treasury rates from the Financial Modeling Prep API and computes the mathematically optimal capital allocation across a multi-asset portfolio. A 24-hour background sync worker decouples data ingestion from the API layer, keeping response times low regardless of external API latency.

Optimization approach: Maximizes the Sharpe Ratio — excess return over the live risk-free rate, divided by portfolio standard deviation — subject to full capital allocation and no short selling, using SLSQP constrained optimization on the Efficient Frontier.
Python FastAPI SciPy Pandas NumPy SQLite httpx

Dynamic Pricing Optimization Engine

View repo →

A high-performance microservice that computes the mathematically optimal price for products to maximize profit. Uses constrained minimization to locate the exact peak of a profit function by balancing base costs, dynamic demand multipliers, and competitor pricing. PyArrow loads a compressed Parquet database directly into RAM on boot, enabling Pandas Boolean indexing across 100,000+ synthetic products in milliseconds.

Python FastAPI SciPy Pandas PyArrow NumPy

E-Commerce Customer RFM Segmentation API

View repo →

An end-to-end data science pipeline and REST API for customer segmentation. Processes 100,000+ real-world e-commerce transactions to compute Recency, Frequency, and Monetary scores and classify customers into actionable business segments in real time.

Key finding: Over 95% of customers had made only a single purchase — this required designing custom scoring functions outside standard quantile methods to meaningfully isolate high-value segments from the broader one-time buyer pool.
Python FastAPI Pandas Seaborn RFM Analysis

Currently learning

SQL

Query optimization and working with large-scale structured datasets.

Scikit-learn

Supervised and unsupervised learning pipelines.

Machine Learning Fundamentals

Regression, classification, and clustering with emphasis on the underlying mathematics.

Pipeline Optimization

Applying scientific computing principles to improve data processing efficiency.

Contact

Built with GitHub Pages  ·  2025  ·  Yash Raj