Overview

Python Portfolio

Data science, machine learning, and visualization projects showcasing Python's power for turning complex data into actionable insights.

Machine Learning

Predictive modeling, regression analysis, and classification algorithms to identify patterns and make data-driven predictions.

Explore ML models

Data Visualization

Creating insightful visualizations with matplotlib, seaborn, and other libraries to communicate complex data relationships effectively.

Explore visualizations

Financial Analysis

Automated financial modeling, break-even analysis, and report generation for business decision-making and planning.

View bakery case study

AG Grid Customization

Python-based data processing and transformation for AG Grid column customization, enabling dynamic UI configuration and data visualization.

View column customization project

My Python Journey

Over the past 5+ years, I've used Python for data analysis, machine learning, and visualization projects across finance, marketing, and product analytics. My approach combines statistical rigor with practical business application, focusing on creating actionable insights from complex datasets.

Python libraries I work with:

Pandas, NumPy, Scikit-learn, PyTorch, Matplotlib, Seaborn, SciPy, TensorFlow

Machine Learning

Leveraging Python's machine learning libraries to build predictive models and extract insights

Logistic Regression Classifier

A predictive model that analyzes portfolio characteristics to identify accounts that may require rebalancing or special attention. This model was implemented in PowerQuery for portfolio management teams to proactively address client needs.

Key Techniques:

  • Feature selection from large financial datasets
  • Data preprocessing and normalization
  • Train/test split for model validation
  • Probability-based classification
  • Model performance evaluation
View project implementation

Code Sample

PowerQuery_Correlation_Regression.py
# Training logistic regression model
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

# Select features from dataset
features = ['AccountGroupValueDollars', 
    'CashBufferPercent', 'ClassDrift',
    'CategoryDrift', 'PositionDrift', 
    'TaxType', 'AccountType',
    'ModelPositionsNotHeld', 'Equivalencies']
X = dataset[features]
y = dataset[['Columns']]

# Split data into train and test dataset
X_train, X_test, y_train, y_test = train_test_split(X, y)

# Train logistic regression model
log = LogisticRegression()
log.fit(X_train, y_train)

# Testing the algorithm
y_pred_test = log.predict(X_test)
y_prob_test = log.predict_proba(X_test)

# Predict for all inputs
y_pred = log.predict(X)
y_prob = log.predict_proba(X)

# Output results with probabilities
dataset2 = dataset[features + ['Columns']]
dataset2['predictions'] = y_pred
dataset2['probability'] = y_prob[:,1]

Data Visualization

Creating visual representations to uncover patterns in complex datasets

Correlation Matrix

A heatmap visualization revealing relationships between proposal sections and client engagement metrics.

correlation_matrix.py
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Calculate the correlation matrix
correlation_matrix = dataset.corr(method='pearson')

# Create a heatmap
plt.figure(figsize=(24, 12))
heatmap = sns.heatmap(correlation_matrix, 
                     annot=True,
                     cmap="BuPu")

plt.title('Correlation Matrix', fontsize=34)
plt.show()
See implementation

Hierarchical Clustering

A dendrogram showing hierarchical relationships between investment proposal features and identifying natural content groupings.

dendrogram.py
import numpy as np
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import ward, dendrogram
import seaborn as sns

# Calculate correlation and distance matrices
correlation_matrix = dataset.corr(method='pearson')
distance_matrix = 1 - correlation_matrix

# Create linkage matrix using Ward's method
linkage_matrix = ward(distance_matrix)

plt.figure(figsize=(24, 13))
dendrogram(linkage_matrix, labels=correlation_matrix.columns)
plt.show()
See implementation

Key Insights

These visualizations revealed important correlations between proposal sections and client engagement, with "Risk vs Return" and "Hypothetical Growth" sections showing strong client interest.

Key Finding #1

Historical performance sections had strongest correlation with proposal acceptance.

Key Finding #2

Risk visualizations clustered separately from growth projections.

Key Finding #3

Customized sections showed higher engagement than templates.

Financial Analysis

Automated financial modeling, break-even analysis, and report generation using Python.

Bakery Financial Analysis Case Study

This project demonstrates a Python-based system for comprehensive financial analysis for a bakery, covering data preparation, multi-dimensional visualizations (trends, monthly performance), detailed calculations like break-even points, and automated PDF executive report generation.

Key Features & Outcomes:

  • Analyzed 3-year projections: 20.2% revenue growth & 17.5% EBITA increase.
  • Automated monthly break-even calculations for financial planning.
  • Generated professional PDF reports with key metrics and visualizations.
  • Identified seasonal fluctuations impacting revenue and cash flow.
  • Risk assessment: e.g., a 10% sales decrease reduces EBITA by ~13.5%.

AG Grid Column Customization

Leveraging Python for dynamic data processing and UI configuration in advanced data grids.

Dynamic AG Grid Configuration

This project focuses on using Python for backend data processing and transformation to dynamically configure AG Grid columns. It allows for flexible UI based on user roles or data characteristics, enhancing data visualization and interaction within complex enterprise applications.

Core Functionality:

  • Python scripts to define column visibility, order, and formatting.
  • Transformation of data structures to match AG Grid's expected schema.
  • Enabling features like dynamic grouping, sorting, and filtering based on processed data.
  • Improving user experience by presenting tailored views of large datasets.