Projects
My Data Science Projects
Welcome to my projects page! Here, you can find a list of data science projects that I’ve worked on, showcasing my skills in data analysis, machine learning, and more. Each project is accompanied by a brief description, key technologies used, and a link to the GitHub repository for more details.
Sentiment Analysis: US Airline Tweets
Duration: Oct 2023 – Dec 2023
Project Description:
In this project, we conducted a comprehensive sentiment analysis of U.S. domestic airline tweets, utilizing a range of machine learning and natural language processing models. The objective was to accurately categorize tweets into sentiment classes to assist airlines in understanding customer feedback. Our approach included traditional models like Support Vector Machines (SVM) and Multinomial Regression, both with and without regularization techniques like LASSO and Elastic Net, as well as advanced deep learning models such as Recurrent Neural Networks (RNNs), Convolutional Neural Networks (CNNs), and the state-of-the-art Bidirectional Encoder Representations from Transformers (BERT) model. The analysis revealed that while each model offered unique insights, the fine-tuned BERT model stood out with its exceptional accuracy of 90%, demonstrating its superior capability in handling the complexities of sentiment analysis in social media data. This project exemplifies the use of diverse AI methodologies in practical applications, providing valuable tools for businesses to derive actionable insights from customer opinions expressed online.
Features and Technologies Used:
- Python
- Pandas
- Scikit-Learn
- TensorFlow
- NLP
Modeling Contributing Factors of Unreported Crimes for Victims in the United States
Duration: February 2023 – March 2023
Abstract:
The 2020 National Crime Victimization Survey (NCVS) provides information about crime around the country. Following data cleaning and re-encoding, logistic regression models the determinants of a crime being reported to the police or not. Attempts to expand the model with interactions and a generalized partially linear additive model lead back to modeling with a few predictors as main effects.
Features and Technologies Used:
- R
- ggplot2/plotly
- tidyverse
- Logistic Regression
- Classification
Modeling the Count of Rental Bikes in the Greater Washington D.C. Metropolitan Area
Duration: January 2023 – February 2023
Abstract:
Modern bike-sharing services provide data on daily users, including the number of bikes that are rented each day, which this analysis attempts to build an accurate model for. The data sourced from the UCI Machine Learning Repository is cleaned, visualized, and used as the foundation of multiple count regression models. Model reduction and variable encoding lead to a final set of predictors to best describe the relationship between weather behavior, time of year, seasons, and the number of rented bicycles for any given day.
Features and Technologies Used:
- R
- ggplot2/plotly
- Poisson Regression
- Negative Binomial Regression
- Likelihood Ratio Tests
- Count Data
View Full Report | View Poster |
Get In Touch
Interested in collaborating on a project or have questions? Feel free to contact me.