Sunday Edi Portfolio

March 20, 2024

Credit Card Fraud Detection
Using SAS

This project delves into the analysis of credit card fraud using a large dataset of over 1.8 million transactions. The primary goal is to uncover patterns that can aid in identifying and preventing fraudulent activities in the banking sector. Through a combination of exploratory data analysis (EDA), statistical analysis, and data visualization techniques using SAS, the project uncovers valuable insights about the factors influencing fraud.

View Project

January 22, 2024

Spam Detection with NLP and Random Forest Classification

This project showcases the application of Natural Language Processing (NLP) and Machine Learning to classify text messages as spam or ham (non-spam). Using the SMS Spam Collection Dataset from the UCI Machine Learning Repository, I implemented a Random Forest Classifier to detect spam messages effectively. The process involved data preprocessing with NLTK, feature engineering, and vectorization using TF-IDF to convert text into numerical data. Multiple models were evaluated using hyperparameter tuning and cross-validation, with the final model achieving an impressive 99% accuracy. Click below to explore how NLP techniques and machine learning models can be leveraged to improve spam detection.

Full Story

January 19, 2024

Big Data Processing and Analytics: Unlocking Insights with PySpark and Machine Learning

In this project, I leverage the power of PySpark to process and analyze large datasets efficiently. From Extract, Transform, and Load (ETL) operations to SQL-based business analysis, this project showcases how big data tools can drive data-driven decision-making. Using PySpark DataFrame API, I perform data cleaning, aggregation, and visualization, uncovering key business metrics through histograms, boxplots, bar charts, and pair plots with Seaborn and Matplotlib. Additionally, I implement machine learning models, including Linear Regression, Decision Trees, and Multilayer Perceptron Classification, to predict business trends and optimize operations. Dive in to explore how big data analytics can extract actionable insights and enhance predictive capabilities!

Full Story

April 18, 2017

Statistical Analysis with R

This project showcases my proficiency in statistical analysis and data visualization using R, highlighting key techniques such as descriptive statistics, hypothesis testing, and exploratory data analysis. Through a series of R Markdown code chunks, I demonstrate how to analyze datasets, create histograms, boxplots, and empirical cumulative distribution functions (ECDFs), and perform normality tests like the Shapiro-Wilk test. The project also includes an ANOVA test to compare means across different groups, illustrating my ability to derive meaningful insights from data. By exploring real-world examples—such as calorie consumption across months and tree measurements—I provide a clear, step-by-step guide to handling and interpreting data programmatically. Click the link to dive into the full analysis and see how R can transform raw data into actionable knowledge!

Full Story