Close

Spencer Prentiss

Data Science Professional

"We’ve all been there, in a group project where 2 people wind up doing most of the work. I’m one of the 2."

Download Resume

About Me

I'm currently a freelance data science professional. I obtained two B.S. degrees from Purdue University in Applied Statistics and Data Science, Boiler Up! When not working you can usually catch me at the gym, watching sports, or playing trivia games. I love working with great people, getting creative with data, using state-of-the-art technologies, and building awesome products.



Interested in working together or having a chat? Feel free to contact me.

Experience

Data Analyst Consultant



Leveraged expertise in data analysis, statistical theory, and machine learning to drive strategic decision-making and achieve impactful results across diverse organizations. Published work featured in Delen, D., Shadra, R., and Turban, E.'s "Business Intelligence, Analytics, Data Science, and AI: A Managerial Perspective 5th ed." (pp. 8-12).



Baseball Athelete Dashboard

Data-Driven Optimization of High School Football Strategies

For more specifics take a look at my resume.

Purdue University

Teaching Assistant

As a Teaching Assistant (TA), I had the privilege of teaching computer science freshmen as part of a summer program, providing them with a solid foundation in programming languages such as Java and Python, along with practical knowledge of GitHub and Unix environments. In addition to this, I designed the programming curriculum for a political science research class. During my employment, I worked with 20 other TAs to effectively address student inquiries and assist in the preparation of study materials for study sessions and exam reviews. This role allowed me to contribute to the education of over 200 students and provided me with valuable teaching and collaborative skills.



View Example Assignment

Education

Purdue University

Aug 2019 - Dec 2022

Bachelor of Science in Data Science, Bachelor of Science in Applied Statistics

GPA: 3.68

Relevant Coursework: Data Mining and Machine Learning, Information Systems, Theoretical Statistics, Probability, Intro to Time Series, Intro to AI, Large Scale Data Analysis, Applied Regression Analysis

Projects

Political Science Research Project

As the project lead, I headed a 6-person team investigating the complexity and polarity of judicial opinions. It was my first end-to-end data science project, involving scraping, cleaning, and analyzing 10GB of data from 12 appellate courts. Utilizing advanced NLP techniques, such as sentiment analysis and LDA topic modeling, we derived metrics for complexity, polarity, and subjectivity. Collaborating with a professor, we successfully integrated our findings and methodology into ongoing research. Despite challenges with messy real-world data, we designed and implemented the entire ETL pipeline. Through presentations and discussions, we effectively conveyed our work, covering all phases of the data science process.

View Project   View Code

Restaurant Management Web App

In close collaboration with a team of four members, I contributed to the development of a restaurant management web application. Leveraging the Django web app framework, we successfully integrated a custom-built SQLite database. I took on the responsibility of creating complex stored procedures and triggers to ensure efficient updating, deleting, adding of application data. Despite being my first time working on a web app and using Django, I successfully created the data, schema, ERDs, triggers, and stored procedures from scratch.

View Code

Chess AI Project

The objective of this project was to design and implement a competitive chess AI as part of YouTuber Sebastian Lague’s Chess Contest. The contest provided the user interface (UI) and an API for interacting with the chess engine. The goal was to create an AI opponent that was challenging while using a limited amount of code and memory. This would prevent normal Chess AI techniques like large transposition tables. For this project, I used C# to implement the Negamax algorithm with Alpha-Beta Pruning. I also used Optimization Techniques like move ordering, iterative deepening, and qsearch to improve speed. Overall the chess AI achieved an Elo rating ranging between 1800 and 2100. Which would rate it as a USCF Expert, which fully exceeded my expectations This project improved my AI development and problem-solving skills. I look forward to improving it in the future.

View Code
t

Animal Fight Simulator

The Animal Fight Simulator is a Python program that aims to answer the age-old question of which animals would emerge victorious in head-to-head matchups. Through turn-based simulation, the program brings together meticulously crafted stats and tendencies for 20 animals, derived from extensive research into attack patterns, strength, environments, and other relevant variables to predict a winner. The Animal Fight Simulator makes use of the tkinter library, enabling the creation of an interactive user interface. This project provides a fun and relatively scientific way to predict the outcome of animal duels.

View Code
View More Projects

Ranking the Top 75 NBA Players of All Time

In this data-driven project I attempted to rank the top 75 NBA players of all time. Using Python, advanced statistical analysis, and web scraping techniques, I collected and analyzed data on player accolades, achievements, and gameplay statistics. By calculating individual scores for peak performance and career contributions, I identified the top players. This project was a great way to hone my skills in data visualization, data manipulation, and data cleaning. This project provides a fun and statsitical way to look at and contrbute to the debate of who’s basketballs GOAT.

View Project

Loan Default Model

A Random Forest Classifier that predicts wether someone will default on a loan. The moel utilizes Python's pandas, scikit-learn, and imbalanced-learn libraries for data manipulation, model training, and evaluation. After preprocessing the dataset and handling missing values, I performed feature engineering by encoding categorical variables. Additionally, I employed Random Under-Sampling to address class imbalance. The model achieved a fairly high accuracy, ROC-AUC score, and F1 score, demonstrating its effectiveness. This project was a great way to improve my machine learning and data modeling skills.

View Project

External Sorting

I Implemented an efficient external merge sort algorithm in Python to handle large files that exceed available memory. To do so I split the 8GB input file into smaller chunks, applied internal sorting (merge sort) on each chunk, and then merged the sorted chunks to obtain the final sorted output using a min-heap. The project allowed me to increase my skills in memory management, file I/O optimization, and algorithm design. I successfully tackled the real-world challenge of sorting large datasets and provided a scalable solution for handling data processing tasks.

View Project

Skills

Programs/Languages

  • Python
  • R
  • Excel
  • Java
  • Tableau
  • SQL/MYSQL
  • SAS

Libraries

  • Pandas
  • NumPy
  • Scikit-learn
  • Matplotlib
  • BeautifulSoup

Machine Learning

  • NLP (Sentiment Analysis, Topic Modeling, Text Classification)
  • Supervised Learning ( Logistic Regression, Random Forest, XGBoost, KNN, SVM )
  • Unsupervised Learning ( K means, SVD, PCA )

Get in Touch