Here I showcase all the projects I have worked on over the past two years while completing my master’s degree. My specialization is in Information Systems and Design so many of the projects listed here deal with documenting, analyzing and recommending changes to organizational Processes. I have taken a number of courses in Data Science which include some Machine Learning Courses and Experimental Design Courses. The links to the code and the reports are given below.
Some of these reports are on organizational processes of real organizations. The names of these organizations have been anonymized to preserve privacy. Many of the links provided lead to my Github page. I think this is easier than randomly clicking around my Github Repos.
This page is being updated frequently!
The projects are listed in chronological order (starting with the most recent), each containing a small description along with a preview link.
- Click Here to see details about my Email Mining Research Project (ongoing)
Application of Machine Learning Algorithms on NASDAQ and NYSE Stocks and Pharmaceutical prices datasets from Kaggle
This was a a final group project for a Machine Learning in R course. We tried to predict whether stock prices of a certain stock would increase or decrease based on a labelled dataset (which indicated year-end increase or decrease) available on Kaggle. This dataset had over 200 financial indicators for over 4000 companies spanning 4 years. We also used Weekly pharmaceutical sales data (over a period of 4 years) and attempted to extract insights about the seasonality of certain drugs. Unfortunately, as time series was not covered in this course, the analysis we obtained from the pharmaceutical sales dataset is not 100% accurate but this assignment allowed us to think about interesting questions and implement the techniques we learned in class to a real world problem.
- Click Here for the code I wrote for my part of the experimentation.
- Click Here for the code written by all members of my team. It includes my own code.
- Click Here for the PDF file of the written report which contains the graphs and our analysis of the training results. Links to the datasets are provided in the reference section of the report
Application of Statistical Experimental Methods for Analysis of Social Phenomenons
The projects displayed here were done as a part of an Experimental Design in Data Science Course. All the projects were done in R, some were done individually by me, some were done as part of a group.
- Exploratory Analysis on the Number of Playgrounds in Toronto by Property Type: This project covers basic concepts of Exploratory Data Analysis (EDA) in R. It includes things like loading data in to R, giving a brief dataset description and creating graphs to explain specific variables. This is was mine and my teammates’ first time using R stats to do Data Analysis.
- Analysis of Red Light Camera Distribution in Toronto: covers a more detailed EDA on a red light camera dataset available through the Open Data Toronto portal. A little bit of data cleaning was necessary to extract hood number from the neighbourhood names. The analysis was mostly based on the count of red-light cameras around neighbourhoods in Toronto. The ethics of the high number of red-light cameras in low income neighbourhoods is discussed. This projected was completed individually.
- An Analysis on Income and Education Through the Lens of Visible Minority in Toronto: This project uses the Education level and Toronto Demographics datasets from Open Data Toronto. In this project my team members and I focused on creating charts and graphs to better depict our results. The use of the GGplot2 package was the main focus for this project. We performed correlation analysis of variables relating to demographics such as income, visible minority status and education in Toronto Neighborhoods. We analyze the ethics behind this kind of experimentation and how easy it is to manipulate the results to skew the results in a certain direction.
- Does Newspaper Price Affect Readership? An Analysis of French Newspaper Readership After the Introduction of Television Advertisement: This project was a replication experiment of Angelucci, C., and Cagé, J. (2019). Newspapers in Times of Low Advertising Revenues. The statistical design used in this paper is difference in differences. The authors released their dataset and my team member’s and I attempted to replicate their results. Our findings differed from the authors as we found that the introduction of television ads may not have been the sole cause of increasing newspaper prices and a decreasing readership.
- Divides in Development- An Analysis on the Impact of Modern Western Colonialism on the Economies of Former Colonies: This was an individually completed project which was another replication experiment. The authors used Instrumental Variables as the experimental design method to analyze the effects of colonization on countries today. The instrumental variable used was settler mortality rates. The idea was that colonies that had higher settler mortality did not have strong governmental institutions to govern people and the political conflict that resulted from this impact the development of those colonies to this day.
Proof of Concept of an On Campus Food App
The following report contains the requirements, specifications as well as the systems architecture required for the app to be functional. This report was written in collaboration with four other masters students as part of the Systems Requirements and Architectural Design Course at University of Toronto’s Faculty of Information.
The purpose of this designing this App was to provide students a way to purchase meals without having to leave their places of study. Apps like UberEats, DoorDash and SkipTheDishes did not contain on-campus food vendors when the app was being designed.
Please see the report here
Entity Relationship Diagrams of a Hockey Statistics Database and sample queries written to extract information from said database
Along with my other two team members, a hockey stats website was scraped and the information was used to create a mini database of tables. Over these tables we created and tested mySQL queries to extract insight about specific team and player traits.
- Click here to view the database design proposal and here to view the Entity Relationship Diagram (ERD) of the database
- Click here to view the mySQL queries
- Click here to view the results of the queries
- Click here to view the code (DDL) to create and populate database tables used
Request for Proposal for an information system required to better serve Post-Secondary Students
Report on required systems innovation and transformations of a foreign government agency.
Report on the business process of the help-desk at an Ontario University’s help center for international students
In this report, I give a brief overview of the center and its services and highlight areas within the current business process that would benefit from automation. I also discuss which parts of the process could be innovated to manage a higher influx of students and what an overall transformation of the current process might look like.
Click here to view the report and the BPMN Models