Here I showcase all the projects I worked on while completing my master’s degree. My specialization is in Information Systems and Design so many of the projects listed here deal with documenting, analyzing and recommending changes to Organizational Processes. I have taken a number of courses in Data Science which include some Machine Learning Courses and Experimental Design Courses. My research work with Process Mining and Natural Language Processing is at the intersection of Data Science and Information Systems Design. The links to the code and the reports are given below.
Some of these reports are on organizational processes of real organizations. The names of these organizations have been anonymized to preserve privacy. Many of the links provided lead to my Github page. I think this is easier than randomly clicking around my Github Repos.
The projects are listed in chronological order (starting with the most recent), each containing a small description along with a preview link.
During my masters, I was involved in a research project focused on deriving intent from unstructured data with Dr. Eric Yu, Dr. Arik Senderovich, Dr. Zia Babar and Dr. Alexei Lapouchnian, in collaboration with IBM CAS Canada.
- Information on the project can be found here
- A brief video explaining our project can be found here
- Click here to see where our research is being implemented.
I finished off the project by presenting a prototype of an intent classification pipeline at IBM CASCON-EVOKE 2021!
- I took two reading courses during my masters to learn more about process mining and email mining. Click Here to see details about my Email Mining Research Project (April 2020-2021)
Applied Regression Analysis
- Here I simulated data and used it to test the estimates of population mean and variance as sample size, correlation and number of times a population is sampled, increases. Click here to see the analysis. Click here to see the code.
- In this project I used a subset of the 2011-2012 NHANES dataset and analyzed it in hopes of finding the best predictors for blood pressure. A number of variable selection and shrinkage methods such as Akaike Information Criterion and Lasso were employed to narrow down the list of 16 predictors. After processing the data for multi-collinearity and influential points, running variable selection methods and performing cross validation on all the models, the model with age and gender as predictors returned the lowest prediction score. Based on this result and interaction model was also designed to capture the effect of smoking on blood pressure. The results of this model revealed that regardless of gender, smoking decreases blood pressure as age increases. Click here to see the R-Markdown containing all the code. Click here to see a short presentation about this project.
Application of Machine Learning Algorithms on NASDAQ and NYSE Stocks and Pharmaceutical prices datasets from Kaggle
This was a a final group project for a Machine Learning in R course. We tried to predict whether stock prices of a certain stock would increase or decrease based on a labelled dataset (which indicated year-end increase or decrease) available on Kaggle. This dataset had over 200 financial indicators for over 4000 companies spanning 4 years. We also used Weekly pharmaceutical sales data (over a period of 4 years) and attempted to extract insights about the seasonality of certain drugs. Unfortunately, as time series was not covered in this course, the analysis we obtained from the pharmaceutical sales dataset is not 100% accurate but this assignment allowed us to think about interesting questions and implement the techniques we learned in class to a real world problem.
- Click Here for the code I wrote for my part of the experimentation.
- Click Here for the code written by all members of my team. It includes my own code.
- Click Here for the PDF file of the written report which contains the graphs and our analysis of the training results. Links to the datasets are provided in the reference section of the report
Application of Statistical Experimental Methods for Analysis of Social Phenomenons
The projects displayed here were done as a part of an Experimental Design in Data Science Course. All the projects were done in R, some were done individually by me, some were done as part of a group.
- Exploratory Analysis on the Number of Playgrounds in Toronto by Property Type: This project covers basic concepts of Exploratory Data Analysis (EDA) in R. It includes things like loading data in to R, giving a brief dataset description and creating graphs to explain specific variables. This is was mine and my teammates’ first time using R stats to do Data Analysis.
- Analysis of Red Light Camera Distribution in Toronto: covers a more detailed EDA on a red light camera dataset available through the Open Data Toronto portal. A little bit of data cleaning was necessary to extract hood number from the neighbourhood names. The analysis was mostly based on the count of red-light cameras around neighbourhoods in Toronto. The ethics of the high number of red-light cameras in low income neighbourhoods is discussed. This projected was completed individually.
- An Analysis on Income and Education Through the Lens of Visible Minority in Toronto: This project uses the Education level and Toronto Demographics datasets from Open Data Toronto. In this project my team members and I focused on creating charts and graphs to better depict our results. The use of the GGplot2 package was the main focus for this project. We performed correlation analysis of variables relating to demographics such as income, visible minority status and education in Toronto Neighborhoods. We analyze the ethics behind this kind of experimentation and how easy it is to manipulate the results to skew the results in a certain direction.
- Does Newspaper Price Affect Readership? An Analysis of French Newspaper Readership After the Introduction of Television Advertisement: This project was a replication experiment of Angelucci, C., and Cagé, J. (2019). Newspapers in Times of Low Advertising Revenues. The statistical design used in this paper is difference in differences. The authors released their dataset and my team member’s and I attempted to replicate their results. Our findings differed from the authors as we found that the introduction of television ads may not have been the sole cause of increasing newspaper prices and a decreasing readership.
- Divides in Development- An Analysis on the Impact of Modern Western Colonialism on the Economies of Former Colonies: This was an individually completed project which was another replication experiment. The authors used Instrumental Variables as the experimental design method to analyze the effects of colonization on countries today. The instrumental variable used was settler mortality rates. The idea was that colonies that had higher settler mortality did not have strong governmental institutions to govern people and the political conflict that resulted from this impact the development of those colonies to this day.
Proof of Concept of an On Campus Food App
The following report contains the requirements, specifications as well as the systems architecture required for the app to be functional. This report was written in collaboration with four other masters students as part of the Systems Requirements and Architectural Design Course at University of Toronto’s Faculty of Information.
The purpose of this designing this App was to provide students a way to purchase meals without having to leave their places of study. Apps like UberEats, DoorDash and SkipTheDishes did not contain on-campus food vendors when the app was being designed.
Please see the report here
Entity Relationship Diagrams of a Hockey Statistics Database and sample queries written to extract information from said database
Along with my other two team members, a hockey stats website was scraped and the information was used to create a mini database of tables. Over these tables we created and tested mySQL queries to extract insight about specific team and player traits.
- Click here to view the database design proposal and here to view the Entity Relationship Diagram (ERD) of the database
- Click here to view the mySQL queries
- Click here to view the results of the queries
- Click here to view the code (DDL) to create and populate database tables used
Request for Proposal for an information system required to better serve Post-Secondary Students
In this Request for Proposal I highlighted the functional and non-functional requirements of an information system which would be supporting International students and International programming for domestic systems. The desired system would help the university staff to better support students’ academic and extracurricular needs with an easier appointment booking system to direct students to the appropriate staff member .and integrated access to student records. Click here to view the RFP.
Reports on required systems innovation and transformations of a foreign government agency.
In the reports, my group and I analyzed the emergency response process of the Crisis Support Decision Branch of the Kuwaiti government. We map out the current process as well as the suggested improvements to decrease response time. The organizational process is modelled using Business Process Modelling Notation, Data Flow Diagrams and i-star goal models.
- This first report (here) contains As-Is and To-Be processes detailed in BPMN and DFDs with explanations that follow the diagram.
- The second report (here) contains the analysis of whether efficiency goals would be met by the organization if the suggestions made are implemented. This is done through istar goal modelling diagrams.
Report on the business process of the help-desk at an Ontario University’s help center for international students
In this report, I give a brief overview of the center and its services and highlight areas within the current business process that would benefit from automation. I also discuss which parts of the process could be innovated to manage a higher influx of students and what an overall transformation of the current process might look like.
Click here to view the report and the BPMN Models