*****All code related to this project can be found here! If you end up using any of my code or the datasets I have uploaded, please let me know!! I would love to know how it was useful for your project.
This is a research project I have been working on since May 2020 under the guidance of my supervisor Dr. Arik Senderovich. I first started working on the project in the form of a reading course and have continued to work on specific aspects of it as group projects with some of my classmates in courses taught by Dr. Senderovich. A very big thank you to Celio Oliveira and Chenyu Fang for their contributions and their role in helping me navigate some of the complex technical aspects related to this project.
I have been working on this project for almost one year now
I have written a blog post on my interest in Process Mining and the work I accomplished on the project as part of the first reading course here. As part of my reading course I have also presented at the Ethical Innovations for Artificial Intelligence workshop and the recording of the presentation can be found here.
The reading course was mostly theoretical as I was new to the field of Process Mining and needed to understand foundational concepts on which to build the project on. I took a machine learning course gave me a chance to gain hands-on-experience manipulating a real email dataset. For the course, my teammates and I focused on extracting insights about fraud from the Enron Email Dataset. The report of our findings can be viewed here.
The Business Process Management and Mining course gave us a chance build on our technical findings and apply the concepts of process mining broadly to our email dataset. As a first step towards conducting Process Mining on email data (which requires CaseIDs, Activities and Timestamp to be present in the dataset), the steps to deriving Case IDs of email processes was researched as the unique instantiations (cases) of email driven processes are not automatically labelled. The report which presents the research can be viewed here
Most recently, I took a second reading course with Dr. Senderovich to delve deeper into Advanced topics in Process Mining. I have delivered a talk on the progress of this project for the Toronto Data Workshop, the recording of which can be found here, and will continue working on this over the Summer 2021 semester.