From Theory to Practice: UCI’s Machine Learning Hackathon Delivers
For decades, UCI has been known for its impressive Machine Learning Repository. With more than 580 publicly accessible datasets, the repository serves as a tremendous resource for empirical and methodological research in machine learning. It also serves as a great learning tool for UCI students, thanks to the second annual ML Hackathon, a student competition funded by the National Science Foundation. The UCI research community donated nine challenge datasets from the repository for the event and the Research Cyberinfrastructure Center (RCIC) provided the computational resources.
“We wanted to create a fun opportunity for students to engage with real projects within the UCI research community,” says Tamanna Hossain, a graduate student in the Donald Bren School of Information and Computer Sciences (ICS) who organized this year’s hackathon. “My favorite part of the event was definitely the enthusiasm and curiosity students brought to the table, especially under the difficult conditions of the ongoing COVID-19 pandemic.”
Ten teams participated in the virtual hackathon, held May 14 through May 23, 2021, and competed for the top prize of $500 for best overall, and for four $250 prizes awarded for technical skills, scientific insights, creativity and best presentation.
Award-Winning Hacks
The top prize went to Hatomugi, an ML model designed to detect fake reviews on Yelp. Developed by data science major Yu Miyauchi, the model leveraged the YelpNYC challenge dataset, which contains 359,052 Yelp reviews for restaurants in New York City. Miyauchi, who graduated in June and is now a data analysis consultant at Deloitte, says “this competition was a great way to simulate my future work.”
Watch Miyauchi explain his machine learning model for fake review detection in Yelp here.
Another hack using the YelpNYC data was Aces2Alliance, which won the technical skills award. Data science major Yihan Wang and computer science major Qingchuan Yang applied deep learning algorithms to predict whether an unlabeled Yelp review was real or fake. “I was not expecting to win the award at the end,” says Wang, “but I am grateful to ICS for giving the current students a platform to showcase the skills we have learned.” Yang added that “it was exciting to apply ML theories learned from lectures to practice and see how powerful they are.”
The award for scientific insights went to I-am-a-segmentation-fault, developed by computer science major Andrew Chen using the dataset containing climate change and ecosystem carbon information in California. “I created a regressor that predicts carbon densities in California based on projected temperature and precipitation conditions,” he says. “This event was a great opportunity to use the theoretical machine learning skills that we heard about in class on real-world data sets.”
The most creative hack was two_truths_and_a_lie, developed by computer science majors Rionel Dmello and Richard Lopez. “Given a misconception and tweet pair, our goal was to detect if that tweet propagates the misconception, allowing content moderation on social media.” The students used the COVIDLies dataset (of approximately 6,000 tweet and misconception pairs) and fastText (a shallow neural network) to extract word embeddings. “I enjoyed working on a project that was so closely related to the pandemic, and I understand the challenges that big tech faces in regard to reducing the spread of misinformation,” says Dmello. “My partner and I also had to come up with innovative ways to communicate and share information over various platforms, which was a challenge in itself.” Lopez agrees. “Working on a COVID-related project in the middle of the pandemic was quite an adventure!”
Another hack using the COVIDLies dataset was WTAWTAW, which won the presentation award. For this hack, first-year computer science students Randy Huynh and Ruslan Manahoran, along with first-year data science student William Hou, focused on characterizing common COVID-19 misconceptions and developing a model to detect their presence. “COVID-19 has had such a huge impact on our lives and all around the world, and it was really interesting to explore how rampant COVID-19 misinformation is on the Internet,” says the teammates. “Through our research, we were able to determine commonalities between misinformative tweets, such as word and phrase choice, types of misinformation, certain themes, etc. We hope our project will
contribute to the development of more robust misinformation detection models in the future.”
Advancing ML Research
In addition to the competition, the event included ML research talks by Computer Science Professor Sameer Singh and computer science Ph.D. students Sam Showalter, Anthony Chen, Dylan Slack and Gavin Kerrigan. Singh, Chancellor’s Professor Padhraic Smyth, and RCIC Director Philip Papadopoulos served as faculty mentors throughout the event, and because all of the projects completed were open source, the hackathon work contributed to ML research as well.
“This is an amazing opportunity for undergraduates not only to gain hands-on experience with applying machine learning to real problems — which can really help them get jobs in machine learning — but further, working on scientific problems offered by UCI faculty means that the undergraduates are helping advanced research. I wish I had such opportunities as an undergrad,” says Singh. “It is also a great opportunity for the faculty across campus to use our excellent ICS undergraduates to prototype what applying ML to their projects would look like and to get the students excited and involved in their research.”
— Shani Murray