This past weekend, I participated in SCSE's Techfest Hackathon in NTU. And like all hackathons it was wild, draining and extremely rewarding. Spoiler alert, we won 3rd place. As usual, this blog will be a part tutorial, part journey diary and part personal reflection of my experience.
P.S. If you’re just interested in what we made, here’s live demo and our submission.
P.P.S. Shrivardhan turned the GPU off as it was quite expensive to leave on indefinitely. For a working live demo, please wire us money as per need basis.
Let's delve in.
The Ideation Phase - Dante’s 9 circles of Hell
We began the hackathon facing an apparent disadvantage: out of over 100 teams, each comprising 3-4 members, ours was distinct with just myself and my friend Shrivardhan. Nonetheless, we appeared unfazed by this situation. Looking back, our indifference might have stemmed from a mix of naivety and confidence.
And so it began, on Sunday the 4th of February, the prompt for the Techfest Hackathon was released and *drum roll*
It was Generative AI.
Note: If you haven't heard about Generative AI yet; that's very impressive. But for sake of clarity; I will do my best to explain what it is and; as a result of which what it's not. So here goes,
Artificial Intelligence - When a "computer"(more specifically a program/application) performs a task, that otherwise requires human intelligence. Then the computer is said to be artificially intelligent. Or colloquially, the computer would be referred to as an "AI".
Generative Artificial Intelligence - When creation or generation of any format (image, text, video, audio); that mimics human quality, is performed by said intelligent "computer". Then that computer has generative artificial intelligence. Or colloquially, the "computer" would be referred to as "Generative AI".
Using this definition you can notice how, ChatGPT, Stable Diffusion, Text-to-Speech is all “Generative AI”.
(This explanation was perfected over a 30 minute conversation with my mother about this very topic. )
Like almost anything on the internet; while everyone may not be on the same page about something, they’ve definitely heard/spoken/argued about it at some point in the last year. So for the rest, it's the technology used for deepfakes, going to put us out of jobs and so on. Sound familiar? Great.
The second order effect of everyone knowing about Generative AI is that any problem that could be solved by Generative AI is already being attempted by a company with a billion dollars in venture funding. The past year has been a drag race to the finish line of PMF of seemingly "obvious" ideas being implemented in a million different ways. (Myself included, see GPTBookClub and ValentineGPT). This makes it all the more difficult to come up with an idea that excited us; more specifically an idea that was new, fun and/or useful.
At this point, it's useful to share that the judging criteria for the hackathon:
While difficulty and design would be the result of our execution. The idea was key to 50% of the scoring criteria. More than that, we needed to be excited by it to work our asses off for the next 48 hours.
And so we spent the entirety of the first day ideating. We ideated at Machine Learning and Data Analytics @ EEE (for which graciously allowed entry by our honorary third member Ayushman) for >12 hours.
We thought about everything from VR girlfriend (Manas’s favourite), to a new internet(Shrivardhan’s favourite) and an LLM for socio-political law evaluation(No one’s favourite). The problem lied in getting the difficulty just right. Too hard and it's demotivating; too easy and it's not fun.
Defeated and frustrated we phoned in our actual third hackathon member in by far the most ironic entry; ChatGPT
And that's how that went.
Defeated, we walked back to our halls to change and shower in hopes of resetting our brains in some metaphysical way.
Surprisingly it did, it struck us, music generation. The idea was catchy, it had been done before but I was confident we could put our twist on it. And so we did.
A platform for creators to generate content-specific royalty free music using AI
Execution Day
Amazingly, in hindsight that's all we needed. While, the next 24 hours were brutal; with no sleep and hours spent coding away at our computers; coming up with an idea was key to powering us through this treacherous journey.
And so we did; our friend at SCSE's Innovation Lab (shoutout Heng Woon!) and the Lab Tech (Jian Xin) was kind enough to let us have 24/7 access to the lab at such short notice. Like a well oiled machine, we decided the overall architecture and split our roles; I was to do design and frontend, while Shrivardhan was to do backend and DevOps.
That said, the beauty of a hackathon is the fire of the encroaching deadline. It forces you to move fast. Moving fast ensures you build something, but it also ensures that you make mistakes by not allowing you the proper time to make a decision. You just can’t sit on it, you must commit and deal with the consequences later.
We made the decision to have a micro-services architecture. With that came CORS issues, to fix that Shrivardhan made a proxy server. Only to realise our server-less functions would time out before our model even starts to generate music, leading us to a Django server on EC2 and so on. While we did have a working demo, it was really held together by strings.
A brief aside on how our project works.
Frontend: The user interface of Vibes is made with React, I used shadcn/ui for the first time and it’s a great crutch for a design beginner, like myself.
Backend:
Django Server: Powers our innovative image-to-text-to-speech, image-to-text-to-audio, and image-to-text-to-music functionalities, alongside our unique text-to-music models.
AWS Lambda Functions: Manage user authentication, posts management, and interaction with our Postgres database.
State-of-the-Art Models:
Meta's MusicGen: Our backbone for audio generation, enhanced with a fine-tuned GPT for superior results.
Custom Deployment: We ended up choosing a dedicated T4 GPU for hosting our MusicGen (medium checkpoint) on AWS through HuggingFace.
Vision and Theme Generation: GPT-4 Vision for image inputs and a fine-tuned GPT for generating themes and genres.
For the more visually inclined, here’s a picture of everything I described.
After submitting and sleeping at 11AM the next morning, I woke up to the sunset at 6PM only to have Shrivardhan inform me that we had been selected to present tomorrow as part of the Top 10 finalists. I was elated; I had never even considered the possibility of qualifying. But there was more work to be done.
The positive reinforcement was enough to keep the momentum up; we worked to host our services and I whipped up a pitch deck to present. And there we were the next day, in our token college student formal attire; ready to present at Research Techno Plaza, 50 Nanyang Drive.
A nerve wracking two hours later, we were announced 3rd place winners. And yes, the handicap I mentioned at the start; well it paid off with us splitting the winnings of a 4 person team two ways. Not so bad after all.
Congratulations!
and your humor is! XD