How to Detect AI on the Newgrounds Audio Portal
The Rise of Generative AI
As generative AI (GenAI) tools have become more accessible, Newgrounds has received an alarming amount of AI-generated audio content. This “music” lacks the spirit and creativity of genuine, human effort and degrades the quality of the Audio Portal with cookie-cutter tracks that essentially sound the same. Furthermore, GenAI songs represent a certain amount of legal risk to our site, as AI models are trained on copyrighted material without consent.
Suno AI
Suno AI is by far the most common model used for AI-generated tracks uploaded to the Audio Portal. It generates songs with (mostly) distinct instruments and vocals, but struggles with audio quality and mixing. Suno will often sound similar to mainstream music in their respective genres, with many Suno AI rock songs reminiscent of Disturbed, and so on.
Here are some things to look out for in Suno AI tracks:
- Title and Theme: Titles of these AI tracks will frequently be an adjective followed by a noun. “Digital Dreams” and “Purple Cyberspace” are a couple good examples. Since these are generated by ChatGPT, they tend to look and feel very cliché. Look out for AI generated album covers prompted by the title.
- Vocals: If a Suno AI track has vocals, they will sound uneven and scratchy. These vocals can be in any language. It will never sound like a single person is singing. Consonants are often muddled, with some vocals completely incoherent. There is a sensation that the singer is emotionally tone-deaf.
- Pronunciation: Words may sound strange to a native speaker, with stress on the wrong syllable or common words shoe-horned into unusual speeds or rhythms to fit the melody.
- Lyrics: Suno AI has ChatGPT integration to write its lyrics. As a result, they may sound corny and/or contain factual errors a real songwriter wouldn’t make. Rhymes may be very basic, or not rhyme well.
Hyper-specific lyrics about niche topics such as online entertainment fandoms, political figures, or technological themes are common. Content may be stupid and offensive, but the AI will take the musical composition entirely seriously.
Example: “I Glued My Balls To My Butthole Again”
- Bit Rate: One of the biggest indicators of Suno AI is low bit rate (192kbps) with poor sound quality as a result. Low bit rate files are more compressed (less data per second) and sound terrible in comparison to higher bit rate files (320kbps). If the song sounds low quality and its metadata confirms it is 192kbps, there’s reason to suspect it was generated with Suno AI.
Note: The only other music creation software which defaults to a bit rate of 192kbps is Beepbox (and its forks).
- Length: At the moment, Suno AI can generate a maximum of four minutes of audio. Many Suno AI tracks are under this, but many also reach this limit. If you see a track that is exactly four minutes long (give or take a second) it may have been made with Suno AI.
- Mixing: The mixing on Suno AI tracks is usually poor. Instruments will have strange curves and/or clash. Some may be too quiet or have strange artifacts replicating reverb effects on them. Suno AI knows what music sounds like, but not how it works, and these mixing issues are the greatest indicator of that. Sometimes Suno AI will generate near silence or (often) clip the audio.
- “Crusty high tones”: A common artifact on Suno AI tracks is a main instrument that doesn’t really sound like anything you would hear in real life. Some people have likened it to a “broken chinese flute”. It may sound like a really bad cross between an electric guitar and a trumpet. Almost always, it will be poorly defined at higher tones, i.e. sounding “crusty”. This type of artifacting is directly linked to how Suno creates music, but is also worsened with low bit rates.
- Attack: Many instruments have a sharp “attack”, a very defined point where the sound starts. If you can't hear the attack of instruments like guitars, horns, cymbals, and other instruments that typically have sharp attacks, it was likely AI generated.
- Percussion: The missing “attacks” are especially pronounced for percussion instruments. Drums may sound flat with an attack similar to falling sand. Cymbals especially lack definition. They will bleed into themselves and other instruments, if they can be picked out at all.
- Bass: In genres with prominent sub-bass, such as hip-hop, you may not be able to tell what note the bassline is supposed to be. It may sound like two or three notes at once or be seemingly absent.
Udio AI
Udio AI is another generative AI model that can produce full tracks with vocals and instrumentals. It is often superior to Suno AI, and while less seen in the Audio Portal, it is still posted occasionally. As a generative model, Udio bears many similarities to Suno, but with less artifacting.
Any category mentioned for Suno AI above that isn’t listed specifically for Udio may be applied to both models.
- Metadata: The biggest tell that a track was made by Udio AI is if the file metadata tells you directly. The contributing artist will always be listed as “Udio vX.X,” unless the user manually changes the metadata.
- Vocals: Although more coherent than Suno, vocals generated by Udio AI will also sound uneven and scratchy.
- Lyrics: The LLM in Udio to write lyrics has not been disclosed, but it suffers from the same exact issues as Suno AI, and may be based on ChatGPT as well. Lyrics are often cheesy and incomplex, as are the rhymes.
- Bit Rate: Bit rate is usually much higher in Udio AI tracks compared to Suno AI. File metadata will always report 320kbps, and spectrogram analysis matches this. Regardless, Udio AI uploaders who do not cherry-pick their generations may upload tracks with low audio quality.
- Mixing: Although the mixing of Udio AI tracks is usually much more coherent than Suno AI, many instruments may still bleed into each other or have strange EQ applied.
- Length: Udio AI has a maximum length of 15 minutes, so length should not be a determining factor for discerning if a track is made with Udio AI.
Other Models
There are still more models that crop up occasionally.
Soundraw AI will have very specific, ten-second chunks within the song, and instruments will be much more distinct from one another as they are generated separately. It noticeably struggles with complex genres and anything vocal.
Soundful AI has similar artifacts to Udio, but isn’t great at generating more complicated material. It creates very formulaic mixes, perhaps because its model is based on AI-processed stock loops. Most Soundful tracks sound similar to each other, even across different genres.
RVC-based AI models are an older form of generative AI in which an existing song is replaced or “re-sung” with the likeness of another voice or sound. A good example would be the hundreds of AI covers using Plankton’s likeness that were popular throughout 2023. There are some legitimate uses of these models for original works, but most RVC users simply swap out the original acapella of a song to replace the vocalist, to varying degrees of success.
Lastly, audio from AI text-to-speech models (such as 15.ai) is still considered spam and is not allowed on the audio portal, as is all TTS.
Spectrogram Analysis
Sometimes users will attempt to change the metadata of their tracks to hide the damning 192kbps bit rate. However, to a trained ear, the quality never actually improves. Spectrogram analysis provides hard evidence of this and reveals the true bit rate.
Spectrograms can be created using multiple different applications, but for the purposes of uncovering a track made by Suno AI, I recommend Spek. Once installed, simply drag and drop any audio file to generate a spectrogram.
This is a spectrogram of the song “Swing-Bit Brawl” by Waterflame.
On the left axis, you will see the frequency in kHz, stretching from zero to a certain value throughout the song’s length. For MP3’s with a bit rate of 320kbps, they will have a max frequency of 20kHz. There are small portions of the song that drop below this, and the song will never extend beyond 20khz; however, the high frequencies are crisp and bright.
This is a spectrogram for a track generated by Suno AI:
Immediately, we observe that Suno AI tracks present a much lower, much fainter spectrogram, cutting off at 19kHz. Suno AI will often generate songs of lower quality than this, and the max frequency will never extend beyond 19kHz in a spectrogram.
This is because bit rate directly correlates with max frequency.
MP3 files with a bit rate of 320kbps will have a max frequency of 20kHz. MP3 files with a bit rate of 192kbps (the max output of Suno AI) will have a max frequency of 19kHz. Therefore, if there is a visible cutoff at 19kHz, then the original bit rate was 192kbps, regardless of whatever the metadata claims.