the worst possible flowy prompt is “chill music”. it has no scene, no time, no place, no activity, no emotion. you get back the average of every chill track the model knows, which is to say: generic.
good prompts describe a moment. a real moment, with edges. an early user typed eleven different lofi prompts in their first two days, trying to figure out what worked. that's the friction scenarios eliminate. one good moment beats eleven generic ones.
the rule: describe a moment
the stronger the moment, the more specific the music. you don't need every axis, but two or three is the sweet spot. give it some mix of:
- time of day or year: “late night”, “sunday morning”, “first snow”, “4am”
- place: “tokyo neon”, “berlin warehouse”, “cafe in a paris alley”
- activity: “driving”, “cooking jollof”, “getting ready”, “fixing a car”
- weather or light: “rainy”, “golden hour”, “blue-hour skyline”
- emotion or energy: “hyped”, “melancholy”, “slow burn”, “antistress”
- genre or instrumentation: “lo-fi”, “piano trio”, “synthwave”
you don't need all six. three is usually enough. one alone is rarely enough.
30 examples
these all land on real cached tracks, so you can paste any of them and the stream starts immediately.
drives and travel
- summer road jam, top down, golden hour
- midnight drive through neon tokyo
- highway road trip, mountain pass
- graveyard-shift drive home
- cross-country train, wide plains
- sailing tradewinds, calm sea
each of these produces a different stream. “summer road jam, top down, golden hour” produces warm vocal pop. “midnight drive through neon tokyo” lands closer to japanese city pop or synthwave with japanese vocal samples. “graveyard-shift drive home” is darker, mostly instrumental, slower tempo.
study and focus
- deep focus binaural background
- rainy sunday lo-fi, slow coffee
- classical strings, late night essay
- thunderstorm cozy reading
- soft jazz piano trio, dim lighting
- blue-hour skyline, glass and steel
these bias toward instrumental. “binaural” and “no vocals” are explicit instrumental locks. “piano trio” implies it. weather cues (“rainy”, “thunderstorm”) bias toward ambient textures.
cooking, getting ready, kitchen mode
- glossy pop hits to feel hot getting ready
- afrobeats grooves for cooking jollof on a sunday
- kitchen dance party, sunday cooking
- lisbon rooftop, sangria, slow talk
- barcelona tapas crawl
- marrakech souk, brass and spices
scene plus place is the right pattern. “marrakech souk” produces north african melodic patterns. “barcelona tapas crawl” produces spanish guitar and flamenco influences. “afrobeats for cooking jollof” gives you afrobeats with full vocals.
regional and language-locked
- perreo intenso para reventar la pista
- corridos tumbados fuertes para manejar en la noche
- punjabi club bangers full of bass and brass
- purane bollywood sad gaane raat ko sunne ke liye
- japanese city pop, neon tokyo
- hyper produced k-pop with switch ups every 30 seconds
write the moment in the target language and the music comes back with vocals in that language. the genre tags (“perreo”, “corridos”, “bollywood”) anchor the model to that musical tradition.
moods and weather
- post-breakup melancholy, foggy window
- first snow, espresso ritual
- humid summer night, cicadas
- spring rain on cherry blossoms
- dark techno warehouse 4am
- saturday night warehouse rave
these are the most evocative moments, and they tend to produce the most distinctive streams. weather plus emotion plus light is a strong combination because each axis pulls the model in a different direction and the intersection is narrow enough to be specific.
patterns that don't work
- single genres on their own. “rock”, “jazz”, “hip hop” give you the bland mean. add a scene, a time, an activity, and the music sharpens fast.
- contradictions. “hyper energetic deep focus music” confuses the moment. pick one direction.
- walls of adjectives. “dreamy ethereal gentle soft soothing peaceful relaxing music” is all synonyms, no information. use one strong word, not seven weak ones.
- artist names. the model isn't built to imitate specific artists, and even if it could, the copyright story is nightmare fuel. instead of “like radiohead”, try “melancholy art rock, foggy uk afternoon”.
writing in your own language
the model is comfortable with all the major european languages plus japanese, korean, mandarin, hindi, arabic. a moment in your native language usually produces vocals in that language. a moment in english about a non-english scene (“lagos rooftop”, “tokyo neon”) still biases the music to that region's sonic vocabulary.
if the stream keeps coming back in english when you wanted another language, switch the moment's language entirely. that's the strongest signal you can send.
the meta-pattern
good moments feel like text messages to a friend who's about to DJ your evening. you wouldn't text them “chill music please”. you'd text them “something cozy, it's raining, i'm about to cook”. that's the right register.
be specific. be evocative. trust that the model has more cultural reference than you might think. and if a moment doesn't hit, change one thing and try again. that's usually faster than rewriting from scratch.