I started from scratch twice, first using GPT-4o and then using GPT-5.2-Codex. There was a very clear difference in quality using the same exact questions. I asked two very short and simple questions (admittedly a bit too short) "Using JSONPlaceholder, can you input two endpoints on my index page?" as well as "can you style what you did with Tailwind?" and the results were quite different. GPT-4o threw together a very broken style and the data didn't show up at first until I gave it specific endpoints to choose from in JSONPlaceholder's documentation. GPT-5.2-Codex was off the bat very clean in style and although it gave raw JSON data at first, I asked the same thing I asked GPT-4o about using specific endpoints and it decided to structure the data presentation it's own way like a user directory in a card format.
All in all, it's very surprising what the newer AI models can do with practically little effort, just only prompting. I thought my questions were pretty short and vague as well, so I imagine someone who asks it very specific questions could really setup something high quality if they know what to ask it.