Tried it. It is terrible. I tried to make an all black rally car without plates with a number on the door. I instructed it to use wide angle lens and take picture from a distance. I even asked for black paint where the plates would be. Also asked for picture to be taken from an elevated position with the sun coming from behind the camera. Every time, there would be plates, sometimes with numbers, number on the door would be wrong 90% of the time, all pictures would be taken from ground level, and from very close. Dust cloud would often be in front of the car, and I never got the sun from behind the camera as instructed. After 100 or so generations I gave up. Every time i specified some areas with problems it would mess up something else. It also seemed to get stuck in a certain look, just as we see in all the fake video thumbnails on YouTube, you learn to see what it's doing and recognize it. Maybe this is good for some type of design, but the results I got with this was no better, and in fact VERY similar to the thumb creator in udio.
"for free"
Did you not even bother reading their github and such?
The "pro" model which is what their API uses, you can not even download...
Their "dev" model is a whopping 23.8GB. I don't know if image gen models behave like LLMs, but if it does, only a small percentage of people are gonna be able to run it, as they'd need 24GB VRAM.
It would be nice if the video showed the dev model being used locally, so we can see how good we'll realistically get.
I tested few simple prompts and it follows em pretty close. Also it is true about text: each generation was successfull, even with tilted and defocused text.
Impressive. The more complex prompt causes much more mistakes.
These are not free. Literally no mention of the countless stolen images used to train these. You didn’t “make images of scholars” you typed a prompt and the algorithm used actual artists’ work. Tech is amazing but the lack of intellectual honestly and integrity is telling.
Am I seeing this right?? When I run it on replicate, it says it's running on CPU rather than something like SDXL which runs on much larger GPUs like the A40.
I'm running Flux dev on my own machine and its results are GREAT. Very coherent and aesthetic, and the amount of detail is outstanding -- the output is full of pixel-level features, so I wouldn't be surprised if it is using a 16-channel VAE like SD3. The drawbacks are 1. this model is HEAVY at 12B parameters plus T5 (you'll probably need at least 12GB VRAM to run it, and quantized at that); 2. if you thought SDXL was slow, this is about 4x slower step-for-step; 3. it currently doesn't support negative prompts as far as I know, so that may cause problems if your desired prompt causes the model to add undesired features to the image (e.g. "make me a photo of fried rice without peas"). Hopefully, Black Forest will make this into a model family including lower-weight options and put out their paper soon so others can learn from their advances.
Lately it's not about scientific achievements, rather this is free today, this is open source, that will be for free forever, and so on. A communist paradise. TWO, you are losing your beginnings.
Tried it, it is really stupid. Asked to draw a trolleybus and it came up with totally idiotic images. Absolutely no understanding of catenary and trolley poles. Streetcar was a little bit better, but still massively flawed.
Is it hysterically censored or will it allow the generation we want? like hitler netanyahu
Edit: yes it will. it started out a little puppet cartoonish but with several attempts it gets decent. It refused to add satan behind netanyahu but a demon was fine. it doesn't get the hitler mustache quite right, it doesn't have quite the photorealism of midjourney but it's pretty good. has to be the pro version though.
I end up using "Niji 6" of midjourney for it's amazing creative quality when coming up with stylized characters. I don't know if Flux would come close, as this seems mostly similar to midjourney's V6.1
First off, I have to say that Flux is fantastic. Way better than DALL·E3. In my opinion, DALL·E3 is a completely useless algorithm. I’ve tried working with it multiple times, even though I pay for the premium version. Each time it just doesn’t understand or follow my instructions. The prompt is impossible to modify, making it unusable. It might work for something simple like “a hedgehog in a meadow,” but for anything more complex, it fails miserably.
Yeah, its cool. But it will be forgotten within a month unless they manage to, firstly, optimize and prune it probaly by about a factor of 10 on performance and secnondly, firgure out how to finetune it
Are the training models ethically sourced? If not, this is yet again another program built off the backs of uncredited artists and photographers for the sake of cutting corners.
"Your request will cost $0.05 per megapixel. For $1 you can run this model approximately 20 times." I think we have different meanings for the word "free" xD
Note that DALL-E 3 can still seemingly do more styles and Flux AI still struggled with longer texts.
It also struggles with novel concepts such as "Spoon-shaped elephant".
Been using this model for 3 days and I can’t believe that Flux Schnell model is that fast and Flux Pro model’s result is on par with Midjourney’s and Stable Ultra model
This is incredible. I tried it a few minutes ago, and it's going to take me some time to recover from my amazement. I asked for a picture in the style of a Victorian painting with ancient Indo-European chieftains and a mass of their followers, and got an extremely good result.
Dr. Károly, I know it's not quite your usual thing, but I would be very interested to see a video summarizing the major public advancements that have been made in cloth simulations over the last five or ten years. What kind of performance gains have been realized by the kind of research you've covered? What sort of real-time cloth simulation, such as in video games, has gone from unthinkable to trivial? I'd love to know! Maybe it's more suitable for a TwentyMinutePapers kind of video, but I really want to see it!
After several tries, I got the Schnell version on HuggingFace to draw something acceptable (free and without login), so it's about as good as SDXL or DALL-E. I usually throw the same prompt at those two (and Google ImageFX) to see which one actually generates something usable. Good to have a fourth option.
How much is flux pro? I’m not signed up and if I goto flux pro page it says it would be $.0017 to run my prompt. But a friend who has flux pro, it tells him $.05 to run his prompt. That’s not and insignificant difference
Fails my own test I found our when testing midjourney a while ago. I basically ask to get an image of an everyday item broken/disassembled into its parts. Fx. a hammer.
I rather doubt this in particular will be running on a phone any time soon. It's a 12B parameter model and the text encoder (T5-Efficient-XXL) is another 11B parameters. The hardware hunger is real.
Bringing the text capability the model has to a smaller model, though? Seems entirely possible.
The title (currently "OpenAI’s DALL-E 3-Like AI For Free, Forever!") makes it sound like OpenAI actually released an open model. probably should change that to be misleading. A very cool model, though!
Hey Károly! The title of your video made me think that Open AI actually released an open model for once... It may confuse other fellow scholars as well, so maybe consider changing it somewhat.
Also I would have liked it if you had mentioned that there are two variants of the model, but that's just my take.
Have we reached the point yet where any of these can synthesize a 3d scan of what’s in their images? Like with photogrammetry, generating a series of images in 360 around the object and creating the model from that?
Flux suffers from additional appendage, and mystery appendage syndrome. It also doesn't have a sense of direction. I generated a man flying a plane backwards. It also doesn't generate images well that aren't tropes. For example, it took me four or five tries to create a b horror masterpiece with the right zing. Text sometimes randomly appears in the image even if unprompted.
"for free, forever" yeah no. more like, "for the petit bourgeois and above in mostly anglosphere countries, and only until a megacorporation inevitably establishes a backed monopoly on all AI-related industries in the coming years"
Elongated thumb at 58 sec in, 1:03 in, the hand is mangled next to the T. Same problem as other AIs probably, the more specific you are, the less it works.
I feel like this discredits Stable Diffusion a bit, which is not worse Midjourney, definitely not worse than Dall-e, plus it is open weights and more flexible than either of the ones mentioned.
Invisions the notion of running a 12 Billion Parameter image generation model on a Smartphone...
You do know there are more efficient ways to fry an egg right?
Free image generators will not be free forever without limitations. Free versions will not be same as paid versions because paid versions has a lot more features. Free versions have basic or fundamental features only.
コメント
People are already running it on an iPhone using DrawThings app. Its takes an hour... but still the future is now.
Did you not even bother reading their github and such?
The "pro" model which is what their API uses, you can not even download...
Their "dev" model is a whopping 23.8GB. I don't know if image gen models behave like LLMs, but if it does, only a small percentage of people are gonna be able to run it, as they'd need 24GB VRAM.
It would be nice if the video showed the dev model being used locally, so we can see how good we'll realistically get.
Impressive. The more complex prompt causes much more mistakes.
Edit: yes it will. it started out a little puppet cartoonish but with several attempts it gets decent. It refused to add satan behind netanyahu but a demon was fine. it doesn't get the hitler mustache quite right, it doesn't have quite the photorealism of midjourney but it's pretty good. has to be the pro version though.
It also struggles with novel concepts such as "Spoon-shaped elephant".
WELP
Fails my own test I found our when testing midjourney a while ago. I basically ask to get an image of an everyday item broken/disassembled into its parts. Fx. a hammer.
A hammer
one of the major things i need AI for is creating art assets for a game i am working on, for that persistence is important.
Bringing the text capability the model has to a smaller model, though? Seems entirely possible.
Also I would have liked it if you had mentioned that there are two variants of the model, but that's just my take.
Wtit Flux dev quality is better than MJ or Dalle-3
You do know there are more efficient ways to fry an egg right?
"Your request will cost $0.05 per megapixel. For $1 you can run this model approximately 20 times."
"Billable Time
21s
Cost estimate: $0.00"