OpenAI’s DALL-E 3-Like AI For Free, Forever!

Check out Weights & Biases and sign up for a free demo here: https://wandb.me/papersllm Flux is available here: ...

101 件のコメント

  • @Eagleizer
    2024-08-18

    Tried it. It is terrible. I tried to make an all black rally car without plates with a number on the door. I instructed it to use wide angle lens and take picture from a distance. I even asked for black paint where the plates would be. Also asked for picture to be taken from an elevated position with the sun coming from behind the camera. Every time, there would be plates, sometimes with numbers, number on the door would be wrong 90% of the time, all pictures would be taken from ground level, and from very close. Dust cloud would often be in front of the car, and I never got the sun from behind the camera as instructed. After 100 or so generations I gave up. Every time i specified some areas with problems it would mess up something else. It also seemed to get stuck in a certain look, just as we see in all the fake video thumbnails on YouTube, you learn to see what it's doing and recognize it. Maybe this is good for some type of design, but the results I got with this was no better, and in fact VERY similar to the thumb creator in udio.

  • @GenericYousername
    2024-08-18

    "Run on your phone in your pocket 2 more papers down the line."

    People are already running it on an iPhone using DrawThings app. Its takes an hour... but still the future is now.

  • @TechnoMasterBoy
    2024-08-18

    "for free"
    Did you not even bother reading their github and such?
    The "pro" model which is what their API uses, you can not even download...
    Their "dev" model is a whopping 23.8GB. I don't know if image gen models behave like LLMs, but if it does, only a small percentage of people are gonna be able to run it, as they'd need 24GB VRAM.
    It would be nice if the video showed the dev model being used locally, so we can see how good we'll realistically get.

  • @bzikarius
    2024-08-11

    I tested few simple prompts and it follows em pretty close. Also it is true about text: each generation was successfull, even with tilted and defocused text.
    Impressive. The more complex prompt causes much more mistakes.

  • @ooiiooiiooii
    2024-08-11

    These are not free. Literally no mention of the countless stolen images used to train these. You didn’t “make images of scholars” you typed a prompt and the algorithm used actual artists’ work. Tech is amazing but the lack of intellectual honestly and integrity is telling.

  • @TARS..
    2024-08-11

    Am I seeing this right?? When I run it on replicate, it says it's running on CPU rather than something like SDXL which runs on much larger GPUs like the A40.

  • @Adreitz7
    2024-08-11

    I'm running Flux dev on my own machine and its results are GREAT. Very coherent and aesthetic, and the amount of detail is outstanding -- the output is full of pixel-level features, so I wouldn't be surprised if it is using a 16-channel VAE like SD3. The drawbacks are 1. this model is HEAVY at 12B parameters plus T5 (you'll probably need at least 12GB VRAM to run it, and quantized at that); 2. if you thought SDXL was slow, this is about 4x slower step-for-step; 3. it currently doesn't support negative prompts as far as I know, so that may cause problems if your desired prompt causes the model to add undesired features to the image (e.g. "make me a photo of fried rice without peas"). Hopefully, Black Forest will make this into a model family including lower-weight options and put out their paper soon so others can learn from their advances.

  • @LV-426...
    2024-08-11

    Lately it's not about scientific achievements, rather this is free today, this is open source, that will be for free forever, and so on. A communist paradise. TWO, you are losing your beginnings.

  • @patrickhector
    2024-08-11

    Still cannot make a normal human woman that doesn't look like a hyperattractive doll, and really can't handle adding scars

  • @erikziak1249
    2024-08-11

    Tried it, it is really stupid. Asked to draw a trolleybus and it came up with totally idiotic images. Absolutely no understanding of catenary and trolley poles. Streetcar was a little bit better, but still massively flawed.

  • @DanFrederiksen
    2024-08-11

    Is it hysterically censored or will it allow the generation we want? like hitler netanyahu
    Edit: yes it will. it started out a little puppet cartoonish but with several attempts it gets decent. It refused to add satan behind netanyahu but a demon was fine. it doesn't get the hitler mustache quite right, it doesn't have quite the photorealism of midjourney but it's pretty good. has to be the pro version though.

  • @RealityRogue
    2024-08-11

    I end up using "Niji 6" of midjourney for it's amazing creative quality when coming up with stylized characters. I don't know if Flux would come close, as this seems mostly similar to midjourney's V6.1

  • @azeriff
    2024-08-11

    First off, I have to say that Flux is fantastic. Way better than DALL·E3. In my opinion, DALL·E3 is a completely useless algorithm. I’ve tried working with it multiple times, even though I pay for the premium version. Each time it just doesn’t understand or follow my instructions. The prompt is impossible to modify, making it unusable. It might work for something simple like “a hedgehog in a meadow,” but for anything more complex, it fails miserably.

  • @df1ned
    2024-08-11

    Yeah, its cool. But it will be forgotten within a month unless they manage to, firstly, optimize and prune it probaly by about a factor of 10 on performance and secnondly, firgure out how to finetune it

  • @Wrennbird
    2024-08-11

    Are the training models ethically sourced? If not, this is yet again another program built off the backs of uncredited artists and photographers for the sake of cutting corners.

  • @GenericCat
    2024-08-11

    "Your request will cost $0.05 per megapixel. For $1 you can run this model approximately 20 times." I think we have different meanings for the word "free" xD

  • @artman40
    2024-08-11

    Note that DALL-E 3 can still seemingly do more styles and Flux AI still struggled with longer texts.
    It also struggles with novel concepts such as "Spoon-shaped elephant".

  • @muhammadlufti2967
    2024-08-11

    Been using this model for 3 days and I can’t believe that Flux Schnell model is that fast and Flux Pro model’s result is on par with Midjourney’s and Stable Ultra model

  • @looksintolasers
    2024-08-11

    Please do a video on the "Repulsive Shells" paper from SIGGRAPH 2024 by Josua Sassen, Henrik Schumacher, Martin Rumpf, and Keenan Crane!

  • @CartoType
    2024-08-11

    This is incredible. I tried it a few minutes ago, and it's going to take me some time to recover from my amazement. I asked for a picture in the style of a Victorian painting with ancient Indo-European chieftains and a mass of their followers, and got an extremely good result.

  • @TheBennyFisch
    2024-08-11

    There seems to be a misunderstanding in terms of "available for free". This is not the first time that only reduced models are available.

  • @luuketaylor
    2024-08-11

    At this point, I'm digitizing all my papers so I don't have to worry about holding onto them or dropping them!

  • @Onihikage
    2024-08-11

    Dr. Károly, I know it's not quite your usual thing, but I would be very interested to see a video summarizing the major public advancements that have been made in cloth simulations over the last five or ten years. What kind of performance gains have been realized by the kind of research you've covered? What sort of real-time cloth simulation, such as in video games, has gone from unthinkable to trivial? I'd love to know! Maybe it's more suitable for a TwentyMinutePapers kind of video, but I really want to see it!

  • @revengefrommars
    2024-08-11

    After several tries, I got the Schnell version on HuggingFace to draw something acceptable (free and without login), so it's about as good as SDXL or DALL-E. I usually throw the same prompt at those two (and Google ImageFX) to see which one actually generates something usable. Good to have a fourth option.

  • @bricktube3871
    2024-08-11

    How much is flux pro? I’m not signed up and if I goto flux pro page it says it would be $.0017 to run my prompt. But a friend who has flux pro, it tells him $.05 to run his prompt. That’s not and insignificant difference

  • @pallenda
    2024-08-11

    Really good at text!

    Fails my own test I found our when testing midjourney a while ago. I basically ask to get an image of an everyday item broken/disassembled into its parts. Fx. a hammer.

    A hammer

  • @tuseroni6085
    2024-08-11

    my main question would be: how does it do with persistent characters, and the follow up: does it support control net?

    one of the major things i need AI for is creating art assets for a game i am working on, for that persistence is important.

  • @Steamrick
    2024-08-11

    I rather doubt this in particular will be running on a phone any time soon. It's a 12B parameter model and the text encoder (T5-Efficient-XXL) is another 11B parameters. The hardware hunger is real.
    Bringing the text capability the model has to a smaller model, though? Seems entirely possible.

  • @Shootjapan
    2024-08-11

    New here, is it a AI voice commenting? It need to improve by a lot, no real human stop like that every 2 words.

  • @AustinThomasPhD
    2024-08-11

    The title (currently "OpenAI’s DALL-E 3-Like AI For Free, Forever!") makes it sound like OpenAI actually released an open model. probably should change that to be misleading. A very cool model, though!

  • @smorty3573
    2024-08-11

    Hey Károly! The title of your video made me think that Open AI actually released an open model for once... It may confuse other fellow scholars as well, so maybe consider changing it somewhat.
    Also I would have liked it if you had mentioned that there are two variants of the model, but that's just my take.

  • @Village_Iliad
    2024-08-11

    Have we reached the point yet where any of these can synthesize a 3d scan of what’s in their images? Like with photogrammetry, generating a series of images in 360 around the object and creating the model from that?

  • @bars7897
    2024-08-11

    One of the links asks for 0.05 cents per megapixel. And I've used it before I read that. Am I forever in debt now? Should I panic? Was this a scam?

  • @mirek190
    2024-08-11

    Flux iis insane .. I run it locally with rtx 3090 and comfyui , image is ready after 20 seconds
    Wtit Flux dev quality is better than MJ or Dalle-3

  • @roguegryphonica3147
    2024-08-11

    Flux suffers from additional appendage, and mystery appendage syndrome. It also doesn't have a sense of direction. I generated a man flying a plane backwards. It also doesn't generate images well that aren't tropes. For example, it took me four or five tries to create a b horror masterpiece with the right zing. Text sometimes randomly appears in the image even if unprompted.

  • @兄さん
    2024-08-11

    "for free, forever" yeah no. more like, "for the petit bourgeois and above in mostly anglosphere countries, and only until a megacorporation inevitably establishes a backed monopoly on all AI-related industries in the coming years"

  • @nevascurded
    2024-08-11

    Elongated thumb at 58 sec in, 1:03 in, the hand is mangled next to the T. Same problem as other AIs probably, the more specific you are, the less it works.

  • @Syzygy2048
    2024-08-11

    I feel like this discredits Stable Diffusion a bit, which is not worse Midjourney, definitely not worse than Dall-e, plus it is open weights and more flexible than either of the ones mentioned.

  • @vi6ddarkking
    2024-08-11

    Invisions the notion of running a 12 Billion Parameter image generation model on a Smartphone...
    You do know there are more efficient ways to fry an egg right?

  • @DuchAmagi
    2024-08-11

    It was supposed to be free, but I see some messages about prices and costs?

    "Your request will cost $0.05 per megapixel. For $1 you can run this model approximately 20 times."

    "Billable Time
    21s
    Cost estimate: $0.00"

  • @jerryworm
    2024-08-11

    Free image generators will not be free forever without limitations. Free versions will not be same as paid versions because paid versions has a lot more features. Free versions have basic or fundamental features only.

  • @ilakya
    2024-08-11

    Still a little bit too large to chew at home. But I also remembered back then when 13b parameters LLM were impossible to run at home too.