Hacker Newsnew | past | comments | ask | show | jobs | submit | woolion's commentslogin

It seems the shader transparent version is badly aliased? The effect is less noticeable on Chrome than Firefox, but it is still quite visible. This defeats the purpose of vector graphics...

It's a nice trick to play around, but that limits its usefulness.


It is indeed badly aliased. The technique demonstrated does not take into account antialiasing in the initial render, which causes this issue. There are ways to improve it, but I would advise against this approach in general since it doesn't handle these edge cases well.


I have most sympathy for the ideals of free software, but I don't think prominently displaying "What's bad about:", include ChatGPT, and not make a modicum of effort to sketch out a basic argument, is making any service to anyone. It's barely worth a tweet, which would excuse it as a random blurb of barely coherent thought spurred by the moment. There are a number of serious problems with LLMs; the very poor analogies with neurobiology and anthropomorphization do poison the public discourse to a point where most arguments don't even mean anything. The article itself present LLMs as bullshitters, which is clearly another anthropomorphization, so I don't see how this really addresses these problems.

Whats bad about: RMS Not making a decent argument make your position look unserious

The objection that is generally made to RMS is that he is 'radically' pro-freedom rather than be willing to compromise to get 'better results'. This is something that makes sense, and that he is a beacon for. It seems such argument weaken even this perspective.


"In theory it's a great idea, in practice not so much."

I feel that's the lesson anyone who toyed with libertarians ideals ultimately come to. It just takes a bit longer for some than others. It's also harder to realize if you're making mad bank on it, rather than be part of the idiots who blew their hard-earned money on some technical misunderstanding, scam, or retro-active regulation.


The comparison are very useful but also quite limited in terms of styles. Models tend to have extremely diverse abilities in following a given style against steering to its own.

It's pretty obvious that OpenAI is terrible at it -- it is known for its unmissable touch. However, for Flux it really depends on the style. They already posted at some point that they changed their training to avoid averaging different styles together, which is the ultimate AI look. But this is at odds with the goal to directly generate images that are visually appealing, so the style matching is going to be a problem for a while, at least.


The site is broken up into "Editing Comparison" and a "Generative Comparison" sections.

Generative: https://genai-showdown.specr.net

Editing: https://genai-showdown.specr.net/image-editing

Style is mostly irrelevant for editing, since the goal is to integrate seamlessly with the existing image. The focus is on performing relatively surgical edits or modifications to existing imagery while minimizing changes to the rest of the image. It is also primarily concerned with realism, though there are some illustrative examples (the JAWS poster, Great Wave off Kanagawa).

This contrasts with the generative section though even then the emphasis is on prompt adherence, and style/fidelity take a backseat (which is honestly what 99% of existing generative benchmarks already focus on).


Oh, thank you for your reply. We may have different definitions of style and what editing would mean.

If you look for example at "Mermaid Disciplinary Committee", every single image is in a very different style, each that you can consider a default of what the model assume would be for the specific prompt. It's quite obvious that these styles were 'baked in' the models, and it's not clear how much you can steer in a specific style. If you look at "The Yarrctic Circle", a lot more models default to a kind of "generic concept art" style (the "by greg rutkowski" meme) but even then I would classify the results as at least 5 distinct styles. So for me this benchmark is not checking style at all, unless you consider style to be just around 4 categories (cartoon, anime, realistic, painterly).

So regarding image editing, I did my own tests at the first release of Flux tools, and found that it was almost impossible to get any decent results on some specific styles, specifically cartoon and concept art styles. I think the tools focus on what imaginary marketing people would want (like "put this can of sugary beverage into an idyllic scene") rather than such use cases. So editing like "color this" or other changes would just be terrible, and certainly unusable.


I didn't go very far with my own benchmarks because my results were just so bad. But for example, here's a line art with the instruction to color it (I can't remember the prompt, I didn't take notes).

https://woolion.art/assets/img/ai/ai_editing.webp

It's original, ChatGPT, Flux.

Still, you can see that ChatGPT just throw everything out and does not do a minimal attempt at respecting style. Flux is quite bad, but it follows the design much more (although it gets completely confused by it) that it seems that with a whole lot of work you could get something out of it.


Yeah so NOVEL style transfer without the use of a trained LoRA is, to my knowledge, still a relatively unsolved problem. Even in SOTA models like Nano Banana Pro, if you attach several images with a distinct artistic style that is outside of its training data and use a prompt such as:

"Using the attached images as stylistic references, create an image of X"

It's fall down pretty hard.

https://imgur.com/a/o3htsKn


I'm pretty sure that some model at least advertised that it would work. I also think your example was in the training data at some point least, but I suspect these styles are kind of pruned when the models are steered towards "aesthetically pleasing" outputs which are often used as benchmarks. Thanks for the replies, it's quite informative.


Sure! So that image was pretty zoomed out, I've gone ahead and attached some of the reference images in greater detail:

https://imgur.com/a/failed-style-transfer-nb-pro-o3htsKn

Now you should be able to see that the generated image is stylistically not even close to the references (which are early works by Yoichi Kotabe). Pay careful attention to the characters.

With locally hostable models, you can try things like Reference/Shuffle ControlNets but that's not always successful either.


I'm migrating my last AWS services to dedicated servers with Gitops. In principle, AWS give you a few benefits that are worth paying for. In practice, I have seen all of them to be massive issues. Price and performance are obviously bad. More annoying than that, their systems have arbitrary limitations that you may not be aware of because they're considered 'corner cases' -- e.g. my small use-case bumped against DNS limitation and the streaming of replies was not supported. Then, you have a fairly steep learning curve with their products and their configuration DSLs.

There are Gitops solution that give you all the benefits that are promised by it, without any of the downsides or compromises. You just have to bite the bullet and learn kubernetes. It may be a bit more of a learning curve, but in my experience I would say not by much. And you have much more flexibility in the precise tech stack that you choose, so you can reduce it by using stuff you're already know well.


So, something I find particularly annoying about hn is that you can segment it into very different subgroups that may or may not interact with a particular post. So you may find that in one thread, anti-hype sentiment is very high, and a more reasonable comment would be downvoted, and the next day the same anti-AI posts on another thread would get strongly downvoted because the thread is dominated by the hype people. It's far from being uniform, and since some people might feel that they risk to burn karma by entering the wrong thread there's an amount of self-censorship that makes this effect stronger.

Do you have something like that to manage the group dynamics?

Also in terms of personalities, I'm guessing the most appropriate way to get the list of prompts would be to run an analysis on the hn dataset to classify user behaviour patterns and create the prompts according to this. Since you can match these to posts in thread, you can also get a rough approximation of the dynamics distribution. Did you do such an analysis?


The mascot it super cute lion too. How can a project do everything so right? I was browsing some popular python libraries and they just slapped on the first image they got out of ChatGPT. It's nice to see care in the craft.


Were you aware of the Dato Duo (https://dato.mu/)? It's very cool for kids, except for the fairly steep price point.

The advantage is that it's limited, so it greatly reduces the wall of difficulty to manage to get some 'nice-sounding' music (mostly the restriction to the pentatonic scale). However, kids still manage to find the most horrible-sounding settings, and insist on keeping them as is...


It seems to me that it's exactly why I don't like word games. They use words like "combine", but it's generally mixing abstractions or taxonomies.

To guess it, I looked at 'crab' because it's a quite uncommon that has some deep relationship with a few words only. Then checked the most obvious one (which was the solution) against the other words, and determined that it didn't bear any significant relationship to the third word. So I checked the other (less obvious) potential solutions, and after a frustrating lack of match, I gave up. And then got annoyed that the first candidate was the right one. To be fair, I guess it's partly because I'm an ESL, as I think that solution/sauce can be used as a nominative locution enough to form a "special relationship".

To be a designer, you have to play with people's (as in general crowd, not individuals) general understanding of the subject. In particular, that means avoiding the curse of knowledge, and yes for normal people PC meant "not Apple consumer product". So ultimately, the search algorithm includes:

- categorize all relationships between words, ranked by strength

- compare with what is expected to be known in popular culture (adjust ranks)

- match against the designer's expectations of similar problems (look for clues to pick a best match)

It's a lot of words to say it's the opposite of a aha moment, the result of a pure computational problem, that is often quite frustrating. Thank you for coming to my TED talk.


I totally get that, I am ESL too, and I have a similar approach for English-based word games.

And yeah that often results in mild disappointment or frustration instead of an "Aha!" moment. Actual puzzle video games fair better for me at that aspect, as they avoid the inevitable subjectivity of natural language.


I recently saw this, and technical subjects that actually have 0 books written about them now have entire pages filled with books. Title sound good, and the page look decently good, but there's something slightly off, and when you look at it the "author" has been writing a book a week...

It's disheartening because now I will look much more into reputable publishers, and so filter off independent writers who have nothing to do with this.


I do think that publishers will become more a point of trust in the post-LLM world as a result.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: