Hacker Newsnew | past | comments | ask | show | jobs | submit | stevenwalton's commentslogin

  > To any Linux users,
I have a Macbook Air and I pretty much use it as an ssh machine. It is definitely over priced for that, but it at least beats the annoyance of having to deal with Windows and all the Word docs I get sent or Teams meetings... (Seriously, how does Microsoft still exist?)

Since I mostly live in the terminal (ghostty) or am using the web browser I usually don't have to deal with stupid Apple decisions. Though I've found it quite painful to try to do some even basic things when I want to use my Macbook like I'd use a linux machine. Especially since the functionality can change dramatically after an update... I just don't get why they (and other companies) try to hinder power users so much. I understand we're small in numbers, but usually things don't follow flat distributions.

  > I had to split all my dot files into common/Linux/Mac specific sections
There's often better ways around this. On my machine my OSX config isn't really about specifically OSX but what programs I might be running there[0]. Same goes for linux[1], which you'll see is pretty much just about CUDA and aliasing apt to nala if I'm on a Debian/Ubuntu machine (sometimes I don't get a choice).

I think what ends up being more complicated is when a program has a different name under a distro or version[2]. Though that can be sorted out by a little scripting. This definitely isn't the most efficient way to do things but I write like this so that things are easier to organize, turn on/off, or for me to try new things.

What I find more of a pain in the ass is how commands like `find`[3] and `grep` differ. But usually there are ways you can find to get them to work identically across platforms.

  > Don't expect to be able to clone and build any random C++ project unless someone in the project is specifically targeting Mac.
But yeah, I don't have a solution to this... :(

[0] https://github.com/stevenwalton/.dotfiles/blob/master/rc_fil...

[1] https://github.com/stevenwalton/.dotfiles/blob/master/rc_fil...

[2] https://github.com/stevenwalton/.dotfiles/blob/master/rc_fil...

[3] https://github.com/stevenwalton/.dotfiles/tree/master/rc_fil...


  > Once you have it on disk, how do you get it away from your phone?
Since we're talking about Android, a great method is to just use Termux and rsync. You can write a pretty quick and dirty shell script to accomplish this. Here, I'll drop mine[0]. It's no the cleanest but it'll get the job done and has some documentation to it. It will check if you're on WiFi and connected to a specific SSID. You can change this around pretty easily to do different things like point at 2 servers, use Tailscale, give a white list of allowed SSIDs, change the rsync to have it delete from the local storage, or whatever. If you don't know how you can reply to this comment or open an issue and I'll respond[1].

Unfortunately this doesn't work on iPhone. I have a shortcut that will do something similar that I can share but that is a lot hackier...

[0] https://github.com/stevenwalton/.dotfiles/blob/master/script...

[1] Probably better. I'm normally logged into my alt account


  >  so we could not hop on WiFi and message each other using Signal.
I have a feature request for this actually. I think if it got a harder push they would consider it. It's not full decentralization but does still prevent the concerns that Moxie and Meredith have stated.

It is like you say: I too wish Signal would allow for communication over any available medium.

https://community.signalusers.org/t/signal-airdrop/37402


I've really been trying to get Signal to get some decentralization[0] but unfortunately I pissed off some mods. I do understand their reasoning for staying away from full decentralization, both Moxie and Meredith have made good arguments. But I think this is something where there's a really good middle ground. Where both parties highly benefit.

Users get a lot of added utility, "fun", and not to mention a huge upgrade in privacy and security (under local settings), while Signal gets to reduce a lot of data transfer over the network. There's a lot of use cases for local message and file sharing (see thread) and if the goal is to capture as little data as possible about the users, well let's not capture any network traffic when users are in close proximity, right? It's got to be a lot harder to pick up signals that only are available within a local proximity than signals traveling across the internet. The option of expanding to a mesh network can be implemented later[1] but I don't understand how an idea like this doesn't further the stated goals.

The big problem with things like Briar is that you can't install it after the internet has been turned off AND it is already unpopular. But if an existing app with an existing userbase implements even some meshing then this benefits all those users when an event like that happens. Not to mention there's clear utility in day-to-day life.

[0] https://community.signalusers.org/t/signal-airdrop/37402

[1] I think a mesh network maintains the constraints both Moxie and Meredith have discussed, concerns about ensuring servers are up to data. But then again I'm not sure why that can't be resolved in the same way it is already done where if you let Signal fall too far behind in updates then it will no longer communicate with the servers.


> The big problem with things like Briar is that you can't install it after the internet has been turned off AND it is already unpopular

Sideloading an .apk is supported in all Android versions, right? Even without internet access? Is something more needed to install Briar?


Sure, but this doesn't really scale very well. Distributing those APKs without internet access is pretty hard.


Briar already has this built in via the "Share this app offline" feature. It starts a wifi hotspot from which people can download the apk.


Also Fdroid has support for local distribution and discovery for offline scenarios.


you can distribute it via the same mechanism which distributes your messages


> Why on earth, when you go to 192.168.whatever:8096 does it ask you what server you want to connect to?

I don't know what the answer is, but I ran into this when I was trying to harden my systemd settings. I'll link my override below and maybe someone can give something more conclusive (and any suggestions to my override or for other services are greatly welcomed. I'm happy to add even ones I don't use). Where I hit this error was when messing with RestrictAddressFamilies, which are network socket addresses. For example, when I restrict AF_PPPOX or AF_UNIX I get that issue. IIRC I also hit that issue when I had moved a file, but I forgot which one (I noticed it got autogenerated again). So I suspect it has to do with access to some file location where it stashes a config file. Fwiw, this works with tailscale just fine.

https://github.com/stevenwalton/.dotfiles/blob/master/skelet...

(docs for RestrictAddressFamilies) https://www.freedesktop.org/software/systemd/man/latest/syst...


> how much are GenAI companies willing to invest to get eyeball reflections right?

Willing to? Probably not much. Should? A WHOLE LOT. It is the whole enchilada.

While this might not seem like a big issue and truthfully most people don't notice, getting this right (consistently) requires getting a lot more right. It doesn't require the model knowing physics (because every training sample face will have realistic lighting). But what underlines this issue is the model understanding subtleties. No model to date accomplishes this. From image generators to language generators (LLMs). There is a pareto efficiency issue here too. Remember that it is magnitudes easier to get a model to be "80% correct" than to be "90% correct".

But recall that the devil is in the details. We live in a complex world, and what that means is that the subtleties matter. The world is (mathematically) chaotic, so small things have big effects. You should start solving problems not worrying about these, but eventually you need to move into tackling these problems. If you don't, you'll just generate enshitification. In fact, I'd argue that the difference between an amateur and an expert is knowledge of subtleties and nuance. This is both why amateurs can trick themselves into thinking they're more expert than they are and why experts can recognize when talking to other experts (I remember a thread a while ago where many people were shocked about how most industries don't give tests or whiteboard problems when interviewing candidates and how hiring managers can identify good hires from bad ones).


> But the constraint that both eyes should have consistent reflection patterns is just another statistical regularity that appears in real photographs

Hi, author here of a model that does really good on this[0]. My model is SOTA and has undergone a third party user study that shows it generates convincing images of faces[1]. AND my undergrad is in physics. I'm not saying this to brag, I'm giving my credentials. That I have deep knowledge in both generating realistic human faces and in physics. I've seen hundreds of thousands of generated faces from many different models and architectures.

I can assure you, these models don't know physics. What you're seeing is the result of attention. Go ahead and skip the front matter in my paper and go look at the appendix where I show attention maps and go through artifacts.

Yes, the work is GANs, but the same principles apply to diffusion models. Just diffusion models are typically MUCH bigger and have way more training data (sure, I had access to an A100 node at the time, but even one node makes you GPU poor these days. So best to explore on GANs ):

I'll point out flaws in images in my paper, but remember that these fool people and you're now primed to see errors, and if you continue reading you'll be even further informed. In Figures 8-10 you can see the "stars" that the article talks about. You'll see mine does a lot better. But the artifact exists in all images. You can also see these errors in all of the images in the header, but they are much harder to see. But I did embed the images as large as I could into the paper, so you can zoom in quite a bit.

Now there are ways to detect deep fakes pretty readily, but it does take an expert eye. These aren't the days of StyleGAN-2 where monsters are common (well... at least on GANs and diffusion is getting there). Each model and architecture has a different unique signature but there are key things that you can look for if you want to get better at this. Here's things that I look for, and I've used these to identify real world fake profiles and you will see them across Twitter and elsewhere:

- Eyes: Eyes are complex in humans with lots of texture. Look for "stars" (inconsistent lighting), pupil dilation, pupil shape, heterochromia (can be subtle see Figure 2, last row, column 2 for example), and the texture of the iris. And also make sure to look at the edge of eyes (Figs 8-10) and

- Glasses: look for aberrations, inconsistent lighting/reflections, and pay very close attention to the edges where new textures can be created

- Necks: These are just never right. The skin wrinkles, shape, angles, etc

- Ears: These always lose detail (as seen in TFA and my paper), lose symmetry in shape, are often not lit correctly, if there are earrings then watch for the same things too (see TFA).

- Hair: Dear fucking god, it is always the hair. But I think most people might not notice this at first. If you're having trouble, start by looking at the strands. Start with Figure 8. Patches are weird, color changes, texture, direction, and more. Then try Fig 9 and TFA.

- Backgrounds: I make a joke that the best indicator to determine if you have a good quality image is how much it looks like a LinkedIn headshot. I have yet to see a generated photo that has things happening in the background that do not have errors. Both long-range and local. Look at my header image with care and look at the bottom image in row 2 (which is pretty good but has errors), row 2 column 4, and even row 1 in column 4's shadow doesn't make sense.

- Phase Artifacts: This one is discussed back in StyleGAN2 paper (Fig 6). These are still common today.

- Skin texture: Without fail, unrealistic textures are created on faces. These are hard to use in the wild though because you're typically seeing a compressed image and that creates artifacts too and you frequently need to zoom to see. They can be more apparent with post processing though.

There's more, but all of these are a result of models not knowing physics. If you are just scrolling through Twitter you won't notice many of these issues. But if you slow down and study an image, they become apparent. If you practice looking, you'll quickly learn to find the errors with little effort. I can be more specific about model differences but this comment is already too long. I can also go into detail about how we can't determine these errors from our metrics, but that's a whole other lengthy comment.

[0] https://arxiv.org/abs/2211.05770

[1] https://arxiv.org/abs/2306.04675


I'm just curious, why not build ontop of another app like Signal?[0] My understanding is that there's nothing stopping anyone from using the same app and creating their own server and nodes. My understanding is that you can even hook into multiple nodes with a custom fork of the app. Wouldn't this give a big advantage of not requiring people to have a whole new app and you can work synergistically with a company with similar/compatable goals?

The thing I see is that if you really want to make a huge P2P network, you need a reason to have the app installed for reasons other than P2P. The problem I've always seen with FireChat was that I'd never get anyone to talk to me and then when there was an emergency no one would be able to download. So we need to have the features built into something with more normal day-to-day utility.

[0] https://community.signalusers.org/t/signal-airdrop/37402


Not up, but most of the times it is a lot easier to build something from scratch. Signal is notoriously hard extend and use - they have a lot of custom tech. I gave up looking through their documentation. Now, the reason is that they actually have end-to-end encryption and most do not implement it in a secure and nice way. They basically had to build everything from scratch themselves.

TLDR it is often harder to reuse


Logged into my personal account for this one! I'm a lead author on a paper that explored exactly. It does enable faster training and smaller model sizes. For reference, you can get 80% accuracy on CIFAR-10 in ~30 minutes of CPU (not using crazy optimizations). There are open questions about scaling but at the time we did not have access to big compute (really still don't) and our goals were focused on addressing the original ViT's claims of data constraints and necessities of pretraining for smaller datasets (spoiler, augmentation + overlapping patches plays a huge role). Basically we wanted to make a network that allowed people to train transformers from scratch for their data projects because pretrained models aren't always the best solutions or practical.

Paper: https://arxiv.org/abs/2104.05704

Blog: https://medium.com/pytorch/training-compact-transformers-fro...

CPU compute: https://twitter.com/WaltonStevenj/status/1382045610283397120

Crazy optimizations (no affiliation): 94% on CIFAR-10 in <6.3 seconds on a single A100 : https://github.com/tysam-code/hlb-CIFAR10

I also want to give maybe some better information about ViTs in general. Lucas Beyer is a good source and has some lectures as well as Hila Chefer and Sayak Paul's tutorials. Also, just follow Ross Wightman, the man is a beast

Lucas Beyer: https://twitter.com/giffmana/status/1570152923233144832

Chefer & Paul's All Things ViT: https://all-things-vits.github.io/atv/

Ross Wightman : https://twitter.com/wightmanr

His very famous timm package https://github.com/huggingface/pytorch-image-models


Thanks for all the good work and all the pointers! Awesome stuff. Let me know if you would want to join us live on a Friday and go over some of your newer work or any recent papers you find interesting. Feel free to reach out at hello@oxen.ai if so :)


I very cursorily skimmed your paper but I didn’t spot where it discusses overlapping the patches. Is it the section about using the hybrid model with a convolutional step which de facto accomplishes it (maybe?) instead of overlapping patches?


Yeah so I can get how that might be confusing. Sometimes code is clearer. So in the vanilla transformer you do a patch and then embed operation, right? A quick way to do that is actually with non-overlapping convolutions. Your strides are the same size as your kernel sizes. Look closely at Figure 2 (you can also see a visual representation in Figure 1 but I'll admit there is some artistic liberty there because we wanted to stress the combined patch and embed operation. Those are real outputs though. But basically yeah, change the stride so you overlap. Those create patches, then you embed. So we don't really call it a hybrid the same way you may call a 1x1 cov a channel wise linear (which is the same as permuting linear then permute).

ViT https://github.com/SHI-Labs/Compact-Transformers/blob/main/s...

CCT: https://github.com/SHI-Labs/Compact-Transformers/blob/main/s...

Edit: Actually here's a third party version doing the permutation then linear then reshape operation

https://github.com/lucidrains/vit-pytorch/blob/main/vit_pyto...

But the original implementation uses Conv: https://github.com/google-research/vision_transformer/blob/m...


> Are transformers competitive with (for example) CNNs on vision-related tasks when there's less data available?

They can be, there's current research into the tradeoffs between local inductive bias (information from local receptive fields: CNNs have strong local inductive bias) and global inductive bias (large receptive fields: i.e. attention). There's plenty of works that combine CNNs and Attention/Transformers. A handful of them focus on smaller datasets, but the majority are more interested in ImageNet. There's also work being done to change the receptive fields within attention mechanisms as a means to balance this.

> Are transformers ever used with that scale of data?

So there's a yes and no to your question. But definitely yes since people have done work on Flowers102 (6.5k training) and CIFAR10 (50k training). Keep in mind that not all these models are pure transformers. Some have early convolutions or intermediate ones. Some of these works even have a smaller number of parameters and better computational efficiency than CNNs.

But more importantly, I think the big question is about what type of data you have. If large receptive fields are helpful to your problem then transformers will work great. If you need local receptive fields then CNNs will tend to do better (or combinations of transformers and CNNs or reduced receptive fields on transformers). I doubt there will be a one size fits all architecture.

One thing to also keep in mind is that transformers typically like heavy amounts of augmentation. Not all data can be augmented significantly. There's also pre-training and knowledge transfer/distillation.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: