Good question. As is, this does not keep anything in sync.
To keep the graph in sync with changes in the account, simply set up a cronjob to run `cartography` whenever you would need a refresh. Each sync run should guarantee that you have the most up-to-date data.
Here's how a sync works: when the sync starts, set a variable called `update_tag` to the current time. Then, pull all the data from your AWS account(s) and create Neo4j nodes and their relationships, making sure to set their `lastupdated` fields to `update_tag`.
Finally, delete the left over nodes and relationships (i.e. those that do not have up-to-date `lastupdated` fields). This way the data stays fresh, and you can see this in the [cleanup jobs](https://github.com/lyft/cartography/tree/master/cartography/...).
Our approach requires us to stay as real-time as possible, so we're actually using CloudWatch events to keep in sync -- the deletes become a little hard after that.
I look forward to the progress of Cartography, though!
That is true and it is the first place I usually check when I compromise a new server.
This wasn't mentioned in the post but imagine you compromised a server and found an unprotected ssh key. You don't know where it can be used, and the .bash_history has rolled over or has very few ssh commands in it. You see a lot of hosts in the known_hosts file though but it is hashed. That is where this would be helpful, and is why I went down this route.
Let's suggest an alternative scenario - the hosts and ports are encrypted.
Now what can the attackers do? Well, they still have hashes of public keys. The attacker can scan the entire IPv4 Internet with Z-MAP, and record all SSH public keys. With some hashing, the host can be identified. With online services like Censys (https://censys.io/), the attackers don't even have to scan and compute, but can directly obtain the information from a public database...
Also, to make it clear, while I'm saying that the attack is too impractical to make sense, I have full respect to your research project, thanks for analyzing this security issue for the community.
Curl piping into bash will trivially steal all of your data at once.
Running a container from dockerhub is much safer, provided you do not give it privileges using --privileged or bind-mounting system files like docker control socket.
If your system is up to date and there are no docker 0-days active, the worst "docker run --rm -it RANDOM-CONTAINER" can do is to use too much resources -- your local secrets would be safe.
It is kind of disturbing that apparently a huge number of people installed these Docker containers and did not care to notice that they were using 100% CPU on all available cores, 24x7.
Yep. Even full virtualization isn't truly sandboxed, but the sandbox is much tighter.
FreeBSD has jails and Solaris has zones, both of which were designed to be safe sandboxes for OS-level virtualization or "containerization" as it's called today. The consensus, as far as I can tell, is that these are pretty safe/strict, at least as far as "provide a safe environment to execute untrusted code" goes.
On Linux, resource control mechanisms like cgroups and namespaces have been co-opted to simulate secure sandboxes, but it's not the same as actually providing them.
Sure, and there is nothing wrong with either one in most cases. Salesmen, bloggers, security people, and others like to disagree, but they do it out of bias, and not because they want you to get things done.
Edit: I'd like to be wrong about this. Maybe some brave downvoter could help out here?
Security people certainly "do it out of bias". Most are, rather understandably, biased against having systems they're tasked with managing get pwned from under them.
Piping curl to bash is equivalent to running a remote code execution exploit against yourself. Even if you implcitly trust the endpoint, do you trust that they will remain uncompromised forever? Also, it's especially silly because it's never the best or only way of accomplishing a given task, so it serves only to shoot yourself in the foot.
I agree, it seems strange that these ports are open, which is why I wrote this up. I spoke with Microsoft on the phone and they confirmed that the ports must be open for their SLA checks and monitoring to work properly.
To reconfirm this, I ran a quick test and saw these ports open by default after I configured an application gateway on my account.
Apparently they don’t have a mechanism to permit their own addresses because the probes come from dynamic addresses. It sounded like they were adding this as a predefined label in the future.