Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Our company has a no AI use policy. The assumption is zero trust. We simply can’t know whether a model or its framework could or would send proprietary code outside the network. So it’s best to assume all LLMs/AI is or will send code or fragments of code. While I applaud the incredible work by their creators, I’m not sure how a responsible enterprise class company could rely on “trust us bro” EULAs or repo readmes.


The same way responsible enterprise class companies rely on "trust us bro" EULAs for financial systems, customer databases, payroll, and all the other systems it would be very expensive and error prone to build custom for every business.


Pretty much this.

OpenAI poisoned the well badly with their "we train off your chats" nonsense.

If you are using any API service, or any enterprise ChatGPT plan, your tokens are not being logged and recycled into new training data.

As for why trust them? Like the parent said: EULAs. Large companies trust EULAs and terms of service for every single SAAS product they use, and they use tons and tons of them.

OpenAI in a clumsy attempt to create a regulatory moat by doing sketchy shit and waving wild "AI will kill us all" nonsense has created a situation where the usefullness of these transforming generative solutions are automatically rejected by many.


The enterprise plan and I believe the API too are more expensive


But isn't AI very competitive right now? And also have more direct access to the "secret sauce" of the company?


What would competitiveness of AI have to do with whether a confidentiality clause in a commercial license is enough for an enterprise or not?

A source code hosting service has direct access to the "secret sauce" of a company built around proprietary software. A customer relationship management service has direct access to the "secret sauce" of a company built around sales and customer relations. A document management service has direct access to the "secret sauce" of a company built around confidential documents. A cloud hosting provider has direct access to the "secret sauce" of a company built around a database.

Everything's SaaS. Everyone's confidential and critical data is aggregated in some big provider's system. Except for a few paranoid (sensible? rational?) holdouts who aren't competitive as a result.


Your company could locally host LLMs; you wont get chatGPT or Claude quality, but you can get something that would have been SOTA a year ago. You can vet the public inference codebases (they are only of moderate complexity), and you control your own firewalls.


You can get standalone/isolated versions of chatGPT, if your org is large enough, in partnership with OpenAI. And others. They run on the same infra but in accounts you set up, cost the same, but you have visibility on the compute, and control of data exfil - ie is there is none.


You can run Claude on both AWS and Google Cloud. I’m fairly certain they don’t share data, but would need to verify to be sure.


You can also run Llama 405B and the latest (huge) DeepSeek on your own hardware and get LLMs that trade blows with Claude and ChatGPT, while being fully isolated and offline if needed.


With Amazon Bedrock you can get an isolated serverless Claude or llama with a few clicks


True, but if your org is super paranoid about data exfiltration you're probably not sending it to AWS either.


The vast majority of fortune 500’s have legal frameworks up for dealing with internal AI use already because the reality is employees are going to use it regardless of internal policy. Assuming every employee will act in good faith just because a blanket AI ban is in place is extremely optimistic at best, and isn’t a good substitute for actual understanding.


Internal policies at these companies are rarely subject to a level of faith that you're implying. Instead external access to systems is logged, internal systems are often sandboxed or otherwise constrained in how you interact with them, and anything that looks like exfiltration sets off enough alarms to have your manager talking to you that same day, if not that same hour.


What's the realistic attack scenario? Will Sam Altman steal your company's code? Or will next version of GPT learn on your secret sauce algorithms and then your competitors will get them when they generate code for their tasks and your company loses its competitive advantage?

I'm actually sure that there are companies for which these scenarios are very real. But I don't think there's a lot of them. Most of the code our industry works on has very little value outside of context of particular product and company.


So why bother securing anything at all if not willing to secure the raisons d'être? Doesn’t that suggest that these companies are trivial entities?


Only if you see source code as the only valuable thing, which it isn't. The knowledge of the team, industry connections, experience etc etc are a big part of what make it so you can effectively use the source code.

We're making an industrial sorting machine. Our management is feared to death to lose the source code. But realistically, who's going to put in the time to fully understand a codebase we can barely grasp ourselves? Then get rid of all custom sensor mappings, paths and other stuff specific for us. And then develop on it further, assuming they even believe we have the "right" way of doing things?

Right, no one. 90% of companies could open source their stuff and, apart from legal nonsense, nothing practical will happen, no one will read the code.


You just supported my point that these companies at their core have little value. A team? Teams are fleeting and easily replaced given the hiring and firing (and poaching) practices of companies. Industry connections? Maybe to some degree, but those are fleeting as well and how do you value it? Most of these connections are held by relatively few people in the company.

Companies in other legal jurisdictions will and can steal ip with little impunity and throw new AI tools to quickly gather an understanding of the codebase. Furthermore, knowledge of source provides a roadmap to attack vectors for security violations. Seems foolish to dismiss the risks of losing control of source code.


There are plenty of very realistic attack scenarios, that's why we secure stuff.


So, you're asking how enterprise class companies are using github for repos and gmail for all the enterprise mail? What's next, zoom/teams for meetings?


They might be using neither


Local LLMs for code aren't that out of the question to run.

Even for not code generation, but even smaller models only for programming to weigh on different design approaches, etc.


Does your company develop software overseas where legal action is difficult? Or where their ip could be nationalized or secretly stolen? Where network communications are monitored and saved?


Just curious, how does your company host its email? Documents? Files?


> I’m not sure how a responsible enterprise class company could rely on “trust us bro” EULAs or repo readmes.

Isn't that what we do with operating systems, internet providers, &c. ?


How is that related? we're talking of continuously sending proprietary code and related IP to a third party, seems a pretty valid concern to me.

I, for one, work every day with plenty of proprietary vendor code under very restrictive NDAs. I don't think they would be very happy knowing I let AIs crawl our whole code base and send it to remote language models just to have fancy autocompletion.


"Continuously sending proprietary code and related IP to a third party"

Isn't this... github?

Companies and people are doing this all day every day. LLM APIs are really no different. Only when you magic it up as "the AI is doing thinking" ... but in reality text -> tokens -> math -> tokens -> text. It's a transformation of numbers into other numbers.

The EULAs and ToS say they don't log or retain information from API requests. This is really no different than Google Drive, Atlassian Cloud, Github, and any number of online services that people store valuable IP and proprietary business and code in.


Ok, the LLM crawls your code. Then what? What is the exfiltration scenario?


Do you read every single line of code of every single dependency you have ? I don't see how llms are more of a threat than a random compromised npm package or something from a OS package manager. Chances are you're already relying on tons and tons of "trust me bro" and "it's opensource bro don't worry, just read the code if you feel like it"


One thing is consciously sharing IP with third parties violating contracts, another is falling victim of malicious code in the toolchain.

Npm concern though suggests we likely work in very different industries so that may explain the different perspective.


> proprietary code outside the network

Thought exercise: what would seriously happen if you did let some of your proprietary code outside your network? Oddly enough, 75% of the people writing code on HN probably have their companies code stored in GitHub. So there already is an inherent trust factor with GH/MSFT.

As another anecdote - Twitch's source code got leaked a few years back. Did Twitch lose business because of it?


The other consideration: your company's code probably just isn't that good.

I think many people over-value this giant pile of text. That's not to say IP theft doesn't exist, but I think the actual risk is often overblown. Most of an organization's value is in the team's collective knowledge and teamwork ability, not in the source code.


> Thought exercise: what would seriously happen if you did let some of your proprietary code outside your network

Lawsuits? Lawful terminations? Financial damages?


Huh? No, i'm saying, what potential damage does an organization have? Not the individual who may leak data outside your network.


Those are risks both for the individual and for the company when there are contracts in place with third parties involving code sharing.

Other risks include leaking industrial secrets that may significantly damage company business or benefit competitors.


Please acknowledge that your situation is pretty unique. Just take a look at the comments: how many people say, or outright presume, that their company's code is already on GitHub? I'd wager that your org doesn't keep code at a 3rd party provider, right? Then, you're in a minority.

I don't mean to dismiss your concerns - in your situation, they are probably warranted - I just wanted to say that they are unique and not necessarily shared by people who don't share your circumstances.


This subthread started with someone from a no AI policy company, people are dismissing it with snarky comments, along the line of your code is not as important as you believe. I'm just trying to show a different picture, we work in a pretty vast field and people commenting here don't necessarily represent a valid sample.


> people are dismissing it with snarky comments, along the line of your code is not as important as you believe.

That says more about those people than about your/OP's code :)

Personally, I had a few collisions with regulation and compliance over the years, so I can appreciate the completely different mindset you need when working with them. On the other hand, at my current position, not only do we have everything on Github, but there were also instances where I was tasked with mirroring everything to bitbucket! (For code escrow... i.e., if we go out of business, our customer will get access to the mirrored code.)

> people commenting here don't necessarily represent a valid sample.

Right. I should have said that you're in the minority here. I'm not sure what's the ratio of dumb CRUD apps to "serious business" kind of development in the wild. I know there are whole programming subfields where your kinds of concerns are typical. They might just be underrepresented here.


Yes I've had plenty of experiences with orgs that self host everything, I don't think it's a minority it's just a different cluster than the one most represented here.

Still I believe hosting is somewhat different, if anything because it's something established, known players, trusted practices. AI is new, contracts are still getting refined, players are still making their name, companies are moving fast and I doubt data protection is their priority.

I may be wrong but I think it's reasonable for IT departments to be at least prudent towards these frameworks. Search is ok, chat is okish, crawling whole projects for autocompletion I'd be more careful.


> I doubt data protection is their priority.

So you're basing your whole argument on nothing other than "I just don't feel like they do that".

Does this look unserious to you? https://trust.openai.com/


> Yes I've had plenty of experiences with orgs that self host everything, I don't think it's a minority it's just a different cluster than the one most represented here.

I've done 800+ tech diligence projects and have first hand knowledge of every single one's use of VCS. At least 95% of the codebases are stored on a cloud hosted VCS. It's absolutely a minority to host your own VCS.


First, I didn't dismiss their "no AI policy" nor did I use snarky comments. I was asking a legitimate question - which is - most orgs have their code stored on another server out of their control, so what's the legitimate business issue if your code gets leaked? I still haven't gotten an answer.


You can run pretty decent models on your laptop these days. Works in airplane mode.

https://ollama.com/


Palo Alto networks provides security product "AI access security" which claims to solve the problem you mentioned - access control, data protection etc. I don't personally use it neither does my org. Giving here just in case it is useful for someone.


You can get models that run offline. The other risk is copyright/licensing exposure; e.g. the AI regurgitates a recognisably large chunk of GPL code, and suddenly you have a legal landmine in your project waiting to be discovered. There's no sane way for a reviewer to spot this situation in general.

You can ask a human to not do that, and there are various risks to them personally if they do so regardless. I'd like to see the AI providers take on some similar risks instead of disclaiming them in their EULAs before I trust them the way I might a human.


Seems like only working on open source code has its benefits.


I mean, we host our code on Github. What are they going to do with Copilot code snippets?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: