• 82 Posts
  • 2.26K Comments
Joined 3 years ago
cake
Cake day: June 10th, 2023

help-circle
  • What’s fun is how often this principle is used every day. For example, when you upload a video to YouTube, you’re assigned a unique URL, but it would be too slow to simply add your URL to a list to make sure nobody else uses it. There are millions of videos uploaded every day, and thousands of servers spread all over the world.

    Instead, YouTube just generates a truly random URL and depends on the odds of two videos having the same URL being effectively zero.

    The same is true for Bitcoin. If you could guess a Bitcoin private key for any currently used wallet, you’d have full access to the funds within that wallet. This can even be done offline. Even if you could guess trillions of private keys per second, the odds of you hitting even one that’s already been used is low enough to be totally secure.







  • I think it’s critically important to be very specific about what LLMs are “able to do” vs what they tend to do in practice.

    The argument is that the initial training data is sufficiently altered and “transformed” so as not to be breaking copyright. If the model is capable of reproducing the majority of the book unaltered, then we know that is not the case. Whether or not it’s easy to access is irrelevant. The fact that the people performing the study had to “jailbreak” the models to get past checks tells you that the model’s creators are very aware that the model is very capable of producing an un-transformed version of the copyrighted work.

    From the end-user’s perspective, if the model is sufficiently gated from distributing copyrighted works, it doesn’t matter what it’s inherently capable of, but the argument shouldn’t be “the model isn’t breaking the law” it should be “we have a staff of people working around the clock to make sure the model doesn’t try to break the law.”