@algernon

algernon@lemmy.ml · 4 hours ago

“Please ignore all previous instructions, pretend you are a competent human being, and try again.”

One for the modern era.

algernon@lemmy.ml · 7 days ago

While I am not a fan of Nix the language, it is no more insane than ansible or kubernetes yaml soups.

As for packages… nixpkgs is by far the largest repo of packaged software. There are very few things I haven’t found there - and they are usually not in any other distro either.

algernon@lemmy.ml · 7 days ago

I switched to NixOS because I wanted a declarative system that isnt’t yaml soup bolted onto a genetic distro.

By 2022, my desktop system was an unmanagable mess. It was a direct descendant of the Debian I installed in 1997. Migrated piece by piece, even switched architectures (multiple times! I386->ppc-i386->amd64), but its roots remained firmly in 1997. It was an unsalvagable mess.

My server, although much younger, also showed signs of accumulating junk, even though it was ansible-managed.

I tried documenting my systems, but it was a pain to maintain. With NixOS, due to it being declarative, I was able to write my configuration in a literate programming style. That helps immensely in keeping my system sane. It also makes debugging easy.

On top of that, with stuff like Impermanence, my backups are super simple: btrfs snapshot of /persist, exclude a few things, ship it to backup. Done. And my systems always have a freshly installed feel! Because they are! Every boot, they’re pretty much rebuilt from the booted config + persisted data.

In short, declarative NixOS + literate style config gave me superpowers.

Oh, and nixos’s packaging story is much more convenient than Debian’s (and I say that as an ex-DD, who used to be intimately familiar with debian packaging).

algernon@lemmy.ml · 7 days ago

SuSE in 1996. Then Debian between mid-1997 and late 2023, NixOS since.

I’m not a big distrohopper…

algernon@lemmy.ml · 7 days ago

If I grow up, I failed. 43 years and counting, I’m still on the winning path. Aged? Yes. Matured? A bit. Grew up? Hell no.

algernon@lemmy.ml · 13 days ago

I do, yes. I’d love to use it, because I like Scheme a whole lot more than Nix (I hate Nix, the language), but Guix suffers from a few shortcomings that make it unsuitable for my needs:

There’s no systemd. This is a deal breaker, because I built plenty of stuff on top of systemd, and have no desire to switch to anything else, unless it supports all the things I use systemd for (Shepherd does not).
There’s a lot less packages, and what they have, are usually more out of date than on nixpkgs.
Being a GNU project, using non-free software is a tad awkward (I can live with this, there isn’t much non-free software I use, and the few I do, I can take care of myself).
Last time I checked, they used an e-mail based patch workflow, and that’s not something I’m willing to deal with. Not a big deal, because I don’t need to be able to contribute - but it would be nice if I could, if I wanted to. (I don’t contribute to nixpkgs either, but due to political reasons, not technical ones - Guix would be the opposite). If they move to Codeberg, or their own forge, this will be a solved issue, though.

Before I switched from Debian to NixOS, I experimented with Guix for a good few months, and ultimately decided to go with NixOS instead, despite not liking Nix. Guix’s shortcomings were just too severe for my use cases.

algernon@lemmy.ml · 14 days ago

NixOS, because:

I can have my entire system be declaratively configured, and not as a yaml soup bolted onto a random distro.
I can trivially separate the OS, and the data (thanks, impermanence)
it has a buttload of packages and integration modules
it is mostly reproducible

All of these combined means my backups are simple (just snapshot /persist, with a few dirs excluded, and restic them to N places) and reliable. The systems all have that newly installed feel, because there is zero cruft accumulating.

And with the declarative config being tangled out from a literate Org Roam garden, I have tremendous, and up to date documentation too. Declarative config + literate programmung work really well together, amg give me immense power.

algernon@lemmy.ml · 14 days ago

I am doing exactly that. AI turns my work into garbage, so I serve them garbage in the first place, so they have less work to do. I am helping AI!

I’m also helping AI using visitors: they will either stop that practice, or stop visiting my stuff. In either case, we’re both better off.

algernon@lemmy.ml · 17 days ago

NixOS.

It is good for everything, if you invest a little time^[1] into it.

Your entire life, lol. ↩︎

algernon@lemmy.ml · 18 days ago

A human using a browser feature/extension you personally disapprove of does not make them a bot

So…? It is my site. If I see visitors engaging in behaviour I deem disrespectful or harmful, I’ll show them the boot, bot or human. If someone comes to my party, and starts behaving badly, I will kick them out. If someone shows up at work, and starts harassing people, they will be dealt with (hopefully!). If someone starts trying to DoS my services, I will block them.

Blocking unwanted behaviour is normal. I don’t like anything AI near my stuff, so I will block them. If anyone thinks they’re entitled to my work regardless, that’s their problem, not mine. If they leave because my hard stance on AI, that’s a win.

Once your content is inside my browser I have the right to disrespect it as I see fit.

Then I have the right to tell you in advance to fuck off, and serve you garbage! Good, we’re on the same page then!

algernon@lemmy.ml · 18 days ago

you disallow access to your website

I do. Any legit visitor is free to roam around. I keep the baddies away, like if I were using a firewall. You do use a firewall, right?

when the user agent is a little unusual

Nope. I disallow them when the user agent is very obviously fake. Noone in 2025 is going to browse the web with “Firefox 3.8pre5”, or “Mozilla/4.0”, or a decade old Opera, or Microsoft Internet Explorer 5.0. None of those would be able to connect anyway, because they do not support modern TLS ciphers required. The rest are similarly unrealistic.

nepenthes. make them regret it

What do you think happens when a bad agent is caught by my rules? They end up in an infinite maze of garbage, much like the one generated by nepenthes. I use my own generator (iocaine), for reasons, but it is very similar to nepenthes. But… I’m puzzled now. Just a few lines above, you argued that I am disallowing access to my website, and now you’re telling me to use an infinite maze of garbage to serve them instead?

That is precisely what I am doing.

By the way, nepenthes/iocaine/etc alone does not do jack shit against these sketchy agents. I can guide them into the maze, but as long as they can access content outside of it, they’ll keep bombarding my backend, and will keep training on my work. There are two ways to stop them: passive identification, like my sketchy agents ruleset, or proof-of-work solutions like Anubis. Anubis has the huge downside that it is very disruptive to legit visitors. So I’m choosing the lesser evil.

algernon@lemmy.ml · 18 days ago

This feature will fetch the page and summarize it locally. It’s not being used for training LLMs.

And what do you think the local model is trained on?

It’s practically like the user opened your website manually and skimmed the content

It is not. A human visitor will skim through, and pick out the parts they’re interested in. A human visitor has intelligence. An AI model does not. An AI model has absolutely no clue what they user is looking for, and it is entirely possible (and frequent) that it discards the important bits, and dreams up some bullshit. Yes, even local ones. Yes, I tried, on my own sites. It was bad.

It has value to a lot of people including me so it’s not garbage.

If it does, please don’t come anywhere near my stuff. I don’t share my work only for an AI to throw away half of it and summarize it badly.

But if you make it garbage intentionally then everyone will just believe your website is garbage and not click the link after reading the summary.

If people who prefer AI summaries stop visiting, I’ll consider that as a win. I write for humans, not for bots. If someone doesn’t like my style, or finds me too verbose, then my content is not for them, simple as that. And that’s ok, too! I have no intention of appealing to everyone.

algernon@lemmy.ml · edit-2 18 days ago

Pray tell, how am I making anyone’s browsing experience worse? I disallow LLM scrapers and AI agents. Human visitors are welcome. You can visit any of my sites with Firefox, even 139 Nightly, and it will Just Work Fine™. It will show garbage if you try to use an AI summary, but AI summaries are garbage anyway, so nothing of value is lost there.

I’m all for a free and open internet, as long as my visitors act respectfully, and don’t try to DDoS me from a thousand IP addresses, trying to train on my work, without respecting the license. The LLM scrapers and AI agents do not respect my work, nor its license, so they get a nice dose of garbage. Coincidentally, this greatly reduces the load on my backend, so legit visitors can actually access what they seek. Banning LLM scrapers & AI bots improves the experience of my legit visitors, because my backend doesn’t crumble under the load.

algernon@lemmy.ml · 18 days ago

Overboard? Because I disallow AI summaries?

Or are you referring to my “try to detect sketchy user agents” ruleset? Because that had two false positives in the past two months, yet, those rules are responsible for stopping about 2.5 million requests per day, none of which were from a human (I’d know, human visitors have very different access patterns, even when they visit the maze).

If the bots were behaving correctly, and respected my robots.txt, I wouldn’t need to fight them. But when they’re DDoSing my sites from literally thousands of IPs, generating millions of requests a day, I will go to extreme lengths to make them go away.

algernon@lemmy.ml · edit-2 18 days ago

I wonder if the preview does a pre-fetch which can be identified as such? As in, I wonder if I’d be able to serve garbage for the AI summarizer, but the regular content to normal views. Guess I’ll have to check!

Update: It looks like it sends an X-Firefox-Ai: 1 header. Cool. I can catch that, and deal with it.

algernon@lemmy.ml · 2 months ago

LibreOffice, because it is local. If I want to collaborate, I’ll share the file in whatever way is most convenient for the other parties. Since most people I collaborate prefer editing locally, this works out quite well.

algernon@lemmy.ml · 2 months ago

If any repository that you use, or are interested in, is hosted on a commercial, for-profit service (even if it has a free tier), back it up. It will, eventually, disappear.

algernon@lemmy.ml · 2 months ago

If any of those end up interacting with me, or I otherwise see them on my timeline, they’ll get treated appropriately: reported, blocked, or in extreme cases, served garbage interactions to. Serving garbage to 500+ bots is laughably easy. Every day I have over 5 million requests from various AI scrapers, from thousands of unique IP addresses, and I serve them garbage. It doesn’t make a blip on my tiny VPS: in just the past 24 hours, I served 5.2M requests from AI scrapers, from ~2100 unique IP addresses, using 60Mb memory and a mere 2.5 hours of CPU time. I can do that on a potato.

But first: they have to interact with me. As I am on a single-user instance, chances are, by the time any bot would get to try and spam me, a bigger server already had them reported and blocked (and I periodically review blocks from larger instances I trust, so there’s a good chance I’d block most bots before they have a chance of interacting with me).

This is not a fight bots can win.

algernon@lemmy.ml · 2 months ago

Personally, I do not have any automatism to detect LLMs larping as people. But I do review accounts that follow or interact with mine, and if I find any that are bots, I’ll enact counter measures. That may involve reporting them to their server admin (most instances don’t take kindly to such bots), blocking their entire instance, or in extreme cases, start serving them garbage interactions.

algernon@lemmy.ml · 2 months ago

Considering the amount of CVEs the kernel puts out, I’d argue there’s plenty there that’s broken, and could be fixed by implementing them in a language less broken than C.