In the recent days I’ve been stumbling upon weird, new so-called “AI” Mathy-math-slop sites, like linuxv*x.com[1]. Some other was called something like “tutorialsipedia”, or whatever.

Have you noticed these? Is that some weird new Startup that wants to leverage CEO and “AI”? I’d use them, but my eyes glaze off the page. It’s like a drop on a Lotus leaf and I can’t really read that garbage. What’s up with those?


  1. Don’t want to give them the traffic. ↩︎

  • doodoo_wizard@lemmy.ml
    link
    fedilink
    arrow-up
    7
    ·
    2 days ago

    Welcome to the year of the linux desktop. Now solving linux problems is big business!

    What you’re saying about drops on a lotus leaf hits though. There’s something weird about the prose on those sites that’s significantly different than even ai text I’ve made at home on my own hardware.

    Sometimes it feels like the opposite of meditation where I can feel something tugging “up” in the top center of my skull when “reading” one of those pages but don’t remember what the page was about.

    • CCRhode@lemmy.ml
      link
      fedilink
      arrow-up
      1
      ·
      edit-2
      22 hours ago

      drops on a lotus leaf

      Here’s a strategy for scoring your own search results.

      “Keywords” are the seven words most commonly occurring on the page. If these seven words are seen to be repeated on the page to an unusual degree, then it is a good assumption that the page was designed by the author to appear high on search results.

      Keyword density is a measure of “gloss.” Most people will read pages with high keyword density as unusually glossy. Keyword density is not necessarily related to how genuine the page content appears to be otherwise, but most people will look askance at a page that is too glossy.

      It should come as no big surprise that the pages that appear high on search results have been designed that way. They are deliberately glossy with high keyword density. You may consider whether to skip reading them or even loading them in your browser. Chances are good that the glossy pages are mostly advertising.

      Generally you will find interspersed in your results a handful of sites with low keyword density. These are likely from universities, government sites, and research institutions that have sources of revenue beyond advertising. You may consider whether to load these up and skim through them. Probably they will show a publication date, author, and list of references, which will move your research forward.

      It can be noted that AI-generated sites often exhibit high keyword density. This is probably deliberate so that they garner advertising revenue. However, it may also be due to “bot 'splaining,” which is polly-paraphrasing a series of several (perhaps contradictory) articles.

      Keyword density is not the only measure of gloss. There are others that have been developed to measure ratios between parts of speech. Unfortunately NONE OF THESE — including keyword density — distinguish sharply between pages that naturally convey genuine information and pages that have been designed to convey fluff for ulterior purposes. It is unlikely that combining measures of gloss will result in a tool that discriminates much better than keyword density by itself.

      • Piskorski, Jakub, Marcin Sydow, and Weiss Weiss. “Exploring Linguistic Features for Web Spam Detection: A Preliminary Study.” Airweb '08: Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web. Ed. Carlos Castillo, Kumar Chellapilla, and Dennis Fetterly. New York: ACM, Apr. 2008. 25-28. ISBN:9781605581590. DOI:10.1145/1451983. 09 Nov. 2025 https://users.pja.edu.pl/~msyd/lingFeat08draft.pdf.

      Nevertheless, you may wish to explore keyword density as a means to rank search results.

      When I try to include a direct link to my python scripts, which do that, my responses and in fact the whole posted discussion are taken down. … something to do with self promotion of untested software I suppose. But you can find them in the Cheese Shop (See Wikipedia “Python Package Index.”) under clanker_score.

      We don’t want to make this too easy for just anyone to censor all his search results. Rather, these scrips are meant as a learning tool. They demonstrate generally how rotten search results can be on one particular and not very compelling dimension. It should not be necessary to download and scan each and every page. You should be able to train yourself to ignore a priori results that include handfuls of pages from unauthoritative sites.

    • definitemaybe@lemmy.ca
      link
      fedilink
      arrow-up
      1
      ·
      1 day ago

      “reading” one of those pages but don’t remember what the page was about.

      That’s one of the biggest tells of AI-written text. It uses a lot of words to say very little, but does so in a very authoritative-sounding (or needlessly flowery) way.

    • Prunebutt@slrpnk.netOP
      link
      fedilink
      arrow-up
      2
      ·
      2 days ago

      Sometimes it feels like the opposite of meditation where I can feel something tugging “up” in the top center of my skull when “reading” one of those pages but don’t remember what the page was about.

      This is your brain on slop.