Maybe LLM isn't the full problem?

Shin@piefed.social · 8 days ago

Maybe LLM isn't the full problem?

dumnezero@piefed.social · 6 days ago

The creation is also the problem. Pillaged training data - problem.

The older AI society destroying models – the ad and recommendation algorithms – are also bad, as is their their training data.

Banana@sh.itjust.works · 8 days ago

The problem is capitalism and what it’s using AI for. LLMs and their pattern recognition could be very useful if we used them for what they are good at, but capitalists want to replace people with it to avoid paying wages.

Hotzmon@fedinsfw.app · 8 days ago

There are use cases that have previously been impossible in coding, but are now relatively simple to resolve due to LLM. Like text categorization and summarization, which were previously near impossible to code. Nothing “ai” about that, it just uses the statistical nature of LLM’s.

very_well_lost@lemmy.world · 8 days ago

text categorization and summarization, which were previously near impossible to code

Not saying that there aren’t any coding challenges that were impossible before AI, but these are bad examples. 10+ years ago Reddit was already infested with “summary” bots that summarized articles with high accuracy, often better than the AI generated summaries I see nowadays… especially when you consider the computational cost. AI summarization requires a $1000+ GPU. “Dumb” summarization algorithms can run on a phone from 2015.

Hotzmon@fedinsfw.app · 7 days ago

Gonna need some references on that. To my knowledge there is no other algorithms/tech to do that kind of summarization than these kind of token prediction. Yes, reddit might have been doing that kind of things, but under the hood that is 100% same base tech than in GPT, just sooner version of it, with similar kind of GPU consumption.

very_well_lost@lemmy.world · 7 days ago

There’s “LexRank” from 2019 that uses a graph-based approach, similar to the PageRank algorithm that originally made Google so successful. An older version called TextRank has been around since 2004: https://github.com/crabcamp/lexrank

SMMRY has been around since 2009 and I believe was behind most of the tl;dr style bots that were common on Reddit in the 2010s. The original implementation was rules-based instead of a transformer architecture, but it appears the company has pivoted into AI in recent years. Here’s an article about it from before they made the switch: https://medium.com/@mplaut929/smmry-the-algorithm-behind-reddits-tldr-bot-c268722a4c27

Neither of these use(d) the GPT architecture or needed a GPU to run.

Hotzmon@fedinsfw.app · 6 days ago

Sure LexRank works, but it cannot reword the text, it can only reorder/remove sentences or words. It has use cases, and it cannot hallusinate, because it must just reuse the parts of the input. Unfortunately a good summarization requires often rewording.

But, I stand corrected, I did not know reddit used it.

very_well_lost@lemmy.world · 6 days ago

Unfortunately a good summarization requires often rewording.

Agree to disagree, I guess.

Summarization is lossy no matter what, but I’d much rather the lost data be deterministic, and the preserved data be guaranteed to represent the original text. AI summarization is like a bad game of telephone, and it’s hard to be sure when it’s given you a genuine summary or injected its own bias, missed key details, etc. And that’s assuming it doesn’t just completely hallucinate.

Eheran@lemmy.world · 7 days ago

Can you link such an example summary?

Maybe LLM isn't the full problem?

Maybe LLM isn't the full problem?

Agents for text