Maybe LLM isn't the full problem?

Shin@piefed.social · 11 days ago

Maybe LLM isn't the full problem?

Hotzmon@fedinsfw.app · 9 days ago

Gonna need some references on that. To my knowledge there is no other algorithms/tech to do that kind of summarization than these kind of token prediction. Yes, reddit might have been doing that kind of things, but under the hood that is 100% same base tech than in GPT, just sooner version of it, with similar kind of GPU consumption.

very_well_lost@lemmy.world · 9 days ago

There’s “LexRank” from 2019 that uses a graph-based approach, similar to the PageRank algorithm that originally made Google so successful. An older version called TextRank has been around since 2004: https://github.com/crabcamp/lexrank

SMMRY has been around since 2009 and I believe was behind most of the tl;dr style bots that were common on Reddit in the 2010s. The original implementation was rules-based instead of a transformer architecture, but it appears the company has pivoted into AI in recent years. Here’s an article about it from before they made the switch: https://medium.com/@mplaut929/smmry-the-algorithm-behind-reddits-tldr-bot-c268722a4c27

Neither of these use(d) the GPT architecture or needed a GPU to run.

Hotzmon@fedinsfw.app · 9 days ago

Sure LexRank works, but it cannot reword the text, it can only reorder/remove sentences or words. It has use cases, and it cannot hallusinate, because it must just reuse the parts of the input. Unfortunately a good summarization requires often rewording.

But, I stand corrected, I did not know reddit used it.

very_well_lost@lemmy.world · 8 days ago

Unfortunately a good summarization requires often rewording.

Agree to disagree, I guess.

Summarization is lossy no matter what, but I’d much rather the lost data be deterministic, and the preserved data be guaranteed to represent the original text. AI summarization is like a bad game of telephone, and it’s hard to be sure when it’s given you a genuine summary or injected its own bias, missed key details, etc. And that’s assuming it doesn’t just completely hallucinate.

Maybe LLM isn't the full problem?

Maybe LLM isn't the full problem?

Agents for text