not_IO@lemmy.blahaj.zone to

Science Memes@mander.xyzEnglish · 3 days ago

how things become science

lemmy.blahaj.zone

973

how things become science

lemmy.blahaj.zone

not_IO@lemmy.blahaj.zone to

Science Memes@mander.xyzEnglish · 3 days ago

https://www.nature.com/articles/d41586-026-01100-y

https://bsky.brid.gy/r/https://bsky.app/profile/did:plc:s6yp6jam5og3tftozaw7pjth/post/3mj34sn6kyk25

Chat

percent@infosec.pub
link
fedilink
English
arrow-up
5·
3 days ago
There are huge public datasets that are often used for pretraining. Common Crawl and C4 are probably the most prominent, but there are others.

There are also big public datasets available for fine-running and instruction tuning.

The open weight models are getting pretty powerful, thanks to some Chinese labs.