Internet saving/downloading (theory and practice)?
Hey, I realize that you can NEVER save the entire internet. Simply because it changes from moment to moment, and much of the data isn't freely accessible (e.g., password-protected areas).
However, I'm still wondering whether it's possible to save and display the "freely accessible" internet in a few GB size. I'm asking this question because it's already possible to install a module, such as GPT4ALL, that has a lot of knowledge but is only a few GB in size.
I think we should minimize the theory to just websites for now, so it remains easier to understand and avoids many problems for the time being. At least, that's what I think.
Thank you in advance and please understand my dyslexia.
Birds – Wikipedia Simple Website, Wikipedia. Let’s take the HTML code. Only the HTML code. CSS and JS outside.
Sound UTF-8 string length & byte counter (mothereff.in) we are 202,490 characters and 205,803 bytes. Now it’s just one side. But Wikipedia has millions of them. If we take other hosters, that’s a number of impossibilities. It just gets too big and that’s just the HTML part
No. not even possible for lack of space. It would take forever to do that. AIs like chatgpt only use apis to existing search engines. the offline versions have a lot to know but this is not even 1% of the dates of the internet.
Okay, they don’t even give me 1%. Would it be possible with the 1%?
negative.
Each side of the Internet is already stored on its respective server. However, a private user alone does not have the necessary means to store everything again redundantly.
And what does it look like if that’s what I get as an output? Just talk about finding static pages. I don’t need any functions or something, but only the opportunity to look at all pages, as I would call them, without the backend technique.
(Hoffe it is understandable)
Even then not. That’s inconceivable. Youtube, Wikipedia, webarchive, etc. are immensely large.
accepted
No. The information remains the same, and you can’t compress it any more.
That’s true, but is it also not possible to compress with enough server resources as with the AI modules?
70% of the Internet are the Deep Web, so (server data) are already not accessible.
There are already websites. Calls wayback machine
And is it possible to compress only those to a few GB? Like the modules of AIs? (Do not have much idea yet)
You cannot compress the sites to smaller sizes when it is 6GB large, it is 6GB. You can index them then you have google.
And AIs were fed with Terra, if not even Petabyte data.
Google itself does not show me the site, but it only links the site. Maybe I wrote that a little stupid.
You call Google.
Would it be possible to train a kind of AI, which is not for chatting, but serves as a search function and then shows the pages?