Internet Archive’s legal fights are over, but its founder mourns what was lost - Ars Technica

submitted by

arstechnica.com/tech-policy/2025/11/the-interne…

Last month, the Internet Archive’s Wayback Machine archived its trillionth webpage, and the nonprofit invited its more than 1,200 library partners and 800,000 daily users to join a celebration of the moment. To honor “three decades of safeguarding the world’s online heritage,” the city of San Francisco declared October 22 to be “Internet Archive Day.” The Archive was also recently designated a federal depository library by Sen. Alex Padilla (D-Calif.), who proclaimed the organization a “perfect fit” to expand “access to federal government publications amid an increasingly digital landscape.”

The Internet Archive might sound like a thriving organization, but it only recently emerged from years of bruising copyright battles that threatened to bankrupt the beloved library project. In the end, the fight led to more than 500,000 books being removed from the Archive’s “Open Library.”

“We survived,” Internet Archive founder Brewster Kahle told Ars. “But it wiped out the Library.”

An Internet Archive spokesperson confirmed to Ars that the archive currently faces no major lawsuits and no active threats to its collections. Kahle thinks “the world became stupider” when the Open Library was gutted—but he’s moving forward with new ideas.

16
301

Log in to comment

16 Comments

I’m glad the lawsuits didn’t kill them but what Kahle tried to do with “Open Library” and “Project 78” was truly insane. Admirable but insane. They absolutely had to know right from the outset that the Media Companies weren’t going to allow it to continue post-COVID.

I mean, large corps like Meta get away with straight up piracy these days.

Laws only matter if you’re not part of the ruling class.

They didn’t even want it during COVID.

Calling Project 78 insane is a bit much.

If amazon can download books I can also against what’s missing from thw archive

Books are less valuable than a record of the state of the digital space, through time.

Indeed, I consider this to be an okay outcome. It’s the Internet Archive, not the All Information Ever Archive. It archives the Internet. There are other projects archiving books.

And it’s the Internet Archive, not the Internet Barely Disguised Pirate Bay. I’m okay if the data they’re archiving isn’t super easy to access by everyone all the time, as long as it’s being preserved. Someday eventually copyright law might become sane again, at which point these archives can come out of their bunkers. Until then those bunkers are important for keeping them safe.

I really think the Internet Archive did a downright stupid thing poking this bear with a stick. I’m relieved they survived and I hope they learned from the experience.

Copyright should not apply to a historic record, but things don’t become historic record upon creation, but sometime after. However, you can’t have a historic record if it isn’t recording history as it happens so…once enough time passes to make a historic record case versus copyright how do you add back the stuff that wasn’t recorded at the time?

The removal of this content is itself now historic record so tag the missing information and why there is a black hole where the record should be. Digital history, and thus history, as swiss cheese because the value of copyright matters more than accuracy of the history of the digital age itself. It is a tragedy to the future that we can’t record reality because someone claims they own it…

We are a stupid, stupid species.

You archive it but don’t publish it.

That might be a viable option.

It’s the approach I’ve been advocating for for years now, throughout this whole lawsuit circus. I got a lot of downvotes for it over the years too, people couldn’t separate my position from capitulation.

Really, it’s just a matter of fighting the battles you can win and not fighting the battles that will annihilate you simply on the basis of principle. The analogy I kept using was a man carrying a precious and fragile treasure going up to a bear and whacking it with a stick, and then acting like we should be sympathetic to them as they desperately scream about how the precious treasure was at risk now that the bear was eating their leg.

They should be focusing on protecting that treasure. Let the EFF take the bear on, that’s what they are for.

It does call into question the motive of the archive and it’s financial viability to pivot to doing that.

Yes, the archiving and republishing would be illegal in most countries, but not in the US. Fair Use

They didn’t face trouble over archiving the net, but over digitally lending e-books and audio.

Had a similar conversation over in Mastodon recently and yeah, this is a very fair point. The indiscriminate scanning and publication of copyrighted books shouldn’t have happened in the first place, especially when there are existing ecosystems for ethical lending/leasing/borrowing of books already in place, which benefit and are working with authors/publishers already.

I’ve been donating monthly for about a year now, and yall should too!

Good.

Stick to CC media, next pandemic.

Comments from other communities

The Internet Archive is a good project.

However, like most tech obsessives, they still don’t understand, or perhaps care about, consent.

Does an author need to consent to have their book added to a physical library?

Once you publish and widely sell a work should an author really get to decide that their book cannot be borrowed or lent?

That isn’t what we are talking about, apologies for not being clear with that.

We are talking about them not taking down things like tweets etc without what they consider a ‘good reason’.

The article is about the internet archive’s book library, however I’m very open to discussing tweets.

Maybe this is because I was taught “once it’s on the Internet it’s out there forever” growing up, but I have no problem with everything posted to the internet being archived forever. Why should someone have the ability to scrub their past and pretend they are perfect? Sure I don’t agree with every opinion I’ve ever posted, but that doesn’t mean I should be able to pretend I never said those things.

I see no difference between a newspaper recording a public speech and the Internet Archive recording a public tweet.

Because things can be posted about other people without their consent. Things like revenge porn, the location of air defenses, etc.

I think it’s something that is currently being figured out in various legal systems around the world.

I agree, those are examples of things that should be able to be deleted from an archive. In my mind those are different issues than just the archive though, and we should probably have laws in place ensuring that content like that can be removed from anywhere on the Internet.

But it really seems like you’re arguing that “some content shouldn’t be archived forever”, not that “an archive must receive consent before recording public Internet data.”

Why should someone have the ability to scrub their past and pretend they are perfect?

That’s not what it does. People are capable of changing on their own and thus should have the right for their past mistakes or opinions to be forgotten, or just to be forgotten anyway.

Having their past or just current opinions dragged up can actually have the opposite effect and force them to double down. So having some way of getting rid of it all could actually help them work through their terrible opinions.

If someone go out in public spouting terrible opinions, I think they should be responsible for explaining why they were wrong and why they no longer believe what they used to. That is hard and requires actually growing as a person, instead of just deleting their past and pretending they never said what they did.

We think it’s okay to grow, learn from it and not do it again without necessarily explaining it etc. But we do understand the value of doing it the way you think it should be done too.

Yes. There’s an agreement and a system in place for that. While I support the internet archives efforts in general. Consent is a very important right. Not everyone cares or wants everything about them preserved. People also want the right to be forgotten.

A balance, definitely needs to be struck. Even if it is opt-out.

Im very curious what you mean by " there’s an agreement and a system in place for that.” Can an author tell a library that they are not allowed to lend specific books? I’ve never heard of anything like that

An author can tell anyone that. And as long as they still control the rights to their work, they can enforce it. Should you sign your rights over to a publisher, then that becomes the publisher’s prerogative, not yours.

The consensus and case law surrounding traditional libraries. Is that libraries generally still bought physical copies. Even in the age of the e-book today. They still play by publisher rules of artificial scarcity and limited lending. Only lending out for a limited period, the number of licenses they purchased from the publisher.

The Internet archive however, allows everyone to infinitely duplicate items in the archive. Which is great for retention. But as a business model, it sucks. I support the archives mission. But is the archive supporting any of those they archive? And while generally invaluable in a good way. They don’t offer a way to be forgotten for those that do want to be forgotten. Then again neither do most major internet focused entities. Reddit etc undeleting comments their authors deleted for instance.

An author can tell anyone that.

I really don’t understand how this would work. They can’t do the normal legal nonsense of claiming they are only selling me a license to the book, if they sell a physical copy of the book to me. After they sell me a physical book they cannot prevent me from lending or reselling it.

If authors/publishers could find some way to legally do this they would, I just don’t think they can.

I understand why a library can’t make copies of a book (as far as I understood it the internet archive was “limiting” access to how many copies of a book can be viewed at a time) the copyright protections are clear. But copyright does not cover resale or lending.

Oh then I misunderstood you. Yes, if an author self publishes and sells copies. Their control over said copy largely ends when it leaves their possession. In fact they only maintain one right to it after that point. The copy right. Which unfortunately IA is a bit fast and loose with.

Yeah, the concent of a megacorporation who owns the rights to a bunch of books shouldn’t matter when freedom of information is on the line.

That isn’t what we are talking about, apologies for not being clear with that.

We are talking about them not taking down things like tweets etc without what they consider a ‘good reason’.

These corporations absolutely are immortal thieves. But it is important to respect consent as well. The solution isn’t to ignore consent, but to change the system to make the thieves obsolete or irrelevant. To find better ways to reward those who culturally enrich society outside of capitalist cycles of false scarcity.

But Kahle said the lawsuits against IA showed that “massive multibillion-dollar media conglomerates” have their own interests in controlling the flow of information. “That’s what they really succeeded at—to make sure that Wikipedia readers don’t get access to books,” Kahle said.

And that’s why I hate copyright.

Shareholders would rather have history destroyed than free.

There is nothing more evil than actively fighting to ensure people have less access to knowledge in the future.

IP defenders in shambles.

I have mixed feelings. I’m glad they survived the lawsuits, and now they can spend their funding on their actual goals rather than it going towards lawyers.

On the other hand, it’s really sad that they had to delete so much of their archive - over half a million books, and a bunch of recordings from their Great 78 Project (which was archiving 300k+ music albums released between ~1900 and 1950). A lot of the things that can’t be archived are eventually going to become lost media.

I really hope that they didn’t actually delete anything, and only just removed public access.

And open themselves up to massive penalties? That would be beyond stupid.

I wouldn’t think a library/archive retaining data in an offline form would incur penalties, and I feel like preserving books for the future is the opposite of stupid.

Preserving is important, sure. But if the settlement required them to delete it and they keep an offline backup and this ever gets out, the settlement is voided and it opens up a world of hurt for them.

This is not a debate about the merits of preservation but about legal repercussions for the Internet Archive.

I didn’t know if it did or didn’t. But since you say that’s the case, that sucks and I hate the publishers even more.

I’m 95% sure the settlement with the publishers would have included a clause requiring the Internet Archive to delete all “infringing” material in their possession.

what’s your methodology for that 95% figure? because Internet Archive themselves mention no such clause:

The lawsuit only concerns our book lending program. The injunction clarifies that the Publisher Plaintiffs will notify us of their commercially available books, and the Internet Archive will expeditiously remove them from lending. Additionally, Judge Koeltl also signed an orderin favor of the Internet Archive, agreeing with our request that the injunction should only cover books available in electronic format, and not the publishers’ full catalog of books in print

Because this case was limited to our book lending program, the injunction does not significantly impact our other library services.  The Internet Archive may still digitize books for preservation purposes, and may still provide access to our digital collections in a number of ways, including through interlibrary loan and by making accessible formats available to people with qualified print disabilities. We may continue to display “short portions” of books as is consistent with fair use—for example, Wikipedia references (as shown in the image above). The injunction does not affect lending of out-of-print books. And of course, the Internet Archive will still make millions of public domain texts available to the public without restriction.

the judgement did not require they delete the books from their archives, only that they stop lending out digital copies of books fitting specific criteria. which should be obvious because possession not copyright infringement, reproduction/distribution is.

in fact, the judgement specfically allows Internet Archive to continue to use those books “for the purpose of accessibility for ‘eligible persons’”

Distributed archives seem to be the way forward. It’s much harder to take something down if it’s spread across the globe and not controlled by a single entity

It’s also much harder to guarantee preservation with distributed archive. Example: torrents with 0 seeders.

That’s why you need more people and spread the word. If enough people and devices are dedicated to the archival probably cess, the safer it is

So 5 times more overhead to guarantee the safety of data, that is x5 more cost cause it’s not like regular people have servers with lots of memory just sitting at their homes.

That’s the price you pay to ensure archival in the face of adversity

In the end, the fight led to more than 500,000 books being removed from the Archive’s “Open Library.”

In case you wanted to know what was lost.

I’m kind of amazed that only that and the 78’s archive was lost. At least there’s Project Gutenberg et al to help with the books, and meanwhile the IA does still archive a vast load of video material, software, and no doubt, other stuff.

What criteria decided those books? It must be a relatively small number of all books

How many of them are not backed up somewhere else?

Most of them are still “on” the Internet Archive.

Reminds me of a certain emulation site that was hit by Nintendo’s lawyers and removed the download links for all of their games.

Except that’s all they did. The files are still there, the game pages are still up, all that’s missing is the big shiny download button. A simple userscript can add them back and let you download the “removed” games.

Got any info on that user script? Y’know, just for educational purposes

I’m sure Anna has an archive or something

Someone should check to see how much of their library is accessible

Is that Steve Wozniak holding one of those zeroes?

Not unless he became Asian. You can click on the picture in the article to make it full screen. Not Woz.

Alright, but even full-size it does look a bit like him.

A bit, but he’s quite a bit older and heavier at this point in his life. Looks closer to what he did 25 years ago.

https://annas-archive.org/

Copyright and patent laws need to die.

They don’t need to die, they need to go back to what they used to be. The first copyright law was called the Statute of Anne and it covered a work for 14 years.

That’s a totally reasonable amount of time for an author/publisher to make their money. And it’s reasonable for creators to want to get paid for their work.

And then it should be public domain.

No, they need to die.

If you can’t protect your own ideas, you shouldn’t get to rely on the government to do it for you.

How are you proposing that people protect their own ideas?

Say you write a book. You self-publish. A big publisher CTRL-C/CTRL-Vs your book and publishes it themselves with their access to distribution networks and advertising budgets. Now you sell 0 copies of your book while the publishing house makes millions.

What should you have done differently?

Copyright laws were invented to protect creative people against publishing monopolies.

How is the publisher making money if everyone can copy and redistribute it for free themselves?

Edit: Loving the downvotes from useful idiots. Keep getting taken for a ride 👍

I can’t believe people disagree with this point I am unable to explain. My utter lack of self awareness and critical thinking skills inform me that they’re all idiots, not I!

Yuuup.

You didn’t answer my questions

To answer yours, beyond what i already laid out in the question itself, the original Night Of The Living Dead has been out of copyright for decades, and yet corporations still make money off it.

How do they make money off of it?

Do they create their own derivative work and then make money off of that because copyright laws prevent people from copying and redistributing it for free?

Edit: They didn’t have an answer because they know I’m right. They respond with insults rather than admitting they’re wrong.

This is why businesses that profit off of copyright and patent laws make so much profit, because they have no shortage of suckers and saps who don’t know any better proud to throw money at them.

But hey, at least they fit in with each other, right? 😉

People can literally do that right now and yet the music, book, etc. industries still exist

People can and will do that Big publishing houses cannot, because of the litigational threat.

While I don’t uncritically support one side or the other, there are provisions for protecting the small and large alike, and I think there’s no easy answer.

Everyone thinks the problem is easy to solve until a specific incidence lands in their lap.

on todays bozo braindead takes.

Good read.

It’s a bummer to see AI gobbling up entire books and often misrepresenting them but then IA can’t provide the source material.

The AI sees but you can’t see IA.

Deleted by author

 reply
1

I’m glad their very ill advised move didn’t kill the entire internet archive project.

If they are going to do something so reckless again they should make it a totally new project not attached to the internet archive at all.

They might not get so lucky again in the future

Insert image