Skip to main content
SearchLoginLogin or Signup

Chapter 10. Our Archive

It is no accident that the etymology of the word “archive” comes from the Greek ἄρχω—“to begin, rule, govern”—and thus no accident that “archive” shares the same root as the words “monarch” and “hierarchy.”

Published onFeb 23, 2021
Chapter 10. Our Archive

Who controls our access to the libraries of content developed and produced and archived over the last hundred-plus years? Who controls, or has tried to control, our search across these screens and servers for the moving pictures and sounds we are looking for? The forces of the Monsterverse are arrayed on every battlefield. We have to recognize these forces for what they are.

And we have to gird for war.

As we digitize all of our cultural heritage materials for access, we link our institutions and ourselves together online, and are in fact building one big supercomputer—futurologist Kevin Kelly has called it a “planetary electric membrane”—comparable to the individual human brain. It is an organism of collective human intelligence in the business now of processing the hundreds of thousands of full-length feature films we have made, the millions of television shows, the tens of millions of recorded songs, tens of billions of books, and billions of web pages—and looking at the world every day through camera lenses and microphones, including 3 billion phones and counting—all recording our own sounds and visions. It is a supercomputer so large that if we think of it as one connected thing, it processes some 3 million emails every second and generates so many exabytes of data each year that it consumes 5 percent of the world’s electrical energy. And what it wants is . . . more knowledge. Increasing sentience, or intelligence.

Who is writing the software that makes this contraption useful and productive? We are. When we post and tag photos, for example, we are teaching the machine to give names to images, and the thickening links between caption and picture form a “neural net” that can continue learning! The 100 billion times per day humans click on one page or another is a way of teaching the web what we think is important. Each time we forge a link between words, we teach it an idea. We may think we are merely wasting time when we surf mindlessly or blog an item, but each time we click a link we strengthen a node somewhere in the supercomputer’s mind. “Google is learning. . . . We teach it while we think it is teaching us. . . . Every search for information is itself a piece of information Google can learn from.”1

An understanding of our rights and our power needs to course through the knowledge sector. The Rebel Alliance—should we found it—must have this understanding, and a deep awareness of the pedigree of our integrity, the duration of our centuries-old struggle, at its core. While Wikipedia doesn’t formally recognize the Encyclopédie anywhere as its exquisite forerunner, the Internet Archive roots its name and founding principles—even core graphics—in the Library of Alexandria, from the Ptolemaic period, some 2,300 years ago, in Egypt, 7,400 miles away from where the Internet Archive is based in current-day San Francisco. The Internet Archive was funded initially, and still, in the main, by its founder, Brewster Kahle, from the sale of a type of search engine he created called, not incidentally, Alexa.

Today the Archive, an independent nonprofit, runs on an operating budget of approximately $10 million per year. Its goal is to provide “universal access to all knowledge.” Its primary operation is to archive the worldwide web—not a bad ambition, given that everyone seems to be ignoring the essential mandate of collecting, keeping, and learning from the past. Today the archive holds 330 billion web pages, 20 million books and texts, 4.5 million audio recordings (including 180,000 live concerts), 4 million videos (including 1.6 million television news programs), 3 million images, and 200,000 software programs.2 Any user anywhere can create a free account and archive her own media—it’s as close to a public resource as we can get.

Is this important? It is no accident that the etymology of the word “archive” comes from the Greek ἄρχω—“to begin, rule, govern”—and thus no accident that “archive” shares the same root as the words “monarch” and “hierarchy.”3 Archives started in the “archon”—the seat of government—and the centrality of the power of the archive is likely to be the story of the twenty-first century.4 If this book is in part about the power of the moving image and the web, then it is at least worth noting that this—the archive as pure power—is the critical message of one of the highest-dollar-grossing moving images of all time. Set in the future (but how far?), the Na’avi people in Avatar plug into and connect with the sounds of the past—they make zahaylu—as the primordial way for them to heal, regenerate their powers, and enlighten themselves. It seems to work. We too have to imagine a collective of knowledge institutions forging alliances to make available all of their holdings to similar seekers and the weak and the injured at our own Tree of Knowledge.

The Internet Archive mirrors the Library of Alexandria’s ambitions. “Starting as early as 300 BCE,” we have been told, “the Ptolemaic kings who ruled Alexandria had the inspired idea of luring leading scholars, scientists, and poets to their city by offering them life appointments at the Museum”—located right in the center of the city—featuring a library where “most of the intellectual inheritance of Greek, Latin, Babylonian, Egyptian, and Jewish cultures had been assembled at enormous cost and carefully archived for research.” Ptolemy III (246–221 BCE), according to one historian, “is said to have sent messages to all the rulers of the known world, asking for books to copy.” As a result, Euclid developed his geometry in Alexandria; Archimedes discovered pi and laid the foundation for calculus; Eratosthenes posited that the earth was round and calculated its circumference to within 1 percent; Galen revolutionized medicine. Alexandrian astronomers postulated a heliocentric universe; geometers deduced that the length of a year was 365 1⁄4 days and proposed adding a “leap day” every fourth year; geographers speculated that it would be possible to reach India by sailing west from Spain; engineers developed hydraulics and pneumatics; anatomists first understood clearly that the brain and the nervous system were a unit, studied the function of the heart and the digestive system, and conducted experiments in nutrition. The level of achievement was staggering. And:

The Alexandrian library was not associated with a particular doctrine or philosophical school; its scope was the entire range of intellectual inquiry. It represented a global cosmopolitanism, a determination to assemble the accumulated knowledge of the whole world and to perfect and add to this knowledge.5

The concatenating links of past and present abound. The Encyclopédie launched in 1750 with a prospectus that included a map of human knowledge. It was built upon the knowledge maps of Francis Bacon—to whom Diderot and D’Alembert quite frequently alluded.6 Bacon sought to organize knowledge—memory, reason, imagination, that sort of thing—and here we go again. Google’s mission is one they put out there simply for us:

Organize the world’s information and make it universally accessible and useful.

And it’s terrifying.7 Should one private, profit-seeking company, one state, one religion, or one church have a role like this ceded to it by the people? (The answer, again, is still no.) The company’s work on its taxonomy of knowledge objects, its “knowledge graph,” is in many senses founded upon the knowledge base of structured data that Metaweb developed and ran publicly as Freebase from 2007 to 2010. Google bought that out from Danny Hillis and his colleagues, who today, at MIT and elsewhere, are trying to develop a not-for-profit alternative that may take billions of dollars to underwrite.8 At the time of that purchase, that free knowledge base was said to be able to sort and classify 70 billion facts. Today, the challenge is orders of magnitude greater.9 We have to understand how to make knowledge as easily searchable and discoverable as other products are in our society, from music and sports videos to pornography and shoes.

What we have that the Carnegie Commission did not have is the new network, some woke alliance members, and extraordinary computing power. (What they had that we do not is a sense of deep, catalyzing fear—but we will be getting that again.) Where artificial intelligence may take us is for other work in the future, but for now, what if we were to steal a page from the Monsterverse? What if, as Rebel Alliance members, we were to acknowledge our own lack of progress to date, as a field? While we noodle over the potential rights challenges involved in making our assets searchable, our lunch is being eaten. Our breakfast. Our dinner. The commercial sector is actively exploiting the growth potential for such advanced products and services. Pandora, Netflix, iTunes, and IMDb (the Internet Movie Database), among others, enable customers to experience moving images, sounds, texts, and images, and they provide thriving recommendation engines. Google has more than two hundred signals in its Page-Rank algorithm. Amazon’s engine resembles in many ways its competitor Netflix’s, which has been studied and even opened to the public to improve upon. But in many ways the Holy Grail for us in the knowledge and public education business would be to develop the equivalent of the music genome at the heart of a company like Pandora. Pandora’s automated Music Genome Project, for example, the patent application for which is available online, assembles and searches through four hundred separate characteristics of each song and music file to determine relationships between that file and the rest of the sound corpus. These attributes are called “genes” for songs. And each of the songs in Pandora’s database—more than four hundred thousand songs from more than twenty thousand artists—has been assessed manually, requiring a minimum of twenty to thirty minutes of assessment per four minutes of music. Even the characteristics of books—that old medium—are being analyzed this way now.10

All of which leads one to ask, where is the knowledge genome? And where are we as educators and knowledge experts in developing it? To know a knowledge genome! To search for “black hole,” “tax reform,” “greenhouse gas”—and receive facts! Perhaps via . . . Wikidata!11 Facts! Truth! Knowledge! And to see the archive owned by . . . the people!

If this is the Republic of Images now, a video age, and an age also propelled by a need to expand the Commons, we will have to ensure that the moving image not be knotted up by six hundred years of the same mistakes in the contracts and agreements regulating the use and ultimately ownership of the value in these media (as book and journal contracts have been). Book contracts we discussed earlier in this volume. But zoom in on a particular piece of moving-image media—we can focus, for example, on the monumental American public broadcasting documentary about the civil rights movement, Eyes on the Prize—and the very real complexity of video’s copyright and contracts anatomy becomes apparent pretty quickly. To a civilian viewer, the documentary might be entertaining, informative, and educational. To the people involved in producing it, the film also represents myriad relationships of talent, materials, imagination, and technical experience, behind which lies a matrix of rights and responsibilities often governed by dozens of contracts and agreements involving talent, agents, lawyers, guilds, and unions, representing thousands, sometimes millions, of dollars of underwriting or investment. Rightsholders and other financial stakeholders can include producers, directors, cinematographers, cameramen, film and video editors, writers of scripts, writers of songs, writers of music, actors, singers, musicians, dancers, choreographers, narrators, and animators, as well as whole cohorts of content from music and book publishing and the film business who may have sold or otherwise licensed rights to the production—to say nothing of the dozens, sometimes hundreds, of artists, designers, engineers, consultants, and staff who are often rewarded when they help the production to complete its journey from idea to finished work. F. Scott Fitzgerald wrote of the “savage tensity” that often would be present when Hollywood studio bosses would first screen the movies they were producing: these screenings, he wrote, were “the net result of months of buying, planning, writing and rewriting, casting, constructing, lighting, rehearsing and shooting—the fruit alike of brilliant hunches or of counsels of despair, of lethargy, conspiracy and sweat.”12

The anatomy of rich media is now getting the attention it deserves. One will find, to bear out Fitzgerald, that a typical two-hour feature film can have as many as five thousand different shots, all told, edited together—and a typical feature-length documentary, which will present much more licensed content, as many as two thousand.13 Perhaps that complexity can best be visualized itself in a moving-image illustration that explores the anatomy of a media production—and visualizes the sources and online uses for those sources together. The number and types of existing/potential creative- and economic-property stakeholders involved in the professional production of media are numerous; licensing experts in public media have calculated that there can be as many as almost eighty different rightsholders for a single minute of a finished public-television documentary.14

These rightsholders include talented individuals, companies, music bands, and other groups whose work is audible and visible on the screen, and who often have business contracts with producers and distributors describing the compensation and credits they receive and the rights they have licensed to their work for specific media uses (television, radio, DVD, and online, for example) and, even in this networked world, certain delineated territories (such as North America or Japan) in which they have granted those rights. And, in the United States anyway, unions and guilds that engage in collective bargaining with networks and producers often represent them to determine the appropriate pay scales and more general equity participation on behalf of their members. Video stakeholders subject to engagement agreements include actors, singers, dancers, and producers, via the American Federation of Television and Radio Artists (AFTRA); scriptwriters, via the Writers Guild of America (WGA); directors, via the Directors Guild of America (DGA); and songwriters and lyricists, composers and arrangers, musicians and music publishers, via the American Federation of Music (AFM). Possibly subject to various additional collective bargaining agreements are producers, cinematographers and cameramen, film and video editors, animators, voice narrators, choreographers, artists, designers, engineers, consultants, and other staff. The collective bargaining agreements these unions and guilds have negotiated on behalf of their clients, and the roles they have played and still play in protecting the rights of those clients (and their own interests), profoundly affect the ways in which media has been and is being put online.

Vendors and suppliers of images, sounds, photographs, and artwork form another circle of stakeholders. These licensors often have receivables tied to the number of end users the licensee is likely to reach or the number of uses (television, home video/DVD, educational video/DVD, mobile platforms, etc.) through which the licensee’s work will be made available. These footage suppliers and archives include Getty Images and AP Images, for example. Most are commercial businesses. Some represent unique collections of classic media that can be used—under current law and standard practice—only through a license obtained from their company. A licensing director at a US public media station once sought to help a producer use a clip of the film Rebel Without a Cause in his television show, and Metro-Goldwyn-Mayer billed his company over $100,000 for the educational/public television broadcast rights to seventy seconds. Such licenses, too, often lie at the core of public-media productions. And all of these stakeholders, licensors, and beneficiaries involved in producing audiovisual media have interests that are affected when their productions enter the digital universe online—where, once posted, they can be replicated ad infinitum almost for free, anywhere. The original economic model on which most every one of these contracts was predicated goes right out the window. New publishing and production regimes that make explicit reference to the Commons may help investors and underwriters appreciate the new future they face—and the power of Wikipedia in particular as a portal to the world of ideas.

The anatomy of a video clip. Produced by Intelligent Television, Inc., 2016. Watch at:

One day, the powerful algorithms at work toward commercial objectives may be turned on education and the exploration of culture. Netflix, it is said, an archive as much as YouTube, has “the ability to ‘personalize’ its interactions” with its tens of millions of customers.15

Will educational and culture need to stay so far behind?

The challenge of making things free for the vast archive and the Commons is multilayered. There have been, over the past thirty years or so, multiple levels of progress—first, to put content online; next, to put it online for free; next, to put it online for free with a CC license or with another generous license; and lastly, to put it online for free with the most liberal type of license that facilitates that content’s full integration into the Commons. Passing into each of these circles has involved, as it should, some self-congratulation on the part of each licensor making progress.

When MIT OpenCourseWare first started, Creative Commons and Wikipedia and our general knowledge about how to enable sharing were not as advanced as they are today; indeed, as I’ve written elsewhere, that knowledge then was as primitive as a coelacanth.16 This became apparent some years ago when educators and producers endeavored to fit popular MIT OpenCourseWare lecture videos about Isaac Newton’s laws of physics into the appropriate articles in Wikipedia. Wikipedia editors told us we had to renegotiate the standard MIT OCW terms of service and all the relevant agreements with the MIT physics lecturer before Wikipedia would allow MIT’s video into the encyclopedia. Intelligent Television post-produced video of the lectures to fit them into Wikipedia articles, and the terms were reworked with the professor’s approval and blessing—but the rights and permissions statement published in the encyclopedia looks (as it should) more like an exception was made to include these videos and OCW in Wikipedia, rather than, as should obviously be the case, the rule.17

That OCW should exist as an artifact with some imperfection in “openness” or “free-as-in-freedom” today is no travesty. Far from it. The entire universe of digital scholarly and educational resources—from JSTOR and HathiTrust to the Khan Academy and beyond—provides invaluable knowledge and information to millions worldwide, and much of that is free to the public. Yet one cannot but wonder, at a time when so much is wrong with the world, whether a little tweak—a goose, a nudge—in the licensing requirements for the production of this knowledge could not be put into effect, so that the educational materials produced expressly as such could become fully blessed at their birth—with a licensing sacrament—by becoming fully free. Richard Stallman has noted how software called “open-source” is more often than not free, but the educational projects described as “open courseware” and “open access” are, more often than not, not.18 It might do great good to make open courseware—and any educational material licensed for it—completely free at all times, once all the underlying creations have been secured and their creators properly compensated. Without these or some similar kind of these myriad agreements, artists and creators of content at all levels—directors, musicians, composers, actors, archivists, researchers, and publishers, to name just a few—would have no basis on which to be paid, and the work that they do, which informs, entertains, and indeed enlightens us, might all but cease. We don’t need to eradicate these kinds of contracts, just reimagine them, such that they explicitly make reference, each and every one, to the Commons where their work is destined at last to lie. Let us reimagine these agreements such that they protect both the public good and the abilities of creators, and so that they reduce the corporate overreach of our age by companies that make a grab for rights beyond their natural purview—because no one is stopping them from doing so.

Iterating toward openness in this regard would involve applying best practices to the past as well as the present and future.19 Systematically exploring how to extend the license from early and contemporary productions to encompass these fuller freedoms could be an extraordinary task to assign to a production team—and one that would benefit world knowledge forever. Reclearing past productions like this would be relatively straightforward. The Eyes on the Prize documentary series had to be recleared for continued broadcast and DVD distribution in recent years—the Ford Foundation sponsored the process—with funds that amounted to a small fraction of the original production budget but which were substantial nonetheless.20 The larger point here involves regret that the process had to be undertaken at all—and also that a second effort at clearing was more expensive to conduct years later than it would have been at the time, had the knowledge and the sense of a longer future, of our New Enlightenment, been present at the moment of production. It would have been possible, and simpler, to anticipate, and to formally recognize right at the time of the work’s creation, its ultimate entrance into the public domain. Things need to be produced with the confidence that they will enter—they must enter—the public record of civilization. Imagine a library book being pulled from the shelves for clearance reasons: that the rights to peruse the words within had expired! But that is exactly what happens to our film and television programs—and our music and sound—because of the adolescent approach we take to rights in our new Republic of Images.

Looking forward with all the knowledge we can indeed gain from hindsight, we might ask if it is more expensive to produce works from the get-go that are freely licensed and licensable. One doesn’t need extra cameras or lights or software. Only a select few line items—legal costs, rights acquisition, insurance policies, accounting, staff—in a production budget are affected by the pursuit of open licenses, and these, if the right licenses are embraced at the start—and appropriate and thorough briefings given to the production teams and administrators about research, clearance procedures, citations, and record keeping—only marginally or hardly at all.21

The early history—the foundation—of cinema and radio, when screen culture was just beginning to take root a century ago, is indicative. In early cinema, media consumers in theaters multitasked endlessly, interacting with the screen, lecturers, musicians, and other audience members throughout the playing time of a picture.22 Early filmmakers treated their media as unfinished and customizable. Historians of film tell us, for example, that pioneering filmmaker D. W. Griffith’s “incessant adding and subtracting of footage implies that he saw these films as essentially open texts, capable of showing one face to Boston and another to New York. . . . By the late silent period, exhibitors could choose alternate endings for a number of major films. Some audiences, viewing Garbo as Anna Karenina in Clarence Brown’s ‘Love’ (1927), saw Anna throw herself under a train. Other theaters showed Anna happily reunited with Count Vronsky.”23 That the bulk of modern Internet usage today involves time-shifting watchers of Netflix and collecting and curating video on file-sharing networks should tell us something.24

To imagine this fairer future for media, the long historical perspective is crucial. While the challenges that Internet technology presents can seem huge to us, in fact there has always been a sense of challenge present with technological innovation. Lawyer Fred von Lohmann has noted that the Internet is one of the biggest disruptive innovations in copyright—but it is certainly not the first or only one. “People forget that broadcast radio, cable television, the VCR, the player piano—every one of those technologies created a panic among copyright owners, incumbents of the era, upon their introduction.”25 Many of the institutions that grew up at that time—institutions that seem to have always been present—arose as a function of earlier copyright panics, and prove that we are constantly adapting.

Many people and institutions today are incentivized already to put their material online and to make that material more openly available for use and reuse. They might not have their rights houses fully in order, and they might not label what they are doing with a formal “open educational resources” or “OER” brand, but they are seeking to make—and making—their material more available. Rightsholders are all concerned, as a rule, with the same thing—clear definitions, clear rules of the road, ways for people who invest to be compensated, and avoiding surprises. Setting out the obstacles to making educational material more accessible—building a full toolkit for putting content online and then into the Commons—should be a priority for advocates and funders in the years ahead.26 Such a toolkit should include all kinds of sextants, compasses, and telescopes—especially a set of richly annotated production contracts and agreements, for example, where the language representing barriers to making material more openly available could be identified and highlighted as such, and boilerplate language about institutional commitments to openness that can be developed and copied is included. Cornell law professor James Grimmelman launched one such a model effort to annotate the Google Book Search agreement, for example.27 This is the key—a library of foundational documents, annotated for promoting free/libre access to online education.

The development and production of all of our work takes place in a crucible of experimentation. In many ways today we are traveling up and down the Rhine river valley in the earliest years of print. Princeton historian and media scholar Anthony Grafton recounts how printers would experiment with printing all kinds of things during these years—sometimes becoming so competitive as to assault each other (while printing Bibles, no less, which emerged as a cutthroat business!).28 We are still in the early years of movable type here—notwithstanding the achievements we’ve chronicled above. And the key to more achievements in this area is, predictably perhaps, more experimentation.

No comments here
Why not start the discussion?