This is my response to the White House RFI on Public Access.
January 10, 2012
Response to Request for Information: Public Access to Peer-Reviewed Scholarly Publications Resulting From Federally Funded Research
There are two kinds of markets for the access and analysis of peer reviewed publications emerging from federally funded research.
One is the “mental” market, or the size of the readership base. This current market for the results of scientific research is limited, artificially, to those researchers who sit inside wealthy institutions whose libraries can afford to subscribe to the majority of scientific journals. This excludes researchers at many state educational systems, community colleges, middle and high schools, state and local employees, the American taxpayer, and the American entrepreneur. By implementing a robust public access policy for federally funded research outputs, each of these groups will have access to the literature and, if the policy is crafted correctly, the right to begin creating new knowledge and experimenting with new businesses atop it.
This leads to the second kind of market – the economic one. At the moment, there is at best a sputtering startup culture built atop the scholarly literature, with a few text-mining companies here and there, mainly in the life sciences. A small number of publishing houses exploit their gatekeeper function to impose prices on elemental services like abstracting that in the consumer world would cause revolt, and the American venture capital industry invests instead in social media. The lack of robust public access to the literature – and the relentless focus on asserting and controlling copyright – means that economically it remains a content industry and not a knowledge industry. We will not see meaningful job creation in secondary markets as long as the primary secondary use of digital literature is informal file transfer via Twitter (using the #icanhaspdf hashtag).
The scientific enterprise would clearly be better served through some creative destruction. We have replicated the analog production and distribution system digitally, realizing few of the cost benefits, few of the speed benefits, and none of the innovation benefits of the transition. iTunes came out more than a decade ago. Netflix, more than 15 years ago. Content industries are disrupted by technology, and should respond with innovation, creating new jobs that are durable against outsourcing. Yet we have seen none of this in the scholarly publishing industry, which given its enviable almost-monopoly on the outputs, has little incentive in the absence of policy to make the admittedly difficult transition to a knowledge industry.
The intellectual property interests of the stakeholders must be aligned with the scientific goals of the government and taxpayers, which is easily done through the use of open copyright licenses such as those provided by Creative Commons. Open copyright licenses protect the rights of the author or legal copyright owner while providing for conditional access to the public – for example, copying and republishing may be allowed, even for commercial purposes, but attribution back to the author and original journal, including a link to a free copy of the paper, would be required and if not present the full power of copyright remedy could be brought to bear on the violator.
Open copyright licenses can also be phased in alongside an embargo in a way that both protects the economic interests of publishers and the long term public interest in access to research literature. For example, during an embargo period, no open license might be used, switching to a license like Creative Commons Attribution-Non-Commercial for a second intermediate embargo period, and then eventually decaying to a Creative Commons Attribution license that is fully compliant with community definitions of Open Access. One could easily imagine using real data about economic usage of the literature to set these times in a noncontroversial fashion, creating a truly open corpus of literature both in terms of technical access and legal rights, without an emotional argument unfounded in data or the reality of modern web-based copyright licensing.
The pros of a centralized approach to managing the public access are fairly straightforward. First, a single point of access to the research, with stable and common identifiers, radically decreases the cognitive burden to find and download the research. Second, the centralized approach raises the odds of common standards being applied to link the research to data (as we see in the vastly popular PubMed links to both internal and external data sources). And third, the centralized approach relieves the publishers of the need to perform these infrastructural functions, which should lower economic demands on the industry. However, it is important that a centralized repository be accompanied by open copyright licenses, so that additional copies of the open corpus can be maintained in libraries and research institutions, providing additional security to the preservation of the scholarly record. This mixture of a centralized resource with open licensing and standard technologies mirrors that of the internet itself, which runs on a small set of centralized resources (the domain name system, for example).
Centralization of resources also radically lowers the burdens on the researchers and their host institutions. A single interface to upload to learn, a single interface for libraries to manage, and the comfort of a persistent repository rather than the funding of local repositories at library after library, reduces the burden of compliance not just on the publisher but on the other key stakeholders in the process.
The cons of a centralized approach are also straightforward. It must be funded (and thus can be defunded in a crisis) and it takes a certain amount of control out of the hands of the publisher – but since the goal is to remove access controls, removal of control may in fact be a pro rather than a con.
To encourage interoperable search, discovery, and analysis capability (and the small business, venture-backed job creation that innovation in each of those spaces will bring) the federal government should make a commitment to clear standards in document format, metadata, structured vocabulary and taxonomy, and commit to using its procurement power to only pay for articles that carry the designated metadata. Standards building is a long and cumbersome process, and any standard that doesn’t have adoption may be worth less than the (digital) paper on which it is printed. Having a stable customer for metadata in the person of the government creates a defined and clear market for startup business to serve, and creates potential for top-line economic growth at more established publishers as well.
It is vital as well to ensure that the metadata associated with the research is itself public. While the copyright status of metadata has not been extensively tested in court, there is reason to believe (from cases involving medical procedure codes among others) that at least some metadata, especially vocabularies and ontologies, may carry copyright obligations. The federal government should authorize the use of open copyright licenses such as the Creative Commons licenses on metadata, and preferentially select vendors who use the most open of copyright licenses and tools.
While scholarly articles are the traditional focus, and should be the first order of business in a federal open access policy, book chapters and conference proceedings (and even perhaps more novel forms of communication, like blogs and wikis and social media) should be evaluated for inclusion in the policy. However, careful attention should be paid to the level of effort required to create the work, and different rules might be applied to works that require a bit less effort (a conference poster might be required to be open immediately, no embargo) compared to those that require significant effort (a book chapter might receive a longer embargo than an article).
I am a Senior Fellow at the Ewing Marion Kauffman Foundation, and a Fellow at Lybba. I’ve worked at Harvard Law School, MIT’s Computer Science and Artificial Intelligence Laboratory, the World Wide Web Consortium, the US House of Representatives, and Creative Commons. I also started a bioinformatics company called Incellico, which is now part of Selventa. I sit on the Board of Directors for Sage Bionetworks, iCommons, and 1DegreeBio, as well as the Advisory Board for Boundless Learning and Genomera. I have been creating and funding jobs since 1999.