del-fi

Month

June 2013

2 posts

SCOTUS Plays Solomon on Gene Patents

Big news today: the Supreme Court has ruled that naturally occurring genes are not patentable.

This is a Big Fucking Deal, as Joe Biden might say. But it’s not as complete a victory as it may seem on the surface. The ruling explicitly excludes cDNA, and notes that the case didn’t address methods, applications of knowledge, synthetic DNA, or alterations of gene sequences that do occur in nature*. Lots of room to negotiate privatization there - indeed, that sentence could be the next “by means of a computer program” in patent law, though it doesn’t have to be so.

This is a ruling that resets the default to unpatentable, as Mike Eisen rightly pointed out on twitter. That’s why it’s big. But it doesn’t close the door, at all, on lots of patents in and around DNA*. That’s why the biotech stock index is up today*.

I’m actually not nearly as interested in those parts of the ruling though. What this means for me is more that the biggest barrier to building a commons of mutations with diagnostic potential is gone: the inability of DTC sequencing companies like 23andme to reveal the status of its customers to its customers because of patents. The companies that rely on these patents now have to move to trade secret approaches, as Myriad already has done, and the thing about trade secrets is…we can compete with them.

What the patents did was make it impossible, illegal, for us to build commons-based competition. They were enforcers on trade secrecy. Those enforcers are gone. We can now go straight to the citizen and say, get yourself genotyped, and donate your data to science. 

At Sage Bionetworks, my non-profit employer, we’ve built a system that allows precisely that. It’s called Portable Consent, and we’ve got a study called the Self-Contributed Cohort for Common Genomics Research. You can enroll and donate your data in less than ten minutes, start to finish.

So go get yourself genotyped. Download your data. And donate it to science. Let’s stop fighting companies that privatize, and start competing with them.


Note: edited post at 12:15PM EST for clarification of a few points. Those sentences carry asterisks at the end.

Jun 13, 20136 notes
Speaking Notes - Health Datapalooza

My notes for the panel today at Health Datapalooza. I’ll come back later and add links, fix typos, and so forth…

Why consumer access to data is important

- because data is a digital representation of us. and it’s increasingly being used to affect our real-world services, their costs, their benefits.
- we’re increasingly able to generate data that used to be the exclusive space of the clinical system. genomes are just part of it. health data is in everything. Facebook cups, keyboards, iPhones, google’s next phone.
- but it’s faulty at worst, and incomplete at best. if we can’t access it, we can’t tell how and where it’s incomplete or wrong.


Use cases and potential applications for consumer access to data

- send to app provider to interpret data and help me make better choices about my care.
- send to app provider to interpret data and help me make better lifestyle choices (ideally not a panopticon of health provider, ATT, and government - but a decentralized, competitive market)

that’s what this event is mainly concerned with. but there’s more.

- extract the “real” clinical data (compared to the “dry” data from phones etc) from the record. needs a lot of normalization etc but it starts to paint a longitudinal phenotype of me and my life. mapped to my genome, in large in sample sizes, we can start to correlate lifestyle and medical treatment outcomes to individual genomic variation.

Current state of consumer access to data

- raw in every sense of the word. most folks aren’t exactly aware of the things we here are aware of.
- most of the data collection is happening in zones like mobile and social where we have no positive rights to privacy, like health, and where the entire system depends on designing awareness of the data (and its ownership) out of the hands of us as citizens.
- in health, access to data is hamstrung by a combination of rapidly advancing technology and slowly (that’s a nice way of putting it) adapting law.
- we just moved here from Oakland. when i wanted my son’s immunization records to get him into his new child care center here in DC, the provider couldn’t fax them to me because they were afraid it violated hipaa. but it would have been legal to hire a TaskRabbit - total stranger - to go pick them up, take them to Kinkos, and fax them to me.
- that’s insane. we’re not being protected. we’re not getting access at the rate or in the form that we need yet. it’s getting better, but it’s still slow. and we’re less willing to tolerate it, because we’ve been trained to expect more from our institutions by good technology.

Where we have to go next / role of Blue Button+

- the hard part is that most of us, when handed a file of data, don’t know what to do with it. i downloaded my genotype, my fitbit data. No idea what to do with it. compared to my “medical record” in PDF, which is computationally useless but human readable.
- the investment in BB will pay off best when I have the right to direct my file to someone who can do something with it, for whatever reasons i choose. whether to run an app that makes sense of the data or to donate the data to research.
- BB+ is a great example of this. It allows for both market solutions (banks!) to emerge and for pre-competitive or public private solutions to emerge where we can donate data, or share it conditionally.
- i’m looking forward to pushing Sage Bionetworks, the non profit where I work, to be one of the first certified recipients of BB+ precisely to enable the non-market reuse of health records data.

Jun 3, 2013

May 2013

1 post

Officially Entering "And Then They Fight" Phase of Open Access

And, my friends, in this story you have a history of this entire movement. First they ignore you. Then they ridicule you. And then they attack you and want to burn you. And then they build monuments to you. 

Although the substance of the above quote is usually attributed to Gandhi, there’s no record that he actually said it. The quote above is by Nicholas Klein, a labor activist, from 1918.

I don’t include it as an example of attribution decay. I use it as a frame for where we are in the open access world right now.

We’ve had a good run. We got the NIH public access mandate. We got the petition to 25,000 signatures. We got the presidential directive extending the NIH policy across the entire federal government. We got multiple examples of open access publishers into sustainable revenue models.

But changing the default from closed to open was always going to involve a phase where those whose revenue models depend on closed really brought the guns out against us. And we’re there now.

There’s Wiley, wallowing in the mud and smearing Public Library of Science’s peer review credentials under the charade of a survey of authors.

There’s Elsevier, proposing a novel license for STM publishers and somehow magically being part of a Netaction “bad legislation” coalition that attacks all open access bills, while denying any knowledge of it (no story coverage, but some conversations on Alicia Wise’s twitter feed). (UPDATE May 24 2013: Times Higher Education UK has a story in which Elsevier sort of distances themselves from Netaction)

There’s crocodile tears covering the emergence of scammy open access journals, none of which mentions the long-time existence of scammy closed access journals. This is not surprising, as so many of the large, “authoritative and important” publishers make money by publishing scammy journals - anyone remember the Merck-Elsevier scammy bone journal? Still waiting to see someone mention that in the same breath.

Then there’s the systemic disadvantage we have as advocates once policies move into implementation phase. The meetings last week at the National Academies are a great example of why it’s so hard to change the system. I had to travel 3 of the 4 days for work, and the fourth day I was in meetings all day that made it impossible for me to attend, or to speak.

We have day jobs, us advocates. But the publishing industry we’re fighting against has no other job. They can hire people who have only the responsibility of making sure the open policies are implemented in the least open way. They can saturate every meeting in DC with hired guns, and claim it as evidence that the public supports them.

But it’s not about being depressed, or complaining. It’s a sign that we’re finally getting close to the bone. We’re enough of a threat not to be ignored, or ridiculed. We’re gonna get hit, and we’re gonna get hit hard.

We have to keep reminding the world that this isn’t about protecting a dinosaur business model, this fight. It’s not about scammy journals, which exist no matter how they get paid for. It’s not about who has the most lobbying money in DC. It’s not about new licenses, or sleazy survey language.

It’s about letting entrepreneurs build businesses on top of open content. It’s about kids building cancer tests on open content. It’s about you and me being able to read what our tax dollars paid for. Don’t let the FUD and mudslinging get in the way of that message, ever.

We have to keep getting up. We have to keep fighting back. Because in the end, we’re on the right side of history. And once we get through this phase we get to the good part, where they build a monument to Heather Joseph and Peter Suber and Mike Eisen and all the heroes of open access.

May 20, 20134 notes

April 2013

2 posts

Business Strategy and Openness

I got a lot of good responses to my post yesterday about Mendeley’s acquisition by Elsevier.

But a theme emerged that I didn’t intend to emerge, which was the idea that because I pegged Mendeley’s investment in open access as a customer acquisition strategy that it made it insincere.

That was most definitely not the point.

Remember that half the time I’m a senior fellow in entrepreneurship at the Kauffman Foundation. We study startups and support entrepreneurs. I can tell you that using openness as a customer acquisition strategy is something I think is a smart move, especially the way that Mendeley did it.

As a quick reminder for those who’ve never started a company, they need to acquire customers or they go out of business (as mine did, because mine didn’t). If you don’t have a strategy to do that, you’ll fail.

By selecting open, and fully committing to it, as a strategy, Mendeley helped the Open Access world enormously over the past four years. What they’ve done creates a solid track record in OA, a serious and provable one. They didn’t do this for evil reasons, or in anything close to an insincere way. And by doing it they advanced the movement. We can, and should, thank them. That’s why the anger stunned me, as I mentioned yesterday.

They hired a community manager (William Gunn) personally and professionally dedicated to OA, and let him run. I know, respect, and believe deeply in William and his commitment. That’s not an insincere company move.

Their data’s under CC-BY. That’s not insincere. I made a crack about allowing the database to be downloaded and reposted, and immediately heard back from William that I could go for it. That’s not insincere.

They ran the binary battle with PLoS (I was a judge!). That’s not insincere.

They backed the Access2Research petition, immediately. That’s not insincere.

My point wasn’t to gleefully give Mendeley the finger and say, you were closed all along, or to say, you were just using us to get customers. Every startup needs customers, and if you don’t realize that you’re kind of being a dick to the people in the startup. Mendeley’s making a meaningful commitment to open as a customer acquisition strategy was innovative, and important, and advanced the cause of open access to the scholarly literature.

My point was deeper. It was that because the openness was not tied to revenue, it could be removed now that the product didn’t need an innovative customer acquisition strategy. It’s tied to a massive customer base now. Open can be discarded. When it’s part of the revenue model, it can’t be discarded nearly as easily.

And in my experience, if open *can* be discarded, it usually *is* discarded when monetization becomes the priority. I hope that I’m wrong.

image

Apr 13, 20132 notes
Lessons from Mendeley: Where's The Open In The Model?

So Mendeley got bought by Elsevier. And there was much teeth-gnashing. I won’t link to it but it spawned two solid hashtags: #mendelsevier and #mendelete.

I have a Mendeley account, but never used it other than to test the system against Zotero, which is what  I use to track my own work. So I am not affected by this but I’ve been a bit stunned by the depth of the anger against Mendeley. I’ve waited to write this to try and understand it.

Part of it, I assume, is just Elsevier rage. Danah Boyd has summarized why Elsevier is rage-worthy nicely in her post on the acquisition.

But the greater part feels like the anger over what many seem to think is a broken promise made by Mendeley to be an “open” company.

I don’t feel that way. I never thought Mendeley was an open company. I thought they were deploying a strategic approach to openness by exposing their data under CC-BY, but I always thought that openness wasn’t the point of the company. It’s why I didn’t use the product and why I wasn’t surprised, shocked, or saddened by the acquisition.

It’s got me thinking though about companies and “open” - and what matters in deciding whether or not to use a product from a company claiming that mantle. For me it boils down to where the openness lives in a company.

There’s a lot of ways to slice this, but a simple one would be: is the “open” part of the revenue model or is it part of the market acquisition strategy? If the former, like BioMed Central, I have a lot more faith that an acquisition will not destroy the openness, because “open” is part of the way that the company makes money.

But the open access part of Mendeley to me always appeared to be a customer acquisition strategy. It appealed to the OA folks, it appealed to developers, and it never affected any monetization or revenues. There were always visible choices by management to hedge their open bets, as Jason Hoyt has laid out. And that makes it a risky bet to think they’ll stick with it now that they have access to a massively larger customer base while inside a company with traditional antipathy towards openness.

Again, I’m not mad. I either avoid, from a professional basis, companies built on closure, or I mitigate my expectations of them and do a lot of backing up. Because at some point unless the revenues come from open, the customer acquisition strategy of openness will be deprecated. If it isn’t, then the management of the company will be replaced with managers who are willing to shut things down to make money.

Always, always, always examine claims made by companies about openness. I’m not the world’s biggest Evgeny Morozov supporter, but he’s right to examine the way that the words “open” and “sharing” and “free” get co-opted. Facebook lets you share! It’s free, and always will be!

Look the gift horse in the mouth. And if the revenue model of a startup isn’t built on open, then feel free to use the tool. But don’t get emotionally invested, or dependent, no matter how seductive the rhetoric may be. Because at some point your use and attention and content will be monetized, probably in a way that bothers you.

Apr 12, 20139 notes

March 2013

4 posts

A Fool's Errand, Annotated

So, I have a commentary published in Nature this month about the importance of using a CC-BY license to achieve full open access. I requested the article be made freely available as part of my agreement with Nature, but they paywalled it anyway. It’s freely available now but not until after some embarrassing email and twitter hassling.

I am not particularly mad at any of the parties involved. It just points out the power of the default switch being closed, and how hard it is, even when you’ve negotiated an agreement, to flip it to open. It points out the weakness of the author in negotiation with the journal. Maybe I’m the fool in the fool’s errand.

Also, in the search for brevity that print journals enforce, I didn’t get to be as granular as I wanted. My quarrel is with the publishing industry’s attempt to write a new license and I have no wish to lump those with whom I have a philosophical disagreement with those OA advocates who sincerely dislike CC BY, like Heather Morrison or many in humanities, into the same pool.

Anyway. Below are references for key points I make in the commentary.

1. Re: definitions of OA, see “Budapest Open Access Initiative” at http://www.opensocietyfoundations.org/openaccess, accessed 03/13/13

2. Re: restrictions in licensing, see Elsevier’s published contract with California Digital Libraries: ““Schedule 1.2(a) General Terms and Conditions “RESTRICTIONS ON USAGE OF THE LICENSED PRODUCTS/ INTELLECTUAL PROPERTY RIGHTS” GTC1] “Subscriber shall not use spider or web-crawling or other software programs, routines, robots or other mechanized devices to continuously and automatically search and index any content accessed online under this Agreement. “” online at http://www.google.com/url?q=http://orpheus-1.ucsd.edu/acq/license/cdlelsevier2004.pdf&usd=2&usg=ALhdy2_FmzOtI3JkKs-fJwirgig4WLA5fA, accessed 03/13/13

3. Re: “CC Plus,” see various comments in lectures at “FACT Seminar No. 1:  Licensing in an Open Access Environment: legal niceties, funder mandates and publishing challenges“ at http://www.stm-assoc.org/events/fact-seminar-no-1/?presentations, accessed 3/13/13

4. Re: CC BY, See “Creative Commons Attribution 3.0 Unported” (the “commons deed” with links to complete underlying license) at http://creativecommons.org/licenses/by/3.0/, accessed 3/13/13

5. Re: 70+ requirements, see “Public Policy Requirements, Objectives and Appropriation Mandates” at  http://grants.nih.gov/grants/policy/nihgps_2010/nihgps_ch4.htm, accessed 3/13/13

6. Re: community defintions of open things, see “Open Knowledge Definition” at http://opendefinition.org, accessed 03/13/13, and “Open Source Definition (Annotated) at http://opensource.org/osd-annotated, accessed 3/13/13

7. Re: license incompatibility, see “GPL-Incompatible Free Software Licenses” at   http://www.gnu.org/licenses/license-list.html#GPLIncompatibleLicenses

8. Re: decomposition of licensed elements and the CC licenseed Time Photo of the Year, see “Trapped Underground,” a CC-BY photograph of the London Bombing aftermath, available at http://en.wikipedia.org/wiki/File:Trapped_underground.jpg  accessed 3/13/13

9. Re: technical solutions to provenance, see “Source Attribution in RDF,” http://www.w3.org/2001/12/attributions/ accessed 3/13/13

Mar 27, 20135 notes
A Natural Study of Openly Licensed Books

One of my favorite users of CC-BY is Pratham Books in India.

Pratham Books is a non-profit trust that publishes high quality books for children at affordable prices and in multiple Indian languages. They’ve shipped more than 7,000,000 books.

They’ve got a beautiful post about their move from the Attribution-NonCommercial-ShareAlike (BY-NC-SA) CC license to the Attribution (BY) only license and the internal back story of the decision. What’s really fascinating isn’t just that they relicensed 400 books under BY, but that they only managed to post 173 of those books online at Scribd. The other 227 books were not posted. So we have a nice, natural study to analyze of the differences between openly licensed content that is in a stable, well used platform and content that isn’t.

Some brief takeaways. Read their whole post to see the graphs.

  1. The sales of individual books available on Scribd don’t differ greatly than those that aren’t  - but they’re definitely not significantly lower.
  2. Cumulatively, the sales of Pratham books on Scribd appear to outsell those that don’t, though not totally outside the error margins. But again, they’re not lower.
  3. For cumulative sales data for CC books that were available on Scribd vs. CC books that were not available on Scribd, the former outsold the latter in almost a 3:1 ratio.

Let’s unpack point #3, because it’s fascinating. These are books that are available online, under the most liberal license offered by Creative Commons. You can one-click download and print them, or even send them off to be reprinted and sell them yourself. Yet the ones that are there radically outsell those (also liberally licensed books) that are in a more controlled technical environment.

There’s a lot of money spent looking for ways to sell books and content on the web, to protect authors, to protect old revenue sources. Sometimes though, the best advertising for the content (and implicitly the author) is the content itself. And there is mounting evidence that at least some people will pay for the authentic version, whether it’s to get the physical artifact, to help the author, or simply because they want to.

This is a tiny data point. What’d be great is if as these experiments happen, more publishers started to release the data as to the outcomes. We live in a world where we can’t even get a publisher to tell us the breakdowns in their revenues, how much they make off subscriptions and new content versus access to the back catalog. Data about where the money and the sales actually happen will drive far better policy than we have. If only the publishers would get that and start opening up some channels.

Mar 14, 20131 note
Solutionism and Sensorism

Evgeny Morozov’s been on a tear lately, with articles about “solutionism” in Slate and the New York Times, and an absolute destruction of Gavin Newsom’s new book on networked politics.

I’m torn by Evgeny’s work. I tend to agree with what I perceive is one of his basic ideas: our culture of deifying the application has a nasty side effect of reducing the perceived importance of political change: don’t bother actually risking anything to protest, just like the Facebook page about the protest. I really liked the concept about the rise of the choosing algorithm and its impact on creativity.  I have a soft spot for a truly well written negative book review. And I think it’s vital that those of us who self-identify as “open” advocates engage with his work, because I think he’s onto something in more cases than I’d like to admit.

On a related note, I’ve been reading The Theory That Would Not Die recently. It’s a nice look at the waxing and waning of probability and inference over hundreds of years. When we’re in periods where we don’t know what we’re looking for - and it’s important to be as right as possible as fast as possible - then Bayes’ table becomes a vital tool in the kit. In World War II, that was cracking Enigma.

Unfortunately now, most of its usage is social, mobile, commercial. It offers me rehab in Napa Valley when I buy wine online.

But I worry that in burying the algorithm and the application and the game, to kill its overestimation in culture and in politics, he fails to appreciate the places where those tools are yet to be applied, but might bring real change. By real change, I mean epistemic change, a change in the way that we know we know something.

Science is one of those places. Science has a problem related to, but different than, solutionism. I’d call it “sensorism” perhaps - the belief that because we can make a machine that senses things more finely, more completely, and in massive parallel, we’ll somehow come to a greater understanding of the underlying thing being studied.

Sensorism is rife in the sciences. Pick a data generation task that used to be human centric and odds are someone is trying to automate and parallelize it (often via solutionism, oddly - there’s an app to generate that data). What’s missing is the epistemic transformation that makes the data emerging from sensors actually useful to make a scientific conclusion - or a policy decision supposedly based on a scientific consensus.

One of the reasons I do “open” work is that I think, in the sciences, it’s a philosophical approach that is more likely to lead to that epistemic transformation. If we have more data available about a scientific problem like climate change, or cancer, then the odds of the algorithms figuring something out that is “true” but incomprehensible to us humans go up. Sam Arbesman has written about this nicely both in his book the Half Life of Facts and in another recent Slate article.

I work for “open” not because “open” solves a specific scientific problem, but because it increases the overall probability of success in sensorism-driven science. Even if the odds of success themselves don’t change, increasing the sample size of attempts will increase the net number of successes. I have philosophical reasons for liking open as well, and those clearly cause me cognitive bias on the topic, but I deeply believe that the greatest value in open science is precisely the increased sample size of those looking.

I also tend to think there’s a truly, deeply political element to enabling access to knowledge and science. I don’t think it’s openwashing (and you should read this paper recommended by Morozov on the topic) to say that letting individuals read science can have a real political impact. 

I look forward to reading Morozov’s new book this weekend at SXSW. It’ll be a bracing antidote to what I expect will be rampant solutionism. My panel is political - it’s about who owns the data these solutions create, and what rights and risks those solutions entail. It’ll get a lot less attendees than a product launch.

And I hope Morozov engages with sensorism one of these days. It’d be fun to see what he does with it.

Mar 7, 201326 notes
Nature Haz Open Access

Wednesday featured a double-barreled shot by Nature Publishing Group across the bow of the Open Access conversation. First, an editorial unambiguously endorsing “gold” open access as the proper form of scholarly publishing. Second, a majority investment in Frontiers, a for-profit CC-BY scholarly publisher.

I’m personally just as happy to see this as I was to see the OSTP policy expanding public access across all federal research. Let’s just say some of my friends were surprised at that.

image

Sorry, but I liked it.

I have enjoyed watching concern trolls wading into the debate warning that Nature’s only doing this for their corporate masters. Well, duh. That’s what for-profit companies do.

And for profit companies choosing to back open systems has a history of working. The open source software movement really exploded when IBM made a full-throated choice to switch to open source. They didn’t do it because they suddenly became devotees of Saint IGNUcius.

image

Image, ironically, copyright 2002 by Julian Cash

They did it because the corporate masters figured out they were losing in the changing marketplace and they could make more money on services by embracing open, and that it could be a weapon against their competitors.

Turns out they were right, by the way. IBM made $1B on its portfolio of patents in 2012, but it made $20B on global business services in 2011. That 20:1 ratio feels about right to me in terms of value.

If Nature did this because they think they can make money, bully for them. What I care about is a second major for profit publisher has made an unambiguous business choice to back open copyrights on scholarly literature (Springer being the first when they acquired BioMed Central).

Publishers realizing that making scholarly research truly open is a good business decision further tilts the field towards making scholarly research truly open. It’s depressing but not surprising that I have to point out this tautology. And if it means we have to fight them in the halls of government on price, fine.

My personal belief is that if these publishers misprice the cost of libre open access in article processing charges, the market will fix that, and startups like PeerJ will win the day. Scientists are price sensitive.

Rather than insist on one open access regime to rule them all, I think we instead need to cultivate our garden, as Candide learned.

We don’t live in the best of all possible worlds. We live in a world of diverse scholarly disciplines with diverse needs, with competing interests in publishing markets and societies and universities. I think we can advance the whole movement fastest by embracing the “yes, and” approach (HT @DCDave).

Open publishing and self archiving will each solve certain problems at certain times. And each will suffer either real or perceived exploitation by companies new and old.

And that’s ok. That’s part of growth. That’s part of the garden growing. 

Mar 1, 20131 note

February 2013

2 posts

Reflecting on Public Access

Now that I’ve had some time to process the new White House public access policy, here’s a few thoughts.

First and foremost, I think the policy is worth celebrating. This is something on which there is a difference of opinion. Mike Eisen is the most vocal and eloquent critic of the policy, and his post on the topic is essential reading whether you like his position or not.

I disagree with Mike on the conclusion, but it’s worth examining why. And it’s not worth attacking him - indeed, attacking each other is a feature of the OA movement that sickens me.

But I think his position comes more from a worldview where there is indeed real movement towards true OA (i.e., no embargo, open copyright licenses, and a total change of the publication industry via startups both non profit and for profit). Compared to that progress, the policy is like going back in time to when the NIH had no policy.

But here’s the thing. That progress simply doesn’t exist in most of the other spaces where the US Government invests in research. It doesn’t exist in agriculture (100,000 papers per year which will be key to food supply and debunking bullshit claims about GMO food, for example), or defense, or trade, or patents, or space science, or energy.

I wrote a paper with my dad on open access and energy policy (my dad is a climate change adaption scientist with a piece of the IPCC Nobel Prize) that opened my eyes to just how little of the energy literature is OA. If you look at the links in an IPCC report you’ll be shocked at how few you can read. None of the green, or gold, OA conversation has had a significant impact there at all.

Compared to the world that exists there, a world that is not being wrenched open by change and entrepreneurs and repositories, this policy is indeed enormous progress.

I don’t like the embargo entrenchment at 12 months. I don’t like the lack of reuse rights. I don’t like allowing linking into journal archives as opposed to centralized repositories. I don’t like the praise of the dying traditional publishing industry, but that’s pablum. 

So let’s be frank. This isn’t a strong open access policy. But that’s ok by me.

Because it’s an enormous expansion of public access into spheres of research where it was essentially absent from the conversation. It creates a policy environment that tilts the field towards change, towards startups, towards publishers whose embrace to real Open Access becomes a competitive advantage over time - and it does so across an enormous swath of science.

It re-creates the conditions built in the life sciences years ago in other sciences. I believe that’s what will advance open access fastest there. But let’s also not attack Mike, or others, who disagree. We need to be constantly reminded to strive for the most, and attacking one’s own critics is not a healthy sign of an open movement.

This policy is not the end of our work in open access, or even the beginning of the end. But I do believe it’s at last the end of the beginning of the enormous change from closed publication of science to open publication of science. From now on it’s about how we implement access, not if.

Hats off to everyone, and now get back to work, because we’re a long way from done. Time to focus our energy on Congress to finish the job.

Feb 27, 20133 notes
White House Public Access Policy Is Out

The White House has published a response to our We the People petition, commonly known as the OA Monday petition.

I’ll have a longer post up once I’ve had time to thoroughly review the policy, but the multi-year campaign to turn the NIH public access policy into something that applied to the entire federal research system is over.

I’d like to thank the White House, especially the Office of Science and Technology policy, for listening to us through the various RFIs and then finally to the petition. This is good policy. And it’s nice to see something substantive emerge from the We the People platform, too.

To the squeals of the publishers, I say only this. Jobs will be created by open content. I personally built a company years ago on the open data published by the NIH. I’m looking forward to watching others build companies on the open literature that will emerge under this policy. And the fight’s not over - it won’t be until we have reuse rights, not just free downloads.

But for now, let’s dance.  We won.

Feb 22, 20131 note

January 2013

4 posts

The Strange State Politics Of Arabic OER

Yesterday, U.S. Secretary of State Hillary Clinton announced the Open Book Project, a major initiative to make Open Educational Resources available in Arabic. There was a big launch program at the State Department, and some remarkable words from Mrs. Clinton:

You can look around the world and see young adults in remote villages and towns huddling around a computer watching videotaped physics lessons by MIT professors. Top universities like Rice University are creating free online textbooks and saving students money in their studies. Science education websites like Khan Academy go viral. There are other examples, and these are all fruits of technological progress, but also of a commitment to make more learning materials open – free, open licensing for anyone to use, adapt, and share.

(from the text of Mrs. Clinton’s remarks - emphasis at the end added by me)

I was one of many people excited to see these kind of words. I’m always glad to see this kind of phrasing about copyright in the halls of government, because it indicates at least some understanding of the potential of open copyright licenses to create innovation.

But then I started thinking more about it, and the uncritical acceptance of the project without a conversation about the politics of it started to bother me. I still think it’s a worthy, and wonderful project.

But let’s unpack what we’re doing here, as a government.

It seems to me that we are applying state power to create digital knowledge resources that (probably, maybe, definitely) subvert the attempts of both state and non-state (social, religious) actors to control the kind of educational materials available in the Arabic speaking world. That we are practicing, as statecraft, what Nils Gilman has called deviant globalization, using an open source approach.

And that’s worth contemplating. I do not think that this is an embrace of open copyright because of philosophical shift in favor of openness. I think it’s a pragmatic piece of realpolitik - getting textbooks into a language and format where they can evade the controls imposed by actors whose interests are felt to be contrary to the US.

The theory that more knowledge in more hands is better for the world is one I happen to subscribe to, and agree with, but let’s not pretend this isn’t an exercise of state power. It is far more like the Voice of America  (especially at its founding, when it was not required to provide “unbiased” news) than it is like Free Software.


Note: I edited this post at the end with the bold text in the final sentence at the suggestion of Dave Clifford after it was first posted.

Jan 29, 20131 note
Emperor's New Short Tandem Repeats

Yaniv Ehrlich’s lab at MIT has a new paper out in Science today, with a companion policy piece from the National Human Genome Research Institute at NIH. Apologies for icky paywalls but these are important papers.

The gist is that a savvy computational scientist can find enough breadcrumbs in a genome to figure out the surnames for participants in supposedly de-identified studies. The methods for the paper are reminiscent of the re-identification approaches against the Netflix database, the AOL search database, and the Massachusetts health records database: cross-referencing de-identified information with other public information, and then using that against other records associated with surnames.

And it gets bigger - 135,000 or so records may potentially identify millions more through inheritance of short tandem repeats. I’ve probably mangled the approach. The short version is perhaps better summed up by a picture:

The dude in red is mathematics. The dude in white is your anonymity.

These two papers force us to be honest in talking about genomic information and identifiability: the basics to re-identify significant portions of people from their genomes alone are already in place. And they’re already strongly able to lead to surnames, long before we hit the mythical $100 genome.

So what does honesty in this space mean? It means we shouldn’t promise people that we can both de-identify data and make it useful. It means that we should also celebrate the benefits that de-identification brings, and think of it in a risk-reward context for those joining studies that involve genome publication online. It doesn’t mean stop sharing, stop sequencing. It means stop pretending the methods for de-identification work very well. A lot of people will go away anyway, but a lot will share.

Both papers reject the idea of ceasing data sharing as a result of the research, which is heartening. We are in a world where we are simply less anonymous than we used to be, than we’ve ever been. There are enough unique things about us all, and enough devices capturing them, and powerful enough algorithms, that this stuff is simply doable now.

We need to develop a whole spectrum of ways to manage privacy. My own work on consent is just a piece of a tapestry, for those who really want to donate, to share, to be exposed. Hopefully this opens up more space at the table for the new approaches that are bubbling in health privacy management. We need data markets. Data banks (you know, like old-school community banks). Data conservation trusts (like land trusts - I’m going to publish something soon on this topic). We need entrepreneurs to fill the gaps between all open and all closed, to provide products that make someone’s data alive to her, not just to a gearhead with a taste for naive bayesian inference. 

Pandora’s box didn’t open with this paper. It’s been sitting open for quite a while now, just waiting for the right eyes to see it.

Jan 17, 20132 notes
Inspired by Aaron?

If you’re new to the idea of open access and public access, brought by the sad energy of Aaron Swartz’s suicide, I urge you to try to be inspired by him and learn about the space. He was a voracious reader, an insatiable scholar.

Follow his example. Do your research, read the overviews, follow the links. Learn is the first part of his four-part method. This is a well developed field, and though we need your energy, we need it very much in a learned and focused manner.

There is already legislation. There are already organizations, advocates, resources. Get involved.

Learn. Then Try. Then Gab. Then Build.

But always, always, start by learning.

Jan 14, 20132 notes
It's Time For An Answer to #OAMonday

On May 20, 2012, I joined Mike Carroll, Mike Rossner, and Heather Joseph to start a campaign for a federal policy to require free access to scholarly articles emerging from taxpayer funded research. We decided to launch a petition on the We the People website.

We hit the 25,000 signature number pretty quickly, and I got a few calls from DC people asking what the hell was going on. This was before every maniac who wanted to secede started a petition - it was, and still is, a backwater, and we figured we could make some noise there. It worked.

Since then, petitions on beer recipes, gun control, and the death star have received official responses from the White House.

We have not.

I am told there is a conversation going on. I am asked to be patient. I am tired of being patient. I’m tired of the power of publisher money carrying the day, delaying the policy, blocking the flow of knowledge.

It’s time to make taxpayer funds turn into taxpayer goods. It’s up to the businesses to figure out how to make money in the new world - if you can’t figure out how to adapt  your business to the network, you will merely be the latest in a long line of dinosaurs dating back to Digital Equipment Corp and running straight through to Newsweek. I do not cry for you.

It’s time to bring an end to the barriers to new businesses. Time to enable new entrepreneurs, who make money from increasing the free flow of information rather than by restricting it. Time for search engines that make it easy to find and integrate the literature. Time to enable people outside the institutional system who might take scholarly knowledge and turn it into products, into policies, into something of use.

It’s time for an answer. And if we don’t get the answer that says, “you paid for it, it’s yours,” it’s going to be time to escalate this from something on a sleepy backwater petition site to a protest that the whole world will notice.

Jan 14, 20132 notes

December 2012

1 post

Gone Fishin

Have a lot of work to do. Then a little bit of vacation to do. Be back here in the new year.

Dec 7, 2012
Patient Control - Good for Open Data

Olin Hyde noted a nifty survey on patient control of electronic health record information the other day. But he interpreted it in a dour way, at least for my own worldview of openness. Here’s his tweet:

Bad news for #opendata advocated by @wilbanks “@berniemd31: Study: Patients Want To Control #EHR Information - bit.ly/ToOglB”

— Olin Hyde (@olinhyde) November 29, 2012

The article talks about how patients want control over the privacy of their data, so I can see why this is an easy interpretation to make. You’re a patient, someone tells you how much crap is happening with your data, and your gut reaction is to protect it. Makes sense.

But actually I believe it’s the opposite conclusion. I think this study virtually guarantees an open data commons built of health records. 

Because, you see, the very thing that we need to build a health data commons is patient control of health data.

Right now, patients don’t control. Large institutions do. Large companies do. Large hospitals do. So we can’t control our privacy, whether to keep something private forever, or to donate it to science.

But if we get the control that this study says we want, then at least some of us will make that donation. And it doesn’t take very many people making the choice to contribute to create a glorious resource.

Take Wikipedia. It’s remarkable how few people, as a function of total Wikipedia users, actually make Wikipedia. From the Quora page on Wikipedia contributions:

According to Wikimedia’s estimates, the larger Wikipedias (e.g. English, German, French) have 0.02-0.03% of visitors actively contribute. 

Wikipedia has roughly 100,000 active monthly contributors, with about 40,000 of those on the English version. It looks like about 10% of users (~4000 for English) are very active (defined as 100+ edits per month).

If you don’t want to log into Quora, you can see the graphs on which these conclusions are based (Core Editors Are Small and WM Articles V Contributions). But .03%? That’s tiny. I bet we can get .03% of all patients to donate their health records to research. Hell, Buzzfeed is built on the bet that we could get .03% of the population to watch a GIF of a cat chasing a laser.

At the current population of the United States (311,591,917), .03% nets us 93,477 people. That’s a massive clinical research cohort - more than 10x bigger than Framingham. My instinct is that we’ll do better than that, but even if we don’t, that’s a big enough number to change the game of mathematical health modeling.

You see, the secret sauce of the commons is asymmetry. A small number of people making an unreasonable choice, a choice to share, to be digitally naked - that’s all it takes. But if we don’t control the privacy over our own health records, that small number will never get the chance to make that choice.

So I view this study as unmitigated Good News. It means that people are going to get more and more pissed that they don’t have control, and that makes it more and more likely that they get control. Patient empowerment is the first step on the road to open data. It doesn’t take all of us. It just takes all of some of us.

Nov 30, 2012

November 2012

4 posts

Data Versus Pundits (Science Edition)

I’m more than 20 years away from my last math classes. And though I spent my startup experience around some very smart mathematical types, I didn’t learn enough at any point to actually use data in models, make models, or otherwise become one of those “big data” types you read about…well, fucking everywhere.

But I did learn a healthy dose of respect for what data, and math, when put together by the right people, are capable of. And since signing up with Sage Bionetworks, first as a Board member, and more recently as a member of the management team, I’ve learned a lot about what data can do now that processors, storage, bandwidth, and sequencing are all brutally cheap.

The key is, as per @DCDave, to reset your expectations. Cheap, plentiful data changes the epistemology of fields. It did so in baseball, which went from trusting a scout’s “gut” instinct to an intensely data driven science. It did so in weather. This year, it did so in politics.

But it’s very hard to reset your expectations when you’re at the top of a traditional industry. The punditocracy’s dismissive handwaving towards Nate Silver is all I can think of when I go to traditional events on science and health these days.

There is a total unwillingness to reset expectations in the sciences, to go from the certainty of the expert class to the probabilistic worldview of a world overrun by data and new entrants. The idea that non-credentialed experts can generate hypotheses, that roles will be fluid between citizen and researcher, that “normal” people (as if a data scientist is normal - she just might not have a SCOPUS identifier or a PhD) can generate useful science insight, that publishing on the web might be a complement to publishing in peer reviewed journals? Those are all easy to accept with expectations reset by data.

They’re very hard to accept if you attempt to retain the world as it was before sequencing costs were crashing, cloud storage and processing were cheap, and publishing business models were broken to hell by the internet. Those scientists (and many pundits of science) at best ignore and at worst demonstrate bitter anger towards the relentless march of data and networks that “sail blithely on regardless of the carefully worded communiqués that emerge from a parade of meetings and consultations,” as Jonathan Zittrain once wrote of the internet.

I understand why a “normal” scientist might not want to see this. Imagine: you’re 40, you went from college in 1994 straight to grad school, graduated in 2000 with your PhD, postdoc til 2005, just got an assistant professorship and a piece of a lab to call your own, and your first NIH R01 grant last year. Now suddenly you have to deal with big data everywhere?

But it’s the pundits that are the worst. Like retail politicians and baseball scouts, regular scientists will hop on board fast with using data, as soon as it’s shown to work better than ignoring data. It’s the elite that will change the slowest. They’re the ones with the most to lose with a change to the status quo because they are already in charge.

The journal editors. The old guard. Those who treasure their hypotheses as precious snowflakes never to be shared, unaware that their ideas often exist in a matrix of data that can be generated as a commodity, findable via algorithm, and often disprovable through algorithm. They have the most to lose in the transition. And they clearly see themselves as the shoulders in the Newtonian equation when we transfer to data.

But that’s not what this is about. In the end, it’s about increasing the probability that we get things right over the way that we try to get things right today. And the philosophies of science that served us so well in our history of expensive, unavailable data will not serve us well in our current history and its inheritors, who have to figure out what data to delete, not what data to save.

Models will help. Open data will help. But a more supple, more finely tuned epistemology of science, one that understands how data can drive the creation of hypothesis, not simply emerge as part of its proof, is the real key. Because the data, and the quantitative research that can emerge from it, is often going to be better at making predictions than the gut instinct that’s driven many of the narrative sciences (and I’m including both biology and health in there, though not chemistry or physics).

But it will take the establishment class either getting with the program, or getting swept away by the tide. Because in science and health, the funding is controlled by the elites. And they’re still deep in the grips of a pre-data way of knowing what they know.

Nov 26, 20123 notes
YODA yada yada

Yale University published a draft “Open Data Access policy” on November 15th to make available rhBMP-2 clinical trial data that they created in conjunction with Medtronic.

Let me get this out of the way before I go negative: the fact that this policy exists, is on the web, and is open to comment is an unalloyed Good Thing. It’s progress. It’s to be commended.

For those who aren’t initiates in the arcane world of open data or pharmaceutical testing, this is data related to a clinical trial of a kind of biological chemical found in our bodies that plays a role in bone development. Most of the time, this kind of data is collected once, then locked away, never to be seen or used again.

One of the reasons drugs cost so much is that we never re-use the data from clinical trials, so we never learn anything from failures, or from secondary uses of data. It’s an incredibly inefficient system. This project at Yale is an attempt to address that inefficiency by making the data available.

But that’s the extent of my nice words. What follows is a point by point review of the policy. The short version: this is not an “open data” policy. It’s a data access policy, and if they’re not going to fix it, they need to rename it. Because those of us who care about the definition of “open data” actually meaning something are going to criticize, persistently and loudly,  if there’s any attempt to claim the title for data under this policy.

I apologize in advance for the length of this rant, but it’s a long policy. Also, they’re using the name of Yoda in vain.

I. Decision: There Will Be a Data Registration Process For Data Access

OK, I can live with this. We do this at Sage Bionetworks with Synapse, for example. Whether or not it’s a good thing depends on what terms and conditions researchers are forced to accept as part of registration, and who is allowed to register. Which brings us to the next point.

II. Decision: Registration For Data Access Will Include Investigator Disclosure And Submission Of A Study Proposal

This is not great. You’ll need all of the below to even apply for registration:

·      Principal investigator’s: name, degree(s), SCOPUS ID, primary employer, and contact information, including phone, mailing address, and email address.

·      Name, degree, and SCOPUS ID of other key investigative team members.

·      Funding source and conflict of interest statement (using a modified version of the ICMJE disclosure form) for all team members.
·      Certification of IRB approval (or waiver) from academic/university partner [see section III].
·      Project specific aims, main and secondary outcome of interest and analysis plan [an example proposal will be available on the YODA project web site], and timeline.

Huh. I guess this would basically rule out…well, every data scientist in the world that doesn’t have a PhD and a SCOPUS ID. Which is pretty much the vast majority of them. It’s a lockout policy intended to limit liability and contain research to the stuff Yale and Medtronic think is ok.

The only acceptable part of this is that they will publish the registration applications – so we’ll know what they’re turning down, I suppose. But that can also create a disincentive to even apply, as it means that you’ll have told the world the questions you wish to ask before you’ve even had a chance to ask them.

This decision also basically guarantees the data will never be integrated with other data, because then these requirements would have to be syndicated over to all the other data. So trials about similar diseases, genetic networks including rhBMP-2, computational networks? Segregated by this decision, forever.

This decision further creates the possibility of catastrophic success. Should this data actually become essential, the transaction costs of reviewing applications will skyrocket. I’m doubtful that the faculty reviewers involved will enjoy spending their time looking at incrementally different applications to access the data rather than doing, yaknow, novel research that helps them get more funding.

III. Decision: Registration For Data Access Will Require At Least One Key Investigative Team Member From An Academic/University-Based Partner

The justification for this is that “This requirement strengthens the likelihood that the data requester (and eventual user) will have the scientific capability to conduct the proposed analysis” and securely store the data.

Based on my experience, the odds that someone can use data are not strongly correlated with academic credentials. For every whipsmart data scientist in academia there are a dozen more with strong chops, secure Amazon web systems, and killer Bayesian models outside academia.

Imagine if Nate Silver had been required to go through this kind of process to access the polls? Only mathematicians with a political partner need apply! I’m sure all the pundits who called him a wizard would have loved to sponsor his application.

IV. Decision: Requests For Data Access Will Be Reviewed For ‘Completeness’ and to Ensure that Data Use Limitations are Met

This is an unnecessary bit of overhead, but if you’re going to make people file applications, and make faculty members review them, then this makes sense. Might as well make the review process as complete (not to mention time-consuming) for your own faculty as possible.

V. Decision: Limitations Will Be Placed On Data Use

“Data requestors will be required to certify that they will meet the following expectations:

·      The data will be used to create generalizable scientific knowledge.

·      The findings derived from analysis of the data will only be publicly disseminated through the peer-reviewed biomedical literature or a scientific meeting.
·      The data will explicitly not be used for commercial purposes or pursuant of litigation.

What the hell is generalizable scientific knowledge? Where does it stop and start? Who gets to enforce the definition? Are we talking Karl Popper, or Kuhn, or Feyerabend – or Arbesman – here? This is so vague it gives those who would deny a data access request total power to say no under its rubric. An exploratory data scientist simply looking to test a model? Door, locked.

The findings will only be disseminated through the literature and meetings? No social media? No blogging? They won’t be published as new algorithms directly into R clients? Seriously? Utterly myopic view of how knowledge is now communicated.

The litigation thing I’m actually willing to give them. There’s too many law firms that would descend on this with fangs bared to start class action lawsuits, at least now. Until we have social norms and judicial precedent to deal intelligently with clinical data this kind of constraint might be part of the deal.

“Data will not be released to applicants whose intent is clearly based on commercial or legal purposes.”

OK, so if I keep a blog with Google ads, is that commercial? If I work on cancer as my 20% project at Google? Who decides what “clearly” means? What if I’m an academic sponsored by a Medtronic competitor (this appears, ironically, to be okey-dokey)? Again, broad reasons to say no to those wishing to exploratory computational research.

VI. Decision: There Will Not Be A Data Use Fee

Yay! Of course that’s just for a year.

VII. Decision: Medtronic Will Be A Party To The Data Use Agreement, With Authority To Enforce It

Oh, awesome.

Just go read all the terms and conditions and then don’t even bother applying. Anyone who actually goes through this is either going to be someone they already know or a total masochist.

Worse, the data use agreement gives Medtronic the right to snoop into your research. It’s a party to the deal. It has the right to enforce the deal. Sign with care!

“DATA DISTRIBUTION”

Data will be distributed via an “encrypted USB flash drive via FedEx (or similarly secure shipping company)”.

Glad they specified a “secure” shipping company.

Seriously? A USB stick? The thing that gets left everywhere, falls out of pockets, carries viruses into Iranian nuclear facilities, and is used for corporate espionage? After all this detail to secure the data the actual delivery method is very likely to result in the leak of at least an encrypted copy of the data into the wild (when there is a Wikipedia heading on “USB drive data leakage” you may have a problem with it).

If you’re going to go to all this effort to lock it down, just keep it simple and use a secure cloud service and encryption. At least then you can monitor the download and there’s not a physical copy floating around for an exhausted postdoc to drop on the floor and get picked up by a janitor.

CONCLUSIONS.

Well, I doubt seriously that this is going to change significantly. This reads as if it were written by Medtronic and then a well-meaning committee of scientists attempted to ameliorate its worst excesses. And I’m glad for the fact that it was created, placed online, and comment was requested.

We have definitions for openness precisely so the words are not used in ways that mislead us. To make sure that when something claims the mantle of “open” that it is indeed open. To create a certain level of quality control in the open world.

There’s nothing wrong with not being open. What’s wrong is claiming to be open, when one is in fact not open. It is important that we hold all to task against the open definition, and that we call something what it is. Because this is not an open data policy. It’s an Yale Access to Data Agreement – a YADA, not a YODA.

And there’s nothing inherently wrong with that. It’s not bad to make data available under kind of insane terms, because at least it’s better than not making it available at all. But this isn’t open. Not even close.

Nov 16, 20121 note
Thanks, Pete

Pete Stark lost his House seat last night. He’d held it for 40 years.

He is a divisive figure to some. He’s a hero for his legislative work to others. And I’m not going to address any of that, though I believe history will judge him very kindly on the merits of his record.

I just want to thank a man who did more for me than I did for him.

Pete hired me in 1996, when I was so wet behind the ears I could water plants just by sitting next to them. I was a legislative assistant in his Washington office, which is a fancy way to say I read through the legislation that the new Republican majority was ramming down our throats and tried to find some of the nastier bits and get them taken out.

One little anecdote from those days. It was my first few weeks on the job and I was overwhelmed. A previous staffer dropped a massive set of files on my desk and said, “this is the Headwaters file - you’ll want to start reading it soon.”

That night I got a call at home, after midnight, that a clause had been inserted into an appropriations bill that would allow the logging of Headwaters. I had to taxi into the office, start writing Dear Colleague letters, and help whip a campaign to get the clause taken out when the bill came to the floor. I had no idea what I was doing. I was 23.

When the floor debate came, I was watching it on CSPAN, which ran 24 hours in the office. We were just across the street. Pete grabbed his 2-minute comments that I’d drafted off the printer and walked over.

I saw him stride up on the TV, and to my horror, he started reading the press release, not the speech for the chambers. The press release was completely over the top. It called Charles Hurwitz, the then-owner of much of Headwaters, a “blood sucking corporate vampire” (not my words, but the product of a gifted, and awesome, press secretary).

You can’t attack an individual like that on the floor of the House. Pete was immediately called for censure. I nearly wet my pants. I was surely going to be fired for this mistake.

The time it took for him to walk back to the office was epic torture.

He came back, clapped me on the back and said, “Aw hell, I decided to read the release because I thought it would be more fun - relax, kid.”

He was a great boss.

Pete gave me my first big professional break. Pete’s office is where I learned how to run a database. His Mac is where I wrote my first webpage, in plain text in 1996. He gave me my first bottle of great wine. His chief of staff got me backstage at a Jerry Jeff Walker show at the Birchmere, and took me to the National Democratic Club on the night Clinton was re-elected.

Pete, I left before I ever got good at working for you. But you were good to me, and you served your country well. I hope somehow this gets to you. Thanks for your service, and thanks for giving a kid a lot of breaks a long time ago.

ps - we won the debate on Headwaters, and I look forward to taking my son to see those trees, when he’s old enough, and telling him this story.

Nov 7, 2012

October 2012

6 posts

Open Access Week, 2012 Version

There’s a ton of activity around the web for Open Access Week 2012. There’s eleventymillion events in person and online around the world for you to engage in.

I’m not doing any, for the first time in a long time. The OA movement is running downhill now, for real. Despite the squeals of those who would defend a broken business model tied to a broken access model, this one’s pretty much over. The question isn’t “if” open access, it’s “how” open access.

OA isn’t going to be pretty, just as the old system wasn’t pretty. Using the flaws of the publishing industry itself as a rationale to ration access is a weak strawman - conflicts of interest and hucksters flacking bad journals aren’t exactly an invention of open access. They’re part of publishing.

But Open Access means vastly more readers, and wider distribution of knowledge driven by science. And in the end, that point trumps all - especially given the public investment made by taxpayers in science. There is no constitutional protection for industries disrupted by network effects. Just ask Borders.

Anyhow. This year’s been a weird year for me in OA. I’m not working in the heart of the movement, having moved over to try to make it easy to donate your data to science instead. But I did help poke a stick in the White House’s eye with the Access2Research petition, for which we are still awaiting a reply.

Honestly, once August came and went I gave up hope of hearing anything before the election. The news cycles in a presidential election, especially one as taut as this one, magnify everything. An OA policy would’ve been jumped on by the Romney campaign, quoting those Elsevier stock hits as American job losses on the way. I remain very hopeful that the answer when it comes is a policy extending the NIH public access policy across all federal agencies but I’ve always been willing to believe.

The extension of the policy is the natural end game for OA in the US from a taxpayer perspective. It’s not the end of the movement though.

Now we need to avoid fragmentation. We need ORCID to actually not just mint identifiers, but itself act like an open organization (give us the freaking source code already, ok?). We need more entrepreneurs starting new publishing businesses, and testing new business models. We need big publishers to recognize the opportunity - to be the IBM/open source of the space.

And we need some acknowledgment that licensing, in the end, is really important. If it ain’t CC-BY, it ain’t compliant with the community definitions of OA. It’s better than no access at all, but we should never, ever trade away our rights in return for free stuff, even if that free stuff is knowledge. 

Congrats all on a great year, and here’s hoping I get a good excuse to bust out the Snoopy Dance sometime in November.

Oct 24, 20122 notes
23andme and Me

I got a lot of reaction to my slides today at Strata Rx, but there was one part that worried me a bit, which was the reaction to my comments about filing for developer credentials for 23andme’s API.

I applied during the first day of the conference. I’m asking to develop an application that sits just after the Portable Legal Consent process, so that people who have gone through the consent and are 23andme customers can just zap their data over into the open commons.

I immediately got testing credentials. It’s so far a very professional setup and I commend them on the ease of the process.

I’ve written about 23andme’s getting patents. And I really hope they’ll make it easy for people to export their data, starting with the genotype but ideally at least including some basic phenotypic information. If we could even get just the equivalent of the form that you fill out when you sign up - like when you go to the doctors’ office…

But I always want to come back to this. I tweak 23andme but they deserve enormous credit. They show up, time and again, where other companies in the space don’t. They talk to their critics. They fight the fight at the FDA. Were there any other DTC genomics companies even in the room at Strata today?

The vision is one of real change, and one that has room for a commons in it, even if it’s not all we want yet.

I’m going to keep tweaking, but that’s because I think 23andme can set the tone for an entire industry by facilitating data portability through their API, not just through a lengthy manual download process. I don’t think anyone who wants to export wants to leave 23andme - I sure don’t. I just want to send a copy to a place where there’s more eyeballs because there’s lower barriers.

Most people won’t. The secret of the commons is that it takes a shockingly small number of people to pull one off…though it does certainly help to have seeds.

And the success of the more eyeballs space will only increase the value of the coherent, internal data set at a place like 23andme. It’s a long look kind of investment. But it would cost nothing other than a small number of people’s data sets becoming non-rivalrous, and could potentially bring enormous returns.

Oct 18, 20122 notes
My TED Excellent Adventure

My TED talk went up yesterday.

And whatever snark you hear about TED, or the TED brand, know this. The conferences are amazing in person. I go to to conferences. All the time. There is nothing like TED. Part of it is the seriousness of it - no laptops, no tablets, no phones, except in the very back of the auditorium. So if you want to be close to the speaker, you have to pay attention. And people do. It’s the most attentive audience I’ve ever seen.

I spoke at the end of the week, on the last day, at the end of the first morning session. The crowd was tired, hungover from the grand party, and it was probably the only session where there were seats available from beginning to end. Imogen Heap gave a surreal performance just before I went up. Suffice to say I’m not used to that as a warm up.

I am in the end mostly happy with my talk. A few stumbles, one choke in the throat as I talked about my grandfather. That’s because I took their advice, ditched my script, and just spoke. I riffed a little more in places than I meant, and I left out a few points I wanted to make. And I was unclear - you have to be 14 to access the data in the commons, but you have to be 21 to donate. But in the end I think it was the better for having winged it than having stuck to a script.

It wasn’t the talk I expected to give when they asked me to speak. The more I worked on it, the more it became clear that it needed to be a story about consent and technology, even if that’s not what I usually talk about specifically. I’d heard of novelists and writers talking about books writing themselves, or revealing themselves through craft. Since I think of giving talks as far more akin to jazz that never made sense to me, but it does now.

TED is what it is. The brand, the ubiquity, the criticism by the Evgeny Morozovs. It should not be viewed uncritically (nothing should be, for that matter). But part of what it is, is two extraordinary conferences where people actually want to hear about things that matter. That’s all the conference is. That’s where the talks come from.

That space, that simple physical concentration of human attention, creates enormous power. I wish more conferences could create that place.

Thank you, Bruno and the entire TED krewe, for giving me a spot on the red dot. It was fun.

Now, some thanks and attribution.

First, Lesa Mitchell at the Kauffman Foundation is the reason I was there. She is a tireless champion of difficult people and visionary change, and she is one of the best mentors I have ever had. Thank you Lesa.

Second, my family. My wife Carolina, who tolerated endless drafting and A/B testing. My son, who threw spaghetti at me when I was feeling self-inflated. My mom and dad, who taught me to say my mind (maybe too much!). My sisters, whose health inspires me. This one was for you.

Third, those who donated ideas or content to the slides themselves.

Dave Clifford made an offhand (and totally genius) comment about Avicenna in a business meeting that unlocked the entire introduction for me. Molly Crosby’s book The American Plague about yellow fever led me down the path of researching Camp Lazear to the National Library of Medicine photo archives, and to the Philip Hench Walter Read Yellow Fever Collection.

I got the large, high resolution photos from the Flickr Creative Commons pool and will be posting the full slideset with attributions and individual licenses in the comment fields of the appropriate slides. The images of my family and my data are from my private collection. The screenshots of the Eatery are courtesy of Massive Health.

The clip art is all from the Open Clipart Library, the world’s largest collection of public domain clip art (you should not only use it, but join the campaign to free one of its creators, Bassel, imprisoned in Syria since April).

Last, I was searching for a close while Amanda Palmer was finishing her famous Kickstarter campaign. And her thoughts on nakedness as strength hit me as the only way to think about the true risk of data sharing. It’s a form of being naked, one that we protect through privacy. But if it’s your choice, being naked can be a form of power, not a source of fear.

Talks come from everywhere. Everything is a remix. Thanks to all for the ideas and inspirations.

Oct 17, 2012
Ada Lovelace Day - Guest Post

Guest post by Carolina Rossini (@carolinarossini)


My Ada Lovelace Day Heroine: Marilia Rossini

Many of my posts at EFF are focused on the details of international policy around copyrights, or policy laundering, or general expansion of intellectual property rights. But today is a different day. It’s Ada Lovelace Day And Ada Lovelace Day is about celebrating women in science, technology, engineering or maths whose achievements you admire.

And for me, that’s my mother.  So I hope you will forgive me a personal post.

My mother’s name is Marilia Rossini. She lives in Sao Paulo, Brazil, where she works at the Instituto Adolfo Lutz. She works on infectious diseases as a biologist with a focus on HIV both in the lab and in the clinic. She often travels to Angola to work on the front lines of infection and treatment in the clinics of southern Africa, where she sets up clinical laboratories and trains local clinicians to detect infectious disease outbreaks as they happen.  She helped coordinate links between Angola and Brazil to provide access and training to the technology that local clinicians needed to have a better understanding of the spread and impact of HIV locally.

Her work has led to multiple papers with high impact factors, most in Portuguese and thus not well indexed by American search engines. But one key paper from her work that was published in English is not only very, very heavily cited, but even available as a Green Open Access download (as you see I cannot entirely stay away from copyright and access issues).

I admire my mother’s work in science. Becoming a viral biologist in Brazil in the 1980s was a hard choice, especially with three kids and a public servant’s rewards. Her dad – my grandfather – was a taxi driver who immigrated to Sao Paulo from Portugal in 1952, with less than $5 in his pocket. She was the first in the family to graduate from university. She fought for everything she got. And she introduced me, indirectly, to the IP world. Because of her, I volunteered – back in high school – in an NGO that taught HIV prevention, where I first learned of problems with access to medicine and knowledge. 

And now she fights something in her own genes, and probably mine. She is a cancer survivor. That means things most of us don’t know, pain and surgery and chemo, and she still flies to Angola to train biologists. And she still flies here to see her grandson.

Lady Ada would be proud of my mom. And so am I.

 [disclosure by John Wilbanks - I’m lucky enough to be married to Carolina Rossini, and I thoroughly approve of this post!]

Oct 16, 20122 notes
GSK pivot to "open" data - reading tea leaves

(after much complaining, I’ve moved my meme photo to the bottom of the page)

I woke up this morning and found that my twitter feed had exploded with the news that GSK is promising to start making individual level data from clinical trials “open” to independent research.

There’s not a lot of detail to go on. But there is enough to do some inferential analysis, and a little bit of historical context as well.

I’m not surprised to see this. Perhaps that it happened now - I’d have bet on 2013 - but not that it’s happening. Sanofi is doing DataSphere (toll access, sorry for sucking), Eli Lilly is running a Clinical Open Innovation portal, etc. Pharma has already figured out that the creation of precompetitive space will be part of whatever new business models wind up supporting drug discovery.

As for the swipes against academia…well, I’ll only speak from my own experience: I have had pretty much all of the top 20 global pharmas call me at some point over the last decade to explore data sharing. I have had maybe three academics call me. There are some good - indeed great - shared academic projects, but the relentless funding requirements and sharp elbows in grantmaking mean that data sharing is hard. There is a business case against sharing, from an economic perspective, in academia - someone else finds a golden ticket in your data, and you lose.

That business case does not exist in biology data or in clinical data if you’re a pharmaceutical company. It’s about getting compounds into the patient and getting reimbursed.

OK, on to the open part of all this.

First, based on what the press release and news reports are saying, we don’t know if the data will be “open” or not - because the press reports don’t mention intellectual property status of the data.

So, to the first point - is it open? We don’t know. Honestly, it doesn’t matter very much for a company like GSK, so I’d expect that they will indeed use a CC0 public domain waiver - they did so previously with their Tres Cantos malaria compound set, which is now underpinning open source drug discovery work. I helped, just a little bit, with that project while I was still at CC and I can tell you the hard part was getting GSK to agree to make the data public, not getting a good license onto it.

Second, frankly, open from a copyrights or patents perspective probably doesn’t matter very much compared to the other mechanisms of control that are explicitly referenced in the reports.

The release intimates that a select panel of judges will review a select panel of applicants and grant access. This points out the failure of “open” definitions to adequately grapple with data in my opinion. It’s easy to meet an open definition with this kind of data but only allow an elite group of scientists in to touch it, using strong norms that say “if you share it, you’re out of the pool forever” which don’t violate the various definitions - because of the obsession with intellectual property as the source of openness.

Who will be the judges? Under what terms will they allow access? Will it be a liberal policy - one that says, by default, any reasonable request will be granted, and here’s what reasonable means? Or one that says, by default the answer is no?

If GSK wants this to work, they need to go all the way and make it not just legally open, but accessible and usable.

I’d like to see the data deposited in Synpase at Sage Bionetworks, as well as at a federal repository like NCBI or EBI. I’d like to see it under a set of terms that specify that any researcher willing to comply with certain terms - not attempt to reidentify, not attempt to harm, perhaps agree not to use the data to bring class action lawsuits (hey, for me, it’d be worth it if we could actually build a map of tox and ADME that worked - sort of a truth and reconciliation commission approach). And I’d like to see the world take a whack at it, not have to apply to have their research ideas judged.

Anything less than this will be “half” open. And being half open is like half learning how to break a board with your hand in karate. You don’t break the board. You break your hand. It’d be a shame to have the first big pharma that tries this fail because they didn’t have the guts to go the distance.

File under:

Oct 11, 20122 notes
Inputs

I’m at the tail end of a wicked burn of travel and talks, which means that I’ve been pummeled with input (in a good way). The more time spent in airplanes and hotels, the more reading I get done. And the more I give talks, the more I get introduced to people I don’t know, hear others speak on topics I don’t know well, and learn.

Because it’s politics season, and I can’t stand televised political coverage, I sought refuge in Future Perfect by Stephen Johnson and Ill Fares the Land byTony Judt, which each in their own way address the gap from which the current political system in the US suffers. Because I’m addicted to William Gibson in all forms - right down to staying at the Mondrian in LA, at least in part because it was the start of Spook Country, Distrust That Particular Flavor. It’s also easy to consume in small parts.

Because travel is lonely and makes me melancholy, but also helps me learn and is my job, Dark Star Safari by Paul Theroux. I also just turned 40 and this is his book about turning 60.

Because sometimes you need a f*cking drink, Every Night’s a Saturday Night by Bobby Keys. He played sax on everything that was awesome for a long time, and the back story of Exile on Main Street alone is worth the price of the book.

Because David Byrne is David Byrne, How Music Works. Bonus points for making me see connections to the re-entry of music into culture to the re-entry of science into culture.

Because I’m curious about the new Wachowski film, Cloud Atlas by David Mitchell.

Because I like re-reading things, Reamde by Neal Stephenson. Also, because I was going to a conference with a keynote on games, and although I’m not a gamer myself, I’m fascinated with what power they hold.

Those are all on my kindle. In real life, I have been lugging Symbol by Steven Bateman or Thinking With Type by Ellen Lupton in my rolling bag. I’m experimenting with web redesigns for Consent to Research and these are great books for when I’m worn out, jetlagged, sleepy, and can’t focus on narrative. Symbol is a collection of symbols, just like it sounds, hundreds and hundreds of pages. I tag ones that move me with post its and come back to them later. Thinking with type is like an ethnography of fonts and typefaces. Both books are visually gorgeous and feel good in the hand.

Anything I’m missing or should be reading?

Oct 3, 2012

September 2012

2 posts

Half Life of Facts

Sam Arbesman’s new book, The Half Life of Facts, is coming out soon. I’ve had the pleasure of reading both a galley proof and an advance print, and I can tell you it’s a great book.

Sam demonstrates, quantitatively, something we all know implicitly but probably don’t think about very much. Everything we know, or think we know, has an expiration date. For a long time, we “knew” that the sun revolved around the earth. For a long time, we “knew” that atoms were the smallest part of matter.

But we learn. And we learn more. And what we “know” decays, over time.

The Half Life of Facts is a great read for those who are interested in science, or philosophy of science, or who just need a reminder to have a little less hubris about how smart we are. Because to the future, we’re all phrenologists.

Disclosure: Sam works at the Kauffman Foundation, where I am a non-resident Senior Fellow. But I like his book anyway.

Sep 19, 2012
23andme's API...

So, big news today in the DTC genomics world was that 23andme has decided to publish an API. The blog announcement optimistically announces that there is “no monopoly on great ideas”and encourages developers to apply for access to write applications to the API.

I’ve gotten quite a few requests for my thoughts on the announcement.

I’m torn between wanting to praise 23andme and wanting to scream DANGER DANGER DANGER at the top of my lungs. So, I’m going to do both.

First, bravo. Publishing an API is smart business move for 23andme, and holds the potential to create a lot of genomics apps on the data they hold.

Second, DANGER. Publishing an API is a smart business move. Let me repeat that: it’s a business move. It’s not because they’re nice.

There is a very serious difference between “we’ve opened our API to you” and freedom. Developers must apply for permission to make applications. Developers must tell 23andme, before developing, what their apps will do. And there is nothing that prevents the strategic shrinking of an API, or the subtle or not so subtle pressure to turn off applications that compete with core business functions or revenues of 23andme. ‘

The fundamental problem is monetization. If you write an awesome app to a closed API, it makes good business sense to give you the Sopranos treatment. You make the app, it gets popular, you get the bust out.

Twitter started with a wide open API and is basically phasing it out. Sucks if you invested your time there. Or if you’re like Dalton Caldwell, being strongarmed by Facebook when his app competed with FB’s App Center.

So that’s the fundamental thing. Develop to a corporate API knowing full well that if you are successful, there’s a better chance that you wind up Scaltino and not Zuckerberg. And despite all the press that App.net is getting now, remember that social networks achieve network effects before they shut down functions in the API. And once a social network gets to that network effect, it’s very hard to build a competitor from the open.

Anyone out there use Diaspora? Didn’t think so.

Code carefully, my friends. And choose your monopolies wisely.

Sep 18, 2012

August 2012

2 posts

Remembering Lee Dirks

Lee Dirks died yesterday with his wife in a car accident in Peru. They leave two daughters behind.

Lee worked in Microsoft Research, where he was a large force in the open access world. 

He was also a very good friend. Someone I trusted to talk to in confidence. Someone I called the first day I changed jobs. When his name was on the agenda, flying twelve hours in a cramped plane to a mediocre hotel in an impossible time zone was a lot more fun. We’d have 30 minute work calls that ran 80 minutes to talk about balancing fatherhood and the travel schedules we shared. We drank together, cussed out our favorite sports teams together, complained about mediocre mexican food together. 

We sent each other pictures of our kids.

And we’ve spent more than one night talking about some awesome place one of us heard about. Nowhere near the mediocre hotel, just a place you can reach in an poorly maintained cab, on a road with crumbling features, that great local version of the overrun tourist attraction with no crowds and the best food stall. Nearly every time that ride is ok. Sometimes something goes wrong but it always ends as a minor thing, a sideswipe, a funny story to tell at the next bar night. 

But I guess it doesn’t always. This ending sucks. 

Lee deserved, more than most I know, to go out fat and happy with a smile on his face. He was a mensch. I’m going to miss him fiercely.

Aug 30, 20121 note
Planting Trees

The best time to plant a tree was 20 years ago. The second best time is now.

- Chinese proverb (supposedly - but lots of difficult-to-source good quotes seem to be attributed to a Chinese proverb)

Building the digital commons is a tricky thing. 

It’s not all that different from doing a startup. If you’re at the right time, people are telling you you’re crazy. People have failed at your task. Big companies have yanked projects. The market isn’t ready.

But that’s when you have to build.

Because once a market gels, inserting competition from the open is very difficult. See: Diaspora. It is far better to ensure from the get-go that credible competition exists from the open source side, so that the entire market is affected by the credible threat that customers can bolt to open source, open data, open content, open systems. 

I would argue we’re at this point in the personal data revolution. We’re awash in devices that collect data, which will look paleolithic in five years. We’re on the edge of ubiquitous electronic health records for everyone, which will actually start to be useful in five years. We’re already at the $1000 genome, which will be $100 in five years. 

It’s easy to say, just wait. After all, Google exited the health records business. None of the big players in personal genomics or health has more than 150,000 total users (I don’t have a link, but this is what I’ve been told by management at a variety of companies) and a tiny fraction of those actually log in each month. Don’t plant the tree yet. Wait til the ground’s ready.

Now is the time to build a commons out of our personal data. Not in five years. In five years, the walled gardens will be so big most people won’t notice the walls. But walled gardens don’t lead to the kind of innovation we need. They lead to monopolistic, bullying behavior when the walled garden needs to hit quarterly results.

It’s not about shutting down the companies. It’s about changing the overall composition of the market’s ecosystem through openness. 

Openness means that the non open players have to treat us better, because then we have an option to simply leave. Openness in web browsers and web servers changes the closed browsers and closed servers, in ways that we don’t have in social and mobile (despite the promise of android, it’s faux-pen, not open). 

Now’s the time to plant the trees in personal health data. 

Aug 6, 20121 note

July 2012

1 post

Re-reading Bruce Sterling

Re-reading Holy Fire by  Bruce Sterling. It’s probably been a decade since I first read it, and it is remarkably fresh and relevant - a fate few futurist books can claim fifteen years after publications. 

It’s got great bits on all sorts of things. Gerontocracy. The crudity and just plain boneheadedness of what we think of as “modern” medicine. Love. Art. Transformation. I teared up reading the description of the emotion of holy fire as something Mia, the main character, sees as she looks at a young girl, and how it was an emotion she felt on looking at her daughter. It’s what I feel when I see my son raise his arms on our little patio, water coursing over him from a gardening bucket. It’s a fantastic book. 

Since I work on privacy these days I’m completely obsessed by the novel’s approach towards privacy. Everyone’s medical information is online to all. Everything, to everybody. Sterling posits a generation burned by a decade of plagues, preferring total transparency to the alternatives. And it’s very plausible.

However, it’s not an all or nothing world. It gives us perhaps the idea that what we need is not a monolithic approach to privacy. Maybe we need a new epistemics of privacy - where not all data is the same, and not all things need be shared. 

There’s a little exchange in the front of the book that sums it up perfectly.

“I’m not asking you what networks they’re accessing,” Mia explained. “I just want to know a little about their personal lives.”

“Oh, okay, no problem,” Stuart said, relieved.

Heh.

I’m going back and re-reading Distraction next. As I recall there’s some lovely near-future biohacking in it, as well as great Louisiana dystopia-fiction. 

Pick up a copy at Amazon.

Jul 30, 2012

June 2012

4 posts

Unintended Consequences of Informed Consent

My TED Global talk from yesterday has generated attention, as I’d hoped. 

I want to clear up one point from an otherwise excellent summary of the talk from the TED blog.

The title of the talk could well have been the “unintended consequences” of informed consent. I’m not criticizing the idea that we should be informed before we join medical studies - indeed, in the talk, I note that this is a beautiful invention of society, something we should be proud of. 

My problem is that we’ve joined the idea of being informed with an idea of privacy  that is creating a drag on innovation. That prevents reuse of data without extensive permissions. That makes it hard to donate your data to science. 

Consent to Research is about untangling the two. It’s built on informed consent, not an attack on it. Because the reality is that privacy and consent have become entwined, and are used by researchers who are competitive, or just plain lazy or arrogant about our ability to understand data about our own bodies, to keep their data secret. 

Just as free software is a hack built on copyright, open consent is a hack built on informed consent. Simple as that.

Jun 30, 20122 notes
TED Global

I’m at TED Global this week. I speak on Friday morning, closing the first session of the morning, just after Imogen Heap.

It seems trendy in some of the circles that pop through my Twitter feeds to be cynical about TED. I never really bought into the hype in the first place, so I didn’t have anything to be disappointed about.

But now that I’m here, and through a few days of it…there’s something to the hype. It’s a truly unique and immersive experience. You dance at 9:30AM to an Indian rock band. You see a guy who turns sound into paintings, because he’s colorblind and has a device that converts color to sounds for him and vice versa. You see *zero* laptops out during sessions. You get exposed to a diversity of ideas that simply doesn’t happen at other conferences. Everyone is open, everyone turns around and talks to you, no one turns away from an introduction. It’s different.

I don’t have the money to go to TED unless I’m speaking. But I’m grateful to have the chance to be here, now, to be part of making this conference, to understand why it’s become a big deal. 

Yes, there’s precious stuff here. And I won’t lie - I’ve heard the word paradigm too many times. But I’m not going to pretend that being able to walk up and get an iced tea brewed from coffee cherries and dosed with elderflower syrup isn’t awesome. Haters, feel free to hate. I’m going to keep drinking the cascara.

Jun 27, 2012
Happy Fathers Day

I’m a relatively new dad. This is my second fathers’ day, but my son was so new last year that June is lost somewhere in the fog that comes along with four hours of sleep a night for weeks on end, 18 diapers a day, and feedings every two hours. It’s the first one I can really enjoy.

Being a dad is a funny thing. I didn’t expect it to be this much fun. The diapers and the feeding and the repetitious activities are the things that so many people focused on before I had a kid, but very few people talked to me about how having a kid can drag you into the moment as well as, or better than, anything else. Watching the boy play on the patio in the warm afternoon sun. Giving him a bath. Helping him climb. Smelling his head in the morning while he takes his first bottle.

I spend a lot of time - a LOT - plugged in and online and digitally present. My son tethers me to the real world.

And he makes me appreciate my own dad more and more. I find myself unconsciously doing things my dad did. I make a silly face and catch myself in a mirror and it’s my dad’s face. I see his face in my son. I hear his voice in my voice when I talk, feel his structure in my structure.

Happy fathers day, pops. Thanks for teaching me.

Jun 17, 2012
Designing Geopolitics - Talk

Giving a talk at Designing Geopolitics 2. Based in large part on ideas in my post on designing for emergence. Since I’m speaking in front of the optiportal, I have no powerpoint, but instead a folder of files that I’m using as a mosaic. I can’t redistribute all of them, so I’m not able to show you everything. Will add a link to the HD video.

These are not my comments as they will be delivered. Think of them as chord changes. We’ll see if it works.

0 (displaying a giant photo of Copernicus)

knowledge from data

Copernicus. Photo. cultural object. Creative work. But since it’s digital, it’s also data. It can now be effortlessly copied, distributed, at zero loss of quality or resolution. 

And Copernican theory led to Brahe, who made painstaking observations, which allowed Kepler and galileo to figure out planetary motion.

1 (displaying a mosaic of pictures of old data)

they did this using what they called observations, what we call data. this is what data used to be. it reformed geopolitics (heliocentricism) and transformed knowledge distribution.

2 (displaying a mosaic of paleofuture slides)

Remember that with the data we have at the moment, we ususaly get the future hilariously wrong. Future shock images. Futurism tells us more about now than the future a lot of the time. Thus any geopolitics needs to understand that we are at least mostly wrong.

3 (display images of analog data storage and copying tech)

these are all ways that we wound up storing data, from the old to the last of the pre digital. quality degraded with copying. all analog. all kind of human: liable to age in ways we understood, intuitively (get images)

4 (display a large picture of the @ sign key on an old typewriter)

Transition to digital and what it means. 

5 (display pictures of data capture technologies and shots of pigpen from peanuts)

capacity to capture data exploding. faster, better, smaller, ubiquitous. pigpen secreting dust as we secrete data.

6 (display pictures of the Dust Bowl era)

dustbowl of data. dust bowl emerged from a national politics of exploitation of the land, coupled with an unusually rainy period on the great plains. “Rain follows the plow” was the idea. Devastating impact on humans, on economy, on land.

7 (black screen)

not to mention that digital data, though of the human, is distinctly digitally not-necessarily-human stuff.

8 (wipe through a set of pictures of a greenhouse monoculture, the earth at night from space, black box.)

monoculture emerging. that is the geopolitics we are getting.

digital divide emerging. that is the geopolitics we are getting.

the black box transaction. that is the geopolitics we are getting.

9 (black screen)

but, the trick. we can build our own geopolitics without asking anyone else’s permission.

10 (display a set of soviet propaganda poster)

it is commonly built, but it is not sovietism.

11 (display photo of pic saying no public access)

it is a reaction to the increasing privatization and exploitation of digital assets. no public access is the default rule. access might be free of cost, but you trade your inner pigpen for access.

12 (display photo of public footpath)

common public access. free software. creative commons. science commons. where commons based systems work, where they don’t. property rights and their limitations, privacy rights and their implications.

13 (display photo of one green shoot in a furrow)

the consented commons as the green shoot growing. it does not take all of us, even very many of us, working together to create something amazing. that’s the real lesson. there is already an asymmetry between the number of people who make things and the number of people who consume them. when we invert the system through a shared open core we can benefit from that same asymmetry to make open systems deeply competitive with closed ones. that’s why there is resistance to the open.

14 (display pics of wright brothers at kitty hawk.)

where we are now: first flight at kitty hawk, first experiments. the older i get the earlier it seems. we’re learning, trying, hacking, failing. that’s fine. that is indeed the point.

15 (display pics from NASA explorations)

the blurring of the lines: jupiter photo (get more) - are they art? are they data? does it matter?

16 (display the following words across entire screen, white text on black background)

geopolitics of data as the potential for emergence.

Jun 2, 2012

May 2012

6 posts

23andme: It's In Their Nature

The Scorpion and the Frog is a fable about a scorpion asking a frog to carry him across a river. The frog is afraid of being stung during the trip, but the scorpion argues that if it stung the frog, the frog would sink and the scorpion would drown. The frog agrees and begins carrying the scorpion, but midway across the river the scorpion does indeed sting the frog, dooming them both. When asked why, the scorpion points out that this is its nature. The fable is used to illustrate the position that the behaviour of some creatures is irrepressible, no matter how they are treated and no matter what the consequences. (from Wikipedia)

Lots of folks are getting hot and bothered over the 23andme announcement that it has been granted its first gene patent. It’s caused a lot of chatter on Twitter, and a lot of handwringing.

Let me ask you all something. Why on earth did you expect anything else?

Genotyping is a commodity service. That’s not 23andme’s business. Their business is selling the anonymized data to those who wish to use it for research purposes  and in doing their own research on the data. They tell you this up front, as they note in the comments on the blog post:

We make reference to our intent to pursue intellectual property rights for discoveries made from our research in both Terms of Service (in section 13) and in our Consent document (sections 3 and 5).

Companies exist not just to provide you with neat services, but to make money. And patenting genes is part of how companies in the drug and health space make money. Whether or not it leads to drugs is another question, one we might ask of shareholders of Millennium and Human Genome Sciences, two leaders in gene patenting yet not two leaders in actually curing very much at all in humans.

Open systems matter because the core of the system is a shared set of norms. Sage Bionetworks is part of an open personalized medicine revolution. It’s why I’ve downloaded my own copy of my own data, and am uploading it into Sage’s Synapse compute environment, so that it can be part of fully open research. 

23andme wants to be at the center - the czar - of a closed personalized medicine revolution. They want to build a walled garden so big that no one person notices the walls - unless that person wants to do something without getting permission from 23andme, like new research, or starting a company.

That doesn’t make them bad, or their service bad. They’re the scorpion, we’re the frog. Hopefully the sting isn’t bad enough that we all drown. I want 23andme to make money. I’m a paying customer (indeed, I pay for my whole family’s accounts). I’ve filled out every survey, answered every snippet. But I don’t trust them to operate openly, or with my interests at heart. They have $50M in investment to recoup, indeed, to add a zero to. 

They’re going to make choices as they monetize that I don’t like. This is one of them - it’s both not likely to lead to a drug, and it pisses off their fan base. But I’m not remotely mad about it. It’s in their nature. 

May 31, 2012
Why Care?

An old friend of mine asked me, after seeing my torrent of posts about the Access2Research petition (go sign it PLEASE) asked me this:

What the heck is this all about? I hate the fact that the Internet people can use cookies to spy on me, track my computer use and invade my privacy. Will your open Internet protect their ability to continue to violate my privacy?

This is a really old friend. We met 22 years ago, in school. We took introductory philosophy together and we were jackals in that class. We lived across the hall from each other the next year. He’s one of the smartest people I met in school, and a genuinely good guy. He actually wrote letters to his friends, including to me, before there was Facebook. He introduced me to online networks - I first got on the internet via his Mac, onto AOL, in 1991.

The fact that we have to explain open internet to my friend is part of the problem we face. Indeed, it may be the most important part of the problem.

So, old friend, here’s why I think this is important.

The internet itself, and the web itself, are “open” at their core. That is to say they are built on open technical standards that define how computers send bits across telecommunications networks (the internet) and how computers send and reassemble and display documents across networks (the web). No one can “own” these standards. No one can lock out anyone who wants to add a computer, or add a web page. And anyone can *add* stuff to both the internet and the web - including security, like https (on which Amazon transactions thrive securely) or social networks like Facebook.

You need never ask permission to add things to open systems. The market decides if they work or not.

The openness of the systems is at their core. The thing is, building closed systems on top of the open internet is awesome business. Thus, companies build browsers that track all your behavior (sometimes they even offer you small cash prizes!). Facebook tracks you while you’re logged out. 

The only reasons browsers are even contemplative of “do not track” is because the competition from the open - Firefox - is so strong. In the absence of competition from the open there is just exploitation. That’s what Facebook’s all about. 

I use Facebook. The question was posted on Facebook. It’s a great way to communicate with my family and old friends. But the lack of competition from the open allows Facebook to make choices in a lopsided market, ones that we don’t have to worry about when it comes to the internet. Because Facebook can delete your page at will. Facebook can shut down your app company at will (this applies to Apple as well).

There’s a czar in a closed system, whereas in an open system, the center is shared openness, what William Gibson called a consensual hallucination. 

So in open systems, it’s actually both more likely that someone will build tools to exploit your privacy (because it’s good business and has already happened) and that you’ll be able to resist those tools (because there is competition from the open and you’ll be allowed to install resistance software). In a closed world, only the former takes place.

I see open access to the scholarly literature as part of that continuum. Freeing knowledge from behind artificial walls makes knowledge an open system like the internet or the web. That’s going to mean a lot of the scientific and scholarly equivalent of cat pages and blink tags and all the bad stuff that went along with the early web. That’s going to mean a lot of pseudonyms and trolling like we have with today’s web. But it also means that good businesses can be built on top of the knowledge web, and that as those businesses inevitably look to monetize, we have the freedom to resist the less savory choices they might make.

Open systems are about freedom. Not just of cost, but in the constitutional sense.

Sign the Access2Research petition. Please. 

May 29, 2012
Making More Scientists

I’ve been working up to a rant on twitter lately about Open Access, as the Access2Research petition’s first week drew to a close (sign it).

The usual suspects are making the usual arguments - OA will dictate where scientists have to publish, OA will kill peer review, and most offensively to me, science is too complex for us unwashed liberal arts heathens to possibly understand, so no good will come of access.

But science is a place where we keep out the unwashed masses. We no longer credential computer scientists (well, universities churn them out, but your credentials on github matter a lot more to savvy programmers than a CS degree from a state university - you’d be better served majoring in something fun and checking in code to open source projects). 

Has innovation in computer science been a problem? [crickets chirping]

Basically every knowledge based discipline that runs on digital content has been transformed. Software. Journalism. Music. Video. And you can track the innovation patterns of each one based on the level of control that institutions maintain.

Note: not all innovation is useful - most of it is shit - so part of my argument is that radically increasing the rate of *all* innovation is the best mathematically certain way to increase the rate of *useful* innovation. It’s like art. Most art sucks. But if enough people make art, then even if the rate of awesome artists doesn’t improve, making more people overall be artists means more awesome art.

That’s what’s happened in software. More people make it. That means more shit software. We just don’t use it (ever browse the Android app store’s dregs? Sheesh). It’s happening in journalism, whose business model turned out to be based on classified advertising and got eaten by Craigslist, the ugliest website on earth. It’s happening in music, where Apple ate the music industry’s lunch, where artists can raise a million dollars on Kickstarter just as their old labels go bankrupt. 

But science isn’t like that. Science is a lot more like the cable industry. Comcast and a few behemoths control the last mile of the internet to most houses, and so we don’t even realize the world we live in is radically limited. Internet in the US is so bad compared to so much of the world and we don’t even see it. Toll access publishers of science are just like Comcast. They want to control the last mile. 

And scientists who buy into the argument that those of us in our houses, lacking credentials to understand their science, are perpetuating a knowledge lockup. They’re on the wrong side of history.

You see, it does not matter if 999 of the 1000 people who read an open access article, who might not otherwise have been allowed to read it without paying $50, fail to understand it, believe they have disproved the second law of thermodynamics, etc. It matters that the one person does read and understand is provided access. 

Because then, in that moment, we’ve created a scientist - or at least the makings of one. And the only people that threatens are those counting on their credentials to keep them competitive, or profitable, or employed. Since I’m none of those three it’s pretty easy to support open access.

May 28, 20124 notes
Time For A Beer

The Access2Research We The People petition has been a far greater success in its first week than I at least dared hope, when we hatched this plan and took this picture:

We’ve crossed 17,000 signatories from late Sunday night to mid-day Friday of the first week, measuring in US Pacific time. Signing is tailing off significantly, as we expected - we’re heading into the weekend, which is a long holiday weekend here in the United States for Memorial Day. I would have been thrilled with 10,000 this week, honestly. So we’re set up well for the next big push to the summit. Everyone who signed, who recruited, who tweeted, who blogged, who shared - thank you. I am going to celebrate with a nice beer tonight and take a break from constant monitoring of the stream for a day or two. There’s a toddler I’d like to spend some time with, and a spouse coming home from a long business trip who I can’t wait to see.

Before that though…some color commentary on the campaign that came out of nowhere.

We have received a plausible batch of criticism, from not going far enough in the petition (asking for liberal copyright licensing on articles, or specifying a maximum embargo) to not having enough detail about the petition on the website. These are good points. We looked at them, and chose not to go after them. Here’s my view on why we chose that - the others may have different views of course.

The petition is simple because of two reasons. One, you only get 800 characters to work with. That’s not something conducive to nuance. Second, it’s simple because we want a positive response from the Administration, and by staying simple we allow a little bit of flexibility to them as they respond. Sometimes detail doesn’t help; we believe this to be one of those cases. That belief may or may not be true, or best, but it’s what we went with, and we did a lot of behind-the-scenes canvassing and draft review of the petition before we posted it.  

The website is simple for similar reasons. We’re not creating an effort to educate the public about open access, or public access, or taxpayer access. We’re trying to influence executive policy by getting a certain number of people to sign a short petition. Those people often have to suffer a miserable user experience on the petition website (horror stories of failed registration and browser crashes are commonplace enough to make me think we’d easily have passed 25K if the White House knew about OAuth). They have to fill out email addresses, solve captchas, wait for an email confirmation, and then sign. 

Again, our belief was that simplicity makes that action easier than detail. There is an enormous amount of information on the web about OA. We could copy and repost it to teach signers more, or we could be polemic. Polemic was the choice.

None of this matters much in the end. We’ll get our 25,000 by June 19 even if we have to drag the twitterverse screaming across the finish line. Hopefully long before then. 

What matters now is what the Administration does in response. The total number of people who care about this issue has radically expanded in the past week. Wikimedia’s endorsement means we’re only starting to see the impact of that expansion. 

If the White House wants us to take We the People seriously, this is a great chance to make us believe. This is a proposition we know is under consideration, that is in the power of the executive office to achieve, and that has demonstrated broad public support. 

As an #OAMonday wit said early on, Mister President, tear down this paywall. 

May 25, 2012
Access2Research

In my spare time when not working on Portable Legal Consent, I continue to work on Open Access issues.

About ten days ago, I got home from vacation to news that I could go to Washington DC and meet with John Holdren, the Science Advisor to President Obama. He’s a nice guy, and it was a great meeting. He seemed to understand the issues and our points. Like any good political appointee he was non committal of course.

But it was a brutal trip. Redeye on a Tuesday night from the West Coast after a full day of meetings, change into a suit (and shave) in a public bathroom because Dulles Airport has no showers, meeting, back onto the plane home. 

And it hit me - us, because I was with Mike Carroll, Mike Rossner, and Heather Joseph - that the redeyes and the meetings and the arguing were not carrying the day. We needed to do something else. 

So we started the Access2Research campaign to engage the public in open access. Please go, read the context, and if you agree, sign the petition. The only thing missing from the open access debate is the public. You can remedy that - but you’ve only got 30 days to sign, and we need 25,000 signatures to carry the day.

May 21, 20127 notes
Oh Please No

I come back from vacation and nearly have an aneurysm.

Seriously, someone thinks we should extend the HIPAA model to consumer privacy.

Because it works so well. Encourages so much innovation. Protects consumers and customers so thoroughly. Let’s use one of the worst-working data systems we have as a model!

It is depressing how often we in the “open” world have to say this, again and again. More control via rights is not the answer to every problem. Not in copyright. Not in patents. And not in privacy for data. 

May 8, 2012

April 2012

2 posts

Taxonomy of Privacy

Just a quick hit to link to Daniel Solove’s epic paper “A Taxonomy of Privacy” which has rapidly become a major influence on my work at Consent to Research. It’s a free download. 

But the first sentences of the abstract are a great teaser:

Privacy is a concept in disarray. Nobody can articulate what it means. As one commentator has observed, privacy suffers from an embarrassment of meanings. Privacy is far too vague a concept to guide adjudication and lawmaking, as abstract incantations of the importance of privacy do not fare well when pitted against more concretely-stated countervailing interests. 

Agreed. I love the taxonomy’s distinction between information collection, processing, and dissemination. I do wish there was a discussion of harm in here because economic harm is a real consequence of information dissemination in health.

Apr 20, 20121 note
Atlantic Health Forum - Comments

I’m speaking today on a panel at the Atlantic Live healthcare forum. Below are the themes I’m going to hit in my talk - but it felt like it was worth posting them here as well.

We have engines to generate correlations out of massive data that are unreasonably effective. We’re not using them in health, because the machines don’t work very well on health data. It isn’t just that the data be massive, but that it be available (i.e. open) and at least somewhat standardized. Health data’s almost never either one.

Here’s a key sentence from the Google paper linked above:

“With a corpus of thousands of photos, the results were poor. But once they accumulated millions of photos, the same algorithm performed quite well.”

Have we ever had millions of data records about health for algorithm training? Nope.

That’s a tragedy. It’s not that these engines - machine learning - are the magic answer. But they’re an incredibly powerful toolkit for finding correlations, which helps us decide what experiments to run to test for causation. It’s like we’re carpenters and we’re sitting around building a house and saying, nah, we don’t want to use power tools. Hand saws for everyone! 

The Amish approach won’t scale. We have no idea how many people it’s going to take, or how much data about each person it’s going to take, to discover exactly how valuable machine learning is going to be in the health space. But that’s a terrible excuse to do nothing. In fact, it should be the spur that gets us started on the experiment itself. 

That’s what i am doing with the Portable Legal Consent project. It’s an experiment to test three things. 1. can we use a standardized approach to informed consent that disintermediates the traditional study systems, which are crusty and have insanely high transaction costs? 2. will people enroll and upload data, and of what type and quality? 3. what kind of results will emerge from the computational research? 

Until we know these three things we’re basically arguing about philosophy and not reality. We know that these systems are tremendously powerful and predictive in many areas. We know that we aren’t using them in health in a meaningful way. And we know that there’s nothing stopping us from trying except ourselves.

Apr 19, 20121 note

March 2012

4 posts

The Magic Number

I have been obsessing lately with how many people we’ll need to recruit into the PLC-CGR study to make a real difference. On a good day I think it’s 100,000. On other days I think it’s well into the millions. 

But it’s gotten me thinking about magic numbers. Every “open” community has a magic number - the number of active contributors that make a project into something that lives and breathes. As Nathan Yergler has pointed out beautifully, “open source” is not a verb. It’s about communities of people who do (often boring) work, willingly, for a multitude of reasons, to create open stuff. And there’s not a lot of evidence that people actually want to control their own health information in a way that gets PLC-CGR to its magic number. Indeed, some of the evidence points the other way.

Wikipedia’s magic number seems to be 300,000 - that’s how many editors make edits every month. Of those editors, 50,000 make more than five edits per month, and 5,000 make more than one hundred edits per month. That puts the magic ratios at 60:1 for serious editors and 6:1 for occasional editors. And that is all out of about 16.5 million Wikipedia users who have bothered to sign up for an account, which is a lot less than the actual total users who’ve ever looked at Wikipedia.

One of the reasons I’m so skeptical of “open source science” as an approach is the magic number problem. The total number of people who know enough stuff about any given topic to edit Wikipedia is so vast that even a tiny fraction gets you to the magic number. But when we’re talking about medicinal chemistry, those same percentages probably don’t even get you a single postdoc. 

We often (I did, for a long time!) assume implicitly that scientists are somehow more open than society at large to the ideas of knowledge sharing, and thus the math will be better. I’m not sure. 

But the PLC-CGR study routes around this by recruiting normal people, not scientists or doctors. And it doesn’t require wikipedia level editing commitment. But in an ideal world, we’ll have a hard core of people who don’t just create an account and never come back. Who don’t just upload their genotype from 23andme once and never come back. Who don’t just upload a crappy EMR in Blue Button and never come back.

We need a group that decides their free time is well spent donating their data to science, who are willing to RTFM enough to get not just today’s data, but tomorrow’s data, into the system. To serve as community moderators and admins to help the newbies along. 

That’s hard enough. But on top of that, we need that group to be large enough that the science that emerges is replicable, statistically sound, and not horribly overfitted to the sample we provide.

I’m more optimistic than the Health Data Management article. I think one of the reasons for the small numbers of genotypes bought, or the small numbers of people controlling their own health data, is that there isn’t a route for that data to actually turn into results. I downloaded my 23andme file and it’s useless to me. But if we know that we can funnel our data into a system that gets it to researchers, then we increase the motivation - at least by a little - to take some control ourselves. 

If we knew the magic number, this would all be easier. But then again, if we knew the magic number, it’d probably already be done.

Mar 30, 20121 note
US Government and CC0

So this happened yesterday, and it’s a good thing. But note Mike Eisen’s question below Ed Summers’ tweet.

The question of federal entities and CC0 is a thorny one, depressingly. 

US Government works are by default in the public domain from a copyright perspective, but that status only holds in the United States. This is kind of insane in an internet world, obviously. Is the work outside the United States when a citizen of Japan hosts a copy of the work on an Amazon server (US company) which is co-located in a third country for technical reasons? 

The USG has made it very clear to me and others on various occasions that not only do they hold copyright to US Government works outside the United States, they expect to oftentimes exercise those rights to make money through licensing. The canonical position on this is at the USA.gov page on copyright.

Using CC0 effectively waives the rights internationally, harmonizing the position of the copyright status of the work inside and outside the country. That interoperability functionality was a key design factor for CC0. It’s why the waiver devolves to a license in juridsidictions where the public domain doesn’t exist the same way - it is functionally interoperable with the public domain as we know it in the US even if that public domain isn’t the legal norm in another country.

I tried several times to get the USG to formally endorse CC0 while I worked at Creative Commons, most notably in testimony to the National Science Board’s working group on data. But we never got there. By the end I simply was asking to have the USG state that using CC0 was allowable if the relevant data owner inside the government was willing to do so, and we never got that explicitly either. It was one of the hardest parts of the Polar Information Commons negotiations.

So, for me, the fact that the Smithsonian has gone to CC0 is actually a great step. It means that data owners inside the USG have the latitude to use tools that put USG works into a legal status outside the US that is interoperable with their public domain status inside the US, and that’s an unalloyed Good Thing in my view. 

Mar 23, 20121 note
Designing For Emergence

I spend a lot of time ranting on about openness, and a lot of that time I am preaching to a choir or people who believe in openness at a deep enough level that they don’t think about the why of it. It’s just accepted that open = good in some epistemic sense.

And that’s nice in a lot of ways. It’s nice to be able to just talk about good stuff and not make basic arguments. But I arrived at openness from a different place than a lot of my compatriots.

I didn’t arrive at it from a political freedom perspective like the founders of the free software movement did, although I now embrace the political. I didn’t arrive at it from a horror at copyright expansionism like the founders of Creative Commons did, although I now share that horror. Nor did I arrive at openness after experiencing the misery of scholarly publishing models like the founders of the Public Library of Science did, although I now know that same misery.

I arrived at it after getting my nose bashed in by biology. I started a company about thirteen years ago, based on some work I did in university epistemology and semantics, whose entire premise was that we knew enough about biology that if we simply converted that knowledge to formal ontologies and then used common identifiers to connect those ontologies into networks, we’d discover some basic laws of biology that were implicit in the knowledge but invisible to us because of format problems. 

Was I ever wrong. But I was wrong in a way that is a leitmotif for people entering biology from other fields, whether technologists or philosophers, whether naive twentysomethings like me or successful Silicon Valley executives: I bought into reductionism.

Reductionism is the great legacy of twentieth-century physics, but while it worked spectacularly well for particle physics it doesn’t quite work for drug design.

- The Curious Waveform (via In the Pipeline)

I would go even further, and say that reductionism is the great legacy of the twentieth-century effort to industrialize science. Physics sat at the core of the scientific enterprise that ended the second world war, that guided Vannevar Bush in the design of a modern national investment in science. 

But Wikipedia’s entry on reductionism has a clue as to why it’s a doomed way to think about biology: 

“Reductionism does not preclude the existence of what might be called emergent phenomena, but it does imply the ability to understand those phenomena completely in terms of the processes from which they are composed. This reductionist understanding is very different from that usually implied by the term ‘emergence’, which typically intends that what emerges is more than the sum of the processes from which it emerges.” 

So far at least biology appears to be a pretty nice example of “more than the sum of the processes” - or at least we don’t have the ability to understand the phenomena of life completely in terms of the processes from which it’s composed. Otherwise we’d be able to find drugs at a cost lower than [insert whichever billion dollar number your preferred analyst assigns to new drugs] and we wouldn’t have to price them at points that require political action.

It is precisely this face to face contact with emergence that led me to openness. Open systems are inherently capable of emergence on their own - thus the internet begets email and the web, the web begets e-commerce, the digital commons, and on and on. That’s a design principle that we’d do well to remember as we design scientific systems, whether technical or legal or policy - we need a cultural context for science that is just as emergent as the science itself.

And design is the right way to think about this. It’s not just a matter of law and policy and tech. It’s why I have signed on with Lybba as part of my consent work, because I’m convinced that design is fundamental to thinking about emergence, about unintended consequences, about openness. 

Simple, but not simplistic. Weak, but not weakness. Open, but not free of cost. These are design principles for emergence, for generativity, for whatever is the opposite of reductionism. Because we live in an emergent world, and are emergent beings of biological nature. Openness just makes more sense in that context.

Mar 6, 20122 notes
The Only Meaningful Reaction To Anti-FRPAA Letter

Seriously, I wish I could get cranked about yet another piece of “please protect my content industry from teh internetz” lobbying crap, but the only reaction I can really summon at this point is LOLCats and animated gifs of Bill Cosby. 

Mar 5, 20121 note

February 2012

2 posts

Apply for Panton Fellowship - Deadline Feb 24

The Panton Fellowships applications will close in 9 days, on 24 February. You should probably apply if you’re reading this blog and you’re a grad student or a budding scientist.

Adapted from the OKFN blog: The fellowships are for scientists who actively promote open data in science - awardees receive £8,000 over one year, plus a small discretionary budget for travel and related expenses. They are designed to be flexible, and there is scope for Fellows to carry out a wide variety of activities. Applicants are encouraged to propose their own work plan. Panton Fellows may wish to initiate discussion about the role and value of openness, explore practical solutions for making data open, and push for change in scientific practices. Panton Fellowships are open to all applicants, and are particularly suited to graduate students and early-stage career stage scientists.

I was an original author of the Panton Principles around which the Fellowships are centered, so I have a bias towards getting awesome people into the application pool. But the reality is that a Fellowship like this has rewards that go well beyond the monetary ones, which are honestly and obviously limited. Panton Fellows will have wide recognition in the community, and a chance to work with some of the leading minds in open science and open data. Give it a shot.

Feb 15, 20121 note
Bartleby The Scientist?

I’ve been watching the growth of The Cost Of Knowledge with fascination since it launched last week. If you’re following the kerfuffle around the Research Works Act, and the uncanny similarities between Elsevier press releases and the phrasing of Congressional responses to input on the Act, let me explain a little bit.

Scientists are the labor on which the scientific publishing industry is built. They do the science, they do the writing of papers, and they decide where to submit their papers for publication. Then the publisher turns right back around and asks other scientists to do more work: read the submissions, review them for accuracy, review them for how important the science is, and decide if the paper is worth publishing or not. Then the publishers format the paper and sell it right back to the scientist via punishingly high subscription costs. 

With this labor system, traditional subscription-based publishers can (unsurprisingly) clear profit margins that would make Bill Gates jealous, upwards of 30%. They’ve increased prices, fought open access policies, and paid for false-front lobbying groups to maintain the status quo.

But there was a fundamental fault line in scientific publishing, one those of us in the open science world have always watched, waiting for the first earthquake to strike: the willingness of scientists to be the volunteer labor in the equation of publication. 

Seeing 2600 (and increasing as of 1 February 2012) scientists state they won’t review, and won’t publish, at Elsevier journals in response to the RWA, is that earthquake. It’s a gorgeous example of nonviolent resistance. 

But it’s not enough. Scientists who won’t publish or review with Elsevier need to make a second commitment to make the same amout of labor they used to give to the Dutch giant and give it to a true Open Access publisher. We cannot make the change we want through telling Elsevier “I Prefer Not To” - we must make a separate commitment to devote time and sweat to open journals.

Remember the parable of Bartleby. His passive nature did create change, but eventually he wound up too passive to eat, and he died. There are limits to the power of saying you won’t do the wrong thing - there are few limits when you commit to doing the right thing as well.

And, because this was a way too thinky post, here’s a nice pop culture reference to Bartleby.
 

Feb 1, 20122 notes

January 2012

3 posts

Response to RFI on Digital Data

My response to the White House RFI on Digital Data.

»»»»»»»»»»»»»

While the advent of data sharing plan submission requirements at the NIH and the NSF is a welcome development, encouraging the reuse of scientific data needs far more policy intervention.

First, Standards should be developed that can be used to grade data sharing plans, so that grant review panels can know both whether or not a specific data sharing plan is satisfactory and so that for any given call for submissions the reviewers have a sense of how important data sharing is versus the scientific goals of the project.  Second, data sharing plans should be made public alongside the notices of awards and contact information for the principal investigators, so that both taxpayers and scientists know what promises were made and how to contact a scientist and ask for data under the plan approved.

Third, tracking should be possible to begin to estimate compliance: annual grant review forms should contain fields where the researcher is obliged to place URLs to data shared under the plan (or if left blank, explain why), for example. It should also be easy to create a data request system in which those asking for data send a copy of their request to the grants database, which can then be cross-referenced against the review forms to provide at least a rough estimate of compliance. And fourth, scientists with a record of subpar execution against data sharing plans should be downgraded in their applications for new funding. Taken together, these four elements create an incentive structure that would significantly increase the incentive for scientists to provide public access to the digital data resulting from federally funded research.

In tandem, the funding agencies might develop financial models for the preservation of these digital data in much the same way that models exist for estimating overhead and other baseline costs as a percentage of the grant. This could fund not only new library services and jobs in the research enterprise but also serve as a non dilutive funding source for a new breed of data science startup companies focused on preservation, governance, querying, integration, and access to digital data.

However, we should be careful not to treat data as property by default. Intellectual property is a useful frame through which to view creative works and inventions in science, as well as to protect valuable “marks” and secrets. But in the United States at least, data is typically in the public domain already, and therefore the extension of intellectual property rights to it would represent a vast expansion of rights in a space where there is zero empirical evidence that it is needed.

Typically data is treated more as a secret, which is at odds with the public nature of the idea of data access, and the obstacles to data sharing are less legal than they are professional and economic. The ugly reality is that sharing data represents a net economic loss in the eyes of many researchers: it takes time and effort to make the data useful to third parties (through annotation and metadata) and that is time that could be spent exploiting the data to make new discoveries. On top of this, there is a twin incentive problem. Scientists see no benefit to sharing data and are not punished if they fail to share data, while there is a pervasive fear that other scientists will “scoop” them if their data are available before being fully explored. This creates a collective action problem that can be overcome most easily by clear funder policy as enumerated above: data sharing plan mandates with transparency, accountability, tracking, and impact on future funding.

One policy action that would be very welcome would be an unambiguous signal that publicly funded science data is in the public domain worldwide, not just in the United States. This could be accomplished either through the use of a copyright waiver, such as the Creative Commons Zero tool, or through other means. But it is vital to make it unambiguous and clear when and where data are free to reuse, because applying conditions imported from creative works and inventions to a class of information that is fundamentally far less like “property” can have serious unintended consequences. Easily imaginable consequences include vast cascades of attribution requirements, so that a query to 40,000 data sets requires 40,000 attributions – every time – or worse, the poisoning of data for use in job creation by small companies who wish to build atop data as a platform or infrastructure.  

The intellectual property status of data does differ across the scholarly disciplines and its own status in how far it’s been processed. Some sciences rely on inherently copyrightable “containers” for data, from field books to recordings to photographs. And raw data converted to beautiful information by visualizations will touch on copyright. Policy should be flexible enough to account for this, but start with a default bias that public domain data is the most reusable, while providing “opt-out” capacity for data and disciplines where the public domain is simply not the best solution.

There is an obvious problem with this set of policy recommendations. They rely on money to work. We do not yet know the true costs of storing digital data over the same time frames that we store the scholarly literature. As our capacity to generate data explodes, we must invest at the same time in our capacity to steward it. Research projects into large data information science should be a priority, with specific attention paid to when and where it is possible to compress data, move data to secure “cold storage”, jettison data (either because it is duplicative, or because it can be regenerated again later) , and more. We do not have the sociotechnical infrastructure required to answer questions of data stewardship with any authority, and we must create it on the fly at the same moment that the data creation burden is hitting exponential heights.

Solving these stewardship problems might be best achieved through a coalition of research institutions, the library community, publishers, and funders. Taken together these groups already heavily regulate the daily life of a federally funded scientist. It is a small extension to imagine leveraging that regulatory power to provide new services to the scientist – a university and its library might keep an archive of standard data sharing plans, standard budget items to implement, which together would take the guesswork out of filing and operating a data sharing plan. Even better would be a federal program to certify a small number of such plans for each discipline.

Missing from the set of stakeholders mentioned in the RFI is, notably, the business community, both the large scientific companies and the vast potential of startup firms. In an ideal world, the stewardship conversation will bring in actors from those industries, from pharma to venture capital, as we are missing an entire professional class of data stewards and data engineers (not just data scientists) who could serve the needs of the research enterprise while creating stable. Even better, because the data stewards must be close to the researchers to serve them, these jobs are less likely to move offshore. An investment in small business grants, job training (and retraining) vouchers, and the creation of community college pedagogy for data stewardship functions could go a long way towards stimulating the emergence of this professional class.

In order to stimulate the interaction among these stakeholders and the emergence of a new class of data stewardship jobs, agencies could take additional steps to stimulate use of data. Contests are one obvious route, where a prize is posted in return for solving a problem (or simply for coming up with innovative ideas and/or applications that run on government data). Another route is the expansion of SBIR grants to create a track focused specifically on data startups, which lower the risk of company formation and job creation as well as creating non-dilutive funding sources for entrepreneurs.

A route that is vital, but less obvious, is investment in and commitment to the emergence of standards that enable interoperability of, and thus reuse of, digital data. Standards lie at the heart of the Internet and the World Wide Web, and together lower the cost of failure to such a low point that companies built on the web and the internet can begin in garages. Such is not the case in the sciences. And it will not spontaneously emerge, even if data flow onto the web. As long as those data are in a tower of babel of formats, incoherent names, and might move about every day, they will be a slippery surface on which to build value and create jobs. Federal policy could call for a standard method for providing names and descriptions both for digital data and for the entities represented in digital data, like the proposed standard of the Shared Names project at http://sharedname.org .

Standards also make it far easier to provide credit back to scientists who make data available, as well as increasing the odds that a user gets enough value from data to decide to give credit back. Embracing a standard identifier system for data posters will make it easier to link back unambiguously to a researcher as well as to make it easier for grant review committees and universities to receive a full picture of a scientist’s impact, not just their publication list.

Standards for Interoperability, Re-Use and Re-Purposing

About me:

I am a Senior Fellow at the Kauffman Foundation, the Group D Commons Leader at Sage Bionetworks, and a Research Fellow at Lybba. I’ve worked at Harvard Law School, MIT’s Computer Science and Artificial Intelligence Laboratory, the World Wide Web Consortium, the US House of Representatives, and Creative Commons. I also started a bioinformatics company called Incellico, which is now part of Selventa. I sit on the Board of Directors for Sage Bionetworks, iCommons, and 1DegreeBio, as well as the Advisory Board for Boundless Learning and Genomera. I have been creating and funding jobs since 1999.

Jan 11, 20125 notes
Next page →
2012 2013
  • January 4
  • February 2
  • March 4
  • April 2
  • May 1
  • June 2
  • July
  • August
  • September
  • October
  • November
  • December
2011 2012 2013
  • January 3
  • February 2
  • March 4
  • April 2
  • May 6
  • June 4
  • July 1
  • August 2
  • September 2
  • October 6
  • November 4
  • December 1
2011 2012
  • January
  • February
  • March
  • April
  • May 11
  • June 16
  • July 8
  • August 7
  • September 12
  • October 3
  • November 10
  • December 14