May 09

I’ve been quiet on the Portable Legal Consent front for a bit, but there’s enough going on that it feels like time to make a public update on where we’ve been, where we are, and where we’re going.

First, the study we’ve been running at Sage Bionetworks - the Self-Contributed Cohort - has yielded a lot of data. I’ve got a chapter in the Privacy, Big Data, and the Public Good book coming out later this year that goes into some detail, and we’ll be preparing a paper for peer-review later in the year as well. The short version is:

  • Deploying interaction design is incredibly effective at increasing informedness in an e-consent context. We can get levels of informedness, even on complex concepts, that can outstrip traditional methods of consent. That’s the good news.
  • Most people do not yet have enough data about themselves, or good enough data, to feed computational modeling processes that generate insights. Despite all the advances of patient-centered research, direct-to-consumer genomics, electronic medical records, and more, the data simply aren’t yet rich and clean enough to make the big data analytics tools really sing in a citizen-contributed cohort. That’s the bad news. 
  • If we want the kind of prediction engines in use by Amazon or Facebook or Google, which we use to figure out why drugs work in arthritis, generate insights for personalized cancer therapy, predict Alzheimers progression, and more…we need the kind of rich, clean data that comes out of structured study.
  • Clinical studies are the best structured place at the moment to get rich, clean data about individuals to feed computational reuse.

Given these results, we have turned the revision of PLC towards the study itself. We have been working in partnership with the Electronic Data Methods (EDM) Forum as a Collaborative Methods Project. The EDM Forum is funded by the Alliance for Healthcare Research and Quality and is a multi-year project to build capacity for networks of actors working in health care.

We conducted a series of interviews under the EDM’s guidance earlier this year as part of an in-depth interaction design process. We talked to giant clinical systems, hospital surgical centers, patient-powered networks, clinical data networks, patient advocates, and more. And to get to the million patients I’ve always wanted under PLC and make their data actionable for computation, we need to infuse PLC into clinical studies. This creates a requirement for decentralized approaches, which fits nicely with most of my underlying beliefs about networks (they are either dominated by a dictator at the center, or they’re built at the edges by lots of people using standardized tools - take a guess with my preference!).

Clinical studies happen locally, and their protocols and consent forms are approved locally. Think of it as thousands of little pieces of software - legal code - that don’t interoperate. Thus, PLC 2.0 is not “one consent form to rule them all.” It is instead designed to make it easy for those writing the thousands of little pieces of legal code to add on functionality: getting the data from the study into the cloud for secondary reuse, and empowering patients to receive and transmit a copy of the data about them to other parties. It’s akin to a software developer’s kit, but for legal code.

The beta of PLC 2.0 (which we’re calling the PLC toolkit) will be released in the fall of  2014. The toolkit isn’t going to contain any technology in this release. It’s instead a set of text documents containing consent language for various classes of data, wireframes for those classes of data, implementation guidelines for investigators running studies, and educational materials for institutional review boards. Supported data modules include mobile devices, wearable sensors, electronic medical records, labs data, and genetic sequences. All the files will be available under the CC-BY 4.0 license, so they’ll be infinitely reusable by all parties for all uses as long as we get attribution.

I expect to see a few mobile applications using the toolkit as part of Sage’s BRIDGE project and partnerships coming out this fall. And we are conducting private alpha testing now and welcome anyone who wants to get involved and review - email me for details.

In the interim, here’s a teaser that I showed publicly at the last EDM forum meeting. It’s the dashboard that sits inside mobile applications and manages a participant’s ongoing consent inside a clinical study. This is the kind of wireframe we’re designing.


Apr 11

I’ve been devouring The Remedy by Thomas Goetz since it came out last week and finished it on a series of long flights this week. 

It’s a lucid, accessible popular science book. It’s primarily about two men - Robert Koch and Arthur Conan Doyle - engaged in a debate over whether a tuberculosis cure was indeed a cure or not. If you’re even a little interested in the late 1800s, popular science, the origins of Sherlock Holmes, or the emergence of medicine as a science you’ll probably enjoy it. 

For me though, I was most struck by the first few chapters and the remarkably clear unfolding of how certain moments in scientific time can be moments of massive and rapid change. Goetz traces how Koch, a country doctor in Germany with big dreams, integrated theoretical, methodological, and technological breakthroughs to become one of the most famous scientists in the world.

This integration of breakthroughs fascinated me. The theoretical breakthrough was germ theory, and Koch didn’t invent it. It had spent decades burbling from the fringes of science towards the mainstream but was still well on the outside. It threatened the theories of the famous, eminence-based scientific system. And it hadn’t been demonstrably proven. But it was a powerful enough theory that despite the shutout, it continued to develop, quietly, on the edges. 

The methodological breakthrough was Koch’s. He figured out how to use pure cultures (the Four Postulates) to determine the cause of infectious disease. In so doing he helped cement germ theory as a cornerstone of modern science and medicine. 

The technological breakthroughs laid out in the book may be my favorite part. I knew about the emergence of germ theory, I knew vaguely of the four postulates, but I had no idea of the kind of rapid, on-the-fly invention that the emergence of the culture-based methodology spurred. It is straight out of Eric Von Hippel

The example that sticks with me from the book is such a simple one on the surface: the petri dish full of agar. But it arrived after Koch began culturing anthrax in the aqueous humor of a cow’s eye between glass, moving to gelatin on plates, moving to agar on the advice of a jam-savvy scientist’s wife, finally to round plates with upraised edges. This kind of evolution of technology to support methodology to support theory is packed onto most every page of the first few chapters and just blew me away.

I’m fairly convinced that this is a pattern we’re in the middle of right now.  What struck me reading The Remedy was that I think we can identify the methods and the technologies - sequencing, causal statistical analysis, self-tracking, all the stuff that is on the bingo card of a “Big Data” conference attendee.

But I wonder what the theory is. Goetz clearly makes the point that scientific progress is only obvious in retrospect. In the moment, it’s messy, competitive, sometimes downright personally nasty (the Pasteur-Koch animosity is epic!). The methods can seem so much more obvious in the moment than the theory. 

I do what I do in the belief, naive though it may be, that the breakthrough methods and technologies of the last 20 years are on the edge of allowing us to prove or disprove new theories about the causation of chronic disease, as Koch’s time did about infectious disease. And it seems obvious in retrospect that germs would emerge as the causative theory. But at the time, it wasn’t. Just like it wasn’t obvious that ulcers were caused by infection

So what are the theories of chronic disease that are going to be embarrassingly obvious? What are even the candidates? I wonder. 

Apr 08

(disclosure: This post is about Jane McGonigal. I’ve met Jane in person twice, and we follow each other on twitter. we have spent about 10 minutes total in each others’ company - we are friendly, though we don’t know each other well.)

Jane McGonigal, a well-known gamer and advocate for the good that games can bring to people’s health, put up a webpage recently. It’s titled “Play, don’t replay!" and it’s intended to broadcast the existence of a study that established a small, but statistically significant, connection between playing games like Tetris and easing post-traumatic stress disorder.

It’s a neat theory. I spent some time in treatment for traumatic stress disorder and looked into eye-movement desensitization and reprocessing as a therapeutic intervention, and there is some real evidence that EMDR works. It makes intuitive sense to me that games, especially ones that inspire a visual twitch like Tetris, could trigger some of the same effects. 

Jane came under some withering criticism for putting up the page. Much of it is gaslighting and I won’t link to it. The criticism that interests me comes from Brendan Keogh, who lists himself as a PhD Candidate in Game Studies at RMIT University in Australia, and who called the page “shockingly unethical and irresponsible.”

Here’s the thing. What’s ethical or responsible depends on where you live, where you work, and what your goals are. What’s ethical is changing on us, in real time, thanks to social media. And charging that someone is shockingly unethical and irresponsible, as Brendan did, is serious stuff. It’s about the worst thing you can say in academia (perhaps only plagiarism is worse). 

But here’s the thing. It’s not clear to me that the page constitutes research under U.S. law. I can’t see anything on the page that says the point of the page is “a systematic investigation … designed to develop or contribute to generalizable knowledge" - which is what our laws define as research. It’s not systematic. It’s not promising to publish results. So the law’s ambiguous to me here. 

And it’s really important, this definition. Because the whole point of the criticism seems to be about research ethics (as opposed to, say, Aristotle’s Ethics). So whether or not this is research is really relevant to its ethics.

Besides governing law, institutions control for their own liability, which means that Game Studies researchers would probably have to get institutional review for something like this even if it’s not research under the law. But Jane doesn’t work at a research institution, which means she’s not subject to institutional review. If this had been a Huffington Post piece promoting the article and asking for people to leave their experience in the comments, it wouldn’t be much different.

Now it’s entirely fair to question that Jane should have taken some more time to think about whether or not she’s covered, if this is human subjects research, if she should get independent review. If her internet stature imposes an obligation. I think that would have been smart, and I’ll come back to that later in this post. But that’s the thing. It’s arguable.

And arguable is a long way from "shockingly unethical." 

Reading the piece it feels like there was a pre-existing allergic reaction to the “games evangelism industry” that colored the reaction to the page in question. The first version of the piece even added an Upworthy twist to the page’s description of “one simple technique” by converting it to “one simple trick” (this may be an example of priming).

I have run into this allergic reaction for years in the “harder” sciences (biology especially). There is a real distaste for connecting directly to people via social media, a distaste that I believe has at least some origins in ethics training. I’d imagine Brendan has had ethics drummed into him by his university (likely the Australian version of research ethics, which does seem to have a larger idea of research than US law).

Research ethics require us to get informed consent, assess risks and benefits, and perform selection of subjects - none of which are explicit in Play, don’t replay. And as someone who works nearly full time on informed consent, that does nag at my senses. I’d like to see more of those elements drawn in, more of a sense of responsibility incorporated.

But I can’t get past the idea that this isn’t clearly research. It’s talking to people. And the internet has changed the way we talk to people. Talking to people over twitter reaches more people than a clinical trial if you’re Jane. When Amanda Palmer has a twitter chat about sexual violence, it reaches several orders of magnitude more people than a sexual violence research study.

That reach itself doesn’t make it a study. 

I also can’t get past the idea that this isn’t clearly not-research either. There’s enough dancing near the creation of knowledge that, with the right eyes, one could say this is a page that should have been reviewed by an ethics committee. 

I would love to have seen both parties do something different here.

I think Brendan’s accusation of shocking unethical irresponsible behavior ignores local context about what is research and where research ethics kick in. If you’re going to criticize someone’s ethics, you must first attempt to understand their context. I see no evidence of that in the criticism, and that bothers me. 

I also think Jane’s page brushes close enough to research that she should have run it past someone (not me, someone who does social science and social media) to get an ethical review. I do not think it’s unethical, though. 

The real reason I think she should run it by ethical review is because of her reach. I think that reach imposes an obligation, an obligation that has never existed the way it now exists. 

There is a real possibility for abuse in this space by those who have social reach. Indeed I think this possibility is part of the  criticism leveled by Brendan, as he repeatedly notes that he believes in her good intentions. The shockingness here is not attributed to intention, which is an interesting point of intersection. Jane could be a leader in how to use social reach ethically. I would love to see her do it - there’s not a lot of candidates who could do it better than she could. 

But in the general context…the line between “just talking to people” and “doing research” is dissolving.

We never had to even deal with that line. It was there because only credentialed researchers could hit scale in talking to people. They could raise money, they had structures to recruit. Now Jane’s got the structures to recruit, and it’s costless to contact. Now talking-to-people can brush right up against the edge of doing-research, with all the attendant ethical questions swept up into the engine, with none of the systems functioning and none of the people talking to each other about the real problem.

We need to have a serious conversation about what the dissolution of that line between research and conversation means. Research has much to teach conversation. But - and this is essential - conversations at scale have much to teach research. I would submit that conversations at scale are simultaneously the most powerful form of research that we have yet invented and a form of research that is totally outside our ethics, because it is so new. 

This needs to be a two-way street if traditional, university-oriented research wants to survive. Because conversations at scale are going to eat it alive if the academy tries to pick the wrong fight. 

Mar 28

The Synapse software we run at Sage Bionetworks is open source software.

That’s a statement that has a certain set of expectations that we provide you, the user, with some serious powers: you can download the code from github under the Apache 2.0 License, you can grab our bug tracker. We’ve invested in developer documentation.

That’s what “open source” means. You can get our code. You can change our code. You can redistribute those changes. You can sell our software and never tell us or send us a check. There is an Open Source Definition that spells this all out clearly, arrived at via community consensus, and long used to adjudicate claims of open source-ness. 

There’s a rub, though. The definition is based on the distribution terms of the software - the legal tools that wrap around the code and govern its use and reuse. And we comply with all those terms.

But Synapse has been built from the bottom up as a cloud-based service, Our users love that aspect. Our developers develop for that. And that creates a very interesting conundrum. We run a product that is clearly, legally, technically open source. But the definitions that govern our ability to make that claim about a piece of software don’t reach anywhere into the operations of that software - the way that we run it.

This has big long-term implications. It’s thus possible to market an organization as an open source software organization, yet to architect a radically closed service built on it. It’s not what we do at Sage, or where we’re going, but it’s something that we have learned, almost by accident, that’s entirely doable. The Free Software Foundation has written about some elements of the relation between services and freedoms. But there’s not a roadmap for an organization that wants to run an open service for free.

We need to watch out for for Fauxpen Source here. Not everyone can stand up an Amazon instance and run a cloud service, even if they’ve got the license rights to do so. Not everyone has steady internet. Not everyone will pay attention to the details when they see an open source stamp. Not everyone will pay for open services.

And costs are a big part of the rub. Open source in the old technology model is inherently scalable at a fairly low cost other than the costs of development, which can be either paid or volunteer. The usage of the software doesn’t add much cost. If everyone is downloading and running code locally, then you don’t need a big budget for users and their usage. In a service context, our costs scale with our users - maybe linearly, but given that we store genomes and genome sequencing is cheaper every day, maybe exponentially.

Open as a service is where we’re going. But no one has a map yet. If anyone’s got ideas, please get in touch. 

Mar 20

I’m sitting in the Vancouver airport winging home from TED 2014. It’s still going on, but I only had a day pass for yesterday as we gathered DNA samples for the soft launch of the Resilience Project (a collaboration between Sage Bionetoworks and Mount Sinai). 


I don’t often take DNA swabs, but when I do, I kneel on the floor and furiously barcode TED attendees DNA swabs. Elissa Levin, Stephen Friend, Lesa Mitchell, Diana Friend. Not pictured: Linda Avey, who was on the floor with me.

Thanks, TED (and especially Priscilla, the ever-gracious speaker concierge) for letting us launch our project!

As we were decompressing, Charlie Rose started interviewing Larry Page. It was a very wide-ranging interview, but the part that stuck out to me was Page’s desire to have a massive pool of medical records available for research purposes. 

This is obviously near and dear to my heart, and to many others. TechCrunch picked up on it and was gracious enough to quote me in their coverage.

But it was the phrasing that stuck like a splinter in my brain. At no point was this mass of records about the people referenced by those records. It was as if the data had no relation to the people other than as grist for a vast mill, something to be turned into insights that only then would help people. There was a desire for total disconnect between the medical records and the very real, very human beings whose cholesterol and hemorrhoids were described therein. 

This is often done in the name of protecting the privacy of those people. But as we’ve seen, anonymization of records doesn’t mean the records are anonymous in the hands of skilled attackers

I think the real reason is that it’s just easier. It solves for the privacy laws, if not the privacy problems, and it lets those analyzing the data treat it as an economic and computational resource without moral dimensions. I don’t blame a company for taking that stance. It’s easier.

Easier isn’t always the answer. My own TED talk, from 2012, was about looking into the mirror and making informed consent the centerpiece of pooling medical data. 

We can do this. It’s what underpinned the first version of the Portable Legal Consent study. It’s underpinning the Resilience Project. It’s underpinning the Bridge platform at Sage. 

But it won’t happen - we won’t do it - if fall into the trap of thinking about “50,000,000 anonymized records.”

One medical record is a person. 50 million is a statistic. It’s a lot easier to toss moral and ethical concerns away when the numbers get big. But no matter how many records we have, each one came from a person. And she deserves a voice, a consent, an engagement, with the research she empowers.