This document discusses the opportunities and challenges of open science and open data. It argues that openly sharing scientific data and findings has significant benefits, including enabling faster scientific progress, deterring fraud, and supporting citizen science. However, for data to be truly open and useful to others, it needs to be accessible, intelligible, assessable, and reusable. The document also examines the roles and responsibilities of different stakeholders in working towards more open and reproducible science. This includes changing incentives for scientists, strategic funding for technical solutions from funders, and exploring how institutions like libraries and learned societies can help address the challenges of managing and making sense of the growing volume of research data.
4. The challenge to Oldenburg’s principle:
- a crisis of replicability and credibility?
A fundamental principle: the data providing the evidence
for a published concept MUST be concurrently
published, together with the metadata
But what about the vast data volumes that are not used to
support publication as well as those that are?
5. The opportunity:
new knowledge from data
(some technical opportunities & priorities)
Exploiting the potential
of linked data requires:
• data integration
• dynamic data
Solutions/agreements are
needed for:
• provenance
• persistent identifiers
• standards
• data citation formats
• algorithm integration
• file-format translation
• software-archiving
• automated data reading
• metadata generation
• timing of data release
6. BUT - its not just sharing and
integrating data – its also what we do with it!
Jim Gray - “When you go and look at what scientists are
doing, day in and day out, in terms of data analysis, it is
truly dreadful. We are embarrassed by our data!”
….. and we need a new breed of informatics-trained
data scientist as the new librarians of the post-
Gutenberg world
but will they be in the Library?
So what are the priorities?
1. Ensuring valid reasoning
2. Innovative manipulation to create new information
3. Effective management of the data ecology
4. Education & training in data informatics & statistics
7. Benefits of open communication of data
• Greater benefit to the individual than hugging their own data (e.g.
bioinformatics)
• Permits faster responses to emergencies (e.g. Hamburg 2011 E. Coli outbreak)
• Deterrent to fraud (system integrity as well as personal integrity)
• Stimulates novel and creative collaborations (e.g. Tim Gowers & “crowd-sourcing
in mathematics)
• Support for citizen science movement ( e.g. Galaxy Zoo, Fold-it, Ash-Tag, etc)
• Demands by citizens for access (e.g. “Climategate”)
• Efficient responses to global challenges
8. Benefits 1 : data sharing in ethos & practice
– the example of bio-informatics
ELIXIR Hub (European Bioinformatic Institute) and ELIXIR Nodes provide
infrastructure for data, computing, tools, standards and training.
9. • E-coli outbreak spread through
several countries affecting 4000 people
• Strain analysed and genome
released under an open data license.
• Two dozen reports in a week with
interest from 4 continents
• Crucial information about strain’s
virulence and resistance
Benefits 2: Response to Gastro-intestinal
infection in Hamburg
10. “Scientific fraud is rife: it's time to stand up for good science”
“Science is broken”
Examples:
psychology academics making up data,
anaesthesiologist Yoshitaka Fujii with 172 faked articles
Nature - rise in biomedical retraction rates overtakes rise in published papers
Cause:
Rewards and pressures promote extreme behaviours, and normalise malpractice
(e.g. selective publication of positive novel findings)
Cures:
Open data for replication
Transparent peer review
Not just personal integrity – but system integrity
Benefits 3: Response to fraud
11. Mathematics related discussions
Tim Gowers
- crowd-sourced mathematics
An unsolved problem posed on
his blog.
32 days – 27 people – 800
substantive contributions
Emerging contributions rapidly
developed or discarded
Problem solved!
“Its like driving a car whilst
normal research is like pushing
it”
What inhibits such processes?
- The criteria for credit and
promotion.
Benefits 4: Opening-up
science:
e.g. crowd-sourcing
12. • Openly collected science is already helping policy
makers.
• AshTag app allows users to submit photos and
locations of sightings to a team who will refer them on
to the Forestry Commission, which is leading efforts to
stop the disease's spread with the Department for
Environment, Food and Rural Affairs (Defra).
Chalara spread: 1992-2012
Benefits 5: Citizen Science
16. Boundaries of openness?
Openness should be the default position, with
proportional exceptions for:
• Legitimate commercial interests (sectoral
variation)
• Privacy (completely anonymised data is
impossible)
• Safety, security & dual use (impacts
contentious)
All these boundaries are fuzzy
17. Openness of data per se has little value.
Open science is more than disclosure
For effective communication, replication and re-purposing we
need intelligent openness. Data and meta-data must be:
• Accessible
• Intelligible
• Assessable
• Re-usable
Only when these four criteria are fulfilled are data
properly open.
But, intelligent openness must be audience sensitive.
Intelligent openness to fellow citizens normally makes far
greater demands than openness to peers – we should
prioritise “PUBLIC INTEREST SCIENCE” for the former.
18. Responsibilities & actions
• Scientists: - changing the mindset
• Learned Societies: - influencing their communities
• Universities/Insts: - responsibility for the knowledge they produce?
- incentives & promotion criteria
- proactive, not just compliant
- strategies (e.g. the library)
- management processes
• Funders of research: - mandate intelligent openness
- accept diverse outputs
- cost of open data is a cost of science
- strategic funding for technical solutions
(a priority for international collaboration)
• Publishers: - mandate concurrent open deposition
19. A data management ecology?
The role of the top-down
and the bottom-up?
Massive data loss
Sum (little science data) >
Sum (big science data)?
20. Can libraries rise to the challenges of a post-
Gutenberg world?
“Libraries do the wrong things, employ the wrong people”
People
• Funders mandate novel customers – the public
• Can they attract data scientists?
• Support for researchers & students
Things
• Reversing centralisation
• A data repository – directory - metadata – background
• Dynamic data
• Selection problem
• Compliant or proactive?
21. A taxonomy of openness
Inputs Outputs
Open access
Administrative
data (held by
public
authorities e.g.
prescription
data)
Public Sector
Research data
(e.g. Met
Office weather
data)
Research
Data (e.g.
CERN,
generated in
universities)
Research
publications
(i.e. papers in
journals)
Open data
Open science
Collecting the
data
Doing
research
Doing science
openly
Researchers - Citizens - Citizen scientists – Businesses – Govt & Public sector
Science as a public enterprise
22. A realiseable aspiration: all scientific
literature open & online,
all data open & online, and for them to
interoperate
… but, this is a process, not an event!
You have just been discussing global challenges; major targets for modern science – about science for policy. I will talk about some of the processes of science that will be vital if those challenges are to be effectively met – about policy for science. I restrict my talk to science itself, Chas Bountra will talk about one of the main domains of application – that of the commercial.
But first a little helpful history. This is Henry Oldenberg, the first secretary of the newly formed Royal Society in the early 1660s. Henry was an inveterate correspondent, with those we would now call scientists both in Europe and beyond. Rather than keep this correspondence private, he thought it would be a good idea to publish it, and persuaded the new Society to do so by creating the Philosophical Transactions, which remains a top-flight journal to the present day. But he demanded two things of his correspondents: that they should submit in the vernacular and not Latin; and that evidence (data) that supported a concept must be published together with the concept. It permitted others to scrutinize the logic of the concept, the extent to which it was supported by the data and permitted replication and re-use. Open publication of concept and evidence is the basis of “scientific self-correction”, which historians of science argue were the crucial building blocks on which the scientific revolution of the 18th and 19th centuries was built and remain fundamental to the progress of science, and its value to humanity as the most reliable means of acquiring knowledge. Openness to scrutiny by scientific peers is the most powerful form of peer review.
But Oldenberg’s world has changed. The last 20 years has seen a revolution in the rate at which data can be acquired, in the volume and complexity that can be stored and in the immediacy of ubiquitous communication. We have an unprecedented data storm. This poses challenges and opportunities in the way science is done.
The fundamental challenge is to scientific self-correction. Journals can no longer contain the data, and neither scientists nor journals have taken the obvious step of having data relevant to a publication concurrently available in an electronic database. (example of last year’s Nature paper revealing that only 11% of results in 50 benchmark papers in pre-clinical oncology were replicable – a crisis of credibility?)But enormous data volumes are created by publicly funded research that are never used as the basis for publication. There is a powerful argument that open sharing of such data has enormous potential.
The opportunity is, as some have argued, of another scientific revolution of a scale and significance equivalent to that of the 18th and 19th centuries. Integrating data from a large number of cognate databases, ensuring dynamic data that is automatically up-dated, and using powerful techniques for database interrogation and data and text mining permit us to ask questions in new ways with the potential to gain a more profound understanding of deep data relationships and the information and knowledge that they potentially contain, as promised by the vision of a semantic web.
Henry Oldenburg: the scientific journal and the process of peer review Henry Oldenburg (1619-1677) was a German theologian who became the first Secretary of the Royal Society. He corresponded with the leading scientists of Europe, and believed that rather than waiting for entire books to be published, letters were much better suited to quick communication of facts or new discoveries. He invited people to write to him, even laymen who were not involved with science but had discovered some item of knowledge. He no longer required that science be conveyed in Latin, but in any vernacular language. From these letters the idea of printing scientific papers or articles in a scientific journal was born. In creating the Philosophical Transactions of the Royal Society in 1665, he wrote: "It is therefore thought fit to employ the [printing] press, as the most proper way to gratify those [who] . . . delight in the advancement of Learning and profitable Discoveries [and who are] invited and encouraged to search, try, and find out new things, impart their knowledge to one another, and contribute what they can to the Grand Design of improving Natural Knowledge . . . for the Glory of God . . . and the Universal Good of Mankind." Oldenburg also initiated the process of peer review of submissions by asking three of the Society’s Fellows who had more knowledge of the matters in question than he, to comment on submissions prior to making the decision about whether to publish.REFAaron Klug, (2000) “Address of the President, Sir Aaron Klug, O.M., P.R.S., Given at the Anniversary Meeting on 30 November 1999”, Notes Rec. R. Soc. Lond. 2000 54, 99-108.Marie Boas Hall, Henry Oldenburg: Shaping the Royal Society (Oxford: Oxford University Press 2002).
Open communication of data offers many benefits: Data sharing in specific scientific communities offers greater benefits to the individual than does hugging their own data (e.g. bio-informatics).Data sharing permits faster, more efficient response to emergencies (e.g. the 2011 Hamburg-centred e-coli infection).Mandating open data concurrent with publication has the potential to deter fraud (examples of invented data) and malpractice (mention clinical trials) – stress system integrity as well as personal integrity.Openness stimulates novel, highly creative and efficient modes of scientific collaboration (e.g. crowd sourcing and Tim Gowers).The stimulus it offers to the “citizen science movement”, which has the potential in the next decade or so to fundamentally change the social dynamics of science.A response to the increasing demand from many citizens to interrogate for themselves the evidence for a particular policy that may have major impacts on the lives of individuals and society, and after all they pay, through their taxes for publicly funded scienceAnd finally, and crucially, it offers more efficient and speedier means of addressing many modern science-related challenges (e.g. climate change; energy; infectious pandemics etc)
Henry Oldenburg: the scientific journal and the process of peer review Henry Oldenburg (1619-1677) was a German theologian who became the first Secretary of the Royal Society. He corresponded with the leading scientists of Europe, and believed that rather than waiting for entire books to be published, letters were much better suited to quick communication of facts or new discoveries. He invited people to write to him, even laymen who were not involved with science but had discovered some item of knowledge. He no longer required that science be conveyed in Latin, but in any vernacular language. From these letters the idea of printing scientific papers or articles in a scientific journal was born. In creating the Philosophical Transactions of the Royal Society in 1665, he wrote: "It is therefore thought fit to employ the [printing] press, as the most proper way to gratify those [who] . . . delight in the advancement of Learning and profitable Discoveries [and who are] invited and encouraged to search, try, and find out new things, impart their knowledge to one another, and contribute what they can to the Grand Design of improving Natural Knowledge . . . for the Glory of God . . . and the Universal Good of Mankind." Oldenburg also initiated the process of peer review of submissions by asking three of the Society’s Fellows who had more knowledge of the matters in question than he, to comment on submissions prior to making the decision about whether to publish.REFAaron Klug, (2000) “Address of the President, Sir Aaron Klug, O.M., P.R.S., Given at the Anniversary Meeting on 30 November 1999”, Notes Rec. R. Soc. Lond. 2000 54, 99-108.Marie Boas Hall, Henry Oldenburg: Shaping the Royal Society (Oxford: Oxford University Press 2002).
Henry Oldenburg: the scientific journal and the process of peer review Henry Oldenburg (1619-1677) was a German theologian who became the first Secretary of the Royal Society. He corresponded with the leading scientists of Europe, and believed that rather than waiting for entire books to be published, letters were much better suited to quick communication of facts or new discoveries. He invited people to write to him, even laymen who were not involved with science but had discovered some item of knowledge. He no longer required that science be conveyed in Latin, but in any vernacular language. From these letters the idea of printing scientific papers or articles in a scientific journal was born. In creating the Philosophical Transactions of the Royal Society in 1665, he wrote: "It is therefore thought fit to employ the [printing] press, as the most proper way to gratify those [who] . . . delight in the advancement of Learning and profitable Discoveries [and who are] invited and encouraged to search, try, and find out new things, impart their knowledge to one another, and contribute what they can to the Grand Design of improving Natural Knowledge . . . for the Glory of God . . . and the Universal Good of Mankind." Oldenburg also initiated the process of peer review of submissions by asking three of the Society’s Fellows who had more knowledge of the matters in question than he, to comment on submissions prior to making the decision about whether to publish.REFAaron Klug, (2000) “Address of the President, Sir Aaron Klug, O.M., P.R.S., Given at the Anniversary Meeting on 30 November 1999”, Notes Rec. R. Soc. Lond. 2000 54, 99-108.Marie Boas Hall, Henry Oldenburg: Shaping the Royal Society (Oxford: Oxford University Press 2002).
Henry Oldenburg: the scientific journal and the process of peer review Henry Oldenburg (1619-1677) was a German theologian who became the first Secretary of the Royal Society. He corresponded with the leading scientists of Europe, and believed that rather than waiting for entire books to be published, letters were much better suited to quick communication of facts or new discoveries. He invited people to write to him, even laymen who were not involved with science but had discovered some item of knowledge. He no longer required that science be conveyed in Latin, but in any vernacular language. From these letters the idea of printing scientific papers or articles in a scientific journal was born. In creating the Philosophical Transactions of the Royal Society in 1665, he wrote: "It is therefore thought fit to employ the [printing] press, as the most proper way to gratify those [who] . . . delight in the advancement of Learning and profitable Discoveries [and who are] invited and encouraged to search, try, and find out new things, impart their knowledge to one another, and contribute what they can to the Grand Design of improving Natural Knowledge . . . for the Glory of God . . . and the Universal Good of Mankind." Oldenburg also initiated the process of peer review of submissions by asking three of the Society’s Fellows who had more knowledge of the matters in question than he, to comment on submissions prior to making the decision about whether to publish.REFAaron Klug, (2000) “Address of the President, Sir Aaron Klug, O.M., P.R.S., Given at the Anniversary Meeting on 30 November 1999”, Notes Rec. R. Soc. Lond. 2000 54, 99-108.Marie Boas Hall, Henry Oldenburg: Shaping the Royal Society (Oxford: Oxford University Press 2002).
Openness of itself has no value unless it is “intelligent openness”, where data are:Accessible – they can be foundIntelligible – they can be understoodAssessable – e.g. does the creator have an interest in a particular outcome?Re-useable – sufficient meta-data to permit re-use and re-purposing.These should be standard criteria for an open data regime.But we must recognise that the amount of meta and background data required for intelligent openness to fellow citizens is usually far greater than that required for openness to scientific peers. If all data were to be intelligently open to fellow citizens on the basis that that have ultimately paid for it, science would stop tomorrow. A way forward would be to make a much greater effort to make data intelligently open in what we could call “public interest science”, including those issues that frequently arise in public debate or concern.